Non Blocking IO

{My notes on blocking IO, non-blocking IO, and epoll edge triggered IO}

What is blocking IO, non-blocking IO, select and epoll? When to think about using them? The below notes are on the controversial topic of thread versus connections, and some parts of these notes are derived from

Blocking IO

A call to read may come when no data is available in the input buffer, but the calling thread is expecting more data in future. Similarly, an application may attempt to write, when the network is not ready to accept the data, because the output buffer is already full since the client is very slow.

The application does not care about these issues. So the network driver must, by default, put the calling thread to sleep until the request can be processed. When ready, the driver must interrupt the thread and resume it. The application would usually invoke the read/write operations in a loop, where the driver blocks each invocation of read/write. The application breaks out of the loop, when it discovers that it has read/written fully.

“All network IO is blocking by default.”

Blocking IO Behavior:

  • If a thread calls read, but no data is available in the input buffer, then the thread is put to sleep. Its awakened when the data arrives in the input buffer. The data is returned to the caller (thread), even if there is less data than requested by the “count” argument of read call.
  • If a thread calls write, but the output buffer is full, then the thread is put to sleep. When some data has been written to the network, then the output buffer becomes free (may be partially). Then the thread is awakened and the write call succeeds; although the data may only be partially written in cases when there was not enough room in the output buffer as compared to the count bytes that were requested to be written.

Non-Blocking IO

There are times when the calling thread can indicate that it doesn’t want to be blocked, whether or not its IO can make any progress at all. The calling thread indicates this by passing a “open_non_blocking” flag to the read or write operations of the driver.

Non-Blocking IO Behavior:

  • If a thread calls read with a “open_non_blocking” flag, and no data is available, then the call returns “try_again”.
  • If a thread calls write with a “open_non_blocking” flag, and the output buffer is full, then the call returns “try_again”.

The calling thread is thus free to do other stuff, and try the read/write again later

Blocking IO vs Non-Blocking IO

In case of blocking IO, one thread is created per client connection. Whereas in case of non-blocking IO, a handful of threads can multiplex across a large number of connections. But how is this important? Why can’t I just create as many threads as the connections I need to handle. Well, there are two aspects of threads:

  • The kernel needs to schedule threads in and out of the CPU. With large number of threads, some time is spent in the scheduling task.
  • Each thread requires memory for its stack frame. It could be between 256K to 1M of memory per thread. With large number of threads, the memory requirement grows linearly.

Thus, even when the kernel can handle scheduling of threads fairly well, the memory requirements can grow pretty fast with one thread per connection model. While its okay to have a blocking IO for a few hundred connections, one must think of using non-blocking IO as they start moving towards thousand concurrent connections (C10K Problem).

Select (a.k.a. Poll) Call

Select takes non-blocking IO to the next level. It built on top of non-blocking IO, and offloads the task of checking stream readiness to the driver. The driver is asked to check a set of streams for readiness, and it return to the calling thread a bit mask for each of the set of streams. The bit mask indicates to the thread on what streams are ready. This allows the calling thread to multiplex many active streams using a single thread by leveraging the readiness information returned by the driver. The select call used by application servers to handle large numbers of clients, and scale for high volume.

Select operation works by blocking the application thread until something happens on a set of file descriptors (socket is represented as a file descriptor). What is that something? Until one of the file descriptors has data ready to be read or written. Mostly, the select() based application servers would perform the below:

  • Populate the fd_set data structure with the file descriptors that they want to read in.
  • Populate the fd_set data structure with the file descriptors that they want to write to.
  • Call the select().
  • The network driver blocks the call, until something.
  • When one of the file descriptor’s status has changed, the network driver awakens the thread, and returns the call.
  • Once select() returns, the application thread can find what file descriptors are serviceable by inspecting a bit mask for each file descriptor returned by the select() call. The thread can service the ready FDs by performing reads, writes, or hangups.
  • The application thread can repeat the same process.

Epoll Edge trigger

Epoll is the next generation of select call introduced in Linux 2.6. Key differences with select:

  • Select system call is O(n) algorithm that uses linear search over a list of watched file descriptors. Upon return of a select call, you need to linearly search through the list of watched FDs. The epoll is an O(1) algorithm that uses callbacks in the kernel file structure. Epoll only returns changed FDs. This allows the driver to scale better when the number of watched file descriptors increase.
  • Select has a restriction of 1024 FDs to be watched. Epoll has no such restriction.
  • Epoll can be edge triggered or level triggered:
    • Originally, epoll was an edge triggered interface. It would only return a FD as being ready after a change has happened on that FD. If you ask epoll to watch a particular socket for readability, and if a certain amount of data is already available for that socket, epoll will still block. It will flag that socket as being readable only when new data shows up and when changes occur on monitored FD.
    • Level triggered epoll will always return the FD as ready when some data is readable or writable. This  is good, but can return FDs that have really insignificant data for the application to process.
    • The current epoll implementations allow for switching between edge and level triggered interfaces.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>