My notes from Bert Belder's talk on LIBUV

Introduction

These are my notes based on a talk by Bert Belder. Bert generally discusses how the event loop works, why libeio and libev were removed from Node.js, and why they were replaced with libuv. Bert also presents a simple use case and demonstrates implementation in libuv. He further explains what a request and a handle are.

The Old Node.js Architecture

Let's explain each element one by one:

Node standard library -> This is an API layer made available to JavaScript developers for implementing web applications. In this layer, we only use JavaScript, and the implementation relies on the Node.js documentation.

Node bindings -> For example, if we call fs.readFile(), the JavaScript code is bound to methods that handle this operation at the C/C++ level.

V8 -> This is the JavaScript engine developed by Google.

Libev -> This library provides functions for monitoring child processes/PID identifiers, periodic timers based on wall clock time (absolute time), and timers with relative timeouts. It also supports epoll, kqueue, event ports, inotify, eventfd, signalfd, efficient timer handling, time-jump detection and correction, and easy configuration.

Bert's Explanation of How the Event Loop Works

The basic operation of the event loop is as follows. If an event appears, it will be executed immediately; otherwise, the event loop will go to sleep and wait for new events.

Libev

In the context of Node.js, libev is only used for implementing the event loop. Libev resembles a wrapper for the select() system call and for epoll and kqueue.

Libeio

In the context of Node.js, libeio only deals with asynchronous I/O processing. Additionally, it provides a thread pool implementation and is mainly characterized by convenient methods for file descriptors.

Implicit Assumptions of Node.js

The select model is considered the best model for scalable network applications.

It’s good, but it’s not perfect!

System calls like read(), write(), accept() work the same across all operating systems.

This is not true. The Windows OS has a completely different implementation of system calls.

File operations must be performed in a thread pool.

Updated Node.js Architecture

To enable Node.js to run on various operating systems, architectural changes were necessary. libeio and libev were replaced with libuv. Thanks to this, an I/O implementation for Windows could be added.

libuv

Description of libuv:

The library does not operate directly on events but has an implementation of abstract operations.
It supports various models of non-blocking I/O.
Its primary focus is on performance and embedding methods.

Example

require('net').connect(80, 'nayan.cat').pipe(process.stdout);

This simple code written in Node.js accomplishes a few things:

It connects to the website at the address nayan.cat on port 80.
After reading the website, it outputs its content to the console, or standard output.

What Does the Implementation Look Like in libuv?

int main(int argc, char* argv[]) {
  uv_getaddrinfo_t* gai_reg = malloc(sizeof(uv_getaddrinfo_t));
  
  uv_getaddrinfo(uv_default_loop(),
                      gai_req,
                      after_getaddrinfo,
                      "nayan.cat",
                      "80",
                      NULL);

  uv_run(uv_default_loop());

  return 0;
}

Explanation of the program steps:

1 -> In the line uv_getaddrinfo_t* gai_reg = malloc(sizeof(uv_getaddrinfo_t));, we allocate memory space needed to read the address. uv_getaddrinfo_t is an abstract method known as a request.

2 -> Next, we call the method uv_getaddrinfo with the following parameters: the event loop via uv_default_loop(), allocated memory space gai_req for reading the site, the callback function after_getaddrinfo, where, upon successful address retrieval, information about the address and metadata will be stored, the website address nayan.cat, and the site port 80.

3 -> The event loop is launched via the uv_run method.

Request

A request is an abstract event in the libuv library. Its key characteristic is that it operates in the background, functions similarly to a promise, and its state cannot change while executing.

What Happens When Data Returns?

In this case, the callback after_getaddrinfo is called.

void after_getaddrinfo(uv_getaddrinfo_t*, gai_req, 
                       int status, 
                       struct addrinfo* ai) {
  uv_tcp_t* tcp_handle;
  uv_connect_t* connect_req;
  
  if (status < 0) {
    abort(); /* handle error */
  }

  tcp_handle = malloc(sizeof(uv_tcp_t));
  uv_tcp_init(uv_default_loop(), tcp_handle);
  
  connect_req = malloc(sizeof(uv_connect_t));
  uv_tcp_connect(connect_req, 
                 tcp_handle, 
                 *(struct sockaddr_in*) ai->ai_addr, 
                 after_connect);

  free(gai_req);
  uv_freeaddrinfo(ai);
}

Explanation of the program steps:

1 -> In the line if (status < 0), we check the status. If it’s less than 0, we need to handle this error. In this simple implementation, it’s handled with abort().

2 -> In the line tcp_handle = malloc(sizeof(uv_tcp_t));, we allocate memory for a process related to the TCP protocol. This is not yet handling the TCP connection!

3 -> In the line connect_req = malloc(sizeof(uv_connect_t));, we allocate memory for the TCP connection.

4 -> At this point, we connect via TCP by calling uv_tcp_connect, with parameters: TCP connection connect_req, TCP handler tcp_handle, data stored in the ai structure, and the callback after_connect.

5 -> Free memory for gai_req and ai.

Handler

A handler functions similarly to an event emitter or stream. A handle emits an event on its own.

What Happens When a TCP Connection is Successfully Established?

After a successful connection, the callback function on_connect is executed.

void after_connect(uv_connect_t* connect_req, int status) {
  uv_write_t* write_req;
  uv_buf_t buf;

  if (status < 0) {
    abort(); /* handle error */
  }
  
  write_req = malloc(sizeof(uvwrite_t));

  buf.base = "GET / HTTP/1.0\r\n"
              "Host: nayan.cat\r\n"
              "\r\n";
  buf.len = strlen(buf.base);

  uv_write(write_req, connect_req->handle, &buf, 1, after_write);

  uv_read_start(connect_req->handle, on_alloc, on_read);

  free(connect_req);
}

Explanation of the program steps:

1 -> If an error occurs, we handle it.

2 -> In the line write_req = malloc(sizeof(uv_write_t));, we allocate memory for output to the screen.

3 -> We properly construct buf.

4 -> Using uv_write(write_req, connect_req -> handle, &buf, 1, after_write);, we output to the screen.

5 -> The special line uv_read_start(connect_req -> handle, on_alloc, on_read);. Once the page content has been printed to the console, we wait for a signal in this line.

6 -> free(connect_req); releases memory.

What If We Want to Read Data That Arrives After the Console Output?

First, we need to allocate memory to perform this action.

uv_buf_t on_alloc(uv_handle_t* handle, size_t suggested_size) {
  uv_buf_t buf;

  buf. base = malloc(suggested_size);
  buf.len = suggested_size;

  return buf;
}

Additionally, we must handle the on_read callback.

void on_read(uv_stream_t* tcp_habdle, 
             ssize_t nread, 
             uv_buf_t buf) {

  if(nread < 0) {
    /* Error or end of file */
    if (uv_last_error(uv_deafult_loop()).code == UV_EOF) {
      /* no more data. Close the connection. */
      uv_close((uv_handle_t*) tcp_handle, on_close);
    } else {
      /* That's an error. */
      abort();
    }
  }

  if (nread > 0) {
    /* print it! FTW!!!1 */
    fwrite(buf.base, 1, nread, stdout);
  }

  free(buf.base);
}

We also need to handle closing the TCP connection:

void on_close(uv_handle_t* handle) {
  free(handle);
}

Sources

Bert Belder's talk on libuv