It wasn't very long ago that it was a feat of greatness to get a single webserver setup to support 10,000 concurrent connections. There were many factors that made it possible to develop webservers, such as nginx, that could handle more connections with greater efficiency than their predecessors. One of the biggest factors was the advent of constant-time polling ( O(1) ) mechanisms for monitoring file descriptors introduced into most operating systems.
In the No Starch Press book, The Linux Programming Interface, section 63.4.5 provides a table of observations that describes the time it takes to check different quantities of file descriptors via some of the most common polling methods.
As this shows, the performance benefits of epoll are decent enough to have an impact on even as few as 10 descriptors. As the number of descriptors increases, using regular poll() or select() becomes a very unattractive option compared to epoll().
This tutorial will run through some of the basics of using epoll() on Linux 2.6.27+.
Prerequisite knowledge
This tutorial assumes you’re familiar and comfortable with Linux, the syntax of C and the usage of file descriptors in UNIX-like systems.
Getting started
Make a new directory to work out of for this tutorial, here’s the Makefile we’re using.
all: epoll_example
epoll_example: epoll_example.c
gcc -Wall -Werror -o $@ epoll_example.c
clean:
@rm -v epoll_example
Throughout this post I’ll be using functionality described by the following headers:
#include <stdio.h> // for fprintf()
#include <unistd.h> // for close(), read()
#include <sys/epoll.h> // for epoll_create1(), epoll_ctl(), struct epoll_event
#include <string.h> // for strncmp
Step 1: Create epoll file descriptor
First I’ll go through the process of just creating and closing an epoll instance.
#include <stdio.h> // for fprintf()
#include <unistd.h> // for close()
#include <sys/epoll.h> // for epoll_create1()
int main()
{
int epoll_fd = epoll_create1(0);
if (epoll_fd == -1) {
fprintf(stderr, "Failed to create epoll file descriptor\n");
return 1;
}
if (close(epoll_fd)) {
fprintf(stderr, "Failed to close epoll file descriptor\n");
return 1;
}
return 0;
}
Running this should work and display no output, if you do get errors you’re either probably running a very old Linux kernel or your system needs real help.
This first example uses epoll_create1() to create a file descriptor to a new epoll instance given to us by the mighty kernel. While it doesn’t do anything with it quite yet we should still make sure to clean it up before the program terminates. Since it’s like any other Linux file descriptor we can just use close() for this.
Level triggered and edge triggered event notifications
Level-triggered and edge-triggered are terms borrowed from electrical engineering. When we’re using epoll the difference is important. In edge triggered mode we will only receive events when the state of the watched file descriptors change; whereas in level triggered mode we will continue to receive events until the underlying file descriptor is no longer in a ready state. Generally speaking level triggered is the default and is easier to use and is what I’ll use for this tutorial, though it’s good to know edge triggered mode is available.
Step 2: Add file descriptors for epoll to watch
The next thing to do is tell epoll what file descriptors to watch and what kinds of events to watch for. In this example I’ll use one of my favorite file descriptors in Linux, good ol' file descriptor 0
(also known as Standard Input).
#include <stdio.h> // for fprintf()
#include <unistd.h> // for close()
#include <sys/epoll.h> // for epoll_create1(), epoll_ctl(), struct epoll_event
int main()
{
struct epoll_event event;
int epoll_fd = epoll_create1(0);
if (epoll_fd == -1) {
fprintf(stderr, "Failed to create epoll file descriptor\n");
return 1;
}
event.events = EPOLLIN;
event.data.fd = 0;
if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, 0, &event)) {
fprintf(stderr, "Failed to add file descriptor to epoll\n");
close(epoll_fd);
return 1;
}
if (close(epoll_fd)) {
fprintf(stderr, "Failed to close epoll file descriptor\n");
return 1;
}
return 0;
}
Here I’ve added an instance of an epoll_event structure and used epoll_ctl() to add the file descriptor 0
to our epoll instance epoll_fd
. The event structure we pass in for the last argument lets epoll know we’re looking to watch only input events, EPOLLIN
, and lets us provide some user-defined data that will be returned for events.
Step 3: Profit
That’s right! We’re almost there. Now let epoll do it’s magic.
#define MAX_EVENTS 5
#define READ_SIZE 10
#include <stdio.h> // for fprintf()
#include <unistd.h> // for close(), read()
#include <sys/epoll.h> // for epoll_create1(), epoll_ctl(), struct epoll_event
#include <string.h> // for strncmp
int main()
{
int running = 1, event_count, i;
size_t bytes_read;
char read_buffer[READ_SIZE + 1];
struct epoll_event event, events[MAX_EVENTS];
int epoll_fd = epoll_create1(0);
if (epoll_fd == -1) {
fprintf(stderr, "Failed to create epoll file descriptor\n");
return 1;
}
event.events = EPOLLIN;
event.data.fd = 0;
if(epoll_ctl(epoll_fd, EPOLL_CTL_ADD, 0, &event))
{
fprintf(stderr, "Failed to add file descriptor to epoll\n");
close(epoll_fd);
return 1;
}
while (running) {
printf("\nPolling for input...\n");
event_count = epoll_wait(epoll_fd, events, MAX_EVENTS, 30000);
printf("%d ready events\n", event_count);
for (i = 0; i < event_count; i++) {
printf("Reading file descriptor '%d' -- ", events[i].data.fd);
bytes_read = read(events[i].data.fd, read_buffer, READ_SIZE);
printf("%zd bytes read.\n", bytes_read);
read_buffer[bytes_read] = '\0';
printf("Read '%s'\n", read_buffer);
if(!strncmp(read_buffer, "stop\n", 5))
running = 0;
}
}
if (close(epoll_fd)) {
fprintf(stderr, "Failed to close epoll file descriptor\n");
return 1;
}
return 0;
}
Finally we’re getting down to business!
I added a few new variables here to support and expose what I’m doing. I also added a while loop that’ll keep reading from the file descriptors being watched until one of them says ‘stop’. I used epoll_wait() to wait for events to occur from the epoll instance, the results will be stored in the events array up to MAX_EVENTS
with a timeout of 30 second. The return value of epoll_wait() indicates how many members of the events array were filled with event data. Beyond that it’s just printing out what it got and doing some basic logic to close things out!
Here’s the example in action:
$ ./epoll_example
Polling for input..
hello!
1 ready events
Reading file descriptor '0' -- 7 bytes read.
Read 'hello!
'
Polling for input...
this is too long for the buffer we made
1 ready events
Reading file descriptor '0' -- 10 bytes read.
Read 'this is to'
Polling for input...
1 ready events
Reading file descriptor '0' -- 10 bytes read.
Read 'o long for'
Polling for input...
1 ready events
Reading file descriptor '0' -- 10 bytes read.
Read ' the buffe'
Polling for input...
1 ready events
Reading file descriptor '0' -- 10 bytes read.
Read 'r we made
'
Polling for input...
stop
1 ready events
Reading file descriptor '0' -- 5 bytes read.
Read 'stop
'
First I gave it a small string that fits in the buffer and it works fine and continues iterating over the loop. The second input was too long for the read buffer, and is where level triggering helped us out; events continued to populate until it read all of what was left in the buffer, in edge triggering mode we would have only received 1 notification and the application as-is would not progress until more was written to the file descriptor being watching.
I hope this helped you get some bearings on how to use epoll()!