Networking in C · intermediate · ~25 min

epoll — modern Linux I/O multiplexing

Build a 1-thread server that scales to thousands of connections with epoll.

Overview

epoll is Linux's event-driven I/O multiplexer. You register file descriptors once with epoll_ctl, then epoll_wait only returns the fds that became ready since the last call — O(ready), not O(registered). It scales to tens of thousands of concurrent connections on one thread.

Why it matters

Production C network code is almost always epoll-based on Linux. Reading nginx or HAProxy's source means reading epoll loops. Knowing the semantics — edge-trigger vs level-trigger, EPOLLONESHOT, the meaning of data.ptr — is table stakes.

Core concepts

epoll_create1 returns an epoll instance (itself an fd). Use EPOLL_CLOEXEC so it doesn't leak into exec'd children.

epoll_ctl adds, modifies, removes fds. The events field is a bitmask of EPOLLIN, EPOLLOUT, EPOLLET (edge-trigger), EPOLLONESHOT, EPOLLRDHUP.

epoll_wait blocks until one or more fds are ready (or the timeout fires). Returns the count; fills events[] with {events, data} for each ready fd.

Edge-triggered vs level-triggered. Level (default) keeps notifying you while data is available. Edge fires once per readiness transition — you MUST drain to EAGAIN or you'll deadlock.

Pentester mindset. epoll bugs that swallow events lead to half-open connections and DoS. When auditing, look for: (a) ET without a drain loop, (b) ONESHOT without re-arm, (c) missing EPOLLRDHUP for half-close.

Defensive coding habit. Use level-triggered until you really need ET. Always check events[i].events against what you registered — kernel may report HUP/ERR you didn't ask for.

Syntax notes

#include <sys/epoll.h>
int epoll_create1(int flags);                       /* EPOLL_CLOEXEC */
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout_ms);

Lesson

select and poll work but scan every fd on every call — O(n) per wakeup. epoll (Linux) is event-driven: the kernel remembers which fds you registered, and epoll_wait only returns the ones that fired. This is the foundation of every modern C network server: nginx, redis, libuv, kqueue's Linux peer.

Code examples

int ep = epoll_create1(EPOLL_CLOEXEC);
struct epoll_event ev = { .events = EPOLLIN, .data.fd = listen_fd };
epoll_ctl(ep, EPOLL_CTL_ADD, listen_fd, &ev);

struct epoll_event ready[64];
for (;;) {
    int n = epoll_wait(ep, ready, 64, -1);
    for (int i = 0; i < n; i++) {
        int fd = ready[i].data.fd;
        if (fd == listen_fd) accept_new();
        else handle_client(fd);
    }
}

Line by line

int ep = epoll_create1(EPOLL_CLOEXEC);   /* CLOEXEC stops fd leak on exec */
struct epoll_event ev = { .events = EPOLLIN, .data.fd = listen_fd };
epoll_ctl(ep, EPOLL_CTL_ADD, listen_fd, &ev);   /* watch for readability */
struct epoll_event out[64];
int n = epoll_wait(ep, out, 64, /*ms=*/ -1);    /* block until ready */

Common mistakes

  • Mixing edge-triggered and level-triggered semantics in one fd set.
  • Forgetting to re-arm EPOLLONESHOT after handling.

Debugging tips

strace -e trace=epoll_ctl,epoll_wait ./prog shows the lifecycle. ss -tnp lists what's listening. For 'stuck server' debugging, gdb -p PID then bt shows you which call you're blocked on.

Memory safety

data.ptr is a void * you control. Pointing it at a heap-allocated per-connection struct is normal — but you MUST free it on close, or every disconnect leaks.

Real-world uses

nginx workers, HAProxy, redis, libuv (Node.js / libevent under the hood on Linux), every C-implemented messaging broker.

Practice tasks

  1. Build a level-triggered echo server with epoll. 2. Switch it to edge-triggered and add the drain-to-EAGAIN loop. 3. Add EPOLLRDHUP and close cleanly on half-close.

Summary

epoll = kernel-side event registry. Level-triggered for safety, edge-triggered for max throughput (with drain loops). The bedrock of every modern Linux C server.

Practice with these exercises