This document discusses the epoll function in Linux, which provides efficient event-driven and non-blocking I/O. It allows servers to handle thousands of requests using only a handful of threads by monitoring file descriptors for read/write readiness and notifying the server immediately. This enables highly scalable and asynchronous web servers like Nginx to handle high volumes of traffic with low resource usage.
Linux epoll - Evented I/O for High Performance Servers
1. epoll - The I/O Hero
Evented I/O in Linux
Mohsin Shafeeque MS080400037
2. Operating System
●Processor blindly executes instructions.
●Its the OS that puts a structure that enables a lot
oThings like multi tasking
oInteraction with I/O devices
o Providing APIs and utility functions
●In essence, lots of services!
●Without them, there is no screen, no keyboard no HD.
●But today I'll talk about a single service that has a great
impact.
●Or say, a single function that made Linux Kernel hell lot
valuable.
●And that is, evented I/O.
3. A little about I/O
●All I/O devices are slow
oDisks
oNetworks
●Processor are too much fast.
oA 10 msec disk operation.
oAnd processor has executed millions of instructions.
●Two modes of I/O
oBlocking
Process blocks until operation completes
oNon blocking
Process continues. Asynchronous.
●Non-blocking has many implementations
4. Non blocking I/O
●There are many hardware/software implementations.
●Polling
oContinues looping to check status polling
oWastes CPU cycles
●Signals
oOS generated interrupts.
oMight leave other processes inconsistent.
●Callbacks
oPointer to functions.
oStack deepening issue. (callback issuing I/O)
●Interrupts
o Hardware interrupts in kernel mode.
5. Web servers - I/O hungry!
●Its not just the disk fragmentation and file copy
programs that are I/O hungry.
●Even more hungry are the web servers!
●In the age of Internet, web server performance is
critical.
●And all of it relies on throughput!
●Number of requests/clients served per second.
● There are many models around it.
●But one particular service/function called epoll has
accelerated all this.
6. How web server works?
●Before we look at epoll(), lets look at servers.
●The open up a socket.
●Wait for incoming connections.
●There are three models here:
oOne process per connection (Apache?)
oOne thread per connection.
●In first case, a new process spawned on every request.
●Second case, new thread created for each request.
●Both aren't very scalable.
● Third option:
oOne thread multiple connections!
●Lets see how it works!
7. Create sockets, select then!
●Single thread creates many sockets.
●Each socket is a file descriptor so an array of them.
●Server code calls the below given select function.
oint select(int nfds, fd_set *readfds, fd_set
*writefds, fd_set *exceptfds, struct timeval
*timeout);
●nfds - number of file descriptors.
●readfs - those to be read.
●writefds - those to be written.
●exceptfds - those to check for error.
●timeout - time to sleep at max.
● Program calls this function, which makes it to sleep.
●The call would only return when some descriptor is ready.
8. select - Zooming in
●From the man page:
select() and pselect() allow a program to monitor multiple file
descriptors, waiting until one or more of the file descriptors become
"ready" for some class of I/O operation (e.g., input possible). A file
descriptor is considered ready if it is possible to perform the corre‐sponding
I/O operation (e.g., read(2)) without blocking.
9. Problem with select
●It takes O(n) time!
●That is, if 500 file descriptors (sockets) are being
watched,
●it might take 500 steps to return the fd that's ready for
I/O.
●And that's a problem!
●Already we have kernel to user mode switching overhead.
●And then this O(n)
● Solution....?
●epoll() - Introduced in Linux 2.5.44.
●Takes O(1) time. That's fast!
●Lets have a look.
10. epoll details
int epoll_create(int size);
Creates an epoll object and returns its file descriptor.
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
Controls (configures) which file descriptors are watched by this object, and for which events.
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
11. How it works?
●epoll_wait is called by the server code.
●It will return any fds that are ready (i.e. have data).
●Two modes of operations
oedge triggered.
olevel-triggered.
●In edge triggered, process will be invoked only once per
new arrival.
oE.g. 2 KB received, process reads 1 KB only, next call
will block till further data arrives even if 1 KB already
in.
●In level triggered, process will be invoked till the buffer
is empty.
oE.g. 2 KB received, 1 KB read, next call won't block but
would return same descriptor.
12. So what?
●This model has enabled the servers to handle thousands
of requests in handful of threads.
●Server creates sockets, on each request arrival, a file
descriptor created and monitored.
●Event driven!
●Ngnix! The second largest server on internet uses this
model.
●Written by a Russian. Handles 70 million calls a day.
●Ngnix used by Wordpress! Many others!
● Node.js - The new hotness. Based on V8 Javascript
Engine.
●Uses the epoll() to handle thousands of requests per
thread. Apache in comparison is on thread per request.
13. Conclusion
●Operating systems do provide services.
●But sometimes, even a single function call can open up a
new world of possibilities.
● epoll() is such an example.
●I/O is most critical portion of an OS.
●Even if it is single tasking system efficient I/O is what
the system is all about.
●And epoll() adds event driven I/O to Linux.
●Interesting applications are being developed on top of
epoll()
14. References
●epoll() official man page.
●Linux Device Drivers - poll epoll
● Node.js - Evented I/O for Javascript
● NGNIX - The Russian Webserver