Barcamp PT

POSIX Realtime Evented Patterns
An introduction to AIO, IO reactors, callbacks and context switches

Lourens Naudé
Trade2win Limited

The Call Stack

09/12/09
The Call Stack 2

The Call Stack

09/12/09 3

The Call Stack

Keeps track of subroutine execution (call + return)
Dynamic, grows up or down depending on machine
architecture
Composed of n-1 stack frames
Frequently includes local data storage, params,
additional return state etc.
Optimized for a single action
Ordered
Usually a single call stack per process, except ...

09/12/09 4

Context Switches

09/12/09
Threads + Fiber / Coroutine overheads 5

Context Switches

A stack per Thread / Fiber
Context specific data
Transient long running contexts are memory
hungry
Guard against transient threads with a pre-
initialized Thread pool
Threaded full stack Web Apps == expensive
context switches
clean_backtrace ?

09/12/09 6

Ruby Threads

09/12/09
Green Threads 7

Ruby Threads

09/12/09 8

Ruby Threads

Scheduled with a timer (SIGVTALRM) every 10ms
Each thread is allowed a 10ms time slice
Not the most efficient scheduler
Coupled with select with IO multiplexing for
portability
Can wait for : fd, select, a PID, sleep state, a join
MRI 1.8: Green Threads, cheap to spawn + switch
MRI 1.9: Native OS threads, GIL, more expensive
to spawn
JRuby: Ruby thread == Java thread

09/12/09 9

Fibers

09/12/09
Coroutines 10

Fibers

A resumable execution state
Computes a partial result – generator
Yields back to it's caller
Caller resumes
Facilities for data exchange
Initial 4k stack size and very fast context switches
MRI 1.9 and JRuby only
Cooperative scheduling for IO

09/12/09 12

Reactor Pattern

09/12/09
IO Reactor Pattern 13

Reactor Pattern

09/12/09 14

Reactor Pattern

Main loop with a tick quantum ( 10 to 100ms )
Operations register themselves with the reactor
Process forked, fd readable, cmd finished, timer
fired, IO timeout etc.
Callbacks and errbacks
Reactor notified by lower level subsystems : select,
epoll, kqueue etc.
Twisted (Python), EventMachine (Ruby, c++, Java)

09/12/09 15

Reactor and Contexts

09/12/09
Best Practices for Multi Threading 16

Reactor and Threads

Operations fire on the reactor thread
Enumerated and invoked FIFO
Blocking operations block the reactor
Defer: schedule an operation on a background
thread
Schedule: push a deferred context back to the
reactor thread

09/12/09 17

Blocked Reactor

09/12/09 18

Reactor with a deferred operation

09/12/09 19

System Calls

09/12/09
Syscalls and the Kernel 20

Syscalls and the Kernel

09/12/09 21

Syscalls and the Kernel

Function calls into the OS: read,write,fork,sbrk etc.
User vs Kernel space context switch, much more
expensive than function calls within a process
Usually implies data transfer between User and
Kernel
Important to reduce syscalls for high throughput
environments
Some lift workloads ...
sendfile: request a file to be served directly from
the kernel without User space overhead

09/12/09 22

POSIX Realtime (AIO)

09/12/09
POSIX Async IO extensions 23

POSIX Realtime (AIO)

Introduced in Linux Kernel 2.6.x
Floating spec for a number of years, currently
defined in POSIX.1b
Implementation resembles browser compat
Fallback to blocking operations in most
implementations
Powers popular reverse proxies like Squid, Nginx,
Varnish etc.

09/12/09 24

And then there were Specs

09/12/09 25

AIO Control Blocks

09/12/09
AIO Control Blocks 26

Control Block Struct

09/12/09 27

AIO Control Blocks

File descriptor with proper r/w mode set
Buffer region for read / write
Type of operation: read / write
Priority: higher priority == faster execution
Offset: Position to read from / write to
Bytes: Amount of data to transfer
Callback mechanism: no op, thread or signal
Best wrapped in a custom struct for embedding
domain logic specific to the use cases

09/12/09 28

AIO Operations

09/12/09
AIO Operations 29

AIO Operations on a single fd

09/12/09 30

AIO Operations on a single fd

aio_read: sync / async read
aio_write: sync / async write
aio_error: error state, if any, for an operation
aio_error and EINPROGRESS to simulate a
blocking operation
aio_cancel: cancel a submitted job
aio_suspend: pause an in progress operation
aio_sync: forcefully sync a write op to disk
aio_return: return status from a given operation
Uniform API, single AIO Control Block as arg

09/12/09 31

AIO List Operations

09/12/09
AIO List Operations 32

AIO list operations

09/12/09 33

AIO list operations

Previously mentioned API still have a syscall per
call overhead
lio_listio: submit a batch of control blocks with a
single syscall
Modes: blocking, non-blocking and no-op
Array of control blocks, number of operations and
an optional callback as arguments
Callback fires when all operations done
Callbacks from individual control blocks still fire
Useful for app specific correlation

09/12/09 34

AIO and Syscalls

09/12/09
AIO Syscalls 35

8 files, read

09/12/09 36

8 files, async read

09/12/09 37

Revisit Threads and
Fibers

09/12/09
Threads and Fibers, revisited 38

Revisit Threads and Fibers

Concept from James “raggi” Tucker
Cheap switching of MRI green threads
Lets embrace this …
Stopped threads don't have scheduler overhead

09/12/09 39

Revisit Threads and Fibers

09/12/09 40

Fibered IO Interpreter

09/12/09
Fibered IO Interpreter 41

Fibered IO Interpreter

Thread#sleep and Thread#wakeup for pooled or
transient threads
Stopped threads excluded by the scheduler saves
10ms runtime per stopped thread when IO bound
Model fits very well with existing threaded servers
like Mongrel
No need for an IO reactor – we delegate this to the
OS and syscalls

09/12/09 42

Links

09/12/09
Links and References 43

Links and References

A few related projects
http://github.com/eventmachine/eventmachine
Event Machine repository
http://github/methodmissing/aio
Work in progress AIO extension for MRI, API in
flux, but usable
http://github/methodmissing/callback
A native MRI callback object
http://github/methodmissing/channel
Fixed sized pub sub channels for MRI

09/12/09 44

Questions ?

09/12/09
Q&A 45

Thanks!
@methodmissing
(github / twitter)

09/12/09
Thanks for listening 46

Barcamp PT

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Barcamp PT

Similar to Barcamp PT (20)

More from Lourens Naudé

More from Lourens Naudé (9)

Recently uploaded

Recently uploaded (20)

Barcamp PT