this talk is going to get technical, so feel free to interrupt if you have any questions
differs on OS and platforms, but usually includes..
differs on OS and platforms, but usually includes..
differs on OS and platforms, but usually includes..
differs on OS and platforms, but usually includes..
differs on OS and platforms, but usually includes..
each one has pros and cons, different use cases where they make sense. i’ll show pictures for each one. let’s dive into differences
solaris older than version 9 used hybrid threads too
switch to aman
syscalls are calls to kernel functions numbered functions switches from usermode to kernel mode doesn’t show userland functions, but you can look for gaps
look for system calls that took a while look for gaps that indicate userland activity
lots of other options, trace network related or fd related calls, etc
look for system calls that took a while look for gaps that indicate userland activity
lots of other options, trace network related or fd related calls, etc
look for system calls that took a while look for gaps that indicate userland activity
lots of other options, trace network related or fd related calls, etc
so what’s the deal with ruby threads? lets strace to find out
straced a production ruby.. lots of vtalrms. wtf?
so what’s the deal with ruby threads? lets strace to find out
straced a production ruby.. lots of vtalrms. wtf?
ruby uses setitimer and signals to schedule green threads
setitimer tells the kernel to send a VTALRM signal every 10ms. signal interrupts the process and invokes catch_timer to set rb_thread_pending, which lets the interpreter know it needs to switch threads.
rb_thread_start uses thread_init to keep track of whether it needs to start the timer or not. rb_thread_start calls rb_thread_start_timer (.. or pthread_create later)
ruby uses setitimer and signals to schedule green threads
setitimer tells the kernel to send a VTALRM signal every 10ms. signal interrupts the process and invokes catch_timer to set rb_thread_pending, which lets the interpreter know it needs to switch threads.
rb_thread_start uses thread_init to keep track of whether it needs to start the timer or not. rb_thread_start calls rb_thread_start_timer (.. or pthread_create later)
ruby uses setitimer and signals to schedule green threads
setitimer tells the kernel to send a VTALRM signal every 10ms. signal interrupts the process and invokes catch_timer to set rb_thread_pending, which lets the interpreter know it needs to switch threads.
rb_thread_start uses thread_init to keep track of whether it needs to start the timer or not. rb_thread_start calls rb_thread_start_timer (.. or pthread_create later)
but our code isn’t using threads!
turns out net::http and smtp use timeout, which uses threads. and the first time a thread is spawned, the timer is started.. and it never stops!
let’s fix it.
but our code isn’t using threads!
turns out net::http and smtp use timeout, which uses threads. and the first time a thread is spawned, the timer is started.. and it never stops!
let’s fix it.
but our code isn’t using threads!
turns out net::http and smtp use timeout, which uses threads. and the first time a thread is spawned, the timer is started.. and it never stops!
let’s fix it.
remember the thread_init variable from before?
thread_remove() removes the thread from the linked list. if only the main_thread is left, we simply stop the timer, and make sure to set thread_init=0 so the timer is started up again next time a new thread is spawned.
switch over to JOE. talk about running debian ruby in production
we noticed ruby on debian is pretty slow we googled debian ruby issues, and it turns out sigprocmask is related to enable pthread
we noticed ruby on debian is pretty slow we googled debian ruby issues, and it turns out sigprocmask is related to enable pthread
we noticed ruby on debian is pretty slow we googled debian ruby issues, and it turns out sigprocmask is related to enable pthread
using a pthread for timing doesn’t make it slower.. what does?
let’s see what ./configure --enable-pthread actually does. diff’ed generated config.h.
hmm, getcontext/setcontext??
using a pthread for timing doesn’t make it slower.. what does?
let’s see what ./configure --enable-pthread actually does. diff’ed generated config.h.
hmm, getcontext/setcontext??
turns out you don’t really need ucontext to use pthreads (maybe on some obscure platforms?)
let’s strace it!
.. 3.5 million sigprocmask are gone! ruby is 30% faster!
switch to aman
two threads each allocates large stack frame (50kb)
does some computation, then calls thread pass to switch to the other thread
two threads each allocates large stack frame (50kb)
does some computation, then calls thread pass to switch to the other thread
two threads each allocates large stack frame (50kb)
does some computation, then calls thread pass to switch to the other thread
two threads each allocates large stack frame (50kb)
does some computation, then calls thread pass to switch to the other thread
two threads each allocates large stack frame (50kb)
does some computation, then calls thread pass to switch to the other thread
two threads each allocates large stack frame (50kb)
does some computation, then calls thread pass to switch to the other thread
really.. memcpy? let’s make sure
really.. memcpy? let’s make sure
really.. memcpy? let’s make sure
really.. memcpy? let’s make sure
really.. memcpy? let’s make sure
ok, its calling memcpy. what is it copying? it’s copying the thread stacks to the heap. let’s take a step back and talk about the difference between stacks and heaps
ok, its calling memcpy. what is it copying? it’s copying the thread stacks to the heap. let’s take a step back and talk about the difference between stacks and heaps
ok, its calling memcpy. what is it copying? it’s copying the thread stacks to the heap. let’s take a step back and talk about the difference between stacks and heaps
ok, its calling memcpy. what is it copying? it’s copying the thread stacks to the heap. let’s take a step back and talk about the difference between stacks and heaps
ok, its calling memcpy. what is it copying? it’s copying the thread stacks to the heap. let’s take a step back and talk about the difference between stacks and heaps
ok, its calling memcpy. what is it copying? it’s copying the thread stacks to the heap. let’s take a step back and talk about the difference between stacks and heaps
ok, its calling memcpy. what is it copying? it’s copying the thread stacks to the heap. let’s take a step back and talk about the difference between stacks and heaps
ok, its calling memcpy. what is it copying? it’s copying the thread stacks to the heap. let’s take a step back and talk about the difference between stacks and heaps
ok, its calling memcpy. what is it copying? it’s copying the thread stacks to the heap. let’s take a step back and talk about the difference between stacks and heaps
ok, its calling memcpy. what is it copying? it’s copying the thread stacks to the heap. let’s take a step back and talk about the difference between stacks and heaps
ok, its calling memcpy. what is it copying? it’s copying the thread stacks to the heap. let’s take a step back and talk about the difference between stacks and heaps
ok, its calling memcpy. what is it copying? it’s copying the thread stacks to the heap. let’s take a step back and talk about the difference between stacks and heaps
ok, its calling memcpy. what is it copying? it’s copying the thread stacks to the heap. let’s take a step back and talk about the difference between stacks and heaps
ok, its calling memcpy. what is it copying? it’s copying the thread stacks to the heap. let’s take a step back and talk about the difference between stacks and heaps
func3() has a 8byte stack frame, twice as big as the other two
the bigger the stack frames, the more it has to memcpy and the longer it takes.
func3() has a 8byte stack frame, twice as big as the other two
the bigger the stack frames, the more it has to memcpy and the longer it takes.
func3() has a 8byte stack frame, twice as big as the other two
the bigger the stack frames, the more it has to memcpy and the longer it takes.
func3() has a 8byte stack frame, twice as big as the other two
the bigger the stack frames, the more it has to memcpy and the longer it takes.
func3() has a 8byte stack frame, twice as big as the other two
the bigger the stack frames, the more it has to memcpy and the longer it takes.
func3() has a 8byte stack frame, twice as big as the other two
the bigger the stack frames, the more it has to memcpy and the longer it takes.
func3() has a 8byte stack frame, twice as big as the other two
the bigger the stack frames, the more it has to memcpy and the longer it takes.
func3() has a 8byte stack frame, twice as big as the other two
the bigger the stack frames, the more it has to memcpy and the longer it takes.
func3() has a 8byte stack frame, twice as big as the other two
the bigger the stack frames, the more it has to memcpy and the longer it takes.
func3() has a 8byte stack frame, twice as big as the other two
the bigger the stack frames, the more it has to memcpy and the longer it takes.
func3() has a 8byte stack frame, twice as big as the other two
the bigger the stack frames, the more it has to memcpy and the longer it takes.
func3() has a 8byte stack frame, twice as big as the other two
the bigger the stack frames, the more it has to memcpy and the longer it takes.
func3() has a 8byte stack frame, twice as big as the other two
the bigger the stack frames, the more it has to memcpy and the longer it takes.
func3() has a 8byte stack frame, twice as big as the other two
the bigger the stack frames, the more it has to memcpy and the longer it takes.
func3() has a 8byte stack frame, twice as big as the other two
the bigger the stack frames, the more it has to memcpy and the longer it takes.
func3() has a 8byte stack frame, twice as big as the other two
the bigger the stack frames, the more it has to memcpy and the longer it takes.
syscalls are calls to kernel functions numbered functions switches from usermode to kernel mode doesn’t show userland functions, but you can look for gaps
starts out with main() like any C program calls ruby_run right away to start the ruby vm
int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block
but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
starts out with main() like any C program calls ruby_run right away to start the ruby vm
int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block
but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
starts out with main() like any C program calls ruby_run right away to start the ruby vm
int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block
but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
starts out with main() like any C program calls ruby_run right away to start the ruby vm
int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block
but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
starts out with main() like any C program calls ruby_run right away to start the ruby vm
int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block
but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
starts out with main() like any C program calls ruby_run right away to start the ruby vm
int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block
but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
starts out with main() like any C program calls ruby_run right away to start the ruby vm
int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block
but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
starts out with main() like any C program calls ruby_run right away to start the ruby vm
int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block
but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
starts out with main() like any C program calls ruby_run right away to start the ruby vm
int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block
but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
each rb_eval stack frame is almost 1k! (mention mbari patches)
switch to joe
each rb_eval stack frame is almost 1k! (mention mbari patches)
switch to joe
each rb_eval stack frame is almost 1k! (mention mbari patches)
switch to joe
each rb_eval stack frame is almost 1k! (mention mbari patches)
switch to joe
each rb_eval stack frame is almost 1k! (mention mbari patches)
switch to joe
each rb_eval stack frame is almost 1k! (mention mbari patches)
switch to joe
rb_thread_start allocates a new heap, sets the stack pointer using assembly
then thread_save/restore just call setjump and longjump like normal, which takes care of saving and restoring where the stack pointer was pointing!
rb_thread_start allocates a new heap, sets the stack pointer using assembly
then thread_save/restore just call setjump and longjump like normal, which takes care of saving and restoring where the stack pointer was pointing!
rb_thread_start allocates a new heap, sets the stack pointer using assembly
then thread_save/restore just call setjump and longjump like normal, which takes care of saving and restoring where the stack pointer was pointing!
normally the kernel extends the stack automatically mmap is an alternative to malloc that gives you a big region of memory
each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack
each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack
each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack
each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack
each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack
each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack
Threaded Awesome
(that’s an oxymoron)
Joe Damato and Aman Gupta
About Joe Damato
From NJ, Godfather II is actually my
Biography
CMU/VMWare alum
http://timetobleed.com
@joedamato
About Aman Gupta
EventMachine, amqp
Ruby Hero 2009
github.com/tmm1
@tmm1
What is a thread?
source: wikipedia
What is a thread?
What is a thread?
A thread is just a set of execution
state
What is a thread?
A thread is just a set of execution
state
This state usually includes:
What is a thread?
A thread is just a set of execution
state
This state usually includes:
instruction & stack pointers
What is a thread?
A thread is just a set of execution
state
This state usually includes:
instruction & stack pointers
scheduling priority
What is a thread?
A thread is just a set of execution
state
This state usually includes:
instruction & stack pointers
scheduling priority
other CPU state
Threading Models
Green threads (1:N)
Native Threads (1:1)
Hybrid (M:N)
Green Threads (1:N)
Green Threads (1:N)
“Green” because they are light weight
Green Threads (1:N)
“Green” because they are light weight
Kernel doesn’t know they exist
Green Threads (1:N)
“Green” because they are light weight
Kernel doesn’t know they exist
Implementation is in userland
Green Threads (1:N)
“Green” because they are light weight
Kernel doesn’t know they exist
Implementation is in userland
Pros
Green Threads (1:N)
“Green” because they are light weight
Kernel doesn’t know they exist
Implementation is in userland
Pros
Create lots of them cheaply (10,000s)
Green Threads (1:N)
“Green” because they are light weight
Kernel doesn’t know they exist
Implementation is in userland
Pros
Create lots of them cheaply (10,000s)
Switch between them cheaply (Ruby doesn’t)
Green Threads (1:N)
“Green” because they are light weight
Kernel doesn’t know they exist
Implementation is in userland
Pros
Create lots of them cheaply (10,000s)
Switch between them cheaply (Ruby doesn’t)
Schedule them however you want
Green Threads (1:N)
“Green” because they are light weight
Kernel doesn’t know they exist
Implementation is in userland
Pros
Create lots of them cheaply (10,000s)
Switch between them cheaply (Ruby doesn’t)
Schedule them however you want
Cons
Green Threads (1:N)
“Green” because they are light weight
Kernel doesn’t know they exist
Implementation is in userland
Pros
Create lots of them cheaply (10,000s)
Switch between them cheaply (Ruby doesn’t)
Schedule them however you want
Cons
A blocking call in one blocks ALL
Green Threads (1:N)
“Green” because they are light weight
Kernel doesn’t know they exist
Implementation is in userland
Pros
Create lots of them cheaply (10,000s)
Switch between them cheaply (Ruby doesn’t)
Schedule them however you want
Cons
A blocking call in one blocks ALL
Kernel doesn’t know about them
Green Threads (1:N)
“Green” because they are light weight
Kernel doesn’t know they exist
Implementation is in userland
Pros
Create lots of them cheaply (10,000s)
Switch between them cheaply (Ruby doesn’t)
Schedule them however you want
Cons
A blocking call in one blocks ALL
Kernel doesn’t know about them
Can’t take advantage of SMP
Green Threads (1:N)
(pics or it didn’t happen)
Ruby 1.8 uses Green Threads
(and does it wrong)
Native Threads (1:1)
Native Threads (1:1)
Native Threads
Native Threads (1:1)
Native Threads
Kernel knows they exist
Native Threads (1:1)
Native Threads
Kernel knows they exist
Some userland code (libpthread)
Native Threads (1:1)
Native Threads
Kernel knows they exist
Some userland code (libpthread)
Pros
Native Threads (1:1)
Native Threads
Kernel knows they exist
Some userland code (libpthread)
Pros
Take advantage of SMP
Native Threads (1:1)
Native Threads
Kernel knows they exist
Some userland code (libpthread)
Pros
Take advantage of SMP
Shared memory
Native Threads (1:1)
Native Threads
Kernel knows they exist
Some userland code (libpthread)
Pros
Take advantage of SMP
Shared memory
Blocking in one thread doesn’t block everyone
Native Threads (1:1)
Native Threads
Kernel knows they exist
Some userland code (libpthread)
Pros
Take advantage of SMP
Shared memory
Blocking in one thread doesn’t block everyone
Don’t have to write a scheduler
Native Threads (1:1)
Native Threads
Kernel knows they exist
Some userland code (libpthread)
Pros
Take advantage of SMP
Shared memory
Blocking in one thread doesn’t block everyone
Don’t have to write a scheduler
Cons
Native Threads (1:1)
Native Threads
Kernel knows they exist
Some userland code (libpthread)
Pros
Take advantage of SMP
Shared memory
Blocking in one thread doesn’t block everyone
Don’t have to write a scheduler
Cons
Overhead limits how many you can create
Native Threads (1:1)
Native Threads
Kernel knows they exist
Some userland code (libpthread)
Pros
Take advantage of SMP
Shared memory
Blocking in one thread doesn’t block everyone
Don’t have to write a scheduler
Cons
Overhead limits how many you can create
Bugs (glibc, more threads = slower creation time)
Native Threads (1:1)
Native Threads
Kernel knows they exist
Some userland code (libpthread)
Pros
Take advantage of SMP
Shared memory
Blocking in one thread doesn’t block everyone
Don’t have to write a scheduler
Cons
Overhead limits how many you can create
Bugs (glibc, more threads = slower creation time)
Don’t have fine grained scheduling control
Native Threads (1:1)
Ruby 1.9 uses Native Threads
(but.. they don’t execute in parallel)
Hybrid Threads (M:N)
Hybrid Threads (M:N)
Hybrid threads
Hybrid Threads (M:N)
Hybrid threads
Almost best of both worlds
Hybrid Threads (M:N)
Hybrid threads
Almost best of both worlds
Pros
Hybrid Threads (M:N)
Hybrid threads
Almost best of both worlds
Pros
Take advantage of SMP
Hybrid Threads (M:N)
Hybrid threads
Almost best of both worlds
Pros
Take advantage of SMP
Cheap setup and teardown
Hybrid Threads (M:N)
Hybrid threads
Almost best of both worlds
Pros
Take advantage of SMP
Cheap setup and teardown
Blocking in one thread doesn’t block everyone
Hybrid Threads (M:N)
Hybrid threads
Almost best of both worlds
Pros
Take advantage of SMP
Cheap setup and teardown
Blocking in one thread doesn’t block everyone
Cons
Hybrid Threads (M:N)
Hybrid threads
Almost best of both worlds
Pros
Take advantage of SMP
Cheap setup and teardown
Blocking in one thread doesn’t block everyone
Cons
Need 2 schedulers (userland + kernel)
Hybrid Threads (M:N)
Hybrid threads
Almost best of both worlds
Pros
Take advantage of SMP
Cheap setup and teardown
Blocking in one thread doesn’t block everyone
Cons
Need 2 schedulers (userland + kernel)
Need to make them actually work together
Hybrid Threads (M:N)
Hybrid threads
Almost best of both worlds
Pros
Take advantage of SMP
Cheap setup and teardown
Blocking in one thread doesn’t block everyone
Cons
Need 2 schedulers (userland + kernel)
Need to make them actually work together
All green threads backed by same native thread
can be blocked
Hybrid Threads (M:N)
Erlang uses Hybrid Threads
Ruby 1.9, too (with fibers)
Preemptive
Multitasking
Outside event (timer) signals end of CPU slice
Preemptive
Multitasking
Outside event (timer) signals end of CPU slice
Handle important events quickly
Preemptive
Multitasking
Outside event (timer) signals end of CPU slice
Handle important events quickly
Can help ensure everyone gets to execute
Preemptive
Multitasking
Outside event (timer) signals end of CPU slice
Handle important events quickly
Can help ensure everyone gets to execute
But..
Preemptive
Multitasking
Outside event (timer) signals end of CPU slice
Handle important events quickly
Can help ensure everyone gets to execute
But..
Need to build a smart scheduler
Preemptive
Multitasking
Outside event (timer) signals end of CPU slice
Handle important events quickly
Can help ensure everyone gets to execute
But..
Need to build a smart scheduler
Can yield non-determistic execution order
Cooperative
Multitasking
Cooperative
Multitasking
Threads voluntarily release the CPU
Cooperative
Multitasking
Threads voluntarily release the CPU
Give up the CPU when it is “optimal”
Cooperative
Multitasking
Threads voluntarily release the CPU
Give up the CPU when it is “optimal”
Can guarantee deterministic execution order
Cooperative
Multitasking
Threads voluntarily release the CPU
Give up the CPU when it is “optimal”
Can guarantee deterministic execution order
Very simple “scheduler”
Cooperative
Multitasking
Threads voluntarily release the CPU
Give up the CPU when it is “optimal”
Can guarantee deterministic execution order
Very simple “scheduler”
But..
Cooperative
Multitasking
Threads voluntarily release the CPU
Give up the CPU when it is “optimal”
Can guarantee deterministic execution order
Very simple “scheduler”
But..
Badly written code can hang all threads
So, what is a fiber?
In Ruby fibers are green threads
with cooperative multitasking.
So what’s the deal
with ruby threads?
strace
google-perftools
ltrace
gdb
strace
trace system calls and signals
strace -cp <pid>
strace -ttTp <pid> -o <file>
strace -cp <pid>
-c
Count time, calls, and errors for each system call and report a
summary on program exit.
-p pid
Attach to the process with the process ID pid and begin tracing.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
50.39 0.000064 0 1197 592 read
34.65 0.000044 0 609 writev
14.96 0.000019 0 1226 epoll_ctl
0.00 0.000000 0 4 close
0.00 0.000000 0 1 select
0.00 0.000000 0 4 socket
0.00 0.000000 0 4 4 connect
0.00 0.000000 0 1057 epoll_wait
------ ----------- ----------- --------- --------- ----------------
100.00 0.000127 4134 596 total
strace -ttTp <pid> -o <file>
-t
Prefix each line of the trace with the time of day.
-tt
If given twice, the time printed will include the microseconds.
-T
Show the time spent in system calls. This records the time
difference between the beginning and the end of each system call.
-o filename
Write the trace output to the file filename rather than to stderr.
01:09:11.266949 epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109>
01:09:11.300102 accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014>
01:09:11.300190 fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007>
01:09:11.300237 fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008>
01:09:11.300277 setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008>
01:09:11.300489 accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014>
01:09:11.300547 epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009>
01:09:11.300593 epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007>
01:09:11.300633 read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012>
01:09:11.301727 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007>
01:09:11.302095 poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008>
01:09:11.302144 write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023>
01:09:11.302221 read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
strace -ttTp <pid> -o <file>
-t
Prefix each line of the trace with the time of day.
-tt
If given twice, the time printed will include the microseconds.
-T
Show the time spent in system calls. This records the time
difference between the beginning and the end of each system call.
-o filename
Write the trace output to the file filename rather than to stderr.
01:09:11.266949 epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109>
01:09:11.300102 accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014>
01:09:11.300190 fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007>
01:09:11.300237 fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008>
01:09:11.300277 setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008>
01:09:11.300489 accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014>
01:09:11.300547 epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009>
01:09:11.300593 epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007>
01:09:11.300633 read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012>
01:09:11.301727 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007>
01:09:11.302095 poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008>
01:09:11.302144 write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023>
01:09:11.302221 read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
strace -ttTp <pid> -o <file>
-t
Prefix each line of the trace with the time of day.
-tt
If given twice, the time printed will include the microseconds.
-T
Show the time spent in system calls. This records the time
difference between the beginning and the end of each system call.
-o filename
Write the trace output to the file filename rather than to stderr.
01:09:11.266949 epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109>
01:09:11.300102 accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014>
01:09:11.300190 fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007>
01:09:11.300237 fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008>
01:09:11.300277 setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008>
01:09:11.300489 accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014>
01:09:11.300547 epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009>
01:09:11.300593 epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007>
01:09:11.300633 read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012>
01:09:11.301727 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007>
01:09:11.302095 poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008>
01:09:11.302144 write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023>
01:09:11.302221 read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
strace -ttTp <pid> -o <file>
-t
Prefix each line of the trace with the time of day.
-tt
If given twice, the time printed will include the microseconds.
-T
Show the time spent in system calls. This records the time
difference between the beginning and the end of each system call.
-o filename
Write the trace output to the file filename rather than to stderr.
01:09:11.266949 epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109>
01:09:11.300102 accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014>
01:09:11.300190 fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007>
01:09:11.300237 fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008>
01:09:11.300277 setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008>
01:09:11.300489 accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014>
01:09:11.300547 epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009>
01:09:11.300593 epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007>
01:09:11.300633 read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012>
01:09:11.301727 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007>
01:09:11.302095 poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008>
01:09:11.302144 write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023>
01:09:11.302221 read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
ruby uses setitimer and signals
to schedule green threads*
The first time a new thread is created, ruby
calls:
setitimer(ITIMER_VIRTUAL, 10ms): tell the
kernel to send the process a SIGVTALRM
every 10ms
posix_signal(SIGVTALRM, catch_timer): bind
the catch_timer function to the signal
* when compiled without --enable-pthread
static void
catch_timer(sig)
int sig;
{
if (!rb_thread_critical) { static VALUE
rb_thread_pending = 1; rb_thread_start_0(fn, arg, th)
} VALUE (*fn)();
/* cause EINTR */ void *arg;
} rb_thread_t th;
{
void if (!thread_init) {
rb_thread_start_timer() thread_init = 1;
{ posix_signal(SIGVTALRM, catch_timer);
struct itimerval tval; rb_thread_start_timer();
}
if (!thread_init) return;
tval.it_interval.tv_sec = 0; /* ... */
tval.it_interval.tv_usec = 10000; }
tval.it_value = tval.it_interval;
setitimer(ITIMER_VIRTUAL, &tval, NULL);
}
static void
catch_timer(sig)
int sig;
{
if (!rb_thread_critical) { static VALUE
rb_thread_pending = 1; rb_thread_start_0(fn, arg, th)
} VALUE (*fn)();
/* cause EINTR */ void *arg;
} rb_thread_t th;
{
void if (!thread_init) {
rb_thread_start_timer() thread_init = 1;
{ posix_signal(SIGVTALRM, catch_timer);
struct itimerval tval; rb_thread_start_timer();
}
if (!thread_init) return;
tval.it_interval.tv_sec = 0; /* ... */
tval.it_interval.tv_usec = 10000; }
tval.it_value = tval.it_interval;
setitimer(ITIMER_VIRTUAL, &tval, NULL);
}
static void
catch_timer(sig)
int sig;
{
if (!rb_thread_critical) { static VALUE
rb_thread_pending = 1; rb_thread_start_0(fn, arg, th)
} VALUE (*fn)();
/* cause EINTR */ void *arg;
} rb_thread_t th;
{
void if (!thread_init) {
rb_thread_start_timer() thread_init = 1;
{ posix_signal(SIGVTALRM, catch_timer);
struct itimerval tval; rb_thread_start_timer();
}
if (!thread_init) return;
tval.it_interval.tv_sec = 0; /* ... */
tval.it_interval.tv_usec = 10000; }
tval.it_value = tval.it_interval;
setitimer(ITIMER_VIRTUAL, &tval, NULL);
}
But I’m not using threads!
begin
# require 'net/http'
# Net::HTTP.new(host, port).request(...)
# require 'net/smtp'
# Net::SMTP.new('localhost').send_message(...)
require 'timeout'
Timeout.timeout(0.1) do
1+2*3/4 while true
end
rescue Timeout::Error
end
500_000_000.times{ |i| i * 2 }
But I’m not using threads!
begin
# require 'net/http'
# Net::HTTP.new(host, port).request(...) uses timeout
# require 'net/smtp'
# Net::SMTP.new('localhost').send_message(...)
require 'timeout'
Timeout.timeout(0.1) do
1+2*3/4 while true
end
rescue Timeout::Error
end
500_000_000.times{ |i| i * 2 }
But I’m not using threads!
begin
# require 'net/http'
# Net::HTTP.new(host, port).request(...) uses timeout
# require 'net/smtp'
# Net::SMTP.new('localhost').send_message(...)
require 'timeout'
Timeout.timeout(0.1) do uses threads
1+2*3/4 while true
end
rescue Timeout::Error
end
500_000_000.times{ |i| i * 2 }
But I’m not using threads!
begin
# require 'net/http'
# Net::HTTP.new(host, port).request(...) uses timeout
# require 'net/smtp'
# Net::SMTP.new('localhost').send_message(...)
require 'timeout'
Timeout.timeout(0.1) do uses threads
1+2*3/4 while true
end
rescue Timeout::Error
end
500_000_000.times{ |i| i * 2 }
Thread.new, Timeout.timeout and Net::* all use threads
and start the thread timer
Once the timer is started, it will interrupt your
process every 10ms, even if all threads are killed
PATCH: stop the thread timer
@@ -10518,6 +10520,15 @@ rb_thread_remove(th)
rb_thread_die(th);
th->prev->next = th->next;
th->next->prev = th->prev;
+
+ /* if this is the last ruby thread, stop timer signals */
+ if (th->next == th->prev && th->next == main_thread) {
+ rb_thread_stop_timer();
+ thread_init = 0;
+ }
}
What is --enable-pthread anyway?
--- config.h.nopthread
uses a pthread for +++ config.h
@@ -173,6 +173,12 @@
timing instead of #define FILE_READEND _IO_read_end
#define HAVE__SC_CLK_TCK 1
setitimer() #define STACK_GROW_DIRECTION -1
+#define _REENTRANT 1
+#define _THREAD_SAFE 1
useful for +#define HAVE_LIBPTHREAD 1
+#define HAVE_NANOSLEEP 1
compatibility with +#define HAVE_GETCONTEXT 1
+#define HAVE_SETCONTEXT 1
external libs that #define DEFAULT_KCODE KCODE_NONE
#define USE_ELF 1
use pthreads or #define DLEXT_MAXLEN 3
signals (like ruby- #ifdef _THREAD_SAFE
pthread_create(&time_thread, 0,
tk) #else
thread_timer, 0);
rb_thread_start_timer();
#endif
What is --enable-pthread anyway?
--- config.h.nopthread
uses a pthread for +++ config.h
@@ -173,6 +173,12 @@
timing instead of #define FILE_READEND _IO_read_end
#define HAVE__SC_CLK_TCK 1
setitimer() #define STACK_GROW_DIRECTION -1
+#define _REENTRANT 1
+#define _THREAD_SAFE 1
useful for +#define HAVE_LIBPTHREAD 1
+#define HAVE_NANOSLEEP 1
compatibility with +#define HAVE_GETCONTEXT 1
+#define HAVE_SETCONTEXT 1
external libs that #define DEFAULT_KCODE KCODE_NONE
#define USE_ELF 1
use pthreads or #define DLEXT_MAXLEN 3
signals (like ruby- #ifdef _THREAD_SAFE
pthread_create(&time_thread, 0,
tk) #else
thread_timer, 0);
rb_thread_start_timer();
#endif
but.. it also
enables getcontext/
setcontext??
What is --enable-pthread anyway?
--- config.h.nopthread
uses a pthread for +++ config.h
@@ -173,6 +173,12 @@
timing instead of #define FILE_READEND _IO_read_end
#define HAVE__SC_CLK_TCK 1
setitimer() #define STACK_GROW_DIRECTION -1
+#define _REENTRANT 1
+#define _THREAD_SAFE 1
useful for +#define HAVE_LIBPTHREAD 1
+#define HAVE_NANOSLEEP 1
compatibility with +#define HAVE_GETCONTEXT 1
+#define HAVE_SETCONTEXT 1
external libs that #define DEFAULT_KCODE KCODE_NONE
#define USE_ELF 1
use pthreads or #define DLEXT_MAXLEN 3
signals (like ruby- #ifdef _THREAD_SAFE
pthread_create(&time_thread, 0,
?
tk) #else
thread_timer, 0);
rb_thread_start_timer();
#endif
but.. it also
#if defined(HAVE_GETCONTEXT) &&
enables getcontext/ defined(HAVE_SETCONTEXT)
#include <ucontext.h>
setcontext?? #define USE_CONTEXT
#endif
ucontext?
ucontext?
ruby can use either setjmp/longjmp or
setcontext/getcontext in its
threading implementation and for
exception handling
ucontext?
ruby can use either setjmp/longjmp or
setcontext/getcontext in its
threading implementation and for
exception handling
setjmp/longjmp save and restore the
current cpu registers
ucontext?
ruby can use either setjmp/longjmp or
setcontext/getcontext in its
threading implementation and for
exception handling
setjmp/longjmp save and restore the
current cpu registers
setcontext/getcontext are an advanced
version of setjmp/longjmp, but they
also call sigprocmask to save/restore
the signal mask before each jump
PATCH: --disable-ucontext
--- a/configure.in
+++ b/configure.in
@@ -368,6 +368,10 @@
+AC_ARG_ENABLE(ucontext,
+ [ --disable-ucontext do not use getcontext()/setcontext().],
+ [disable_ucontext=yes], [disable_ucontext=no])
+
AC_ARG_ENABLE(pthread,
[ --enable-pthread use pthread library.],
[enable_pthread=$enableval], [enable_pthread=no])
@@ -1038,7 +1042,8 @@
-if test x"$ac_cv_header_ucontext_h" = xyes; then
+if test x"$ac_cv_header_ucontext_h" = xyes && test x"$disable_ucontext" = xno; then
if test x"$rb_with_pthread" = xyes; then
AC_CHECK_FUNCS(getcontext setcontext)
fi
./configure --enable-pthread --disable-ucontext
PATCH: --disable-ucontext
--- a/configure.in
+++ b/configure.in
@@ -368,6 +368,10 @@
+AC_ARG_ENABLE(ucontext,
+ [ --disable-ucontext do not use getcontext()/setcontext().],
+ [disable_ucontext=yes], [disable_ucontext=no])
+
AC_ARG_ENABLE(pthread,
[ --enable-pthread use pthread library.],
[enable_pthread=$enableval], [enable_pthread=no])
@@ -1038,7 +1042,8 @@
-if test x"$ac_cv_header_ucontext_h" = xyes; then
+if test x"$ac_cv_header_ucontext_h" = xyes && test x"$disable_ucontext" = xno; then
if test x"$rb_with_pthread" = xyes; then
AC_CHECK_FUNCS(getcontext setcontext)
fi
./configure --enable-pthread --disable-ucontext
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
nan 0.000000 0 13 read
nan 0.000000 0 21 10 open
nan 0.000000 0 11 close
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 45 10 total
EventMachine + threads = slow??
EventMachine allocates large buffers on the
stack to read/write from the network
Using threads with EM made ruby extremely
slow..
EventMachine + threads = slow??
EventMachine allocates large buffers on the
stack to read/write from the network
Using threads with EM made ruby extremely
slow..
...profile?
EventMachine + threads = slow??
EventMachine allocates large buffers on the
stack to read/write from the network
Using threads with EM made ruby extremely
slow..
#include "ruby.h"
require 'cext'
VALUE bigstack(VALUE self)
(1..2).map{ {
Thread.new{ char buffer[ 50 * 1024 ]; /* large stack frame */
CExt.bigstack{ if (rb_block_given_p()) rb_yield(Qnil);
100_000.times{ return Qnil;
1*2+3/4 }
Thread.pass
} void Init_cext()
} {
} VALUE CExt = rb_define_module("CExt");
}.map{ |t| t.join } rb_define_singleton_method(CExt, "bigstack", bigstack, 0);
}
...profile?
EventMachine + threads = slow??
EventMachine allocates large buffers on the
stack to read/write from the network
Using threads with EM made ruby extremely
slow..
#include "ruby.h"
require 'cext'
VALUE bigstack(VALUE self)
(1..2).map{ {
Thread.new{ char buffer[ 50 * 1024 ]; /* large stack frame */
CExt.bigstack{ if (rb_block_given_p()) rb_yield(Qnil);
100_000.times{ return Qnil;
1*2+3/4 }
Thread.pass
} void Init_cext()
} {
} VALUE CExt = rb_define_module("CExt");
}.map{ |t| t.join } rb_define_singleton_method(CExt, "bigstack", bigstack, 0);
}
...profile?
EventMachine + threads = slow??
EventMachine allocates large buffers on the
stack to read/write from the network
Using threads with EM made ruby extremely
slow..
#include "ruby.h"
require 'cext'
VALUE bigstack(VALUE self)
(1..2).map{ {
Thread.new{ char buffer[ 50 * 1024 ]; /* large stack frame */
CExt.bigstack{ if (rb_block_given_p()) rb_yield(Qnil);
100_000.times{ return Qnil;
1*2+3/4 }
Thread.pass
} void Init_cext()
} {
} VALUE CExt = rb_define_module("CExt");
}.map{ |t| t.join } rb_define_singleton_method(CExt, "bigstack", bigstack, 0);
}
...profile?
EventMachine + threads = slow??
EventMachine allocates large buffers on the
stack to read/write from the network
Using threads with EM made ruby extremely
slow..
#include "ruby.h"
require 'cext'
VALUE bigstack(VALUE self)
(1..2).map{ {
Thread.new{ char buffer[ 50 * 1024 ]; /* large stack frame */
CExt.bigstack{ if (rb_block_given_p()) rb_yield(Qnil);
100_000.times{ return Qnil;
1*2+3/4 }
Thread.pass
} void Init_cext()
} {
} VALUE CExt = rb_define_module("CExt");
}.map{ |t| t.join } rb_define_singleton_method(CExt, "bigstack", bigstack, 0);
}
...profile?
EventMachine + threads = slow??
EventMachine allocates large buffers on the
stack to read/write from the network
Using threads with EM made ruby extremely
slow..
#include "ruby.h"
require 'cext'
VALUE bigstack(VALUE self)
(1..2).map{ {
Thread.new{ char buffer[ 50 * 1024 ]; /* large stack frame */
CExt.bigstack{ if (rb_block_given_p()) rb_yield(Qnil);
100_000.times{ return Qnil;
1*2+3/4 }
Thread.pass
} void Init_cext()
} {
} VALUE CExt = rb_define_module("CExt");
}.map{ |t| t.join } rb_define_singleton_method(CExt, "bigstack", bigstack, 0);
}
...profile?
OK, its calling memcpy()
but what is it copying?
static void static void
rb_thread_save_context(th) rb_thread_restore_context(th)
rb_thread_t th; rb_thread_t th;
{ {
VALUE *pos; ruby_frame = th->frame;
int len; ruby_scope = th->scope;
ruby_setjmp(th->context); /* ... */
len = ruby_stack_length(&pos); MEMCPY(th->stk_pos,
th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len);
th->stk_len = len; ruby_longjmp(th->context);
MEMCPY(th->stk_ptr, }
th->stk_pos, VALUE, th->stk_len);
th->frame = ruby_frame;
th->scope = ruby_scope;
/* ... */
}
OK, its calling memcpy()
but what is it copying?
static void static void
rb_thread_save_context(th) rb_thread_restore_context(th)
rb_thread_t th; rb_thread_t th;
{ {
VALUE *pos; ruby_frame = th->frame;
int len; ruby_scope = th->scope;
ruby_setjmp(th->context); /* ... */
len = ruby_stack_length(&pos); MEMCPY(th->stk_pos,
th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len);
th->stk_len = len; ruby_longjmp(th->context);
MEMCPY(th->stk_ptr, }
th->stk_pos, VALUE, th->stk_len);
1. save cpu registers
th->frame = ruby_frame;
th->scope = ruby_scope;
/* ... */
}
OK, its calling memcpy()
but what is it copying?
static void static void
rb_thread_save_context(th) rb_thread_restore_context(th)
rb_thread_t th; rb_thread_t th;
{ {
VALUE *pos; ruby_frame = th->frame;
int len; ruby_scope = th->scope;
ruby_setjmp(th->context); /* ... */
len = ruby_stack_length(&pos); MEMCPY(th->stk_pos,
th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len);
th->stk_len = len; ruby_longjmp(th->context);
MEMCPY(th->stk_ptr, }
th->stk_pos, VALUE, th->stk_len);
th->frame = ruby_frame;
th->scope = ruby_scope;
/* ... */
}
OK, its calling memcpy()
but what is it copying?
static void static void
rb_thread_save_context(th) rb_thread_restore_context(th)
rb_thread_t th; rb_thread_t th;
{ {
VALUE *pos; ruby_frame = th->frame;
int len; ruby_scope = th->scope;
ruby_setjmp(th->context); /* ... */
len = ruby_stack_length(&pos); MEMCPY(th->stk_pos,
th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len);
th->stk_len = len; ruby_longjmp(th->context);
MEMCPY(th->stk_ptr, }
th->stk_pos, VALUE, th->stk_len);
th->frame = ruby_frame; 2. save stack frames
th->scope = ruby_scope;
/* ... */
}
OK, its calling memcpy()
but what is it copying?
static void static void
rb_thread_save_context(th) rb_thread_restore_context(th)
rb_thread_t th; rb_thread_t th;
{ {
VALUE *pos; ruby_frame = th->frame;
int len; ruby_scope = th->scope;
ruby_setjmp(th->context); /* ... */
len = ruby_stack_length(&pos); MEMCPY(th->stk_pos,
th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len);
th->stk_len = len; ruby_longjmp(th->context);
MEMCPY(th->stk_ptr, }
th->stk_pos, VALUE, th->stk_len);
th->frame = ruby_frame;
th->scope = ruby_scope;
/* ... */
}
OK, its calling memcpy()
but what is it copying?
static void static void
rb_thread_save_context(th) rb_thread_restore_context(th)
rb_thread_t th; rb_thread_t th;
{ {
VALUE *pos; ruby_frame = th->frame;
int len; ruby_scope = th->scope;
ruby_setjmp(th->context); /* ... */
len = ruby_stack_length(&pos); MEMCPY(th->stk_pos,
th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len);
th->stk_len = len; ruby_longjmp(th->context);
MEMCPY(th->stk_ptr, }
th->stk_pos, VALUE, th->stk_len);
th->frame = ruby_frame;
th->scope = ruby_scope; 3. save vm globals
/* ... */
}
OK, its calling memcpy()
but what is it copying?
static void static void
rb_thread_save_context(th) rb_thread_restore_context(th)
rb_thread_t th; rb_thread_t th;
{ {
VALUE *pos; ruby_frame = th->frame;
int len; ruby_scope = th->scope;
ruby_setjmp(th->context); /* ... */
len = ruby_stack_length(&pos); MEMCPY(th->stk_pos,
th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len);
th->stk_len = len; ruby_longjmp(th->context);
MEMCPY(th->stk_ptr, }
th->stk_pos, VALUE, th->stk_len);
th->frame = ruby_frame;
th->scope = ruby_scope;
/* ... */
}
OK, its calling memcpy()
but what is it copying?
static void static void
rb_thread_save_context(th) rb_thread_restore_context(th)
rb_thread_t th; rb_thread_t th;
{ {
VALUE *pos; ruby_frame = th->frame;
int len; ruby_scope = th->scope;
ruby_setjmp(th->context); /* ... */
len = ruby_stack_length(&pos); MEMCPY(th->stk_pos,
th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len);
th->stk_len = len; ruby_longjmp(th->context);
MEMCPY(th->stk_ptr, }
th->stk_pos, VALUE, th->stk_len);
th->frame = ruby_frame;
th->scope = ruby_scope;
/* ... */
4. restore vm globals
}
OK, its calling memcpy()
but what is it copying?
static void static void
rb_thread_save_context(th) rb_thread_restore_context(th)
rb_thread_t th; rb_thread_t th;
{ {
VALUE *pos; ruby_frame = th->frame;
int len; ruby_scope = th->scope;
ruby_setjmp(th->context); /* ... */
len = ruby_stack_length(&pos); MEMCPY(th->stk_pos,
th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len);
th->stk_len = len; ruby_longjmp(th->context);
MEMCPY(th->stk_ptr, }
th->stk_pos, VALUE, th->stk_len);
th->frame = ruby_frame;
th->scope = ruby_scope;
/* ... */
}
OK, its calling memcpy()
but what is it copying?
static void static void
rb_thread_save_context(th) rb_thread_restore_context(th)
rb_thread_t th; rb_thread_t th;
{ {
VALUE *pos; ruby_frame = th->frame;
int len; ruby_scope = th->scope;
ruby_setjmp(th->context); /* ... */
len = ruby_stack_length(&pos); MEMCPY(th->stk_pos,
th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len);
th->stk_len = len; ruby_longjmp(th->context);
MEMCPY(th->stk_ptr, }
th->stk_pos, VALUE, th->stk_len);
th->frame = ruby_frame;
th->scope = ruby_scope;
/* ... */
} 5. restore stack frames
OK, its calling memcpy()
but what is it copying?
static void static void
rb_thread_save_context(th) rb_thread_restore_context(th)
rb_thread_t th; rb_thread_t th;
{ {
VALUE *pos; ruby_frame = th->frame;
int len; ruby_scope = th->scope;
ruby_setjmp(th->context); /* ... */
len = ruby_stack_length(&pos); MEMCPY(th->stk_pos,
th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len);
th->stk_len = len; ruby_longjmp(th->context);
MEMCPY(th->stk_ptr, }
th->stk_pos, VALUE, th->stk_len);
th->frame = ruby_frame;
th->scope = ruby_scope;
/* ... */
}
OK, its calling memcpy()
but what is it copying?
static void static void
rb_thread_save_context(th) rb_thread_restore_context(th)
rb_thread_t th; rb_thread_t th;
{ {
VALUE *pos; ruby_frame = th->frame;
int len; ruby_scope = th->scope;
ruby_setjmp(th->context); /* ... */
len = ruby_stack_length(&pos); MEMCPY(th->stk_pos,
th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len);
th->stk_len = len; ruby_longjmp(th->context);
MEMCPY(th->stk_ptr, }
th->stk_pos, VALUE, th->stk_len);
th->frame = ruby_frame;
th->scope = ruby_scope;
/* ... */
}
6. restore cpu registers
OK, its calling memcpy()
but what is it copying?
static void static void
rb_thread_save_context(th) rb_thread_restore_context(th)
rb_thread_t th; rb_thread_t th;
{ {
VALUE *pos; ruby_frame = th->frame;
int len; ruby_scope = th->scope;
ruby_setjmp(th->context); /* ... */
len = ruby_stack_length(&pos); MEMCPY(th->stk_pos,
th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len);
th->stk_len = len; ruby_longjmp(th->context);
MEMCPY(th->stk_ptr, }
th->stk_pos, VALUE, th->stk_len);
th->frame = ruby_frame;
th->scope = ruby_scope;
/* ... */
}
OK, its calling memcpy()
but what is it copying?
static void static void
rb_thread_save_context(th) rb_thread_restore_context(th)
rb_thread_t th; rb_thread_t th;
{ {
VALUE *pos; ruby_frame = th->frame;
int len; ruby_scope = th->scope;
ruby_setjmp(th->context); /* ... */
len = ruby_stack_length(&pos); MEMCPY(th->stk_pos,
th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len);
th->stk_len = len; ruby_longjmp(th->context);
MEMCPY(th->stk_ptr, }
th->stk_pos, VALUE, th->stk_len);
th->frame = ruby_frame;
th->scope = ruby_scope;
/* ... */
}
it’s copying the stacks to the heap!
Stack vs. Heap
Stack vs. Heap
Stack:
Stack vs. Heap
Stack:
Storage for local vars
Stack vs. Heap
Stack:
Storage for local vars
Only valid while stack
frame is on the stack!
Stack vs. Heap
Stack:
Storage for local vars
Only valid while stack
frame is on the stack!
Keeping track of function calls
Stack vs. Heap
Stack: Heap:
Storage for local vars
Only valid while stack
frame is on the stack!
Keeping track of function calls
Stack vs. Heap
Stack: Heap:
Storage for local vars Storage for vars that
persist across function
Only valid while stack calls.
frame is on the stack!
Keeping track of function calls
Stack vs. Heap
Stack: Heap:
Storage for local vars Storage for vars that
persist across function
Only valid while stack calls.
frame is on the stack!
Managed by malloc
Keeping track of function calls
Stack vs. Heap
func1()
void *data;
func2();
Stack: Heap:
Storage for local vars Storage for vars that
persist across function
Only valid while stack calls.
frame is on the stack!
Managed by malloc
Keeping track of function calls
Stack vs. Heap
func1()
4 bytes void *data;
func2();
Stack: Heap:
Storage for local vars Storage for vars that
persist across function
Only valid while stack calls.
frame is on the stack!
Managed by malloc
Keeping track of function calls
Stack vs. Heap
func2()
char *string = malloc(10);
func3();
func1()
4 bytes void *data;
func2();
Stack: Heap:
Storage for local vars Storage for vars that
persist across function
Only valid while stack calls.
frame is on the stack!
Managed by malloc
Keeping track of function calls
Stack vs. Heap
func2()
4 bytes char *string = malloc(10);
func3();
func1()
4 bytes void *data;
func2();
Stack: Heap:
Storage for local vars Storage for vars that
persist across function
Only valid while stack calls.
frame is on the stack!
Managed by malloc
Keeping track of function calls
Stack vs. Heap
func2()
4 bytes char *string = malloc(10); 10 bytes
func3();
func1()
4 bytes void *data;
func2();
Stack: Heap:
Storage for local vars Storage for vars that
persist across function
Only valid while stack calls.
frame is on the stack!
Managed by malloc
Keeping track of function calls
Stack vs. Heap
func3()
char buffer[8];
func2()
4 bytes char *string = malloc(10); 10 bytes
func3();
func1()
4 bytes void *data;
func2();
Stack: Heap:
Storage for local vars Storage for vars that
persist across function
Only valid while stack calls.
frame is on the stack!
Managed by malloc
Keeping track of function calls
Stack vs. Heap
func3()
8 bytes char buffer[8];
func2()
4 bytes char *string = malloc(10); 10 bytes
func3();
func1()
4 bytes void *data;
func2();
Stack: Heap:
Storage for local vars Storage for vars that
persist across function
Only valid while stack calls.
frame is on the stack!
Managed by malloc
Keeping track of function calls
Stack vs. Heap
func3()
char buffer[8];
func2()
4 bytes char *string = malloc(10); 10 bytes
func3();
func1()
4 bytes void *data;
func2();
Stack: Heap:
Storage for local vars Storage for vars that
persist across function
Only valid while stack calls.
frame is on the stack!
Managed by malloc
Keeping track of function calls
Stack vs. Heap
func2()
4 bytes char *string = malloc(10); 10 bytes
func3();
func1()
4 bytes void *data;
func2();
Stack: Heap:
Storage for local vars Storage for vars that
persist across function
Only valid while stack calls.
frame is on the stack!
Managed by malloc
Keeping track of function calls
memcpy()ing the
thread stacks
memcpy()ing the
thread stacks
During execution
memcpy()ing the
thread stacks
During execution Saving current thread
memcpy()ing the
thread stacks
During execution Saving current thread Restoring next thread
memcpy()ing the
thread stacks
During execution Saving current thread Restoring next thread
so, what’s on these thread stacks?
gdb
the GNU debugger
gdb <program>
gdb <program> <pid>
Be sure to build with:
-ggdb
-O0
gdb walkthrough
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3 int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0;
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it start gdb
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3 int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0;
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3 int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0;
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average set breakpoint on function named average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3 int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0;
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3 int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0;
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run run program
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3 int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0;
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3 int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0;
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3 hit breakpoint!
3 int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0;
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3 int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0;
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3 int sum = x + y;
(gdb) bt show backtrace
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0;
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3 int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0;
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3 int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
function stack
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0;
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3 int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0;
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3 int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0; single step
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3 int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0;
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3 int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0;
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5 print variables
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3 int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0;
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
rb_eval recursively executes ruby code in 1.8
How big is the stack?
How big is the stack?
#8 rb_eval at eval.c:3493
(gdb) p $ebp - $esp
$1 = 968
How big is the stack?
#8 rb_eval at eval.c:3493
(gdb) p $ebp - $esp base - stack ptr = frame size
$1 = 968
How big is the stack?
#8 rb_eval at eval.c:3493
(gdb) p $ebp - $esp base - stack ptr = frame size
$1 = 968 each rb_eval stack frame is almost 1k!
How big is the stack?
#8 rb_eval at eval.c:3493
(gdb) p $ebp - $esp base - stack ptr = frame size
$1 = 968 each rb_eval stack frame is almost 1k!
#0 rb_thread_save_context at eval.c:10597
(gdb) p (void*)rb_gc_stack_start - $esp
$1 = 10572
How big is the stack?
#8 rb_eval at eval.c:3493
(gdb) p $ebp - $esp base - stack ptr = frame size
$1 = 968 each rb_eval stack frame is almost 1k!
#0 rb_thread_save_context at eval.c:10597
(gdb) p (void*)rb_gc_stack_start - $esp
$1 = 10572 10.5k stack will be memcpy()’d
How big is the stack?
#8 rb_eval at eval.c:3493
(gdb) p $ebp - $esp base - stack ptr = frame size
$1 = 968 each rb_eval stack frame is almost 1k!
#0 rb_thread_save_context at eval.c:10597
(gdb) p (void*)rb_gc_stack_start - $esp
$1 = 10572 10.5k stack will be memcpy()’d
50 method calls * 1k ≈ 50k stack
Recap: How do Ruby threads work?
Recap: How do Ruby threads work?
Each thread has it’s own execution context:
saved cpu registers (setjmp/longjmp)
copy of vm globals (current frame, scope, block)
stack (memcpy)
Recap: How do Ruby threads work?
Each thread has it’s own execution context:
saved cpu registers (setjmp/longjmp)
copy of vm globals (current frame, scope, block)
stack (memcpy)
How does Ruby switch between threads?
Thread executes until time is up (SIGVTALRM)
rb_thread_save_context() saves state
rb_thread_schedule() picks the next thread
rb_thread_restore_context() restores new thread state
Recap: How do Ruby threads work?
Each thread has it’s own execution context:
saved cpu registers (setjmp/longjmp)
copy of vm globals (current frame, scope, block)
stack (memcpy)
How does Ruby switch between threads?
Thread executes until time is up (SIGVTALRM)
rb_thread_save_context() saves state
rb_thread_schedule() picks the next thread
rb_thread_restore_context() restores new thread state
memcpy
But I thought you said...
But I thought you said...
The whole point of green threads is
that they are fast and cheap (at the
loss of SMP).
But I thought you said...
The whole point of green threads is
that they are fast and cheap (at the
loss of SMP).
That much copying is neither fast nor
cheap.
But I thought you said...
The whole point of green threads is
that they are fast and cheap (at the
loss of SMP).
That much copying is neither fast nor
cheap.
So how can we fix it?
Stop copying stuff!
Stop copying stuff!
A stack is just a region of memory.
Stop copying stuff!
A stack is just a region of memory.
Why not just point the CPU at a
region on the heap?
Stop copying stuff!
A stack is just a region of memory.
Why not just point the CPU at a
region on the heap?
Then, each switch we just swap
registers and do no copying at all!
Tradeoffs!
Switching will be really, really
fast, but...
Can’t grow stacks anymore
What happens if/when you fall
off the edge?
How big should they be?
Heap or mmap area?
If they go on the heap..
use malloc/free (easier than mmap!)
malloc overhead, though
could try to grow stacks with realloc
need to make sure realloc returns the
same address!
overflow and you’ll corrupt the heap
won’t know until it’s too late!
If they are mmaped...
mmap-family of functions can be hard to
use
could try to grow stacks with mremap
put guard pages under stacks to catch
overflow
We decided to...
mmap the stacks
use guard pages to protect from
overflow
do not grow stacks
stacks are 1MB, but provide a tuning
knob for advanced users
0-copy thread switches
Main Thread Thread Switch Thread Switch
Benchmark
Computer Language Benchmark Game:
http://shootout.alioth.debian.org/
Let’s use the thread-ring benchmark
Let’s also grow the program stack
a bit to illustrate the speed
boost.
Benchmark
Computer Language Benchmark Game:
http://shootout.alioth.debian.org/
Let’s use the thread-ring benchmark
Let’s also grow the program stack
a bit to illustrate the speed
boost.
def grow_stack n=0, &blk
unless n > 20
grow_stack n+1, &blk
else
yield
end
end
thread-ring
number = 50_000_000
threads = []
for i in 1..503
threads << Thread.new(i) do |thr_num|
grow_stack do
while true
Thread.stop
if number > 0
number -= 1
else
puts thr_num
exit 0
end
end
end
end
end
thread-ring
number = 50_000_000
threads = []
for i in 1..503
threads << Thread.new(i) do |thr_num| create 503 threads
grow_stack do
while true
Thread.stop
if number > 0
number -= 1
else
puts thr_num
exit 0
end
end
end
end
end
thread-ring
number = 50_000_000
threads = []
for i in 1..503
threads << Thread.new(i) do |thr_num| create 503 threads
grow_stack do increase the thread stacks
while true
Thread.stop
if number > 0
number -= 1
else
puts thr_num
exit 0
end
end
end
end
end
thread-ring
number = 50_000_000
threads = []
for i in 1..503
threads << Thread.new(i) do |thr_num| create 503 threads
grow_stack do increase the thread stacks
while true
Thread.stop pause the thread
if number > 0
number -= 1
else
puts thr_num
exit 0
end
end
end
end
end
thread-ring
number = 50_000_000
threads = []
for i in 1..503
threads << Thread.new(i) do |thr_num| create 503 threads
grow_stack do increase the thread stacks
while true
Thread.stop pause the thread
if number > 0
number -= 1 when resumed, decrement number
else
puts thr_num
exit 0
end
end
end
end
end
thread-ring
number = 50_000_000
threads = []
for i in 1..503
threads << Thread.new(i) do |thr_num| create 503 threads
grow_stack do increase the thread stacks
while true
Thread.stop pause the thread
if number > 0
number -= 1 when resumed, decrement number
else prev_thread = threads.last
puts thr_num while true
exit 0 for thread in threads
end Thread.pass until prev_thread.stop?
end thread.run
end prev_thread = thread
end end
end end
thread-ring
number = 50_000_000
threads = []
for i in 1..503
threads << Thread.new(i) do |thr_num| create 503 threads
grow_stack do increase the thread stacks
while true
Thread.stop pause the thread
if number > 0
number -= 1 when resumed, decrement number
else prev_thread = threads.last
puts thr_num while true
exit 0 for thread in threads
end Thread.pass until prev_thread.stop?
end thread.run
end prev_thread = thread
end end
end end
schedule each thread until number == 0
rb_thread_schedule() sucks
Thread switching might be
fast now, but the
scheduler is still pretty
bad.
rb_thread_schedule() sucks
Thread switching might be FOREACH_THREAD_FROM(curr, th) {
/* */
fast now, but the }
END_FOREACH_FROM(curr, th);
scheduler is still pretty
FOREACH_THREAD_FROM(curr, th) {
bad. /* */
}
END_FOREACH_FROM(curr, th);
FOREACH_THREAD_FROM(curr, th) {
/* */
}
END_FOREACH_FROM(curr, th);
FOREACH_THREAD_FROM(curr, th) {
/* */
}
END_FOREACH_FROM(curr, th);
FOREACH_THREAD_FROM(curr, th) {
/* */
}
END_FOREACH_FROM(curr, th);
rb_thread_schedule() sucks
Thread switching might be FOREACH_THREAD_FROM(curr, th) {
/* */
fast now, but the }
END_FOREACH_FROM(curr, th);
scheduler is still pretty
FOREACH_THREAD_FROM(curr, th) {
bad. /* */
}
END_FOREACH_FROM(curr, th);
Loops over each thread 5+
FOREACH_THREAD_FROM(curr, th) {
times /* */
}
END_FOREACH_FROM(curr, th);
FOREACH_THREAD_FROM(curr, th) {
/* */
}
END_FOREACH_FROM(curr, th);
FOREACH_THREAD_FROM(curr, th) {
/* */
}
END_FOREACH_FROM(curr, th);
rb_thread_schedule() sucks
Thread switching might be FOREACH_THREAD_FROM(curr, th) {
/* */
fast now, but the }
END_FOREACH_FROM(curr, th);
scheduler is still pretty
FOREACH_THREAD_FROM(curr, th) {
bad. /* */
}
END_FOREACH_FROM(curr, th);
Loops over each thread 5+
FOREACH_THREAD_FROM(curr, th) {
times /* */
}
END_FOREACH_FROM(curr, th);
Complexity theory says
FOREACH_THREAD_FROM(curr, th) {
constants don’t matter, /* */
}
but... END_FOREACH_FROM(curr, th);
FOREACH_THREAD_FROM(curr, th) {
/* */
}
END_FOREACH_FROM(curr, th);
rb_thread_schedule() sucks
Thread switching might be FOREACH_THREAD_FROM(curr, th) {
/* */
fast now, but the }
END_FOREACH_FROM(curr, th);
scheduler is still pretty
FOREACH_THREAD_FROM(curr, th) {
bad. /* */
}
END_FOREACH_FROM(curr, th);
Loops over each thread 5+
FOREACH_THREAD_FROM(curr, th) {
times /* */
}
END_FOREACH_FROM(curr, th);
Complexity theory says
FOREACH_THREAD_FROM(curr, th) {
constants don’t matter, /* */
}
but... END_FOREACH_FROM(curr, th);
FOREACH_THREAD_FROM(curr, th) {
What now? /* */
}
END_FOREACH_FROM(curr, th);
Rewrite the scheduler.
Rewrite the scheduler.
Rewrite the scheduler.
Remove the scheduler!
Fibers!
Fibers!
We’re backporting the Fibers API to MRI
Fibers!
We’re backporting the Fibers API to MRI
Behind the scenes:
Fibers!
We’re backporting the Fibers API to MRI
Behind the scenes:
Create a thread
Fibers!
We’re backporting the Fibers API to MRI
Behind the scenes:
Create a thread
Don’t add to schedule list
Fibers!
We’re backporting the Fibers API to MRI
Behind the scenes:
Create a thread
Don’t add to schedule list
“Schedule” manually with yield and
resume
Ruby Enterprise Edition
Based on Ruby 1.8.6
Thread timer fix is in the current
release.
0-copy threading patch will be in the
next release.
Next release also merges MBARI for
smaller rb_eval stack frames.
http://www.rubyenterpriseedition.com/
Questions?
@joedamato @tmm1
timetobleed.com github.com/tmm1
Thanks for listening!
Let LinkedIn power your SlideShare experience
+
Let LinkedIn power your SlideShare experience
Customize SlideShare content based on your interests
We will import your LinkedIn profile and you will be visible on SlideShare.
Keep up to date when your LinkedIn contacts post on SlideShare
一、1-N thread,即coroutine
二、1-1 Native thread(python2.5+ & ruby1.9+都有实现但是并不能并行执行各个线程)
三、M-N,erlang的实现 1 year ago