Threaded Awesome

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

1 comments

Comments 1 - 1 of 1 previous next Post a comment

Post a comment
Embed Video
Edit your comment Cancel

Notes on slide 1

this talk is going to get technical, so feel free to interrupt if you have any questions

differs on OS and platforms, but usually includes..

differs on OS and platforms, but usually includes..

differs on OS and platforms, but usually includes..

differs on OS and platforms, but usually includes..

differs on OS and platforms, but usually includes..

each one has pros and cons, different use cases where they make sense.
i’ll show pictures for each one.
let’s dive into differences

solaris older than version 9 used hybrid threads too

switch to aman

syscalls are calls to kernel functions
numbered functions
switches from usermode to kernel mode
doesn’t show userland functions, but you can look for gaps

look for system calls that took a while
look for gaps that indicate userland activity

lots of other options, trace network related or fd related calls, etc

look for system calls that took a while
look for gaps that indicate userland activity

lots of other options, trace network related or fd related calls, etc

look for system calls that took a while
look for gaps that indicate userland activity

lots of other options, trace network related or fd related calls, etc

so what’s the deal with ruby threads? lets strace to find out

straced a production ruby.. lots of vtalrms. wtf?

so what’s the deal with ruby threads? lets strace to find out

straced a production ruby.. lots of vtalrms. wtf?

ruby uses setitimer and signals to schedule green threads

setitimer tells the kernel to send a VTALRM signal every 10ms. signal interrupts the process and invokes catch_timer to set rb_thread_pending, which lets the interpreter know it needs to switch threads.

rb_thread_start uses thread_init to keep track of whether it needs to start the timer or not.
rb_thread_start calls rb_thread_start_timer (.. or pthread_create later)

ruby uses setitimer and signals to schedule green threads

setitimer tells the kernel to send a VTALRM signal every 10ms. signal interrupts the process and invokes catch_timer to set rb_thread_pending, which lets the interpreter know it needs to switch threads.

rb_thread_start uses thread_init to keep track of whether it needs to start the timer or not.
rb_thread_start calls rb_thread_start_timer (.. or pthread_create later)

ruby uses setitimer and signals to schedule green threads

setitimer tells the kernel to send a VTALRM signal every 10ms. signal interrupts the process and invokes catch_timer to set rb_thread_pending, which lets the interpreter know it needs to switch threads.

rb_thread_start uses thread_init to keep track of whether it needs to start the timer or not.
rb_thread_start calls rb_thread_start_timer (.. or pthread_create later)

but our code isn’t using threads!

turns out net::http and smtp use timeout, which uses threads. and the first time a thread is spawned, the timer is started.. and it never stops!

let’s fix it.

but our code isn’t using threads!

turns out net::http and smtp use timeout, which uses threads. and the first time a thread is spawned, the timer is started.. and it never stops!

let’s fix it.

but our code isn’t using threads!

turns out net::http and smtp use timeout, which uses threads. and the first time a thread is spawned, the timer is started.. and it never stops!

let’s fix it.

remember the thread_init variable from before?

thread_remove() removes the thread from the linked list. if only the main_thread is left, we simply stop the timer, and make sure to set thread_init=0 so the timer is started up again next time a new thread is spawned.

switch over to JOE. talk about running debian ruby in production

we noticed ruby on debian is pretty slow
we googled debian ruby issues, and it turns out sigprocmask is related to enable pthread

we noticed ruby on debian is pretty slow
we googled debian ruby issues, and it turns out sigprocmask is related to enable pthread

we noticed ruby on debian is pretty slow
we googled debian ruby issues, and it turns out sigprocmask is related to enable pthread

using a pthread for timing doesn’t make it slower.. what does?

let’s see what ./configure --enable-pthread actually does. diff’ed generated config.h.

hmm, getcontext/setcontext??

using a pthread for timing doesn’t make it slower.. what does?

let’s see what ./configure --enable-pthread actually does. diff’ed generated config.h.

hmm, getcontext/setcontext??

turns out you don’t really need ucontext to use pthreads (maybe on some obscure platforms?)

let’s strace it!

.. 3.5 million sigprocmask are gone! ruby is 30% faster!

switch to aman

two threads
each allocates large stack frame (50kb)

does some computation, then calls thread pass to switch to the other thread

two threads
each allocates large stack frame (50kb)

does some computation, then calls thread pass to switch to the other thread

two threads
each allocates large stack frame (50kb)

does some computation, then calls thread pass to switch to the other thread

two threads
each allocates large stack frame (50kb)

does some computation, then calls thread pass to switch to the other thread

two threads
each allocates large stack frame (50kb)

does some computation, then calls thread pass to switch to the other thread

two threads
each allocates large stack frame (50kb)

does some computation, then calls thread pass to switch to the other thread

really.. memcpy? let’s make sure

really.. memcpy? let’s make sure

really.. memcpy? let’s make sure

really.. memcpy? let’s make sure

really.. memcpy? let’s make sure

ok, its calling memcpy. what is it copying?
it’s copying the thread stacks to the heap.
let’s take a step back and talk about the difference between stacks and heaps

ok, its calling memcpy. what is it copying?
it’s copying the thread stacks to the heap.
let’s take a step back and talk about the difference between stacks and heaps

ok, its calling memcpy. what is it copying?
it’s copying the thread stacks to the heap.
let’s take a step back and talk about the difference between stacks and heaps

ok, its calling memcpy. what is it copying?
it’s copying the thread stacks to the heap.
let’s take a step back and talk about the difference between stacks and heaps

ok, its calling memcpy. what is it copying?
it’s copying the thread stacks to the heap.
let’s take a step back and talk about the difference between stacks and heaps

ok, its calling memcpy. what is it copying?
it’s copying the thread stacks to the heap.
let’s take a step back and talk about the difference between stacks and heaps

ok, its calling memcpy. what is it copying?
it’s copying the thread stacks to the heap.
let’s take a step back and talk about the difference between stacks and heaps

ok, its calling memcpy. what is it copying?
it’s copying the thread stacks to the heap.
let’s take a step back and talk about the difference between stacks and heaps

ok, its calling memcpy. what is it copying?
it’s copying the thread stacks to the heap.
let’s take a step back and talk about the difference between stacks and heaps

ok, its calling memcpy. what is it copying?
it’s copying the thread stacks to the heap.
let’s take a step back and talk about the difference between stacks and heaps

ok, its calling memcpy. what is it copying?
it’s copying the thread stacks to the heap.
let’s take a step back and talk about the difference between stacks and heaps

ok, its calling memcpy. what is it copying?
it’s copying the thread stacks to the heap.
let’s take a step back and talk about the difference between stacks and heaps

ok, its calling memcpy. what is it copying?
it’s copying the thread stacks to the heap.
let’s take a step back and talk about the difference between stacks and heaps

ok, its calling memcpy. what is it copying?
it’s copying the thread stacks to the heap.
let’s take a step back and talk about the difference between stacks and heaps

func3() has a 8byte stack frame, twice as big as the other two

the bigger the stack frames, the more it has to memcpy and the longer it takes.

func3() has a 8byte stack frame, twice as big as the other two

the bigger the stack frames, the more it has to memcpy and the longer it takes.

func3() has a 8byte stack frame, twice as big as the other two

the bigger the stack frames, the more it has to memcpy and the longer it takes.

func3() has a 8byte stack frame, twice as big as the other two

the bigger the stack frames, the more it has to memcpy and the longer it takes.

func3() has a 8byte stack frame, twice as big as the other two

the bigger the stack frames, the more it has to memcpy and the longer it takes.

func3() has a 8byte stack frame, twice as big as the other two

the bigger the stack frames, the more it has to memcpy and the longer it takes.

func3() has a 8byte stack frame, twice as big as the other two

the bigger the stack frames, the more it has to memcpy and the longer it takes.

func3() has a 8byte stack frame, twice as big as the other two

the bigger the stack frames, the more it has to memcpy and the longer it takes.

func3() has a 8byte stack frame, twice as big as the other two

the bigger the stack frames, the more it has to memcpy and the longer it takes.

func3() has a 8byte stack frame, twice as big as the other two

the bigger the stack frames, the more it has to memcpy and the longer it takes.

func3() has a 8byte stack frame, twice as big as the other two

the bigger the stack frames, the more it has to memcpy and the longer it takes.

func3() has a 8byte stack frame, twice as big as the other two

the bigger the stack frames, the more it has to memcpy and the longer it takes.

func3() has a 8byte stack frame, twice as big as the other two

the bigger the stack frames, the more it has to memcpy and the longer it takes.

func3() has a 8byte stack frame, twice as big as the other two

the bigger the stack frames, the more it has to memcpy and the longer it takes.

func3() has a 8byte stack frame, twice as big as the other two

the bigger the stack frames, the more it has to memcpy and the longer it takes.

func3() has a 8byte stack frame, twice as big as the other two

the bigger the stack frames, the more it has to memcpy and the longer it takes.

syscalls are calls to kernel functions
numbered functions
switches from usermode to kernel mode
doesn’t show userland functions, but you can look for gaps

starts out with main() like any C program
calls ruby_run right away to start the ruby vm

int_dotimes in numeric.c, this code calls 5000.times{}
rb_yield is yielding to the block

but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval

starts out with main() like any C program
calls ruby_run right away to start the ruby vm

int_dotimes in numeric.c, this code calls 5000.times{}
rb_yield is yielding to the block

but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval

starts out with main() like any C program
calls ruby_run right away to start the ruby vm

int_dotimes in numeric.c, this code calls 5000.times{}
rb_yield is yielding to the block

but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval

starts out with main() like any C program
calls ruby_run right away to start the ruby vm

int_dotimes in numeric.c, this code calls 5000.times{}
rb_yield is yielding to the block

but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval

starts out with main() like any C program
calls ruby_run right away to start the ruby vm

int_dotimes in numeric.c, this code calls 5000.times{}
rb_yield is yielding to the block

but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval

starts out with main() like any C program
calls ruby_run right away to start the ruby vm

int_dotimes in numeric.c, this code calls 5000.times{}
rb_yield is yielding to the block

but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval

starts out with main() like any C program
calls ruby_run right away to start the ruby vm

int_dotimes in numeric.c, this code calls 5000.times{}
rb_yield is yielding to the block

but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval

starts out with main() like any C program
calls ruby_run right away to start the ruby vm

int_dotimes in numeric.c, this code calls 5000.times{}
rb_yield is yielding to the block

but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval

starts out with main() like any C program
calls ruby_run right away to start the ruby vm

int_dotimes in numeric.c, this code calls 5000.times{}
rb_yield is yielding to the block

but, the most common stack frame is rb_eval. 1.8’s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval

each rb_eval stack frame is almost 1k!
(mention mbari patches)

switch to joe

each rb_eval stack frame is almost 1k!
(mention mbari patches)

switch to joe

each rb_eval stack frame is almost 1k!
(mention mbari patches)

switch to joe

each rb_eval stack frame is almost 1k!
(mention mbari patches)

switch to joe

each rb_eval stack frame is almost 1k!
(mention mbari patches)

switch to joe

each rb_eval stack frame is almost 1k!
(mention mbari patches)

switch to joe

rb_thread_start allocates a new heap, sets the stack pointer using assembly

then thread_save/restore just call setjump and longjump like normal, which takes care of saving and restoring where the stack pointer was pointing!

rb_thread_start allocates a new heap, sets the stack pointer using assembly

then thread_save/restore just call setjump and longjump like normal, which takes care of saving and restoring where the stack pointer was pointing!

rb_thread_start allocates a new heap, sets the stack pointer using assembly

then thread_save/restore just call setjump and longjump like normal, which takes care of saving and restoring where the stack pointer was pointing!

normally the kernel extends the stack automatically
mmap is an alternative to malloc that gives you a big region of memory

each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack

each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack

each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack

each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack

each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack

each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack

30 Favorites

Threaded Awesome - Presentation Transcript

  1. Threaded Awesome (that’s an oxymoron) Joe Damato and Aman Gupta
  2. About Joe Damato From NJ, Godfather II is actually my Biography CMU/VMWare alum http://timetobleed.com @joedamato
  3. About Aman Gupta EventMachine, amqp Ruby Hero 2009 github.com/tmm1 @tmm1
  4. What is a thread? source: wikipedia
  5. What is a thread?
  6. What is a thread? A thread is just a set of execution state
  7. What is a thread? A thread is just a set of execution state This state usually includes:
  8. What is a thread? A thread is just a set of execution state This state usually includes: instruction & stack pointers
  9. What is a thread? A thread is just a set of execution state This state usually includes: instruction & stack pointers scheduling priority
  10. What is a thread? A thread is just a set of execution state This state usually includes: instruction & stack pointers scheduling priority other CPU state
  11. Threading Models Green threads (1:N) Native Threads (1:1) Hybrid (M:N)
  12. Green Threads (1:N)
  13. Green Threads (1:N) “Green” because they are light weight
  14. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist
  15. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland
  16. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland Pros
  17. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland Pros Create lots of them cheaply (10,000s)
  18. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland Pros Create lots of them cheaply (10,000s) Switch between them cheaply (Ruby doesn’t)
  19. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland Pros Create lots of them cheaply (10,000s) Switch between them cheaply (Ruby doesn’t) Schedule them however you want
  20. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland Pros Create lots of them cheaply (10,000s) Switch between them cheaply (Ruby doesn’t) Schedule them however you want Cons
  21. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland Pros Create lots of them cheaply (10,000s) Switch between them cheaply (Ruby doesn’t) Schedule them however you want Cons A blocking call in one blocks ALL
  22. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland Pros Create lots of them cheaply (10,000s) Switch between them cheaply (Ruby doesn’t) Schedule them however you want Cons A blocking call in one blocks ALL Kernel doesn’t know about them
  23. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland Pros Create lots of them cheaply (10,000s) Switch between them cheaply (Ruby doesn’t) Schedule them however you want Cons A blocking call in one blocks ALL Kernel doesn’t know about them Can’t take advantage of SMP
  24. Green Threads (1:N) (pics or it didn’t happen)
  25. Ruby 1.8 uses Green Threads (and does it wrong)
  26. Native Threads (1:1)
  27. Native Threads (1:1) Native Threads
  28. Native Threads (1:1) Native Threads Kernel knows they exist
  29. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread)
  30. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros
  31. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros Take advantage of SMP
  32. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros Take advantage of SMP Shared memory
  33. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros Take advantage of SMP Shared memory Blocking in one thread doesn’t block everyone
  34. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros Take advantage of SMP Shared memory Blocking in one thread doesn’t block everyone Don’t have to write a scheduler
  35. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros Take advantage of SMP Shared memory Blocking in one thread doesn’t block everyone Don’t have to write a scheduler Cons
  36. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros Take advantage of SMP Shared memory Blocking in one thread doesn’t block everyone Don’t have to write a scheduler Cons Overhead limits how many you can create
  37. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros Take advantage of SMP Shared memory Blocking in one thread doesn’t block everyone Don’t have to write a scheduler Cons Overhead limits how many you can create Bugs (glibc, more threads = slower creation time)
  38. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros Take advantage of SMP Shared memory Blocking in one thread doesn’t block everyone Don’t have to write a scheduler Cons Overhead limits how many you can create Bugs (glibc, more threads = slower creation time) Don’t have fine grained scheduling control
  39. Native Threads (1:1)
  40. Ruby 1.9 uses Native Threads (but.. they don’t execute in parallel)
  41. Hybrid Threads (M:N)
  42. Hybrid Threads (M:N) Hybrid threads
  43. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds
  44. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds Pros
  45. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds Pros Take advantage of SMP
  46. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds Pros Take advantage of SMP Cheap setup and teardown
  47. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds Pros Take advantage of SMP Cheap setup and teardown Blocking in one thread doesn’t block everyone
  48. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds Pros Take advantage of SMP Cheap setup and teardown Blocking in one thread doesn’t block everyone Cons
  49. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds Pros Take advantage of SMP Cheap setup and teardown Blocking in one thread doesn’t block everyone Cons Need 2 schedulers (userland + kernel)
  50. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds Pros Take advantage of SMP Cheap setup and teardown Blocking in one thread doesn’t block everyone Cons Need 2 schedulers (userland + kernel) Need to make them actually work together
  51. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds Pros Take advantage of SMP Cheap setup and teardown Blocking in one thread doesn’t block everyone Cons Need 2 schedulers (userland + kernel) Need to make them actually work together All green threads backed by same native thread can be blocked
  52. Hybrid Threads (M:N)
  53. Erlang uses Hybrid Threads Ruby 1.9, too (with fibers)
  54. Multitasking Types Preemptive Multitasking Cooperative Multitasking
  55. Preemptive Multitasking
  56. Preemptive Multitasking Outside event (timer) signals end of CPU slice
  57. Preemptive Multitasking Outside event (timer) signals end of CPU slice Handle important events quickly
  58. Preemptive Multitasking Outside event (timer) signals end of CPU slice Handle important events quickly Can help ensure everyone gets to execute
  59. Preemptive Multitasking Outside event (timer) signals end of CPU slice Handle important events quickly Can help ensure everyone gets to execute But..
  60. Preemptive Multitasking Outside event (timer) signals end of CPU slice Handle important events quickly Can help ensure everyone gets to execute But.. Need to build a smart scheduler
  61. Preemptive Multitasking Outside event (timer) signals end of CPU slice Handle important events quickly Can help ensure everyone gets to execute But.. Need to build a smart scheduler Can yield non-determistic execution order
  62. Cooperative Multitasking
  63. Cooperative Multitasking Threads voluntarily release the CPU
  64. Cooperative Multitasking Threads voluntarily release the CPU Give up the CPU when it is “optimal”
  65. Cooperative Multitasking Threads voluntarily release the CPU Give up the CPU when it is “optimal” Can guarantee deterministic execution order
  66. Cooperative Multitasking Threads voluntarily release the CPU Give up the CPU when it is “optimal” Can guarantee deterministic execution order Very simple “scheduler”
  67. Cooperative Multitasking Threads voluntarily release the CPU Give up the CPU when it is “optimal” Can guarantee deterministic execution order Very simple “scheduler” But..
  68. Cooperative Multitasking Threads voluntarily release the CPU Give up the CPU when it is “optimal” Can guarantee deterministic execution order Very simple “scheduler” But.. Badly written code can hang all threads
  69. So, what is a fiber? In Ruby fibers are green threads with cooperative multitasking.
  70. So what’s the deal with ruby threads? strace google-perftools ltrace gdb
  71. strace trace system calls and signals strace -cp <pid> strace -ttTp <pid> -o <file>
  72. strace -cp <pid> -c Count time, calls, and errors for each system call and report a summary on program exit. -p pid Attach to the process with the process ID pid and begin tracing. % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 50.39 0.000064 0 1197 592 read 34.65 0.000044 0 609 writev 14.96 0.000019 0 1226 epoll_ctl 0.00 0.000000 0 4 close 0.00 0.000000 0 1 select 0.00 0.000000 0 4 socket 0.00 0.000000 0 4 4 connect 0.00 0.000000 0 1057 epoll_wait ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000127 4134 596 total
  73. strace -ttTp <pid> -o <file> -t Prefix each line of the trace with the time of day. -tt If given twice, the time printed will include the microseconds. -T Show the time spent in system calls. This records the time difference between the beginning and the end of each system call. -o filename Write the trace output to the file filename rather than to stderr. 01:09:11.266949 epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109> 01:09:11.300102 accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014> 01:09:11.300190 fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007> 01:09:11.300237 fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008> 01:09:11.300277 setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008> 01:09:11.300489 accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014> 01:09:11.300547 epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009> 01:09:11.300593 epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007> 01:09:11.300633 read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012> 01:09:11.301727 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007> 01:09:11.302095 poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008> 01:09:11.302144 write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023> 01:09:11.302221 read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
  74. strace -ttTp <pid> -o <file> -t Prefix each line of the trace with the time of day. -tt If given twice, the time printed will include the microseconds. -T Show the time spent in system calls. This records the time difference between the beginning and the end of each system call. -o filename Write the trace output to the file filename rather than to stderr. 01:09:11.266949 epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109> 01:09:11.300102 accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014> 01:09:11.300190 fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007> 01:09:11.300237 fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008> 01:09:11.300277 setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008> 01:09:11.300489 accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014> 01:09:11.300547 epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009> 01:09:11.300593 epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007> 01:09:11.300633 read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012> 01:09:11.301727 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007> 01:09:11.302095 poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008> 01:09:11.302144 write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023> 01:09:11.302221 read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
  75. strace -ttTp <pid> -o <file> -t Prefix each line of the trace with the time of day. -tt If given twice, the time printed will include the microseconds. -T Show the time spent in system calls. This records the time difference between the beginning and the end of each system call. -o filename Write the trace output to the file filename rather than to stderr. 01:09:11.266949 epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109> 01:09:11.300102 accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014> 01:09:11.300190 fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007> 01:09:11.300237 fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008> 01:09:11.300277 setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008> 01:09:11.300489 accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014> 01:09:11.300547 epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009> 01:09:11.300593 epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007> 01:09:11.300633 read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012> 01:09:11.301727 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007> 01:09:11.302095 poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008> 01:09:11.302144 write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023> 01:09:11.302221 read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
  76. strace -ttTp <pid> -o <file> -t Prefix each line of the trace with the time of day. -tt If given twice, the time printed will include the microseconds. -T Show the time spent in system calls. This records the time difference between the beginning and the end of each system call. -o filename Write the trace output to the file filename rather than to stderr. 01:09:11.266949 epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109> 01:09:11.300102 accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014> 01:09:11.300190 fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007> 01:09:11.300237 fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008> 01:09:11.300277 setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008> 01:09:11.300489 accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014> 01:09:11.300547 epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009> 01:09:11.300593 epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007> 01:09:11.300633 read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012> 01:09:11.301727 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007> 01:09:11.302095 poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008> 01:09:11.302144 write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023> 01:09:11.302221 read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
  77. Let’s strace ruby..
  78. Let’s strace ruby.. 15:45:51.658164 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.658244 rt_sigreturn(0x1a) = 2207807 <0.000009> 15:45:51.678208 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.678271 rt_sigreturn(0x1a) = 0 <0.000009> 15:45:51.698161 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.698216 rt_sigreturn(0x1a) = 140734552062624 <0.000009> 15:45:51.718154 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.718192 rt_sigreturn(0x1a) = 140734552066688 <0.000009> 15:45:51.738185 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.738221 rt_sigreturn(0x1a) = 11333952 <0.000008> 15:45:51.758162 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.758216 rt_sigreturn(0x1a) = 0 <0.000009> 15:45:51.778223 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.778296 rt_sigreturn(0x1a) = 0 <0.000009> 15:45:51.798170 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.798244 rt_sigreturn(0x1a) = 2298980 <0.000009> 15:45:51.818168 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.819817 rt_sigreturn(0x1a) = 1 <0.000010> 15:45:51.838196 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
  79. Let’s strace ruby.. 15:45:51.658164 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.658244 rt_sigreturn(0x1a) = 2207807 <0.000009> 15:45:51.678208 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.678271 rt_sigreturn(0x1a) = 0 <0.000009> 15:45:51.698161 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.698216 rt_sigreturn(0x1a) = 140734552062624 <0.000009> 15:45:51.718154 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.718192 rt_sigreturn(0x1a) = 140734552066688 <0.000009> 15:45:51.738185 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.738221 rt_sigreturn(0x1a) = 11333952 <0.000008> 15:45:51.758162 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.758216 rt_sigreturn(0x1a) = 0 <0.000009> 15:45:51.778223 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.778296 rt_sigreturn(0x1a) = 0 <0.000009> 15:45:51.798170 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.798244 rt_sigreturn(0x1a) = 2298980 <0.000009> 15:45:51.818168 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.819817 rt_sigreturn(0x1a) = 1 <0.000010> 15:45:51.838196 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- wtf is SIGVTALRM?
  80. ruby uses setitimer and signals to schedule green threads* The first time a new thread is created, ruby calls: setitimer(ITIMER_VIRTUAL, 10ms): tell the kernel to send the process a SIGVTALRM every 10ms posix_signal(SIGVTALRM, catch_timer): bind the catch_timer function to the signal * when compiled without --enable-pthread
  81. static void catch_timer(sig) int sig; { if (!rb_thread_critical) { static VALUE rb_thread_pending = 1; rb_thread_start_0(fn, arg, th) } VALUE (*fn)(); /* cause EINTR */ void *arg; } rb_thread_t th; { void if (!thread_init) { rb_thread_start_timer() thread_init = 1; { posix_signal(SIGVTALRM, catch_timer); struct itimerval tval; rb_thread_start_timer(); } if (!thread_init) return; tval.it_interval.tv_sec = 0; /* ... */ tval.it_interval.tv_usec = 10000; } tval.it_value = tval.it_interval; setitimer(ITIMER_VIRTUAL, &tval, NULL); }
  82. static void catch_timer(sig) int sig; { if (!rb_thread_critical) { static VALUE rb_thread_pending = 1; rb_thread_start_0(fn, arg, th) } VALUE (*fn)(); /* cause EINTR */ void *arg; } rb_thread_t th; { void if (!thread_init) { rb_thread_start_timer() thread_init = 1; { posix_signal(SIGVTALRM, catch_timer); struct itimerval tval; rb_thread_start_timer(); } if (!thread_init) return; tval.it_interval.tv_sec = 0; /* ... */ tval.it_interval.tv_usec = 10000; } tval.it_value = tval.it_interval; setitimer(ITIMER_VIRTUAL, &tval, NULL); }
  83. static void catch_timer(sig) int sig; { if (!rb_thread_critical) { static VALUE rb_thread_pending = 1; rb_thread_start_0(fn, arg, th) } VALUE (*fn)(); /* cause EINTR */ void *arg; } rb_thread_t th; { void if (!thread_init) { rb_thread_start_timer() thread_init = 1; { posix_signal(SIGVTALRM, catch_timer); struct itimerval tval; rb_thread_start_timer(); } if (!thread_init) return; tval.it_interval.tv_sec = 0; /* ... */ tval.it_interval.tv_usec = 10000; } tval.it_value = tval.it_interval; setitimer(ITIMER_VIRTUAL, &tval, NULL); }
  84. static void catch_timer(sig) int sig; { if (!rb_thread_critical) { static VALUE rb_thread_pending = 1; rb_thread_start_0(fn, arg, th) } VALUE (*fn)(); /* cause EINTR */ void *arg; } rb_thread_t th; { void if (!thread_init) { rb_thread_start_timer() thread_init = 1; { posix_signal(SIGVTALRM, catch_timer); struct itimerval tval; rb_thread_start_timer(); } if (!thread_init) return; tval.it_interval.tv_sec = 0; /* ... */ tval.it_interval.tv_usec = 10000; } tval.it_value = tval.it_interval; setitimer(ITIMER_VIRTUAL, &tval, NULL); } strace -e trace=setitimer ruby threaded.rb setitimer(ITIMER_VIRTUAL, {it_interval={0, 10000}, it_value={0, 10000}}, NULL) = 0 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
  85. But I’m not using threads! begin # require 'net/http' # Net::HTTP.new(host, port).request(...) # require 'net/smtp' # Net::SMTP.new('localhost').send_message(...) require 'timeout' Timeout.timeout(0.1) do 1+2*3/4 while true end rescue Timeout::Error end 500_000_000.times{ |i| i * 2 }
  86. But I’m not using threads! begin # require 'net/http' # Net::HTTP.new(host, port).request(...) uses timeout # require 'net/smtp' # Net::SMTP.new('localhost').send_message(...) require 'timeout' Timeout.timeout(0.1) do 1+2*3/4 while true end rescue Timeout::Error end 500_000_000.times{ |i| i * 2 }
  87. But I’m not using threads! begin # require 'net/http' # Net::HTTP.new(host, port).request(...) uses timeout # require 'net/smtp' # Net::SMTP.new('localhost').send_message(...) require 'timeout' Timeout.timeout(0.1) do uses threads 1+2*3/4 while true end rescue Timeout::Error end 500_000_000.times{ |i| i * 2 }
  88. But I’m not using threads! begin # require 'net/http' # Net::HTTP.new(host, port).request(...) uses timeout # require 'net/smtp' # Net::SMTP.new('localhost').send_message(...) require 'timeout' Timeout.timeout(0.1) do uses threads 1+2*3/4 while true end rescue Timeout::Error end 500_000_000.times{ |i| i * 2 } Thread.new, Timeout.timeout and Net::* all use threads and start the thread timer Once the timer is started, it will interrupt your process every 10ms, even if all threads are killed
  89. PATCH: stop the thread timer @@ -10518,6 +10520,15 @@ rb_thread_remove(th) rb_thread_die(th); th->prev->next = th->next; th->next->prev = th->prev; + + /* if this is the last ruby thread, stop timer signals */ + if (th->next == th->prev && th->next == main_thread) { + rb_thread_stop_timer(); + thread_init = 0; + } }
  90. PATCH: stop the thread timer @@ -10518,6 +10520,15 @@ rb_thread_remove(th) rb_thread_die(th); th->prev->next = th->next; th->next->prev = th->prev; + + /* if this is the last ruby thread, stop timer signals */ + if (th->next == th->prev && th->next == main_thread) { + rb_thread_stop_timer(); + thread_init = 0; + } } strace -e trace=setitimer ruby threaded.rb setitimer(ITIMER_VIRTUAL, {it_interval={0, 10000}, it_value={0, 10000}}, NULL) = 0 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- setitimer(ITIMER_VIRTUAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
  91. Why are our debian servers so slow?
  92. Why are our debian servers so slow? strace -ttT ruby threaded.rb 18:42:39.566788 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.566836 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.567083 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000006> 18:42:39.567131 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.567415 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006>
  93. Why are our debian servers so slow? strace -c ruby threaded.rb % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.326334 0 3568567 rt_sigprocmask 0.00 0.000000 0 9 read 0.00 0.000000 0 10 open 0.00 0.000000 0 10 close 0.00 0.000000 0 9 fstat 0.00 0.000000 0 25 mmap ------ ----------- ----------- --------- --------- ---------------- 100.00 0.326334 3568685 0 total strace -ttT ruby threaded.rb 18:42:39.566788 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.566836 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.567083 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000006> 18:42:39.567131 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.567415 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006>
  94. Why are our debian servers so slow? strace -c ruby threaded.rb % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.326334 0 3568567 rt_sigprocmask 0.00 0.000000 0 9 read 0.00 0.000000 0 10 open 0.00 0.000000 0 10 close 0.00 0.000000 0 9 fstat 0.00 0.000000 0 25 mmap ------ ----------- ----------- --------- --------- ---------------- 100.00 0.326334 3568685 0 total strace -ttT ruby threaded.rb 18:42:39.566788 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.566836 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.567083 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000006> 18:42:39.567131 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.567415 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 3.5 million sigprocmasks.. wtf?
  95. What is --enable-pthread anyway? --- config.h.nopthread uses a pthread for +++ config.h @@ -173,6 +173,12 @@ timing instead of #define FILE_READEND _IO_read_end #define HAVE__SC_CLK_TCK 1 setitimer() #define STACK_GROW_DIRECTION -1 +#define _REENTRANT 1 +#define _THREAD_SAFE 1 useful for +#define HAVE_LIBPTHREAD 1 +#define HAVE_NANOSLEEP 1 compatibility with +#define HAVE_GETCONTEXT 1 +#define HAVE_SETCONTEXT 1 external libs that #define DEFAULT_KCODE KCODE_NONE #define USE_ELF 1 use pthreads or #define DLEXT_MAXLEN 3 signals (like ruby- #ifdef _THREAD_SAFE pthread_create(&time_thread, 0, tk) #else thread_timer, 0); rb_thread_start_timer(); #endif
  96. What is --enable-pthread anyway? --- config.h.nopthread uses a pthread for +++ config.h @@ -173,6 +173,12 @@ timing instead of #define FILE_READEND _IO_read_end #define HAVE__SC_CLK_TCK 1 setitimer() #define STACK_GROW_DIRECTION -1 +#define _REENTRANT 1 +#define _THREAD_SAFE 1 useful for +#define HAVE_LIBPTHREAD 1 +#define HAVE_NANOSLEEP 1 compatibility with +#define HAVE_GETCONTEXT 1 +#define HAVE_SETCONTEXT 1 external libs that #define DEFAULT_KCODE KCODE_NONE #define USE_ELF 1 use pthreads or #define DLEXT_MAXLEN 3 signals (like ruby- #ifdef _THREAD_SAFE pthread_create(&time_thread, 0, tk) #else thread_timer, 0); rb_thread_start_timer(); #endif but.. it also enables getcontext/ setcontext??
  97. What is --enable-pthread anyway? --- config.h.nopthread uses a pthread for +++ config.h @@ -173,6 +173,12 @@ timing instead of #define FILE_READEND _IO_read_end #define HAVE__SC_CLK_TCK 1 setitimer() #define STACK_GROW_DIRECTION -1 +#define _REENTRANT 1 +#define _THREAD_SAFE 1 useful for +#define HAVE_LIBPTHREAD 1 +#define HAVE_NANOSLEEP 1 compatibility with +#define HAVE_GETCONTEXT 1 +#define HAVE_SETCONTEXT 1 external libs that #define DEFAULT_KCODE KCODE_NONE #define USE_ELF 1 use pthreads or #define DLEXT_MAXLEN 3 signals (like ruby- #ifdef _THREAD_SAFE pthread_create(&time_thread, 0, ? tk) #else thread_timer, 0); rb_thread_start_timer(); #endif but.. it also #if defined(HAVE_GETCONTEXT) && enables getcontext/ defined(HAVE_SETCONTEXT) #include <ucontext.h> setcontext?? #define USE_CONTEXT #endif
  98. ucontext?
  99. ucontext? ruby can use either setjmp/longjmp or setcontext/getcontext in its threading implementation and for exception handling
  100. ucontext? ruby can use either setjmp/longjmp or setcontext/getcontext in its threading implementation and for exception handling setjmp/longjmp save and restore the current cpu registers
  101. ucontext? ruby can use either setjmp/longjmp or setcontext/getcontext in its threading implementation and for exception handling setjmp/longjmp save and restore the current cpu registers setcontext/getcontext are an advanced version of setjmp/longjmp, but they also call sigprocmask to save/restore the signal mask before each jump
  102. PATCH: --disable-ucontext --- a/configure.in +++ b/configure.in @@ -368,6 +368,10 @@ +AC_ARG_ENABLE(ucontext, + [ --disable-ucontext do not use getcontext()/setcontext().], + [disable_ucontext=yes], [disable_ucontext=no]) + AC_ARG_ENABLE(pthread, [ --enable-pthread use pthread library.], [enable_pthread=$enableval], [enable_pthread=no]) @@ -1038,7 +1042,8 @@ -if test x"$ac_cv_header_ucontext_h" = xyes; then +if test x"$ac_cv_header_ucontext_h" = xyes && test x"$disable_ucontext" = xno; then if test x"$rb_with_pthread" = xyes; then AC_CHECK_FUNCS(getcontext setcontext) fi ./configure --enable-pthread --disable-ucontext
  103. PATCH: --disable-ucontext --- a/configure.in +++ b/configure.in @@ -368,6 +368,10 @@ +AC_ARG_ENABLE(ucontext, + [ --disable-ucontext do not use getcontext()/setcontext().], + [disable_ucontext=yes], [disable_ucontext=no]) + AC_ARG_ENABLE(pthread, [ --enable-pthread use pthread library.], [enable_pthread=$enableval], [enable_pthread=no]) @@ -1038,7 +1042,8 @@ -if test x"$ac_cv_header_ucontext_h" = xyes; then +if test x"$ac_cv_header_ucontext_h" = xyes && test x"$disable_ucontext" = xno; then if test x"$rb_with_pthread" = xyes; then AC_CHECK_FUNCS(getcontext setcontext) fi ./configure --enable-pthread --disable-ucontext % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- nan 0.000000 0 13 read nan 0.000000 0 21 10 open nan 0.000000 0 11 close ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000000 45 10 total
  104. EventMachine + threads = slow?? EventMachine allocates large buffers on the stack to read/write from the network Using threads with EM made ruby extremely slow..
  105. EventMachine + threads = slow?? EventMachine allocates large buffers on the stack to read/write from the network Using threads with EM made ruby extremely slow.. ...profile?
  106. EventMachine + threads = slow?? EventMachine allocates large buffers on the stack to read/write from the network Using threads with EM made ruby extremely slow.. #include "ruby.h" require 'cext' VALUE bigstack(VALUE self) (1..2).map{ { Thread.new{ char buffer[ 50 * 1024 ]; /* large stack frame */ CExt.bigstack{ if (rb_block_given_p()) rb_yield(Qnil); 100_000.times{ return Qnil; 1*2+3/4 } Thread.pass } void Init_cext() } { } VALUE CExt = rb_define_module("CExt"); }.map{ |t| t.join } rb_define_singleton_method(CExt, "bigstack", bigstack, 0); } ...profile?
  107. EventMachine + threads = slow?? EventMachine allocates large buffers on the stack to read/write from the network Using threads with EM made ruby extremely slow.. #include "ruby.h" require 'cext' VALUE bigstack(VALUE self) (1..2).map{ { Thread.new{ char buffer[ 50 * 1024 ]; /* large stack frame */ CExt.bigstack{ if (rb_block_given_p()) rb_yield(Qnil); 100_000.times{ return Qnil; 1*2+3/4 } Thread.pass } void Init_cext() } { } VALUE CExt = rb_define_module("CExt"); }.map{ |t| t.join } rb_define_singleton_method(CExt, "bigstack", bigstack, 0); } ...profile?
  108. EventMachine + threads = slow?? EventMachine allocates large buffers on the stack to read/write from the network Using threads with EM made ruby extremely slow.. #include "ruby.h" require 'cext' VALUE bigstack(VALUE self) (1..2).map{ { Thread.new{ char buffer[ 50 * 1024 ]; /* large stack frame */ CExt.bigstack{ if (rb_block_given_p()) rb_yield(Qnil); 100_000.times{ return Qnil; 1*2+3/4 } Thread.pass } void Init_cext() } { } VALUE CExt = rb_define_module("CExt"); }.map{ |t| t.join } rb_define_singleton_method(CExt, "bigstack", bigstack, 0); } ...profile?
  109. EventMachine + threads = slow?? EventMachine allocates large buffers on the stack to read/write from the network Using threads with EM made ruby extremely slow.. #include "ruby.h" require 'cext' VALUE bigstack(VALUE self) (1..2).map{ { Thread.new{ char buffer[ 50 * 1024 ]; /* large stack frame */ CExt.bigstack{ if (rb_block_given_p()) rb_yield(Qnil); 100_000.times{ return Qnil; 1*2+3/4 } Thread.pass } void Init_cext() } { } VALUE CExt = rb_define_module("CExt"); }.map{ |t| t.join } rb_define_singleton_method(CExt, "bigstack", bigstack, 0); } ...profile?
  110. EventMachine + threads = slow?? EventMachine allocates large buffers on the stack to read/write from the network Using threads with EM made ruby extremely slow.. #include "ruby.h" require 'cext' VALUE bigstack(VALUE self) (1..2).map{ { Thread.new{ char buffer[ 50 * 1024 ]; /* large stack frame */ CExt.bigstack{ if (rb_block_given_p()) rb_yield(Qnil); 100_000.times{ return Qnil; 1*2+3/4 } Thread.pass } void Init_cext() } { } VALUE CExt = rb_define_module("CExt"); }.map{ |t| t.join } rb_define_singleton_method(CExt, "bigstack", bigstack, 0); } ...profile?
  111. google-perftools Google’s CPU profiler export LD_PRELOAD=libprofiler.so export DYLD_INSERT_LIBRARIES=libprofiler.dylib CPUPROFILE=/tmp/myprof ./myapp pprof ./myapp /tmp/myprof
  112. wget http://google-perftools.googlecode.com/files/google- perftools-1.3.tar.gz tar zxvf google-perftools-1.3.tar.gz cd google-perftools-1.3 ./configure --prefix=/opt make sudo make install # for linux export LD_PRELOAD=/opt/lib/libprofiler.so # for osx export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib CPUPROFILE=/tmp/ruby.prof ruby -e' 5_000_000.times{ "hello world" } ' pprof `which ruby` --text /tmp/ruby.prof
  113. wget http://google-perftools.googlecode.com/files/google- perftools-1.3.tar.gz download tar zxvf google-perftools-1.3.tar.gz cd google-perftools-1.3 ./configure --prefix=/opt make sudo make install # for linux export LD_PRELOAD=/opt/lib/libprofiler.so # for osx export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib CPUPROFILE=/tmp/ruby.prof ruby -e' 5_000_000.times{ "hello world" } ' pprof `which ruby` --text /tmp/ruby.prof
  114. wget http://google-perftools.googlecode.com/files/google- perftools-1.3.tar.gz download tar zxvf google-perftools-1.3.tar.gz cd google-perftools-1.3 ./configure --prefix=/opt make compile sudo make install # for linux export LD_PRELOAD=/opt/lib/libprofiler.so # for osx export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib CPUPROFILE=/tmp/ruby.prof ruby -e' 5_000_000.times{ "hello world" } ' pprof `which ruby` --text /tmp/ruby.prof
  115. wget http://google-perftools.googlecode.com/files/google- perftools-1.3.tar.gz download tar zxvf google-perftools-1.3.tar.gz cd google-perftools-1.3 ./configure --prefix=/opt make compile sudo make install # for linux export LD_PRELOAD=/opt/lib/libprofiler.so setup # for osx export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib CPUPROFILE=/tmp/ruby.prof ruby -e' 5_000_000.times{ "hello world" } ' pprof `which ruby` --text /tmp/ruby.prof
  116. wget http://google-perftools.googlecode.com/files/google- perftools-1.3.tar.gz download tar zxvf google-perftools-1.3.tar.gz cd google-perftools-1.3 ./configure --prefix=/opt make compile sudo make install # for linux export LD_PRELOAD=/opt/lib/libprofiler.so setup # for osx export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib CPUPROFILE=/tmp/ruby.prof ruby -e' profile 5_000_000.times{ "hello world" } ' pprof `which ruby` --text /tmp/ruby.prof
  117. wget http://google-perftools.googlecode.com/files/google- perftools-1.3.tar.gz download tar zxvf google-perftools-1.3.tar.gz cd google-perftools-1.3 ./configure --prefix=/opt make compile sudo make install # for linux export LD_PRELOAD=/opt/lib/libprofiler.so setup # for osx export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib CPUPROFILE=/tmp/ruby.prof ruby -e' profile 5_000_000.times{ "hello world" } ' pprof `which ruby` --text /tmp/ruby.prof report
  118. pprof ruby pprof ruby ruby.prof --text ruby.prof --gif Total: 103 samples 20 19.4% 19.4% 95 92.2% rb_yield_0 11 10.7% 30.1% 103 100.0% rb_eval 8 7.8% 37.9% 12 11.7% gc_sweep 3 2.9% 68.9% 52 50.5% rb_str_new3 3 2.9% 74.8% 3 2.9% obj_free 3 2.9% 77.7% 103 100.0% int_dotimes 3 2.9% 80.6% 12 11.7% gc_mark
  119. Profiling EM + threads
  120. Profiling EM + threads Total: 3763 samples 2764 73.5% catch_timer 989 26.3% memcpy 3 0.1% st_lookup 2 0.1% rb_thread_schedule 1 0.0% rb_eval 1 0.0% rb_newobj 1 0.0% rb_gc_force_recycle
  121. Profiling EM + threads Total: 3763 samples 2764 73.5% catch_timer 989 26.3% memcpy 3 0.1% st_lookup 2 0.1% rb_thread_schedule 1 0.0% rb_eval 1 0.0% rb_newobj 1 0.0% rb_gc_force_recycle rb_thread_save_context
  122. Profiling EM + threads Total: 3763 samples 2764 73.5% catch_timer 989 26.3% memcpy 3 0.1% st_lookup 2 0.1% rb_thread_schedule 1 0.0% rb_eval 1 0.0% rb_newobj 1 0.0% rb_gc_force_recycle rb_thread_save_context rb_thread_restore_context
  123. Profiling EM + threads Total: 3763 samples 2764 73.5% catch_timer 989 26.3% memcpy 3 0.1% st_lookup 2 0.1% rb_thread_schedule 1 0.0% rb_eval 1 0.0% rb_newobj 1 0.0% rb_gc_force_recycle rb_thread_save_context rb_thread_restore_context memcpy???
  124. Profiling EM + threads Total: 3763 samples 2764 73.5% catch_timer 989 26.3% memcpy 3 0.1% st_lookup 2 0.1% rb_thread_schedule 1 0.0% rb_eval 1 0.0% rb_newobj 1 0.0% rb_gc_force_recycle rb_thread_save_context rb_thread_restore_context memcpy??? really? memcpy?
  125. ltrace trace library calls ltrace -cp <pid> ltrace -ttTp <pid> -o <file>
  126. ltrace -c ruby cext_test.rb
  127. ltrace -c ruby cext_test.rb % time seconds usecs/call calls function ------ ----------- ----------- --------- -------------------- 48.65 11.741295 617 19009 memcpy 30.16 7.279634 831 8751 longjmp 9.78 2.359889 135 17357 _setjmp 8.91 2.150565 285 7540 malloc 1.10 0.265946 20 13021 memset 0.81 0.195272 19 10105 __ctype_b_loc 0.35 0.084575 19 4361 strcmp 0.19 0.046163 19 2377 strlen 0.03 0.006272 23 265 realloc ------ ----------- ----------- --------- -------------------- 100.00 24.134999 82999 total
  128. ltrace -c ruby cext_test.rb % time seconds usecs/call calls function ------ ----------- ----------- --------- -------------------- 48.65 11.741295 617 19009 memcpy really 30.16 7.279634 831 8751 longjmp 9.78 2.359889 135 17357 _setjmp 8.91 2.150565 285 7540 malloc 1.10 0.265946 20 13021 memset 0.81 0.195272 19 10105 __ctype_b_loc 0.35 0.084575 19 4361 strcmp 0.19 0.046163 19 2377 strlen 0.03 0.006272 23 265 realloc ------ ----------- ----------- --------- -------------------- 100.00 24.134999 82999 total
  129. ltrace -c ruby cext_test.rb % time seconds usecs/call calls function ------ ----------- ----------- --------- -------------------- 48.65 11.741295 617 19009 memcpy really 30.16 7.279634 831 8751 longjmp 9.78 2.359889 135 17357 _setjmp 8.91 2.150565 285 7540 malloc 1.10 0.265946 20 13021 memset 0.81 0.195272 19 10105 __ctype_b_loc 0.35 0.084575 19 4361 strcmp 0.19 0.046163 19 2377 strlen 0.03 0.006272 23 265 realloc ------ ----------- ----------- --------- -------------------- 100.00 24.134999 82999 total ltrace -ttT -e memcpy ruby cext_test.rb 01:24:48.769408 --- SIGVTALRM (Virtual timer expired) --- 01:24:48.769616 memcpy(0x1216000, "", 1086328) = 0x1216000 <0.000578> 01:24:48.770555 memcpy(0x6e32670, "240&343v", 1086328) = 0x6e32670 <0.000418> 01:24:49.899414 --- SIGVTALRM (Virtual timer expired) --- 01:24:49.899490 memcpy(0x1320000, "", 1082584) = 0x1320000 <0.000628> 01:24:49.900474 memcpy(0x6e32670, "", 1086328) = 0x6e32670 <0.000479>
  130. OK, its calling memcpy() but what is it copying?
  131. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ }
  132. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); 1. save cpu registers th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ }
  133. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ }
  134. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; 2. save stack frames th->scope = ruby_scope; /* ... */ }
  135. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ }
  136. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; 3. save vm globals /* ... */ }
  137. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ }
  138. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ 4. restore vm globals }
  139. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ }
  140. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ } 5. restore stack frames
  141. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ }
  142. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ } 6. restore cpu registers
  143. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ }
  144. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ } it’s copying the stacks to the heap!
  145. Stack vs. Heap
  146. Stack vs. Heap Stack:
  147. Stack vs. Heap Stack: Storage for local vars
  148. Stack vs. Heap Stack: Storage for local vars Only valid while stack frame is on the stack!
  149. Stack vs. Heap Stack: Storage for local vars Only valid while stack frame is on the stack! Keeping track of function calls
  150. Stack vs. Heap Stack: Heap: Storage for local vars Only valid while stack frame is on the stack! Keeping track of function calls
  151. Stack vs. Heap Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Keeping track of function calls
  152. Stack vs. Heap Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  153. Stack vs. Heap func1() void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  154. Stack vs. Heap func1() 4 bytes void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  155. Stack vs. Heap func2() char *string = malloc(10); func3(); func1() 4 bytes void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  156. Stack vs. Heap func2() 4 bytes char *string = malloc(10); func3(); func1() 4 bytes void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  157. Stack vs. Heap func2() 4 bytes char *string = malloc(10); 10 bytes func3(); func1() 4 bytes void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  158. Stack vs. Heap func3() char buffer[8]; func2() 4 bytes char *string = malloc(10); 10 bytes func3(); func1() 4 bytes void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  159. Stack vs. Heap func3() 8 bytes char buffer[8]; func2() 4 bytes char *string = malloc(10); 10 bytes func3(); func1() 4 bytes void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  160. Stack vs. Heap func3() char buffer[8]; func2() 4 bytes char *string = malloc(10); 10 bytes func3(); func1() 4 bytes void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  161. Stack vs. Heap func2() 4 bytes char *string = malloc(10); 10 bytes func3(); func1() 4 bytes void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  162. memcpy()ing the thread stacks
  163. memcpy()ing the thread stacks During execution
  164. memcpy()ing the thread stacks During execution Saving current thread
  165. memcpy()ing the thread stacks During execution Saving current thread Restoring next thread
  166. memcpy()ing the thread stacks During execution Saving current thread Restoring next thread so, what’s on these thread stacks?
  167. gdb the GNU debugger gdb <program> gdb <program> <pid> Be sure to build with: -ggdb -O0
  168. gdb walkthrough
  169. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  170. gdb walkthrough % gdb ./test-it start gdb (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  171. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  172. gdb walkthrough % gdb ./test-it (gdb) b average set breakpoint on function named average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  173. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  174. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run run program Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  175. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  176. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 hit breakpoint! 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  177. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  178. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt show backtrace #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  179. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  180. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 function stack #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  181. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  182. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; single step (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  183. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  184. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 print variables (gdb) p sum $2 = 11
  185. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  186. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  187. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  188. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  189. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  190. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  191. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  192. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  193. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  194. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  195. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48 rb_eval recursively executes ruby code in 1.8
  196. How big is the stack?
  197. How big is the stack? #8 rb_eval at eval.c:3493 (gdb) p $ebp - $esp $1 = 968
  198. How big is the stack? #8 rb_eval at eval.c:3493 (gdb) p $ebp - $esp base - stack ptr = frame size $1 = 968
  199. How big is the stack? #8 rb_eval at eval.c:3493 (gdb) p $ebp - $esp base - stack ptr = frame size $1 = 968 each rb_eval stack frame is almost 1k!
  200. How big is the stack? #8 rb_eval at eval.c:3493 (gdb) p $ebp - $esp base - stack ptr = frame size $1 = 968 each rb_eval stack frame is almost 1k! #0 rb_thread_save_context at eval.c:10597 (gdb) p (void*)rb_gc_stack_start - $esp $1 = 10572
  201. How big is the stack? #8 rb_eval at eval.c:3493 (gdb) p $ebp - $esp base - stack ptr = frame size $1 = 968 each rb_eval stack frame is almost 1k! #0 rb_thread_save_context at eval.c:10597 (gdb) p (void*)rb_gc_stack_start - $esp $1 = 10572 10.5k stack will be memcpy()’d
  202. How big is the stack? #8 rb_eval at eval.c:3493 (gdb) p $ebp - $esp base - stack ptr = frame size $1 = 968 each rb_eval stack frame is almost 1k! #0 rb_thread_save_context at eval.c:10597 (gdb) p (void*)rb_gc_stack_start - $esp $1 = 10572 10.5k stack will be memcpy()’d 50 method calls * 1k ≈ 50k stack
  203. Recap: How do Ruby threads work?
  204. Recap: How do Ruby threads work? Each thread has it’s own execution context: saved cpu registers (setjmp/longjmp) copy of vm globals (current frame, scope, block) stack (memcpy)
  205. Recap: How do Ruby threads work? Each thread has it’s own execution context: saved cpu registers (setjmp/longjmp) copy of vm globals (current frame, scope, block) stack (memcpy) How does Ruby switch between threads? Thread executes until time is up (SIGVTALRM) rb_thread_save_context() saves state rb_thread_schedule() picks the next thread rb_thread_restore_context() restores new thread state
  206. Recap: How do Ruby threads work? Each thread has it’s own execution context: saved cpu registers (setjmp/longjmp) copy of vm globals (current frame, scope, block) stack (memcpy) How does Ruby switch between threads? Thread executes until time is up (SIGVTALRM) rb_thread_save_context() saves state rb_thread_schedule() picks the next thread rb_thread_restore_context() restores new thread state memcpy
  207. But I thought you said...
  208. But I thought you said... The whole point of green threads is that they are fast and cheap (at the loss of SMP).
  209. But I thought you said... The whole point of green threads is that they are fast and cheap (at the loss of SMP). That much copying is neither fast nor cheap.
  210. But I thought you said... The whole point of green threads is that they are fast and cheap (at the loss of SMP). That much copying is neither fast nor cheap. So how can we fix it?
  211. Stop copying stuff!
  212. Stop copying stuff! A stack is just a region of memory.
  213. Stop copying stuff! A stack is just a region of memory. Why not just point the CPU at a region on the heap?
  214. Stop copying stuff! A stack is just a region of memory. Why not just point the CPU at a region on the heap? Then, each switch we just swap registers and do no copying at all!
  215. Tradeoffs! Switching will be really, really fast, but... Can’t grow stacks anymore What happens if/when you fall off the edge? How big should they be? Heap or mmap area?
  216. If they go on the heap.. use malloc/free (easier than mmap!) malloc overhead, though could try to grow stacks with realloc need to make sure realloc returns the same address! overflow and you’ll corrupt the heap won’t know until it’s too late!
  217. If they are mmaped... mmap-family of functions can be hard to use could try to grow stacks with mremap put guard pages under stacks to catch overflow
  218. We decided to... mmap the stacks use guard pages to protect from overflow do not grow stacks stacks are 1MB, but provide a tuning knob for advanced users
  219. 0-copy threading patch static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); - MEMCPY(th->stk_pos, th->stk_pos = pos; - th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); - MEMCPY(th->stk_ptr, } - th->stk_pos, VALUE, th->stk_len); static VALUE th->frame = ruby_frame; rb_thread_start_0(fn, arg, th) th->scope = ruby_scope; VALUE (*fn)(); void *arg; /* ... */ rb_thread_t th; } { /* ... */ + th->stk_ptr = mmap(NULL, stack_size); + __asm__ ("movl %0, %%esp" + :: "r" (th->stk_ptr)); }
  220. 0-copy threading patch static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); - MEMCPY(th->stk_pos, th->stk_pos = pos; - th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); - MEMCPY(th->stk_ptr, } - th->stk_pos, VALUE, th->stk_len); static VALUE th->frame = ruby_frame; rb_thread_start_0(fn, arg, th) th->scope = ruby_scope; VALUE (*fn)(); void *arg; /* ... */ rb_thread_t th; } { /* ... */ allocate stack + th->stk_ptr = mmap(NULL, stack_size); + __asm__ ("movl %0, %%esp" + :: "r" (th->stk_ptr)); }
  221. 0-copy threading patch static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); - MEMCPY(th->stk_pos, th->stk_pos = pos; - th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); - MEMCPY(th->stk_ptr, } - th->stk_pos, VALUE, th->stk_len); static VALUE th->frame = ruby_frame; rb_thread_start_0(fn, arg, th) th->scope = ruby_scope; VALUE (*fn)(); void *arg; /* ... */ rb_thread_t th; } { /* ... */ allocate stack + th->stk_ptr = mmap(NULL, stack_size); + __asm__ ("movl %0, %%esp" update stack pointer + :: "r" (th->stk_ptr)); }
  222. 0-copy thread switches
  223. 0-copy thread switches Main Thread
  224. 0-copy thread switches Main Thread Thread Switch
  225. 0-copy thread switches Main Thread Thread Switch Thread Switch
  226. Benchmark Computer Language Benchmark Game: http://shootout.alioth.debian.org/ Let’s use the thread-ring benchmark Let’s also grow the program stack a bit to illustrate the speed boost.
  227. Benchmark Computer Language Benchmark Game: http://shootout.alioth.debian.org/ Let’s use the thread-ring benchmark Let’s also grow the program stack a bit to illustrate the speed boost. def grow_stack n=0, &blk unless n > 20 grow_stack n+1, &blk else yield end end
  228. thread-ring number = 50_000_000 threads = [] for i in 1..503 threads << Thread.new(i) do |thr_num| grow_stack do while true Thread.stop if number > 0 number -= 1 else puts thr_num exit 0 end end end end end
  229. thread-ring number = 50_000_000 threads = [] for i in 1..503 threads << Thread.new(i) do |thr_num| create 503 threads grow_stack do while true Thread.stop if number > 0 number -= 1 else puts thr_num exit 0 end end end end end
  230. thread-ring number = 50_000_000 threads = [] for i in 1..503 threads << Thread.new(i) do |thr_num| create 503 threads grow_stack do increase the thread stacks while true Thread.stop if number > 0 number -= 1 else puts thr_num exit 0 end end end end end
  231. thread-ring number = 50_000_000 threads = [] for i in 1..503 threads << Thread.new(i) do |thr_num| create 503 threads grow_stack do increase the thread stacks while true Thread.stop pause the thread if number > 0 number -= 1 else puts thr_num exit 0 end end end end end
  232. thread-ring number = 50_000_000 threads = [] for i in 1..503 threads << Thread.new(i) do |thr_num| create 503 threads grow_stack do increase the thread stacks while true Thread.stop pause the thread if number > 0 number -= 1 when resumed, decrement number else puts thr_num exit 0 end end end end end
  233. thread-ring number = 50_000_000 threads = [] for i in 1..503 threads << Thread.new(i) do |thr_num| create 503 threads grow_stack do increase the thread stacks while true Thread.stop pause the thread if number > 0 number -= 1 when resumed, decrement number else prev_thread = threads.last puts thr_num while true exit 0 for thread in threads end Thread.pass until prev_thread.stop? end thread.run end prev_thread = thread end end end end
  234. thread-ring number = 50_000_000 threads = [] for i in 1..503 threads << Thread.new(i) do |thr_num| create 503 threads grow_stack do increase the thread stacks while true Thread.stop pause the thread if number > 0 number -= 1 when resumed, decrement number else prev_thread = threads.last puts thr_num while true exit 0 for thread in threads end Thread.pass until prev_thread.stop? end thread.run end prev_thread = thread end end end end schedule each thread until number == 0
  235. Results
  236. Results token = 50_000_000
  237. Results token = 50_000_000 Ruby 1.8.6 7493.50s
  238. Results token = 50_000_000 Ruby 1.8.6 7493.50s ~ 2.081 hours!
  239. Results token = 50_000_000 Ruby 1.8.6 7493.50s ~ 2.081 hours! Ruby 1.8.6 w/ thread fix 799.52s
  240. Results token = 50_000_000 Ruby 1.8.6 7493.50s ~ 2.081 hours! Ruby 1.8.6 w/ thread fix 799.52s ~13 mins!
  241. rb_thread_schedule() sucks
  242. rb_thread_schedule() sucks Thread switching might be fast now, but the scheduler is still pretty bad.
  243. rb_thread_schedule() sucks Thread switching might be FOREACH_THREAD_FROM(curr, th) { /* */ fast now, but the } END_FOREACH_FROM(curr, th); scheduler is still pretty FOREACH_THREAD_FROM(curr, th) { bad. /* */ } END_FOREACH_FROM(curr, th); FOREACH_THREAD_FROM(curr, th) { /* */ } END_FOREACH_FROM(curr, th); FOREACH_THREAD_FROM(curr, th) { /* */ } END_FOREACH_FROM(curr, th); FOREACH_THREAD_FROM(curr, th) { /* */ } END_FOREACH_FROM(curr, th);
  244. rb_thread_schedule() sucks Thread switching might be FOREACH_THREAD_FROM(curr, th) { /* */ fast now, but the } END_FOREACH_FROM(curr, th); scheduler is still pretty FOREACH_THREAD_FROM(curr, th) { bad. /* */ } END_FOREACH_FROM(curr, th); Loops over each thread 5+ FOREACH_THREAD_FROM(curr, th) { times /* */ } END_FOREACH_FROM(curr, th); FOREACH_THREAD_FROM(curr, th) { /* */ } END_FOREACH_FROM(curr, th); FOREACH_THREAD_FROM(curr, th) { /* */ } END_FOREACH_FROM(curr, th);
  245. rb_thread_schedule() sucks Thread switching might be FOREACH_THREAD_FROM(curr, th) { /* */ fast now, but the } END_FOREACH_FROM(curr, th); scheduler is still pretty FOREACH_THREAD_FROM(curr, th) { bad. /* */ } END_FOREACH_FROM(curr, th); Loops over each thread 5+ FOREACH_THREAD_FROM(curr, th) { times /* */ } END_FOREACH_FROM(curr, th); Complexity theory says FOREACH_THREAD_FROM(curr, th) { constants don’t matter, /* */ } but... END_FOREACH_FROM(curr, th); FOREACH_THREAD_FROM(curr, th) { /* */ } END_FOREACH_FROM(curr, th);
  246. rb_thread_schedule() sucks Thread switching might be FOREACH_THREAD_FROM(curr, th) { /* */ fast now, but the } END_FOREACH_FROM(curr, th); scheduler is still pretty FOREACH_THREAD_FROM(curr, th) { bad. /* */ } END_FOREACH_FROM(curr, th); Loops over each thread 5+ FOREACH_THREAD_FROM(curr, th) { times /* */ } END_FOREACH_FROM(curr, th); Complexity theory says FOREACH_THREAD_FROM(curr, th) { constants don’t matter, /* */ } but... END_FOREACH_FROM(curr, th); FOREACH_THREAD_FROM(curr, th) { What now? /* */ } END_FOREACH_FROM(curr, th);
  247. Rewrite the scheduler.
  248. Rewrite the scheduler.
  249. Rewrite the scheduler. Remove the scheduler!
  250. Fibers!
  251. Fibers! We’re backporting the Fibers API to MRI
  252. Fibers! We’re backporting the Fibers API to MRI Behind the scenes:
  253. Fibers! We’re backporting the Fibers API to MRI Behind the scenes: Create a thread
  254. Fibers! We’re backporting the Fibers API to MRI Behind the scenes: Create a thread Don’t add to schedule list
  255. Fibers! We’re backporting the Fibers API to MRI Behind the scenes: Create a thread Don’t add to schedule list “Schedule” manually with yield and resume
  256. Where can I get all this awesome stuff?
  257. GitHub http://github.com/ice799/matzruby heap_stacks branch heap_stacks_186 branch http://github.com/tmm1/ruby187 fibers branch
  258. Ruby Enterprise Edition Based on Ruby 1.8.6 Thread timer fix is in the current release. 0-copy threading patch will be in the next release. Next release also merges MBARI for smaller rb_eval stack frames. http://www.rubyenterpriseedition.com/
  259. Questions? @joedamato @tmm1 timetobleed.com github.com/tmm1 Thanks for listening!

+ Aman GuptaAman Gupta, 2 months ago

custom

4668 views, 30 favs, 5 embeds more stats

An in-depth look at threads in Ruby 1.8

More info about this document

© All Rights Reserved

Go to text version

  • Total Views 4668
    • 2810 on SlideShare
    • 1858 from embeds
  • Comments 1
  • Favorites 30
  • Downloads 162
Most viewed embeds
  • 1835 views on http://timetobleed.com
  • 13 views on http://www.slideshare.net
  • 8 views on http://blog.darkhax.com
  • 1 views on http://bmqdev.local:8888
  • 1 views on http://jeffreykeen.com

more

All embeds
  • 1835 views on http://timetobleed.com
  • 13 views on http://www.slideshare.net
  • 8 views on http://blog.darkhax.com
  • 1 views on http://bmqdev.local:8888
  • 1 views on http://jeffreykeen.com

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories