SlideShare a Scribd company logo
1 of 263
Download to read offline
Threaded Awesome
   (that’s an oxymoron)

 Joe Damato and Aman Gupta
About Joe Damato
From NJ, Godfather II is actually my
Biography

CMU/VMWare alum

http://timetobleed.com

@joedamato
About Aman Gupta
EventMachine, amqp

Ruby Hero 2009

github.com/tmm1

@tmm1
What is a thread?




   source: wikipedia
What is a thread?
What is a thread?
A thread is just a set of execution
state
What is a thread?
A thread is just a set of execution
state

This state usually includes:
What is a thread?
A thread is just a set of execution
state

This state usually includes:

  instruction & stack pointers
What is a thread?
A thread is just a set of execution
state

This state usually includes:

  instruction & stack pointers

  scheduling priority
What is a thread?
A thread is just a set of execution
state

This state usually includes:

  instruction & stack pointers

  scheduling priority

  other CPU state
Threading Models
  Green threads (1:N)

  Native Threads (1:1)

  Hybrid (M:N)
Green Threads (1:N)
Green Threads (1:N)
“Green” because they are light weight
Green Threads (1:N)
“Green” because they are light weight
  Kernel doesn’t know they exist
Green Threads (1:N)
“Green” because they are light weight
  Kernel doesn’t know they exist
  Implementation is in userland
Green Threads (1:N)
“Green” because they are light weight
  Kernel doesn’t know they exist
  Implementation is in userland

Pros
Green Threads (1:N)
“Green” because they are light weight
  Kernel doesn’t know they exist
  Implementation is in userland

Pros
  Create lots of them cheaply (10,000s)
Green Threads (1:N)
“Green” because they are light weight
  Kernel doesn’t know they exist
  Implementation is in userland

Pros
  Create lots of them cheaply (10,000s)
  Switch between them cheaply (Ruby doesn’t)
Green Threads (1:N)
“Green” because they are light weight
  Kernel doesn’t know they exist
  Implementation is in userland

Pros
  Create lots of them cheaply (10,000s)
  Switch between them cheaply (Ruby doesn’t)
  Schedule them however you want
Green Threads (1:N)
“Green” because they are light weight
  Kernel doesn’t know they exist
  Implementation is in userland

Pros
  Create lots of them cheaply (10,000s)
  Switch between them cheaply (Ruby doesn’t)
  Schedule them however you want

Cons
Green Threads (1:N)
“Green” because they are light weight
  Kernel doesn’t know they exist
  Implementation is in userland

Pros
  Create lots of them cheaply (10,000s)
  Switch between them cheaply (Ruby doesn’t)
  Schedule them however you want

Cons
  A blocking call in one blocks ALL
Green Threads (1:N)
“Green” because they are light weight
  Kernel doesn’t know they exist
  Implementation is in userland

Pros
  Create lots of them cheaply (10,000s)
  Switch between them cheaply (Ruby doesn’t)
  Schedule them however you want

Cons
  A blocking call in one blocks ALL
  Kernel doesn’t know about them
Green Threads (1:N)
“Green” because they are light weight
  Kernel doesn’t know they exist
  Implementation is in userland

Pros
  Create lots of them cheaply (10,000s)
  Switch between them cheaply (Ruby doesn’t)
  Schedule them however you want

Cons
  A blocking call in one blocks ALL
  Kernel doesn’t know about them
  Can’t take advantage of SMP
Green Threads (1:N)
  (pics or it didn’t happen)
Ruby 1.8 uses Green Threads
     (and does it wrong)
Native Threads (1:1)
Native Threads (1:1)
Native Threads
Native Threads (1:1)
Native Threads
 Kernel knows they exist
Native Threads (1:1)
Native Threads
 Kernel knows they exist
 Some userland code (libpthread)
Native Threads (1:1)
Native Threads
  Kernel knows they exist
  Some userland code (libpthread)

Pros
Native Threads (1:1)
Native Threads
  Kernel knows they exist
  Some userland code (libpthread)

Pros
  Take advantage of SMP
Native Threads (1:1)
Native Threads
  Kernel knows they exist
  Some userland code (libpthread)

Pros
  Take advantage of SMP
  Shared memory
Native Threads (1:1)
Native Threads
  Kernel knows they exist
  Some userland code (libpthread)

Pros
  Take advantage of SMP
  Shared memory
  Blocking in one thread doesn’t block everyone
Native Threads (1:1)
Native Threads
  Kernel knows they exist
  Some userland code (libpthread)

Pros
  Take advantage of SMP
  Shared memory
  Blocking in one thread doesn’t block everyone
  Don’t have to write a scheduler
Native Threads (1:1)
Native Threads
  Kernel knows they exist
  Some userland code (libpthread)

Pros
  Take advantage of SMP
  Shared memory
  Blocking in one thread doesn’t block everyone
  Don’t have to write a scheduler

Cons
Native Threads (1:1)
Native Threads
  Kernel knows they exist
  Some userland code (libpthread)

Pros
  Take advantage of SMP
  Shared memory
  Blocking in one thread doesn’t block everyone
  Don’t have to write a scheduler

Cons
  Overhead limits how many you can create
Native Threads (1:1)
Native Threads
  Kernel knows they exist
  Some userland code (libpthread)

Pros
  Take advantage of SMP
  Shared memory
  Blocking in one thread doesn’t block everyone
  Don’t have to write a scheduler

Cons
  Overhead limits how many you can create
  Bugs (glibc, more threads = slower creation time)
Native Threads (1:1)
Native Threads
  Kernel knows they exist
  Some userland code (libpthread)

Pros
  Take advantage of SMP
  Shared memory
  Blocking in one thread doesn’t block everyone
  Don’t have to write a scheduler

Cons
  Overhead limits how many you can create
  Bugs (glibc, more threads = slower creation time)
  Don’t have fine grained scheduling control
Native Threads (1:1)
Ruby 1.9 uses Native Threads
(but.. they don’t execute in parallel)
Hybrid Threads (M:N)
Hybrid Threads (M:N)
Hybrid threads
Hybrid Threads (M:N)
Hybrid threads
  Almost best of both worlds
Hybrid Threads (M:N)
Hybrid threads
  Almost best of both worlds

Pros
Hybrid Threads (M:N)
Hybrid threads
  Almost best of both worlds

Pros
  Take advantage of SMP
Hybrid Threads (M:N)
Hybrid threads
  Almost best of both worlds

Pros
  Take advantage of SMP
  Cheap setup and teardown
Hybrid Threads (M:N)
Hybrid threads
  Almost best of both worlds

Pros
  Take advantage of SMP
  Cheap setup and teardown
  Blocking in one thread doesn’t block everyone
Hybrid Threads (M:N)
Hybrid threads
  Almost best of both worlds

Pros
  Take advantage of SMP
  Cheap setup and teardown
  Blocking in one thread doesn’t block everyone

Cons
Hybrid Threads (M:N)
Hybrid threads
  Almost best of both worlds

Pros
  Take advantage of SMP
  Cheap setup and teardown
  Blocking in one thread doesn’t block everyone

Cons
  Need 2 schedulers (userland + kernel)
Hybrid Threads (M:N)
Hybrid threads
  Almost best of both worlds

Pros
  Take advantage of SMP
  Cheap setup and teardown
  Blocking in one thread doesn’t block everyone

Cons
  Need 2 schedulers (userland + kernel)
  Need to make them actually work together
Hybrid Threads (M:N)
Hybrid threads
  Almost best of both worlds

Pros
  Take advantage of SMP
  Cheap setup and teardown
  Blocking in one thread doesn’t block everyone

Cons
  Need 2 schedulers (userland + kernel)
  Need to make them actually work together
  All green threads backed by same native thread
  can be blocked
Hybrid Threads (M:N)
Erlang uses Hybrid Threads
   Ruby 1.9, too (with fibers)
Multitasking Types

  Preemptive Multitasking

  Cooperative Multitasking
Preemptive
Multitasking
Preemptive
        Multitasking
Outside event (timer) signals end of CPU slice
Preemptive
        Multitasking
Outside event (timer) signals end of CPU slice
  Handle important events quickly
Preemptive
        Multitasking
Outside event (timer) signals end of CPU slice
  Handle important events quickly
  Can help ensure everyone gets to execute
Preemptive
        Multitasking
Outside event (timer) signals end of CPU slice
  Handle important events quickly
  Can help ensure everyone gets to execute

But..
Preemptive
        Multitasking
Outside event (timer) signals end of CPU slice
  Handle important events quickly
  Can help ensure everyone gets to execute

But..
  Need to build a smart scheduler
Preemptive
        Multitasking
Outside event (timer) signals end of CPU slice
  Handle important events quickly
  Can help ensure everyone gets to execute

But..
  Need to build a smart scheduler
  Can yield non-determistic execution order
Cooperative
Multitasking
Cooperative
        Multitasking
Threads voluntarily release the CPU
Cooperative
        Multitasking
Threads voluntarily release the CPU
 Give up the CPU when it is “optimal”
Cooperative
        Multitasking
Threads voluntarily release the CPU
 Give up the CPU when it is “optimal”
 Can guarantee deterministic execution order
Cooperative
        Multitasking
Threads voluntarily release the CPU
 Give up the CPU when it is “optimal”
 Can guarantee deterministic execution order
 Very simple “scheduler”
Cooperative
        Multitasking
Threads voluntarily release the CPU
 Give up the CPU when it is “optimal”
 Can guarantee deterministic execution order
 Very simple “scheduler”

But..
Cooperative
        Multitasking
Threads voluntarily release the CPU
 Give up the CPU when it is “optimal”
 Can guarantee deterministic execution order
 Very simple “scheduler”

But..
 Badly written code can hang all threads
So, what is a fiber?

In Ruby fibers are green threads
 with cooperative multitasking.
So what’s the deal
with ruby threads?
     strace

     google-perftools

     ltrace

     gdb
strace
trace system calls and signals


       strace -cp <pid>

 strace -ttTp <pid> -o <file>
strace -cp <pid>
-c
Count time, calls, and errors for each system call and report a
summary on program exit.

-p pid
Attach to the process with the process ID pid and begin tracing.




        % time     seconds usecs/call      calls    errors syscall
        ------ ----------- ----------- --------- --------- ----------------
         50.39    0.000064           0      1197       592 read
         34.65    0.000044           0       609           writev
         14.96    0.000019           0      1226           epoll_ctl
          0.00    0.000000           0         4           close
          0.00    0.000000           0         1           select
          0.00    0.000000           0         4           socket
          0.00    0.000000           0         4         4 connect
          0.00    0.000000           0      1057           epoll_wait
        ------ ----------- ----------- --------- --------- ----------------
        100.00    0.000127                  4134       596 total
strace -ttTp <pid> -o <file>
-t
Prefix each line of the trace with the time of day.

-tt
If given twice, the time printed will include the microseconds.

-T
Show the time spent in system calls. This records the time
difference between the beginning and the end of each system call.

-o filename
Write the trace output to the file filename rather than to stderr.

01:09:11.266949   epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109>
01:09:11.300102   accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014>
01:09:11.300190   fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007>
01:09:11.300237   fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008>
01:09:11.300277   setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008>
01:09:11.300489   accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014>
01:09:11.300547   epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009>
01:09:11.300593   epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007>
01:09:11.300633   read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012>
01:09:11.301727   rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007>
01:09:11.302095   poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008>
01:09:11.302144   write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023>
01:09:11.302221   read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
strace -ttTp <pid> -o <file>
-t
Prefix each line of the trace with the time of day.

-tt
If given twice, the time printed will include the microseconds.

-T
Show the time spent in system calls. This records the time
difference between the beginning and the end of each system call.

-o filename
Write the trace output to the file filename rather than to stderr.

01:09:11.266949   epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109>
01:09:11.300102   accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014>
01:09:11.300190   fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007>
01:09:11.300237   fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008>
01:09:11.300277   setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008>
01:09:11.300489   accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014>
01:09:11.300547   epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009>
01:09:11.300593   epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007>
01:09:11.300633   read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012>
01:09:11.301727   rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007>
01:09:11.302095   poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008>
01:09:11.302144   write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023>
01:09:11.302221   read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
strace -ttTp <pid> -o <file>
-t
Prefix each line of the trace with the time of day.

-tt
If given twice, the time printed will include the microseconds.

-T
Show the time spent in system calls. This records the time
difference between the beginning and the end of each system call.

-o filename
Write the trace output to the file filename rather than to stderr.

01:09:11.266949   epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109>
01:09:11.300102   accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014>
01:09:11.300190   fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007>
01:09:11.300237   fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008>
01:09:11.300277   setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008>
01:09:11.300489   accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014>
01:09:11.300547   epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009>
01:09:11.300593   epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007>
01:09:11.300633   read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012>
01:09:11.301727   rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007>
01:09:11.302095   poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008>
01:09:11.302144   write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023>
01:09:11.302221   read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
strace -ttTp <pid> -o <file>
-t
Prefix each line of the trace with the time of day.

-tt
If given twice, the time printed will include the microseconds.

-T
Show the time spent in system calls. This records the time
difference between the beginning and the end of each system call.

-o filename
Write the trace output to the file filename rather than to stderr.

01:09:11.266949   epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109>
01:09:11.300102   accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014>
01:09:11.300190   fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007>
01:09:11.300237   fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008>
01:09:11.300277   setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008>
01:09:11.300489   accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014>
01:09:11.300547   epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009>
01:09:11.300593   epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007>
01:09:11.300633   read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012>
01:09:11.301727   rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007>
01:09:11.302095   poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008>
01:09:11.302144   write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023>
01:09:11.302221   read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
Let’s strace ruby..
Let’s strace ruby..
  15:45:51.658164   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.658244   rt_sigreturn(0x1a)        = 2207807 <0.000009>
  15:45:51.678208   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.678271   rt_sigreturn(0x1a)        = 0 <0.000009>
  15:45:51.698161   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.698216   rt_sigreturn(0x1a)        = 140734552062624 <0.000009>
  15:45:51.718154   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.718192   rt_sigreturn(0x1a)        = 140734552066688 <0.000009>
  15:45:51.738185   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.738221   rt_sigreturn(0x1a)        = 11333952 <0.000008>
  15:45:51.758162   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.758216   rt_sigreturn(0x1a)        = 0 <0.000009>
  15:45:51.778223   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.778296   rt_sigreturn(0x1a)        = 0 <0.000009>
  15:45:51.798170   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.798244   rt_sigreturn(0x1a)        = 2298980 <0.000009>
  15:45:51.818168   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.819817   rt_sigreturn(0x1a)        = 1 <0.000010>
  15:45:51.838196   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
Let’s strace ruby..
  15:45:51.658164   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.658244   rt_sigreturn(0x1a)        = 2207807 <0.000009>
  15:45:51.678208   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.678271   rt_sigreturn(0x1a)        = 0 <0.000009>
  15:45:51.698161   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.698216   rt_sigreturn(0x1a)        = 140734552062624 <0.000009>
  15:45:51.718154   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.718192   rt_sigreturn(0x1a)        = 140734552066688 <0.000009>
  15:45:51.738185   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.738221   rt_sigreturn(0x1a)        = 11333952 <0.000008>
  15:45:51.758162   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.758216   rt_sigreturn(0x1a)        = 0 <0.000009>
  15:45:51.778223   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.778296   rt_sigreturn(0x1a)        = 0 <0.000009>
  15:45:51.798170   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.798244   rt_sigreturn(0x1a)        = 2298980 <0.000009>
  15:45:51.818168   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---
  15:45:51.819817   rt_sigreturn(0x1a)        = 1 <0.000010>
  15:45:51.838196   --- SIGVTALRM (Virtual   timer expired) @ 0 (0) ---



                                    wtf is SIGVTALRM?
ruby uses setitimer and signals
   to schedule green threads*

 The first time a new thread is created, ruby
 calls:

   setitimer(ITIMER_VIRTUAL, 10ms): tell the
   kernel to send the process a SIGVTALRM
   every 10ms

   posix_signal(SIGVTALRM, catch_timer): bind
   the catch_timer function to the signal



                       * when compiled without --enable-pthread
static void
             catch_timer(sig)
                 int sig;
             {
                 if (!rb_thread_critical) {   static VALUE
                   rb_thread_pending = 1;     rb_thread_start_0(fn, arg, th)
                 }                                VALUE (*fn)();
                 /* cause EINTR */                void *arg;
             }                                    rb_thread_t th;
                                              {
void                                              if (!thread_init) {
rb_thread_start_timer()                             thread_init = 1;
{                                                   posix_signal(SIGVTALRM, catch_timer);
     struct itimerval tval;                         rb_thread_start_timer();
                                                  }
    if (!thread_init) return;
    tval.it_interval.tv_sec = 0;                  /* ... */
    tval.it_interval.tv_usec = 10000;         }
    tval.it_value = tval.it_interval;
    setitimer(ITIMER_VIRTUAL, &tval, NULL);
}
static void
             catch_timer(sig)
                 int sig;
             {
                 if (!rb_thread_critical) {   static VALUE
                   rb_thread_pending = 1;     rb_thread_start_0(fn, arg, th)
                 }                                VALUE (*fn)();
                 /* cause EINTR */                void *arg;
             }                                    rb_thread_t th;
                                              {
void                                              if (!thread_init) {
rb_thread_start_timer()                             thread_init = 1;
{                                                   posix_signal(SIGVTALRM, catch_timer);
     struct itimerval tval;                         rb_thread_start_timer();
                                                  }
    if (!thread_init) return;
    tval.it_interval.tv_sec = 0;                  /* ... */
    tval.it_interval.tv_usec = 10000;         }
    tval.it_value = tval.it_interval;
    setitimer(ITIMER_VIRTUAL, &tval, NULL);
}
static void
             catch_timer(sig)
                 int sig;
             {
                 if (!rb_thread_critical) {   static VALUE
                   rb_thread_pending = 1;     rb_thread_start_0(fn, arg, th)
                 }                                VALUE (*fn)();
                 /* cause EINTR */                void *arg;
             }                                    rb_thread_t th;
                                              {
void                                              if (!thread_init) {
rb_thread_start_timer()                             thread_init = 1;
{                                                   posix_signal(SIGVTALRM, catch_timer);
     struct itimerval tval;                         rb_thread_start_timer();
                                                  }
    if (!thread_init) return;
    tval.it_interval.tv_sec = 0;                  /* ... */
    tval.it_interval.tv_usec = 10000;         }
    tval.it_value = tval.it_interval;
    setitimer(ITIMER_VIRTUAL, &tval, NULL);
}
static void
             catch_timer(sig)
                 int sig;
             {
                 if (!rb_thread_critical) {   static VALUE
                   rb_thread_pending = 1;     rb_thread_start_0(fn, arg, th)
                 }                                VALUE (*fn)();
                 /* cause EINTR */                void *arg;
             }                                    rb_thread_t th;
                                              {
void                                              if (!thread_init) {
rb_thread_start_timer()                             thread_init = 1;
{                                                   posix_signal(SIGVTALRM, catch_timer);
     struct itimerval tval;                         rb_thread_start_timer();
                                                  }
    if (!thread_init) return;
    tval.it_interval.tv_sec = 0;                  /* ... */
    tval.it_interval.tv_usec = 10000;         }
    tval.it_value = tval.it_interval;
    setitimer(ITIMER_VIRTUAL, &tval, NULL);
}


    strace -e trace=setitimer ruby threaded.rb
    setitimer(ITIMER_VIRTUAL, {it_interval={0, 10000}, it_value={0, 10000}}, NULL) = 0
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
But I’m not using threads!
begin
  # require 'net/http'
  # Net::HTTP.new(host, port).request(...)

  # require 'net/smtp'
  # Net::SMTP.new('localhost').send_message(...)

  require 'timeout'
  Timeout.timeout(0.1) do
    1+2*3/4 while true
  end
rescue Timeout::Error
end

500_000_000.times{ |i| i * 2 }
But I’m not using threads!
begin
  # require 'net/http'
  # Net::HTTP.new(host, port).request(...)         uses timeout

  # require 'net/smtp'
  # Net::SMTP.new('localhost').send_message(...)

  require 'timeout'
  Timeout.timeout(0.1) do
    1+2*3/4 while true
  end
rescue Timeout::Error
end

500_000_000.times{ |i| i * 2 }
But I’m not using threads!
begin
  # require 'net/http'
  # Net::HTTP.new(host, port).request(...)         uses timeout

  # require 'net/smtp'
  # Net::SMTP.new('localhost').send_message(...)

  require 'timeout'
  Timeout.timeout(0.1) do                          uses threads
    1+2*3/4 while true
  end
rescue Timeout::Error
end

500_000_000.times{ |i| i * 2 }
But I’m not using threads!
begin
  # require 'net/http'
  # Net::HTTP.new(host, port).request(...)         uses timeout

  # require 'net/smtp'
  # Net::SMTP.new('localhost').send_message(...)

  require 'timeout'
  Timeout.timeout(0.1) do                          uses threads
    1+2*3/4 while true
  end
rescue Timeout::Error
end

500_000_000.times{ |i| i * 2 }




      Thread.new, Timeout.timeout and Net::* all use threads
      and start the thread timer

      Once the timer is started, it will interrupt your
      process every 10ms, even if all threads are killed
PATCH: stop the thread timer
   @@ -10518,6 +10520,15 @@ rb_thread_remove(th)
        rb_thread_die(th);
        th->prev->next = th->next;
        th->next->prev = th->prev;
   +
   +    /* if this is the last ruby thread, stop timer signals */
   +    if (th->next == th->prev && th->next == main_thread) {
   +       rb_thread_stop_timer();
   +       thread_init = 0;
   +    }
     }
PATCH: stop the thread timer
            @@ -10518,6 +10520,15 @@ rb_thread_remove(th)
                 rb_thread_die(th);
                 th->prev->next = th->next;
                 th->next->prev = th->prev;
            +
            +    /* if this is the last ruby thread, stop timer signals */
            +    if (th->next == th->prev && th->next == main_thread) {
            +       rb_thread_stop_timer();
            +       thread_init = 0;
            +    }
              }




strace -e trace=setitimer ruby threaded.rb
setitimer(ITIMER_VIRTUAL, {it_interval={0, 10000}, it_value={0, 10000}}, NULL) = 0
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
setitimer(ITIMER_VIRTUAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
Why are our debian servers so slow?
Why are our debian servers so slow?




          strace -ttT ruby threaded.rb
   18:42:39.566788   rt_sigprocmask(SIG_BLOCK, NULL,   [], 8) =   0   <0.000006>
   18:42:39.566836   rt_sigprocmask(SIG_BLOCK, NULL,   [], 8) =   0   <0.000006>
   18:42:39.567083   rt_sigprocmask(SIG_SETMASK, [],   NULL, 8)   =   0 <0.000006>
   18:42:39.567131   rt_sigprocmask(SIG_BLOCK, NULL,   [], 8) =   0   <0.000006>
   18:42:39.567415   rt_sigprocmask(SIG_BLOCK, NULL,   [], 8) =   0   <0.000006>
Why are our debian servers so slow?
            strace -c ruby threaded.rb
     % time     seconds usecs/call      calls    errors syscall
     ------ ----------- ----------- --------- --------- ----------------
     100.00    0.326334           0   3568567           rt_sigprocmask
       0.00    0.000000           0         9           read
       0.00    0.000000           0        10           open
       0.00    0.000000           0        10           close
       0.00    0.000000           0         9           fstat
       0.00    0.000000           0        25           mmap
     ------ ----------- ----------- --------- --------- ----------------
     100.00    0.326334               3568685         0 total



          strace -ttT ruby threaded.rb
   18:42:39.566788   rt_sigprocmask(SIG_BLOCK, NULL,   [], 8) =   0   <0.000006>
   18:42:39.566836   rt_sigprocmask(SIG_BLOCK, NULL,   [], 8) =   0   <0.000006>
   18:42:39.567083   rt_sigprocmask(SIG_SETMASK, [],   NULL, 8)   =   0 <0.000006>
   18:42:39.567131   rt_sigprocmask(SIG_BLOCK, NULL,   [], 8) =   0   <0.000006>
   18:42:39.567415   rt_sigprocmask(SIG_BLOCK, NULL,   [], 8) =   0   <0.000006>
Why are our debian servers so slow?
            strace -c ruby threaded.rb
     % time     seconds usecs/call      calls    errors syscall
     ------ ----------- ----------- --------- --------- ----------------
     100.00    0.326334           0   3568567           rt_sigprocmask
       0.00    0.000000           0         9           read
       0.00    0.000000           0        10           open
       0.00    0.000000           0        10           close
       0.00    0.000000           0         9           fstat
       0.00    0.000000           0        25           mmap
     ------ ----------- ----------- --------- --------- ----------------
     100.00    0.326334               3568685         0 total



          strace -ttT ruby threaded.rb
   18:42:39.566788   rt_sigprocmask(SIG_BLOCK, NULL,   [], 8) =   0   <0.000006>
   18:42:39.566836   rt_sigprocmask(SIG_BLOCK, NULL,   [], 8) =   0   <0.000006>
   18:42:39.567083   rt_sigprocmask(SIG_SETMASK, [],   NULL, 8)   =   0 <0.000006>
   18:42:39.567131   rt_sigprocmask(SIG_BLOCK, NULL,   [], 8) =   0   <0.000006>
   18:42:39.567415   rt_sigprocmask(SIG_BLOCK, NULL,   [], 8) =   0   <0.000006>



                       3.5 million sigprocmasks.. wtf?
What is --enable-pthread anyway?
                       --- config.h.nopthread
 uses a pthread for    +++ config.h
                       @@ -173,6 +173,12 @@
 timing instead of      #define FILE_READEND _IO_read_end
                        #define HAVE__SC_CLK_TCK 1
 setitimer()            #define STACK_GROW_DIRECTION -1
                       +#define _REENTRANT 1
                       +#define _THREAD_SAFE 1

 useful for            +#define HAVE_LIBPTHREAD 1
                       +#define HAVE_NANOSLEEP 1

 compatibility with    +#define HAVE_GETCONTEXT 1
                       +#define HAVE_SETCONTEXT 1

 external libs that     #define DEFAULT_KCODE KCODE_NONE
                        #define USE_ELF 1

 use pthreads or        #define DLEXT_MAXLEN 3


 signals (like ruby-   #ifdef _THREAD_SAFE
                         pthread_create(&time_thread, 0,

 tk)                   #else
                                        thread_timer, 0);

                         rb_thread_start_timer();
                       #endif
What is --enable-pthread anyway?
                       --- config.h.nopthread
 uses a pthread for    +++ config.h
                       @@ -173,6 +173,12 @@
 timing instead of      #define FILE_READEND _IO_read_end
                        #define HAVE__SC_CLK_TCK 1
 setitimer()            #define STACK_GROW_DIRECTION -1
                       +#define _REENTRANT 1
                       +#define _THREAD_SAFE 1

 useful for            +#define HAVE_LIBPTHREAD 1
                       +#define HAVE_NANOSLEEP 1

 compatibility with    +#define HAVE_GETCONTEXT 1
                       +#define HAVE_SETCONTEXT 1

 external libs that     #define DEFAULT_KCODE KCODE_NONE
                        #define USE_ELF 1

 use pthreads or        #define DLEXT_MAXLEN 3


 signals (like ruby-   #ifdef _THREAD_SAFE
                         pthread_create(&time_thread, 0,

 tk)                   #else
                                        thread_timer, 0);

                         rb_thread_start_timer();
                       #endif
 but.. it also
 enables getcontext/
 setcontext??
What is --enable-pthread anyway?
                       --- config.h.nopthread
 uses a pthread for    +++ config.h
                       @@ -173,6 +173,12 @@
 timing instead of      #define FILE_READEND _IO_read_end
                        #define HAVE__SC_CLK_TCK 1
 setitimer()            #define STACK_GROW_DIRECTION -1
                       +#define _REENTRANT 1
                       +#define _THREAD_SAFE 1

 useful for            +#define HAVE_LIBPTHREAD 1
                       +#define HAVE_NANOSLEEP 1

 compatibility with    +#define HAVE_GETCONTEXT 1
                       +#define HAVE_SETCONTEXT 1

 external libs that     #define DEFAULT_KCODE KCODE_NONE
                        #define USE_ELF 1

 use pthreads or        #define DLEXT_MAXLEN 3


 signals (like ruby-   #ifdef _THREAD_SAFE
                         pthread_create(&time_thread, 0,
                                                            ?
 tk)                   #else
                                        thread_timer, 0);

                         rb_thread_start_timer();
                       #endif
 but.. it also
                       #if defined(HAVE_GETCONTEXT) &&
 enables getcontext/       defined(HAVE_SETCONTEXT)
                       #include <ucontext.h>
 setcontext??          #define USE_CONTEXT
                       #endif
ucontext?
ucontext?
ruby can use either setjmp/longjmp or
setcontext/getcontext in its
threading implementation and for
exception handling
ucontext?
ruby can use either setjmp/longjmp or
setcontext/getcontext in its
threading implementation and for
exception handling

setjmp/longjmp save and restore the
current cpu registers
ucontext?
ruby can use either setjmp/longjmp or
setcontext/getcontext in its
threading implementation and for
exception handling

setjmp/longjmp save and restore the
current cpu registers

setcontext/getcontext are an advanced
version of setjmp/longjmp, but they
also call sigprocmask to save/restore
the signal mask before each jump
PATCH: --disable-ucontext
--- a/configure.in
+++ b/configure.in
@@ -368,6 +368,10 @@
+AC_ARG_ENABLE(ucontext,
+       [ --disable-ucontext       do not use getcontext()/setcontext().],
+       [disable_ucontext=yes], [disable_ucontext=no])
+

 AC_ARG_ENABLE(pthread,
        [ --enable-pthread         use pthread library.],
        [enable_pthread=$enableval], [enable_pthread=no])
@@ -1038,7 +1042,8 @@
-if test x"$ac_cv_header_ucontext_h" = xyes; then
+if test x"$ac_cv_header_ucontext_h" = xyes && test x"$disable_ucontext" = xno; then
     if test x"$rb_with_pthread" = xyes; then
        AC_CHECK_FUNCS(getcontext setcontext)
     fi



        ./configure --enable-pthread --disable-ucontext
PATCH: --disable-ucontext
--- a/configure.in
+++ b/configure.in
@@ -368,6 +368,10 @@
+AC_ARG_ENABLE(ucontext,
+       [ --disable-ucontext       do not use getcontext()/setcontext().],
+       [disable_ucontext=yes], [disable_ucontext=no])
+

 AC_ARG_ENABLE(pthread,
        [ --enable-pthread         use pthread library.],
        [enable_pthread=$enableval], [enable_pthread=no])
@@ -1038,7 +1042,8 @@
-if test x"$ac_cv_header_ucontext_h" = xyes; then
+if test x"$ac_cv_header_ucontext_h" = xyes && test x"$disable_ucontext" = xno; then
     if test x"$rb_with_pthread" = xyes; then
        AC_CHECK_FUNCS(getcontext setcontext)
     fi



        ./configure --enable-pthread --disable-ucontext

     % time     seconds usecs/call      calls    errors syscall
     ------ ----------- ----------- --------- --------- ----------------
        nan    0.000000           0        13           read
        nan    0.000000           0        21        10 open
        nan    0.000000           0        11           close
     ------ ----------- ----------- --------- --------- ----------------
     100.00    0.000000                    45        10 total
EventMachine + threads = slow??
EventMachine allocates large buffers on the
stack to read/write from the network

Using threads with EM made ruby extremely
slow..
EventMachine + threads = slow??
EventMachine allocates large buffers on the
stack to read/write from the network

Using threads with EM made ruby extremely
slow..




                                 ...profile?
EventMachine + threads = slow??
  EventMachine allocates large buffers on the
  stack to read/write from the network

  Using threads with EM made ruby extremely
  slow..
                       #include "ruby.h"
require 'cext'
                       VALUE bigstack(VALUE self)
(1..2).map{            {
  Thread.new{            char buffer[ 50 * 1024 ]; /* large stack frame */
    CExt.bigstack{       if (rb_block_given_p()) rb_yield(Qnil);
      100_000.times{     return Qnil;
        1*2+3/4        }
        Thread.pass
      }                void Init_cext()
    }                  {
  }                      VALUE CExt = rb_define_module("CExt");
}.map{ |t| t.join }      rb_define_singleton_method(CExt, "bigstack", bigstack, 0);
                       }


                                                            ...profile?
EventMachine + threads = slow??
  EventMachine allocates large buffers on the
  stack to read/write from the network

  Using threads with EM made ruby extremely
  slow..
                       #include "ruby.h"
require 'cext'
                       VALUE bigstack(VALUE self)
(1..2).map{            {
  Thread.new{            char buffer[ 50 * 1024 ]; /* large stack frame */
    CExt.bigstack{       if (rb_block_given_p()) rb_yield(Qnil);
      100_000.times{     return Qnil;
        1*2+3/4        }
        Thread.pass
      }                void Init_cext()
    }                  {
  }                      VALUE CExt = rb_define_module("CExt");
}.map{ |t| t.join }      rb_define_singleton_method(CExt, "bigstack", bigstack, 0);
                       }


                                                            ...profile?
EventMachine + threads = slow??
  EventMachine allocates large buffers on the
  stack to read/write from the network

  Using threads with EM made ruby extremely
  slow..
                       #include "ruby.h"
require 'cext'
                       VALUE bigstack(VALUE self)
(1..2).map{            {
  Thread.new{            char buffer[ 50 * 1024 ]; /* large stack frame */
    CExt.bigstack{       if (rb_block_given_p()) rb_yield(Qnil);
      100_000.times{     return Qnil;
        1*2+3/4        }
        Thread.pass
      }                void Init_cext()
    }                  {
  }                      VALUE CExt = rb_define_module("CExt");
}.map{ |t| t.join }      rb_define_singleton_method(CExt, "bigstack", bigstack, 0);
                       }


                                                            ...profile?
EventMachine + threads = slow??
  EventMachine allocates large buffers on the
  stack to read/write from the network

  Using threads with EM made ruby extremely
  slow..
                       #include "ruby.h"
require 'cext'
                       VALUE bigstack(VALUE self)
(1..2).map{            {
  Thread.new{            char buffer[ 50 * 1024 ]; /* large stack frame */
    CExt.bigstack{       if (rb_block_given_p()) rb_yield(Qnil);
      100_000.times{     return Qnil;
        1*2+3/4        }
        Thread.pass
      }                void Init_cext()
    }                  {
  }                      VALUE CExt = rb_define_module("CExt");
}.map{ |t| t.join }      rb_define_singleton_method(CExt, "bigstack", bigstack, 0);
                       }


                                                            ...profile?
EventMachine + threads = slow??
  EventMachine allocates large buffers on the
  stack to read/write from the network

  Using threads with EM made ruby extremely
  slow..
                       #include "ruby.h"
require 'cext'
                       VALUE bigstack(VALUE self)
(1..2).map{            {
  Thread.new{            char buffer[ 50 * 1024 ]; /* large stack frame */
    CExt.bigstack{       if (rb_block_given_p()) rb_yield(Qnil);
      100_000.times{     return Qnil;
        1*2+3/4        }
        Thread.pass
      }                void Init_cext()
    }                  {
  }                      VALUE CExt = rb_define_module("CExt");
}.map{ |t| t.join }      rb_define_singleton_method(CExt, "bigstack", bigstack, 0);
                       }


                                                            ...profile?
google-perftools
       Google’s CPU profiler


       export LD_PRELOAD=libprofiler.so

export DYLD_INSERT_LIBRARIES=libprofiler.dylib


       CPUPROFILE=/tmp/myprof ./myapp


          pprof ./myapp /tmp/myprof
wget http://google-perftools.googlecode.com/files/google-
perftools-1.3.tar.gz
tar zxvf google-perftools-1.3.tar.gz
cd google-perftools-1.3

./configure --prefix=/opt
make
sudo make install

# for linux
export LD_PRELOAD=/opt/lib/libprofiler.so

# for osx
export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib

CPUPROFILE=/tmp/ruby.prof ruby -e'
  5_000_000.times{ "hello world" }
'

pprof `which ruby` --text /tmp/ruby.prof
wget http://google-perftools.googlecode.com/files/google-
perftools-1.3.tar.gz                                 download
tar zxvf google-perftools-1.3.tar.gz
cd google-perftools-1.3

./configure --prefix=/opt
make
sudo make install

# for linux
export LD_PRELOAD=/opt/lib/libprofiler.so

# for osx
export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib

CPUPROFILE=/tmp/ruby.prof ruby -e'
  5_000_000.times{ "hello world" }
'

pprof `which ruby` --text /tmp/ruby.prof
wget http://google-perftools.googlecode.com/files/google-
perftools-1.3.tar.gz                                 download
tar zxvf google-perftools-1.3.tar.gz
cd google-perftools-1.3

./configure --prefix=/opt
make                                                 compile
sudo make install

# for linux
export LD_PRELOAD=/opt/lib/libprofiler.so

# for osx
export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib

CPUPROFILE=/tmp/ruby.prof ruby -e'
  5_000_000.times{ "hello world" }
'

pprof `which ruby` --text /tmp/ruby.prof
wget http://google-perftools.googlecode.com/files/google-
perftools-1.3.tar.gz                                 download
tar zxvf google-perftools-1.3.tar.gz
cd google-perftools-1.3

./configure --prefix=/opt
make                                                 compile
sudo make install

# for linux
export LD_PRELOAD=/opt/lib/libprofiler.so            setup

# for osx
export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib

CPUPROFILE=/tmp/ruby.prof ruby -e'
  5_000_000.times{ "hello world" }
'

pprof `which ruby` --text /tmp/ruby.prof
wget http://google-perftools.googlecode.com/files/google-
perftools-1.3.tar.gz                                 download
tar zxvf google-perftools-1.3.tar.gz
cd google-perftools-1.3

./configure --prefix=/opt
make                                                 compile
sudo make install

# for linux
export LD_PRELOAD=/opt/lib/libprofiler.so            setup

# for osx
export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib

CPUPROFILE=/tmp/ruby.prof ruby -e'                   profile
  5_000_000.times{ "hello world" }
'

pprof `which ruby` --text /tmp/ruby.prof
wget http://google-perftools.googlecode.com/files/google-
perftools-1.3.tar.gz                                 download
tar zxvf google-perftools-1.3.tar.gz
cd google-perftools-1.3

./configure --prefix=/opt
make                                                 compile
sudo make install

# for linux
export LD_PRELOAD=/opt/lib/libprofiler.so            setup

# for osx
export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib

CPUPROFILE=/tmp/ruby.prof ruby -e'                   profile
  5_000_000.times{ "hello world" }
'

pprof `which ruby` --text /tmp/ruby.prof             report
pprof ruby                                  pprof ruby
  ruby.prof --text                            ruby.prof --gif
Total: 103 samples
  20 19.4% 19.4%      95 92.2% rb_yield_0
  11 10.7% 30.1%     103 100.0% rb_eval
   8   7.8% 37.9%     12 11.7% gc_sweep
   3   2.9% 68.9%     52 50.5% rb_str_new3
   3   2.9% 74.8%      3   2.9% obj_free
   3   2.9% 77.7%    103 100.0% int_dotimes
   3   2.9% 80.6%     12 11.7% gc_mark
Profiling EM + threads
Profiling EM + threads
           Total: 3763 samples
            2764 73.5% catch_timer
             989 26.3% memcpy
               3   0.1% st_lookup
               2   0.1% rb_thread_schedule
               1   0.0% rb_eval
               1   0.0% rb_newobj
               1   0.0% rb_gc_force_recycle
Profiling EM + threads
           Total: 3763 samples
            2764 73.5% catch_timer
             989 26.3% memcpy
               3   0.1% st_lookup
               2   0.1% rb_thread_schedule
               1   0.0% rb_eval
               1   0.0% rb_newobj
               1   0.0% rb_gc_force_recycle



                rb_thread_save_context
Profiling EM + threads
           Total: 3763 samples
            2764 73.5% catch_timer
             989 26.3% memcpy
               3   0.1% st_lookup
               2   0.1% rb_thread_schedule
               1   0.0% rb_eval
               1   0.0% rb_newobj
               1   0.0% rb_gc_force_recycle



                rb_thread_save_context

                rb_thread_restore_context
Profiling EM + threads
           Total: 3763 samples
            2764 73.5% catch_timer
             989 26.3% memcpy
               3   0.1% st_lookup
               2   0.1% rb_thread_schedule
               1   0.0% rb_eval
               1   0.0% rb_newobj
               1   0.0% rb_gc_force_recycle



                rb_thread_save_context

                rb_thread_restore_context

                memcpy???
Profiling EM + threads
           Total: 3763 samples
            2764 73.5% catch_timer
             989 26.3% memcpy
               3   0.1% st_lookup
               2   0.1% rb_thread_schedule
               1   0.0% rb_eval
               1   0.0% rb_newobj
               1   0.0% rb_gc_force_recycle



                rb_thread_save_context

                rb_thread_restore_context

                memcpy???



                 really? memcpy?
ltrace
    trace library calls


      ltrace -cp <pid>

ltrace -ttTp <pid> -o <file>
ltrace -c ruby cext_test.rb
ltrace -c ruby cext_test.rb
% time     seconds usecs/call      calls       function
------ ----------- ----------- --------- --------------------
 48.65   11.741295         617     19009 memcpy
 30.16    7.279634         831      8751 longjmp
  9.78    2.359889         135     17357 _setjmp
  8.91    2.150565         285      7540 malloc
  1.10    0.265946          20     13021 memset
  0.81    0.195272          19     10105 __ctype_b_loc
  0.35    0.084575          19      4361 strcmp
  0.19    0.046163          19      2377 strlen
  0.03    0.006272          23       265 realloc
------ ----------- ----------- --------- --------------------
100.00   24.134999                 82999 total
ltrace -c ruby cext_test.rb
% time     seconds usecs/call      calls       function
------ ----------- ----------- --------- --------------------
 48.65   11.741295         617     19009 memcpy             really
 30.16    7.279634         831      8751 longjmp
  9.78    2.359889         135     17357 _setjmp
  8.91    2.150565         285      7540 malloc
  1.10    0.265946          20     13021 memset
  0.81    0.195272          19     10105 __ctype_b_loc
  0.35    0.084575          19      4361 strcmp
  0.19    0.046163          19      2377 strlen
  0.03    0.006272          23       265 realloc
------ ----------- ----------- --------- --------------------
100.00   24.134999                 82999 total
ltrace -c ruby cext_test.rb
         % time     seconds usecs/call      calls       function
         ------ ----------- ----------- --------- --------------------
          48.65   11.741295         617     19009 memcpy             really
          30.16    7.279634         831      8751 longjmp
           9.78    2.359889         135     17357 _setjmp
           8.91    2.150565         285      7540 malloc
           1.10    0.265946          20     13021 memset
           0.81    0.195272          19     10105 __ctype_b_loc
           0.35    0.084575          19      4361 strcmp
           0.19    0.046163          19      2377 strlen
           0.03    0.006272          23       265 realloc
         ------ ----------- ----------- --------- --------------------
         100.00   24.134999                 82999 total


    ltrace -ttT -e memcpy ruby cext_test.rb
01:24:48.769408 --- SIGVTALRM (Virtual timer expired) ---
01:24:48.769616 memcpy(0x1216000, "", 1086328)   = 0x1216000 <0.000578>
01:24:48.770555 memcpy(0x6e32670, "240&343v", 1086328) = 0x6e32670 <0.000418>

01:24:49.899414 --- SIGVTALRM (Virtual timer expired) ---
01:24:49.899490 memcpy(0x1320000, "", 1082584)   = 0x1320000 <0.000628>
01:24:49.900474 memcpy(0x6e32670, "", 1086328) = 0x6e32670 <0.000479>
OK, its calling memcpy()
    but what is it copying?
OK, its calling memcpy()
                 but what is it copying?
static void                                  static void
rb_thread_save_context(th)                   rb_thread_restore_context(th)
    rb_thread_t th;                              rb_thread_t th;
{                                            {
    VALUE *pos;                                  ruby_frame = th->frame;
    int len;                                     ruby_scope = th->scope;

    ruby_setjmp(th->context);                    /* ... */

    len = ruby_stack_length(&pos);               MEMCPY(th->stk_pos,
    th->stk_pos = pos;                                  th->stk_ptr, VALUE, th->stk_len);
    th->stk_len = len;                           ruby_longjmp(th->context);
    MEMCPY(th->stk_ptr,                      }
           th->stk_pos, VALUE, th->stk_len);

    th->frame = ruby_frame;
    th->scope = ruby_scope;

    /* ... */
}
OK, its calling memcpy()
                 but what is it copying?
static void                                  static void
rb_thread_save_context(th)                   rb_thread_restore_context(th)
    rb_thread_t th;                              rb_thread_t th;
{                                            {
    VALUE *pos;                                  ruby_frame = th->frame;
    int len;                                     ruby_scope = th->scope;

    ruby_setjmp(th->context);                     /* ... */

    len = ruby_stack_length(&pos);                MEMCPY(th->stk_pos,
    th->stk_pos = pos;                                   th->stk_ptr, VALUE, th->stk_len);
    th->stk_len = len;                            ruby_longjmp(th->context);
    MEMCPY(th->stk_ptr,                      }
           th->stk_pos, VALUE, th->stk_len);
                                                 1. save cpu registers
    th->frame = ruby_frame;
    th->scope = ruby_scope;

    /* ... */
}
OK, its calling memcpy()
                 but what is it copying?
static void                                  static void
rb_thread_save_context(th)                   rb_thread_restore_context(th)
    rb_thread_t th;                              rb_thread_t th;
{                                            {
    VALUE *pos;                                  ruby_frame = th->frame;
    int len;                                     ruby_scope = th->scope;

    ruby_setjmp(th->context);                    /* ... */

    len = ruby_stack_length(&pos);               MEMCPY(th->stk_pos,
    th->stk_pos = pos;                                  th->stk_ptr, VALUE, th->stk_len);
    th->stk_len = len;                           ruby_longjmp(th->context);
    MEMCPY(th->stk_ptr,                      }
           th->stk_pos, VALUE, th->stk_len);

    th->frame = ruby_frame;
    th->scope = ruby_scope;

    /* ... */
}
OK, its calling memcpy()
                 but what is it copying?
static void                                  static void
rb_thread_save_context(th)                   rb_thread_restore_context(th)
    rb_thread_t th;                              rb_thread_t th;
{                                            {
    VALUE *pos;                                  ruby_frame = th->frame;
    int len;                                     ruby_scope = th->scope;

    ruby_setjmp(th->context);                     /* ... */

    len = ruby_stack_length(&pos);                MEMCPY(th->stk_pos,
    th->stk_pos = pos;                                   th->stk_ptr, VALUE, th->stk_len);
    th->stk_len = len;                            ruby_longjmp(th->context);
    MEMCPY(th->stk_ptr,                      }
           th->stk_pos, VALUE, th->stk_len);

    th->frame = ruby_frame;                      2. save stack frames
    th->scope = ruby_scope;

    /* ... */
}
OK, its calling memcpy()
                 but what is it copying?
static void                                  static void
rb_thread_save_context(th)                   rb_thread_restore_context(th)
    rb_thread_t th;                              rb_thread_t th;
{                                            {
    VALUE *pos;                                  ruby_frame = th->frame;
    int len;                                     ruby_scope = th->scope;

    ruby_setjmp(th->context);                    /* ... */

    len = ruby_stack_length(&pos);               MEMCPY(th->stk_pos,
    th->stk_pos = pos;                                  th->stk_ptr, VALUE, th->stk_len);
    th->stk_len = len;                           ruby_longjmp(th->context);
    MEMCPY(th->stk_ptr,                      }
           th->stk_pos, VALUE, th->stk_len);

    th->frame = ruby_frame;
    th->scope = ruby_scope;

    /* ... */
}
OK, its calling memcpy()
                 but what is it copying?
static void                                  static void
rb_thread_save_context(th)                   rb_thread_restore_context(th)
    rb_thread_t th;                              rb_thread_t th;
{                                            {
    VALUE *pos;                                  ruby_frame = th->frame;
    int len;                                     ruby_scope = th->scope;

    ruby_setjmp(th->context);                     /* ... */

    len = ruby_stack_length(&pos);                MEMCPY(th->stk_pos,
    th->stk_pos = pos;                                   th->stk_ptr, VALUE, th->stk_len);
    th->stk_len = len;                            ruby_longjmp(th->context);
    MEMCPY(th->stk_ptr,                      }
           th->stk_pos, VALUE, th->stk_len);

    th->frame = ruby_frame;
    th->scope = ruby_scope;                      3. save vm globals
    /* ... */
}
OK, its calling memcpy()
                 but what is it copying?
static void                                  static void
rb_thread_save_context(th)                   rb_thread_restore_context(th)
    rb_thread_t th;                              rb_thread_t th;
{                                            {
    VALUE *pos;                                  ruby_frame = th->frame;
    int len;                                     ruby_scope = th->scope;

    ruby_setjmp(th->context);                    /* ... */

    len = ruby_stack_length(&pos);               MEMCPY(th->stk_pos,
    th->stk_pos = pos;                                  th->stk_ptr, VALUE, th->stk_len);
    th->stk_len = len;                           ruby_longjmp(th->context);
    MEMCPY(th->stk_ptr,                      }
           th->stk_pos, VALUE, th->stk_len);

    th->frame = ruby_frame;
    th->scope = ruby_scope;

    /* ... */
}
OK, its calling memcpy()
                 but what is it copying?
static void                                  static void
rb_thread_save_context(th)                   rb_thread_restore_context(th)
    rb_thread_t th;                              rb_thread_t th;
{                                            {
    VALUE *pos;                                  ruby_frame = th->frame;
    int len;                                     ruby_scope = th->scope;

    ruby_setjmp(th->context);                     /* ... */

    len = ruby_stack_length(&pos);                MEMCPY(th->stk_pos,
    th->stk_pos = pos;                                   th->stk_ptr, VALUE, th->stk_len);
    th->stk_len = len;                            ruby_longjmp(th->context);
    MEMCPY(th->stk_ptr,                      }
           th->stk_pos, VALUE, th->stk_len);

    th->frame = ruby_frame;
    th->scope = ruby_scope;

    /* ... */
                                                 4. restore vm globals
}
OK, its calling memcpy()
                 but what is it copying?
static void                                  static void
rb_thread_save_context(th)                   rb_thread_restore_context(th)
    rb_thread_t th;                              rb_thread_t th;
{                                            {
    VALUE *pos;                                  ruby_frame = th->frame;
    int len;                                     ruby_scope = th->scope;

    ruby_setjmp(th->context);                    /* ... */

    len = ruby_stack_length(&pos);               MEMCPY(th->stk_pos,
    th->stk_pos = pos;                                  th->stk_ptr, VALUE, th->stk_len);
    th->stk_len = len;                           ruby_longjmp(th->context);
    MEMCPY(th->stk_ptr,                      }
           th->stk_pos, VALUE, th->stk_len);

    th->frame = ruby_frame;
    th->scope = ruby_scope;

    /* ... */
}
OK, its calling memcpy()
                 but what is it copying?
static void                                  static void
rb_thread_save_context(th)                   rb_thread_restore_context(th)
    rb_thread_t th;                              rb_thread_t th;
{                                            {
    VALUE *pos;                                  ruby_frame = th->frame;
    int len;                                     ruby_scope = th->scope;

    ruby_setjmp(th->context);                     /* ... */

    len = ruby_stack_length(&pos);                MEMCPY(th->stk_pos,
    th->stk_pos = pos;                                   th->stk_ptr, VALUE, th->stk_len);
    th->stk_len = len;                            ruby_longjmp(th->context);
    MEMCPY(th->stk_ptr,                      }
           th->stk_pos, VALUE, th->stk_len);

    th->frame = ruby_frame;
    th->scope = ruby_scope;

    /* ... */
}                                                5. restore stack frames
OK, its calling memcpy()
                 but what is it copying?
static void                                  static void
rb_thread_save_context(th)                   rb_thread_restore_context(th)
    rb_thread_t th;                              rb_thread_t th;
{                                            {
    VALUE *pos;                                  ruby_frame = th->frame;
    int len;                                     ruby_scope = th->scope;

    ruby_setjmp(th->context);                    /* ... */

    len = ruby_stack_length(&pos);               MEMCPY(th->stk_pos,
    th->stk_pos = pos;                                  th->stk_ptr, VALUE, th->stk_len);
    th->stk_len = len;                           ruby_longjmp(th->context);
    MEMCPY(th->stk_ptr,                      }
           th->stk_pos, VALUE, th->stk_len);

    th->frame = ruby_frame;
    th->scope = ruby_scope;

    /* ... */
}
OK, its calling memcpy()
                 but what is it copying?
static void                                  static void
rb_thread_save_context(th)                   rb_thread_restore_context(th)
    rb_thread_t th;                              rb_thread_t th;
{                                            {
    VALUE *pos;                                  ruby_frame = th->frame;
    int len;                                     ruby_scope = th->scope;

    ruby_setjmp(th->context);                     /* ... */

    len = ruby_stack_length(&pos);                MEMCPY(th->stk_pos,
    th->stk_pos = pos;                                   th->stk_ptr, VALUE, th->stk_len);
    th->stk_len = len;                            ruby_longjmp(th->context);
    MEMCPY(th->stk_ptr,                      }
           th->stk_pos, VALUE, th->stk_len);

    th->frame = ruby_frame;
    th->scope = ruby_scope;

    /* ... */
}
                                                 6. restore cpu registers
OK, its calling memcpy()
                 but what is it copying?
static void                                  static void
rb_thread_save_context(th)                   rb_thread_restore_context(th)
    rb_thread_t th;                              rb_thread_t th;
{                                            {
    VALUE *pos;                                  ruby_frame = th->frame;
    int len;                                     ruby_scope = th->scope;

    ruby_setjmp(th->context);                    /* ... */

    len = ruby_stack_length(&pos);               MEMCPY(th->stk_pos,
    th->stk_pos = pos;                                  th->stk_ptr, VALUE, th->stk_len);
    th->stk_len = len;                           ruby_longjmp(th->context);
    MEMCPY(th->stk_ptr,                      }
           th->stk_pos, VALUE, th->stk_len);

    th->frame = ruby_frame;
    th->scope = ruby_scope;

    /* ... */
}
OK, its calling memcpy()
                 but what is it copying?
static void                                  static void
rb_thread_save_context(th)                   rb_thread_restore_context(th)
    rb_thread_t th;                              rb_thread_t th;
{                                            {
    VALUE *pos;                                  ruby_frame = th->frame;
    int len;                                     ruby_scope = th->scope;

    ruby_setjmp(th->context);                    /* ... */

    len = ruby_stack_length(&pos);               MEMCPY(th->stk_pos,
    th->stk_pos = pos;                                  th->stk_ptr, VALUE, th->stk_len);
    th->stk_len = len;                           ruby_longjmp(th->context);
    MEMCPY(th->stk_ptr,                      }
           th->stk_pos, VALUE, th->stk_len);

    th->frame = ruby_frame;
    th->scope = ruby_scope;

    /* ... */
}



    it’s copying the stacks to the heap!
Stack vs. Heap
Stack vs. Heap




Stack:
Stack vs. Heap




Stack:

   Storage for local vars
Stack vs. Heap




Stack:

   Storage for local vars

         Only valid while stack
         frame is on the stack!
Stack vs. Heap




Stack:

   Storage for local vars

         Only valid while stack
         frame is on the stack!

   Keeping track of function calls
Stack vs. Heap




Stack:                               Heap:

   Storage for local vars

         Only valid while stack
         frame is on the stack!

   Keeping track of function calls
Stack vs. Heap




Stack:                               Heap:

   Storage for local vars              Storage for vars that
                                       persist across function
         Only valid while stack        calls.
         frame is on the stack!

   Keeping track of function calls
Stack vs. Heap




Stack:                               Heap:

   Storage for local vars              Storage for vars that
                                       persist across function
         Only valid while stack        calls.
         frame is on the stack!
                                       Managed by malloc
   Keeping track of function calls
Stack vs. Heap



                     func1()
                          void *data;
                          func2();


Stack:                                  Heap:

   Storage for local vars                 Storage for vars that
                                          persist across function
         Only valid while stack           calls.
         frame is on the stack!
                                          Managed by malloc
   Keeping track of function calls
Stack vs. Heap



                     func1()
 4 bytes                  void *data;
                          func2();


Stack:                                  Heap:

   Storage for local vars                 Storage for vars that
                                          persist across function
         Only valid while stack           calls.
         frame is on the stack!
                                          Managed by malloc
   Keeping track of function calls
Stack vs. Heap


                     func2()
                          char *string = malloc(10);
                          func3();

                     func1()
 4 bytes                  void *data;
                          func2();


Stack:                                       Heap:

   Storage for local vars                      Storage for vars that
                                               persist across function
         Only valid while stack                calls.
         frame is on the stack!
                                               Managed by malloc
   Keeping track of function calls
Stack vs. Heap


                     func2()
 4 bytes                  char *string = malloc(10);
                          func3();

                     func1()
 4 bytes                  void *data;
                          func2();


Stack:                                       Heap:

   Storage for local vars                      Storage for vars that
                                               persist across function
         Only valid while stack                calls.
         frame is on the stack!
                                               Managed by malloc
   Keeping track of function calls
Stack vs. Heap


                     func2()
 4 bytes                  char *string = malloc(10);       10 bytes
                          func3();

                     func1()
 4 bytes                  void *data;
                          func2();


Stack:                                       Heap:

   Storage for local vars                      Storage for vars that
                                               persist across function
         Only valid while stack                calls.
         frame is on the stack!
                                               Managed by malloc
   Keeping track of function calls
Stack vs. Heap
                     func3()
                         char buffer[8];




                     func2()
 4 bytes                  char *string = malloc(10);       10 bytes
                          func3();

                     func1()
 4 bytes                  void *data;
                          func2();


Stack:                                       Heap:

   Storage for local vars                      Storage for vars that
                                               persist across function
         Only valid while stack                calls.
         frame is on the stack!
                                               Managed by malloc
   Keeping track of function calls
Stack vs. Heap
                     func3()
 8 bytes                 char buffer[8];




                     func2()
 4 bytes                  char *string = malloc(10);       10 bytes
                          func3();

                     func1()
 4 bytes                  void *data;
                          func2();


Stack:                                       Heap:

   Storage for local vars                      Storage for vars that
                                               persist across function
         Only valid while stack                calls.
         frame is on the stack!
                                               Managed by malloc
   Keeping track of function calls
Stack vs. Heap
                     func3()
                         char buffer[8];




                     func2()
 4 bytes                  char *string = malloc(10);       10 bytes
                          func3();

                     func1()
 4 bytes                  void *data;
                          func2();


Stack:                                       Heap:

   Storage for local vars                      Storage for vars that
                                               persist across function
         Only valid while stack                calls.
         frame is on the stack!
                                               Managed by malloc
   Keeping track of function calls
Stack vs. Heap


                     func2()
 4 bytes                  char *string = malloc(10);       10 bytes
                          func3();

                     func1()
 4 bytes                  void *data;
                          func2();


Stack:                                       Heap:

   Storage for local vars                      Storage for vars that
                                               persist across function
         Only valid while stack                calls.
         frame is on the stack!
                                               Managed by malloc
   Keeping track of function calls
memcpy()ing the
 thread stacks
memcpy()ing the
         thread stacks




During execution
memcpy()ing the
         thread stacks




During execution   Saving current thread
memcpy()ing the
         thread stacks




During execution   Saving current thread   Restoring next thread
memcpy()ing the
         thread stacks




During execution   Saving current thread   Restoring next thread



    so, what’s on these thread stacks?
gdb
 the GNU debugger


   gdb <program>
gdb <program> <pid>


  Be sure to build with:
           -ggdb
            -O0
gdb walkthrough
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3	    int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0;
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it           start gdb
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3	    int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0;
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3	    int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0;
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average           set breakpoint on function named average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3	    int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0;
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3	    int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0;
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run          run program
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3	    int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0;
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3	    int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0;
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3   hit breakpoint!
3	    int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0;
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3	    int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0;
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3	    int sum = x + y;
(gdb) bt           show backtrace
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0;
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3	    int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0;
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3	    int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
                                                  function stack
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0;
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3	    int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0;
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3	    int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0; single step
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3	    int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0;
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3	    int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0;
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5                          print variables
(gdb) p sum
$2 = 11
gdb walkthrough
% gdb ./test-it
(gdb) b average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3
3	    int sum = x + y;
(gdb) bt
#0 average (x=5, y=6) at test-it.c:3
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4	    double avg = sum / 2.0;
(gdb) s
5	    return avg;
(gdb) p avg
$1 = 5.5
(gdb) p sum
$2 = 11
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
What’s on the ruby stack?
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48


rb_eval recursively executes ruby code in 1.8
How big is the stack?
How big is the stack?
#8   rb_eval at eval.c:3493

(gdb) p $ebp - $esp
$1 = 968
How big is the stack?
#8   rb_eval at eval.c:3493

(gdb) p $ebp - $esp     base - stack ptr = frame size

$1 = 968
How big is the stack?
#8   rb_eval at eval.c:3493

(gdb) p $ebp - $esp         base - stack ptr = frame size

$1 = 968          each   rb_eval stack frame is almost 1k!
How big is the stack?
#8   rb_eval at eval.c:3493

(gdb) p $ebp - $esp         base - stack ptr = frame size

$1 = 968          each   rb_eval stack frame is almost 1k!



#0   rb_thread_save_context at eval.c:10597

(gdb) p (void*)rb_gc_stack_start - $esp
$1 = 10572
How big is the stack?
#8   rb_eval at eval.c:3493

(gdb) p $ebp - $esp          base - stack ptr = frame size

$1 = 968          each    rb_eval stack frame is almost 1k!



#0   rb_thread_save_context at eval.c:10597

(gdb) p (void*)rb_gc_stack_start - $esp
$1 = 10572             10.5k stack will be memcpy()’d
How big is the stack?
#8   rb_eval at eval.c:3493

(gdb) p $ebp - $esp          base - stack ptr = frame size

$1 = 968          each    rb_eval stack frame is almost 1k!



#0   rb_thread_save_context at eval.c:10597

(gdb) p (void*)rb_gc_stack_start - $esp
$1 = 10572             10.5k stack will be memcpy()’d



 50 method calls * 1k ≈ 50k stack
Recap: How do Ruby threads work?
Recap: How do Ruby threads work?

 Each thread has it’s own execution context:

   saved cpu registers (setjmp/longjmp)

   copy of vm globals (current frame, scope, block)

   stack (memcpy)
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome
Threaded Awesome

More Related Content

Similar to Threaded Awesome

Concurrent Programming with Ruby and Tuple Spaces
Concurrent Programming with Ruby and Tuple SpacesConcurrent Programming with Ruby and Tuple Spaces
Concurrent Programming with Ruby and Tuple Spacesluccastera
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wallugur candan
 
Chalmers microprocessor sept 2010
Chalmers microprocessor sept 2010Chalmers microprocessor sept 2010
Chalmers microprocessor sept 2010parallellabs
 
Wuala, P2P Online Storage
Wuala, P2P Online StorageWuala, P2P Online Storage
Wuala, P2P Online Storageadunne
 
Need for Async: Hot pursuit for scalable applications
Need for Async: Hot pursuit for scalable applicationsNeed for Async: Hot pursuit for scalable applications
Need for Async: Hot pursuit for scalable applicationsKonrad Malawski
 
Some Rough Fibrous Material
Some Rough Fibrous MaterialSome Rough Fibrous Material
Some Rough Fibrous MaterialMurray Steele
 
Let's Talk Locks!
Let's Talk Locks!Let's Talk Locks!
Let's Talk Locks!C4Media
 
Gluster Cloud Night in Tokyo 2013 -- Tips for getting started
Gluster Cloud Night in Tokyo 2013 -- Tips for getting startedGluster Cloud Night in Tokyo 2013 -- Tips for getting started
Gluster Cloud Night in Tokyo 2013 -- Tips for getting startedKeisuke Takahashi
 
Understanding how concurrency work in os
Understanding how concurrency work in osUnderstanding how concurrency work in os
Understanding how concurrency work in osGenchiLu1
 
Parallel Programming: Beyond the Critical Section
Parallel Programming: Beyond the Critical SectionParallel Programming: Beyond the Critical Section
Parallel Programming: Beyond the Critical SectionTony Albrecht
 
The Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseArangoDB Database
 
Kernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are moneyKernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are moneyAnne Nicolas
 
Microkernels and Beyond
Microkernels and BeyondMicrokernels and Beyond
Microkernels and BeyondDavid Evans
 
Ruby thread safety first
Ruby thread safety firstRuby thread safety first
Ruby thread safety firstEmily Stolfo
 
Systemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to loveSystemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to loveAlison Chaiken
 
LXC Containers and AUFs
LXC Containers and AUFsLXC Containers and AUFs
LXC Containers and AUFsDocker, Inc.
 
Lecture #2 threading, networking &amp; permissions final version #2
Lecture #2  threading, networking &amp; permissions final version #2Lecture #2  threading, networking &amp; permissions final version #2
Lecture #2 threading, networking &amp; permissions final version #2Vitali Pekelis
 

Similar to Threaded Awesome (20)

Concurrent Programming with Ruby and Tuple Spaces
Concurrent Programming with Ruby and Tuple SpacesConcurrent Programming with Ruby and Tuple Spaces
Concurrent Programming with Ruby and Tuple Spaces
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
 
Chalmers microprocessor sept 2010
Chalmers microprocessor sept 2010Chalmers microprocessor sept 2010
Chalmers microprocessor sept 2010
 
Wuala, P2P Online Storage
Wuala, P2P Online StorageWuala, P2P Online Storage
Wuala, P2P Online Storage
 
Need for Async: Hot pursuit for scalable applications
Need for Async: Hot pursuit for scalable applicationsNeed for Async: Hot pursuit for scalable applications
Need for Async: Hot pursuit for scalable applications
 
Some Rough Fibrous Material
Some Rough Fibrous MaterialSome Rough Fibrous Material
Some Rough Fibrous Material
 
Let's Talk Locks!
Let's Talk Locks!Let's Talk Locks!
Let's Talk Locks!
 
Gluster Cloud Night in Tokyo 2013 -- Tips for getting started
Gluster Cloud Night in Tokyo 2013 -- Tips for getting startedGluster Cloud Night in Tokyo 2013 -- Tips for getting started
Gluster Cloud Night in Tokyo 2013 -- Tips for getting started
 
Understanding how concurrency work in os
Understanding how concurrency work in osUnderstanding how concurrency work in os
Understanding how concurrency work in os
 
Parallel Programming: Beyond the Critical Section
Parallel Programming: Beyond the Critical SectionParallel Programming: Beyond the Critical Section
Parallel Programming: Beyond the Critical Section
 
Parallelformers
ParallelformersParallelformers
Parallelformers
 
The Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed Database
 
Kernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are moneyKernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are money
 
Tuning Java Servers
Tuning Java Servers Tuning Java Servers
Tuning Java Servers
 
Microkernels and Beyond
Microkernels and BeyondMicrokernels and Beyond
Microkernels and Beyond
 
Ruby thread safety first
Ruby thread safety firstRuby thread safety first
Ruby thread safety first
 
Systemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to loveSystemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to love
 
LXC Containers and AUFs
LXC Containers and AUFsLXC Containers and AUFs
LXC Containers and AUFs
 
.ppt
.ppt.ppt
.ppt
 
Lecture #2 threading, networking &amp; permissions final version #2
Lecture #2  threading, networking &amp; permissions final version #2Lecture #2  threading, networking &amp; permissions final version #2
Lecture #2 threading, networking &amp; permissions final version #2
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 

Threaded Awesome

  • 1. Threaded Awesome (that’s an oxymoron) Joe Damato and Aman Gupta
  • 2. About Joe Damato From NJ, Godfather II is actually my Biography CMU/VMWare alum http://timetobleed.com @joedamato
  • 3. About Aman Gupta EventMachine, amqp Ruby Hero 2009 github.com/tmm1 @tmm1
  • 4. What is a thread? source: wikipedia
  • 5. What is a thread?
  • 6. What is a thread? A thread is just a set of execution state
  • 7. What is a thread? A thread is just a set of execution state This state usually includes:
  • 8. What is a thread? A thread is just a set of execution state This state usually includes: instruction & stack pointers
  • 9. What is a thread? A thread is just a set of execution state This state usually includes: instruction & stack pointers scheduling priority
  • 10. What is a thread? A thread is just a set of execution state This state usually includes: instruction & stack pointers scheduling priority other CPU state
  • 11. Threading Models Green threads (1:N) Native Threads (1:1) Hybrid (M:N)
  • 13. Green Threads (1:N) “Green” because they are light weight
  • 14. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist
  • 15. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland
  • 16. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland Pros
  • 17. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland Pros Create lots of them cheaply (10,000s)
  • 18. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland Pros Create lots of them cheaply (10,000s) Switch between them cheaply (Ruby doesn’t)
  • 19. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland Pros Create lots of them cheaply (10,000s) Switch between them cheaply (Ruby doesn’t) Schedule them however you want
  • 20. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland Pros Create lots of them cheaply (10,000s) Switch between them cheaply (Ruby doesn’t) Schedule them however you want Cons
  • 21. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland Pros Create lots of them cheaply (10,000s) Switch between them cheaply (Ruby doesn’t) Schedule them however you want Cons A blocking call in one blocks ALL
  • 22. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland Pros Create lots of them cheaply (10,000s) Switch between them cheaply (Ruby doesn’t) Schedule them however you want Cons A blocking call in one blocks ALL Kernel doesn’t know about them
  • 23. Green Threads (1:N) “Green” because they are light weight Kernel doesn’t know they exist Implementation is in userland Pros Create lots of them cheaply (10,000s) Switch between them cheaply (Ruby doesn’t) Schedule them however you want Cons A blocking call in one blocks ALL Kernel doesn’t know about them Can’t take advantage of SMP
  • 24. Green Threads (1:N) (pics or it didn’t happen)
  • 25. Ruby 1.8 uses Green Threads (and does it wrong)
  • 28. Native Threads (1:1) Native Threads Kernel knows they exist
  • 29. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread)
  • 30. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros
  • 31. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros Take advantage of SMP
  • 32. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros Take advantage of SMP Shared memory
  • 33. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros Take advantage of SMP Shared memory Blocking in one thread doesn’t block everyone
  • 34. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros Take advantage of SMP Shared memory Blocking in one thread doesn’t block everyone Don’t have to write a scheduler
  • 35. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros Take advantage of SMP Shared memory Blocking in one thread doesn’t block everyone Don’t have to write a scheduler Cons
  • 36. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros Take advantage of SMP Shared memory Blocking in one thread doesn’t block everyone Don’t have to write a scheduler Cons Overhead limits how many you can create
  • 37. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros Take advantage of SMP Shared memory Blocking in one thread doesn’t block everyone Don’t have to write a scheduler Cons Overhead limits how many you can create Bugs (glibc, more threads = slower creation time)
  • 38. Native Threads (1:1) Native Threads Kernel knows they exist Some userland code (libpthread) Pros Take advantage of SMP Shared memory Blocking in one thread doesn’t block everyone Don’t have to write a scheduler Cons Overhead limits how many you can create Bugs (glibc, more threads = slower creation time) Don’t have fine grained scheduling control
  • 40. Ruby 1.9 uses Native Threads (but.. they don’t execute in parallel)
  • 43. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds
  • 44. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds Pros
  • 45. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds Pros Take advantage of SMP
  • 46. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds Pros Take advantage of SMP Cheap setup and teardown
  • 47. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds Pros Take advantage of SMP Cheap setup and teardown Blocking in one thread doesn’t block everyone
  • 48. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds Pros Take advantage of SMP Cheap setup and teardown Blocking in one thread doesn’t block everyone Cons
  • 49. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds Pros Take advantage of SMP Cheap setup and teardown Blocking in one thread doesn’t block everyone Cons Need 2 schedulers (userland + kernel)
  • 50. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds Pros Take advantage of SMP Cheap setup and teardown Blocking in one thread doesn’t block everyone Cons Need 2 schedulers (userland + kernel) Need to make them actually work together
  • 51. Hybrid Threads (M:N) Hybrid threads Almost best of both worlds Pros Take advantage of SMP Cheap setup and teardown Blocking in one thread doesn’t block everyone Cons Need 2 schedulers (userland + kernel) Need to make them actually work together All green threads backed by same native thread can be blocked
  • 53. Erlang uses Hybrid Threads Ruby 1.9, too (with fibers)
  • 54. Multitasking Types Preemptive Multitasking Cooperative Multitasking
  • 56. Preemptive Multitasking Outside event (timer) signals end of CPU slice
  • 57. Preemptive Multitasking Outside event (timer) signals end of CPU slice Handle important events quickly
  • 58. Preemptive Multitasking Outside event (timer) signals end of CPU slice Handle important events quickly Can help ensure everyone gets to execute
  • 59. Preemptive Multitasking Outside event (timer) signals end of CPU slice Handle important events quickly Can help ensure everyone gets to execute But..
  • 60. Preemptive Multitasking Outside event (timer) signals end of CPU slice Handle important events quickly Can help ensure everyone gets to execute But.. Need to build a smart scheduler
  • 61. Preemptive Multitasking Outside event (timer) signals end of CPU slice Handle important events quickly Can help ensure everyone gets to execute But.. Need to build a smart scheduler Can yield non-determistic execution order
  • 63. Cooperative Multitasking Threads voluntarily release the CPU
  • 64. Cooperative Multitasking Threads voluntarily release the CPU Give up the CPU when it is “optimal”
  • 65. Cooperative Multitasking Threads voluntarily release the CPU Give up the CPU when it is “optimal” Can guarantee deterministic execution order
  • 66. Cooperative Multitasking Threads voluntarily release the CPU Give up the CPU when it is “optimal” Can guarantee deterministic execution order Very simple “scheduler”
  • 67. Cooperative Multitasking Threads voluntarily release the CPU Give up the CPU when it is “optimal” Can guarantee deterministic execution order Very simple “scheduler” But..
  • 68. Cooperative Multitasking Threads voluntarily release the CPU Give up the CPU when it is “optimal” Can guarantee deterministic execution order Very simple “scheduler” But.. Badly written code can hang all threads
  • 69. So, what is a fiber? In Ruby fibers are green threads with cooperative multitasking.
  • 70. So what’s the deal with ruby threads? strace google-perftools ltrace gdb
  • 71. strace trace system calls and signals strace -cp <pid> strace -ttTp <pid> -o <file>
  • 72. strace -cp <pid> -c Count time, calls, and errors for each system call and report a summary on program exit. -p pid Attach to the process with the process ID pid and begin tracing. % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 50.39 0.000064 0 1197 592 read 34.65 0.000044 0 609 writev 14.96 0.000019 0 1226 epoll_ctl 0.00 0.000000 0 4 close 0.00 0.000000 0 1 select 0.00 0.000000 0 4 socket 0.00 0.000000 0 4 4 connect 0.00 0.000000 0 1057 epoll_wait ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000127 4134 596 total
  • 73. strace -ttTp <pid> -o <file> -t Prefix each line of the trace with the time of day. -tt If given twice, the time printed will include the microseconds. -T Show the time spent in system calls. This records the time difference between the beginning and the end of each system call. -o filename Write the trace output to the file filename rather than to stderr. 01:09:11.266949 epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109> 01:09:11.300102 accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014> 01:09:11.300190 fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007> 01:09:11.300237 fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008> 01:09:11.300277 setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008> 01:09:11.300489 accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014> 01:09:11.300547 epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009> 01:09:11.300593 epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007> 01:09:11.300633 read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012> 01:09:11.301727 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007> 01:09:11.302095 poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008> 01:09:11.302144 write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023> 01:09:11.302221 read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
  • 74. strace -ttTp <pid> -o <file> -t Prefix each line of the trace with the time of day. -tt If given twice, the time printed will include the microseconds. -T Show the time spent in system calls. This records the time difference between the beginning and the end of each system call. -o filename Write the trace output to the file filename rather than to stderr. 01:09:11.266949 epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109> 01:09:11.300102 accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014> 01:09:11.300190 fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007> 01:09:11.300237 fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008> 01:09:11.300277 setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008> 01:09:11.300489 accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014> 01:09:11.300547 epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009> 01:09:11.300593 epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007> 01:09:11.300633 read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012> 01:09:11.301727 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007> 01:09:11.302095 poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008> 01:09:11.302144 write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023> 01:09:11.302221 read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
  • 75. strace -ttTp <pid> -o <file> -t Prefix each line of the trace with the time of day. -tt If given twice, the time printed will include the microseconds. -T Show the time spent in system calls. This records the time difference between the beginning and the end of each system call. -o filename Write the trace output to the file filename rather than to stderr. 01:09:11.266949 epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109> 01:09:11.300102 accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014> 01:09:11.300190 fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007> 01:09:11.300237 fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008> 01:09:11.300277 setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008> 01:09:11.300489 accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014> 01:09:11.300547 epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009> 01:09:11.300593 epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007> 01:09:11.300633 read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012> 01:09:11.301727 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007> 01:09:11.302095 poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008> 01:09:11.302144 write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023> 01:09:11.302221 read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
  • 76. strace -ttTp <pid> -o <file> -t Prefix each line of the trace with the time of day. -tt If given twice, the time printed will include the microseconds. -T Show the time spent in system calls. This records the time difference between the beginning and the end of each system call. -o filename Write the trace output to the file filename rather than to stderr. 01:09:11.266949 epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109> 01:09:11.300102 accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014> 01:09:11.300190 fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007> 01:09:11.300237 fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008> 01:09:11.300277 setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008> 01:09:11.300489 accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014> 01:09:11.300547 epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009> 01:09:11.300593 epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007> 01:09:11.300633 read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012> 01:09:11.301727 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007> 01:09:11.302095 poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008> 01:09:11.302144 write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023> 01:09:11.302221 read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
  • 78. Let’s strace ruby.. 15:45:51.658164 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.658244 rt_sigreturn(0x1a) = 2207807 <0.000009> 15:45:51.678208 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.678271 rt_sigreturn(0x1a) = 0 <0.000009> 15:45:51.698161 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.698216 rt_sigreturn(0x1a) = 140734552062624 <0.000009> 15:45:51.718154 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.718192 rt_sigreturn(0x1a) = 140734552066688 <0.000009> 15:45:51.738185 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.738221 rt_sigreturn(0x1a) = 11333952 <0.000008> 15:45:51.758162 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.758216 rt_sigreturn(0x1a) = 0 <0.000009> 15:45:51.778223 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.778296 rt_sigreturn(0x1a) = 0 <0.000009> 15:45:51.798170 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.798244 rt_sigreturn(0x1a) = 2298980 <0.000009> 15:45:51.818168 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.819817 rt_sigreturn(0x1a) = 1 <0.000010> 15:45:51.838196 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
  • 79. Let’s strace ruby.. 15:45:51.658164 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.658244 rt_sigreturn(0x1a) = 2207807 <0.000009> 15:45:51.678208 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.678271 rt_sigreturn(0x1a) = 0 <0.000009> 15:45:51.698161 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.698216 rt_sigreturn(0x1a) = 140734552062624 <0.000009> 15:45:51.718154 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.718192 rt_sigreturn(0x1a) = 140734552066688 <0.000009> 15:45:51.738185 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.738221 rt_sigreturn(0x1a) = 11333952 <0.000008> 15:45:51.758162 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.758216 rt_sigreturn(0x1a) = 0 <0.000009> 15:45:51.778223 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.778296 rt_sigreturn(0x1a) = 0 <0.000009> 15:45:51.798170 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.798244 rt_sigreturn(0x1a) = 2298980 <0.000009> 15:45:51.818168 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- 15:45:51.819817 rt_sigreturn(0x1a) = 1 <0.000010> 15:45:51.838196 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- wtf is SIGVTALRM?
  • 80. ruby uses setitimer and signals to schedule green threads* The first time a new thread is created, ruby calls: setitimer(ITIMER_VIRTUAL, 10ms): tell the kernel to send the process a SIGVTALRM every 10ms posix_signal(SIGVTALRM, catch_timer): bind the catch_timer function to the signal * when compiled without --enable-pthread
  • 81. static void catch_timer(sig) int sig; { if (!rb_thread_critical) { static VALUE rb_thread_pending = 1; rb_thread_start_0(fn, arg, th) } VALUE (*fn)(); /* cause EINTR */ void *arg; } rb_thread_t th; { void if (!thread_init) { rb_thread_start_timer() thread_init = 1; { posix_signal(SIGVTALRM, catch_timer); struct itimerval tval; rb_thread_start_timer(); } if (!thread_init) return; tval.it_interval.tv_sec = 0; /* ... */ tval.it_interval.tv_usec = 10000; } tval.it_value = tval.it_interval; setitimer(ITIMER_VIRTUAL, &tval, NULL); }
  • 82. static void catch_timer(sig) int sig; { if (!rb_thread_critical) { static VALUE rb_thread_pending = 1; rb_thread_start_0(fn, arg, th) } VALUE (*fn)(); /* cause EINTR */ void *arg; } rb_thread_t th; { void if (!thread_init) { rb_thread_start_timer() thread_init = 1; { posix_signal(SIGVTALRM, catch_timer); struct itimerval tval; rb_thread_start_timer(); } if (!thread_init) return; tval.it_interval.tv_sec = 0; /* ... */ tval.it_interval.tv_usec = 10000; } tval.it_value = tval.it_interval; setitimer(ITIMER_VIRTUAL, &tval, NULL); }
  • 83. static void catch_timer(sig) int sig; { if (!rb_thread_critical) { static VALUE rb_thread_pending = 1; rb_thread_start_0(fn, arg, th) } VALUE (*fn)(); /* cause EINTR */ void *arg; } rb_thread_t th; { void if (!thread_init) { rb_thread_start_timer() thread_init = 1; { posix_signal(SIGVTALRM, catch_timer); struct itimerval tval; rb_thread_start_timer(); } if (!thread_init) return; tval.it_interval.tv_sec = 0; /* ... */ tval.it_interval.tv_usec = 10000; } tval.it_value = tval.it_interval; setitimer(ITIMER_VIRTUAL, &tval, NULL); }
  • 84. static void catch_timer(sig) int sig; { if (!rb_thread_critical) { static VALUE rb_thread_pending = 1; rb_thread_start_0(fn, arg, th) } VALUE (*fn)(); /* cause EINTR */ void *arg; } rb_thread_t th; { void if (!thread_init) { rb_thread_start_timer() thread_init = 1; { posix_signal(SIGVTALRM, catch_timer); struct itimerval tval; rb_thread_start_timer(); } if (!thread_init) return; tval.it_interval.tv_sec = 0; /* ... */ tval.it_interval.tv_usec = 10000; } tval.it_value = tval.it_interval; setitimer(ITIMER_VIRTUAL, &tval, NULL); } strace -e trace=setitimer ruby threaded.rb setitimer(ITIMER_VIRTUAL, {it_interval={0, 10000}, it_value={0, 10000}}, NULL) = 0 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
  • 85. But I’m not using threads! begin # require 'net/http' # Net::HTTP.new(host, port).request(...) # require 'net/smtp' # Net::SMTP.new('localhost').send_message(...) require 'timeout' Timeout.timeout(0.1) do 1+2*3/4 while true end rescue Timeout::Error end 500_000_000.times{ |i| i * 2 }
  • 86. But I’m not using threads! begin # require 'net/http' # Net::HTTP.new(host, port).request(...) uses timeout # require 'net/smtp' # Net::SMTP.new('localhost').send_message(...) require 'timeout' Timeout.timeout(0.1) do 1+2*3/4 while true end rescue Timeout::Error end 500_000_000.times{ |i| i * 2 }
  • 87. But I’m not using threads! begin # require 'net/http' # Net::HTTP.new(host, port).request(...) uses timeout # require 'net/smtp' # Net::SMTP.new('localhost').send_message(...) require 'timeout' Timeout.timeout(0.1) do uses threads 1+2*3/4 while true end rescue Timeout::Error end 500_000_000.times{ |i| i * 2 }
  • 88. But I’m not using threads! begin # require 'net/http' # Net::HTTP.new(host, port).request(...) uses timeout # require 'net/smtp' # Net::SMTP.new('localhost').send_message(...) require 'timeout' Timeout.timeout(0.1) do uses threads 1+2*3/4 while true end rescue Timeout::Error end 500_000_000.times{ |i| i * 2 } Thread.new, Timeout.timeout and Net::* all use threads and start the thread timer Once the timer is started, it will interrupt your process every 10ms, even if all threads are killed
  • 89. PATCH: stop the thread timer @@ -10518,6 +10520,15 @@ rb_thread_remove(th) rb_thread_die(th); th->prev->next = th->next; th->next->prev = th->prev; + + /* if this is the last ruby thread, stop timer signals */ + if (th->next == th->prev && th->next == main_thread) { + rb_thread_stop_timer(); + thread_init = 0; + } }
  • 90. PATCH: stop the thread timer @@ -10518,6 +10520,15 @@ rb_thread_remove(th) rb_thread_die(th); th->prev->next = th->next; th->next->prev = th->prev; + + /* if this is the last ruby thread, stop timer signals */ + if (th->next == th->prev && th->next == main_thread) { + rb_thread_stop_timer(); + thread_init = 0; + } } strace -e trace=setitimer ruby threaded.rb setitimer(ITIMER_VIRTUAL, {it_interval={0, 10000}, it_value={0, 10000}}, NULL) = 0 --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- setitimer(ITIMER_VIRTUAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
  • 91.
  • 92. Why are our debian servers so slow?
  • 93. Why are our debian servers so slow? strace -ttT ruby threaded.rb 18:42:39.566788 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.566836 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.567083 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000006> 18:42:39.567131 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.567415 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006>
  • 94. Why are our debian servers so slow? strace -c ruby threaded.rb % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.326334 0 3568567 rt_sigprocmask 0.00 0.000000 0 9 read 0.00 0.000000 0 10 open 0.00 0.000000 0 10 close 0.00 0.000000 0 9 fstat 0.00 0.000000 0 25 mmap ------ ----------- ----------- --------- --------- ---------------- 100.00 0.326334 3568685 0 total strace -ttT ruby threaded.rb 18:42:39.566788 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.566836 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.567083 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000006> 18:42:39.567131 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.567415 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006>
  • 95. Why are our debian servers so slow? strace -c ruby threaded.rb % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.326334 0 3568567 rt_sigprocmask 0.00 0.000000 0 9 read 0.00 0.000000 0 10 open 0.00 0.000000 0 10 close 0.00 0.000000 0 9 fstat 0.00 0.000000 0 25 mmap ------ ----------- ----------- --------- --------- ---------------- 100.00 0.326334 3568685 0 total strace -ttT ruby threaded.rb 18:42:39.566788 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.566836 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.567083 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000006> 18:42:39.567131 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 18:42:39.567415 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006> 3.5 million sigprocmasks.. wtf?
  • 96. What is --enable-pthread anyway? --- config.h.nopthread uses a pthread for +++ config.h @@ -173,6 +173,12 @@ timing instead of #define FILE_READEND _IO_read_end #define HAVE__SC_CLK_TCK 1 setitimer() #define STACK_GROW_DIRECTION -1 +#define _REENTRANT 1 +#define _THREAD_SAFE 1 useful for +#define HAVE_LIBPTHREAD 1 +#define HAVE_NANOSLEEP 1 compatibility with +#define HAVE_GETCONTEXT 1 +#define HAVE_SETCONTEXT 1 external libs that #define DEFAULT_KCODE KCODE_NONE #define USE_ELF 1 use pthreads or #define DLEXT_MAXLEN 3 signals (like ruby- #ifdef _THREAD_SAFE pthread_create(&time_thread, 0, tk) #else thread_timer, 0); rb_thread_start_timer(); #endif
  • 97. What is --enable-pthread anyway? --- config.h.nopthread uses a pthread for +++ config.h @@ -173,6 +173,12 @@ timing instead of #define FILE_READEND _IO_read_end #define HAVE__SC_CLK_TCK 1 setitimer() #define STACK_GROW_DIRECTION -1 +#define _REENTRANT 1 +#define _THREAD_SAFE 1 useful for +#define HAVE_LIBPTHREAD 1 +#define HAVE_NANOSLEEP 1 compatibility with +#define HAVE_GETCONTEXT 1 +#define HAVE_SETCONTEXT 1 external libs that #define DEFAULT_KCODE KCODE_NONE #define USE_ELF 1 use pthreads or #define DLEXT_MAXLEN 3 signals (like ruby- #ifdef _THREAD_SAFE pthread_create(&time_thread, 0, tk) #else thread_timer, 0); rb_thread_start_timer(); #endif but.. it also enables getcontext/ setcontext??
  • 98. What is --enable-pthread anyway? --- config.h.nopthread uses a pthread for +++ config.h @@ -173,6 +173,12 @@ timing instead of #define FILE_READEND _IO_read_end #define HAVE__SC_CLK_TCK 1 setitimer() #define STACK_GROW_DIRECTION -1 +#define _REENTRANT 1 +#define _THREAD_SAFE 1 useful for +#define HAVE_LIBPTHREAD 1 +#define HAVE_NANOSLEEP 1 compatibility with +#define HAVE_GETCONTEXT 1 +#define HAVE_SETCONTEXT 1 external libs that #define DEFAULT_KCODE KCODE_NONE #define USE_ELF 1 use pthreads or #define DLEXT_MAXLEN 3 signals (like ruby- #ifdef _THREAD_SAFE pthread_create(&time_thread, 0, ? tk) #else thread_timer, 0); rb_thread_start_timer(); #endif but.. it also #if defined(HAVE_GETCONTEXT) && enables getcontext/ defined(HAVE_SETCONTEXT) #include <ucontext.h> setcontext?? #define USE_CONTEXT #endif
  • 100. ucontext? ruby can use either setjmp/longjmp or setcontext/getcontext in its threading implementation and for exception handling
  • 101. ucontext? ruby can use either setjmp/longjmp or setcontext/getcontext in its threading implementation and for exception handling setjmp/longjmp save and restore the current cpu registers
  • 102. ucontext? ruby can use either setjmp/longjmp or setcontext/getcontext in its threading implementation and for exception handling setjmp/longjmp save and restore the current cpu registers setcontext/getcontext are an advanced version of setjmp/longjmp, but they also call sigprocmask to save/restore the signal mask before each jump
  • 103. PATCH: --disable-ucontext --- a/configure.in +++ b/configure.in @@ -368,6 +368,10 @@ +AC_ARG_ENABLE(ucontext, + [ --disable-ucontext do not use getcontext()/setcontext().], + [disable_ucontext=yes], [disable_ucontext=no]) + AC_ARG_ENABLE(pthread, [ --enable-pthread use pthread library.], [enable_pthread=$enableval], [enable_pthread=no]) @@ -1038,7 +1042,8 @@ -if test x"$ac_cv_header_ucontext_h" = xyes; then +if test x"$ac_cv_header_ucontext_h" = xyes && test x"$disable_ucontext" = xno; then if test x"$rb_with_pthread" = xyes; then AC_CHECK_FUNCS(getcontext setcontext) fi ./configure --enable-pthread --disable-ucontext
  • 104. PATCH: --disable-ucontext --- a/configure.in +++ b/configure.in @@ -368,6 +368,10 @@ +AC_ARG_ENABLE(ucontext, + [ --disable-ucontext do not use getcontext()/setcontext().], + [disable_ucontext=yes], [disable_ucontext=no]) + AC_ARG_ENABLE(pthread, [ --enable-pthread use pthread library.], [enable_pthread=$enableval], [enable_pthread=no]) @@ -1038,7 +1042,8 @@ -if test x"$ac_cv_header_ucontext_h" = xyes; then +if test x"$ac_cv_header_ucontext_h" = xyes && test x"$disable_ucontext" = xno; then if test x"$rb_with_pthread" = xyes; then AC_CHECK_FUNCS(getcontext setcontext) fi ./configure --enable-pthread --disable-ucontext % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- nan 0.000000 0 13 read nan 0.000000 0 21 10 open nan 0.000000 0 11 close ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000000 45 10 total
  • 105.
  • 106. EventMachine + threads = slow?? EventMachine allocates large buffers on the stack to read/write from the network Using threads with EM made ruby extremely slow..
  • 107. EventMachine + threads = slow?? EventMachine allocates large buffers on the stack to read/write from the network Using threads with EM made ruby extremely slow.. ...profile?
  • 108. EventMachine + threads = slow?? EventMachine allocates large buffers on the stack to read/write from the network Using threads with EM made ruby extremely slow.. #include "ruby.h" require 'cext' VALUE bigstack(VALUE self) (1..2).map{ { Thread.new{ char buffer[ 50 * 1024 ]; /* large stack frame */ CExt.bigstack{ if (rb_block_given_p()) rb_yield(Qnil); 100_000.times{ return Qnil; 1*2+3/4 } Thread.pass } void Init_cext() } { } VALUE CExt = rb_define_module("CExt"); }.map{ |t| t.join } rb_define_singleton_method(CExt, "bigstack", bigstack, 0); } ...profile?
  • 109. EventMachine + threads = slow?? EventMachine allocates large buffers on the stack to read/write from the network Using threads with EM made ruby extremely slow.. #include "ruby.h" require 'cext' VALUE bigstack(VALUE self) (1..2).map{ { Thread.new{ char buffer[ 50 * 1024 ]; /* large stack frame */ CExt.bigstack{ if (rb_block_given_p()) rb_yield(Qnil); 100_000.times{ return Qnil; 1*2+3/4 } Thread.pass } void Init_cext() } { } VALUE CExt = rb_define_module("CExt"); }.map{ |t| t.join } rb_define_singleton_method(CExt, "bigstack", bigstack, 0); } ...profile?
  • 110. EventMachine + threads = slow?? EventMachine allocates large buffers on the stack to read/write from the network Using threads with EM made ruby extremely slow.. #include "ruby.h" require 'cext' VALUE bigstack(VALUE self) (1..2).map{ { Thread.new{ char buffer[ 50 * 1024 ]; /* large stack frame */ CExt.bigstack{ if (rb_block_given_p()) rb_yield(Qnil); 100_000.times{ return Qnil; 1*2+3/4 } Thread.pass } void Init_cext() } { } VALUE CExt = rb_define_module("CExt"); }.map{ |t| t.join } rb_define_singleton_method(CExt, "bigstack", bigstack, 0); } ...profile?
  • 111. EventMachine + threads = slow?? EventMachine allocates large buffers on the stack to read/write from the network Using threads with EM made ruby extremely slow.. #include "ruby.h" require 'cext' VALUE bigstack(VALUE self) (1..2).map{ { Thread.new{ char buffer[ 50 * 1024 ]; /* large stack frame */ CExt.bigstack{ if (rb_block_given_p()) rb_yield(Qnil); 100_000.times{ return Qnil; 1*2+3/4 } Thread.pass } void Init_cext() } { } VALUE CExt = rb_define_module("CExt"); }.map{ |t| t.join } rb_define_singleton_method(CExt, "bigstack", bigstack, 0); } ...profile?
  • 112. EventMachine + threads = slow?? EventMachine allocates large buffers on the stack to read/write from the network Using threads with EM made ruby extremely slow.. #include "ruby.h" require 'cext' VALUE bigstack(VALUE self) (1..2).map{ { Thread.new{ char buffer[ 50 * 1024 ]; /* large stack frame */ CExt.bigstack{ if (rb_block_given_p()) rb_yield(Qnil); 100_000.times{ return Qnil; 1*2+3/4 } Thread.pass } void Init_cext() } { } VALUE CExt = rb_define_module("CExt"); }.map{ |t| t.join } rb_define_singleton_method(CExt, "bigstack", bigstack, 0); } ...profile?
  • 113. google-perftools Google’s CPU profiler export LD_PRELOAD=libprofiler.so export DYLD_INSERT_LIBRARIES=libprofiler.dylib CPUPROFILE=/tmp/myprof ./myapp pprof ./myapp /tmp/myprof
  • 114. wget http://google-perftools.googlecode.com/files/google- perftools-1.3.tar.gz tar zxvf google-perftools-1.3.tar.gz cd google-perftools-1.3 ./configure --prefix=/opt make sudo make install # for linux export LD_PRELOAD=/opt/lib/libprofiler.so # for osx export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib CPUPROFILE=/tmp/ruby.prof ruby -e' 5_000_000.times{ "hello world" } ' pprof `which ruby` --text /tmp/ruby.prof
  • 115. wget http://google-perftools.googlecode.com/files/google- perftools-1.3.tar.gz download tar zxvf google-perftools-1.3.tar.gz cd google-perftools-1.3 ./configure --prefix=/opt make sudo make install # for linux export LD_PRELOAD=/opt/lib/libprofiler.so # for osx export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib CPUPROFILE=/tmp/ruby.prof ruby -e' 5_000_000.times{ "hello world" } ' pprof `which ruby` --text /tmp/ruby.prof
  • 116. wget http://google-perftools.googlecode.com/files/google- perftools-1.3.tar.gz download tar zxvf google-perftools-1.3.tar.gz cd google-perftools-1.3 ./configure --prefix=/opt make compile sudo make install # for linux export LD_PRELOAD=/opt/lib/libprofiler.so # for osx export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib CPUPROFILE=/tmp/ruby.prof ruby -e' 5_000_000.times{ "hello world" } ' pprof `which ruby` --text /tmp/ruby.prof
  • 117. wget http://google-perftools.googlecode.com/files/google- perftools-1.3.tar.gz download tar zxvf google-perftools-1.3.tar.gz cd google-perftools-1.3 ./configure --prefix=/opt make compile sudo make install # for linux export LD_PRELOAD=/opt/lib/libprofiler.so setup # for osx export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib CPUPROFILE=/tmp/ruby.prof ruby -e' 5_000_000.times{ "hello world" } ' pprof `which ruby` --text /tmp/ruby.prof
  • 118. wget http://google-perftools.googlecode.com/files/google- perftools-1.3.tar.gz download tar zxvf google-perftools-1.3.tar.gz cd google-perftools-1.3 ./configure --prefix=/opt make compile sudo make install # for linux export LD_PRELOAD=/opt/lib/libprofiler.so setup # for osx export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib CPUPROFILE=/tmp/ruby.prof ruby -e' profile 5_000_000.times{ "hello world" } ' pprof `which ruby` --text /tmp/ruby.prof
  • 119. wget http://google-perftools.googlecode.com/files/google- perftools-1.3.tar.gz download tar zxvf google-perftools-1.3.tar.gz cd google-perftools-1.3 ./configure --prefix=/opt make compile sudo make install # for linux export LD_PRELOAD=/opt/lib/libprofiler.so setup # for osx export DYLD_INSERT_LIBRARIES=/opt/lib/libprofiler.dylib CPUPROFILE=/tmp/ruby.prof ruby -e' profile 5_000_000.times{ "hello world" } ' pprof `which ruby` --text /tmp/ruby.prof report
  • 120. pprof ruby pprof ruby ruby.prof --text ruby.prof --gif Total: 103 samples 20 19.4% 19.4% 95 92.2% rb_yield_0 11 10.7% 30.1% 103 100.0% rb_eval 8 7.8% 37.9% 12 11.7% gc_sweep 3 2.9% 68.9% 52 50.5% rb_str_new3 3 2.9% 74.8% 3 2.9% obj_free 3 2.9% 77.7% 103 100.0% int_dotimes 3 2.9% 80.6% 12 11.7% gc_mark
  • 121. Profiling EM + threads
  • 122. Profiling EM + threads Total: 3763 samples 2764 73.5% catch_timer 989 26.3% memcpy 3 0.1% st_lookup 2 0.1% rb_thread_schedule 1 0.0% rb_eval 1 0.0% rb_newobj 1 0.0% rb_gc_force_recycle
  • 123. Profiling EM + threads Total: 3763 samples 2764 73.5% catch_timer 989 26.3% memcpy 3 0.1% st_lookup 2 0.1% rb_thread_schedule 1 0.0% rb_eval 1 0.0% rb_newobj 1 0.0% rb_gc_force_recycle rb_thread_save_context
  • 124. Profiling EM + threads Total: 3763 samples 2764 73.5% catch_timer 989 26.3% memcpy 3 0.1% st_lookup 2 0.1% rb_thread_schedule 1 0.0% rb_eval 1 0.0% rb_newobj 1 0.0% rb_gc_force_recycle rb_thread_save_context rb_thread_restore_context
  • 125. Profiling EM + threads Total: 3763 samples 2764 73.5% catch_timer 989 26.3% memcpy 3 0.1% st_lookup 2 0.1% rb_thread_schedule 1 0.0% rb_eval 1 0.0% rb_newobj 1 0.0% rb_gc_force_recycle rb_thread_save_context rb_thread_restore_context memcpy???
  • 126. Profiling EM + threads Total: 3763 samples 2764 73.5% catch_timer 989 26.3% memcpy 3 0.1% st_lookup 2 0.1% rb_thread_schedule 1 0.0% rb_eval 1 0.0% rb_newobj 1 0.0% rb_gc_force_recycle rb_thread_save_context rb_thread_restore_context memcpy??? really? memcpy?
  • 127. ltrace trace library calls ltrace -cp <pid> ltrace -ttTp <pid> -o <file>
  • 128. ltrace -c ruby cext_test.rb
  • 129. ltrace -c ruby cext_test.rb % time seconds usecs/call calls function ------ ----------- ----------- --------- -------------------- 48.65 11.741295 617 19009 memcpy 30.16 7.279634 831 8751 longjmp 9.78 2.359889 135 17357 _setjmp 8.91 2.150565 285 7540 malloc 1.10 0.265946 20 13021 memset 0.81 0.195272 19 10105 __ctype_b_loc 0.35 0.084575 19 4361 strcmp 0.19 0.046163 19 2377 strlen 0.03 0.006272 23 265 realloc ------ ----------- ----------- --------- -------------------- 100.00 24.134999 82999 total
  • 130. ltrace -c ruby cext_test.rb % time seconds usecs/call calls function ------ ----------- ----------- --------- -------------------- 48.65 11.741295 617 19009 memcpy really 30.16 7.279634 831 8751 longjmp 9.78 2.359889 135 17357 _setjmp 8.91 2.150565 285 7540 malloc 1.10 0.265946 20 13021 memset 0.81 0.195272 19 10105 __ctype_b_loc 0.35 0.084575 19 4361 strcmp 0.19 0.046163 19 2377 strlen 0.03 0.006272 23 265 realloc ------ ----------- ----------- --------- -------------------- 100.00 24.134999 82999 total
  • 131. ltrace -c ruby cext_test.rb % time seconds usecs/call calls function ------ ----------- ----------- --------- -------------------- 48.65 11.741295 617 19009 memcpy really 30.16 7.279634 831 8751 longjmp 9.78 2.359889 135 17357 _setjmp 8.91 2.150565 285 7540 malloc 1.10 0.265946 20 13021 memset 0.81 0.195272 19 10105 __ctype_b_loc 0.35 0.084575 19 4361 strcmp 0.19 0.046163 19 2377 strlen 0.03 0.006272 23 265 realloc ------ ----------- ----------- --------- -------------------- 100.00 24.134999 82999 total ltrace -ttT -e memcpy ruby cext_test.rb 01:24:48.769408 --- SIGVTALRM (Virtual timer expired) --- 01:24:48.769616 memcpy(0x1216000, "", 1086328) = 0x1216000 <0.000578> 01:24:48.770555 memcpy(0x6e32670, "240&343v", 1086328) = 0x6e32670 <0.000418> 01:24:49.899414 --- SIGVTALRM (Virtual timer expired) --- 01:24:49.899490 memcpy(0x1320000, "", 1082584) = 0x1320000 <0.000628> 01:24:49.900474 memcpy(0x6e32670, "", 1086328) = 0x6e32670 <0.000479>
  • 132. OK, its calling memcpy() but what is it copying?
  • 133. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ }
  • 134. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); 1. save cpu registers th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ }
  • 135. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ }
  • 136. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; 2. save stack frames th->scope = ruby_scope; /* ... */ }
  • 137. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ }
  • 138. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; 3. save vm globals /* ... */ }
  • 139. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ }
  • 140. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ 4. restore vm globals }
  • 141. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ }
  • 142. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ } 5. restore stack frames
  • 143. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ }
  • 144. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ } 6. restore cpu registers
  • 145. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ }
  • 146. OK, its calling memcpy() but what is it copying? static void static void rb_thread_save_context(th) rb_thread_restore_context(th) rb_thread_t th; rb_thread_t th; { { VALUE *pos; ruby_frame = th->frame; int len; ruby_scope = th->scope; ruby_setjmp(th->context); /* ... */ len = ruby_stack_length(&pos); MEMCPY(th->stk_pos, th->stk_pos = pos; th->stk_ptr, VALUE, th->stk_len); th->stk_len = len; ruby_longjmp(th->context); MEMCPY(th->stk_ptr, } th->stk_pos, VALUE, th->stk_len); th->frame = ruby_frame; th->scope = ruby_scope; /* ... */ } it’s copying the stacks to the heap!
  • 149. Stack vs. Heap Stack: Storage for local vars
  • 150. Stack vs. Heap Stack: Storage for local vars Only valid while stack frame is on the stack!
  • 151. Stack vs. Heap Stack: Storage for local vars Only valid while stack frame is on the stack! Keeping track of function calls
  • 152. Stack vs. Heap Stack: Heap: Storage for local vars Only valid while stack frame is on the stack! Keeping track of function calls
  • 153. Stack vs. Heap Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Keeping track of function calls
  • 154. Stack vs. Heap Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  • 155. Stack vs. Heap func1() void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  • 156. Stack vs. Heap func1() 4 bytes void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  • 157. Stack vs. Heap func2() char *string = malloc(10); func3(); func1() 4 bytes void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  • 158. Stack vs. Heap func2() 4 bytes char *string = malloc(10); func3(); func1() 4 bytes void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  • 159. Stack vs. Heap func2() 4 bytes char *string = malloc(10); 10 bytes func3(); func1() 4 bytes void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  • 160. Stack vs. Heap func3() char buffer[8]; func2() 4 bytes char *string = malloc(10); 10 bytes func3(); func1() 4 bytes void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  • 161. Stack vs. Heap func3() 8 bytes char buffer[8]; func2() 4 bytes char *string = malloc(10); 10 bytes func3(); func1() 4 bytes void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  • 162. Stack vs. Heap func3() char buffer[8]; func2() 4 bytes char *string = malloc(10); 10 bytes func3(); func1() 4 bytes void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  • 163. Stack vs. Heap func2() 4 bytes char *string = malloc(10); 10 bytes func3(); func1() 4 bytes void *data; func2(); Stack: Heap: Storage for local vars Storage for vars that persist across function Only valid while stack calls. frame is on the stack! Managed by malloc Keeping track of function calls
  • 165. memcpy()ing the thread stacks During execution
  • 166. memcpy()ing the thread stacks During execution Saving current thread
  • 167. memcpy()ing the thread stacks During execution Saving current thread Restoring next thread
  • 168. memcpy()ing the thread stacks During execution Saving current thread Restoring next thread so, what’s on these thread stacks?
  • 169. gdb the GNU debugger gdb <program> gdb <program> <pid> Be sure to build with: -ggdb -O0
  • 171. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  • 172. gdb walkthrough % gdb ./test-it start gdb (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  • 173. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  • 174. gdb walkthrough % gdb ./test-it (gdb) b average set breakpoint on function named average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  • 175. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  • 176. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run run program Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  • 177. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  • 178. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 hit breakpoint! 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  • 179. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  • 180. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt show backtrace #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  • 181. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  • 182. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 function stack #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  • 183. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  • 184. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; single step (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  • 185. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  • 186. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 print variables (gdb) p sum $2 = 11
  • 187. gdb walkthrough % gdb ./test-it (gdb) b average Breakpoint 1 at 0x1f8e: file test-it.c, line 3. (gdb) run Starting program: /Users/joe/test-it Reading symbols for shared libraries ++. done Breakpoint 1, average (x=5, y=6) at test-it.c:3 3 int sum = x + y; (gdb) bt #0 average (x=5, y=6) at test-it.c:3 #1 0x00001fec in main () at test-it.c:12 (gdb) s 4 double avg = sum / 2.0; (gdb) s 5 return avg; (gdb) p avg $1 = 5.5 (gdb) p sum $2 = 11
  • 188. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  • 189. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  • 190. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  • 191. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  • 192. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  • 193. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  • 194. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  • 195. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  • 196. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
  • 197. What’s on the ruby stack? (gdb) where #0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0, scope=0, self=1403220) at eval.c:6125 #1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493 #2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at eval.c:5083 #3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168 #4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946 #5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0, argv=0x0) at eval.c:5759 #6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0, argv=0x0, body=0x152b24, flags=0) at eval.c:5911 #7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0, scope=0, self=1403220) at eval.c:6158 #8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493 #9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223 #10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437 #11 0x0001d60f in ruby_exec_internal () at eval.c:1642 #12 0x0001d660 in ruby_exec () at eval.c:1662 #13 0x0001d68e in ruby_run () at eval.c:1672 #14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48 rb_eval recursively executes ruby code in 1.8
  • 198. How big is the stack?
  • 199. How big is the stack? #8 rb_eval at eval.c:3493 (gdb) p $ebp - $esp $1 = 968
  • 200. How big is the stack? #8 rb_eval at eval.c:3493 (gdb) p $ebp - $esp base - stack ptr = frame size $1 = 968
  • 201. How big is the stack? #8 rb_eval at eval.c:3493 (gdb) p $ebp - $esp base - stack ptr = frame size $1 = 968 each rb_eval stack frame is almost 1k!
  • 202. How big is the stack? #8 rb_eval at eval.c:3493 (gdb) p $ebp - $esp base - stack ptr = frame size $1 = 968 each rb_eval stack frame is almost 1k! #0 rb_thread_save_context at eval.c:10597 (gdb) p (void*)rb_gc_stack_start - $esp $1 = 10572
  • 203. How big is the stack? #8 rb_eval at eval.c:3493 (gdb) p $ebp - $esp base - stack ptr = frame size $1 = 968 each rb_eval stack frame is almost 1k! #0 rb_thread_save_context at eval.c:10597 (gdb) p (void*)rb_gc_stack_start - $esp $1 = 10572 10.5k stack will be memcpy()’d
  • 204. How big is the stack? #8 rb_eval at eval.c:3493 (gdb) p $ebp - $esp base - stack ptr = frame size $1 = 968 each rb_eval stack frame is almost 1k! #0 rb_thread_save_context at eval.c:10597 (gdb) p (void*)rb_gc_stack_start - $esp $1 = 10572 10.5k stack will be memcpy()’d 50 method calls * 1k ≈ 50k stack
  • 205. Recap: How do Ruby threads work?
  • 206. Recap: How do Ruby threads work? Each thread has it’s own execution context: saved cpu registers (setjmp/longjmp) copy of vm globals (current frame, scope, block) stack (memcpy)

Editor's Notes

  1. this talk is going to get technical, so feel free to interrupt if you have any questions
  2. differs on OS and platforms, but usually includes..
  3. differs on OS and platforms, but usually includes..
  4. differs on OS and platforms, but usually includes..
  5. differs on OS and platforms, but usually includes..
  6. differs on OS and platforms, but usually includes..
  7. each one has pros and cons, different use cases where they make sense. i&amp;#x2019;ll show pictures for each one. let&amp;#x2019;s dive into differences
  8. solaris older than version 9 used hybrid threads too
  9. switch to aman
  10. syscalls are calls to kernel functions numbered functions switches from usermode to kernel mode doesn&amp;#x2019;t show userland functions, but you can look for gaps
  11. look for system calls that took a while look for gaps that indicate userland activity lots of other options, trace network related or fd related calls, etc
  12. look for system calls that took a while look for gaps that indicate userland activity lots of other options, trace network related or fd related calls, etc
  13. look for system calls that took a while look for gaps that indicate userland activity lots of other options, trace network related or fd related calls, etc
  14. so what&amp;#x2019;s the deal with ruby threads? lets strace to find out straced a production ruby.. lots of vtalrms. wtf?
  15. so what&amp;#x2019;s the deal with ruby threads? lets strace to find out straced a production ruby.. lots of vtalrms. wtf?
  16. ruby uses setitimer and signals to schedule green threads setitimer tells the kernel to send a VTALRM signal every 10ms. signal interrupts the process and invokes catch_timer to set rb_thread_pending, which lets the interpreter know it needs to switch threads. rb_thread_start uses thread_init to keep track of whether it needs to start the timer or not. rb_thread_start calls rb_thread_start_timer (.. or pthread_create later)
  17. ruby uses setitimer and signals to schedule green threads setitimer tells the kernel to send a VTALRM signal every 10ms. signal interrupts the process and invokes catch_timer to set rb_thread_pending, which lets the interpreter know it needs to switch threads. rb_thread_start uses thread_init to keep track of whether it needs to start the timer or not. rb_thread_start calls rb_thread_start_timer (.. or pthread_create later)
  18. ruby uses setitimer and signals to schedule green threads setitimer tells the kernel to send a VTALRM signal every 10ms. signal interrupts the process and invokes catch_timer to set rb_thread_pending, which lets the interpreter know it needs to switch threads. rb_thread_start uses thread_init to keep track of whether it needs to start the timer or not. rb_thread_start calls rb_thread_start_timer (.. or pthread_create later)
  19. but our code isn&amp;#x2019;t using threads! turns out net::http and smtp use timeout, which uses threads. and the first time a thread is spawned, the timer is started.. and it never stops! let&amp;#x2019;s fix it.
  20. but our code isn&amp;#x2019;t using threads! turns out net::http and smtp use timeout, which uses threads. and the first time a thread is spawned, the timer is started.. and it never stops! let&amp;#x2019;s fix it.
  21. but our code isn&amp;#x2019;t using threads! turns out net::http and smtp use timeout, which uses threads. and the first time a thread is spawned, the timer is started.. and it never stops! let&amp;#x2019;s fix it.
  22. remember the thread_init variable from before? thread_remove() removes the thread from the linked list. if only the main_thread is left, we simply stop the timer, and make sure to set thread_init=0 so the timer is started up again next time a new thread is spawned.
  23. switch over to JOE. talk about running debian ruby in production
  24. we noticed ruby on debian is pretty slow we googled debian ruby issues, and it turns out sigprocmask is related to enable pthread
  25. we noticed ruby on debian is pretty slow we googled debian ruby issues, and it turns out sigprocmask is related to enable pthread
  26. we noticed ruby on debian is pretty slow we googled debian ruby issues, and it turns out sigprocmask is related to enable pthread
  27. using a pthread for timing doesn&amp;#x2019;t make it slower.. what does? let&amp;#x2019;s see what ./configure --enable-pthread actually does. diff&amp;#x2019;ed generated config.h. hmm, getcontext/setcontext??
  28. using a pthread for timing doesn&amp;#x2019;t make it slower.. what does? let&amp;#x2019;s see what ./configure --enable-pthread actually does. diff&amp;#x2019;ed generated config.h. hmm, getcontext/setcontext??
  29. turns out you don&amp;#x2019;t really need ucontext to use pthreads (maybe on some obscure platforms?) let&amp;#x2019;s strace it! .. 3.5 million sigprocmask are gone! ruby is 30% faster!
  30. switch to aman
  31. two threads each allocates large stack frame (50kb) does some computation, then calls thread pass to switch to the other thread
  32. two threads each allocates large stack frame (50kb) does some computation, then calls thread pass to switch to the other thread
  33. two threads each allocates large stack frame (50kb) does some computation, then calls thread pass to switch to the other thread
  34. two threads each allocates large stack frame (50kb) does some computation, then calls thread pass to switch to the other thread
  35. two threads each allocates large stack frame (50kb) does some computation, then calls thread pass to switch to the other thread
  36. two threads each allocates large stack frame (50kb) does some computation, then calls thread pass to switch to the other thread
  37. really.. memcpy? let&amp;#x2019;s make sure
  38. really.. memcpy? let&amp;#x2019;s make sure
  39. really.. memcpy? let&amp;#x2019;s make sure
  40. really.. memcpy? let&amp;#x2019;s make sure
  41. really.. memcpy? let&amp;#x2019;s make sure
  42. ok, its calling memcpy. what is it copying? it&amp;#x2019;s copying the thread stacks to the heap. let&amp;#x2019;s take a step back and talk about the difference between stacks and heaps
  43. ok, its calling memcpy. what is it copying? it&amp;#x2019;s copying the thread stacks to the heap. let&amp;#x2019;s take a step back and talk about the difference between stacks and heaps
  44. ok, its calling memcpy. what is it copying? it&amp;#x2019;s copying the thread stacks to the heap. let&amp;#x2019;s take a step back and talk about the difference between stacks and heaps
  45. ok, its calling memcpy. what is it copying? it&amp;#x2019;s copying the thread stacks to the heap. let&amp;#x2019;s take a step back and talk about the difference between stacks and heaps
  46. ok, its calling memcpy. what is it copying? it&amp;#x2019;s copying the thread stacks to the heap. let&amp;#x2019;s take a step back and talk about the difference between stacks and heaps
  47. ok, its calling memcpy. what is it copying? it&amp;#x2019;s copying the thread stacks to the heap. let&amp;#x2019;s take a step back and talk about the difference between stacks and heaps
  48. ok, its calling memcpy. what is it copying? it&amp;#x2019;s copying the thread stacks to the heap. let&amp;#x2019;s take a step back and talk about the difference between stacks and heaps
  49. ok, its calling memcpy. what is it copying? it&amp;#x2019;s copying the thread stacks to the heap. let&amp;#x2019;s take a step back and talk about the difference between stacks and heaps
  50. ok, its calling memcpy. what is it copying? it&amp;#x2019;s copying the thread stacks to the heap. let&amp;#x2019;s take a step back and talk about the difference between stacks and heaps
  51. ok, its calling memcpy. what is it copying? it&amp;#x2019;s copying the thread stacks to the heap. let&amp;#x2019;s take a step back and talk about the difference between stacks and heaps
  52. ok, its calling memcpy. what is it copying? it&amp;#x2019;s copying the thread stacks to the heap. let&amp;#x2019;s take a step back and talk about the difference between stacks and heaps
  53. ok, its calling memcpy. what is it copying? it&amp;#x2019;s copying the thread stacks to the heap. let&amp;#x2019;s take a step back and talk about the difference between stacks and heaps
  54. ok, its calling memcpy. what is it copying? it&amp;#x2019;s copying the thread stacks to the heap. let&amp;#x2019;s take a step back and talk about the difference between stacks and heaps
  55. ok, its calling memcpy. what is it copying? it&amp;#x2019;s copying the thread stacks to the heap. let&amp;#x2019;s take a step back and talk about the difference between stacks and heaps
  56. func3() has a 8byte stack frame, twice as big as the other two the bigger the stack frames, the more it has to memcpy and the longer it takes.
  57. func3() has a 8byte stack frame, twice as big as the other two the bigger the stack frames, the more it has to memcpy and the longer it takes.
  58. func3() has a 8byte stack frame, twice as big as the other two the bigger the stack frames, the more it has to memcpy and the longer it takes.
  59. func3() has a 8byte stack frame, twice as big as the other two the bigger the stack frames, the more it has to memcpy and the longer it takes.
  60. func3() has a 8byte stack frame, twice as big as the other two the bigger the stack frames, the more it has to memcpy and the longer it takes.
  61. func3() has a 8byte stack frame, twice as big as the other two the bigger the stack frames, the more it has to memcpy and the longer it takes.
  62. func3() has a 8byte stack frame, twice as big as the other two the bigger the stack frames, the more it has to memcpy and the longer it takes.
  63. func3() has a 8byte stack frame, twice as big as the other two the bigger the stack frames, the more it has to memcpy and the longer it takes.
  64. func3() has a 8byte stack frame, twice as big as the other two the bigger the stack frames, the more it has to memcpy and the longer it takes.
  65. func3() has a 8byte stack frame, twice as big as the other two the bigger the stack frames, the more it has to memcpy and the longer it takes.
  66. func3() has a 8byte stack frame, twice as big as the other two the bigger the stack frames, the more it has to memcpy and the longer it takes.
  67. func3() has a 8byte stack frame, twice as big as the other two the bigger the stack frames, the more it has to memcpy and the longer it takes.
  68. func3() has a 8byte stack frame, twice as big as the other two the bigger the stack frames, the more it has to memcpy and the longer it takes.
  69. func3() has a 8byte stack frame, twice as big as the other two the bigger the stack frames, the more it has to memcpy and the longer it takes.
  70. func3() has a 8byte stack frame, twice as big as the other two the bigger the stack frames, the more it has to memcpy and the longer it takes.
  71. func3() has a 8byte stack frame, twice as big as the other two the bigger the stack frames, the more it has to memcpy and the longer it takes.
  72. syscalls are calls to kernel functions numbered functions switches from usermode to kernel mode doesn&amp;#x2019;t show userland functions, but you can look for gaps
  73. starts out with main() like any C program calls ruby_run right away to start the ruby vm int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block but, the most common stack frame is rb_eval. 1.8&amp;#x2019;s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
  74. starts out with main() like any C program calls ruby_run right away to start the ruby vm int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block but, the most common stack frame is rb_eval. 1.8&amp;#x2019;s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
  75. starts out with main() like any C program calls ruby_run right away to start the ruby vm int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block but, the most common stack frame is rb_eval. 1.8&amp;#x2019;s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
  76. starts out with main() like any C program calls ruby_run right away to start the ruby vm int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block but, the most common stack frame is rb_eval. 1.8&amp;#x2019;s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
  77. starts out with main() like any C program calls ruby_run right away to start the ruby vm int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block but, the most common stack frame is rb_eval. 1.8&amp;#x2019;s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
  78. starts out with main() like any C program calls ruby_run right away to start the ruby vm int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block but, the most common stack frame is rb_eval. 1.8&amp;#x2019;s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
  79. starts out with main() like any C program calls ruby_run right away to start the ruby vm int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block but, the most common stack frame is rb_eval. 1.8&amp;#x2019;s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
  80. starts out with main() like any C program calls ruby_run right away to start the ruby vm int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block but, the most common stack frame is rb_eval. 1.8&amp;#x2019;s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
  81. starts out with main() like any C program calls ruby_run right away to start the ruby vm int_dotimes in numeric.c, this code calls 5000.times{} rb_yield is yielding to the block but, the most common stack frame is rb_eval. 1.8&amp;#x2019;s vm represents ruby code using nodes, and nodes are evaluated using rb_eval. also notice that rb_eval is recursive.. rails for instance would show many dozens of nested rb_eval
  82. each rb_eval stack frame is almost 1k! (mention mbari patches) switch to joe
  83. each rb_eval stack frame is almost 1k! (mention mbari patches) switch to joe
  84. each rb_eval stack frame is almost 1k! (mention mbari patches) switch to joe
  85. each rb_eval stack frame is almost 1k! (mention mbari patches) switch to joe
  86. each rb_eval stack frame is almost 1k! (mention mbari patches) switch to joe
  87. each rb_eval stack frame is almost 1k! (mention mbari patches) switch to joe
  88. rb_thread_start allocates a new heap, sets the stack pointer using assembly then thread_save/restore just call setjump and longjump like normal, which takes care of saving and restoring where the stack pointer was pointing!
  89. rb_thread_start allocates a new heap, sets the stack pointer using assembly then thread_save/restore just call setjump and longjump like normal, which takes care of saving and restoring where the stack pointer was pointing!
  90. rb_thread_start allocates a new heap, sets the stack pointer using assembly then thread_save/restore just call setjump and longjump like normal, which takes care of saving and restoring where the stack pointer was pointing!
  91. normally the kernel extends the stack automatically mmap is an alternative to malloc that gives you a big region of memory
  92. each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack
  93. each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack
  94. each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack
  95. each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack
  96. each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack
  97. each thread decrements number and then pauses itself. basically tests 50 million thread context switches across 500 threads with 20 ruby method frames in each thread stack