In Search of the Perfect                    Global Interpreter Lock                                                    Dav...
Introduction                • As many programmers know, Python and Ruby                        feature a Global Interprete...
An Experiment                 • Consider a trivial CPU-bound function                         def countdown(n):           ...
An Experiment                  • Some Ruby                         def countdown(n)                             while n > ...
Expectations                 • Sequential and threaded versions perform the                        same amount of work (sa...
Results                • Ruby 1.9 on OS-X (4 cores)                         Sequential                          : 2.46s   ...
Results                • Ruby 1.9 on OS-X (4 cores)                         Sequential                          : 2.46s   ...
Results                • Ruby 1.9 on OS-X (4 cores)                         Sequential                          : 2.46s   ...
Results                • Ruby 1.9 on Windows Server 2008 (2 cores)                         Sequential                     ...
Results                • Ruby 1.9 on Windows Server 2008 (2 cores)                         Sequential                     ...
Results                • Ruby 1.9 on Windows Server 2008 (2 cores)                         Sequential                     ...
Experiment: Messaging        • A request/reply server for size-prefixed messages                                  Client   ...
An Experiment: Messaging          • A simple test - message echo (pseudocode)        def client(nummsg,msg):              ...
An Experiment: Messaging          • A simple test - message echo (pseudocode)        def client(nummsg,msg):              ...
An Experiment: Messaging          • A test: send/receive 1000 8K messages          • Scenario 1: Unloaded server          ...
Results               • Messaging with no threads (OS-X, 4 cores)                         C                               ...
Results               • Messaging with no threads (OS-X, 4 cores)                         C                               ...
Results               • Messaging with no threads (Linux, 8 CPUs)                         C                               ...
Results               • Messaging with no threads (Linux, 8 CPUs)                         C                               ...
Results               • Messaging with no threads (Linux, 8 CPUs)                         C                               ...
The Mystery Deepens               • Disable all but one CPU core               • CPU-bound threads (OS-X)                 ...
Better is Worse               • Change software versions               • Lets upgrade to Python 3 (Linux)                 ...
Whats Happening?                • The GIL does far more than limit cores                • It can make performance much wor...
Why You Might Care           • Must you abandon Python/Ruby for concurrency?           • Having threads restricted to one ...
Why I Care             • Its an interesting little systems problem             • How do you make a better GIL?            ...
Some Background                   • I have been discussing some of these issues                           in the Python co...
A Tale of Two GILsCopyright (C) 2010, David Beazley, http://www.dabeaz.com                        27
Thread Implementation            • System threads                               • System threads                   (e.g., ...
Alas, the GIL                • Parallel execution is forbidden                • There is a "global interpreter lock"      ...
GIL Implementation           int gil_locked = 0;                             mutex_t gil;           mutex_t gil_mutex;    ...
Thread Execution Model                • The GIL results in cooperative multitasking                                       ...
Threads for I/O                • For I/O it works great                • GIL is never held very long                • Most...
Threads for Computation              • You may actually want to compute something!                 • Fibonacci numbers    ...
CPU-Bound Switching      • Releases and                                       • Background thread             reacquires t...
Python Thread Switching                                                   Run 100              Run 100                Run ...
Ruby Thread Switching                      Timer                                  Timer (10ms)          Timer (10ms)      ...
A Common Theme              • Both Python and Ruby have C code like this:                           void execute() {      ...
Question                    • What can go wrong with this bit of code?                                    if (must_release...
PathologyCopyright (C) 2010, David Beazley, http://www.dabeaz.com               39
Thread Switching                • Suppose you have two threads                                                      Runnin...
Thread Switching                • Easy case : Thread 1 performs I/O (read/write)                                          ...
Thread Switching                • Tricky case : Thread 1 runs until preempted                  pt                         ...
Thread Switching                • You might expect that Thread 2 will run                      pt                         ...
Thread Switching               • What might actually happen on multicore                       pt                         ...
Fallacy                    • This code doesnt actually switch threads                                    if (must_release_...
Fallacy                    • This doesnt force switching (sleeping)                                    if (must_release_gi...
Fallacy                    • Neither does this (calling the scheduler)                                    if (must_release...
A Conflict                 • There are conflicting goals                    • Python/Ruby - wants to run on a single        ...
Multicore GIL Battle            • Python 2.7 on OS-X                                          (4 cores)                   ...
Multicore GIL Battle            • You can see it! (2 CPU-bound threads)                                                   ...
I/O Handling                • If there is a CPU-bound thread, I/O bound                       threads have a hard time get...
Messaging Pathology               • Messaging on Linux (8 Cores)                                 Ruby 1.9 (no threads)   :...
Lets Talk Fairness             • Fair-locking means that locks have some notion                    of priorities, arrival ...
Effect of Fair-Locking             • Ruby 1.9 (multiple cores)                        Messages + 1 CPU Thread (OS-X)     :...
Effect of Fair-Locking             • Ruby 1.9 (multiple cores)                        Messages + 1 CPU Thread (OS-X)     :...
Effect of Fair-Locking             • Ruby 1.9 (multiple cores)                        Messages + 1 CPU Thread (OS-X)     :...
Effect of Fair-Locking             • Ruby 1.9 (multiple cores)                        Messages + 1 CPU Thread (OS-X)     :...
Fair-Locking - Bah!                 • In reality, you dont want fairness                 • Messaging Revisited (OS X, 4 Co...
Messaging Revisited          • Go back to the messaging server                                               def server():...
Messaging Revisited          • The actual implementation (size-prefixed messages)                                          ...
Performance Explained          • What actually happens under the covers                                           def serv...
Performance Illustrated           Timer                                     10ms         10ms    10ms    10ms     10ms    ...
DespairCopyright (C) 2010, David Beazley, http://www.dabeaz.com             63
A Solution?                                                       Dont use threads!                • Yes, yes, everyone ha...
A Better Solution                                                    Make the GIL better                  • Its probably n...
GIL Efforts in Python 3                 • Python 3.2 has a new GIL implementation                 • Its imperfect--in fact...
Python 3 GIL                • GIL acquisition now based on timeouts                                                       ...
Problem: Convoying               • CPU-bound threads significantly degrade I/O                                             ...
Problem: Convoying            • You can directly observe the delays (messaging)                         Python/Ruby (No th...
PromiseCopyright (C) 2010, David Beazley, http://www.dabeaz.com             70
Priorities               • Best promise : Priority scheduling               • Earlier versions of Ruby had it             ...
Priorities             • Experimental Python-3.2 with priority scheduler             • Also features immediate preemption ...
New Problems               • Priorities bring new challenges                  • Starvation                  • Priority inv...
Final Words              • Implementing a GIL is a lot trickier than it looks              • Even work with priorities has...
Thanks for Listening!              • I hope you learned at least one new thing              • Im always interested in feed...
Upcoming SlideShare
Loading in...5
×

In Search of the Perfect Global Interpreter Lock

49,736

Published on

Presentation on the Python/Ruby Global Interpreter Lock at RuPy 2011. October 14, 2011. Poznan, Poland.

Published in: Technology
0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
49,736
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
172
Comments
0
Likes
15
Embeds 0
No embeds

No notes for slide

In Search of the Perfect Global Interpreter Lock

  1. 1. In Search of the Perfect Global Interpreter Lock David Beazley http://www.dabeaz.com @dabeaz October 15, 2011 Presented at RuPy 2011 Poznan, PolandCopyright (C) 2010, David Beazley, http://www.dabeaz.com 1
  2. 2. Introduction • As many programmers know, Python and Ruby feature a Global Interpreter Lock (GIL) • More precise: CPython and MRI • It limits thread performance on multicore • Theoretically restricts code to a single CPUCopyright (C) 2010, David Beazley, http://www.dabeaz.com 2
  3. 3. An Experiment • Consider a trivial CPU-bound function def countdown(n): while n > 0: n -= 1 • Run it once with a lot of work COUNT = 100000000 # 100 million countdown(COUNT) • Now, divide the work across two threads t1 = Thread(target=count,args=(COUNT//2,)) t2 = Thread(target=count,args=(COUNT//2,)) t1.start(); t2.start() t1.join(); t2.join()Copyright (C) 2010, David Beazley, http://www.dabeaz.com 3
  4. 4. An Experiment • Some Ruby def countdown(n) while n > 0 n -= 1 end end • Sequential COUNT = 100000000 # 100 million countdown(COUNT) • Subdivided across threads t1 = Thread.new { countdown(COUNT/2) } t2 = Thread.new { countdown(COUNT/2) } t1.join t2.joinCopyright (C) 2010, David Beazley, http://www.dabeaz.com 4
  5. 5. Expectations • Sequential and threaded versions perform the same amount of work (same # calculations) • There is the GIL... so no parallelism • Performance should be about the sameCopyright (C) 2010, David Beazley, http://www.dabeaz.com 5
  6. 6. Results • Ruby 1.9 on OS-X (4 cores) Sequential : 2.46s Threaded (2 threads) : 2.55s (~ same)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 6
  7. 7. Results • Ruby 1.9 on OS-X (4 cores) Sequential : 2.46s Threaded (2 threads) : 2.55s (~ same) • Python 2.7 Sequential : 6.12s Threaded (2 threads) : 9.28s (1.5x slower!)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 7
  8. 8. Results • Ruby 1.9 on OS-X (4 cores) Sequential : 2.46s Threaded (2 threads) : 2.55s (~ same) • Python 2.7 Sequential : 6.12s Threaded (2 threads) : 9.28s (1.5x slower!) • Question: Why does it get slower in Python?Copyright (C) 2010, David Beazley, http://www.dabeaz.com 8
  9. 9. Results • Ruby 1.9 on Windows Server 2008 (2 cores) Sequential : 3.32s Threaded (2 threads) : 3.45s (~ same)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 9
  10. 10. Results • Ruby 1.9 on Windows Server 2008 (2 cores) Sequential : 3.32s Threaded (2 threads) : 3.45s (~ same) • Python 2.7 Sequential : 6.9s Threaded (2 threads) : 63.0s (9.1x slower!)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 10
  11. 11. Results • Ruby 1.9 on Windows Server 2008 (2 cores) Sequential : 3.32s Threaded (2 threads) : 3.45s (~ same) • Python 2.7 Sequential : 6.9s Threaded (2 threads) : 63.0s (9.1x slower!) • Why does it get that much slower on Windows?Copyright (C) 2010, David Beazley, http://www.dabeaz.com 11
  12. 12. Experiment: Messaging • A request/reply server for size-prefixed messages Client Server • Each message: a size header + payload • Similar: ZeroMQCopyright (C) 2010, David Beazley, http://www.dabeaz.com 12
  13. 13. An Experiment: Messaging • A simple test - message echo (pseudocode) def client(nummsg,msg): def server(): while nummsg > 0: while True: send(msg) msg = recv() resp = recv() send(msg) sleep(0.001) nummsg -= 1Copyright (C) 2010, David Beazley, http://www.dabeaz.com 13
  14. 14. An Experiment: Messaging • A simple test - message echo (pseudocode) def client(nummsg,msg): def server(): while nummsg > 0: while True: send(msg) msg = recv() resp = recv() send(msg) sleep(0.001) nummsg -= 1 • To be less evil, its throttled (<1000 msg/sec) • Not a messaging stress testCopyright (C) 2010, David Beazley, http://www.dabeaz.com 14
  15. 15. An Experiment: Messaging • A test: send/receive 1000 8K messages • Scenario 1: Unloaded server Client Server • Scenario 2 : Server competing with one CPU-thread CPU-Thread Client ServerCopyright (C) 2010, David Beazley, http://www.dabeaz.com 15
  16. 16. Results • Messaging with no threads (OS-X, 4 cores) C : 1.26s Python 2.7 : 1.29s Ruby 1.9 : 1.29sCopyright (C) 2010, David Beazley, http://www.dabeaz.com 16
  17. 17. Results • Messaging with no threads (OS-X, 4 cores) C : 1.26s Python 2.7 : 1.29s Ruby 1.9 : 1.29s • Messaging with one CPU-bound thread* C : 1.16s (~8% faster!?) Python 2.7 : 12.3s (10x slower) Ruby 1.9 : 42.0s (33x slower) • Hmmm. Curious. * On Ruby, the CPU-bound thread was also given lower priorityCopyright (C) 2010, David Beazley, http://www.dabeaz.com 17
  18. 18. Results • Messaging with no threads (Linux, 8 CPUs) C : 1.13s Python 2.7 : 1.18s Ruby 1.9 : 1.18sCopyright (C) 2010, David Beazley, http://www.dabeaz.com 18
  19. 19. Results • Messaging with no threads (Linux, 8 CPUs) C : 1.13s Python 2.7 : 1.18s Ruby 1.9 : 1.18s • Messaging with one CPU-bound thread C : 1.11s (same) Python 2.7 : 1.60s (1.4x slower) - better Ruby 1.9 : 5839.4s (~5000x slower) - worse!Copyright (C) 2010, David Beazley, http://www.dabeaz.com 19
  20. 20. Results • Messaging with no threads (Linux, 8 CPUs) C : 1.13s Python 2.7 : 1.18s Ruby 1.9 : 1.18s • Messaging with one CPU-bound thread C : 1.11s (same) Python 2.7 : 1.60s (1.4x slower) - better Ruby 1.9 : 5839.4s (~5000x slower) - worse! • 5000x slower? Really? Why?Copyright (C) 2010, David Beazley, http://www.dabeaz.com 20
  21. 21. The Mystery Deepens • Disable all but one CPU core • CPU-bound threads (OS-X) Python 2.7 (4 cores+hyperthreading) : 9.28s Python 2.7 (1 core) : 7.9s (faster!) • Messaging with one CPU-bound thread Ruby 1.9 (4 cores+hyperthreading) : 42.0s Ruby 1.9 (1 core) : 10.5s (much faster!) • ?!?!?!?!?!?Copyright (C) 2010, David Beazley, http://www.dabeaz.com 21
  22. 22. Better is Worse • Change software versions • Lets upgrade to Python 3 (Linux) Python 2.7 (Messaging) : 12.3s Python 3.2 (Messaging) : 20.1s (1.6x slower) • Lets downgrade to Ruby 1.8 (Linux) Ruby 1.9 (Messaging) : 42.0 Ruby 1.8.7 (Messaging) : 10.0s (4x faster) • So much for progress (sigh)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 22
  23. 23. Whats Happening? • The GIL does far more than limit cores • It can make performance much worse • Better performance by turning off cores? • 5000x performance hit on Linux? • Why?Copyright (C) 2010, David Beazley, http://www.dabeaz.com 23
  24. 24. Why You Might Care • Must you abandon Python/Ruby for concurrency? • Having threads restricted to one CPU core might be okay if it were sane • Analogy: A multitasking operating system (e.g., Linux) runs fine on a single CPU • Plus, threads get used a lot behind the scenes (even in thread alternatives, e.g., async)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 24
  25. 25. Why I Care • Its an interesting little systems problem • How do you make a better GIL? • Its fun.Copyright (C) 2010, David Beazley, http://www.dabeaz.com 25
  26. 26. Some Background • I have been discussing some of these issues in the Python community since 2009 http://www.dabeaz.com/GIL • Im less familiar with Ruby, but Ive looked at its GIL implementation and experimented • Very interested in commonalities/differencesCopyright (C) 2010, David Beazley, http://www.dabeaz.com 26
  27. 27. A Tale of Two GILsCopyright (C) 2010, David Beazley, http://www.dabeaz.com 27
  28. 28. Thread Implementation • System threads • System threads (e.g., pthreads) (e.g., pthreads) • Managed by OS • Managed by OS • Concurrent • Concurrent execution of the execution of the Python interpreter Ruby VM (written in C) (written in C)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 28
  29. 29. Alas, the GIL • Parallel execution is forbidden • There is a "global interpreter lock" • The GIL ensures that only one thread runs in the interpreter at once • Simplifies many low-level details (memory management, callouts to C extensions, etc.)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 29
  30. 30. GIL Implementation int gil_locked = 0; mutex_t gil; mutex_t gil_mutex; cond_t gil_cond; void gil_acquire() { mutex_lock(gil); void gil_acquire() { } mutex_lock(gil_mutex); void gil_release() { while (gil_locked) mutex_unlock(gil); cond_wait(gil_cond); } gil_locked = 1; mutex_unlock(gil_mutex); } Simple mutex lock void gil_release() { mutex_lock(gil_mutex); gil_locked = 0; cond_notify(); mutex_unlock(gil_mutex); Condition variable }Copyright (C) 2010, David Beazley, http://www.dabeaz.com 30
  31. 31. Thread Execution Model • The GIL results in cooperative multitasking block block block block block Thread 1 run run Thread 2 run run run Thread 3 release acquire release acquire GIL GIL GIL GIL • When a thread is running, it holds the GIL • GIL released on blocking (e.g., I/O operations)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 31
  32. 32. Threads for I/O • For I/O it works great • GIL is never held very long • Most threads just sit around sleeping • Life is goodCopyright (C) 2010, David Beazley, http://www.dabeaz.com 32
  33. 33. Threads for Computation • You may actually want to compute something! • Fibonacci numbers • Image/audio processing • Parsing • The CPU will be busy • And it wont give up the GIL on its ownCopyright (C) 2010, David Beazley, http://www.dabeaz.com 33
  34. 34. CPU-Bound Switching • Releases and • Background thread reacquires the GIL generates a timer every 100 "ticks" interrupt every 10ms • 1 Tick ~= 1 interpreter • GIL released and instruction reacquired by current thread on interruptCopyright (C) 2010, David Beazley, http://www.dabeaz.com 34
  35. 35. Python Thread Switching Run 100 Run 100 Run 100 ticks ticks ticks CPU Bound Thread e e e e e e as uir le q as uir le q as uir le q re ac re ac re ac • Every 100 VM instructions, GIL is dropped, allowing other threads to run if they want • Not time based--switching interval depends on kind of instructions executedCopyright (C) 2010, David Beazley, http://www.dabeaz.com 35
  36. 36. Ruby Thread Switching Timer Timer (10ms) Timer (10ms) Thread CPU Bound Run Run Thread e e as uir e e as uir le q le q re ac re ac • Loosely mimics the time-slice of the OS • Every 10ms, GIL is released/acquiredCopyright (C) 2010, David Beazley, http://www.dabeaz.com 36
  37. 37. A Common Theme • Both Python and Ruby have C code like this: void execute() { while (inst = next_instruction()) { // Run the VM instruction ... if (must_release_gil) { GIL_release(); /* Other threads may run now */ GIL_acquire(); } } } • Exact details vary, but concept is the same • Each thread has periodic release/acquire in the VM to allow other threads to runCopyright (C) 2010, David Beazley, http://www.dabeaz.com 37
  38. 38. Question • What can go wrong with this bit of code? if (must_release_gil) { GIL_release(); /* Other threads may run now */ GIL_acquire(); } • Short answer: Everything!Copyright (C) 2010, David Beazley, http://www.dabeaz.com 38
  39. 39. PathologyCopyright (C) 2010, David Beazley, http://www.dabeaz.com 39
  40. 40. Thread Switching • Suppose you have two threads Running Thread 1 Thread 2 READY • Thread 1 : Running • Thread 2 : Ready (Waiting for GIL)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 40
  41. 41. Thread Switching • Easy case : Thread 1 performs I/O (read/write) I/O Running Thread 1 BLOCKED release GIL pthreads/OS schedule Running Thread 2 READY acquire GIL • Thread 1 : Releases GIL and blocks for I/O • Thread 2 : Gets scheduled, starts runningCopyright (C) 2010, David Beazley, http://www.dabeaz.com 41
  42. 42. Thread Switching • Tricky case : Thread 1 runs until preempted pt m ee Running pr Thread 1 ??? release GIL pthreads/OS Which thread runs? Thread 2 READY ???Copyright (C) 2010, David Beazley, http://www.dabeaz.com 42
  43. 43. Thread Switching • You might expect that Thread 2 will run pt m ee pr Running Thread 1 READY release GIL pthreads/OS acquire schedule GIL Running Thread 2 READY • But you assume the GIL plays nice...Copyright (C) 2010, David Beazley, http://www.dabeaz.com 43
  44. 44. Thread Switching • What might actually happen on multicore pt m ee pr Running Running Thread 1 release acquire GIL GIL pthreads/OS schedule fails (GIL locked) Thread 2 READY READY • Both threads attempt to run simultaneously • ... but only one will succeed (depends on timing)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 44
  45. 45. Fallacy • This code doesnt actually switch threads if (must_release_gil) { GIL_release(); /* Other threads may run now */ GIL_acquire(); } • It might switch threads, but it depends • What operating system • # cores • Lock scheduling policy (if any)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 45
  46. 46. Fallacy • This doesnt force switching (sleeping) if (must_release_gil) { GIL_release(); sleep(0); /* Other threads may run now */ GIL_acquire(); } • It might switch threads, but it depends • What operating system • # cores • Lock scheduling policy (if any)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 46
  47. 47. Fallacy • Neither does this (calling the scheduler) if (must_release_gil) { GIL_release(); sched_yield() /* Other threads may run now */ GIL_acquire(); } • It might switch threads, but it depends • What operating system • # cores • Lock scheduling policy (if any)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 47
  48. 48. A Conflict • There are conflicting goals • Python/Ruby - wants to run on a single CPU, but doesnt want to do thread scheduling (i.e., let the OS do it). • OS - "Oooh. Multiple cores." Schedules as many runnable tasks as possible at any instant • Result: Threads fight with each otherCopyright (C) 2010, David Beazley, http://www.dabeaz.com 48
  49. 49. Multicore GIL Battle • Python 2.7 on OS-X (4 cores) Sequential : 6.12s Threaded (2 threads) : 9.28s (1.5x slower!) pt pt pt em em em p re p re pr e 100 ticks 100 ticks Thread 1 ... READY release acquire release acquire pthreads/OS Eventually... schedule fail schedule fail run Thread 2 READY READY READY • Millions of failed GIL acquisitionsCopyright (C) 2010, David Beazley, http://www.dabeaz.com 49
  50. 50. Multicore GIL Battle • You can see it! (2 CPU-bound threads) Why >100%? • Comment: In Python, its very rapid • GIL is released every few microseconds!Copyright (C) 2010, David Beazley, http://www.dabeaz.com 50
  51. 51. I/O Handling • If there is a CPU-bound thread, I/O bound threads have a hard time getting the GIL Thread 1 (CPU 1) Thread 2 (CPU 2) run sleep preempt Network Packet run Acquire GIL (fails) preempt run Acquire GIL (fails) Might repeat preempt 100s-1000s of times run Acquire GIL (fails) preempt Acquire GIL (success) runCopyright (C) 2010, David Beazley, http://www.dabeaz.com 51
  52. 52. Messaging Pathology • Messaging on Linux (8 Cores) Ruby 1.9 (no threads) : 1.18s Ruby 1.9 (1 CPU thread) : 5839.4s • Locks in Linux have no fairness • Consequence: Really hard to steal the GIL • And Ruby only retries every 10msCopyright (C) 2010, David Beazley, http://www.dabeaz.com 52
  53. 53. Lets Talk Fairness • Fair-locking means that locks have some notion of priorities, arrival order, queuing, etc. running waiting t0 Lock t1 t2 t3 t4 t5 release running waiting t1 Lock t2 t3 t4 t5 t0 • Releasing means you go to end of lineCopyright (C) 2010, David Beazley, http://www.dabeaz.com 53
  54. 54. Effect of Fair-Locking • Ruby 1.9 (multiple cores) Messages + 1 CPU Thread (OS-X) : 42.0s Messages + 1 CPU Thread (Linux) : 5839.4s • Question: Which one uses fair locking?Copyright (C) 2010, David Beazley, http://www.dabeaz.com 54
  55. 55. Effect of Fair-Locking • Ruby 1.9 (multiple cores) Messages + 1 CPU Thread (OS-X) : 42.0s (Fair) Messages + 1 CPU Thread (Linux) : 5839.4s • Benefit : I/O threads get their turn (yay!)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 55
  56. 56. Effect of Fair-Locking • Ruby 1.9 (multiple cores) Messages + 1 CPU Thread (OS-X) : 42.0s (Fair) Messages + 1 CPU Thread (Linux) : 5839.4s • Benefit : I/O threads get their turn (yay!) • Python 2.7 (multiple cores) 2 CPU-Bound Threads (OS-X) : 9.28s 2 CPU-Bound Threads (Windows) : 63.0s • Question: Which one uses fair-locking?Copyright (C) 2010, David Beazley, http://www.dabeaz.com 56
  57. 57. Effect of Fair-Locking • Ruby 1.9 (multiple cores) Messages + 1 CPU Thread (OS-X) : 42.0s (Fair) Messages + 1 CPU Thread (Linux) : 5839.4s • Benefit : I/O threads get their turn (yay!) • Python 2.7 (multiple cores) 2 CPU-Bound Threads (OS-X) : 9.28s 2 CPU-Bound Threads (Windows) : 63.0s (Fair) • Problem: Too much context switchingCopyright (C) 2010, David Beazley, http://www.dabeaz.com 57
  58. 58. Fair-Locking - Bah! • In reality, you dont want fairness • Messaging Revisited (OS X, 4 Cores) Ruby 1.9 (No Threads) : 1.29s Ruby 1.9 (1 CPU-Bound thread) : 42.0s (33x slower) • Why is it still 33x slower? • Answer: Fair locking! (and convoying)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 58
  59. 59. Messaging Revisited • Go back to the messaging server def server(): while True: msg = recv() send(msg)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 59
  60. 60. Messaging Revisited • The actual implementation (size-prefixed messages) def server(): while True: size = recv(4) msg = recv(size) send(size) send(msg)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 60
  61. 61. Performance Explained • What actually happens under the covers def server(): while True: GIL release size = recv(4) GIL release msg = recv(size) GIL release send(size) GIL release send(msg) • Why? Each operation might block • Catch: Passes control back to CPU-bound threadCopyright (C) 2010, David Beazley, http://www.dabeaz.com 61
  62. 62. Performance Illustrated Timer 10ms 10ms 10ms 10ms 10ms Thread run CPU Bound Thread run run run run run I/O recv recv send send done Thread Data Arrives • Each message has 40ms response cycle • 1000 messages x 40ms = 40s (42.0s measured)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 62
  63. 63. DespairCopyright (C) 2010, David Beazley, http://www.dabeaz.com 63
  64. 64. A Solution? Dont use threads! • Yes, yes, everyone hates threads • However, thats only because theyre useful! • Threads are used for all sorts of things • Even if theyre hidden behind the scenesCopyright (C) 2010, David Beazley, http://www.dabeaz.com 64
  65. 65. A Better Solution Make the GIL better • Its probably not going away (very difficult) • However, does it have to thrash wildly? • Question: Can you do anything?Copyright (C) 2010, David Beazley, http://www.dabeaz.com 65
  66. 66. GIL Efforts in Python 3 • Python 3.2 has a new GIL implementation • Its imperfect--in fact, it has a lot of problems • However, people are experimenting with itCopyright (C) 2010, David Beazley, http://www.dabeaz.com 66
  67. 67. Python 3 GIL • GIL acquisition now based on timeouts running Thread 1 drop_request release 5ms running Thread 2 IOWAIT READY wait(gil, TIMEOUT) wait(gil, TIMEOUT) data arrives • Involves waiting on a condition variableCopyright (C) 2010, David Beazley, http://www.dabeaz.com 67
  68. 68. Problem: Convoying • CPU-bound threads significantly degrade I/O running running running Thread 1 release 5ms 5ms 5ms run run Thread 2 READY READY READY data data data arrives arrives arrives • This is the same problem as in Ruby • Just a shorter time delay (5ms)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 68
  69. 69. Problem: Convoying • You can directly observe the delays (messaging) Python/Ruby (No threads) : 1.29s (no delays) Python 3.2 (1 Thread) : 20.1s (5ms delays) Ruby 1.9 (1 Thread) : 42.0s (10ms delays) • Still not great, but problem is understoodCopyright (C) 2010, David Beazley, http://www.dabeaz.com 69
  70. 70. PromiseCopyright (C) 2010, David Beazley, http://www.dabeaz.com 70
  71. 71. Priorities • Best promise : Priority scheduling • Earlier versions of Ruby had it • It works (OS-X, 4 cores) Ruby 1.9 (1 Thread) : 42.0s Ruby 1.8.7 (1 Thread) : 40.2s Ruby 1.8.7 (1 Thread, lower priority) : 10.0s • Comment: Ruby-1.9 allows thread priorities to be set in pthreads, but it doesnt seem to have much (if any) effectCopyright (C) 2010, David Beazley, http://www.dabeaz.com 71
  72. 72. Priorities • Experimental Python-3.2 with priority scheduler • Also features immediate preemption • Messages (OS X, 4 Cores) Python 3.2 (No threads) : 1.29s Python 3.2 (1 Thread) : 20.2s Python 3.2+priorities (1 Thread) : 1.21s (faster?) • Thats a lot more promising!Copyright (C) 2010, David Beazley, http://www.dabeaz.com 72
  73. 73. New Problems • Priorities bring new challenges • Starvation • Priority inversion • Implementation complexity • Do you have to write a full OS scheduler? • Hopefully not, but its an open questionCopyright (C) 2010, David Beazley, http://www.dabeaz.com 73
  74. 74. Final Words • Implementing a GIL is a lot trickier than it looks • Even work with priorities has problems • Good example of how multicore is diabolicalCopyright (C) 2010, David Beazley, http://www.dabeaz.com 74
  75. 75. Thanks for Listening! • I hope you learned at least one new thing • Im always interested in feedback • Follow me on Twitter (@dabeaz)Copyright (C) 2010, David Beazley, http://www.dabeaz.com 75
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×