Python threads: Dive into GIL!PyCon India 2011Pune, Sept 16-18Vishal Kanaujia & Chetan Giridhar
Python: A multithreading example
Setting up the context!!Hmm…My threads should be twice as faster on dual cores!57 % dip in Execution Time on dual core!!
Getting into the problem space!!Python v2.7Getting in to the Problem Space!!GIL
Python ThreadsReal system threads (POSIX/ Windows threads)Python VM has no intelligence of thread management No thread priorities, pre-emptionNative operative system supervises thread schedulingPython interpreter just does the per-thread bookkeeping.
What’s wrong with Py threads?Each ‘running’ thread requires exclusive access to data structures in Python interpreterGlobal interpreter lock (GIL) provides the synchronization (bookkeeping) GIL is necessary, mainly because CPython's memory management is not thread-safe.
GIL: Code DetailsA thread create request in Python is just a pthread_create() callThe function Py_Initialize() creates the GILGIL is simply a synchronization primitive, and can be implemented with a semaphore/ mutex.../Python/ceval.cstatic PyThread_type_lockinterpreter_lock = 0; /* This is the GIL */A “runnable” thread acquires this lock and start execution
GIL ManagementHow does Python manages GIL?Python interpreter regularly performs a    check on the running threadAccomplishes thread switching and signal handlingWhat is a “check”?A counter of ticks; ticks decrement as a thread runs
A tick maps to a Python VM’s byte-code instructions
Check interval can be set with sys.setcheckinterval (interval)
Check dictate CPU time-slice available to a threadGIL Management: ImplementationInvolves two global varibales:PyAPI_DATA(volatile int) _Py_Ticker;PyAPI_DATA(int) _Py_CheckInterval;As soon as ticks reach zero:Refill the ticker
active thread releases the GIL
Signals sleeping threads to wake up
Everyone competes for GILFile: ../ceval.cPyEval_EvalFrameEx () {           --_Py_Ticker;  /* Decrement ticker */          /* Refill it */           _Py_Ticker = _Py_CheckInterval;      if (interpreter_lock) {PyThread_release_lock(interpreter_lock);                /* Other threads may run now */PyThread_acquire_lock(interpreter_lock, 1);      }  }
 Two CPU bound threads on single core machineGIL releasedwaiting for GILthread1RunningSuspendedSignals thread2GIL acquiredwaiting for GILthread2SuspendedWakeupRunningtimecheck1check2
GIL impactThere is considerable time lag with Communication (Signaling)Thread wake-upGIL acquisitionGIL Management: Independent of host/native OS schedulingResultSignificant overheadThread waits if GIL in unavailableThreads run sequentially, rather than concurrentlyTry Ctrl + C. Does it stop execution?
Curious case of multicore systemthread1, core0threadthread2, core1time Conflicting goals of OS scheduler and Python interpreter
 Host OS can schedule threads concurrently on multi-core
 GIL battleThe ‘Priority inversion’In a [CPU, I/O]-bound mixed application, I/O bound thread may starve! “cache-hotness” may influence the new GIL owner; usually the recent owner!Preferring CPU thread over I/O threadPython presents a priority inversion on multi-core systems.
Understanding the new story!!Python v3.2Getting in to the Problem Space!!GIL
New GIL: Python v3.2Regular “check” are discontinuedWe have new time-out mechanism.Default time-out= 5msConfigurable through sys.setswitchinterval()For every time-out, the current GIL holder is forced to release GIL It then signals the other waiting threadsWaits for a signal from new GIL owner (acknowledgement).A sleeping thread wakes up, acquires the GIL, and signals the last owner.
Curious Case of Multicore: Python v3.2CPU Thread core0timeWaiting for the GILGIL releasedRunningWaitSuspendedlangissignalTime Outwaiting for GILGIL acquiredSuspendedWakeupRunningCPU Thread core1
Positive impact with new GILBetter GIL arbitrationEnsures that a thread runs only for 5msLess context switching and fewer signalsMulticore perspective: GIL battle eliminated!More responsive threads (fair scheduling)All iz well
I/O threads in PythonAn interesting optimization by interpreterI/O calls are assumed blockingPython I/O extensively exercise this optimization  with file, socket ops (e.g. read, write, send, recv calls)./Python3.2.1/Include/ceval.hPy_BEGIN_ALLOW_THREADS         Do some blocking I/O operation ...Py_END_ALLOW_THREADSI/O thread always releases the GIL
Convoy effect: Fallout of I/O optimizationWhen an I/O thread releases the GIL, another ‘runnable’ CPU bound thread can acquire it (remember we are on multiple cores).It leaves the I/O thread waiting for another time-out (default: 5ms)!Once CPU thread releases GIL, I/O thread acquires and releases it again This cycle goes on => performance suffers 
Convoy “in” effect	Convoy effect- observed in an application comprising I/O-bound  and CPU-bound threadsGIL releasedGIL releasedI/O Thread core0GIL acquiredTime OutRunningWaitRunningWaitSuspendedwaiting for GILGIL acquiredSuspendedRunningWaitSuspendedCPU Thread core1time
Performance measurements!Curious to know how convoy effect translates into performance numbersWe performed following tests with Python3.2:An application with a CPU and a I/O threadExecuted on a dual core machineCPU thread spends less than few seconds (<10s)!
Comparing: Python 2.7 & Python 3.2Performance dip still observed in dual cores 
Getting into the problem space!!Python v2.7 / v3.2Getting in to the Problem Space!!GILSolution space!!
GIL free world: JythonJython is free of GIL It can fully exploit multiple cores, as per our experimentsExperiments with Jython2.5Run with two CPU threads in tandemExperiment shows performance improvement on a multi-core system
Avoiding GIL impact with multiprocessingmultiprocessing — Process-based “threading” interface“multiprocessing” module spawns a new Python interpreter instance for a process.Each process is independent, and GIL is irrelevant; Utilizes multiple cores better than threads.Shares API with “threading” module.Cool! 40 % improvement in Execution Time on dual core!! 
ConclusionMulti-core systems are becoming ubiquitousPython applications should exploit this abundant powerCPython inherently suffers the GIL limitationAn intelligent awareness of Python interpreter behavior is helpful in developing multi-threaded applicationsUnderstand and use 
QuestionsThank you for your time and attention Please share your feedback/ comments/ suggestions to us at:cjgiridhar@gmail.com ,                http://technobeans.comvishalkanaujia@gmail.com,        http://freethreads.wordpress.com
ReferencesUnderstanding the Python GIL, http://dabeaz.com/talks.htmlGlobalInterpreterLock, http://wiki.python.org/moin/GlobalInterpreterLockThread State and the Global Interpreter Lock, http://docs.python.org/c-api/init.html#threadsPython v3.2.2 and v2.7.2 documentation, http://docs.python.org/Concurrency and Python, http://drdobbs.com/open-source/206103078?pgno=3
Backup slides

Pycon11: Python threads: Dive into GIL!

  • 1.
    Python threads: Diveinto GIL!PyCon India 2011Pune, Sept 16-18Vishal Kanaujia & Chetan Giridhar
  • 2.
  • 3.
    Setting up thecontext!!Hmm…My threads should be twice as faster on dual cores!57 % dip in Execution Time on dual core!!
  • 4.
    Getting into theproblem space!!Python v2.7Getting in to the Problem Space!!GIL
  • 5.
    Python ThreadsReal systemthreads (POSIX/ Windows threads)Python VM has no intelligence of thread management No thread priorities, pre-emptionNative operative system supervises thread schedulingPython interpreter just does the per-thread bookkeeping.
  • 6.
    What’s wrong withPy threads?Each ‘running’ thread requires exclusive access to data structures in Python interpreterGlobal interpreter lock (GIL) provides the synchronization (bookkeeping) GIL is necessary, mainly because CPython's memory management is not thread-safe.
  • 7.
    GIL: Code DetailsAthread create request in Python is just a pthread_create() callThe function Py_Initialize() creates the GILGIL is simply a synchronization primitive, and can be implemented with a semaphore/ mutex.../Python/ceval.cstatic PyThread_type_lockinterpreter_lock = 0; /* This is the GIL */A “runnable” thread acquires this lock and start execution
  • 8.
    GIL ManagementHow doesPython manages GIL?Python interpreter regularly performs a check on the running threadAccomplishes thread switching and signal handlingWhat is a “check”?A counter of ticks; ticks decrement as a thread runs
  • 9.
    A tick mapsto a Python VM’s byte-code instructions
  • 10.
    Check interval canbe set with sys.setcheckinterval (interval)
  • 11.
    Check dictate CPUtime-slice available to a threadGIL Management: ImplementationInvolves two global varibales:PyAPI_DATA(volatile int) _Py_Ticker;PyAPI_DATA(int) _Py_CheckInterval;As soon as ticks reach zero:Refill the ticker
  • 12.
  • 13.
  • 14.
    Everyone competes forGILFile: ../ceval.cPyEval_EvalFrameEx () { --_Py_Ticker; /* Decrement ticker */ /* Refill it */ _Py_Ticker = _Py_CheckInterval; if (interpreter_lock) {PyThread_release_lock(interpreter_lock); /* Other threads may run now */PyThread_acquire_lock(interpreter_lock, 1); } }
  • 15.
    Two CPUbound threads on single core machineGIL releasedwaiting for GILthread1RunningSuspendedSignals thread2GIL acquiredwaiting for GILthread2SuspendedWakeupRunningtimecheck1check2
  • 16.
    GIL impactThere isconsiderable time lag with Communication (Signaling)Thread wake-upGIL acquisitionGIL Management: Independent of host/native OS schedulingResultSignificant overheadThread waits if GIL in unavailableThreads run sequentially, rather than concurrentlyTry Ctrl + C. Does it stop execution?
  • 17.
    Curious case ofmulticore systemthread1, core0threadthread2, core1time Conflicting goals of OS scheduler and Python interpreter
  • 18.
    Host OScan schedule threads concurrently on multi-core
  • 19.
    GIL battleThe‘Priority inversion’In a [CPU, I/O]-bound mixed application, I/O bound thread may starve! “cache-hotness” may influence the new GIL owner; usually the recent owner!Preferring CPU thread over I/O threadPython presents a priority inversion on multi-core systems.
  • 20.
    Understanding the newstory!!Python v3.2Getting in to the Problem Space!!GIL
  • 21.
    New GIL: Pythonv3.2Regular “check” are discontinuedWe have new time-out mechanism.Default time-out= 5msConfigurable through sys.setswitchinterval()For every time-out, the current GIL holder is forced to release GIL It then signals the other waiting threadsWaits for a signal from new GIL owner (acknowledgement).A sleeping thread wakes up, acquires the GIL, and signals the last owner.
  • 22.
    Curious Case ofMulticore: Python v3.2CPU Thread core0timeWaiting for the GILGIL releasedRunningWaitSuspendedlangissignalTime Outwaiting for GILGIL acquiredSuspendedWakeupRunningCPU Thread core1
  • 23.
    Positive impact withnew GILBetter GIL arbitrationEnsures that a thread runs only for 5msLess context switching and fewer signalsMulticore perspective: GIL battle eliminated!More responsive threads (fair scheduling)All iz well
  • 24.
    I/O threads inPythonAn interesting optimization by interpreterI/O calls are assumed blockingPython I/O extensively exercise this optimization with file, socket ops (e.g. read, write, send, recv calls)./Python3.2.1/Include/ceval.hPy_BEGIN_ALLOW_THREADS Do some blocking I/O operation ...Py_END_ALLOW_THREADSI/O thread always releases the GIL
  • 25.
    Convoy effect: Falloutof I/O optimizationWhen an I/O thread releases the GIL, another ‘runnable’ CPU bound thread can acquire it (remember we are on multiple cores).It leaves the I/O thread waiting for another time-out (default: 5ms)!Once CPU thread releases GIL, I/O thread acquires and releases it again This cycle goes on => performance suffers 
  • 26.
    Convoy “in” effect Convoyeffect- observed in an application comprising I/O-bound and CPU-bound threadsGIL releasedGIL releasedI/O Thread core0GIL acquiredTime OutRunningWaitRunningWaitSuspendedwaiting for GILGIL acquiredSuspendedRunningWaitSuspendedCPU Thread core1time
  • 27.
    Performance measurements!Curious toknow how convoy effect translates into performance numbersWe performed following tests with Python3.2:An application with a CPU and a I/O threadExecuted on a dual core machineCPU thread spends less than few seconds (<10s)!
  • 28.
    Comparing: Python 2.7& Python 3.2Performance dip still observed in dual cores 
  • 29.
    Getting into theproblem space!!Python v2.7 / v3.2Getting in to the Problem Space!!GILSolution space!!
  • 30.
    GIL free world:JythonJython is free of GIL It can fully exploit multiple cores, as per our experimentsExperiments with Jython2.5Run with two CPU threads in tandemExperiment shows performance improvement on a multi-core system
  • 31.
    Avoiding GIL impactwith multiprocessingmultiprocessing — Process-based “threading” interface“multiprocessing” module spawns a new Python interpreter instance for a process.Each process is independent, and GIL is irrelevant; Utilizes multiple cores better than threads.Shares API with “threading” module.Cool! 40 % improvement in Execution Time on dual core!! 
  • 32.
    ConclusionMulti-core systems arebecoming ubiquitousPython applications should exploit this abundant powerCPython inherently suffers the GIL limitationAn intelligent awareness of Python interpreter behavior is helpful in developing multi-threaded applicationsUnderstand and use 
  • 33.
    QuestionsThank you foryour time and attention Please share your feedback/ comments/ suggestions to us at:cjgiridhar@gmail.com , http://technobeans.comvishalkanaujia@gmail.com, http://freethreads.wordpress.com
  • 34.
    ReferencesUnderstanding the PythonGIL, http://dabeaz.com/talks.htmlGlobalInterpreterLock, http://wiki.python.org/moin/GlobalInterpreterLockThread State and the Global Interpreter Lock, http://docs.python.org/c-api/init.html#threadsPython v3.2.2 and v2.7.2 documentation, http://docs.python.org/Concurrency and Python, http://drdobbs.com/open-source/206103078?pgno=3
  • 35.

Editor's Notes