Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing

Threading Theory Multiprocessing Others Conclusion Finalise

Beating the (sh** out of the) GIL
Multithreading vs. Multiprocessing

Hair dryer 1920s,
Dark Roasted Blend:
http://www.darkroastedblend.
com/2007/01/
retro-technology-update.html

Guy K. Kloss | Multithreading vs. Multiprocessing 1/36


Beating the (sh** out of the) GIL
Multithreading vs. Multiprocessing

Guy K. Kloss

Computer Science
Massey University, Albany

New Zealand Python User Group Meeting
Auckland, 12 June 2008



Outline

1 Threading

2 Theory

3 Multiprocessing

4 Others

5 Conclusion



Source: http://blog.snaplogic.org/?cat=29


What People Think Now

Threading and shared memory are common
(thanks to Windows and Java)
Python supports threads (Yay!)
Python also supports easy forking (Yay!)
The GIL . . . is a problem for pure Python,
non I/O bound applications
Lots of people “understand” threads . . .
. . . and fail at them (to do them properly)



What People Think Now

Blog post by Mark Ramm, 14 May 2008
A multi threaded system is particularly important for people
who use Windows, which makes multi–process computing
much more memory intensive than it needs to be. As my
grandma always said Windows can’t fork worth a damn. ;)
[. . . ]
So, really it’s kinda like shared–memory optimized
micro–processes running inside larger OS level processes, and
that makes multi–threaded applications a lot more
reasonable to wrap your brain around. Once you start down
the path of lock managment the non-deterministic character
of the system can quickly overwhelm your brain.



Simple Threading Example
from threading import Thread
from stuff import expensiveFunction

class MyClass(Thread):
def __init__(self, argument):
self.argument = argument
Thread.__init__(self) # I n i t i a l i s e the thread

def run(self):
self.value = expensiveFunction(self.argument)

callObjects = []
for i in range(config.segments):
callObjects.append(MyClass(i))

for item in callObjects:
item.start()

# Do something e l s e .
time.sleep(15.0)

for item in callObjects:
item.join()
print item.value



Our Example with Threading

Our fractal example
now with threading.

Just a humble hair–dryer from the
30s: “One of the ﬁrst machines used
for permanent wave hairstyling back
in the 1920’s and 1930’s.”
Dark Roasted Blend:
http://www.darkroastedblend.com/2007/05/
mystery-devices-issue-2.html



The GIL

Global Interpreter Lock
What is it for?
Cooperative multitasking
Interpreter knows when it’s “good to switch”
Often more eﬃcient than preemptive multi–tasking
Can be released from native (C) code extensions
(done for I/O intensive operations)
Is it good?
Easy coding
Easy modules/extensions
Large base of available modules alredy
Speed improvement by factor 2
(for single–threaded applications)
Keeps code safe


The GIL
Alternatives

Other implementations
(C) Python uses it
Jython doesn’t
IronPython doesn’t
They use their own/internal threading mechanisms
Is it a design ﬂaw?
Maybe . . . but . . .
Fierce/intense discussions to change the code base
Solutions that pose other beneﬁts:
Processes create fewer inherent dead lock situations
Processes scale also to multi–host scenarios



Doug Hellmann in Python Magazine 10/2007:
Techniques using low–level, operating system–speciﬁc,
libraries for process management are as passe as using
compiled languages for CGI programming. I don’t have time
for this low–level stuﬀ any more, and neither do you. Let’s
look at some modern alternatives.



GIL–less Python

There was an attempt/patch “way back then ...”
There’s a new project now by Adam Olsen
Python 3000 with “free theading” [1]
Using Monitors to isolate state
Design focus: usability
(for common cases, maintainable code)
Optional at compile time using --with-freethread
Sacriﬁced single–threaded performance
(60–65 % but equivalent to threaded CPython)
Automatic deadlock detection
(detection/breaking, giving exceptions/stack trace)
Runs on Linux and OS/X


Outline

1 Threading

2 Theory

3 Multiprocessing

4 Others

5 Conclusion



Parallelisation in General

CPU vs. I/O bottle necks
Threading: Good for I/O constrains
This talk aims at CPU constrains
Threads vs. Processes
Threads: Within a process on one host
Processes: Independent on the OS
Processes are:
Heavier in memory/overhead
Have their own name space and memory
Involve less problems with competing access
to resources and their management
But:
On UN*X/Linux: Process overhead is very low
(C)Python is ineﬃcient in handling threads
Stackless Python is much more eﬃcient on threading


Abstraction Level vs. Control

Abstraction levels for parallel computing models [7]

Parallelism Communication Synchronisation
4 implicit
3 explicit implicit
2 explicit implicit
1 explicit

Explicit: The programmer speciﬁes it in the parallel program
Implicit: A compiler/runtime system derives it from other information



Abstraction Level vs. Control

Low level: Close to hardware
Must specify parallelism
. . . communication
. . . and synchronisation
→ Best means for performance tuning
→ Premature optimisation?
High level: Highest machine independence
More/all handled by computing model
Up to automatic parallelisation approaches
Both extremes have not been very successful to date
Most developments now:
Level 3 for speciﬁc purposes
Level 1 for general programming
(esp. in the scientiﬁc community)
With Python consistent level 2 possible


Common for Parallel Computing

Message Passing Interface (MPI)
for distributed memory
OpenMP
shared memory multi–threading
The two do not have to be categorised like this



Art by “Teknika Molodezhi,” Russia 1966

Dark Roasted Blend: http://www.darkroastedblend.com/2008/01/retro-future-mind-boggling.html


Outline

1 Threading

2 Theory

3 Multiprocessing

4 Others

5 Conclusion



Processing around the GIL

Smart multi–processing
Smart task farming



(py)Processing module

By R. Oudkerk [2]
Written in C (really fast!)
Allowes multiple cores and multiple hosts/clusters
Data synchronisation through managers
Easy “upgrade path”
Drop in replacement (mostly) for the threading module
Transparent to user
Forks processes, but uses Thread API
Supports queues, pipes, locks,
managers (for sharing state), worker pools
VERY fast, see PEP-371 [3]
Jesse Noller for pyprocessing into core Python
benchmarks available, awesome results!
PEP is oﬃcially accepted: Thanks Guido!


(continued)

Some details
Producer/consumer style system
– workers pull jobs
Hides most details of communication
– usable default settings
Communication is tweakable
(to improve performance or meet certain requirements)




Let’s see it!



Parallel Python module

By Vitalii Vanovschi [4]
Pure Python
Full “Batteries included” paradigm model:
Spawns automatically across detected cores,
and can spawn to clusters
Uses some thread module methods under the hood
More of a “task farming” approach
(requires potentially rethinking/restructuring)
Automatically deploys code and data,
no diﬃcult/multiple installs
Fault tolerance, secure inter–node communication,
runs everywhere
Very active communigy,
good documentation, good support


Parallel Python module

Let’s see it!



Outline

1 Threading

2 Theory

3 Multiprocessing

4 Others

5 Conclusion



Honourable Mentions

pprocess [5]
IPython for parallel computing [6]
Bulk Synchronous Parallel (BSP) Model [7]
sequence of super steps
(computation, communication, barrier synch)
Reactor based architectures, through Twisted [8]
“Don’t call us, we call you”
MPI (pyMPI, Pypar, MPI for Python, pypvm)
requires constant number of processors during
compation’s duration
Pyro (distributed object system)
Linda (PyLinda)
Scientiﬁc Python (master/slave computing model)
data distribution through call parameters/replication


Outline

1 Threading

2 Theory

3 Multiprocessing

4 Others

5 Conclusion



Things to Note

Which approach is best?
Can’t say!
Many of the approaches are complimentary
Needs to be evaluated what to use when
All, however, save you a lot of time over the alternative
of writing everything yourself with low–level libraries.
What an age to be alive!
Problems can arise when objects cannot be pickled



Conclusion

Resolving the GIL is not necessarily the best solution
More ineﬃcient (single threaded) runtime
Problems with shared memory access
Various approaches to beat the GIL
Solutions are complimentary in many ways
many scale beyond a local machine/memory system



Questions?

G.Kloss@massey.ac.nz
Slides and code available here:
http://www.kloss-familie.de/moin/TalksPresentations



References I

[1] A. Olsen,
Python 3000 with Free Threading project,
[Online]
http://code.google.com/p/python-safethread/
[2] R. Oudkerk,
Processing Package,
[Online] http://pypi.python.org/pypi/processing/
[3] J. Noller,
PEP-371,
[Online] http://www.python.org/dev/peps/pep-0371/



References II

[4] V. Vanovschi,
Parallel Python,
[Online] http://parallelpython.com/
[5] P. Boddie,
pprocess,
[Online] http://pypi.python.org/pypi/processing/
[6] Project Website,
IPython,
[Online] http://ipython.scipy.org/doc/ipython1/
html/parallel_intro.html
[7] K. Hinsen,
Parallel Scripting with Python
Computing in Science & Engineering, Nov/Dec 2007


References III

[8] B. Eckel,
Concurrency with Python, Twisted, and Flex,
[Online] http://www.artima.com/weblogs/viewpost.
jsp?thread=230001


Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing

Similar to Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing (20)

More from Guy K. Kloss

More from Guy K. Kloss (16)

Recently uploaded

Recently uploaded (20)

Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing