SlideShare a Scribd company logo
1 of 31
Download to read offline
Concurrency
A very brief overview, focusing on threading.
Jonathan Wagoner
2016-07-29
Basic Concepts
Process
Thread
Python concurrency modules
A running program is called a process. Each process has the following:
It's own system state
Memory
Lists of open files
A program counter keeping track of instructions to execute
The system stack
etc.
Normally, processes execute statements one after another in a single sequence of control flow, which is otherwise referred to as the main thread. This
means, at any given time, the program is just doing one thing.
Programs can create new processes via commands found in the os or subprocess modules, such as os.fork() or subprocess.call(), etc. These are
known as subprocesses because they are run with their own private system state and main thread of execution. Because subprocesses are
independent, they can execute concurrently with the original process.
Processes can communicate with one another via Interprocess Communication (IPC). This protocol uses primatives such as send() and recv() to
transmit and receive messages through and I/O channel such as a pipe or network socket.
A thread is a similar to a process in that it has its own control flow and execution stack. However, the fundamental difference is a thread runs inside the
process that created it! This means that a thread can share all the data and system resources within a process.
When multiple processes or threads are used, the host Operating System (OS) will schedule their work. This is done by a concept known as time-
slicing, in which each process or thread is given a small time slice and the OS will rapidly cycle between all of the active tasks.
Why Use Concurrency?
Compelling Reasons:
Performance
Responsiveness
Non-Blocking / Yielding
Under the Hood:
Preemption
Inter and Intra-Process Communication
Running several threads is similar to running several different programs at the same time (i.e. concurrently), though with the following benefits:
1. Multiple threads within a process share the same data space with the main thread and can therefore share information or communicate with
each other more easily than if they were separate processes.
2. Threads sometimes called light-weight processes and they do not require much memory overhead; they are cheaper than processes.
3. A thread has a beginning, an execution sequence, and a conclusion. It has an instruction pointer that keeps track of where within its context it
is currently running.
4. It can be pre-empted (interrupted)
5. It can temporarily be put on hold (also known as sleeping) while other threads are running - this is called yielding.
Threading does NOT mean:
1. That the programs are executed on different CPUs.
2. That your program is automatically made faster, especially if it already uses 100% CPU time.
For the above instances on what threading does not mean, look into using multiprocessing, which will be discussed in more detail on a different
training session.
The Global Interpreter Lock (GIL)
Only a single Python thread may run at any moment, independent of CPU cores
When to use threads
When to use processes
The Global Interpreter Lock (GIL)
Python threading DOES NOT work as you would expect it to, especially if you are not a Python developer and are coming from other languages such
as C++ or Java. One can write code in Python that uses threads and actually obtain worse performance, especially if the program uses almost all the
CPU.
While Python is minimally thread-safe, the Python interpreter uses the Global Interpreter Lock, or GIL. The GIL allows only a single Python thread to
execute at any moment. In other words, this restricts Python programs to run only on a single processor, no matter how many CPU cores might be
available.
The GIL is a heated topic of debate in the Python community, and it's unlikely to be going anywhere soon.
So when should one use threading and when should one use multiprocessing? If your application is mostly I/O bound, then using threads in Python
are fine. However, if there are applications that require heavy amounts of CPU processing, then using threads to subdivide work doesn't provide any
benefit at all. In fact, your program will actually run slower.
A best practice is to profile any threaded code for performance! Unfortunately, we don't have time to go in-depth into profiling, so let's save that for
next time.
In [1]: import random
import threading
import logging
import time
reload(logging) # Needed because Jupyter mucks with the logging module.
reload(threading) # Ditto. Jupyter also mucks with the threading module.
logging.basicConfig(level=logging.INFO, format='%(asctime)s (%(threadName)-10s) %(message)s',
datefmt='%M:%S')
This entire training module focuses solely on the threading library. Let's save discussions on other concurrency libraries and constructs, such as
multiprocessing, queues, etc. for a later talk.
class threading.Thread(group=None, target=None, name=None, args=(), kwargs={}) This constructor should always be called with keyword arguments.
Arguments are:
group should be None; reserved for future extension when a ThreadGroup class is implemented.
target is the callable object to be invoked by the run() method. Defaults to None, meaning nothing is cal
led.
name is the thread name. By default, a unique name is constructed of the form “Thread-N” where N is a sma
ll decimal number.
args is the argument tuple for the target invocation. Defaults to ().
kwargs is a dictionary of keyword arguments for the target invocation. Defaults to {}.
Once a thread object is created, its activity must be started by calling the thread's start() method. This invokes the run() method in a separate thread of
control.
Once the thread's activity is started, the thread is considered 'alive'. It stops being alive when its run() method terminates - either normally, or by
raising an unhandled exception. The is_alive() method tests whether the thread is alive.
Why should start() be used instead of run()?
start()
Start the thread’s activity.
It must be called at most once per thread object. It arranges for the object’s run() method to be inv
oked in a separate thread of control.
This method will raise a RuntimeError if called more than once on the same thread object.
run()
Method representing the thread’s activity.
You may override this method in a subclass. The standard run() method invokes the callable object pas
sed to the object’s constructor as the target argument, if any, with sequential and keyword arguments tak
en from the args and kwargs arguments, respectively.
The simpliest way to start a thread is to instantiate it with the target method and let it start working. Make sure you use start() to activate the thread
instead of run()!
In [2]: def _worker(fruit):
'''Perform some "work" by eating a piece of `fruit`.'''
logging.info('Eating %s ...' % repr(fruit)) # Indicates "work" is being performed.
logging.info('*Burp* That was a good %s!' % repr(fruit)) # Indicates the "work" is done.
def serial_example():
'''Executes 5 `_worker` methods serially, not using threads.'''
for fruit in ('apple', 'banana', 'grape', 'cherry', 'strawberry'):
_worker(fruit)
if __name__ == '__main__':
serial_example()
The Basics
"Why did the multithreaded Chicken cross the road?"
to To other side. get the
Concurrent vs Serial
Re-entrant vs Non-reentrant
Thread Safe
threading.Thread
run() vs start()
13:06 (MainThread) Eating 'apple' ...
13:06 (MainThread) *Burp* That was a good 'apple'!
13:06 (MainThread) Eating 'banana' ...
13:06 (MainThread) *Burp* That was a good 'banana'!
13:06 (MainThread) Eating 'grape' ...
13:06 (MainThread) *Burp* That was a good 'grape'!
13:06 (MainThread) Eating 'cherry' ...
13:06 (MainThread) *Burp* That was a good 'cherry'!
13:06 (MainThread) Eating 'strawberry' ...
13:06 (MainThread) *Burp* That was a good 'strawberry'!
In [3]: def _worker():
'''A basic "worker" method which logs to stdout, indicating that it did some work.'''
logging.info('Doing Work ...')
def basic_example():
'''Executes 5 `_worker` methods on threads.'''
for _ in range(5):
thread_inst = threading.Thread(target=_worker)
thread_inst.start()
if __name__ == '__main__':
basic_example()
13:06 (Thread-1 ) Doing Work ...
13:06 (Thread-2 ) Doing Work ...
13:06 (Thread-3 ) Doing Work ...
13:06 (Thread-4 ) Doing Work ...
13:06 (Thread-5 ) Doing Work ...
In [4]: def _worker(fruit):
'''Perform some "work" by eating a piece of `fruit` yet finishing after a random amount of tim
e.'''
logging.info('Eating %s ...' % repr(fruit)) # Indicates "work" is being perfored.
seconds_to_sleep = random.randint(1, 10) / 10.
logging.debug('This will take %s seconds to eat' % seconds_to_sleep)
time.sleep(seconds_to_sleep)
logging.info('*Burp* That was a good %s!' % repr(fruit)) # Indicates the "work" is done.
def asynchronous_example():
'''Executes 5 `_worker` methods on threads.'''
threads = list()
for fruit in ('apple', 'banana', 'grape', 'cherry', 'strawberry'):
thread_inst = threading.Thread(target=_worker, args=(fruit,))
threads.append(thread_inst)
[thread_inst.start() for thread_inst in threads] # pylint: disable-msg=expression-not-assigned
[thread_inst.join() for thread_inst in threads] # Needed for Jupyter
if __name__ == '__main__':
asynchronous_example()
13:06 (Thread-6 ) Eating 'apple' ...
13:06 (Thread-7 ) Eating 'banana' ...
13:06 (Thread-8 ) Eating 'grape' ...
13:06 (Thread-9 ) Eating 'cherry' ...
13:06 (Thread-10 ) Eating 'strawberry' ...
13:06 (Thread-8 ) *Burp* That was a good 'grape'!
13:06 (Thread-6 ) *Burp* That was a good 'apple'!
13:06 (Thread-10 ) *Burp* That was a good 'strawberry'!
13:06 (Thread-9 ) *Burp* That was a good 'cherry'!
13:07 (Thread-7 ) *Burp* That was a good 'banana'!
Daemon vs Non-Daemon Threads
join()
currentThread()
Threads are non-daemon by default
Daemon threads die when the main program dies
Useful applications
This example illustrates some property differences betweeen daemon and non-daemon threads.
Up to this point, all of the example programs have implicitly waited to exit until all the threads have completed their work. Sometimes a program will
spawn a thread as a daemon that runs without blocking the main program from exiting.
These daemon threads are best designed to do background tasks, such as sending keepalive packets, performing periodic garbage collection, etc.
These are only useful when the main program is running and it's okay to kill them off once the other, non-daemon, threads have exited. They are also
useful for services where there may not be an easy way to interrupt the thread or where letting the thread die in the middle of its work does not lose or
corrupt data (for example, a thread that generates keepalive packets for a service monitoring tool).
Without daemon threads, you'd have to keep track of them, and tell them to exit, before your program can completely quit. By setting them as
daemon threads, you can let them run and forget about them, and when your program quits, any daemon threads are killed automatically.
The default is for threads to not be daemons.
This example also illustrates that each threading.Thread instance has a name with a default value that can be assigned during thread instantiation.
Naming threads is useful in server process applications, for example, in which there are multiple threads handling different operations. Also, as you've
seen from now, the logging module supports embedding the thread name in every log message. logging is also thread-safe, so messages from
different threads are kept distinct in the output.
In [5]: def _worker(seconds_to_sleep=.1):
'''A "worker" method that indicates when work is starting, waits a static amount of time, then in
dicates when work is done.'''
logging.info('Starting %s' % threading.currentThread().getName())
time.sleep(seconds_to_sleep)
logging.info('Exiting %s' % threading.currentThread().getName())
def daemon_vs_non_daemon_example():
daemon_thread_inst = threading.Thread(target=_worker, name='Daemon Worker', args=(3,))
daemon_thread_inst.setDaemon(True)
non_daemon_thread_inst = threading.Thread(target=_worker, name='Non-daemon Worker')
daemon_thread_inst.start()
non_daemon_thread_inst.start()
# Remove these two lines and you will not see "Exiting Daemon Worker".
daemon_thread_inst.join()
non_daemon_thread_inst.join()
if __name__ == '__main__':
daemon_vs_non_daemon_example()
Advanced Features
currentThread()
enumerate()
13:07 (Daemon Worker) Starting Daemon Worker
13:07 (Non-daemon Worker) Starting Non-daemon Worker
13:07 (Non-daemon Worker) Exiting Non-daemon Worker
13:10 (Daemon Worker) Exiting Daemon Worker
When using daemon threads, it isn't necessary to retain an explicit handle to all of them to ensure they were completed before exiting the main
process thread. The threading module provides an enumerate() method, which returns a list of active threading.Thread instances.
Warning: This list includes the main thread, and joining that will introduce deadlock situation, so care must be taken to skip the joining of the main
thread.
In [18]: def _worker(fruit):
'''Perform some "work" by eating a piece of `fruit` yet finishing after a random amount of tim
e.'''
logging.info('Eating %s ...' % repr(fruit)) # Indicates "work" is being perfored.
seconds_to_sleep = random.randint(5, 15)
logging.debug('This will take %s seconds to eat' % seconds_to_sleep)
time.sleep(seconds_to_sleep)
logging.info('*Burp* That was a good %s!' % repr(fruit)) # Indicates the "work" is done.
def enumeration_example():
for fruit in ('apple', 'banana', 'grape', 'cherry', 'strawberry'):
thread_inst = threading.Thread(target=_worker, name='%s' % fruit, args=(fruit,))
thread_inst.setDaemon(True)
thread_inst.start()
main_thread = threading.currentThread()
for thread_inst in threading.enumerate():
logging.debug('Active threads: %s' % repr(', '.join(sorted([thread_inst.name for thread_inst
in threading.enumerate()]))))
if thread_inst is main_thread:
continue
logging.info('Joining on %s Worker ...' % repr(thread_inst.getName()))
thread_inst.join(16) # Needed for Jupyter
if __name__ == '__main__':
enumeration_example()
14:19 (apple ) Eating 'apple' ...
14:19 (banana ) Eating 'banana' ...
14:19 (grape ) Eating 'grape' ...
14:19 (cherry ) Eating 'cherry' ...
14:19 (strawberry) Eating 'strawberry' ...
14:19 (MainThread) Joining on 'strawberry' Worker ...
14:25 (grape ) *Burp* That was a good 'grape'!
14:26 (banana ) *Burp* That was a good 'banana'!
14:27 (cherry ) *Burp* That was a good 'cherry'!
14:28 (strawberry) *Burp* That was a good 'strawberry'!
14:31 (apple ) *Burp* That was a good 'apple'!
Subclassing
Can overload the run() method
Useful for extending functionality
In [7]: class _CustomThread(threading.Thread):
def __init__(self, foo, *args, **kwargs):
super(_CustomThread, self).__init__(*args, **kwargs)
self.foo_ = foo
self.args = args
self.kwargs = kwargs
def run(self):
logging.info('Executing Foo %s with args %s and kwargs %s' % (repr(self.foo_),
repr(self.args), repr(self.kwargs)))
def subclassing_example():
for index in range(1, 6):
thread_inst = _CustomThread(foo='Fighter', name='Worker %d' % index)
thread_inst.start()
if __name__ == '__main__':
subclassing_example()
13:21 (Worker 1 ) Executing Foo 'Fighter' with args () and kwargs {'name': 'Worker 1'}
13:21 (Worker 2 ) Executing Foo 'Fighter' with args () and kwargs {'name': 'Worker 2'}
13:21 (Worker 3 ) Executing Foo 'Fighter' with args () and kwargs {'name': 'Worker 3'}
13:21 (Worker 4 ) Executing Foo 'Fighter' with args () and kwargs {'name': 'Worker 4'}
13:21 (Worker 5 ) Executing Foo 'Fighter' with args () and kwargs {'name': 'Worker 5'}
class StoppableThread(threading.Thread):
def __init__(self, *args, **kwargs):
super(StoppableThread, self).__init__(*args, **kwargs)
self._terminate = False
self._suspend_lock = threading.Lock()
def terminate(self):
self._terminate = True
def suspend(self):
self._suspend_lock.acquire()
def resume(self):
self._suspend_lock.release()
def run(self):
while True:
if self._terminate:
break
self._suspend_lock.acquire()
self._suspend_lock.release()
# statements go here
Timer
Starts after some delay
cancel()
Useful applications
In [19]: def _worker():
'''A very basic "worker" method which logs to stdout, indicating that it did some work.'''
logging.info('Doing Work ...')
def timer_example():
thread_inst_1 = threading.Timer(3, _worker)
thread_inst_1.setName('Worker 1')
thread_inst_2 = threading.Timer(3, _worker)
thread_inst_2.setName('Worker 2')
logging.info('Starting timer threads ...')
thread_inst_1.start()
thread_inst_2.start()
logging.info('Waiting before canceling %s ...' % repr(thread_inst_2.getName()))
time.sleep(2.)
logging.info('Canceling %s ...' % repr(thread_inst_2.getName()))
thread_inst_2.cancel()
logging.info('Done!')
if __name__ == '__main__':
timer_example()
Signaling
threading.Event
One thread signals an event; the others wait for it
Useful applications
14:42 (MainThread) Starting timer threads ...
14:42 (MainThread) Waiting before canceling 'Worker 2' ...
14:44 (MainThread) Canceling 'Worker 2' ...
14:44 (MainThread) Done!
14:45 (Worker 1 ) Doing Work ...
class threading.Event
The internal flag is initially false.
is_set()
isSet()
Return true if and only if the internal flag is true.
set()
Set the internal flag to true. All threads waiting for it to become true are awakened. Threads that c
all wait() once the flag is true will not block at all.
clear()
Reset the internal flag to false. Subsequently, threads calling wait() will block until set() is call
ed to set the internal flag to true again.
wait([timeout])
Block until the internal flag is true. If the internal flag is true on entry, return immediately. Oth
erwise, block until another thread calls set() to set the flag to true, or until the optional timeout occ
urs.
When the timeout argument is present and not None, it should be a floating point number specifying a time
out for the operation in seconds (or fractions thereof).
This method returns the internal flag on exit, so it will always return True except if a timeout is given
and the operation times out.
In [9]: EVENT = threading.Event()
def _blocking_event():
logging.info('Starting')
event_is_set = EVENT.wait()
logging.info('Event is set: %s' % repr(event_is_set))
def _non_blocking_event(timeout):
while not EVENT.isSet():
logging.info('Starting')
event_is_set = EVENT.wait(timeout)
logging.info('Event is set: %s' % repr(event_is_set))
if event_is_set:
logging.info('Processing event ...')
else:
logging.info('Waiting ...')
In [20]: def signaling_example():
blocking_thread_inst = threading.Thread(target=_blocking_event, name='Blocking Event')
non_blocking_thread_inst = threading.Thread(target=_non_blocking_event, name='Non-Blocking
Event', args=(2.,))
blocking_thread_inst.start()
non_blocking_thread_inst.start()
logging.info('Waiting until Event.set() is called ...')
time.sleep(5.)
EVENT.set()
logging.info('Event is set')
if __name__ == '__main__':
signaling_example()
14:52 (Blocking Event) Starting
14:52 (MainThread) Waiting until Event.set() is called ...
14:52 (Blocking Event) Event is set: True
14:57 (MainThread) Event is set
Controlling Resources
threading.Lock
acquire() and release()
Useful applications
Controlling access to shared resources (such as physical hardware, test equipment, memory, Disk I/O, etc.) is important to prevent corruption or
missed data.
In Python, not all data structures are equal with regards to being thread-safe. So what data structures are thread-safe and which are not? Python’s
built-in data structures (Lists, Dictionaries, etc.) are thread-safe as a side-effect of having atomic byte-codes for manipulating them (the GIL is not
released in the middle of an update). Other data structures implemented in Python, or simpler types like Integers and Floats, don’t have that
protection. To guard against simultaneous access to an object, use a threading.Lock object.
A primitive lock is in one of two states, “locked” or “unlocked”. It is created in the unlocked state. It has two basic methods, acquire() and release().
When the state is unlocked, acquire() changes the state to locked and returns immediately. When the state is locked, acquire() blocks until a call to
release() in another thread changes it to unlocked, then the acquire() call resets it to locked and returns. The release() method should only be called in
the locked state; it changes the state to unlocked and returns immediately. If an attempt is made to release an unlocked lock, a ThreadError will be
raised.
When more than one thread is blocked in acquire() waiting for the state to turn to unlocked, only one thread proceeds when a release() call resets the
state to unlocked; which one of the waiting threads proceeds is not defined, and may vary across implementations.
All methods are executed atomically.
Lock.acquire([blocking]) Acquire a lock, blocking or non-blocking.
When invoked with the blocking argument set to True (the default), block until the lock is unlocked, then set it to locked and return True.
When invoked with the blocking argument set to False, do not block. If a call with blocking set to True would block, return False immediately;
otherwise, set the lock to locked and return True.
Lock.release() Release a lock.
When the lock is locked, reset it to unlocked, and return. If any other threads are blocked waiting for the lock to become unlocked, allow exactly one
of them to proceed.
When invoked on an unlocked lock, a ThreadError is raised.
There is no return value.
In [11]: LOCK = threading.Lock()
def _worker():
logging.info('Waiting for lock ...')
LOCK.acquire()
logging.info('Acquired lock')
try:
logging.info('Doing work on shared resource ...')
time.sleep(random.randint(30, 50) / 10.)
finally:
LOCK.release()
logging.info('Released lock')
In [12]: def locking_example():
threads = list()
for index in range(1, 3):
thread_inst = threading.Thread(target=_worker, name='Worker %d' % index)
threads.append(thread_inst)
random.shuffle(threads)
[thread_inst.start() for thread_inst in threads] # pylint: disable-msg=expression-not-assigned
logging.info('Waiting for worker threads to finish ...')
main_thread = threading.currentThread()
for thread_inst in threading.enumerate():
if thread_inst is not main_thread:
thread_inst.join(10) # Needed for Jupyter since they overload the threading module!
if __name__ == '__main__':
locking_example()
13:28 (Worker 1 ) Waiting for lock ...
13:28 (Worker 2 ) Waiting for lock ...
13:28 (MainThread) Waiting for worker threads to finish ...
13:28 (Worker 1 ) Acquired lock
13:28 (Worker 1 ) Doing work on shared resource ...
13:28 (Non-Blocking Event) Event is set: True
13:28 (Non-Blocking Event) Processing event ...
13:31 (Worker 1 ) Released lock
13:31 (Worker 2 ) Acquired lock
13:31 (Worker 2 ) Doing work on shared resource ...
13:34 (Worker 2 ) Released lock
Reentrant Locks
threading.RLock
Useful applications
Normal threading.Lock objects cannot be acquired more than once, even by the same thread. This can produce undesirable side-effects, especially if
a lock is accessed by more than one method in the call chain.
A reentrant lock is a synchronization primitive that may be acquired multiple times by the same thread. Internally, it uses the concepts of “owning
thread” and “recursion level” in addition to the locked/unlocked state used by primitive locks. In the locked state, some thread owns the lock; in the
unlocked state, no thread owns it.
To lock the lock, a thread calls its acquire() method; this returns once the thread owns the lock. To unlock the lock, a thread calls its release() method.
acquire()/release() call pairs may be nested; only the final release() (the release() of the outermost pair) resets the lock to unlocked and allows another
thread blocked in acquire() to proceed.
RLock.acquire([blocking=1]) Acquire a lock, blocking or non-blocking.
When invoked without arguments: if this thread already owns the lock, increment the recursion level by one, and return immediately. Otherwise, if
another thread owns the lock, block until the lock is unlocked. Once the lock is unlocked (not owned by any thread), then grab ownership, set the
recursion level to one, and return. If more than one thread is blocked waiting until the lock is unlocked, only one at a time will be able to grab
ownership of the lock. There is no return value in this case.
When invoked with the blocking argument set to true, do the same thing as when called without arguments, and return true.
When invoked with the blocking argument set to false, do not block. If a call without an argument would block, return false immediately; otherwise, do
the same thing as when called without arguments, and return true.
RLock.release() Release a lock, decrementing the recursion level. If after the decrement it is zero, reset the lock to unlocked (not owned by any thread),
and if any other threads are blocked waiting for the lock to become unlocked, allow exactly one of them to proceed. If after the decrement the
recursion level is still nonzero, the lock remains locked and owned by the calling thread.
Only call this method when the calling thread owns the lock. A RuntimeError is raised if this method is called when the lock is unlocked.
There is no return value.
In [13]: logging.basicConfig(level=logging.INFO, format='%(asctime)s (%(threadName)-10s) %(message)s')
def rlocking_example():
lock = threading.Lock()
logging.info('First try to acquire a threading.Lock object: %s' % bool(lock.acquire()))
logging.info('Second try to acquire a threading.Lock object: %s' % bool(lock.acquire(0)))
rlock = threading.RLock()
logging.info('First try to acquire a threading.RLock object: %s' % bool(rlock.acquire()))
logging.info('Second try to acquire a threading.RLock object: %s' % bool(rlock.acquire(0)))
if __name__ == '__main__':
rlocking_example()
Using threading.Lock as Context Managers
Eliminates try/finally block
Cleaner, more readable and safer code
13:34 (MainThread) First try to acquire a threading.Lock object: True
13:34 (MainThread) Second try to acquire a threading.Lock object: False
13:34 (MainThread) First try to acquire a threading.RLock object: True
13:34 (MainThread) Second try to acquire a threading.RLock object: True
In [14]: LOCK = threading.Lock()
def _worker():
logging.info('Waiting for lock ...')
with LOCK:
logging.info('Acquired lock')
logging.info('Doing work on shared resource ...')
time.sleep(random.randint(30, 50) / 10.)
logging.info('Released lock')
def locking_example():
threads = list()
for index in range(1, 3):
thread_inst = threading.Thread(target=_worker, name='Worker %d' % index)
threads.append(thread_inst)
random.shuffle(threads)
[thread_inst.start() for thread_inst in threads] # pylint: disable-msg=expression-not-assigned
logging.info('Waiting for worker threads to finish ...')
main_thread = threading.currentThread()
for thread_inst in threading.enumerate():
if thread_inst is not main_thread:
thread_inst.join(10) # Needed for Jupyter since they overload the threading module!
if __name__ == '__main__':
locking_example()
13:34 (Worker 2 ) Waiting for lock ...
13:34 (Worker 1 ) Waiting for lock ...
13:34 (MainThread) Waiting for worker threads to finish ...
13:34 (Worker 2 ) Acquired lock
13:34 (Worker 2 ) Doing work on shared resource ...
13:39 (Worker 2 ) Released lock
13:39 (Worker 1 ) Acquired lock
13:39 (Worker 1 ) Doing work on shared resource ...
13:44 (Worker 1 ) Released lock
Synchronization
threading.Condition
wait() and notify()
Useful applications
In addition to using threading.Event, another way of synchronizing threads is through using a threading.Condition object. Because the
threading.Condition uses a threading.Lock, it can be tied to a shared resource. This allows threads to wait for the resource to be updated. In this
example, the _consumer() threads wait() for the threading.Condition to be set before continuing. The _producer() thread is responsible for setting the
condition and notifying the other threads that they can continue.
A condition variable is always associated with some kind of lock; this can be passed in or one will be created by default. (Passing one in is useful
when several condition variables must share the same lock.)
A condition variable has acquire() and release() methods that call the corresponding methods of the associated lock. It also has a wait() method, and
notify() and notifyAll() methods. These three must only be called when the calling thread has acquired the lock, otherwise a RuntimeError is raised.
The wait() method releases the lock, and then blocks until it is awakened by a notify() or notifyAll() call for the same condition variable in another
thread. Once awakened, it re-acquires the lock and returns. It is also possible to specify a timeout.
The notify() method wakes up one of the threads waiting for the condition variable, if any are waiting. The notifyAll() method wakes up all threads
waiting for the condition variable.
Note: the notify() and notifyAll() methods don’t release the lock; this means that the thread or threads awakened will not return from their wait() call
immediately, but only when the thread that called notify() or notifyAll() finally relinquishes ownership of the lock.
In [15]: CONDITION = threading.Condition()
def _consumer():
logging.info('Starting consumer thread')
with CONDITION:
CONDITION.wait()
logging.info('Resource is available to consumer')
def _producer():
logging.info('Starting producer thread')
with CONDITION:
logging.info('Making resource available')
CONDITION.notifyAll()
def synchronization_example():
thread_inst_consumer_1 = threading.Thread(target=_consumer, name='Consumer 1')
thread_inst_consumer_2 = threading.Thread(target=_consumer, name='Consumer 2')
thread_inst_consumer_3 = threading.Thread(target=_consumer, name='Consumer 3')
thread_inst_producer = threading.Thread(target=_producer, name='Producer')
for thread_inst in (thread_inst_consumer_1, thread_inst_consumer_2, thread_inst_consumer_3):
thread_inst.start()
time.sleep(2)
thread_inst_producer.start()
if __name__ == '__main__':
synchronization_example()
13:44 (Consumer 1) Starting consumer thread
13:44 (Consumer 2) Starting consumer thread
13:44 (Consumer 3) Starting consumer thread
13:46 (Producer ) Starting producer thread
13:46 (Producer ) Making resource available
13:46 (Consumer 2) Resource is available to consumer
13:46 (Consumer 3) Resource is available to consumer
13:46 (Consumer 1) Resource is available to consumer
Resource Management
threading.Semaphore
Blocks when internal counter reaches 0
Useful applications
Sometimes it is useful to allow more than one worker access to a resource at a time, while still limiting the overall number. For example, a connection
pool might support a fixed number of simultaneous connections, or a network application might support a fixed number of concurrent downloads. A
threading.Semaphore is one way to manage those connections.
class threading.Semaphore([value]) The optional argument gives the initial value for the internal counter; it defaults to 1. If the value given is less than
0, ValueError is raised.
acquire([blocking]) Acquire a semaphore.
When invoked without arguments: if the internal counter is larger than zero on entry, decrement it by one and return immediately. If it is zero on entry,
block, waiting until some other thread has called release() to make it larger than zero. This is done with proper interlocking so that if multiple acquire()
calls are blocked, release() will wake exactly one of them up. The implementation may pick one at random, so the order in which blocked threads are
awakened should not be relied on. There is no return value in this case.
When invoked with blocking set to true, do the same thing as when called without arguments, and return true.
When invoked with blocking set to false, do not block. If a call without an argument would block, return false immediately; otherwise, do the same
thing as when called without arguments, and return true.
release() Release a semaphore, incrementing the internal counter by one. When it was zero on entry and another thread is waiting for it to become
larger than zero again, wake up that thread.
In [16]: LOCK = threading.Lock()
SEMAPHORE = threading.Semaphore(2)
def _worker():
logging.info('Waiting for lock ...')
with LOCK:
logging.info('Acquired lock')
logging.info('Doing work on shared resource ...')
time.sleep(random.randint(30, 50) / 10.)
logging.info('Released lock')
def arbiter():
logging.info('Waiting to join the pool ...')
with SEMAPHORE:
logging.info('Now part of the pool party!')
_worker()
In [17]: def resource_management_example():
for index in range(1, 4):
thread_inst = threading.Thread(target=arbiter, name='Worker %d' % index)
thread_inst.start()
if __name__ == '__main__':
resource_management_example()
Problems
Added complexity
Deadlock
Livelock
Starvation
13:46 (Worker 1 ) Waiting to join the pool ...
13:46 (Worker 2 ) Waiting to join the pool ...
13:46 (Worker 1 ) Now part of the pool party!
13:46 (Worker 3 ) Waiting to join the pool ...
13:46 (Worker 2 ) Now part of the pool party!
13:46 (Worker 1 ) Waiting for lock ...
13:46 (Worker 2 ) Waiting for lock ...
13:46 (Worker 1 ) Acquired lock
13:46 (Worker 1 ) Doing work on shared resource ...
13:50 (Worker 1 ) Released lock
13:50 (Worker 2 ) Acquired lock
13:50 (Worker 3 ) Now part of the pool party!
13:50 (Worker 2 ) Doing work on shared resource ...
13:50 (Worker 3 ) Waiting for lock ...
13:54 (Worker 2 ) Released lock
13:54 (Worker 3 ) Acquired lock
13:54 (Worker 3 ) Doing work on shared resource ...
13:58 (Worker 3 ) Released lock
Closing
What we covered
Know when and when not to use threads
The tip of the iceburg

More Related Content

What's hot (20)

Multithreading in java
Multithreading in javaMultithreading in java
Multithreading in java
 
Multithreading in java
Multithreading in javaMultithreading in java
Multithreading in java
 
Threads
ThreadsThreads
Threads
 
Java Multithreading
Java MultithreadingJava Multithreading
Java Multithreading
 
Concurrency in java
Concurrency in javaConcurrency in java
Concurrency in java
 
Multithreading 101
Multithreading 101Multithreading 101
Multithreading 101
 
Multithreading
MultithreadingMultithreading
Multithreading
 
Java
JavaJava
Java
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
 
Java Thread & Multithreading
Java Thread & MultithreadingJava Thread & Multithreading
Java Thread & Multithreading
 
Ppl for students unit 4 and 5
Ppl for students unit 4 and 5Ppl for students unit 4 and 5
Ppl for students unit 4 and 5
 
Java
JavaJava
Java
 
Multithread Programing in Java
Multithread Programing in JavaMultithread Programing in Java
Multithread Programing in Java
 
multi threading
multi threadingmulti threading
multi threading
 
MULTI THREADING IN JAVA
MULTI THREADING IN JAVAMULTI THREADING IN JAVA
MULTI THREADING IN JAVA
 
Multithreading
MultithreadingMultithreading
Multithreading
 
Md09 multithreading
Md09 multithreadingMd09 multithreading
Md09 multithreading
 
Multithreading Introduction and Lifecyle of thread
Multithreading Introduction and Lifecyle of threadMultithreading Introduction and Lifecyle of thread
Multithreading Introduction and Lifecyle of thread
 
Java concurrency
Java concurrencyJava concurrency
Java concurrency
 
Multithreading programming in java
Multithreading programming in javaMultithreading programming in java
Multithreading programming in java
 

Similar to concurrency

Java Performance, Threading and Concurrent Data Structures
Java Performance, Threading and Concurrent Data StructuresJava Performance, Threading and Concurrent Data Structures
Java Performance, Threading and Concurrent Data StructuresHitendra Kumar
 
Multithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdfMultithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdfgiridharsripathi
 
Ppl for students unit 4 and 5
Ppl for students unit 4 and 5Ppl for students unit 4 and 5
Ppl for students unit 4 and 5Akshay Nagpurkar
 
Multi t hreading_14_10
Multi t hreading_14_10Multi t hreading_14_10
Multi t hreading_14_10Minal Maniar
 
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYAChapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYAMaulik Borsaniya
 
Multithreading in java
Multithreading in javaMultithreading in java
Multithreading in javaKavitha713564
 
Multithreading in java
Multithreading in javaMultithreading in java
Multithreading in javaKavitha713564
 
OOPS object oriented programming UNIT-4.pptx
OOPS object oriented programming UNIT-4.pptxOOPS object oriented programming UNIT-4.pptx
OOPS object oriented programming UNIT-4.pptxArulmozhivarman8
 
Java multithreading
Java multithreadingJava multithreading
Java multithreadingMohammed625
 
Generators & Decorators.pptx
Generators & Decorators.pptxGenerators & Decorators.pptx
Generators & Decorators.pptxIrfanShaik98
 
Slot02 concurrency1
Slot02 concurrency1Slot02 concurrency1
Slot02 concurrency1Viên Mai
 
Multithreading
MultithreadingMultithreading
Multithreadingsagsharma
 
maXbox Starter 42 Multiprocessing Programming
maXbox Starter 42 Multiprocessing Programming maXbox Starter 42 Multiprocessing Programming
maXbox Starter 42 Multiprocessing Programming Max Kleiner
 
Here comes the Loom - Ya!vaConf.pdf
Here comes the Loom - Ya!vaConf.pdfHere comes the Loom - Ya!vaConf.pdf
Here comes the Loom - Ya!vaConf.pdfKrystian Zybała
 

Similar to concurrency (20)

Java Performance, Threading and Concurrent Data Structures
Java Performance, Threading and Concurrent Data StructuresJava Performance, Threading and Concurrent Data Structures
Java Performance, Threading and Concurrent Data Structures
 
Multithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdfMultithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdf
 
Ppl for students unit 4 and 5
Ppl for students unit 4 and 5Ppl for students unit 4 and 5
Ppl for students unit 4 and 5
 
Multi t hreading_14_10
Multi t hreading_14_10Multi t hreading_14_10
Multi t hreading_14_10
 
Java multi thread programming on cmp system
Java multi thread programming on cmp systemJava multi thread programming on cmp system
Java multi thread programming on cmp system
 
MultiThreading in Python
MultiThreading in PythonMultiThreading in Python
MultiThreading in Python
 
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYAChapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
 
Multithreading in java
Multithreading in javaMultithreading in java
Multithreading in java
 
Multithreading in java
Multithreading in javaMultithreading in java
Multithreading in java
 
multithreading
multithreadingmultithreading
multithreading
 
OOPS object oriented programming UNIT-4.pptx
OOPS object oriented programming UNIT-4.pptxOOPS object oriented programming UNIT-4.pptx
OOPS object oriented programming UNIT-4.pptx
 
Java multithreading
Java multithreadingJava multithreading
Java multithreading
 
Generators & Decorators.pptx
Generators & Decorators.pptxGenerators & Decorators.pptx
Generators & Decorators.pptx
 
Slot02 concurrency1
Slot02 concurrency1Slot02 concurrency1
Slot02 concurrency1
 
Multithreading
MultithreadingMultithreading
Multithreading
 
Multithreading
MultithreadingMultithreading
Multithreading
 
Java Threads
Java ThreadsJava Threads
Java Threads
 
maXbox Starter 42 Multiprocessing Programming
maXbox Starter 42 Multiprocessing Programming maXbox Starter 42 Multiprocessing Programming
maXbox Starter 42 Multiprocessing Programming
 
Here comes the Loom - Ya!vaConf.pdf
Here comes the Loom - Ya!vaConf.pdfHere comes the Loom - Ya!vaConf.pdf
Here comes the Loom - Ya!vaConf.pdf
 
Threadnotes
ThreadnotesThreadnotes
Threadnotes
 

concurrency

  • 1. Concurrency A very brief overview, focusing on threading. Jonathan Wagoner 2016-07-29 Basic Concepts Process Thread Python concurrency modules
  • 2. A running program is called a process. Each process has the following: It's own system state Memory Lists of open files A program counter keeping track of instructions to execute The system stack etc. Normally, processes execute statements one after another in a single sequence of control flow, which is otherwise referred to as the main thread. This means, at any given time, the program is just doing one thing. Programs can create new processes via commands found in the os or subprocess modules, such as os.fork() or subprocess.call(), etc. These are known as subprocesses because they are run with their own private system state and main thread of execution. Because subprocesses are independent, they can execute concurrently with the original process. Processes can communicate with one another via Interprocess Communication (IPC). This protocol uses primatives such as send() and recv() to transmit and receive messages through and I/O channel such as a pipe or network socket. A thread is a similar to a process in that it has its own control flow and execution stack. However, the fundamental difference is a thread runs inside the process that created it! This means that a thread can share all the data and system resources within a process. When multiple processes or threads are used, the host Operating System (OS) will schedule their work. This is done by a concept known as time- slicing, in which each process or thread is given a small time slice and the OS will rapidly cycle between all of the active tasks.
  • 3. Why Use Concurrency? Compelling Reasons: Performance Responsiveness Non-Blocking / Yielding Under the Hood: Preemption Inter and Intra-Process Communication Running several threads is similar to running several different programs at the same time (i.e. concurrently), though with the following benefits: 1. Multiple threads within a process share the same data space with the main thread and can therefore share information or communicate with each other more easily than if they were separate processes. 2. Threads sometimes called light-weight processes and they do not require much memory overhead; they are cheaper than processes. 3. A thread has a beginning, an execution sequence, and a conclusion. It has an instruction pointer that keeps track of where within its context it is currently running. 4. It can be pre-empted (interrupted) 5. It can temporarily be put on hold (also known as sleeping) while other threads are running - this is called yielding. Threading does NOT mean: 1. That the programs are executed on different CPUs. 2. That your program is automatically made faster, especially if it already uses 100% CPU time. For the above instances on what threading does not mean, look into using multiprocessing, which will be discussed in more detail on a different training session.
  • 4. The Global Interpreter Lock (GIL) Only a single Python thread may run at any moment, independent of CPU cores When to use threads When to use processes The Global Interpreter Lock (GIL) Python threading DOES NOT work as you would expect it to, especially if you are not a Python developer and are coming from other languages such as C++ or Java. One can write code in Python that uses threads and actually obtain worse performance, especially if the program uses almost all the CPU. While Python is minimally thread-safe, the Python interpreter uses the Global Interpreter Lock, or GIL. The GIL allows only a single Python thread to execute at any moment. In other words, this restricts Python programs to run only on a single processor, no matter how many CPU cores might be available. The GIL is a heated topic of debate in the Python community, and it's unlikely to be going anywhere soon. So when should one use threading and when should one use multiprocessing? If your application is mostly I/O bound, then using threads in Python are fine. However, if there are applications that require heavy amounts of CPU processing, then using threads to subdivide work doesn't provide any benefit at all. In fact, your program will actually run slower. A best practice is to profile any threaded code for performance! Unfortunately, we don't have time to go in-depth into profiling, so let's save that for next time.
  • 5. In [1]: import random import threading import logging import time reload(logging) # Needed because Jupyter mucks with the logging module. reload(threading) # Ditto. Jupyter also mucks with the threading module. logging.basicConfig(level=logging.INFO, format='%(asctime)s (%(threadName)-10s) %(message)s', datefmt='%M:%S')
  • 6. This entire training module focuses solely on the threading library. Let's save discussions on other concurrency libraries and constructs, such as multiprocessing, queues, etc. for a later talk. class threading.Thread(group=None, target=None, name=None, args=(), kwargs={}) This constructor should always be called with keyword arguments. Arguments are: group should be None; reserved for future extension when a ThreadGroup class is implemented. target is the callable object to be invoked by the run() method. Defaults to None, meaning nothing is cal led. name is the thread name. By default, a unique name is constructed of the form “Thread-N” where N is a sma ll decimal number. args is the argument tuple for the target invocation. Defaults to (). kwargs is a dictionary of keyword arguments for the target invocation. Defaults to {}. Once a thread object is created, its activity must be started by calling the thread's start() method. This invokes the run() method in a separate thread of control. Once the thread's activity is started, the thread is considered 'alive'. It stops being alive when its run() method terminates - either normally, or by raising an unhandled exception. The is_alive() method tests whether the thread is alive. Why should start() be used instead of run()?
  • 7. start() Start the thread’s activity. It must be called at most once per thread object. It arranges for the object’s run() method to be inv oked in a separate thread of control. This method will raise a RuntimeError if called more than once on the same thread object. run() Method representing the thread’s activity. You may override this method in a subclass. The standard run() method invokes the callable object pas sed to the object’s constructor as the target argument, if any, with sequential and keyword arguments tak en from the args and kwargs arguments, respectively. The simpliest way to start a thread is to instantiate it with the target method and let it start working. Make sure you use start() to activate the thread instead of run()!
  • 8. In [2]: def _worker(fruit): '''Perform some "work" by eating a piece of `fruit`.''' logging.info('Eating %s ...' % repr(fruit)) # Indicates "work" is being performed. logging.info('*Burp* That was a good %s!' % repr(fruit)) # Indicates the "work" is done. def serial_example(): '''Executes 5 `_worker` methods serially, not using threads.''' for fruit in ('apple', 'banana', 'grape', 'cherry', 'strawberry'): _worker(fruit) if __name__ == '__main__': serial_example() The Basics "Why did the multithreaded Chicken cross the road?" to To other side. get the Concurrent vs Serial Re-entrant vs Non-reentrant Thread Safe threading.Thread run() vs start() 13:06 (MainThread) Eating 'apple' ... 13:06 (MainThread) *Burp* That was a good 'apple'! 13:06 (MainThread) Eating 'banana' ... 13:06 (MainThread) *Burp* That was a good 'banana'! 13:06 (MainThread) Eating 'grape' ... 13:06 (MainThread) *Burp* That was a good 'grape'! 13:06 (MainThread) Eating 'cherry' ... 13:06 (MainThread) *Burp* That was a good 'cherry'! 13:06 (MainThread) Eating 'strawberry' ... 13:06 (MainThread) *Burp* That was a good 'strawberry'!
  • 9. In [3]: def _worker(): '''A basic "worker" method which logs to stdout, indicating that it did some work.''' logging.info('Doing Work ...') def basic_example(): '''Executes 5 `_worker` methods on threads.''' for _ in range(5): thread_inst = threading.Thread(target=_worker) thread_inst.start() if __name__ == '__main__': basic_example() 13:06 (Thread-1 ) Doing Work ... 13:06 (Thread-2 ) Doing Work ... 13:06 (Thread-3 ) Doing Work ... 13:06 (Thread-4 ) Doing Work ... 13:06 (Thread-5 ) Doing Work ...
  • 10. In [4]: def _worker(fruit): '''Perform some "work" by eating a piece of `fruit` yet finishing after a random amount of tim e.''' logging.info('Eating %s ...' % repr(fruit)) # Indicates "work" is being perfored. seconds_to_sleep = random.randint(1, 10) / 10. logging.debug('This will take %s seconds to eat' % seconds_to_sleep) time.sleep(seconds_to_sleep) logging.info('*Burp* That was a good %s!' % repr(fruit)) # Indicates the "work" is done. def asynchronous_example(): '''Executes 5 `_worker` methods on threads.''' threads = list() for fruit in ('apple', 'banana', 'grape', 'cherry', 'strawberry'): thread_inst = threading.Thread(target=_worker, args=(fruit,)) threads.append(thread_inst) [thread_inst.start() for thread_inst in threads] # pylint: disable-msg=expression-not-assigned [thread_inst.join() for thread_inst in threads] # Needed for Jupyter if __name__ == '__main__': asynchronous_example() 13:06 (Thread-6 ) Eating 'apple' ... 13:06 (Thread-7 ) Eating 'banana' ... 13:06 (Thread-8 ) Eating 'grape' ... 13:06 (Thread-9 ) Eating 'cherry' ... 13:06 (Thread-10 ) Eating 'strawberry' ... 13:06 (Thread-8 ) *Burp* That was a good 'grape'! 13:06 (Thread-6 ) *Burp* That was a good 'apple'! 13:06 (Thread-10 ) *Burp* That was a good 'strawberry'! 13:06 (Thread-9 ) *Burp* That was a good 'cherry'! 13:07 (Thread-7 ) *Burp* That was a good 'banana'!
  • 11. Daemon vs Non-Daemon Threads join() currentThread() Threads are non-daemon by default Daemon threads die when the main program dies Useful applications This example illustrates some property differences betweeen daemon and non-daemon threads. Up to this point, all of the example programs have implicitly waited to exit until all the threads have completed their work. Sometimes a program will spawn a thread as a daemon that runs without blocking the main program from exiting. These daemon threads are best designed to do background tasks, such as sending keepalive packets, performing periodic garbage collection, etc. These are only useful when the main program is running and it's okay to kill them off once the other, non-daemon, threads have exited. They are also useful for services where there may not be an easy way to interrupt the thread or where letting the thread die in the middle of its work does not lose or corrupt data (for example, a thread that generates keepalive packets for a service monitoring tool). Without daemon threads, you'd have to keep track of them, and tell them to exit, before your program can completely quit. By setting them as daemon threads, you can let them run and forget about them, and when your program quits, any daemon threads are killed automatically. The default is for threads to not be daemons. This example also illustrates that each threading.Thread instance has a name with a default value that can be assigned during thread instantiation. Naming threads is useful in server process applications, for example, in which there are multiple threads handling different operations. Also, as you've seen from now, the logging module supports embedding the thread name in every log message. logging is also thread-safe, so messages from different threads are kept distinct in the output.
  • 12. In [5]: def _worker(seconds_to_sleep=.1): '''A "worker" method that indicates when work is starting, waits a static amount of time, then in dicates when work is done.''' logging.info('Starting %s' % threading.currentThread().getName()) time.sleep(seconds_to_sleep) logging.info('Exiting %s' % threading.currentThread().getName()) def daemon_vs_non_daemon_example(): daemon_thread_inst = threading.Thread(target=_worker, name='Daemon Worker', args=(3,)) daemon_thread_inst.setDaemon(True) non_daemon_thread_inst = threading.Thread(target=_worker, name='Non-daemon Worker') daemon_thread_inst.start() non_daemon_thread_inst.start() # Remove these two lines and you will not see "Exiting Daemon Worker". daemon_thread_inst.join() non_daemon_thread_inst.join() if __name__ == '__main__': daemon_vs_non_daemon_example() Advanced Features currentThread() enumerate() 13:07 (Daemon Worker) Starting Daemon Worker 13:07 (Non-daemon Worker) Starting Non-daemon Worker 13:07 (Non-daemon Worker) Exiting Non-daemon Worker 13:10 (Daemon Worker) Exiting Daemon Worker
  • 13. When using daemon threads, it isn't necessary to retain an explicit handle to all of them to ensure they were completed before exiting the main process thread. The threading module provides an enumerate() method, which returns a list of active threading.Thread instances. Warning: This list includes the main thread, and joining that will introduce deadlock situation, so care must be taken to skip the joining of the main thread.
  • 14. In [18]: def _worker(fruit): '''Perform some "work" by eating a piece of `fruit` yet finishing after a random amount of tim e.''' logging.info('Eating %s ...' % repr(fruit)) # Indicates "work" is being perfored. seconds_to_sleep = random.randint(5, 15) logging.debug('This will take %s seconds to eat' % seconds_to_sleep) time.sleep(seconds_to_sleep) logging.info('*Burp* That was a good %s!' % repr(fruit)) # Indicates the "work" is done. def enumeration_example(): for fruit in ('apple', 'banana', 'grape', 'cherry', 'strawberry'): thread_inst = threading.Thread(target=_worker, name='%s' % fruit, args=(fruit,)) thread_inst.setDaemon(True) thread_inst.start() main_thread = threading.currentThread() for thread_inst in threading.enumerate(): logging.debug('Active threads: %s' % repr(', '.join(sorted([thread_inst.name for thread_inst in threading.enumerate()])))) if thread_inst is main_thread: continue logging.info('Joining on %s Worker ...' % repr(thread_inst.getName())) thread_inst.join(16) # Needed for Jupyter if __name__ == '__main__': enumeration_example() 14:19 (apple ) Eating 'apple' ... 14:19 (banana ) Eating 'banana' ... 14:19 (grape ) Eating 'grape' ... 14:19 (cherry ) Eating 'cherry' ... 14:19 (strawberry) Eating 'strawberry' ... 14:19 (MainThread) Joining on 'strawberry' Worker ... 14:25 (grape ) *Burp* That was a good 'grape'! 14:26 (banana ) *Burp* That was a good 'banana'! 14:27 (cherry ) *Burp* That was a good 'cherry'! 14:28 (strawberry) *Burp* That was a good 'strawberry'! 14:31 (apple ) *Burp* That was a good 'apple'!
  • 15. Subclassing Can overload the run() method Useful for extending functionality In [7]: class _CustomThread(threading.Thread): def __init__(self, foo, *args, **kwargs): super(_CustomThread, self).__init__(*args, **kwargs) self.foo_ = foo self.args = args self.kwargs = kwargs def run(self): logging.info('Executing Foo %s with args %s and kwargs %s' % (repr(self.foo_), repr(self.args), repr(self.kwargs))) def subclassing_example(): for index in range(1, 6): thread_inst = _CustomThread(foo='Fighter', name='Worker %d' % index) thread_inst.start() if __name__ == '__main__': subclassing_example() 13:21 (Worker 1 ) Executing Foo 'Fighter' with args () and kwargs {'name': 'Worker 1'} 13:21 (Worker 2 ) Executing Foo 'Fighter' with args () and kwargs {'name': 'Worker 2'} 13:21 (Worker 3 ) Executing Foo 'Fighter' with args () and kwargs {'name': 'Worker 3'} 13:21 (Worker 4 ) Executing Foo 'Fighter' with args () and kwargs {'name': 'Worker 4'} 13:21 (Worker 5 ) Executing Foo 'Fighter' with args () and kwargs {'name': 'Worker 5'}
  • 16. class StoppableThread(threading.Thread): def __init__(self, *args, **kwargs): super(StoppableThread, self).__init__(*args, **kwargs) self._terminate = False self._suspend_lock = threading.Lock() def terminate(self): self._terminate = True def suspend(self): self._suspend_lock.acquire() def resume(self): self._suspend_lock.release() def run(self): while True: if self._terminate: break self._suspend_lock.acquire() self._suspend_lock.release() # statements go here Timer Starts after some delay cancel() Useful applications
  • 17. In [19]: def _worker(): '''A very basic "worker" method which logs to stdout, indicating that it did some work.''' logging.info('Doing Work ...') def timer_example(): thread_inst_1 = threading.Timer(3, _worker) thread_inst_1.setName('Worker 1') thread_inst_2 = threading.Timer(3, _worker) thread_inst_2.setName('Worker 2') logging.info('Starting timer threads ...') thread_inst_1.start() thread_inst_2.start() logging.info('Waiting before canceling %s ...' % repr(thread_inst_2.getName())) time.sleep(2.) logging.info('Canceling %s ...' % repr(thread_inst_2.getName())) thread_inst_2.cancel() logging.info('Done!') if __name__ == '__main__': timer_example() Signaling threading.Event One thread signals an event; the others wait for it Useful applications 14:42 (MainThread) Starting timer threads ... 14:42 (MainThread) Waiting before canceling 'Worker 2' ... 14:44 (MainThread) Canceling 'Worker 2' ... 14:44 (MainThread) Done! 14:45 (Worker 1 ) Doing Work ...
  • 18. class threading.Event The internal flag is initially false. is_set() isSet() Return true if and only if the internal flag is true. set() Set the internal flag to true. All threads waiting for it to become true are awakened. Threads that c all wait() once the flag is true will not block at all. clear() Reset the internal flag to false. Subsequently, threads calling wait() will block until set() is call ed to set the internal flag to true again. wait([timeout]) Block until the internal flag is true. If the internal flag is true on entry, return immediately. Oth erwise, block until another thread calls set() to set the flag to true, or until the optional timeout occ urs. When the timeout argument is present and not None, it should be a floating point number specifying a time out for the operation in seconds (or fractions thereof). This method returns the internal flag on exit, so it will always return True except if a timeout is given and the operation times out.
  • 19. In [9]: EVENT = threading.Event() def _blocking_event(): logging.info('Starting') event_is_set = EVENT.wait() logging.info('Event is set: %s' % repr(event_is_set)) def _non_blocking_event(timeout): while not EVENT.isSet(): logging.info('Starting') event_is_set = EVENT.wait(timeout) logging.info('Event is set: %s' % repr(event_is_set)) if event_is_set: logging.info('Processing event ...') else: logging.info('Waiting ...') In [20]: def signaling_example(): blocking_thread_inst = threading.Thread(target=_blocking_event, name='Blocking Event') non_blocking_thread_inst = threading.Thread(target=_non_blocking_event, name='Non-Blocking Event', args=(2.,)) blocking_thread_inst.start() non_blocking_thread_inst.start() logging.info('Waiting until Event.set() is called ...') time.sleep(5.) EVENT.set() logging.info('Event is set') if __name__ == '__main__': signaling_example() 14:52 (Blocking Event) Starting 14:52 (MainThread) Waiting until Event.set() is called ... 14:52 (Blocking Event) Event is set: True 14:57 (MainThread) Event is set
  • 20. Controlling Resources threading.Lock acquire() and release() Useful applications
  • 21. Controlling access to shared resources (such as physical hardware, test equipment, memory, Disk I/O, etc.) is important to prevent corruption or missed data. In Python, not all data structures are equal with regards to being thread-safe. So what data structures are thread-safe and which are not? Python’s built-in data structures (Lists, Dictionaries, etc.) are thread-safe as a side-effect of having atomic byte-codes for manipulating them (the GIL is not released in the middle of an update). Other data structures implemented in Python, or simpler types like Integers and Floats, don’t have that protection. To guard against simultaneous access to an object, use a threading.Lock object. A primitive lock is in one of two states, “locked” or “unlocked”. It is created in the unlocked state. It has two basic methods, acquire() and release(). When the state is unlocked, acquire() changes the state to locked and returns immediately. When the state is locked, acquire() blocks until a call to release() in another thread changes it to unlocked, then the acquire() call resets it to locked and returns. The release() method should only be called in the locked state; it changes the state to unlocked and returns immediately. If an attempt is made to release an unlocked lock, a ThreadError will be raised. When more than one thread is blocked in acquire() waiting for the state to turn to unlocked, only one thread proceeds when a release() call resets the state to unlocked; which one of the waiting threads proceeds is not defined, and may vary across implementations. All methods are executed atomically. Lock.acquire([blocking]) Acquire a lock, blocking or non-blocking. When invoked with the blocking argument set to True (the default), block until the lock is unlocked, then set it to locked and return True. When invoked with the blocking argument set to False, do not block. If a call with blocking set to True would block, return False immediately; otherwise, set the lock to locked and return True. Lock.release() Release a lock. When the lock is locked, reset it to unlocked, and return. If any other threads are blocked waiting for the lock to become unlocked, allow exactly one of them to proceed. When invoked on an unlocked lock, a ThreadError is raised. There is no return value.
  • 22. In [11]: LOCK = threading.Lock() def _worker(): logging.info('Waiting for lock ...') LOCK.acquire() logging.info('Acquired lock') try: logging.info('Doing work on shared resource ...') time.sleep(random.randint(30, 50) / 10.) finally: LOCK.release() logging.info('Released lock') In [12]: def locking_example(): threads = list() for index in range(1, 3): thread_inst = threading.Thread(target=_worker, name='Worker %d' % index) threads.append(thread_inst) random.shuffle(threads) [thread_inst.start() for thread_inst in threads] # pylint: disable-msg=expression-not-assigned logging.info('Waiting for worker threads to finish ...') main_thread = threading.currentThread() for thread_inst in threading.enumerate(): if thread_inst is not main_thread: thread_inst.join(10) # Needed for Jupyter since they overload the threading module! if __name__ == '__main__': locking_example() 13:28 (Worker 1 ) Waiting for lock ... 13:28 (Worker 2 ) Waiting for lock ... 13:28 (MainThread) Waiting for worker threads to finish ... 13:28 (Worker 1 ) Acquired lock 13:28 (Worker 1 ) Doing work on shared resource ... 13:28 (Non-Blocking Event) Event is set: True 13:28 (Non-Blocking Event) Processing event ... 13:31 (Worker 1 ) Released lock 13:31 (Worker 2 ) Acquired lock 13:31 (Worker 2 ) Doing work on shared resource ... 13:34 (Worker 2 ) Released lock
  • 23. Reentrant Locks threading.RLock Useful applications Normal threading.Lock objects cannot be acquired more than once, even by the same thread. This can produce undesirable side-effects, especially if a lock is accessed by more than one method in the call chain. A reentrant lock is a synchronization primitive that may be acquired multiple times by the same thread. Internally, it uses the concepts of “owning thread” and “recursion level” in addition to the locked/unlocked state used by primitive locks. In the locked state, some thread owns the lock; in the unlocked state, no thread owns it. To lock the lock, a thread calls its acquire() method; this returns once the thread owns the lock. To unlock the lock, a thread calls its release() method. acquire()/release() call pairs may be nested; only the final release() (the release() of the outermost pair) resets the lock to unlocked and allows another thread blocked in acquire() to proceed. RLock.acquire([blocking=1]) Acquire a lock, blocking or non-blocking. When invoked without arguments: if this thread already owns the lock, increment the recursion level by one, and return immediately. Otherwise, if another thread owns the lock, block until the lock is unlocked. Once the lock is unlocked (not owned by any thread), then grab ownership, set the recursion level to one, and return. If more than one thread is blocked waiting until the lock is unlocked, only one at a time will be able to grab ownership of the lock. There is no return value in this case. When invoked with the blocking argument set to true, do the same thing as when called without arguments, and return true. When invoked with the blocking argument set to false, do not block. If a call without an argument would block, return false immediately; otherwise, do the same thing as when called without arguments, and return true. RLock.release() Release a lock, decrementing the recursion level. If after the decrement it is zero, reset the lock to unlocked (not owned by any thread), and if any other threads are blocked waiting for the lock to become unlocked, allow exactly one of them to proceed. If after the decrement the recursion level is still nonzero, the lock remains locked and owned by the calling thread. Only call this method when the calling thread owns the lock. A RuntimeError is raised if this method is called when the lock is unlocked. There is no return value.
  • 24. In [13]: logging.basicConfig(level=logging.INFO, format='%(asctime)s (%(threadName)-10s) %(message)s') def rlocking_example(): lock = threading.Lock() logging.info('First try to acquire a threading.Lock object: %s' % bool(lock.acquire())) logging.info('Second try to acquire a threading.Lock object: %s' % bool(lock.acquire(0))) rlock = threading.RLock() logging.info('First try to acquire a threading.RLock object: %s' % bool(rlock.acquire())) logging.info('Second try to acquire a threading.RLock object: %s' % bool(rlock.acquire(0))) if __name__ == '__main__': rlocking_example() Using threading.Lock as Context Managers Eliminates try/finally block Cleaner, more readable and safer code 13:34 (MainThread) First try to acquire a threading.Lock object: True 13:34 (MainThread) Second try to acquire a threading.Lock object: False 13:34 (MainThread) First try to acquire a threading.RLock object: True 13:34 (MainThread) Second try to acquire a threading.RLock object: True
  • 25. In [14]: LOCK = threading.Lock() def _worker(): logging.info('Waiting for lock ...') with LOCK: logging.info('Acquired lock') logging.info('Doing work on shared resource ...') time.sleep(random.randint(30, 50) / 10.) logging.info('Released lock') def locking_example(): threads = list() for index in range(1, 3): thread_inst = threading.Thread(target=_worker, name='Worker %d' % index) threads.append(thread_inst) random.shuffle(threads) [thread_inst.start() for thread_inst in threads] # pylint: disable-msg=expression-not-assigned logging.info('Waiting for worker threads to finish ...') main_thread = threading.currentThread() for thread_inst in threading.enumerate(): if thread_inst is not main_thread: thread_inst.join(10) # Needed for Jupyter since they overload the threading module! if __name__ == '__main__': locking_example() 13:34 (Worker 2 ) Waiting for lock ... 13:34 (Worker 1 ) Waiting for lock ... 13:34 (MainThread) Waiting for worker threads to finish ... 13:34 (Worker 2 ) Acquired lock 13:34 (Worker 2 ) Doing work on shared resource ... 13:39 (Worker 2 ) Released lock 13:39 (Worker 1 ) Acquired lock 13:39 (Worker 1 ) Doing work on shared resource ... 13:44 (Worker 1 ) Released lock
  • 26. Synchronization threading.Condition wait() and notify() Useful applications In addition to using threading.Event, another way of synchronizing threads is through using a threading.Condition object. Because the threading.Condition uses a threading.Lock, it can be tied to a shared resource. This allows threads to wait for the resource to be updated. In this example, the _consumer() threads wait() for the threading.Condition to be set before continuing. The _producer() thread is responsible for setting the condition and notifying the other threads that they can continue. A condition variable is always associated with some kind of lock; this can be passed in or one will be created by default. (Passing one in is useful when several condition variables must share the same lock.) A condition variable has acquire() and release() methods that call the corresponding methods of the associated lock. It also has a wait() method, and notify() and notifyAll() methods. These three must only be called when the calling thread has acquired the lock, otherwise a RuntimeError is raised. The wait() method releases the lock, and then blocks until it is awakened by a notify() or notifyAll() call for the same condition variable in another thread. Once awakened, it re-acquires the lock and returns. It is also possible to specify a timeout. The notify() method wakes up one of the threads waiting for the condition variable, if any are waiting. The notifyAll() method wakes up all threads waiting for the condition variable. Note: the notify() and notifyAll() methods don’t release the lock; this means that the thread or threads awakened will not return from their wait() call immediately, but only when the thread that called notify() or notifyAll() finally relinquishes ownership of the lock.
  • 27. In [15]: CONDITION = threading.Condition() def _consumer(): logging.info('Starting consumer thread') with CONDITION: CONDITION.wait() logging.info('Resource is available to consumer') def _producer(): logging.info('Starting producer thread') with CONDITION: logging.info('Making resource available') CONDITION.notifyAll() def synchronization_example(): thread_inst_consumer_1 = threading.Thread(target=_consumer, name='Consumer 1') thread_inst_consumer_2 = threading.Thread(target=_consumer, name='Consumer 2') thread_inst_consumer_3 = threading.Thread(target=_consumer, name='Consumer 3') thread_inst_producer = threading.Thread(target=_producer, name='Producer') for thread_inst in (thread_inst_consumer_1, thread_inst_consumer_2, thread_inst_consumer_3): thread_inst.start() time.sleep(2) thread_inst_producer.start() if __name__ == '__main__': synchronization_example() 13:44 (Consumer 1) Starting consumer thread 13:44 (Consumer 2) Starting consumer thread 13:44 (Consumer 3) Starting consumer thread 13:46 (Producer ) Starting producer thread 13:46 (Producer ) Making resource available 13:46 (Consumer 2) Resource is available to consumer 13:46 (Consumer 3) Resource is available to consumer 13:46 (Consumer 1) Resource is available to consumer
  • 28. Resource Management threading.Semaphore Blocks when internal counter reaches 0 Useful applications Sometimes it is useful to allow more than one worker access to a resource at a time, while still limiting the overall number. For example, a connection pool might support a fixed number of simultaneous connections, or a network application might support a fixed number of concurrent downloads. A threading.Semaphore is one way to manage those connections. class threading.Semaphore([value]) The optional argument gives the initial value for the internal counter; it defaults to 1. If the value given is less than 0, ValueError is raised. acquire([blocking]) Acquire a semaphore. When invoked without arguments: if the internal counter is larger than zero on entry, decrement it by one and return immediately. If it is zero on entry, block, waiting until some other thread has called release() to make it larger than zero. This is done with proper interlocking so that if multiple acquire() calls are blocked, release() will wake exactly one of them up. The implementation may pick one at random, so the order in which blocked threads are awakened should not be relied on. There is no return value in this case. When invoked with blocking set to true, do the same thing as when called without arguments, and return true. When invoked with blocking set to false, do not block. If a call without an argument would block, return false immediately; otherwise, do the same thing as when called without arguments, and return true. release() Release a semaphore, incrementing the internal counter by one. When it was zero on entry and another thread is waiting for it to become larger than zero again, wake up that thread.
  • 29. In [16]: LOCK = threading.Lock() SEMAPHORE = threading.Semaphore(2) def _worker(): logging.info('Waiting for lock ...') with LOCK: logging.info('Acquired lock') logging.info('Doing work on shared resource ...') time.sleep(random.randint(30, 50) / 10.) logging.info('Released lock') def arbiter(): logging.info('Waiting to join the pool ...') with SEMAPHORE: logging.info('Now part of the pool party!') _worker()
  • 30. In [17]: def resource_management_example(): for index in range(1, 4): thread_inst = threading.Thread(target=arbiter, name='Worker %d' % index) thread_inst.start() if __name__ == '__main__': resource_management_example() Problems Added complexity Deadlock Livelock Starvation 13:46 (Worker 1 ) Waiting to join the pool ... 13:46 (Worker 2 ) Waiting to join the pool ... 13:46 (Worker 1 ) Now part of the pool party! 13:46 (Worker 3 ) Waiting to join the pool ... 13:46 (Worker 2 ) Now part of the pool party! 13:46 (Worker 1 ) Waiting for lock ... 13:46 (Worker 2 ) Waiting for lock ... 13:46 (Worker 1 ) Acquired lock 13:46 (Worker 1 ) Doing work on shared resource ... 13:50 (Worker 1 ) Released lock 13:50 (Worker 2 ) Acquired lock 13:50 (Worker 3 ) Now part of the pool party! 13:50 (Worker 2 ) Doing work on shared resource ... 13:50 (Worker 3 ) Waiting for lock ... 13:54 (Worker 2 ) Released lock 13:54 (Worker 3 ) Acquired lock 13:54 (Worker 3 ) Doing work on shared resource ... 13:58 (Worker 3 ) Released lock
  • 31. Closing What we covered Know when and when not to use threads The tip of the iceburg