6. Data Buffering in Multiprocessing
Module solution
Although the user of Multiprocessing and
ProcessPoolExecutor in python could solve
the parallelizable problems and does not need
to use synchronization mechanisms, such as
Locks for instance, but internally, these
mechanisms are used to transport data among
buffers and pipes in order to accomplish
communication.
7. Advantage of using PP module
1. Automatic detection of number of
processors to improve load balance
2. Many processors allocated can be
changed at runtime
3. Load balance at runtime
4. Auto-discovery resources throughout the
network
8. Advantage of using Celery
1. Improved version of the
multiprocessing pool (pool as a
service)
2. Best for shared-arrays
9. Distributing Tasks with Celery
Celery is a framework that offers
mechanisms to lessen difficulties while
creating distributed systems. The Celery
framework works with the concept of
distribution of work units (tasks) by
exchanging messages among the
machines that are interconnected as a
network, or local workers.
A task is the key concept in Celery; any
sort of job we must distribute has to be
encapsulated in a task beforehand.
10. Parallel Algorithms
The divide and conquer technique
Data decomposition
(we used to creat low cost the user thread)
Not good in the case of reality
Decomposing tasks with pipeline
Processing and mapping
11. Data Decomposition problem
We used it to solve the matrix problem
where each necessary operation to get to
the final result was executed by a single
worker, and each worker executed the
same number of operations. In real world,
there is an asymmetry of the relation
between the number of workers and the
quantity of data that is decomposed, and
this directly affects the performance of the
solution.
12. Processing and mapping solution
1. Identifying the tasks that require data
exchange
2. Grouping the tasks that establish
constant communication in a single
worker can enhance the performance.
This is true when there is a large load of data
communication as it may help reduce the
overhead in exchange of the information within
the tasks.
13. Resources/Books
1. Parallel Programming with Python – June 25, 2014 by Jan Palach
2. Distributed Computing with Python – April 2016 by Francesco Pierfederici
3. Parallel, Distributed Scripting with Python by P. Miller Center for Applied Scientific Computing