If the Data Cannot Come to
many cores with java
copyright 2013 Robert Burrell Donkin robertburrelldonkin.name
this work is licensed under a Creative Commons Attribution 3.0 Unported License
Pre-emptive multi-tasking operating
systems use involuntary context switching
to provide the illusion of parallel processes
even when the hardware supports only a
single thread of execution.
Take Away from Session One
Even on a single core,
there's no escaping parallelism.
Take Away from Session Two
Take Away from Session Three
Code executing on different cores uses copies held
in registers and caches, so memory shared is likely
to be incoherent unless the program plays by the
rules of the software platform.
S(p) = p - a (p-1)
● S(p) is the speedup for pprocessors
● a is the non-parallelizable fraction
"in practice, the problem size scales with the number of
processors" John L. Gustafson
● Think about Gustafson's Law...
● The quantity of data processed...
● ...scales linearly as processors added.
● Throwing processors at the problem
● ...at least sometimes.
Scales and Scaling
Divide and Conquer
● Back to the future
● Partition the data...
○ ...apply the same algorithm to each part and then
○ ...collate the answers.
● Natural to parallelise
● No contended shared memory
● When the algorithm is small
○ it's more efficient
■ to bring the algorithm to the data
■ than the data to the algorithm
● Whether the data is in
○ caches on cores in a many core computer, or in
○ disc storage in a distributed data store
Map and Reduce
● Partition the data
● The map algorithm
○ works in parallel
○ on local data
● The reduce algorithm
○ collates output from map algorithms
● More complex systems built from these blocks
As a Query Language
● A popular alternative to SQL
○ for distributed data stores
○ Easy to
■ read and write
○ Rich and full programming model
Crunching Big Data
● Commodity hardware
● Scales up to Terabyte and Petabyte
○ smoothly by adding new nodes
● Map-Reduce platforms typically provide
○ fault tolerance eg. retry
○ redundant data storage
● Statistical resilience
When you want to be able to process big data
tomorrow by adding cores or computers, adopt
an appropriate architecture today.