Irregularity Countermeasures in Massively Parallel BigData Processors

Risk Management for BigData
• hardware moving towards regular designs -- manycore
• but execution environments are becoming more irregular
◦ even physics 14, modeling and other HPC is becoming irrsegular
• the question: what is the risk from running irregular apps on regular
structures?
◦ other questions: which hardware platforms are better and which can be
improved?
• this talk presents the concept of massively multicore as a possible
answer
14 S.Jarp+2 "The future of commodity computing and many-core versus the interests of HEP software" Journal of Physics (2012)
M.Zhanikeev -- maratishe@gmail.com Irregularity Countermeasures in Massively Parallel BigData Processors -- bit.do/151016 2/22
2/22

The Story of Massively Multicore
….
Time
Now
(buffer head)
Manager
Job
Job
Buffer
tail
pos
pos
Controller
Kill
2 Report
Manage
in realtime
One Replay Batch
One
Buffer
One
Buffer
One
BufferJobs
Jobs
Jobs
Replay at
a scale
1
• traditional Hadoop has reached
its limits 0708
• massively multicore: many cores
but connected in the multicore (not manycore)
design
• many uses, but specifically for
BigData Replay on Multicore 01
• merits: can pack jobs in batches,
optimize batches at runtime, etc.
• irregularity is in the variance in
playback positions across jobs
01 myself+0 "Streaming Algorithms for Big Data Processing on Multicore" Big Data: Algorithms, ...CRC (2015)
07 K.Shvachko+0 "HDFS Scalability: the Limits to Growth" the Magazine of USENIX, vol.35, no.2 (2012)
08 A.Rowstron+4 "Nobody ever got fired for using Hadoop on a cluster" 1st Int.Work. on Hot Topics in Cloud Data Processing (2012)
3/22

Implementing Massively Parallel
DRAM
shmap
shmap
…
…
Manager
Jobs
• shmap is the best
• lockfree shmap is even
better 02
• recently discussed in MPI as
the true one-sided
communication method 15
• on traditional hardware all
shmaps share the same DRAM
• ... but there is hope for
non-traditional hardware in
near future (working on it)
02 myself+0 "A lock-free shared memory design for high-throughput multicore packet traffic capture" IJNM (2014)
15 S.Potluri+4 "Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters..." 18th EuroMPI (2011)
4/22

Performance Benchmarking
5/22

Benchmark : Parameter Space
• shmap size in bytes
• batchcount : number of batches = shmap regions
• batchsize : number of jobs in each batch
• experimental setup : commodity 4-core hardware
6/22

Benchmark : Results (1)
size#100000batchcount#1
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
Executiontime(us)
Order: size batchcount batchsize
• visualizing multidim
space via
permutation
sequence on X
• the order is key
-- first parameter is
outer loop, and so on
7/22

size#100000batchsize#1
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
Executiontime(us)
Order: size batchsize batchcount
8/22

batchcount#1batchsize#1
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
Executiontime(us)
Order: batchcount batchsize size
9/22

batchcount#1size#100000
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
Executiontime(us)
Order: batchcount size batchsize
10/22

Irregularity Countermeasures
11/22

Grow vs Drop Models
….
Time
Now
(buffer head)
Manager
Job
Job
Buffer
tail
pos
pos
Controller
Kill
2 Report
Manage
in realtime
One Replay Batch
One
Buffer
One
Buffer
One
BufferJobs
Jobs
Jobs
Replay at
a scale
1
• practical problem: how to
manage batches with high
variance across jobs?
• grow: let the batch grow in
size, no need to kill/remap jobs
• drop: size is fixed, lagging
jobs are killed and possibly
remapped to other batches
• ... these are basic models,
other variants are possible
12/22

Analysis Setup
• start with 100 cores, one job per core, run for some time collecting statistics
• use hotspot distribution to describe processing time per data unit
13/22

Hotspot Distribution
Hotspot Disribution 06...
...consists of normal, popular, and hot/
flash sets
0 20 40 60 80 100
Decreasing order
0
0.35
0.7
1.05
1.4
1.75
2.1
2.45
2.8
log(value)
Class A Class B Class C Class D Class E
• CDN example: normal are almost
never watched videos, popular are
watches sometimes, and only hot/
flash are the videos which are hot
normally but also experience Flash
Crowds (go viral)
• additional classification: assign a
letter to the curve based on the
fatness of its tail (size of head)
06 myself+1 "Popularity-Based Modeling of Flash Events in Synthetic Packet Traces" IEICE CQ研 (2012)
14/22

Elasticity of the two Models
0 0.15 0.3 0.45 0.6 0.75 0.9 1.05
Norm. drop count
0
0.15
0.3
0.45
0.6
0.75
0.9
1.05
Norm.dragwindow
size#100000 batchcount#5
0 0.15 0.3 0.45 0.6 0.75 0.9 1.05
Norm. drop count
0
0.15
0.3
0.45
0.6
0.75
0.9
1.05
Norm.dragwindow
0 0.15 0.3 0.45 0.6 0.75 0.9 1.05
Norm. drop count
0
0.15
0.3
0.45
0.6
0.75
0.9
1.05
Norm.dragwindow
0 0.15 0.3 0.45 0.6 0.75 0.9 1.05
Norm. drop count
0
0.15
0.3
0.45
0.6
0.75
0.9
1.05
Norm.dragwindow
0 0.15 0.3 0.45 0.6 0.75 0.9 1.05
Norm. drop count
0
0.15
0.3
0.45
0.6
0.75
0.9
1.05
Norm.dragwindow
0 0.15 0.3 0.45 0.6 0.75 0.9 1.05
Norm. drop count
0
0.15
0.3
0.45
0.6
0.75
0.9
1.05
Norm.dragwindow
• same setups run for each model
• metrics: shmap size for grow vs drop
count for drop
• figure: plot normalized distributions of
outcomes
• figure: drop model is much more flexible
15/22

The ManyCore Design
ManyCore is about Tiles
...where each tile has CPU + L1/L2 cache
+ switch
Manycore Device
Manager
…
…
…
… …
…
I/O
One Batch
Tile
Jobsshmap
• wormhole routing is common
1311
• against intuition, wormhole method
is low-latency and
high-throughput but not
contention-free
• hardware makers offer various
tricks 13 in this area, but do not
resolve the key problem
11 J.Duato+3 "...Router Architectures for Virtual Cut-Through and Wormhole Switching in a NOW Environment" 13th IPPS/SPDP (1999)
13 D.Wentzlaff+9 "On-chip interconnection architecture of the tile processor" IEEE Micro, vol.27, issue 5 (2007)
16/22

ManyCore Parameters
• each batch is a spatial area on the chip, areas compete for space
• heterometric: a measure of irregularity = variance in batchsize, hotspots,
etc.
• performance metric: failure to map a new job to an existing batch
17/22

ManyCore Example Run
heterometric#1
batches#5
batchsize#10
classrange#A
failed grows#20
4
1
0 3
2
epoch#0
3
3
3
2
2
3
3
3
3
3
3
2
2
2
3
2
4
4
4
4
4
4
4
4
1
4
4
4
1
1
1
1
1
0
0
0
2
0
0
0
0
2
2
0
0
0
0
3
2
2
epoch#10
3
3
3
2
2
3
3
3
3
3
3
2
2
2
3
2
4
4
4
4
4
4
4
4
4
4
4
1
1
1
1
1
0
0
0
2
0
0
0
0
2
2
0
0
0
0
3
1
2
2
epoch#20
3
3
3
2
2
3
3
3
3
3
3
2
2
2
3
2
4
4
4
4
4
4
1
4
4
4
4
4
1
1
1
1
1
0
0
0
2
0
0
0
0
2
2
0
0
0
0
3
2
2
epoch#30
3
3
3
2
2
3
3
3
3
3
3
2
2
2
3
2
4
4
4
4
4
4
4
4
4
4
4
1
1
1
1
1
1
0
0
0
2
0
0
0
0
2
2
0
0
0
0
3
2
2
epoch#40
18/22

MassiveMulti- vs Many-Core (1)
• different yet comparable response to irregularity
• MassivelyMulti judged by processing time, ManyCore by failed
mappings
• elasticity : ∆output/∆configuration
19/22

MassiveMulti- vs Many-Core (2)
0.552
0.763
8.333
manycore / heterometric
2.282
23.077
1470
manycore / batches
0.792
1.155
1000
manycore / classrange
0.622
0.794
52.408
shmap / size
0.867
1.265
30.795
shmap / batchsize
0.739
1.016
19.728
shmap / batchcount
20/22

Wrapup
• ManyCore may remain for regular apps
◦ scientific modeling, Earth simulator, etc.
• irregular apps perform better on Massively Multicore
• towards new platforms : virtualization techniques for DRAM on standard
multicore 16?
16 R.Brightwell+0 "Lightweight Kernel Support for Direct Shared Memory Access..." W. on Managed Many-Core Systems (2008)
21/22

That’s all, thank you ...
22/22

Irregularity Countermeasures in Massively Parallel BigData Processors

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (19)

Similar to Irregularity Countermeasures in Massively Parallel BigData Processors

Similar to Irregularity Countermeasures in Massively Parallel BigData Processors (20)

More from Tokyo University of Science

More from Tokyo University of Science (20)

Recently uploaded

Recently uploaded (20)

Irregularity Countermeasures in Massively Parallel BigData Processors