Pgaref Piccolo Building Fast, Distributed Programs with Partitioned Tables

17-11-2014 © Imperial College LondonPage 1
Piccolo: Building Fast, Distributed
Programs with Partitioned Tables
Presenter: Panagiotis Garefalakis
Course 590 - Academic Writing
Russell Power and Jinyang Li - New York University

Outline
• Motivation
• Background
• Piccolo
– Challenges
– Contribution
– Evaluation
• Conclusion
• Discussion

Motivation
Page 3
• This is the age of big data and distributed data processing
frameworks are key to analyzing them
• Companies such as Google (MapReduce), Microsoft (Naiad)
and open-source communities such as Apache (Hadoop, Spark)
have proposed such frameworks
– require developers to follow a functional programming model
Garefalakis, Panagiotis, et al. "ACaZoo: A Distributed Key-Value Store based on Replicated LSM-Trees."

Motivation
17-11-2014 © Imperial College London
• Scaling out: Processing data is quick, I/O is very slow
– 􏰀 1 HDD = 75 MB/sec
– 􏰀 1000 HDDs = 75 GB/sec
• For data-intensive workloads, a large number of
commodity servers is preferred over a small number
of high-end servers
– 􏰀 Cost of super-computers is not linear
– 􏰀 But datacenter efficiency is a difficult problem to solve
Page 4

MapReduce
• Partition a large problem into smaller sub-problems
• 􏰀Independent sub-problems executed in parallel
• Combine intermediate results from each individual node (worker)
Parallel problems which
are independent
(shared nothing)
Computations depend
on fragments of the
dataset

Motivating Example
Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.Page 6

PageRank in Map-Reduce
Page 7 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Dataflow models do not expose global state!

PageRank with RPC/MPI

Piccolo’s Goal: Distributed Shared State
• Expose this state in a useful form for the programmer but not deal with communication
• Interact with state and graph data and not with machines

Piccolo programming model
• Need an easy and effective way to access and represent the sate in matter of performance
• We need the right level of abstraction

PageRank with Piccolo

Piccolo - Locality
• Communication between machines is slow!

Piccolo - Locality
• We need to exploit locality!

PageRank with Piccolo Updated

Piccolo - Synchronization
Page 15
Avoid write conflicts with accumulation functions
•NewValue = Accum(OldValue, Update)
•sum, product, min, max
Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.

Piccolo - Failure Recovery

Piccolo Evaluation
• 12 nodes cluster, 64 cores
• 100M-page graph
Page 19
Piccolo Evaluation
Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.

Piccolo Evaluation
• EC2 Cluster – linearly scaled the amount of data in proportion with the
number of workers

Conclusion
• Parallel in memory applications might need to access
and share intermediate state which resides in
different machines
• Piccolo provides a programming model supporting
distributed shared table model
• It provides user-specified policies for
– Effective use of locality
– Efficient synchronization
– Robust failure recovery

Limitations??

Limitations??
• Aggregate functions are not always an
option
• The shared state should fit in memory
• If a node fails you should restore all nodes
to the last checkpoint

Paper Comments
• Piccolo paper is clear and concise with extensive evaluation
• It was published in 2010 and it was presented in a top-tier
systems conference (OSDI) collocated with USENIX annual
conference
• Is cited 100 time according to Google Scholar
• The reason: It introduces a new programming model for sharing
mutable state in parallel applications
• Map-Reduce which can be considered a de-facto standard for
parallel execution does not support sharing state
• It continues getting attention as it is an open research area

Panagiotis Garefalakis 17/11/2014
Review Presentation
Course 590 - Academic Writing

Backup - LB

Pgaref Piccolo Building Fast, Distributed Programs with Partitioned Tables

Recommended

Recommended

More Related Content

Similar to Pgaref Piccolo Building Fast, Distributed Programs with Partitioned Tables

Similar to Pgaref Piccolo Building Fast, Distributed Programs with Partitioned Tables (20)

More from Panagiotis Garefalakis

More from Panagiotis Garefalakis (8)

Recently uploaded

Recently uploaded (20)

Pgaref Piccolo Building Fast, Distributed Programs with Partitioned Tables