SlideShare a Scribd company logo
17-11-2014 © Imperial College LondonPage 1
Piccolo: Building Fast, Distributed
Programs with Partitioned Tables
Presenter: Panagiotis Garefalakis
Course 590 - Academic Writing
Russell Power and Jinyang Li - New York University
Outline
• Motivation
• Background
• Piccolo
– Challenges
– Contribution
– Evaluation
• Conclusion
• Discussion
17-11-2014 © Imperial College LondonPage 2
Motivation
Page 3
• This is the age of big data and distributed data processing
frameworks are key to analyzing them
• Companies such as Google (MapReduce), Microsoft (Naiad)
and open-source communities such as Apache (Hadoop, Spark)
have proposed such frameworks
– require developers to follow a functional programming model
Garefalakis, Panagiotis, et al. "ACaZoo: A Distributed Key-Value Store based on Replicated LSM-Trees."
Motivation
17-11-2014 © Imperial College London
• Scaling out: Processing data is quick, I/O is very slow
– 􏰀 1 HDD = 75 MB/sec
– 􏰀 1000 HDDs = 75 GB/sec
• For data-intensive workloads, a large number of
commodity servers is preferred over a small number
of high-end servers
– 􏰀 Cost of super-computers is not linear
– 􏰀 But datacenter efficiency is a difficult problem to solve
Page 4
MapReduce
17-11-2014 © Imperial College LondonPage 5
• Partition a large problem into smaller sub-problems
• 􏰀Independent sub-problems executed in parallel
• Combine intermediate results from each individual node (worker)
Parallel problems which
are independent
(shared nothing)
Computations depend
on fragments of the
dataset
Motivating Example
Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.Page 6
PageRank in Map-Reduce
Page 7 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Dataflow models do not expose global state!
PageRank with RPC/MPI
Page 8 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Piccolo’s Goal: Distributed Shared State
Page 9 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
• Expose this state in a useful form for the programmer but not deal with communication
• Interact with state and graph data and not with machines
Piccolo programming model
Page 10 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
• Need an easy and effective way to access and represent the sate in matter of performance
• We need the right level of abstraction
PageRank with Piccolo
Page 11 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Piccolo - Locality
Page 12 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
• Communication between machines is slow!
Piccolo - Locality
Page 13 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
• We need to exploit locality!
PageRank with Piccolo Updated
Page 14 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Piccolo - Synchronization
Page 15
Avoid write conflicts with accumulation functions
•NewValue = Accum(OldValue, Update)
•sum, product, min, max
Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
PageRank with Piccolo Updated
Page 16 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Piccolo - Failure Recovery
Page 17 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
PageRank with Piccolo Updated
Page 18 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Piccolo Evaluation
• 12 nodes cluster, 64 cores
• 100M-page graph
Page 19
Piccolo Evaluation
Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Piccolo Evaluation
• EC2 Cluster – linearly scaled the amount of data in proportion with the
number of workers
Page 20 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
Conclusion
• Parallel in memory applications might need to access
and share intermediate state which resides in
different machines
• Piccolo provides a programming model supporting
distributed shared table model
• It provides user-specified policies for
– Effective use of locality
– Efficient synchronization
– Robust failure recovery
17-11-2014 © Imperial College LondonPage 21
Limitations??
17-11-2014 © Imperial College LondonPage 22
Limitations??
• Aggregate functions are not always an
option
• The shared state should fit in memory
• If a node fails you should restore all nodes
to the last checkpoint
17-11-2014 © Imperial College LondonPage 23
Paper Comments
• Piccolo paper is clear and concise with extensive evaluation
• It was published in 2010 and it was presented in a top-tier
systems conference (OSDI) collocated with USENIX annual
conference
• Is cited 100 time according to Google Scholar
• The reason: It introduces a new programming model for sharing
mutable state in parallel applications
• Map-Reduce which can be considered a de-facto standard for
parallel execution does not support sharing state
• It continues getting attention as it is an open research area
17-11-2014 © Imperial College LondonPage 24
17-11-2014 © Imperial College LondonPage 25
Panagiotis Garefalakis 17/11/2014
Review Presentation
Course 590 - Academic Writing
Backup - LB
17-11-2014 © Imperial College LondonPage 26

More Related Content

Similar to Pgaref Piccolo Building Fast, Distributed Programs with Partitioned Tables

Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...
Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...
Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...
IndicThreads
 
How Microsoft Built and Scaled Cosmos
How Microsoft Built and Scaled CosmosHow Microsoft Built and Scaled Cosmos
How Microsoft Built and Scaled Cosmos
SingleStore
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
Adrian Cockcroft
 
Java Spring
Java SpringJava Spring
Java Spring
AathikaJava
 
Exascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing WorldExascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing World
inside-BigData.com
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
Vipin Singhal
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
TamilKnowledgebase
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
geminass1
 
Bleeding, Leading, or Not Competing
Bleeding, Leading, or Not CompetingBleeding, Leading, or Not Competing
Bleeding, Leading, or Not Competing
Robert H. McDonald
 
Spark
SparkSpark
20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db
hyeongchae lee
 
browserCloud.js - David Dias M.Sc Thesis Defense Deck
browserCloud.js - David Dias M.Sc Thesis Defense Deck browserCloud.js - David Dias M.Sc Thesis Defense Deck
browserCloud.js - David Dias M.Sc Thesis Defense Deck
David Dias
 
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
Docker, Inc.
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
OpenEBS
 
Managing Large Flask Applications On Google App Engine (GAE)
Managing Large Flask Applications On Google App Engine (GAE)Managing Large Flask Applications On Google App Engine (GAE)
Managing Large Flask Applications On Google App Engine (GAE)
Emmanuel Olowosulu
 
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
MLconf
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConf
Qubole
 
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMPowering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Alluxio, Inc.
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
David Wallom
 
Eduserv Education Cloud
Eduserv Education CloudEduserv Education Cloud
Eduserv Education Cloud
Andy Powell
 

Similar to Pgaref Piccolo Building Fast, Distributed Programs with Partitioned Tables (20)

Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...
Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...
Current State of Affairs – Cloud Computing - Indicthreads Cloud Computing Con...
 
How Microsoft Built and Scaled Cosmos
How Microsoft Built and Scaled CosmosHow Microsoft Built and Scaled Cosmos
How Microsoft Built and Scaled Cosmos
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
 
Java Spring
Java SpringJava Spring
Java Spring
 
Exascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing WorldExascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing World
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
Bleeding, Leading, or Not Competing
Bleeding, Leading, or Not CompetingBleeding, Leading, or Not Competing
Bleeding, Leading, or Not Competing
 
Spark
SparkSpark
Spark
 
20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db
 
browserCloud.js - David Dias M.Sc Thesis Defense Deck
browserCloud.js - David Dias M.Sc Thesis Defense Deck browserCloud.js - David Dias M.Sc Thesis Defense Deck
browserCloud.js - David Dias M.Sc Thesis Defense Deck
 
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Managing Large Flask Applications On Google App Engine (GAE)
Managing Large Flask Applications On Google App Engine (GAE)Managing Large Flask Applications On Google App Engine (GAE)
Managing Large Flask Applications On Google App Engine (GAE)
 
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConf
 
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMPowering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
 
Eduserv Education Cloud
Eduserv Education CloudEduserv Education Cloud
Eduserv Education Cloud
 

More from Panagiotis Garefalakis

Accelerating distributed joins in Apache Hive: Runtime filtering enhancements
Accelerating distributed joins in Apache Hive: Runtime filtering enhancementsAccelerating distributed joins in Apache Hive: Runtime filtering enhancements
Accelerating distributed joins in Apache Hive: Runtime filtering enhancements
Panagiotis Garefalakis
 
Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications
Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch ApplicationsNeptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications
Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications
Panagiotis Garefalakis
 
Medea: Scheduling of Long Running Applications in Shared Production Clusters
Medea: Scheduling of Long Running Applications in Shared Production ClustersMedea: Scheduling of Long Running Applications in Shared Production Clusters
Medea: Scheduling of Long Running Applications in Shared Production Clusters
Panagiotis Garefalakis
 
Mres presentation
Mres presentationMres presentation
Mres presentation
Panagiotis Garefalakis
 
Dais 2013 2 6 june
Dais 2013 2 6 juneDais 2013 2 6 june
Dais 2013 2 6 june
Panagiotis Garefalakis
 
Master presentation-21-7-2014
Master presentation-21-7-2014Master presentation-21-7-2014
Master presentation-21-7-2014
Panagiotis Garefalakis
 
Storage managment using nagios
Storage managment using nagiosStorage managment using nagios
Storage managment using nagios
Panagiotis Garefalakis
 
Ithings2012 20nov
Ithings2012 20novIthings2012 20nov
Ithings2012 20nov
Panagiotis Garefalakis
 

More from Panagiotis Garefalakis (8)

Accelerating distributed joins in Apache Hive: Runtime filtering enhancements
Accelerating distributed joins in Apache Hive: Runtime filtering enhancementsAccelerating distributed joins in Apache Hive: Runtime filtering enhancements
Accelerating distributed joins in Apache Hive: Runtime filtering enhancements
 
Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications
Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch ApplicationsNeptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications
Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications
 
Medea: Scheduling of Long Running Applications in Shared Production Clusters
Medea: Scheduling of Long Running Applications in Shared Production ClustersMedea: Scheduling of Long Running Applications in Shared Production Clusters
Medea: Scheduling of Long Running Applications in Shared Production Clusters
 
Mres presentation
Mres presentationMres presentation
Mres presentation
 
Dais 2013 2 6 june
Dais 2013 2 6 juneDais 2013 2 6 june
Dais 2013 2 6 june
 
Master presentation-21-7-2014
Master presentation-21-7-2014Master presentation-21-7-2014
Master presentation-21-7-2014
 
Storage managment using nagios
Storage managment using nagiosStorage managment using nagios
Storage managment using nagios
 
Ithings2012 20nov
Ithings2012 20novIthings2012 20nov
Ithings2012 20nov
 

Recently uploaded

Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
Wahiba Chair Training & Consulting
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
spdendr
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
Constructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective CommunicationConstructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective Communication
Chevonnese Chevers Whyte, MBA, B.Sc.
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 

Recently uploaded (20)

Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
Constructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective CommunicationConstructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective Communication
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 

Pgaref Piccolo Building Fast, Distributed Programs with Partitioned Tables

  • 1. 17-11-2014 © Imperial College LondonPage 1 Piccolo: Building Fast, Distributed Programs with Partitioned Tables Presenter: Panagiotis Garefalakis Course 590 - Academic Writing Russell Power and Jinyang Li - New York University
  • 2. Outline • Motivation • Background • Piccolo – Challenges – Contribution – Evaluation • Conclusion • Discussion 17-11-2014 © Imperial College LondonPage 2
  • 3. Motivation Page 3 • This is the age of big data and distributed data processing frameworks are key to analyzing them • Companies such as Google (MapReduce), Microsoft (Naiad) and open-source communities such as Apache (Hadoop, Spark) have proposed such frameworks – require developers to follow a functional programming model Garefalakis, Panagiotis, et al. "ACaZoo: A Distributed Key-Value Store based on Replicated LSM-Trees."
  • 4. Motivation 17-11-2014 © Imperial College London • Scaling out: Processing data is quick, I/O is very slow – 􏰀 1 HDD = 75 MB/sec – 􏰀 1000 HDDs = 75 GB/sec • For data-intensive workloads, a large number of commodity servers is preferred over a small number of high-end servers – 􏰀 Cost of super-computers is not linear – 􏰀 But datacenter efficiency is a difficult problem to solve Page 4
  • 5. MapReduce 17-11-2014 © Imperial College LondonPage 5 • Partition a large problem into smaller sub-problems • 􏰀Independent sub-problems executed in parallel • Combine intermediate results from each individual node (worker) Parallel problems which are independent (shared nothing) Computations depend on fragments of the dataset
  • 6. Motivating Example Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.Page 6
  • 7. PageRank in Map-Reduce Page 7 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010. Dataflow models do not expose global state!
  • 8. PageRank with RPC/MPI Page 8 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 9. Piccolo’s Goal: Distributed Shared State Page 9 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010. • Expose this state in a useful form for the programmer but not deal with communication • Interact with state and graph data and not with machines
  • 10. Piccolo programming model Page 10 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010. • Need an easy and effective way to access and represent the sate in matter of performance • We need the right level of abstraction
  • 11. PageRank with Piccolo Page 11 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 12. Piccolo - Locality Page 12 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010. • Communication between machines is slow!
  • 13. Piccolo - Locality Page 13 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010. • We need to exploit locality!
  • 14. PageRank with Piccolo Updated Page 14 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 15. Piccolo - Synchronization Page 15 Avoid write conflicts with accumulation functions •NewValue = Accum(OldValue, Update) •sum, product, min, max Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 16. PageRank with Piccolo Updated Page 16 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 17. Piccolo - Failure Recovery Page 17 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 18. PageRank with Piccolo Updated Page 18 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 19. Piccolo Evaluation • 12 nodes cluster, 64 cores • 100M-page graph Page 19 Piccolo Evaluation Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 20. Piccolo Evaluation • EC2 Cluster – linearly scaled the amount of data in proportion with the number of workers Page 20 Power, Russell, and Jinyang Li. "Piccolo: Building Fast, Distributed Programs with Partitioned Tables." OSDI 2010.
  • 21. Conclusion • Parallel in memory applications might need to access and share intermediate state which resides in different machines • Piccolo provides a programming model supporting distributed shared table model • It provides user-specified policies for – Effective use of locality – Efficient synchronization – Robust failure recovery 17-11-2014 © Imperial College LondonPage 21
  • 22. Limitations?? 17-11-2014 © Imperial College LondonPage 22
  • 23. Limitations?? • Aggregate functions are not always an option • The shared state should fit in memory • If a node fails you should restore all nodes to the last checkpoint 17-11-2014 © Imperial College LondonPage 23
  • 24. Paper Comments • Piccolo paper is clear and concise with extensive evaluation • It was published in 2010 and it was presented in a top-tier systems conference (OSDI) collocated with USENIX annual conference • Is cited 100 time according to Google Scholar • The reason: It introduces a new programming model for sharing mutable state in parallel applications • Map-Reduce which can be considered a de-facto standard for parallel execution does not support sharing state • It continues getting attention as it is an open research area 17-11-2014 © Imperial College LondonPage 24
  • 25. 17-11-2014 © Imperial College LondonPage 25 Panagiotis Garefalakis 17/11/2014 Review Presentation Course 590 - Academic Writing
  • 26. Backup - LB 17-11-2014 © Imperial College LondonPage 26