SlideShare a Scribd company logo
1 of 31
Download to read offline
Pregel: A System for Large-
                  Scale Graph Processing

                              Written by G. Malewicz et al. at SIGMOD 2010
                                        Presented by Chris Bunch
                                       Tuesday, October 12, 2010


                                                   1
Wednesday, October 13, 2010
Graphs are hard

                   • Poor locality of memory access
                   • Very little work per vertex
                   • Changing degree of parallelism
                   • Running over many machines
                              makes the problem worse


                                           2
Wednesday, October 13, 2010
State of the Art Today
                   • Write your own infrastructure
                    • Substantial engineering effort
                   • Use MapReduce
                    • Inefficient - must store graph state in
                              each stage, too much communication
                              between stages


                                              3
Wednesday, October 13, 2010
State of the Art Today

                   • Use a single-computer graph library
                    • Not scalable ☹
                   • Use existing parallel graph systems
                    • No fault tolerance ☹

                                        4
Wednesday, October 13, 2010
Bulk Synchronous
                                    Parallel
                   • Series of iterations (supersteps)
                   • Each vertex invokes a function in parallel
                   • Can read messages sent in previous
                              superstep
                   • Can send messages, to be read at the next
                              superstep
                   • Can modify state of outgoing edges
                                          5
Wednesday, October 13, 2010
Compute Model

                   • You give Pregel a directed graph
                   • It runs your computation at each vertex
                   • Do this until every vertex votes to halt
                   • Pregel gives you a directed graph back

                                        6
Wednesday, October 13, 2010
Primitives

                   • Vertices - first class
                   • Edges - not
                   • Both can be dynamically created and
                              destroyed



                                              7
Wednesday, October 13, 2010
Vertex State Machine




                                8
Wednesday, October 13, 2010
C++ API

                   • Your code subclasses Vertex, writes a
                              Compute method
                   • Can get/set vertex value
                   • Can get/set outgoing edges values
                   • Can send/receive messages

                                               9
Wednesday, October 13, 2010
C++ API

                   • Message passing:
                   • No guaranteed message delivery order
                   • Messages are delivered exactly once
                   • Can send messages to any node
                   • If dest doesn’t exist, user’s function is called
                                          10
Wednesday, October 13, 2010
C++ API
                   • Combiners (off by default):
                   • User specifies a way to reduce many
                              messages into one value (ala Reduce in MR)
                   • Must be commutative and associative
                   • Exceedingly useful in certain contexts (e.g.,
                              4x speedup on shortest-path compuation)


                                                11
Wednesday, October 13, 2010
C++ API

                   • Aggregators:
                   • User specifies a function
                   • Each vertex sends it a value
                   • Each vertex receives aggregate(vals)
                   • Can be used for statistics or coordination
                                        12
Wednesday, October 13, 2010
C++ API
                   • Topology mutations:
                   • Vertices can create / destroy vertices at will
                   • Resolving conflicting requests:
                    • Partial ordering: E Remove,V Remove,V
                              Add, E Add
                         • User-defined handlers:You fix the
                              conflicts on your own

                                             13
Wednesday, October 13, 2010
C++ API

                   • Input and output:
                    • Text file
                    • Vertices in a relational DB
                    • Rows in BigTable
                    • Custom - subclass Reader/Writer classes
                                       14
Wednesday, October 13, 2010
Implementation

                   • Executable is copied to many machines
                   • One machine becomes the Master
                    • Coordinates activities
                   • Other machines become Workers
                    • Performs computation
                                       15
Wednesday, October 13, 2010
Implementation
                   • Master partitions the graph
                   • Master partitions the input
                    • If a Worker receives input that is not for
                              their vertices, they pass it along
                   • Supersteps begin
                   • Master can tell Workers to save graphs
                                                16
Wednesday, October 13, 2010
Fault Tolerance
                   • At each superstep S:
                    • Workers checkpoint V, E, and Messages
                    • Master checkpoints Aggregators
                   • If a node fails, everyone starts over at S
                   • Confined recovery is under development
                   • what happens if the Master fails?
                                         17
Wednesday, October 13, 2010
The Worker

                   • Keeps graph in memory
                   • Message queues for supersteps S and S+1
                   • Remote messages are buffered
                   • Combiner is used when messages are sent
                              or received (save network and disk)


                                                18
Wednesday, October 13, 2010
The Master
                   • Master keeps track of which Workers own
                              each partition
                         • Not who owns each Vertex
                   • Coordinates all operations (via barriers)
                   • Maintains statistics and runs a HTTP server
                              for users to view info on


                                                 19
Wednesday, October 13, 2010
Aggregators

                   • Worker passes values to its aggregator
                   • Aggregator uses tree structure to reduce
                              vals w/ other aggregators
                         • Better parallelism than chain pipelining
                   • Final value is sent to Master

                                                 20
Wednesday, October 13, 2010
PageRank in Pregel




                                      21
Wednesday, October 13, 2010
Shortest Path in Pregel




                              22
Wednesday, October 13, 2010
Evaluation
                   • 300 multicore commodity PCs used
                   • Only running time is counted
                    • Checkpointing disabled
                   • Measures scalability of Worker tasks
                   • Measures scalability w.r.t. # of Vertices
                    • in binary trees and log-normal trees
                                          23
Wednesday, October 13, 2010
24
Wednesday, October 13, 2010
25
Wednesday, October 13, 2010
26
Wednesday, October 13, 2010
Current / Future Work

                   • Graph must fit in RAM - working on spilling
                              over to / from disk
                   • Assigning vertices to machines to optimize
                              traffic is an open problem
                         • Want to investigate dynamic re-
                                partitioning


                                                    27
Wednesday, October 13, 2010
Conclusions
                   • Pregel is production-ready and in use
                   • Usable after a short learning curve
                    • Vertex centric is not always easy to do
                   • Pregel works best on sparse graphs w /
                              communication over edges
                   • Can’t change the API - too many people
                              using it!

                                               28
Wednesday, October 13, 2010
Related Work

                   • Hama - from the Apache Hadoop team
                   • BSP model but not vertex centric ala Pregel
                   • Appears not to be ready for real use:
                   •

                                        29
Wednesday, October 13, 2010
Related Work
                   • Phoebus, released last week on github
                   • Runs on Mac OS X
                   • Cons (as of this writing):
                    • Doesn’t work on Linux
                    • Must write code in Erlang (since Phoebus
                              is written in it)

                                                  30
Wednesday, October 13, 2010
Thanks!

                   • To my advisor, Chandra Krintz
                   • To Google for this paper
                   • To all of you for coming!

                                       31
Wednesday, October 13, 2010

More Related Content

Viewers also liked

Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache HamaTommaso Teofili
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Automated and Adaptive Infrastructure Monitoring using Chef, Nagios and Graphite
Automated and Adaptive Infrastructure Monitoring using Chef, Nagios and GraphiteAutomated and Adaptive Infrastructure Monitoring using Chef, Nagios and Graphite
Automated and Adaptive Infrastructure Monitoring using Chef, Nagios and GraphitePratima Singh
 
Xây dựng chương trình đào tạo thương mại điện tử tại trường đại học
Xây dựng chương trình đào tạo thương mại điện tử tại trường đại họcXây dựng chương trình đào tạo thương mại điện tử tại trường đại học
Xây dựng chương trình đào tạo thương mại điện tử tại trường đại họcCat Van Khoi
 
6 Dean Google
6 Dean Google6 Dean Google
6 Dean GoogleFrank Cai
 
Riak Search - Erlang Factory London 2010
Riak Search - Erlang Factory London 2010Riak Search - Erlang Factory London 2010
Riak Search - Erlang Factory London 2010Rusty Klophaus
 
Chloe and the Realtime Web
Chloe and the Realtime WebChloe and the Realtime Web
Chloe and the Realtime WebTrotter Cashion
 
Blazes: coordination analysis for distributed programs
Blazes: coordination analysis for distributed programsBlazes: coordination analysis for distributed programs
Blazes: coordination analysis for distributed programspalvaro
 
Hyperdex - A closer look
Hyperdex - A closer lookHyperdex - A closer look
Hyperdex - A closer lookDECK36
 
LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013dotCloud
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseRobert Lujo
 
(Functional) reactive programming (@pavlobaron)
(Functional) reactive programming (@pavlobaron)(Functional) reactive programming (@pavlobaron)
(Functional) reactive programming (@pavlobaron)Pavlo Baron
 
Complex Legacy System Archiving/Data Retention with MongoDB and Xquery
Complex Legacy System Archiving/Data Retention with MongoDB and XqueryComplex Legacy System Archiving/Data Retention with MongoDB and Xquery
Complex Legacy System Archiving/Data Retention with MongoDB and XqueryDATAVERSITY
 
Spring Cleaning for Your Smartphone
Spring Cleaning for Your SmartphoneSpring Cleaning for Your Smartphone
Spring Cleaning for Your SmartphoneLookout
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Spark Summit
 
Web-Oriented Architecture (WOA)
Web-Oriented Architecture (WOA)Web-Oriented Architecture (WOA)
Web-Oriented Architecture (WOA)thetechnicalweb
 
Scalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDBScalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDBWilliam Candillon
 

Viewers also liked (20)

Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache Hama
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Automated and Adaptive Infrastructure Monitoring using Chef, Nagios and Graphite
Automated and Adaptive Infrastructure Monitoring using Chef, Nagios and GraphiteAutomated and Adaptive Infrastructure Monitoring using Chef, Nagios and Graphite
Automated and Adaptive Infrastructure Monitoring using Chef, Nagios and Graphite
 
Xây dựng chương trình đào tạo thương mại điện tử tại trường đại học
Xây dựng chương trình đào tạo thương mại điện tử tại trường đại họcXây dựng chương trình đào tạo thương mại điện tử tại trường đại học
Xây dựng chương trình đào tạo thương mại điện tử tại trường đại học
 
6 Dean Google
6 Dean Google6 Dean Google
6 Dean Google
 
GraphLab
GraphLabGraphLab
GraphLab
 
Riak Search - Erlang Factory London 2010
Riak Search - Erlang Factory London 2010Riak Search - Erlang Factory London 2010
Riak Search - Erlang Factory London 2010
 
Chloe and the Realtime Web
Chloe and the Realtime WebChloe and the Realtime Web
Chloe and the Realtime Web
 
Blazes: coordination analysis for distributed programs
Blazes: coordination analysis for distributed programsBlazes: coordination analysis for distributed programs
Blazes: coordination analysis for distributed programs
 
Hyperdex - A closer look
Hyperdex - A closer lookHyperdex - A closer look
Hyperdex - A closer look
 
Brunch With Coffee
Brunch With CoffeeBrunch With Coffee
Brunch With Coffee
 
LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
 
(Functional) reactive programming (@pavlobaron)
(Functional) reactive programming (@pavlobaron)(Functional) reactive programming (@pavlobaron)
(Functional) reactive programming (@pavlobaron)
 
Complex Legacy System Archiving/Data Retention with MongoDB and Xquery
Complex Legacy System Archiving/Data Retention with MongoDB and XqueryComplex Legacy System Archiving/Data Retention with MongoDB and Xquery
Complex Legacy System Archiving/Data Retention with MongoDB and Xquery
 
Spring Cleaning for Your Smartphone
Spring Cleaning for Your SmartphoneSpring Cleaning for Your Smartphone
Spring Cleaning for Your Smartphone
 
NkSIP: The Erlang SIP application server
NkSIP: The Erlang SIP application serverNkSIP: The Erlang SIP application server
NkSIP: The Erlang SIP application server
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
 
Web-Oriented Architecture (WOA)
Web-Oriented Architecture (WOA)Web-Oriented Architecture (WOA)
Web-Oriented Architecture (WOA)
 
Scalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDBScalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDB
 

Similar to Pregel: A System for Large-Scale Graph Processing

Introduction to CouchDB
Introduction to CouchDBIntroduction to CouchDB
Introduction to CouchDBJohn Wood
 
Challenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in JavaChallenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in Javalucenerevolution
 
Playing between the clouds - Better Software 2010
Playing between the clouds - Better Software 2010Playing between the clouds - Better Software 2010
Playing between the clouds - Better Software 2010Stefano Linguerri
 
HBase @ Hadoop Day Seattle
HBase @ Hadoop Day SeattleHBase @ Hadoop Day Seattle
HBase @ Hadoop Day Seattleamansk
 
Database Scalability Patterns
Database Scalability PatternsDatabase Scalability Patterns
Database Scalability PatternsRobert Treat
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Gaelyk - SpringOne2GX - 2010 - Guillaume Laforge
Gaelyk - SpringOne2GX - 2010 - Guillaume LaforgeGaelyk - SpringOne2GX - 2010 - Guillaume Laforge
Gaelyk - SpringOne2GX - 2010 - Guillaume LaforgeGuillaume Laforge
 
Ops for Developers
Ops for DevelopersOps for Developers
Ops for DevelopersMojo Lingo
 
Multi Handset Development - ETE 2010
Multi Handset Development - ETE 2010Multi Handset Development - ETE 2010
Multi Handset Development - ETE 2010Kevin Griffin
 
Abstractions at Scale – Our Experiences at Twitter
Abstractions at Scale – Our Experiences at TwitterAbstractions at Scale – Our Experiences at Twitter
Abstractions at Scale – Our Experiences at TwitterLeonidas Tsementzis
 
Odnoklassniki.ru Architecture
Odnoklassniki.ru ArchitectureOdnoklassniki.ru Architecture
Odnoklassniki.ru ArchitectureDmitry Buzdin
 
Puppet buero20 presentation
Puppet buero20 presentationPuppet buero20 presentation
Puppet buero20 presentationMartin Alfke
 
Built-in Replication in PostgreSQL
Built-in Replication in PostgreSQLBuilt-in Replication in PostgreSQL
Built-in Replication in PostgreSQLMasao Fujii
 
Designing for Massive Scalability at BackType #bigdatacamp
Designing for Massive Scalability at BackType #bigdatacampDesigning for Massive Scalability at BackType #bigdatacamp
Designing for Massive Scalability at BackType #bigdatacampMichael Montano
 
Doing data science with Clojure
Doing data science with ClojureDoing data science with Clojure
Doing data science with ClojureSimon Belak
 
Yet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRepYet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRepDenish Patel
 

Similar to Pregel: A System for Large-Scale Graph Processing (20)

Introduction to CouchDB
Introduction to CouchDBIntroduction to CouchDB
Introduction to CouchDB
 
Challenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in JavaChallenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in Java
 
Playing between the clouds - Better Software 2010
Playing between the clouds - Better Software 2010Playing between the clouds - Better Software 2010
Playing between the clouds - Better Software 2010
 
HBase @ Hadoop Day Seattle
HBase @ Hadoop Day SeattleHBase @ Hadoop Day Seattle
HBase @ Hadoop Day Seattle
 
Database Scalability Patterns
Database Scalability PatternsDatabase Scalability Patterns
Database Scalability Patterns
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Gaelyk - SpringOne2GX - 2010 - Guillaume Laforge
Gaelyk - SpringOne2GX - 2010 - Guillaume LaforgeGaelyk - SpringOne2GX - 2010 - Guillaume Laforge
Gaelyk - SpringOne2GX - 2010 - Guillaume Laforge
 
Processing
ProcessingProcessing
Processing
 
Ops for Developers
Ops for DevelopersOps for Developers
Ops for Developers
 
Multi Handset Development - ETE 2010
Multi Handset Development - ETE 2010Multi Handset Development - ETE 2010
Multi Handset Development - ETE 2010
 
Abstractions at Scale – Our Experiences at Twitter
Abstractions at Scale – Our Experiences at TwitterAbstractions at Scale – Our Experiences at Twitter
Abstractions at Scale – Our Experiences at Twitter
 
Odnoklassniki.ru Architecture
Odnoklassniki.ru ArchitectureOdnoklassniki.ru Architecture
Odnoklassniki.ru Architecture
 
Reef - ESUG 2010
Reef - ESUG 2010Reef - ESUG 2010
Reef - ESUG 2010
 
Verification with LoLA: 1 Basics
Verification with LoLA: 1 BasicsVerification with LoLA: 1 Basics
Verification with LoLA: 1 Basics
 
Noit ocon-2010
Noit ocon-2010Noit ocon-2010
Noit ocon-2010
 
Puppet buero20 presentation
Puppet buero20 presentationPuppet buero20 presentation
Puppet buero20 presentation
 
Built-in Replication in PostgreSQL
Built-in Replication in PostgreSQLBuilt-in Replication in PostgreSQL
Built-in Replication in PostgreSQL
 
Designing for Massive Scalability at BackType #bigdatacamp
Designing for Massive Scalability at BackType #bigdatacampDesigning for Massive Scalability at BackType #bigdatacamp
Designing for Massive Scalability at BackType #bigdatacamp
 
Doing data science with Clojure
Doing data science with ClojureDoing data science with Clojure
Doing data science with Clojure
 
Yet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRepYet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRep
 

More from Chris Bunch

AppScale at SB Cloud Meetup
AppScale at SB Cloud MeetupAppScale at SB Cloud Meetup
AppScale at SB Cloud MeetupChris Bunch
 
A Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCCA Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCCChris Bunch
 
AppScale + Neptune @ HPCDB
AppScale + Neptune @ HPCDBAppScale + Neptune @ HPCDB
AppScale + Neptune @ HPCDBChris Bunch
 
AppScale @ LA.rb
AppScale @ LA.rbAppScale @ LA.rb
AppScale @ LA.rbChris Bunch
 
AppScale Talk at SBonRails
AppScale Talk at SBonRailsAppScale Talk at SBonRails
AppScale Talk at SBonRailsChris Bunch
 
Active Cloud DB at CloudComp '10
Active Cloud DB at CloudComp '10Active Cloud DB at CloudComp '10
Active Cloud DB at CloudComp '10Chris Bunch
 
Designing the Call of Cthulhu app with Google App Engine
Designing the Call of Cthulhu app with Google App EngineDesigning the Call of Cthulhu app with Google App Engine
Designing the Call of Cthulhu app with Google App EngineChris Bunch
 
Presentation on Large Scale Data Management
Presentation on Large Scale Data ManagementPresentation on Large Scale Data Management
Presentation on Large Scale Data ManagementChris Bunch
 
Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Chris Bunch
 

More from Chris Bunch (11)

AppScale at SB Cloud Meetup
AppScale at SB Cloud MeetupAppScale at SB Cloud Meetup
AppScale at SB Cloud Meetup
 
Ph.D. Defense
Ph.D. DefensePh.D. Defense
Ph.D. Defense
 
A Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCCA Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCC
 
AppScale + Neptune @ HPCDB
AppScale + Neptune @ HPCDBAppScale + Neptune @ HPCDB
AppScale + Neptune @ HPCDB
 
Neptune @ SoCal
Neptune @ SoCalNeptune @ SoCal
Neptune @ SoCal
 
AppScale @ LA.rb
AppScale @ LA.rbAppScale @ LA.rb
AppScale @ LA.rb
 
AppScale Talk at SBonRails
AppScale Talk at SBonRailsAppScale Talk at SBonRails
AppScale Talk at SBonRails
 
Active Cloud DB at CloudComp '10
Active Cloud DB at CloudComp '10Active Cloud DB at CloudComp '10
Active Cloud DB at CloudComp '10
 
Designing the Call of Cthulhu app with Google App Engine
Designing the Call of Cthulhu app with Google App EngineDesigning the Call of Cthulhu app with Google App Engine
Designing the Call of Cthulhu app with Google App Engine
 
Presentation on Large Scale Data Management
Presentation on Large Scale Data ManagementPresentation on Large Scale Data Management
Presentation on Large Scale Data Management
 
Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Pregel: A System for Large-Scale Graph Processing

  • 1. Pregel: A System for Large- Scale Graph Processing Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010 1 Wednesday, October 13, 2010
  • 2. Graphs are hard • Poor locality of memory access • Very little work per vertex • Changing degree of parallelism • Running over many machines makes the problem worse 2 Wednesday, October 13, 2010
  • 3. State of the Art Today • Write your own infrastructure • Substantial engineering effort • Use MapReduce • Inefficient - must store graph state in each stage, too much communication between stages 3 Wednesday, October 13, 2010
  • 4. State of the Art Today • Use a single-computer graph library • Not scalable ☹ • Use existing parallel graph systems • No fault tolerance ☹ 4 Wednesday, October 13, 2010
  • 5. Bulk Synchronous Parallel • Series of iterations (supersteps) • Each vertex invokes a function in parallel • Can read messages sent in previous superstep • Can send messages, to be read at the next superstep • Can modify state of outgoing edges 5 Wednesday, October 13, 2010
  • 6. Compute Model • You give Pregel a directed graph • It runs your computation at each vertex • Do this until every vertex votes to halt • Pregel gives you a directed graph back 6 Wednesday, October 13, 2010
  • 7. Primitives • Vertices - first class • Edges - not • Both can be dynamically created and destroyed 7 Wednesday, October 13, 2010
  • 8. Vertex State Machine 8 Wednesday, October 13, 2010
  • 9. C++ API • Your code subclasses Vertex, writes a Compute method • Can get/set vertex value • Can get/set outgoing edges values • Can send/receive messages 9 Wednesday, October 13, 2010
  • 10. C++ API • Message passing: • No guaranteed message delivery order • Messages are delivered exactly once • Can send messages to any node • If dest doesn’t exist, user’s function is called 10 Wednesday, October 13, 2010
  • 11. C++ API • Combiners (off by default): • User specifies a way to reduce many messages into one value (ala Reduce in MR) • Must be commutative and associative • Exceedingly useful in certain contexts (e.g., 4x speedup on shortest-path compuation) 11 Wednesday, October 13, 2010
  • 12. C++ API • Aggregators: • User specifies a function • Each vertex sends it a value • Each vertex receives aggregate(vals) • Can be used for statistics or coordination 12 Wednesday, October 13, 2010
  • 13. C++ API • Topology mutations: • Vertices can create / destroy vertices at will • Resolving conflicting requests: • Partial ordering: E Remove,V Remove,V Add, E Add • User-defined handlers:You fix the conflicts on your own 13 Wednesday, October 13, 2010
  • 14. C++ API • Input and output: • Text file • Vertices in a relational DB • Rows in BigTable • Custom - subclass Reader/Writer classes 14 Wednesday, October 13, 2010
  • 15. Implementation • Executable is copied to many machines • One machine becomes the Master • Coordinates activities • Other machines become Workers • Performs computation 15 Wednesday, October 13, 2010
  • 16. Implementation • Master partitions the graph • Master partitions the input • If a Worker receives input that is not for their vertices, they pass it along • Supersteps begin • Master can tell Workers to save graphs 16 Wednesday, October 13, 2010
  • 17. Fault Tolerance • At each superstep S: • Workers checkpoint V, E, and Messages • Master checkpoints Aggregators • If a node fails, everyone starts over at S • Confined recovery is under development • what happens if the Master fails? 17 Wednesday, October 13, 2010
  • 18. The Worker • Keeps graph in memory • Message queues for supersteps S and S+1 • Remote messages are buffered • Combiner is used when messages are sent or received (save network and disk) 18 Wednesday, October 13, 2010
  • 19. The Master • Master keeps track of which Workers own each partition • Not who owns each Vertex • Coordinates all operations (via barriers) • Maintains statistics and runs a HTTP server for users to view info on 19 Wednesday, October 13, 2010
  • 20. Aggregators • Worker passes values to its aggregator • Aggregator uses tree structure to reduce vals w/ other aggregators • Better parallelism than chain pipelining • Final value is sent to Master 20 Wednesday, October 13, 2010
  • 21. PageRank in Pregel 21 Wednesday, October 13, 2010
  • 22. Shortest Path in Pregel 22 Wednesday, October 13, 2010
  • 23. Evaluation • 300 multicore commodity PCs used • Only running time is counted • Checkpointing disabled • Measures scalability of Worker tasks • Measures scalability w.r.t. # of Vertices • in binary trees and log-normal trees 23 Wednesday, October 13, 2010
  • 27. Current / Future Work • Graph must fit in RAM - working on spilling over to / from disk • Assigning vertices to machines to optimize traffic is an open problem • Want to investigate dynamic re- partitioning 27 Wednesday, October 13, 2010
  • 28. Conclusions • Pregel is production-ready and in use • Usable after a short learning curve • Vertex centric is not always easy to do • Pregel works best on sparse graphs w / communication over edges • Can’t change the API - too many people using it! 28 Wednesday, October 13, 2010
  • 29. Related Work • Hama - from the Apache Hadoop team • BSP model but not vertex centric ala Pregel • Appears not to be ready for real use: • 29 Wednesday, October 13, 2010
  • 30. Related Work • Phoebus, released last week on github • Runs on Mac OS X • Cons (as of this writing): • Doesn’t work on Linux • Must write code in Erlang (since Phoebus is written in it) 30 Wednesday, October 13, 2010
  • 31. Thanks! • To my advisor, Chandra Krintz • To Google for this paper • To all of you for coming! 31 Wednesday, October 13, 2010