SlideShare a Scribd company logo
1
Twitter Storm

1st Mentioning of Big Data
Michael Cox and David Ellsworth publish “Application-controlled
demand paging for out-of-core visualization” in the Proceedings of the
IEEE 8th conference on Visualization, 1997

http://www.gartner.com/it-glossary/big-data/

Pregel
Map Reduce
2
• Scalable platform for large scale graph analytics
in commodity clusters and cloud.
• Simple yet scalable programming abstraction for
large scale graph processing
• More than 10x improvements in some graph
algorithms over traditional graph processing
systems
• Components
– GoFS : Sub-graph centric Distributed Graph Storage
– Gopher : Distributed BSP-based programming
abstraction for Large Scale graph analytics framework
3
• Iterative vertex centric programming model
based on Bulk synchronous parallel model.
• In each iteration vertex can
1. Receive messages sent to it in previous iteration
2. Send messages to other vertices
3. Modify its own state

• Vertex centric programming mode- Think like
a vertex
• Very simple,
• High communication overhead
• Pregel, http://giraph.apache.org/
4
• Partition graph in to set of sets of vertices and edges
• Sub-graphs – Connected components in a partition.
• User logic operates on a sub graph
– Independent unit of computation

• Resource allocation
– Single Partition → Single Machine | Single Sub-graph → Single CPU

• Data loading
– Entire partition is loaded into memory before computation
– Tasks retain sub-graphs in memory within the task scope
Sub-graph-Task 1

Sub-graph-Task 2

Sub-graph-Task 3

5
• Algorithm : Seqn of super-steps separated
by global barrier synchronization
• Super Step i:
1.
2.
3.
4.

Sub-graphs compute in parallel
Receive messages sent to it in super step i-1
Execute same user defined function
Send messages to other sub graphs (to be
received in super step i+1)
5. Can vote to halt : I’m done / de-activate

http://www.ibm.com/developerworks/opensource/library/os-giraph/

• Global Vote to Halt check

BSP Manager

Control Channels

– termination → all sub-graphs voted to halt

• Active sub graphs participate in every
computation
• De-activated sub-graphs will not get
executed/activated unless it get new
messages

P1

P1

Data Channnels

P1

6
• Algorithm – Max vertex value
• Input – Connected graph with different vertex values {n1,n2…. ni}
• Output – Each vertex with the Max ({n1,n2…. ni})

• Compared to Vertex centric
model
• Less communication
• Fast convergence
7
8
• Cluster
– 12 Nodes, 8-Core Intel Xeon CPU (each)
– 16GB RAM (each), 1TB HDD (each)

• Network
– Gigabit Ethernet

• Apache Giraph latest version from trunk
– Includes Performance improvements from Facebook !
Data Set

Vertices

Edges

Diameter

RN: California Road
Network

1,965,206

2,766,607

849

TR: Internet path graph 19,442,778
from Tracesroutes

22,782,842

25

LJ: LiveJournal
Social Network

68,475,391

10

4,847,571

9
• Connected components
– 81x improvement using California
Road Network (RN) dataset
– 21x improvement using a Trace
route path network of a CDN. (TR)
dataset

Page Rank
4x improvement using RN dataset
1.5x improvement using TR dataset
Not an ideal algorithm for sub-graph
centric programming model

• Single Source Shortest Path
– 32x improvement using RN dataset
– 10x improvement using TR dataset

10
• Connected components
– ~79x reduction using RN dataset.
– ~4x reduction using TR dataset
– ~2x reduction using Live Journal dataset (LJ)

• Single source shortest path
– ~38x reduction using RN data set
– 4x reduction using TR dataset
– ~1.6x reduction using Live journal dataset

11
• Introduced a sub-graph centric programming abstraction
for large scale graph analytics on distributed systems
– Simple
– Enable using shared memory algorithms at sub-graph level.

• Sub-graph centric algorithms and performance results
– Connected Components
– Single Source Shortest Path
– Page Rank

• Issues and Future work
– Sub-graph aware partitioning
– Sub-graph centric algorithms

• Try now
– https://github.com/usc-cloud/goffish
12
http://thesciencepresenter.wordpress.com/category/behaviour-management/

13

More Related Content

What's hot

What's new in IP 4.4
What's new in IP 4.4What's new in IP 4.4
What's new in IP 4.4
Lloyd's Register Energy
 
Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithms
Yash Khandelwal
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
Ahmad El Tawil
 
Whats new in IC 2016?
Whats new in IC 2016?Whats new in IC 2016?
Whats new in IC 2016?
Lloyd's Register Energy
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications
Humoyun Ahmedov
 
Mapreduce
MapreduceMapreduce
Mapreduce
Humera Shaikh
 
Scaling graphite to handle a zerg rush
Scaling graphite to handle a zerg rushScaling graphite to handle a zerg rush
Scaling graphite to handle a zerg rush
Daniel Ben-Zvi
 
Sky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor ComputationSky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor Computation
EUDAT
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
ExtremeEarth
 
MapReduce
MapReduceMapReduce
MapReduce
Surinder Kaur
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution
Chen Wu
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joins
Shalish VJ
 
Introduction to yarn
Introduction to yarnIntroduction to yarn
Introduction to yarn
Bhupesh Chawda
 
Apache Giraph
Apache GiraphApache Giraph
Apache Giraph
Ahmet Emre Aladağ
 
Dive in with Databases – FME Summer Camp 2018
Dive in with Databases  – FME Summer Camp 2018Dive in with Databases  – FME Summer Camp 2018
Dive in with Databases – FME Summer Camp 2018
Safe Software
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)
Jonas Traub
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
Gabriela Agustini
 
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on HadoopApache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
guest20d395b
 
Highly Available Graphite
Highly Available GraphiteHighly Available Graphite
Highly Available Graphite
Matthew Barlocker
 
Map reduce in Hadoop
Map reduce in HadoopMap reduce in Hadoop
Map reduce in Hadoop
ishan0019
 

What's hot (20)

What's new in IP 4.4
What's new in IP 4.4What's new in IP 4.4
What's new in IP 4.4
 
Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithms
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Whats new in IC 2016?
Whats new in IC 2016?Whats new in IC 2016?
Whats new in IC 2016?
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications
 
Mapreduce
MapreduceMapreduce
Mapreduce
 
Scaling graphite to handle a zerg rush
Scaling graphite to handle a zerg rushScaling graphite to handle a zerg rush
Scaling graphite to handle a zerg rush
 
Sky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor ComputationSky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor Computation
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
 
MapReduce
MapReduceMapReduce
MapReduce
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joins
 
Introduction to yarn
Introduction to yarnIntroduction to yarn
Introduction to yarn
 
Apache Giraph
Apache GiraphApache Giraph
Apache Giraph
 
Dive in with Databases – FME Summer Camp 2018
Dive in with Databases  – FME Summer Camp 2018Dive in with Databases  – FME Summer Camp 2018
Dive in with Databases – FME Summer Camp 2018
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on HadoopApache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
 
Highly Available Graphite
Highly Available GraphiteHighly Available Graphite
Highly Available Graphite
 
Map reduce in Hadoop
Map reduce in HadoopMap reduce in Hadoop
Map reduce in Hadoop
 

Similar to GoFFish - A Sub-graph centric framework for large scale graph analytics

Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
balmanme
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
jins0618
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
DataWorks Summit
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
NETWAYS
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
Ganesan Narayanasamy
 
The Challenges of SDN/OpenFlow in an Operational and Large-scale Network
The Challenges of SDN/OpenFlow in an Operational and Large-scale NetworkThe Challenges of SDN/OpenFlow in an Operational and Large-scale Network
The Challenges of SDN/OpenFlow in an Operational and Large-scale Network
Open Networking Summits
 
try
trytry
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
Sagar Dolas
 
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
balmanme
 
Exascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate AnalyticsExascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate Analytics
inside-BigData.com
 
Update on Trinity System Procurement and Plans
Update on Trinity System Procurement and PlansUpdate on Trinity System Procurement and Plans
Update on Trinity System Procurement and Plans
inside-BigData.com
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
Sagar Dolas
 
Ocpeu14
Ocpeu14Ocpeu14
Ocpeu14
KALRAY
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
HPCC Systems
 
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale FrontierMultiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
inside-BigData.com
 
LISP and NSH in Open vSwitch
LISP and NSH in Open vSwitchLISP and NSH in Open vSwitch
LISP and NSH in Open vSwitch
mestery
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
Petr Novotný
 
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
InfluxData
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
Reynold Xin
 

Similar to GoFFish - A Sub-graph centric framework for large scale graph analytics (20)

Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
The Challenges of SDN/OpenFlow in an Operational and Large-scale Network
The Challenges of SDN/OpenFlow in an Operational and Large-scale NetworkThe Challenges of SDN/OpenFlow in an Operational and Large-scale Network
The Challenges of SDN/OpenFlow in an Operational and Large-scale Network
 
try
trytry
try
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
 
Exascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate AnalyticsExascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate Analytics
 
Update on Trinity System Procurement and Plans
Update on Trinity System Procurement and PlansUpdate on Trinity System Procurement and Plans
Update on Trinity System Procurement and Plans
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Ocpeu14
Ocpeu14Ocpeu14
Ocpeu14
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale FrontierMultiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
 
LISP and NSH in Open vSwitch
LISP and NSH in Open vSwitchLISP and NSH in Open vSwitch
LISP and NSH in Open vSwitch
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
 

Recently uploaded

Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 

Recently uploaded (20)

Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 

GoFFish - A Sub-graph centric framework for large scale graph analytics

  • 1. 1
  • 2. Twitter Storm 1st Mentioning of Big Data Michael Cox and David Ellsworth publish “Application-controlled demand paging for out-of-core visualization” in the Proceedings of the IEEE 8th conference on Visualization, 1997 http://www.gartner.com/it-glossary/big-data/ Pregel Map Reduce 2
  • 3. • Scalable platform for large scale graph analytics in commodity clusters and cloud. • Simple yet scalable programming abstraction for large scale graph processing • More than 10x improvements in some graph algorithms over traditional graph processing systems • Components – GoFS : Sub-graph centric Distributed Graph Storage – Gopher : Distributed BSP-based programming abstraction for Large Scale graph analytics framework 3
  • 4. • Iterative vertex centric programming model based on Bulk synchronous parallel model. • In each iteration vertex can 1. Receive messages sent to it in previous iteration 2. Send messages to other vertices 3. Modify its own state • Vertex centric programming mode- Think like a vertex • Very simple, • High communication overhead • Pregel, http://giraph.apache.org/ 4
  • 5. • Partition graph in to set of sets of vertices and edges • Sub-graphs – Connected components in a partition. • User logic operates on a sub graph – Independent unit of computation • Resource allocation – Single Partition → Single Machine | Single Sub-graph → Single CPU • Data loading – Entire partition is loaded into memory before computation – Tasks retain sub-graphs in memory within the task scope Sub-graph-Task 1 Sub-graph-Task 2 Sub-graph-Task 3 5
  • 6. • Algorithm : Seqn of super-steps separated by global barrier synchronization • Super Step i: 1. 2. 3. 4. Sub-graphs compute in parallel Receive messages sent to it in super step i-1 Execute same user defined function Send messages to other sub graphs (to be received in super step i+1) 5. Can vote to halt : I’m done / de-activate http://www.ibm.com/developerworks/opensource/library/os-giraph/ • Global Vote to Halt check BSP Manager Control Channels – termination → all sub-graphs voted to halt • Active sub graphs participate in every computation • De-activated sub-graphs will not get executed/activated unless it get new messages P1 P1 Data Channnels P1 6
  • 7. • Algorithm – Max vertex value • Input – Connected graph with different vertex values {n1,n2…. ni} • Output – Each vertex with the Max ({n1,n2…. ni}) • Compared to Vertex centric model • Less communication • Fast convergence 7
  • 8. 8
  • 9. • Cluster – 12 Nodes, 8-Core Intel Xeon CPU (each) – 16GB RAM (each), 1TB HDD (each) • Network – Gigabit Ethernet • Apache Giraph latest version from trunk – Includes Performance improvements from Facebook ! Data Set Vertices Edges Diameter RN: California Road Network 1,965,206 2,766,607 849 TR: Internet path graph 19,442,778 from Tracesroutes 22,782,842 25 LJ: LiveJournal Social Network 68,475,391 10 4,847,571 9
  • 10. • Connected components – 81x improvement using California Road Network (RN) dataset – 21x improvement using a Trace route path network of a CDN. (TR) dataset Page Rank 4x improvement using RN dataset 1.5x improvement using TR dataset Not an ideal algorithm for sub-graph centric programming model • Single Source Shortest Path – 32x improvement using RN dataset – 10x improvement using TR dataset 10
  • 11. • Connected components – ~79x reduction using RN dataset. – ~4x reduction using TR dataset – ~2x reduction using Live Journal dataset (LJ) • Single source shortest path – ~38x reduction using RN data set – 4x reduction using TR dataset – ~1.6x reduction using Live journal dataset 11
  • 12. • Introduced a sub-graph centric programming abstraction for large scale graph analytics on distributed systems – Simple – Enable using shared memory algorithms at sub-graph level. • Sub-graph centric algorithms and performance results – Connected Components – Single Source Shortest Path – Page Rank • Issues and Future work – Sub-graph aware partitioning – Sub-graph centric algorithms • Try now – https://github.com/usc-cloud/goffish 12