SlideShare a Scribd company logo
1 of 1
Download to read offline
Sub-graph centric programming framework for
                                      large scale time series graphs
                                                                           Charith Wickramaarachchi,Alok Kumbhare
                                                                   Advisors: Yogesh Simmhan & Viktor Prasanna

                             1. Introduction                                                                                                 2. Motivation
                                                             R         R
                                                                                                                Sub graph centric graph algorithms
        Pregel                                                                    Map reduce
                                                         M       M    M    M
                                                                                                                • Operates on sub-graphs which will run in parallel.
                                                                                                                • Sub graphs can communicate with each other in each super
                                                                                                                  step.



                                                                 Sub graph centric



                               Barrier Synchronization
                 BSP




• Limitations of Map Reduce[1] and Pregel[2] models for graph
  algorithms.
  • Map reduce – Do not consider the topology and locality of graphs.
  • Pregel - Costs large number of coordination steps for some class of
     graph algorithms due to vertex centric nature
• Bulk Synchronous parallel - Abstract computer model for designing
  data parallel algorithms.
  • Computation proceeds in super steps which consists of
     • Computation
     • Communication
     • Barrier Synchronization

                                 Sub-graph centric programming model

                                  • Sub-graph G’ =(V’,E’)
  G’1                  G’2          • connected subset of vertices of a
                                      Graph G = (V,E)
                                          V '  V , E'  E
                                  • Sub-Graph centric programming                                                Sub graph centric Minimum Spanning Tree algorithm
                                    model
                                    • Operates on a sub-graphs                                                   • Build Min spanning forest in each sub -graph locally with
                                      concurrently                                                                 Borůvka's algorithm
                                    • Communicate between sub-                                                   • Extends Karloff et al. [3] s MST algorithm using a sub-graph
                                      graphs                                                                       centric approach and reduce the graph size.



                             3. Time Series Graphs                                                                                                  4. GoFFish
                                                                                                                 Analytics & Time-series Graph
                                                                                                                          Applications
   • Fixed graph of event sources                                                       Data
                                                                                     Analyst
        • Known relationships between
          sources E.g. pathway                                                                                               Gopher
                                                                                                                   Graph-oriented Programming
        • E.g. “license plate detected”                                                                                    Framework
                                                                      Analytics
          event generated by a camera
   • Streams of events form graph time-                                 Cloud                                                 GoFS
                                                                      Storage                                     Graph-oriented Distributed File
     series
                                                                                                                             System
        • Snapshots of graphs over time                                 Graph
                                                                            s
   • Graph Template GT=(V,E)
   • Graph instance Gi = (Vt,Et,ti)                                                                                 •  Graph-oriented File System
   • Time Series Graph TG= (GT,                                                                                          • Relationships between event sources and across time
     G1,G2,..Gk)                                                                                                            are captured in the data model
                                                                                                                         • Layout is key: How can we maximize parallelism &
                                                                                                                            minimize disk access for analytics
                                                                                                                    • Graph Programming Framework
                                              Time series graph applications                                          • Abstractions sensitive to event relationships & time-series
                                                                                                                      • Sub-graph centric operations that span graph instances
                                              • License plate recognition                                             • Analytics composed as a dataflow
                                                systems scan the license                                              • Framework aware of distributed layout
                                                plates of moving or parked                                            • Limit coordination overhead
                                                vehicles.                                                           • Works on Intersection of time-series event data & graph data
                                                • Large amount of time                                                structures
                                                   series data
                                                • Applications :                                             References
                                                  • Finding hot routes / spots                               [1] MapReduce: Simplified Data Processing on Large Clusters , Jeffrey Dean and
                                                     for Auto theft.                                         Sanjay Ghemawat , http://dl.acm.org/citation.cfm?id=1327492
                                                                                                             [2] Pregel: a system for large-scale graph processing , Grzegorz Malewicz et al.
                                                                                                             http://dl.acm.org/citation.cfm?id=1807184
                                                                                                             [3] A Model of Computation for MapReduce,Karloff et al.
                                                                                                             ,http://dl.acm.org/citation.cfm?id=1873677



                                    This work is supported by Defense Advanced Research Projects Agency - XDATA program and the National Science foundation Co – PI s :
http://ganges.usc.edu                                                     Viktor Prasanna , Yogesh Simmhan , Raghu Raghavendra
                                                         The views of authors expressed herein do not necessarily reflect those of the sponsors.

More Related Content

Similar to Gopher A Sub-graph centric framework for large scale graphs

A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat DetectionJen Aman
 
Geographic Information System for Egyptian Railway System(GIS)
Geographic Information System for Egyptian Railway System(GIS)Geographic Information System for Egyptian Railway System(GIS)
Geographic Information System for Egyptian Railway System(GIS)Ismail El Gayar
 
Sparksummit2016 share
Sparksummit2016 shareSparksummit2016 share
Sparksummit2016 sharePing Yan
 
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology Enthusiasts
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology EnthusiastsNebula Graph nMeetup in Shanghai - Meet with Graph Technology Enthusiasts
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology EnthusiastsNebula Graph
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsNYC Predictive Analytics
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processinghuguk
 
Gis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsGis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsAhmad Jawwad
 
The Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemThe Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemReza Rahimi
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learningjie cao
 
An Integrated Framework for Parameter-based Optimization of Scientific Workflows
An Integrated Framework for Parameter-based Optimization of Scientific WorkflowsAn Integrated Framework for Parameter-based Optimization of Scientific Workflows
An Integrated Framework for Parameter-based Optimization of Scientific Workflowsvijayskumar
 
NIAR_VRC_2010
NIAR_VRC_2010NIAR_VRC_2010
NIAR_VRC_2010fftoledo
 
아이폰강의(7) pdf
아이폰강의(7) pdf아이폰강의(7) pdf
아이폰강의(7) pdfsunwooindia
 
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningHandling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningSpark Summit
 
Trace Visualization
Trace VisualizationTrace Visualization
Trace VisualizationPTIHPA
 
Hanna bosc2010
Hanna bosc2010Hanna bosc2010
Hanna bosc2010BOSC 2010
 

Similar to Gopher A Sub-graph centric framework for large scale graphs (20)

A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
Geographic Information System for Egyptian Railway System(GIS)
Geographic Information System for Egyptian Railway System(GIS)Geographic Information System for Egyptian Railway System(GIS)
Geographic Information System for Egyptian Railway System(GIS)
 
Sensor Data Management
Sensor Data ManagementSensor Data Management
Sensor Data Management
 
Sparksummit2016 share
Sparksummit2016 shareSparksummit2016 share
Sparksummit2016 share
 
Delay Tolerant Streaming Services, Thomas Plagemann, UiO
Delay Tolerant Streaming Services, Thomas Plagemann, UiODelay Tolerant Streaming Services, Thomas Plagemann, UiO
Delay Tolerant Streaming Services, Thomas Plagemann, UiO
 
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology Enthusiasts
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology EnthusiastsNebula Graph nMeetup in Shanghai - Meet with Graph Technology Enthusiasts
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology Enthusiasts
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processing
 
Gis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsGis capabilities on Big Data Systems
Gis capabilities on Big Data Systems
 
The Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemThe Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management System
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
 
An Integrated Framework for Parameter-based Optimization of Scientific Workflows
An Integrated Framework for Parameter-based Optimization of Scientific WorkflowsAn Integrated Framework for Parameter-based Optimization of Scientific Workflows
An Integrated Framework for Parameter-based Optimization of Scientific Workflows
 
NIAR_VRC_2010
NIAR_VRC_2010NIAR_VRC_2010
NIAR_VRC_2010
 
아이폰강의(7) pdf
아이폰강의(7) pdf아이폰강의(7) pdf
아이폰강의(7) pdf
 
Giraph
GiraphGiraph
Giraph
 
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningHandling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
 
Trace Visualization
Trace VisualizationTrace Visualization
Trace Visualization
 
Spark Technology Center IBM
Spark Technology Center IBMSpark Technology Center IBM
Spark Technology Center IBM
 
Project Progress
Project ProgressProject Progress
Project Progress
 
Hanna bosc2010
Hanna bosc2010Hanna bosc2010
Hanna bosc2010
 

Gopher A Sub-graph centric framework for large scale graphs

  • 1. Sub-graph centric programming framework for large scale time series graphs Charith Wickramaarachchi,Alok Kumbhare Advisors: Yogesh Simmhan & Viktor Prasanna 1. Introduction 2. Motivation R R Sub graph centric graph algorithms Pregel Map reduce M M M M • Operates on sub-graphs which will run in parallel. • Sub graphs can communicate with each other in each super step. Sub graph centric Barrier Synchronization BSP • Limitations of Map Reduce[1] and Pregel[2] models for graph algorithms. • Map reduce – Do not consider the topology and locality of graphs. • Pregel - Costs large number of coordination steps for some class of graph algorithms due to vertex centric nature • Bulk Synchronous parallel - Abstract computer model for designing data parallel algorithms. • Computation proceeds in super steps which consists of • Computation • Communication • Barrier Synchronization Sub-graph centric programming model • Sub-graph G’ =(V’,E’) G’1 G’2 • connected subset of vertices of a Graph G = (V,E) V '  V , E'  E • Sub-Graph centric programming Sub graph centric Minimum Spanning Tree algorithm model • Operates on a sub-graphs • Build Min spanning forest in each sub -graph locally with concurrently Borůvka's algorithm • Communicate between sub- • Extends Karloff et al. [3] s MST algorithm using a sub-graph graphs centric approach and reduce the graph size. 3. Time Series Graphs 4. GoFFish Analytics & Time-series Graph Applications • Fixed graph of event sources Data Analyst • Known relationships between sources E.g. pathway Gopher Graph-oriented Programming • E.g. “license plate detected” Framework Analytics event generated by a camera • Streams of events form graph time- Cloud GoFS Storage Graph-oriented Distributed File series System • Snapshots of graphs over time Graph s • Graph Template GT=(V,E) • Graph instance Gi = (Vt,Et,ti) • Graph-oriented File System • Time Series Graph TG= (GT, • Relationships between event sources and across time G1,G2,..Gk) are captured in the data model • Layout is key: How can we maximize parallelism & minimize disk access for analytics • Graph Programming Framework Time series graph applications • Abstractions sensitive to event relationships & time-series • Sub-graph centric operations that span graph instances • License plate recognition • Analytics composed as a dataflow systems scan the license • Framework aware of distributed layout plates of moving or parked • Limit coordination overhead vehicles. • Works on Intersection of time-series event data & graph data • Large amount of time structures series data • Applications : References • Finding hot routes / spots [1] MapReduce: Simplified Data Processing on Large Clusters , Jeffrey Dean and for Auto theft. Sanjay Ghemawat , http://dl.acm.org/citation.cfm?id=1327492 [2] Pregel: a system for large-scale graph processing , Grzegorz Malewicz et al. http://dl.acm.org/citation.cfm?id=1807184 [3] A Model of Computation for MapReduce,Karloff et al. ,http://dl.acm.org/citation.cfm?id=1873677 This work is supported by Defense Advanced Research Projects Agency - XDATA program and the National Science foundation Co – PI s : http://ganges.usc.edu Viktor Prasanna , Yogesh Simmhan , Raghu Raghavendra The views of authors expressed herein do not necessarily reflect those of the sponsors.