SlideShare a Scribd company logo
Asymmetry in Large-Scale
Graph Analysis, Explained
2nd International Workshop on
Graph Data Management Experiences and Systems (GRADES 2014)
Vasiliki Kalavri1
, Stephan Ewen2
, Kostas Tzoumas3
,
Vladimir Vlassov4
, Volker Markl5
, Seif Haridi6
1, 4, 6
KTH Royal Institute of Technology, 2, 3, 5
Technical University of Berlin
1, 4, 6
{kalavri, vladv, haridi}@kth.se
2, 3, 5
{firstname.lastname}@tu-berlin.de
Motivation
● Many of large-scale data processing
applications include
fixed point iterations
○ social network analysis
○ web graph analysis
○ machine learning
2
Asymmetrical Convergence
● Often, in fixed point
iterations, some
elements converge
faster than others
● Not all elements
require an update in
every iteration
3
Can we detect the elements that
require recomputation and
avoid redundant computations?
4
Contributions
● A categorization of optimizations for fixed point
iterative graph processing
● Necessary conditions under which, it is safe to
apply optimizations
● A mapping of existing techniques to graph
processing abstractions
● An implementation of template execution plans
Optimized algorithms yield order of
magnitude gains!
5
Asymmetry in Connected Components
6
Iterative Plans - Bulk
● In each iteration, all
elements are computed
● Always applicable
7
Iterative Plans - Dependency
8
● In each iteration, only
elements whose at least
one neighbor has
changed are computed
● The state is computed
using the values of all
neighbors
● Always applicable
Iterative Plans - Incremental
9
● In each iteration, only elements
whose at least one neighbor
has changed are computed
● The state is computed using only
the values of updated
neighbors
● Applicable when the update
function is idempotent and
weakly monotonic (e.g. min)
Iterative Plans - Delta
10
● In each iteration, only elements
whose at least one neighbor
has changed are computed
● The state is computed using only
the deltas of updated
neighbors
● Applicable when the update
function is linear over the
composition operator (e.g.
sum)
Iteration Techniques Support in Graph
Processing Systems
X : provided by default
X : can be easily implemented
X : possible, but non-intuitive
System Bulk Dependency Incremental Delta
Pregel X X X X
GraphLab X X X X
GraphX X X X X
Powergraph X X X X
Stratosphere X X X X
11
Performance - Connected Components
12
Performance - PageRank
13
Conclusions & Future Work
● Exploiting asymmetrical convergence can lead to
order of magnitude performance gains
● In the future, we plan to
○ Use cost-based optimization, to automatically select the
most efficient iterative plan, at runtime.
○ Implement a set of representative applications and
compare performance with iterative and graph-processing
systems.
14

More Related Content

What's hot

Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Vasia Kalavri
 
MapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open IssuesMapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open Issues
Vasia Kalavri
 
Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming
Flink Forward
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Jen Aman
 
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
Deep Stream Dynamic Graph Analytics with Grapharis -  Massimo PeriniDeep Stream Dynamic Graph Analytics with Grapharis -  Massimo Perini
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
Flink Forward
 
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward
 
Demystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingDemystifying Distributed Graph Processing
Demystifying Distributed Graph Processing
Vasia Kalavri
 
Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Andra Lungu
 
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Till Rohrmann
 
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Flink Forward
 
SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink Streaming
Turi, Inc.
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
Paris Carbone
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingRiyad Parvez
 
Apache flink
Apache flinkApache flink
Apache flink
pranay kumar
 
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...
Flink Forward
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
Flink Forward
 

What's hot (20)

Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
 
MapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open IssuesMapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open Issues
 
Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
Deep Stream Dynamic Graph Analytics with Grapharis -  Massimo PeriniDeep Stream Dynamic Graph Analytics with Grapharis -  Massimo Perini
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
 
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
 
Demystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingDemystifying Distributed Graph Processing
Demystifying Distributed Graph Processing
 
Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015
 
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
 
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
 
SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink Streaming
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
 
Apache flink
Apache flinkApache flink
Apache flink
 
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 

Viewers also liked

Like a Pack of Wolves: Community Structure of Web Trackers
Like a Pack of Wolves: Community Structure of Web TrackersLike a Pack of Wolves: Community Structure of Web Trackers
Like a Pack of Wolves: Community Structure of Web Trackers
Vasia Kalavri
 
The shortest path is not always a straight line
The shortest path is not always a straight lineThe shortest path is not always a straight line
The shortest path is not always a straight line
Vasia Kalavri
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Vasia Kalavri
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
Vasia Kalavri
 
A Skype case study (2011)
A Skype case study (2011)A Skype case study (2011)
A Skype case study (2011)
Vasia Kalavri
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
Slim Baltagi
 

Viewers also liked (6)

Like a Pack of Wolves: Community Structure of Web Trackers
Like a Pack of Wolves: Community Structure of Web TrackersLike a Pack of Wolves: Community Structure of Web Trackers
Like a Pack of Wolves: Community Structure of Web Trackers
 
The shortest path is not always a straight line
The shortest path is not always a straight lineThe shortest path is not always a straight line
The shortest path is not always a straight line
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
A Skype case study (2011)
A Skype case study (2011)A Skype case study (2011)
A Skype case study (2011)
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 

Similar to Asymmetry in Large-Scale Graph Analysis, Explained

Pregel
PregelPregel
Pregel
Weiru Dai
 
Ad Click Prediction - Paper review
Ad Click Prediction - Paper reviewAd Click Prediction - Paper review
Ad Click Prediction - Paper review
Mazen Aly
 
Enabling Application Integrated Proactive Fault Tolerance
Enabling Application Integrated Proactive Fault ToleranceEnabling Application Integrated Proactive Fault Tolerance
Enabling Application Integrated Proactive Fault Tolerance
Dai Yang
 
Improving Resource Utilization in Cloud using Application Placement Heuristics
Improving Resource Utilization in Cloud using Application Placement HeuristicsImproving Resource Utilization in Cloud using Application Placement Heuristics
Improving Resource Utilization in Cloud using Application Placement Heuristics
AtakanAral
 
Pregel - Paper Review
Pregel - Paper ReviewPregel - Paper Review
Pregel - Paper Review
Maria Stylianou
 
Uber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache FlinkUber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache Flink
Wenrui Meng
 
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
LDBC council
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
Wush Wu
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
SigOpt
 
IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CL
IRJET Journal
 
M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
M3AT: Monitoring Agents Assignment Model for the Data-Intensive ApplicationsM3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
VladislavKashansky
 
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger Analytics
Itzhak Kameli
 
Gatling
Gatling Gatling
Gatling
Gaurav Shukla
 
Performance Test Automation With Gatling
Performance Test Automation  With GatlingPerformance Test Automation  With Gatling
Performance Test Automation With Gatling
Knoldus Inc.
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
LEGATO project
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Till Blume
 
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link PredictionMemory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
miyurud
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
Bill Liu
 
Distributed Feature Selection for Efficient Economic Big Data Analysis
Distributed Feature Selection for Efficient Economic Big Data AnalysisDistributed Feature Selection for Efficient Economic Big Data Analysis
Distributed Feature Selection for Efficient Economic Big Data Analysis
IRJET Journal
 

Similar to Asymmetry in Large-Scale Graph Analysis, Explained (20)

Pregel
PregelPregel
Pregel
 
Ad Click Prediction - Paper review
Ad Click Prediction - Paper reviewAd Click Prediction - Paper review
Ad Click Prediction - Paper review
 
Enabling Application Integrated Proactive Fault Tolerance
Enabling Application Integrated Proactive Fault ToleranceEnabling Application Integrated Proactive Fault Tolerance
Enabling Application Integrated Proactive Fault Tolerance
 
Improving Resource Utilization in Cloud using Application Placement Heuristics
Improving Resource Utilization in Cloud using Application Placement HeuristicsImproving Resource Utilization in Cloud using Application Placement Heuristics
Improving Resource Utilization in Cloud using Application Placement Heuristics
 
Pregel - Paper Review
Pregel - Paper ReviewPregel - Paper Review
Pregel - Paper Review
 
Kineograph
KineographKineograph
Kineograph
 
Uber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache FlinkUber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache Flink
 
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CL
 
M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
M3AT: Monitoring Agents Assignment Model for the Data-Intensive ApplicationsM3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
 
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger Analytics
 
Gatling
Gatling Gatling
Gatling
 
Performance Test Automation With Gatling
Performance Test Automation  With GatlingPerformance Test Automation  With Gatling
Performance Test Automation With Gatling
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
 
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link PredictionMemory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 
Distributed Feature Selection for Efficient Economic Big Data Analysis
Distributed Feature Selection for Efficient Economic Big Data AnalysisDistributed Feature Selection for Efficient Economic Big Data Analysis
Distributed Feature Selection for Efficient Economic Big Data Analysis
 

Recently uploaded

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 

Recently uploaded (20)

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 

Asymmetry in Large-Scale Graph Analysis, Explained

  • 1. Asymmetry in Large-Scale Graph Analysis, Explained 2nd International Workshop on Graph Data Management Experiences and Systems (GRADES 2014) Vasiliki Kalavri1 , Stephan Ewen2 , Kostas Tzoumas3 , Vladimir Vlassov4 , Volker Markl5 , Seif Haridi6 1, 4, 6 KTH Royal Institute of Technology, 2, 3, 5 Technical University of Berlin 1, 4, 6 {kalavri, vladv, haridi}@kth.se 2, 3, 5 {firstname.lastname}@tu-berlin.de
  • 2. Motivation ● Many of large-scale data processing applications include fixed point iterations ○ social network analysis ○ web graph analysis ○ machine learning 2
  • 3. Asymmetrical Convergence ● Often, in fixed point iterations, some elements converge faster than others ● Not all elements require an update in every iteration 3
  • 4. Can we detect the elements that require recomputation and avoid redundant computations? 4
  • 5. Contributions ● A categorization of optimizations for fixed point iterative graph processing ● Necessary conditions under which, it is safe to apply optimizations ● A mapping of existing techniques to graph processing abstractions ● An implementation of template execution plans Optimized algorithms yield order of magnitude gains! 5
  • 6. Asymmetry in Connected Components 6
  • 7. Iterative Plans - Bulk ● In each iteration, all elements are computed ● Always applicable 7
  • 8. Iterative Plans - Dependency 8 ● In each iteration, only elements whose at least one neighbor has changed are computed ● The state is computed using the values of all neighbors ● Always applicable
  • 9. Iterative Plans - Incremental 9 ● In each iteration, only elements whose at least one neighbor has changed are computed ● The state is computed using only the values of updated neighbors ● Applicable when the update function is idempotent and weakly monotonic (e.g. min)
  • 10. Iterative Plans - Delta 10 ● In each iteration, only elements whose at least one neighbor has changed are computed ● The state is computed using only the deltas of updated neighbors ● Applicable when the update function is linear over the composition operator (e.g. sum)
  • 11. Iteration Techniques Support in Graph Processing Systems X : provided by default X : can be easily implemented X : possible, but non-intuitive System Bulk Dependency Incremental Delta Pregel X X X X GraphLab X X X X GraphX X X X X Powergraph X X X X Stratosphere X X X X 11
  • 12. Performance - Connected Components 12
  • 14. Conclusions & Future Work ● Exploiting asymmetrical convergence can lead to order of magnitude performance gains ● In the future, we plan to ○ Use cost-based optimization, to automatically select the most efficient iterative plan, at runtime. ○ Implement a set of representative applications and compare performance with iterative and graph-processing systems. 14