SlideShare a Scribd company logo
1 of 102
Download to read offline
distributed graph algorithms
Generalized Architecture For Some Graph Problems
Abhilash Kumar and Saurav Kumar
November 10, 2015
Indian Institute of Technology Kanpur
problem statement
problem statement
∙ Compute all connected sub-graphs of a given graph,
in a distributed environment
2
problem statement
∙ Compute all connected sub-graphs of a given graph,
in a distributed environment
∙ Develop a generalized architecture to solve similar
graph problems
2
motivation
motivation
∙ Exponential number of connected sub-graphs of a
given graph
4
motivation
∙ Exponential number of connected sub-graphs of a
given graph
∙ Necessity to build distributed systems which utilize
the worldwide plethora of distributed resources
4
approach
approach
Insights
∙ Connected sub-graphs exhibit sub-structure
6
approach
Insights
∙ Connected sub-graphs exhibit sub-structure
∙ Extend smaller sub-graphs by adding an outgoing edge to
generate larger sub-graphs
6
approach
Insights
∙ Connected sub-graphs exhibit sub-structure
∙ Extend smaller sub-graphs by adding an outgoing edge to
generate larger sub-graphs
∙ Base cases are sub-graphs represented by all the edges of the
graph
6
approach
Algorithm to compute all connected sub-graphs
∙ Initialize:
7
approach
Algorithm to compute all connected sub-graphs
∙ Initialize:
∙ Queue Q
7
approach
Algorithm to compute all connected sub-graphs
∙ Initialize:
∙ Queue Q
∙ For each edge in G
7
approach
Algorithm to compute all connected sub-graphs
∙ Initialize:
∙ Queue Q
∙ For each edge in G
∙ Create a sub-graph G’ representing the edge
7
approach
Algorithm to compute all connected sub-graphs
∙ Initialize:
∙ Queue Q
∙ For each edge in G
∙ Create a sub-graph G’ representing the edge
∙ Push G’ to Q
7
approach
Algorithm to compute all connected sub-graphs
∙ Initialize:
∙ Queue Q
∙ For each edge in G
∙ Create a sub-graph G’ representing the edge
∙ Push G’ to Q
∙ Process:
7
approach
Algorithm to compute all connected sub-graphs
∙ Initialize:
∙ Queue Q
∙ For each edge in G
∙ Create a sub-graph G’ representing the edge
∙ Push G’ to Q
∙ Process:
∙ while Q is not empty
7
approach
Algorithm to compute all connected sub-graphs
∙ Initialize:
∙ Queue Q
∙ For each edge in G
∙ Create a sub-graph G’ representing the edge
∙ Push G’ to Q
∙ Process:
∙ while Q is not empty
∙ G = Q.pop()
7
approach
Algorithm to compute all connected sub-graphs
∙ Initialize:
∙ Queue Q
∙ For each edge in G
∙ Create a sub-graph G’ representing the edge
∙ Push G’ to Q
∙ Process:
∙ while Q is not empty
∙ G = Q.pop()
∙ Save G
7
approach
Algorithm to compute all connected sub-graphs
∙ Initialize:
∙ Queue Q
∙ For each edge in G
∙ Create a sub-graph G’ representing the edge
∙ Push G’ to Q
∙ Process:
∙ while Q is not empty
∙ G = Q.pop()
∙ Save G
∙ For each outgoing edge E of G
G’ = G U E
if G’ has not been seen yet
Push G’ to Q
7
approach
Algorithm to compute all connected sub-graphs
∙ Initialize:
∙ Queue Q
∙ For each edge in G
∙ Create a sub-graph G’ representing the edge
∙ Push G’ to Q
∙ Process:
∙ while Q is not empty
∙ G = Q.pop()
∙ Save G
∙ For each outgoing edge E of G
G’ = G U E
if G’ has not been seen yet
Push G’ to Q
7
approach
Algorithm to compute all connected sub-graphs
∙ Initialize:
∙ Queue Q
∙ For each edge in G
∙ Create a sub-graph G’ representing the edge
∙ Push G’ to Q
∙ Process:
∙ while Q is not empty
∙ G = Q.pop()
∙ Save G
∙ For each outgoing edge E of G
G’ = G U E
if G’ has not been seen yet
Push G’ to Q
7
approach
Figure: Generating initial sub-graphs from a given graph
8
approach
Figure: Extending a sub-graph to generate new sub-graphs
9
approach
Figure: Consider only unique sub-graphs generated for further processing
10
architecture
architecture
Master-Slave Architecture
∙ Commonly used approach for parallel and distributed
applications
12
architecture
Master-Slave Architecture
∙ Commonly used approach for parallel and distributed
applications
∙ Message passing to communicate over TCP
12
architecture
Master-Slave Architecture
∙ Commonly used approach for parallel and distributed
applications
∙ Message passing to communicate over TCP
∙ Master assigns tasks to slaves and finally collects the results
12
architecture
Master-Slave Architecture
∙ Commonly used approach for parallel and distributed
applications
∙ Message passing to communicate over TCP
∙ Master assigns tasks to slaves and finally collects the results
∙ A Task object represents a sub-graph which contains all
necessary information to process that sub-graph
12
architecture
Master-Slave Architecture
∙ Commonly used approach for parallel and distributed
applications
∙ Message passing to communicate over TCP
∙ Master assigns tasks to slaves and finally collects the results
∙ A Task object represents a sub-graph which contains all
necessary information to process that sub-graph
∙ A slave may request a task from other slaves when its task
queue is empty and processing ends when all task queues are
empty
12
architecture
Task, Queue and Bloom filter
∙ A task has these information:
13
architecture
Task, Queue and Bloom filter
∙ A task has these information:
∙ A list of vertices that are already in the sub-graph
13
architecture
Task, Queue and Bloom filter
∙ A task has these information:
∙ A list of vertices that are already in the sub-graph
∙ A list of edges that can be extended in the next step
13
architecture
Task, Queue and Bloom filter
∙ A task has these information:
∙ A list of vertices that are already in the sub-graph
∙ A list of edges that can be extended in the next step
∙ Task Queue
13
architecture
Task, Queue and Bloom filter
∙ A task has these information:
∙ A list of vertices that are already in the sub-graph
∙ A list of edges that can be extended in the next step
∙ Task Queue
∙ Each slave has a task queue
13
architecture
Task, Queue and Bloom filter
∙ A task has these information:
∙ A list of vertices that are already in the sub-graph
∙ A list of edges that can be extended in the next step
∙ Task Queue
∙ Each slave has a task queue
∙ Slave picks up a task from its task queue and processes it
13
architecture
Task, Queue and Bloom filter
∙ A task has these information:
∙ A list of vertices that are already in the sub-graph
∙ A list of edges that can be extended in the next step
∙ Task Queue
∙ Each slave has a task queue
∙ Slave picks up a task from its task queue and processes it
∙ Newly generated unique tasks are pushed into the task queue
13
architecture
Task, Queue and Bloom filter
∙ A task has these information:
∙ A list of vertices that are already in the sub-graph
∙ A list of edges that can be extended in the next step
∙ Task Queue
∙ Each slave has a task queue
∙ Slave picks up a task from its task queue and processes it
∙ Newly generated unique tasks are pushed into the task queue
∙ Bloom filter
13
architecture
Task, Queue and Bloom filter
∙ A task has these information:
∙ A list of vertices that are already in the sub-graph
∙ A list of edges that can be extended in the next step
∙ Task Queue
∙ Each slave has a task queue
∙ Slave picks up a task from its task queue and processes it
∙ Newly generated unique tasks are pushed into the task queue
∙ Bloom filter
∙ We use Bloom filter to check uniqueness of the newly generated
tasks (i.e. sub-graphs)
13
architecture
Task, Queue and Bloom filter
∙ A task has these information:
∙ A list of vertices that are already in the sub-graph
∙ A list of edges that can be extended in the next step
∙ Task Queue
∙ Each slave has a task queue
∙ Slave picks up a task from its task queue and processes it
∙ Newly generated unique tasks are pushed into the task queue
∙ Bloom filter
∙ We use Bloom filter to check uniqueness of the newly generated
tasks (i.e. sub-graphs)
∙ Bloom filter is also distributed so that none of the servers get
loaded
13
architecture
Bloom Filter Vs Hashing
∙ Used bloom filter because its very space efficient
14
architecture
Bloom Filter Vs Hashing
∙ Used bloom filter because its very space efficient
∙ Space required to get error probability of p is
−n × ln p
(ln 2)2
bits
14
architecture
Bloom Filter Vs Hashing
∙ Used bloom filter because its very space efficient
∙ Space required to get error probability of p is
−n × ln p
(ln 2)2
bits
∙ Error probability can be reduced with very little extra space
14
architecture
Bloom Filter Vs Hashing
∙ Used bloom filter because its very space efficient
∙ Space required to get error probability of p is
−n × ln p
(ln 2)2
bits
∙ Error probability can be reduced with very little extra space
∙ Hashing can be used to make the algorithm deterministic
14
architecture
Bloom Filter Vs Hashing
∙ Used bloom filter because its very space efficient
∙ Space required to get error probability of p is
−n × ln p
(ln 2)2
bits
∙ Error probability can be reduced with very little extra space
∙ Hashing can be used to make the algorithm deterministic
∙ Bloom filter can also be parallelized whereas Hashing cannot be.
14
architecture
How to use this architecture?
∙ Two functions required: initialize and process
15
architecture
How to use this architecture?
∙ Two functions required: initialize and process
∙ Initialize generates initial tasks. Master randomly assigns these
tasks to the slaves.
15
architecture
How to use this architecture?
∙ Two functions required: initialize and process
∙ Initialize generates initial tasks. Master randomly assigns these
tasks to the slaves.
∙ Process defines a procedure that will generate new tasks from a
given task (extend sub-graph in our case)
15
architecture
How to use this architecture?
∙ Two functions required: initialize and process
∙ Initialize generates initial tasks. Master randomly assigns these
tasks to the slaves.
∙ Process defines a procedure that will generate new tasks from a
given task (extend sub-graph in our case)
15
architecture
How to use this architecture?
∙ Two functions required: initialize and process
∙ Initialize generates initial tasks. Master randomly assigns these
tasks to the slaves.
∙ Process defines a procedure that will generate new tasks from a
given task (extend sub-graph in our case)
15
architecture
Fitting the connected sub-graph problem
∙ Initialize creates all the tasks (sub-graphs) with one edge.
16
architecture
Fitting the connected sub-graph problem
∙ Initialize creates all the tasks (sub-graphs) with one edge.
∙ Process takes a connected sub-graph and extends it by adding
all extend-able edges, one at a time
16
simulation
simulation
Simulation for testing
∙ Used 2 machines, say H and L.
18
simulation
Simulation for testing
∙ Used 2 machines, say H and L.
∙ H: 24 core, 200 GB, Xeon E5645 @ 2.40GHz
18
simulation
Simulation for testing
∙ Used 2 machines, say H and L.
∙ H: 24 core, 200 GB, Xeon E5645 @ 2.40GHz
∙ L: 4 core, 8 GB, i5-3230M CPU @ 2.60GHz
18
simulation
Simulation for testing
∙ Used 2 machines, say H and L.
∙ H: 24 core, 200 GB, Xeon E5645 @ 2.40GHz
∙ L: 4 core, 8 GB, i5-3230M CPU @ 2.60GHz
∙ Opened multiple ports (6 on H, 2 on L) to mimic 8 slave servers.
18
simulation
Simulation for testing
∙ Used various combinations of number of slaves on H and L
19
simulation
Simulation for testing
∙ Used various combinations of number of slaves on H and L
∙ Used 2 tree graphs G(14, 13) and G(16, 15): easy to match results
19
simulation
Simulation for testing
∙ Used various combinations of number of slaves on H and L
∙ Used 2 tree graphs G(14, 13) and G(16, 15): easy to match results
∙ Collected data for number of tasks processed by each slave and
number of hash-check queries made by each slave.
19
simulation
Simulation for testing
∙ Used various combinations of number of slaves on H and L
∙ Used 2 tree graphs G(14, 13) and G(16, 15): easy to match results
∙ Collected data for number of tasks processed by each slave and
number of hash-check queries made by each slave.
∙ Collected total running time data for both graphs, including the
cases of network fault.
19
results
results
Figure: Number of hash check queries vs number of slaves for G(14, 13)
21
results
Figure: Distribution of number of tasks processed by slaves for G(14, 13)
22
results
Figure: Distribution of number of tasks processed by slaves for G(14, 13)
23
results
Figure: Distribution of number of tasks processed by slaves for G(14, 13)
24
results
Figure: Number of hash check queries vs number of slaves for G(16, 15)
25
results
Figure: Distribution of number of tasks processed by slaves for G(16, 15)
26
results
Figure: Distribution of number of tasks processed by slaves for G(16, 15)
27
results
Figure: Distribution of number of tasks processed by slaves for G(16, 15)
28
results
Actual Running Time
∙ Network faults happened, specially due to fewer physical
machines
29
results
Actual Running Time
∙ Network faults happened, specially due to fewer physical
machines
∙ The architecture recovers from these faults, but a lot of time is
consumed
29
results
Actual Running Time
∙ Network faults happened, specially due to fewer physical
machines
∙ The architecture recovers from these faults, but a lot of time is
consumed
∙ For G(14, 13), running time ranged from 15s to 91s
29
results
Actual Running Time
∙ Network faults happened, specially due to fewer physical
machines
∙ The architecture recovers from these faults, but a lot of time is
consumed
∙ For G(14, 13), running time ranged from 15s to 91s
∙ For G(15, 14), running time ranged from 255s to 447s
29
results
Actual Running Time
∙ Network faults happened, specially due to fewer physical
machines
∙ The architecture recovers from these faults, but a lot of time is
consumed
∙ For G(14, 13), running time ranged from 15s to 91s
∙ For G(15, 14), running time ranged from 255s to 447s
∙ These are the cases when process function doesn’t do
additional computation per subgraph.
29
results
Figure: Running time when process does addition computation(10ms)
30
advantages
advantages
Advantages
∙ Highly scalable
32
advantages
Advantages
∙ Highly scalable
∙ More slaves can be added easily
32
advantages
Advantages
∙ Highly scalable
∙ More slaves can be added easily
∙ Performance increases with number of slaves
32
advantages
Advantages
∙ Highly scalable
∙ More slaves can be added easily
∙ Performance increases with number of slaves
∙ Even distribution of tasks: efficient machines process more tasks
32
advantages
Advantages
∙ Highly scalable
∙ More slaves can be added easily
∙ Performance increases with number of slaves
∙ Even distribution of tasks: efficient machines process more tasks
∙ Architecture is very reusable
32
advantages
Advantages
∙ Highly scalable
∙ More slaves can be added easily
∙ Performance increases with number of slaves
∙ Even distribution of tasks: efficient machines process more tasks
∙ Architecture is very reusable
∙ Many other problems can be solved using this architecture
32
advantages
Advantages
∙ Highly scalable
∙ More slaves can be added easily
∙ Performance increases with number of slaves
∙ Even distribution of tasks: efficient machines process more tasks
∙ Architecture is very reusable
∙ Many other problems can be solved using this architecture
∙ Only need to provide 2 functions: initialize and process
32
advantages
Advantages
∙ Highly scalable
∙ More slaves can be added easily
∙ Performance increases with number of slaves
∙ Even distribution of tasks: efficient machines process more tasks
∙ Architecture is very reusable
∙ Many other problems can be solved using this architecture
∙ Only need to provide 2 functions: initialize and process
∙ Network fault tolerant
32
advantages
Other problems that can be solved using this
paradigm
∙ Generating all cliques, paths, cycles, sub-trees, spanning
sub-trees
33
advantages
Other problems that can be solved using this
paradigm
∙ Generating all cliques, paths, cycles, sub-trees, spanning
sub-trees
∙ Can also solve few classical NP problems like finding all
maximal cliques and TSP
33
future works
future works
Further improvements
∙ Implement parallelized bloom filter
35
future works
Further improvements
∙ Implement parallelized bloom filter
∙ Parallely solving tasks in a slave (on powerful servers)
35
future works
Further improvements
∙ Implement parallelized bloom filter
∙ Parallely solving tasks in a slave (on powerful servers)
∙ Handle slave/master failures
35
future works
Further improvements
∙ Implement parallelized bloom filter
∙ Parallely solving tasks in a slave (on powerful servers)
∙ Handle slave/master failures
∙ Using file I/O to store task queue for large problems
35
future works
Further improvements
∙ Implement parallelized bloom filter
∙ Parallely solving tasks in a slave (on powerful servers)
∙ Handle slave/master failures
∙ Using file I/O to store task queue for large problems
∙ Exploring this paradigm to solve other problems
35
conclusion
conclusion
Conclusion
∙ The algorithm is very efficient, total computation is not greater
than m * T, where T is the minimum computation required to
find all sub-graphs and m is number of edges.
37
conclusion
Conclusion
∙ The algorithm is very efficient, total computation is not greater
than m * T, where T is the minimum computation required to
find all sub-graphs and m is number of edges.
∙ In practice time complexity is c*T where c is much smaller.
Bound on c can be improved to min(m, log T).
37
conclusion
Conclusion
∙ The algorithm is very efficient, total computation is not greater
than m * T, where T is the minimum computation required to
find all sub-graphs and m is number of edges.
∙ In practice time complexity is c*T where c is much smaller.
Bound on c can be improved to min(m, log T).
∙ As we are interested in finding all connected sub-graph, T better
not be very large.
37
conclusion
Conclusion
∙ The algorithm is very efficient, total computation is not greater
than m * T, where T is the minimum computation required to
find all sub-graphs and m is number of edges.
∙ In practice time complexity is c*T where c is much smaller.
Bound on c can be improved to min(m, log T).
∙ As we are interested in finding all connected sub-graph, T better
not be very large.
∙ The architecture help us solve this problem in much scalable
manner and significantly reduces the time of computation
provided good infrastructure and better implementation.
37
Questions?
Implementation of the algorithm and the architecture available at
github.com/abhilak/DGA
Slides created using Beamer(mtheme) and plot.ly on ShareLaTeX
38
Thank You
39

More Related Content

What's hot

What's hot (17)

Python - Lecture 10
Python - Lecture 10Python - Lecture 10
Python - Lecture 10
 
3 little clojure functions
3 little clojure functions3 little clojure functions
3 little clojure functions
 
Queues
QueuesQueues
Queues
 
Two C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp InsightsTwo C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp Insights
 
Scilab: Computing Tool For Engineers
Scilab: Computing Tool For EngineersScilab: Computing Tool For Engineers
Scilab: Computing Tool For Engineers
 
[CCC'21] Evaluation of Work Stealing Algorithms
[CCC'21] Evaluation of Work Stealing Algorithms[CCC'21] Evaluation of Work Stealing Algorithms
[CCC'21] Evaluation of Work Stealing Algorithms
 
Model checker for NTCC
Model checker for NTCCModel checker for NTCC
Model checker for NTCC
 
Garbage collection
Garbage collectionGarbage collection
Garbage collection
 
Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using Automata
Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using AutomataModeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using Automata
Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using Automata
 
Golang dot-testing-lite
Golang dot-testing-liteGolang dot-testing-lite
Golang dot-testing-lite
 
Net practicals lab mannual
Net practicals lab mannualNet practicals lab mannual
Net practicals lab mannual
 
C# p7
C# p7C# p7
C# p7
 
Scilab-by-dr-gomez-june2014
Scilab-by-dr-gomez-june2014Scilab-by-dr-gomez-june2014
Scilab-by-dr-gomez-june2014
 
SLE2015: Distributed ATL
SLE2015: Distributed ATLSLE2015: Distributed ATL
SLE2015: Distributed ATL
 
Introduction to c part -3
Introduction to c   part -3Introduction to c   part -3
Introduction to c part -3
 
Effective java item 80 and 81
Effective java   item 80 and 81Effective java   item 80 and 81
Effective java item 80 and 81
 
Cape2013 scilab-workshop-19Oct13
Cape2013 scilab-workshop-19Oct13Cape2013 scilab-workshop-19Oct13
Cape2013 scilab-workshop-19Oct13
 

Viewers also liked

Graph Traversal Algorithm
Graph Traversal AlgorithmGraph Traversal Algorithm
Graph Traversal Algorithmjyothimonc
 
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphSocialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphMongoDB
 
Graph Traversal Algorithms - Depth First Search Traversal
Graph Traversal Algorithms - Depth First Search TraversalGraph Traversal Algorithms - Depth First Search Traversal
Graph Traversal Algorithms - Depth First Search TraversalAmrinder Arora
 
2.2 topological sort 02
2.2 topological sort 022.2 topological sort 02
2.2 topological sort 02Krish_ver2
 
Graph theory
Graph theoryGraph theory
Graph theoryKumar
 
Bfs and dfs in data structure
Bfs and dfs in  data structure Bfs and dfs in  data structure
Bfs and dfs in data structure Ankit Kumar Singh
 
Depth first search and breadth first searching
Depth first search and breadth first searchingDepth first search and breadth first searching
Depth first search and breadth first searchingKawsar Hamid Sumon
 

Viewers also liked (15)

18 Basic Graph Algorithms
18 Basic Graph Algorithms18 Basic Graph Algorithms
18 Basic Graph Algorithms
 
1535 graph algorithms
1535 graph algorithms1535 graph algorithms
1535 graph algorithms
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Graph Traversal Algorithm
Graph Traversal AlgorithmGraph Traversal Algorithm
Graph Traversal Algorithm
 
Fano algorithm
Fano algorithmFano algorithm
Fano algorithm
 
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphSocialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
 
Graph Traversal Algorithms - Depth First Search Traversal
Graph Traversal Algorithms - Depth First Search TraversalGraph Traversal Algorithms - Depth First Search Traversal
Graph Traversal Algorithms - Depth First Search Traversal
 
Shannon Fano
Shannon FanoShannon Fano
Shannon Fano
 
2.2 topological sort 02
2.2 topological sort 022.2 topological sort 02
2.2 topological sort 02
 
DFS and BFS
DFS and BFSDFS and BFS
DFS and BFS
 
Graphs bfs dfs
Graphs bfs dfsGraphs bfs dfs
Graphs bfs dfs
 
Graph theory
Graph theoryGraph theory
Graph theory
 
Bfs and dfs in data structure
Bfs and dfs in  data structure Bfs and dfs in  data structure
Bfs and dfs in data structure
 
Depth first search and breadth first searching
Depth first search and breadth first searchingDepth first search and breadth first searching
Depth first search and breadth first searching
 
Compression
CompressionCompression
Compression
 

Similar to Distributed Graph Algorithms

Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
 
Java 8 - functional features
Java 8 - functional featuresJava 8 - functional features
Java 8 - functional featuresRafal Rybacki
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data scienceSovello Hildebrand
 
Functional Programming 101 for Java 7 Developers
Functional Programming 101 for Java 7 DevelopersFunctional Programming 101 for Java 7 Developers
Functional Programming 101 for Java 7 DevelopersJayaram Sankaranarayanan
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisC4Media
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
Asynchronous single page applications without a line of HTML or Javascript, o...
Asynchronous single page applications without a line of HTML or Javascript, o...Asynchronous single page applications without a line of HTML or Javascript, o...
Asynchronous single page applications without a line of HTML or Javascript, o...Robert Schadek
 
Raising ux bar with offline first design
Raising ux bar with offline first designRaising ux bar with offline first design
Raising ux bar with offline first designKyrylo Reznykov
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning InfrastructureSigOpt
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockScyllaDB
 
Architecture for scalable Angular applications
Architecture for scalable Angular applicationsArchitecture for scalable Angular applications
Architecture for scalable Angular applicationsPaweł Żurowski
 
Slick 3.0 functional programming and db side effects
Slick 3.0   functional programming and db side effectsSlick 3.0   functional programming and db side effects
Slick 3.0 functional programming and db side effectsJoost de Vries
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 

Similar to Distributed Graph Algorithms (20)

vega
vegavega
vega
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
Java 8 - functional features
Java 8 - functional featuresJava 8 - functional features
Java 8 - functional features
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data science
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Async fun
Async funAsync fun
Async fun
 
Functional Programming 101 for Java 7 Developers
Functional Programming 101 for Java 7 DevelopersFunctional Programming 101 for Java 7 Developers
Functional Programming 101 for Java 7 Developers
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
Asynchronous single page applications without a line of HTML or Javascript, o...
Asynchronous single page applications without a line of HTML or Javascript, o...Asynchronous single page applications without a line of HTML or Javascript, o...
Asynchronous single page applications without a line of HTML or Javascript, o...
 
Raising ux bar with offline first design
Raising ux bar with offline first designRaising ux bar with offline first design
Raising ux bar with offline first design
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
cb streams - gavin pickin
cb streams - gavin pickincb streams - gavin pickin
cb streams - gavin pickin
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
 
Architecture for scalable Angular applications
Architecture for scalable Angular applicationsArchitecture for scalable Angular applications
Architecture for scalable Angular applications
 
01_intro-cpp.ppt
01_intro-cpp.ppt01_intro-cpp.ppt
01_intro-cpp.ppt
 
01_intro-cpp.ppt
01_intro-cpp.ppt01_intro-cpp.ppt
01_intro-cpp.ppt
 
CS267_Graph_Lab
CS267_Graph_LabCS267_Graph_Lab
CS267_Graph_Lab
 
Slick 3.0 functional programming and db side effects
Slick 3.0   functional programming and db side effectsSlick 3.0   functional programming and db side effects
Slick 3.0 functional programming and db side effects
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 

Recently uploaded

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 

Distributed Graph Algorithms

  • 1. distributed graph algorithms Generalized Architecture For Some Graph Problems Abhilash Kumar and Saurav Kumar November 10, 2015 Indian Institute of Technology Kanpur
  • 3. problem statement ∙ Compute all connected sub-graphs of a given graph, in a distributed environment 2
  • 4. problem statement ∙ Compute all connected sub-graphs of a given graph, in a distributed environment ∙ Develop a generalized architecture to solve similar graph problems 2
  • 6. motivation ∙ Exponential number of connected sub-graphs of a given graph 4
  • 7. motivation ∙ Exponential number of connected sub-graphs of a given graph ∙ Necessity to build distributed systems which utilize the worldwide plethora of distributed resources 4
  • 10. approach Insights ∙ Connected sub-graphs exhibit sub-structure ∙ Extend smaller sub-graphs by adding an outgoing edge to generate larger sub-graphs 6
  • 11. approach Insights ∙ Connected sub-graphs exhibit sub-structure ∙ Extend smaller sub-graphs by adding an outgoing edge to generate larger sub-graphs ∙ Base cases are sub-graphs represented by all the edges of the graph 6
  • 12. approach Algorithm to compute all connected sub-graphs ∙ Initialize: 7
  • 13. approach Algorithm to compute all connected sub-graphs ∙ Initialize: ∙ Queue Q 7
  • 14. approach Algorithm to compute all connected sub-graphs ∙ Initialize: ∙ Queue Q ∙ For each edge in G 7
  • 15. approach Algorithm to compute all connected sub-graphs ∙ Initialize: ∙ Queue Q ∙ For each edge in G ∙ Create a sub-graph G’ representing the edge 7
  • 16. approach Algorithm to compute all connected sub-graphs ∙ Initialize: ∙ Queue Q ∙ For each edge in G ∙ Create a sub-graph G’ representing the edge ∙ Push G’ to Q 7
  • 17. approach Algorithm to compute all connected sub-graphs ∙ Initialize: ∙ Queue Q ∙ For each edge in G ∙ Create a sub-graph G’ representing the edge ∙ Push G’ to Q ∙ Process: 7
  • 18. approach Algorithm to compute all connected sub-graphs ∙ Initialize: ∙ Queue Q ∙ For each edge in G ∙ Create a sub-graph G’ representing the edge ∙ Push G’ to Q ∙ Process: ∙ while Q is not empty 7
  • 19. approach Algorithm to compute all connected sub-graphs ∙ Initialize: ∙ Queue Q ∙ For each edge in G ∙ Create a sub-graph G’ representing the edge ∙ Push G’ to Q ∙ Process: ∙ while Q is not empty ∙ G = Q.pop() 7
  • 20. approach Algorithm to compute all connected sub-graphs ∙ Initialize: ∙ Queue Q ∙ For each edge in G ∙ Create a sub-graph G’ representing the edge ∙ Push G’ to Q ∙ Process: ∙ while Q is not empty ∙ G = Q.pop() ∙ Save G 7
  • 21. approach Algorithm to compute all connected sub-graphs ∙ Initialize: ∙ Queue Q ∙ For each edge in G ∙ Create a sub-graph G’ representing the edge ∙ Push G’ to Q ∙ Process: ∙ while Q is not empty ∙ G = Q.pop() ∙ Save G ∙ For each outgoing edge E of G G’ = G U E if G’ has not been seen yet Push G’ to Q 7
  • 22. approach Algorithm to compute all connected sub-graphs ∙ Initialize: ∙ Queue Q ∙ For each edge in G ∙ Create a sub-graph G’ representing the edge ∙ Push G’ to Q ∙ Process: ∙ while Q is not empty ∙ G = Q.pop() ∙ Save G ∙ For each outgoing edge E of G G’ = G U E if G’ has not been seen yet Push G’ to Q 7
  • 23. approach Algorithm to compute all connected sub-graphs ∙ Initialize: ∙ Queue Q ∙ For each edge in G ∙ Create a sub-graph G’ representing the edge ∙ Push G’ to Q ∙ Process: ∙ while Q is not empty ∙ G = Q.pop() ∙ Save G ∙ For each outgoing edge E of G G’ = G U E if G’ has not been seen yet Push G’ to Q 7
  • 24. approach Figure: Generating initial sub-graphs from a given graph 8
  • 25. approach Figure: Extending a sub-graph to generate new sub-graphs 9
  • 26. approach Figure: Consider only unique sub-graphs generated for further processing 10
  • 28. architecture Master-Slave Architecture ∙ Commonly used approach for parallel and distributed applications 12
  • 29. architecture Master-Slave Architecture ∙ Commonly used approach for parallel and distributed applications ∙ Message passing to communicate over TCP 12
  • 30. architecture Master-Slave Architecture ∙ Commonly used approach for parallel and distributed applications ∙ Message passing to communicate over TCP ∙ Master assigns tasks to slaves and finally collects the results 12
  • 31. architecture Master-Slave Architecture ∙ Commonly used approach for parallel and distributed applications ∙ Message passing to communicate over TCP ∙ Master assigns tasks to slaves and finally collects the results ∙ A Task object represents a sub-graph which contains all necessary information to process that sub-graph 12
  • 32. architecture Master-Slave Architecture ∙ Commonly used approach for parallel and distributed applications ∙ Message passing to communicate over TCP ∙ Master assigns tasks to slaves and finally collects the results ∙ A Task object represents a sub-graph which contains all necessary information to process that sub-graph ∙ A slave may request a task from other slaves when its task queue is empty and processing ends when all task queues are empty 12
  • 33. architecture Task, Queue and Bloom filter ∙ A task has these information: 13
  • 34. architecture Task, Queue and Bloom filter ∙ A task has these information: ∙ A list of vertices that are already in the sub-graph 13
  • 35. architecture Task, Queue and Bloom filter ∙ A task has these information: ∙ A list of vertices that are already in the sub-graph ∙ A list of edges that can be extended in the next step 13
  • 36. architecture Task, Queue and Bloom filter ∙ A task has these information: ∙ A list of vertices that are already in the sub-graph ∙ A list of edges that can be extended in the next step ∙ Task Queue 13
  • 37. architecture Task, Queue and Bloom filter ∙ A task has these information: ∙ A list of vertices that are already in the sub-graph ∙ A list of edges that can be extended in the next step ∙ Task Queue ∙ Each slave has a task queue 13
  • 38. architecture Task, Queue and Bloom filter ∙ A task has these information: ∙ A list of vertices that are already in the sub-graph ∙ A list of edges that can be extended in the next step ∙ Task Queue ∙ Each slave has a task queue ∙ Slave picks up a task from its task queue and processes it 13
  • 39. architecture Task, Queue and Bloom filter ∙ A task has these information: ∙ A list of vertices that are already in the sub-graph ∙ A list of edges that can be extended in the next step ∙ Task Queue ∙ Each slave has a task queue ∙ Slave picks up a task from its task queue and processes it ∙ Newly generated unique tasks are pushed into the task queue 13
  • 40. architecture Task, Queue and Bloom filter ∙ A task has these information: ∙ A list of vertices that are already in the sub-graph ∙ A list of edges that can be extended in the next step ∙ Task Queue ∙ Each slave has a task queue ∙ Slave picks up a task from its task queue and processes it ∙ Newly generated unique tasks are pushed into the task queue ∙ Bloom filter 13
  • 41. architecture Task, Queue and Bloom filter ∙ A task has these information: ∙ A list of vertices that are already in the sub-graph ∙ A list of edges that can be extended in the next step ∙ Task Queue ∙ Each slave has a task queue ∙ Slave picks up a task from its task queue and processes it ∙ Newly generated unique tasks are pushed into the task queue ∙ Bloom filter ∙ We use Bloom filter to check uniqueness of the newly generated tasks (i.e. sub-graphs) 13
  • 42. architecture Task, Queue and Bloom filter ∙ A task has these information: ∙ A list of vertices that are already in the sub-graph ∙ A list of edges that can be extended in the next step ∙ Task Queue ∙ Each slave has a task queue ∙ Slave picks up a task from its task queue and processes it ∙ Newly generated unique tasks are pushed into the task queue ∙ Bloom filter ∙ We use Bloom filter to check uniqueness of the newly generated tasks (i.e. sub-graphs) ∙ Bloom filter is also distributed so that none of the servers get loaded 13
  • 43. architecture Bloom Filter Vs Hashing ∙ Used bloom filter because its very space efficient 14
  • 44. architecture Bloom Filter Vs Hashing ∙ Used bloom filter because its very space efficient ∙ Space required to get error probability of p is −n × ln p (ln 2)2 bits 14
  • 45. architecture Bloom Filter Vs Hashing ∙ Used bloom filter because its very space efficient ∙ Space required to get error probability of p is −n × ln p (ln 2)2 bits ∙ Error probability can be reduced with very little extra space 14
  • 46. architecture Bloom Filter Vs Hashing ∙ Used bloom filter because its very space efficient ∙ Space required to get error probability of p is −n × ln p (ln 2)2 bits ∙ Error probability can be reduced with very little extra space ∙ Hashing can be used to make the algorithm deterministic 14
  • 47. architecture Bloom Filter Vs Hashing ∙ Used bloom filter because its very space efficient ∙ Space required to get error probability of p is −n × ln p (ln 2)2 bits ∙ Error probability can be reduced with very little extra space ∙ Hashing can be used to make the algorithm deterministic ∙ Bloom filter can also be parallelized whereas Hashing cannot be. 14
  • 48. architecture How to use this architecture? ∙ Two functions required: initialize and process 15
  • 49. architecture How to use this architecture? ∙ Two functions required: initialize and process ∙ Initialize generates initial tasks. Master randomly assigns these tasks to the slaves. 15
  • 50. architecture How to use this architecture? ∙ Two functions required: initialize and process ∙ Initialize generates initial tasks. Master randomly assigns these tasks to the slaves. ∙ Process defines a procedure that will generate new tasks from a given task (extend sub-graph in our case) 15
  • 51. architecture How to use this architecture? ∙ Two functions required: initialize and process ∙ Initialize generates initial tasks. Master randomly assigns these tasks to the slaves. ∙ Process defines a procedure that will generate new tasks from a given task (extend sub-graph in our case) 15
  • 52. architecture How to use this architecture? ∙ Two functions required: initialize and process ∙ Initialize generates initial tasks. Master randomly assigns these tasks to the slaves. ∙ Process defines a procedure that will generate new tasks from a given task (extend sub-graph in our case) 15
  • 53. architecture Fitting the connected sub-graph problem ∙ Initialize creates all the tasks (sub-graphs) with one edge. 16
  • 54. architecture Fitting the connected sub-graph problem ∙ Initialize creates all the tasks (sub-graphs) with one edge. ∙ Process takes a connected sub-graph and extends it by adding all extend-able edges, one at a time 16
  • 56. simulation Simulation for testing ∙ Used 2 machines, say H and L. 18
  • 57. simulation Simulation for testing ∙ Used 2 machines, say H and L. ∙ H: 24 core, 200 GB, Xeon E5645 @ 2.40GHz 18
  • 58. simulation Simulation for testing ∙ Used 2 machines, say H and L. ∙ H: 24 core, 200 GB, Xeon E5645 @ 2.40GHz ∙ L: 4 core, 8 GB, i5-3230M CPU @ 2.60GHz 18
  • 59. simulation Simulation for testing ∙ Used 2 machines, say H and L. ∙ H: 24 core, 200 GB, Xeon E5645 @ 2.40GHz ∙ L: 4 core, 8 GB, i5-3230M CPU @ 2.60GHz ∙ Opened multiple ports (6 on H, 2 on L) to mimic 8 slave servers. 18
  • 60. simulation Simulation for testing ∙ Used various combinations of number of slaves on H and L 19
  • 61. simulation Simulation for testing ∙ Used various combinations of number of slaves on H and L ∙ Used 2 tree graphs G(14, 13) and G(16, 15): easy to match results 19
  • 62. simulation Simulation for testing ∙ Used various combinations of number of slaves on H and L ∙ Used 2 tree graphs G(14, 13) and G(16, 15): easy to match results ∙ Collected data for number of tasks processed by each slave and number of hash-check queries made by each slave. 19
  • 63. simulation Simulation for testing ∙ Used various combinations of number of slaves on H and L ∙ Used 2 tree graphs G(14, 13) and G(16, 15): easy to match results ∙ Collected data for number of tasks processed by each slave and number of hash-check queries made by each slave. ∙ Collected total running time data for both graphs, including the cases of network fault. 19
  • 65. results Figure: Number of hash check queries vs number of slaves for G(14, 13) 21
  • 66. results Figure: Distribution of number of tasks processed by slaves for G(14, 13) 22
  • 67. results Figure: Distribution of number of tasks processed by slaves for G(14, 13) 23
  • 68. results Figure: Distribution of number of tasks processed by slaves for G(14, 13) 24
  • 69. results Figure: Number of hash check queries vs number of slaves for G(16, 15) 25
  • 70. results Figure: Distribution of number of tasks processed by slaves for G(16, 15) 26
  • 71. results Figure: Distribution of number of tasks processed by slaves for G(16, 15) 27
  • 72. results Figure: Distribution of number of tasks processed by slaves for G(16, 15) 28
  • 73. results Actual Running Time ∙ Network faults happened, specially due to fewer physical machines 29
  • 74. results Actual Running Time ∙ Network faults happened, specially due to fewer physical machines ∙ The architecture recovers from these faults, but a lot of time is consumed 29
  • 75. results Actual Running Time ∙ Network faults happened, specially due to fewer physical machines ∙ The architecture recovers from these faults, but a lot of time is consumed ∙ For G(14, 13), running time ranged from 15s to 91s 29
  • 76. results Actual Running Time ∙ Network faults happened, specially due to fewer physical machines ∙ The architecture recovers from these faults, but a lot of time is consumed ∙ For G(14, 13), running time ranged from 15s to 91s ∙ For G(15, 14), running time ranged from 255s to 447s 29
  • 77. results Actual Running Time ∙ Network faults happened, specially due to fewer physical machines ∙ The architecture recovers from these faults, but a lot of time is consumed ∙ For G(14, 13), running time ranged from 15s to 91s ∙ For G(15, 14), running time ranged from 255s to 447s ∙ These are the cases when process function doesn’t do additional computation per subgraph. 29
  • 78. results Figure: Running time when process does addition computation(10ms) 30
  • 81. advantages Advantages ∙ Highly scalable ∙ More slaves can be added easily 32
  • 82. advantages Advantages ∙ Highly scalable ∙ More slaves can be added easily ∙ Performance increases with number of slaves 32
  • 83. advantages Advantages ∙ Highly scalable ∙ More slaves can be added easily ∙ Performance increases with number of slaves ∙ Even distribution of tasks: efficient machines process more tasks 32
  • 84. advantages Advantages ∙ Highly scalable ∙ More slaves can be added easily ∙ Performance increases with number of slaves ∙ Even distribution of tasks: efficient machines process more tasks ∙ Architecture is very reusable 32
  • 85. advantages Advantages ∙ Highly scalable ∙ More slaves can be added easily ∙ Performance increases with number of slaves ∙ Even distribution of tasks: efficient machines process more tasks ∙ Architecture is very reusable ∙ Many other problems can be solved using this architecture 32
  • 86. advantages Advantages ∙ Highly scalable ∙ More slaves can be added easily ∙ Performance increases with number of slaves ∙ Even distribution of tasks: efficient machines process more tasks ∙ Architecture is very reusable ∙ Many other problems can be solved using this architecture ∙ Only need to provide 2 functions: initialize and process 32
  • 87. advantages Advantages ∙ Highly scalable ∙ More slaves can be added easily ∙ Performance increases with number of slaves ∙ Even distribution of tasks: efficient machines process more tasks ∙ Architecture is very reusable ∙ Many other problems can be solved using this architecture ∙ Only need to provide 2 functions: initialize and process ∙ Network fault tolerant 32
  • 88. advantages Other problems that can be solved using this paradigm ∙ Generating all cliques, paths, cycles, sub-trees, spanning sub-trees 33
  • 89. advantages Other problems that can be solved using this paradigm ∙ Generating all cliques, paths, cycles, sub-trees, spanning sub-trees ∙ Can also solve few classical NP problems like finding all maximal cliques and TSP 33
  • 91. future works Further improvements ∙ Implement parallelized bloom filter 35
  • 92. future works Further improvements ∙ Implement parallelized bloom filter ∙ Parallely solving tasks in a slave (on powerful servers) 35
  • 93. future works Further improvements ∙ Implement parallelized bloom filter ∙ Parallely solving tasks in a slave (on powerful servers) ∙ Handle slave/master failures 35
  • 94. future works Further improvements ∙ Implement parallelized bloom filter ∙ Parallely solving tasks in a slave (on powerful servers) ∙ Handle slave/master failures ∙ Using file I/O to store task queue for large problems 35
  • 95. future works Further improvements ∙ Implement parallelized bloom filter ∙ Parallely solving tasks in a slave (on powerful servers) ∙ Handle slave/master failures ∙ Using file I/O to store task queue for large problems ∙ Exploring this paradigm to solve other problems 35
  • 97. conclusion Conclusion ∙ The algorithm is very efficient, total computation is not greater than m * T, where T is the minimum computation required to find all sub-graphs and m is number of edges. 37
  • 98. conclusion Conclusion ∙ The algorithm is very efficient, total computation is not greater than m * T, where T is the minimum computation required to find all sub-graphs and m is number of edges. ∙ In practice time complexity is c*T where c is much smaller. Bound on c can be improved to min(m, log T). 37
  • 99. conclusion Conclusion ∙ The algorithm is very efficient, total computation is not greater than m * T, where T is the minimum computation required to find all sub-graphs and m is number of edges. ∙ In practice time complexity is c*T where c is much smaller. Bound on c can be improved to min(m, log T). ∙ As we are interested in finding all connected sub-graph, T better not be very large. 37
  • 100. conclusion Conclusion ∙ The algorithm is very efficient, total computation is not greater than m * T, where T is the minimum computation required to find all sub-graphs and m is number of edges. ∙ In practice time complexity is c*T where c is much smaller. Bound on c can be improved to min(m, log T). ∙ As we are interested in finding all connected sub-graph, T better not be very large. ∙ The architecture help us solve this problem in much scalable manner and significantly reduces the time of computation provided good infrastructure and better implementation. 37
  • 101. Questions? Implementation of the algorithm and the architecture available at github.com/abhilak/DGA Slides created using Beamer(mtheme) and plot.ly on ShareLaTeX 38