SlideShare a Scribd company logo
Finding Small Dense
Subgraphs
Big Data Computing 2014 – 2015 Pavel Popa – Rui Madeira
Algorithm
 3 Rounds of map-reduce
 Based on graph pruning to eliminate the edges that
least contribute to the best solution
 Graph pruning is done in parallel based on graph
partitioning
 Data structures prepared to have optimal access
time to nodes
1
Round 1 – Computing graph density
 Mapper:
 Goes through all the graph outputting all the edges to the
same Reducer
 Reducer:
 Goes through the list received calculating the graph
density by storing the number of edges and nodes
 Outputs the complete list of edges and the graph density
2
Round 2 – Partitioning and Pruning the graph
 Mapper:
 Goes through all the edges and for each one it ouputs a
<SubgraphID, edge> key value pair
 SubgraphID randomly chose in [0, graph density)
 Reducer:
 Each reducer takes care of a subgraph
 Goes through the edges list, if both endpoints of the edge
have degree > graph density, it outputs it
3
Subgraph 0 Subgraph 1
Round 2 – Partitioning and Pruning the graph
4
Round 3 – Finding the smallest subgraph
 Mapper:
 Outputs all the edges to the same reducer
 Reducer:
 Iterates until there are no nodes in the graph
 It removes the node with minimum degree at each step
 It calculates the new graph density and number of nodes
 If it’s better, stores the number of nodes and the graph
5
Implementation details
 Importance of the number of graph partitions
 Balance between more parallel processing and loss of
information
 Importance of the threshold for pruning
 Balance between processing time and loss of information
 In the last round min-heap was used to have optimal acess
to the graph nodes, easily extract the minimum and reorder
by decreasing the keys (the degree of the node) of its
neighbors
6
Running times on Amazon AWS
Round 1 Round 2 Round 3
as-skitter 1 minute 1 minute 48 seconds
web-BerkStan 8 minutes 1 minute 44 seconds
loc-gowalla 57 seconds 58 seconds 42 seconds
For the last round the time depends on the rho, however the
time variations between values are minimal
7
Results
rho = 2 rho = 3 rho = 4
as-skitter 5 - 10 7 - 21 9 - 36
web-BerkStan 5 - 10 7 - 21 9 - 36
loc-gowalla 5 - 10 7 - 21 9 - 36
Results obtained with the first correct run of the algorithm.
No improvement was made since an optimal solution was
found
8

More Related Content

What's hot

Lesson9
Lesson9Lesson9
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
Daniel Abadi
 
Optimization of graph storage using GoFFish
Optimization of graph storage using GoFFishOptimization of graph storage using GoFFish
Optimization of graph storage using GoFFish
Anushree Prasanna Kumar
 
Suft
SuftSuft
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce script
Haripritha
 
3D Analyst - Watershed, Padang
3D Analyst - Watershed, Padang3D Analyst - Watershed, Padang
3D Analyst - Watershed, Padang
Hartanto Sanjaya
 
Super COMPUTING Journal
Super COMPUTING JournalSuper COMPUTING Journal
Super COMPUTING Journal
Pandey_G
 
PRIM’S AND KRUSKAL’S ALGORITHM
PRIM’S AND KRUSKAL’S  ALGORITHMPRIM’S AND KRUSKAL’S  ALGORITHM
PRIM’S AND KRUSKAL’S ALGORITHM
JaydeepDesai10
 
Network topologies working
Network topologies workingNetwork topologies working
Network topologies working
MY_Education_System
 
3D Analyst Watershed Lombok
3D Analyst Watershed  Lombok3D Analyst Watershed  Lombok
3D Analyst Watershed Lombok
Hartanto Sanjaya
 
3D Analyst - Watershed, Lombok
3D Analyst - Watershed, Lombok3D Analyst - Watershed, Lombok
3D Analyst - Watershed, Lombok
Hartanto Sanjaya
 
LIDAR-derived DTM for archaeology and landscape history research some recent ...
LIDAR-derived DTM for archaeology and landscape history research some recent ...LIDAR-derived DTM for archaeology and landscape history research some recent ...
LIDAR-derived DTM for archaeology and landscape history research some recent ...
Shaun Lewis
 
3D Watershed Celebes
3D Watershed Celebes3D Watershed Celebes
3D Watershed Celebes
Hartanto Sanjaya
 
Graph chi
Graph chiGraph chi
Graph chi
Jay Rathod
 
C-SAW: A Framework for Graph Sampling and Random Walk on GPUs
C-SAW: A Framework for Graph Sampling and Random Walk on GPUsC-SAW: A Framework for Graph Sampling and Random Walk on GPUs
C-SAW: A Framework for Graph Sampling and Random Walk on GPUs
Pandey_G
 
Dc project 1
Dc project 1Dc project 1
Dc project 1
shwetha mk
 
Wei's notes on MapReduce Scheduling
Wei's notes on MapReduce SchedulingWei's notes on MapReduce Scheduling
Wei's notes on MapReduce Scheduling
Lu Wei
 
3D Analyst - Watershed from SRTM
3D Analyst - Watershed from SRTM3D Analyst - Watershed from SRTM
3D Analyst - Watershed from SRTM
Hartanto Sanjaya
 
3D Analyst - Watershed and Stream Network
3D Analyst - Watershed and Stream Network3D Analyst - Watershed and Stream Network
3D Analyst - Watershed and Stream Network
Hartanto Sanjaya
 
study Latent Doodle Space
study Latent Doodle Spacestudy Latent Doodle Space
study Latent Doodle Space
Chiamin Hsu
 

What's hot (20)

Lesson9
Lesson9Lesson9
Lesson9
 
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
 
Optimization of graph storage using GoFFish
Optimization of graph storage using GoFFishOptimization of graph storage using GoFFish
Optimization of graph storage using GoFFish
 
Suft
SuftSuft
Suft
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce script
 
3D Analyst - Watershed, Padang
3D Analyst - Watershed, Padang3D Analyst - Watershed, Padang
3D Analyst - Watershed, Padang
 
Super COMPUTING Journal
Super COMPUTING JournalSuper COMPUTING Journal
Super COMPUTING Journal
 
PRIM’S AND KRUSKAL’S ALGORITHM
PRIM’S AND KRUSKAL’S  ALGORITHMPRIM’S AND KRUSKAL’S  ALGORITHM
PRIM’S AND KRUSKAL’S ALGORITHM
 
Network topologies working
Network topologies workingNetwork topologies working
Network topologies working
 
3D Analyst Watershed Lombok
3D Analyst Watershed  Lombok3D Analyst Watershed  Lombok
3D Analyst Watershed Lombok
 
3D Analyst - Watershed, Lombok
3D Analyst - Watershed, Lombok3D Analyst - Watershed, Lombok
3D Analyst - Watershed, Lombok
 
LIDAR-derived DTM for archaeology and landscape history research some recent ...
LIDAR-derived DTM for archaeology and landscape history research some recent ...LIDAR-derived DTM for archaeology and landscape history research some recent ...
LIDAR-derived DTM for archaeology and landscape history research some recent ...
 
3D Watershed Celebes
3D Watershed Celebes3D Watershed Celebes
3D Watershed Celebes
 
Graph chi
Graph chiGraph chi
Graph chi
 
C-SAW: A Framework for Graph Sampling and Random Walk on GPUs
C-SAW: A Framework for Graph Sampling and Random Walk on GPUsC-SAW: A Framework for Graph Sampling and Random Walk on GPUs
C-SAW: A Framework for Graph Sampling and Random Walk on GPUs
 
Dc project 1
Dc project 1Dc project 1
Dc project 1
 
Wei's notes on MapReduce Scheduling
Wei's notes on MapReduce SchedulingWei's notes on MapReduce Scheduling
Wei's notes on MapReduce Scheduling
 
3D Analyst - Watershed from SRTM
3D Analyst - Watershed from SRTM3D Analyst - Watershed from SRTM
3D Analyst - Watershed from SRTM
 
3D Analyst - Watershed and Stream Network
3D Analyst - Watershed and Stream Network3D Analyst - Watershed and Stream Network
3D Analyst - Watershed and Stream Network
 
study Latent Doodle Space
study Latent Doodle Spacestudy Latent Doodle Space
study Latent Doodle Space
 

Viewers also liked

YOUR VANCOUVER PLUMBER
YOUR VANCOUVER PLUMBERYOUR VANCOUVER PLUMBER
YOUR VANCOUVER PLUMBER
amit kumar
 
100% 30 pages
100% 30 pages100% 30 pages
100% 30 pages
mediahta
 
Ten German Lessons for Canadian Energy Policy
Ten German Lessons for Canadian Energy PolicyTen German Lessons for Canadian Energy Policy
Ten German Lessons for Canadian Energy Policy
Michael Clarke
 
Jozhus Book
Jozhus BookJozhus Book
2plan_Edition16_Newsletter
2plan_Edition16_Newsletter2plan_Edition16_Newsletter
Presentación PLE, M.Personal e ID
Presentación PLE, M.Personal e IDPresentación PLE, M.Personal e ID
Presentación PLE, M.Personal e ID
Luis Corró
 
Prestação de Contas - NISP - 2014/2015
Prestação de Contas - NISP - 2014/2015Prestação de Contas - NISP - 2014/2015
Prestação de Contas - NISP - 2014/2015
Daniel Pinheiro
 
Magnus Technical News Letter Nov-2016
Magnus Technical News Letter Nov-2016Magnus Technical News Letter Nov-2016
Magnus Technical News Letter Nov-2016
Aravind K
 
נגישות אתרים
נגישות אתריםנגישות אתרים
נגישות אתרים
Go Internet Marketing
 
General Biocides Booklet
General Biocides BookletGeneral Biocides Booklet
General Biocides Booklet
Emilie Kowalczewski
 

Viewers also liked (10)

YOUR VANCOUVER PLUMBER
YOUR VANCOUVER PLUMBERYOUR VANCOUVER PLUMBER
YOUR VANCOUVER PLUMBER
 
100% 30 pages
100% 30 pages100% 30 pages
100% 30 pages
 
Ten German Lessons for Canadian Energy Policy
Ten German Lessons for Canadian Energy PolicyTen German Lessons for Canadian Energy Policy
Ten German Lessons for Canadian Energy Policy
 
Jozhus Book
Jozhus BookJozhus Book
Jozhus Book
 
2plan_Edition16_Newsletter
2plan_Edition16_Newsletter2plan_Edition16_Newsletter
2plan_Edition16_Newsletter
 
Presentación PLE, M.Personal e ID
Presentación PLE, M.Personal e IDPresentación PLE, M.Personal e ID
Presentación PLE, M.Personal e ID
 
Prestação de Contas - NISP - 2014/2015
Prestação de Contas - NISP - 2014/2015Prestação de Contas - NISP - 2014/2015
Prestação de Contas - NISP - 2014/2015
 
Magnus Technical News Letter Nov-2016
Magnus Technical News Letter Nov-2016Magnus Technical News Letter Nov-2016
Magnus Technical News Letter Nov-2016
 
נגישות אתרים
נגישות אתריםנגישות אתרים
נגישות אתרים
 
General Biocides Booklet
General Biocides BookletGeneral Biocides Booklet
General Biocides Booklet
 

Similar to BDC-presentation

Assignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceAssignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduce
Shantanu Sharma
 
Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)
Nima Sarshar
 
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceComputing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Leonidas Akritidis
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
Romain Jacotin
 
Bf4102414417
Bf4102414417Bf4102414417
Bf4102414417
IJERA Editor
 
Hadoop classes in mumbai
Hadoop classes in mumbaiHadoop classes in mumbai
Hadoop classes in mumbai
Vibrant Technologies & Computers
 
Elementary Parallel Algorithms
Elementary Parallel AlgorithmsElementary Parallel Algorithms
Elementary Parallel Algorithms
Heman Pathak
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
Chicago Hadoop Users Group
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Unit i
Unit iUnit i
Unit i
guna287176
 
user_defined_functions_forinterpolation
user_defined_functions_forinterpolationuser_defined_functions_forinterpolation
user_defined_functions_forinterpolation
sushanth tiruvaipati
 
Unit i
Unit iUnit i
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
Jenny Liu
 
Data analytics concepts
Data analytics conceptsData analytics concepts
Data analytics concepts
Hiranthi Tennakoon
 
DataMiningReport
DataMiningReportDataMiningReport
DataMiningReport
?? ?
 
(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...
(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...
(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...
Naoki Shibata
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
ShimoFcis
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
ateeq ateeq
 
lecture14.ppt
lecture14.pptlecture14.ppt
lecture14.ppt
SivaSankaran81
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
prithan
 

Similar to BDC-presentation (20)

Assignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceAssignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduce
 
Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)
 
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceComputing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
 
Bf4102414417
Bf4102414417Bf4102414417
Bf4102414417
 
Hadoop classes in mumbai
Hadoop classes in mumbaiHadoop classes in mumbai
Hadoop classes in mumbai
 
Elementary Parallel Algorithms
Elementary Parallel AlgorithmsElementary Parallel Algorithms
Elementary Parallel Algorithms
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Unit i
Unit iUnit i
Unit i
 
user_defined_functions_forinterpolation
user_defined_functions_forinterpolationuser_defined_functions_forinterpolation
user_defined_functions_forinterpolation
 
Unit i
Unit iUnit i
Unit i
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 
Data analytics concepts
Data analytics conceptsData analytics concepts
Data analytics concepts
 
DataMiningReport
DataMiningReportDataMiningReport
DataMiningReport
 
(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...
(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...
(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
lecture14.ppt
lecture14.pptlecture14.ppt
lecture14.ppt
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 

BDC-presentation

  • 1. Finding Small Dense Subgraphs Big Data Computing 2014 – 2015 Pavel Popa – Rui Madeira
  • 2. Algorithm  3 Rounds of map-reduce  Based on graph pruning to eliminate the edges that least contribute to the best solution  Graph pruning is done in parallel based on graph partitioning  Data structures prepared to have optimal access time to nodes 1
  • 3. Round 1 – Computing graph density  Mapper:  Goes through all the graph outputting all the edges to the same Reducer  Reducer:  Goes through the list received calculating the graph density by storing the number of edges and nodes  Outputs the complete list of edges and the graph density 2
  • 4. Round 2 – Partitioning and Pruning the graph  Mapper:  Goes through all the edges and for each one it ouputs a <SubgraphID, edge> key value pair  SubgraphID randomly chose in [0, graph density)  Reducer:  Each reducer takes care of a subgraph  Goes through the edges list, if both endpoints of the edge have degree > graph density, it outputs it 3
  • 5. Subgraph 0 Subgraph 1 Round 2 – Partitioning and Pruning the graph 4
  • 6. Round 3 – Finding the smallest subgraph  Mapper:  Outputs all the edges to the same reducer  Reducer:  Iterates until there are no nodes in the graph  It removes the node with minimum degree at each step  It calculates the new graph density and number of nodes  If it’s better, stores the number of nodes and the graph 5
  • 7. Implementation details  Importance of the number of graph partitions  Balance between more parallel processing and loss of information  Importance of the threshold for pruning  Balance between processing time and loss of information  In the last round min-heap was used to have optimal acess to the graph nodes, easily extract the minimum and reorder by decreasing the keys (the degree of the node) of its neighbors 6
  • 8. Running times on Amazon AWS Round 1 Round 2 Round 3 as-skitter 1 minute 1 minute 48 seconds web-BerkStan 8 minutes 1 minute 44 seconds loc-gowalla 57 seconds 58 seconds 42 seconds For the last round the time depends on the rho, however the time variations between values are minimal 7
  • 9. Results rho = 2 rho = 3 rho = 4 as-skitter 5 - 10 7 - 21 9 - 36 web-BerkStan 5 - 10 7 - 21 9 - 36 loc-gowalla 5 - 10 7 - 21 9 - 36 Results obtained with the first correct run of the algorithm. No improvement was made since an optimal solution was found 8