SlideShare a Scribd company logo
Navlakha, et al. (2008, June)
Graph summarization with bounded error.
SIGMOD international conference on Management of data. ACM.
Khan, K. et al. (2015)
Set-based approximate approach for lossless graph summarization
Computing 97.12
Liu, X., et al. (2014, November)
Distributed graph summarization.
23rd International Conference on Conference on Information and KM. ACM.
Aftab Alam
12 Jun 2017
Department of Computer Engineering, Kyung Hee University
Distributed graph summarization
Contents
Introduction
Conclusion
Experimental Evaluation
Graph Summarization with BE
Solution
7
6
5
2
1
4
3 Distributed Graph Summarization
Challenges in DGS
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Graph Summarization with Bounded Error
• Many interactions can be represented as graphs
– Webgraphs:
o search engine, etc.
– Social networks:
o mine user communities, viral marketing
– Email exchanges:
o security. virus spread, spam detection
– Market basket data:
o customer profiles, targeted advertising
– Netflow graphs
o (which IPs talk to each other):
o traffic patterns, security, worm attacks
• Need to compress, understand
– Webgraph ~ 50 billion edges;
social networks ~ few million, growing quickly
– Compression reduces size to one-tenth (webgraphs)
• Graph summarization is NP-hard
Large Graphs
SN
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Graph Summarization with Bounded Error
Out Approach
• Graph Compression (reference encoding)
– Not applicable to all graphs: use urls, node labels for compression
– Resulting structure is hard to visualize/interpret
• Graph Clustering
– Nice summary, works for generic graphs
– No compression: needs the same memory to store the graph itself
• MDL-based representation R = (S,C)
– S is a high-level summary graph:
o compact, highlights dominant trends, easy to visualize
– C is a set of edge corrections:
o help in reconstructing the graph
– Compression based on MDL principle:
o minimize cost of S+C
information-theoretic approach; parameter less; applicable to any graph
– Novel Approximate Representation:
o reconstructs graph with bounded error (є);
o results in better compression
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Graph Summarization with Bounded Error
How do we compress?
• Compression possible (S)
– Many nodes with similar neighborhoods
o Communities in social networks
o link-copying in webpages
– Collapse
o such nodes into supernodes
o and the edges into superedges
o Bipartite subgraph to two supernodes and a superedge
o Clique to supernode with a “self-edge”
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Graph Summarization with Bounded Error
• Need to correct mistakes (C)
– Most superedges are not complete
o Nodes don’t have exact same neighbors:
 friends in social networks
– Remember edge-corrections
o Edges not present in superedges
 (-ve corrections)
o Extra edges not counted in superedges
 (+ve corrections)
• Minimize overall storage cost = S+C
How do we compress?
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Graph Summarization with Bounded Error
• Summary S(VS, ES)
– Each supernode v represents a set of nodes Av
– Each superedge (u,v) represents all pair of edges πuv = Au x Av
• Corrections C: {(a,b); a and b are nodes of G}
• Supernodes are key, superedges/corrections easy
– Auv actual edges of G between Au and Av
– Cost with (u,v) = 1 + |πuv – Euv|
– Cost without (u,v) = |Euv|
– Choose the minimum, decides whether edge (u,v) is in S
Representation Structure R=(S,C)
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Graph Summarization with Bounded Error
• Reconstructing the graph from R
– For all superedges (u,v) in S, insert all pair of edges πuv
– For all +ve corrections +(a,b), insert edge (a,b)
– For all -ve corrections -(a,b), delete edge (a,b)
Representation Structure R=(S,C)
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Graph Summarization with Bounded Error
• Compressed graph
– MDL representation R=(S,C); є-representation
• Computing R=(S,C)
– GREEDY
– RANDOMIZED
Outline
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Graph Summarization with Bounded Error
• Cost of merging supernodes u and v into single
supernode w
– Recall: cost of a superedge (u,x):
o c(u,x) = min{|πvx – Avx|+1, |Avx|}
– cu = sum of costs of all its edges = Σx c(u,x)
– s(u,v) = (cu + cv – cw)/(cu + cv)
• Main idea:
– recursive bottom-up merging of supernodes
– If s(u,v) > 0, merging u and v reduces the cost of reduction
– Normalize the cost: remove bias towards high degree nodes
– Making supernodes is the key:
o superedges and corrections can be computed later
GREEDY
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Graph Summarization with Bounded Error
• Recall: s(u,v) = (cu + cv – cw)/(cu + cv)
• GREEDY algorithm
– Start with S=G
– At every step, pick the pair with max s(.) value, merge them
– If no pair has positive s(.) value, stop
GREEDY
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Graph Summarization with Bounded Error
• GREEDY is slow
– Need to find the pair with (globally) max s(.) value
– Need to process all pair of nodes at a distance of 2-hops
– Every merge changes costs of all pairs containing Nw
• Main idea: light weight randomized procedure
– Instead of choosing the globally best pair,
– Choose (randomly) a node u
– Merge the best pair containing u
RANDOMIZED
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Graph Summarization with Bounded Error
• Unfinished set U=VG
• At every step,
– randomly pick a node u from U
• Find the node v with max value
• If s(u,v) > 0,
– then merge u and v into w, put w in U
• Else remove u from U
• Repeat till U is not empty
RANDOMIZED
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Graph Summarization with Bounded Error
• CNR:
– web-graph dataset
• Routeview:
– autonomous systems topology of the internet
• Wordnet:
– English words, edges between related words (synonym, similar, etc.)
• Facebook:
– social networking
Experimental set-up
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Graph Summarization with Bounded Error
Cost Reduction (CNR dataset)
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Graph Summarization with Bounded Error
Comparison with other schemes & Cost Breakup
80% cost of representation
is due to corrections
The proposed techniques give much
better compression
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• All existing works in graph summarization are single-process solutions,
– as a result cannot scale to large graphs.
• Introduce three distributed graph summarization algorithms (DC).
– DistGreedy
– DistRandom
– DistLSH
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Nodes and edges are distributed in different machines
– requires message passing and
– careful coordination across multiple node
• Fully distributed graph summarization to achieve better parallelization
– should fully distribute computation across different machines for efficient
parallelization.
• Minimizing computation and communication costs
– smart techniques are needed to avoid unnecessary communication & computation
Challenges in Distributed Summarization
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Proposed three distributed algorithms for large scale graph summarization
• Implemented on top of Apache Giraph
– open source distributed graph processing platform
• Dist-Greedy
– examines all pairs of nodes with 2-hop distance
– thus causes a large amount of computation and communication cost.
• Dist-Random
– Reduces the number of examined node pairs using random selection.
– But randomness negatively affects the effectiveness of the algorithm.
• Dist-LSH
Solution
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Input G = (V, E),
• Summary graph for G is: S(G) = (VS, ES).
• The summary S(G) is an aggregated graph, in which
• is a partition of the nodes in
–
• Vi a supernode,
– representing an aggregation of a subset of the original nodes.
– V(v) to denote the supernode that an original node v belongs to.
• Superedge:
– Each (Vi, Vj) ∈ ES is called a superedge,
– representing all-to-all connections between nodes in Vi and nodes in Vj
• Errors in summary graph
• The connection error among each pair of super-nodes Vi and Vj is:
Preliminaries
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Given a graph G
– and a desired number of super-nodes k,
– compute a summary graph S(G) with k super-nodes,
– such that the summary error is minimized.
• Graph summarization is NP-hard
– Difficult part is determining the super-nodes VS
– Once the supernodes are decided,
o constructing the super-edges with minimum summary
o error can be achieved in polynomial time.
Preliminaries > Graph Summarization Problem
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Giraph is an open source implementation of Pregel
• Supports
– Iterative algorithms and
– vertex-to-vertex communication in a distributed graph
• Giraph program consists
– input step (graph initialization)
– followed by a sequence of iterations (called supersteps)
– an output
• Vertex-centric model
– Each vertex
o is considered an independent computing unit
o Has a unique id, A set of outgoing edges
o application-dependent attributes of the vertex and Its edges
GIRAPH OVERVIEW
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Distributed graph summarization
– same iterative merging mechanism in the centralized algorithm
– starting from the original graph as the summary
o each node is a super-node and
o iteratively merging super-nodes until k super-nodes left.
– In Centralized algorithms easy
o Single process with share memory
o to decide which pairs of super-nodes are good candidates for merge &
o perform these merge operations
– In Giraph distributed environment,
o All the decisions and operations have to be done in a distributed way
o through message passing and synchronization
o To fully utilize the parallelization
 need to find multiple pairs of nodes to merge, and
 simultaneously merge them in each iteration.
Main idea
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Two challenges define two crucial tasks:
– Candidates-Find task
o The Candidates-Find task decides on the pairs of super-nodes to be merged.
– Merge task
o Whereas the Merge task executes these merges
• Propose three distributed graph summarization algorithms:
– Dist-Greedy,
– Dist-Random and
– Dist-LSH
• Three algorithms share the same operations in the Merge task
• Differ in how merge candidates are selected.
Challenges
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Each Giraph vertex
– Has three attributes associated with vertices
o owner-id: points to which other super-node this super-node has been merged to.
o size: records the number of nodes in the original graph contained in this super-node.
o selfconn: represents the number of edges in connecting the nodes inside this super-node.
– Two attributes associated with edges
o size: caches the number of nodes in the other adjacent super-node of the edge to avoid an
additional round of query for this value.
o conn: is the number of edges in the original graph between this super-node and the neighbor.
Giraph vertex’s Data structure
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Super-steps:
– Candidates-Find task &
– Merge task
• ExecutionPhase (Aggregator )
– Indicate COMPUTE() function currently current
Phase.
• Based on the previous value of ExecutionPhase,
– we can set the right value to this aggregator in
the PRESUPERSTEP function before each
superstep starts.
Overview
• ActiveNodes (Aggregator)
• is used to keep track of the number of super-nodes in the current summary.
• When the summary size is less or equal to the required size k,
• the value of the ExecutionPhase will be set to DONE.
• In this case, in the COMPUTE() function, every vertex will vote to halt.
• Then the whole program will finish.
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• How to find pairs of super-nodes as candidates to merge in
– DistGreedy
– DistRandom
– DistLSH.
• FindCandidates(msgs)
FINDING MERGE CANDIDATES
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• DistGreedy
– based on the centralized Greedy algorithm.
– looks at super-nodes that are 2-hops away to each other and
– thrives to find the pairs with minimum error increase.
FINDING MERGE CANDIDATES > DistGreedy
– To control the number of super-node pairs to be merged in each iteration,
o use a threshold called ErrorThreshold
o as the cutoff for which pairs qualify as merge candidates.
– every pairs with error increase < ErrorThreshold
o will become merge candidates.
– In start, ErrorThreshold = 0 (no error)
– Number of merge candidates fall below 5% of the current summary size,
o the algorithm increases ErrorThreshold by a controllable parameter,
o called ThresholdIncrease, for the subsequent iterations.
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Major task
– To compute the actual error increase for each pair of 2-
hop-away super-nodes
• simple in the centralized Greedy
• More complex in the distributed environment,
– as the information to compute the error increase is
distributed in different places.
FINDING MERGE CANDIDATES > DistGreedy (Cont’d)
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Error increase for merging a pair of
supernodes Vi and Vj can be decomposed
into 3 parts:
– Common Neighbor Error Increase
– Unique Neighbor Error Increase
– Self Error Increase
FINDING MERGE CANDIDATES > DistGreedy (Cont’d)
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Common Neighbor Error Increase
– requires the error increase associated with
the connections of Vi and Vj to all their
common neighbors.
– For a common neighbor, say Vp
o error before the merge is
o After merge =
o Thus error increase of merging Vi and Vj w.r.t.
common neighbor Vp is:
o Collectively computed common neighbors:
FINDING MERGE CANDIDATES > DistGreedy (Cont’d)
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Unique Neighbor Error Increase
– Computation requires only unique neighbors
of each super-node.
– Vi and Vj can independently compute this
part of error increase.
– For the unique neighbor Vq in Fig.
– Error increase associated with Vi unique
neighbors
– Similar for Vj
– The total is a simple sum of the two:
FINDING MERGE CANDIDATES > DistGreedy (Cont’d)
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Self Error Increase
– requires collaboration between Vi and Vj
• Between the two super-nodes,
– the one with a larger id, say Vj
– Sends its self-conn to Vi
– Then at Vi,
– self-loop error
FINDING MERGE CANDIDATES > DistGreedy (Cont’d)
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Finally:
– the three parts of error increase will be
aggregated at the super-node with the
smaller id, Vi in our example.
– This requires messages from
o common neighbors
o Unique Neighbors
o Self Connections
• Then Vi can simply test whether
– the total error increase is below ErrorThreshold
– or not to decide on
– whether the two super-nodes should be merged.
FINDING MERGE CANDIDATES > DistGreedy (Cont’d)
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• DistGreedy
– Algorithm 2
o DistGreedy’s FindCandidates function.
– There are three phases for this function.
– Giraph vertex = different roles in computation
– Aggregator ExecutionPhase
o indicate current superstep phase.
– First phase
o Giraph vertex role = common neighbor
o Vp, to a potential merge candidate Vi and Vj
o neighbors of Vp are all two hops away from each
other
o Vp will compute for all pairs of neighbors
Vi and Vj
o And send to the super-node in the pair
with the smaller id, Vi.
FINDING MERGE CANDIDATES
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• DistGreedy - Time complexity
– d = average number of neighbors of a vertex.
– then average no. of 2-hop away neighbors for a vertex is d2
– the computation of all the different 2-hop away neighbors
– complexity is O(d2, N)
o where N is the total number of vertices.
– Same for
– computation phase
o iterates through each 1-hop neighbor Vq to compute for every 2-hop neighbor Vj ,
 thus has a time complexity of O(d3 N).
– Overall DistGreedy time complexity is =
FINDING MERGE CANDIDATES > DistGreedy (Cont’d)
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• DistRandom
– DistGreedy blindly examines all super-node pairs of 2 hops
o large amount of computation
o network messages.
– DistRandom randomly selects some super-node pairs to examine.
– DistRandom also has the following three supersteps.
o super-node randomly selects one neighbor
 sends a message to this neighbor, including its
» size, selfconn, all neighbors’ size and conn.
o neighbor receives the message and forwards it to a random chosen neighbor with an id
smaller than the sender.
o The 2-hop away neighbor receives this message and use it to compute the error increase. If
the error increase is above ErrorThreshold, then a merge decision is made.
– Time complexity is O(d, N)
FINDING MERGE CANDIDATES > DistRandom
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• After Candidates-Find task
– Super-nodes to be merged
• How to merge super-nodes distributedly?
• For every vertex merge
– Instead of creating a new merged super-node
– always reuse the super-node
o with the smaller id as the merged super-node.
• super-node with larger id shall set its owner-id to the
merged super-node.
– and call VOTETOHALT()
– to turn itself to inactive.
MERGING SUPER-NODES
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Issue?
– merge super-nodes Vi and Vj in to Vi
– issue is that there could be another merge
decision that requires Vi merged into Vg.
– Efficiently merge multiple super-node pairs
distributedly
– we introduce a repeatable merge decision
propagation phase to ensure all the super-nodes
know whom they eventually should be merge
into.
– This design decision is essential to save overall
supersteps and messages,
– Vertex id is much cheaper to propagate than real
vertex data.
• Decision Propagation Phase
• Connection Switch Phase
• Connection Merge Phase
• State Update Phase
MERGING SUPER-NODES
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Decision Propagation Phase
– Vi will notify Vj and Vg will notify Vi
• Connection Switch Phase
– each super-nodes to be merged
– shall notify its neighbors to update this neighbor
information
– self.size, self.conn,
– All neighbor’s nbr.sizes and
– nbr.conns.
• Connection Merge Phase
– receivers of the connection switch messages shall
update their neighbor list with the new neighbor ids
• State Update Phase
– performs the actual merge by updating all the attributes
MERGING SUPER-NODES
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Environment
– Cluster of 16-node (IBM SystemX iDataPlex dx340)
– 32GB RAM,
– Ubuntu Linux, Java 1.6, Giraph trunk version
• Dataset:
EXPERIMENTAL EVALUATION
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Log-scaled graph summary error histograms
– across different graph summary sizes for three real datasets.
EXPERIMENTAL EVALUATION
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
• Log-scaled running time histograms
– across different graph summary sizes for three real datasets..
EXPERIMENTAL EVALUATION
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Distributed Graph Summarization
EXPERIMENTAL EVALUATION
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Conclusion and Future work
• Presented a highly compact two-part representation
– R(S,C) of the input graph G
– based on the MDL principle.
o Greedy, Random and LSH based.
– The same has been implemented in distributed environment.
Your Logo
THANK YOU!
?

More Related Content

What's hot

IRJET- Devnagari Text Detection
IRJET- Devnagari Text DetectionIRJET- Devnagari Text Detection
IRJET- Devnagari Text Detection
IRJET Journal
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial Data
Kostis Kyzirakos
 
Contour Line Tracing Algorithm for Digital Topographic Maps
Contour Line Tracing Algorithm for Digital Topographic MapsContour Line Tracing Algorithm for Digital Topographic Maps
Contour Line Tracing Algorithm for Digital Topographic Maps
CSCJournals
 
Miniproject final group 14
Miniproject final group 14Miniproject final group 14
Miniproject final group 14Ashish Mundhra
 
A study and implementation of the transit route network design problem for a ...
A study and implementation of the transit route network design problem for a ...A study and implementation of the transit route network design problem for a ...
A study and implementation of the transit route network design problem for a ...
csandit
 
Graph Based Pattern Recognition
Graph Based Pattern RecognitionGraph Based Pattern Recognition
Graph Based Pattern Recognition
Nicola Strisciuglio
 
Design and Implementation of Mobile Map Application for Finding Shortest Dire...
Design and Implementation of Mobile Map Application for Finding Shortest Dire...Design and Implementation of Mobile Map Application for Finding Shortest Dire...
Design and Implementation of Mobile Map Application for Finding Shortest Dire...
Eswar Publications
 
Laplacian-regularized Graph Bandits
Laplacian-regularized Graph BanditsLaplacian-regularized Graph Bandits
Laplacian-regularized Graph Bandits
lauratoni4
 
Topological data analysis
Topological data analysisTopological data analysis
Topological data analysis
Sunghyon Kyeong
 
AN IMPLEMENTATION OF ADAPTIVE PROPAGATION-BASED COLOR SAMPLING FOR IMAGE MATT...
AN IMPLEMENTATION OF ADAPTIVE PROPAGATION-BASED COLOR SAMPLING FOR IMAGE MATT...AN IMPLEMENTATION OF ADAPTIVE PROPAGATION-BASED COLOR SAMPLING FOR IMAGE MATT...
AN IMPLEMENTATION OF ADAPTIVE PROPAGATION-BASED COLOR SAMPLING FOR IMAGE MATT...
ijiert bestjournal
 
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial DataESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
Kostis Kyzirakos
 
Improved algorithm for road region segmentation based on sequential monte car...
Improved algorithm for road region segmentation based on sequential monte car...Improved algorithm for road region segmentation based on sequential monte car...
Improved algorithm for road region segmentation based on sequential monte car...
csandit
 
How to Decide the Best Fuzzy Model in ANFIS
How to Decide the Best Fuzzy Model in ANFIS How to Decide the Best Fuzzy Model in ANFIS
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
Daksh Raj Chopra
 
TEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATION
TEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATIONTEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATION
TEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATION
csandit
 
Beyond Bag of Features: Adaptive Hilbert Scan Based Tree for Image Retrieval
Beyond Bag of Features: Adaptive Hilbert Scan Based Tree for Image RetrievalBeyond Bag of Features: Adaptive Hilbert Scan Based Tree for Image Retrieval
Beyond Bag of Features: Adaptive Hilbert Scan Based Tree for Image Retrieval
Association of Scientists, Developers and Faculties
 
From Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog VisualizationFrom Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog Visualization
giurca
 
Modified CSLBP
Modified CSLBPModified CSLBP
Modified CSLBP
IJECEIAES
 
Large graph analysis using g mine system
Large graph analysis using g mine systemLarge graph analysis using g mine system
Large graph analysis using g mine system
saujog
 

What's hot (19)

IRJET- Devnagari Text Detection
IRJET- Devnagari Text DetectionIRJET- Devnagari Text Detection
IRJET- Devnagari Text Detection
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial Data
 
Contour Line Tracing Algorithm for Digital Topographic Maps
Contour Line Tracing Algorithm for Digital Topographic MapsContour Line Tracing Algorithm for Digital Topographic Maps
Contour Line Tracing Algorithm for Digital Topographic Maps
 
Miniproject final group 14
Miniproject final group 14Miniproject final group 14
Miniproject final group 14
 
A study and implementation of the transit route network design problem for a ...
A study and implementation of the transit route network design problem for a ...A study and implementation of the transit route network design problem for a ...
A study and implementation of the transit route network design problem for a ...
 
Graph Based Pattern Recognition
Graph Based Pattern RecognitionGraph Based Pattern Recognition
Graph Based Pattern Recognition
 
Design and Implementation of Mobile Map Application for Finding Shortest Dire...
Design and Implementation of Mobile Map Application for Finding Shortest Dire...Design and Implementation of Mobile Map Application for Finding Shortest Dire...
Design and Implementation of Mobile Map Application for Finding Shortest Dire...
 
Laplacian-regularized Graph Bandits
Laplacian-regularized Graph BanditsLaplacian-regularized Graph Bandits
Laplacian-regularized Graph Bandits
 
Topological data analysis
Topological data analysisTopological data analysis
Topological data analysis
 
AN IMPLEMENTATION OF ADAPTIVE PROPAGATION-BASED COLOR SAMPLING FOR IMAGE MATT...
AN IMPLEMENTATION OF ADAPTIVE PROPAGATION-BASED COLOR SAMPLING FOR IMAGE MATT...AN IMPLEMENTATION OF ADAPTIVE PROPAGATION-BASED COLOR SAMPLING FOR IMAGE MATT...
AN IMPLEMENTATION OF ADAPTIVE PROPAGATION-BASED COLOR SAMPLING FOR IMAGE MATT...
 
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial DataESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
 
Improved algorithm for road region segmentation based on sequential monte car...
Improved algorithm for road region segmentation based on sequential monte car...Improved algorithm for road region segmentation based on sequential monte car...
Improved algorithm for road region segmentation based on sequential monte car...
 
How to Decide the Best Fuzzy Model in ANFIS
How to Decide the Best Fuzzy Model in ANFIS How to Decide the Best Fuzzy Model in ANFIS
How to Decide the Best Fuzzy Model in ANFIS
 
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
 
TEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATION
TEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATIONTEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATION
TEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATION
 
Beyond Bag of Features: Adaptive Hilbert Scan Based Tree for Image Retrieval
Beyond Bag of Features: Adaptive Hilbert Scan Based Tree for Image RetrievalBeyond Bag of Features: Adaptive Hilbert Scan Based Tree for Image Retrieval
Beyond Bag of Features: Adaptive Hilbert Scan Based Tree for Image Retrieval
 
From Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog VisualizationFrom Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog Visualization
 
Modified CSLBP
Modified CSLBPModified CSLBP
Modified CSLBP
 
Large graph analysis using g mine system
Large graph analysis using g mine systemLarge graph analysis using g mine system
Large graph analysis using g mine system
 

Similar to Distributed graph summarization

SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONSCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
aftab alam
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection
aftab alam
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
Yu Liu
 
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx
thanhdowork
 
Sigmod11 outsource shortest path
Sigmod11 outsource shortest pathSigmod11 outsource shortest path
Sigmod11 outsource shortest pathredhatdb
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
ivaderivader
 
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
LDBC council
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
Nesreen K. Ahmed
 
Carved visual hulls for image based modeling
Carved visual hulls for image based modelingCarved visual hulls for image based modeling
Carved visual hulls for image based modeling
aftab alam
 
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsParallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Subhajit Sahu
 
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsParallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Subhajit Sahu
 
Distributed Graph Transformations Supported By Multi-Agent Systems
Distributed Graph Transformations Supported By Multi-Agent SystemsDistributed Graph Transformations Supported By Multi-Agent Systems
Distributed Graph Transformations Supported By Multi-Agent Systems
adamsedziwy
 
Hypergraph for consensus optimization
Hypergraph for consensus optimizationHypergraph for consensus optimization
Hypergraph for consensus optimization
Hershel Safer
 
Ijciet 10 01_183
Ijciet 10 01_183Ijciet 10 01_183
Ijciet 10 01_183
IAEME Publication
 
Vskills Certified CAD Sample Material
Vskills Certified CAD Sample MaterialVskills Certified CAD Sample Material
Vskills Certified CAD Sample Material
Vskills
 
240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...
240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...
240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...
thanhdowork
 
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
Ryan Rossi
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
Ben Mabey
 
UNIT II GEOMETRIC MODELING (COMPUTER AIDED DESIGN AND MANUFACTURING )
UNIT II GEOMETRIC MODELING (COMPUTER AIDED DESIGN AND MANUFACTURING )UNIT II GEOMETRIC MODELING (COMPUTER AIDED DESIGN AND MANUFACTURING )
UNIT II GEOMETRIC MODELING (COMPUTER AIDED DESIGN AND MANUFACTURING )
ravis205084
 

Similar to Distributed graph summarization (20)

SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONSCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
 
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx
 
Sigmod11 outsource shortest path
Sigmod11 outsource shortest pathSigmod11 outsource shortest path
Sigmod11 outsource shortest path
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
Carved visual hulls for image based modeling
Carved visual hulls for image based modelingCarved visual hulls for image based modeling
Carved visual hulls for image based modeling
 
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsParallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
 
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsParallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds
 
Distributed Graph Transformations Supported By Multi-Agent Systems
Distributed Graph Transformations Supported By Multi-Agent SystemsDistributed Graph Transformations Supported By Multi-Agent Systems
Distributed Graph Transformations Supported By Multi-Agent Systems
 
Hypergraph for consensus optimization
Hypergraph for consensus optimizationHypergraph for consensus optimization
Hypergraph for consensus optimization
 
Ijciet 10 01_183
Ijciet 10 01_183Ijciet 10 01_183
Ijciet 10 01_183
 
Vskills Certified CAD Sample Material
Vskills Certified CAD Sample MaterialVskills Certified CAD Sample Material
Vskills Certified CAD Sample Material
 
240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...
240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...
240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...
 
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
HalifaxNGGs
HalifaxNGGsHalifaxNGGs
HalifaxNGGs
 
UNIT II GEOMETRIC MODELING (COMPUTER AIDED DESIGN AND MANUFACTURING )
UNIT II GEOMETRIC MODELING (COMPUTER AIDED DESIGN AND MANUFACTURING )UNIT II GEOMETRIC MODELING (COMPUTER AIDED DESIGN AND MANUFACTURING )
UNIT II GEOMETRIC MODELING (COMPUTER AIDED DESIGN AND MANUFACTURING )
 

Recently uploaded

Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 

Recently uploaded (20)

Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 

Distributed graph summarization

  • 1. Navlakha, et al. (2008, June) Graph summarization with bounded error. SIGMOD international conference on Management of data. ACM. Khan, K. et al. (2015) Set-based approximate approach for lossless graph summarization Computing 97.12 Liu, X., et al. (2014, November) Distributed graph summarization. 23rd International Conference on Conference on Information and KM. ACM. Aftab Alam 12 Jun 2017 Department of Computer Engineering, Kyung Hee University
  • 2. Distributed graph summarization Contents Introduction Conclusion Experimental Evaluation Graph Summarization with BE Solution 7 6 5 2 1 4 3 Distributed Graph Summarization Challenges in DGS
  • 3. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Graph Summarization with Bounded Error • Many interactions can be represented as graphs – Webgraphs: o search engine, etc. – Social networks: o mine user communities, viral marketing – Email exchanges: o security. virus spread, spam detection – Market basket data: o customer profiles, targeted advertising – Netflow graphs o (which IPs talk to each other): o traffic patterns, security, worm attacks • Need to compress, understand – Webgraph ~ 50 billion edges; social networks ~ few million, growing quickly – Compression reduces size to one-tenth (webgraphs) • Graph summarization is NP-hard Large Graphs SN
  • 4. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Graph Summarization with Bounded Error Out Approach • Graph Compression (reference encoding) – Not applicable to all graphs: use urls, node labels for compression – Resulting structure is hard to visualize/interpret • Graph Clustering – Nice summary, works for generic graphs – No compression: needs the same memory to store the graph itself • MDL-based representation R = (S,C) – S is a high-level summary graph: o compact, highlights dominant trends, easy to visualize – C is a set of edge corrections: o help in reconstructing the graph – Compression based on MDL principle: o minimize cost of S+C information-theoretic approach; parameter less; applicable to any graph – Novel Approximate Representation: o reconstructs graph with bounded error (є); o results in better compression
  • 5. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Graph Summarization with Bounded Error How do we compress? • Compression possible (S) – Many nodes with similar neighborhoods o Communities in social networks o link-copying in webpages – Collapse o such nodes into supernodes o and the edges into superedges o Bipartite subgraph to two supernodes and a superedge o Clique to supernode with a “self-edge”
  • 6. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Graph Summarization with Bounded Error • Need to correct mistakes (C) – Most superedges are not complete o Nodes don’t have exact same neighbors:  friends in social networks – Remember edge-corrections o Edges not present in superedges  (-ve corrections) o Extra edges not counted in superedges  (+ve corrections) • Minimize overall storage cost = S+C How do we compress?
  • 7. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Graph Summarization with Bounded Error • Summary S(VS, ES) – Each supernode v represents a set of nodes Av – Each superedge (u,v) represents all pair of edges πuv = Au x Av • Corrections C: {(a,b); a and b are nodes of G} • Supernodes are key, superedges/corrections easy – Auv actual edges of G between Au and Av – Cost with (u,v) = 1 + |πuv – Euv| – Cost without (u,v) = |Euv| – Choose the minimum, decides whether edge (u,v) is in S Representation Structure R=(S,C)
  • 8. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Graph Summarization with Bounded Error • Reconstructing the graph from R – For all superedges (u,v) in S, insert all pair of edges πuv – For all +ve corrections +(a,b), insert edge (a,b) – For all -ve corrections -(a,b), delete edge (a,b) Representation Structure R=(S,C)
  • 9. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Graph Summarization with Bounded Error • Compressed graph – MDL representation R=(S,C); є-representation • Computing R=(S,C) – GREEDY – RANDOMIZED Outline
  • 10. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Graph Summarization with Bounded Error • Cost of merging supernodes u and v into single supernode w – Recall: cost of a superedge (u,x): o c(u,x) = min{|πvx – Avx|+1, |Avx|} – cu = sum of costs of all its edges = Σx c(u,x) – s(u,v) = (cu + cv – cw)/(cu + cv) • Main idea: – recursive bottom-up merging of supernodes – If s(u,v) > 0, merging u and v reduces the cost of reduction – Normalize the cost: remove bias towards high degree nodes – Making supernodes is the key: o superedges and corrections can be computed later GREEDY
  • 11. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Graph Summarization with Bounded Error • Recall: s(u,v) = (cu + cv – cw)/(cu + cv) • GREEDY algorithm – Start with S=G – At every step, pick the pair with max s(.) value, merge them – If no pair has positive s(.) value, stop GREEDY
  • 12. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Graph Summarization with Bounded Error • GREEDY is slow – Need to find the pair with (globally) max s(.) value – Need to process all pair of nodes at a distance of 2-hops – Every merge changes costs of all pairs containing Nw • Main idea: light weight randomized procedure – Instead of choosing the globally best pair, – Choose (randomly) a node u – Merge the best pair containing u RANDOMIZED
  • 13. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Graph Summarization with Bounded Error • Unfinished set U=VG • At every step, – randomly pick a node u from U • Find the node v with max value • If s(u,v) > 0, – then merge u and v into w, put w in U • Else remove u from U • Repeat till U is not empty RANDOMIZED
  • 14. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Graph Summarization with Bounded Error • CNR: – web-graph dataset • Routeview: – autonomous systems topology of the internet • Wordnet: – English words, edges between related words (synonym, similar, etc.) • Facebook: – social networking Experimental set-up
  • 15. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Graph Summarization with Bounded Error Cost Reduction (CNR dataset)
  • 16. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Graph Summarization with Bounded Error Comparison with other schemes & Cost Breakup 80% cost of representation is due to corrections The proposed techniques give much better compression
  • 17. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • All existing works in graph summarization are single-process solutions, – as a result cannot scale to large graphs. • Introduce three distributed graph summarization algorithms (DC). – DistGreedy – DistRandom – DistLSH
  • 18. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Nodes and edges are distributed in different machines – requires message passing and – careful coordination across multiple node • Fully distributed graph summarization to achieve better parallelization – should fully distribute computation across different machines for efficient parallelization. • Minimizing computation and communication costs – smart techniques are needed to avoid unnecessary communication & computation Challenges in Distributed Summarization
  • 19. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Proposed three distributed algorithms for large scale graph summarization • Implemented on top of Apache Giraph – open source distributed graph processing platform • Dist-Greedy – examines all pairs of nodes with 2-hop distance – thus causes a large amount of computation and communication cost. • Dist-Random – Reduces the number of examined node pairs using random selection. – But randomness negatively affects the effectiveness of the algorithm. • Dist-LSH Solution
  • 20. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Input G = (V, E), • Summary graph for G is: S(G) = (VS, ES). • The summary S(G) is an aggregated graph, in which • is a partition of the nodes in – • Vi a supernode, – representing an aggregation of a subset of the original nodes. – V(v) to denote the supernode that an original node v belongs to. • Superedge: – Each (Vi, Vj) ∈ ES is called a superedge, – representing all-to-all connections between nodes in Vi and nodes in Vj • Errors in summary graph • The connection error among each pair of super-nodes Vi and Vj is: Preliminaries
  • 21. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Given a graph G – and a desired number of super-nodes k, – compute a summary graph S(G) with k super-nodes, – such that the summary error is minimized. • Graph summarization is NP-hard – Difficult part is determining the super-nodes VS – Once the supernodes are decided, o constructing the super-edges with minimum summary o error can be achieved in polynomial time. Preliminaries > Graph Summarization Problem
  • 22. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Giraph is an open source implementation of Pregel • Supports – Iterative algorithms and – vertex-to-vertex communication in a distributed graph • Giraph program consists – input step (graph initialization) – followed by a sequence of iterations (called supersteps) – an output • Vertex-centric model – Each vertex o is considered an independent computing unit o Has a unique id, A set of outgoing edges o application-dependent attributes of the vertex and Its edges GIRAPH OVERVIEW
  • 23. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Distributed graph summarization – same iterative merging mechanism in the centralized algorithm – starting from the original graph as the summary o each node is a super-node and o iteratively merging super-nodes until k super-nodes left. – In Centralized algorithms easy o Single process with share memory o to decide which pairs of super-nodes are good candidates for merge & o perform these merge operations – In Giraph distributed environment, o All the decisions and operations have to be done in a distributed way o through message passing and synchronization o To fully utilize the parallelization  need to find multiple pairs of nodes to merge, and  simultaneously merge them in each iteration. Main idea
  • 24. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Two challenges define two crucial tasks: – Candidates-Find task o The Candidates-Find task decides on the pairs of super-nodes to be merged. – Merge task o Whereas the Merge task executes these merges • Propose three distributed graph summarization algorithms: – Dist-Greedy, – Dist-Random and – Dist-LSH • Three algorithms share the same operations in the Merge task • Differ in how merge candidates are selected. Challenges
  • 25. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Each Giraph vertex – Has three attributes associated with vertices o owner-id: points to which other super-node this super-node has been merged to. o size: records the number of nodes in the original graph contained in this super-node. o selfconn: represents the number of edges in connecting the nodes inside this super-node. – Two attributes associated with edges o size: caches the number of nodes in the other adjacent super-node of the edge to avoid an additional round of query for this value. o conn: is the number of edges in the original graph between this super-node and the neighbor. Giraph vertex’s Data structure
  • 26. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Super-steps: – Candidates-Find task & – Merge task • ExecutionPhase (Aggregator ) – Indicate COMPUTE() function currently current Phase. • Based on the previous value of ExecutionPhase, – we can set the right value to this aggregator in the PRESUPERSTEP function before each superstep starts. Overview • ActiveNodes (Aggregator) • is used to keep track of the number of super-nodes in the current summary. • When the summary size is less or equal to the required size k, • the value of the ExecutionPhase will be set to DONE. • In this case, in the COMPUTE() function, every vertex will vote to halt. • Then the whole program will finish.
  • 27. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • How to find pairs of super-nodes as candidates to merge in – DistGreedy – DistRandom – DistLSH. • FindCandidates(msgs) FINDING MERGE CANDIDATES
  • 28. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • DistGreedy – based on the centralized Greedy algorithm. – looks at super-nodes that are 2-hops away to each other and – thrives to find the pairs with minimum error increase. FINDING MERGE CANDIDATES > DistGreedy – To control the number of super-node pairs to be merged in each iteration, o use a threshold called ErrorThreshold o as the cutoff for which pairs qualify as merge candidates. – every pairs with error increase < ErrorThreshold o will become merge candidates. – In start, ErrorThreshold = 0 (no error) – Number of merge candidates fall below 5% of the current summary size, o the algorithm increases ErrorThreshold by a controllable parameter, o called ThresholdIncrease, for the subsequent iterations.
  • 29. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Major task – To compute the actual error increase for each pair of 2- hop-away super-nodes • simple in the centralized Greedy • More complex in the distributed environment, – as the information to compute the error increase is distributed in different places. FINDING MERGE CANDIDATES > DistGreedy (Cont’d)
  • 30. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Error increase for merging a pair of supernodes Vi and Vj can be decomposed into 3 parts: – Common Neighbor Error Increase – Unique Neighbor Error Increase – Self Error Increase FINDING MERGE CANDIDATES > DistGreedy (Cont’d)
  • 31. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Common Neighbor Error Increase – requires the error increase associated with the connections of Vi and Vj to all their common neighbors. – For a common neighbor, say Vp o error before the merge is o After merge = o Thus error increase of merging Vi and Vj w.r.t. common neighbor Vp is: o Collectively computed common neighbors: FINDING MERGE CANDIDATES > DistGreedy (Cont’d)
  • 32. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Unique Neighbor Error Increase – Computation requires only unique neighbors of each super-node. – Vi and Vj can independently compute this part of error increase. – For the unique neighbor Vq in Fig. – Error increase associated with Vi unique neighbors – Similar for Vj – The total is a simple sum of the two: FINDING MERGE CANDIDATES > DistGreedy (Cont’d)
  • 33. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Self Error Increase – requires collaboration between Vi and Vj • Between the two super-nodes, – the one with a larger id, say Vj – Sends its self-conn to Vi – Then at Vi, – self-loop error FINDING MERGE CANDIDATES > DistGreedy (Cont’d)
  • 34. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Finally: – the three parts of error increase will be aggregated at the super-node with the smaller id, Vi in our example. – This requires messages from o common neighbors o Unique Neighbors o Self Connections • Then Vi can simply test whether – the total error increase is below ErrorThreshold – or not to decide on – whether the two super-nodes should be merged. FINDING MERGE CANDIDATES > DistGreedy (Cont’d)
  • 35. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • DistGreedy – Algorithm 2 o DistGreedy’s FindCandidates function. – There are three phases for this function. – Giraph vertex = different roles in computation – Aggregator ExecutionPhase o indicate current superstep phase. – First phase o Giraph vertex role = common neighbor o Vp, to a potential merge candidate Vi and Vj o neighbors of Vp are all two hops away from each other o Vp will compute for all pairs of neighbors Vi and Vj o And send to the super-node in the pair with the smaller id, Vi. FINDING MERGE CANDIDATES
  • 36. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • DistGreedy - Time complexity – d = average number of neighbors of a vertex. – then average no. of 2-hop away neighbors for a vertex is d2 – the computation of all the different 2-hop away neighbors – complexity is O(d2, N) o where N is the total number of vertices. – Same for – computation phase o iterates through each 1-hop neighbor Vq to compute for every 2-hop neighbor Vj ,  thus has a time complexity of O(d3 N). – Overall DistGreedy time complexity is = FINDING MERGE CANDIDATES > DistGreedy (Cont’d)
  • 37. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • DistRandom – DistGreedy blindly examines all super-node pairs of 2 hops o large amount of computation o network messages. – DistRandom randomly selects some super-node pairs to examine. – DistRandom also has the following three supersteps. o super-node randomly selects one neighbor  sends a message to this neighbor, including its » size, selfconn, all neighbors’ size and conn. o neighbor receives the message and forwards it to a random chosen neighbor with an id smaller than the sender. o The 2-hop away neighbor receives this message and use it to compute the error increase. If the error increase is above ErrorThreshold, then a merge decision is made. – Time complexity is O(d, N) FINDING MERGE CANDIDATES > DistRandom
  • 38. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • After Candidates-Find task – Super-nodes to be merged • How to merge super-nodes distributedly? • For every vertex merge – Instead of creating a new merged super-node – always reuse the super-node o with the smaller id as the merged super-node. • super-node with larger id shall set its owner-id to the merged super-node. – and call VOTETOHALT() – to turn itself to inactive. MERGING SUPER-NODES
  • 39. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Issue? – merge super-nodes Vi and Vj in to Vi – issue is that there could be another merge decision that requires Vi merged into Vg. – Efficiently merge multiple super-node pairs distributedly – we introduce a repeatable merge decision propagation phase to ensure all the super-nodes know whom they eventually should be merge into. – This design decision is essential to save overall supersteps and messages, – Vertex id is much cheaper to propagate than real vertex data. • Decision Propagation Phase • Connection Switch Phase • Connection Merge Phase • State Update Phase MERGING SUPER-NODES
  • 40. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Decision Propagation Phase – Vi will notify Vj and Vg will notify Vi • Connection Switch Phase – each super-nodes to be merged – shall notify its neighbors to update this neighbor information – self.size, self.conn, – All neighbor’s nbr.sizes and – nbr.conns. • Connection Merge Phase – receivers of the connection switch messages shall update their neighbor list with the new neighbor ids • State Update Phase – performs the actual merge by updating all the attributes MERGING SUPER-NODES
  • 41. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Environment – Cluster of 16-node (IBM SystemX iDataPlex dx340) – 32GB RAM, – Ubuntu Linux, Java 1.6, Giraph trunk version • Dataset: EXPERIMENTAL EVALUATION
  • 42. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Log-scaled graph summary error histograms – across different graph summary sizes for three real datasets. EXPERIMENTAL EVALUATION
  • 43. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization • Log-scaled running time histograms – across different graph summary sizes for three real datasets.. EXPERIMENTAL EVALUATION
  • 44. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Distributed Graph Summarization EXPERIMENTAL EVALUATION
  • 45. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Conclusion and Future work • Presented a highly compact two-part representation – R(S,C) of the input graph G – based on the MDL principle. o Greedy, Random and LSH based. – The same has been implemented in distributed environment.

Editor's Notes

  1. Common Neighbor Error Increase associated with the connections to the common neighbors of the two super-nodes. Unique Neighbor Error Increase captures the error increase brought by the connections to the unique neighbors of the two super-nodes. Self Error Increase This last part of error increase comes from the self connections of the two super-nodes as well as the connection between the two super-nodes if there is any.