SlideShare a Scribd company logo
1 of 25
Antonio Maccioni & Daniel Abadi
SCALABLE PATTERN MATCHING OVER COMPRESSED
GRAPHS VIA DE-DENSIFICATION
22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 13-17, 2016 | San Francisco, California
Aftab Alam
Department of Computer Engineering, Kyung Hee University
Scalable Pattern Matching over Compressed Graphs via Dedensification
Contents
Background
Conclusion and Future Work
Empirical Evaluation
Problem Statement
Graph Pattern Matching
7
6
5
2
1
4
3 Proposed Solution
Dedensification
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
• Graph G (V, E, W) Database:
– Uses graph structures for semantic queries with nodes,
edges and properties to represent and store data.
• Graph Query Language:
– SELECT ?subject ?predicate ?object
WHERE {?subject ?predicate ?object}
LIMIT 100
• One of the common operations on graph DB is
– graph pattern matching
Background
subject predicate object
s1 p1 o1
s2 p2 o2
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
Problem Statement
• Modern social networks such as Facebook, Twitter and many more are
• bulky and having dense areas.
• HD node make query processing over graphs challenging for database.
• Problem of scaling queries over graphs is harder than scaling queries over relational data.
Almost 1/15 of registered twitter users follow @B. Obama
• Relational Data:
o Set-oriented
o Partitioning, replication, & indexing can be applied.
o Multiple cores/servers can be used.
o To operate each partition independently.
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
Problem Statement (Cont’d)
• Real world Graph query operations are less partitionable.
– Because follow the Power-law.
– E.g.: 10% of Twitter accounts follow the same five users.
– The graph is dense around these height degree (HD) nodes.
• Techniques like Partitioning, Replication, Indexing
– are not enough to solve the problem raised by these dense areas.
• Parallelization: complicated, almost impossible in some situations.
Replication Partitioning
Indexing
Node or Edge Partitioning
• Extremely skewed (High degree processing time >> Low degree nodes )
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
Proposed Solution
Graph G  Graph G`
• Dedensification (lossless compression technique)
– By reducing the number of adjacencies of HD nodes.
• HD nodes are surrounded by redundant information.
– that can be synthesized and eliminated.
• Identify clusters of low-degree nodes connected to HD nodes.
– Insert special node in the graph called compressor nodes.
– Representing common connections of clusters of related nodes to HD nodes.
– Remove Redundant edges.
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
Coming Next
• A non-expansive strategy for dedensification
• Query Answering for graph pattern matching
• Experiments on real and synthetic graph
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
DEDENSIFICATION
Features and Parameters
• Setting threshold of high-degree (HD)?
– Threshold, indicates the minimum number of adjacencies that a node should have.
– A node is HD if it has at least incoming edges sharing the same label.
– divides the N nodes in to two sets. i.e.
o Height degree nodes Hh
o Low degree nodes Hi
• Dedensification
– Lossless compression technique
– Reduce the number of adjacencies of HD nodes
– Applicable to both undirected and directed graphs
– Can perform on both incoming and outgoing edges of the graph
• Real-world graphs (SN domains) have more problems for incoming edge
• Directed labeled graphs G = (N=set of unique nodes, E=set of non-unique edges)
Twitter: Follow Labels
Facebook; Follow or like labels.
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
DEDENSIFICATION (Cont’d)
Working & Example
• Often a cluster of related nodes are connected to the same group of HD
nodes.
• Add a compressor nodes that summarize multiple connections of the
same kind to high-degree nodes.
• This process is called Dedenscation".
• Example:
– Figure (a), 6 low degree nodes (white) and 3 HD nodes(red).
– Low-degree nodes have outgoing edges to this same set of three HD nodes.
– Remove the edges that connect the white nodes to this set of HD nodes, and
instead create a single edge from each white node to the new yellow
node.(compressor node)
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
DEDENSIFICATION (Cont’d)
Advantages and Constraint
• Advantages
– If 1000 nodes contacted to set of 3 HD nodes,
– Means that 3000 edges incoming to three HD nodes
– Replaced with 1000 edges
– Reduce congestion around the HD nodes.
• To accelerate query performance and optimize for compression.
• So, place constraints on how and when dedensication occurs.
• Dedensification creates a new compressor node if CONSTRAINT 1 holds on a set of HD
nodes H and other nodes M.
• If M and H overlap, Constraint 1 is still valid.
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
DEDENSIFICATION
Algorithm
• Algorithm 1: Dedensication of a graph.
– HD(H) & M (White nodes)
• Lines 4-11 used to find node sets H and M.
• Where lines 12-19 computes the actual
dedensication over H and M.
• We can reconstruct the initial graph G from G`
– By iterating on each compressor nc∈Nc
– and connecting each incoming node to nc
to all the outgoing nodes of nc.
– Finally empty and remove Nc.
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
GRAPH PATTERN MATCHING
Query Pattern
• How graph pattern matching queries are processed over CG?
• Query pattern is itself a graph consisting of.
– Set of nodes and edges which can either have labels or variables.
• Example:
– Fig. (d). Consist of two nodes and a link
o Constant node (6), variable Links ?v3 and node ?v2
o Return all possible values of ?vs and ?v2 connected to node (6)
• Query pattern is the composition of triple:
– Node-edge-node: Triples
– 1st Node: Source or subject “s”,
– Link: Label, edge or predicate “p”
– 2nd Node: called destination or object “o”
• Fig. (b) Query can be decomposed as:
– (?v1, ?v3, ?v4),
– (?v1,?v3, A), and
– (?v1, ?v3, 6).
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
GRAPH PATTERN MATCHING (Cont’d)
Query Pattern Matching
• Significant amount of algorithms for processing pattern matching queries
• Based on the most prevalent algorithm for processing pattern matching queries & proceeds as:
– Each triple pattern t1, t2,… tn of query q corresponds to
– a selection on the graph database G (i.e. t1(G) t2(G)… tn(G)) and
– A common node leads to join operation between two sets of triples.
– e.g., t1(G) t2(G) if t1 & t2 have a node in common.
– After selection and joins, output is the complete answer-set to query q i.e.,
– Such joins are commutative, however commonly source node is used to connect.
• Star Query: where a node in the query graph has multiple edges emerging from it.
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
• Proposed solution processes a query q over G
– by rewriting q to a new query pattern q` over the compressed graph G`,
– such that q(G) = q`(G`).
• The underlying system does not need to be aware of the original q(G).
• It simply needs optimize the processing q` over G`.
• In dedenscified G`,
– no direct connections from low-degree nodes to HD nodes.
– compressor nodes are always present
GRAPH PATTERN MATCHING (Cont’d)
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
GRAPH PATTERN MATCHING (Cont’d)
• Flowchart shows the computation of all
– types of star joins over a G`.
• Queries are submitted to the graph database G`,
– As a set of edges (s, o).
• Algorithm input is graph G` and query q.
• If q contains any reference to constant low-degree nodes labels,
– Low-degree nodes are less important , and
– Then no different between G` and G graphs.
• Only focus on the parts of q involving constant HD nodes:
– HD = {h1,h2, … hn} & VAR = {s, v1, v2…. vm} where (s=source)
– Matched either to low-degree nodes or HD nodes.
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
GRAPH PATTERN MATCHING (Cont’d)
Three Micro Cases of Star Queries
1. Stars formed by only HD nodes in their fan-out.
– The left branch of Figure
2. Stars formed by a mixed fan-out of variables and HD
nodes.
– The central branch of Figure
3. Stars formed by only variables in their fan-out
– The right branch of Figure
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
GRAPH PATTERN MATCHING (Cont’d)
1. HD Nodes Only.
• When VAR = the query contains a fan-out with HD
nodes only.
• Easiest case to compute
– As incoming edges to HD nodes only come from
compressor nodes.
– Simply need to search for compressor node
– that connect to the HD nodes h1, h2,….,hn in the query.
• If found no nodes, the empty set can be returned.
• If found nodes,
– Then all nodes connected to these compressor nodes
via an outgoing edge from itself to the compressor node
form the result set for q`(G`).
• Block A is responsible for such computations.
• Finding the compressors AB and ABC
• Connected to both A and B.
• The solutions are the incoming nodes to those compressors,
• Namely nodes 2, 3, 4 and 5.
q1:  (HD = {A, B}, VAR = {s}
over the graph in Fig. (b),
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
GRAPH PATTERN MATCHING (Cont’d)
2. Mix of HD and Variable Nodes.
• When both &
• 1st search for the partial stars containing the HD
nodes that are specified in the query.
– (push high-degree nodes down)
• Assigning them to the set y.
– If y = Null, no need to search
– As the final result set q`(G`) = Null (in block O) .
• If y is not empty,
– we compute Block C, D, E and F in sequence
– To enrich the partial answers in y.
• The goal of Blocks C-F is to refine y by matching the
variable parts of the query.
• This is complicated by the fact that variables can
correspond to both LD and HD nodes.
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
GRAPH PATTERN MATCHING (Cont’d)
3. Variables Nodes Only.
• When HD = Null
– Equivalent to 2. mix of HD and variable nodes
– Except: no filter to perform on the HD constants,
• Block G = block C
– Previously it finds the HD nodes that are potential
matches for object variables in the query.
• Block H = Block D and Block E
– It finds all the LD nodes that are potential matches for
variables in the query i.e. all nodes that have incoming
edges from non-compressor nodes.
• Block I = Block F,
• Finally outputting q`(G`).
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
EMPIRICAL EVALUATION
• To understand
– Proposed approach VS running normal queries over uncompressed graphs,
– The scalability of the approach on real-world graphs and
– Whether the approach can complement existing indexing approaches.
• Created a graph database system prototype
– That can generate compress graph (G` )
– Implemented the proposed algorithms
• Cold and warm cache experiments
• The following dataset were used for testing.
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
EMPIRICAL EVALUATION (Cont’d)
Star Queries & Patterns
• Star queries used to evaluate the proposed algorithms.
– Five classes of queries
Pattern D & E Compositions of multiple stars through HD
Pattern A = Only HD nodes Pattern B = Mixed of variables and HD nodes
Pattern C = Variables only.
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
EMPIRICAL EVALUATION (Cont’d)
Results
• Focus on query performance rather than data compression
• Set value large to create 100 compressor nodes/dataset
o = 2,500 for Twitter
o = 5,000 for Google
o = 10,000 for LoveJournal
o = 7,000-28,000 for Barabasi
• Performance were checked against both
o With out indexing (ni) & With indexing (in)
• dd = Dedensified graph & or = original graph
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
EMPIRICAL EVALUATION (Cont’d)
Evaluation with Evolving Graphs
• How different pattern-matching techniques scale as the graph increases.
• Use Barabasi model to generate a graph and measure the performance
– When 100,000 nodes and 2,000,000 edges,
– 200,000 nodes and 4,000,000 edges,
– 300,000 nodes and 6,000,000 edges, and
– 400,000 nodes and 8,000,000 edges.
– Where the was set to 7,000; 14,000; 21,000 and 28,000 respectively.
Data & Knowledge Engineering Laboratory
Department of Computer Engineering, Kyung Hee
Conclusion
• Introduced a dedensication for graph databases
– To improve scalable query performance: HD nodes
– Remove redundancy in graphs using the compressor node.
– Improves the performance
Your Logo
THANK YOU!
?

More Related Content

What's hot

Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual IdentificationSubspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual IdentificationUnited States Air Force Academy
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris
 
Graph Convolutional Neural Networks
Graph Convolutional Neural Networks Graph Convolutional Neural Networks
Graph Convolutional Neural Networks 신동 강
 
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Ruairi de Frein
 
Lgm pakdd2011 public
Lgm pakdd2011 publicLgm pakdd2011 public
Lgm pakdd2011 publicYasuo Tabei
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsRyan B Harvey, CSDP, CSM
 
Training and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesTraining and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesKeyon Vafa
 
A short introduction to Network coding
A short introduction to Network codingA short introduction to Network coding
A short introduction to Network codingArash Pourdamghani
 
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...Waqas Nawaz
 
Pathfinding - Part 1: Α* heuristic search
Pathfinding - Part 1: Α* heuristic searchPathfinding - Part 1: Α* heuristic search
Pathfinding - Part 1: Α* heuristic searchStavros Vassos
 
2015 16combinepdf
2015 16combinepdf2015 16combinepdf
2015 16combinepdfmadhesi
 
Elliptic curve scalar multiplier using karatsuba
Elliptic curve scalar multiplier using karatsubaElliptic curve scalar multiplier using karatsuba
Elliptic curve scalar multiplier using karatsubaIAEME Publication
 
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...Computer Science Club
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNDat Nguyen
 

What's hot (20)

Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual IdentificationSubspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
 
Graph Convolutional Neural Networks
Graph Convolutional Neural Networks Graph Convolutional Neural Networks
Graph Convolutional Neural Networks
 
Gate-Cs 1995
Gate-Cs 1995Gate-Cs 1995
Gate-Cs 1995
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Unit 5
Unit 5Unit 5
Unit 5
 
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
 
Lgm pakdd2011 public
Lgm pakdd2011 publicLgm pakdd2011 public
Lgm pakdd2011 public
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
 
Ijetr042170
Ijetr042170Ijetr042170
Ijetr042170
 
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
 
Training and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesTraining and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian Processes
 
A short introduction to Network coding
A short introduction to Network codingA short introduction to Network coding
A short introduction to Network coding
 
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
 
Pathfinding - Part 1: Α* heuristic search
Pathfinding - Part 1: Α* heuristic searchPathfinding - Part 1: Α* heuristic search
Pathfinding - Part 1: Α* heuristic search
 
2015 16combinepdf
2015 16combinepdf2015 16combinepdf
2015 16combinepdf
 
Encoding survey
Encoding surveyEncoding survey
Encoding survey
 
Elliptic curve scalar multiplier using karatsuba
Elliptic curve scalar multiplier using karatsubaElliptic curve scalar multiplier using karatsuba
Elliptic curve scalar multiplier using karatsuba
 
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCN
 

Similar to SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION

Distributed graph summarization
Distributed graph summarizationDistributed graph summarization
Distributed graph summarizationaftab alam
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection aftab alam
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernelsivaderivader
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmYu Liu
 
1 chayes
1 chayes1 chayes
1 chayesYandex
 
Reducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology MappingReducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology Mappingsatrajit
 
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01Hemant Jha
 
Locally densest subgraph discovery
Locally densest subgraph discoveryLocally densest subgraph discovery
Locally densest subgraph discoveryaftab alam
 
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseLarge-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseAapo Kyrölä
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblasgraphulo
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblasMIT
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Sigmod11 outsource shortest path
Sigmod11 outsource shortest pathSigmod11 outsource shortest path
Sigmod11 outsource shortest pathredhatdb
 
Double Patterning (4/2 update)
Double Patterning (4/2 update)Double Patterning (4/2 update)
Double Patterning (4/2 update)Danny Luk
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Databricks
 
Sparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptxSparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptxssuser2624f71
 
Fast optimization intevacoct6_3final
Fast optimization intevacoct6_3finalFast optimization intevacoct6_3final
Fast optimization intevacoct6_3finaleArtius, Inc.
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
 

Similar to SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION (20)

Distributed graph summarization
Distributed graph summarizationDistributed graph summarization
Distributed graph summarization
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
 
1 chayes
1 chayes1 chayes
1 chayes
 
Reducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology MappingReducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology Mapping
 
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
 
Locally densest subgraph discovery
Locally densest subgraph discoveryLocally densest subgraph discovery
Locally densest subgraph discovery
 
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseLarge-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblas
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblas
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Lecture24
Lecture24Lecture24
Lecture24
 
Sigmod11 outsource shortest path
Sigmod11 outsource shortest pathSigmod11 outsource shortest path
Sigmod11 outsource shortest path
 
Double Patterning (4/2 update)
Double Patterning (4/2 update)Double Patterning (4/2 update)
Double Patterning (4/2 update)
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 
Sparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptxSparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptx
 
Fast optimization intevacoct6_3final
Fast optimization intevacoct6_3finalFast optimization intevacoct6_3final
Fast optimization intevacoct6_3final
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 

More from aftab alam

Carved visual hulls for image based modeling
Carved visual hulls for image based modelingCarved visual hulls for image based modeling
Carved visual hulls for image based modelingaftab alam
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sqlaftab alam
 
A Graph Summarization: A Survey | Summarizing and understanding large graphs
A Graph Summarization: A Survey | Summarizing and understanding large graphsA Graph Summarization: A Survey | Summarizing and understanding large graphs
A Graph Summarization: A Survey | Summarizing and understanding large graphsaftab alam
 
Writing for computer science: Fourteen steps to a clearly written technical p...
Writing for computer science: Fourteen steps to a clearly written technical p...Writing for computer science: Fourteen steps to a clearly written technical p...
Writing for computer science: Fourteen steps to a clearly written technical p...aftab alam
 
Writing for Computer Science: Design an article
Writing for Computer Science: Design an articleWriting for Computer Science: Design an article
Writing for Computer Science: Design an articleaftab alam
 
Efficient aggregation for graph summarization
Efficient aggregation for graph summarizationEfficient aggregation for graph summarization
Efficient aggregation for graph summarizationaftab alam
 

More from aftab alam (6)

Carved visual hulls for image based modeling
Carved visual hulls for image based modelingCarved visual hulls for image based modeling
Carved visual hulls for image based modeling
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
A Graph Summarization: A Survey | Summarizing and understanding large graphs
A Graph Summarization: A Survey | Summarizing and understanding large graphsA Graph Summarization: A Survey | Summarizing and understanding large graphs
A Graph Summarization: A Survey | Summarizing and understanding large graphs
 
Writing for computer science: Fourteen steps to a clearly written technical p...
Writing for computer science: Fourteen steps to a clearly written technical p...Writing for computer science: Fourteen steps to a clearly written technical p...
Writing for computer science: Fourteen steps to a clearly written technical p...
 
Writing for Computer Science: Design an article
Writing for Computer Science: Design an articleWriting for Computer Science: Design an article
Writing for Computer Science: Design an article
 
Efficient aggregation for graph summarization
Efficient aggregation for graph summarizationEfficient aggregation for graph summarization
Efficient aggregation for graph summarization
 

Recently uploaded

HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...jabtakhaidam7
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...ronahami
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesChandrakantDivate1
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...vershagrag
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...gragchanchal546
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Ramkumar k
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Ground Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth ReinforcementGround Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth ReinforcementDr. Deepak Mudgal
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 

Recently uploaded (20)

HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Ground Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth ReinforcementGround Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth Reinforcement
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 

SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION

  • 1. Antonio Maccioni & Daniel Abadi SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining August 13-17, 2016 | San Francisco, California Aftab Alam Department of Computer Engineering, Kyung Hee University
  • 2. Scalable Pattern Matching over Compressed Graphs via Dedensification Contents Background Conclusion and Future Work Empirical Evaluation Problem Statement Graph Pattern Matching 7 6 5 2 1 4 3 Proposed Solution Dedensification
  • 3. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee • Graph G (V, E, W) Database: – Uses graph structures for semantic queries with nodes, edges and properties to represent and store data. • Graph Query Language: – SELECT ?subject ?predicate ?object WHERE {?subject ?predicate ?object} LIMIT 100 • One of the common operations on graph DB is – graph pattern matching Background subject predicate object s1 p1 o1 s2 p2 o2
  • 4. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee Problem Statement • Modern social networks such as Facebook, Twitter and many more are • bulky and having dense areas. • HD node make query processing over graphs challenging for database. • Problem of scaling queries over graphs is harder than scaling queries over relational data. Almost 1/15 of registered twitter users follow @B. Obama • Relational Data: o Set-oriented o Partitioning, replication, & indexing can be applied. o Multiple cores/servers can be used. o To operate each partition independently.
  • 5. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee Problem Statement (Cont’d) • Real world Graph query operations are less partitionable. – Because follow the Power-law. – E.g.: 10% of Twitter accounts follow the same five users. – The graph is dense around these height degree (HD) nodes. • Techniques like Partitioning, Replication, Indexing – are not enough to solve the problem raised by these dense areas. • Parallelization: complicated, almost impossible in some situations. Replication Partitioning Indexing Node or Edge Partitioning • Extremely skewed (High degree processing time >> Low degree nodes )
  • 6. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee Proposed Solution Graph G  Graph G` • Dedensification (lossless compression technique) – By reducing the number of adjacencies of HD nodes. • HD nodes are surrounded by redundant information. – that can be synthesized and eliminated. • Identify clusters of low-degree nodes connected to HD nodes. – Insert special node in the graph called compressor nodes. – Representing common connections of clusters of related nodes to HD nodes. – Remove Redundant edges.
  • 7. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee Coming Next • A non-expansive strategy for dedensification • Query Answering for graph pattern matching • Experiments on real and synthetic graph
  • 8. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee DEDENSIFICATION Features and Parameters • Setting threshold of high-degree (HD)? – Threshold, indicates the minimum number of adjacencies that a node should have. – A node is HD if it has at least incoming edges sharing the same label. – divides the N nodes in to two sets. i.e. o Height degree nodes Hh o Low degree nodes Hi • Dedensification – Lossless compression technique – Reduce the number of adjacencies of HD nodes – Applicable to both undirected and directed graphs – Can perform on both incoming and outgoing edges of the graph • Real-world graphs (SN domains) have more problems for incoming edge • Directed labeled graphs G = (N=set of unique nodes, E=set of non-unique edges) Twitter: Follow Labels Facebook; Follow or like labels.
  • 9. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee DEDENSIFICATION (Cont’d) Working & Example • Often a cluster of related nodes are connected to the same group of HD nodes. • Add a compressor nodes that summarize multiple connections of the same kind to high-degree nodes. • This process is called Dedenscation". • Example: – Figure (a), 6 low degree nodes (white) and 3 HD nodes(red). – Low-degree nodes have outgoing edges to this same set of three HD nodes. – Remove the edges that connect the white nodes to this set of HD nodes, and instead create a single edge from each white node to the new yellow node.(compressor node)
  • 10. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee DEDENSIFICATION (Cont’d) Advantages and Constraint • Advantages – If 1000 nodes contacted to set of 3 HD nodes, – Means that 3000 edges incoming to three HD nodes – Replaced with 1000 edges – Reduce congestion around the HD nodes. • To accelerate query performance and optimize for compression. • So, place constraints on how and when dedensication occurs. • Dedensification creates a new compressor node if CONSTRAINT 1 holds on a set of HD nodes H and other nodes M. • If M and H overlap, Constraint 1 is still valid.
  • 11. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee DEDENSIFICATION Algorithm • Algorithm 1: Dedensication of a graph. – HD(H) & M (White nodes) • Lines 4-11 used to find node sets H and M. • Where lines 12-19 computes the actual dedensication over H and M. • We can reconstruct the initial graph G from G` – By iterating on each compressor nc∈Nc – and connecting each incoming node to nc to all the outgoing nodes of nc. – Finally empty and remove Nc.
  • 12. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee GRAPH PATTERN MATCHING Query Pattern • How graph pattern matching queries are processed over CG? • Query pattern is itself a graph consisting of. – Set of nodes and edges which can either have labels or variables. • Example: – Fig. (d). Consist of two nodes and a link o Constant node (6), variable Links ?v3 and node ?v2 o Return all possible values of ?vs and ?v2 connected to node (6) • Query pattern is the composition of triple: – Node-edge-node: Triples – 1st Node: Source or subject “s”, – Link: Label, edge or predicate “p” – 2nd Node: called destination or object “o” • Fig. (b) Query can be decomposed as: – (?v1, ?v3, ?v4), – (?v1,?v3, A), and – (?v1, ?v3, 6).
  • 13. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee GRAPH PATTERN MATCHING (Cont’d) Query Pattern Matching • Significant amount of algorithms for processing pattern matching queries • Based on the most prevalent algorithm for processing pattern matching queries & proceeds as: – Each triple pattern t1, t2,… tn of query q corresponds to – a selection on the graph database G (i.e. t1(G) t2(G)… tn(G)) and – A common node leads to join operation between two sets of triples. – e.g., t1(G) t2(G) if t1 & t2 have a node in common. – After selection and joins, output is the complete answer-set to query q i.e., – Such joins are commutative, however commonly source node is used to connect. • Star Query: where a node in the query graph has multiple edges emerging from it.
  • 14. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee • Proposed solution processes a query q over G – by rewriting q to a new query pattern q` over the compressed graph G`, – such that q(G) = q`(G`). • The underlying system does not need to be aware of the original q(G). • It simply needs optimize the processing q` over G`. • In dedenscified G`, – no direct connections from low-degree nodes to HD nodes. – compressor nodes are always present GRAPH PATTERN MATCHING (Cont’d)
  • 15. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee GRAPH PATTERN MATCHING (Cont’d) • Flowchart shows the computation of all – types of star joins over a G`. • Queries are submitted to the graph database G`, – As a set of edges (s, o). • Algorithm input is graph G` and query q. • If q contains any reference to constant low-degree nodes labels, – Low-degree nodes are less important , and – Then no different between G` and G graphs. • Only focus on the parts of q involving constant HD nodes: – HD = {h1,h2, … hn} & VAR = {s, v1, v2…. vm} where (s=source) – Matched either to low-degree nodes or HD nodes.
  • 16. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee GRAPH PATTERN MATCHING (Cont’d) Three Micro Cases of Star Queries 1. Stars formed by only HD nodes in their fan-out. – The left branch of Figure 2. Stars formed by a mixed fan-out of variables and HD nodes. – The central branch of Figure 3. Stars formed by only variables in their fan-out – The right branch of Figure
  • 17. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee GRAPH PATTERN MATCHING (Cont’d) 1. HD Nodes Only. • When VAR = the query contains a fan-out with HD nodes only. • Easiest case to compute – As incoming edges to HD nodes only come from compressor nodes. – Simply need to search for compressor node – that connect to the HD nodes h1, h2,….,hn in the query. • If found no nodes, the empty set can be returned. • If found nodes, – Then all nodes connected to these compressor nodes via an outgoing edge from itself to the compressor node form the result set for q`(G`). • Block A is responsible for such computations. • Finding the compressors AB and ABC • Connected to both A and B. • The solutions are the incoming nodes to those compressors, • Namely nodes 2, 3, 4 and 5. q1:  (HD = {A, B}, VAR = {s} over the graph in Fig. (b),
  • 18. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee GRAPH PATTERN MATCHING (Cont’d) 2. Mix of HD and Variable Nodes. • When both & • 1st search for the partial stars containing the HD nodes that are specified in the query. – (push high-degree nodes down) • Assigning them to the set y. – If y = Null, no need to search – As the final result set q`(G`) = Null (in block O) . • If y is not empty, – we compute Block C, D, E and F in sequence – To enrich the partial answers in y. • The goal of Blocks C-F is to refine y by matching the variable parts of the query. • This is complicated by the fact that variables can correspond to both LD and HD nodes.
  • 19. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee GRAPH PATTERN MATCHING (Cont’d) 3. Variables Nodes Only. • When HD = Null – Equivalent to 2. mix of HD and variable nodes – Except: no filter to perform on the HD constants, • Block G = block C – Previously it finds the HD nodes that are potential matches for object variables in the query. • Block H = Block D and Block E – It finds all the LD nodes that are potential matches for variables in the query i.e. all nodes that have incoming edges from non-compressor nodes. • Block I = Block F, • Finally outputting q`(G`).
  • 20. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee EMPIRICAL EVALUATION • To understand – Proposed approach VS running normal queries over uncompressed graphs, – The scalability of the approach on real-world graphs and – Whether the approach can complement existing indexing approaches. • Created a graph database system prototype – That can generate compress graph (G` ) – Implemented the proposed algorithms • Cold and warm cache experiments • The following dataset were used for testing.
  • 21. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee EMPIRICAL EVALUATION (Cont’d) Star Queries & Patterns • Star queries used to evaluate the proposed algorithms. – Five classes of queries Pattern D & E Compositions of multiple stars through HD Pattern A = Only HD nodes Pattern B = Mixed of variables and HD nodes Pattern C = Variables only.
  • 22. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee EMPIRICAL EVALUATION (Cont’d) Results • Focus on query performance rather than data compression • Set value large to create 100 compressor nodes/dataset o = 2,500 for Twitter o = 5,000 for Google o = 10,000 for LoveJournal o = 7,000-28,000 for Barabasi • Performance were checked against both o With out indexing (ni) & With indexing (in) • dd = Dedensified graph & or = original graph
  • 23. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee EMPIRICAL EVALUATION (Cont’d) Evaluation with Evolving Graphs • How different pattern-matching techniques scale as the graph increases. • Use Barabasi model to generate a graph and measure the performance – When 100,000 nodes and 2,000,000 edges, – 200,000 nodes and 4,000,000 edges, – 300,000 nodes and 6,000,000 edges, and – 400,000 nodes and 8,000,000 edges. – Where the was set to 7,000; 14,000; 21,000 and 28,000 respectively.
  • 24. Data & Knowledge Engineering Laboratory Department of Computer Engineering, Kyung Hee Conclusion • Introduced a dedensication for graph databases – To improve scalable query performance: HD nodes – Remove redundancy in graphs using the compressor node. – Improves the performance

Editor's Notes

  1. a power law is a functional relationship between two quantities, where a relative change in one quantity results in a proportional relative change in the other quantity, independent of the initial size of those quantities: one quantity varies as a power of another. Example: Considering the area of a square in terms of the length of its side, if the length is doubled, the area is multiplied by a factor of four