SlideShare a Scribd company logo
Gspan: Graph-based
Substructure Pattern Mining
Presented By: Sadik Mussah
University of Vermont
CS 332 – Data mining
1
- Algorithm -
Outlines
• Background
• Problem Definition
• Authors Contribution
• Concepts Behind Gspan
• Experimental Result
• Conclusion
2
Background
• Frequent Subgraph Mining Is An Extension To Existing
Frequent Pattern Mining Algorithms
• A Major Challenge IsTo Count How Many Instances of
patterns are in the Dataset
• Counting Instances Might Be Easy For Sets, But Subtle For
Graphs
• Graph Isomorphism Problem
3
Background
Theorem
Given two graphs G and G’ (g prime), G isomorphic to G’ iff min(G)
= min(G’)
04/12/16Sadik Mussah
4
Background
5
X W
U Y
V
(a)
X
W
U
YV
(b)
Two Isomorphic graph (a) and (b) with their mapping function (c)
 Two Graphs Are Isomorphic If One Can Find A Mapping Of Nodes Of
The First Graph To The Second Graph Such That Labels On Nodes
And Edges Are Preserved.
f(V1.1) = V2.2
f(V1.2) = V2.5
f(V1.3) = V2.3
f(V1.4) = V2.4
f(V1.5) = V2.1
(c)
G1=(V1,E1,L1) G2=(V2,E2,L2)
1
2
3
4
5
1
2
3
4
5
Problem: Finding Frequent Subgraphs
• Problem Setting: Similar To Finding Frequent Itemsets For
Association Rule Discovery
• Input: Database Of Graph Transactions
• Undirected Simple Graph (No Multiples Edges)
• Each Graph Transaction Has Labeled Edges/Vertices.
• Transactions May Not Be Connected
• Minimum Support Thresholds
• Output: Frequent Subgraphs That Satisfy The Support Threshold,
Where Each Frequent Subgraph Is Connected.
6
Finding Frequent Subgraphs
7
Authors Contribution
• Representing Graphs As Strings (Like Treeminer)
• No Candidate Generation!
• “It Combines The Growing And Checking Of Frequent Subgraphs
Into One Procedure,Thus Accelerates The Mining Process.”
• Really Fast, Still A Standard Baseline System That Most Rivals
Compare Their Systems To.
8
Concepts Behind Gspan
• The Idea Is To Produces A Depth-first Search (DFS) Codes For
Each Edge In Graphs
• Edges Are Sorted According To Lexicographic Order Of Codes
• Yan And Han Proved That Graph Isomororphism Can Be Tested
For Two Graphs Annotated With DFS Codes
• Starting With Small Graph Patterns Containing 1-edge, Patterns
Are Expanded Systemically By The DFS Search
• Employ Anti-monotonic Property Of Graph Frequency
9
Lexicographic Ordering In Graph
• It Can Tell Us The Order Of Two Graphs.
• The Design Can Help Us Build A Similar Hierarchy.
• The Design Should Guarantee Easy-growing From One Level To
The Lower Level And Easy-rolling-up From Low Level To Higher
Level.
• It May Be Difficult To Have Such Design That No Two Nodes In
This Tree Are Same For Graph Case.
• It Can Tell Us Whether The Graph Has Been Discovered.
• And More,The Most Important, If A Graph Has Been Discovered,
All Its Children Nodes In The Hierarchy Must Have Been
Discovered.
10
Lexicographic Ordering in Graph11
...
... ...
1-edge
2-edge
...3-edge ...
...
...
...
DFS Code And Minimum DFS Code
• We Use A 5-tuple (Vi,Vj, L(vi), L(vj), L(vi,vj)) To Represent An Edge. (It May Be
Redudant, But Much EasierTo Understand.)
• Turn A Graph Into A SequenceWhose Basic Element Is 5-tuple. Form The
Sequence In Such An Order:
• To Extend One New Node,Add The Forward Edge
That Connect One Node In The Old Graph With This
New Node.
• Add All Backward Edge That Connect This New Node
To Other Nodes In The Old Graph
• Repeat This Procedure.
12
DFS code
13
X
Y
X
Z
Z
a a
b
b
c
d
v0
v1
v2
v3
v4
X
Y
a
e0: (0,1,x,y,a)
X
b
e1: (1,2,y,x,b)a
e2: (2,0,x,x,a)
Z
c e3: (2,3,x,z,c)b
e4: (3,1,x,y,b)
Z
d
e5: (1,4,x,z,d)
DFS Code And Minimum DFS Code
14
Depth First Tree And Forward/Backward Edge Set
Minimum DFS code
15
Each Graph may have lots of DFS code (why?):
one smallest lexicographic one is its Minimum DFS Code
Edge no. (B) (C) ( D)
0 (0,1,x,y,a) (0,1,y,x,a) (0,1,x,x,a)
1 (1,2,y,x,b) (1,2,x,x,a) (1,2,x,y,b)
2 (2,0,x,x,a) (2,0,x,y,b) (0,1,y,x,a)
3 (2,3,x,z,c) (2,3,x,z,c) (2,3,y,z,a)
4 (3,1,z,y,b) (3,0,z,y,b) (3,1,z,x,c)
5 (1,4,x,z,d) (0,4,y,z,d) (2,4,y,z,d)
Graph Parent And Its Children
16
X
Y
X
Z
Z
a
b
c
a
Given a DFS code
c0=(e0,e1,…,en)
if c1=(e0,e1,…,en,ex)
if c0<c1, then
c0 is c1’s parent,
c1 is c0’s child.
?
?
?
?
?
?
?
?
Theorem
• 1. Given Two Graph G0 And G1, G0 Is Isomorphic To G1 Iff
Min_dfs_code(g0)=min_dfs_code(g1).
• 2. DFS CodeTree Covers All Graphs Although SomeTree Nodes May
Represent The Same Graph
• 3. Given A Node In DFS CodeTree, If Its DFS Code Is Not Its Minimum DFS
Code, PruneThis Node And Its All DescendantsWon’t Change.“Covering”.
17
DFS Code Tree
18
...
... ...
1-edge
2-edge
...3-edge ...
...
...
...
pruned
FSG: two substructure patterns and their
potential candidates.
04/12/16Sadik Mussah
19
04/12/16SADIK MUSSAH
20
AGM: two substructures joined by two chains
Algorithm
21
Algorithm
22
Algorithm:
Apriorigraph
04/12/16SADIK MUSSAH
23
ALGORITHM:
gSpan
04/12/16Xifeng Yan
24
Experimental Result
25
Experimental Result
26
Conclusion
• No Candidate Generation And FalseTest
• Space Saving From Depth First Search
• Good Performance: Using “Memory Pool” And One Major
Counting Improvement, It SeemsThe PerformanceWill Be
Improved 5Times More. (But Need MoreTesting).
27
Questions
Q1) What Two Major Costs From Apriori-like, Frequent
Substructure Mining Algorithms Did Gspan Aim To
Reduce/Avoid?
 Answer:
1)The Creation Of Size K+1 Candidate Subgraphs From Size K
Frequent Subgraphs Is More Complicated And Costly The
Standard Apriori Large Itemset Generation.
2) Pruning False Positives Is An Expensive Process. Subgraph
Isomorphism Problem Is Np-complete.
28
Security Graph 3DVisualization
• https://www.youtube.com/watch?v=JsEm-CDj4qM
04/12/16Sadik Mussah
29
Questions (cont.)
• Q2) Which DFSTree Does The DFS Code Below BelongTo?
30
v0
Y
x
x
z
z v4
v1
v2
v3
a
a
c
bb
d
Answer: tree (c)
Questions
• Q3) What Does Gspan CompareWhen Testing For
Isomorphism Between Two Graphs,AndWhy?
• Answer: Gspan Compares The Minimum Dfs Codes Of The Two
Graphs. GivenTwo Graphs G And G’, G Is Isomorphic To G’ If
Min(g)=min(g’).This Theorem Allows For A Simple String
Comparison Of More Complicated Graphs. If Two Nodes Contain
The Same Graph But Different Minimum DFS Codes,We Can
Prune The Sub-branch Of The Rightmost Of The Two Nodes.This
Greatly Decreases The Problem Size.
32
Questions?
33

More Related Content

What's hot

Hierachical clustering
Hierachical clusteringHierachical clustering
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Prof. Neeta Awasthy
 
Knapsack Problem
Knapsack ProblemKnapsack Problem
Knapsack Problem
Jenny Galino
 
Divide and conquer
Divide and conquerDivide and conquer
Divide and conquer
Dr Shashikant Athawale
 
Genetic programming
Genetic programmingGenetic programming
Genetic programming
Meghna Singh
 
Code Optimization
Code OptimizationCode Optimization
Code Optimization
Akhil Kaushik
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
Prakhar Rastogi
 
First Order Logic resolution
First Order Logic resolutionFirst Order Logic resolution
First Order Logic resolution
Amar Jukuntla
 
Recursion (left recursion) | Compiler design
Recursion (left recursion) | Compiler designRecursion (left recursion) | Compiler design
Recursion (left recursion) | Compiler design
Shamsul Huda
 
Dataflow Analysis
Dataflow AnalysisDataflow Analysis
Dataflow Analysis
Eelco Visser
 
Intermediate code generation
Intermediate code generationIntermediate code generation
Intermediate code generation
Akshaya Arunan
 
Developing R Graphical User Interfaces
Developing R Graphical User InterfacesDeveloping R Graphical User Interfaces
Developing R Graphical User Interfaces
Setia Pramana
 
Dinive conquer algorithm
Dinive conquer algorithmDinive conquer algorithm
Dinive conquer algorithm
Mohd Arif
 
Top down parsing
Top down parsingTop down parsing
Top down parsing
ASHOK KUMAR REDDY
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
Tonmoy Bhagawati
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
huda2018
 
ID3 ALGORITHM
ID3 ALGORITHMID3 ALGORITHM
ID3 ALGORITHM
HARDIK SINGH
 
Intro to AI STRIPS Planning & Applications in Video-games Lecture6-Part1
Intro to AI STRIPS Planning & Applications in Video-games Lecture6-Part1Intro to AI STRIPS Planning & Applications in Video-games Lecture6-Part1
Intro to AI STRIPS Planning & Applications in Video-games Lecture6-Part1
Stavros Vassos
 
SPADE -
SPADE - SPADE -
SPADE -
Monica Dagadita
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
Carlos Castillo (ChaTo)
 

What's hot (20)

Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Knapsack Problem
Knapsack ProblemKnapsack Problem
Knapsack Problem
 
Divide and conquer
Divide and conquerDivide and conquer
Divide and conquer
 
Genetic programming
Genetic programmingGenetic programming
Genetic programming
 
Code Optimization
Code OptimizationCode Optimization
Code Optimization
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
First Order Logic resolution
First Order Logic resolutionFirst Order Logic resolution
First Order Logic resolution
 
Recursion (left recursion) | Compiler design
Recursion (left recursion) | Compiler designRecursion (left recursion) | Compiler design
Recursion (left recursion) | Compiler design
 
Dataflow Analysis
Dataflow AnalysisDataflow Analysis
Dataflow Analysis
 
Intermediate code generation
Intermediate code generationIntermediate code generation
Intermediate code generation
 
Developing R Graphical User Interfaces
Developing R Graphical User InterfacesDeveloping R Graphical User Interfaces
Developing R Graphical User Interfaces
 
Dinive conquer algorithm
Dinive conquer algorithmDinive conquer algorithm
Dinive conquer algorithm
 
Top down parsing
Top down parsingTop down parsing
Top down parsing
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
ID3 ALGORITHM
ID3 ALGORITHMID3 ALGORITHM
ID3 ALGORITHM
 
Intro to AI STRIPS Planning & Applications in Video-games Lecture6-Part1
Intro to AI STRIPS Planning & Applications in Video-games Lecture6-Part1Intro to AI STRIPS Planning & Applications in Video-games Lecture6-Part1
Intro to AI STRIPS Planning & Applications in Video-games Lecture6-Part1
 
SPADE -
SPADE - SPADE -
SPADE -
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 

Viewers also liked

gSpan algorithm
 gSpan algorithm gSpan algorithm
gSpan algorithm
Sadik Mussah
 
Lgm saarbrucken
Lgm saarbruckenLgm saarbrucken
Lgm saarbrucken
Yasuo Tabei
 
Trends In Graph Data Management And Mining
Trends In Graph Data Management And MiningTrends In Graph Data Management And Mining
Trends In Graph Data Management And Mining
Srinath Srinivasa
 
Data Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network AnalysisData Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network Analysis
vwchu
 
Cracking the Coding Interview (Oct 2012)
Cracking the Coding Interview (Oct 2012)Cracking the Coding Interview (Oct 2012)
Cracking the Coding Interview (Oct 2012)
careercup
 
Close Graph
Close GraphClose Graph
Close Graph
Sayeed Mahmud
 
Clustering for Beginners
Clustering for BeginnersClustering for Beginners
Clustering for Beginners
Sayeed Mahmud
 
Data mining fp growth
Data mining fp growthData mining fp growth
Data mining fp growth
Shihab Rahman
 

Viewers also liked (8)

gSpan algorithm
 gSpan algorithm gSpan algorithm
gSpan algorithm
 
Lgm saarbrucken
Lgm saarbruckenLgm saarbrucken
Lgm saarbrucken
 
Trends In Graph Data Management And Mining
Trends In Graph Data Management And MiningTrends In Graph Data Management And Mining
Trends In Graph Data Management And Mining
 
Data Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network AnalysisData Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network Analysis
 
Cracking the Coding Interview (Oct 2012)
Cracking the Coding Interview (Oct 2012)Cracking the Coding Interview (Oct 2012)
Cracking the Coding Interview (Oct 2012)
 
Close Graph
Close GraphClose Graph
Close Graph
 
Clustering for Beginners
Clustering for BeginnersClustering for Beginners
Clustering for Beginners
 
Data mining fp growth
Data mining fp growthData mining fp growth
Data mining fp growth
 

Similar to gSpan algorithm

NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...
NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...
NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...
ssuser4b1f48
 
graphin-c1.pnggraphin-c1.txt1 22 3 83 44 5.docx
graphin-c1.pnggraphin-c1.txt1 22 3 83 44 5.docxgraphin-c1.pnggraphin-c1.txt1 22 3 83 44 5.docx
graphin-c1.pnggraphin-c1.txt1 22 3 83 44 5.docx
whittemorelucilla
 
ae_722_unstructured_meshes.ppt
ae_722_unstructured_meshes.pptae_722_unstructured_meshes.ppt
ae_722_unstructured_meshes.ppt
Sushilkumar Jogdankar
 
LEC 12-DSALGO-GRAPHS(final12).pdf
LEC 12-DSALGO-GRAPHS(final12).pdfLEC 12-DSALGO-GRAPHS(final12).pdf
LEC 12-DSALGO-GRAPHS(final12).pdf
MuhammadUmerIhtisham
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblas
graphulo
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblas
MIT
 
Graph mining ppt
Graph mining pptGraph mining ppt
Graph mining ppt
tallalfarooq1
 
Lgm pakdd2011 public
Lgm pakdd2011 publicLgm pakdd2011 public
Lgm pakdd2011 public
Yasuo Tabei
 
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Christopher Morris
 
06 mlp
06 mlp06 mlp
06 mlp
Ronald Teo
 
testpang
testpangtestpang
testpang
pangpang2
 
Skiena algorithm 2007 lecture12 topological sort connectivity
Skiena algorithm 2007 lecture12 topological sort connectivitySkiena algorithm 2007 lecture12 topological sort connectivity
Skiena algorithm 2007 lecture12 topological sort connectivity
zukun
 
LalitBDA2015V3
LalitBDA2015V3LalitBDA2015V3
LalitBDA2015V3
Lalit Kumar
 
Number Crunching in Python
Number Crunching in PythonNumber Crunching in Python
Number Crunching in Python
Valerio Maggio
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
PyData
 
Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...
Till Blume
 
GBM package in r
GBM package in rGBM package in r
GBM package in r
mark_landry
 
Data Structures - Lecture 10 [Graphs]
Data Structures - Lecture 10 [Graphs]Data Structures - Lecture 10 [Graphs]
Data Structures - Lecture 10 [Graphs]
Muhammad Hammad Waseem
 
A Hybrid Technique for Shape Matching Based on chain code and DFS Tree
A Hybrid Technique for Shape Matching Based on chain code and DFS TreeA Hybrid Technique for Shape Matching Based on chain code and DFS Tree
A Hybrid Technique for Shape Matching Based on chain code and DFS Tree
IOSR Journals
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
Databricks
 

Similar to gSpan algorithm (20)

NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...
NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...
NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...
 
graphin-c1.pnggraphin-c1.txt1 22 3 83 44 5.docx
graphin-c1.pnggraphin-c1.txt1 22 3 83 44 5.docxgraphin-c1.pnggraphin-c1.txt1 22 3 83 44 5.docx
graphin-c1.pnggraphin-c1.txt1 22 3 83 44 5.docx
 
ae_722_unstructured_meshes.ppt
ae_722_unstructured_meshes.pptae_722_unstructured_meshes.ppt
ae_722_unstructured_meshes.ppt
 
LEC 12-DSALGO-GRAPHS(final12).pdf
LEC 12-DSALGO-GRAPHS(final12).pdfLEC 12-DSALGO-GRAPHS(final12).pdf
LEC 12-DSALGO-GRAPHS(final12).pdf
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblas
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblas
 
Graph mining ppt
Graph mining pptGraph mining ppt
Graph mining ppt
 
Lgm pakdd2011 public
Lgm pakdd2011 publicLgm pakdd2011 public
Lgm pakdd2011 public
 
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
 
06 mlp
06 mlp06 mlp
06 mlp
 
testpang
testpangtestpang
testpang
 
Skiena algorithm 2007 lecture12 topological sort connectivity
Skiena algorithm 2007 lecture12 topological sort connectivitySkiena algorithm 2007 lecture12 topological sort connectivity
Skiena algorithm 2007 lecture12 topological sort connectivity
 
LalitBDA2015V3
LalitBDA2015V3LalitBDA2015V3
LalitBDA2015V3
 
Number Crunching in Python
Number Crunching in PythonNumber Crunching in Python
Number Crunching in Python
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...
 
GBM package in r
GBM package in rGBM package in r
GBM package in r
 
Data Structures - Lecture 10 [Graphs]
Data Structures - Lecture 10 [Graphs]Data Structures - Lecture 10 [Graphs]
Data Structures - Lecture 10 [Graphs]
 
A Hybrid Technique for Shape Matching Based on chain code and DFS Tree
A Hybrid Technique for Shape Matching Based on chain code and DFS TreeA Hybrid Technique for Shape Matching Based on chain code and DFS Tree
A Hybrid Technique for Shape Matching Based on chain code and DFS Tree
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 

Recently uploaded

一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 

Recently uploaded (20)

一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 

gSpan algorithm

  • 1. Gspan: Graph-based Substructure Pattern Mining Presented By: Sadik Mussah University of Vermont CS 332 – Data mining 1 - Algorithm -
  • 2. Outlines • Background • Problem Definition • Authors Contribution • Concepts Behind Gspan • Experimental Result • Conclusion 2
  • 3. Background • Frequent Subgraph Mining Is An Extension To Existing Frequent Pattern Mining Algorithms • A Major Challenge IsTo Count How Many Instances of patterns are in the Dataset • Counting Instances Might Be Easy For Sets, But Subtle For Graphs • Graph Isomorphism Problem 3
  • 4. Background Theorem Given two graphs G and G’ (g prime), G isomorphic to G’ iff min(G) = min(G’) 04/12/16Sadik Mussah 4
  • 5. Background 5 X W U Y V (a) X W U YV (b) Two Isomorphic graph (a) and (b) with their mapping function (c)  Two Graphs Are Isomorphic If One Can Find A Mapping Of Nodes Of The First Graph To The Second Graph Such That Labels On Nodes And Edges Are Preserved. f(V1.1) = V2.2 f(V1.2) = V2.5 f(V1.3) = V2.3 f(V1.4) = V2.4 f(V1.5) = V2.1 (c) G1=(V1,E1,L1) G2=(V2,E2,L2) 1 2 3 4 5 1 2 3 4 5
  • 6. Problem: Finding Frequent Subgraphs • Problem Setting: Similar To Finding Frequent Itemsets For Association Rule Discovery • Input: Database Of Graph Transactions • Undirected Simple Graph (No Multiples Edges) • Each Graph Transaction Has Labeled Edges/Vertices. • Transactions May Not Be Connected • Minimum Support Thresholds • Output: Frequent Subgraphs That Satisfy The Support Threshold, Where Each Frequent Subgraph Is Connected. 6
  • 8. Authors Contribution • Representing Graphs As Strings (Like Treeminer) • No Candidate Generation! • “It Combines The Growing And Checking Of Frequent Subgraphs Into One Procedure,Thus Accelerates The Mining Process.” • Really Fast, Still A Standard Baseline System That Most Rivals Compare Their Systems To. 8
  • 9. Concepts Behind Gspan • The Idea Is To Produces A Depth-first Search (DFS) Codes For Each Edge In Graphs • Edges Are Sorted According To Lexicographic Order Of Codes • Yan And Han Proved That Graph Isomororphism Can Be Tested For Two Graphs Annotated With DFS Codes • Starting With Small Graph Patterns Containing 1-edge, Patterns Are Expanded Systemically By The DFS Search • Employ Anti-monotonic Property Of Graph Frequency 9
  • 10. Lexicographic Ordering In Graph • It Can Tell Us The Order Of Two Graphs. • The Design Can Help Us Build A Similar Hierarchy. • The Design Should Guarantee Easy-growing From One Level To The Lower Level And Easy-rolling-up From Low Level To Higher Level. • It May Be Difficult To Have Such Design That No Two Nodes In This Tree Are Same For Graph Case. • It Can Tell Us Whether The Graph Has Been Discovered. • And More,The Most Important, If A Graph Has Been Discovered, All Its Children Nodes In The Hierarchy Must Have Been Discovered. 10
  • 11. Lexicographic Ordering in Graph11 ... ... ... 1-edge 2-edge ...3-edge ... ... ... ...
  • 12. DFS Code And Minimum DFS Code • We Use A 5-tuple (Vi,Vj, L(vi), L(vj), L(vi,vj)) To Represent An Edge. (It May Be Redudant, But Much EasierTo Understand.) • Turn A Graph Into A SequenceWhose Basic Element Is 5-tuple. Form The Sequence In Such An Order: • To Extend One New Node,Add The Forward Edge That Connect One Node In The Old Graph With This New Node. • Add All Backward Edge That Connect This New Node To Other Nodes In The Old Graph • Repeat This Procedure. 12
  • 13. DFS code 13 X Y X Z Z a a b b c d v0 v1 v2 v3 v4 X Y a e0: (0,1,x,y,a) X b e1: (1,2,y,x,b)a e2: (2,0,x,x,a) Z c e3: (2,3,x,z,c)b e4: (3,1,x,y,b) Z d e5: (1,4,x,z,d)
  • 14. DFS Code And Minimum DFS Code 14 Depth First Tree And Forward/Backward Edge Set
  • 15. Minimum DFS code 15 Each Graph may have lots of DFS code (why?): one smallest lexicographic one is its Minimum DFS Code Edge no. (B) (C) ( D) 0 (0,1,x,y,a) (0,1,y,x,a) (0,1,x,x,a) 1 (1,2,y,x,b) (1,2,x,x,a) (1,2,x,y,b) 2 (2,0,x,x,a) (2,0,x,y,b) (0,1,y,x,a) 3 (2,3,x,z,c) (2,3,x,z,c) (2,3,y,z,a) 4 (3,1,z,y,b) (3,0,z,y,b) (3,1,z,x,c) 5 (1,4,x,z,d) (0,4,y,z,d) (2,4,y,z,d)
  • 16. Graph Parent And Its Children 16 X Y X Z Z a b c a Given a DFS code c0=(e0,e1,…,en) if c1=(e0,e1,…,en,ex) if c0<c1, then c0 is c1’s parent, c1 is c0’s child. ? ? ? ? ? ? ? ?
  • 17. Theorem • 1. Given Two Graph G0 And G1, G0 Is Isomorphic To G1 Iff Min_dfs_code(g0)=min_dfs_code(g1). • 2. DFS CodeTree Covers All Graphs Although SomeTree Nodes May Represent The Same Graph • 3. Given A Node In DFS CodeTree, If Its DFS Code Is Not Its Minimum DFS Code, PruneThis Node And Its All DescendantsWon’t Change.“Covering”. 17
  • 18. DFS Code Tree 18 ... ... ... 1-edge 2-edge ...3-edge ... ... ... ... pruned
  • 19. FSG: two substructure patterns and their potential candidates. 04/12/16Sadik Mussah 19
  • 20. 04/12/16SADIK MUSSAH 20 AGM: two substructures joined by two chains
  • 27. Conclusion • No Candidate Generation And FalseTest • Space Saving From Depth First Search • Good Performance: Using “Memory Pool” And One Major Counting Improvement, It SeemsThe PerformanceWill Be Improved 5Times More. (But Need MoreTesting). 27
  • 28. Questions Q1) What Two Major Costs From Apriori-like, Frequent Substructure Mining Algorithms Did Gspan Aim To Reduce/Avoid?  Answer: 1)The Creation Of Size K+1 Candidate Subgraphs From Size K Frequent Subgraphs Is More Complicated And Costly The Standard Apriori Large Itemset Generation. 2) Pruning False Positives Is An Expensive Process. Subgraph Isomorphism Problem Is Np-complete. 28
  • 29. Security Graph 3DVisualization • https://www.youtube.com/watch?v=JsEm-CDj4qM 04/12/16Sadik Mussah 29
  • 30. Questions (cont.) • Q2) Which DFSTree Does The DFS Code Below BelongTo? 30
  • 32. Questions • Q3) What Does Gspan CompareWhen Testing For Isomorphism Between Two Graphs,AndWhy? • Answer: Gspan Compares The Minimum Dfs Codes Of The Two Graphs. GivenTwo Graphs G And G’, G Is Isomorphic To G’ If Min(g)=min(g’).This Theorem Allows For A Simple String Comparison Of More Complicated Graphs. If Two Nodes Contain The Same Graph But Different Minimum DFS Codes,We Can Prune The Sub-branch Of The Rightmost Of The Two Nodes.This Greatly Decreases The Problem Size. 32

Editor's Notes

  1. Isomorphisim: The graph isomorphism problem is the computational problem of determining whether two finite graphs are isomorphic. Which is MP - it is one of a very small number of problems belonging to NP neither known to be solvable in polynomial time nor NP-complete: