SlideShare a Scribd company logo
1 of 33
Gspan: Graph-based
Substructure Pattern Mining
Presented By: Sadik Mussah
University of Vermont
CS 332 – Data mining
1
- Algorithm -
Outlines
• Background
• Problem Definition
• Authors Contribution
• Concepts Behind Gspan
• Experimental Result
• Conclusion
2
Background
• Frequent Subgraph Mining Is An Extension To Existing
Frequent Pattern Mining Algorithms
• A Major Challenge IsTo Count How Many Instances of
patterns are in the Dataset
• Counting Instances Might Be Easy For Sets, But Subtle For
Graphs
• Graph Isomorphism Problem
3
Background
Theorem
Given two graphs G and G’ (g prime), G isomorphic to G’ iff min(G)
= min(G’)
04/12/16Sadik Mussah
4
Background
5
X W
U Y
V
(a)
X
W
U
YV
(b)
Two Isomorphic graph (a) and (b) with their mapping function (c)
 Two Graphs Are Isomorphic If One Can Find A Mapping Of Nodes Of
The First Graph To The Second Graph Such That Labels On Nodes
And Edges Are Preserved.
f(V1.1) = V2.2
f(V1.2) = V2.5
f(V1.3) = V2.3
f(V1.4) = V2.4
f(V1.5) = V2.1
(c)
G1=(V1,E1,L1) G2=(V2,E2,L2)
1
2
3
4
5
1
2
3
4
5
Problem: Finding Frequent Subgraphs
• Problem Setting: Similar To Finding Frequent Itemsets For
Association Rule Discovery
• Input: Database Of Graph Transactions
• Undirected Simple Graph (No Multiples Edges)
• Each Graph Transaction Has Labeled Edges/Vertices.
• Transactions May Not Be Connected
• Minimum Support Thresholds
• Output: Frequent Subgraphs That Satisfy The Support Threshold,
Where Each Frequent Subgraph Is Connected.
6
Finding Frequent Subgraphs
7
Authors Contribution
• Representing Graphs As Strings (Like Treeminer)
• No Candidate Generation!
• “It Combines The Growing And Checking Of Frequent Subgraphs
Into One Procedure,Thus Accelerates The Mining Process.”
• Really Fast, Still A Standard Baseline System That Most Rivals
Compare Their Systems To.
8
Concepts Behind Gspan
• The Idea Is To Produces A Depth-first Search (DFS) Codes For
Each Edge In Graphs
• Edges Are Sorted According To Lexicographic Order Of Codes
• Yan And Han Proved That Graph Isomororphism Can Be Tested
For Two Graphs Annotated With DFS Codes
• Starting With Small Graph Patterns Containing 1-edge, Patterns
Are Expanded Systemically By The DFS Search
• Employ Anti-monotonic Property Of Graph Frequency
9
Lexicographic Ordering In Graph
• It Can Tell Us The Order Of Two Graphs.
• The Design Can Help Us Build A Similar Hierarchy.
• The Design Should Guarantee Easy-growing From One Level To
The Lower Level And Easy-rolling-up From Low Level To Higher
Level.
• It May Be Difficult To Have Such Design That No Two Nodes In
This Tree Are Same For Graph Case.
• It Can Tell Us Whether The Graph Has Been Discovered.
• And More,The Most Important, If A Graph Has Been Discovered,
All Its Children Nodes In The Hierarchy Must Have Been
Discovered.
10
Lexicographic Ordering in Graph11
...
... ...
1-edge
2-edge
...3-edge ...
...
...
...
DFS Code And Minimum DFS Code
• We Use A 5-tuple (Vi,Vj, L(vi), L(vj), L(vi,vj)) To Represent An Edge. (It May Be
Redudant, But Much EasierTo Understand.)
• Turn A Graph Into A SequenceWhose Basic Element Is 5-tuple. Form The
Sequence In Such An Order:
• To Extend One New Node,Add The Forward Edge
That Connect One Node In The Old Graph With This
New Node.
• Add All Backward Edge That Connect This New Node
To Other Nodes In The Old Graph
• Repeat This Procedure.
12
DFS code
13
X
Y
X
Z
Z
a a
b
b
c
d
v0
v1
v2
v3
v4
X
Y
a
e0: (0,1,x,y,a)
X
b
e1: (1,2,y,x,b)a
e2: (2,0,x,x,a)
Z
c e3: (2,3,x,z,c)b
e4: (3,1,x,y,b)
Z
d
e5: (1,4,x,z,d)
DFS Code And Minimum DFS Code
14
Depth First Tree And Forward/Backward Edge Set
Minimum DFS code
15
Each Graph may have lots of DFS code (why?):
one smallest lexicographic one is its Minimum DFS Code
Edge no. (B) (C) ( D)
0 (0,1,x,y,a) (0,1,y,x,a) (0,1,x,x,a)
1 (1,2,y,x,b) (1,2,x,x,a) (1,2,x,y,b)
2 (2,0,x,x,a) (2,0,x,y,b) (0,1,y,x,a)
3 (2,3,x,z,c) (2,3,x,z,c) (2,3,y,z,a)
4 (3,1,z,y,b) (3,0,z,y,b) (3,1,z,x,c)
5 (1,4,x,z,d) (0,4,y,z,d) (2,4,y,z,d)
Graph Parent And Its Children
16
X
Y
X
Z
Z
a
b
c
a
Given a DFS code
c0=(e0,e1,…,en)
if c1=(e0,e1,…,en,ex)
if c0<c1, then
c0 is c1’s parent,
c1 is c0’s child.
?
?
?
?
?
?
?
?
Theorem
• 1. Given Two Graph G0 And G1, G0 Is Isomorphic To G1 Iff
Min_dfs_code(g0)=min_dfs_code(g1).
• 2. DFS CodeTree Covers All Graphs Although SomeTree Nodes May
Represent The Same Graph
• 3. Given A Node In DFS CodeTree, If Its DFS Code Is Not Its Minimum DFS
Code, PruneThis Node And Its All DescendantsWon’t Change.“Covering”.
17
DFS Code Tree
18
...
... ...
1-edge
2-edge
...3-edge ...
...
...
...
pruned
FSG: two substructure patterns and their
potential candidates.
04/12/16Sadik Mussah
19
04/12/16SADIK MUSSAH
20
AGM: two substructures joined by two chains
Algorithm
21
Algorithm
22
Algorithm:
Apriorigraph
04/12/16SADIK MUSSAH
23
ALGORITHM:
gSpan
04/12/16Xifeng Yan
24
Experimental Result
25
Experimental Result
26
Conclusion
• No Candidate Generation And FalseTest
• Space Saving From Depth First Search
• Good Performance: Using “Memory Pool” And One Major
Counting Improvement, It SeemsThe PerformanceWill Be
Improved 5Times More. (But Need MoreTesting).
27
Questions
Q1) What Two Major Costs From Apriori-like, Frequent
Substructure Mining Algorithms Did Gspan Aim To
Reduce/Avoid?
 Answer:
1)The Creation Of Size K+1 Candidate Subgraphs From Size K
Frequent Subgraphs Is More Complicated And Costly The
Standard Apriori Large Itemset Generation.
2) Pruning False Positives Is An Expensive Process. Subgraph
Isomorphism Problem Is Np-complete.
28
Security Graph 3DVisualization
• https://www.youtube.com/watch?v=JsEm-CDj4qM
04/12/16Sadik Mussah
29
Questions (cont.)
• Q2) Which DFSTree Does The DFS Code Below BelongTo?
30
v0
Y
x
x
z
z v4
v1
v2
v3
a
a
c
bb
d
Answer: tree (c)
Questions
• Q3) What Does Gspan CompareWhen Testing For
Isomorphism Between Two Graphs,AndWhy?
• Answer: Gspan Compares The Minimum Dfs Codes Of The Two
Graphs. GivenTwo Graphs G And G’, G Is Isomorphic To G’ If
Min(g)=min(g’).This Theorem Allows For A Simple String
Comparison Of More Complicated Graphs. If Two Nodes Contain
The Same Graph But Different Minimum DFS Codes,We Can
Prune The Sub-branch Of The Rightmost Of The Two Nodes.This
Greatly Decreases The Problem Size.
32
Questions?
33

More Related Content

What's hot

Graph attention network - deep learning paper review
Graph attention network -  deep learning paper reviewGraph attention network -  deep learning paper review
Graph attention network - deep learning paper reviewtaeseon ryu
 
Cluster Analysis : Assignment & Update
Cluster Analysis : Assignment & UpdateCluster Analysis : Assignment & Update
Cluster Analysis : Assignment & UpdateBilly Yang
 
Lecture 12 Heuristic Searches
Lecture 12 Heuristic SearchesLecture 12 Heuristic Searches
Lecture 12 Heuristic SearchesHema Kashyap
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoderJun Lang
 
Style gan2 review
Style gan2 reviewStyle gan2 review
Style gan2 reviewtaeseon ryu
 
AI Informed Search Strategies by Examples
AI Informed Search Strategies by ExamplesAI Informed Search Strategies by Examples
AI Informed Search Strategies by ExamplesAhmed Gad
 
Broadcasting and low exponent rsa attack
Broadcasting and low exponent rsa attackBroadcasting and low exponent rsa attack
Broadcasting and low exponent rsa attackAnkita Kapratwar
 
Chord- A Scalable Peer-to-Peer Lookup Service for Internet Applications
Chord- A Scalable Peer-to-Peer Lookup Service for Internet ApplicationsChord- A Scalable Peer-to-Peer Lookup Service for Internet Applications
Chord- A Scalable Peer-to-Peer Lookup Service for Internet ApplicationsChandan Thakur
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris
 
K means and dbscan
K means and dbscanK means and dbscan
K means and dbscanYan Xu
 
K Nearest Neighbor Presentation
K Nearest Neighbor PresentationK Nearest Neighbor Presentation
K Nearest Neighbor PresentationDessy Amirudin
 

What's hot (20)

Alpha beta pruning
Alpha beta pruningAlpha beta pruning
Alpha beta pruning
 
Gnn overview
Gnn overviewGnn overview
Gnn overview
 
Graph attention network - deep learning paper review
Graph attention network -  deep learning paper reviewGraph attention network -  deep learning paper review
Graph attention network - deep learning paper review
 
Cluster Analysis : Assignment & Update
Cluster Analysis : Assignment & UpdateCluster Analysis : Assignment & Update
Cluster Analysis : Assignment & Update
 
Lecture 12 Heuristic Searches
Lecture 12 Heuristic SearchesLecture 12 Heuristic Searches
Lecture 12 Heuristic Searches
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoder
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Explaining Peptide Prophet
Explaining Peptide ProphetExplaining Peptide Prophet
Explaining Peptide Prophet
 
LSH
LSHLSH
LSH
 
Style gan2 review
Style gan2 reviewStyle gan2 review
Style gan2 review
 
AI Informed Search Strategies by Examples
AI Informed Search Strategies by ExamplesAI Informed Search Strategies by Examples
AI Informed Search Strategies by Examples
 
SPADE -
SPADE - SPADE -
SPADE -
 
Broadcasting and low exponent rsa attack
Broadcasting and low exponent rsa attackBroadcasting and low exponent rsa attack
Broadcasting and low exponent rsa attack
 
Cnn
CnnCnn
Cnn
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Chord- A Scalable Peer-to-Peer Lookup Service for Internet Applications
Chord- A Scalable Peer-to-Peer Lookup Service for Internet ApplicationsChord- A Scalable Peer-to-Peer Lookup Service for Internet Applications
Chord- A Scalable Peer-to-Peer Lookup Service for Internet Applications
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
 
K means and dbscan
K means and dbscanK means and dbscan
K means and dbscan
 
K Nearest Neighbor Presentation
K Nearest Neighbor PresentationK Nearest Neighbor Presentation
K Nearest Neighbor Presentation
 
Graph colouring
Graph colouringGraph colouring
Graph colouring
 

Similar to gSpan algorithm

NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...
NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...
NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...ssuser4b1f48
 
graphin-c1.pnggraphin-c1.txt1 22 3 83 44 5.docx
graphin-c1.pnggraphin-c1.txt1 22 3 83 44 5.docxgraphin-c1.pnggraphin-c1.txt1 22 3 83 44 5.docx
graphin-c1.pnggraphin-c1.txt1 22 3 83 44 5.docxwhittemorelucilla
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblasMIT
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblasgraphulo
 
Lgm pakdd2011 public
Lgm pakdd2011 publicLgm pakdd2011 public
Lgm pakdd2011 publicYasuo Tabei
 
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Christopher Morris
 
Skiena algorithm 2007 lecture12 topological sort connectivity
Skiena algorithm 2007 lecture12 topological sort connectivitySkiena algorithm 2007 lecture12 topological sort connectivity
Skiena algorithm 2007 lecture12 topological sort connectivityzukun
 
Number Crunching in Python
Number Crunching in PythonNumber Crunching in Python
Number Crunching in PythonValerio Maggio
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...Till Blume
 
GBM package in r
GBM package in rGBM package in r
GBM package in rmark_landry
 
A Hybrid Technique for Shape Matching Based on chain code and DFS Tree
A Hybrid Technique for Shape Matching Based on chain code and DFS TreeA Hybrid Technique for Shape Matching Based on chain code and DFS Tree
A Hybrid Technique for Shape Matching Based on chain code and DFS TreeIOSR Journals
 

Similar to gSpan algorithm (20)

NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...
NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...NS-CUK Seminar: H.B.Kim,  Review on "subgraph2vec: Learning Distributed Repre...
NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...
 
graphin-c1.pnggraphin-c1.txt1 22 3 83 44 5.docx
graphin-c1.pnggraphin-c1.txt1 22 3 83 44 5.docxgraphin-c1.pnggraphin-c1.txt1 22 3 83 44 5.docx
graphin-c1.pnggraphin-c1.txt1 22 3 83 44 5.docx
 
ae_722_unstructured_meshes.ppt
ae_722_unstructured_meshes.pptae_722_unstructured_meshes.ppt
ae_722_unstructured_meshes.ppt
 
LEC 12-DSALGO-GRAPHS(final12).pdf
LEC 12-DSALGO-GRAPHS(final12).pdfLEC 12-DSALGO-GRAPHS(final12).pdf
LEC 12-DSALGO-GRAPHS(final12).pdf
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblas
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblas
 
Graph mining ppt
Graph mining pptGraph mining ppt
Graph mining ppt
 
Lgm pakdd2011 public
Lgm pakdd2011 publicLgm pakdd2011 public
Lgm pakdd2011 public
 
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
 
06 mlp
06 mlp06 mlp
06 mlp
 
testpang
testpangtestpang
testpang
 
Skiena algorithm 2007 lecture12 topological sort connectivity
Skiena algorithm 2007 lecture12 topological sort connectivitySkiena algorithm 2007 lecture12 topological sort connectivity
Skiena algorithm 2007 lecture12 topological sort connectivity
 
Lgm saarbrucken
Lgm saarbruckenLgm saarbrucken
Lgm saarbrucken
 
LalitBDA2015V3
LalitBDA2015V3LalitBDA2015V3
LalitBDA2015V3
 
Number Crunching in Python
Number Crunching in PythonNumber Crunching in Python
Number Crunching in Python
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...
 
GBM package in r
GBM package in rGBM package in r
GBM package in r
 
Data Structures - Lecture 10 [Graphs]
Data Structures - Lecture 10 [Graphs]Data Structures - Lecture 10 [Graphs]
Data Structures - Lecture 10 [Graphs]
 
A Hybrid Technique for Shape Matching Based on chain code and DFS Tree
A Hybrid Technique for Shape Matching Based on chain code and DFS TreeA Hybrid Technique for Shape Matching Based on chain code and DFS Tree
A Hybrid Technique for Shape Matching Based on chain code and DFS Tree
 

Recently uploaded

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

gSpan algorithm

  • 1. Gspan: Graph-based Substructure Pattern Mining Presented By: Sadik Mussah University of Vermont CS 332 – Data mining 1 - Algorithm -
  • 2. Outlines • Background • Problem Definition • Authors Contribution • Concepts Behind Gspan • Experimental Result • Conclusion 2
  • 3. Background • Frequent Subgraph Mining Is An Extension To Existing Frequent Pattern Mining Algorithms • A Major Challenge IsTo Count How Many Instances of patterns are in the Dataset • Counting Instances Might Be Easy For Sets, But Subtle For Graphs • Graph Isomorphism Problem 3
  • 4. Background Theorem Given two graphs G and G’ (g prime), G isomorphic to G’ iff min(G) = min(G’) 04/12/16Sadik Mussah 4
  • 5. Background 5 X W U Y V (a) X W U YV (b) Two Isomorphic graph (a) and (b) with their mapping function (c)  Two Graphs Are Isomorphic If One Can Find A Mapping Of Nodes Of The First Graph To The Second Graph Such That Labels On Nodes And Edges Are Preserved. f(V1.1) = V2.2 f(V1.2) = V2.5 f(V1.3) = V2.3 f(V1.4) = V2.4 f(V1.5) = V2.1 (c) G1=(V1,E1,L1) G2=(V2,E2,L2) 1 2 3 4 5 1 2 3 4 5
  • 6. Problem: Finding Frequent Subgraphs • Problem Setting: Similar To Finding Frequent Itemsets For Association Rule Discovery • Input: Database Of Graph Transactions • Undirected Simple Graph (No Multiples Edges) • Each Graph Transaction Has Labeled Edges/Vertices. • Transactions May Not Be Connected • Minimum Support Thresholds • Output: Frequent Subgraphs That Satisfy The Support Threshold, Where Each Frequent Subgraph Is Connected. 6
  • 8. Authors Contribution • Representing Graphs As Strings (Like Treeminer) • No Candidate Generation! • “It Combines The Growing And Checking Of Frequent Subgraphs Into One Procedure,Thus Accelerates The Mining Process.” • Really Fast, Still A Standard Baseline System That Most Rivals Compare Their Systems To. 8
  • 9. Concepts Behind Gspan • The Idea Is To Produces A Depth-first Search (DFS) Codes For Each Edge In Graphs • Edges Are Sorted According To Lexicographic Order Of Codes • Yan And Han Proved That Graph Isomororphism Can Be Tested For Two Graphs Annotated With DFS Codes • Starting With Small Graph Patterns Containing 1-edge, Patterns Are Expanded Systemically By The DFS Search • Employ Anti-monotonic Property Of Graph Frequency 9
  • 10. Lexicographic Ordering In Graph • It Can Tell Us The Order Of Two Graphs. • The Design Can Help Us Build A Similar Hierarchy. • The Design Should Guarantee Easy-growing From One Level To The Lower Level And Easy-rolling-up From Low Level To Higher Level. • It May Be Difficult To Have Such Design That No Two Nodes In This Tree Are Same For Graph Case. • It Can Tell Us Whether The Graph Has Been Discovered. • And More,The Most Important, If A Graph Has Been Discovered, All Its Children Nodes In The Hierarchy Must Have Been Discovered. 10
  • 11. Lexicographic Ordering in Graph11 ... ... ... 1-edge 2-edge ...3-edge ... ... ... ...
  • 12. DFS Code And Minimum DFS Code • We Use A 5-tuple (Vi,Vj, L(vi), L(vj), L(vi,vj)) To Represent An Edge. (It May Be Redudant, But Much EasierTo Understand.) • Turn A Graph Into A SequenceWhose Basic Element Is 5-tuple. Form The Sequence In Such An Order: • To Extend One New Node,Add The Forward Edge That Connect One Node In The Old Graph With This New Node. • Add All Backward Edge That Connect This New Node To Other Nodes In The Old Graph • Repeat This Procedure. 12
  • 13. DFS code 13 X Y X Z Z a a b b c d v0 v1 v2 v3 v4 X Y a e0: (0,1,x,y,a) X b e1: (1,2,y,x,b)a e2: (2,0,x,x,a) Z c e3: (2,3,x,z,c)b e4: (3,1,x,y,b) Z d e5: (1,4,x,z,d)
  • 14. DFS Code And Minimum DFS Code 14 Depth First Tree And Forward/Backward Edge Set
  • 15. Minimum DFS code 15 Each Graph may have lots of DFS code (why?): one smallest lexicographic one is its Minimum DFS Code Edge no. (B) (C) ( D) 0 (0,1,x,y,a) (0,1,y,x,a) (0,1,x,x,a) 1 (1,2,y,x,b) (1,2,x,x,a) (1,2,x,y,b) 2 (2,0,x,x,a) (2,0,x,y,b) (0,1,y,x,a) 3 (2,3,x,z,c) (2,3,x,z,c) (2,3,y,z,a) 4 (3,1,z,y,b) (3,0,z,y,b) (3,1,z,x,c) 5 (1,4,x,z,d) (0,4,y,z,d) (2,4,y,z,d)
  • 16. Graph Parent And Its Children 16 X Y X Z Z a b c a Given a DFS code c0=(e0,e1,…,en) if c1=(e0,e1,…,en,ex) if c0<c1, then c0 is c1’s parent, c1 is c0’s child. ? ? ? ? ? ? ? ?
  • 17. Theorem • 1. Given Two Graph G0 And G1, G0 Is Isomorphic To G1 Iff Min_dfs_code(g0)=min_dfs_code(g1). • 2. DFS CodeTree Covers All Graphs Although SomeTree Nodes May Represent The Same Graph • 3. Given A Node In DFS CodeTree, If Its DFS Code Is Not Its Minimum DFS Code, PruneThis Node And Its All DescendantsWon’t Change.“Covering”. 17
  • 18. DFS Code Tree 18 ... ... ... 1-edge 2-edge ...3-edge ... ... ... ... pruned
  • 19. FSG: two substructure patterns and their potential candidates. 04/12/16Sadik Mussah 19
  • 20. 04/12/16SADIK MUSSAH 20 AGM: two substructures joined by two chains
  • 27. Conclusion • No Candidate Generation And FalseTest • Space Saving From Depth First Search • Good Performance: Using “Memory Pool” And One Major Counting Improvement, It SeemsThe PerformanceWill Be Improved 5Times More. (But Need MoreTesting). 27
  • 28. Questions Q1) What Two Major Costs From Apriori-like, Frequent Substructure Mining Algorithms Did Gspan Aim To Reduce/Avoid?  Answer: 1)The Creation Of Size K+1 Candidate Subgraphs From Size K Frequent Subgraphs Is More Complicated And Costly The Standard Apriori Large Itemset Generation. 2) Pruning False Positives Is An Expensive Process. Subgraph Isomorphism Problem Is Np-complete. 28
  • 29. Security Graph 3DVisualization • https://www.youtube.com/watch?v=JsEm-CDj4qM 04/12/16Sadik Mussah 29
  • 30. Questions (cont.) • Q2) Which DFSTree Does The DFS Code Below BelongTo? 30
  • 32. Questions • Q3) What Does Gspan CompareWhen Testing For Isomorphism Between Two Graphs,AndWhy? • Answer: Gspan Compares The Minimum Dfs Codes Of The Two Graphs. GivenTwo Graphs G And G’, G Is Isomorphic To G’ If Min(g)=min(g’).This Theorem Allows For A Simple String Comparison Of More Complicated Graphs. If Two Nodes Contain The Same Graph But Different Minimum DFS Codes,We Can Prune The Sub-branch Of The Rightmost Of The Two Nodes.This Greatly Decreases The Problem Size. 32

Editor's Notes

  1. Isomorphisim: The graph isomorphism problem is the computational problem of determining whether two finite graphs are isomorphic. Which is MP - it is one of a very small number of problems belonging to NP neither known to be solvable in polynomial time nor NP-complete: