SlideShare a Scribd company logo
Graph Mining - Motivation,
Applications and Algorithms
Presented By
Tallal Ahmed Farooq (16094119-071)
Adnan Ali (16094119-062)
Mustafa Shareef (16094119-058)
Outline
• Introduction
• Motivation and applications of Graph Mining
• Mining Frequent Subgraphs – Transaction setting
– BFS/Apriori Approach (FSG and others)
– DFS Approach (gSpan and others)
– Greedy Approach
• Mining Frequent Subgraphs – Single graph setting
– The support issue
– The path-based algorithm
What Graphs are good for?
• Most of existing data mining algorithms are based on
transaction representation, i.e., sets of items.
• Datasets with structures, layers, hierarchy and/or
geometry often do not fit well in this transaction
setting. For e.g.
– Numerical simulations
– 3D protein structures
– Chemical Compounds
– Generic XML files.
Graph Based Data Mining
• Graph Mining is the problem of discovering repetitive
subgraphs occurring in the input graphs.
• Motivation:
– finding subgraphs capable of compressing the data by
abstracting instances of the substructures.
– identifying conceptually interesting patterns
Why Graph Mining
• Graphs are everywhere
– Chemical compounds (Cheminformatics)
– Protein structures, biological pathways/networks (Bioinformactics)
– Program control flow, traffic flow, and workflow analysis
– XML databases, Web, and social network analysis
• Graph is a general model
– Trees, lattices, sequences, and items are degenerated graphs
• Diversity of graphs
– Directed vs. undirected, labeled vs. unlabeled (edges & vertices), weighted, with
angles & geometry (topological vs. 2-D/3-D)
• Complexity of algorithms: many problems are of high complexity (NP
complete or even P-SPACE !)
Graphs, Graphs, Everywhere
Aspirin Yeast protein interaction network
Internet
Co-author network
Graph Pattern Mining
• Frequent subgraphs
– A (sub)graph is frequent if its support (occurrence
frequency) in a given dataset is no less than a minimum
support threshold
• Applications of graph pattern mining:
– Mining biochemical structures
– Program control flow analysis
– Mining XML structures or Web communities
– Building blocks for graph classification, clustering,
compression, comparison, and correlation analysis
Example 1
GRAPH DATASET
FREQUENT PATTERNS
(MIN SUPPORT IS 2)
(T1) (T2) (T3)
(1) (2)
Example 2
GRAPH DATASET
FREQUENT PATTERNS
(MIN SUPPORT IS 2)
Graph Mining Algorithms
• Simple path patterns
• Generalized path patterns
• Simple tree patterns
• Tree-like patterns
• General graph patterns
Graph mining methods
• Apriori-based approach
• Pattern-growth approach
Apriori-Based Approach
…
G
G1
G2
Gn
k-graph
(k+1)-graph
G’
G’’
join
Pattern Growth Method
…
G
G1
G2
Gn
k-graph
(k+1)-graph
…
(k+2)-graph
…
duplicate
graphs
Transaction Setting
 Input: (D, minSup)
 Set of labeled-graphs transactions D={T1, T2, …, TN}
 Minimum support minSup
 Output: (All frequent subgraphs).
 A subgraph is frequent if it is a subgraph of at least
minSup|D| (or #minSup) different transactions in D.
 Each subgraph is connected.
Single graph setting
 Input: (D, minSup)
 A single graph D (e.g. the Web or DBLP or an XML file)
 Minimum support minSup
 Output: (All frequent subgraphs).
 A subgraph is frequent if the number of its occurrences in D
is above an admissible support measure (measure that
satisfies the downward closure property).
Graph Mining: Transaction Setting
Finding Frequent Subgraphs:
Input and Output
Input
– Database of graph transactions.
– Undirected simple graph
(no loops, no multiples edges).
– Each graph transaction has labels
associated with its vertices and edges.
– Transactions may not be connected.
– Minimum support threshold σ.
Output
– Frequent subgraphs that satisfy the
minimum support constraint.
– Each frequent subgraph is connected.
Support = 100%
Support = 66%
Support = 66%
Input: Graph Transactions Output: Frequent Connected Subgraphs
Different Approaches for GM
• Apriori Approach
– FSG
– Path Based
• DFS Approach
– gSpan
• Greedy Approach
– Subdue
FSG Algorithm
Notation: k-subgraph is a subgraph with k edges.
Init: Scan the transactions to find F1, the set of all frequent
1-subgraphs and 2-subgraphs, together with their counts;
For (k=3; Fk-1   ; k++)
1. Candidate Generation - Ck, the set of candidate k-subgraphs, from Fk-
1, the set of frequent (k-1)-subgraphs;
2. Candidates pruning - a necessary condition of candidate to be
frequent is that each of its (k-1)-subgraphs is frequent.
3. Frequency counting - Scan the transactions to count the occurrences
of subgraphs in Ck;
4. Fk = { c CK | c has counts no less than #minSup }
5. Return F1  F2  …… Fk (= F )
Candidates generation (join) based
on core detection
+
+
+
part1:
Define the Tree Search Space (TSS)
Part2:
Find all frequent graphs
by exploring TSS
gSpan outline
gSpan Algorithm
gSpan(D, F, g)
1: if g  min(g)
return;
2: F  F  { g }
3: children(g)  [generate all g’ potential children with one edge growth]*
4: Enumerate(D, g, children(g))
5: for each c  children(g)
if support(c)  #minSup
SubgraphMining (D, F, c)
___________________________
* gSpan improve this line
aaa
a
c a
a
a
bb bb
b
b
c c c
c a
c
c
c
T2 T3T1
Given: database D
Task: Mine all frequent subgraphs with support  2 (#minSup)
Example
g Span
X
Y
X
Z
Z
a
a
b
c
b d
v0
v1
v2
v3
v4
X
Y
X
Z
Z
a
a
b
c
b
d
Y
X
X
Z
Za
b
a
c
d
v0
v1
v2
v3
v4
b
X
X
Y
Z
Z
a
b
a
b
d
v0
v1
v2
v3
v4
c
(a)
(b) (c)
(c)(b)(a)
(0, 1, X, a, X)(0, 1, Y, a, X)(0, 1, X, a, Y)1
(1, 2, X, a, Y)(1, 2, X, a, X)(1, 2, Y, b, X)2
(2, 0, Y, b, X)(2, 0, X, b, Y)(2, 0, X, a, X)3
(2, 3, Y, b, Z)(2, 3, X, c, Z)(2, 3, X, c, Z)4
(3, 0, Z, c, X)(3, 0, Z, b, Y)(3, 1, Z, b, Y)5
(2, 4, Y, d, Z)(0, 4, Y, d, Z)(1, 4, Y, d, Z)6
Min
DFS-Code G
gSpan Performance
• On synthetic datasets it was 6-10 times faster than
FSG.
• On Chemical compounds datasets it was 15-100
times faster!
• But this was comparing to OLD versions of FSG!
Different Approaches for GM
• Apriori Approach
– FSG
– Path Based
• DFS Approach
– gSpan
• Greedy Approach
– Subdue
D. J. Cook and L. B. Holder
Graph-Based Data Mining
Tech. report, Department of CS
Engineering, 1998
Graph Pattern Explosion Problem
• If a graph is frequent, all of its subgraphs are frequent ─ the
Apriori property
• An n-edge frequent graph may have 2n subgraphs.
• Among 422 chemical compounds which are confirmed to be
active in an AIDS antiviral screen dataset, there are 1,000,000
frequent graph patterns if the minimum support is 5%.
Subdue algorithm
• A greedy algorithm for finding some of the most
prevalent subgraphs.
• This method is not complete, i.e. it may not obtain all
frequent subgraphs, although it pays in fast
execution.
Subdue algorithm (Cont.)
• It discovers substructures that compress the original
data and represent structural concepts in the data.
• Based on Beam Search - like BFS it progresses level by
level. Unlike BFS, however, beam search moves
downward only through the best W nodes at each
level. The other nodes are ignored.
Step 1: Create substructure for each unique vertex label
circle
rectangle
left
triangle
square
on
on
triangle
square
on
on
triangle
square
on
on
triangle
square
on
on
left
left left
left
Substructures:
triangle (4)
square (4)
circle (1)
rectangle (1)
Subdue algorithm: step 1
DB:
Subdue Algorithm: step 2
Step 2: Expand best substructure by an edge or edge and
neighboring vertex
circle
rectangle
left
triangle
square
on
triangle
square
on
on
triangle
square
on
on
triangle
square
on
on
left
left left
left
triangle
square
on
on
circle
triangle
square
circleleft
square
rectangle
square
on
rectangle
triangle
on
Substructures:DB:
Step 3: Keep only best substructures on queue (specified by
beam width).
Step 4: Terminate when queue is empty or when the number
of discovered substructures is greater than or equal to the
limit specified.
Step 5: Compress graph and repeat to generate hierarchical
description.
Subdue Algorithm: steps 3-5
Single Graph Setting vs Transaction
Setting
Most existing algorithms use a transaction setting
approach.
That is, if a pattern appears in a transaction even
multiple times it is counted as 1 (FSG, gSPAN ).
What if the entire database is a single graph?
This is called single graph setting.
Single graph setting - Motivation
 Often the input is a single large graph.
 Examples:
 The web or portions of it.
 A social network (e.g. a network of users communicating by
email at BGU).
 A large XML database such as DBLP or Movies database.
 Mining large graph databases is very useful.

More Related Content

What's hot

Asymptotic notations
Asymptotic notationsAsymptotic notations
Asymptotic notations
Nikhil Sharma
 
Bfs and Dfs
Bfs and DfsBfs and Dfs
Bfs and Dfs
Masud Parvaze
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
Institute of Technology Telkom
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
Kamal Acharya
 
unit-4-dynamic programming
unit-4-dynamic programmingunit-4-dynamic programming
unit-4-dynamic programming
hodcsencet
 
Chapter8
Chapter8Chapter8
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
Mainul Hassan
 
Max flow min cut
Max flow min cutMax flow min cut
Max flow min cut
Mayank Garg
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
hina firdaus
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
Benazir Income Support Program (BISP)
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
Ashis Kumar Chanda
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
Ashikapokiya12345
 
Dfs presentation
Dfs presentationDfs presentation
Dfs presentation
Alizay Khan
 
Fp growth
Fp growthFp growth
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
mrizwan969
 
Algorithms Lecture 7: Graph Algorithms
Algorithms Lecture 7: Graph AlgorithmsAlgorithms Lecture 7: Graph Algorithms
Algorithms Lecture 7: Graph Algorithms
Mohamed Loey
 
RECURSIVE DESCENT PARSING
RECURSIVE DESCENT PARSINGRECURSIVE DESCENT PARSING
RECURSIVE DESCENT PARSING
Jothi Lakshmi
 
Breadth First Search & Depth First Search
Breadth First Search & Depth First SearchBreadth First Search & Depth First Search
Breadth First Search & Depth First Search
Kevin Jadiya
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 

What's hot (20)

Asymptotic notations
Asymptotic notationsAsymptotic notations
Asymptotic notations
 
Bfs and Dfs
Bfs and DfsBfs and Dfs
Bfs and Dfs
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
unit-4-dynamic programming
unit-4-dynamic programmingunit-4-dynamic programming
unit-4-dynamic programming
 
Chapter8
Chapter8Chapter8
Chapter8
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Max flow min cut
Max flow min cutMax flow min cut
Max flow min cut
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Assosiate rule mining
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
Dfs presentation
Dfs presentationDfs presentation
Dfs presentation
 
Fp growth
Fp growthFp growth
Fp growth
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Algorithms Lecture 7: Graph Algorithms
Algorithms Lecture 7: Graph AlgorithmsAlgorithms Lecture 7: Graph Algorithms
Algorithms Lecture 7: Graph Algorithms
 
RECURSIVE DESCENT PARSING
RECURSIVE DESCENT PARSINGRECURSIVE DESCENT PARSING
RECURSIVE DESCENT PARSING
 
Breadth First Search & Depth First Search
Breadth First Search & Depth First SearchBreadth First Search & Depth First Search
Breadth First Search & Depth First Search
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 

Similar to Graph mining ppt

Graph mining seminar_2009
Graph mining seminar_2009Graph mining seminar_2009
Graph mining seminar_2009
Houw Liong The
 
graph_mining_seminar_2009.ppt
graph_mining_seminar_2009.pptgraph_mining_seminar_2009.ppt
graph_mining_seminar_2009.ppt
Venkateswara Rao Katevarapu
 
Data Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network AnalysisData Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network Analysis
vwchu
 
Lgm saarbrucken
Lgm saarbruckenLgm saarbrucken
Lgm saarbrucken
Yasuo Tabei
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Salah Amean
 
gSpan algorithm
 gSpan algorithm gSpan algorithm
gSpan algorithm
Sadik Mussah
 
gSpan algorithm
gSpan algorithmgSpan algorithm
gSpan algorithm
Sadik Mussah
 
LalitBDA2015V3
LalitBDA2015V3LalitBDA2015V3
LalitBDA2015V3
Lalit Kumar
 
List intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and OptimizationsList intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and Optimizations
Sunghwan Kim
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
Krish_ver2
 
Survey of Graph Indexing
Survey of Graph IndexingSurvey of Graph Indexing
Survey of Graph Indexing
Kisung Kim
 
Finding similar items in high dimensional spaces locality sensitive hashing
Finding similar items in high dimensional spaces  locality sensitive hashingFinding similar items in high dimensional spaces  locality sensitive hashing
Finding similar items in high dimensional spaces locality sensitive hashing
Dmitriy Selivanov
 
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Mail.ru Group
 
Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr -
PyData
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming Applications
Debasish Ghosh
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
Shani729
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatterns
Kamal Singh Lodhi
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
My6asso
My6assoMy6asso
My6asso
ketan533
 
ODSC 2019: Sessionisation via stochastic periods for root event identification
ODSC 2019: Sessionisation via stochastic periods for root event identificationODSC 2019: Sessionisation via stochastic periods for root event identification
ODSC 2019: Sessionisation via stochastic periods for root event identification
Kuldeep Jiwani
 

Similar to Graph mining ppt (20)

Graph mining seminar_2009
Graph mining seminar_2009Graph mining seminar_2009
Graph mining seminar_2009
 
graph_mining_seminar_2009.ppt
graph_mining_seminar_2009.pptgraph_mining_seminar_2009.ppt
graph_mining_seminar_2009.ppt
 
Data Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network AnalysisData Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network Analysis
 
Lgm saarbrucken
Lgm saarbruckenLgm saarbrucken
Lgm saarbrucken
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
gSpan algorithm
 gSpan algorithm gSpan algorithm
gSpan algorithm
 
gSpan algorithm
gSpan algorithmgSpan algorithm
gSpan algorithm
 
LalitBDA2015V3
LalitBDA2015V3LalitBDA2015V3
LalitBDA2015V3
 
List intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and OptimizationsList intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and Optimizations
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
 
Survey of Graph Indexing
Survey of Graph IndexingSurvey of Graph Indexing
Survey of Graph Indexing
 
Finding similar items in high dimensional spaces locality sensitive hashing
Finding similar items in high dimensional spaces  locality sensitive hashingFinding similar items in high dimensional spaces  locality sensitive hashing
Finding similar items in high dimensional spaces locality sensitive hashing
 
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
 
Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr -
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming Applications
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatterns
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
My6asso
My6assoMy6asso
My6asso
 
ODSC 2019: Sessionisation via stochastic periods for root event identification
ODSC 2019: Sessionisation via stochastic periods for root event identificationODSC 2019: Sessionisation via stochastic periods for root event identification
ODSC 2019: Sessionisation via stochastic periods for root event identification
 

Recently uploaded

1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
MadhavJungKarki
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
sydezfe
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
AlvianRamadhani5
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...
DharmaBanothu
 
FULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back EndFULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back End
PreethaV16
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
PreethaV16
 
Bituminous road construction project based learning report
Bituminous road construction project based learning reportBituminous road construction project based learning report
Bituminous road construction project based learning report
CE19KaushlendraKumar
 
P5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civilP5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civil
AnasAhmadNoor
 
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
OKORIE1
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
Indrajeet sahu
 
Power Electronics- AC -AC Converters.pptx
Power Electronics- AC -AC Converters.pptxPower Electronics- AC -AC Converters.pptx
Power Electronics- AC -AC Converters.pptx
Poornima D
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
harshapolam10
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
Paris Salesforce Developer Group
 
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Levelised Cost of Hydrogen  (LCOH) Calculator ManualLevelised Cost of Hydrogen  (LCOH) Calculator Manual
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Massimo Talia
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
uqyfuc
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
vmspraneeth
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 

Recently uploaded (20)

1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...
 
FULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back EndFULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back End
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
 
Bituminous road construction project based learning report
Bituminous road construction project based learning reportBituminous road construction project based learning report
Bituminous road construction project based learning report
 
P5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civilP5 Working Drawings.pdf floor plan, civil
P5 Working Drawings.pdf floor plan, civil
 
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
 
Power Electronics- AC -AC Converters.pptx
Power Electronics- AC -AC Converters.pptxPower Electronics- AC -AC Converters.pptx
Power Electronics- AC -AC Converters.pptx
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
 
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Levelised Cost of Hydrogen  (LCOH) Calculator ManualLevelised Cost of Hydrogen  (LCOH) Calculator Manual
Levelised Cost of Hydrogen (LCOH) Calculator Manual
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 

Graph mining ppt

  • 1. Graph Mining - Motivation, Applications and Algorithms Presented By Tallal Ahmed Farooq (16094119-071) Adnan Ali (16094119-062) Mustafa Shareef (16094119-058)
  • 2. Outline • Introduction • Motivation and applications of Graph Mining • Mining Frequent Subgraphs – Transaction setting – BFS/Apriori Approach (FSG and others) – DFS Approach (gSpan and others) – Greedy Approach • Mining Frequent Subgraphs – Single graph setting – The support issue – The path-based algorithm
  • 3. What Graphs are good for? • Most of existing data mining algorithms are based on transaction representation, i.e., sets of items. • Datasets with structures, layers, hierarchy and/or geometry often do not fit well in this transaction setting. For e.g. – Numerical simulations – 3D protein structures – Chemical Compounds – Generic XML files.
  • 4. Graph Based Data Mining • Graph Mining is the problem of discovering repetitive subgraphs occurring in the input graphs. • Motivation: – finding subgraphs capable of compressing the data by abstracting instances of the substructures. – identifying conceptually interesting patterns
  • 5. Why Graph Mining • Graphs are everywhere – Chemical compounds (Cheminformatics) – Protein structures, biological pathways/networks (Bioinformactics) – Program control flow, traffic flow, and workflow analysis – XML databases, Web, and social network analysis • Graph is a general model – Trees, lattices, sequences, and items are degenerated graphs • Diversity of graphs – Directed vs. undirected, labeled vs. unlabeled (edges & vertices), weighted, with angles & geometry (topological vs. 2-D/3-D) • Complexity of algorithms: many problems are of high complexity (NP complete or even P-SPACE !)
  • 6. Graphs, Graphs, Everywhere Aspirin Yeast protein interaction network Internet Co-author network
  • 7. Graph Pattern Mining • Frequent subgraphs – A (sub)graph is frequent if its support (occurrence frequency) in a given dataset is no less than a minimum support threshold • Applications of graph pattern mining: – Mining biochemical structures – Program control flow analysis – Mining XML structures or Web communities – Building blocks for graph classification, clustering, compression, comparison, and correlation analysis
  • 8. Example 1 GRAPH DATASET FREQUENT PATTERNS (MIN SUPPORT IS 2) (T1) (T2) (T3) (1) (2)
  • 9. Example 2 GRAPH DATASET FREQUENT PATTERNS (MIN SUPPORT IS 2)
  • 10. Graph Mining Algorithms • Simple path patterns • Generalized path patterns • Simple tree patterns • Tree-like patterns • General graph patterns
  • 11. Graph mining methods • Apriori-based approach • Pattern-growth approach
  • 14. Transaction Setting  Input: (D, minSup)  Set of labeled-graphs transactions D={T1, T2, …, TN}  Minimum support minSup  Output: (All frequent subgraphs).  A subgraph is frequent if it is a subgraph of at least minSup|D| (or #minSup) different transactions in D.  Each subgraph is connected.
  • 15. Single graph setting  Input: (D, minSup)  A single graph D (e.g. the Web or DBLP or an XML file)  Minimum support minSup  Output: (All frequent subgraphs).  A subgraph is frequent if the number of its occurrences in D is above an admissible support measure (measure that satisfies the downward closure property).
  • 17. Finding Frequent Subgraphs: Input and Output Input – Database of graph transactions. – Undirected simple graph (no loops, no multiples edges). – Each graph transaction has labels associated with its vertices and edges. – Transactions may not be connected. – Minimum support threshold σ. Output – Frequent subgraphs that satisfy the minimum support constraint. – Each frequent subgraph is connected. Support = 100% Support = 66% Support = 66% Input: Graph Transactions Output: Frequent Connected Subgraphs
  • 18. Different Approaches for GM • Apriori Approach – FSG – Path Based • DFS Approach – gSpan • Greedy Approach – Subdue
  • 19. FSG Algorithm Notation: k-subgraph is a subgraph with k edges. Init: Scan the transactions to find F1, the set of all frequent 1-subgraphs and 2-subgraphs, together with their counts; For (k=3; Fk-1   ; k++) 1. Candidate Generation - Ck, the set of candidate k-subgraphs, from Fk- 1, the set of frequent (k-1)-subgraphs; 2. Candidates pruning - a necessary condition of candidate to be frequent is that each of its (k-1)-subgraphs is frequent. 3. Frequency counting - Scan the transactions to count the occurrences of subgraphs in Ck; 4. Fk = { c CK | c has counts no less than #minSup } 5. Return F1  F2  …… Fk (= F )
  • 20. Candidates generation (join) based on core detection + + +
  • 21. part1: Define the Tree Search Space (TSS) Part2: Find all frequent graphs by exploring TSS gSpan outline
  • 22. gSpan Algorithm gSpan(D, F, g) 1: if g  min(g) return; 2: F  F  { g } 3: children(g)  [generate all g’ potential children with one edge growth]* 4: Enumerate(D, g, children(g)) 5: for each c  children(g) if support(c)  #minSup SubgraphMining (D, F, c) ___________________________ * gSpan improve this line
  • 23. aaa a c a a a bb bb b b c c c c a c c c T2 T3T1 Given: database D Task: Mine all frequent subgraphs with support  2 (#minSup) Example
  • 24. g Span X Y X Z Z a a b c b d v0 v1 v2 v3 v4 X Y X Z Z a a b c b d Y X X Z Za b a c d v0 v1 v2 v3 v4 b X X Y Z Z a b a b d v0 v1 v2 v3 v4 c (a) (b) (c) (c)(b)(a) (0, 1, X, a, X)(0, 1, Y, a, X)(0, 1, X, a, Y)1 (1, 2, X, a, Y)(1, 2, X, a, X)(1, 2, Y, b, X)2 (2, 0, Y, b, X)(2, 0, X, b, Y)(2, 0, X, a, X)3 (2, 3, Y, b, Z)(2, 3, X, c, Z)(2, 3, X, c, Z)4 (3, 0, Z, c, X)(3, 0, Z, b, Y)(3, 1, Z, b, Y)5 (2, 4, Y, d, Z)(0, 4, Y, d, Z)(1, 4, Y, d, Z)6 Min DFS-Code G
  • 25. gSpan Performance • On synthetic datasets it was 6-10 times faster than FSG. • On Chemical compounds datasets it was 15-100 times faster! • But this was comparing to OLD versions of FSG!
  • 26. Different Approaches for GM • Apriori Approach – FSG – Path Based • DFS Approach – gSpan • Greedy Approach – Subdue D. J. Cook and L. B. Holder Graph-Based Data Mining Tech. report, Department of CS Engineering, 1998
  • 27. Graph Pattern Explosion Problem • If a graph is frequent, all of its subgraphs are frequent ─ the Apriori property • An n-edge frequent graph may have 2n subgraphs. • Among 422 chemical compounds which are confirmed to be active in an AIDS antiviral screen dataset, there are 1,000,000 frequent graph patterns if the minimum support is 5%.
  • 28. Subdue algorithm • A greedy algorithm for finding some of the most prevalent subgraphs. • This method is not complete, i.e. it may not obtain all frequent subgraphs, although it pays in fast execution.
  • 29. Subdue algorithm (Cont.) • It discovers substructures that compress the original data and represent structural concepts in the data. • Based on Beam Search - like BFS it progresses level by level. Unlike BFS, however, beam search moves downward only through the best W nodes at each level. The other nodes are ignored.
  • 30. Step 1: Create substructure for each unique vertex label circle rectangle left triangle square on on triangle square on on triangle square on on triangle square on on left left left left Substructures: triangle (4) square (4) circle (1) rectangle (1) Subdue algorithm: step 1 DB:
  • 31. Subdue Algorithm: step 2 Step 2: Expand best substructure by an edge or edge and neighboring vertex circle rectangle left triangle square on triangle square on on triangle square on on triangle square on on left left left left triangle square on on circle triangle square circleleft square rectangle square on rectangle triangle on Substructures:DB:
  • 32. Step 3: Keep only best substructures on queue (specified by beam width). Step 4: Terminate when queue is empty or when the number of discovered substructures is greater than or equal to the limit specified. Step 5: Compress graph and repeat to generate hierarchical description. Subdue Algorithm: steps 3-5
  • 33. Single Graph Setting vs Transaction Setting Most existing algorithms use a transaction setting approach. That is, if a pattern appears in a transaction even multiple times it is counted as 1 (FSG, gSPAN ). What if the entire database is a single graph? This is called single graph setting.
  • 34. Single graph setting - Motivation  Often the input is a single large graph.  Examples:  The web or portions of it.  A social network (e.g. a network of users communicating by email at BGU).  A large XML database such as DBLP or Movies database.  Mining large graph databases is very useful.