SlideShare a Scribd company logo
Lecture 11:
Graph Data Mining

Slides are modified from Jiawei Han & Micheline Kamber
Graph Data Mining
 DNA sequence

 RNA
Graph Data Mining
 Compounds

 Texts
Outline

 Graph Pattern Mining
 Mining Frequent Subgraph Patterns
 Graph Indexing
 Graph Similarity Search

 Graph Classification
 Graph pattern-based approach
 Machine Learning approaches

 Graph Clustering
 Link-density-based approach
Graph Pattern Mining
 Frequent subgraphs
 A (sub)graph is frequent if its support (occurrence frequency) in

a given dataset is no less than a minimum support threshold

 Support of a graph g is defined as the percentage of

graphs in G which have g as subgraph
 Applications of graph pattern mining
 Mining biochemical structures
 Program control flow analysis
 Mining XML structures or Web communities
 Building blocks for graph classification, clustering, compression,

comparison, and correlation analysis
5
Example: Frequent Subgraphs
GRAPH DATASET

(A)

(B)

(C)

FREQUENT PATTERNS
(MIN SUPPORT IS 2)

(1)

(2)

6
Example
GRAPH DATASET

FREQUENT PATTERNS
(MIN SUPPORT IS 2)

7
Graph Mining Algorithms

 Incomplete beam search – Greedy (Subdue)

 Inductive logic programming (WARMR)

 Graph theory-based approaches
 Apriori-based approach
 Pattern-growth approach

8
Properties of Graph Mining Algorithms
 Search order
 breadth vs. depth
 Generation of candidate subgraphs
 apriori vs. pattern growth
 Elimination of duplicate subgraphs
 passive vs. active

 Support calculation
 embedding store or not
 Discover order of patterns
 path  tree  graph
9
Apriori-Based Approach

k-edge

(k+1)-edge
G1

G1

G
G2
G’

…
Gn

G’’
Join

Gn

Subgraph
isomorphism
test
NP-complete

Prune
check the frequency of
each candidate
10
Apriori-Based, Breadth-First Search


Methodology: breadth-search, joining two graphs

 AGM (Inokuchi, et al.)
 generates new graphs with one more node



FSG (Kuramochi and Karypis)
 generates new graphs with one more edge
11
Pattern Growth Method
(k+2)-edge
(k+1)-edge
G1

k-edge

…

duplicate
graph

G2

G
…
Gn

…

12
Graph Pattern Explosion Problem
 If a graph is frequent, all of its subgraphs are frequent
 the Apriori property

 An n-edge frequent graph may have 2n subgraphs

 Among 422 chemical compounds which are confirmed to

be active in an AIDS antiviral screen dataset,
 there are 1,000,000 frequent graph patterns if the minimum

support is 5%

13
Closed Frequent Graphs
 A frequent graph G is closed
 if there exists no supergraph of G that carries the same support

as G

 If some of G’s subgraphs have the same support
 it is unnecessary to output these subgraphs
 nonclosed graphs

 Lossless compression
 Still ensures that the mining result is complete
Graph Search
 Querying graph databases:
 Given a graph database and a query graph, find all the
graphs containing this query graph

query graph

graph database

15
Scalability Issue
 Naïve solution
 Sequential scan (Disk I/O)
 Subgraph isomorphism test (NP-complete)

 Problem: Scalability is a big issue

 An indexing mechanism is needed

16
Indexing Strategy

Query graph (Q)

Graph (G)
If graph G contains query
graph Q, G should contain
any substructure of Q

Substructure
Remarks

Index substructures of a query graph to prune graphs that do not
contain these substructures

17
Indexing Framework
 Two steps in processing graph queries

Step 1. Index Construction
 Enumerate structures in the graph database,
build an inverted index between structures
and graphs
Step 2. Query Processing
 Enumerate structures in the query graph
 Calculate the candidate graphs containing
these structures
 Prune the false positive answers by
performing subgraph isomorphism test
18
Why Frequent Structures?
 We cannot index (or even search) all of substructures
 Large structures will likely be indexed well by their

substructures
 Size-increasing support threshold

support

minimum
support threshold

size
19
Structure Similarity Search
• CHEMICAL COMPOUNDS

(a) caffeine

(b) diurobromine

(c) sildenafil

• QUERY GRAPH

20
Substructure Similarity Measure
 Feature-based similarity measure
 Each graph is represented as a feature vector

X = {x1, x2, …, xn}
 Similarity is defined by the distance of their

corresponding vectors
 Advantages
 Easy to index
 Fast
 Rough measure
21
Some “Straightforward” Methods
 Method1: Directly compute the similarity between the

graphs in the DB and the query graph
 Sequential scan
 Subgraph similarity computation

 Method 2: Form a set of subgraph queries from the

original query graph and use the exact subgraph search
 Costly: If we allow 3 edges to be missed in a 20-edge query

graph, it may generate 1,140 subgraphs

22
Index: Precise vs. Approximate Search
 Precise Search
 Use frequent patterns as indexing features
 Select features in the database space based on their selectivity
 Build the index

 Approximate Search
 Hard to build indices covering similar subgraphs
 explosive number of subgraphs in databases

 Idea: (1) keep the index structure

(2) select features in the query space

23
Outline

 Graph Pattern Mining
 Mining Frequent Subgraph Patterns
 Graph Indexing
 Graph Similarity Search

 Graph Classification
 Graph pattern-based approach
 Machine Learning approaches

 Graph Clustering
 Link-density-based approach
Substructure-Based Graph Classification
 Basic idea
 Extract graph substructures

F = {g1,..., g n }

 Represent a graph with a feature vector
 where

xi

is the frequency of
 Build a classification model

x = {x1 ,..., xn },

g i in that graph

 Different features and representative work
 Fingerprint
 Maccs keys
 Tree and cyclic patterns [Horvath et al.]
 Minimal contrast subgraph [Ting and Bailey]
 Frequent subgraphs [Deshpande et al.; Liu et al.]
 Graph fragments [Wale and Karypis]
Direct Mining of Discriminative Patterns
 Avoid mining the whole set of patterns
 Harmony [Wang and Karypis]
 DDPMine [Cheng et al.]
 LEAP [Yan et al.]
 MbT [Fan et al.]

 Find the most discriminative pattern
 A search problem?
 An optimization problem?

 Extensions
 Mining top-k discriminative patterns
 Mining approximate/weighted discriminative patterns
Graph Kernels
 Motivation:
 Kernel based learning methods doesn’t need to access data

points
 They rely on the kernel function between the data points

 Can be applied to any complex structure provided you can

define a kernel function on them

 Basic idea:
 Map each graph to some significant set of patterns
 Define a kernel on the corresponding sets of patterns

27
Kernel-based Classification
 Random walk
 Basic Idea: count the matching random walks between the two graphs

 Marginalized Kernels
 Gärtner ’02, Kashima et al. ’02, Mahé et al.’04





and

are paths in graphs
and

and

are probability distributions on paths
is a kernel between paths, e.g.,
Boosting in Graph Classification
 Decision stumps
 Simple classifiers in which the final decision is made by single

features
 A rule is a tuple
 If a molecule contains substructure

 Gain

 Applying boosting

, it is classified as

.
Outline

 Graph Pattern Mining
 Mining Frequent Subgraph Patterns
 Graph Indexing
 Graph Similarity Search

 Graph Classification
 Graph pattern-based approach
 Machine Learning approaches

 Graph Clustering
 Link-density-based approach
Graph Compression
 Extract common subgraphs and simplify graphs by

condensing these subgraphs into nodes
Graph/Network Clustering Problem
 Networks made up of the mutual relationships of data

elements usually have an underlying structure
 Because relationships are complex, it is difficult to discover

these structures.
 How can the structure be made clear?

 Given simple information of who associates with whom,

could one identify clusters of individuals with common
interests or special relationships?
 E.g., families, cliques, terrorist cells…
An Example of Networks
 How many clusters?
 What size should they be?
 What is the best

partitioning?
 Should some points be

segregated?
A Social Network Model
 Individuals in a tight social group, or clique, know

many of the same people
 regardless of the size of the group

 Individuals who are hubs know many people in

different groups but belong to no single group
 E.g., politicians bridge multiple groups

 Individuals who are outliers reside at the margins of

society
 E.g., Hermits know few people and belong to no group
The Neighborhood of a Vertex

 Define Γ(ν) as the immediate neighborhood of a vertex
 i.e. the set of people that an individual knows
Structure Similarity
 The desired features tend to be captured by a measure

called Structural Similarity

| Γ(v)  Γ( w) |
σ (v, w) =
| Γ(v) || Γ( w) |
 Structural similarity is large for members of a clique and

small for hubs and outliers.
Graph Mining

Frequent Subgraph
Mining (FSM)

Apriori
based
AGM
FSG
PATH

Pattern
Growth
based

gSpan
MoFa
GASTO
N FFSM
SPIN

Variant Subgraph
Pattern Mining

Applications of
Frequent Subgraph
Mining

Indexing
and
Search

Clustering
Coherent
Subgraph
mining
Closed
Dense
Classification
Subgraph CSA
Subgraph
CLAN
mining
Mining

Approximate
methods
SUBDUE
GBI

CloseGraph

CloseCut
Splat
CODENSE

Kernel Methods
(Graph Kernels)

GraphGrep
Daylight
gIndex
(Є Grafil)
37

More Related Content

What's hot

3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
Azad public school
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
Krish_ver2
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
Krish_ver2
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
Phi Jack
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
Albert Orriols-Puig
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Salah Amean
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
Zalpa Rathod
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Salah Amean
 
Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641
Aiswaryadevi Jaganmohan
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
Albert Orriols-Puig
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
DataminingTools Inc
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
maha797959
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
Laila Fatehy
 
Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data Mining
AarshDhokai
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Salah Amean
 
08. Mining Type Of Complex Data
08. Mining Type Of Complex Data08. Mining Type Of Complex Data
08. Mining Type Of Complex Data
Achmad Solichin
 
History of Data Science
History of Data ScienceHistory of Data Science
History of Data Science
Daniel Caesar
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
Ali Abbasi
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Salah Amean
 
Data mining
Data miningData mining
Data mining
Birju Tank
 

What's hot (20)

3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
 
Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data Mining
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
08. Mining Type Of Complex Data
08. Mining Type Of Complex Data08. Mining Type Of Complex Data
08. Mining Type Of Complex Data
 
History of Data Science
History of Data ScienceHistory of Data Science
History of Data Science
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
Data mining
Data miningData mining
Data mining
 

Viewers also liked

Data Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network AnalysisData Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network Analysis
vwchu
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
DataminingTools Inc
 
Dagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphsDagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphs
Arijit Khan
 
Trends In Graph Data Management And Mining
Trends In Graph Data Management And MiningTrends In Graph Data Management And Mining
Trends In Graph Data Management And Mining
Srinath Srinivasa
 
Graph mining seminar_2009
Graph mining seminar_2009Graph mining seminar_2009
Graph mining seminar_2009
Houw Liong The
 
Raster animation
Raster animationRaster animation
Raster animation
abhijit754
 
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosLarge Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
BigMine
 
DMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph MiningDMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph Mining
Pier Luca Lanzi
 
Etabs presentation with new graphics sept 2002
Etabs presentation with new graphics sept 2002Etabs presentation with new graphics sept 2002
Etabs presentation with new graphics sept 2002
Nguyen Bao
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
Datamining Tools
 
Principles of animation
Principles of animationPrinciples of animation
Principles of animation
Gena Montgomery
 
Basic Concepts of Animation
Basic Concepts of AnimationBasic Concepts of Animation
Basic Concepts of Animation
jamalharun
 
12 Concepts of Animation
12 Concepts of Animation12 Concepts of Animation
12 Concepts of Animation
jaspang
 
Computer animation Computer Graphics
Computer animation Computer Graphics Computer animation Computer Graphics
Computer animation Computer Graphics
University of Potsdam
 
Computer animation
Computer animationComputer animation
Computer animation
shusrusha
 
Animation techniques for CG students
Animation techniques for CG studentsAnimation techniques for CG students
Animation techniques for CG students
Mahith
 
Animation Techniques
Animation TechniquesAnimation Techniques
Animation Techniques
Media Studies
 
Introduction to Animation
Introduction to AnimationIntroduction to Animation
Introduction to Animation
mrnasim
 
Animation
AnimationAnimation
Animation
ankur bhalla
 

Viewers also liked (19)

Data Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network AnalysisData Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network Analysis
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Dagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphsDagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphs
 
Trends In Graph Data Management And Mining
Trends In Graph Data Management And MiningTrends In Graph Data Management And Mining
Trends In Graph Data Management And Mining
 
Graph mining seminar_2009
Graph mining seminar_2009Graph mining seminar_2009
Graph mining seminar_2009
 
Raster animation
Raster animationRaster animation
Raster animation
 
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosLarge Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
 
DMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph MiningDMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph Mining
 
Etabs presentation with new graphics sept 2002
Etabs presentation with new graphics sept 2002Etabs presentation with new graphics sept 2002
Etabs presentation with new graphics sept 2002
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Principles of animation
Principles of animationPrinciples of animation
Principles of animation
 
Basic Concepts of Animation
Basic Concepts of AnimationBasic Concepts of Animation
Basic Concepts of Animation
 
12 Concepts of Animation
12 Concepts of Animation12 Concepts of Animation
12 Concepts of Animation
 
Computer animation Computer Graphics
Computer animation Computer Graphics Computer animation Computer Graphics
Computer animation Computer Graphics
 
Computer animation
Computer animationComputer animation
Computer animation
 
Animation techniques for CG students
Animation techniques for CG studentsAnimation techniques for CG students
Animation techniques for CG students
 
Animation Techniques
Animation TechniquesAnimation Techniques
Animation Techniques
 
Introduction to Animation
Introduction to AnimationIntroduction to Animation
Introduction to Animation
 
Animation
AnimationAnimation
Animation
 

Similar to Lect12 graph mining

Survey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - SlidesSurvey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - Slides
Kasun Gajasinghe
 
Tdm recent trends
Tdm recent trendsTdm recent trends
Tdm recent trends
KU Leuven
 
GraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceGraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data Science
Neo4j
 
Data mining.pptx
Data mining.pptxData mining.pptx
Data mining.pptx
Sanjay Chakraborty
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
NYC Predictive Analytics
 
20151130
2015113020151130
20151130
chen chao
 
Substructure Similarity Search in Graph Databases
Substructure Similarity Search in Graph DatabasesSubstructure Similarity Search in Graph Databases
Substructure Similarity Search in Graph Databases
pgst
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
Doug Needham
 
Data Structures unit I Introduction - data types
Data Structures unit I Introduction - data typesData Structures unit I Introduction - data types
Data Structures unit I Introduction - data types
AmirthaVarshini80
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
Yu Liu
 
Neo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpNeo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExp
Adrian Ziegler
 
Cs501 cluster analysis
Cs501 cluster analysisCs501 cluster analysis
Cs501 cluster analysis
Kamal Singh Lodhi
 
Graph based Clustering
Graph based ClusteringGraph based Clustering
Graph based Clustering
怡秀 林
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
YONG ZHENG
 
K044055762
K044055762K044055762
K044055762
IJERA Editor
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptx
GandhiMathy6
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
James Wong
 
Text categorization as a graph
Text categorization as a graph Text categorization as a graph
Text categorization as a graph
David Hoen
 
Text categorization as graph
Text categorization as graphText categorization as graph
Text categorization as graph
Harry Potter
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
Fraboni Ec
 

Similar to Lect12 graph mining (20)

Survey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - SlidesSurvey on Frequent Pattern Mining on Graph Data - Slides
Survey on Frequent Pattern Mining on Graph Data - Slides
 
Tdm recent trends
Tdm recent trendsTdm recent trends
Tdm recent trends
 
GraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceGraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data Science
 
Data mining.pptx
Data mining.pptxData mining.pptx
Data mining.pptx
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
20151130
2015113020151130
20151130
 
Substructure Similarity Search in Graph Databases
Substructure Similarity Search in Graph DatabasesSubstructure Similarity Search in Graph Databases
Substructure Similarity Search in Graph Databases
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 
Data Structures unit I Introduction - data types
Data Structures unit I Introduction - data typesData Structures unit I Introduction - data types
Data Structures unit I Introduction - data types
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
 
Neo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpNeo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExp
 
Cs501 cluster analysis
Cs501 cluster analysisCs501 cluster analysis
Cs501 cluster analysis
 
Graph based Clustering
Graph based ClusteringGraph based Clustering
Graph based Clustering
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
K044055762
K044055762K044055762
K044055762
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptx
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
 
Text categorization as a graph
Text categorization as a graph Text categorization as a graph
Text categorization as a graph
 
Text categorization as graph
Text categorization as graphText categorization as graph
Text categorization as graph
 
Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
 

More from Houw Liong The

Museumgeologi 130427165857-phpapp02
Museumgeologi 130427165857-phpapp02Museumgeologi 130427165857-phpapp02
Museumgeologi 130427165857-phpapp02
Houw Liong The
 
Space weather
Space weather Space weather
Space weather
Houw Liong The
 
Research in data mining
Research in data miningResearch in data mining
Research in data mining
Houw Liong The
 
Indonesia
IndonesiaIndonesia
Indonesia
Houw Liong The
 
Canfis
CanfisCanfis
Climate Change
Climate ChangeClimate Change
Climate Change
Houw Liong The
 
Space Weather
Space Weather Space Weather
Space Weather
Houw Liong The
 
Fisika komputasi
Fisika komputasiFisika komputasi
Fisika komputasi
Houw Liong The
 
Fisika & komputasi cerdas
Fisika & komputasi cerdasFisika & komputasi cerdas
Fisika & komputasi cerdas
Houw Liong The
 
Climate model
Climate modelClimate model
Climate model
Houw Liong The
 
Sharma : social networks
Sharma : social networksSharma : social networks
Sharma : social networks
Houw Liong The
 
Introduction to-graph-theory-1204617648178088-2
Introduction to-graph-theory-1204617648178088-2Introduction to-graph-theory-1204617648178088-2
Introduction to-graph-theory-1204617648178088-2
Houw Liong The
 
Chaper 13 trend, Han & Kamber
Chaper 13 trend, Han & KamberChaper 13 trend, Han & Kamber
Chaper 13 trend, Han & Kamber
Houw Liong The
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
Houw Liong The
 
Chapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberChapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & Kamber
Houw Liong The
 
Web & text mining lecture10
Web & text mining lecture10Web & text mining lecture10
Web & text mining lecture10
Houw Liong The
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
Houw Liong The
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advanced
Houw Liong The
 
System dynamics prof nagurney
System dynamics prof nagurneySystem dynamics prof nagurney
System dynamics prof nagurney
Houw Liong The
 
System dynamics math representation
System dynamics math representationSystem dynamics math representation
System dynamics math representation
Houw Liong The
 

More from Houw Liong The (20)

Museumgeologi 130427165857-phpapp02
Museumgeologi 130427165857-phpapp02Museumgeologi 130427165857-phpapp02
Museumgeologi 130427165857-phpapp02
 
Space weather
Space weather Space weather
Space weather
 
Research in data mining
Research in data miningResearch in data mining
Research in data mining
 
Indonesia
IndonesiaIndonesia
Indonesia
 
Canfis
CanfisCanfis
Canfis
 
Climate Change
Climate ChangeClimate Change
Climate Change
 
Space Weather
Space Weather Space Weather
Space Weather
 
Fisika komputasi
Fisika komputasiFisika komputasi
Fisika komputasi
 
Fisika & komputasi cerdas
Fisika & komputasi cerdasFisika & komputasi cerdas
Fisika & komputasi cerdas
 
Climate model
Climate modelClimate model
Climate model
 
Sharma : social networks
Sharma : social networksSharma : social networks
Sharma : social networks
 
Introduction to-graph-theory-1204617648178088-2
Introduction to-graph-theory-1204617648178088-2Introduction to-graph-theory-1204617648178088-2
Introduction to-graph-theory-1204617648178088-2
 
Chaper 13 trend, Han & Kamber
Chaper 13 trend, Han & KamberChaper 13 trend, Han & Kamber
Chaper 13 trend, Han & Kamber
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
Chapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & KamberChapter 11 cluster advanced, Han & Kamber
Chapter 11 cluster advanced, Han & Kamber
 
Web & text mining lecture10
Web & text mining lecture10Web & text mining lecture10
Web & text mining lecture10
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advanced
 
System dynamics prof nagurney
System dynamics prof nagurneySystem dynamics prof nagurney
System dynamics prof nagurney
 
System dynamics math representation
System dynamics math representationSystem dynamics math representation
System dynamics math representation
 

Recently uploaded

clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 

Recently uploaded (20)

clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 

Lect12 graph mining

  • 1. Lecture 11: Graph Data Mining Slides are modified from Jiawei Han & Micheline Kamber
  • 2. Graph Data Mining  DNA sequence  RNA
  • 3. Graph Data Mining  Compounds  Texts
  • 4. Outline  Graph Pattern Mining  Mining Frequent Subgraph Patterns  Graph Indexing  Graph Similarity Search  Graph Classification  Graph pattern-based approach  Machine Learning approaches  Graph Clustering  Link-density-based approach
  • 5. Graph Pattern Mining  Frequent subgraphs  A (sub)graph is frequent if its support (occurrence frequency) in a given dataset is no less than a minimum support threshold  Support of a graph g is defined as the percentage of graphs in G which have g as subgraph  Applications of graph pattern mining  Mining biochemical structures  Program control flow analysis  Mining XML structures or Web communities  Building blocks for graph classification, clustering, compression, comparison, and correlation analysis 5
  • 6. Example: Frequent Subgraphs GRAPH DATASET (A) (B) (C) FREQUENT PATTERNS (MIN SUPPORT IS 2) (1) (2) 6
  • 8. Graph Mining Algorithms  Incomplete beam search – Greedy (Subdue)  Inductive logic programming (WARMR)  Graph theory-based approaches  Apriori-based approach  Pattern-growth approach 8
  • 9. Properties of Graph Mining Algorithms  Search order  breadth vs. depth  Generation of candidate subgraphs  apriori vs. pattern growth  Elimination of duplicate subgraphs  passive vs. active  Support calculation  embedding store or not  Discover order of patterns  path  tree  graph 9
  • 11. Apriori-Based, Breadth-First Search  Methodology: breadth-search, joining two graphs  AGM (Inokuchi, et al.)  generates new graphs with one more node  FSG (Kuramochi and Karypis)  generates new graphs with one more edge 11
  • 13. Graph Pattern Explosion Problem  If a graph is frequent, all of its subgraphs are frequent  the Apriori property  An n-edge frequent graph may have 2n subgraphs  Among 422 chemical compounds which are confirmed to be active in an AIDS antiviral screen dataset,  there are 1,000,000 frequent graph patterns if the minimum support is 5% 13
  • 14. Closed Frequent Graphs  A frequent graph G is closed  if there exists no supergraph of G that carries the same support as G  If some of G’s subgraphs have the same support  it is unnecessary to output these subgraphs  nonclosed graphs  Lossless compression  Still ensures that the mining result is complete
  • 15. Graph Search  Querying graph databases:  Given a graph database and a query graph, find all the graphs containing this query graph query graph graph database 15
  • 16. Scalability Issue  Naïve solution  Sequential scan (Disk I/O)  Subgraph isomorphism test (NP-complete)  Problem: Scalability is a big issue  An indexing mechanism is needed 16
  • 17. Indexing Strategy Query graph (Q) Graph (G) If graph G contains query graph Q, G should contain any substructure of Q Substructure Remarks  Index substructures of a query graph to prune graphs that do not contain these substructures 17
  • 18. Indexing Framework  Two steps in processing graph queries Step 1. Index Construction  Enumerate structures in the graph database, build an inverted index between structures and graphs Step 2. Query Processing  Enumerate structures in the query graph  Calculate the candidate graphs containing these structures  Prune the false positive answers by performing subgraph isomorphism test 18
  • 19. Why Frequent Structures?  We cannot index (or even search) all of substructures  Large structures will likely be indexed well by their substructures  Size-increasing support threshold support minimum support threshold size 19
  • 20. Structure Similarity Search • CHEMICAL COMPOUNDS (a) caffeine (b) diurobromine (c) sildenafil • QUERY GRAPH 20
  • 21. Substructure Similarity Measure  Feature-based similarity measure  Each graph is represented as a feature vector X = {x1, x2, …, xn}  Similarity is defined by the distance of their corresponding vectors  Advantages  Easy to index  Fast  Rough measure 21
  • 22. Some “Straightforward” Methods  Method1: Directly compute the similarity between the graphs in the DB and the query graph  Sequential scan  Subgraph similarity computation  Method 2: Form a set of subgraph queries from the original query graph and use the exact subgraph search  Costly: If we allow 3 edges to be missed in a 20-edge query graph, it may generate 1,140 subgraphs 22
  • 23. Index: Precise vs. Approximate Search  Precise Search  Use frequent patterns as indexing features  Select features in the database space based on their selectivity  Build the index  Approximate Search  Hard to build indices covering similar subgraphs  explosive number of subgraphs in databases  Idea: (1) keep the index structure (2) select features in the query space 23
  • 24. Outline  Graph Pattern Mining  Mining Frequent Subgraph Patterns  Graph Indexing  Graph Similarity Search  Graph Classification  Graph pattern-based approach  Machine Learning approaches  Graph Clustering  Link-density-based approach
  • 25. Substructure-Based Graph Classification  Basic idea  Extract graph substructures F = {g1,..., g n }  Represent a graph with a feature vector  where xi is the frequency of  Build a classification model x = {x1 ,..., xn }, g i in that graph  Different features and representative work  Fingerprint  Maccs keys  Tree and cyclic patterns [Horvath et al.]  Minimal contrast subgraph [Ting and Bailey]  Frequent subgraphs [Deshpande et al.; Liu et al.]  Graph fragments [Wale and Karypis]
  • 26. Direct Mining of Discriminative Patterns  Avoid mining the whole set of patterns  Harmony [Wang and Karypis]  DDPMine [Cheng et al.]  LEAP [Yan et al.]  MbT [Fan et al.]  Find the most discriminative pattern  A search problem?  An optimization problem?  Extensions  Mining top-k discriminative patterns  Mining approximate/weighted discriminative patterns
  • 27. Graph Kernels  Motivation:  Kernel based learning methods doesn’t need to access data points  They rely on the kernel function between the data points  Can be applied to any complex structure provided you can define a kernel function on them  Basic idea:  Map each graph to some significant set of patterns  Define a kernel on the corresponding sets of patterns 27
  • 28. Kernel-based Classification  Random walk  Basic Idea: count the matching random walks between the two graphs  Marginalized Kernels  Gärtner ’02, Kashima et al. ’02, Mahé et al.’04    and are paths in graphs and and are probability distributions on paths is a kernel between paths, e.g.,
  • 29. Boosting in Graph Classification  Decision stumps  Simple classifiers in which the final decision is made by single features  A rule is a tuple  If a molecule contains substructure  Gain  Applying boosting , it is classified as .
  • 30. Outline  Graph Pattern Mining  Mining Frequent Subgraph Patterns  Graph Indexing  Graph Similarity Search  Graph Classification  Graph pattern-based approach  Machine Learning approaches  Graph Clustering  Link-density-based approach
  • 31. Graph Compression  Extract common subgraphs and simplify graphs by condensing these subgraphs into nodes
  • 32. Graph/Network Clustering Problem  Networks made up of the mutual relationships of data elements usually have an underlying structure  Because relationships are complex, it is difficult to discover these structures.  How can the structure be made clear?  Given simple information of who associates with whom, could one identify clusters of individuals with common interests or special relationships?  E.g., families, cliques, terrorist cells…
  • 33. An Example of Networks  How many clusters?  What size should they be?  What is the best partitioning?  Should some points be segregated?
  • 34. A Social Network Model  Individuals in a tight social group, or clique, know many of the same people  regardless of the size of the group  Individuals who are hubs know many people in different groups but belong to no single group  E.g., politicians bridge multiple groups  Individuals who are outliers reside at the margins of society  E.g., Hermits know few people and belong to no group
  • 35. The Neighborhood of a Vertex  Define Γ(ν) as the immediate neighborhood of a vertex  i.e. the set of people that an individual knows
  • 36. Structure Similarity  The desired features tend to be captured by a measure called Structural Similarity | Γ(v)  Γ( w) | σ (v, w) = | Γ(v) || Γ( w) |  Structural similarity is large for members of a clique and small for hubs and outliers.
  • 37. Graph Mining Frequent Subgraph Mining (FSM) Apriori based AGM FSG PATH Pattern Growth based gSpan MoFa GASTO N FFSM SPIN Variant Subgraph Pattern Mining Applications of Frequent Subgraph Mining Indexing and Search Clustering Coherent Subgraph mining Closed Dense Classification Subgraph CSA Subgraph CLAN mining Mining Approximate methods SUBDUE GBI CloseGraph CloseCut Splat CODENSE Kernel Methods (Graph Kernels) GraphGrep Daylight gIndex (Є Grafil) 37

Editor's Notes

  1. Apriori: Step1: Join two k-1 edge graphs (these two graphs share a same k-2 edge subgraph) to generate a k-edge graph Step2: Join the tid-list of these two k-1 edge graphs, then see whether its count is larger than the minimum support Step3: Check all k-1 subgraph of this k-edge graph to see whether all of them are frequent Step4: After G successfully pass Step1-3, do support computation of G in the graph dataset, See whether it is really frequent. gSpan: Step1: Right-most extend a k-1 edge graph to several k edge graphs. Step2: Enumerate the occurrence of this k-1 edge graph in the graph dataset, meanwhile, counting these k edge graphs. Step3: Output those k edge graphs whose support is larger than the minimum support. Pros: 1: gSpan avoid the costly candidate generation and testing some infrequent subgraphs. 2: No complicated graph operations, like joining two graphs and calculating its k-1 subgraphs. 3. gSpan is very simple The key is how to do right most extension efficiently in graph. We invented DFS code for graph.
  2. Our work, also with all the previous work follows this indexing strategy.