SlideShare a Scribd company logo
MaxPlanckInstitute@Tubingen, Feb. 25th, 2010   Mining Frequent Subgraphs from Linear Graphs YasuoTabei Computational Biology Research Center, AIST Joint work with Daisuke Okanohara (Univ. of Tokyo),  Shuichi Hirose (AIST),  Koji Tsuda (AIST)
Outline ,[object Object],- The needs for frequent subgraph mining algorithm  - What is a linear graph? ,[object Object], - Subgraphenumeration algorithm from a linear graph  - Extension to frequent subgraphmining algorithm ,[object Object],- Motifs extraction from protein 3D-structures in molecularbiology  - Phrase extraction from predicate-argument structures in NLP ,[object Object],[object Object]
Frequent Subgraph Mining  Enumerate all frequent subgraphs in a graph database Input: graph database G={g1,g2,…,gN} G1 G2 G3 Output: frequent subgraphs appearing in at least m graphs
gSpan algorithm (Yan et al., 2002) Rightmost pattern extension  Duplication can happen Minimum DFS code checking Time exponential to pattern size  g
Linear Graph (Davydov et al., 2004) c b a a Labeled graph whose vertices are totally ordered Linear graph g=(V,E,LV,LE) - V⊂N: ordered vertex set    - E⊆V×V: edge set    - LV: V->ΣV: vertex labeling    - LE:E->ΣE: edge labeling     Ex) RNA, protein,                                                        alternative                                                                    splicing forms,                                                       PAS 1 2 3 4 5 6 A B A B C A
Linear Graph (Davydov et al., 2004) c b Labeled graph whose vertices are totally ordered Many types of data can be represented as linear graphs ,[object Object]
Alternative splicing forms
RNA secondary structures
Predicate-argument structurea a 1 2 3 4 5 6 A B A B C A
Linear Subgraph Relation  g1 is a linear subgraph of g2 ⇔ i)The ordinary subgraph condition      - the vertex labels are matched   - all edges of g1 also exit in g2 with the correctlabels       ii) The order of vertices are conserved   Ex) ⊂ 1 3 2 6 4 5 3 2 1 C G A C G A T A T g1 g2
Example of Not Linear Subgraph g1  is not a linear subgraph of g2  - vertex labels are matched    - all edges of g1 also exit in g2 with the correct labels - the order of vertices is not conserved             Ex) b b c × ⊂ c 1 2 1 3 2 3 g1 g2 A A A B B A
Total order among edges in a linear graph Compare the left nodes first. If they are identical, look at the right nodes ∀e1=(i,j),e2=(k,l)∈Eg, e1<ee2     if and only if (i)i<k or (ii)i=k, j < l                                                 Ex) 2 3 1 1 2 3 4 e2 e1 i j k l
Disconnected Patterns Linear Graph: Sequence + Graph In sequence mining, gapped patterns are considered Need to mine disconnected patterns as well  Data represented as disconnected patterns ,[object Object]
RNA secondary structure
Alternative splicing4 1 2 3 4 D A R N D
Outline ,[object Object],- The needs for frequent subgraph mining algorithm  - Linear Graph ,[object Object], - Subgraph mining algorithm from a linear graph  - Frequent subgraphmining algorithm from linear graphs  ,[object Object],- Motifs extraction from protein 3D-structures in molecularbiology  - Phrase extraction from predicate-argument structures in NLP ,[object Object],[object Object]
Enumeration of All Linear Subgraphs of a Linear Graph Before considering a mining algorithm, we have to solve the problem of subgraph enumeration first How to enumerate all subgraphs of the following linear graph without duplication?
Search Lattice of All Subgraphs # of edges (level) 1 empty 2 3 4
Reverse Search (Avis and Fukuda, 1993) All subgraphs can be enumerated by traversing the search lattice  - To prevent duplication is difficult Need to define a search tree in the search lattice Reduction map f Mapping from a child to its parent Remove the largest edge  f 2 2 3 1 1 1 2 3 4 1 2 3
Search Tree induced by the reduction map By applying the reduction map to each element search tree can be induced empty
Inverting the reduction map f-1 In traversing the tree from root, children nodes are created on demand Consider all children candidate Take the ones that qualify the reduction map ,[object Object],However,  in this particular case, the reduction map can be inverted explicitly Can derive the pattern extension rule (from parent to children)
Pattern Extension Rule 0-vertex addition (A-1) Parent Graph the largest edge new added edge (B-2) (B-3) (B-4) i i 1-vertex addition i i i (B-1) i (B-5) (B-6) (B-7) i i i 2-verteces addition (C-2) (C-3) (C-1) i i i (C-4) (C-5) (C-6) i i i
Traversing search tree from root Depth first traversal for its memory efficiency the largest edge new added edge empty
Frequent Subgraph Mining Basic idea: find all possible extensions of a current pattern in the graph database, and      extend the pattern. Occurrence list LG(g)  - Record every occurrence of a pattern g in the graph database G - Calculatethe support of a pattern g by the occurrence list. ,[object Object],    for pruning Search Tree pruning
Outline ,[object Object],- The need for frequent subgraph mining algorithm  - Linear Graph ,[object Object], - Subgraph mining algorithm from a linear graph  - Frequent subgraphmining algorithm from linear graphs  ,[object Object],- Motifs extraction from protein 3D-structures in molecularbiology  - Phrase extraction from predicate-argument structures in NLP ,[object Object],[object Object]
Motif extraction from protein 3D structures Pairs of homologous proteins in thermophilic organism and methophilic organism  Construct a linear graph from a protein  ,[object Object], - Assign vertex labels from {1,…,6} according to its property (Mirny, 1999).  ,[object Object]
  No edge labels.Rank the patterns by statistical significance (p-values) Association to thermophilic/methophilic label Fisher exact test
Applying gSpan Want to compare the execution time of our algorithm with that of gSpan gSpan is not directly applicable   - Contact maps are not always connected   - Made 1-gap and 2-gap linear graphs
Runtime comparison ,[object Object]
Execution time of LGM is reasonable.gSpan does not work on the 2-gap linear graph dataset even if the minimum support threshold is 50.
Minimum support = 10 103 patterns whose p-value < 0.001 Thermophilic (TATA), Mesophilic (pol II)
Mapping motifs in 3D structure
Phrase extraction from predicate-argument structures Internet movie review dataset (Pang et al., 2004) Sentiment dataset 5331 positive and  5331 negative opinions  ,[object Object],5000 subjective and 5000 objective sentences ,[object Object]
Extract characteristic phrases (subgraph patterns),[object Object]
Methods in comparison PAS+gSpan Predicate argument structure + gSpan No edges added Dep+FREQT Dependency tree (KSDep) + FREQT (Tree Miner) ,[object Object],Modified PrefixSpan (Sequence Miner)
Classification Accuracy The accuracy of LGM is better than that of gSpan  PAS representation is comparable to the other representations.
Phrase structure extraction from predicate-argument structures Only simple sequential patterns are extracted ,[object Object],aaa
Phrase structure extraction from predicate-argument structures Phrase structures were extracted. ,[object Object],[object Object]
Another topics  Alignment algorithms for RNA sequences  - Ph.D. study All pairs similarity search method  - nearest neighbor graphs
Q & A
Data represented as linear graphs DNA, RNA, protein-3D structure, predicate argument structure  - reference point: 5-strand(DNA, RNA),  N-terminal (protein)  Ex)  RNA                         Protein (edge: 5Å) 1 2 3 4 1 2 3 4 3’ N C 5’ G U G C A R N D
Data NOT represented as linear graphs Chemical compounds, Gene co-expression networks, social networks etc  v1 v3 v2 v4 ,[object Object],-  4! manners v1 v2 v3 v4 v1 v2 v4 v3 …. v1 v2 v3 v4

More Related Content

What's hot

AI Lesson 06
AI Lesson 06AI Lesson 06
AI Lesson 06
Assistant Professor
 
Example of iterative deepening search &amp; bidirectional search
Example of iterative deepening search &amp; bidirectional searchExample of iterative deepening search &amp; bidirectional search
Example of iterative deepening search &amp; bidirectional search
Abhijeet Agarwal
 
Search algorithms master
Search algorithms masterSearch algorithms master
Search algorithms master
Hossam Hassan
 
09 heuristic search
09 heuristic search09 heuristic search
09 heuristic search
Tianlu Wang
 
AI Lesson 05
AI Lesson 05AI Lesson 05
AI Lesson 05
Assistant Professor
 
Dfs presentation
Dfs presentationDfs presentation
Dfs presentation
Alizay Khan
 
Solving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) SearchSolving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) Search
matele41
 
Informed search (heuristics)
Informed search (heuristics)Informed search (heuristics)
Informed search (heuristics)
Bablu Shofi
 
Jarrar: Informed Search
Jarrar: Informed Search  Jarrar: Informed Search
Jarrar: Informed Search
Mustafa Jarrar
 
Lecture 08 uninformed search techniques
Lecture 08 uninformed search techniquesLecture 08 uninformed search techniques
Lecture 08 uninformed search techniques
Hema Kashyap
 
A star algorithms
A star algorithmsA star algorithms
A star algorithms
sandeep54552
 
DFS and BFS
DFS and BFSDFS and BFS
DFS and BFS
satya parsana
 
Lecture 12 Heuristic Searches
Lecture 12 Heuristic SearchesLecture 12 Heuristic Searches
Lecture 12 Heuristic Searches
Hema Kashyap
 
Functions
FunctionsFunctions
Functions
Gaditek
 
GATE Computer Science Solved Paper 2004
GATE Computer Science Solved Paper 2004GATE Computer Science Solved Paper 2004
GATE Computer Science Solved Paper 2004
Rohit Garg
 
Lecture13
Lecture13Lecture13
Lecture13
vaishali_singh
 
Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...
Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...
Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...
Edureka!
 
20110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-0420110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-04
Computer Science Club
 
Heuristic search
Heuristic searchHeuristic search
Heuristic search
Soheil Khodayari
 
Pathfinding - Part 1: Α* heuristic search
Pathfinding - Part 1: Α* heuristic searchPathfinding - Part 1: Α* heuristic search
Pathfinding - Part 1: Α* heuristic search
Stavros Vassos
 

What's hot (20)

AI Lesson 06
AI Lesson 06AI Lesson 06
AI Lesson 06
 
Example of iterative deepening search &amp; bidirectional search
Example of iterative deepening search &amp; bidirectional searchExample of iterative deepening search &amp; bidirectional search
Example of iterative deepening search &amp; bidirectional search
 
Search algorithms master
Search algorithms masterSearch algorithms master
Search algorithms master
 
09 heuristic search
09 heuristic search09 heuristic search
09 heuristic search
 
AI Lesson 05
AI Lesson 05AI Lesson 05
AI Lesson 05
 
Dfs presentation
Dfs presentationDfs presentation
Dfs presentation
 
Solving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) SearchSolving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) Search
 
Informed search (heuristics)
Informed search (heuristics)Informed search (heuristics)
Informed search (heuristics)
 
Jarrar: Informed Search
Jarrar: Informed Search  Jarrar: Informed Search
Jarrar: Informed Search
 
Lecture 08 uninformed search techniques
Lecture 08 uninformed search techniquesLecture 08 uninformed search techniques
Lecture 08 uninformed search techniques
 
A star algorithms
A star algorithmsA star algorithms
A star algorithms
 
DFS and BFS
DFS and BFSDFS and BFS
DFS and BFS
 
Lecture 12 Heuristic Searches
Lecture 12 Heuristic SearchesLecture 12 Heuristic Searches
Lecture 12 Heuristic Searches
 
Functions
FunctionsFunctions
Functions
 
GATE Computer Science Solved Paper 2004
GATE Computer Science Solved Paper 2004GATE Computer Science Solved Paper 2004
GATE Computer Science Solved Paper 2004
 
Lecture13
Lecture13Lecture13
Lecture13
 
Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...
Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...
Breadth First Search Algorithm In 10 Minutes | Artificial Intelligence Tutori...
 
20110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-0420110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-04
 
Heuristic search
Heuristic searchHeuristic search
Heuristic search
 
Pathfinding - Part 1: Α* heuristic search
Pathfinding - Part 1: Α* heuristic searchPathfinding - Part 1: Α* heuristic search
Pathfinding - Part 1: Α* heuristic search
 

Viewers also liked

Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data MatricesScalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Yasuo Tabei
 
Sketch sort ochadai20101015-public
Sketch sort ochadai20101015-publicSketch sort ochadai20101015-public
Sketch sort ochadai20101015-public
Yasuo Tabei
 
Mlab2012 tabei 20120806
Mlab2012 tabei 20120806Mlab2012 tabei 20120806
Mlab2012 tabei 20120806
Yasuo Tabei
 
Kdd2015reading-tabei
Kdd2015reading-tabeiKdd2015reading-tabei
Kdd2015reading-tabei
Yasuo Tabei
 
Dmss2011 public
Dmss2011 publicDmss2011 public
Dmss2011 public
Yasuo Tabei
 
Ibisml2011 06-20
Ibisml2011 06-20Ibisml2011 06-20
Ibisml2011 06-20Yasuo Tabei
 
Sketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - publicSketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - public
Yasuo Tabei
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant Space
Yasuo Tabei
 
GIW2013
GIW2013GIW2013
GIW2013
Yasuo Tabei
 
Gwt presen alsip-20111201
Gwt presen alsip-20111201Gwt presen alsip-20111201
Gwt presen alsip-20111201
Yasuo Tabei
 
CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306
Yasuo Tabei
 
SPIRE2013-tabei20131009
SPIRE2013-tabei20131009SPIRE2013-tabei20131009
SPIRE2013-tabei20131009
Yasuo Tabei
 
WABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTreeWABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTree
Yasuo Tabei
 
Gwt sdm public
Gwt sdm publicGwt sdm public
Gwt sdm public
Yasuo Tabei
 
NIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributesNIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributesYasuo Tabei
 
Jayant lrs
Jayant lrsJayant lrs
Jayant lrs
Jayant Apte, PhD
 
CSMR11b.ppt
CSMR11b.pptCSMR11b.ppt
CSMR11b.ppt
Ptidej Team
 
20110501 csseminar rybalkin_substructure_search
20110501 csseminar rybalkin_substructure_search20110501 csseminar rybalkin_substructure_search
20110501 csseminar rybalkin_substructure_searchComputer Science Club
 
Effective community search_dami2015
Effective community search_dami2015Effective community search_dami2015
Effective community search_dami2015
Nicola Barbieri
 

Viewers also liked (20)

Lp Boost
Lp BoostLp Boost
Lp Boost
 
Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data MatricesScalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
 
Sketch sort ochadai20101015-public
Sketch sort ochadai20101015-publicSketch sort ochadai20101015-public
Sketch sort ochadai20101015-public
 
Mlab2012 tabei 20120806
Mlab2012 tabei 20120806Mlab2012 tabei 20120806
Mlab2012 tabei 20120806
 
Kdd2015reading-tabei
Kdd2015reading-tabeiKdd2015reading-tabei
Kdd2015reading-tabei
 
Dmss2011 public
Dmss2011 publicDmss2011 public
Dmss2011 public
 
Ibisml2011 06-20
Ibisml2011 06-20Ibisml2011 06-20
Ibisml2011 06-20
 
Sketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - publicSketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - public
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant Space
 
GIW2013
GIW2013GIW2013
GIW2013
 
Gwt presen alsip-20111201
Gwt presen alsip-20111201Gwt presen alsip-20111201
Gwt presen alsip-20111201
 
CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306
 
SPIRE2013-tabei20131009
SPIRE2013-tabei20131009SPIRE2013-tabei20131009
SPIRE2013-tabei20131009
 
WABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTreeWABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTree
 
Gwt sdm public
Gwt sdm publicGwt sdm public
Gwt sdm public
 
NIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributesNIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributes
 
Jayant lrs
Jayant lrsJayant lrs
Jayant lrs
 
CSMR11b.ppt
CSMR11b.pptCSMR11b.ppt
CSMR11b.ppt
 
20110501 csseminar rybalkin_substructure_search
20110501 csseminar rybalkin_substructure_search20110501 csseminar rybalkin_substructure_search
20110501 csseminar rybalkin_substructure_search
 
Effective community search_dami2015
Effective community search_dami2015Effective community search_dami2015
Effective community search_dami2015
 

Similar to Lgm saarbrucken

A Subgraph Pattern Search over Graph Databases
A Subgraph Pattern Search over Graph DatabasesA Subgraph Pattern Search over Graph Databases
A Subgraph Pattern Search over Graph Databases
IJMER
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
International Journal of Engineering Inventions www.ijeijournal.com
 
Paired-end alignments in sequence graphs
Paired-end alignments in sequence graphsPaired-end alignments in sequence graphs
Paired-end alignments in sequence graphs
Chirag Jain
 
Survey of Graph Indexing
Survey of Graph IndexingSurvey of Graph Indexing
Survey of Graph Indexing
Kisung Kim
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblas
graphulo
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblas
MIT
 
Finding Top-k Similar Graphs in Graph Database @ ReadingCircle
Finding Top-k Similar Graphs in Graph Database @ ReadingCircleFinding Top-k Similar Graphs in Graph Database @ ReadingCircle
Finding Top-k Similar Graphs in Graph Database @ ReadingCircle
charlingual
 
Colombo14a
Colombo14aColombo14a
Colombo14a
AlferoSimona
 
Kailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptxKailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptx
KailashChandMeena6
 
Dycops2019
Dycops2019 Dycops2019
Dycops2019
Jéssyca Bessa
 
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE - IEEE PROJE...
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE  - IEEE PROJE...SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE  - IEEE PROJE...
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE - IEEE PROJE...
Nexgen Technology
 
Subgraph matching with set similarity in a
Subgraph matching with set similarity in aSubgraph matching with set similarity in a
Subgraph matching with set similarity in a
Nexgen Technology
 
Subgraph matching with set similarity in a
Subgraph matching with set similarity in aSubgraph matching with set similarity in a
Subgraph matching with set similarity in a
nexgentech15
 
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
Subhajit Sahu
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
Kyong-Ha Lee
 
Parallel Biological Sequence Comparison in GPU Platforms
Parallel Biological Sequence Comparison in GPU PlatformsParallel Biological Sequence Comparison in GPU Platforms
Parallel Biological Sequence Comparison in GPU Platforms
Ganesan Narayanasamy
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfish
tuxette
 
Graph mining seminar_2009
Graph mining seminar_2009Graph mining seminar_2009
Graph mining seminar_2009
Houw Liong The
 
Mayank
MayankMayank
Mayank
Mayank Miky
 
graph_mining_seminar_2009.ppt
graph_mining_seminar_2009.pptgraph_mining_seminar_2009.ppt
graph_mining_seminar_2009.ppt
Venkateswara Rao Katevarapu
 

Similar to Lgm saarbrucken (20)

A Subgraph Pattern Search over Graph Databases
A Subgraph Pattern Search over Graph DatabasesA Subgraph Pattern Search over Graph Databases
A Subgraph Pattern Search over Graph Databases
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Paired-end alignments in sequence graphs
Paired-end alignments in sequence graphsPaired-end alignments in sequence graphs
Paired-end alignments in sequence graphs
 
Survey of Graph Indexing
Survey of Graph IndexingSurvey of Graph Indexing
Survey of Graph Indexing
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblas
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblas
 
Finding Top-k Similar Graphs in Graph Database @ ReadingCircle
Finding Top-k Similar Graphs in Graph Database @ ReadingCircleFinding Top-k Similar Graphs in Graph Database @ ReadingCircle
Finding Top-k Similar Graphs in Graph Database @ ReadingCircle
 
Colombo14a
Colombo14aColombo14a
Colombo14a
 
Kailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptxKailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptx
 
Dycops2019
Dycops2019 Dycops2019
Dycops2019
 
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE - IEEE PROJE...
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE  - IEEE PROJE...SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE  - IEEE PROJE...
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE - IEEE PROJE...
 
Subgraph matching with set similarity in a
Subgraph matching with set similarity in aSubgraph matching with set similarity in a
Subgraph matching with set similarity in a
 
Subgraph matching with set similarity in a
Subgraph matching with set similarity in aSubgraph matching with set similarity in a
Subgraph matching with set similarity in a
 
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
Parallel Biological Sequence Comparison in GPU Platforms
Parallel Biological Sequence Comparison in GPU PlatformsParallel Biological Sequence Comparison in GPU Platforms
Parallel Biological Sequence Comparison in GPU Platforms
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfish
 
Graph mining seminar_2009
Graph mining seminar_2009Graph mining seminar_2009
Graph mining seminar_2009
 
Mayank
MayankMayank
Mayank
 
graph_mining_seminar_2009.ppt
graph_mining_seminar_2009.pptgraph_mining_seminar_2009.ppt
graph_mining_seminar_2009.ppt
 

Lgm saarbrucken

  • 1. MaxPlanckInstitute@Tubingen, Feb. 25th, 2010 Mining Frequent Subgraphs from Linear Graphs YasuoTabei Computational Biology Research Center, AIST Joint work with Daisuke Okanohara (Univ. of Tokyo), Shuichi Hirose (AIST), Koji Tsuda (AIST)
  • 2.
  • 3. Frequent Subgraph Mining Enumerate all frequent subgraphs in a graph database Input: graph database G={g1,g2,…,gN} G1 G2 G3 Output: frequent subgraphs appearing in at least m graphs
  • 4. gSpan algorithm (Yan et al., 2002) Rightmost pattern extension Duplication can happen Minimum DFS code checking Time exponential to pattern size g
  • 5. Linear Graph (Davydov et al., 2004) c b a a Labeled graph whose vertices are totally ordered Linear graph g=(V,E,LV,LE) - V⊂N: ordered vertex set - E⊆V×V: edge set - LV: V->ΣV: vertex labeling - LE:E->ΣE: edge labeling Ex) RNA, protein, alternative splicing forms, PAS 1 2 3 4 5 6 A B A B C A
  • 6.
  • 9. Predicate-argument structurea a 1 2 3 4 5 6 A B A B C A
  • 10. Linear Subgraph Relation g1 is a linear subgraph of g2 ⇔ i)The ordinary subgraph condition - the vertex labels are matched   - all edges of g1 also exit in g2 with the correctlabels ii) The order of vertices are conserved Ex) ⊂ 1 3 2 6 4 5 3 2 1 C G A C G A T A T g1 g2
  • 11. Example of Not Linear Subgraph g1 is not a linear subgraph of g2 - vertex labels are matched - all edges of g1 also exit in g2 with the correct labels - the order of vertices is not conserved Ex) b b c × ⊂ c 1 2 1 3 2 3 g1 g2 A A A B B A
  • 12. Total order among edges in a linear graph Compare the left nodes first. If they are identical, look at the right nodes ∀e1=(i,j),e2=(k,l)∈Eg, e1<ee2 if and only if (i)i<k or (ii)i=k, j < l Ex) 2 3 1 1 2 3 4 e2 e1 i j k l
  • 13.
  • 15. Alternative splicing4 1 2 3 4 D A R N D
  • 16.
  • 17. Enumeration of All Linear Subgraphs of a Linear Graph Before considering a mining algorithm, we have to solve the problem of subgraph enumeration first How to enumerate all subgraphs of the following linear graph without duplication?
  • 18. Search Lattice of All Subgraphs # of edges (level) 1 empty 2 3 4
  • 19. Reverse Search (Avis and Fukuda, 1993) All subgraphs can be enumerated by traversing the search lattice - To prevent duplication is difficult Need to define a search tree in the search lattice Reduction map f Mapping from a child to its parent Remove the largest edge f 2 2 3 1 1 1 2 3 4 1 2 3
  • 20. Search Tree induced by the reduction map By applying the reduction map to each element search tree can be induced empty
  • 21.
  • 22. Pattern Extension Rule 0-vertex addition (A-1) Parent Graph the largest edge new added edge (B-2) (B-3) (B-4) i i 1-vertex addition i i i (B-1) i (B-5) (B-6) (B-7) i i i 2-verteces addition (C-2) (C-3) (C-1) i i i (C-4) (C-5) (C-6) i i i
  • 23. Traversing search tree from root Depth first traversal for its memory efficiency the largest edge new added edge empty
  • 24.
  • 25.
  • 26.
  • 27. No edge labels.Rank the patterns by statistical significance (p-values) Association to thermophilic/methophilic label Fisher exact test
  • 28. Applying gSpan Want to compare the execution time of our algorithm with that of gSpan gSpan is not directly applicable - Contact maps are not always connected - Made 1-gap and 2-gap linear graphs
  • 29.
  • 30. Execution time of LGM is reasonable.gSpan does not work on the 2-gap linear graph dataset even if the minimum support threshold is 50.
  • 31. Minimum support = 10 103 patterns whose p-value < 0.001 Thermophilic (TATA), Mesophilic (pol II)
  • 32. Mapping motifs in 3D structure
  • 33.
  • 34.
  • 35.
  • 36. Classification Accuracy The accuracy of LGM is better than that of gSpan PAS representation is comparable to the other representations.
  • 37.
  • 38.
  • 39. Another topics Alignment algorithms for RNA sequences - Ph.D. study All pairs similarity search method - nearest neighbor graphs
  • 40. Q & A
  • 41. Data represented as linear graphs DNA, RNA, protein-3D structure, predicate argument structure - reference point: 5-strand(DNA, RNA), N-terminal (protein) Ex) RNA Protein (edge: 5Å) 1 2 3 4 1 2 3 4 3’ N C 5’ G U G C A R N D
  • 42.
  • 43. Right most pattern extension right most path A graph is extended from a vertex on the right most path v1 v1 v3 v2 v3 v4 v1 v2 v1 v4 v2 v1 v2 v3 v2 v3
  • 44. What is a code for an edge A code assigned for an edge in a graph - a set of label ids, vertex labels, edge ids Ex) ( vertex id1, vertex id2, vertex id1 label, vertex id2 label, edge label) v1 v2 v3 v4
  • 45.
  • 46. Motif extraction To extract protein-3D motifs, we use the Fisher’s exact test. The P-value can be computed by the sum of all probabilities of tables that are more extreme than this table. Ranked the frequent subgraphs according to the P-values. Focused on a pair of proteins, TATA-binding protein and human polIIpromotor protein Table1: 2×2 contingency table
  • 47. Unannoated Data VHLTPEEKKVVVK ? Prediction GGCCGGCCGGCCC ? Model Ex) HMM, SCFG etc ? Learning Feedback Annotated Data Ex) DNA, Protein, RNA etc ATGGGGCCCCGGC Gene VHLTPEEKKVVVK Protein RNA
  • 48. Algorithms for prediction and learning are based on Dynamic Programming (DP). Ordering in linear graphs is useful for designing DP algorithms