SlideShare a Scribd company logo
The 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2011)
25 May 2011



         LGM: Mining Frequent Subgraphs
              from Linear Graphs

                                Yasuo Tabei
                           ERATO Minato Project
                    Japan Science and Technology Agency
                               joint work with
                 Daisuke Okanohara (Preferred Infrastructure),
                           Shuichi Hirose (AIST),
                              Koji Tsuda (AIST)


                                             1
                                                                                     1
Outline
• Introduction to linear graph
  ★   Linear subgraph relation
  ★   Total order among edges
• Frequent subgraph mining from a set of
  linear graphs
• Experiments
  ★   Motif extraction from protein 3D
      structures
                       2
                                           2
Linear graph (Davydov et al., 2004)
 • Labeled graph whose vertices are totally
   ordered
 • Linear graph g = (V, E, L , L )   V       E


   ‣ V ⊂ N : ordered vertex set
   ‣ E ⊆ V × V : edge set
   ‣ LV → ΣV : vertex labels
   ‣L →Σ
      E      E : edge labels

Example:
                                     c
                     b

                 a                   a
             1       2   3       4       5   6
             A       B   A       B       C       A
                             3
                                                     3
Linear subgraph relation
•   g1 is a linear subgraph of g2
      i) Conventional subgraph condition
        ★ Vertex labels are matched
        ★ All edges of g1 exist in g2 with the correct labels
       ii) Order of vertices are conserved
Example:
                                             b
                b
                                             c

        1
            a
                2    3
                         ⊂           a                a

                                 1   2   3        4   5   6
        A       B    A           A   A   B        B   C   A
                g1                           g2

                             4
                                                                4
Subgraph but not linear
              subgraph
•   g1 is a subgraph of g2
    ★ vertex labels are matched
    ★ all edges in g1also exist in g2 with
       correct labels
•   g1 is not a linear subgraph of g2
    ★   the order of vertices is not conserved
            b
                                 b       c       a
                 c
        1   2        3       1       2       3       4
        A   A        B       A       A       B       A
            g1                           g2
                         5
                                                         5
Total order among edges in a
             linear graph
• Compare the left vertices first. If they
    are identical, look at the right vertices
•     ∀e1 = (i, j) , e2 = (k, l) ∈ Eg , e1   <e e2
    if and only if (i) i < k or (ii) i = k, j < l
                                 Example:
     e1            e2                        2
                                                         3
                                       1
i         j k           l          1         2       3       4
                             6
                                                                 6
Outline
• Introduction to linear graph
  ★   linear subgraph relation
  ★   Total order among edges
• Frequent subgraph mining from a set of
  linear graphs
• Experiments
  ★   Motif extraction from protein 3D
      structures
                       7
                                           7
Frequent subgraph mining
               from linear graphs
• Enumerate all frequent subgraphs from a set of
    linear graphs
     ★ Subgraphs included in a set of linear graphs at
        least τ times (minimum support threshold)
    ★  Enumerate connected and disconnected subgraphs
       with a unified framework
     ★ Use reverse search for an efficient enumeration
       (Avis and Fukuda, 1993)
•   Polynomial delay
     ★ gSpan = exponential delay
                           8
                                                         8
Enumeration of all linear
  subgraph of a linear graph
• Before considering a mining
  algorithm, we have to solve the
  problem of subgraph enumeration
  first
• How to enumerate graph withoutof
  the following linear
                       all subgraphs
  duplication


                  9
                                       9
Search lattice of all subgraphs
          !"#$%
                        *+,-+!./!0+12!3!24
                                       &



                                       '




                                       (


                                       )

                  10
                                             10
Reverse search (Avis and Fukuda, 1993)
  • To enumerate all subgraphs without
    duplication, we need to define a search tree
    in the search lattice

  • Reduction map f
   ★ Mapping from a child to its parent
   ★ Remove the largest edge


               2       3
                               f            2
           1                            1
       1       2   3       4        1       2   3

                               11
                                                    11
Search tree induced by the
        reduction map
• By applying the reduction map to each
  element, search tree can be induced
                 !"#$%




                         12
                                          12
Inverting the reduction map                         f   −1


• When traversing the tree from the root,
  children nodes are created on demand
• In most cases, the inversion of reduction
  map takes the following two steps:
  ★   Consider all children candidates
  ★   Take the ones that qualify the reduction map

• However, in this particular case, the
  reduction map can be inverted explicitly
  ★   Can derive the pattern extension rule
      (parent to children)
                          13
                                                              13
Pattern extension rule




          14
                         14
Traversing search tree from root
• Depth first traversal for its memory efficiency
      $&!'()*+!,$'!-+!
      .!/')--!-'!-+!     !"#$%




                             15
                                                  15
Frequent subgraph mining
• Basic idea: find all possible extensions of a
    current pattern in the graph database, and
    extend the pattern
• Occurrence list L    G (g)
★   Record every occurrence of a pattern g in
    the graph database G
★   Calculate the support of a pattern g by the
    occurrence list                   !"#$%&'($""


• Usesupport for pruningof
  the
      anti-monotonicity
                                )$*+,+-



                       16
                                                    16
Outline
• Introduction to linear graph
  ★   linear subgraph relation
  ★   Total order among edges
• Frequent subgraph mining from a set of
  linear graphs
• Experiments
  ★   Motif extraction from protein 3D
      structures
                       17
                                           17
Motif extraction from protein
            3D structures
•   Pairs of homologous proteins in thermophilic
     organism and mesophilic organism
•   Construct a linear graph from a protein
     ★ Use vertex order from N- to C- terminal
     ★ Assign vertex labels from {1,...,6}
     ★ Draw an edge between pairs of amino acid
       residues whose distance is 5Å
•   # of data:742, avg. # of vertices:371, avg. # of edges:
    496
•   Rank the enumerated patterns by statistical
    significance (p-value)
     ★ Association to thermophilic/methophilic labels
     ★ Fisher exact test
                          18
                                                              18
Runtime comparison
• Compared to gSpan
• Made gapped linear graphs and run gSpan
• LGM is faster than gSpan




                    19
                                            19
• Minimum support = 10
• 103 patterns whose p-value < 0.001
•★Thermophilic (TATA), Mesophilic (pol II)
    Share the function as DNA binding
    protein, but the thermostatility is
    different




                     20
                                             20
Mapping motifs in 3D structure

• Thermophilic (TATA), Mesophilic (pol II)




                       21
                                             21
Summary

• Efficient subgraph mining algorithm from
  linear graphs
• Search tree is defined by reverse search
  principle
• Patterns include disconnected subgraphs
• Computational time is polynomial-delay
• Interesting patterns from proteins
                     22
                                            22

More Related Content

What's hot

20110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-0420110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-04
Computer Science Club
 
B.Sc.IT: Semester - VI (December - 2017) [IDOL - Revised Course | Question Pa...
B.Sc.IT: Semester - VI (December - 2017) [IDOL - Revised Course | Question Pa...B.Sc.IT: Semester - VI (December - 2017) [IDOL - Revised Course | Question Pa...
B.Sc.IT: Semester - VI (December - 2017) [IDOL - Revised Course | Question Pa...
Mumbai B.Sc.IT Study
 
Digital Signals and System (April – 2015) [Revised Syllabus | Question Paper]
Digital Signals and System (April  – 2015) [Revised Syllabus | Question Paper]Digital Signals and System (April  – 2015) [Revised Syllabus | Question Paper]
Digital Signals and System (April – 2015) [Revised Syllabus | Question Paper]
Mumbai B.Sc.IT Study
 
Data structure
Data structureData structure
Data structure
Vivek Kumar Sinha
 
Dfs presentation
Dfs presentationDfs presentation
Dfs presentation
Alizay Khan
 
Functions
FunctionsFunctions
Functions
Gaditek
 
Jarrar: Informed Search
Jarrar: Informed Search  Jarrar: Informed Search
Jarrar: Informed Search
Mustafa Jarrar
 
C applications
C applicationsC applications
C applications
faizankhan260690
 
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]
Mumbai B.Sc.IT Study
 
Solving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) SearchSolving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) Search
matele41
 
Digital Signals and Systems (October – 2016) [Question Paper | IDOL: Revised ...
Digital Signals and Systems (October – 2016) [Question Paper | IDOL: Revised ...Digital Signals and Systems (October – 2016) [Question Paper | IDOL: Revised ...
Digital Signals and Systems (October – 2016) [Question Paper | IDOL: Revised ...
Mumbai B.Sc.IT Study
 
Graphical Models In Python | Edureka
Graphical Models In Python | EdurekaGraphical Models In Python | Edureka
Graphical Models In Python | Edureka
Edureka!
 
Formal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updatesFormal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updates
openCypher
 
Informed search (heuristics)
Informed search (heuristics)Informed search (heuristics)
Informed search (heuristics)
Bablu Shofi
 
Lecture 12 Heuristic Searches
Lecture 12 Heuristic SearchesLecture 12 Heuristic Searches
Lecture 12 Heuristic Searches
Hema Kashyap
 
Lecture 08 uninformed search techniques
Lecture 08 uninformed search techniquesLecture 08 uninformed search techniques
Lecture 08 uninformed search techniques
Hema Kashyap
 
[Question Paper] Data Communication and Network Standards (Revised Course) [J...
[Question Paper] Data Communication and Network Standards (Revised Course) [J...[Question Paper] Data Communication and Network Standards (Revised Course) [J...
[Question Paper] Data Communication and Network Standards (Revised Course) [J...
Mumbai B.Sc.IT Study
 
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
LDBC council
 
A star algorithms
A star algorithmsA star algorithms
A star algorithms
sandeep54552
 
AI Lesson 04
AI Lesson 04AI Lesson 04
AI Lesson 04
Assistant Professor
 

What's hot (20)

20110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-0420110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-04
 
B.Sc.IT: Semester - VI (December - 2017) [IDOL - Revised Course | Question Pa...
B.Sc.IT: Semester - VI (December - 2017) [IDOL - Revised Course | Question Pa...B.Sc.IT: Semester - VI (December - 2017) [IDOL - Revised Course | Question Pa...
B.Sc.IT: Semester - VI (December - 2017) [IDOL - Revised Course | Question Pa...
 
Digital Signals and System (April – 2015) [Revised Syllabus | Question Paper]
Digital Signals and System (April  – 2015) [Revised Syllabus | Question Paper]Digital Signals and System (April  – 2015) [Revised Syllabus | Question Paper]
Digital Signals and System (April – 2015) [Revised Syllabus | Question Paper]
 
Data structure
Data structureData structure
Data structure
 
Dfs presentation
Dfs presentationDfs presentation
Dfs presentation
 
Functions
FunctionsFunctions
Functions
 
Jarrar: Informed Search
Jarrar: Informed Search  Jarrar: Informed Search
Jarrar: Informed Search
 
C applications
C applicationsC applications
C applications
 
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]
B.Sc.IT: Semester - VI (October - 2016) [IDOL - Revised Course | Question Paper]
 
Solving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) SearchSolving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) Search
 
Digital Signals and Systems (October – 2016) [Question Paper | IDOL: Revised ...
Digital Signals and Systems (October – 2016) [Question Paper | IDOL: Revised ...Digital Signals and Systems (October – 2016) [Question Paper | IDOL: Revised ...
Digital Signals and Systems (October – 2016) [Question Paper | IDOL: Revised ...
 
Graphical Models In Python | Edureka
Graphical Models In Python | EdurekaGraphical Models In Python | Edureka
Graphical Models In Python | Edureka
 
Formal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updatesFormal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updates
 
Informed search (heuristics)
Informed search (heuristics)Informed search (heuristics)
Informed search (heuristics)
 
Lecture 12 Heuristic Searches
Lecture 12 Heuristic SearchesLecture 12 Heuristic Searches
Lecture 12 Heuristic Searches
 
Lecture 08 uninformed search techniques
Lecture 08 uninformed search techniquesLecture 08 uninformed search techniques
Lecture 08 uninformed search techniques
 
[Question Paper] Data Communication and Network Standards (Revised Course) [J...
[Question Paper] Data Communication and Network Standards (Revised Course) [J...[Question Paper] Data Communication and Network Standards (Revised Course) [J...
[Question Paper] Data Communication and Network Standards (Revised Course) [J...
 
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
 
A star algorithms
A star algorithmsA star algorithms
A star algorithms
 
AI Lesson 04
AI Lesson 04AI Lesson 04
AI Lesson 04
 

Viewers also liked

Mlab2012 tabei 20120806
Mlab2012 tabei 20120806Mlab2012 tabei 20120806
Mlab2012 tabei 20120806
Yasuo Tabei
 
SPIRE2013-tabei20131009
SPIRE2013-tabei20131009SPIRE2013-tabei20131009
SPIRE2013-tabei20131009
Yasuo Tabei
 
CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306
Yasuo Tabei
 
WABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTreeWABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTree
Yasuo Tabei
 
Gwt sdm public
Gwt sdm publicGwt sdm public
Gwt sdm public
Yasuo Tabei
 
NIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributesNIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributesYasuo Tabei
 
Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data MatricesScalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Yasuo Tabei
 
Gwt presen alsip-20111201
Gwt presen alsip-20111201Gwt presen alsip-20111201
Gwt presen alsip-20111201
Yasuo Tabei
 
Dmss2011 public
Dmss2011 publicDmss2011 public
Dmss2011 public
Yasuo Tabei
 
Sketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - publicSketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - public
Yasuo Tabei
 
Sketch sort ochadai20101015-public
Sketch sort ochadai20101015-publicSketch sort ochadai20101015-public
Sketch sort ochadai20101015-public
Yasuo Tabei
 
Ibisml2011 06-20
Ibisml2011 06-20Ibisml2011 06-20
Ibisml2011 06-20Yasuo Tabei
 
GIW2013
GIW2013GIW2013
GIW2013
Yasuo Tabei
 
Kdd2015reading-tabei
Kdd2015reading-tabeiKdd2015reading-tabei
Kdd2015reading-tabei
Yasuo Tabei
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant Space
Yasuo Tabei
 
CSMR11b.ppt
CSMR11b.pptCSMR11b.ppt
CSMR11b.ppt
Ptidej Team
 
20110501 csseminar rybalkin_substructure_search
20110501 csseminar rybalkin_substructure_search20110501 csseminar rybalkin_substructure_search
20110501 csseminar rybalkin_substructure_searchComputer Science Club
 
Jayant lrs
Jayant lrsJayant lrs
Jayant lrs
Jayant Apte, PhD
 
LEXBFS on Chordal Graphs
LEXBFS on Chordal GraphsLEXBFS on Chordal Graphs
LEXBFS on Chordal Graphs
nazlitemu
 

Viewers also liked (20)

Mlab2012 tabei 20120806
Mlab2012 tabei 20120806Mlab2012 tabei 20120806
Mlab2012 tabei 20120806
 
SPIRE2013-tabei20131009
SPIRE2013-tabei20131009SPIRE2013-tabei20131009
SPIRE2013-tabei20131009
 
CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306
 
WABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTreeWABI2012-SuccinctMultibitTree
WABI2012-SuccinctMultibitTree
 
Gwt sdm public
Gwt sdm publicGwt sdm public
Gwt sdm public
 
NIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributesNIPS2013読み会: Scalable kernels for graphs with continuous attributes
NIPS2013読み会: Scalable kernels for graphs with continuous attributes
 
Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data MatricesScalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices
 
Gwt presen alsip-20111201
Gwt presen alsip-20111201Gwt presen alsip-20111201
Gwt presen alsip-20111201
 
Dmss2011 public
Dmss2011 publicDmss2011 public
Dmss2011 public
 
Sketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - publicSketch sort sugiyamalab-20101026 - public
Sketch sort sugiyamalab-20101026 - public
 
Sketch sort ochadai20101015-public
Sketch sort ochadai20101015-publicSketch sort ochadai20101015-public
Sketch sort ochadai20101015-public
 
Ibisml2011 06-20
Ibisml2011 06-20Ibisml2011 06-20
Ibisml2011 06-20
 
GIW2013
GIW2013GIW2013
GIW2013
 
Kdd2015reading-tabei
Kdd2015reading-tabeiKdd2015reading-tabei
Kdd2015reading-tabei
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant Space
 
Lp Boost
Lp BoostLp Boost
Lp Boost
 
CSMR11b.ppt
CSMR11b.pptCSMR11b.ppt
CSMR11b.ppt
 
20110501 csseminar rybalkin_substructure_search
20110501 csseminar rybalkin_substructure_search20110501 csseminar rybalkin_substructure_search
20110501 csseminar rybalkin_substructure_search
 
Jayant lrs
Jayant lrsJayant lrs
Jayant lrs
 
LEXBFS on Chordal Graphs
LEXBFS on Chordal GraphsLEXBFS on Chordal Graphs
LEXBFS on Chordal Graphs
 

Similar to Lgm pakdd2011 public

gSpan algorithm
 gSpan algorithm gSpan algorithm
gSpan algorithm
Sadik Mussah
 
Graphs In Data Structure
Graphs In Data StructureGraphs In Data Structure
Graphs In Data Structure
Anuj Modi
 
Graphs In Data Structure
Graphs In Data StructureGraphs In Data Structure
Graphs In Data Structure
Anuj Modi
 
Graph data structure
Graph data structureGraph data structure
Graph data structure
Tech_MX
 
Skiena algorithm 2007 lecture12 topological sort connectivity
Skiena algorithm 2007 lecture12 topological sort connectivitySkiena algorithm 2007 lecture12 topological sort connectivity
Skiena algorithm 2007 lecture12 topological sort connectivity
zukun
 
Attributed Graph Matching of Planar Graphs
Attributed Graph Matching of Planar GraphsAttributed Graph Matching of Planar Graphs
Attributed Graph Matching of Planar Graphs
Raül Arlàndez
 
Lecture 8
Lecture 8Lecture 8
Lecture 8
Wael Sharba
 
Data Structures - Lecture 10 [Graphs]
Data Structures - Lecture 10 [Graphs]Data Structures - Lecture 10 [Graphs]
Data Structures - Lecture 10 [Graphs]
Muhammad Hammad Waseem
 
Graph theory
Graph theoryGraph theory
Graph theory
grahamwell
 
Propertiesofexponents
PropertiesofexponentsPropertiesofexponents
Propertiesofexponents
sgrandstaff
 
Graph
GraphGraph
Graph
sakthisree
 
Double Patterning (4/2 update)
Double Patterning (4/2 update)Double Patterning (4/2 update)
Double Patterning (4/2 update)
Danny Luk
 
6.2 Notes
6.2 Notes6.2 Notes
6.2 Notes
mbetzel
 
Object Recognition with Deformable Models
Object Recognition with Deformable ModelsObject Recognition with Deformable Models
Object Recognition with Deformable Models
zukun
 
RTree Spatial Indexing with MongoDB - MongoDC
RTree Spatial Indexing with MongoDB - MongoDC RTree Spatial Indexing with MongoDB - MongoDC
RTree Spatial Indexing with MongoDB - MongoDC
Nicholas Knize, Ph.D., GISP
 
Surveys
SurveysSurveys
MinFill_Presentation
MinFill_PresentationMinFill_Presentation
MinFill_Presentation
Anna Lasota
 
Lec28
Lec28Lec28
Unit 2: All
Unit 2: AllUnit 2: All
Unit 2: All
Hector Zenil
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblas
graphulo
 

Similar to Lgm pakdd2011 public (20)

gSpan algorithm
 gSpan algorithm gSpan algorithm
gSpan algorithm
 
Graphs In Data Structure
Graphs In Data StructureGraphs In Data Structure
Graphs In Data Structure
 
Graphs In Data Structure
Graphs In Data StructureGraphs In Data Structure
Graphs In Data Structure
 
Graph data structure
Graph data structureGraph data structure
Graph data structure
 
Skiena algorithm 2007 lecture12 topological sort connectivity
Skiena algorithm 2007 lecture12 topological sort connectivitySkiena algorithm 2007 lecture12 topological sort connectivity
Skiena algorithm 2007 lecture12 topological sort connectivity
 
Attributed Graph Matching of Planar Graphs
Attributed Graph Matching of Planar GraphsAttributed Graph Matching of Planar Graphs
Attributed Graph Matching of Planar Graphs
 
Lecture 8
Lecture 8Lecture 8
Lecture 8
 
Data Structures - Lecture 10 [Graphs]
Data Structures - Lecture 10 [Graphs]Data Structures - Lecture 10 [Graphs]
Data Structures - Lecture 10 [Graphs]
 
Graph theory
Graph theoryGraph theory
Graph theory
 
Propertiesofexponents
PropertiesofexponentsPropertiesofexponents
Propertiesofexponents
 
Graph
GraphGraph
Graph
 
Double Patterning (4/2 update)
Double Patterning (4/2 update)Double Patterning (4/2 update)
Double Patterning (4/2 update)
 
6.2 Notes
6.2 Notes6.2 Notes
6.2 Notes
 
Object Recognition with Deformable Models
Object Recognition with Deformable ModelsObject Recognition with Deformable Models
Object Recognition with Deformable Models
 
RTree Spatial Indexing with MongoDB - MongoDC
RTree Spatial Indexing with MongoDB - MongoDC RTree Spatial Indexing with MongoDB - MongoDC
RTree Spatial Indexing with MongoDB - MongoDC
 
Surveys
SurveysSurveys
Surveys
 
MinFill_Presentation
MinFill_PresentationMinFill_Presentation
MinFill_Presentation
 
Lec28
Lec28Lec28
Lec28
 
Unit 2: All
Unit 2: AllUnit 2: All
Unit 2: All
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblas
 

Recently uploaded

“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 

Recently uploaded (20)

“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 

Lgm pakdd2011 public

  • 1. The 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2011) 25 May 2011 LGM: Mining Frequent Subgraphs from Linear Graphs Yasuo Tabei ERATO Minato Project Japan Science and Technology Agency joint work with Daisuke Okanohara (Preferred Infrastructure), Shuichi Hirose (AIST), Koji Tsuda (AIST) 1 1
  • 2. Outline • Introduction to linear graph ★ Linear subgraph relation ★ Total order among edges • Frequent subgraph mining from a set of linear graphs • Experiments ★ Motif extraction from protein 3D structures 2 2
  • 3. Linear graph (Davydov et al., 2004) • Labeled graph whose vertices are totally ordered • Linear graph g = (V, E, L , L ) V E ‣ V ⊂ N : ordered vertex set ‣ E ⊆ V × V : edge set ‣ LV → ΣV : vertex labels ‣L →Σ E E : edge labels Example: c b a a 1 2 3 4 5 6 A B A B C A 3 3
  • 4. Linear subgraph relation • g1 is a linear subgraph of g2 i) Conventional subgraph condition ★ Vertex labels are matched ★ All edges of g1 exist in g2 with the correct labels ii) Order of vertices are conserved Example: b b c 1 a 2 3 ⊂ a a 1 2 3 4 5 6 A B A A A B B C A g1 g2 4 4
  • 5. Subgraph but not linear subgraph • g1 is a subgraph of g2 ★ vertex labels are matched ★ all edges in g1also exist in g2 with correct labels • g1 is not a linear subgraph of g2 ★ the order of vertices is not conserved b b c a c 1 2 3 1 2 3 4 A A B A A B A g1 g2 5 5
  • 6. Total order among edges in a linear graph • Compare the left vertices first. If they are identical, look at the right vertices • ∀e1 = (i, j) , e2 = (k, l) ∈ Eg , e1 <e e2 if and only if (i) i < k or (ii) i = k, j < l Example: e1 e2 2 3 1 i j k l 1 2 3 4 6 6
  • 7. Outline • Introduction to linear graph ★ linear subgraph relation ★ Total order among edges • Frequent subgraph mining from a set of linear graphs • Experiments ★ Motif extraction from protein 3D structures 7 7
  • 8. Frequent subgraph mining from linear graphs • Enumerate all frequent subgraphs from a set of linear graphs ★ Subgraphs included in a set of linear graphs at least τ times (minimum support threshold) ★ Enumerate connected and disconnected subgraphs with a unified framework ★ Use reverse search for an efficient enumeration (Avis and Fukuda, 1993) • Polynomial delay ★ gSpan = exponential delay 8 8
  • 9. Enumeration of all linear subgraph of a linear graph • Before considering a mining algorithm, we have to solve the problem of subgraph enumeration first • How to enumerate graph withoutof the following linear all subgraphs duplication 9 9
  • 10. Search lattice of all subgraphs !"#$% *+,-+!./!0+12!3!24 & ' ( ) 10 10
  • 11. Reverse search (Avis and Fukuda, 1993) • To enumerate all subgraphs without duplication, we need to define a search tree in the search lattice • Reduction map f ★ Mapping from a child to its parent ★ Remove the largest edge 2 3 f 2 1 1 1 2 3 4 1 2 3 11 11
  • 12. Search tree induced by the reduction map • By applying the reduction map to each element, search tree can be induced !"#$% 12 12
  • 13. Inverting the reduction map f −1 • When traversing the tree from the root, children nodes are created on demand • In most cases, the inversion of reduction map takes the following two steps: ★ Consider all children candidates ★ Take the ones that qualify the reduction map • However, in this particular case, the reduction map can be inverted explicitly ★ Can derive the pattern extension rule (parent to children) 13 13
  • 15. Traversing search tree from root • Depth first traversal for its memory efficiency $&!'()*+!,$'!-+! .!/')--!-'!-+! !"#$% 15 15
  • 16. Frequent subgraph mining • Basic idea: find all possible extensions of a current pattern in the graph database, and extend the pattern • Occurrence list L G (g) ★ Record every occurrence of a pattern g in the graph database G ★ Calculate the support of a pattern g by the occurrence list !"#$%&'($"" • Usesupport for pruningof the anti-monotonicity )$*+,+- 16 16
  • 17. Outline • Introduction to linear graph ★ linear subgraph relation ★ Total order among edges • Frequent subgraph mining from a set of linear graphs • Experiments ★ Motif extraction from protein 3D structures 17 17
  • 18. Motif extraction from protein 3D structures • Pairs of homologous proteins in thermophilic organism and mesophilic organism • Construct a linear graph from a protein ★ Use vertex order from N- to C- terminal ★ Assign vertex labels from {1,...,6} ★ Draw an edge between pairs of amino acid residues whose distance is 5Å • # of data:742, avg. # of vertices:371, avg. # of edges: 496 • Rank the enumerated patterns by statistical significance (p-value) ★ Association to thermophilic/methophilic labels ★ Fisher exact test 18 18
  • 19. Runtime comparison • Compared to gSpan • Made gapped linear graphs and run gSpan • LGM is faster than gSpan 19 19
  • 20. • Minimum support = 10 • 103 patterns whose p-value < 0.001 •★Thermophilic (TATA), Mesophilic (pol II) Share the function as DNA binding protein, but the thermostatility is different 20 20
  • 21. Mapping motifs in 3D structure • Thermophilic (TATA), Mesophilic (pol II) 21 21
  • 22. Summary • Efficient subgraph mining algorithm from linear graphs • Search tree is defined by reverse search principle • Patterns include disconnected subgraphs • Computational time is polynomial-delay • Interesting patterns from proteins 22 22