Frequent subgraph discovery for      a single large graph
Agenda•   Motivation•   Summary of existing approaches ?•   Support computations•   Comparison and Evaluation
Background• Frequent subgraph mining  – Graph-transection setting (for graph datasets)     • Many small graphs  – Single-g...
Challenge• Difficulty of defining the support in a large  graph  – Property of anti-monotone is required in pruning    the...
Subgraph Support• The most intuitive definition   – Count of embeddings in input graph       • Not anti-monotone Count of ...
Motivation• Suggest a new definition of support for  subgraph that  – Resulting support is anti-monotone  – Support can be...
Agenda• Motivation• Summary of existing approaches• Support computations  – Simple overlap                         Overlap...
Overlap based support• The size of maximum independent set (MIS)  – Find overlaps  – Find maximum independent node size
Overlap• Sharing at least one node in each embeddings• 𝑉1 ∩ 𝑉2 ≠ ∅    (𝑉1 , 𝑉2 : 𝑛𝑜𝑑𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔𝑠)  Embedding i...
Overlap Graph• 𝑂 = (𝑉 𝑂 , 𝐸 𝑂 )   – 𝑉 𝑂 : set of embeddings as its node set   – 𝐸 𝑂 = { 𝑓1 , 𝑓2 |      𝑓1 , 𝑓2 ∈ 𝑉 𝑂 ∧ 𝑓1 ...
Maximum Independent Set Support• Independent node set of Graph 𝐺 = (𝑉, 𝐸)  – 𝐼 ⊆ 𝑉 𝑤𝑖𝑡ℎ ∀𝑢, 𝑣 ∈ 𝐼: 𝑢, 𝑣 ∈ 𝐸  – Maximum ind...
Harmful Overlap Support(1/3)• MIS-support  – Considering any overlap as harmful• Overlap is Not necessarily harmful  – Ant...
Harmful Overlap Support(2/3)• Harmful Overlap Graph 𝐻 = (𝑉 𝐻 , 𝐸 𝐻 )  – 𝑉 𝐻 : set of embeddings as its node set  – 𝐸 𝐻 = {...
Harmful Overlap(3/3)• Completing anti-monotone property                                      #A : 2                       ...
Note• Harmful overlap is a weaker concept than  simple overlap  – HO-support is never lower than MIS-support              ...
Experiment• Support computation as Part of the  MoSS(Molecular Substructure Miner) program  – IC93 dataset[7]     • 1283 m...
Result• Vertical axis: Number of frequent subgraphs of  which support exceeds threshold• Horizontal axis: Number of nodes ...
Agenda• Motivation• Summary of existing approaches• Support computations  – Simple overlap  – Harmful overlap  – Minimum i...
Minimum image based definition• Minimum image based support of p in g  – Number of unique nodes mapped      1      2      ...
BenefitsI. Instead of 𝑂(𝑁 2 ) 𝑜𝑣𝑒𝑟𝑙𝑎𝑝𝑠, 𝑂 𝑁 𝑑𝑎𝑡𝑎𝑠𝑒𝑡II. No NP-compete MIS problemIII. Not necessary to compute all occurren...
Agenda• Motivation• Summary of existing approaches• Support computations  – Simple overlap  – Harmful overlap  – Minimum i...
Embedding of a Pattern• 𝑃𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 = (𝑉𝑝 , 𝐸 𝑝 , 𝜆 𝑝 )• 𝐷𝑎𝑡𝑎 𝑔𝑟𝑎𝑝ℎ 𝑔 = (𝑉𝑔 , 𝐸 𝑔 , 𝜆 𝑔 )• 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝜑: 𝑉𝑝 → 𝑉𝑔
Three support measures• Simple Overlap   – 𝑂𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝜑 𝑎𝑛𝑑 𝜑 ′ 𝑜𝑓 𝑝𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 𝑒𝑥𝑖𝑠𝑡𝑠 𝑖𝑓                 𝜑(𝑉𝑝 ) ∩ 𝜑 ′ 𝑉𝑝 ≠ ∅...
Comparison𝜎1 = 1    <     𝜎2 = 2      <     𝜎3 = 3Overlap   harmful overlap       Minimum image
Experimental Setting• Comparisons of Image-based and overlap-  based algorithms• Dataset  – WebKB dataset (4 large graphs ...
Experiment Result
Conclusion• Conclusion  – Overlap based support measure that is anti-    monotone  – Maximum image based algorithm that is...
Upcoming SlideShare
Loading in …5
×

120808

378 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
378
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

120808

  1. 1. Frequent subgraph discovery for a single large graph
  2. 2. Agenda• Motivation• Summary of existing approaches ?• Support computations• Comparison and Evaluation
  3. 3. Background• Frequent subgraph mining – Graph-transection setting (for graph datasets) • Many small graphs – Single-graph setting • One big graph• New problem for single-graph setting – Definition of support
  4. 4. Challenge• Difficulty of defining the support in a large graph – Property of anti-monotone is required in pruning the search space• Anti-monotone – A⊂B ⇒sup(A) > sup(B)
  5. 5. Subgraph Support• The most intuitive definition – Count of embeddings in input graph • Not anti-monotone Count of embeddings 1 2 2 5
  6. 6. Motivation• Suggest a new definition of support for subgraph that – Resulting support is anti-monotone – Support can be computed efficiently• Three Support computation algorithms – Overlap based (2) – Minimum image based (1)
  7. 7. Agenda• Motivation• Summary of existing approaches• Support computations – Simple overlap Overlap based methods – Harmful overlap – Minimum image• Comparison and Evaluation
  8. 8. Overlap based support• The size of maximum independent set (MIS) – Find overlaps – Find maximum independent node size
  9. 9. Overlap• Sharing at least one node in each embeddings• 𝑉1 ∩ 𝑉2 ≠ ∅ (𝑉1 , 𝑉2 : 𝑛𝑜𝑑𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔𝑠) Embedding is an occurrence of pattern 9
  10. 10. Overlap Graph• 𝑂 = (𝑉 𝑂 , 𝐸 𝑂 ) – 𝑉 𝑂 : set of embeddings as its node set – 𝐸 𝑂 = { 𝑓1 , 𝑓2 | 𝑓1 , 𝑓2 ∈ 𝑉 𝑂 ∧ 𝑓1 ≡ 𝑓2 ∧ 𝑉1 ∩ 𝑉2 ≠ ∅ 1 ∈ 1 , 𝑓2 ∈ 𝑓 𝑉 𝑉 2 } – If two embeddings share at least one node, nodes of overlap graph is connected 10
  11. 11. Maximum Independent Set Support• Independent node set of Graph 𝐺 = (𝑉, 𝐸) – 𝐼 ⊆ 𝑉 𝑤𝑖𝑡ℎ ∀𝑢, 𝑣 ∈ 𝐼: 𝑢, 𝑣 ∈ 𝐸 – Maximum independent node set need not to be unique The size of The size of maximum independent node set : 1 maximum independent node set : 2• MIS-support = size of maximum independent node set 11
  12. 12. Harmful Overlap Support(1/3)• MIS-support – Considering any overlap as harmful• Overlap is Not necessarily harmful – Anti-monotone property is important 12
  13. 13. Harmful Overlap Support(2/3)• Harmful Overlap Graph 𝐻 = (𝑉 𝐻 , 𝐸 𝐻 ) – 𝑉 𝐻 : set of embeddings as its node set – 𝐸 𝐻 = {(𝑓1 , 𝑓2 )|𝑓1 , 𝑓2 ∈ 𝑉 𝐻 ∧ 𝑓1 ≡ 𝑓2 ∧ 𝑉1 = 𝑉2 ∨ 𝑎𝑛𝑐𝑒𝑠𝑡𝑜𝑟𝑠 𝑜𝑓 𝑉1 = 𝑎𝑛𝑐𝑒𝑠𝑡𝑜𝑟𝑠 𝑜𝑓 𝑉2 𝑓1 ∈ 𝑉1 , 𝑓2 ∈ 𝑉2 }• HO-support In this case, MIS-support = 1, HO-support = 2 13
  14. 14. Harmful Overlap(3/3)• Completing anti-monotone property #A : 2 #B : 3 #AB : 2 #BAB : 2 14
  15. 15. Note• Harmful overlap is a weaker concept than simple overlap – HO-support is never lower than MIS-support 15
  16. 16. Experiment• Support computation as Part of the MoSS(Molecular Substructure Miner) program – IC93 dataset[7] • 1283 molecules forms a connected component – Tic-Tac-Toc win dataset • This consists of 626 connected components 16
  17. 17. Result• Vertical axis: Number of frequent subgraphs of which support exceeds threshold• Horizontal axis: Number of nodes (of pattern)?• In the case IC93 – Up to 30% more • Due to heavily overlapping with of carbon atoms• In the case Tic-Tac-Toe – Around 5 % more 17
  18. 18. Agenda• Motivation• Summary of existing approaches• Support computations – Simple overlap – Harmful overlap – Minimum image• Comparison and Evaluation
  19. 19. Minimum image based definition• Minimum image based support of p in g – Number of unique nodes mapped 1 2 Embeddings Unique 1 3 3 5 3 3 2 2 4 4 2 3 1 5 3 3 4 5
  20. 20. BenefitsI. Instead of 𝑂(𝑁 2 ) 𝑜𝑣𝑒𝑟𝑙𝑎𝑝𝑠, 𝑂 𝑁 𝑑𝑎𝑡𝑎𝑠𝑒𝑡II. No NP-compete MIS problemIII. Not necessary to compute all occurrence, only for all nodes
  21. 21. Agenda• Motivation• Summary of existing approaches• Support computations – Simple overlap – Harmful overlap – Minimum image• Comparison and Evaluation
  22. 22. Embedding of a Pattern• 𝑃𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 = (𝑉𝑝 , 𝐸 𝑝 , 𝜆 𝑝 )• 𝐷𝑎𝑡𝑎 𝑔𝑟𝑎𝑝ℎ 𝑔 = (𝑉𝑔 , 𝐸 𝑔 , 𝜆 𝑔 )• 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝜑: 𝑉𝑝 → 𝑉𝑔
  23. 23. Three support measures• Simple Overlap – 𝑂𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝜑 𝑎𝑛𝑑 𝜑 ′ 𝑜𝑓 𝑝𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 𝑒𝑥𝑖𝑠𝑡𝑠 𝑖𝑓 𝜑(𝑉𝑝 ) ∩ 𝜑 ′ 𝑉𝑝 ≠ ∅• Harmful overlap – 𝑂𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝜑 𝑎𝑛𝑑 𝜑 ′ 𝑜𝑓 𝑝𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 𝑒𝑥𝑖𝑠𝑡𝑠 𝑖𝑓 ∃𝑣 ∈ 𝑉𝑝 : 𝜑 𝑣 , 𝜑′(𝑣) ∈ 𝜑(𝑉𝑝 ) ∩ 𝜑 ′ 𝑉𝑝• Minimum image based support of p in g – 𝜎3 𝑝, 𝑔 = min |{𝜑 𝑖 𝑣 : 𝜑 𝑖 𝑖𝑠 𝑎𝑛 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑜𝑓 𝑝 𝑖𝑛 𝑔}| 𝑣∈𝑉 𝑝
  24. 24. Comparison𝜎1 = 1 < 𝜎2 = 2 < 𝜎3 = 3Overlap harmful overlap Minimum image
  25. 25. Experimental Setting• Comparisons of Image-based and overlap- based algorithms• Dataset – WebKB dataset (4 large graphs of structure of web pages)
  26. 26. Experiment Result
  27. 27. Conclusion• Conclusion – Overlap based support measure that is anti- monotone – Maximum image based algorithm that is more efficient than previous ones

×