Upcoming SlideShare
×

# 120808

378 views

Published on

Published in: Technology
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
378
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
5
0
Likes
1
Embeds 0
No embeds

No notes for slide

### 120808

1. 1. Frequent subgraph discovery for a single large graph
2. 2. Agenda• Motivation• Summary of existing approaches ?• Support computations• Comparison and Evaluation
3. 3. Background• Frequent subgraph mining – Graph-transection setting (for graph datasets) • Many small graphs – Single-graph setting • One big graph• New problem for single-graph setting – Definition of support
4. 4. Challenge• Difficulty of defining the support in a large graph – Property of anti-monotone is required in pruning the search space• Anti-monotone – A⊂B ⇒sup(A) > sup(B)
5. 5. Subgraph Support• The most intuitive definition – Count of embeddings in input graph • Not anti-monotone Count of embeddings 1 2 2 5
6. 6. Motivation• Suggest a new definition of support for subgraph that – Resulting support is anti-monotone – Support can be computed efficiently• Three Support computation algorithms – Overlap based (2) – Minimum image based (1)
7. 7. Agenda• Motivation• Summary of existing approaches• Support computations – Simple overlap Overlap based methods – Harmful overlap – Minimum image• Comparison and Evaluation
8. 8. Overlap based support• The size of maximum independent set (MIS) – Find overlaps – Find maximum independent node size
9. 9. Overlap• Sharing at least one node in each embeddings• 𝑉1 ∩ 𝑉2 ≠ ∅ (𝑉1 , 𝑉2 : 𝑛𝑜𝑑𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔𝑠) Embedding is an occurrence of pattern 9
10. 10. Overlap Graph• 𝑂 = (𝑉 𝑂 , 𝐸 𝑂 ) – 𝑉 𝑂 : set of embeddings as its node set – 𝐸 𝑂 = { 𝑓1 , 𝑓2 | 𝑓1 , 𝑓2 ∈ 𝑉 𝑂 ∧ 𝑓1 ≡ 𝑓2 ∧ 𝑉1 ∩ 𝑉2 ≠ ∅ 1 ∈ 1 , 𝑓2 ∈ 𝑓 𝑉 𝑉 2 } – If two embeddings share at least one node, nodes of overlap graph is connected 10
11. 11. Maximum Independent Set Support• Independent node set of Graph 𝐺 = (𝑉, 𝐸) – 𝐼 ⊆ 𝑉 𝑤𝑖𝑡ℎ ∀𝑢, 𝑣 ∈ 𝐼: 𝑢, 𝑣 ∈ 𝐸 – Maximum independent node set need not to be unique The size of The size of maximum independent node set : 1 maximum independent node set : 2• MIS-support = size of maximum independent node set 11
12. 12. Harmful Overlap Support(1/3)• MIS-support – Considering any overlap as harmful• Overlap is Not necessarily harmful – Anti-monotone property is important 12
13. 13. Harmful Overlap Support(2/3)• Harmful Overlap Graph 𝐻 = (𝑉 𝐻 , 𝐸 𝐻 ) – 𝑉 𝐻 : set of embeddings as its node set – 𝐸 𝐻 = {(𝑓1 , 𝑓2 )|𝑓1 , 𝑓2 ∈ 𝑉 𝐻 ∧ 𝑓1 ≡ 𝑓2 ∧ 𝑉1 = 𝑉2 ∨ 𝑎𝑛𝑐𝑒𝑠𝑡𝑜𝑟𝑠 𝑜𝑓 𝑉1 = 𝑎𝑛𝑐𝑒𝑠𝑡𝑜𝑟𝑠 𝑜𝑓 𝑉2 𝑓1 ∈ 𝑉1 , 𝑓2 ∈ 𝑉2 }• HO-support In this case, MIS-support = 1, HO-support = 2 13
14. 14. Harmful Overlap(3/3)• Completing anti-monotone property #A : 2 #B : 3 #AB : 2 #BAB : 2 14
15. 15. Note• Harmful overlap is a weaker concept than simple overlap – HO-support is never lower than MIS-support 15
16. 16. Experiment• Support computation as Part of the MoSS(Molecular Substructure Miner) program – IC93 dataset[7] • 1283 molecules forms a connected component – Tic-Tac-Toc win dataset • This consists of 626 connected components 16
17. 17. Result• Vertical axis: Number of frequent subgraphs of which support exceeds threshold• Horizontal axis: Number of nodes (of pattern)?• In the case IC93 – Up to 30% more • Due to heavily overlapping with of carbon atoms• In the case Tic-Tac-Toe – Around 5 % more 17
18. 18. Agenda• Motivation• Summary of existing approaches• Support computations – Simple overlap – Harmful overlap – Minimum image• Comparison and Evaluation
19. 19. Minimum image based definition• Minimum image based support of p in g – Number of unique nodes mapped 1 2 Embeddings Unique 1 3 3 5 3 3 2 2 4 4 2 3 1 5 3 3 4 5
20. 20. BenefitsI. Instead of 𝑂(𝑁 2 ) 𝑜𝑣𝑒𝑟𝑙𝑎𝑝𝑠, 𝑂 𝑁 𝑑𝑎𝑡𝑎𝑠𝑒𝑡II. No NP-compete MIS problemIII. Not necessary to compute all occurrence, only for all nodes
21. 21. Agenda• Motivation• Summary of existing approaches• Support computations – Simple overlap – Harmful overlap – Minimum image• Comparison and Evaluation
22. 22. Embedding of a Pattern• 𝑃𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 = (𝑉𝑝 , 𝐸 𝑝 , 𝜆 𝑝 )• 𝐷𝑎𝑡𝑎 𝑔𝑟𝑎𝑝ℎ 𝑔 = (𝑉𝑔 , 𝐸 𝑔 , 𝜆 𝑔 )• 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝜑: 𝑉𝑝 → 𝑉𝑔
23. 23. Three support measures• Simple Overlap – 𝑂𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝜑 𝑎𝑛𝑑 𝜑 ′ 𝑜𝑓 𝑝𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 𝑒𝑥𝑖𝑠𝑡𝑠 𝑖𝑓 𝜑(𝑉𝑝 ) ∩ 𝜑 ′ 𝑉𝑝 ≠ ∅• Harmful overlap – 𝑂𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝜑 𝑎𝑛𝑑 𝜑 ′ 𝑜𝑓 𝑝𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 𝑒𝑥𝑖𝑠𝑡𝑠 𝑖𝑓 ∃𝑣 ∈ 𝑉𝑝 : 𝜑 𝑣 , 𝜑′(𝑣) ∈ 𝜑(𝑉𝑝 ) ∩ 𝜑 ′ 𝑉𝑝• Minimum image based support of p in g – 𝜎3 𝑝, 𝑔 = min |{𝜑 𝑖 𝑣 : 𝜑 𝑖 𝑖𝑠 𝑎𝑛 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑜𝑓 𝑝 𝑖𝑛 𝑔}| 𝑣∈𝑉 𝑝
24. 24. Comparison𝜎1 = 1 < 𝜎2 = 2 < 𝜎3 = 3Overlap harmful overlap Minimum image
25. 25. Experimental Setting• Comparisons of Image-based and overlap- based algorithms• Dataset – WebKB dataset (4 large graphs of structure of web pages)
26. 26. Experiment Result
27. 27. Conclusion• Conclusion – Overlap based support measure that is anti- monotone – Maximum image based algorithm that is more efficient than previous ones