Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

No Downloads

Total views

378

On SlideShare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

5

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Frequent subgraph discovery for a single large graph
- 2. Agenda• Motivation• Summary of existing approaches ?• Support computations• Comparison and Evaluation
- 3. Background• Frequent subgraph mining – Graph-transection setting (for graph datasets) • Many small graphs – Single-graph setting • One big graph• New problem for single-graph setting – Definition of support
- 4. Challenge• Difficulty of defining the support in a large graph – Property of anti-monotone is required in pruning the search space• Anti-monotone – A⊂B ⇒sup(A) > sup(B)
- 5. Subgraph Support• The most intuitive definition – Count of embeddings in input graph • Not anti-monotone Count of embeddings 1 2 2 5
- 6. Motivation• Suggest a new definition of support for subgraph that – Resulting support is anti-monotone – Support can be computed efficiently• Three Support computation algorithms – Overlap based (2) – Minimum image based (1)
- 7. Agenda• Motivation• Summary of existing approaches• Support computations – Simple overlap Overlap based methods – Harmful overlap – Minimum image• Comparison and Evaluation
- 8. Overlap based support• The size of maximum independent set (MIS) – Find overlaps – Find maximum independent node size
- 9. Overlap• Sharing at least one node in each embeddings• 𝑉1 ∩ 𝑉2 ≠ ∅ (𝑉1 , 𝑉2 : 𝑛𝑜𝑑𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔𝑠) Embedding is an occurrence of pattern 9
- 10. Overlap Graph• 𝑂 = (𝑉 𝑂 , 𝐸 𝑂 ) – 𝑉 𝑂 : set of embeddings as its node set – 𝐸 𝑂 = { 𝑓1 , 𝑓2 | 𝑓1 , 𝑓2 ∈ 𝑉 𝑂 ∧ 𝑓1 ≡ 𝑓2 ∧ 𝑉1 ∩ 𝑉2 ≠ ∅ 1 ∈ 1 , 𝑓2 ∈ 𝑓 𝑉 𝑉 2 } – If two embeddings share at least one node, nodes of overlap graph is connected 10
- 11. Maximum Independent Set Support• Independent node set of Graph 𝐺 = (𝑉, 𝐸) – 𝐼 ⊆ 𝑉 𝑤𝑖𝑡ℎ ∀𝑢, 𝑣 ∈ 𝐼: 𝑢, 𝑣 ∈ 𝐸 – Maximum independent node set need not to be unique The size of The size of maximum independent node set : 1 maximum independent node set : 2• MIS-support = size of maximum independent node set 11
- 12. Harmful Overlap Support(1/3)• MIS-support – Considering any overlap as harmful• Overlap is Not necessarily harmful – Anti-monotone property is important 12
- 13. Harmful Overlap Support(2/3)• Harmful Overlap Graph 𝐻 = (𝑉 𝐻 , 𝐸 𝐻 ) – 𝑉 𝐻 : set of embeddings as its node set – 𝐸 𝐻 = {(𝑓1 , 𝑓2 )|𝑓1 , 𝑓2 ∈ 𝑉 𝐻 ∧ 𝑓1 ≡ 𝑓2 ∧ 𝑉1 = 𝑉2 ∨ 𝑎𝑛𝑐𝑒𝑠𝑡𝑜𝑟𝑠 𝑜𝑓 𝑉1 = 𝑎𝑛𝑐𝑒𝑠𝑡𝑜𝑟𝑠 𝑜𝑓 𝑉2 𝑓1 ∈ 𝑉1 , 𝑓2 ∈ 𝑉2 }• HO-support In this case, MIS-support = 1, HO-support = 2 13
- 14. Harmful Overlap(3/3)• Completing anti-monotone property #A : 2 #B : 3 #AB : 2 #BAB : 2 14
- 15. Note• Harmful overlap is a weaker concept than simple overlap – HO-support is never lower than MIS-support 15
- 16. Experiment• Support computation as Part of the MoSS(Molecular Substructure Miner) program – IC93 dataset[7] • 1283 molecules forms a connected component – Tic-Tac-Toc win dataset • This consists of 626 connected components 16
- 17. Result• Vertical axis: Number of frequent subgraphs of which support exceeds threshold• Horizontal axis: Number of nodes (of pattern)?• In the case IC93 – Up to 30% more • Due to heavily overlapping with of carbon atoms• In the case Tic-Tac-Toe – Around 5 % more 17
- 18. Agenda• Motivation• Summary of existing approaches• Support computations – Simple overlap – Harmful overlap – Minimum image• Comparison and Evaluation
- 19. Minimum image based definition• Minimum image based support of p in g – Number of unique nodes mapped 1 2 Embeddings Unique 1 3 3 5 3 3 2 2 4 4 2 3 1 5 3 3 4 5
- 20. BenefitsI. Instead of 𝑂(𝑁 2 ) 𝑜𝑣𝑒𝑟𝑙𝑎𝑝𝑠, 𝑂 𝑁 𝑑𝑎𝑡𝑎𝑠𝑒𝑡II. No NP-compete MIS problemIII. Not necessary to compute all occurrence, only for all nodes
- 21. Agenda• Motivation• Summary of existing approaches• Support computations – Simple overlap – Harmful overlap – Minimum image• Comparison and Evaluation
- 22. Embedding of a Pattern• 𝑃𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 = (𝑉𝑝 , 𝐸 𝑝 , 𝜆 𝑝 )• 𝐷𝑎𝑡𝑎 𝑔𝑟𝑎𝑝ℎ 𝑔 = (𝑉𝑔 , 𝐸 𝑔 , 𝜆 𝑔 )• 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝜑: 𝑉𝑝 → 𝑉𝑔
- 23. Three support measures• Simple Overlap – 𝑂𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝜑 𝑎𝑛𝑑 𝜑 ′ 𝑜𝑓 𝑝𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 𝑒𝑥𝑖𝑠𝑡𝑠 𝑖𝑓 𝜑(𝑉𝑝 ) ∩ 𝜑 ′ 𝑉𝑝 ≠ ∅• Harmful overlap – 𝑂𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝜑 𝑎𝑛𝑑 𝜑 ′ 𝑜𝑓 𝑝𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 𝑒𝑥𝑖𝑠𝑡𝑠 𝑖𝑓 ∃𝑣 ∈ 𝑉𝑝 : 𝜑 𝑣 , 𝜑′(𝑣) ∈ 𝜑(𝑉𝑝 ) ∩ 𝜑 ′ 𝑉𝑝• Minimum image based support of p in g – 𝜎3 𝑝, 𝑔 = min |{𝜑 𝑖 𝑣 : 𝜑 𝑖 𝑖𝑠 𝑎𝑛 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑜𝑓 𝑝 𝑖𝑛 𝑔}| 𝑣∈𝑉 𝑝
- 24. Comparison𝜎1 = 1 < 𝜎2 = 2 < 𝜎3 = 3Overlap harmful overlap Minimum image
- 25. Experimental Setting• Comparisons of Image-based and overlap- based algorithms• Dataset – WebKB dataset (4 large graphs of structure of web pages)
- 26. Experiment Result
- 27. Conclusion• Conclusion – Overlap based support measure that is anti- monotone – Maximum image based algorithm that is more efficient than previous ones

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment