Diversiweb2011 07 Approximate subgraph matching - Mitja Trampus

851 views

Published on

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
851
On SlideShare
0
From Embeds
0
Number of Embeds
119
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Diversiweb2011 07 Approximate subgraph matching - Mitja Trampus

  1. 1. Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia ailab.ijs.si
  2. 2. Mining Diversity• Web content varies in many aspects, e.g. o Topical o Social (author, target audience, people written about) o Geographical (publisher, places written about) o Sentiment (positive/negative) o Writing style (structure, vocabulary) o Coverage bias• This work: (micro-)topical diversity o Macroscopic = largely solved o Microscopic = challenge AI Lab, Jozef Stefan Institute ailab.ijs.si
  3. 3. Task:Given a collection of texts on a topic,• identify a common template• align texts to the template AI Lab, Jozef Stefan Institute ailab.ijs.si
  4. 4. Template representation• Syntactic o info1: X people were killed / killed X people / resulted in X casualties o info2: blew up Y / destroyed Y / attacked in a Y• Semantic o kill(bomber, people); count(people, X) o destroy(bomber, Y) X count kill bomber destroy Y people AI Lab, Jozef Stefan Institute ailab.ijs.si
  5. 5. car drive 12 count kill suicide destroychurch believers bomber exit run wear vestSemantic Graph treatment execute attackPrerequisite: receive withstand 100 count kill terroristdemolish hospital patients 2 count police slaughter blow uppolice bomber officer station AI Lab, Jozef Stefan Institute ailab.ijs.si
  6. 6. Mining Templates • Template := subgraph with frequent specializations o Specializations implied by background taxonomy (WordNet) o Threshold frequency manually defined X count kill bomberdestroy building people12 tear count suicide down 2 believers kill church count police slaughter police blow up bomber bomber officer station AI Lab, Jozef Stefan Institute ailab.ijs.si
  7. 7. Semantic Graph Construction1. Data: Google News crawl2. HTML cleanup3. Named entity tagging4. Pronoun resolution (he/she/him/her)5. Named entity consolidation (Barack Obama vs President Obama)6. Parsing, triple/fact/assertion extraction (for now: subj-verb-obj only)7. Ontology/taxonomy alignment8. Merging triples into a graph AI Lab, Jozef Stefan Institute ailab.ijs.si
  8. 8. Approximate subgraph matching number count kill person destroy location people FREQUENT SUBTREE MINING countnumber people kill person destroylocation number count destroy kill person location people GENERALIZE 12 count tear suicide down 2 count slaughter blow up believers kill church police police bomber bomber officer station AI Lab, Jozef Stefan Institute ailab.ijs.si
  9. 9. Approximate subgraph matching number count kill person destroy location people SPECIALIZE number count kill bomberdestroy building people12 tear count suicide down 2 believers kill church count police slaughter police blow up bomber bomber officer station AI Lab, Jozef Stefan Institute ailab.ijs.si
  10. 10. Preliminary results• 5 test domains; for each: o ~10 graphs, ~10000 nodes o 10-60 seconds• At min. support 30% o 20 maximal patterns, 9 manually judged as interesting AI Lab, Jozef Stefan Institute ailab.ijs.si
  11. 11. AI Lab, Jozef Stefan Institute ailab.ijs.si
  12. 12. Conclusion• Future work: o Mapping text -> semantics • Other ontologies? o Interestingness measure for assertions and patterns o Evaluation (precision, recall; multiple domains) o Alternative approaches to generalizing subgraphs• Template extraction is achievable, but not easy• Human filtering of results hard to avoid• Current approach reasonably fast AI Lab, Jozef Stefan Institute ailab.ijs.si
  13. 13. Thank you.AI Lab, Jozef Stefan Institute ailab.ijs.si
  14. 14. Can we extract all relations? Kind of …Thousands of small quakesresumed 18 months ago andcontinue to rattle MammothLakes, June Lake and other MonoCounty resort towns. Thetemblors, most measuring 1 to 3on the Richter scale, startedbeneath Mammoth Mountain.Subject – Verb – Object Triplets Semantic Graph
  15. 15. AI Lab, Jozef Stefan Institute ailab.ijs.si
  16. 16. Templates - why• Interpret content o news archives: structure/annotate old texts, enable semantic search o wikipedia: suggestions for infobox entries• Generate content o wikipedia: a starting point for new articles / a checklist of information to be included• No normative definition of “good Jozef Stefanailab.ijs.si AI Lab, template” Institute
  17. 17. Evaluation• Qualitative o Usage-specific o Not useful for tuning algorithms• Quantitative o Precision o Recall AI Lab, Jozef Stefan Institute ailab.ijs.si

×