Hierarchical taxonomy extraction

576 views

Published on

Slides for the presentation of the paper entitled "Hierarchical Taxonomy Extraction By Mining Topical Query Sessions" in proceedeings of the 1st International Conference on Knowledge Discovery and Information Retrieval (KDIR) celebrated in Madeira in 2009

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
576
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Hierarchical taxonomy extraction

  1. 1. KDIR09International Conference On Knowledge Hierachical taxonomy extraction by mining topical query sessionsDicovery and Information Retrieval 2009 Miguel Fernández Fernández and Daniel Gayo Avello
  2. 2. brittany spearswww.wikipedia.orghorse jumpingauto restorationauto repairclassic car repaircar suppliesclassic car batteriesvintageparts.comlow cost airlinescheap flightseasyjet.com
  3. 3. brittany spearswww.wikipedia.orghorse jumpingauto restorationauto repairclassic car repaircar suppliesclassic car batteriesvintageparts.com “... a series of interactions by thelow cost airlines user toward addressingcheap flights a single informationeasyjet.com need...” Jansen et. al 2007
  4. 4. Unfortunatelly, not all Wang and Zhai 2008. Mining term association patterns fromqueries are equally effective search logs for effective query reformulation.
  5. 5. Unfortunatelly, not all Wang and Zhai 2008. Mining term association patterns fromqueries are equally effective search logs for effective query reformulation. Mispeci fication different people use different words to discribe the same thing!
  6. 6. Unfortunatelly, not all Wang and Zhai 2008. Mining term association patterns fromqueries are equally effective search logs for effective query reformulation. Mispeci fication cification Underspe different people use different user has shallow knowledge about words to discribe the same thing! what he is looking for
  7. 7. How can they be mitigated? Und e n rspec cifi catio ific Mispe atio n
  8. 8. mispecification (typo)query suggestion
  9. 9. mispecification (typo)query suggestion query expansion
  10. 10. mispecification (typo)query suggestion query expansion based on clustering
  11. 11. mispecification (typo)query suggestion query expansion based on clustering
  12. 12. Wuh! Pretty cool, but... Jargon Slang Vague domain knowledge ...are still on the game
  13. 13. Not
  14. 14. Not semantic query sugg & expansioestion n
  15. 15. How?
  16. 16. How?
  17. 17. hyponym |ˈhīpəˌnim|a word of more specific meaning than ageneral or superordinate term applicable to it.
  18. 18. hyponymy|ˈhīpəˌnim| |hyponym | hīˈpänəmēa word of more specific meaning than ageneral or superordinate term applicable to it.
  19. 19. hyponymy|ˈhīpəˌnim| |hyponym | hīˈpänəmēa word of more specific meaning than ageneral or superordinate term applicable to it. Transitivity ➞ deductive power
  20. 20. hyponymy|ˈhīpəˌnim| |hyponym | hīˈpänəmēa word of more specific meaning than ageneral or superordinate term applicable to it. Transitivity ➞ deductive power Socrates is mortal
  21. 21. hyponymy|ˈhīpəˌnim| |hyponym | hīˈpänəmēa word of more specific meaning than ageneral or superordinate term applicable to it. Transitivity ➞ deductive power Socrates is mortal Hyponym semantic equivalence (synsets)
  22. 22. hyponymy|ˈhīpəˌnim| |hyponym | hīˈpänəmēa word of more specific meaning than ageneral or superordinate term applicable to it. Socrates is mortal Transitivity ➞ deductive power Hyponym semantic equivalence (synsets) Ferrari and Lamborghini are luxury cars
  23. 23. Complexity, Semantic richness Semantic data sources Taxonomies hyponymy
  24. 24. Complexity, Semantic richness Semantic data sources Thesauri Taxonomies synonymy hyponymy hyponymy
  25. 25. Complexity, Semantic richness Semantic data sources Wordnets Thesauri Taxonomies [...] entailment troponymy meronymy synonymy synonymy hyponymy hyponymy hyponymy
  26. 26. Semantic data sources OntologiesComplexity, Semantic richness Wordnets Thesauri ANY Taxonomies [...] entailment troponymy meronymy synonymy synonymy hyponymy hyponymy hyponymy
  27. 27. Miller and FellBaun 1990WordNet, an online Lexical Database
  28. 28. (d es ip te Hearst ‘92) to ma atn in Miller and FellBaun 1990h ard WordNet, an online Lexical Database
  29. 29. (d es ip te Hearst ‘92) langu ma iatn n age specifich ard to Miller and FellBaun 1990 WordNet, an online Lexical Database
  30. 30. (d es ip te Hearst ‘92) langu ma iatn n age specifich to Miller and FellBaun 1990 ard absence of proper names, WordNet, an online Lexical Database jna daalargeot na,l slang M . ’99 Gabrilovich & Markovitch ‘07
  31. 31. Our proposal for the KDIR’09
  32. 32. Automatically build hyponym taxonomies thatcapture not only formal lexicon semantics, but also relations between those terms actually used by search engine users Do it without needing additional sources of information than the own query log
  33. 33. Automatic acquisition of hyponyms from large text corpora (1992) Caraballo, 1999. Automatic construction of a hypernym-labeled noun hierarchy from text. Girju, Badulescu and Moldovan. 2003. LearningMa rti A. Hearst semantic constraints for the automatic discovery of part-whole relations. [...]
  34. 34. Baeza-Yates and Tiberi. 2007.Extracting semantic relationsfrom query logs.Shen et al. 2008. Mining web queryhierarchies from clickthrough data Paşca ʻ07 Sekine and Suzuki ʼ07 Mika ʼ07 Schmitz ʼ06 Komachi and Suzuki ʼ08
  35. 35. Baeza-Yates and Tiberi. 2007.Extracting semantic relationsfrom query logs. a wi asest oleut whtho ir es ugg h yShen et al. 2008. Mining web query queto s ik ngrive inghierarchies from clickthrough data Ta d now Paşca ʻ07 Sekine and Suzuki ʼ07 Mika ʼ07 k w Schmitz ʼ06 Komachi and Suzuki ʼ08
  36. 36. What we did
  37. 37. 1. Reveal topical sessions
  38. 38. 1. Reveal topical sessions2. Filter noisy information
  39. 39. 1. Reveal topical sessions2. Filter noisy information3. Identify Generalization / Specialization patterns
  40. 40. 1. Reveal topical sessions2. Filter noisy information3. Identify Generalization / Specialization patterns4. Extract hyponymy relations from patterns
  41. 41. Log sessionization
  42. 42. AOL 6 200Log 0M queries , > 3sessionization
  43. 43. Daniel Gayo-Avello .2009. “A survey on session detectionmethods in query logs and a proposal for future evaluation”
  44. 44. summer collection briefs 17:46:48speedo summer collection 17:48:33madonna get into the groove 17:55:47madonna get into the groove 17:57:29videogames cheats and codes 18:02:56cheatsandcodes.com 18:10:27madonna get into the groove 18:11:40getintothegroovelyrics 18:12:27 Daniel Gayo-Avello .2009. “A survey on session detectionmethods in query logs and a proposal for future evaluation”
  45. 45. summer collection briefs 17:46:48speedo summer collection 17:48:33madonna get into the groove 17:55:47madonna get into the groove 17:57:29madonna get into the groove 18:11:40getintothegroovelyrics 18:12:27videogames cheats and codes 18:02:56cheatsandcodes.com 18:10:27 Daniel Gayo-Avello .2009. “A survey on session detectionmethods in query logs and a proposal for future evaluation”
  46. 46. Noise filtering
  47. 47. summer collection briefs 17:46:48speedo summer collection 17:48:33madonna get into the groove 17:55:47madonna get into the groove 17:57:29madonna get into the groove 18:11:40getintothegroovelyrics 18:12:27videogames cheats and codes 18:02:56cheatsandcodes.com 18:10:27
  48. 48. summer collection briefs 17:46:48speedo summer collection 17:48:33madonna get into the groove 17:55:47madonna get into the groove 17:57:29madonna get into the groove 18:11:40getintothegroovelyrics 18:12:27videogames cheats and codes 18:02:56cheatsandcodes.com 18:10:27
  49. 49. summer collection briefs 17:46:48speedo summer collection 17:48:33madonna get into the groove 17:55:47madonna get into the groove 17:57:29madonna get into the groove 18:11:40getintothegroovelyrics 18:12:27videogames cheats and codes 18:02:56cheatsandcodes.com 18:10:27 Jim Jansen and Amanda Spink. 2008. Determining the informational, navigational and transactional intent of queries.
  50. 50. summer collection briefs 17:46:48speedo summer collection 17:48:33madonna get into the groove 17:55:47madonna get into the groove 17:57:29madonna get into the groove 18:11:40getintothegroovelyrics 18:12:27videogames cheats and codes 18:02:56cheatsandcodes.com 18:10:27 Jim Jansen and Amanda Spink. 2008. Determining the informational, navigational and transactional intent of queries.
  51. 51. summer collection briefs 17:46:48speedo summer collection 17:48:33madonna get into the groove 17:55:47madonna get into the groove 17:57:29madonna get into the groove 18:11:40getintothegroovelyrics 18:12:27videogames cheats and codes 18:02:56cheatsandcodes.com 18:10:27
  52. 52. summer collection briefs 17:46:48speedo summer collection 17:48:33madonna get into the groove 17:55:47madonna get into the groove 17:57:29madonna get into the groove 18:11:40getintothegroovelyrics 18:12:27videogames cheats and codes 18:02:56cheatsandcodes.com 18:10:27
  53. 53. summer collection briefs 17:46:48speedo summer collection 17:48:33
  54. 54. Specialization identification
  55. 55. fish foodtropical fish food Terms added (trivial)
  56. 56. fish foodtropical fish food Terms added (trivial)formula one pilotsFernando Alonso Queries don’t share any term
  57. 57. fish foodtropical fish food Terms added (trivial) opees don’t share any term ut o Queri f scformula one pilots oFernando Alonso
  58. 58. fish foodtropical fish food Terms added (trivial) opees don’t share any term ut o Queri f scformula one pilots oFernando Alonsospeedo summer collectionsummer collection briefs Someremovrmsd added, other te e
  59. 59. Relation extraction
  60. 60. Relation extraction
  61. 61. Relation extraction: Specialization w/reformulation
  62. 62. Relation extraction: Specialization w/reformulation summer collection briefs 35,000,000 speedo summer collection 163,000
  63. 63. Relation extraction: Specialization w/reformulation summer collection briefs ⊇ speedo summer collection
  64. 64. Relation extraction: Specialization w/reformulation briefs speedo ✓
  65. 65. Relation extraction: Trivial specialization fish food tropical fish food
  66. 66. Relation extraction: Trivial specialization fish food tropical fish food ✓
  67. 67. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish tropical food fish fish food food tropical fish fish food tropical fish food
  68. 68. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish tropical food tropical fish food tropical fish fish food fish fish food fish fish food food food fish food food fish tropical fish food tropical fish fish food tropical fish fish fish food food fish food fish food fish food fish tropical fish food food tropical fish food fish food tropical fish food
  69. 69. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish fish fish food fish food food fish food food fish tropical fish fish fish food food fish food fish food fish food fish tropical fish food food tropical fish food fish food tropical fish food
  70. 70. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish food fish fish food food fish tropical fish fish fish food food fish food fish tropical fish food food tropical fish food fish food tropical fish food
  71. 71. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish tropical fish fish fish food food fish food fish tropical fish food food tropical fish food fish food tropical fish food
  72. 72. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish tropical fish fish fish food fish tropical fish food food fish food food tropical fish food fish food tropical fish food
  73. 73. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish tropical fish ✓ fish fish food ✗ fish tropical fish food ✗ food fish food ✓ food tropical fish food ✓ fish food tropical fish food ✓
  74. 74. Preliminary results
  75. 75. Preliminary results 3000 instances Overall Correct Wrong 62,67% 37,33%
  76. 76. Preliminary results 3000 instances Correct Overall Present in Wordnet Correct Not present in Wordnet Wrong 70,22% 62,67% 29,78% 37,33%
  77. 77. Preliminary results 3000 instances Correct Overall Present in Wordnet Correct Not present in Wordnet Wrong 70,22% 62,67% 29,78% 37,33% eventing ← jumping underwear ← briefs ← speedo celtic ← irish
  78. 78. Preliminary results 3000 instances Correct Overall Wrong Present in Wordnet Correct co-hyponyms Not present in Wordnet Wrong unrelated terms 70,22% 62,67% 53,75% 46,25% 29,78% 37,33% eventing ← jumping underwear ← briefs ← speedo celtic ← irish
  79. 79. Preliminary results 3000 instances Correct Overall Wrong Present in Wordnet Correct co-hyponyms Not present in Wordnet Wrong unrelated terms 70,22% 62,67% 53,75% 46,25% 29,78% 37,33% eventing ← jumping underwear ← briefs ← speedo yellow ← white celtic ← irish honda ← kawasaki
  80. 80. Preliminary results 3000 instances Correct Overall Wrong Present in Wordnet Correct co-hyponyms Not present in Wordnet Wrong unrelated terms 70,22% 62,67% 53,75% 46,25% 29,78% 37,33% fish food ← fish scandal ← election eventing ← jumping underwear ← briefs ← speedo yellow ← white celtic ← irish honda ← kawasaki
  81. 81. Work in progress
  82. 82. Work in progress Machine Learningspecialization detection Paolo Boldi et al. 2009. From dango to japanese cakes
  83. 83. Work in progress Machine Learningspecialization detection Paolo Boldi et al. 2009. From dango to japanese cakes qi: Formula one pilots qj: Fernando Alonso
  84. 84. Work in progress Machine Learning Multi-word termspecialization detection identification Paolo Boldi et al. 2009. Rosie Jones et al. 2006. From dango to japanese cakes Generating query substitutions qi: Formula one pilots qj: Fernando Alonso
  85. 85. Work in progress Machine Learning Multi-word termspecialization detection identification Paolo Boldi et al. 2009. Rosie Jones et al. 2006. From dango to japanese cakes Generating query substitutions qi: Formula one pilots golden globe awards qj: Fernando Alonso new york maps
  86. 86. Next future work
  87. 87. Next future workFinish ongoing work
  88. 88. Next future work Evaluation frameworkFinish ongoing work
  89. 89. Next future work Relevance ranking Evaluation frameworkFinish ongoing work
  90. 90. Next future work Suggestions? Relevance ranking Evaluation frameworkFinish ongoing work
  91. 91. research@miguelfernandez.info
  92. 92. KDIR09International Conference On Knowledge Hierachical taxonomy extraction by mining topical query sessionsDicovery and Information Retrieval 2009 Miguel Fernández Fernández and Daniel Gayo Avello

×