Successfully reported this slideshow.
Your SlideShare is downloading. ×

Hierarchical taxonomy extraction

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 97 Ad

Hierarchical taxonomy extraction

Slides for the presentation of the paper entitled "Hierarchical Taxonomy Extraction By Mining Topical Query Sessions" in proceedeings of the 1st International Conference on Knowledge Discovery and Information Retrieval (KDIR) celebrated in Madeira in 2009

Slides for the presentation of the paper entitled "Hierarchical Taxonomy Extraction By Mining Topical Query Sessions" in proceedeings of the 1st International Conference on Knowledge Discovery and Information Retrieval (KDIR) celebrated in Madeira in 2009

Advertisement
Advertisement

More Related Content

Advertisement

Hierarchical taxonomy extraction

  1. 1. KDIR09 International Conference On Knowledge Hierachical taxonomy extraction by mining topical query sessions Dicovery and Information Retrieval 2009 Miguel Fernández Fernández and Daniel Gayo Avello
  2. 2. brittany spears www.wikipedia.org horse jumping auto restoration auto repair classic car repair car supplies classic car batteries vintageparts.com low cost airlines cheap flights easyjet.com
  3. 3. brittany spears www.wikipedia.org horse jumping auto restoration auto repair classic car repair car supplies classic car batteries vintageparts.com “... a series of interactions by the low cost airlines user toward addressing cheap flights a single information easyjet.com need...” Jansen et. al 2007
  4. 4. Unfortunatelly, not all Wang and Zhai 2008. Mining term association patterns from queries are equally effective search logs for effective query reformulation.
  5. 5. Unfortunatelly, not all Wang and Zhai 2008. Mining term association patterns from queries are equally effective search logs for effective query reformulation. Mispeci fication different people use different words to discribe the same thing!
  6. 6. Unfortunatelly, not all Wang and Zhai 2008. Mining term association patterns from queries are equally effective search logs for effective query reformulation. Mispeci fication cification Underspe different people use different user has shallow knowledge about words to discribe the same thing! what he is looking for
  7. 7. How can they be mitigated? Und e n rspec cifi catio ific Mispe atio n
  8. 8. mispecification (typo) query suggestion
  9. 9. mispecification (typo) query suggestion query expansion
  10. 10. mispecification (typo) query suggestion query expansion based on clustering
  11. 11. mispecification (typo) query suggestion query expansion based on clustering
  12. 12. Wuh! Pretty cool, but... Jargon Slang Vague domain knowledge ...are still on the game
  13. 13. Not
  14. 14. Not semantic query sugg & expansioestion n
  15. 15. How?
  16. 16. How?
  17. 17. hyponym |ˈhīpəˌnim| a word of more specific meaning than a general or superordinate term applicable to it.
  18. 18. hyponymy|ˈhīpəˌnim| | hyponym | hīˈpänəmē a word of more specific meaning than a general or superordinate term applicable to it.
  19. 19. hyponymy|ˈhīpəˌnim| | hyponym | hīˈpänəmē a word of more specific meaning than a general or superordinate term applicable to it. Transitivity ➞ deductive power
  20. 20. hyponymy|ˈhīpəˌnim| | hyponym | hīˈpänəmē a word of more specific meaning than a general or superordinate term applicable to it. Transitivity ➞ deductive power Socrates is mortal
  21. 21. hyponymy|ˈhīpəˌnim| | hyponym | hīˈpänəmē a word of more specific meaning than a general or superordinate term applicable to it. Transitivity ➞ deductive power Socrates is mortal Hyponym semantic equivalence (synsets)
  22. 22. hyponymy|ˈhīpəˌnim| | hyponym | hīˈpänəmē a word of more specific meaning than a general or superordinate term applicable to it. Socrates is mortal Transitivity ➞ deductive power Hyponym semantic equivalence (synsets) Ferrari and Lamborghini are luxury cars
  23. 23. Complexity, Semantic richness Semantic data sources Taxonomies hyponymy
  24. 24. Complexity, Semantic richness Semantic data sources Thesauri Taxonomies synonymy hyponymy hyponymy
  25. 25. Complexity, Semantic richness Semantic data sources Wordnets Thesauri Taxonomies [...] entailment troponymy meronymy synonymy synonymy hyponymy hyponymy hyponymy
  26. 26. Semantic data sources Ontologies Complexity, Semantic richness Wordnets Thesauri ANY Taxonomies [...] entailment troponymy meronymy synonymy synonymy hyponymy hyponymy hyponymy
  27. 27. Miller and FellBaun 1990 WordNet, an online Lexical Database
  28. 28. (d es ip te Hearst ‘92) to ma atn in Miller and FellBaun 1990 h ard WordNet, an online Lexical Database
  29. 29. (d es ip te Hearst ‘92) langu ma iatn n age specific h ard to Miller and FellBaun 1990 WordNet, an online Lexical Database
  30. 30. (d es ip te Hearst ‘92) langu ma iatn n age specific h to Miller and FellBaun 1990 ard absence of proper names, WordNet, an online Lexical Database jna daalargeot na,l slang M . ’99 Gabrilovich & Markovitch ‘07
  31. 31. Our proposal for the KDIR’09
  32. 32. Automatically build hyponym taxonomies that capture not only formal lexicon semantics, but also relations between those terms actually used by search engine users Do it without needing additional sources of information than the own query log
  33. 33. Automatic acquisition of hyponyms from large text corpora (1992) Caraballo, 1999. Automatic construction of a hypernym-labeled noun hierarchy from text. Girju, Badulescu and Moldovan. 2003. Learning Ma rti A. Hearst semantic constraints for the automatic discovery of part-whole relations. [...]
  34. 34. Baeza-Yates and Tiberi. 2007. Extracting semantic relations from query logs. Shen et al. 2008. Mining web query hierarchies from clickthrough data Paşca ʻ07 Sekine and Suzuki ʼ07 Mika ʼ07 Schmitz ʼ06 Komachi and Suzuki ʼ08
  35. 35. Baeza-Yates and Tiberi. 2007. Extracting semantic relations from query logs. a wi asest oleut whtho ir es ugg h y Shen et al. 2008. Mining web query queto s ik ngrive ing hierarchies from clickthrough data Ta d now Paşca ʻ07 Sekine and Suzuki ʼ07 Mika ʼ07 k w Schmitz ʼ06 Komachi and Suzuki ʼ08
  36. 36. What we did
  37. 37. 1. Reveal topical sessions
  38. 38. 1. Reveal topical sessions 2. Filter noisy information
  39. 39. 1. Reveal topical sessions 2. Filter noisy information 3. Identify Generalization / Specialization patterns
  40. 40. 1. Reveal topical sessions 2. Filter noisy information 3. Identify Generalization / Specialization patterns 4. Extract hyponymy relations from patterns
  41. 41. Log sessionization
  42. 42. AOL 6 200Log 0M queries , > 3sessionization
  43. 43. Daniel Gayo-Avello .2009. “A survey on session detection methods in query logs and a proposal for future evaluation”
  44. 44. summer collection briefs 17:46:48 speedo summer collection 17:48:33 madonna get into the groove 17:55:47 madonna get into the groove 17:57:29 videogames cheats and codes 18:02:56 cheatsandcodes.com 18:10:27 madonna get into the groove 18:11:40 getintothegroovelyrics 18:12:27 Daniel Gayo-Avello .2009. “A survey on session detection methods in query logs and a proposal for future evaluation”
  45. 45. summer collection briefs 17:46:48 speedo summer collection 17:48:33 madonna get into the groove 17:55:47 madonna get into the groove 17:57:29 madonna get into the groove 18:11:40 getintothegroovelyrics 18:12:27 videogames cheats and codes 18:02:56 cheatsandcodes.com 18:10:27 Daniel Gayo-Avello .2009. “A survey on session detection methods in query logs and a proposal for future evaluation”
  46. 46. Noise filtering
  47. 47. summer collection briefs 17:46:48 speedo summer collection 17:48:33 madonna get into the groove 17:55:47 madonna get into the groove 17:57:29 madonna get into the groove 18:11:40 getintothegroovelyrics 18:12:27 videogames cheats and codes 18:02:56 cheatsandcodes.com 18:10:27
  48. 48. summer collection briefs 17:46:48 speedo summer collection 17:48:33 madonna get into the groove 17:55:47 madonna get into the groove 17:57:29 madonna get into the groove 18:11:40 getintothegroovelyrics 18:12:27 videogames cheats and codes 18:02:56 cheatsandcodes.com 18:10:27
  49. 49. summer collection briefs 17:46:48 speedo summer collection 17:48:33 madonna get into the groove 17:55:47 madonna get into the groove 17:57:29 madonna get into the groove 18:11:40 getintothegroovelyrics 18:12:27 videogames cheats and codes 18:02:56 cheatsandcodes.com 18:10:27 Jim Jansen and Amanda Spink. 2008. Determining the informational, navigational and transactional intent of queries.
  50. 50. summer collection briefs 17:46:48 speedo summer collection 17:48:33 madonna get into the groove 17:55:47 madonna get into the groove 17:57:29 madonna get into the groove 18:11:40 getintothegroovelyrics 18:12:27 videogames cheats and codes 18:02:56 cheatsandcodes.com 18:10:27 Jim Jansen and Amanda Spink. 2008. Determining the informational, navigational and transactional intent of queries.
  51. 51. summer collection briefs 17:46:48 speedo summer collection 17:48:33 madonna get into the groove 17:55:47 madonna get into the groove 17:57:29 madonna get into the groove 18:11:40 getintothegroovelyrics 18:12:27 videogames cheats and codes 18:02:56 cheatsandcodes.com 18:10:27
  52. 52. summer collection briefs 17:46:48 speedo summer collection 17:48:33 madonna get into the groove 17:55:47 madonna get into the groove 17:57:29 madonna get into the groove 18:11:40 getintothegroovelyrics 18:12:27 videogames cheats and codes 18:02:56 cheatsandcodes.com 18:10:27
  53. 53. summer collection briefs 17:46:48 speedo summer collection 17:48:33
  54. 54. Specialization identification
  55. 55. fish food tropical fish food Terms added (trivial)
  56. 56. fish food tropical fish food Terms added (trivial) formula one pilots Fernando Alonso Queries don’t share any term
  57. 57. fish food tropical fish food Terms added (trivial) opees don’t share any term ut o Queri f sc formula one pilots o Fernando Alonso
  58. 58. fish food tropical fish food Terms added (trivial) opees don’t share any term ut o Queri f sc formula one pilots o Fernando Alonso speedo summer collection summer collection briefs Someremovrmsd added, other te e
  59. 59. Relation extraction
  60. 60. Relation extraction
  61. 61. Relation extraction: Specialization w/reformulation
  62. 62. Relation extraction: Specialization w/reformulation summer collection briefs 35,000,000 speedo summer collection 163,000
  63. 63. Relation extraction: Specialization w/reformulation summer collection briefs ⊇ speedo summer collection
  64. 64. Relation extraction: Specialization w/reformulation briefs speedo ✓
  65. 65. Relation extraction: Trivial specialization fish food tropical fish food
  66. 66. Relation extraction: Trivial specialization fish food tropical fish food ✓
  67. 67. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish tropical food fish fish food food tropical fish fish food tropical fish food
  68. 68. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish tropical food tropical fish food tropical fish fish food fish fish food fish fish food food food fish food food fish tropical fish food tropical fish fish food tropical fish fish fish food food fish food fish food fish food fish tropical fish food food tropical fish food fish food tropical fish food
  69. 69. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish fish fish food fish food food fish food food fish tropical fish fish fish food food fish food fish food fish food fish tropical fish food food tropical fish food fish food tropical fish food
  70. 70. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish food fish fish food food fish tropical fish fish fish food food fish food fish tropical fish food food tropical fish food fish food tropical fish food
  71. 71. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish tropical fish fish fish food food fish food fish tropical fish food food tropical fish food fish food tropical fish food
  72. 72. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish tropical fish fish fish food fish tropical fish food food fish food food tropical fish food fish food tropical fish food
  73. 73. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish tropical fish ✓ fish fish food ✗ fish tropical fish food ✗ food fish food ✓ food tropical fish food ✓ fish food tropical fish food ✓
  74. 74. Preliminary results
  75. 75. Preliminary results 3000 instances Overall Correct Wrong 62,67% 37,33%
  76. 76. Preliminary results 3000 instances Correct Overall Present in Wordnet Correct Not present in Wordnet Wrong 70,22% 62,67% 29,78% 37,33%
  77. 77. Preliminary results 3000 instances Correct Overall Present in Wordnet Correct Not present in Wordnet Wrong 70,22% 62,67% 29,78% 37,33% eventing ← jumping underwear ← briefs ← speedo celtic ← irish
  78. 78. Preliminary results 3000 instances Correct Overall Wrong Present in Wordnet Correct co-hyponyms Not present in Wordnet Wrong unrelated terms 70,22% 62,67% 53,75% 46,25% 29,78% 37,33% eventing ← jumping underwear ← briefs ← speedo celtic ← irish
  79. 79. Preliminary results 3000 instances Correct Overall Wrong Present in Wordnet Correct co-hyponyms Not present in Wordnet Wrong unrelated terms 70,22% 62,67% 53,75% 46,25% 29,78% 37,33% eventing ← jumping underwear ← briefs ← speedo yellow ← white celtic ← irish honda ← kawasaki
  80. 80. Preliminary results 3000 instances Correct Overall Wrong Present in Wordnet Correct co-hyponyms Not present in Wordnet Wrong unrelated terms 70,22% 62,67% 53,75% 46,25% 29,78% 37,33% fish food ← fish scandal ← election eventing ← jumping underwear ← briefs ← speedo yellow ← white celtic ← irish honda ← kawasaki
  81. 81. Work in progress
  82. 82. Work in progress Machine Learning specialization detection Paolo Boldi et al. 2009. From 'dango' to 'japanese cakes'
  83. 83. Work in progress Machine Learning specialization detection Paolo Boldi et al. 2009. From 'dango' to 'japanese cakes' qi: Formula one pilots qj: Fernando Alonso
  84. 84. Work in progress Machine Learning Multi-word term specialization detection identification Paolo Boldi et al. 2009. Rosie Jones et al. 2006. From 'dango' to 'japanese cakes' Generating query substitutions qi: Formula one pilots qj: Fernando Alonso
  85. 85. Work in progress Machine Learning Multi-word term specialization detection identification Paolo Boldi et al. 2009. Rosie Jones et al. 2006. From 'dango' to 'japanese cakes' Generating query substitutions qi: Formula one pilots golden globe awards qj: Fernando Alonso new york maps
  86. 86. Next future work
  87. 87. Next future work Finish ongoing work
  88. 88. Next future work Evaluation framework Finish ongoing work
  89. 89. Next future work Relevance ranking Evaluation framework Finish ongoing work
  90. 90. Next future work Suggestions? Relevance ranking Evaluation framework Finish ongoing work
  91. 91. research@miguelfernandez.info
  92. 92. KDIR09 International Conference On Knowledge Hierachical taxonomy extraction by mining topical query sessions Dicovery and Information Retrieval 2009 Miguel Fernández Fernández and Daniel Gayo Avello

×