SlideShare a Scribd company logo
1 of 97
KDIR09
International Conference On Knowledge
                                          Hierachical taxonomy extraction
                                          by mining topical query sessions
Dicovery and Information Retrieval 2009   Miguel Fernández Fernández and Daniel Gayo Avello
brittany spears
www.wikipedia.org
horse jumping
auto restoration
auto repair
classic car repair
car supplies
classic car batteries
vintageparts.com
low cost airlines
cheap flights
easyjet.com
brittany spears
www.wikipedia.org
horse jumping
auto restoration
auto repair
classic car repair
car supplies
classic car batteries
vintageparts.com
                                “... a series of
                           interactions by the
low cost airlines       user toward addressing
cheap flights           a single information
easyjet.com                             need...”
                                 Jansen et. al 2007
Unfortunatelly, not all                        Wang and Zhai 2008.
                              Mining term association patterns from
queries are equally effective         search logs for effective query
                                                       reformulation.
Unfortunatelly, not all                        Wang and Zhai 2008.
                              Mining term association patterns from
queries are equally effective         search logs for effective query
                                                       reformulation.




       Mispeci
               fication


  different people use different
 words to discribe the same thing!
Unfortunatelly, not all                        Wang and Zhai 2008.
                              Mining term association patterns from
queries are equally effective         search logs for effective query
                                                       reformulation.




       Mispeci
               fication                                cification
                                             Underspe

  different people use different     user has shallow knowledge about
 words to discribe the same thing!         what he is looking for
How can they be mitigated?



                  Und
                       e
                      n rspec
             cifi catio        ific
        Mispe                    atio
                                     n
mispecification (typo)
query suggestion
mispecification (typo)
query suggestion
                      query expansion
mispecification (typo)
query suggestion
                      query expansion

                      based on clustering
mispecification (typo)
query suggestion
                      query expansion

                      based on clustering
Wuh! Pretty cool, but...

               Jargon

                Slang

       Vague domain knowledge




       ...are still on the game
Not
Not




           semantic
      query sugg
      & expansioestion
                n
How?
How?
hyponym |ˈhīpəˌnim|
a word of more specific meaning than a
general or superordinate term applicable to it.
hyponymy|ˈhīpəˌnim| |
hyponym | hīˈpänəmē
a word of more specific meaning than a
general or superordinate term applicable to it.
hyponymy|ˈhīpəˌnim| |
hyponym | hīˈpänəmē
a word of more specific meaning than a
general or superordinate term applicable to it.



           Transitivity ➞ deductive power
hyponymy|ˈhīpəˌnim| |
hyponym | hīˈpänəmē
a word of more specific meaning than a
general or superordinate term applicable to it.



           Transitivity ➞ deductive power
                                         Socrates is mortal
hyponymy|ˈhīpəˌnim| |
hyponym | hīˈpänəmē
a word of more specific meaning than a
general or superordinate term applicable to it.



          Transitivity ➞ deductive power
                                         Socrates is mortal
       Hyponym semantic equivalence (synsets)
hyponymy|ˈhīpəˌnim| |
hyponym | hīˈpänəmē
a word of more specific meaning than a
general or superordinate term applicable to it.


                                      Socrates is mortal
          Transitivity ➞ deductive power
       Hyponym semantic equivalence (synsets)
          Ferrari and Lamborghini are luxury cars
Complexity, Semantic richness                Semantic data sources




                                Taxonomies




                                hyponymy
Complexity, Semantic richness                           Semantic data sources




                                             Thesauri

                                Taxonomies




                                             synonymy
                                hyponymy     hyponymy
Complexity, Semantic richness                              Semantic data sources




                                                        Wordnets


                                             Thesauri

                                Taxonomies
                                                            [...]
                                                        entailment
                                                        troponymy
                                                        meronymy
                                             synonymy    synonymy
                                hyponymy     hyponymy   hyponymy
Semantic data sources


                                                                     Ontologies
Complexity, Semantic richness




                                                        Wordnets


                                             Thesauri                  ANY

                                Taxonomies
                                                            [...]
                                                        entailment
                                                        troponymy
                                                        meronymy
                                             synonymy    synonymy
                                hyponymy     hyponymy   hyponymy
Miller and FellBaun 1990
WordNet, an online Lexical Database
(d es ip te Hearst ‘92)
        to ma atn in
        Miller and FellBaun 1990
h ard
    WordNet, an online Lexical Database
(d es ip te Hearst ‘92) langu
           ma iatn n                              age specific
h ard   to
        Miller and FellBaun 1990
    WordNet, an online Lexical Database
(d es ip te Hearst ‘92) langu
       ma iatn n                              age specific
h   to Miller and FellBaun 1990
  ard absence of proper names,
   WordNet, an online Lexical Database
                              jna daalargeot na,l slang
                            M . ’99
                            Gabrilovich & Markovitch ‘07
Our proposal for the KDIR’09
Automatically build hyponym taxonomies that
capture not only formal lexicon semantics, but
  also relations between those terms actually
            used by search engine users



 Do it without needing additional sources of
     information than the own query log
Automatic acquisition of hyponyms
                        from large text corpora (1992)



                                         Caraballo, 1999. Automatic
                                 construction of a hypernym-labeled
                                           noun hierarchy from text.

                     Girju, Badulescu and Moldovan. 2003. Learning
Ma rti A. Hearst    semantic constraints for the automatic discovery
                                              of part-whole relations.

                                                                  [...]
Baeza-Yates and Tiberi. 2007.
Extracting semantic relations
from query logs.

Shen et al. 2008. Mining web query
hierarchies from clickthrough data




                    Paşca ʻ07
          Sekine and Suzuki ʼ07 Mika ʼ07
            Schmitz ʼ06
                      Komachi and Suzuki ʼ08
Baeza-Yates and Tiberi. 2007.
Extracting semantic relations
from query logs.

                                                    a wi
                                                 asest    oleut
                                                       whtho
                                          ir es ugg
                                                       h y
Shen et al. 2008. Mining web query

                                      queto s
                              ik ngrive ing
hierarchies from clickthrough data

                            Ta d now
                    Paşca ʻ07
          Sekine and Suzuki ʼ07 Mika ʼ07
                                            k         w
            Schmitz ʼ06
                      Komachi and Suzuki ʼ08
What we did
1. Reveal topical sessions
1. Reveal topical sessions

2. Filter noisy information
1. Reveal topical sessions

2. Filter noisy information

3. Identify Generalization / Specialization patterns
1. Reveal topical sessions

2. Filter noisy information

3. Identify Generalization / Specialization patterns

4. Extract hyponymy relations from patterns
Log sessionization
AOL    6
    200Log    0M queries
        , > 3sessionization
Daniel Gayo-Avello .2009. “A survey on session detection
methods in query logs and a proposal for future evaluation”
summer collection briefs                                 17:46:48
speedo summer collection                                 17:48:33
madonna get into the groove                              17:55:47
madonna get into the groove                              17:57:29
videogames cheats and codes                              18:02:56
cheatsandcodes.com                                       18:10:27
madonna get into the groove                              18:11:40
getintothegroovelyrics                                   18:12:27


 Daniel Gayo-Avello .2009. “A survey on session detection
methods in query logs and a proposal for future evaluation”
summer collection briefs                                 17:46:48
speedo summer collection                                 17:48:33

madonna get into the groove                              17:55:47
madonna get into the groove                              17:57:29
madonna get into the groove                              18:11:40
getintothegroovelyrics                                   18:12:27
videogames cheats and codes                              18:02:56
cheatsandcodes.com                                       18:10:27

 Daniel Gayo-Avello .2009. “A survey on session detection
methods in query logs and a proposal for future evaluation”
Noise filtering
summer collection briefs      17:46:48
speedo summer collection      17:48:33
madonna get into the groove   17:55:47
madonna get into the groove   17:57:29
madonna get into the groove   18:11:40
getintothegroovelyrics        18:12:27

videogames cheats and codes   18:02:56
cheatsandcodes.com            18:10:27
summer collection briefs      17:46:48
speedo summer collection      17:48:33
madonna get into the groove   17:55:47
madonna get into the groove   17:57:29
madonna get into the groove   18:11:40
getintothegroovelyrics        18:12:27

videogames cheats and codes   18:02:56
cheatsandcodes.com            18:10:27
summer collection briefs                         17:46:48
speedo summer collection                         17:48:33
madonna get into the groove                      17:55:47
madonna get into the groove                      17:57:29
madonna get into the groove                      18:11:40
getintothegroovelyrics                           18:12:27

videogames cheats and codes                      18:02:56
cheatsandcodes.com                               18:10:27
          Jim Jansen and Amanda Spink. 2008. Determining the
          informational, navigational and transactional intent of
          queries.
summer collection briefs                         17:46:48
speedo summer collection                         17:48:33
madonna get into the groove                      17:55:47
madonna get into the groove                      17:57:29
madonna get into the groove                      18:11:40
getintothegroovelyrics                           18:12:27

videogames cheats and codes                      18:02:56
cheatsandcodes.com                               18:10:27
          Jim Jansen and Amanda Spink. 2008. Determining the
          informational, navigational and transactional intent of
          queries.
summer collection briefs      17:46:48
speedo summer collection      17:48:33
madonna get into the groove   17:55:47
madonna get into the groove   17:57:29
madonna get into the groove   18:11:40
getintothegroovelyrics        18:12:27

videogames cheats and codes   18:02:56
cheatsandcodes.com            18:10:27
summer collection briefs      17:46:48
speedo summer collection      17:48:33
madonna get into the groove   17:55:47
madonna get into the groove   17:57:29
madonna get into the groove   18:11:40
getintothegroovelyrics        18:12:27

videogames cheats and codes   18:02:56
cheatsandcodes.com            18:10:27
summer collection briefs   17:46:48
speedo summer collection   17:48:33
Specialization identification
fish food
tropical fish food   Terms added (trivial)
fish food
tropical fish food    Terms added (trivial)
formula one pilots
Fernando Alonso        Queries don’t share any term
fish food
tropical fish food   Terms added (trivial)
                       opees don’t share any term
             ut o Queri
                 f sc
formula one pilots
           o
Fernando Alonso
fish food
tropical fish food   Terms added (trivial)
                       opees don’t share any term
             ut o Queri
                 f sc
formula one pilots
           o
Fernando Alonso

speedo summer collection
summer collection briefs
                              Someremovrmsd added,
                             other     te e
Relation extraction
Relation extraction
Relation extraction: Specialization w/reformulation
Relation extraction: Specialization w/reformulation



        summer collection briefs 35,000,000
        speedo summer collection    163,000
Relation extraction: Specialization w/reformulation




 summer collection briefs ⊇ speedo summer collection
Relation extraction: Specialization w/reformulation




                  briefs   speedo ✓
Relation extraction: Trivial specialization

                   fish food
                   tropical fish food
Relation extraction: Trivial specialization

              fish food   tropical fish food ✓
Relation extraction: Trivial specialization

              fish food    tropical fish food ✓

                    fish    tropical
                   food    fish
               fish food    food
                           tropical fish
                           fish food
                           tropical fish food
Relation extraction: Trivial specialization

                      fish food     tropical fish food ✓

 fish   tropical            food   tropical            fish food   tropical
 fish   fish                 food   fish                 fish food   fish
 fish   food                food   food                fish food   food
 fish   tropical fish        food   tropical fish        fish food   tropical fish
 fish   fish food            food   fish food            fish food   fish food
 fish   tropical fish food   food   tropical fish food   fish food   tropical fish food
Relation extraction: Trivial specialization

                      fish food     tropical fish food ✓


 fish   fish                                            fish food   fish
                           food   food                fish food   food
 fish   tropical fish
 fish   fish food            food   fish food            fish food   fish food
 fish   tropical fish food   food   tropical fish food   fish food   tropical fish food
Relation extraction: Trivial specialization

                      fish food     tropical fish food ✓


                                                      fish food   fish
                                                      fish food   food
 fish   tropical fish
 fish   fish food            food   fish food
 fish   tropical fish food   food   tropical fish food   fish food   tropical fish food
Relation extraction: Trivial specialization

                      fish food     tropical fish food ✓




 fish   tropical fish
 fish   fish food            food   fish food
 fish   tropical fish food   food   tropical fish food   fish food   tropical fish food
Relation extraction: Trivial specialization

              fish food       tropical fish food ✓

                     fish    tropical fish
                     fish    fish food
                     fish    tropical fish food
                    food    fish food
                    food    tropical fish food
                 fish food   tropical fish food
Relation extraction: Trivial specialization

              fish food       tropical fish food ✓

                     fish    tropical fish   ✓
                     fish    fish food   ✗
                     fish    tropical fish food   ✗
                    food    fish food   ✓
                    food    tropical fish food ✓
                 fish food   tropical fish food ✓
Preliminary results
Preliminary results

           3000 instances

               Overall
                Correct
                Wrong


            62,67%

                     37,33%
Preliminary results

                          3000 instances

    Correct                   Overall

 Present in Wordnet            Correct
 Not present in Wordnet        Wrong

            70,22%
                           62,67%

   29,78%                           37,33%
Preliminary results

                          3000 instances

    Correct                    Overall

 Present in Wordnet             Correct
 Not present in Wordnet         Wrong

            70,22%
                           62,67%

   29,78%                            37,33%

 eventing ← jumping
 underwear ← briefs ← speedo
 celtic ← irish
Preliminary results

                          3000 instances

    Correct                    Overall            Wrong
 Present in Wordnet             Correct         co-hyponyms
 Not present in Wordnet         Wrong           unrelated terms

            70,22%
                           62,67%             53,75%

                                                       46,25%
   29,78%                            37,33%

 eventing ← jumping
 underwear ← briefs ← speedo
 celtic ← irish
Preliminary results

                          3000 instances

    Correct                    Overall                 Wrong
 Present in Wordnet             Correct               co-hyponyms
 Not present in Wordnet         Wrong                 unrelated terms

            70,22%
                           62,67%                  53,75%

                                                             46,25%
   29,78%                            37,33%

 eventing ← jumping
 underwear ← briefs ← speedo              yellow ← white
 celtic ← irish                           honda ← kawasaki
Preliminary results

                          3000 instances

    Correct                    Overall                 Wrong
 Present in Wordnet             Correct               co-hyponyms
 Not present in Wordnet         Wrong                 unrelated terms

            70,22%
                           62,67%                  53,75%

                                                             46,25%
   29,78%                            37,33%
                                                           fish food ← fish
                                                        scandal ← election
 eventing ← jumping
 underwear ← briefs ← speedo              yellow ← white
 celtic ← irish                           honda ← kawasaki
Work in progress
Work in progress

   Machine Learning
specialization detection
     Paolo Boldi et al. 2009.
 From 'dango' to 'japanese cakes'
Work in progress

   Machine Learning
specialization detection
     Paolo Boldi et al. 2009.
 From 'dango' to 'japanese cakes'


  qi: Formula one pilots
  qj: Fernando Alonso
Work in progress

   Machine Learning                  Multi-word term
specialization detection              identification
     Paolo Boldi et al. 2009.         Rosie Jones et al. 2006.
 From 'dango' to 'japanese cakes'   Generating query substitutions

  qi: Formula one pilots
  qj: Fernando Alonso
Work in progress

   Machine Learning                  Multi-word term
specialization detection              identification
     Paolo Boldi et al. 2009.         Rosie Jones et al. 2006.
 From 'dango' to 'japanese cakes'   Generating query substitutions

  qi: Formula one pilots               golden globe awards
  qj: Fernando Alonso                    new york maps
Next future work
Next future work




Finish ongoing work
Next future work




   Evaluation framework


Finish ongoing work
Next future work



       Relevance ranking




   Evaluation framework


Finish ongoing work
Next future work

       Suggestions?

       Relevance ranking




   Evaluation framework


Finish ongoing work
research@miguelfernandez.info
KDIR09
International Conference On Knowledge
                                          Hierachical taxonomy extraction
                                          by mining topical query sessions
Dicovery and Information Retrieval 2009   Miguel Fernández Fernández and Daniel Gayo Avello

More Related Content

Similar to Hierarchical taxonomy extraction

Natural language processing 2
Natural language processing 2Natural language processing 2
Natural language processing 2Tony Vo
 
Nlp Sentemental analysis of Tweetr And CaseStudy
Nlp Sentemental analysis of Tweetr And CaseStudyNlp Sentemental analysis of Tweetr And CaseStudy
Nlp Sentemental analysis of Tweetr And CaseStudyRaza Azeem
 
Cohesion In English Wasee
Cohesion In English  WaseeCohesion In English  Wasee
Cohesion In English WaseeDr. Cupid Lucid
 
Addlaall search-engine--hattab-haddad-yaseen-uop
Addlaall search-engine--hattab-haddad-yaseen-uopAddlaall search-engine--hattab-haddad-yaseen-uop
Addlaall search-engine--hattab-haddad-yaseen-uopworld20000
 
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...Dawn Anderson MSc DigM
 
Utilising wordsmith and atlas to explore, analyse and report qualitative data
Utilising wordsmith and atlas to explore, analyse and report qualitative dataUtilising wordsmith and atlas to explore, analyse and report qualitative data
Utilising wordsmith and atlas to explore, analyse and report qualitative dataMerlien Institute
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchDawn Anderson MSc DigM
 
Vocabulary grammar
Vocabulary grammarVocabulary grammar
Vocabulary grammarm55mick
 

Similar to Hierarchical taxonomy extraction (12)

Natural language processing 2
Natural language processing 2Natural language processing 2
Natural language processing 2
 
Nlp Sentemental analysis of Tweetr And CaseStudy
Nlp Sentemental analysis of Tweetr And CaseStudyNlp Sentemental analysis of Tweetr And CaseStudy
Nlp Sentemental analysis of Tweetr And CaseStudy
 
Wordnet
WordnetWordnet
Wordnet
 
Cohesion In English Wasee
Cohesion In English  WaseeCohesion In English  Wasee
Cohesion In English Wasee
 
Cohesion Final
Cohesion FinalCohesion Final
Cohesion Final
 
Cohesion In English
Cohesion In EnglishCohesion In English
Cohesion In English
 
Addlaall search-engine--hattab-haddad-yaseen-uop
Addlaall search-engine--hattab-haddad-yaseen-uopAddlaall search-engine--hattab-haddad-yaseen-uop
Addlaall search-engine--hattab-haddad-yaseen-uop
 
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
 
Utilising wordsmith and atlas to explore, analyse and report qualitative data
Utilising wordsmith and atlas to explore, analyse and report qualitative dataUtilising wordsmith and atlas to explore, analyse and report qualitative data
Utilising wordsmith and atlas to explore, analyse and report qualitative data
 
Word Net
Word NetWord Net
Word Net
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic search
 
Vocabulary grammar
Vocabulary grammarVocabulary grammar
Vocabulary grammar
 

Recently uploaded

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceOpsTree solutions
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Memoori
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Why Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile EvolutionWhy Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile EvolutionDEEPRAJ PATHAK
 
Dynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientationDynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientationBuild Intuit
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 

Recently uploaded (20)

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer Experience
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Why Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile EvolutionWhy Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile Evolution
 
Dynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientationDynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientation
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 

Hierarchical taxonomy extraction

  • 1. KDIR09 International Conference On Knowledge Hierachical taxonomy extraction by mining topical query sessions Dicovery and Information Retrieval 2009 Miguel Fernández Fernández and Daniel Gayo Avello
  • 2.
  • 3. brittany spears www.wikipedia.org horse jumping auto restoration auto repair classic car repair car supplies classic car batteries vintageparts.com low cost airlines cheap flights easyjet.com
  • 4. brittany spears www.wikipedia.org horse jumping auto restoration auto repair classic car repair car supplies classic car batteries vintageparts.com “... a series of interactions by the low cost airlines user toward addressing cheap flights a single information easyjet.com need...” Jansen et. al 2007
  • 5. Unfortunatelly, not all Wang and Zhai 2008. Mining term association patterns from queries are equally effective search logs for effective query reformulation.
  • 6. Unfortunatelly, not all Wang and Zhai 2008. Mining term association patterns from queries are equally effective search logs for effective query reformulation. Mispeci fication different people use different words to discribe the same thing!
  • 7. Unfortunatelly, not all Wang and Zhai 2008. Mining term association patterns from queries are equally effective search logs for effective query reformulation. Mispeci fication cification Underspe different people use different user has shallow knowledge about words to discribe the same thing! what he is looking for
  • 8. How can they be mitigated? Und e n rspec cifi catio ific Mispe atio n
  • 9.
  • 12. mispecification (typo) query suggestion query expansion based on clustering
  • 13. mispecification (typo) query suggestion query expansion based on clustering
  • 14. Wuh! Pretty cool, but... Jargon Slang Vague domain knowledge ...are still on the game
  • 15. Not
  • 16. Not semantic query sugg & expansioestion n
  • 17. How?
  • 18. How?
  • 19. hyponym |ˈhīpəˌnim| a word of more specific meaning than a general or superordinate term applicable to it.
  • 20. hyponymy|ˈhīpəˌnim| | hyponym | hīˈpänəmē a word of more specific meaning than a general or superordinate term applicable to it.
  • 21. hyponymy|ˈhīpəˌnim| | hyponym | hīˈpänəmē a word of more specific meaning than a general or superordinate term applicable to it. Transitivity ➞ deductive power
  • 22. hyponymy|ˈhīpəˌnim| | hyponym | hīˈpänəmē a word of more specific meaning than a general or superordinate term applicable to it. Transitivity ➞ deductive power Socrates is mortal
  • 23. hyponymy|ˈhīpəˌnim| | hyponym | hīˈpänəmē a word of more specific meaning than a general or superordinate term applicable to it. Transitivity ➞ deductive power Socrates is mortal Hyponym semantic equivalence (synsets)
  • 24. hyponymy|ˈhīpəˌnim| | hyponym | hīˈpänəmē a word of more specific meaning than a general or superordinate term applicable to it. Socrates is mortal Transitivity ➞ deductive power Hyponym semantic equivalence (synsets) Ferrari and Lamborghini are luxury cars
  • 25. Complexity, Semantic richness Semantic data sources Taxonomies hyponymy
  • 26. Complexity, Semantic richness Semantic data sources Thesauri Taxonomies synonymy hyponymy hyponymy
  • 27. Complexity, Semantic richness Semantic data sources Wordnets Thesauri Taxonomies [...] entailment troponymy meronymy synonymy synonymy hyponymy hyponymy hyponymy
  • 28. Semantic data sources Ontologies Complexity, Semantic richness Wordnets Thesauri ANY Taxonomies [...] entailment troponymy meronymy synonymy synonymy hyponymy hyponymy hyponymy
  • 29. Miller and FellBaun 1990 WordNet, an online Lexical Database
  • 30. (d es ip te Hearst ‘92) to ma atn in Miller and FellBaun 1990 h ard WordNet, an online Lexical Database
  • 31. (d es ip te Hearst ‘92) langu ma iatn n age specific h ard to Miller and FellBaun 1990 WordNet, an online Lexical Database
  • 32. (d es ip te Hearst ‘92) langu ma iatn n age specific h to Miller and FellBaun 1990 ard absence of proper names, WordNet, an online Lexical Database jna daalargeot na,l slang M . ’99 Gabrilovich & Markovitch ‘07
  • 33. Our proposal for the KDIR’09
  • 34. Automatically build hyponym taxonomies that capture not only formal lexicon semantics, but also relations between those terms actually used by search engine users Do it without needing additional sources of information than the own query log
  • 35. Automatic acquisition of hyponyms from large text corpora (1992) Caraballo, 1999. Automatic construction of a hypernym-labeled noun hierarchy from text. Girju, Badulescu and Moldovan. 2003. Learning Ma rti A. Hearst semantic constraints for the automatic discovery of part-whole relations. [...]
  • 36. Baeza-Yates and Tiberi. 2007. Extracting semantic relations from query logs. Shen et al. 2008. Mining web query hierarchies from clickthrough data Paşca ʻ07 Sekine and Suzuki ʼ07 Mika ʼ07 Schmitz ʼ06 Komachi and Suzuki ʼ08
  • 37. Baeza-Yates and Tiberi. 2007. Extracting semantic relations from query logs. a wi asest oleut whtho ir es ugg h y Shen et al. 2008. Mining web query queto s ik ngrive ing hierarchies from clickthrough data Ta d now Paşca ʻ07 Sekine and Suzuki ʼ07 Mika ʼ07 k w Schmitz ʼ06 Komachi and Suzuki ʼ08
  • 39.
  • 40. 1. Reveal topical sessions
  • 41. 1. Reveal topical sessions 2. Filter noisy information
  • 42. 1. Reveal topical sessions 2. Filter noisy information 3. Identify Generalization / Specialization patterns
  • 43. 1. Reveal topical sessions 2. Filter noisy information 3. Identify Generalization / Specialization patterns 4. Extract hyponymy relations from patterns
  • 45. AOL 6 200Log 0M queries , > 3sessionization
  • 46. Daniel Gayo-Avello .2009. “A survey on session detection methods in query logs and a proposal for future evaluation”
  • 47. summer collection briefs 17:46:48 speedo summer collection 17:48:33 madonna get into the groove 17:55:47 madonna get into the groove 17:57:29 videogames cheats and codes 18:02:56 cheatsandcodes.com 18:10:27 madonna get into the groove 18:11:40 getintothegroovelyrics 18:12:27 Daniel Gayo-Avello .2009. “A survey on session detection methods in query logs and a proposal for future evaluation”
  • 48. summer collection briefs 17:46:48 speedo summer collection 17:48:33 madonna get into the groove 17:55:47 madonna get into the groove 17:57:29 madonna get into the groove 18:11:40 getintothegroovelyrics 18:12:27 videogames cheats and codes 18:02:56 cheatsandcodes.com 18:10:27 Daniel Gayo-Avello .2009. “A survey on session detection methods in query logs and a proposal for future evaluation”
  • 50. summer collection briefs 17:46:48 speedo summer collection 17:48:33 madonna get into the groove 17:55:47 madonna get into the groove 17:57:29 madonna get into the groove 18:11:40 getintothegroovelyrics 18:12:27 videogames cheats and codes 18:02:56 cheatsandcodes.com 18:10:27
  • 51. summer collection briefs 17:46:48 speedo summer collection 17:48:33 madonna get into the groove 17:55:47 madonna get into the groove 17:57:29 madonna get into the groove 18:11:40 getintothegroovelyrics 18:12:27 videogames cheats and codes 18:02:56 cheatsandcodes.com 18:10:27
  • 52. summer collection briefs 17:46:48 speedo summer collection 17:48:33 madonna get into the groove 17:55:47 madonna get into the groove 17:57:29 madonna get into the groove 18:11:40 getintothegroovelyrics 18:12:27 videogames cheats and codes 18:02:56 cheatsandcodes.com 18:10:27 Jim Jansen and Amanda Spink. 2008. Determining the informational, navigational and transactional intent of queries.
  • 53. summer collection briefs 17:46:48 speedo summer collection 17:48:33 madonna get into the groove 17:55:47 madonna get into the groove 17:57:29 madonna get into the groove 18:11:40 getintothegroovelyrics 18:12:27 videogames cheats and codes 18:02:56 cheatsandcodes.com 18:10:27 Jim Jansen and Amanda Spink. 2008. Determining the informational, navigational and transactional intent of queries.
  • 54. summer collection briefs 17:46:48 speedo summer collection 17:48:33 madonna get into the groove 17:55:47 madonna get into the groove 17:57:29 madonna get into the groove 18:11:40 getintothegroovelyrics 18:12:27 videogames cheats and codes 18:02:56 cheatsandcodes.com 18:10:27
  • 55. summer collection briefs 17:46:48 speedo summer collection 17:48:33 madonna get into the groove 17:55:47 madonna get into the groove 17:57:29 madonna get into the groove 18:11:40 getintothegroovelyrics 18:12:27 videogames cheats and codes 18:02:56 cheatsandcodes.com 18:10:27
  • 56. summer collection briefs 17:46:48 speedo summer collection 17:48:33
  • 58. fish food tropical fish food Terms added (trivial)
  • 59. fish food tropical fish food Terms added (trivial) formula one pilots Fernando Alonso Queries don’t share any term
  • 60. fish food tropical fish food Terms added (trivial) opees don’t share any term ut o Queri f sc formula one pilots o Fernando Alonso
  • 61. fish food tropical fish food Terms added (trivial) opees don’t share any term ut o Queri f sc formula one pilots o Fernando Alonso speedo summer collection summer collection briefs Someremovrmsd added, other te e
  • 65. Relation extraction: Specialization w/reformulation summer collection briefs 35,000,000 speedo summer collection 163,000
  • 66. Relation extraction: Specialization w/reformulation summer collection briefs ⊇ speedo summer collection
  • 67. Relation extraction: Specialization w/reformulation briefs speedo ✓
  • 68. Relation extraction: Trivial specialization fish food tropical fish food
  • 69. Relation extraction: Trivial specialization fish food tropical fish food ✓
  • 70. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish tropical food fish fish food food tropical fish fish food tropical fish food
  • 71. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish tropical food tropical fish food tropical fish fish food fish fish food fish fish food food food fish food food fish tropical fish food tropical fish fish food tropical fish fish fish food food fish food fish food fish food fish tropical fish food food tropical fish food fish food tropical fish food
  • 72. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish fish fish food fish food food fish food food fish tropical fish fish fish food food fish food fish food fish food fish tropical fish food food tropical fish food fish food tropical fish food
  • 73. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish food fish fish food food fish tropical fish fish fish food food fish food fish tropical fish food food tropical fish food fish food tropical fish food
  • 74. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish tropical fish fish fish food food fish food fish tropical fish food food tropical fish food fish food tropical fish food
  • 75. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish tropical fish fish fish food fish tropical fish food food fish food food tropical fish food fish food tropical fish food
  • 76. Relation extraction: Trivial specialization fish food tropical fish food ✓ fish tropical fish ✓ fish fish food ✗ fish tropical fish food ✗ food fish food ✓ food tropical fish food ✓ fish food tropical fish food ✓
  • 78. Preliminary results 3000 instances Overall Correct Wrong 62,67% 37,33%
  • 79. Preliminary results 3000 instances Correct Overall Present in Wordnet Correct Not present in Wordnet Wrong 70,22% 62,67% 29,78% 37,33%
  • 80. Preliminary results 3000 instances Correct Overall Present in Wordnet Correct Not present in Wordnet Wrong 70,22% 62,67% 29,78% 37,33% eventing ← jumping underwear ← briefs ← speedo celtic ← irish
  • 81. Preliminary results 3000 instances Correct Overall Wrong Present in Wordnet Correct co-hyponyms Not present in Wordnet Wrong unrelated terms 70,22% 62,67% 53,75% 46,25% 29,78% 37,33% eventing ← jumping underwear ← briefs ← speedo celtic ← irish
  • 82. Preliminary results 3000 instances Correct Overall Wrong Present in Wordnet Correct co-hyponyms Not present in Wordnet Wrong unrelated terms 70,22% 62,67% 53,75% 46,25% 29,78% 37,33% eventing ← jumping underwear ← briefs ← speedo yellow ← white celtic ← irish honda ← kawasaki
  • 83. Preliminary results 3000 instances Correct Overall Wrong Present in Wordnet Correct co-hyponyms Not present in Wordnet Wrong unrelated terms 70,22% 62,67% 53,75% 46,25% 29,78% 37,33% fish food ← fish scandal ← election eventing ← jumping underwear ← briefs ← speedo yellow ← white celtic ← irish honda ← kawasaki
  • 85. Work in progress Machine Learning specialization detection Paolo Boldi et al. 2009. From 'dango' to 'japanese cakes'
  • 86. Work in progress Machine Learning specialization detection Paolo Boldi et al. 2009. From 'dango' to 'japanese cakes' qi: Formula one pilots qj: Fernando Alonso
  • 87. Work in progress Machine Learning Multi-word term specialization detection identification Paolo Boldi et al. 2009. Rosie Jones et al. 2006. From 'dango' to 'japanese cakes' Generating query substitutions qi: Formula one pilots qj: Fernando Alonso
  • 88. Work in progress Machine Learning Multi-word term specialization detection identification Paolo Boldi et al. 2009. Rosie Jones et al. 2006. From 'dango' to 'japanese cakes' Generating query substitutions qi: Formula one pilots golden globe awards qj: Fernando Alonso new york maps
  • 90. Next future work Finish ongoing work
  • 91. Next future work Evaluation framework Finish ongoing work
  • 92. Next future work Relevance ranking Evaluation framework Finish ongoing work
  • 93. Next future work Suggestions? Relevance ranking Evaluation framework Finish ongoing work
  • 94.
  • 95.
  • 97. KDIR09 International Conference On Knowledge Hierachical taxonomy extraction by mining topical query sessions Dicovery and Information Retrieval 2009 Miguel Fernández Fernández and Daniel Gayo Avello