SlideShare a Scribd company logo
Folksonomy-Based Adaptive
        Query Expansion
Claudio Biancalana, Fabio Gasparetti, Alessandro Micarelli,
         Alfonso Miola, and Giuseppe Sansonetti

          Department of Computer Science and Automation
         Artificial Intelligence Laboratory, Roma Tre University
            Via della Vasca Navale, 79, 00146 Rome, Italy




             SRS 2012 – Montreal, Canada, July 17, 2012
State of the Art
•  1993 - Web Search Engines
•  Popular techniques to improve their performance
    Explicit Relevance Feedback and (Automatic) Query Expansion
     (Maron Kuhns 1960, Rocchio 1971)
    PageRank (1998)
    (Implicitly built) User Profiles (2004)
     •  e.g., Google Personalized
    Exploiting Social Networks or Signals (2010)
     •  Facebook, YouTube, Twitter
    Implicitly Understanding User Actions

                        SRS 2012 – Montreal, Canada, July 17, 2012
Query Expansion
The process of expanding a user query with additional
related words and phrases
    Original Query Q:             {q1, q2,…, qk, qk+1,…, qn}
    Terms to Add Q+:              {e1, e2,..., em}
    Terms to Remove Q-:           {qk+1,..., qn}

                   Expanded Query
                 EQ = (Q U Q+) - Q-
                {q1,q2,...,qk,e1,e2,...,em}

                    SRS 2012 – Montreal, Canada, July 17, 2012
Building a Co-Occ Matrix
For each document, a co-occurrence matrix is
generated and then summed up in a single matrix
    Usually a POS tagger extracts nouns, proper nouns, and
     adjectives

                                                                  t1    t2    t3    t4    t5

                                                          t1      0.0   1.0   0.0   2.0   1.0

                                                          t2      1.0   0.0   3.0   2.0   0.0

                                                          t3      0.0   3.0   0.0   9.0   0.0

                                                          t4      2.0   2.0   9.0   0.0   4.0

                                                          t5      1.0   0.0   0.0   4.0   0.0


                                                                 Co-Occurrence Matrix
                    SRS 2012 – Montreal, Canada, July 17, 2012
Limits of Co-Occ Matrices
•  Furnas’ Vocabulary problem (1987)
    Polysemy and Homonym
     •  Mouth (river-sea; cave entrance; body part)
     •  River Bank or Financial Bank



•  Corpus-dependent
    Small corpora contain few statistics
    Relevant concepts missing



                     SRS 2012 – Montreal, Canada, July 17, 2012
Research Question


Is it possible to combine Query Expansion,
Social Web, Semantic Search, and User
Personalization in traditional Web search tools?




              SRS 2012 – Montreal, Canada, July 17, 2012
Nereau




Nereau, Master of Spiders, is the name of a divinity worshipped in the Nauru
islands, in Micronesia. It is a foremost figure in many myths, some of which
give it a specific role, that of endowing the mad with rationality and the mute
with speech, thus making them complete human beings.
                        SRS 2012 – Montreal, Canada, July 17, 2012
Nereau Co-Occurrence Matrix
•  Extension of Co-occurrence matrix:
    Semantic meta-data as 3rd dimension
    The user matrix is built on usage data
•  Use of Social Bookmarking Services for metadata
  retrieval:
    e.g., delicious, StumbleUpon, Digg




                       SRS 2012 – Montreal, Canada, July 17, 2012
Nereau Co-Occurrence Matrix
•  Tags associated with visited URLs are collected
  and associated to the (stemmed) keywords from
  extracted content.




•  Each co-occ matrix is associated to a tag
                <t1, t2, tag, co-occ>
                 SRS 2012 – Montreal, Canada, July 17, 2012
Nereau Co-Occurrence Matrix
•  The expansion follows similar steps: each term of
  the query retrieves multiple co-occ matrices
  associated to different tags




•  The occs of the tags are summed up over all the
  query terms obtaining a weighted set:


                SRS 2012 – Montreal, Canada, July 17, 2012
Nereau Co-Occurrence Matrix
The co-occ keywords associated to the most relevant
tags compose the new query




                SRS 2012 – Montreal, Canada, July 17, 2012
Experimental Evaluations




         Does it work?




     SRS 2012 – Montreal, Canada, July 17, 2012
Experimental Evaluations
•  Three kinds of evaluations
    TREC corpus-based (500K docs, 249 queries)
     •  RF vs CoOcc vs Google vs Nereau
    ODP corpus-based
     •  RF vs CoOcc vs Google vs Nereau
    Web user-based
     •  Google vs PersGoogle vs RF vs Nereau



•  nDCG, P@n, MAP

                    SRS 2012 – Montreal, Canada, July 17, 2012
Experimental Evaluations
             Web corpus
Web corpus




                                            42 users on real Web 
                                            sessions
                                            nDCG@{1,5,10}


                               SRS 2012 – Montreal, Canada, July 17, 2012
Conclusions and Future Work
•  A Nereau search engine that combines:
       Traditional Query Expansion

       Social Web

       Semantic Spaces

       Basic User Personalization

•  Suitable to be included in traditional search engines;
       Complexity O(n2K)

       n = training docs

       K = keywords extracted

•  Future Work
       Including more social data (e.g., networks, user authority)

       Addressing the dynamic of folksonomies

       Automatically assign tags when no social data is available

                                  SRS 2012 – Montreal, Canada, July 17, 2012

More Related Content

Similar to Slides SRS 2012

RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)
Vladimir Alexiev, PhD, PMP
 
Metadata interoperability With JPSearch
Metadata interoperability With JPSearchMetadata interoperability With JPSearch
Metadata interoperability With JPSearch
Nicolas Demetriou
 
00 intro
00 intro00 intro
00 intro
Basma Fayech
 
NERD: an open source platform for extracting and disambiguating named entitie...
NERD: an open source platform for extracting and disambiguating named entitie...NERD: an open source platform for extracting and disambiguating named entitie...
NERD: an open source platform for extracting and disambiguating named entitie...
Raphael Troncy
 
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Project
 
Overview of-semantic-technologies-and-ontologies
Overview of-semantic-technologies-and-ontologiesOverview of-semantic-technologies-and-ontologies
Overview of-semantic-technologies-and-ontologies
Andrea Westerinen
 
2012.10 - DDI Lifecycle - Moving Forward
2012.10 - DDI Lifecycle - Moving Forward2012.10 - DDI Lifecycle - Moving Forward
2012.10 - DDI Lifecycle - Moving Forward
Dr.-Ing. Thomas Hartmann
 
Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)
evabl444
 
ICSM12.ppt
ICSM12.pptICSM12.ppt
ICSM12.ppt
Ptidej Team
 
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchCrowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
Matthew Lease
 
Driver Guidelines and Repository Interoperability
Driver Guidelines and Repository InteroperabilityDriver Guidelines and Repository Interoperability
Driver Guidelines and Repository Interoperability
maurice.vanderfeesten
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
Dr.-Ing. Thomas Hartmann
 
ODIN 1st year Conference Oct 2013 Interoperability: connecting identifiers
ODIN 1st year Conference Oct 2013 Interoperability: connecting identifiersODIN 1st year Conference Oct 2013 Interoperability: connecting identifiers
ODIN 1st year Conference Oct 2013 Interoperability: connecting identifiers
Gudmundur Thorisson
 
Session III Census and registers - R.Radini, M.Scannapieco, L.Tosco, The ital...
Session III Census and registers - R.Radini, M.Scannapieco, L.Tosco, The ital...Session III Census and registers - R.Radini, M.Scannapieco, L.Tosco, The ital...
Session III Census and registers - R.Radini, M.Scannapieco, L.Tosco, The ital...
Istituto nazionale di statistica
 
discopen
discopendiscopen
discopen
Jisc
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
Doug Needham
 
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Anastasija Nikiforova
 
FDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.pptFDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.ppt
PerumalPitchandi
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
Amrapali Zaveri, PhD
 
Vii 4 Sh17 Sorathia
Vii 4 Sh17 SorathiaVii 4 Sh17 Sorathia
Vii 4 Sh17 Sorathia
IESS
 

Similar to Slides SRS 2012 (20)

RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)
 
Metadata interoperability With JPSearch
Metadata interoperability With JPSearchMetadata interoperability With JPSearch
Metadata interoperability With JPSearch
 
00 intro
00 intro00 intro
00 intro
 
NERD: an open source platform for extracting and disambiguating named entitie...
NERD: an open source platform for extracting and disambiguating named entitie...NERD: an open source platform for extracting and disambiguating named entitie...
NERD: an open source platform for extracting and disambiguating named entitie...
 
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
 
Overview of-semantic-technologies-and-ontologies
Overview of-semantic-technologies-and-ontologiesOverview of-semantic-technologies-and-ontologies
Overview of-semantic-technologies-and-ontologies
 
2012.10 - DDI Lifecycle - Moving Forward
2012.10 - DDI Lifecycle - Moving Forward2012.10 - DDI Lifecycle - Moving Forward
2012.10 - DDI Lifecycle - Moving Forward
 
Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)
 
ICSM12.ppt
ICSM12.pptICSM12.ppt
ICSM12.ppt
 
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchCrowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
 
Driver Guidelines and Repository Interoperability
Driver Guidelines and Repository InteroperabilityDriver Guidelines and Repository Interoperability
Driver Guidelines and Repository Interoperability
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
 
ODIN 1st year Conference Oct 2013 Interoperability: connecting identifiers
ODIN 1st year Conference Oct 2013 Interoperability: connecting identifiersODIN 1st year Conference Oct 2013 Interoperability: connecting identifiers
ODIN 1st year Conference Oct 2013 Interoperability: connecting identifiers
 
Session III Census and registers - R.Radini, M.Scannapieco, L.Tosco, The ital...
Session III Census and registers - R.Radini, M.Scannapieco, L.Tosco, The ital...Session III Census and registers - R.Radini, M.Scannapieco, L.Tosco, The ital...
Session III Census and registers - R.Radini, M.Scannapieco, L.Tosco, The ital...
 
discopen
discopendiscopen
discopen
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
 
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
 
FDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.pptFDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.ppt
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Vii 4 Sh17 Sorathia
Vii 4 Sh17 SorathiaVii 4 Sh17 Sorathia
Vii 4 Sh17 Sorathia
 

Recently uploaded

Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 

Recently uploaded (20)

Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 

Slides SRS 2012

  • 1. Folksonomy-Based Adaptive Query Expansion Claudio Biancalana, Fabio Gasparetti, Alessandro Micarelli, Alfonso Miola, and Giuseppe Sansonetti Department of Computer Science and Automation Artificial Intelligence Laboratory, Roma Tre University Via della Vasca Navale, 79, 00146 Rome, Italy SRS 2012 – Montreal, Canada, July 17, 2012
  • 2. State of the Art •  1993 - Web Search Engines •  Popular techniques to improve their performance   Explicit Relevance Feedback and (Automatic) Query Expansion (Maron Kuhns 1960, Rocchio 1971)   PageRank (1998)   (Implicitly built) User Profiles (2004) •  e.g., Google Personalized   Exploiting Social Networks or Signals (2010) •  Facebook, YouTube, Twitter   Implicitly Understanding User Actions SRS 2012 – Montreal, Canada, July 17, 2012
  • 3. Query Expansion The process of expanding a user query with additional related words and phrases   Original Query Q: {q1, q2,…, qk, qk+1,…, qn}   Terms to Add Q+: {e1, e2,..., em}   Terms to Remove Q-: {qk+1,..., qn} Expanded Query EQ = (Q U Q+) - Q- {q1,q2,...,qk,e1,e2,...,em} SRS 2012 – Montreal, Canada, July 17, 2012
  • 4. Building a Co-Occ Matrix For each document, a co-occurrence matrix is generated and then summed up in a single matrix   Usually a POS tagger extracts nouns, proper nouns, and adjectives t1 t2 t3 t4 t5 t1 0.0 1.0 0.0 2.0 1.0 t2 1.0 0.0 3.0 2.0 0.0 t3 0.0 3.0 0.0 9.0 0.0 t4 2.0 2.0 9.0 0.0 4.0 t5 1.0 0.0 0.0 4.0 0.0 Co-Occurrence Matrix SRS 2012 – Montreal, Canada, July 17, 2012
  • 5. Limits of Co-Occ Matrices •  Furnas’ Vocabulary problem (1987)   Polysemy and Homonym •  Mouth (river-sea; cave entrance; body part) •  River Bank or Financial Bank •  Corpus-dependent   Small corpora contain few statistics   Relevant concepts missing SRS 2012 – Montreal, Canada, July 17, 2012
  • 6. Research Question Is it possible to combine Query Expansion, Social Web, Semantic Search, and User Personalization in traditional Web search tools? SRS 2012 – Montreal, Canada, July 17, 2012
  • 7. Nereau Nereau, Master of Spiders, is the name of a divinity worshipped in the Nauru islands, in Micronesia. It is a foremost figure in many myths, some of which give it a specific role, that of endowing the mad with rationality and the mute with speech, thus making them complete human beings. SRS 2012 – Montreal, Canada, July 17, 2012
  • 8. Nereau Co-Occurrence Matrix •  Extension of Co-occurrence matrix:   Semantic meta-data as 3rd dimension   The user matrix is built on usage data •  Use of Social Bookmarking Services for metadata retrieval:   e.g., delicious, StumbleUpon, Digg SRS 2012 – Montreal, Canada, July 17, 2012
  • 9. Nereau Co-Occurrence Matrix •  Tags associated with visited URLs are collected and associated to the (stemmed) keywords from extracted content. •  Each co-occ matrix is associated to a tag <t1, t2, tag, co-occ> SRS 2012 – Montreal, Canada, July 17, 2012
  • 10. Nereau Co-Occurrence Matrix •  The expansion follows similar steps: each term of the query retrieves multiple co-occ matrices associated to different tags •  The occs of the tags are summed up over all the query terms obtaining a weighted set: SRS 2012 – Montreal, Canada, July 17, 2012
  • 11. Nereau Co-Occurrence Matrix The co-occ keywords associated to the most relevant tags compose the new query SRS 2012 – Montreal, Canada, July 17, 2012
  • 12. Experimental Evaluations Does it work? SRS 2012 – Montreal, Canada, July 17, 2012
  • 13. Experimental Evaluations •  Three kinds of evaluations   TREC corpus-based (500K docs, 249 queries) •  RF vs CoOcc vs Google vs Nereau   ODP corpus-based •  RF vs CoOcc vs Google vs Nereau   Web user-based •  Google vs PersGoogle vs RF vs Nereau •  nDCG, P@n, MAP SRS 2012 – Montreal, Canada, July 17, 2012
  • 14. Experimental Evaluations Web corpus Web corpus 42 users on real Web sessions nDCG@{1,5,10} SRS 2012 – Montreal, Canada, July 17, 2012
  • 15. Conclusions and Future Work •  A Nereau search engine that combines:   Traditional Query Expansion   Social Web   Semantic Spaces   Basic User Personalization •  Suitable to be included in traditional search engines;   Complexity O(n2K)   n = training docs   K = keywords extracted •  Future Work   Including more social data (e.g., networks, user authority)   Addressing the dynamic of folksonomies   Automatically assign tags when no social data is available SRS 2012 – Montreal, Canada, July 17, 2012