SlideShare a Scribd company logo
Dynamic Collective Entity
Representations for Entity Ranking
David Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maarten de Rijke
2
3
4
Entity search?
 Index = Knowledge Base (= Wikipedia)
 Documents = Entities
 “Real world entities” have a single representation
(in KB)
5
Representation is not static
 People talk about entities all the time
 Associations between words and entities change
over time
6
Example 1: News events
7
Example 2: Social media chatter
8
Dynamic Collective Entity
Representations
 Use “collective intelligence” to mine entity
descriptions to enrich representation.
 Is like document expansion (add terms found
through explicit links)
 Is not query expansion (terms found through
predicted links)
9
Advantages
 Cheap: Change document in index, leverage tried &
tested retrieval algorithms
 Free “smoothing”: (e.g., tweets) may capture ‘newly
evolving’ word associations (Ferguson shooting) and
incorporate out-of-document terms
 “move relevant documents closer to queries” (= close
the gap between searcher vocabulary & docs in index)
10
Haven’t we seen this before?
 Anchors & queries in particular have been shown to
improve retrieval [1]
 Tweets have been shown to be similar to anchors [2]
 Social tags, same [3]
 But:
 in batch (i.e., add data, see how it affects retrieval)
 single source
[1] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. TREC 2001
[2] G. Mishne and J. Lin. Twanchor text: A preliminary study of the value of tweets as anchor text. SIGIR ’12
[3] C.-J. Lee and W. B. Croft. Incorporating social anchors for ad hoc retrieval. OAIR ’13
11
Description sourcesAnthropornis nordenskjoeldi
Anthropornis
Nordenskjoeld's Giant Penguin
Eocene
Oligocene
Animal
Chordate
Aves
Sphenisciformes
Spheniscidae
...
emperor penguin
Nordenskjoeld's Giant Penguin
Anthropornis nordenskjoeldi
Nordenskjoeld's giant penguin
Anthropornis
Eocene birds
Oligocene birds
Extinct penguins
Oligocene extinctions
Bird genera
KB Anchors
KB Categories
KB Redirects
KB Links
Anthropornis nordenskjoeldi
Anthropornis nordenskjoeldi
Web Anchors
megafauna
Tags
Tweets
biggest penguin
anthropornis
extinct penguin
prehistoric birds
Queries
12
Challenge
 Heterogeneity
1. Description sources
2. Entities
 Dynamic nature
 Content changes over time
13
Method: Adaptive ranking
 Supervised single-field weighting model
 Features:
 field similarity: retrieval score per field.
 field “importance”: length, novel terms, etc.
 entity “importance”: time since last update.
 (Re-)learn optimal weights from clicks
14
Experimental setup
1. Data:
 MSN Query log (62,841 queries + clicks (on entities))
 Each query is treated as a time unit
 For each query:
 Produce ranking
 Observe click
 Evaluate ranking (MAP/P@1)
 Expand entities (w/ dynamic descriptions)
 [re-train ranker]
15
Main results
 Comparing effectiveness of diff. description
sources
 Comparing adaptive vs. non-adaptive ranker
performance
16
Description sources
MAP
No. of queries
17
Feature weights over time
Relativefeatureimportance
No. of queries
18
Non-adaptive vs. adaptive ranking
19
In summary
 Expanding entity representations with different
sources enables better matching of queries to
entities
 As new content comes in, it is beneficial to retrain
the ranker
 Informing ranker of “expansion state” further
improves performance
20
Thank you
 (Also, thank you WSDM & SIGIR travel grants)

More Related Content

What's hot

Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
Todd Vision
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline
Ravi Madduri
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
Todd Vision
 
Class intro cm7-referencinginternet_2011
Class intro cm7-referencinginternet_2011Class intro cm7-referencinginternet_2011
Class intro cm7-referencinginternet_2011
Penn State University
 
Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...
Todd Vision
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
Todd Vision
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS Analysis
Ravi Madduri
 
W3C HCLS Scientific Discourse Task Autumn 2010
W3C HCLS Scientific Discourse Task Autumn 2010W3C HCLS Scientific Discourse Task Autumn 2010
W3C HCLS Scientific Discourse Task Autumn 2010
tim.clark
 

What's hot (8)

Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
 
Class intro cm7-referencinginternet_2011
Class intro cm7-referencinginternet_2011Class intro cm7-referencinginternet_2011
Class intro cm7-referencinginternet_2011
 
Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS Analysis
 
W3C HCLS Scientific Discourse Task Autumn 2010
W3C HCLS Scientific Discourse Task Autumn 2010W3C HCLS Scientific Discourse Task Autumn 2010
W3C HCLS Scientific Discourse Task Autumn 2010
 

Viewers also liked

yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
David Graus
 
Understanding Email Traffic
Understanding Email TrafficUnderstanding Email Traffic
Understanding Email Traffic
David Graus
 
Instance Matching
Instance Matching Instance Matching
Instance Matching
Robert Isele
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
David Graus
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
David Graus
 
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus
 
Analyzing and Predicting Task Reminders
Analyzing and Predicting Task RemindersAnalyzing and Predicting Task Reminders
Analyzing and Predicting Task Reminders
David Graus
 
Big Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & Valkuilen
David Graus
 

Viewers also liked (8)

yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
 
Understanding Email Traffic
Understanding Email TrafficUnderstanding Email Traffic
Understanding Email Traffic
 
Instance Matching
Instance Matching Instance Matching
Instance Matching
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
 
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
 
Analyzing and Predicting Task Reminders
Analyzing and Predicting Task RemindersAnalyzing and Predicting Task Reminders
Analyzing and Predicting Task Reminders
 
Big Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & Valkuilen
 

Similar to Dynamic Collective Entity Representations for Entity Ranking

Shorthouse
ShorthouseShorthouse
Shorthouse
David Shorthouse
 
Information retrieval and extraction
Information retrieval and extractionInformation retrieval and extraction
Information retrieval and extraction
Ankit Sharma
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
DeVonne Parks, CEM
 
Data Publishing in Archaeozoology
Data Publishing in ArchaeozoologyData Publishing in Archaeozoology
Data Publishing in Archaeozoology
Sarah Whitcher Kansa
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
GigaScience, BGI Hong Kong
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
LIBER Europe
 
UKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
UKSG webinar: Quo vadis? Getting there with linked data with Gordon DunsireUKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
UKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
UKSG: connecting the knowledge community
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
andrea huang
 
Toward a news data science
Toward a news data scienceToward a news data science
Toward a news data science
Daemin Park
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open Context
Eric Kansa
 
Material Cultures2010 Alexandre Monnin
Material Cultures2010 Alexandre MonninMaterial Cultures2010 Alexandre Monnin
Material Cultures2010 Alexandre Monnin
Alexandre Monnin
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
Scottish Library & Information Council (SLIC), CILIP in Scotland (CILIPS)
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impedimentDonat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
ICZN
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
krisztianbalog
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data Seminar
Jenny Molloy
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
Markus Luczak-Rösch
 
Visual Analysis of Concept Change and Information Diffusion
Visual Analysis of Concept Change and Information DiffusionVisual Analysis of Concept Change and Information Diffusion
Visual Analysis of Concept Change and Information Diffusion
inscit2006
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
GigaScience, BGI Hong Kong
 

Similar to Dynamic Collective Entity Representations for Entity Ranking (20)

Shorthouse
ShorthouseShorthouse
Shorthouse
 
Information retrieval and extraction
Information retrieval and extractionInformation retrieval and extraction
Information retrieval and extraction
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 
Data Publishing in Archaeozoology
Data Publishing in ArchaeozoologyData Publishing in Archaeozoology
Data Publishing in Archaeozoology
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
 
UKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
UKSG webinar: Quo vadis? Getting there with linked data with Gordon DunsireUKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
UKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
 
Toward a news data science
Toward a news data scienceToward a news data science
Toward a news data science
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open Context
 
Material Cultures2010 Alexandre Monnin
Material Cultures2010 Alexandre MonninMaterial Cultures2010 Alexandre Monnin
Material Cultures2010 Alexandre Monnin
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impedimentDonat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data Seminar
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
Visual Analysis of Concept Change and Information Diffusion
Visual Analysis of Concept Change and Information DiffusionVisual Analysis of Concept Change and Information Diffusion
Visual Analysis of Concept Change and Information Diffusion
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 

More from David Graus

Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
David Graus
 
Bias in Recommendations
Bias in RecommendationsBias in Recommendations
Bias in Recommendations
David Graus
 
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
David Graus
 
CAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for Impact
David Graus
 
Opening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender Systems
David Graus
 
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
David Graus
 
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
David Graus
 
Financial News Mining @ PyData Amsterdam
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData Amsterdam
David Graus
 
De Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgeven
David Graus
 
Financial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.info
David Graus
 
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsGenerating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
David Graus
 
Semantic Search in E-Discovery
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
David Graus
 
Semantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron Database
David Graus
 
Semantic annotation, clustering and visualization
Semantic annotation, clustering and visualizationSemantic annotation, clustering and visualization
Semantic annotation, clustering and visualization
David Graus
 

More from David Graus (14)

Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
 
Bias in Recommendations
Bias in RecommendationsBias in Recommendations
Bias in Recommendations
 
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
 
CAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for Impact
 
Opening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender Systems
 
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
 
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
 
Financial News Mining @ PyData Amsterdam
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData Amsterdam
 
De Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgeven
 
Financial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.info
 
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsGenerating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
 
Semantic Search in E-Discovery
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
 
Semantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron Database
 
Semantic annotation, clustering and visualization
Semantic annotation, clustering and visualizationSemantic annotation, clustering and visualization
Semantic annotation, clustering and visualization
 

Recently uploaded

What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 

Recently uploaded (20)

What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 

Dynamic Collective Entity Representations for Entity Ranking

  • 1. Dynamic Collective Entity Representations for Entity Ranking David Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maarten de Rijke
  • 2. 2
  • 3. 3
  • 4. 4 Entity search?  Index = Knowledge Base (= Wikipedia)  Documents = Entities  “Real world entities” have a single representation (in KB)
  • 5. 5 Representation is not static  People talk about entities all the time  Associations between words and entities change over time
  • 7. 7 Example 2: Social media chatter
  • 8. 8 Dynamic Collective Entity Representations  Use “collective intelligence” to mine entity descriptions to enrich representation.  Is like document expansion (add terms found through explicit links)  Is not query expansion (terms found through predicted links)
  • 9. 9 Advantages  Cheap: Change document in index, leverage tried & tested retrieval algorithms  Free “smoothing”: (e.g., tweets) may capture ‘newly evolving’ word associations (Ferguson shooting) and incorporate out-of-document terms  “move relevant documents closer to queries” (= close the gap between searcher vocabulary & docs in index)
  • 10. 10 Haven’t we seen this before?  Anchors & queries in particular have been shown to improve retrieval [1]  Tweets have been shown to be similar to anchors [2]  Social tags, same [3]  But:  in batch (i.e., add data, see how it affects retrieval)  single source [1] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. TREC 2001 [2] G. Mishne and J. Lin. Twanchor text: A preliminary study of the value of tweets as anchor text. SIGIR ’12 [3] C.-J. Lee and W. B. Croft. Incorporating social anchors for ad hoc retrieval. OAIR ’13
  • 11. 11 Description sourcesAnthropornis nordenskjoeldi Anthropornis Nordenskjoeld's Giant Penguin Eocene Oligocene Animal Chordate Aves Sphenisciformes Spheniscidae ... emperor penguin Nordenskjoeld's Giant Penguin Anthropornis nordenskjoeldi Nordenskjoeld's giant penguin Anthropornis Eocene birds Oligocene birds Extinct penguins Oligocene extinctions Bird genera KB Anchors KB Categories KB Redirects KB Links Anthropornis nordenskjoeldi Anthropornis nordenskjoeldi Web Anchors megafauna Tags Tweets biggest penguin anthropornis extinct penguin prehistoric birds Queries
  • 12. 12 Challenge  Heterogeneity 1. Description sources 2. Entities  Dynamic nature  Content changes over time
  • 13. 13 Method: Adaptive ranking  Supervised single-field weighting model  Features:  field similarity: retrieval score per field.  field “importance”: length, novel terms, etc.  entity “importance”: time since last update.  (Re-)learn optimal weights from clicks
  • 14. 14 Experimental setup 1. Data:  MSN Query log (62,841 queries + clicks (on entities))  Each query is treated as a time unit  For each query:  Produce ranking  Observe click  Evaluate ranking (MAP/P@1)  Expand entities (w/ dynamic descriptions)  [re-train ranker]
  • 15. 15 Main results  Comparing effectiveness of diff. description sources  Comparing adaptive vs. non-adaptive ranker performance
  • 17. 17 Feature weights over time Relativefeatureimportance No. of queries
  • 19. 19 In summary  Expanding entity representations with different sources enables better matching of queries to entities  As new content comes in, it is beneficial to retrain the ranker  Informing ranker of “expansion state” further improves performance
  • 20. 20 Thank you  (Also, thank you WSDM & SIGIR travel grants)

Editor's Notes

  1. first entities & structure, i get to show the mandatory entity search example
  2. you are not interested in documents but in things: person/artist kendrick lamar referring to him w/ his former stage name
  3. so it is like web search, but the units of retrieval are real life entities, so we can collect data for them
  4. This is what we try to leverage in this work
  5. July 31st, after August 7th -> Added content, new words associations
  6. this looks a bit extreme, because there’s swearing but there’s a serious intuition here; vocabulary gap (formal KB, informal chatter)
  7. our method aims to leverage this enrich representation + close the gap
  8. of collective int/descr sources
  9. we look at a scenario where the expansions come in a streaming manner
  10. Fielded document representation
  11. You could do vanilla retrieval. But two challenges arise; description sources differ along several dimensions (e.g., volume, quality, novelty) head entities are likely to receive a larger number of external descriptions than tail entities. content changes over time, so expansions may accumulate and “swamp” the representation
  12. Our solution is to dynamically learn how to combine fields into single representation, Features (more detail in paper); field similarity features (per field) = query–field similarity scores. field importance features (per field) to inform the ranker of the status of the field at that time (i.e., more and novel content) entity importance (to favor “recently” updated entities) (what about experimental setup?)
  13. Took all queries that yield Wiki clicks. Top-k retrieval, extract features Allows to track performance over time
  14. in this talk I focus on the contribution of sources and adaptive vs. static ranker
  15. 1. Each source contributes to better ranking; Tags/web anchors do best, tweets are significantly > KB 2. Dynamic sources have higher “learning rates” (suggests that newly incoming data is successfully incorporated) 3. Tags starts under web but approaches it; new tags improve [NEXT] To see the effect of incoming data, feature weights
  16. - Static go down, dynamic go up (suggests retraining is important w/ dynamic expansions) - Tweets marginally, but as we know KB+Tweets > KB, the tweets do help - Not shown; static expansions stay roughly the same [NEXT] Increasing field weight + increased performance suggests retraining is needed, next;
  17. 1. [LEFT] Lower performance overall (more data w/o more training queries) 2. [LEFT] Dynamic ones higher slopes; so newly incoming data does help even in static 3. [RIGHT] same patterns but tags+web do comparatively better (because of swamping?) [END] higher performance: retraining increases ranker’s ability in optimally combining descriptions into a single representation
  18. More data helps, but to optimally benefit you need to inform your ranker