SlideShare a Scribd company logo
Dynamic Collective Entity
Representations for Entity Ranking
David Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maarten de Rijke
2
3
4
Entity search?
 Index = Knowledge Base (= Wikipedia)
 Documents = Entities
 “Real world entities” have a single representation
(in KB)
5
Representation is not static
 People talk about entities all the time
 Associations between words and entities change
over time
6
Example 1: News events
7
Example 2: Social media chatter
8
Dynamic Collective Entity
Representations
 Use “collective intelligence” to mine entity
descriptions to enrich representation.
 Is like document expansion (add terms found
through explicit links)
 Is not query expansion (terms found through
predicted links)
9
Advantages
 Cheap: Change document in index, leverage tried &
tested retrieval algorithms
 Free “smoothing”: (e.g., tweets) may capture ‘newly
evolving’ word associations (Ferguson shooting) and
incorporate out-of-document terms
 “move relevant documents closer to queries” (= close
the gap between searcher vocabulary & docs in index)
10
Haven’t we seen this before?
 Anchors & queries in particular have been shown to
improve retrieval [1]
 Tweets have been shown to be similar to anchors [2]
 Social tags, same [3]
 But:
 in batch (i.e., add data, see how it affects retrieval)
 single source
[1] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. TREC 2001
[2] G. Mishne and J. Lin. Twanchor text: A preliminary study of the value of tweets as anchor text. SIGIR ’12
[3] C.-J. Lee and W. B. Croft. Incorporating social anchors for ad hoc retrieval. OAIR ’13
11
Description sourcesAnthropornis nordenskjoeldi
Anthropornis
Nordenskjoeld's Giant Penguin
Eocene
Oligocene
Animal
Chordate
Aves
Sphenisciformes
Spheniscidae
...
emperor penguin
Nordenskjoeld's Giant Penguin
Anthropornis nordenskjoeldi
Nordenskjoeld's giant penguin
Anthropornis
Eocene birds
Oligocene birds
Extinct penguins
Oligocene extinctions
Bird genera
KB Anchors
KB Categories
KB Redirects
KB Links
Anthropornis nordenskjoeldi
Anthropornis nordenskjoeldi
Web Anchors
megafauna
Tags
Tweets
biggest penguin
anthropornis
extinct penguin
prehistoric birds
Queries
12
Challenge
 Heterogeneity
1. Description sources
2. Entities
 Dynamic nature
 Content changes over time
13
Method: Adaptive ranking
 Supervised single-field weighting model
 Features:
 field similarity: retrieval score per field.
 field “importance”: length, novel terms, etc.
 entity “importance”: time since last update.
 (Re-)learn optimal weights from clicks
14
Experimental setup
1. Data:
 MSN Query log (62,841 queries + clicks (on entities))
 Each query is treated as a time unit
 For each query:
 Produce ranking
 Observe click
 Evaluate ranking (MAP/P@1)
 Expand entities (w/ dynamic descriptions)
 [re-train ranker]
15
Main results
 Comparing effectiveness of diff. description
sources
 Comparing adaptive vs. non-adaptive ranker
performance
16
Description sources
MAP
No. of queries
17
Feature weights over time
Relativefeatureimportance
No. of queries
18
Non-adaptive vs. adaptive ranking
19
In summary
 Expanding entity representations with different
sources enables better matching of queries to
entities
 As new content comes in, it is beneficial to retrain
the ranker
 Informing ranker of “expansion state” further
improves performance
20
Thank you
 (Also, thank you WSDM & SIGIR travel grants)

More Related Content

What's hot

Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
Todd Vision
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline
Ravi Madduri
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
Todd Vision
 
Class intro cm7-referencinginternet_2011
Class intro cm7-referencinginternet_2011Class intro cm7-referencinginternet_2011
Class intro cm7-referencinginternet_2011
Penn State University
 
Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...
Todd Vision
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
Todd Vision
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS Analysis
Ravi Madduri
 
W3C HCLS Scientific Discourse Task Autumn 2010
W3C HCLS Scientific Discourse Task Autumn 2010W3C HCLS Scientific Discourse Task Autumn 2010
W3C HCLS Scientific Discourse Task Autumn 2010
tim.clark
 

What's hot (8)

Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
 
Class intro cm7-referencinginternet_2011
Class intro cm7-referencinginternet_2011Class intro cm7-referencinginternet_2011
Class intro cm7-referencinginternet_2011
 
Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS Analysis
 
W3C HCLS Scientific Discourse Task Autumn 2010
W3C HCLS Scientific Discourse Task Autumn 2010W3C HCLS Scientific Discourse Task Autumn 2010
W3C HCLS Scientific Discourse Task Autumn 2010
 

Viewers also liked

yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
David Graus
 
Understanding Email Traffic
Understanding Email TrafficUnderstanding Email Traffic
Understanding Email Traffic
David Graus
 
Instance Matching
Instance Matching Instance Matching
Instance Matching
Robert Isele
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
David Graus
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
David Graus
 
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus
 
Analyzing and Predicting Task Reminders
Analyzing and Predicting Task RemindersAnalyzing and Predicting Task Reminders
Analyzing and Predicting Task Reminders
David Graus
 
Big Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & Valkuilen
David Graus
 

Viewers also liked (8)

yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
 
Understanding Email Traffic
Understanding Email TrafficUnderstanding Email Traffic
Understanding Email Traffic
 
Instance Matching
Instance Matching Instance Matching
Instance Matching
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
 
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
 
Analyzing and Predicting Task Reminders
Analyzing and Predicting Task RemindersAnalyzing and Predicting Task Reminders
Analyzing and Predicting Task Reminders
 
Big Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & Valkuilen
 

Similar to Dynamic Collective Entity Representations for Entity Ranking

Shorthouse
ShorthouseShorthouse
Shorthouse
David Shorthouse
 
Information retrieval and extraction
Information retrieval and extractionInformation retrieval and extraction
Information retrieval and extraction
Ankit Sharma
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
DeVonne Parks, CEM
 
Data Publishing in Archaeozoology
Data Publishing in ArchaeozoologyData Publishing in Archaeozoology
Data Publishing in Archaeozoology
Sarah Whitcher Kansa
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
GigaScience, BGI Hong Kong
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
LIBER Europe
 
UKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
UKSG webinar: Quo vadis? Getting there with linked data with Gordon DunsireUKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
UKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
UKSG: connecting the knowledge community
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
andrea huang
 
Toward a news data science
Toward a news data scienceToward a news data science
Toward a news data science
Daemin Park
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open Context
Eric Kansa
 
Material Cultures2010 Alexandre Monnin
Material Cultures2010 Alexandre MonninMaterial Cultures2010 Alexandre Monnin
Material Cultures2010 Alexandre Monnin
Alexandre Monnin
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
Scottish Library & Information Council (SLIC), CILIP in Scotland (CILIPS)
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impedimentDonat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
ICZN
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
krisztianbalog
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data Seminar
Jenny Molloy
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
Markus Luczak-Rösch
 
Visual Analysis of Concept Change and Information Diffusion
Visual Analysis of Concept Change and Information DiffusionVisual Analysis of Concept Change and Information Diffusion
Visual Analysis of Concept Change and Information Diffusion
inscit2006
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
GigaScience, BGI Hong Kong
 

Similar to Dynamic Collective Entity Representations for Entity Ranking (20)

Shorthouse
ShorthouseShorthouse
Shorthouse
 
Information retrieval and extraction
Information retrieval and extractionInformation retrieval and extraction
Information retrieval and extraction
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 
Data Publishing in Archaeozoology
Data Publishing in ArchaeozoologyData Publishing in Archaeozoology
Data Publishing in Archaeozoology
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
 
UKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
UKSG webinar: Quo vadis? Getting there with linked data with Gordon DunsireUKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
UKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
 
Toward a news data science
Toward a news data scienceToward a news data science
Toward a news data science
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open Context
 
Material Cultures2010 Alexandre Monnin
Material Cultures2010 Alexandre MonninMaterial Cultures2010 Alexandre Monnin
Material Cultures2010 Alexandre Monnin
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impedimentDonat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data Seminar
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
Visual Analysis of Concept Change and Information Diffusion
Visual Analysis of Concept Change and Information DiffusionVisual Analysis of Concept Change and Information Diffusion
Visual Analysis of Concept Change and Information Diffusion
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 

More from David Graus

Algoritmes; wie heeft het voor het zeggen?
Algoritmes; wie heeft het voor het zeggen?Algoritmes; wie heeft het voor het zeggen?
Algoritmes; wie heeft het voor het zeggen?
David Graus
 
Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
David Graus
 
Bias in Recommendations
Bias in RecommendationsBias in Recommendations
Bias in Recommendations
David Graus
 
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
David Graus
 
CAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for Impact
David Graus
 
Opening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender Systems
David Graus
 
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
David Graus
 
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
David Graus
 
Financial News Mining @ PyData Amsterdam
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData Amsterdam
David Graus
 
De Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgeven
David Graus
 
Financial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.info
David Graus
 
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsGenerating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
David Graus
 
Semantic Search in E-Discovery
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
David Graus
 
Semantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron Database
David Graus
 
Semantic annotation, clustering and visualization
Semantic annotation, clustering and visualizationSemantic annotation, clustering and visualization
Semantic annotation, clustering and visualization
David Graus
 

More from David Graus (15)

Algoritmes; wie heeft het voor het zeggen?
Algoritmes; wie heeft het voor het zeggen?Algoritmes; wie heeft het voor het zeggen?
Algoritmes; wie heeft het voor het zeggen?
 
Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
 
Bias in Recommendations
Bias in RecommendationsBias in Recommendations
Bias in Recommendations
 
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
 
CAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for Impact
 
Opening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender Systems
 
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
 
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
 
Financial News Mining @ PyData Amsterdam
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData Amsterdam
 
De Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgeven
 
Financial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.info
 
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsGenerating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
 
Semantic Search in E-Discovery
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
 
Semantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron Database
 
Semantic annotation, clustering and visualization
Semantic annotation, clustering and visualizationSemantic annotation, clustering and visualization
Semantic annotation, clustering and visualization
 

Recently uploaded

largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptxlargeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
muralinath2
 
Testing the Son of God Hypothesis (Jesus Christ)
Testing the Son of God Hypothesis (Jesus Christ)Testing the Son of God Hypothesis (Jesus Christ)
Testing the Son of God Hypothesis (Jesus Christ)
Robert Luk
 
poikilocytosis 23765437865210857453257844.pptx
poikilocytosis 23765437865210857453257844.pptxpoikilocytosis 23765437865210857453257844.pptx
poikilocytosis 23765437865210857453257844.pptx
muralinath2
 
gastrointestinal hormonese I 45678633134668097636903278.pptx
gastrointestinal hormonese I 45678633134668097636903278.pptxgastrointestinal hormonese I 45678633134668097636903278.pptx
gastrointestinal hormonese I 45678633134668097636903278.pptx
muralinath2
 
Liver & Gall Bladder 23098463278654387654328765439875.pptx
Liver & Gall Bladder 23098463278654387654328765439875.pptxLiver & Gall Bladder 23098463278654387654328765439875.pptx
Liver & Gall Bladder 23098463278654387654328765439875.pptx
muralinath2
 
Komodo Dragon I PPT
Komodo Dragon I PPT Komodo Dragon I PPT
Komodo Dragon I PPT
alokitapramanik0
 
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Sérgio Sacani
 
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
James AH Campbell
 
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Hossein Fani
 
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
Sharon Liu
 
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
Sérgio Sacani
 
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary TrackThe Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
Sérgio Sacani
 
Introduction-to-the-Recent-Floods-in-Dubai [Autosaved].pptx
Introduction-to-the-Recent-Floods-in-Dubai [Autosaved].pptxIntroduction-to-the-Recent-Floods-in-Dubai [Autosaved].pptx
Introduction-to-the-Recent-Floods-in-Dubai [Autosaved].pptx
DeemaB1
 
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
Ashkbiz Danehkar
 
Computer aided biopharmaceutical characterization
Computer aided biopharmaceutical characterizationComputer aided biopharmaceutical characterization
Computer aided biopharmaceutical characterization
souravpaul769171
 
SUBJECT SPECIFIC ETHICAL ISSUES IN STUDY
SUBJECT SPECIFIC ETHICAL ISSUES IN STUDYSUBJECT SPECIFIC ETHICAL ISSUES IN STUDY
SUBJECT SPECIFIC ETHICAL ISSUES IN STUDY
Dr Kirpa Ram Jangra
 
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanetHydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Sérgio Sacani
 
Active and Passive Surveillance of pharmacovigillance
Active and Passive Surveillance of pharmacovigillanceActive and Passive Surveillance of pharmacovigillance
Active and Passive Surveillance of pharmacovigillance
SejalAgrawal43
 
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptxSCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
WALTONMARBRUCAL
 
2. Osmotic pressure, osmotic potential, turgor pressure, wall pressure, water...
2. Osmotic pressure, osmotic potential, turgor pressure, wall pressure, water...2. Osmotic pressure, osmotic potential, turgor pressure, wall pressure, water...
2. Osmotic pressure, osmotic potential, turgor pressure, wall pressure, water...
khadija07kubra
 

Recently uploaded (20)

largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptxlargeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
 
Testing the Son of God Hypothesis (Jesus Christ)
Testing the Son of God Hypothesis (Jesus Christ)Testing the Son of God Hypothesis (Jesus Christ)
Testing the Son of God Hypothesis (Jesus Christ)
 
poikilocytosis 23765437865210857453257844.pptx
poikilocytosis 23765437865210857453257844.pptxpoikilocytosis 23765437865210857453257844.pptx
poikilocytosis 23765437865210857453257844.pptx
 
gastrointestinal hormonese I 45678633134668097636903278.pptx
gastrointestinal hormonese I 45678633134668097636903278.pptxgastrointestinal hormonese I 45678633134668097636903278.pptx
gastrointestinal hormonese I 45678633134668097636903278.pptx
 
Liver & Gall Bladder 23098463278654387654328765439875.pptx
Liver & Gall Bladder 23098463278654387654328765439875.pptxLiver & Gall Bladder 23098463278654387654328765439875.pptx
Liver & Gall Bladder 23098463278654387654328765439875.pptx
 
Komodo Dragon I PPT
Komodo Dragon I PPT Komodo Dragon I PPT
Komodo Dragon I PPT
 
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
 
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
 
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
 
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
 
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
 
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary TrackThe Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
 
Introduction-to-the-Recent-Floods-in-Dubai [Autosaved].pptx
Introduction-to-the-Recent-Floods-in-Dubai [Autosaved].pptxIntroduction-to-the-Recent-Floods-in-Dubai [Autosaved].pptx
Introduction-to-the-Recent-Floods-in-Dubai [Autosaved].pptx
 
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
 
Computer aided biopharmaceutical characterization
Computer aided biopharmaceutical characterizationComputer aided biopharmaceutical characterization
Computer aided biopharmaceutical characterization
 
SUBJECT SPECIFIC ETHICAL ISSUES IN STUDY
SUBJECT SPECIFIC ETHICAL ISSUES IN STUDYSUBJECT SPECIFIC ETHICAL ISSUES IN STUDY
SUBJECT SPECIFIC ETHICAL ISSUES IN STUDY
 
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanetHydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
 
Active and Passive Surveillance of pharmacovigillance
Active and Passive Surveillance of pharmacovigillanceActive and Passive Surveillance of pharmacovigillance
Active and Passive Surveillance of pharmacovigillance
 
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptxSCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
 
2. Osmotic pressure, osmotic potential, turgor pressure, wall pressure, water...
2. Osmotic pressure, osmotic potential, turgor pressure, wall pressure, water...2. Osmotic pressure, osmotic potential, turgor pressure, wall pressure, water...
2. Osmotic pressure, osmotic potential, turgor pressure, wall pressure, water...
 

Dynamic Collective Entity Representations for Entity Ranking

  • 1. Dynamic Collective Entity Representations for Entity Ranking David Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maarten de Rijke
  • 2. 2
  • 3. 3
  • 4. 4 Entity search?  Index = Knowledge Base (= Wikipedia)  Documents = Entities  “Real world entities” have a single representation (in KB)
  • 5. 5 Representation is not static  People talk about entities all the time  Associations between words and entities change over time
  • 7. 7 Example 2: Social media chatter
  • 8. 8 Dynamic Collective Entity Representations  Use “collective intelligence” to mine entity descriptions to enrich representation.  Is like document expansion (add terms found through explicit links)  Is not query expansion (terms found through predicted links)
  • 9. 9 Advantages  Cheap: Change document in index, leverage tried & tested retrieval algorithms  Free “smoothing”: (e.g., tweets) may capture ‘newly evolving’ word associations (Ferguson shooting) and incorporate out-of-document terms  “move relevant documents closer to queries” (= close the gap between searcher vocabulary & docs in index)
  • 10. 10 Haven’t we seen this before?  Anchors & queries in particular have been shown to improve retrieval [1]  Tweets have been shown to be similar to anchors [2]  Social tags, same [3]  But:  in batch (i.e., add data, see how it affects retrieval)  single source [1] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. TREC 2001 [2] G. Mishne and J. Lin. Twanchor text: A preliminary study of the value of tweets as anchor text. SIGIR ’12 [3] C.-J. Lee and W. B. Croft. Incorporating social anchors for ad hoc retrieval. OAIR ’13
  • 11. 11 Description sourcesAnthropornis nordenskjoeldi Anthropornis Nordenskjoeld's Giant Penguin Eocene Oligocene Animal Chordate Aves Sphenisciformes Spheniscidae ... emperor penguin Nordenskjoeld's Giant Penguin Anthropornis nordenskjoeldi Nordenskjoeld's giant penguin Anthropornis Eocene birds Oligocene birds Extinct penguins Oligocene extinctions Bird genera KB Anchors KB Categories KB Redirects KB Links Anthropornis nordenskjoeldi Anthropornis nordenskjoeldi Web Anchors megafauna Tags Tweets biggest penguin anthropornis extinct penguin prehistoric birds Queries
  • 12. 12 Challenge  Heterogeneity 1. Description sources 2. Entities  Dynamic nature  Content changes over time
  • 13. 13 Method: Adaptive ranking  Supervised single-field weighting model  Features:  field similarity: retrieval score per field.  field “importance”: length, novel terms, etc.  entity “importance”: time since last update.  (Re-)learn optimal weights from clicks
  • 14. 14 Experimental setup 1. Data:  MSN Query log (62,841 queries + clicks (on entities))  Each query is treated as a time unit  For each query:  Produce ranking  Observe click  Evaluate ranking (MAP/P@1)  Expand entities (w/ dynamic descriptions)  [re-train ranker]
  • 15. 15 Main results  Comparing effectiveness of diff. description sources  Comparing adaptive vs. non-adaptive ranker performance
  • 17. 17 Feature weights over time Relativefeatureimportance No. of queries
  • 19. 19 In summary  Expanding entity representations with different sources enables better matching of queries to entities  As new content comes in, it is beneficial to retrain the ranker  Informing ranker of “expansion state” further improves performance
  • 20. 20 Thank you  (Also, thank you WSDM & SIGIR travel grants)

Editor's Notes

  1. first entities & structure, i get to show the mandatory entity search example
  2. you are not interested in documents but in things: person/artist kendrick lamar referring to him w/ his former stage name
  3. so it is like web search, but the units of retrieval are real life entities, so we can collect data for them
  4. This is what we try to leverage in this work
  5. July 31st, after August 7th -> Added content, new words associations
  6. this looks a bit extreme, because there’s swearing but there’s a serious intuition here; vocabulary gap (formal KB, informal chatter)
  7. our method aims to leverage this enrich representation + close the gap
  8. of collective int/descr sources
  9. we look at a scenario where the expansions come in a streaming manner
  10. Fielded document representation
  11. You could do vanilla retrieval. But two challenges arise; description sources differ along several dimensions (e.g., volume, quality, novelty) head entities are likely to receive a larger number of external descriptions than tail entities. content changes over time, so expansions may accumulate and “swamp” the representation
  12. Our solution is to dynamically learn how to combine fields into single representation, Features (more detail in paper); field similarity features (per field) = query–field similarity scores. field importance features (per field) to inform the ranker of the status of the field at that time (i.e., more and novel content) entity importance (to favor “recently” updated entities) (what about experimental setup?)
  13. Took all queries that yield Wiki clicks. Top-k retrieval, extract features Allows to track performance over time
  14. in this talk I focus on the contribution of sources and adaptive vs. static ranker
  15. 1. Each source contributes to better ranking; Tags/web anchors do best, tweets are significantly > KB 2. Dynamic sources have higher “learning rates” (suggests that newly incoming data is successfully incorporated) 3. Tags starts under web but approaches it; new tags improve [NEXT] To see the effect of incoming data, feature weights
  16. - Static go down, dynamic go up (suggests retraining is important w/ dynamic expansions) - Tweets marginally, but as we know KB+Tweets > KB, the tweets do help - Not shown; static expansions stay roughly the same [NEXT] Increasing field weight + increased performance suggests retraining is needed, next;
  17. 1. [LEFT] Lower performance overall (more data w/o more training queries) 2. [LEFT] Dynamic ones higher slopes; so newly incoming data does help even in static 3. [RIGHT] same patterns but tags+web do comparatively better (because of swamping?) [END] higher performance: retraining increases ranker’s ability in optimally combining descriptions into a single representation
  18. More data helps, but to optimally benefit you need to inform your ranker