Dynamic Collective Entity Representations for Entity Ranking

David Graus
David GrausLead Data Scientist
Dynamic Collective Entity
Representations for Entity Ranking
David Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maarten de Rijke
2
3
4
Entity search?
Ò Index = Knowledge Base (= Wikipedia)
Ò Documents = Entities
Ò “Real world entities” have a single representation
(in KB)
5
Representation is not static
Ò Associations between words and entities change
over time
Ò “ferguson shooting” -> Ferguson, Missouri
Ò People talk about entities all the time
6
*****
7
Dynamic Collective Entity
Representations
Ò Use “collective intelligence” to mine entity
descriptions to enrich representation.
Ò Is like document expansion (add terms found
through explicit links)
Ò Is not query expansion (terms found through
predicted links)
8
Advantages
Ò Cheap: Change document in index, leverage tried &
tested retrieval algorithms
Ò Free “smoothing”: (e.g., tweets) may capture ‘newly
evolving’ word associations (Ferguson shooting) and
incorporate out-of-document terms
Ò “move relevant documents closer to queries” (= close
the gap between searcher vocabulary & docs in index)
9
Haven’t we seen this before?
Ò Anchors & queries in particular have been shown to
improve retrieval [1]
Ò Tweets have been shown to be similar to anchors [2]
Ò Social tags, same [3]
Ò But: in batch (i.e., add data, see if/how it improves
retrieval)
[1] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. TREC 2001
[2] G. Mishne and J. Lin. Twanchor text: A preliminary study of the value of tweets as anchor text. SIGIR ’12
[3] C.-J. Lee and W. B. Croft. Incorporating social anchors for ad hoc retrieval. OAIR ’13
10
Description sources
Description sources
KB
Wikipedia dump
(Aug ‘14)
57M descriptions for
4.8M entities.
Web anchors
Anchors from Google
WikiLinks corpus.
9.8M descriptions for
876,063 entities.
Tweets
Tweets w/ links to
Wikipedia pages
(2011-2014)
52,631 descriptions for
38,269 entities.
Queries
Queries from MSN query
logs that yield Wikipedia
clicks.
47,002 descriptions for
18,724 entities.
Social tags
Delicious tags for
Wiki pages, from the
SocialBM0311 corpus.
4.4M descriptions for
289,015 entities.
Dynamic sources
Static sources
11
Original entity representation
Tupac Shakur
Tupac Amaru Shakur
(Previously known
as Lesane Parish
Crooks)(too-pahk
shə-koor;[1] June
16, 1971 – Septem-
ber 13, 1996), also
known by his stage
names 2Pac and
(briefly) Makaveli,
was an American
rapper, author,
actor, and poet.[2]
As of 2007, Shakur
has sold over 75
million records
worldwide, making
him one of the
best-selling music
artists of all
time.[3] His double
disc albums All
Eyez on Me and his
Greatest Hits are
among the [...]
Original entity description
Entity description
12
Static description sources
KB Anchors
2Pac
Tupac
Makaveli
KB Linked entities
The Notorious B.I.G.
Black Panther Party
Muammar Gaddafi
KB Redirects
2pac Shakur
Thug Immortal
KB Categories
Murdered Rappers
Death Row Record Artists
American deists
Web Anchors
What job did Tupac have before
he was a rapper
Tupac
Tupac is arguably more
influential
Tupac Amaru Shakur
Tupac Shakur-style drive-by
shooting
Tupac Shakur
Tupac Shakur reciting Shake-
speare at art school
Description sources
KB
Wikipedia dump
(Aug ‘14)
57M descriptions for
4.8M entities.
Web anchors
Anchors from Google
WikiLinks corpus.
9.8M descriptions for
876,063 entities.
Tweets
Tweets w/ links to
Wikipedia pages
(2011-2014)
52,631 descriptions for
38,269 entities.
Queries
Queries from MSN query
logs that yield Wikipedia
clicks.
47,002 descriptions for
18,724 entities.
Social tags
Delicious tags for
Wiki pages, from th
SocialBM0311 corpu
4.4M descriptions fo
289,015 entities.
Dynamic sources
Static sources
Description sources
KB
Wikipedia dump
(Aug ‘14)
57M descriptions for
4.8M entities.
Web anchors
Anchors from Google
WikiLinks corpus.
9.8M descriptions for
876,063 entities.
Q
Q
l
c
4
1Static sources
KB Anchors
2Pac
Tupac
Makaveli
KB Linked entities
The Notorious B.I.G.
Black Panther Party
Muammar Gaddafi
KB Redirects
2pac Shakur
Thug Immortal
KB Categories
Murdered Rappers
Death Row Record Artists
American deists
Web Anchors
What job did Tupac have before
he was a rapper
Tupac
Tupac is arguably more
influential
Tupac Amaru Shakur
Tupac Shakur-style drive-by
shooting
Tupac Shakur
Tupac Shakur reciting Shake-
speare at art school
13
Dynamic description sources
Dynamic expansions
tupac and the law
hiphop/icons
dead rappers
people influenced by tupac
awesomeartist rapd
Happy Birthday
Tupac!!! 2Pac Gemini
RT: Las cenizas de Tupac, el
mejor rapero de la historia,-
fueron mezcladas con marihuana y
fumadas por miembros de Outlawz
Even more crazy that this was an-
nounced just one day before what
would have been Pac’s 40th birth-
day.
Tweets TagsQueries
tion sources
dump
iptions for
ies.
Web anchors
Anchors from Google
WikiLinks corpus.
9.8M descriptions for
876,063 entities.
Tweets
Tweets w/ links to
Wikipedia pages
(2011-2014)
52,631 descriptions for
38,269 entities.
Queries
Queries from MSN query
logs that yield Wikipedia
clicks.
47,002 descriptions for
18,724 entities.
Social tags
Delicious tags for
Wiki pages, from the
SocialBM0311 corpus.
4.4M descriptions for
289,015 entities.
Dynamic sources
Static sources
tion sources
dump
iptions for
ies.
Web anchors
Anchors from Google
WikiLinks corpus.
9.8M descriptions for
876,063 entities.
Tweets
Tweets w/ links to
Wikipedia pages
(2011-2014)
52,631 descriptions for
38,269 entities.
Queries
Queries from MSN query
logs that yield Wikipedia
clicks.
47,002 descriptions for
18,724 entities.
Social tags
Delicious tags for
Wiki pages, from the
SocialBM0311 corpus.
4.4M descriptions for
289,015 entities.
Dynamic sources
Static sources
tion sources
dump
iptions for
ies.
Web anchors
Anchors from Google
WikiLinks corpus.
9.8M descriptions for
876,063 entities.
Tweets
Tweets w/ links to
Wikipedia pages
(2011-2014)
52,631 descriptions for
38,269 entities.
Queries
Queries from MSN query
logs that yield Wikipedia
clicks.
47,002 descriptions for
18,724 entities.
Social tags
Delicious tags for
Wiki pages, from the
SocialBM0311 corpus.
4.4M descriptions for
289,015 entities.
Dynamic sources
Static sources
14
Challenge
Ò Heterogeneity
1. Description sources
2. Entities
Ò Dynamic nature
Ò Content changes over time
15
Adaptive ranking
Ò Supervised single-field weighting model
Ò Features:
Ò field similarity: retrieval score per field.
Ò field “importance”: length, novel terms, etc.
Ò entity “importance”: time since last update.
Ò Learn optimal field weights from clicks
Supervised single-field weighting model
Eeach field’s contribution towards the final score is
individually weighted, learned from clicks at set intervals.
16
Experimental setup
1. Data:
Ò MSN Query log (62,841 queries that yield entity clicks)
Ò For each query:
Ò Produce ranking
Ò Observe click
Ò Evaluate ranking (MAP/P@1)
Ò Expand entities (w/ descriptions from dynamic
sources)
Ò [re-train ranker]
17
Results
Ò Comparing effectiveness of diff. description
sources
Ò Comparing adaptive vs. non-adaptive ranker
performance
18
Description sources
0.60
0.50
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59
0 5000 10000 15000 20000 25000 30000
19
Feature weights over time
20
Adaptive vs. non-adaptive ranking
0.60
0.50
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59
0 5000 10000 15000 20000 25000 30000
21
In summary
Ò Expanding entity representations with different
sources enables better matching of queries to
entities
Ò As new content comes in, it is beneficial to retrain
the ranker
Ò Informing ranker of “expansion state” further
improves performance
22
Thank you
1 of 22

Recommended

2 Hka Researching by
2 Hka Researching2 Hka Researching
2 Hka Researchingaptwano
258 views16 slides
Power Searching with Google by
Power Searching with GooglePower Searching with Google
Power Searching with Googlescubatek
1.5K views9 slides
e0201 by
e0201e0201
e0201a1c9e2g8
243 views5 slides
Linked Open Data & Semantic Web by
Linked Open Data & Semantic WebLinked Open Data & Semantic Web
Linked Open Data & Semantic Web小蜜 許
563 views26 slides
Internet Search Tips (Google) by
Internet Search Tips (Google)Internet Search Tips (Google)
Internet Search Tips (Google)Lisa Hartman
1.3K views19 slides
Beyond Google: Advanced Search by
Beyond Google: Advanced SearchBeyond Google: Advanced Search
Beyond Google: Advanced SearchGenealogyMedia.com
4.5K views58 slides

More Related Content

Similar to Dynamic Collective Entity Representations for Entity Ranking

Dynamic Collective Entity Representations for Entity Ranking by
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDavid Graus
497 views20 slides
WTF is Semantic Web? by
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?milesw
598 views107 slides
Information retrieval and extraction by
Information retrieval and extractionInformation retrieval and extraction
Information retrieval and extractionAnkit Sharma
205 views9 slides
Interpretation, Context, and Metadata: Examples from Open Context by
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextEric Kansa
470 views31 slides
Importing life science at a into Neo4j by
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4jSimon Jupp
798 views19 slides
Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations by
Leveraging Wikipedia-based Features for Entity Relatedness and RecommendationsLeveraging Wikipedia-based Features for Entity Relatedness and Recommendations
Leveraging Wikipedia-based Features for Entity Relatedness and RecommendationsNitish Aggarwal
452 views46 slides

Similar to Dynamic Collective Entity Representations for Entity Ranking(20)

Dynamic Collective Entity Representations for Entity Ranking by David Graus
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
David Graus497 views
WTF is Semantic Web? by milesw
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?
milesw598 views
Information retrieval and extraction by Ankit Sharma
Information retrieval and extractionInformation retrieval and extraction
Information retrieval and extraction
Ankit Sharma205 views
Interpretation, Context, and Metadata: Examples from Open Context by Eric Kansa
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open Context
Eric Kansa470 views
Importing life science at a into Neo4j by Simon Jupp
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
Simon Jupp798 views
Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations by Nitish Aggarwal
Leveraging Wikipedia-based Features for Entity Relatedness and RecommendationsLeveraging Wikipedia-based Features for Entity Relatedness and Recommendations
Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations
Nitish Aggarwal452 views
Information_retrieval_and_extraction_IIIT by Ankit Sharma
Information_retrieval_and_extraction_IIITInformation_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIIT
Ankit Sharma296 views
Schema.org - An Extending Influence by Richard Wallis
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending Influence
Richard Wallis1.8K views
Semantic Web Austin Yahoo by Peter Mika
Semantic Web Austin YahooSemantic Web Austin Yahoo
Semantic Web Austin Yahoo
Peter Mika10.9K views
Year of the Monkey: Lessons from the first year of SearchMonkey by Peter Mika
Year of the Monkey: Lessons from the first year of SearchMonkeyYear of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkey
Peter Mika742 views
On the Value of Temporal Anchor Texts in Wikipedia by Nattiya Kanhabua
On the Value of Temporal Anchor Texts in WikipediaOn the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in Wikipedia
Nattiya Kanhabua803 views
Schema.org - Extending Benefits by Richard Wallis
Schema.org - Extending BenefitsSchema.org - Extending Benefits
Schema.org - Extending Benefits
Richard Wallis5.4K views
Watson at RPI - Summer 2013 by James Hendler
Watson at RPI - Summer 2013Watson at RPI - Summer 2013
Watson at RPI - Summer 2013
James Hendler11.2K views
Deploying Semantic Technologies for Digital Publishing: A Case Study from Log... by sboisen
Deploying Semantic Technologies for Digital Publishing: A Case Study from Log...Deploying Semantic Technologies for Digital Publishing: A Case Study from Log...
Deploying Semantic Technologies for Digital Publishing: A Case Study from Log...
sboisen611 views
MW2014 Workshop - Intro to Linked Open Data by David Henry
MW2014 Workshop - Intro to Linked Open DataMW2014 Workshop - Intro to Linked Open Data
MW2014 Workshop - Intro to Linked Open Data
David Henry682 views
FSU SLIS Wk 4 Info Services: Databases & Indexes by Lorri Mon
FSU SLIS Wk 4 Info Services: Databases & IndexesFSU SLIS Wk 4 Info Services: Databases & Indexes
FSU SLIS Wk 4 Info Services: Databases & Indexes
Lorri Mon501 views
FedX - Optimization Techniques for Federated Query Processing on Linked Data by aschwarte
FedX - Optimization Techniques for Federated Query Processing on Linked DataFedX - Optimization Techniques for Federated Query Processing on Linked Data
FedX - Optimization Techniques for Federated Query Processing on Linked Data
aschwarte7.1K views

More from David Graus

Pragmatic ethical and fair AI for data scientists by
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsDavid Graus
177 views36 slides
Bias in Recommendations by
Bias in RecommendationsBias in Recommendations
Bias in RecommendationsDavid Graus
2.8K views191 slides
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity. by
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.David Graus
2.4K views104 slides
CAT/AI: Computer Assisted Translation 
Assessment for Impact by
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactDavid Graus
208 views60 slides
Opening the Black Box of User Profiles in Content-based Recommender Systems by
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsDavid Graus
108 views43 slides
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy by
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyDavid Graus
2.9K views76 slides

More from David Graus(18)

Pragmatic ethical and fair AI for data scientists by David Graus
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
David Graus177 views
Bias in Recommendations by David Graus
Bias in RecommendationsBias in Recommendations
Bias in Recommendations
David Graus2.8K views
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity. by David Graus
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
David Graus2.4K views
CAT/AI: Computer Assisted Translation 
Assessment for Impact by David Graus
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for Impact
David Graus208 views
Opening the Black Box of User Profiles in Content-based Recommender Systems by David Graus
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender Systems
David Graus108 views
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy by David Graus
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
David Graus2.9K views
Layman's Talk: Entities of Interest --- Discovery in Digital Traces by David Graus
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
David Graus265 views
Financial News Mining @ PyData Amsterdam by David Graus
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData Amsterdam
David Graus748 views
De Macht van Data --- Hoe algoritmen ons leven vormgeven by David Graus
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgeven
David Graus293 views
Financial News Mining @ FD Mediagroep/Company.info by David Graus
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.info
David Graus2.5K views
Big Data & Machine Learning - Mogelijkheden & Valkuilen by David Graus
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & Valkuilen
David Graus4.5K views
Understanding Email Traffic by David Graus
Understanding Email TrafficUnderstanding Email Traffic
Understanding Email Traffic
David Graus530 views
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th by David Graus
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus554 views
Understanding Email Traffic (talk @ E-Discovery NL Symposium) by David Graus
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
David Graus4.7K views
yourHistory - entity linking for a personalized timeline of historic events by David Graus
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
David Graus3.7K views
Semantic Search in E-Discovery by David Graus
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
David Graus939 views
Semantic Annotation of the Cyttron Database by David Graus
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron Database
David Graus805 views
Semantic annotation, clustering and visualization by David Graus
Semantic annotation, clustering and visualizationSemantic annotation, clustering and visualization
Semantic annotation, clustering and visualization
David Graus546 views

Recently uploaded

Ecology by
Ecology Ecology
Ecology Abhijith Raj.R
13 views10 slides
application of genetic engineering 2.pptx by
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptxSankSurezz
14 views12 slides
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe... by
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...Anmol Vishnu Gupta
26 views12 slides
ALGAL PRODUCTS.pptx by
ALGAL PRODUCTS.pptxALGAL PRODUCTS.pptx
ALGAL PRODUCTS.pptxRASHMI M G
5 views17 slides
1978 NASA News Release Log by
1978 NASA News Release Log1978 NASA News Release Log
1978 NASA News Release Logpurrterminator
11 views146 slides
ELECTRON TRANSPORT CHAIN by
ELECTRON TRANSPORT CHAINELECTRON TRANSPORT CHAIN
ELECTRON TRANSPORT CHAINDEEKSHA RANI
10 views16 slides

Recently uploaded(20)

application of genetic engineering 2.pptx by SankSurezz
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptx
SankSurezz14 views
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe... by Anmol Vishnu Gupta
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
ELECTRON TRANSPORT CHAIN by DEEKSHA RANI
ELECTRON TRANSPORT CHAINELECTRON TRANSPORT CHAIN
ELECTRON TRANSPORT CHAIN
DEEKSHA RANI10 views
별헤는 사람들 2023년 12월호 전명원 교수 자료 by sciencepeople
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료
sciencepeople58 views
How to be(come) a successful PhD student by Tom Mens
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
Tom Mens524 views
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance... by InsideScientific
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
InsideScientific78 views
A giant thin stellar stream in the Coma Galaxy Cluster by Sérgio Sacani
A giant thin stellar stream in the Coma Galaxy ClusterA giant thin stellar stream in the Coma Galaxy Cluster
A giant thin stellar stream in the Coma Galaxy Cluster
Sérgio Sacani17 views
Open Access Publishing in Astrophysics by Peter Coles
Open Access Publishing in AstrophysicsOpen Access Publishing in Astrophysics
Open Access Publishing in Astrophysics
Peter Coles1.2K views
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor... by Trustlife
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...
Trustlife100 views
Light Pollution for LVIS students by CWBarthlmew
Light Pollution for LVIS studentsLight Pollution for LVIS students
Light Pollution for LVIS students
CWBarthlmew9 views
Experimental animal Guinea pigs.pptx by Mansee Arya
Experimental animal Guinea pigs.pptxExperimental animal Guinea pigs.pptx
Experimental animal Guinea pigs.pptx
Mansee Arya35 views
Pollination By Nagapradheesh.M.pptx by MNAGAPRADHEESH
Pollination By Nagapradheesh.M.pptxPollination By Nagapradheesh.M.pptx
Pollination By Nagapradheesh.M.pptx
MNAGAPRADHEESH19 views

Dynamic Collective Entity Representations for Entity Ranking

  • 1. Dynamic Collective Entity Representations for Entity Ranking David Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maarten de Rijke
  • 2. 2
  • 3. 3
  • 4. 4 Entity search? Ò Index = Knowledge Base (= Wikipedia) Ò Documents = Entities Ò “Real world entities” have a single representation (in KB)
  • 5. 5 Representation is not static Ò Associations between words and entities change over time Ò “ferguson shooting” -> Ferguson, Missouri Ò People talk about entities all the time
  • 7. 7 Dynamic Collective Entity Representations Ò Use “collective intelligence” to mine entity descriptions to enrich representation. Ò Is like document expansion (add terms found through explicit links) Ò Is not query expansion (terms found through predicted links)
  • 8. 8 Advantages Ò Cheap: Change document in index, leverage tried & tested retrieval algorithms Ò Free “smoothing”: (e.g., tweets) may capture ‘newly evolving’ word associations (Ferguson shooting) and incorporate out-of-document terms Ò “move relevant documents closer to queries” (= close the gap between searcher vocabulary & docs in index)
  • 9. 9 Haven’t we seen this before? Ò Anchors & queries in particular have been shown to improve retrieval [1] Ò Tweets have been shown to be similar to anchors [2] Ò Social tags, same [3] Ò But: in batch (i.e., add data, see if/how it improves retrieval) [1] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. TREC 2001 [2] G. Mishne and J. Lin. Twanchor text: A preliminary study of the value of tweets as anchor text. SIGIR ’12 [3] C.-J. Lee and W. B. Croft. Incorporating social anchors for ad hoc retrieval. OAIR ’13
  • 10. 10 Description sources Description sources KB Wikipedia dump (Aug ‘14) 57M descriptions for 4.8M entities. Web anchors Anchors from Google WikiLinks corpus. 9.8M descriptions for 876,063 entities. Tweets Tweets w/ links to Wikipedia pages (2011-2014) 52,631 descriptions for 38,269 entities. Queries Queries from MSN query logs that yield Wikipedia clicks. 47,002 descriptions for 18,724 entities. Social tags Delicious tags for Wiki pages, from the SocialBM0311 corpus. 4.4M descriptions for 289,015 entities. Dynamic sources Static sources
  • 11. 11 Original entity representation Tupac Shakur Tupac Amaru Shakur (Previously known as Lesane Parish Crooks)(too-pahk shə-koor;[1] June 16, 1971 – Septem- ber 13, 1996), also known by his stage names 2Pac and (briefly) Makaveli, was an American rapper, author, actor, and poet.[2] As of 2007, Shakur has sold over 75 million records worldwide, making him one of the best-selling music artists of all time.[3] His double disc albums All Eyez on Me and his Greatest Hits are among the [...] Original entity description Entity description
  • 12. 12 Static description sources KB Anchors 2Pac Tupac Makaveli KB Linked entities The Notorious B.I.G. Black Panther Party Muammar Gaddafi KB Redirects 2pac Shakur Thug Immortal KB Categories Murdered Rappers Death Row Record Artists American deists Web Anchors What job did Tupac have before he was a rapper Tupac Tupac is arguably more influential Tupac Amaru Shakur Tupac Shakur-style drive-by shooting Tupac Shakur Tupac Shakur reciting Shake- speare at art school Description sources KB Wikipedia dump (Aug ‘14) 57M descriptions for 4.8M entities. Web anchors Anchors from Google WikiLinks corpus. 9.8M descriptions for 876,063 entities. Tweets Tweets w/ links to Wikipedia pages (2011-2014) 52,631 descriptions for 38,269 entities. Queries Queries from MSN query logs that yield Wikipedia clicks. 47,002 descriptions for 18,724 entities. Social tags Delicious tags for Wiki pages, from th SocialBM0311 corpu 4.4M descriptions fo 289,015 entities. Dynamic sources Static sources Description sources KB Wikipedia dump (Aug ‘14) 57M descriptions for 4.8M entities. Web anchors Anchors from Google WikiLinks corpus. 9.8M descriptions for 876,063 entities. Q Q l c 4 1Static sources KB Anchors 2Pac Tupac Makaveli KB Linked entities The Notorious B.I.G. Black Panther Party Muammar Gaddafi KB Redirects 2pac Shakur Thug Immortal KB Categories Murdered Rappers Death Row Record Artists American deists Web Anchors What job did Tupac have before he was a rapper Tupac Tupac is arguably more influential Tupac Amaru Shakur Tupac Shakur-style drive-by shooting Tupac Shakur Tupac Shakur reciting Shake- speare at art school
  • 13. 13 Dynamic description sources Dynamic expansions tupac and the law hiphop/icons dead rappers people influenced by tupac awesomeartist rapd Happy Birthday Tupac!!! 2Pac Gemini RT: Las cenizas de Tupac, el mejor rapero de la historia,- fueron mezcladas con marihuana y fumadas por miembros de Outlawz Even more crazy that this was an- nounced just one day before what would have been Pac’s 40th birth- day. Tweets TagsQueries tion sources dump iptions for ies. Web anchors Anchors from Google WikiLinks corpus. 9.8M descriptions for 876,063 entities. Tweets Tweets w/ links to Wikipedia pages (2011-2014) 52,631 descriptions for 38,269 entities. Queries Queries from MSN query logs that yield Wikipedia clicks. 47,002 descriptions for 18,724 entities. Social tags Delicious tags for Wiki pages, from the SocialBM0311 corpus. 4.4M descriptions for 289,015 entities. Dynamic sources Static sources tion sources dump iptions for ies. Web anchors Anchors from Google WikiLinks corpus. 9.8M descriptions for 876,063 entities. Tweets Tweets w/ links to Wikipedia pages (2011-2014) 52,631 descriptions for 38,269 entities. Queries Queries from MSN query logs that yield Wikipedia clicks. 47,002 descriptions for 18,724 entities. Social tags Delicious tags for Wiki pages, from the SocialBM0311 corpus. 4.4M descriptions for 289,015 entities. Dynamic sources Static sources tion sources dump iptions for ies. Web anchors Anchors from Google WikiLinks corpus. 9.8M descriptions for 876,063 entities. Tweets Tweets w/ links to Wikipedia pages (2011-2014) 52,631 descriptions for 38,269 entities. Queries Queries from MSN query logs that yield Wikipedia clicks. 47,002 descriptions for 18,724 entities. Social tags Delicious tags for Wiki pages, from the SocialBM0311 corpus. 4.4M descriptions for 289,015 entities. Dynamic sources Static sources
  • 14. 14 Challenge Ò Heterogeneity 1. Description sources 2. Entities Ò Dynamic nature Ò Content changes over time
  • 15. 15 Adaptive ranking Ò Supervised single-field weighting model Ò Features: Ò field similarity: retrieval score per field. Ò field “importance”: length, novel terms, etc. Ò entity “importance”: time since last update. Ò Learn optimal field weights from clicks Supervised single-field weighting model Eeach field’s contribution towards the final score is individually weighted, learned from clicks at set intervals.
  • 16. 16 Experimental setup 1. Data: Ò MSN Query log (62,841 queries that yield entity clicks) Ò For each query: Ò Produce ranking Ò Observe click Ò Evaluate ranking (MAP/P@1) Ò Expand entities (w/ descriptions from dynamic sources) Ò [re-train ranker]
  • 17. 17 Results Ò Comparing effectiveness of diff. description sources Ò Comparing adaptive vs. non-adaptive ranker performance
  • 20. 20 Adaptive vs. non-adaptive ranking 0.60 0.50 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0 5000 10000 15000 20000 25000 30000
  • 21. 21 In summary Ò Expanding entity representations with different sources enables better matching of queries to entities Ò As new content comes in, it is beneficial to retrain the ranker Ò Informing ranker of “expansion state” further improves performance