Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Dynamic Collective Entity
Representations for Entity Ranking
David Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maa...
2
3
4
Entity search?
Ò Index = Knowledge Base (= Wikipedia)
Ò Documents = Entities
Ò “Real world entities” have a single repre...
5
Representation is not static
Ò Associations between words and entities change
over time
Ò “ferguson shooting” -> Ferguso...
6
*****
7
Dynamic Collective Entity
Representations
Ò Use “collective intelligence” to mine entity
descriptions to enrich represen...
8
Advantages
Ò Cheap: Change document in index, leverage tried &
tested retrieval algorithms
Ò Free “smoothing”: (e.g., tw...
9
Haven’t we seen this before?
Ò Anchors & queries in particular have been shown to
improve retrieval [1]
Ò Tweets have be...
10
Description sources
Description sources
KB
Wikipedia dump
(Aug ‘14)
57M descriptions for
4.8M entities.
Web anchors
Anc...
11
Original entity representation
Tupac Shakur
Tupac Amaru Shakur
(Previously known
as Lesane Parish
Crooks)(too-pahk
shə-...
12
Static description sources
KB Anchors
2Pac
Tupac
Makaveli
KB Linked entities
The Notorious B.I.G.
Black Panther Party
M...
13
Dynamic description sources
Dynamic expansions
tupac and the law
hiphop/icons
dead rappers
people influenced by tupac
a...
14
Challenge
Ò Heterogeneity
1. Description sources
2. Entities
Ò Dynamic nature
Ò Content changes over time
15
Adaptive ranking
Ò Supervised single-field weighting model
Ò Features:
Ò field similarity: retrieval score per field.
Ò...
16
Experimental setup
1. Data:
Ò MSN Query log (62,841 queries that yield entity clicks)
Ò For each query:
Ò Produce ranki...
17
Results
Ò Comparing effectiveness of diff. description
sources
Ò Comparing adaptive vs. non-adaptive ranker
performance
18
Description sources
0.60
0.50
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59
0 5000 10000 15000 20000 25000 30000
19
Feature weights over time
20
Adaptive vs. non-adaptive ranking
0.60
0.50
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59
0 5000 10000 15000 20000 25000...
21
In summary
Ò Expanding entity representations with different
sources enables better matching of queries to
entities
Ò A...
22
Thank you
Upcoming SlideShare
Loading in …5
×

Dynamic Collective Entity Representations for Entity Ranking

396 views

Published on

Talk at DIR 2015

Published in: Science
  • Be the first to comment

  • Be the first to like this

Dynamic Collective Entity Representations for Entity Ranking

  1. 1. Dynamic Collective Entity Representations for Entity Ranking David Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maarten de Rijke
  2. 2. 2
  3. 3. 3
  4. 4. 4 Entity search? Ò Index = Knowledge Base (= Wikipedia) Ò Documents = Entities Ò “Real world entities” have a single representation (in KB)
  5. 5. 5 Representation is not static Ò Associations between words and entities change over time Ò “ferguson shooting” -> Ferguson, Missouri Ò People talk about entities all the time
  6. 6. 6 *****
  7. 7. 7 Dynamic Collective Entity Representations Ò Use “collective intelligence” to mine entity descriptions to enrich representation. Ò Is like document expansion (add terms found through explicit links) Ò Is not query expansion (terms found through predicted links)
  8. 8. 8 Advantages Ò Cheap: Change document in index, leverage tried & tested retrieval algorithms Ò Free “smoothing”: (e.g., tweets) may capture ‘newly evolving’ word associations (Ferguson shooting) and incorporate out-of-document terms Ò “move relevant documents closer to queries” (= close the gap between searcher vocabulary & docs in index)
  9. 9. 9 Haven’t we seen this before? Ò Anchors & queries in particular have been shown to improve retrieval [1] Ò Tweets have been shown to be similar to anchors [2] Ò Social tags, same [3] Ò But: in batch (i.e., add data, see if/how it improves retrieval) [1] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. TREC 2001 [2] G. Mishne and J. Lin. Twanchor text: A preliminary study of the value of tweets as anchor text. SIGIR ’12 [3] C.-J. Lee and W. B. Croft. Incorporating social anchors for ad hoc retrieval. OAIR ’13
  10. 10. 10 Description sources Description sources KB Wikipedia dump (Aug ‘14) 57M descriptions for 4.8M entities. Web anchors Anchors from Google WikiLinks corpus. 9.8M descriptions for 876,063 entities. Tweets Tweets w/ links to Wikipedia pages (2011-2014) 52,631 descriptions for 38,269 entities. Queries Queries from MSN query logs that yield Wikipedia clicks. 47,002 descriptions for 18,724 entities. Social tags Delicious tags for Wiki pages, from the SocialBM0311 corpus. 4.4M descriptions for 289,015 entities. Dynamic sources Static sources
  11. 11. 11 Original entity representation Tupac Shakur Tupac Amaru Shakur (Previously known as Lesane Parish Crooks)(too-pahk shə-koor;[1] June 16, 1971 – Septem- ber 13, 1996), also known by his stage names 2Pac and (briefly) Makaveli, was an American rapper, author, actor, and poet.[2] As of 2007, Shakur has sold over 75 million records worldwide, making him one of the best-selling music artists of all time.[3] His double disc albums All Eyez on Me and his Greatest Hits are among the [...] Original entity description Entity description
  12. 12. 12 Static description sources KB Anchors 2Pac Tupac Makaveli KB Linked entities The Notorious B.I.G. Black Panther Party Muammar Gaddafi KB Redirects 2pac Shakur Thug Immortal KB Categories Murdered Rappers Death Row Record Artists American deists Web Anchors What job did Tupac have before he was a rapper Tupac Tupac is arguably more influential Tupac Amaru Shakur Tupac Shakur-style drive-by shooting Tupac Shakur Tupac Shakur reciting Shake- speare at art school Description sources KB Wikipedia dump (Aug ‘14) 57M descriptions for 4.8M entities. Web anchors Anchors from Google WikiLinks corpus. 9.8M descriptions for 876,063 entities. Tweets Tweets w/ links to Wikipedia pages (2011-2014) 52,631 descriptions for 38,269 entities. Queries Queries from MSN query logs that yield Wikipedia clicks. 47,002 descriptions for 18,724 entities. Social tags Delicious tags for Wiki pages, from th SocialBM0311 corpu 4.4M descriptions fo 289,015 entities. Dynamic sources Static sources Description sources KB Wikipedia dump (Aug ‘14) 57M descriptions for 4.8M entities. Web anchors Anchors from Google WikiLinks corpus. 9.8M descriptions for 876,063 entities. Q Q l c 4 1Static sources KB Anchors 2Pac Tupac Makaveli KB Linked entities The Notorious B.I.G. Black Panther Party Muammar Gaddafi KB Redirects 2pac Shakur Thug Immortal KB Categories Murdered Rappers Death Row Record Artists American deists Web Anchors What job did Tupac have before he was a rapper Tupac Tupac is arguably more influential Tupac Amaru Shakur Tupac Shakur-style drive-by shooting Tupac Shakur Tupac Shakur reciting Shake- speare at art school
  13. 13. 13 Dynamic description sources Dynamic expansions tupac and the law hiphop/icons dead rappers people influenced by tupac awesomeartist rapd Happy Birthday Tupac!!! 2Pac Gemini RT: Las cenizas de Tupac, el mejor rapero de la historia,- fueron mezcladas con marihuana y fumadas por miembros de Outlawz Even more crazy that this was an- nounced just one day before what would have been Pac’s 40th birth- day. Tweets TagsQueries tion sources dump iptions for ies. Web anchors Anchors from Google WikiLinks corpus. 9.8M descriptions for 876,063 entities. Tweets Tweets w/ links to Wikipedia pages (2011-2014) 52,631 descriptions for 38,269 entities. Queries Queries from MSN query logs that yield Wikipedia clicks. 47,002 descriptions for 18,724 entities. Social tags Delicious tags for Wiki pages, from the SocialBM0311 corpus. 4.4M descriptions for 289,015 entities. Dynamic sources Static sources tion sources dump iptions for ies. Web anchors Anchors from Google WikiLinks corpus. 9.8M descriptions for 876,063 entities. Tweets Tweets w/ links to Wikipedia pages (2011-2014) 52,631 descriptions for 38,269 entities. Queries Queries from MSN query logs that yield Wikipedia clicks. 47,002 descriptions for 18,724 entities. Social tags Delicious tags for Wiki pages, from the SocialBM0311 corpus. 4.4M descriptions for 289,015 entities. Dynamic sources Static sources tion sources dump iptions for ies. Web anchors Anchors from Google WikiLinks corpus. 9.8M descriptions for 876,063 entities. Tweets Tweets w/ links to Wikipedia pages (2011-2014) 52,631 descriptions for 38,269 entities. Queries Queries from MSN query logs that yield Wikipedia clicks. 47,002 descriptions for 18,724 entities. Social tags Delicious tags for Wiki pages, from the SocialBM0311 corpus. 4.4M descriptions for 289,015 entities. Dynamic sources Static sources
  14. 14. 14 Challenge Ò Heterogeneity 1. Description sources 2. Entities Ò Dynamic nature Ò Content changes over time
  15. 15. 15 Adaptive ranking Ò Supervised single-field weighting model Ò Features: Ò field similarity: retrieval score per field. Ò field “importance”: length, novel terms, etc. Ò entity “importance”: time since last update. Ò Learn optimal field weights from clicks Supervised single-field weighting model Eeach field’s contribution towards the final score is individually weighted, learned from clicks at set intervals.
  16. 16. 16 Experimental setup 1. Data: Ò MSN Query log (62,841 queries that yield entity clicks) Ò For each query: Ò Produce ranking Ò Observe click Ò Evaluate ranking (MAP/P@1) Ò Expand entities (w/ descriptions from dynamic sources) Ò [re-train ranker]
  17. 17. 17 Results Ò Comparing effectiveness of diff. description sources Ò Comparing adaptive vs. non-adaptive ranker performance
  18. 18. 18 Description sources 0.60 0.50 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0 5000 10000 15000 20000 25000 30000
  19. 19. 19 Feature weights over time
  20. 20. 20 Adaptive vs. non-adaptive ranking 0.60 0.50 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0 5000 10000 15000 20000 25000 30000
  21. 21. 21 In summary Ò Expanding entity representations with different sources enables better matching of queries to entities Ò As new content comes in, it is beneficial to retrain the ranker Ò Informing ranker of “expansion state” further improves performance
  22. 22. 22 Thank you

×