4. 4
Entity search?
Ò Index = Knowledge Base (= Wikipedia)
Ò Documents = Entities
Ò “Real world entities” have a single representation
(in KB)
5. 5
Representation is not static
Ò Associations between words and entities change
over time
Ò “ferguson shooting” -> Ferguson, Missouri
Ò People talk about entities all the time
7. 7
Dynamic Collective Entity
Representations
Ò Use “collective intelligence” to mine entity
descriptions to enrich representation.
Ò Is like document expansion (add terms found
through explicit links)
Ò Is not query expansion (terms found through
predicted links)
8. 8
Advantages
Ò Cheap: Change document in index, leverage tried &
tested retrieval algorithms
Ò Free “smoothing”: (e.g., tweets) may capture ‘newly
evolving’ word associations (Ferguson shooting) and
incorporate out-of-document terms
Ò “move relevant documents closer to queries” (= close
the gap between searcher vocabulary & docs in index)
9. 9
Haven’t we seen this before?
Ò Anchors & queries in particular have been shown to
improve retrieval [1]
Ò Tweets have been shown to be similar to anchors [2]
Ò Social tags, same [3]
Ò But: in batch (i.e., add data, see if/how it improves
retrieval)
[1] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. TREC 2001
[2] G. Mishne and J. Lin. Twanchor text: A preliminary study of the value of tweets as anchor text. SIGIR ’12
[3] C.-J. Lee and W. B. Croft. Incorporating social anchors for ad hoc retrieval. OAIR ’13
10. 10
Description sources
Description sources
KB
Wikipedia dump
(Aug ‘14)
57M descriptions for
4.8M entities.
Web anchors
Anchors from Google
WikiLinks corpus.
9.8M descriptions for
876,063 entities.
Tweets
Tweets w/ links to
Wikipedia pages
(2011-2014)
52,631 descriptions for
38,269 entities.
Queries
Queries from MSN query
logs that yield Wikipedia
clicks.
47,002 descriptions for
18,724 entities.
Social tags
Delicious tags for
Wiki pages, from the
SocialBM0311 corpus.
4.4M descriptions for
289,015 entities.
Dynamic sources
Static sources
11. 11
Original entity representation
Tupac Shakur
Tupac Amaru Shakur
(Previously known
as Lesane Parish
Crooks)(too-pahk
shə-koor;[1] June
16, 1971 – Septem-
ber 13, 1996), also
known by his stage
names 2Pac and
(briefly) Makaveli,
was an American
rapper, author,
actor, and poet.[2]
As of 2007, Shakur
has sold over 75
million records
worldwide, making
him one of the
best-selling music
artists of all
time.[3] His double
disc albums All
Eyez on Me and his
Greatest Hits are
among the [...]
Original entity description
Entity description
12. 12
Static description sources
KB Anchors
2Pac
Tupac
Makaveli
KB Linked entities
The Notorious B.I.G.
Black Panther Party
Muammar Gaddafi
KB Redirects
2pac Shakur
Thug Immortal
KB Categories
Murdered Rappers
Death Row Record Artists
American deists
Web Anchors
What job did Tupac have before
he was a rapper
Tupac
Tupac is arguably more
influential
Tupac Amaru Shakur
Tupac Shakur-style drive-by
shooting
Tupac Shakur
Tupac Shakur reciting Shake-
speare at art school
Description sources
KB
Wikipedia dump
(Aug ‘14)
57M descriptions for
4.8M entities.
Web anchors
Anchors from Google
WikiLinks corpus.
9.8M descriptions for
876,063 entities.
Tweets
Tweets w/ links to
Wikipedia pages
(2011-2014)
52,631 descriptions for
38,269 entities.
Queries
Queries from MSN query
logs that yield Wikipedia
clicks.
47,002 descriptions for
18,724 entities.
Social tags
Delicious tags for
Wiki pages, from th
SocialBM0311 corpu
4.4M descriptions fo
289,015 entities.
Dynamic sources
Static sources
Description sources
KB
Wikipedia dump
(Aug ‘14)
57M descriptions for
4.8M entities.
Web anchors
Anchors from Google
WikiLinks corpus.
9.8M descriptions for
876,063 entities.
Q
Q
l
c
4
1Static sources
KB Anchors
2Pac
Tupac
Makaveli
KB Linked entities
The Notorious B.I.G.
Black Panther Party
Muammar Gaddafi
KB Redirects
2pac Shakur
Thug Immortal
KB Categories
Murdered Rappers
Death Row Record Artists
American deists
Web Anchors
What job did Tupac have before
he was a rapper
Tupac
Tupac is arguably more
influential
Tupac Amaru Shakur
Tupac Shakur-style drive-by
shooting
Tupac Shakur
Tupac Shakur reciting Shake-
speare at art school
13. 13
Dynamic description sources
Dynamic expansions
tupac and the law
hiphop/icons
dead rappers
people influenced by tupac
awesomeartist rapd
Happy Birthday
Tupac!!! 2Pac Gemini
RT: Las cenizas de Tupac, el
mejor rapero de la historia,-
fueron mezcladas con marihuana y
fumadas por miembros de Outlawz
Even more crazy that this was an-
nounced just one day before what
would have been Pac’s 40th birth-
day.
Tweets TagsQueries
tion sources
dump
iptions for
ies.
Web anchors
Anchors from Google
WikiLinks corpus.
9.8M descriptions for
876,063 entities.
Tweets
Tweets w/ links to
Wikipedia pages
(2011-2014)
52,631 descriptions for
38,269 entities.
Queries
Queries from MSN query
logs that yield Wikipedia
clicks.
47,002 descriptions for
18,724 entities.
Social tags
Delicious tags for
Wiki pages, from the
SocialBM0311 corpus.
4.4M descriptions for
289,015 entities.
Dynamic sources
Static sources
tion sources
dump
iptions for
ies.
Web anchors
Anchors from Google
WikiLinks corpus.
9.8M descriptions for
876,063 entities.
Tweets
Tweets w/ links to
Wikipedia pages
(2011-2014)
52,631 descriptions for
38,269 entities.
Queries
Queries from MSN query
logs that yield Wikipedia
clicks.
47,002 descriptions for
18,724 entities.
Social tags
Delicious tags for
Wiki pages, from the
SocialBM0311 corpus.
4.4M descriptions for
289,015 entities.
Dynamic sources
Static sources
tion sources
dump
iptions for
ies.
Web anchors
Anchors from Google
WikiLinks corpus.
9.8M descriptions for
876,063 entities.
Tweets
Tweets w/ links to
Wikipedia pages
(2011-2014)
52,631 descriptions for
38,269 entities.
Queries
Queries from MSN query
logs that yield Wikipedia
clicks.
47,002 descriptions for
18,724 entities.
Social tags
Delicious tags for
Wiki pages, from the
SocialBM0311 corpus.
4.4M descriptions for
289,015 entities.
Dynamic sources
Static sources
15. 15
Adaptive ranking
Ò Supervised single-field weighting model
Ò Features:
Ò field similarity: retrieval score per field.
Ò field “importance”: length, novel terms, etc.
Ò entity “importance”: time since last update.
Ò Learn optimal field weights from clicks
Supervised single-field weighting model
Eeach field’s contribution towards the final score is
individually weighted, learned from clicks at set intervals.
16. 16
Experimental setup
1. Data:
Ò MSN Query log (62,841 queries that yield entity clicks)
Ò For each query:
Ò Produce ranking
Ò Observe click
Ò Evaluate ranking (MAP/P@1)
Ò Expand entities (w/ descriptions from dynamic
sources)
Ò [re-train ranker]
21. 21
In summary
Ò Expanding entity representations with different
sources enables better matching of queries to
entities
Ò As new content comes in, it is beneficial to retrain
the ranker
Ò Informing ranker of “expansion state” further
improves performance