SlideShare a Scribd company logo
1 of 68
Download to read offline
En#ty Search
The Last Decade and the Next
Krisz#an Balog
University of Stavanger

@krisz'anbalog
10th Russian Summer School in Informa'on Retrieval (RuSSIR 2016) | Saratov, Russia, 2016
WHAT IS AN ENTITY?
• An en#ty is an "object" or
"thing" in the real world that
can be dis'nctly iden'fied and
is characterized by the following
proper#es:
• unique iden#fier(s)
• name(s)
• type(s)
• aRributes (or descrip#on)
• (typed) rela#onships to other en##es
people
products
organiza#ons
loca#ons
OUTLINE
2

Present
1

Past
3

Future
now-10y +10y
THE PAST
1
PART
The core problem of en#ty ranking and its inves#ga#on at various
benchmarking evalua#on campaigns
EVALUATION CYCLE
02. Experimental
design
03. Method
development
05. Repor'ng
REVISION
04. Experimental
evalua'on
IDEA
01. Task defini'on
ENTITY RANKING TASK
search query
retrieval
method
search results
EVALUATION CAMPAIGNS
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
INEX
Question Answering over Linked Data
EVALUATION CAMPAIGNS
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
INEX
Question Answering over Linked Data
Task: expert finding
Input: keyword query
Data collec'on: enterprise intranet
En'ty ID: email address
ontology engineering climate change
xxx xxxx xx xx xxxx xx
x xxxxxx xxx x xxxxxx
xxxx xxxx xx xxxx xx
xxxx xx xxxx xx xxxxxx
xx xxxx xxxxx xxx x
xxxxxxx
xxx xxxx xx xx xxxx xx
x xxxxxx xxx x xxxxxx
xxxx xxxx xx xxxx xx
xxxx xx xxxx xx xxxxxx
xx xxxx xxxxx xxx x
xxxxxxx
TREC ENTERPRISE EXPERT FINDING
• How to rank en##es that have no direct
representa#ons?
• Idea: Look at co-occurrences of en##es and query
terms in documents
xxx xxxx xx xx xxxx xx
x xxxxxx xxx x xxxxxx
xxxx xxxx xx xxxx xx
xxxx xx xxxx xx xxxxxx
xx xxxx xxxxx xxx x
xxxxxxx
query terms
en#ty men#on
documents
PROFILE-BASED METHODS
• Build a direct term-based en#ty
representa#on based on
associated language usage
• "You shall know a word by the
company it keeps." [Firth, 1957]
• Use document retrieval
techniques for ranking en#ty
profile documents
q
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xx x xxx xx xxxx
xx xxx xx x xxxxx xxx xx
x xxxx x xxx xx xxxxxx
xx x xxx xx x xxxx xx
xxx x xxxxxx xx x xxx xx
xxxx xx xxx xx x xxxxx
xxx xx x
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xx x xxx xx xxxx
xx xxx xx x xxxxx xxx xx
x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xxxxxx xx x xxx
xx x xxxx xx xxx x xxxxx
xx x xxx xx xxxx xx xxx
xx x xxxxx xxx
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xx x xxx xx xxxx
xx xxx xx x xxxxx xxx xx
x xxxx x xxx xx
e
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xx x xxx xx xxxx
xx xxx xx x xxxxx xxx xx
x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xx x xxx xx xxxx
xx xxx xx x xxxxx xxx xx
x xxxx x xxx xx
e
e
DOCUMENT-BASED METHODS
• First rank documents 

(or document snippets)
• Then aggregate evidence for
the associated en##es
q
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xx x xxx xx xxxx
xx xxx xx x xxxxx xxx xx
x xxxx x xxx xx xxxxxx
xx x xxx xx x xxxx xx
xxx x xxxxxx xx x xxx xx
xxxx xx xxx xx x xxxxx
xxx xx x
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xx x xxx xx xxxx
xx xxx xx x xxxxx xxx xx
x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xxxxxx xx x xxx
xx x xxxx xx xxx x xxxxx
xx x xxx xx xxxx xx xxx
xx x xxxxx xxx
X
e
X
X
e
e
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
EVALUATION CAMPAIGNS
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
INEX
Question Answering over Linked Data
Task: en#ty ranking in Wikipedia
Input:
keyword++ query 

(target types/examples)
Data collec'on: Wikipedia
En'ty ID: Wikipedia ar#cle ID
Movies with eight or more Academy Awards
+category: best picture oscar
+category: bri#sh films
+category: american films
INEX ENTITY RANKING
Movies with eight or more Academy Awards
+category: best picture oscar
+category: bri#sh films
+category: american films
Term-based representa'on
Category-based representa'on
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
EVALUATION CAMPAIGNS
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
INEX
Question Answering over Linked Data
Task: related en#ty finding
Input:
keyword++ query 

(input en#ty, target type)
Data collec'on: Web
En'ty ID: en#ty homepage URL
airlines that currently use Boeing-747 planes
+en'ty: Boeing-747 (clueweb09-..292)
+target type: organiza#on
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
EVALUATION CAMPAIGNS
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
INEX
Question Answering over Linked Data
Task:
en#ty search
in the Web of Data
Input: keyword query
Data collec'on: RDF triples
En'ty ID: URI
nokia e73
boroughs of New York City
disney orlando
FIELDED DOCUMENT REPRESENTATION
FROM RDF TRIPLES
dbpedia:Audi_A4
subject object
predicate
subject
predicate
literal
foaf:name Audi A4
rdfs:label Audi A4
rdfs:comment The Audi A4 is a compact executive car
produced since late 1994 by the German car
manufacturer Audi, a subsidiary of the
Volkswagen Group. The A4 has been built
[...]
dbpprop:production 1994
2001
2005
2008
rdf:type dbpedia-owl:MeanOfTransportation
dbpedia-owl:Automobile
dbpedia-owl:manufacturer dbpedia:Audi
dbpedia-owl:class dbpedia:Compact_executive_car
owl:sameAs freebase:Audi A4
is dbpedia-owl:predecessor of dbpedia:Audi_A5
is dbpprop:similar of dbpedia:Cadillac_BLS
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
EVALUATION CAMPAIGNS
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
INEX
Question Answering over Linked Data
Task: ques#on answering over RDF data
Input: natural language query
Data collec'on: RDF triples
En'ty ID: URI
Which German ci#es have more than
250000 inhabitants?
Who is the youngest Pulitzer Prize winner?
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
EVALUATION CAMPAIGNS
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
INEX
Question Answering over Linked Data
Task: ad-hoc en#ty retrieval
Input: keyword query
Data collec'on: Wikipedia + RDF triples
En'ty ID: Wikipedia ar#cle ID
NASA country German
EVALUATION CAMPAIGNS
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
INEX
Question Answering over Linked Data
DATA EVOLUTION
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
TREC Enterprise TREC Entity
INEX Entity Ranking
SemSearch
Question Answering over Linked Data
unstructured
structured
semistructured
INEX
• Clear trend moving towards structured data
• No meaningful/successful aRempt at combining unstructured and
structured data
QUERY EVOLUTION
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
Question Answering over Linked Data
keyword
natural language
keyword++
INEX
• Keyword queries are s#ll the most common way to search
• From providing explicit seman#c annota#ons to natural language
ques#ons
WHAT HAVE WE BEEN DOING?
• Core focus has been on retrieval models, and more
specifically on en'ty representa'ons
• In terms of associated language usage, descrip#on, types,
aRributes
• Richer query representa#ons (i.e., query
annota#ons) were taken for granted
image source: hRps://www.pinterest.com/pin/382946774535857111/
THE BIGGER PICTURE
Understanding
informa'on needs
Data source(s)
Result presenta'on 

& user interac'on
Retrieval method
THE PRESENT
2
PART
Current research themes on various aspects of en#ty search.
DATA
J. Benetka, K. Balog, and K. Nørvåg. 

Towards Building a Knowledge Base of Monetary
Transac'ons from a News Collec'on. 

JCDL’17.
KNOWLEDGE BASES
• Modern en#ty-oriented search features are fueled
by knowledge bases—need con#nuous upda#ng
• Cri#cal to be able to verify the validity of data
• Supply provenance informa#on for each statement
• Validity check (s#ll) needs to be performed by a human
• Can we help human editors to maintain and
expand knowledge bases?
acquisitionFinancial event:OracleSubject: Find events
InsertConfidence
2004
NYT
USD 10 300 000 000
Value
NYT
Year
56%
2007
USD 1 500 000
… from the PeopleSoft purchase …
2005 NYT
2004
NYT
Snippet
NYT
82.8% …Oracle finally acquired PeopleSoft for…
pleSoft finally capitulated to Oracle's …
Link
2004
… which acquired PeopleSoft last year …
USD 11
75.3% USD 20 000 000 000
78.9%
66.7% PeopleSoft for $5.1 billion in cash.
USD 7 700 000 000
Counterpart Event attributes
Hyperion Solutions
Siebel Systems
Retek
PeopleSoft
BUILDING A KNOWLEDGE BASE OF
MONETARY TRANSACTIONS
Subject en'ty Predicate filter
Object en'ty
Extracted informa'on
A Boom in Merger Activity
In December 2004, after a
battle for control that grew
nasty, Oracle finally acquired
PeopleSoft for about $10.3
billion, becoming the second-
largest maker of business-
management software.
APPROACH
• Generate all possible event
interpreta#ons (quintuples)
Event representa'on
• Monetary value recogni#on
• Economic event recogni#on
• En#ty recogni#on
• Date extrac#on
• Seman#c role labeling
Seman'c annota'on of sentences
• Grouping sentences that discuss
the same economic event
Clustering events
• Assigning confidence score to
each interpreta#on
Supervised learning
s#1
s#2
s#3
s#4
s#5
s#1
s#1
s#2
s#5
s#3
s#4
0.85
0.65
0.91
0.43
0.45
0.77
1
2 3
4
s#1
s#2
s#5
A B
A B
A B
s#3
s#4
C D
C D
e#1
[C] <rel> [D]
e#2
[A] <rel> [B]
{
{
RESULTS
F1
0
0,1
0,2
0,3
0,4
Events A]ributes (strict) A]ributes (relaxed)
First repor#ng Last repor#ng Most frequent Supervised learning
SUMMARY
• Building a domain-specific knowledge base
• NLP pipeline for informa#on extrac#on
• ML for establishing confidence for human processing
• Open research problems
• Long-tail en##es
• En##es "not worthy" of a Wikipedia page
• What are the aRributes that ma#er?
UNDERSTANDING
INFORMATION NEEDS
F. Hasibi, K. Balog, and S. E. Bratsberg. 

Exploi'ng En'ty Linking in Queries for En'ty Retrieval. 

ICTIR’16.
ANNOTATING QUERIES WITH ENTITIES
• Seman#c annota#ons of queries
were taken for granted so far
• How can automa'c en'ty
annota'ons of queries be
leveraged to improve en'ty
retrieval?
barack obama parents
APPROACH
<Barack_Obama>
Annotations:
barack obama parents
Entity-based representation ˆDˆD
Term-based representation DD
term-based
matching
entity-based
matching
entity linking
<dbo:birthPlace>: [<Honolulu>,
<Hawaii> ]
<dbo:child>: <Barack_Obama>
<dbo:wikiPageWikiLink>:
[ <United_States>,
<Family_of_Barack_Obama>, …]
Query terms:
<rdfs:label>: Ann Dunham
<dbo:abstract>: Stanley Ann Dunham the mother
Barack Obama, was an American
anthropologist who …
<dbo:birthPlace>: Honolulu Hawaii …
<dbo:child>: Barack Obama
<dbo:wikiPageWikiLink>:
United States Family Barack Obama
Term-based representa'on
En'ty-based representa'on
barack obama parents
<Barack_Obama>
Annotations:
barack obama parents
Entity-based representation ˆDˆD
Term-based representation DD
term-based
matching
entity-based
matching
entity linking
<dbo:birthPlace>: [<Honolulu>,
<Hawaii> ]
<dbo:child>: <Barack_Obama>
<dbo:wikiPageWikiLink>:
[ <United_States>,
<Family_of_Barack_Obama>, …]
Query terms:
<rdfs:label>: Ann Dunham
<dbo:abstract>: Stanley Ann Dunham the mother
Barack Obama, was an American
anthropologist who …
<dbo:birthPlace>: Honolulu Hawaii …
<dbo:child>: Barack Obama
<dbo:wikiPageWikiLink>:
United States Family Barack Obama
<Barack_Obama>
en'ty annota'on
(automa'c)
RESULTS
MAP
0,00
0,06
0,11
0,17
0,22
LM MLM-tc MLM-all PRMS SDM FSDM
baseline +ELR
ANALYSIS
SUMMARY
• Automa#cally annota#ng queries with en##es can
significantly improve retrieval performance
• Open research problem:
• How should a query be answered (list, fact, table, etc.)?
ENTITY SUMMARIES
F. Hasibi, K. Balog, and S. E. Bratsberg. 

Dynamic Factual Summaries for En'ty Cards. 

SIGIR’17.
ENTITY SUMMARIES
• Summaries serve a dual purpose
• Synopsis of the en#ty
• Provide evidence why the en#ty is a good answer
for the given query
• How to generate dynamic en'ty
summaries that can directly address
users’ informa'on needs?
• Two subtasks
• Fact ranking — What should be in the summary?
• Summary genera#on — How should it be presented?
EXAMPLE
einstein awards
Sta'c (query-independent) summary Dynamic (query-dependent) summary
Born: March 14, 1879, Ulm, Germany
Died: April 18, 1955, Princeton, New Jersey, United States
Influenced by: Isaac Newton, Mahatma Gandhi, more
Spouse: Elsa Einstein, Mileva Marić
Children: Eduard Einstein, Lieserl Einstein, Hans A. Einstein
Born: March 14, 1879, Ulm, Germany
Died: April 18, 1955, Princeton, New Jersey, United States
Awards: Barnard Medal, Nobel Prize in Physics, more
Educa'on: Swiss Federal Polytechnic, University of Zurich
Influenced by: Isaac Newton, Mahatma Gandhi, more
FACT RANKING
• Ranking en#ty facts according to various
"goodness" criteria
• Importance: how well it describes the en#ty
• Relevance: how well it supports/explains why the en#ty is a
relevant result for the given query (informa#on need)
• U'lity: combines importance and relevance
• Learning-to-rank approach with specific features
designed for capturing importance and relevance
SUMMARY GENERATION
• A summary is more than a ranked list of facts
Seman'cally
iden'cal
predicates
Presenta'on 

(human-readable labels, size constraints)
Mul'-valued
predicates
<dbo:capital> <dbpedia:Oslo>
<dbo:currency> <dbpedia:Norwegian_krone>
<dbo:leader> <dbpedia:Harald_V_of_Norway>
<dbp:establishedDate> 1814-05-17
<dbp:leaderName> <dbpedia:Harald_V_of_Norway>
<foaf:homepage> <hRp://www.norway.no/>
<dbo:language> <dbpedia:Norwegian_language>
<dbo:language> <dbpedia:Romani_language>
<dbo:language> <dbpedia:Scandoromani_language>
<dbp:website> <hRp://www.norway.no/>
<dbo:leaderTitle> President of the Stor#ng
<dbp:areaKm> 385178
vs.
Capital: Oslo
Currency: Norwegian krone
Leader: Harald V of Norway
Homepage: hRp://www.norway.no/
Language: Norwegian, Romani, more
SUMMARY GENERATION ALGORITHM
… …
headingiheadingi valueivaluei
height(⌧h)height(⌧h)
width(⌧w)width(⌧w)
lineilinei
1. Selec'ng line headings
• Recognizing seman#cally iden#cal predicates
• Mapping predicates to human readable labels
2. Collec'ng line values
• Grouping values for mul#-valued predicates
• Adhering to size constraints
FACT RANKING EVALUATION
All facts Facts with URI-only objects
NGCD@10
0
0,2
0,4
0,6
0,8
Importance U'lity
RELIN DynES/imp DynES
NGCD@10
0,00
0,23
0,45
0,68
0,90
Importance U'lity
RELIN LinkSUM SUMMARUM
DynES/imp DynES
END-TO-END (SUMMARY) EVALUATION
• How do sta#c and dynamic summaries compare
against each other?
Oracle (perfect) fact ranking
Automa#c fact ranking
0 25 50 75 100
31
37
23
16
46
47
Dynamic summary wins Sta#c summary wins
SUMMARY
• Addressed the problem of genera#ng dynamic
(query-dependent) en#ty summaries
• Open research problems
• What should be on the en#ty card?
• Other forms of result presenta#on (tables, lists, graphs, etc.)
ANTICIPATING
INFORMATION NEEDS
J. Benetka, K. Balog, and K. Nørvåg. 

An'cipa'ng Informa'on Needs Based on Check-in Ac'vity. 

WSDM’17.
ZERO-QUERY SEARCH
• ProacAve instead of reacAve search
• "An#cipate user needs and respond with
informa#on appropriate to the current
context without the user having to enter a
query" — (Allan et al., SIGIR Forum 2012)
• Using a person's check-in ac'vity
as context, can we an'cipate her
informa'on needs, and respond
with a set of informa'on cards
that directly address those needs?
Terminal
Weather
21ºC
Traffic
INFORMATION NEEDS FOR ACTIVITIES
• What are relevant informa#on needs in the context of
a given ac#vity?
• Use POI categories (Foursquare) to represent ac#vi#es
• Mine informa#on needs from search sugges#ons
ANTICIPATING INFORMATION NEEDS
• Maximize the likelihood of sa#sfying the user's
informa#on needs by considering each possible ac#vity
that might follow next
• Transi#on probabili#es are es#mated based on historical
check-in data
Activity A
Activity B
Activity C
Activity D
45%
34%
21%
?
Train Test80%
User 3
User 2
User 1
Check-in dataset
EVALUATION METHODOLOGY
Terminal
Weather
21ºC
Traffic
RESULTSNGCD@5
0,00
0,23
0,45
0,68
0,90
Top level Second level
Most frequent informa#on needs,
regardless of the last ac#vity
M0
Consider informa#on needs for all
possible upcoming ac#vi#es
In addi#on, consider the informa#on
needs relevant to the past ac#vity
(fixed weight for all info needs)
Consider the temporal sensi#vity of
each informa#on need individually
M1
M2
M3
SUMMARY
• Iden#fying informa#on needs that are relevant in the
context of a given ac#vity and proac#vely presen#ng
informa#on cards addressing those needs
• Open research problems
• Other contexts
• (Access to data, privacy...)
THE FUTURE
3
PART
Making the right informa#on available to the right person at the right #me.
IMAGINARY SCENARIO
WITH AN INTELLIGENT PERSONAL ASSISTANT
I see you're was'ng 'me away on
Facebook. Do you have 'me now to
talk about your holiday plans?Sure. I want an ac've holiday with
the family in beau'ful nature.
It sounds like you would definitely
love Norway. A cabin in the
mountains maybe?
Could be. But I want to go kayaking
and also catch some fish. 

And not too much rain, please.
And something fun for the kids
nearby, I suppose?
Of course.
How does Oltedal sound?
People have been quite successful
with catching lake trout based on
what I found on Instagram.
There is also a theme park and
horse riding, both within 50kms.
And what about the weather? You know we’re talking about
Norway, right…?
Anyway, based on sta's'cs from the
past 30 years, this is one of the areas
with the least amount of rain if you
go in August.
I see. What about accommoda'on?
Here is a list of places that I think you
might like.
Any opinions on this one?
According to the reviews that I can
find on the web, the cabins are well
equipped, the staff is nice and they
even allow guests to borrow their
kayaks.
OK. Let’s find a date that works for
everyone. According to your wife's calendar, her
parents will be visi'ng you in the first
week of August. School starts for the
kids on the week of Aug 22. So there
is a two week window between Aug
8 and 21, assuming that I can cancel
the regular weekly mee'ngs with
your PhD students.
That's fine. The students won't mind.
Write them an email to upload their
holiday plans to the group wiki, and
add summer planning to the next
group mee'ng's agenda.
Guys,
What are your plans for the summer?
Please upload your away times to the
group wiki.
-Kr
To: XXX, YYY, ZZZ
Send
Agenda item Summer planning added
In the mean'me, I called the cabin to
check availability. Their online
booking system is down at the
moment. They s'll have some cabins
available. Do you want to see them?
No, I had enough of this for today.
Mail the pictures to my wife with
some kind words.
Anything else I can do for you?Order a water filter for my espresso
machine. I just found out that it'll
need to be replaced soon.
Darling,
You will love the place I found for us for a
vacation in August. It is by the water; at
night we will hear the waves. We will be
able to take our morning breakfasts on
the balcony, which ...
To: Wife
Send
FUTURE RESEARCH THEMES
UNDERSTANDING 

INFORMATION NEEDS
• Natural language
conversa#onal interface
• An#cipa#ng informa#on needs
• Proac#ve recommenda#ons
It sounds like you would definitely
love Norway. A cabin in the
mountains maybe?
And something fun for the kids
nearby, I suppose?
I see you're was'ng 'me away on
Facebook. Do you have 'me now to
talk about your holiday plans?
DATA
• Long-tail en##es
• On-the-fly informa#on extrac#on
• "Personal" knowledge base
• "Wife", "My students", "my group", "my
espresso machine", ... en##es I care about
Here is a list of places that I think you
might like.
According to the reviews that I can
find on the web, ...
Order a water filter for my espresso
machine. I just found out that it'll
need to be replaced soon.
Breville BES860XL Barista
Express Espresso Machine
RESULT PRESENTATION 

& USER INTERACTION
• Providing evidence
• "Ac#onable" en##es
• Make booking, order item, write email, ...
• Helping the user to get things
done
• Support for task comple#on
... based on sta's'cs from the past
30 years, ...
According to your wife's calendar, ...
Agenda item Summer planning added
Write them an email to upload their
holiday plans to the group wiki, and
add summer planning to the next
group mee'ng's agenda.
SUMMARY
Understanding
informa'on needs
Data source(s)
Result presenta'on 

& user interac'on
Retrieval method
• Seman#c annota#ons
• An#cipa#ng info needs
• Natural language 

conversa#onal interfaces
• Long tail en##es
• Personal knowledge base
• On-the-fly informa#on extrac#on
• Hybrid approaches
• En#ty cards
• Ac#onable en##es
• Support for task comple#on
ACKNOWLEDGMENTS
• Joint work with
• Faegheh Hasibi
• Jan Benetka
• Darío Gariglioz
• Kje#l Nørvåg
• Svein Erik Bratsberg
QUESTIONS?
@krisz'anbalog 

krisz#anbalog.com

More Related Content

Similar to Entity Search: The Last Decade and the Next

Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
Andre Freitas
 
Identifying The Benefit of Linked Data
Identifying The Benefit of Linked DataIdentifying The Benefit of Linked Data
Identifying The Benefit of Linked Data
Richard Wallis
 

Similar to Entity Search: The Last Decade and the Next (20)

Visually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and PatternsVisually Exploring Patent Collections for Events and Patterns
Visually Exploring Patent Collections for Events and Patterns
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
 
Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistants
 
Test Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely testsTest Trend Analysis : Towards robust, reliable and timely tests
Test Trend Analysis : Towards robust, reliable and timely tests
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Building a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesBuilding a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and Ontologies
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
 
Graphs are Eating the World
Graphs are Eating the WorldGraphs are Eating the World
Graphs are Eating the World
 
On Entities and Evaluation
On Entities and EvaluationOn Entities and Evaluation
On Entities and Evaluation
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
 
20140521 sem-tech-biz-guest-lecture
20140521 sem-tech-biz-guest-lecture20140521 sem-tech-biz-guest-lecture
20140521 sem-tech-biz-guest-lecture
 
Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)
 
Dynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & StatisticsDynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & Statistics
 
Schema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & HowSchema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & How
 
Identifying The Benefit of Linked Data
Identifying The Benefit of Linked DataIdentifying The Benefit of Linked Data
Identifying The Benefit of Linked Data
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
 
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
 

More from krisztianbalog

What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
krisztianbalog
 

More from krisztianbalog (14)

Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
 
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
 
Personal Knowledge Graphs
Personal Knowledge GraphsPersonal Knowledge Graphs
Personal Knowledge Graphs
 
Table Retrieval and Generation
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generation
 
Overview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search EditionOverview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search Edition
 
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF LabOverview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
 
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
 
Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)
 
Time-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation SystemsTime-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation Systems
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
 
Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)
 
Semistructured Data Seach
Semistructured Data SeachSemistructured Data Seach
Semistructured Data Seach
 
Collection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity SearchCollection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity Search
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Entity Search: The Last Decade and the Next

  • 1. En#ty Search The Last Decade and the Next Krisz#an Balog University of Stavanger
 @krisz'anbalog 10th Russian Summer School in Informa'on Retrieval (RuSSIR 2016) | Saratov, Russia, 2016
  • 2. WHAT IS AN ENTITY? • An en#ty is an "object" or "thing" in the real world that can be dis'nctly iden'fied and is characterized by the following proper#es: • unique iden#fier(s) • name(s) • type(s) • aRributes (or descrip#on) • (typed) rela#onships to other en##es people products organiza#ons loca#ons
  • 3.
  • 4.
  • 6. THE PAST 1 PART The core problem of en#ty ranking and its inves#ga#on at various benchmarking evalua#on campaigns
  • 7. EVALUATION CYCLE 02. Experimental design 03. Method development 05. Repor'ng REVISION 04. Experimental evalua'on IDEA 01. Task defini'on
  • 8. ENTITY RANKING TASK search query retrieval method search results
  • 9. EVALUATION CAMPAIGNS 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 TREC Enterprise TREC Entity INEX Entity Ranking SemSearch INEX Question Answering over Linked Data
  • 10. EVALUATION CAMPAIGNS 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 TREC Enterprise TREC Entity INEX Entity Ranking SemSearch INEX Question Answering over Linked Data Task: expert finding Input: keyword query Data collec'on: enterprise intranet En'ty ID: email address ontology engineering climate change
  • 11. xxx xxxx xx xx xxxx xx x xxxxxx xxx x xxxxxx xxxx xxxx xx xxxx xx xxxx xx xxxx xx xxxxxx xx xxxx xxxxx xxx x xxxxxxx xxx xxxx xx xx xxxx xx x xxxxxx xxx x xxxxxx xxxx xxxx xx xxxx xx xxxx xx xxxx xx xxxxxx xx xxxx xxxxx xxx x xxxxxxx TREC ENTERPRISE EXPERT FINDING • How to rank en##es that have no direct representa#ons? • Idea: Look at co-occurrences of en##es and query terms in documents xxx xxxx xx xx xxxx xx x xxxxxx xxx x xxxxxx xxxx xxxx xx xxxx xx xxxx xx xxxx xx xxxxxx xx xxxx xxxxx xxx x xxxxxxx query terms en#ty men#on documents
  • 12. PROFILE-BASED METHODS • Build a direct term-based en#ty representa#on based on associated language usage • "You shall know a word by the company it keeps." [Firth, 1957] • Use document retrieval techniques for ranking en#ty profile documents q xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx e xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx e e
  • 13. DOCUMENT-BASED METHODS • First rank documents 
 (or document snippets) • Then aggregate evidence for the associated en##es q xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx X e X X e e
  • 14. 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 EVALUATION CAMPAIGNS TREC Enterprise TREC Entity INEX Entity Ranking SemSearch INEX Question Answering over Linked Data Task: en#ty ranking in Wikipedia Input: keyword++ query 
 (target types/examples) Data collec'on: Wikipedia En'ty ID: Wikipedia ar#cle ID Movies with eight or more Academy Awards +category: best picture oscar +category: bri#sh films +category: american films
  • 15. INEX ENTITY RANKING Movies with eight or more Academy Awards +category: best picture oscar +category: bri#sh films +category: american films Term-based representa'on Category-based representa'on
  • 16. 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 EVALUATION CAMPAIGNS TREC Enterprise TREC Entity INEX Entity Ranking SemSearch INEX Question Answering over Linked Data Task: related en#ty finding Input: keyword++ query 
 (input en#ty, target type) Data collec'on: Web En'ty ID: en#ty homepage URL airlines that currently use Boeing-747 planes +en'ty: Boeing-747 (clueweb09-..292) +target type: organiza#on
  • 17. 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 EVALUATION CAMPAIGNS TREC Enterprise TREC Entity INEX Entity Ranking SemSearch INEX Question Answering over Linked Data Task: en#ty search in the Web of Data Input: keyword query Data collec'on: RDF triples En'ty ID: URI nokia e73 boroughs of New York City disney orlando
  • 18. FIELDED DOCUMENT REPRESENTATION FROM RDF TRIPLES dbpedia:Audi_A4 subject object predicate subject predicate literal foaf:name Audi A4 rdfs:label Audi A4 rdfs:comment The Audi A4 is a compact executive car produced since late 1994 by the German car manufacturer Audi, a subsidiary of the Volkswagen Group. The A4 has been built [...] dbpprop:production 1994 2001 2005 2008 rdf:type dbpedia-owl:MeanOfTransportation dbpedia-owl:Automobile dbpedia-owl:manufacturer dbpedia:Audi dbpedia-owl:class dbpedia:Compact_executive_car owl:sameAs freebase:Audi A4 is dbpedia-owl:predecessor of dbpedia:Audi_A5 is dbpprop:similar of dbpedia:Cadillac_BLS
  • 19. 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 EVALUATION CAMPAIGNS TREC Enterprise TREC Entity INEX Entity Ranking SemSearch INEX Question Answering over Linked Data Task: ques#on answering over RDF data Input: natural language query Data collec'on: RDF triples En'ty ID: URI Which German ci#es have more than 250000 inhabitants? Who is the youngest Pulitzer Prize winner?
  • 20. 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 EVALUATION CAMPAIGNS TREC Enterprise TREC Entity INEX Entity Ranking SemSearch INEX Question Answering over Linked Data Task: ad-hoc en#ty retrieval Input: keyword query Data collec'on: Wikipedia + RDF triples En'ty ID: Wikipedia ar#cle ID NASA country German
  • 21. EVALUATION CAMPAIGNS 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 TREC Enterprise TREC Entity INEX Entity Ranking SemSearch INEX Question Answering over Linked Data
  • 22. DATA EVOLUTION 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 TREC Enterprise TREC Entity INEX Entity Ranking SemSearch Question Answering over Linked Data unstructured structured semistructured INEX • Clear trend moving towards structured data • No meaningful/successful aRempt at combining unstructured and structured data
  • 23. QUERY EVOLUTION 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 TREC Enterprise TREC Entity INEX Entity Ranking SemSearch Question Answering over Linked Data keyword natural language keyword++ INEX • Keyword queries are s#ll the most common way to search • From providing explicit seman#c annota#ons to natural language ques#ons
  • 24. WHAT HAVE WE BEEN DOING? • Core focus has been on retrieval models, and more specifically on en'ty representa'ons • In terms of associated language usage, descrip#on, types, aRributes • Richer query representa#ons (i.e., query annota#ons) were taken for granted
  • 26. THE BIGGER PICTURE Understanding informa'on needs Data source(s) Result presenta'on 
 & user interac'on Retrieval method
  • 27. THE PRESENT 2 PART Current research themes on various aspects of en#ty search.
  • 28. DATA J. Benetka, K. Balog, and K. Nørvåg. 
 Towards Building a Knowledge Base of Monetary Transac'ons from a News Collec'on. 
 JCDL’17.
  • 29. KNOWLEDGE BASES • Modern en#ty-oriented search features are fueled by knowledge bases—need con#nuous upda#ng • Cri#cal to be able to verify the validity of data • Supply provenance informa#on for each statement • Validity check (s#ll) needs to be performed by a human • Can we help human editors to maintain and expand knowledge bases?
  • 30. acquisitionFinancial event:OracleSubject: Find events InsertConfidence 2004 NYT USD 10 300 000 000 Value NYT Year 56% 2007 USD 1 500 000 … from the PeopleSoft purchase … 2005 NYT 2004 NYT Snippet NYT 82.8% …Oracle finally acquired PeopleSoft for… pleSoft finally capitulated to Oracle's … Link 2004 … which acquired PeopleSoft last year … USD 11 75.3% USD 20 000 000 000 78.9% 66.7% PeopleSoft for $5.1 billion in cash. USD 7 700 000 000 Counterpart Event attributes Hyperion Solutions Siebel Systems Retek PeopleSoft BUILDING A KNOWLEDGE BASE OF MONETARY TRANSACTIONS Subject en'ty Predicate filter Object en'ty Extracted informa'on A Boom in Merger Activity In December 2004, after a battle for control that grew nasty, Oracle finally acquired PeopleSoft for about $10.3 billion, becoming the second- largest maker of business- management software.
  • 31. APPROACH • Generate all possible event interpreta#ons (quintuples) Event representa'on • Monetary value recogni#on • Economic event recogni#on • En#ty recogni#on • Date extrac#on • Seman#c role labeling Seman'c annota'on of sentences • Grouping sentences that discuss the same economic event Clustering events • Assigning confidence score to each interpreta#on Supervised learning s#1 s#2 s#3 s#4 s#5 s#1 s#1 s#2 s#5 s#3 s#4 0.85 0.65 0.91 0.43 0.45 0.77 1 2 3 4 s#1 s#2 s#5 A B A B A B s#3 s#4 C D C D e#1 [C] <rel> [D] e#2 [A] <rel> [B] { {
  • 32. RESULTS F1 0 0,1 0,2 0,3 0,4 Events A]ributes (strict) A]ributes (relaxed) First repor#ng Last repor#ng Most frequent Supervised learning
  • 33. SUMMARY • Building a domain-specific knowledge base • NLP pipeline for informa#on extrac#on • ML for establishing confidence for human processing • Open research problems • Long-tail en##es • En##es "not worthy" of a Wikipedia page • What are the aRributes that ma#er?
  • 34. UNDERSTANDING INFORMATION NEEDS F. Hasibi, K. Balog, and S. E. Bratsberg. 
 Exploi'ng En'ty Linking in Queries for En'ty Retrieval. 
 ICTIR’16.
  • 35. ANNOTATING QUERIES WITH ENTITIES • Seman#c annota#ons of queries were taken for granted so far • How can automa'c en'ty annota'ons of queries be leveraged to improve en'ty retrieval? barack obama parents
  • 36. APPROACH <Barack_Obama> Annotations: barack obama parents Entity-based representation ˆDˆD Term-based representation DD term-based matching entity-based matching entity linking <dbo:birthPlace>: [<Honolulu>, <Hawaii> ] <dbo:child>: <Barack_Obama> <dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …] Query terms: <rdfs:label>: Ann Dunham <dbo:abstract>: Stanley Ann Dunham the mother Barack Obama, was an American anthropologist who … <dbo:birthPlace>: Honolulu Hawaii … <dbo:child>: Barack Obama <dbo:wikiPageWikiLink>: United States Family Barack Obama Term-based representa'on En'ty-based representa'on barack obama parents <Barack_Obama> Annotations: barack obama parents Entity-based representation ˆDˆD Term-based representation DD term-based matching entity-based matching entity linking <dbo:birthPlace>: [<Honolulu>, <Hawaii> ] <dbo:child>: <Barack_Obama> <dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …] Query terms: <rdfs:label>: Ann Dunham <dbo:abstract>: Stanley Ann Dunham the mother Barack Obama, was an American anthropologist who … <dbo:birthPlace>: Honolulu Hawaii … <dbo:child>: Barack Obama <dbo:wikiPageWikiLink>: United States Family Barack Obama <Barack_Obama> en'ty annota'on (automa'c)
  • 39. SUMMARY • Automa#cally annota#ng queries with en##es can significantly improve retrieval performance • Open research problem: • How should a query be answered (list, fact, table, etc.)?
  • 40. ENTITY SUMMARIES F. Hasibi, K. Balog, and S. E. Bratsberg. 
 Dynamic Factual Summaries for En'ty Cards. 
 SIGIR’17.
  • 41. ENTITY SUMMARIES • Summaries serve a dual purpose • Synopsis of the en#ty • Provide evidence why the en#ty is a good answer for the given query • How to generate dynamic en'ty summaries that can directly address users’ informa'on needs? • Two subtasks • Fact ranking — What should be in the summary? • Summary genera#on — How should it be presented?
  • 42. EXAMPLE einstein awards Sta'c (query-independent) summary Dynamic (query-dependent) summary Born: March 14, 1879, Ulm, Germany Died: April 18, 1955, Princeton, New Jersey, United States Influenced by: Isaac Newton, Mahatma Gandhi, more Spouse: Elsa Einstein, Mileva Marić Children: Eduard Einstein, Lieserl Einstein, Hans A. Einstein Born: March 14, 1879, Ulm, Germany Died: April 18, 1955, Princeton, New Jersey, United States Awards: Barnard Medal, Nobel Prize in Physics, more Educa'on: Swiss Federal Polytechnic, University of Zurich Influenced by: Isaac Newton, Mahatma Gandhi, more
  • 43. FACT RANKING • Ranking en#ty facts according to various "goodness" criteria • Importance: how well it describes the en#ty • Relevance: how well it supports/explains why the en#ty is a relevant result for the given query (informa#on need) • U'lity: combines importance and relevance • Learning-to-rank approach with specific features designed for capturing importance and relevance
  • 44. SUMMARY GENERATION • A summary is more than a ranked list of facts Seman'cally iden'cal predicates Presenta'on 
 (human-readable labels, size constraints) Mul'-valued predicates <dbo:capital> <dbpedia:Oslo> <dbo:currency> <dbpedia:Norwegian_krone> <dbo:leader> <dbpedia:Harald_V_of_Norway> <dbp:establishedDate> 1814-05-17 <dbp:leaderName> <dbpedia:Harald_V_of_Norway> <foaf:homepage> <hRp://www.norway.no/> <dbo:language> <dbpedia:Norwegian_language> <dbo:language> <dbpedia:Romani_language> <dbo:language> <dbpedia:Scandoromani_language> <dbp:website> <hRp://www.norway.no/> <dbo:leaderTitle> President of the Stor#ng <dbp:areaKm> 385178 vs. Capital: Oslo Currency: Norwegian krone Leader: Harald V of Norway Homepage: hRp://www.norway.no/ Language: Norwegian, Romani, more
  • 45. SUMMARY GENERATION ALGORITHM … … headingiheadingi valueivaluei height(⌧h)height(⌧h) width(⌧w)width(⌧w) lineilinei 1. Selec'ng line headings • Recognizing seman#cally iden#cal predicates • Mapping predicates to human readable labels 2. Collec'ng line values • Grouping values for mul#-valued predicates • Adhering to size constraints
  • 46. FACT RANKING EVALUATION All facts Facts with URI-only objects NGCD@10 0 0,2 0,4 0,6 0,8 Importance U'lity RELIN DynES/imp DynES NGCD@10 0,00 0,23 0,45 0,68 0,90 Importance U'lity RELIN LinkSUM SUMMARUM DynES/imp DynES
  • 47. END-TO-END (SUMMARY) EVALUATION • How do sta#c and dynamic summaries compare against each other? Oracle (perfect) fact ranking Automa#c fact ranking 0 25 50 75 100 31 37 23 16 46 47 Dynamic summary wins Sta#c summary wins
  • 48. SUMMARY • Addressed the problem of genera#ng dynamic (query-dependent) en#ty summaries • Open research problems • What should be on the en#ty card? • Other forms of result presenta#on (tables, lists, graphs, etc.)
  • 49. ANTICIPATING INFORMATION NEEDS J. Benetka, K. Balog, and K. Nørvåg. 
 An'cipa'ng Informa'on Needs Based on Check-in Ac'vity. 
 WSDM’17.
  • 50. ZERO-QUERY SEARCH • ProacAve instead of reacAve search • "An#cipate user needs and respond with informa#on appropriate to the current context without the user having to enter a query" — (Allan et al., SIGIR Forum 2012) • Using a person's check-in ac'vity as context, can we an'cipate her informa'on needs, and respond with a set of informa'on cards that directly address those needs? Terminal Weather 21ºC Traffic
  • 51. INFORMATION NEEDS FOR ACTIVITIES • What are relevant informa#on needs in the context of a given ac#vity? • Use POI categories (Foursquare) to represent ac#vi#es • Mine informa#on needs from search sugges#ons
  • 52. ANTICIPATING INFORMATION NEEDS • Maximize the likelihood of sa#sfying the user's informa#on needs by considering each possible ac#vity that might follow next • Transi#on probabili#es are es#mated based on historical check-in data Activity A Activity B Activity C Activity D 45% 34% 21% ?
  • 53. Train Test80% User 3 User 2 User 1 Check-in dataset EVALUATION METHODOLOGY Terminal Weather 21ºC Traffic
  • 54. RESULTSNGCD@5 0,00 0,23 0,45 0,68 0,90 Top level Second level Most frequent informa#on needs, regardless of the last ac#vity M0 Consider informa#on needs for all possible upcoming ac#vi#es In addi#on, consider the informa#on needs relevant to the past ac#vity (fixed weight for all info needs) Consider the temporal sensi#vity of each informa#on need individually M1 M2 M3
  • 55. SUMMARY • Iden#fying informa#on needs that are relevant in the context of a given ac#vity and proac#vely presen#ng informa#on cards addressing those needs • Open research problems • Other contexts • (Access to data, privacy...)
  • 56. THE FUTURE 3 PART Making the right informa#on available to the right person at the right #me.
  • 57. IMAGINARY SCENARIO WITH AN INTELLIGENT PERSONAL ASSISTANT
  • 58. I see you're was'ng 'me away on Facebook. Do you have 'me now to talk about your holiday plans?Sure. I want an ac've holiday with the family in beau'ful nature. It sounds like you would definitely love Norway. A cabin in the mountains maybe? Could be. But I want to go kayaking and also catch some fish. 
 And not too much rain, please. And something fun for the kids nearby, I suppose? Of course. How does Oltedal sound? People have been quite successful with catching lake trout based on what I found on Instagram. There is also a theme park and horse riding, both within 50kms.
  • 59. And what about the weather? You know we’re talking about Norway, right…? Anyway, based on sta's'cs from the past 30 years, this is one of the areas with the least amount of rain if you go in August. I see. What about accommoda'on? Here is a list of places that I think you might like. Any opinions on this one? According to the reviews that I can find on the web, the cabins are well equipped, the staff is nice and they even allow guests to borrow their kayaks.
  • 60. OK. Let’s find a date that works for everyone. According to your wife's calendar, her parents will be visi'ng you in the first week of August. School starts for the kids on the week of Aug 22. So there is a two week window between Aug 8 and 21, assuming that I can cancel the regular weekly mee'ngs with your PhD students. That's fine. The students won't mind. Write them an email to upload their holiday plans to the group wiki, and add summer planning to the next group mee'ng's agenda. Guys, What are your plans for the summer? Please upload your away times to the group wiki. -Kr To: XXX, YYY, ZZZ Send Agenda item Summer planning added
  • 61. In the mean'me, I called the cabin to check availability. Their online booking system is down at the moment. They s'll have some cabins available. Do you want to see them? No, I had enough of this for today. Mail the pictures to my wife with some kind words. Anything else I can do for you?Order a water filter for my espresso machine. I just found out that it'll need to be replaced soon. Darling, You will love the place I found for us for a vacation in August. It is by the water; at night we will hear the waves. We will be able to take our morning breakfasts on the balcony, which ... To: Wife Send
  • 63. UNDERSTANDING 
 INFORMATION NEEDS • Natural language conversa#onal interface • An#cipa#ng informa#on needs • Proac#ve recommenda#ons It sounds like you would definitely love Norway. A cabin in the mountains maybe? And something fun for the kids nearby, I suppose? I see you're was'ng 'me away on Facebook. Do you have 'me now to talk about your holiday plans?
  • 64. DATA • Long-tail en##es • On-the-fly informa#on extrac#on • "Personal" knowledge base • "Wife", "My students", "my group", "my espresso machine", ... en##es I care about Here is a list of places that I think you might like. According to the reviews that I can find on the web, ... Order a water filter for my espresso machine. I just found out that it'll need to be replaced soon. Breville BES860XL Barista Express Espresso Machine
  • 65. RESULT PRESENTATION 
 & USER INTERACTION • Providing evidence • "Ac#onable" en##es • Make booking, order item, write email, ... • Helping the user to get things done • Support for task comple#on ... based on sta's'cs from the past 30 years, ... According to your wife's calendar, ... Agenda item Summer planning added Write them an email to upload their holiday plans to the group wiki, and add summer planning to the next group mee'ng's agenda.
  • 66. SUMMARY Understanding informa'on needs Data source(s) Result presenta'on 
 & user interac'on Retrieval method • Seman#c annota#ons • An#cipa#ng info needs • Natural language 
 conversa#onal interfaces • Long tail en##es • Personal knowledge base • On-the-fly informa#on extrac#on • Hybrid approaches • En#ty cards • Ac#onable en##es • Support for task comple#on
  • 67. ACKNOWLEDGMENTS • Joint work with • Faegheh Hasibi • Jan Benetka • Darío Gariglioz • Kje#l Nørvåg • Svein Erik Bratsberg