Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
En#ty Search
The Last Decade and the Next
Krisz#an Balog
University of Stavanger

@krisz'anbalog
10th Russian Summer Schoo...
WHAT IS AN ENTITY?
• An en#ty is an "object" or
"thing" in the real world that
can be dis'nctly iden'fied and
is characteri...
OUTLINE
2

Present
1

Past
3

Future
now-10y +10y
THE PAST
1
PART
The core problem of en#ty ranking and its inves#ga#on at various
benchmarking evalua#on campaigns
EVALUATION CYCLE
02. Experimental
design
03. Method
development
05. Repor'ng
REVISION
04. Experimental
evalua'on
IDEA
01. ...
ENTITY RANKING TASK
search query
retrieval
method
search results
EVALUATION CAMPAIGNS
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
TREC Enterprise
TREC Entity
INEX Entity Rankin...
EVALUATION CAMPAIGNS
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
TREC Enterprise
TREC Entity
INEX Entity Rankin...
xxx xxxx xx xx xxxx xx
x xxxxxx xxx x xxxxxx
xxxx xxxx xx xxxx xx
xxxx xx xxxx xx xxxxxx
xx xxxx xxxxx xxx x
xxxxxxx
xxx x...
PROFILE-BASED METHODS
• Build a direct term-based en#ty
representa#on based on
associated language usage
• "You shall know...
DOCUMENT-BASED METHODS
• First rank documents 

(or document snippets)
• Then aggregate evidence for
the associated en##es...
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
EVALUATION CAMPAIGNS
TREC Enterprise
TREC Entity
INEX Entity Rankin...
INEX ENTITY RANKING
Movies with eight or more Academy Awards
+category: best picture oscar
+category: bri#sh films
+categor...
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
EVALUATION CAMPAIGNS
TREC Enterprise
TREC Entity
INEX Entity Rankin...
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
EVALUATION CAMPAIGNS
TREC Enterprise
TREC Entity
INEX Entity Rankin...
FIELDED DOCUMENT REPRESENTATION
FROM RDF TRIPLES
dbpedia:Audi_A4
subject object
predicate
subject
predicate
literal
foaf:n...
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
EVALUATION CAMPAIGNS
TREC Enterprise
TREC Entity
INEX Entity Rankin...
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
EVALUATION CAMPAIGNS
TREC Enterprise
TREC Entity
INEX Entity Rankin...
EVALUATION CAMPAIGNS
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
TREC Enterprise
TREC Entity
INEX Entity Rankin...
DATA EVOLUTION
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
TREC Enterprise TREC Entity
INEX Entity Ranking
SemS...
QUERY EVOLUTION
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
TREC Enterprise
TREC Entity
INEX Entity Ranking
Sem...
WHAT HAVE WE BEEN DOING?
• Core focus has been on retrieval models, and more
specifically on en'ty representa'ons
• In term...
image source: hRps://www.pinterest.com/pin/382946774535857111/
THE BIGGER PICTURE
Understanding
informa'on needs
Data source(s)
Result presenta'on 

& user interac'on
Retrieval method
THE PRESENT
2
PART
Current research themes on various aspects of en#ty search.
DATA
J. Benetka, K. Balog, and K. Nørvåg. 

Towards Building a Knowledge Base of Monetary
Transac'ons from a News Collec'o...
KNOWLEDGE BASES
• Modern en#ty-oriented search features are fueled
by knowledge bases—need con#nuous upda#ng
• Cri#cal to ...
acquisitionFinancial event:OracleSubject: Find events
InsertConfidence
2004
NYT
USD 10 300 000 000
Value
NYT
Year
56%
2007
...
APPROACH
• Generate all possible event
interpreta#ons (quintuples)
Event representa'on
• Monetary value recogni#on
• Econo...
RESULTS
F1
0
0,1
0,2
0,3
0,4
Events A]ributes (strict) A]ributes (relaxed)
First repor#ng Last repor#ng Most frequent Supe...
SUMMARY
• Building a domain-specific knowledge base
• NLP pipeline for informa#on extrac#on
• ML for establishing confidence...
UNDERSTANDING
INFORMATION NEEDS
F. Hasibi, K. Balog, and S. E. Bratsberg. 

Exploi'ng En'ty Linking in Queries for En'ty R...
ANNOTATING QUERIES WITH ENTITIES
• Seman#c annota#ons of queries
were taken for granted so far
• How can automa'c en'ty
an...
APPROACH
<Barack_Obama>
Annotations:
barack obama parents
Entity-based representation ˆDˆD
Term-based representation DD
te...
RESULTS
MAP
0,00
0,06
0,11
0,17
0,22
LM MLM-tc MLM-all PRMS SDM FSDM
baseline +ELR
ANALYSIS
SUMMARY
• Automa#cally annota#ng queries with en##es can
significantly improve retrieval performance
• Open research proble...
ENTITY SUMMARIES
F. Hasibi, K. Balog, and S. E. Bratsberg. 

Dynamic Factual Summaries for En'ty Cards. 

SIGIR’17.
ENTITY SUMMARIES
• Summaries serve a dual purpose
• Synopsis of the en#ty
• Provide evidence why the en#ty is a good answe...
EXAMPLE
einstein awards
Sta'c (query-independent) summary Dynamic (query-dependent) summary
Born: March 14, 1879, Ulm, Ger...
FACT RANKING
• Ranking en#ty facts according to various
"goodness" criteria
• Importance: how well it describes the en#ty
...
SUMMARY GENERATION
• A summary is more than a ranked list of facts
Seman'cally
iden'cal
predicates
Presenta'on 

(human-re...
SUMMARY GENERATION ALGORITHM
… …
headingiheadingi valueivaluei
height(⌧h)height(⌧h)
width(⌧w)width(⌧w)
lineilinei
1. Selec...
FACT RANKING EVALUATION
All facts Facts with URI-only objects
NGCD@10
0
0,2
0,4
0,6
0,8
Importance U'lity
RELIN DynES/imp ...
END-TO-END (SUMMARY) EVALUATION
• How do sta#c and dynamic summaries compare
against each other?
Oracle (perfect) fact ran...
SUMMARY
• Addressed the problem of genera#ng dynamic
(query-dependent) en#ty summaries
• Open research problems
• What sho...
ANTICIPATING
INFORMATION NEEDS
J. Benetka, K. Balog, and K. Nørvåg. 

An'cipa'ng Informa'on Needs Based on Check-in Ac'vit...
ZERO-QUERY SEARCH
• ProacAve instead of reacAve search
• "An#cipate user needs and respond with
informa#on appropriate to ...
INFORMATION NEEDS FOR ACTIVITIES
• What are relevant informa#on needs in the context of
a given ac#vity?
• Use POI categor...
ANTICIPATING INFORMATION NEEDS
• Maximize the likelihood of sa#sfying the user's
informa#on needs by considering each poss...
Train Test80%
User 3
User 2
User 1
Check-in dataset
EVALUATION METHODOLOGY
Terminal
Weather
21ºC
Traffic
RESULTSNGCD@5
0,00
0,23
0,45
0,68
0,90
Top level Second level
Most frequent informa#on needs,
regardless of the last ac#vi...
SUMMARY
• Iden#fying informa#on needs that are relevant in the
context of a given ac#vity and proac#vely presen#ng
informa...
THE FUTURE
3
PART
Making the right informa#on available to the right person at the right #me.
IMAGINARY SCENARIO
WITH AN INTELLIGENT PERSONAL ASSISTANT
I see you're was'ng 'me away on
Facebook. Do you have 'me now to
talk about your holiday plans?Sure. I want an ac've holid...
And what about the weather? You know we’re talking about
Norway, right…?
Anyway, based on sta's'cs from the
past 30 years,...
OK. Let’s find a date that works for
everyone. According to your wife's calendar, her
parents will be visi'ng you in the fir...
In the mean'me, I called the cabin to
check availability. Their online
booking system is down at the
moment. They s'll hav...
FUTURE RESEARCH THEMES
UNDERSTANDING 

INFORMATION NEEDS
• Natural language
conversa#onal interface
• An#cipa#ng informa#on needs
• Proac#ve reco...
DATA
• Long-tail en##es
• On-the-fly informa#on extrac#on
• "Personal" knowledge base
• "Wife", "My students", "my group", ...
RESULT PRESENTATION 

& USER INTERACTION
• Providing evidence
• "Ac#onable" en##es
• Make booking, order item, write email...
SUMMARY
Understanding
informa'on needs
Data source(s)
Result presenta'on 

& user interac'on
Retrieval method
• Seman#c an...
ACKNOWLEDGMENTS
• Joint work with
• Faegheh Hasibi
• Jan Benetka
• Darío Gariglioz
• Kje#l Nørvåg
• Svein Erik Bratsberg
QUESTIONS?
@krisz'anbalog 

krisz#anbalog.com
Entity Search: The Last Decade and the Next
Entity Search: The Last Decade and the Next
Upcoming SlideShare
Loading in …5
×

Entity Search: The Last Decade and the Next

3,481 views

Published on

Keynote talk given at the 10th Russian Summer School in Information Retrieval (RuSSIR ’16), Saratov, Russia, August 2016.

Note: part of the work is under still review; those slides are not yet included.

Published in: Technology
  • Be the first to comment

Entity Search: The Last Decade and the Next

  1. 1. En#ty Search The Last Decade and the Next Krisz#an Balog University of Stavanger
 @krisz'anbalog 10th Russian Summer School in Informa'on Retrieval (RuSSIR 2016) | Saratov, Russia, 2016
  2. 2. WHAT IS AN ENTITY? • An en#ty is an "object" or "thing" in the real world that can be dis'nctly iden'fied and is characterized by the following proper#es: • unique iden#fier(s) • name(s) • type(s) • aRributes (or descrip#on) • (typed) rela#onships to other en##es people products organiza#ons loca#ons
  3. 3. OUTLINE 2
 Present 1
 Past 3
 Future now-10y +10y
  4. 4. THE PAST 1 PART The core problem of en#ty ranking and its inves#ga#on at various benchmarking evalua#on campaigns
  5. 5. EVALUATION CYCLE 02. Experimental design 03. Method development 05. Repor'ng REVISION 04. Experimental evalua'on IDEA 01. Task defini'on
  6. 6. ENTITY RANKING TASK search query retrieval method search results
  7. 7. EVALUATION CAMPAIGNS 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 TREC Enterprise TREC Entity INEX Entity Ranking SemSearch INEX Question Answering over Linked Data
  8. 8. EVALUATION CAMPAIGNS 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 TREC Enterprise TREC Entity INEX Entity Ranking SemSearch INEX Question Answering over Linked Data Task: expert finding Input: keyword query Data collec'on: enterprise intranet En'ty ID: email address ontology engineering climate change
  9. 9. xxx xxxx xx xx xxxx xx x xxxxxx xxx x xxxxxx xxxx xxxx xx xxxx xx xxxx xx xxxx xx xxxxxx xx xxxx xxxxx xxx x xxxxxxx xxx xxxx xx xx xxxx xx x xxxxxx xxx x xxxxxx xxxx xxxx xx xxxx xx xxxx xx xxxx xx xxxxxx xx xxxx xxxxx xxx x xxxxxxx TREC ENTERPRISE EXPERT FINDING • How to rank en##es that have no direct representa#ons? • Idea: Look at co-occurrences of en##es and query terms in documents xxx xxxx xx xx xxxx xx x xxxxxx xxx x xxxxxx xxxx xxxx xx xxxx xx xxxx xx xxxx xx xxxxxx xx xxxx xxxxx xxx x xxxxxxx query terms en#ty men#on documents
  10. 10. PROFILE-BASED METHODS • Build a direct term-based en#ty representa#on based on associated language usage • "You shall know a word by the company it keeps." [Firth, 1957] • Use document retrieval techniques for ranking en#ty profile documents q xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx e xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx e e
  11. 11. DOCUMENT-BASED METHODS • First rank documents 
 (or document snippets) • Then aggregate evidence for the associated en##es q xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx X e X X e e
  12. 12. 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 EVALUATION CAMPAIGNS TREC Enterprise TREC Entity INEX Entity Ranking SemSearch INEX Question Answering over Linked Data Task: en#ty ranking in Wikipedia Input: keyword++ query 
 (target types/examples) Data collec'on: Wikipedia En'ty ID: Wikipedia ar#cle ID Movies with eight or more Academy Awards +category: best picture oscar +category: bri#sh films +category: american films
  13. 13. INEX ENTITY RANKING Movies with eight or more Academy Awards +category: best picture oscar +category: bri#sh films +category: american films Term-based representa'on Category-based representa'on
  14. 14. 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 EVALUATION CAMPAIGNS TREC Enterprise TREC Entity INEX Entity Ranking SemSearch INEX Question Answering over Linked Data Task: related en#ty finding Input: keyword++ query 
 (input en#ty, target type) Data collec'on: Web En'ty ID: en#ty homepage URL airlines that currently use Boeing-747 planes +en'ty: Boeing-747 (clueweb09-..292) +target type: organiza#on
  15. 15. 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 EVALUATION CAMPAIGNS TREC Enterprise TREC Entity INEX Entity Ranking SemSearch INEX Question Answering over Linked Data Task: en#ty search in the Web of Data Input: keyword query Data collec'on: RDF triples En'ty ID: URI nokia e73 boroughs of New York City disney orlando
  16. 16. FIELDED DOCUMENT REPRESENTATION FROM RDF TRIPLES dbpedia:Audi_A4 subject object predicate subject predicate literal foaf:name Audi A4 rdfs:label Audi A4 rdfs:comment The Audi A4 is a compact executive car produced since late 1994 by the German car manufacturer Audi, a subsidiary of the Volkswagen Group. The A4 has been built [...] dbpprop:production 1994 2001 2005 2008 rdf:type dbpedia-owl:MeanOfTransportation dbpedia-owl:Automobile dbpedia-owl:manufacturer dbpedia:Audi dbpedia-owl:class dbpedia:Compact_executive_car owl:sameAs freebase:Audi A4 is dbpedia-owl:predecessor of dbpedia:Audi_A5 is dbpprop:similar of dbpedia:Cadillac_BLS
  17. 17. 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 EVALUATION CAMPAIGNS TREC Enterprise TREC Entity INEX Entity Ranking SemSearch INEX Question Answering over Linked Data Task: ques#on answering over RDF data Input: natural language query Data collec'on: RDF triples En'ty ID: URI Which German ci#es have more than 250000 inhabitants? Who is the youngest Pulitzer Prize winner?
  18. 18. 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 EVALUATION CAMPAIGNS TREC Enterprise TREC Entity INEX Entity Ranking SemSearch INEX Question Answering over Linked Data Task: ad-hoc en#ty retrieval Input: keyword query Data collec'on: Wikipedia + RDF triples En'ty ID: Wikipedia ar#cle ID NASA country German
  19. 19. EVALUATION CAMPAIGNS 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 TREC Enterprise TREC Entity INEX Entity Ranking SemSearch INEX Question Answering over Linked Data
  20. 20. DATA EVOLUTION 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 TREC Enterprise TREC Entity INEX Entity Ranking SemSearch Question Answering over Linked Data unstructured structured semistructured INEX • Clear trend moving towards structured data • No meaningful/successful aRempt at combining unstructured and structured data
  21. 21. QUERY EVOLUTION 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 TREC Enterprise TREC Entity INEX Entity Ranking SemSearch Question Answering over Linked Data keyword natural language keyword++ INEX • Keyword queries are s#ll the most common way to search • From providing explicit seman#c annota#ons to natural language ques#ons
  22. 22. WHAT HAVE WE BEEN DOING? • Core focus has been on retrieval models, and more specifically on en'ty representa'ons • In terms of associated language usage, descrip#on, types, aRributes • Richer query representa#ons (i.e., query annota#ons) were taken for granted
  23. 23. image source: hRps://www.pinterest.com/pin/382946774535857111/
  24. 24. THE BIGGER PICTURE Understanding informa'on needs Data source(s) Result presenta'on 
 & user interac'on Retrieval method
  25. 25. THE PRESENT 2 PART Current research themes on various aspects of en#ty search.
  26. 26. DATA J. Benetka, K. Balog, and K. Nørvåg. 
 Towards Building a Knowledge Base of Monetary Transac'ons from a News Collec'on. 
 JCDL’17.
  27. 27. KNOWLEDGE BASES • Modern en#ty-oriented search features are fueled by knowledge bases—need con#nuous upda#ng • Cri#cal to be able to verify the validity of data • Supply provenance informa#on for each statement • Validity check (s#ll) needs to be performed by a human • Can we help human editors to maintain and expand knowledge bases?
  28. 28. acquisitionFinancial event:OracleSubject: Find events InsertConfidence 2004 NYT USD 10 300 000 000 Value NYT Year 56% 2007 USD 1 500 000 … from the PeopleSoft purchase … 2005 NYT 2004 NYT Snippet NYT 82.8% …Oracle finally acquired PeopleSoft for… pleSoft finally capitulated to Oracle's … Link 2004 … which acquired PeopleSoft last year … USD 11 75.3% USD 20 000 000 000 78.9% 66.7% PeopleSoft for $5.1 billion in cash. USD 7 700 000 000 Counterpart Event attributes Hyperion Solutions Siebel Systems Retek PeopleSoft BUILDING A KNOWLEDGE BASE OF MONETARY TRANSACTIONS Subject en'ty Predicate filter Object en'ty Extracted informa'on A Boom in Merger Activity In December 2004, after a battle for control that grew nasty, Oracle finally acquired PeopleSoft for about $10.3 billion, becoming the second- largest maker of business- management software.
  29. 29. APPROACH • Generate all possible event interpreta#ons (quintuples) Event representa'on • Monetary value recogni#on • Economic event recogni#on • En#ty recogni#on • Date extrac#on • Seman#c role labeling Seman'c annota'on of sentences • Grouping sentences that discuss the same economic event Clustering events • Assigning confidence score to each interpreta#on Supervised learning s#1 s#2 s#3 s#4 s#5 s#1 s#1 s#2 s#5 s#3 s#4 0.85 0.65 0.91 0.43 0.45 0.77 1 2 3 4 s#1 s#2 s#5 A B A B A B s#3 s#4 C D C D e#1 [C] <rel> [D] e#2 [A] <rel> [B] { {
  30. 30. RESULTS F1 0 0,1 0,2 0,3 0,4 Events A]ributes (strict) A]ributes (relaxed) First repor#ng Last repor#ng Most frequent Supervised learning
  31. 31. SUMMARY • Building a domain-specific knowledge base • NLP pipeline for informa#on extrac#on • ML for establishing confidence for human processing • Open research problems • Long-tail en##es • En##es "not worthy" of a Wikipedia page • What are the aRributes that ma#er?
  32. 32. UNDERSTANDING INFORMATION NEEDS F. Hasibi, K. Balog, and S. E. Bratsberg. 
 Exploi'ng En'ty Linking in Queries for En'ty Retrieval. 
 ICTIR’16.
  33. 33. ANNOTATING QUERIES WITH ENTITIES • Seman#c annota#ons of queries were taken for granted so far • How can automa'c en'ty annota'ons of queries be leveraged to improve en'ty retrieval? barack obama parents
  34. 34. APPROACH <Barack_Obama> Annotations: barack obama parents Entity-based representation ˆDˆD Term-based representation DD term-based matching entity-based matching entity linking <dbo:birthPlace>: [<Honolulu>, <Hawaii> ] <dbo:child>: <Barack_Obama> <dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …] Query terms: <rdfs:label>: Ann Dunham <dbo:abstract>: Stanley Ann Dunham the mother Barack Obama, was an American anthropologist who … <dbo:birthPlace>: Honolulu Hawaii … <dbo:child>: Barack Obama <dbo:wikiPageWikiLink>: United States Family Barack Obama Term-based representa'on En'ty-based representa'on barack obama parents <Barack_Obama> Annotations: barack obama parents Entity-based representation ˆDˆD Term-based representation DD term-based matching entity-based matching entity linking <dbo:birthPlace>: [<Honolulu>, <Hawaii> ] <dbo:child>: <Barack_Obama> <dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …] Query terms: <rdfs:label>: Ann Dunham <dbo:abstract>: Stanley Ann Dunham the mother Barack Obama, was an American anthropologist who … <dbo:birthPlace>: Honolulu Hawaii … <dbo:child>: Barack Obama <dbo:wikiPageWikiLink>: United States Family Barack Obama <Barack_Obama> en'ty annota'on (automa'c)
  35. 35. RESULTS MAP 0,00 0,06 0,11 0,17 0,22 LM MLM-tc MLM-all PRMS SDM FSDM baseline +ELR
  36. 36. ANALYSIS
  37. 37. SUMMARY • Automa#cally annota#ng queries with en##es can significantly improve retrieval performance • Open research problem: • How should a query be answered (list, fact, table, etc.)?
  38. 38. ENTITY SUMMARIES F. Hasibi, K. Balog, and S. E. Bratsberg. 
 Dynamic Factual Summaries for En'ty Cards. 
 SIGIR’17.
  39. 39. ENTITY SUMMARIES • Summaries serve a dual purpose • Synopsis of the en#ty • Provide evidence why the en#ty is a good answer for the given query • How to generate dynamic en'ty summaries that can directly address users’ informa'on needs? • Two subtasks • Fact ranking — What should be in the summary? • Summary genera#on — How should it be presented?
  40. 40. EXAMPLE einstein awards Sta'c (query-independent) summary Dynamic (query-dependent) summary Born: March 14, 1879, Ulm, Germany Died: April 18, 1955, Princeton, New Jersey, United States Influenced by: Isaac Newton, Mahatma Gandhi, more Spouse: Elsa Einstein, Mileva Marić Children: Eduard Einstein, Lieserl Einstein, Hans A. Einstein Born: March 14, 1879, Ulm, Germany Died: April 18, 1955, Princeton, New Jersey, United States Awards: Barnard Medal, Nobel Prize in Physics, more Educa'on: Swiss Federal Polytechnic, University of Zurich Influenced by: Isaac Newton, Mahatma Gandhi, more
  41. 41. FACT RANKING • Ranking en#ty facts according to various "goodness" criteria • Importance: how well it describes the en#ty • Relevance: how well it supports/explains why the en#ty is a relevant result for the given query (informa#on need) • U'lity: combines importance and relevance • Learning-to-rank approach with specific features designed for capturing importance and relevance
  42. 42. SUMMARY GENERATION • A summary is more than a ranked list of facts Seman'cally iden'cal predicates Presenta'on 
 (human-readable labels, size constraints) Mul'-valued predicates <dbo:capital> <dbpedia:Oslo> <dbo:currency> <dbpedia:Norwegian_krone> <dbo:leader> <dbpedia:Harald_V_of_Norway> <dbp:establishedDate> 1814-05-17 <dbp:leaderName> <dbpedia:Harald_V_of_Norway> <foaf:homepage> <hRp://www.norway.no/> <dbo:language> <dbpedia:Norwegian_language> <dbo:language> <dbpedia:Romani_language> <dbo:language> <dbpedia:Scandoromani_language> <dbp:website> <hRp://www.norway.no/> <dbo:leaderTitle> President of the Stor#ng <dbp:areaKm> 385178 vs. Capital: Oslo Currency: Norwegian krone Leader: Harald V of Norway Homepage: hRp://www.norway.no/ Language: Norwegian, Romani, more
  43. 43. SUMMARY GENERATION ALGORITHM … … headingiheadingi valueivaluei height(⌧h)height(⌧h) width(⌧w)width(⌧w) lineilinei 1. Selec'ng line headings • Recognizing seman#cally iden#cal predicates • Mapping predicates to human readable labels 2. Collec'ng line values • Grouping values for mul#-valued predicates • Adhering to size constraints
  44. 44. FACT RANKING EVALUATION All facts Facts with URI-only objects NGCD@10 0 0,2 0,4 0,6 0,8 Importance U'lity RELIN DynES/imp DynES NGCD@10 0,00 0,23 0,45 0,68 0,90 Importance U'lity RELIN LinkSUM SUMMARUM DynES/imp DynES
  45. 45. END-TO-END (SUMMARY) EVALUATION • How do sta#c and dynamic summaries compare against each other? Oracle (perfect) fact ranking Automa#c fact ranking 0 25 50 75 100 31 37 23 16 46 47 Dynamic summary wins Sta#c summary wins
  46. 46. SUMMARY • Addressed the problem of genera#ng dynamic (query-dependent) en#ty summaries • Open research problems • What should be on the en#ty card? • Other forms of result presenta#on (tables, lists, graphs, etc.)
  47. 47. ANTICIPATING INFORMATION NEEDS J. Benetka, K. Balog, and K. Nørvåg. 
 An'cipa'ng Informa'on Needs Based on Check-in Ac'vity. 
 WSDM’17.
  48. 48. ZERO-QUERY SEARCH • ProacAve instead of reacAve search • "An#cipate user needs and respond with informa#on appropriate to the current context without the user having to enter a query" — (Allan et al., SIGIR Forum 2012) • Using a person's check-in ac'vity as context, can we an'cipate her informa'on needs, and respond with a set of informa'on cards that directly address those needs? Terminal Weather 21ºC Traffic
  49. 49. INFORMATION NEEDS FOR ACTIVITIES • What are relevant informa#on needs in the context of a given ac#vity? • Use POI categories (Foursquare) to represent ac#vi#es • Mine informa#on needs from search sugges#ons
  50. 50. ANTICIPATING INFORMATION NEEDS • Maximize the likelihood of sa#sfying the user's informa#on needs by considering each possible ac#vity that might follow next • Transi#on probabili#es are es#mated based on historical check-in data Activity A Activity B Activity C Activity D 45% 34% 21% ?
  51. 51. Train Test80% User 3 User 2 User 1 Check-in dataset EVALUATION METHODOLOGY Terminal Weather 21ºC Traffic
  52. 52. RESULTSNGCD@5 0,00 0,23 0,45 0,68 0,90 Top level Second level Most frequent informa#on needs, regardless of the last ac#vity M0 Consider informa#on needs for all possible upcoming ac#vi#es In addi#on, consider the informa#on needs relevant to the past ac#vity (fixed weight for all info needs) Consider the temporal sensi#vity of each informa#on need individually M1 M2 M3
  53. 53. SUMMARY • Iden#fying informa#on needs that are relevant in the context of a given ac#vity and proac#vely presen#ng informa#on cards addressing those needs • Open research problems • Other contexts • (Access to data, privacy...)
  54. 54. THE FUTURE 3 PART Making the right informa#on available to the right person at the right #me.
  55. 55. IMAGINARY SCENARIO WITH AN INTELLIGENT PERSONAL ASSISTANT
  56. 56. I see you're was'ng 'me away on Facebook. Do you have 'me now to talk about your holiday plans?Sure. I want an ac've holiday with the family in beau'ful nature. It sounds like you would definitely love Norway. A cabin in the mountains maybe? Could be. But I want to go kayaking and also catch some fish. 
 And not too much rain, please. And something fun for the kids nearby, I suppose? Of course. How does Oltedal sound? People have been quite successful with catching lake trout based on what I found on Instagram. There is also a theme park and horse riding, both within 50kms.
  57. 57. And what about the weather? You know we’re talking about Norway, right…? Anyway, based on sta's'cs from the past 30 years, this is one of the areas with the least amount of rain if you go in August. I see. What about accommoda'on? Here is a list of places that I think you might like. Any opinions on this one? According to the reviews that I can find on the web, the cabins are well equipped, the staff is nice and they even allow guests to borrow their kayaks.
  58. 58. OK. Let’s find a date that works for everyone. According to your wife's calendar, her parents will be visi'ng you in the first week of August. School starts for the kids on the week of Aug 22. So there is a two week window between Aug 8 and 21, assuming that I can cancel the regular weekly mee'ngs with your PhD students. That's fine. The students won't mind. Write them an email to upload their holiday plans to the group wiki, and add summer planning to the next group mee'ng's agenda. Guys, What are your plans for the summer? Please upload your away times to the group wiki. -Kr To: XXX, YYY, ZZZ Send Agenda item Summer planning added
  59. 59. In the mean'me, I called the cabin to check availability. Their online booking system is down at the moment. They s'll have some cabins available. Do you want to see them? No, I had enough of this for today. Mail the pictures to my wife with some kind words. Anything else I can do for you?Order a water filter for my espresso machine. I just found out that it'll need to be replaced soon. Darling, You will love the place I found for us for a vacation in August. It is by the water; at night we will hear the waves. We will be able to take our morning breakfasts on the balcony, which ... To: Wife Send
  60. 60. FUTURE RESEARCH THEMES
  61. 61. UNDERSTANDING 
 INFORMATION NEEDS • Natural language conversa#onal interface • An#cipa#ng informa#on needs • Proac#ve recommenda#ons It sounds like you would definitely love Norway. A cabin in the mountains maybe? And something fun for the kids nearby, I suppose? I see you're was'ng 'me away on Facebook. Do you have 'me now to talk about your holiday plans?
  62. 62. DATA • Long-tail en##es • On-the-fly informa#on extrac#on • "Personal" knowledge base • "Wife", "My students", "my group", "my espresso machine", ... en##es I care about Here is a list of places that I think you might like. According to the reviews that I can find on the web, ... Order a water filter for my espresso machine. I just found out that it'll need to be replaced soon. Breville BES860XL Barista Express Espresso Machine
  63. 63. RESULT PRESENTATION 
 & USER INTERACTION • Providing evidence • "Ac#onable" en##es • Make booking, order item, write email, ... • Helping the user to get things done • Support for task comple#on ... based on sta's'cs from the past 30 years, ... According to your wife's calendar, ... Agenda item Summer planning added Write them an email to upload their holiday plans to the group wiki, and add summer planning to the next group mee'ng's agenda.
  64. 64. SUMMARY Understanding informa'on needs Data source(s) Result presenta'on 
 & user interac'on Retrieval method • Seman#c annota#ons • An#cipa#ng info needs • Natural language 
 conversa#onal interfaces • Long tail en##es • Personal knowledge base • On-the-fly informa#on extrac#on • Hybrid approaches • En#ty cards • Ac#onable en##es • Support for task comple#on
  65. 65. ACKNOWLEDGMENTS • Joint work with • Faegheh Hasibi • Jan Benetka • Darío Gariglioz • Kje#l Nørvåg • Svein Erik Bratsberg
  66. 66. QUESTIONS? @krisz'anbalog 
 krisz#anbalog.com

×