Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Semantic Search and Result Presentation with Entity Cards

155 views

Published on

Slides of Search Engines Amsterdam (SEA) talk
https://www.meetup.com/SEA-Search-Engines-Amsterdam/events/239659928/

Modern search engines have moved from the traditional “ten blue links” environment towards understanding searchers' intent and providing them with the focused responses; a paradigm that is referred to as “semantic search”. Semantic search is an umbrella term that encompasses various techniques, including but not limited to, query understanding, entity retrieval, and result presentation. In this talk, I will give a brief overview on each of these tasks, and further focus on the result presentation aspect. Specifically, I will present methods for generating content for “entity cards”​, the informational panels that are presented at the right column of search engine results pages. I will end the talk by introducing a practical toolkit and dataset that are meant to foster research in this area.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Semantic Search and Result Presentation with Entity Cards

  1. 1. SEMANTIC SEARCH AND RESULT PRESENTATION WITH ENTITY CARDS FAEGHEH HASIBI | SEARCH ENGINES AMSTERDAM | JUNE 30, 2017
  2. 2. SEMANTIC SEARCH AND RESULT PRESENTATION WITH ENTITY CARDS FAEGHEH HASIBI | SEARCH ENGINES AMSTERDAM | JUNE 30, 2017
  3. 3. “——————————” SEMANTIC SEARCH
  4. 4. SEMANTIC SEARCH “Search with Meaning”
  5. 5. KNOWLEDGE BASE (Core data-enabling component of semantic search)
  6. 6. Albert Einstein ENTITY <dbr:Albert_Einstein, foaf:name, Albert Einstein> <dbr:Albert_Einstein, dbo:birthDate, 1879-03-14> <dbr:Albert_Einstein, dbo:birthPlace, dbr:Ulm> <dbr:Albert_Einstein, dbo:birthPlace, dbr:German_Empire> <dbr:Albert_Einstein, dbp:description, dbr:Physicist> …
  7. 7. “Search with Meaning” SEMANTIC SEARCH
  8. 8. SEMANTIC SEARCH “Search with Meaning” an umbrella term that encompasses various techniques
  9. 9. ‣ Knowledge acquisition
 and curation ‣ Query understanding ‣ Answer retrieval ‣ Result presentation ‣ … BUILDING BLOCKS
  10. 10. OVERVIEW Result
 Presentation 01+ Resources
  11. 11. RESULT PRESENTATION Summarizing Entities for Entity Cards F. Hasibi, K. Balog, and S.E. Bratsberg. “Dynamic Factual Summaries for Entity Cards”. 
 In Proceedings of SIGIR ’17
  12. 12. ENTITY CARDS Entity Summary
  13. 13. ENTITY SUMMARIZATION Albert Einstein … and ~700 more facts dbo:almaMater dbr:ETH_Zurich dbo:almaMater dbr:University_of_Zurich dbo:award dbr:Max_Planck_Medal dbo:award dbr:Nobel_Prize_in_Physics dbo:birthDate 1879-03-14 dbo:birthPlace dbr:Ulm dbo:birthPlace dbr:German_Empire dbo:citizenship dbr:Austria-Hungary dbo:children dbr:Eduard_Einstein dbo:children dbr:Hans_Albert_Einstein dbo:deathDate 1955-04-18 dbo:deathPlace dbr:Princeton,_New_Jersey dbo:spouse dbr:Elsa_Einstein dbo:spouse dbr:Mileva_Marić dbp:influenced dbr:Leo_Szilard
  14. 14. ENTITY SUMMARIES einstein awardseinstein family
  15. 15. Other applications ‣ News search • hovering over an entity in entity-annotated documents ‣ Job search • company descriptions for a given topic ENTITY SUMMARIES
  16. 16. ENTITY SUMMARIES ? Question How to generate 
 query-dependent entity summaries that can directly address users’ information needs?
  17. 17. METHOD Fact ranking Ranking a set of entity facts (and a search query) with respect to some criterion Summary generation constructing an entity summary from ranked entity facts, for a given size
  18. 18. RANKING CRITERIA Importance: The general importance of that fact in describing the entity, irrespective of any particular information need. Relevance: The relevance of fact to query reflects how well the fact supports the information need underlying the query.
  19. 19. Utility RANKING CRITERIA Utility: The utility of a fact combines the general importance and the relevance of a fact into a single number importance relevance
  20. 20. Importance .. FACT RANKING Relevance .. ‣ Supervised ranking with fact-query pairs as learning instances ‣ Learning is optimized on utility with different weights • more bias towards importance or relevance
  21. 21. FACT RANKING ‣ Knowledge bases statistics as ingredients for Importance features • absence of query logs
  22. 22. METHOD Fact ranking Ranking a set of entity facts (and a search query) with respect to some criterion Summary generation constructing an entity summary from ranked entity facts, for a given size
  23. 23. SUMMARY GENERATION 1. dbo:birthDate 1879-03-14 2. dbp:placeOfBirth Ulm 3. dbo:birthPlace dbr:Ulm 4. dbo:deathDate 1955-04-18 5. dbo:award dbr:Nobel_Prize_in_Physics 6. dbo:deathPlace dbr:Princeton,_New_Jersey 7. dbo:birthPlace dbr:German_Empire 8. dbo:almaMater dbr:ETH_Zurich 9. dbo:award dbr:Max_Planck_Medal 10.dbp:influenced dbr:Nathan_Rosen 11.dbo:almaMater dbr:University_of_Zurich … multi-valued predicates
  24. 24. SUMMARY GENERATION 1. dbo:birthDate 1879-03-14 2. dbp:placeOfBirth Ulm 3. dbo:birthPlace dbr:Ulm 4. dbo:deathDate 1955-04-18 5. dbo:award dbr:Nobel_Prize_in_Physics 6. dbo:deathPlace dbr:Princeton,_New_Jersey 7. dbo:birthPlace dbr:German_Empire 8. dbo:almaMater dbr:ETH_Zurich 9. dbo:award dbr:Max_Planck_Medal 10.dbp:influenced dbr:Nathan_Rosen 11.dbo:almaMater dbr:University_of_Zurich … identical facts
  25. 25. SUMMARY GENERATION Algorithm 1 Summary generation algorithm Input: Ranked facts Fe, max height h, max width w Output: Entity summary lines 1: M Predicate-Name Mapping(Fe) 2: headin s [] . Determine line headings 3: for f in Fe do 4: pname M[fp] 5: if (pname < headin s) AND (size(headin s)  h) then 6: headin s.add((fp,pname )) 7: end if 8: end for 9: alues [] . Determine line values 10: for f in Fe do 11: if fp 2 headin s then 12: alues[fp].add(fo) 13: end if 14: end for 15: lines [] . Construct lines 16: for (fp,pname ) in headin s do 17: line pname + ‘:’ 18: for in alues[fp] do 19: if len(line) + len(v)  w then 20: line line + . Add comma if needed 21: end if 22: end for 23: lines.add(line) 24: end for ‣ Creates a summary of a given size (length and width) ‣ Resolving identical facts 
 (RF feature) ‣ Grouping multi-valued predicates (GF feature)
  26. 26. EVALUATION
  27. 27. QUERIES February 2014 Increase profit by 35% Keyword Natural languageList search Named entity • “madrid” • “brooklyn bridge” • “vietnam war facts” • “eiffel” • “states that border oklahoma” • “What is the second highest mountain?” Taken from the DBpedia-entity collection K. Balog and R. Neumayer. “A Test Collection for Entity Search in DBpedia” In proc of SIGIR ’13. Query types
  28. 28. EVALUATION (FACT RANKING) Benchmark construction by Crowd sourcing experiments ‣ rate the importance of the fact w.r.t. the entity ‣ rate the relevance of the fact to the query for the given entity Very important Important Not important How important is this fact for the given entity?
  29. 29. EVALUATION (FACT RANKING) Benchmark construction by crowd sourcing experiments ‣ Collecting judgments for ~4K facts ‣ 5 judgments per record ‣ Fleiss’ Kappa of 0.52 and 0.41 for importance and relevance, (moderate agreement)
  30. 30. RESULTS (FACT RANKING) number of oximately used and e features. validation, We report cal signi- = 0.05) or mance (for icance. approach g fact rele- ynES uses nES/imp mportance ures only Table 2: Comparison of fact ranking against the state-of-the- art of approaches with URI-only objects. Signicance for lines i 3 are tested against lines 1,2,3, and for lines 2,3 are tested against lines 1,2. Model Importance Utility NDCG@5 NDCG@10 NDCG@5 NDCG@10 RELIN 0.6368 0.7130 0.6300 0.7066 LinkSum 0.7018M 0.7031 0.6504 0.6648 SUMMARUM 0.7181N 0.7412 M 0.6719 0.7111 DynES/imp 0.8354NNN 0.8604NNN 0.7645NNN 0.8117NNN DynES 0.8291NNN 0.8652NNN 0.8164NNN 0.8569NNN Table 4: Fact ranking performance by removing features; features are sorted by the relative dierence they make. Group Removed feature NDCG@10 % p DynES - all features 0.7873 - - Imp. - NEFp 0.7757 -1.16 0.08 Imp. - T peImp 0.7760 -1.13 0.14 16% improvements over the best baseline

  31. 31. ‣ Users consume all facts displayed in the summary ‣ The quality of the whole summary should be assessed ‣ Side-by-side evaluation of factual summaries by human EVALUATION (SUMMARY GENERATION)
  32. 32. RESULTS (SUMMARY GENERATION)−10 −5 0 Exp User prefere (a) DynES vs. DynES/imp −10 −5 0 User prefer −1 − User preferen Figure 4: Boxplot for distribution of user preferences for each q DynES/imp or DynES/rel. Table 5: Side-by-Side evaluation of summaries for dierent fact ranking methods. Model Win Loss Tie RI DynES vs. DynES/imp 46 23 31 0.23 DynES vs. DynES/rel 75 12 13 0.63 DynES vs. RELIN 95 5 0 0.90 Utility vs. Importance 47 16 37 0.31 Table 6: Side-by-side evaluation of summaries for dierent summary generation algorithms. Model Win Loss Tie RI DynES vs. DynES(-GF)(-RF) 84 1 15 0.83 DynES vs. DynES(-GF) 74 0 26 0.74 DynES vs. DynES(-RF) 46 2 52 0.44 preferred DynES summaries over DynES/imp (or DynES/rel) sum- maries; ties are ignored. Considering all queries (the black boxes), we observe that the utility-based summaries (DynES) are generally preferred over the other two, and especially over the relevance- −10 −5 0 Exp User prefere (a) DynES vs. DynES/imp −10 −5 0 User prefere −1 − User preferenc Figure 4: Boxplot for distribution of user preferences for each q DynES/imp or DynES/rel. Table 5: Side-by-Side evaluation of summaries for dierent fact ranking methods. Model Win Loss Tie RI DynES vs. DynES/imp 46 23 31 0.23 DynES vs. DynES/rel 75 12 13 0.63 DynES vs. RELIN 95 5 0 0.90 Utility vs. Importance 47 16 37 0.31 Table 6: Side-by-side evaluation of summaries for dierent summary generation algorithms. Model Win Loss Tie RI DynES vs. DynES(-GF)(-RF) 84 1 15 0.83 DynES vs. DynES(-GF) 74 0 26 0.74 DynES vs. DynES(-RF) 46 2 52 0.44 preferred DynES summaries over DynES/imp (or DynES/rel) sum- maries; ties are ignored. Considering all queries (the black boxes), we observe that the utility-based summaries (DynES) are generally • Users preferred utility-based summaries over the others • Grouping of multivalued predicates (GF) is perceived as more important by the users than the resolution of identical facts (RF)
  33. 33. RESOURCES 1. Entity search toolkit 2. Test collection
  34. 34. SEMANTIC SEARCH “Search with Meaning” an umbrella term that encompasses various techniques
  35. 35. SEMANTIC SEARCH TOOLKIT Entity retrieval: Returns a ranked list of entities in response to a query Entity linking: Identifies entities in a query and links them to the corresponding entry in the Knowledge base Target type identification: Detects the target types (or categories) of a query Functionalities
  36. 36. SEMANTIC SEARCH TOOLKIT • Web interface, API, and command line usage
 • 3-tier architecture • Online source code and documentation Highlights
  37. 37. NORDLYS http://nordlys.cc/
  38. 38. NORDLYS
  39. 39. DBPEDIA-ENTITY V2 • (37 + 17) runs + old qrels
 • Pool size: 150K • 3 point likert-scale • 5 judgments per query-entity Details
  40. 40. DBPEDIA-ENTITY V2 778.2.Retrievalmethod Model SemSearch ES INEX-LD ListSearch QALD-2 Total @10 @100 @10 @100 @10 @100 @10 @100 @10 @100 BM25 0.2497 0.4110 0.1828 0.3612 0.0627 0.3302 0.2751 0.3366 0.2558 0.3582 PRMS 0.5340 0.6108 0.3590 0.4295 0.3684 0.4436 0.3151 0.4026 0.3905 0.4688 MLM-all 0.5528 0.6247 0.3752 0.4493 0.3712 0.4577 0.3249 0.4208 0.4021 0.4852 LM 0.5555 0.6475 0.3999 0.4745 0.3925 0.4723 0.3412 0.4338 0.4182 0.5036 SDM 0.5535 0.6672 0.4030 0.4911 0.3961 0.4900 0.3390 0.4274 0.4185 0.5143 LM+ELR 0.5557 0.6477 0.4013 0.4763 0.4037 0.4885 0.3464 0.4377 0.4228 0.5093 SDM+ELR 0.5533 0.6676 0.4097 0.4975 0.4142 0.5058 0.3434 0.4350 0.4257 0.5220 MLM-CA 0.6247 0.6854 0.4029 0.4796 0.4021 0.4786 0.3365 0.4301 0.4365 0.5143 BM25-CA 0.5858 0.6883 0.4120 0.5050 0.4220 0.5142 0.3566 0.4426 0.4399 0.5329 FSDM 0.6521 0.7220 0.4214 0.5043 0.4196 0.4952 0.3401 0.4358 0.4524 0.5342 BM25F-CA 0.6281 0.7200 0.4394 0.5296 0.4252 0.5106 0.3689 0.4614 0.4605 0.5505 FSDM+ELR 0.6568 0.7260 0.4397 0.5144 0.4246 0.5011 0.3467 0.4450 0.4607 0.5416 Table 8.3: Results, broken down into query subtypes, on DBpedia-entity v2. Baseline runs: Generative models are available on Nordlys
  41. 41. DBPEDIA-ENTITY COLLECTIONS Entity 
 Retrieval Entity 
 Summarization Target Type 
 Identification • 468 queries • 19K relevant entities • 100 queries • 4K entity facts • 485 queries • ~900 types Built with DBpedia 2015-10 https://github.com/iai-group/DBpedia-Entity
  42. 42. Nordlys: Krisztian Balog, Dario Garigliotti, Shuo Zhang, Heng Ding DBpedia-Entity Collection: Fedor Nikolaev, Chenyan Xiong, Svein Erik Bratsberg, Krisztian Balog, Alexander Kotov, Jamie Callan ACKNOWLEDGEMENT
  43. 43. THANK YOU

×