Recent Trends in Semantic Search Technologies


Published on

A talked given by Peter Mika and Thanh Tran at SemTechBiz 2013

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Mobile: Google interactive voice search (conversation), Siri (Peter)Facebook’s Graph Search (Thanh)Knowledge Graph (infoboxes)... entity search (“tom cruise actor”) to list/category queries (“tom cruise spouses”) to question-answering (“tom cruise height”) (Thanh)Spark (Yahoo!): related entity recommendation (Peter)Thanh’s search engine: auto-complete based on the schema/data, entity search to relational search using Yago data (Thanh)Glimmer: RDF search engine (Peter)
  • Semantic search can be seen as a retrieval paradigm Centered on the use of semanticsIncorporates the semantics entailed by the query and (or) the resources into the matching process, it essentially performs semantic search.
  • Facebook invited, but continues to pursue OGP
  • We implemented the search paradigms and integrated them as separate search modules into a demonstrator system of the Information Workbench7 that has been developed as a showcase for interaction with the Web of data. In particular, keyword search is implemented according to the design and technologies employed by standard Semantic Web search engines. Like Sindice and FalconS, we use an invertedindex to store and retrieve RDF resources based on terms. Also using the inverted index, faceted search is implemented based on the techniques discussed in [25]. Result completion is based on recent work discussed for the TASTIER system [8]. For computing join graphs, we use the top-k procedure elaborated in [9]. This technique is also used for computing top-k interpretations, i.e. to support query completion. We choose to display the top-6 queries and the top-25 results respectively.
  • Recent Trends in Semantic Search Technologies

    1. 1. Peter Mika| Yahoo! Research, Spainpmika@yahoo-inc.comThanh Tran | Semsolute, GermanyTran@semsolute.comSemantic Search on the Rise
    2. 2. About the speakers Peter Mika Senior Research Scientist Head of Semantic Search group atYahoo! Labs Expertise: Semantic Search, WebObject Retrieval, Natural LanguageProcessing Tran Duc Thanh CEO of Semsolute, Semantic SearchTechnologies Company Served as Assistant Professor forKarlsruhe Institute of Technology andStanford University Expertise: Semantic Search,Semantic / Linked Data Management
    3. 3. Agenda Why Semantic Search What is Semantic Search Innovative Semantic Search Applications Behind the Scene Questions
    4. 4. Why Semantic Search?
    5. 5. Why Semantic Search? I. “We are at the beginning of search.“ (Marissa Mayer) Solved large classes of queries, e.g. navigational Remaining queries are hard, not solvable by bruteforce, require deep understanding of the world andhuman cognition, e.g. Ambiguous searches: paris hilton Imprecise or overly precise searches Searches for descriptions: 34 year old computer scientistliving in barcelona Background knowledge and metadata can help toaddress poorly solved queriesMany of these querieswould not be asked byusers, who learned overtime what searchtechnology can and cannot do.
    6. 6. Why Semantic Search? II. The Semantic Web is now a reality Large amounts of data published in RDF Linked Data Metadata in HTML Facebook‟s Open Graph Protocol Casual users Don‟t know SPARQL Unaware of the schema of the data Searching data instead or in addition to searchingdocuments Enable innovative search applications / tasks
    7. 7. What is Semantic Search?
    8. 8. Semantic Search: Using Semantic Models forSearch Semantic search is a retrieval paradigm that Exploits the semantics of the data or explicit backgroundknowledge to understand user intent and the meaning ofcontent Incorporates the intent of the query and the meaning ofcontent into the search process (semantic models)
    9. 9. Semantic Search: Different Kinds / DifferentUses of Semantic Models Wide range of semantic search systems Employ different semantic models, possibly atdifferent steps of the search process and in order tosupport different tasks Query formulation Query processing / understanding Ranking Result presentation Result / query refinement
    10. 10. Semantic models Semantics is concerned with the meaning of theresources made available for search Various representations of meaning Word-level models: models of relationships amongwords Taxonomies, thesauri, dictionaries of entity names Inference along linguistic relations, e.g. broader/narrowerterms Concept-level models: models of relationshipsamong objects Ontologies capture entities in the world and theirrelationships Inference along domain-specific relations
    11. 11. Graph-based Conceptual Models Core of W3C standards for knowledge representationand data exchange: RDF, OWL Large amount of data / knowledge on the Webavailable as graphs Linked Data: hundreds of interconnected datasetscapturing domain-independent and domain-specificknowledge Metadata in HTML RDFa, microdata, Facebook‟s OGP Private graphs Google‟s Knowledge Graph Facebook Graph Yahoo‟s Knowledge Base (talk yesterday) Microsofts Satori
    12. 12. Linked Data
    13. 13. Where can you find Linked Data? Downloads Dbpedia data dumps SPARQL access LOD cache by OpenLink: 51 billion triples Keyword search Sindice by SindiceTech
    14. 14. Google Knowledge Graph Start with Freebase‟s database, which had 12 millionentities As of June 2012, Knowledge Graph has 500 millionentities and over 3.5 billion relationships betweenthose entities Prioritize properties based on what users were most
    15. 15. Facebook‟s Open Graph Protocol The „Like‟ button provides publishers with a way topromote their content on Facebook and buildcommunities Shows up in profiles and news feed Site owners can later reach users who have liked anobject Facebook Graph API allows 3rd party developers toaccess the data Open Graph Protocol is an RDFa-based format thatallows to describe the object that the user „Likes‟
    16. 16. Facebook‟s Open Graph Protocol RDF vocabulary to be used in conjunction with RDFa Simplify the work of developers by restricting the freedom in RDFa Activities, Businesses, Groups, Organizations, People, Places,Products and Entertainment Only HTML <head> accepted<html xmlns:og=""><head><title>The Rock (1996)</title><meta property="og:title" content="The Rock" /><meta property="og:type" content="movie" /><meta property="og:url"content="" /><meta property="og:image" content="" /> …</head> ...
    17. 17. Semantic Web markup: Agreement on a shared set of schemas for common typesof web content Use a single format to communicate the same information to all threesearch engines Bing, Google, and Yahoo! (June, 2011), Yandex (Nov, 2011) Microdata and RDFa support Schemas for most common web content Business listings, images/video, recipes, reviews, products, jobs… Community
    18. 18.
    19. 19. Current state of metadata on the Web Analysis of the Bing/Yahoo! Search Crawl US crawl, January, 2012 31% of webpages, 5% of domains contain some metadata P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus,LDOW 2012 Data extracted from a public crawl ( February, 2012 results show 11% of URLs with metadatacompared to 5% in 2009/2010 data 7.3 billion triples available for download H.Mühleisen, C.Bizer.Web Data Commons - ExtractingStructured Data from Two Large Web Corpora, LDOW 2012 Large increase in RDFa and microdata adoption comparedto microformats
    20. 20. Where can you find HTML metadata? Web Data Commons Glimmer: Online index of the data in Web DataCommons
    21. 21. Innovative Semantic Search Applications
    22. 22. Innovative Semantic Search Applications Entity search: entity/entities as results Factual search: direct answers, facts (about entities) Relational search: complex relationships between entities Semantic auto-completion: suggesting queries based onthe intent of the provided inputs Results aggregation / analysis / prediction: applycomputational models Semantic log analysis: understanding user behavior interms of objects Semantic profiling: recommendations based on particularinterests Semantic context: contextual model of users / interests Support for complex tasks, e.g. booking a vacation using acombination of services Conversational search
    23. 23. Entity Search: Entity-basedDisambiguation
    24. 24. Entity Search: Entity Summary
    25. 25. Entity Search: Entity-based Navigation / Exploration
    26. 26. Factual Search
    27. 27. Relational Search
    28. 28. Semantic auto-completion: Facebook GraphSearch
    29. 29. Semantic Auto-completion: Semsolute‟s semantic searchengineVorlesung Knowledge Discovery - InstitutAIFBSyntacticCompletionsKeywordsSemanticCompletions29
    30. 30. Results Aggregation
    31. 31. Contextual (pervasive, ambient) searchYahoo! ConnectedTV:Widget engineembedded into theTVYahoo! IntoNow:recognize audio andshow related content
    32. 32. Interactive Voice Search Siri Question-Answering Variety of backend sourcesincluding Wolfram Alpha andvarious Yahoo! services Task completion E.g. schedule an event
    33. 33. Conversational Search Google‟s Interactive Voice Search
    34. 34. Conversational Search Parlance EU project Complex dialogs around a set of objects Restaurant Area Price range Type of cuisine Complete system Automated Speech Recognition (ASR) Spoken Language Understanding (SLU) Interaction Management Knowledge Base Natural Language Generation (NLG) Text-to-Speech (TTS) Video Commercial alternatives from Nuance
    35. 35. Behind the Scene
    36. 36. Main Technological Building Blocks Query Interpretation Spelling Correction Query Segmentation Entity Recognition Query Intent Interpretation for Semantic Auto-Completion Ranking Entity Ranking Relationship Ranking Aggregation Result Fusion Rank / Score Aggregation Result Presentation Summary Generation Visualization
    37. 37. Semsolute‟s Building Blocks - Keyword / Key PhraseInterpretationEntity“address company sanfrancisco” Semantic entity index Inverted index for entities /triples Return entities / entities‟relationships as results tokeys Semantic entity ranking Structured language model:one language model for everyattribute Returns entities‟ LMs thatmost likely generate thekeywords, i.e. the entitydescriptions that best match
    38. 38. Relationships / StructureEntity“address company sanfrancisco”Semsolute‟s Building Blocks – Semantic GraphConstruction Offline component: query-independent schema graph Reuse schema Pseudo-schema construction:all possible connectionsbetween classes of entities,e.g. friendships between users Online component: query-specific keyword matchingelements Connect keyword matchingelements / entities to theclasses they belong to
    39. 39. Relationships / StructureEntity“address company sanfrancisco”Semsolute‟s Building Blocks – Graph Exploration Top-k graph exploration Shortest-path based algorithmthat finds top-k graphsconnecting keyword matchingelements Top-k graph ranking Language model based Aggregated model thatcombines the LMs of entitiesmatching the keywords
    40. 40. Semsolute‟s Building Blocks – Query Generation &ProcessingTripleRelationships / StructureEntityAddress of companies located in SanFrancisco?“address company sanfrancisco” Graph to query mapping Translation rules that map topranked graphs to structuredqueries (SQL, SPARQL) Translation rules that mapstructured queries to naturallanguage questions Graph matching Triple index: cover indexsupporting different triplepatterns Various join implementations
    41. 41. Yahoo! Spark: Entity Recommendation inSearch Different use cases in Web Search Some users are short on time Need direct answers Query expansion, question-answering, information boxes, richresults… Other users want to explore Long term interests such as sports, celebrities, movies and music Long running tasks such as travel planning Spark is a search assistance tool for exploration Recommend related entities given the user‟s currentquery Based on explicit relations in a Knowledge Base
    42. 42. Example user sessions
    43. 43. Spark example I.
    44. 44. Spark example II.
    45. 45. High-Level Architecture ViewEntitygraphDatapreprocessingFeatureextractionModellearningFeaturesourcesEditorialjudgementsDatapackRankingmodelRanking anddisambiguationEntitydataFeatures
    46. 46. Spark challenges Interpretation and disambiguation Obama and Toyota are places in Japan, but maybethe user is not looking for them The popularity of “obama” is not a sign of thepopularity of a Japanese town Ranking “Release me” from Engelbert Humperdinck shouldrank higher than “Lesbian Seagull” which onlyappeared on the soundtrack of a Beavis andButthead episode Editorial relevance vs. what people click Large-scale data processing and ML Knowledge Base built from Wikipedia, Yahoo!data, Web extraction Feature extraction from query logs, Flickr and TwitterdataEntitygraphDatapreprocessingFeatureextractionModellearningFeaturesourcesEditorialjudgementsDatapackRankingmodelRanking anddisambiguationEntitydataFeatures
    47. 47. Contact Peter Mika @pmika Tran Duc Thanh
    48. 48. Resources
    49. 49. Resources Detailed information Peter Mika. Entity Search on the Web, Keynote at Web ofLinked Entities WS Peter Mika, Thanh Tran. Semantic search tutorialSemTech2012 Books Ricardo Baeza-Yates and Berthier Ribeiro-Neto. ModernInformation Retrieval. ACM Press. 2011 Survey papers Thanh Tran, Peter Mika. Survey of Semantic SearchApproaches. Under submission, 2012. Conferences and workshops ISWC, ESWC, WWW, SIGIR, CIKM, SemTech Semantic Search workshop series Exploiting Semantic Annotations in Information Retrieval(ESAIR) Entity-oriented Search (EOS) workshop Web of Linked Entities (WoLE) workshop