• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Recent Trends in Semantic Search Technologies
 

Recent Trends in Semantic Search Technologies

on

  • 456 views

A talked given by Peter Mika and Thanh Tran at SemTechBiz 2013

A talked given by Peter Mika and Thanh Tran at SemTechBiz 2013

Statistics

Views

Total Views
456
Views on SlideShare
450
Embed Views
6

Actions

Likes
1
Downloads
19
Comments
0

1 Embed 6

http://www.linkedin.com 6

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Mobile: Google interactive voice search (conversation), Siri (Peter)Facebook’s Graph Search (Thanh)Knowledge Graph (infoboxes)... entity search (“tom cruise actor”) to list/category queries (“tom cruise spouses”) to question-answering (“tom cruise height”) (Thanh)Spark (Yahoo!): related entity recommendation (Peter)Thanh’s search engine: auto-complete based on the schema/data, entity search to relational search using Yago data (Thanh)Glimmer: RDF search engine (Peter)
  • Semantic search can be seen as a retrieval paradigm Centered on the use of semanticsIncorporates the semantics entailed by the query and (or) the resources into the matching process, it essentially performs semantic search.
  • Facebook invited, but continues to pursue OGP
  • We implemented the search paradigms and integrated them as separate search modules into a demonstrator system of the Information Workbench7 that has been developed as a showcase for interaction with the Web of data. In particular, keyword search is implemented according to the design and technologies employed by standard Semantic Web search engines. Like Sindice and FalconS, we use an invertedindex to store and retrieve RDF resources based on terms. Also using the inverted index, faceted search is implemented based on the techniques discussed in [25]. Result completion is based on recent work discussed for the TASTIER system [8]. For computing join graphs, we use the top-k procedure elaborated in [9]. This technique is also used for computing top-k interpretations, i.e. to support query completion. We choose to display the top-6 queries and the top-25 results respectively.

Recent Trends in Semantic Search Technologies Recent Trends in Semantic Search Technologies Presentation Transcript

  • Peter Mika| Yahoo! Research, Spainpmika@yahoo-inc.comThanh Tran | Semsolute, GermanyTran@semsolute.comSemantic Search on the Rise
  • About the speakers Peter Mika Senior Research Scientist Head of Semantic Search group atYahoo! Labs Expertise: Semantic Search, WebObject Retrieval, Natural LanguageProcessing Tran Duc Thanh CEO of Semsolute, Semantic SearchTechnologies Company Served as Assistant Professor forKarlsruhe Institute of Technology andStanford University Expertise: Semantic Search,Semantic / Linked Data Management
  • Agenda Why Semantic Search What is Semantic Search Innovative Semantic Search Applications Behind the Scene Questions
  • Why Semantic Search?
  • Why Semantic Search? I. “We are at the beginning of search.“ (Marissa Mayer) Solved large classes of queries, e.g. navigational Remaining queries are hard, not solvable by bruteforce, require deep understanding of the world andhuman cognition, e.g. Ambiguous searches: paris hilton Imprecise or overly precise searches Searches for descriptions: 34 year old computer scientistliving in barcelona Background knowledge and metadata can help toaddress poorly solved queriesMany of these querieswould not be asked byusers, who learned overtime what searchtechnology can and cannot do.
  • Why Semantic Search? II. The Semantic Web is now a reality Large amounts of data published in RDF Linked Data Metadata in HTML Facebook‟s Open Graph Protocol Schema.org Casual users Don‟t know SPARQL Unaware of the schema of the data Searching data instead or in addition to searchingdocuments Enable innovative search applications / tasks
  • What is Semantic Search?
  • Semantic Search: Using Semantic Models forSearch Semantic search is a retrieval paradigm that Exploits the semantics of the data or explicit backgroundknowledge to understand user intent and the meaning ofcontent Incorporates the intent of the query and the meaning ofcontent into the search process (semantic models)
  • Semantic Search: Different Kinds / DifferentUses of Semantic Models Wide range of semantic search systems Employ different semantic models, possibly atdifferent steps of the search process and in order tosupport different tasks Query formulation Query processing / understanding Ranking Result presentation Result / query refinement
  • Semantic models Semantics is concerned with the meaning of theresources made available for search Various representations of meaning Word-level models: models of relationships amongwords Taxonomies, thesauri, dictionaries of entity names Inference along linguistic relations, e.g. broader/narrowerterms Concept-level models: models of relationshipsamong objects Ontologies capture entities in the world and theirrelationships Inference along domain-specific relations
  • Graph-based Conceptual Models Core of W3C standards for knowledge representationand data exchange: RDF, OWL Large amount of data / knowledge on the Webavailable as graphs Linked Data: hundreds of interconnected datasetscapturing domain-independent and domain-specificknowledge Metadata in HTML RDFa, microdata, Facebook‟s OGP Private graphs Google‟s Knowledge Graph Facebook Graph Yahoo‟s Knowledge Base (talk yesterday) Microsofts Satori
  • Linked Data
  • Where can you find Linked Data? Downloads Dbpedia data dumps SPARQL access LOD cache by OpenLink: 51 billion triples Keyword search Sindice by SindiceTech
  • Google Knowledge Graph Start with Freebase‟s database, which had 12 millionentities As of June 2012, Knowledge Graph has 500 millionentities and over 3.5 billion relationships betweenthose entities Prioritize properties based on what users were most
  • Facebook‟s Open Graph Protocol The „Like‟ button provides publishers with a way topromote their content on Facebook and buildcommunities Shows up in profiles and news feed Site owners can later reach users who have liked anobject Facebook Graph API allows 3rd party developers toaccess the data Open Graph Protocol is an RDFa-based format thatallows to describe the object that the user „Likes‟
  • Facebook‟s Open Graph Protocol RDF vocabulary to be used in conjunction with RDFa Simplify the work of developers by restricting the freedom in RDFa Activities, Businesses, Groups, Organizations, People, Places,Products and Entertainment Only HTML <head> accepted http://opengraphprotocol.org/<html xmlns:og="http://opengraphprotocol.org/schema/"><head><title>The Rock (1996)</title><meta property="og:title" content="The Rock" /><meta property="og:type" content="movie" /><meta property="og:url"content="http://www.imdb.com/title/tt0117500/" /><meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" /> …</head> ...
  • Semantic Web markup: schema.org Agreement on a shared set of schemas for common typesof web content Use a single format to communicate the same information to all threesearch engines Bing, Google, and Yahoo! (June, 2011), Yandex (Nov, 2011) Microdata and RDFa support Schemas for most common web content Business listings, images/video, recipes, reviews, products, jobs… Community public-vocabs@w3.org
  • Schema.org
  • Current state of metadata on the Web Analysis of the Bing/Yahoo! Search Crawl US crawl, January, 2012 31% of webpages, 5% of domains contain some metadata P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus,LDOW 2012 WebDataCommons.org Data extracted from a public crawl (commoncrawl.org) February, 2012 results show 11% of URLs with metadatacompared to 5% in 2009/2010 data 7.3 billion triples available for download H.Mühleisen, C.Bizer.Web Data Commons - ExtractingStructured Data from Two Large Web Corpora, LDOW 2012 Large increase in RDFa and microdata adoption comparedto microformats
  • Where can you find HTML metadata? Web Data Commons Glimmer: glimmer.research.yahoo.com Online index of the schema.org data in Web DataCommons
  • Innovative Semantic Search Applications
  • Innovative Semantic Search Applications Entity search: entity/entities as results Factual search: direct answers, facts (about entities) Relational search: complex relationships between entities Semantic auto-completion: suggesting queries based onthe intent of the provided inputs Results aggregation / analysis / prediction: applycomputational models Semantic log analysis: understanding user behavior interms of objects Semantic profiling: recommendations based on particularinterests Semantic context: contextual model of users / interests Support for complex tasks, e.g. booking a vacation using acombination of services Conversational search
  • Entity Search: Entity-basedDisambiguation
  • Entity Search: Entity Summary
  • Entity Search: Entity-based Navigation / Exploration
  • Factual Search
  • Relational Search
  • Semantic auto-completion: Facebook GraphSearch
  • Semantic Auto-completion: Semsolute‟s semantic searchengineVorlesung Knowledge Discovery - InstitutAIFBSyntacticCompletionsKeywordsSemanticCompletions29
  • Results Aggregation
  • Contextual (pervasive, ambient) searchYahoo! ConnectedTV:Widget engineembedded into theTVYahoo! IntoNow:recognize audio andshow related content
  • Interactive Voice Search Siri Question-Answering Variety of backend sourcesincluding Wolfram Alpha andvarious Yahoo! services Task completion E.g. schedule an event
  • Conversational Search Google‟s Interactive Voice Search
  • Conversational Search Parlance EU project Complex dialogs around a set of objects Restaurant Area Price range Type of cuisine Complete system Automated Speech Recognition (ASR) Spoken Language Understanding (SLU) Interaction Management Knowledge Base Natural Language Generation (NLG) Text-to-Speech (TTS) Video Commercial alternatives from Nuance
  • Behind the Scene
  • Main Technological Building Blocks Query Interpretation Spelling Correction Query Segmentation Entity Recognition Query Intent Interpretation for Semantic Auto-Completion Ranking Entity Ranking Relationship Ranking Aggregation Result Fusion Rank / Score Aggregation Result Presentation Summary Generation Visualization
  • Semsolute‟s Building Blocks - Keyword / Key PhraseInterpretationEntity“address company sanfrancisco” Semantic entity index Inverted index for entities /triples Return entities / entities‟relationships as results tokeys Semantic entity ranking Structured language model:one language model for everyattribute Returns entities‟ LMs thatmost likely generate thekeywords, i.e. the entitydescriptions that best match
  • Relationships / StructureEntity“address company sanfrancisco”Semsolute‟s Building Blocks – Semantic GraphConstruction Offline component: query-independent schema graph Reuse schema Pseudo-schema construction:all possible connectionsbetween classes of entities,e.g. friendships between users Online component: query-specific keyword matchingelements Connect keyword matchingelements / entities to theclasses they belong to
  • Relationships / StructureEntity“address company sanfrancisco”Semsolute‟s Building Blocks – Graph Exploration Top-k graph exploration Shortest-path based algorithmthat finds top-k graphsconnecting keyword matchingelements Top-k graph ranking Language model based Aggregated model thatcombines the LMs of entitiesmatching the keywords
  • Semsolute‟s Building Blocks – Query Generation &ProcessingTripleRelationships / StructureEntityAddress of companies located in SanFrancisco?“address company sanfrancisco” Graph to query mapping Translation rules that map topranked graphs to structuredqueries (SQL, SPARQL) Translation rules that mapstructured queries to naturallanguage questions Graph matching Triple index: cover indexsupporting different triplepatterns Various join implementations
  • Yahoo! Spark: Entity Recommendation inSearch Different use cases in Web Search Some users are short on time Need direct answers Query expansion, question-answering, information boxes, richresults… Other users want to explore Long term interests such as sports, celebrities, movies and music Long running tasks such as travel planning Spark is a search assistance tool for exploration Recommend related entities given the user‟s currentquery Based on explicit relations in a Knowledge Base
  • Example user sessions
  • Spark example I.
  • Spark example II.
  • High-Level Architecture ViewEntitygraphDatapreprocessingFeatureextractionModellearningFeaturesourcesEditorialjudgementsDatapackRankingmodelRanking anddisambiguationEntitydataFeatures
  • Spark challenges Interpretation and disambiguation Obama and Toyota are places in Japan, but maybethe user is not looking for them The popularity of “obama” is not a sign of thepopularity of a Japanese town Ranking “Release me” from Engelbert Humperdinck shouldrank higher than “Lesbian Seagull” which onlyappeared on the soundtrack of a Beavis andButthead episode Editorial relevance vs. what people click Large-scale data processing and ML Knowledge Base built from Wikipedia, Yahoo!data, Web extraction Feature extraction from query logs, Flickr and TwitterdataEntitygraphDatapreprocessingFeatureextractionModellearningFeaturesourcesEditorialjudgementsDatapackRankingmodelRanking anddisambiguationEntitydataFeatures
  • Contact Peter Mika pmika@yahoo-inc.com @pmika Tran Duc Thanh thanh.tran@semsolute.com
  • Resources
  • Resources Detailed information Peter Mika. Entity Search on the Web, Keynote at Web ofLinked Entities WS Peter Mika, Thanh Tran. Semantic search tutorialSemTech2012 Books Ricardo Baeza-Yates and Berthier Ribeiro-Neto. ModernInformation Retrieval. ACM Press. 2011 Survey papers Thanh Tran, Peter Mika. Survey of Semantic SearchApproaches. Under submission, 2012. Conferences and workshops ISWC, ESWC, WWW, SIGIR, CIKM, SemTech Semantic Search workshop series Exploiting Semantic Annotations in Information Retrieval(ESAIR) Entity-oriented Search (EOS) workshop Web of Linked Entities (WoLE) workshop