Mobile: Google interactive voice search (conversation), Siri (Peter)Facebook’s Graph Search (Thanh)Knowledge Graph (infoboxes)... entity search (“tom cruise actor”) to list/category queries (“tom cruise spouses”) to question-answering (“tom cruise height”) (Thanh)Spark (Yahoo!): related entity recommendation (Peter)Thanh’s search engine: auto-complete based on the schema/data, entity search to relational search using Yago data (Thanh)Glimmer: RDF search engine (Peter)
Semantic search can be seen as a retrieval paradigm Centered on the use of semanticsIncorporates the semantics entailed by the query and (or) the resources into the matching process, it essentially performs semantic search.
Facebook invited, but continues to pursue OGP
We implemented the search paradigms and integrated them as separate search modules into a demonstrator system of the Information Workbench7 that has been developed as a showcase for interaction with the Web of data. In particular, keyword search is implemented according to the design and technologies employed by standard Semantic Web search engines. Like Sindice and FalconS, we use an invertedindex to store and retrieve RDF resources based on terms. Also using the inverted index, faceted search is implemented based on the techniques discussed in . Result completion is based on recent work discussed for the TASTIER system . For computing join graphs, we use the top-k procedure elaborated in . This technique is also used for computing top-k interpretations, i.e. to support query completion. We choose to display the top-6 queries and the top-25 results respectively.
Transcript of "Recent Trends in Semantic Search Technologies"
Peter Mika| Yahoo! Research, Spainpmika@yahoo-inc.comThanh Tran | Semsolute, GermanyTran@semsolute.comSemantic Search on the Rise
About the speakers Peter Mika Senior Research Scientist Head of Semantic Search group atYahoo! Labs Expertise: Semantic Search, WebObject Retrieval, Natural LanguageProcessing Tran Duc Thanh CEO of Semsolute, Semantic SearchTechnologies Company Served as Assistant Professor forKarlsruhe Institute of Technology andStanford University Expertise: Semantic Search,Semantic / Linked Data Management
Agenda Why Semantic Search What is Semantic Search Innovative Semantic Search Applications Behind the Scene Questions
Why Semantic Search? I. “We are at the beginning of search.“ (Marissa Mayer) Solved large classes of queries, e.g. navigational Remaining queries are hard, not solvable by bruteforce, require deep understanding of the world andhuman cognition, e.g. Ambiguous searches: paris hilton Imprecise or overly precise searches Searches for descriptions: 34 year old computer scientistliving in barcelona Background knowledge and metadata can help toaddress poorly solved queriesMany of these querieswould not be asked byusers, who learned overtime what searchtechnology can and cannot do.
Why Semantic Search? II. The Semantic Web is now a reality Large amounts of data published in RDF Linked Data Metadata in HTML Facebook‟s Open Graph Protocol Schema.org Casual users Don‟t know SPARQL Unaware of the schema of the data Searching data instead or in addition to searchingdocuments Enable innovative search applications / tasks
Semantic Search: Using Semantic Models forSearch Semantic search is a retrieval paradigm that Exploits the semantics of the data or explicit backgroundknowledge to understand user intent and the meaning ofcontent Incorporates the intent of the query and the meaning ofcontent into the search process (semantic models)
Semantic Search: Different Kinds / DifferentUses of Semantic Models Wide range of semantic search systems Employ different semantic models, possibly atdifferent steps of the search process and in order tosupport different tasks Query formulation Query processing / understanding Ranking Result presentation Result / query refinement
Semantic models Semantics is concerned with the meaning of theresources made available for search Various representations of meaning Word-level models: models of relationships amongwords Taxonomies, thesauri, dictionaries of entity names Inference along linguistic relations, e.g. broader/narrowerterms Concept-level models: models of relationshipsamong objects Ontologies capture entities in the world and theirrelationships Inference along domain-specific relations
Graph-based Conceptual Models Core of W3C standards for knowledge representationand data exchange: RDF, OWL Large amount of data / knowledge on the Webavailable as graphs Linked Data: hundreds of interconnected datasetscapturing domain-independent and domain-specificknowledge Metadata in HTML RDFa, microdata, Facebook‟s OGP Private graphs Google‟s Knowledge Graph Facebook Graph Yahoo‟s Knowledge Base (talk yesterday) Microsofts Satori
Where can you find Linked Data? Downloads Dbpedia data dumps SPARQL access LOD cache by OpenLink: 51 billion triples Keyword search Sindice by SindiceTech
Google Knowledge Graph Start with Freebase‟s database, which had 12 millionentities As of June 2012, Knowledge Graph has 500 millionentities and over 3.5 billion relationships betweenthose entities Prioritize properties based on what users were most
Facebook‟s Open Graph Protocol The „Like‟ button provides publishers with a way topromote their content on Facebook and buildcommunities Shows up in profiles and news feed Site owners can later reach users who have liked anobject Facebook Graph API allows 3rd party developers toaccess the data Open Graph Protocol is an RDFa-based format thatallows to describe the object that the user „Likes‟
Facebook‟s Open Graph Protocol RDF vocabulary to be used in conjunction with RDFa Simplify the work of developers by restricting the freedom in RDFa Activities, Businesses, Groups, Organizations, People, Places,Products and Entertainment Only HTML <head> accepted http://opengraphprotocol.org/<html xmlns:og="http://opengraphprotocol.org/schema/"><head><title>The Rock (1996)</title><meta property="og:title" content="The Rock" /><meta property="og:type" content="movie" /><meta property="og:url"content="http://www.imdb.com/title/tt0117500/" /><meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" /> …</head> ...
Semantic Web markup: schema.org Agreement on a shared set of schemas for common typesof web content Use a single format to communicate the same information to all threesearch engines Bing, Google, and Yahoo! (June, 2011), Yandex (Nov, 2011) Microdata and RDFa support Schemas for most common web content Business listings, images/video, recipes, reviews, products, jobs… Community firstname.lastname@example.org
Current state of metadata on the Web Analysis of the Bing/Yahoo! Search Crawl US crawl, January, 2012 31% of webpages, 5% of domains contain some metadata P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus,LDOW 2012 WebDataCommons.org Data extracted from a public crawl (commoncrawl.org) February, 2012 results show 11% of URLs with metadatacompared to 5% in 2009/2010 data 7.3 billion triples available for download H.Mühleisen, C.Bizer.Web Data Commons - ExtractingStructured Data from Two Large Web Corpora, LDOW 2012 Large increase in RDFa and microdata adoption comparedto microformats
Where can you find HTML metadata? Web Data Commons Glimmer: glimmer.research.yahoo.com Online index of the schema.org data in Web DataCommons
Innovative Semantic Search Applications Entity search: entity/entities as results Factual search: direct answers, facts (about entities) Relational search: complex relationships between entities Semantic auto-completion: suggesting queries based onthe intent of the provided inputs Results aggregation / analysis / prediction: applycomputational models Semantic log analysis: understanding user behavior interms of objects Semantic profiling: recommendations based on particularinterests Semantic context: contextual model of users / interests Support for complex tasks, e.g. booking a vacation using acombination of services Conversational search
Conversational Search Parlance EU project Complex dialogs around a set of objects Restaurant Area Price range Type of cuisine Complete system Automated Speech Recognition (ASR) Spoken Language Understanding (SLU) Interaction Management Knowledge Base Natural Language Generation (NLG) Text-to-Speech (TTS) Video Commercial alternatives from Nuance
Main Technological Building Blocks Query Interpretation Spelling Correction Query Segmentation Entity Recognition Query Intent Interpretation for Semantic Auto-Completion Ranking Entity Ranking Relationship Ranking Aggregation Result Fusion Rank / Score Aggregation Result Presentation Summary Generation Visualization
Semsolute‟s Building Blocks - Keyword / Key PhraseInterpretationEntity“address company sanfrancisco” Semantic entity index Inverted index for entities /triples Return entities / entities‟relationships as results tokeys Semantic entity ranking Structured language model:one language model for everyattribute Returns entities‟ LMs thatmost likely generate thekeywords, i.e. the entitydescriptions that best match
Relationships / StructureEntity“address company sanfrancisco”Semsolute‟s Building Blocks – Semantic GraphConstruction Offline component: query-independent schema graph Reuse schema Pseudo-schema construction:all possible connectionsbetween classes of entities,e.g. friendships between users Online component: query-specific keyword matchingelements Connect keyword matchingelements / entities to theclasses they belong to
Relationships / StructureEntity“address company sanfrancisco”Semsolute‟s Building Blocks – Graph Exploration Top-k graph exploration Shortest-path based algorithmthat finds top-k graphsconnecting keyword matchingelements Top-k graph ranking Language model based Aggregated model thatcombines the LMs of entitiesmatching the keywords
Semsolute‟s Building Blocks – Query Generation &ProcessingTripleRelationships / StructureEntityAddress of companies located in SanFrancisco?“address company sanfrancisco” Graph to query mapping Translation rules that map topranked graphs to structuredqueries (SQL, SPARQL) Translation rules that mapstructured queries to naturallanguage questions Graph matching Triple index: cover indexsupporting different triplepatterns Various join implementations
Yahoo! Spark: Entity Recommendation inSearch Different use cases in Web Search Some users are short on time Need direct answers Query expansion, question-answering, information boxes, richresults… Other users want to explore Long term interests such as sports, celebrities, movies and music Long running tasks such as travel planning Spark is a search assistance tool for exploration Recommend related entities given the user‟s currentquery Based on explicit relations in a Knowledge Base
Spark challenges Interpretation and disambiguation Obama and Toyota are places in Japan, but maybethe user is not looking for them The popularity of “obama” is not a sign of thepopularity of a Japanese town Ranking “Release me” from Engelbert Humperdinck shouldrank higher than “Lesbian Seagull” which onlyappeared on the soundtrack of a Beavis andButthead episode Editorial relevance vs. what people click Large-scale data processing and ML Knowledge Base built from Wikipedia, Yahoo!data, Web extraction Feature extraction from query logs, Flickr and TwitterdataEntitygraphDatapreprocessingFeatureextractionModellearningFeaturesourcesEditorialjudgementsDatapackRankingmodelRanking anddisambiguationEntitydataFeatures
Contact Peter Mika email@example.com @pmika Tran Duc Thanh firstname.lastname@example.org