• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
GContext: A context-based query construction service for Google
 

GContext: A context-based query construction service for Google

on

  • 335 views

 

Statistics

Views

Total Views
335
Views on SlideShare
335
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Impact of large-scale web search engines in information seekingAccording to Alexa: Google, F/b, Youtube, Yahoo!, Baidu, Wikipedia, windows live, twitter, qq, amazon

GContext: A context-based query construction service for Google GContext: A context-based query construction service for Google Presentation Transcript

  • GContext: A context-based query construction service for Google Ioannis Apostolatos and Ioannis Papadakis Ionian University, Greece
  • Presentation outline Introduction Rationale Proposed approach Usage scenarios Discussion
  • Introduction At the web, information about virtually anything can be found, provided that a searcher knows where to look Searchers largely rely on large-scale web search engines – SE in order to get assistance in locating useful resources The quality of the search results depends on the ability of the searchers to accurately express their information needs as keywords in the search engines input box How do SE aid their users in creating successful queries?
  • Rationale The query construction phase of a search session is crucial to the fulfillment of the searchers‟ information needs During the query construction phase, a searcher has to express his information needs according to the specific dialect (i.e. keywords-based) of the underlying SE The searcher has to guess the words that the SE has chosen to index the web resources that correspond to such needs
  • Rationale Spoken languages have certain features that should be taken under consideration:  Polysemy of words  Polysemy occurs when a word has more than one sense  A query that consists of an ambiguous word without further information that correctly disambiguates it may result in a search results list with completely useless information  Synonymy of words  Synonymy occurs when two or more words share the same meaning  The probability of two persons using the same term in describing the same thing is less than 20%
  • Proposed approach A query construction/refinement service on top of Google SE that is powered by the LOD cloud and especially DBpedia The proposed service is a two-step process: 1. Initially, it provides autosuggest functionality by reacting to the corresponding keystrokes of a searcher  Prefix search is performed to an index that is comprised of words and/or phrases originating from Wikipedia and made available through Dbpedia („article titles‟ dataset)  Such functionality facilitates query disambiguations, since Wikipedias disambiguations follow a pattern that is promoted by prefix search  i.e. <ambiguous word> (disambiguation info)) e.g. bass (fish)  DBpedia‟s suggestions are appended to Google‟s original suggestions
  • Proposed approach The proposed service is a two-step process: (continue…) 2. Upon selection of a suggestion, the searcher is offered the chance to refine the initial query through the appropriate interactions that are provided by the service (i.e. query replacements and refinements)  Query replacements and refinements derive from the results of SPARQL queries that are addressed to DBpedias endpoint Every interaction results to the construction of an appropriate query that is addressed to Googles Custom Search, which, in turn, provides the corresponding search results
  • Proposed approach – under the hood:Query replacements  Words or phrases that correspond to alternatives to the suggestion the user has chosen from the search box  They are actually Wikipedias redirections of the articles title that the user selected from the search box  SPARQL query evolves around the <http://dbpedia.org/ontology/wikiPageRedirects> predicate
  • Proposed approach – under the hood:Query refinements  Query refinements are keywords that a user can add to the initial query in order to semantically refine it. They are organized in three groups:  Categories  Wordnet categories and  Context words  The Categories group is populated with the categories of the Wikipedias article that the user selected from the search box  Corr. SPARQL query evolves around the <http://purl.org/dc/terms/subject> predicate  The Wordnet categories group is populated with the wordnet categories of the Wikipedias title that the user selected from the search box  Corr. SPARQL query evolves around the <http://dbpedia.org/property/wordnet_type> predicate  The group Context words is populated with information deriving from the infobox of the corresponding Wikipedias article  Corr. SPARQL query evolves around the <http://dbpedia.org/property/.*> predicate along with numerous „FILTER‟ clauses
  • Usage scenarios: AutosuggestionsDealing with ambiguous queries: Jaguar the hero from Archie Comics
  • Usage scenarios: AutosuggestionsDealing with ambiguous queries: Jaguar the hero from ArchieComics
  • Usage scenarios: AutosuggestionsDealing with ambiguous queries: Jaguar the hero from Archie Comics
  • Usage scenarios: Query replacements
  • Usage scenarios: Query refinements
  • Usage scenarios: Query refinements
  • Usage scenarios: Query refinements
  • Discussion  So, can we compete Google? Certainly not:  Linked data is full of „noise‟  Things could improve if we all put some effort into it: http://pedantic-web.org/  SPARQL endpoints are often too slow to respond  Unions are expensive  “FILTER regex” clauses take forever to resolve  Maybe the Database community provides solutions that will speed things up  Size matters  Google‟s index size is far greater and fresher  And much more…
  • Discussion  Then, why bother?  We believe that GContext can be seamlessly integrated with any major search engine that provides access to it‟s search box  What about the „knowledge graph‟?  Too early to jump to any conclusions. It was announced on May 16th, so far only partially deployed  A proof that we are on the right tracks:  “… go deeper and broader” i.e. infoboxes from DBpedia  “… Find the right thing” i.e. PageRedirects from DBpedia
  • Discussion  Thank you very much,  Questions?