Improving Semantic Search Using Query Log Analysis


Published on

Despite the attention Semantic Search is continuously gaining, several challenges affecting tool performance and user experience remain unsolved. Among these are: matching user terms with the searchspace, adopting view-based interfaces in the Open Web as well as supporting users while building their queries. This paper proposes an approach to move a step forward towards tackling these challenges by creating models of usage of Linked Data concepts and properties extracted from semantic query logs as a source of collaborative knowledge. We use two sets of query logs from the USEWOD workshops to create our models and show the potential of using them in the mentioned areas.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Improving Semantic Search Using Query Log Analysis

  1. 1. Improving Semantic Search Using Query Log Analysis Khadija Elbedweihy, Stuart N. Wrigley and Fabio Ciravegna OAK Research Group, Department of Computer Science, University of Sheffield, UK
  2. 2. Outline• Introduction• Semantic Query Logs Analysis - Query-Concepts Model - Concepts-Predicates Model - Instance-Types Model• Results Augmentation• Data Visualisation
  4. 4. Motivation• Little work on results returned (answers) and presentation style. – Users want direct answers augmented with more information for richer experience1 – Users want more user-friendly and attractive results presentation format1• Semantic query logs: logs of queries issued to repositories containing RDF data.1. See our paper from this morning’s IWEST 2012 workshop
  5. 5. Related WorkSemantic query logs analysis:• Moller et al. identified patterns of Linked Data usage with respect to different types of agents.• Arias et al. analysed the structure of the SPARQL queries to identify most frequent language elements.• Luczak-Rösch et al. analysed query logs to detect errors and weaknesses in LD ontologies and support their maintenance.
  6. 6. Related Work (cont’d)How our work is different:Analyze semantic query logs to produce models capturingdifferent patterns of information needs on Linked Data: Concepts used together in a query: query-concepts model Predicate used with a concept: concept-predicates model Concepts used as types of a LD entity: instance-types modelThe models make use of the “collaborative knowledge”inherent in the logs to enhance the search process.
  8. 8. Extraction• Query logs entries follow the Combined Log Format (CLF): Extract SPARQL query SELECT DISTINCT ?genre, ?instrument WHERE { <……/Ringo_Starr> ?rel <……/The_Beatles>. <……/Ringo_Starr> dbpedia:genre ?genre. <……/Ringo_Starr> dbpedia:instrument ?instrument. }
  9. 9. Analysis SELECT DISTINCT ?genre, ?instrument WHERE { <……/Ringo_Starr> ?rel <……/The_Beatles>. <……/Ringo_Starr> dbpedia:genre ?genre. <……/Ringo_Starr> dbpedia:instrument ?instrument. }• For each bound resource (subject or object) -> query endpoint for the type of the resource type
  10. 10. Query-Concepts Model SELECT DISTINCT ?genre, ?instrument WHERE { <……/Ringo_Starr> ?rel <……/The_Beatles>. <……/Ringo_Starr> dbpedia:instrument ?instrument. }1) Retrieve types of resources in the query: Ringo_Starr type dbpedia-owl:MusicalArtist, umbel:MusicalPerformer The_Beatles type dbpedia-owl:Band, schema:MusicGroup2) Increment the co-occurrence of each concept in the first list with each concept in the second: MusicalArtist Band MusicalPerformer MusicGroupMusicalArtist MusicGroup MusicalPerformer Band
  11. 11. Concept-Predicates Model SELECT DISTINCT ?genre, ?instrument WHERE { <……/Ringo_Starr> ?rel <……/The_Beatles>. <……/Ringo_Starr> dbpedia:genre ?genre. <……/Ringo_Starr> dbpedia:instrument ?instrument. }1) Retrieve types of resources used as subjects in the query: Ringo_Starr type dbpedia-owl:MusicalArtist, umbel:MusicalPerformer2) Identify bound predicates (dbpedia:genre, dbpedia:instrument)3) Increment the co-occurrence of each type with the predicate used in the same triple pattern:MusicalPerformer genre MusicalPerformer instrument MusicalArtist genre MusicalArtist instrument
  12. 12. Instance-Types Model SELECT DISTINCT ?genre, ?instrument WHERE { <……/Ringo_Starr> ?rel <……/The_Beatles>. <……/Ringo_Starr> dbpedia:instrument ?instrument. }1) Retrieve types of resources in the query: Ringo_Starr type dbpedia-owl:MusicalArtist, umbel:MusicalPerformer The_Beatles type dbpedia-owl:Band, schema:MusicGroup2) Increment the co-occurrence of concepts found as types for the same instance: MusicalArtist MusicalPerformer Band MusicGroup
  14. 14. Dataset• Two sets of DBpedia query logs made available at the USEWOD2011 and USEWOD2012 workshops.• The logs contained around 5 million queries issued to DBpedia over a time period spanning almost 2 years USEWOD2012 USEWOD2011 Number of analyzed queries 8866028 4951803 Number of unique triple patterns 4095011 2641098 Number of unique bound triple patterns 3619216 2571662
  15. 15. Results Enhancement• Google, Yahoo!, Bing, etc. enhance search results using structured data• FalconS and VisiNav return extra information together with each entity in the answers (e.g. type, label)• Evaluation of Semantic Search showed that augmenting answers with extra information provides a richer user experience2.2. See our paper from this morning’s IWEST 2012 workshop
  16. 16. FalconS ResultsQuery: `population of New York city’• Information chosen depend on manually (randomly) predefined set.
  17. 17. Motivation for proposed approach• Utilizing query logs as a source of collaborative knowledge able to capture implicit associations between Linked Data entities and properties.• Use this to select which information to show the user.• Two recent studies3 analyzed semantic query logs and observed that a class of entities is usually queried with similar relations and concepts. 3. Luczak-Rösch et al. ; Elbedweihy et al.
  18. 18. Two Related Types of Result Augmentation1. Additional result-related information. – More details about each result item – Provides better understanding of the answer.2. Additional query-related information. – More results related to the query entities – Assists users in discovering useful findings (serendipity)
  19. 19. Return additional result-related informationSteps1) For each result item, find types of instance.1) Most frequently queried predicates associated with them are extracted from the concept-predicates model.2) Generate queries with each pair (instance, predicate). e.g. (<……/Ringo_Starr> , genre)3) Show aggregated results to the user.
  20. 20. Return additional result-related information• MusicalArtist-> genre, associatedBand, occupation, instrument, birthDate, birthPlace, hometown, prop:yearsActive, foaf:surname, prop:associatedActs, …Query: “Who played drums for the Beatles?”Result: Ringo Starr Pop music, Rock music (genre) Keyboard, Drum,Acousticguitar(instrument) The Beatles, Plastic Ono Band, Rory Storm,(assoc.Band)
  21. 21. Return additional query-related informationSteps1) Extract all concepts from query.2) For any instances, find their types.3) For each query concept, find most frequently occurring concepts from the query-concepts model.4) For each related concept, query for instances that have relation with the originating instance.5) Show aggregated results to the user.
  22. 22. Return additional query-related information• City-> Book, Person, Country, Organisation, SportsTeam, MusicGroup, Film, RadioStation, River, University, SoccerPlayer, Hospital, ...Query: “Where is the University of Sheffield located?”Result: Sheffield,UK NickClegg,CliveBetts, DavidBlunkett(Person) SheffieldUnitedF SheffieldWednesday (SportsT .C., eam) Hallam FM,RealRadio, BBCRadioSheffield (RadioStn.) JessopHosp.,NorthernGeneral, RoyalHallamshire(Hospital) Uni.ofSheffield, SheffieldHallam Uni. (University)
  24. 24. Data Visualization• View-based interfaces (e.g. Semantic Crystal and Smeagol) support users in query formulation by showing the underlying data and connections.• Helpful for users, especially those unfamiliar with the search domain.• Try to bridge the gap between user terms and tool terms (habitability problem)• Facing challenge to visualize large datasets without cluttering the view and affecting user experience.
  25. 25. Data Visualization: Proposed approach• Visualizing large datasets (especially heterogeneous ones) is a challenge.• To overcome this, we need to select and visualize specific parts of the data.• Exploit collaborative knowledge in query logs to derive selection of concepts and predicates added to user’s subgraph of interest.
  26. 26. Data Visualization: Proposed approachSteps1) User enters NL query2) Return best-attempt results3) Identify query instances and find their types4) For each type: • Extract most queried predicates associated with it from concept-predicates model. • Extract most queried concepts associated with it from query-concepts model.5) Add these to the user’s query graph (see next slide)
  27. 27. ExampleQuery: “What is the capital of Egypt?” Best-attempt Answer: Cairo results Result-➔ latitude: 30.058056 ➔ depiction: Related information➔ longitude: 31.228889➔ population: 6758581➔ area: 453000000➔ time zone: Eastern European Time➔ subdivision: Governorates of Egypt➔ page:➔ nickname: The City of a Thousand Minarets, Capital of the Arab World
  28. 28. ExampleQuery: “What is the capital of Egypt?” Query-Related informationAnswer: Cairo➔ Cairo Uni., Ain Shams Uni., German Uni., British Uni. (University)➔ Ittihad El Shorta, El Shams Club, AlNasr Egypt (SportsTeam)➔ Orascom Telecom, HSBC Bank, EgyptAir, Olympic Grp (Organisation)➔ Nile River (River)➔ Al Azhar Park (Park)➔ Hani Shaker, Sherine, Umm Kulthum, Am Diab (MusicalArtist)➔ Nile TV, AL Nile, Al-Baghdadia TV (BroadCaster)➔ Egyptian Museum, Museum of Islamic Art (Museum)
  29. 29. Data Visualization: Proposed approachStep 5: Add concepts andpredicates to user’s querygraph Most queried Most queriedpredicates with concepts with “Country” “Country” Query instance
  30. 30. QuestionsThank YouQuestions?