Improving Semantic Search Using Query Log Analysis

Improving Semantic Search Using
Query Log Analysis

Khadija Elbedweihy, Stuart N. Wrigley and Fabio Ciravegna
OAK Research Group,
Department of Computer Science,
University of Sheffield, UK

Outline

• Introduction
• Semantic Query Logs Analysis
- Query-Concepts Model
- Concepts-Predicates Model
- Instance-Types Model
• Results Augmentation
• Data Visualisation

Motivation

• Little work on results returned (answers) and
presentation style.
– Users want direct answers augmented with more
information for richer experience1
– Users want more user-friendly and attractive results
presentation format1

• Semantic query logs: logs of queries issued to repositories
containing RDF data.

1. See our paper from this morning’s IWEST 2012 workshop

Related Work
Semantic query logs analysis:
• Moller et al. identified patterns of Linked Data usage with
respect to different types of agents.

• Arias et al. analysed the structure of the SPARQL queries
to identify most frequent language elements.

• Luczak-Rösch et al. analysed query logs to detect errors
and weaknesses in LD ontologies and support their
maintenance.

Related Work (cont’d)

How our work is different:
Analyze semantic query logs to produce models capturing
different patterns of information needs on Linked Data:

 Concepts used together in a query: query-concepts model
 Predicate used with a concept: concept-predicates model
 Concepts used as types of a LD entity: instance-types model

The models make use of the “collaborative knowledge”
inherent in the logs to enhance the search process.

Extraction
• Query logs entries follow the Combined Log Format (CLF):

Extract SPARQL query

SELECT DISTINCT ?genre, ?instrument WHERE
{
<…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>.
<…dbpedia.org…/Ringo_Starr> dbpedia:genre ?genre.
<…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument.
}

Analysis
{
<…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>.
<…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument.
}

• For each bound resource (subject or object) ->
query endpoint for the type of the resource

http://dbpedia.org/resource/Ringo_Starr

type
http://dbpedia.org/ontology/MusicalArtist

Query-Concepts Model

{ <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>.
<…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. }

1) Retrieve types of resources in the query:
Ringo_Starr type dbpedia-owl:MusicalArtist, umbel:MusicalPerformer
The_Beatles type dbpedia-owl:Band, schema:MusicGroup

2) Increment the co-occurrence of each concept in the first list
with each concept in the second:

MusicalArtist Band MusicalPerformer MusicGroup

MusicalArtist MusicGroup MusicalPerformer Band

Concept-Predicates Model


1) Retrieve types of resources used as subjects in the query:

2) Identify bound predicates (dbpedia:genre, dbpedia:instrument)

3) Increment the co-occurrence of each type with the predicate used in
the same triple pattern:

MusicalPerformer genre MusicalPerformer instrument

MusicalArtist genre MusicalArtist instrument

Instance-Types Model


1) Retrieve types of resources in the query:
The_Beatles type dbpedia-owl:Band, schema:MusicGroup

2) Increment the co-occurrence of concepts found as types for the
same instance:

MusicalArtist MusicalPerformer

Band MusicGroup

Dataset
• Two sets of DBpedia query logs made available at the
USEWOD2011 and USEWOD2012 workshops.

• The logs contained around 5 million queries issued to
DBpedia over a time period spanning almost 2 years

USEWOD2012 USEWOD2011
Number of analyzed queries 8866028 4951803
Number of unique triple patterns 4095011 2641098
Number of unique bound triple patterns 3619216 2571662

Results Enhancement
• Google, Yahoo!, Bing, etc. enhance search
results using structured data

• FalconS and VisiNav return extra information together
with each entity in the answers (e.g. type, label)

• Evaluation of Semantic Search showed that augmenting
answers with extra information provides a richer user
experience2.
2. See our paper from this morning’s IWEST 2012 workshop

FalconS Results
Query: `population of New York city’

• Information chosen depend on manually (randomly)
predefined set.

Motivation for proposed approach
• Utilizing query logs as a source of collaborative knowledge
able to capture implicit associations between Linked Data
entities and properties.

• Use this to select which information to show the user.

• Two recent studies3 analyzed semantic query logs and
observed that a class of entities is usually queried with
similar relations and concepts.

3. Luczak-Rösch et al. ; Elbedweihy et al.

Two Related Types of Result Augmentation
1. Additional result-related information.
– More details about each result item
– Provides better understanding of the answer.

2. Additional query-related information.
– More results related to the query entities
– Assists users in discovering useful findings
(serendipity)

Return additional result-related information
Steps
1) For each result item, find types of instance.

1) Most frequently queried predicates associated with them
are extracted from the concept-predicates model.

2) Generate queries with each pair (instance, predicate).
e.g. (<…dbpedia.org…/Ringo_Starr> , genre)

3) Show aggregated results to the user.

Return additional result-related information
• MusicalArtist-> genre, associatedBand, occupation, instrument,
birthDate, birthPlace, hometown, prop:yearsActive, foaf:surname,
prop:associatedActs, …

Query: “Who played drums for the Beatles?”

Result: Ringo Starr
Pop music, Rock music (genre)
Keyboard, Drum,Acousticguitar(instrument)
The Beatles, Plastic Ono Band, Rory Storm,(assoc.Band)

Return additional query-related information
Steps
1) Extract all concepts from query.

2) For any instances, find their types.

3) For each query concept, find most frequently occurring
concepts from the query-concepts model.

4) For each related concept, query for instances that have
relation with the originating instance.

5) Show aggregated results to the user.

Return additional query-related information
• City-> Book, Person, Country, Organisation, SportsTeam, MusicGroup,
Film, RadioStation, River, University, SoccerPlayer, Hospital, ...

Query: “Where is the University of Sheffield located?”

Result: Sheffield,UK
NickClegg,CliveBetts, DavidBlunkett(Person)
SheffieldUnitedF SheffieldWednesday (SportsT
.C., eam)
Hallam FM,RealRadio, BBCRadioSheffield (RadioStn.)
JessopHosp.,NorthernGeneral, RoyalHallamshire(Hospital)
Uni.ofSheffield, SheffieldHallam Uni. (University)

Data Visualization
• View-based interfaces (e.g. Semantic Crystal and Smeagol)
support users in query formulation by showing the
underlying data and connections.

• Helpful for users, especially those unfamiliar with the
search domain.

• Try to bridge the gap between user terms and tool terms
(habitability problem)

• Facing challenge to visualize large datasets without
cluttering the view and affecting user experience.

Data Visualization: Proposed approach
• Visualizing large datasets (especially heterogeneous ones)
is a challenge.

• To overcome this, we need to select and visualize specific
parts of the data.

• Exploit collaborative knowledge in query logs to derive
selection of concepts and predicates added to user’s
subgraph of interest.

Steps
1) User enters NL query
2) Return best-attempt results
3) Identify query instances and find their types
4) For each type:
• Extract most queried predicates associated with it from
concept-predicates model.
• Extract most queried concepts associated with it from
query-concepts model.
5) Add these to the user’s query graph (see next slide)

Example
Query: “What is the capital of Egypt?”
Best-attempt
Answer: Cairo results
Result-
➔ latitude: 30.058056 ➔ depiction: Related
information
➔ longitude: 31.228889
➔ population: 6758581
➔ area: 453000000
➔ time zone: Eastern European Time
➔ subdivision: Governorates of Egypt
➔ page: http://www.cairo.gov.eg/default.aspx
➔ nickname: The City of a Thousand Minarets, Capital of the
Arab World

Example
Query: “What is the capital of Egypt?” Query-Related
information
Answer: Cairo

➔ Cairo Uni., Ain Shams Uni., German Uni., British Uni. (University)
➔ Ittihad El Shorta, El Shams Club, AlNasr Egypt (SportsTeam)
➔ Orascom Telecom, HSBC Bank, EgyptAir, Olympic Grp (Organisation)
➔ Nile River (River)
➔ Al Azhar Park (Park)
➔ Hani Shaker, Sherine, Umm Kulthum, Am Diab (MusicalArtist)
➔ Nile TV, AL Nile, Al-Baghdadia TV (BroadCaster)
➔ Egyptian Museum, Museum of Islamic Art (Museum)

Step 5: Add concepts and
predicates to user’s query
graph

Most queried Most queried
predicates with concepts with
“Country” “Country”

Query
instance

Questions

Thank You

Questions?

Improving Semantic Search Using Query Log Analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Improving Semantic Search Using Query Log Analysis

Similar to Improving Semantic Search Using Query Log Analysis (20)

Recently uploaded

Recently uploaded (20)

Improving Semantic Search Using Query Log Analysis