Evaluating Semantic Search Query Approaches with Expert and Casual Users

Evaluating Semantic Search Query
Approaches with Expert and Casual Users

Khadija Elbedweihy, Stuart N. Wrigley and Fabio Ciravegna
OAK Research Group,
Department of Computer Science,
University of Sheffield, UK

Outline
• Motivation
• Research Question
• Evaluation Design
• Evaluation Setup
• Findings
• Conclusions

Motivation – Semantic Search

• Wikipedia states that Semantic Search “seeks to improve
search accuracy by understanding searcher intent and the
contextual meaning of terms as they appear in the
searchable dataspace, whether on the Web or within a
closed system, to generate more relevant results”.

• Covers broad category of applications in Semantic Web:
– Search engines (e.g., Swoogle, FalconS, Sindice)
– Closed-domain query interfaces (e.g., AquaLog, Querix)
– Open-domain query interfaces (e.g., PowerAqua)

Motivation - Evaluations
• Evaluation of software is critical.

• Large-scale evaluations foster research and development.

• Semantic search evaluations (SemSearch, TREC ELC,
QALD) focused on assessing retrieval performance.

Assessing usability of tools and user satisfaction is
important in Semantic Search.

Research Question
How do different types of users perceive the usability
of different query approaches?

• Method
- Assess usability and user satisfaction of:
* Free-NL, Controlled-NL, Form-based, Graph-based

- from the perspective of
* expert users and casual users

Query Approaches
Controlled-NL
Free-NL
Specific vocabulary
Natural language queries
Which state has river Submit
capital
What is the capital of Alabama? Submit lake
mountai
capital Alabama Submit n
a
any

Form-based Graph-based
Visualize the Visualize the
search space search space

Evaluation Design: Dataset
• Mooney Natural Language Learning Data
- simple and well-known domain (geography)
- used by other studies within the search community
- questions already available (877 NL questions)

• Geography Dataset:
– Concepts: State, City, Lake, Mountain, Capital, River, etc
– Properties: population of state, length of river, etc
– Relations linking concepts: State ‘hasCity’ City

Evaluation Design: Data Collected
• Objective data:
1) Input time
2) Number of attempts
3) Success rate

• Subjective data, collected using:
1) Questionnaires (e.g., System Usability Scale ‘SUS’)
2) Ranking of the tools (w.r.t: system, query approach,
results content, results presentation)
3) Observations

Evaluation Setup
• 20 subjects
– 10 casual users, 10 expert users
– 12 females, 8 males

• Within-subjects: allows direct comparison.
• Randomising tool order: normalize learning or tiredness
effects.
• Randomising question order: normalize learning effects.

Results
• Evaluated tools:

– Free-NL: NLP-Reduce

– Controlled-NL: Ginseng

– Form-based: K-Search

– Graph- based:
• Semantic-Crystal (Graph-based 1)
• Affective Graphs (Graph-based 2)

Results for expert users
• Expert users prefer graph- and form- based approach.
• View-based allow more complex queries than NL-based.
Best 1
0.9
0.8
Query Language Rank

0.7 Graph-based1
0.6 Graph-based2
0.5 Form-based
0.4 Controlled-NL
0.3 Free-NL
0.2
0.1
0
Worst

Results for casual users
• Casual users prefer form-based query approach.
• Required less input time than graph-based approach.
Best
100 1
90 0.9

Query Language Rank
80 0.8
70 0.7 Graph-based1
Input Time (Sec)

60 0.6 Graph-based2
50 0.5 Form-based
40 0.4 Controlled-NL
30 0.3 Free-NL
20 0.2
10 0.1
0 0
Worst

• Visualizing the entire ontology supports query formulation
– Semantic Crystal: shows the entire ontology.
– Affective Graphs: shows selected concepts & relations.

• Not showing ontology more complex for casual users:
– Semantic Crystal receiving higher scores.
– Affective Graphs perceived as complex and difficult to use
• 50% of the users found it to increase complexity and difficulty

• Controlled-NL very restrictive for expert users (least-liked)
• Highest query input time
120
Best 1
0.9
100 0.8

Query Language Rank
0.7 Graph-based1
80
Input Time (Sec)

0.6 Graph-based2
60 0.5 Form-based
0.4 Controlled-NL
40
0.3 Free-NL
20 0.2
0.1
0 0
Worst

• Controlled-NL provided most support for casual users.

• Users’ positive feedback for controlled-NL:
– allow only correct queries (50%)
– suggestions and guidance to formulate queries (40%)

Example: Although Ginseng is limited to specific vocabulary, I
knew that I will get answers once I can do the query because it
only allows the correct ones and thus I didn't keep trying a lot
of queries that I wasn't sure about.

RESULTS INDEPENDENT OF USER TYPE

Free-NL approach
+ simplest and most natural
- suffer from habitability problem.

• Feedback: “I have to guess the right words”
– Example: `run through’ with `river’ but not `traverse’.

• NLP-Reduce:
– lowest success rate: 20%
– highest number of attempts: 4.2

Negation
• Tell me which rivers do not traverse the state with the
capital Nashville?
1
0.9
0.8
0.7
Answer Found Rate

Graph-based1
0.6
Graph-based2
0.5
Form-based
0.4
Controlled-NL
0.3
Free-NL
0.2
0.1
0
Expert Users Casual Users

Negation
Tell me which states does the river Mississippi does not
traverse.

• “Closed world assumption (CWA): presumption that what
is not currently known to be true is false”.
<Mississippi, traverse, Louisiana>

• “Open world assumption (OWA): assumption that the
truth-value of a statement is independent of whether or
not it is known by any single observer or agent to be true”.
<Mississippi, not_traverse, Alabama>

Formal Query
• Formal Query (e.g., SPARQL)

Formal Query
• Benefit of showing formal query depends on user type.

• Formal query perceived by:
– Casual users: not understandable and confusing

– Expert users: increased confidence

Also, performing direct changes to the formal query
increased the expressiveness of the query language.

Results presentation
• Results presentation and format affected usability and user
satisfaction.
– Unless users are very familiar with the data, presenting URIs
alone is not very helpful.

– Example: A query for rivers returns one of the answers:
http://www.mooney.net/geo#tennesse2

Results Content
• Results should be augmented with associated information
to provide a `richer’ user experience.

• Users feedback:
– Maybe a `mouse over' function to show more
information.
– Perhaps related information with the results.
– Results very limited, would be good to have more
context.

Research Question & Approach
How do different types of users perceive the usability
of different query approaches?

- Assess usability and user satisfaction of:
* Free-NL, Controlled-NL, Form-based, Graph-based

- from the perspective of
* expert users and casual users

Conclusions

Expert Users Casual Users
• Graph-based most preferred • Form-based mid-point
- Intuitive - Allow more complex queries than
- Support complex queries NL.
- Easier than graph-based
• Controlled-NL least preferred
- Faster than graph-based
- Very restrictive.
- Limited expressiveness • Controlled-NL most supportive
• Prefer flexibility of free-NL - Only valid queries: Confidence
• Formal query provides confidence - Vocabulary suggestions: guidance
- Ability to change query increases • Formal Query not understandable
expressiveness. and confusing.

• Users want search results to be augmented with more
information to have a better understanding of the answers.

Recommendations
Cater to both expert and casual users:

• Hybridized query approach: Combine a view-based
approach (visualize search space) with a NL-input feature
(balance difficulty and speed) while including optional
suggestions for the NL input (provide guidance).

• Results Content: Augment results with ‘extra’ and ‘related’
information.
– extra information: for ‘State’: capital, area, population.
– related information: for ‘State’: rivers, lakes, mountains.

Limitations & Future work
• Limitation: Small size of the dataset.

• Assess learnability of different query approaches.

• Assess how interaction with the search tools affect the
information seeking process: usefulness.

– Use questions with an overall goal and compare users'
knowledge before and after the search task.

Evaluating Semantic Search Query Approaches with Expert and Casual Users

Recommended

Recommended

More Related Content

Similar to Evaluating Semantic Search Query Approaches with Expert and Casual Users

Similar to Evaluating Semantic Search Query Approaches with Expert and Casual Users (20)

Recently uploaded

Recently uploaded (20)

Evaluating Semantic Search Query Approaches with Expert and Casual Users