SlideShare a Scribd company logo
Evaluating Semantic Search Query
Approaches with Expert and Casual Users

               Khadija Elbedweihy, Stuart N. Wrigley and Fabio Ciravegna
                                                         OAK Research Group,
                                              Department of Computer Science,
                                                     University of Sheffield, UK
Outline
•   Motivation
•   Research Question
•   Evaluation Design
•   Evaluation Setup
•   Findings
•   Conclusions
Motivation – Semantic Search

• Wikipedia states that Semantic Search “seeks to improve
  search accuracy by understanding searcher intent and the
  contextual meaning of terms as they appear in the
  searchable dataspace, whether on the Web or within a
  closed system, to generate more relevant results”.

• Covers broad category of applications in Semantic Web:
   – Search engines (e.g., Swoogle, FalconS, Sindice)
   – Closed-domain query interfaces (e.g., AquaLog, Querix)
   – Open-domain query interfaces (e.g., PowerAqua)
Motivation - Evaluations
• Evaluation of software is critical.

• Large-scale evaluations foster research and development.

• Semantic search evaluations (SemSearch, TREC ELC,
  QALD) focused on assessing retrieval performance.

  Assessing usability of tools and user satisfaction is
  important in Semantic Search.
Research Question
How do different types of users perceive the usability
of different query approaches?

• Method
  - Assess usability and user satisfaction of:
     * Free-NL, Controlled-NL, Form-based, Graph-based

  - from the perspective of
      * expert users and casual users
Query Approaches
                                           Controlled-NL
Free-NL
                                           Specific vocabulary
Natural language queries
                                           Which state has   river     Submit
                                                             capital
What is the capital of Alabama?   Submit                     lake
                                                             mountai
 capital Alabama                  Submit                     n
                                                             a
                                                             any


Form-based                                 Graph-based
Visualize the                              Visualize the
search space                               search space
Evaluation Design: Dataset
• Mooney Natural Language Learning Data
   - simple and well-known domain (geography)
   - used by other studies within the search community
   - questions already available (877 NL questions)

• Geography Dataset:
  – Concepts: State, City, Lake, Mountain, Capital, River, etc
  – Properties: population of state, length of river, etc
  – Relations linking concepts: State ‘hasCity’ City
Evaluation Design: Data Collected
• Objective data:
  1) Input time
  2) Number of attempts
  3) Success rate

• Subjective data, collected using:
   1) Questionnaires (e.g., System Usability Scale ‘SUS’)
   2) Ranking of the tools (w.r.t: system, query approach,
      results content, results presentation)
   3) Observations
Evaluation Setup
• 20 subjects
   – 10 casual users, 10 expert users
   – 12 females, 8 males

• Within-subjects: allows direct comparison.
• Randomising tool order: normalize learning or tiredness
  effects.
• Randomising question order: normalize learning effects.
Results
• Evaluated tools:

   – Free-NL: NLP-Reduce

   – Controlled-NL: Ginseng

   – Form-based: K-Search

   – Graph- based:
      • Semantic-Crystal (Graph-based 1)
      • Affective Graphs (Graph-based 2)
QUERY APPROACH
Results for expert users
     • Expert users prefer graph- and form- based approach.
     • View-based allow more complex queries than NL-based.
Best                   1
                      0.9
                      0.8
Query Language Rank




                      0.7               Graph-based1
                      0.6               Graph-based2
                      0.5               Form-based
                      0.4               Controlled-NL
                      0.3               Free-NL
                      0.2
                      0.1
                       0
 Worst
Results for casual users
         • Casual users prefer form-based query approach.
         • Required less input time than graph-based approach.
                                  Best
                   100                                    1
                    90                                  0.9




                                  Query Language Rank
                    80                                  0.8
                    70                                  0.7   Graph-based1
Input Time (Sec)




                    60                                  0.6   Graph-based2
                    50                                  0.5   Form-based
                    40                                  0.4   Controlled-NL
                    30                                  0.3   Free-NL
                    20                                  0.2
                    10                                  0.1
                     0                                    0
                                     Worst
ONTOLOGY VISUALIZATION
Results for expert users
• Visualizing the entire ontology supports query formulation
   – Semantic Crystal: shows the entire ontology.
   – Affective Graphs: shows selected concepts & relations.
Results for casual users
• Not showing ontology more complex for casual users:
  – Semantic Crystal receiving higher scores.
  – Affective Graphs perceived as complex and difficult to use
     • 50% of the users found it to increase complexity and difficulty
CONTROLLED-NL APPROACH
Results for expert users
    • Controlled-NL very restrictive for expert users (least-liked)
    • Highest query input time
                   120
                                   Best 1
                                                        0.9
                   100                                  0.8




                                  Query Language Rank
                                                        0.7   Graph-based1
                   80
Input Time (Sec)




                                                        0.6   Graph-based2
                   60                                   0.5   Form-based
                                                        0.4   Controlled-NL
                   40
                                                        0.3   Free-NL
                   20                                   0.2
                                                        0.1
                    0                                     0
                                        Worst
Results for casual users
• Controlled-NL provided most support for casual users.

• Users’ positive feedback for controlled-NL:
   – allow only correct queries (50%)
   – suggestions and guidance to formulate queries (40%)

   Example: Although Ginseng is limited to specific vocabulary, I
   knew that I will get answers once I can do the query because it
   only allows the correct ones and thus I didn't keep trying a lot
   of queries that I wasn't sure about.
RESULTS INDEPENDENT OF USER TYPE
Free-NL approach
  + simplest and most natural
  - suffer from habitability problem.

• Feedback:     “I have to guess the right words”
  – Example: `run through’ with `river’ but not `traverse’.

• NLP-Reduce:
  – lowest success rate: 20%
  – highest number of attempts: 4.2
Negation
• Tell me which rivers do not traverse the state with the
  capital Nashville?
                           1
                          0.9
                          0.8
                          0.7
      Answer Found Rate




                                                              Graph-based1
                          0.6
                                                              Graph-based2
                          0.5
                                                              Form-based
                          0.4
                                                              Controlled-NL
                          0.3
                                                              Free-NL
                          0.2
                          0.1
                           0
                                Expert Users   Casual Users
Negation
  Tell me which states does the river Mississippi does not
  traverse.


• “Closed world assumption (CWA): presumption that what
  is not currently known to be true is false”.
   <Mississippi, traverse, Louisiana>


• “Open world assumption (OWA): assumption that the
  truth-value of a statement is independent of whether or
  not it is known by any single observer or agent to be true”.
   <Mississippi, not_traverse, Alabama>
Formal Query
• Formal Query (e.g., SPARQL)
Formal Query
• Benefit of showing formal query depends on user type.

• Formal query perceived by:
   – Casual users: not understandable and confusing

   – Expert users: increased confidence

      Also, performing direct changes to the formal query
      increased the expressiveness of the query language.
Results presentation
• Results presentation and format affected usability and user
  satisfaction.
   – Unless users are very familiar with the data, presenting URIs
     alone is not very helpful.




   – Example: A query for rivers returns one of the answers:
      http://www.mooney.net/geo#tennesse2
Results Content
• Results should be augmented with associated information
  to provide a `richer’ user experience.

• Users feedback:
   – Maybe a `mouse over' function to show more
     information.
   – Perhaps related information with the results.
   – Results very limited, would be good to have more
     context.
CONCLUSIONS
Research Question & Approach
How do different types of users perceive the usability
of different query approaches?

  - Assess usability and user satisfaction of:
     * Free-NL, Controlled-NL, Form-based, Graph-based

  - from the perspective of
      * expert users and casual users
Conclusions

Expert Users                            Casual Users
• Graph-based most preferred            • Form-based mid-point
  - Intuitive                             - Allow more complex queries than
  - Support complex queries                 NL.
                                          - Easier than graph-based
• Controlled-NL least preferred
                                          - Faster than graph-based
   - Very restrictive.
   - Limited expressiveness             • Controlled-NL most supportive
• Prefer flexibility of free-NL           - Only valid queries: Confidence
• Formal query provides confidence        - Vocabulary suggestions: guidance
  - Ability to change query increases   • Formal Query not understandable
    expressiveness.                       and confusing.

    • Users want search results to be augmented with more
      information to have a better understanding of the answers.
Recommendations
Cater to both expert and casual users:

• Hybridized query approach: Combine a view-based
  approach (visualize search space) with a NL-input feature
  (balance difficulty and speed) while including optional
  suggestions for the NL input (provide guidance).

• Results Content: Augment results with ‘extra’ and ‘related’
  information.
   – extra information: for ‘State’: capital, area, population.
   – related information: for ‘State’: rivers, lakes, mountains.
Limitations & Future work
• Limitation: Small size of the dataset.

• Assess learnability of different query approaches.

• Assess how interaction with the search tools affect the
  information seeking process: usefulness.

   – Use questions with an overall goal and compare users'
     knowledge before and after the search task.
Questions




Questions?

More Related Content

Similar to Evaluating Semantic Search Query Approaches with Expert and Casual Users

Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Hady Elsahar
 
MaximizingStrengthsFinal
MaximizingStrengthsFinalMaximizingStrengthsFinal
MaximizingStrengthsFinal
Jeanine K. Lineback
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's New
Lucidworks
 
Optique - poster
Optique - posterOptique - poster
Optique - poster
DBOnto
 
BehavioMetrics: A Big Data Approach
BehavioMetrics: A Big Data ApproachBehavioMetrics: A Big Data Approach
BehavioMetrics: A Big Data Approach
Jiang Zhu
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
jeykottalam
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's Architecture
Tony Tam
 
Musings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseMusings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBase
Jesse Yates
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
Ioan Toma
 
Asyma E3 2012 - How to make reporting, budgeting &amp; communications simple ...
Asyma E3 2012 - How to make reporting, budgeting &amp; communications simple ...Asyma E3 2012 - How to make reporting, budgeting &amp; communications simple ...
Asyma E3 2012 - How to make reporting, budgeting &amp; communications simple ...
asyma
 
Introduction to NVivo
Introduction to NVivoIntroduction to NVivo
Introduction to NVivo
Marieke Guy
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
Ivo Andreev
 
K-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role LabelingK-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role Labeling
Yunyao Li
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
Albert Y. C. Chen
 
Relevance Improvements at Cengage - Ivan Provalov
Relevance Improvements at Cengage - Ivan ProvalovRelevance Improvements at Cengage - Ivan Provalov
Relevance Improvements at Cengage - Ivan Provalov
lucenerevolution
 
From Academic Papers To Production : A Learning To Rank Story
From Academic Papers To Production : A Learning To Rank StoryFrom Academic Papers To Production : A Learning To Rank Story
From Academic Papers To Production : A Learning To Rank Story
Alessandro Benedetti
 
Facets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data ExplorationFacets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data Exploration
Roberto García
 
The effects of preferred text formatting on performance and perceptual appeal
The effects of preferred text formatting on performance and perceptual appealThe effects of preferred text formatting on performance and perceptual appeal
The effects of preferred text formatting on performance and perceptual appeal
Paul Doncaster
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databases
sjwoodman
 
Towards Detecting Performance Anti-patterns Using Classification Techniques
Towards Detecting Performance Anti-patterns Using Classification TechniquesTowards Detecting Performance Anti-patterns Using Classification Techniques
Towards Detecting Performance Anti-patterns Using Classification Techniques
James Hill
 

Similar to Evaluating Semantic Search Query Approaches with Expert and Casual Users (20)

Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
 
MaximizingStrengthsFinal
MaximizingStrengthsFinalMaximizingStrengthsFinal
MaximizingStrengthsFinal
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's New
 
Optique - poster
Optique - posterOptique - poster
Optique - poster
 
BehavioMetrics: A Big Data Approach
BehavioMetrics: A Big Data ApproachBehavioMetrics: A Big Data Approach
BehavioMetrics: A Big Data Approach
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's Architecture
 
Musings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseMusings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBase
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
 
Asyma E3 2012 - How to make reporting, budgeting &amp; communications simple ...
Asyma E3 2012 - How to make reporting, budgeting &amp; communications simple ...Asyma E3 2012 - How to make reporting, budgeting &amp; communications simple ...
Asyma E3 2012 - How to make reporting, budgeting &amp; communications simple ...
 
Introduction to NVivo
Introduction to NVivoIntroduction to NVivo
Introduction to NVivo
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
K-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role LabelingK-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role Labeling
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Relevance Improvements at Cengage - Ivan Provalov
Relevance Improvements at Cengage - Ivan ProvalovRelevance Improvements at Cengage - Ivan Provalov
Relevance Improvements at Cengage - Ivan Provalov
 
From Academic Papers To Production : A Learning To Rank Story
From Academic Papers To Production : A Learning To Rank StoryFrom Academic Papers To Production : A Learning To Rank Story
From Academic Papers To Production : A Learning To Rank Story
 
Facets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data ExplorationFacets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data Exploration
 
The effects of preferred text formatting on performance and perceptual appeal
The effects of preferred text formatting on performance and perceptual appealThe effects of preferred text formatting on performance and perceptual appeal
The effects of preferred text formatting on performance and perceptual appeal
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databases
 
Towards Detecting Performance Anti-patterns Using Classification Techniques
Towards Detecting Performance Anti-patterns Using Classification TechniquesTowards Detecting Performance Anti-patterns Using Classification Techniques
Towards Detecting Performance Anti-patterns Using Classification Techniques
 

Recently uploaded

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 

Evaluating Semantic Search Query Approaches with Expert and Casual Users

  • 1. Evaluating Semantic Search Query Approaches with Expert and Casual Users Khadija Elbedweihy, Stuart N. Wrigley and Fabio Ciravegna OAK Research Group, Department of Computer Science, University of Sheffield, UK
  • 2. Outline • Motivation • Research Question • Evaluation Design • Evaluation Setup • Findings • Conclusions
  • 3. Motivation – Semantic Search • Wikipedia states that Semantic Search “seeks to improve search accuracy by understanding searcher intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results”. • Covers broad category of applications in Semantic Web: – Search engines (e.g., Swoogle, FalconS, Sindice) – Closed-domain query interfaces (e.g., AquaLog, Querix) – Open-domain query interfaces (e.g., PowerAqua)
  • 4. Motivation - Evaluations • Evaluation of software is critical. • Large-scale evaluations foster research and development. • Semantic search evaluations (SemSearch, TREC ELC, QALD) focused on assessing retrieval performance. Assessing usability of tools and user satisfaction is important in Semantic Search.
  • 5. Research Question How do different types of users perceive the usability of different query approaches? • Method - Assess usability and user satisfaction of: * Free-NL, Controlled-NL, Form-based, Graph-based - from the perspective of * expert users and casual users
  • 6. Query Approaches Controlled-NL Free-NL Specific vocabulary Natural language queries Which state has river Submit capital What is the capital of Alabama? Submit lake mountai capital Alabama Submit n a any Form-based Graph-based Visualize the Visualize the search space search space
  • 7. Evaluation Design: Dataset • Mooney Natural Language Learning Data - simple and well-known domain (geography) - used by other studies within the search community - questions already available (877 NL questions) • Geography Dataset: – Concepts: State, City, Lake, Mountain, Capital, River, etc – Properties: population of state, length of river, etc – Relations linking concepts: State ‘hasCity’ City
  • 8. Evaluation Design: Data Collected • Objective data: 1) Input time 2) Number of attempts 3) Success rate • Subjective data, collected using: 1) Questionnaires (e.g., System Usability Scale ‘SUS’) 2) Ranking of the tools (w.r.t: system, query approach, results content, results presentation) 3) Observations
  • 9. Evaluation Setup • 20 subjects – 10 casual users, 10 expert users – 12 females, 8 males • Within-subjects: allows direct comparison. • Randomising tool order: normalize learning or tiredness effects. • Randomising question order: normalize learning effects.
  • 10. Results • Evaluated tools: – Free-NL: NLP-Reduce – Controlled-NL: Ginseng – Form-based: K-Search – Graph- based: • Semantic-Crystal (Graph-based 1) • Affective Graphs (Graph-based 2)
  • 12. Results for expert users • Expert users prefer graph- and form- based approach. • View-based allow more complex queries than NL-based. Best 1 0.9 0.8 Query Language Rank 0.7 Graph-based1 0.6 Graph-based2 0.5 Form-based 0.4 Controlled-NL 0.3 Free-NL 0.2 0.1 0 Worst
  • 13. Results for casual users • Casual users prefer form-based query approach. • Required less input time than graph-based approach. Best 100 1 90 0.9 Query Language Rank 80 0.8 70 0.7 Graph-based1 Input Time (Sec) 60 0.6 Graph-based2 50 0.5 Form-based 40 0.4 Controlled-NL 30 0.3 Free-NL 20 0.2 10 0.1 0 0 Worst
  • 15. Results for expert users • Visualizing the entire ontology supports query formulation – Semantic Crystal: shows the entire ontology. – Affective Graphs: shows selected concepts & relations.
  • 16. Results for casual users • Not showing ontology more complex for casual users: – Semantic Crystal receiving higher scores. – Affective Graphs perceived as complex and difficult to use • 50% of the users found it to increase complexity and difficulty
  • 18. Results for expert users • Controlled-NL very restrictive for expert users (least-liked) • Highest query input time 120 Best 1 0.9 100 0.8 Query Language Rank 0.7 Graph-based1 80 Input Time (Sec) 0.6 Graph-based2 60 0.5 Form-based 0.4 Controlled-NL 40 0.3 Free-NL 20 0.2 0.1 0 0 Worst
  • 19. Results for casual users • Controlled-NL provided most support for casual users. • Users’ positive feedback for controlled-NL: – allow only correct queries (50%) – suggestions and guidance to formulate queries (40%) Example: Although Ginseng is limited to specific vocabulary, I knew that I will get answers once I can do the query because it only allows the correct ones and thus I didn't keep trying a lot of queries that I wasn't sure about.
  • 21. Free-NL approach + simplest and most natural - suffer from habitability problem. • Feedback: “I have to guess the right words” – Example: `run through’ with `river’ but not `traverse’. • NLP-Reduce: – lowest success rate: 20% – highest number of attempts: 4.2
  • 22. Negation • Tell me which rivers do not traverse the state with the capital Nashville? 1 0.9 0.8 0.7 Answer Found Rate Graph-based1 0.6 Graph-based2 0.5 Form-based 0.4 Controlled-NL 0.3 Free-NL 0.2 0.1 0 Expert Users Casual Users
  • 23. Negation Tell me which states does the river Mississippi does not traverse. • “Closed world assumption (CWA): presumption that what is not currently known to be true is false”. <Mississippi, traverse, Louisiana> • “Open world assumption (OWA): assumption that the truth-value of a statement is independent of whether or not it is known by any single observer or agent to be true”. <Mississippi, not_traverse, Alabama>
  • 24. Formal Query • Formal Query (e.g., SPARQL)
  • 25. Formal Query • Benefit of showing formal query depends on user type. • Formal query perceived by: – Casual users: not understandable and confusing – Expert users: increased confidence Also, performing direct changes to the formal query increased the expressiveness of the query language.
  • 26. Results presentation • Results presentation and format affected usability and user satisfaction. – Unless users are very familiar with the data, presenting URIs alone is not very helpful. – Example: A query for rivers returns one of the answers: http://www.mooney.net/geo#tennesse2
  • 27. Results Content • Results should be augmented with associated information to provide a `richer’ user experience. • Users feedback: – Maybe a `mouse over' function to show more information. – Perhaps related information with the results. – Results very limited, would be good to have more context.
  • 29. Research Question & Approach How do different types of users perceive the usability of different query approaches? - Assess usability and user satisfaction of: * Free-NL, Controlled-NL, Form-based, Graph-based - from the perspective of * expert users and casual users
  • 30. Conclusions Expert Users Casual Users • Graph-based most preferred • Form-based mid-point - Intuitive - Allow more complex queries than - Support complex queries NL. - Easier than graph-based • Controlled-NL least preferred - Faster than graph-based - Very restrictive. - Limited expressiveness • Controlled-NL most supportive • Prefer flexibility of free-NL - Only valid queries: Confidence • Formal query provides confidence - Vocabulary suggestions: guidance - Ability to change query increases • Formal Query not understandable expressiveness. and confusing. • Users want search results to be augmented with more information to have a better understanding of the answers.
  • 31. Recommendations Cater to both expert and casual users: • Hybridized query approach: Combine a view-based approach (visualize search space) with a NL-input feature (balance difficulty and speed) while including optional suggestions for the NL input (provide guidance). • Results Content: Augment results with ‘extra’ and ‘related’ information. – extra information: for ‘State’: capital, area, population. – related information: for ‘State’: rivers, lakes, mountains.
  • 32. Limitations & Future work • Limitation: Small size of the dataset. • Assess learnability of different query approaches. • Assess how interaction with the search tools affect the information seeking process: usefulness. – Use questions with an overall goal and compare users' knowledge before and after the search task.