SlideShare a Scribd company logo
1 of 35
Methodology and Campaign Design for the Evaluation of Semantic Search Tools Stuart N. Wrigley1, Dorothee Reinhard2, Khadija Elbedweihy1, Abraham Bernstein2, Fabio Ciravegna1 4/27/10 1 1University of Sheffield, UK 2University of Zurich, Switzerland
Outline SEALS initiative Evaluation design Criteria Two phase approach API Workflow Data Results and Analyses Conclusions 4/27/10 2
Seals Initiative 4/27/10 3
SEALS goals Develop and diffuse best practices in evaluation of  semantic technologies Create a lasting reference infrastructure for semantic technology evaluation This infrastructure will be the SEALS Platform Organise two worldwide Evaluation Campaigns One this summer Next in late 2011 / early 2012 Facilitate the continuous evaluation of semantic technologies Allow easy access to both:  evaluation results (for developers and researchers)  technology roadmaps (for non-technical adopters) Transfer all infrastructure to the community 4/27/10 4
Targeted technologies Five different types of semantic technologies: Ontology Engineering tools Ontology Storage and Reasoning Systems Ontology Matching tools Semantic Web Service tools Semantic Search tools 4/27/10 5
What’s our general approach? Low overhead to the participant Automate as far as possible We provide the compute We initiate the actual evaluation run We perform the analysis Provide infrastructure for more than simply running high profile evaluation campaigns reuse existing evaluations for your personal testing create new ones evaluations  store / publish / download test data sets Encourage participation in evaluation campaign definitions and design Open Source (Apache 2.0) 4/27/10 6
SEALS Platform 4/27/10 7 Technology Developers Evaluation Organisers Technology Users SEALS Service Manager Runtime Evaluation Service Result Repository Service Tool Repository Service Test Data Repository Service Evaluation Repository Service
Search Evaluation Design 4/27/10 8
What do we want to do? Evaluate / benchmark semantic search tools with respect to their semantic peers. Allow as wide a range of interface styles as possible Assess tools on basis of a number of criteria including usability Automate (part) of it 4/27/10 9
Evaluation criteria User-centred search methodologies will be evaluated according to the following criteria: Query expressiveness Usability (effectiveness, efficiency, satisfaction) Scalability Quality of documentation Performance 4/27/10 10 ,[object Object]
How complex can the queries be?
Is it easy to understand?
Is it well structured?
How easy is the tool to use?
How easy is it to formulate the queries?
How easy is it to work with the answers?
Ability to cope with a large ontology
Ability to query a large repository in a reasonable time
Ability to cope with a large amount of results returnedResource consumption: ,[object Object]
CPU load
memory required,[object Object]
Evaluation criteria Each phase will address a different subset of criteria. Automated evaluation: query expressiveness, scalability, performance, quality of documentation User-in-the-loop: usability, query expressiveness 4/27/10 12
Running the Evaluation 4/27/10 13
Automated evaluation 4/27/10 14 Tools uploaded to platform. Includes: wrapper implementing API supporting libraries Test data and questions stored on platform Workflow specifies details of evaluation sequence Evaluation executed offline in batch mode Results stored on platform Analyses performed and stored on platform SEALS Platform API Runtime Evaluation Service Search tool
User in the loop evaluation 4/27/10 15 Performed at tool provider site All materials provided Controller software Instructions (leader and subjects) Questionnaires Data downloaded from platform Results uploaded to platform Over the web API Tool provider machine SEALSPlatform Search tool Controller
API A range of information needs to be acquired from the tool in both phases In automated phase, the tool has to be executed and interrogated with no human assistance. Interface between the SEALS platform and the tool must be formalised 4/27/10 16
API – common Load ontology success / failure informs the interoperability Determine result type ranked list or set? Results ready? used to determine execution time Get results list of URIs number of results to be determined by developer 4/27/10 17
API – user in the loop User query input complete? used to determine input time Get user query String representation of user’s query if NL interface, same as text inputted Get internal query String representation of the internal query for use with… 4/27/10 18
API – automated Execute query mustn’t constrain tool type to particular format tool provider given questions shortly before evaluation is executed tool provider converts those questions into some form of ‘internal representation’ which can be serialised as a String serialised internal representation passed to this method 4/27/10 19
Data 4/27/10 20
Data set – user in the loop Mooney Natural Language Learning Data used by previous semantic search evaluation simple and well-known domain using geography subset 9 classes 11 datatype properties 17 object properties and  697 instances 877 questions already available 4/27/10 21
Data set – automated EvoOnt set of object-oriented software source code ontologies easy to create different ABox sizes given a TBox 5 data set sizes: 1k, 10k, 100k, 1M, 10M triples questions generated by software engineers 4/27/10 22
Results and Analyses 4/27/10 23
Questionnaires 3 questionnaires: SUS questionnaire Extended questionnaire similar to SUS in terms of type of question but more detailed Demographics questionnaire 4/27/10 24
System Usability Scale (SUS) score SUS is a Likert scale 10-item questionnaire Each question has 5 levels (strongly disagree to strongly agree) SUS scores have a range of 0 to 100. A score of around 60 and above is generally considered as an indicator of good usability. 4/27/10 25

More Related Content

What's hot

The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
Ali Ouni
 
Understanding Eclipse Plug-in Test Suites @ The Eclipse Testing Day 2011
Understanding Eclipse Plug-in Test Suites @ The Eclipse Testing Day 2011Understanding Eclipse Plug-in Test Suites @ The Eclipse Testing Day 2011
Understanding Eclipse Plug-in Test Suites @ The Eclipse Testing Day 2011
Michaela Greiler
 
Smart debugger
Smart debuggerSmart debugger
Smart debugger
Tao He
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
Sung Kim
 

What's hot (7)

The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
 
Understanding Eclipse Plug-in Test Suites @ The Eclipse Testing Day 2011
Understanding Eclipse Plug-in Test Suites @ The Eclipse Testing Day 2011Understanding Eclipse Plug-in Test Suites @ The Eclipse Testing Day 2011
Understanding Eclipse Plug-in Test Suites @ The Eclipse Testing Day 2011
 
Performance Evaluation of Open Source Data Mining Tools
Performance Evaluation of Open Source Data Mining ToolsPerformance Evaluation of Open Source Data Mining Tools
Performance Evaluation of Open Source Data Mining Tools
 
Smart debugger
Smart debuggerSmart debugger
Smart debugger
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
 

Viewers also liked

Volatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity StreamsVolatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity Streams
Amparo Elizabeth Cano Basave
 
Asterid: Linked Data Asterisms
Asterid: Linked Data AsterismsAsterid: Linked Data Asterisms
Asterid: Linked Data Asterisms
Gregoire Burel
 
SEO, SEM & Social - Intro, Best Practices & Campaign Tools
SEO, SEM & Social - Intro, Best Practices & Campaign ToolsSEO, SEM & Social - Intro, Best Practices & Campaign Tools
SEO, SEM & Social - Intro, Best Practices & Campaign Tools
Luke Freeman
 

Viewers also liked (15)

Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
 
Exploring the similarity between Social Knowledge Sources and Twitter for Cro...
Exploring the similarity between Social Knowledge Sources and Twitter for Cro...Exploring the similarity between Social Knowledge Sources and Twitter for Cro...
Exploring the similarity between Social Knowledge Sources and Twitter for Cro...
 
Lodie overview
Lodie overviewLodie overview
Lodie overview
 
Volatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity StreamsVolatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity Streams
 
Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013
 
SEALS @ WWW2012
SEALS @ WWW2012SEALS @ WWW2012
SEALS @ WWW2012
 
Mapping Keywords to
Mapping Keywords to Mapping Keywords to
Mapping Keywords to
 
Asterid: Linked Data Asterisms
Asterid: Linked Data AsterismsAsterid: Linked Data Asterisms
Asterid: Linked Data Asterisms
 
SEO vs SEM
SEO vs SEMSEO vs SEM
SEO vs SEM
 
SEO, SEM & Social - Intro, Best Practices & Campaign Tools
SEO, SEM & Social - Intro, Best Practices & Campaign ToolsSEO, SEM & Social - Intro, Best Practices & Campaign Tools
SEO, SEM & Social - Intro, Best Practices & Campaign Tools
 
e-governance in India
e-governance in Indiae-governance in India
e-governance in India
 
Smart Cities and E-governance
Smart Cities and E-governanceSmart Cities and E-governance
Smart Cities and E-governance
 
E governance
E governanceE governance
E governance
 
Role of technology in SMART governance “Smart City, Safe City"
Role of technology in SMART governance “Smart City, Safe City"Role of technology in SMART governance “Smart City, Safe City"
Role of technology in SMART governance “Smart City, Safe City"
 
E governance
E governanceE governance
E governance
 

Similar to Methodology and Campaign Design for the Evaluation of Semantic Search Tools

Ui Design And Usability For Everybody
Ui Design And Usability For EverybodyUi Design And Usability For Everybody
Ui Design And Usability For Everybody
Empatika
 
7. evalution of interactive system
7. evalution of interactive system7. evalution of interactive system
7. evalution of interactive system
Kh Ravy
 
Developing of a web-based application to facilitate patient treatment adheren...
Developing of a web-based application to facilitate patient treatment adheren...Developing of a web-based application to facilitate patient treatment adheren...
Developing of a web-based application to facilitate patient treatment adheren...
Gunther Eysenbach
 
Developing of a web-based application to facilitate patient treatment adheren...
Developing of a web-based application to facilitate patient treatment adheren...Developing of a web-based application to facilitate patient treatment adheren...
Developing of a web-based application to facilitate patient treatment adheren...
Gunther Eysenbach
 
The Planets Testbed
The Planets TestbedThe Planets Testbed
The Planets Testbed
Max Kaiser
 
2004 01 10 Chef Sa V01
2004 01 10 Chef Sa V012004 01 10 Chef Sa V01
2004 01 10 Chef Sa V01
jiali zhang
 
WS-* Protocol Workshop Process Overview
WS-* Protocol Workshop Process OverviewWS-* Protocol Workshop Process Overview
WS-* Protocol Workshop Process Overview
Jorgen Thelin
 
Using Qualtrics to Create Automated Online Trainings
Using Qualtrics to Create Automated Online TrainingsUsing Qualtrics to Create Automated Online Trainings
Using Qualtrics to Create Automated Online Trainings
Shalin Hai-Jew
 

Similar to Methodology and Campaign Design for the Evaluation of Semantic Search Tools (20)

CREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 MayCREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 May
 
Ui Design And Usability For Everybody
Ui Design And Usability For EverybodyUi Design And Usability For Everybody
Ui Design And Usability For Everybody
 
7. evalution of interactive system
7. evalution of interactive system7. evalution of interactive system
7. evalution of interactive system
 
Replicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearchReplicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearch
 
Developing of a web-based application to facilitate patient treatment adheren...
Developing of a web-based application to facilitate patient treatment adheren...Developing of a web-based application to facilitate patient treatment adheren...
Developing of a web-based application to facilitate patient treatment adheren...
 
Developing of a web-based application to facilitate patient treatment adheren...
Developing of a web-based application to facilitate patient treatment adheren...Developing of a web-based application to facilitate patient treatment adheren...
Developing of a web-based application to facilitate patient treatment adheren...
 
Preservation Planning using Plato, by Hannes Kulovits and Andreas Rauber
Preservation Planning using Plato, by Hannes Kulovits and Andreas RauberPreservation Planning using Plato, by Hannes Kulovits and Andreas Rauber
Preservation Planning using Plato, by Hannes Kulovits and Andreas Rauber
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
WDES 2015 paper: A Systematic Mapping on the Relations between Systems-of-Sys...
WDES 2015 paper: A Systematic Mapping on the Relations between Systems-of-Sys...WDES 2015 paper: A Systematic Mapping on the Relations between Systems-of-Sys...
WDES 2015 paper: A Systematic Mapping on the Relations between Systems-of-Sys...
 
The Planets Testbed
The Planets TestbedThe Planets Testbed
The Planets Testbed
 
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
 
The recommendations system for source code components retrieval
The recommendations system for source code components retrievalThe recommendations system for source code components retrieval
The recommendations system for source code components retrieval
 
Mobile Healthcare App
Mobile Healthcare AppMobile Healthcare App
Mobile Healthcare App
 
2004 01 10 Chef Sa V01
2004 01 10 Chef Sa V012004 01 10 Chef Sa V01
2004 01 10 Chef Sa V01
 
WS-* Protocol Workshop Process Overview
WS-* Protocol Workshop Process OverviewWS-* Protocol Workshop Process Overview
WS-* Protocol Workshop Process Overview
 
Chapter04
Chapter04Chapter04
Chapter04
 
ESSENSE
ESSENSEESSENSE
ESSENSE
 
Data for Science Service Portfolio
Data for Science Service PortfolioData for Science Service Portfolio
Data for Science Service Portfolio
 
Using Qualtrics to Create Automated Online Trainings
Using Qualtrics to Create Automated Online TrainingsUsing Qualtrics to Create Automated Online Trainings
Using Qualtrics to Create Automated Online Trainings
 
Chapter 7)
Chapter 7)Chapter 7)
Chapter 7)
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Methodology and Campaign Design for the Evaluation of Semantic Search Tools

  • 1. Methodology and Campaign Design for the Evaluation of Semantic Search Tools Stuart N. Wrigley1, Dorothee Reinhard2, Khadija Elbedweihy1, Abraham Bernstein2, Fabio Ciravegna1 4/27/10 1 1University of Sheffield, UK 2University of Zurich, Switzerland
  • 2. Outline SEALS initiative Evaluation design Criteria Two phase approach API Workflow Data Results and Analyses Conclusions 4/27/10 2
  • 4. SEALS goals Develop and diffuse best practices in evaluation of semantic technologies Create a lasting reference infrastructure for semantic technology evaluation This infrastructure will be the SEALS Platform Organise two worldwide Evaluation Campaigns One this summer Next in late 2011 / early 2012 Facilitate the continuous evaluation of semantic technologies Allow easy access to both: evaluation results (for developers and researchers) technology roadmaps (for non-technical adopters) Transfer all infrastructure to the community 4/27/10 4
  • 5. Targeted technologies Five different types of semantic technologies: Ontology Engineering tools Ontology Storage and Reasoning Systems Ontology Matching tools Semantic Web Service tools Semantic Search tools 4/27/10 5
  • 6. What’s our general approach? Low overhead to the participant Automate as far as possible We provide the compute We initiate the actual evaluation run We perform the analysis Provide infrastructure for more than simply running high profile evaluation campaigns reuse existing evaluations for your personal testing create new ones evaluations store / publish / download test data sets Encourage participation in evaluation campaign definitions and design Open Source (Apache 2.0) 4/27/10 6
  • 7. SEALS Platform 4/27/10 7 Technology Developers Evaluation Organisers Technology Users SEALS Service Manager Runtime Evaluation Service Result Repository Service Tool Repository Service Test Data Repository Service Evaluation Repository Service
  • 9. What do we want to do? Evaluate / benchmark semantic search tools with respect to their semantic peers. Allow as wide a range of interface styles as possible Assess tools on basis of a number of criteria including usability Automate (part) of it 4/27/10 9
  • 10.
  • 11. How complex can the queries be?
  • 12. Is it easy to understand?
  • 13. Is it well structured?
  • 14. How easy is the tool to use?
  • 15. How easy is it to formulate the queries?
  • 16. How easy is it to work with the answers?
  • 17. Ability to cope with a large ontology
  • 18. Ability to query a large repository in a reasonable time
  • 19.
  • 21.
  • 22. Evaluation criteria Each phase will address a different subset of criteria. Automated evaluation: query expressiveness, scalability, performance, quality of documentation User-in-the-loop: usability, query expressiveness 4/27/10 12
  • 24. Automated evaluation 4/27/10 14 Tools uploaded to platform. Includes: wrapper implementing API supporting libraries Test data and questions stored on platform Workflow specifies details of evaluation sequence Evaluation executed offline in batch mode Results stored on platform Analyses performed and stored on platform SEALS Platform API Runtime Evaluation Service Search tool
  • 25. User in the loop evaluation 4/27/10 15 Performed at tool provider site All materials provided Controller software Instructions (leader and subjects) Questionnaires Data downloaded from platform Results uploaded to platform Over the web API Tool provider machine SEALSPlatform Search tool Controller
  • 26. API A range of information needs to be acquired from the tool in both phases In automated phase, the tool has to be executed and interrogated with no human assistance. Interface between the SEALS platform and the tool must be formalised 4/27/10 16
  • 27. API – common Load ontology success / failure informs the interoperability Determine result type ranked list or set? Results ready? used to determine execution time Get results list of URIs number of results to be determined by developer 4/27/10 17
  • 28. API – user in the loop User query input complete? used to determine input time Get user query String representation of user’s query if NL interface, same as text inputted Get internal query String representation of the internal query for use with… 4/27/10 18
  • 29. API – automated Execute query mustn’t constrain tool type to particular format tool provider given questions shortly before evaluation is executed tool provider converts those questions into some form of ‘internal representation’ which can be serialised as a String serialised internal representation passed to this method 4/27/10 19
  • 31. Data set – user in the loop Mooney Natural Language Learning Data used by previous semantic search evaluation simple and well-known domain using geography subset 9 classes 11 datatype properties 17 object properties and 697 instances 877 questions already available 4/27/10 21
  • 32. Data set – automated EvoOnt set of object-oriented software source code ontologies easy to create different ABox sizes given a TBox 5 data set sizes: 1k, 10k, 100k, 1M, 10M triples questions generated by software engineers 4/27/10 22
  • 33. Results and Analyses 4/27/10 23
  • 34. Questionnaires 3 questionnaires: SUS questionnaire Extended questionnaire similar to SUS in terms of type of question but more detailed Demographics questionnaire 4/27/10 24
  • 35. System Usability Scale (SUS) score SUS is a Likert scale 10-item questionnaire Each question has 5 levels (strongly disagree to strongly agree) SUS scores have a range of 0 to 100. A score of around 60 and above is generally considered as an indicator of good usability. 4/27/10 25
  • 36. Demographics Age Gender Profession Number of years in education Highest qualification Number of years in employment knowledge of informatics knowledge of linguistics knowledge of formal query languages knowledge of English … 4/27/10 26
  • 37. Automated Results Execution success (OK / FAIL / PLATFORM ERROR) Triples returned Time to execute each query CPU load, memory usage Analyses Ability to load ontology and query (interoperability) Precision and Recall (search accuracy and query expressiveness) Tool robustness: ratio of all benchmarks executed to number of failed executions 4/27/10 27
  • 38. User in the loop Results (other than core results similar to automated phase) Query captured by the tool Underlying query (e.g., SPARQL) Is answer in result set? (user may try a number of queries before being successful) time required to obtain answer number of queries required to answer question Analyses Precision and Recall Correlations between results and SUS scores, demographics, etc 4/27/10 28
  • 39. Dissemination Results browsable on the SEALS portal Split into three areas: performance usability comparison between tools 4/27/10 29
  • 41.
  • 43. Get involved! First Evaluation Campaign in all SEALS technology areas this Summer Get involved – your input and participation is crucial Workshop planned for ISWC 2010 after campaign Find out more (and take part!) at: http://www.seals-project.eu or talk to me, or email me (s.wrigley@dcs.shef.ac.uk) 4/27/10 33
  • 44. Timeline May 2010: Registration opens May-June 2010: Evaluation materials and documentation are provided to participants July 2010: Participants upload their tools August 2010: Evaluation scenarios are executed September 2010: Evaluation results are analysed November 2010: Evaluation results are discussed at ISWC 2010 workshop (tbc) 4/27/10 34
  • 45. Best paper award SEALS is proud to be sponsoring the best paper award here at SemSearch2010 Congratulations to the winning authors! 4/27/10 35