Your SlideShare is downloading. ×
The evolution of semantic technology evaluation in my own flesh (The 15 tips for technology evaluation)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

The evolution of semantic technology evaluation in my own flesh (The 15 tips for technology evaluation)

196
views

Published on

Slides of my talk given at IMATI-CNR on October 15th 2013. …

Slides of my talk given at IMATI-CNR on October 15th 2013.

If you like them, I am available for gigs!

Abstract:
In this talk I will describe how semantic technology evaluation has evolved in the last ten years, focusing on my own research and experiences. It starts with evaluation as a one-time one-user activity and shows the progress towards mature evaluations that are community-driven and supported by rich methods and infrastructures. Along this talk, I will unveil the 15 tips for technology evaluation, which should be of interest for anyone interested in such topic.

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
196
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The evolution of semantic technology evaluation in my own flesh (The 15 tips for technology evaluation) Raúl García-Castro Ontology Engineering Group. Universidad Politécnica de Madrid, Spain rgarcia@fi.upm.es Speaker: Raúl García-Castro Talk at IMATI-CNR, October 15th, Genova, Italy
  • 2. Index •  •  •  •  •  Self-awareness Crawling (Graduation Project) Walking (Ph.D. Thesis) Cruising (Postdoctoral Research) Insight © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 2
  • 3. Who am I? •  Assistant Professor -  Ontology Engineering Group -  Computer Science School at Universidad Politécnica de Madrid (UPM) •  Research lines -  Evaluation and benchmarking of semantic technologies •  Conformance and interoperability of ontology engineering tools •  Evaluation infrastructures -  Ontological engineering •  Sensors, ALM, energy efficiency, context, software evaluation -  Application integration http://www.garcia-castro.com/ © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 3
  • 4. Semantic Web technologies The Semantic Web is: •  “An extension of the current web in which information is given welldefined meaning, better enabling computers and people to work in cooperation” [Berners-Lee et al., 2001] •  A common framework for data sharing and reusing across applications •  Distinctive characteristics: -  -  -  -  Use of W3C standards Use ontologies as data models Inference of new information Open world assumption Information Directory manager Service directory manager Ontology editor Ontology visualizer •  High heterogeneity: -  Different functionalities •  In general •  In particular -  Different KR formalisms •  Different expressivity •  Different reasoning capabilities Ontology browser Ontology selector Service discoverer Ontology aligner Ontology localizer Ontology evaluator Ontology searcher Ontology learner Ontology ranker Ontology modularizer Ontology profiler ONTOLOGY ONTOLOGY DEVELOPMENT & MANAGEMENT CUSTOMIZATION Ontology evolution manager Ontology evolution visualizer Service non-functional selector Ontology matcher Ontology merger Service process mediator Instance editor Query answering Ontology integrator Manual annotation Ontology transformer Automatic annotation Semantic query processor Ontology configuration manager Ontology reconciler Ontology populator ONTOLOGY EVOLUTION ONTOLOGY ALIGNMENT ONTOLOGY INSTANCE GENERATION Ontology versioner Service choreography engine Distributed ontology repository Distributed instance repository Distributed data repository Distributed annotated data repository Service orchestration Distributed alignment repository Semantic query editor Service composer Distributed registry QUERYING AND REASONING SEMANTIC WEB SERVICES DATA MANAGEMENT García-Castro, R.; Muñoz-García, O.; Gómez-Pérez, A.; Nixon L. "Towards a component-based framework for developing Semantic Web applications". 3rd Asian Semantic Web Conference (ASWC 2008). 2-5 February, 2009. Bangkok, Thailand. © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 4
  • 5. Ontology engineering tools Allow the creation and management of ontologies: •  Ontology editors -  User oriented •  Ontology language APIs -  Programming oriented © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 5
  • 6. Index •  •  •  •  •  Self-awareness Crawling (Graduation Project) Walking (Ph.D. Thesis) Cruising (Postdoctoral Research) Insight http://www.phdcomics.com/comics/archive.php?comicid=1012 © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 6
  • 7. Evaluation goal GQM paradigm: Any software measurement activity should be preceded by: 1.- The identification of a software engineering goal ... Latency 2.- ... which leads to questions ... Scalability 3.- ... which in turn lead to actual metrics. Goal: To improve the performance and the scalability of the methods provided by the ontology management APIs of ontology development tools Which is the actual performance of the API methods? Is the performance of the methods stable? Are there any anomalies in the performance of the methods? Do changes in a method’s parameters affect its performance? Does tool load affect the performance of the methods? Execution time of each method Variance of execution times of each method Percentage of execution times out of range in each method’s sample Execution time with parameter A = Execution time with parameter B Tool load versus execution time relationship Metric: Execution times of the methods of the API with different load factors © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 7
  • 8. Evaluation data •  Atomic operations of the ontology management API •  Multiple benchmarks defined for each method according to changes in its parameters •  Benchmarks parameterised according to the number of consecutive executions of the method insertConcept(String ontology, String concept) insertConcept insertRelation insertClassAttribute insertInstanceAttribute insertConstant insertReasoningElement insertInstance updateConcept updateRelation updateClassAttribute updateInstanceAttribute updateConstant updateReasoningElement updateInstance ....... (72 methods) © Raúl García Castro benchmark1_1_08(N) “Inserts N concepts in 1 ontology” benchmark1_1_09(N) “Inserts 1 concept in N ontologies” Ontology_1 Concept_1 . . . Ontology_1 Concept_1 Concept_N . . . Ontology_N (128 benchmarks) Talk at IMATI-CNR. 15th October 2013 8
  • 9. Workload generator •  Generates and inserts into the tool synthetic ontologies accordant with: -  Load factor (X). Defines the size of ontology data -  Ontology structure dependent on the benchmarks Benchmark Operation Execution needs benchmark1_1_08 Inserts N concepts in an ontology 1 ontology benchmark1_1_09 Inserts a concept in N ontologies N ontologies benchmark1_3_20 Removes N concepts from an ontology 1 ontology with N concepts benchmark1_3_21 Removes a concept from N ontologies N ontologies with 1 concept For executing all the benchmarks, the ontology structure includes the execution needs of all the benchmarks © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 9
  • 10. Evaluation infrastructure Benchmark Suite Executor Performance Benchmark Suite Workload Generator Ontology Development Tool Measurement Data Library Statistical Analyser To be instantiated for each tool … http://knowledgeweb.semanticweb.org/wpbs/ © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 10
  • 11. Statistical analyser Benchmark Suite Executor Performance Benchmark Suite Workload Generator Ontology Development Tool BenchStats Measurement Data Library Statistical Analyser Measurement Data Library benchmark1_1_08 400 measurements 2134 ms. 2300 ms. 2242 ms. 2809 ms. ... Statistical software benchmark1_1_09 400 measurements 1399 ms. 2180 ms. ... benchmark1_3_20 400 measurements 2032 ms. 1459 ms. ... … © Raúl García Castro Load N UQ LQ IQR Median % Outliers Function benchmark1_1_08 5000 400 60 60 0 60 1.25 y=62.0-0.009x benchmark1_1_09 5000 400 912 901 11 911 1.75 y=910.25-0.003x benchmark1_3_20 5000 400 160 150 10 150 1.25 y=155.25-0.003x benchmark1_3_21 5000 400 160 150 10 151 0.25 y=154.96-0.001x Talk at IMATI-CNR. 15th October 2013 11
  • 12. Result analysis - Latency Metric for the execution time: Metric for anomalies in the execution times: The median of the execution times of a method Percentage of outliers in the execution times of a method No se puede mostrar la imagen. Puede que su equipo no tenga suficiente memoria para abrir la imagen o que ésta esté dañada. Reinicie el equipo y, a continuación, abra el archivo de nuevo. Si sigue apareciendo la x roja, puede que tenga que borrar la imagen e insertarla de nuevo. 8 Methods with execution times>800 ms. No se puede mostrar la imagen. Puede que su equipo no tenga suficiente memoria para abrir la imagen o que ésta esté dañada. Reinicie el equipo y, a continuación, abra el archivo de nuevo. Si sigue apareciendo la x roja, puede que tenga que borrar la imagen e insertarla de nuevo. N=400, X=5000 2 methods with % outliers>5% N=400, X=5000 Metric for the variability of the execution time: Effect of changes in method parameters: The interquartile range of the execution times of a method Comparison of the medians of the execution times of the benchmarks that use the same method No se puede mostrar la imagen. Puede que su equipo no tenga suficiente memoria para abrir la imagen o que ésta esté dañada. Reinicie el equipo y, a continuación, abra el archivo de nuevo. Si sigue apareciendo la x roja, puede que tenga que borrar la imagen e insertarla de nuevo. 3 methods with IQR>11 ms. N=400, X=5000 © Raúl García Castro No se puede mostrar la imagen. Puede que su equipo no tenga suficiente memoria para abrir la imagen o que ésta esté dañada. Reinicie el equipo y, a continuación, abra el archivo de nuevo. Si sigue apareciendo la x roja, puede que tenga que borrar la imagen e insertarla de nuevo. 5 methods with differences in execution times > 60 ms. N=400, X=5000 Talk at IMATI-CNR. 15th October 2013 12
  • 13. Result analysis - Scalability Effect of changes in WebODE’s load: Slope of the function estimated by simple linear regression of the medians of the execution times from a minimum load (X=500) to a maximum one (X=5000). 8 methods with slope>0.1 ms. N=400, X=500..5000 © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 13
  • 14. Limitations •  Evaluating other tools is expensive Benchmark Suite Executor Workload Generator Performance Benchmark Suite Measurement Data Library Statistical Analyser Ontology Ontology Ontology Development Ontology Development Development Tool Development Tool Tool Tool •  Analysis of results was difficult -  The evaluation was executed 10 times with different load factors: 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, and 5000 -  128 benchmarks x 10 executions = 1280 files with results!!!!! García-Castro R., Gómez-Pérez A "Guidelines for Benchmarking the Performance of Ontology Management APIs" 4th International Semantic Web Conference (ISWC2005), LNCS 3729. November 2005. Galway, Ireland. © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 14
  • 15. The 15 tips for technology evaluation •  Know the technology •  Support different types of technology © Raúl García Castro •  Automate the evaluation framework •  Expect reproducibility •  Beware of result analysis •  Learn statistics •  Plan for evaluation requirements Talk at IMATI-CNR. 15th October 2013 15
  • 16. Index •  •  •  •  •  Self-awareness Crawling (Graduation Project) Walking (Ph.D. Thesis) Cruising (Postdoctoral Research) Insight KHAAAAN! http://www.phdcomics.com/comics/archive.php?comicid=500 © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 16
  • 17. Interoperability in the Semantic Web •  Interoperability is the ability that Semantic Web technologies have to interchange ontologies and use them -  At the information level; not at the system level -  In terms of knowledge reuse; not information integration •  In the real world it is not feasible to use a single system or a single formalism •  Different behaviours in interchanges between different formalisms: C disjoint subclass A © Raúl García Castro B LOSSLESS Different formalism A LOSS Same formalism disjoint B subclass A C subclass disjoint subclass A A C B disjoint myDisjoint A myDisjoint B C subclass B C B A B Talk at IMATI-CNR. 15th October 2013 17
  • 18. Evaluation goal To evaluate and improve the interoperability of Semantic Web technologies using RDF(S) and OWL as interchange languages © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 18
  • 19. Evaluation workflow - Manual Import Export Ontologies Ontologies Tool X Oi RDF(S)/OWL Tool X Oi’ Oi Oi’ RDF(S)/OWL Oi = Oi’ + β - β’ Oi = Oi’ + α – α’ Interoperability Ontologies Tool Y Tool X Oi Oi’ RDF(S)/OWL Oi’’ Oi = Oi’’ + α - α’ + β - β’ © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 19
  • 20. Evaluation workflow - Automatic Interchange language Existing ontologies O1..On Tool X O1RDF(S)/OWL Tool Y O1’ O1’’ RDF(S)/OWL Step 1: Import + Export O1’’’ O1’’’’ RDF(S)/OWL Step 2: Import + Export O1’’=O1’’’’ + β - β’ O1 = O1’’ + α - α’ Interchange O1 = O1’’’’ + α - α’ + β - β’ © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 20
  • 21. Evaluation data - OWL Lite Import Test Suite Component combinations RDF/XML Syntax variants <rdf:Description rdf:about="#class1"> <rdf:type rdf:resource="&rdfs;Class"/> </rdf:Description> = <rdfs:Class rdf:about="#class1"> </rdfs:Class> Group Class hierarchies Class equivalences Classes defined with set operators Property hierarchies Properties with domain and range Relations between properties Global cardinality constraints and logical property characteristics Single individuals Named individuals and properties Anonymous individuals and properties Individual identity Syntax and abbreviation TOTAL No. 17 12 2 4 10 3 5 Subclass of class Subclass of restriction Value constraints Set operators Cardinality + object property Cardinality + datatype property 3 5 3 3 15 82 David S., García-Castro, R.; Gómez-Pérez, A. "Defining a Benchmark Suite for Evaluating the Import of OWL Lite Ontologies". Second International Workshop OWL: Experiences and Directions 2006 (OWL2006). November, 2006. Athens, Georgia, USA. © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 21
  • 22. Evaluation criteria •  Execution informs about the correct execution: –  –  –  –  OK. No execution problem FAIL. Some execution problem Comparer Error (C.E.) Comparer exception Not Executed. (N.E.) Second step not executed •  Information added or lost in terms of triples Oi = Oi’ + α - α’ •  Interchange informs whether the ontology has been interchanged correctly with no addition or loss of information: –  SAME if Execution is OK and Information added and Information lost are void –  DIFFERENT if Execution is OK but Information added or Information lost are not void –  NO if Execution is FAIL, N.E. or C.E. © Raúl García Castro Oi = Oi’ ? Talk at IMATI-CNR. 15th October 2013 22
  • 23. Evaluation campaigns RDF(S) Interoperability Benchmarking 3 ontology repositories 3 ontology development tools 6 Tools (Frames) OWL Interoperability Benchmarking 1 ontology-based annotation tool 5 ontology development tools 3 ontology repositories 9 Tools © Raúl García Castro SemTalk (Frames) (OWL) (Frames) Talk at IMATI-CNR. 15th October 2013 23
  • 24. Evaluation infrastructure - IRIBA ! ! ! ! ! ! http://knowledgeweb.semanticweb.org/iriba/ © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 24
  • 25. Evaluation infrastructure - IBSE Benchmark descriptions OWL Lite Import Benchmark Suite benchmarkOntology Reports (HTML, SVG) rdf:type 1 Describe benchmarks <rdf:RDF xmlns:rdf="http://www.w3.org/ <rdf:RDF 1999/02/22-rdf-syntax-ns#" xmlns:rdf="http://www.w3.org/ <rdf:RDF xmlns:rdfs="http://www.w3.org/ 1999/02/22-rdf-syntax-ns#" xmlns:rdf="http://www.w3.org/ 2000/01/rdf-schema#" xmlns:rdfs="http://www.w3.org/ 1999/02/22-rdf-syntax-ns#" xmlns:owl="http://www.w3.org/ 2000/01/rdf-schema#" xmlns:rdfs="http://www.w3.org/ 2002/07/owl#" xmlns:xsd="http:// 2000/01/rdf-schema#" www.w3.org/2001/XMLSchema#" xmlns:owl="http://www.w3.org/ arkOntology#" arkOntology#"> 2002/07/owl#" <owl:Ontology rescription of the benchmark suite inputs.</ rdfs:comment> <owl:versionInfo>24 October 2006</owl:versionInfo> </owl:Ontology> <!-- classes --> Execution results resultOntology rdf:type Tools … •  •  •  •  2 Execute benchmarks <rdf:RDF xmlns:rdf="http://www.w3.org/ <rdf:RDF 1999/02/22-rdf-syntax-ns#" xmlns:rdf="http://www.w3.org/ <rdf:RDF xmlns:rdfs="http://www.w3.org/ 1999/02/22-rdf-syntax-ns#" xmlns:rdf="http://www.w3.org/ 2000/01/rdf-schema#" xmlns:rdfs="http://www.w3.org/ 1999/02/22-rdf-syntax-ns#" xmlns:owl="http://www.w3.org/ 2000/01/rdf-schema#" xmlns:rdfs="http://www.w3.org/ 2002/07/owl#" xmlns:xsd="http:// 2000/01/rdf-schema#" www.w3.org/2001/XMLSchema#" xmlns:owl="http://www.w3.org/ arkOntology#" arkOntology#"> 2002/07/owl#" <owl:Ontology rescription of the benchmark suite inputs.</ rdfs:comment> <owl:versionInfo>24 October 2006</owl:versionInfo> </owl:Ontology> <!-- classes --> 3 Generate reports Automatically executes experiments between all the tools Allows configuring different execution parameters Uses ontologies to represent benchmarks and results Depends on external ontology comparers (KAON2 OWL Tools and RDFutils) http://knowledgeweb.semanticweb.org/benchmarking_interoperability/ibse/ García-Castro, R.; Gómez-Pérez, A., Prieto-González J. "IBSE: An OWL Interoperability Evaluation Infrastructure". Third International Workshop OWL: Experiences and Directions 2007 (OWL2007). June, 2007. Innsbruck, Austria. © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 25
  • 26. Evaluation results - Variability •  High variability in evaluation results Tool import/export: Ontology comparison: Same information More information Less information Tool fails Comparer fails Not valid ontology Models and executes Does not model and executes Models and fails Does not model and fails Not executed •  Different perspectives for analysis -  -  -  -  50 45 Results per tool / pair of tools Results per component Result evolution over time … W-W K-P K-W P-W K-P-W Y Y Y Y Y - Y N Y - Y - Y - Classes instance of multiple metaclasses (1) Y N - N - - - Class hierarchies without cycles (3) Y Y Y Y Y Y Y Y Y - - N - - - - Datatype properties whose range is String (5) Datatype properties whose range is a XML Schema datatype (2) Object properties without domain or range (8) Object properties with a domain and a range (2) Y Y Y Y Y Y Y Y Y Y N Y Y N Y Y Y Y N Y Object properties with multiple domains or ranges (5) Instances of undefined resources (1) Instances of a single class (2) Instances of a multiple classes (1) Instances related via object properties (7) Instances related via datatype properties (2) Instances related via datatype properties with range a XML schema datatype (2) Instances related via undefined object or datatype properties (3) Y Y Y Y Y - Y N Y Y - Y Y Y Y - Y N Y N - Y Y Y - Y Y Y - Y Y N - Models and executes 30 Not models and exec 25 Models and fails 20 Not models and fails 15 10 5 0 04-2005 Y Class hierarchies with cycles (2) Classes related through object or datatype properties (6) Datatype properties without domain or range (7) Datatype properties with multiple domains (3) © Raúl García Castro K-K 35 05-2005 10-2005 DESTINATION 01-2006 DESTINATION JE PO SW K2 GA ST WE PF Jena 100 100 100 78 85 16 17 5 5 Protégé-OWL 100 100 95 78 89 16 17 5 17 5 SWIProlog 100 100 100 78 55 45 17 5 39 6 0 KAON2 78 78 78 78 40 39 6 0 46 13 15 13 GATE 96 52 79 74 46 13 15 13 JE PO SW K2 GA ST WE PF Jena 100 100 100 78 85 16 17 5 Protégé-OWL 100 100 95 78 89 16 17 SWIProlog 100 100 100 78 55 45 KAON2 78 78 78 78 40 GATE 96 52 79 74 ORIGIN P-P Classes (2) Classes instance of a single metaclass (4) ORIGIN Combinations 40 SemTalk 45 46 46 27 24 46 17 0 SemTalk 45 46 46 27 24 46 17 0 WebODE 17 18 0 6 16 17 17 12 WebODE 17 18 0 6 16 17 17 12 Protégé-Frames 5 5 0 0 4 5 0 13 Protégé-Frames 5 5 0 0 4 5 0 13 Talk at IMATI-CNR. 15th October 2013 26
  • 27. Evaluation results - Interoperability Clear picture of the interoperability between different tools •  Low interoperability and few clusters of interoperable tools •  Interoperability depends on: -  -  -  -  Ontology translation (tool knowledge model) Specification (development decisions) Robustness (tool defects) Tools participating in the interchange (each behaves differently) •  Tools have improved •  Involvement of tool developers is needed -  Tool developers have been informed -  Tool improvement is out of our scope •  Results are expected to change -  Continuous evaluation is needed García-Castro, R.; Gómez-Pérez, A. "Interoperability results for Semantic Web technologies using OWL as the interchange language". Web Semantics: Science, Services and Agents in the World Wide Web. ISSN: 1570-8268. Elsevier. Volume 8, number 4. pp. 278-291. November 2010. García-Castro, R.; Gómez-Pérez, A. "RDF(S) Interoperability Results for Semantic Web Technologies". International Journal of Software Engineering and Knowledge Engineering. ISSN: 0218-1940. Editor: Shi-Kuo Chang. Volume 19, number 8. pp. 1083-1108. December 2009. © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 27
  • 28. Benchmarking interoperability Method for benchmarking interoperability •  Common for different Semantic Web technologies •  Problem-focused instead of tool-focused •  Manual vs automatic experiments: -  It depends on the specific needs of the benchmarking -  Automatic: cheaper, more flexible and extensible -  Manual: higher quality of results Resources for benchmarking interoperability •  All the benchmark suites, software and results are publicly available •  Independent of: RDF(S) Interoperability B. OWL Interoperability B. RDF(S) Import B. Suite RDF(S) Export B. Suite OWL Lite Import B. Suite RDF(S) Interoperability B. Suite -  The interchange language -  The input ontologies Manual Tool X Automatic Tool Y rdfsbs IRIBA Tool X Tool Y IBSE García-Castro, R. "Benchmarking Semantic Web technology". Studies on the Semantic Web vol. 3. AKA Verlag – IOS Press. ISBN: 978-3-89838-622-7. January 2010. © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 28
  • 29. Limitations •  Number of results to analyse increased exponentially -  2168 executions in the RDF(S) benchmarking activity and -  6642 executions in the OWL one •  Hard to support and maintain different test data and tools •  Every tool to be evaluated had to be deployed in the same computer © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 29
  • 30. The 15 tips for technology evaluation •  Know the technology •  Support different test data •  Support different types of technology •  Use machineprocessable descriptions of evaluation resources © Raúl García Castro •  Automate the evaluation framework •  Expect reproducibility •  Beware of result analysis •  Learn statistics •  Plan for evaluation requirements •  Organize (or join) evaluation campaigns Talk at IMATI-CNR. 15th October 2013 30
  • 31. Index •  •  •  •  •  Self-awareness Crawling (Graduation Project) Walking (Ph.D. Thesis) Cruising (Postdoctoral Research) Insight http://www.phdcomics.com/comics/archive.php?comicid=570 © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 31
  • 32. The SEALS Project (RI-238975) hVp://www.seals-­‐project.eu/   Project  Coordinator:     Asunción  Gómez  Pérez   <asun@fi.upm.es>   3 2 1 1 2 1 © Raúl García Castro EC    contribu2on:   3.500.000  €   Dura2on:     June  2009-­‐June  2012   Universidad  Politécnica  de  Madrid,  Spain  (Coordinator)   University  of  Sheffield,  UK   University  of  Mannheim,  Germany   Forschungszentrum  InformaCk,  Germany   University  of  Zurich,  Switzerland   University  of  Innsbruck,  Austria   STI  InternaConal,  Austria   InsCtut  NaConal  de  Recherche  en     Open  University,  UK   InformaCque  et  en  AutomaCque,  France   Oxford  University,  UK   Talk at IMATI-CNR. 15th October 2013 32
  • 33. Semantic technology evaluation @ SEALS SEALS  Pla8orm   SEALS  Evalua2on  Campaigns   SEALS  Evalua2on   Services   SEALS  Community   Wrigley S.; García-Castro R.; Nixon L. "Semantic Evaluation At Large Scale (SEALS)". 21st International World Wide Web Conference (WWW 2012). European projects track. pp. 299-302. Lyon, France. 16-20 April 2012. © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 33
  • 34. The SEALS entities Evaluations Tools Ontology engineering Storage and reasoning Ontology matching Semantic search Semantic web service Results Test Data Raw Results Interpretations 15/10/13 34 © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 34
  • 35. Structure of the SEALS entities • Java  Binaries   • Shell  scripts   • Bundles   • BPEL     • Java  Binaries   • Ontologies   EnCty   Discovery,   Valida2on   SEALS  Ontologies   Metadata   Data   Exploita2on   http://www.seals-project.eu/ontologies/ García-Castro R.; Esteban-Gutiérrez M.; Kerrigan M.; Grimm S. "An Ontology Model to Support the Automatic Evaluation of Software". 22nd International Conference on Software Engineering and Knowledge Engineering (SEKE 2010). pp. 129-134. Redwood City, USA. 1-3 July 2010. © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 35
  • 36. SEALS logical architecture O S   Evaluation Organisers A   Technology Providers Technology Adopters SEALS  Portal   Run2me   Evalua2on   Service   SEALS     Service  Manager   Software agents SEALS Repositories Test  Data     Repository   Service   Tools     Repository   Service   Results     Repository   Service   Evalua2on   Descrip2ons   Repository  Service   García-Castro R.; Esteban-Gutiérrez M.; Gómez-Pérez A. "Towards an Infrastructure for the Evaluation of Semantic Technologies". eChallenges e-2010 Conference (e-2010). pp. 1-8. Warsaw, Poland. 27-29 October 2010. © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 36
  • 37. Challenges •  Tool heterogeneity Virtualization as a technology enabler -  Hardware requirements -  Software requirements •  Reproducibility -  Ensure execution environment offers the same initial status Processing  Node   Execu)on   Node   Virtual  Machine   Tool   Virtual  Machine   Virtualiza)on  Solu)on   •  VMWare  Server  2.0.2   •  VMWare  vSphere  4   •  Amazon  EC2  (In  progress)   Tool   …   © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 37
  • 38. Evaluation campaign methodology SEALS Methodology for Evaluation Campaigns •  SEALS-independent •  Includes: -  -  -  -  -  -  Actors Process Recommendations Alternatives Terms of participation Use rights Raúl García-Castro and Stuart N. Wrigley September 2011 INITIATION   INVOLVEMENT   DISSEMINATION   PREPARATION  &  EXECUTION   FINALIZATION   García Castro R.; Martin-Recuerda F.; Wrigley S. "SEALS. Deliverable 3.8 SEALS Methodology for Evaluation Campaigns v2". Technical Report. SEALS project. July 2011. © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 38
  • 39. Current SEALS evaluation services Ontology engineering Ontology reasoning Ontology matching Semantic search Semantic web service •  Conformance •  Interoperability •  Scalability DL reasoning: • Classification • Class satisfiability • Ontology satisfiability • Entailment • Non-entailment • Instance retrieval RDF reasoning: • Conformance • Matching accuracy • Matching accuracy multilingual • Scalability (ontology size, # CPU) • Search accuracy, efficiency (automated) • Usability, satisfaction (userin-the-loop) • SWS Discovery Conformance & interoperability: •  RDF(S) •  OWL Lite, DL and Full •  OWL 2 Expressive x3 •  OWL 2 Full Scalability: •  Real-world •  LUBM •  Real-world + •  LUBM + DL reasoning: •  Gardiner test suite •  Wang et al. repository •  Versions of GALEN •  Ontologies from EU projects •  Instance retrieval test data RDF reasoning: • OWL 2 Full •  Benchmark •  Anatomy Conference •  MultiFarm •  Large Biomed (supported by SEALS) Automated: • EvoOnt • MusicBrainz (from QALD-1) User-in-the-loop: • Mooney • Mooney + • OWLS-TC 4.0 • SAWSDL-TC 3.0 • WSMO-LITE-TC Evaluations Test Data © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 39
  • 40. New evaluation data – Conformance and interoperability •  OWL DL test suite à keyword-driven approach -  Manual definition of test in CSV/ spreadsheet using a keyword library Test Suite Generator Test Suite Definition Script Expanded Test Suite Definition Script Preprocessor Test Suite Metadata ontology01.owl ontology02.owl Keyword Library Interpreter ontology03.owl … OWL2EG (http://knowledgeweb.semanticweb.org/benchmarking_interoperability/OWL2EG/) •  OWL 2 test suite à automatically generate ontologies of increasing expressiveness: Online Ontologies Ontology Search Ontology Module Extraction Initial ontologies Ontology generation process OWL API Original test suite Metad ata -  Using ontologies in the Web -  Maximizing expressiveness Expressive test suite Increase expressivity Metad ata Maximize expressivity OWLDLGenerator (http://knowledgeweb.semanticweb.org/benchmarking_interoperability/OWLDLGenerator/) Full-expressive test suite Metad ata García-Castro R.; Gómez-Pérez A. "A Keyword-driven Approach for Generating Ontology Language Conformance Test Data". Engineering Applications of Artificial Intelligence. ISSN: 0952-1976. Elsevier. Editor: B. Grabot. Grangel-González I.; García-Castro R. "Automatic Conformance Test Data Generation Using Existing Ontologies in the Web". Second International Workshop on Evaluation of Semantic Technologies (IWEST 2012). 28 May 2012. Heraklion, Greece. © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 40
  • 41. 1st Evaluation Campaign Campaign Ontology engineering Tool Jena Sesame Protégé 4 Protégé OWL NEON toolkit OWL API Reasoning HermiT jcel FaCT++ Matching AROMA ASMOV Aroma Falcon-AO Lily RiMOM Mapso CODI AgreeMaker Gerome* Ef2Match Semantic search K-Search Ginseng NLP-Reduce PowerAqua Jena Arq Semantic web service 4 OWLS-MX variants Provider Country HP Labs Aduna University of Stanford University of Stanford NEON Foundation University of Manchester University of Oxford Tec. Universitat Dresden University of Manchester INRIA INFOTECH Soft Nantes University Southeast University Southeast University Tsinghua University FZI University of Mannheim Advances in Computing Lab RWTH Aachen Nanyang Tec. University K-Now Ltd University of Zurich University of Zurich KMi, Open University HP Labs, Talis DFKI UK Netherlands USA USA Europe UK UK Germany UK France USA France China China China Germany Germany USA Germany China UK Switzerland Switzerland UK UK Germany 29 tools from 8 countries Nixon L.; García-Castro R.; Wrigley S.; Yatskevich M.; Trojahn-dos-Santos C.; Cabral L. "The state of semantic technology today – overview of the first SEALS evaluation campaigns". 7th International Conference on Semantic Systems (I-SEMANTICS2011). Graz, Austria. 7-9 September 2011. © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 41
  • 42. 2nd Evaluation Campaign W P Tool 10 Jena HP Labs UK W Tool Provider Country Sesame Aduna Netherlands P Protégé 4 University of Stanford USA 12 AgrMaker University of Illinois at Chicago Protégé OWL Stanford USA USA Country W Tool University of Provider Aroma INRIA Grenoble Rhone-Alpes France NeOn toolkit NeOn Foundation Europe P AUTOMSv2 University of Manchester VTT Technical Research Centre Finland OWL API UK 13 K-Search K-Now Ltd CIDER Universidad Politecnica de UK Spain UK HermiT University of Oxford Ginseng University of Zurich Switzerland Madrid jcel Technischen Universitat Dresden Germany University of Zurich Switzerland CODI NLP-Reduce Universitat Mannheim FaCT++ University of Manchester UK Germany KMi, Open University UK CSA PowerAqua University of Ho Chi Minh City Vietnam WSReasoner University of New Brunswick Canada Jena Arq v2.8.2 HP Labs, Talis UK GOMMA Universitat Leipzig Germany Jena Arq v2.9.0 HP Labs, Talis UK Hertuda Technische Universitat Germany rdfQuery v0.5.1University of Southampton UK Darmstadt LDOA beta Tunis-El Manar University Tunisia University of Zurich Lily Semantic Crystal Southeast University China Switzerland Affective Graphs University of Sheffield LogMap University of Oxford UK UK 14 WSMO-LITE-OU KMi, Open University LogMapLt University of Oxford UK UK SAWSDL-OU Maastricht University KMi, Open University UK MaasMtch Netherlands OWLS-URJC FZI Forschungszentrum Juan Carlos University of Rey Spain MapEVO Germany OWLS-M0 DFKI Germany Informatik MapPSO FZI Forschungszentrum Germany Informatik MapSSS Wright State University USA Optima University of Georgia USA WeSeEMtch Technische Universitat Germany Darmstadt YAM++ LIRMM France 11 © Raúl García Castro Provider 41 tools from 13 countries Country Talk at IMATI-CNR. 15th October 2013 42
  • 43. Evaluation services Tools Evaluations Test data My tool Or  define   your  own   © Raúl García Castro Tools My test data Tools Update  them   Evaluations My results Test data My evaluation Exploit  results     Execute   evaluaCons   Results My results My results My tool My test data Talk at IMATI-CNR. 15th October 2013 43
  • 44. Quality model for semantic technologies Tool/Measures Raw Results Interpretations Quality Measures Quality subcharacteristics Ontology engineering tools 7 20 8 6 Ontology matching tools 1 4 4 2 Reasoning systems 11 0 16 5 Semantic search tools 12 8 18 7 Semantic web service tools 5 9 10 2 Total 34 41 55 17 Radulovic, F., Garcia-Castro, R., Extending Software Quality Models - A Sample In The Domain of Semantic Technologies. 23rd International Conference on Software Engineering and Knowledge Engineering (SEKE2011). Miami, USA. July, 2011 © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 44
  • 45. Semantic technology recommendation I  need  a  robust   ontology  engineering   tool  and  a  seman.c   search  tool  with  the   highest  precision   User  Quality   Requirements   SEALS  Pla<orm   You  should  use  Sesame   v2.6.5  and  Arq  v2.9.0     The  reason  for  this  is...     Alterna.vely,  you  can  use  ...   SemanCc   Technology   Quality  model   Seman2c   Technology   Recommenda2on   Tools     Repository   Service   RecommendaCon   Results     Repository   Service   Radulovic F.; García-Castro R. "Semantic Technology Recommendation Based on the Analytic Network Process". 24th Int. Conference on Software Engineering and Knowledge Engineering (SEKE 2012). Redwood City, CA, USA. 1-3 July 2012. 3rd Best Paper Award! © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 45
  • 46. You can use the SEALS Platform •  The SEALS Platform facilitates: -  -  -  -  -  -  Comparing tools under common settings Reproducibility of evaluations Reusing evaluation resources, completely or partially Or defining new ones Managing evaluation resources using platform services Computational resources for demanding evaluations •  Don’t start your evaluation from scratch! 15/10/13 46 © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 46
  • 47. The 15 tips for technology evaluation •  Know the technology •  Support different test data •  Facilitate test data definition •  Support different types of technology •  Define declarative evaluation workflows •  Use machineprocessable descriptions of evaluation resources © Raúl García Castro •  Automate the evaluation framework •  Expect reproducibility •  Beware of result analysis •  Learn statistics •  Plan for evaluation requirements •  Use a quality model •  Organize (or join) evaluation campaigns •  Share evaluation resources •  Exploit evaluation results Talk at IMATI-CNR. 15th October 2013 47
  • 48. Index •  •  •  •  •  Self-awareness Crawling (Graduation Project) Walking (Ph.D. Thesis) Cruising (Postdoctoral Research) Insight insight noun […] [mass noun] Psychiatry awareness by a mentally ill person that their mental experiences are not based in external reality. © Raúl García Castro Talk at IMATI-CNR. 15th October 2013 48
  • 49. Evolution towards maturity Software Evaluation Technology Maturity Model TABLE I L EVELS AND THEMES OF SOFTWARE EVALUATION TECHNOLOGY MATURITY Level Initial Repeatable Reusable Integrated Optimized Formalization of the evaluation workflow Ad-hoc workflow informally defined. Ad-hoc workflow defined. Software support to the evaluation Manual evaluation. No software support. Ad-hoc evaluation software. Technology-specific workflow defined. Reusable evaluation software: - multiple software products. - multiple test data. Evaluation infrastructure: - multiple types of software products. - multiple test data. Federation of evaluation infrastructures: - autonomous infrastructures. - interchange of evaluation resources. - data access and use policies. Generic workflow defined. Machine-processable and built reusing common parts. Evaluation resources built upon shared principles. Generic workflow defined. Machine-processable and built reusing common parts. Evaluation resources built upon shared principles. Measured and optimized. Applicability to multiple software types Small number of software products of the same type. Small number of software products of the same type. Ad-hoc access to software products. Multiple software products of the same type. Generic access to software products. Usability of test data Informally defined. Defined. Machine-processable. Exploitability of results Informally defined. Not verifiable. Machine-processable. Combined for some software products of the same type. Machine-processable. Combined for many software products of the same type. Representativeness of participants One team. One or few teams. Several teams. UPM-FBI RDF(S) Import B. Suite RDF(S) Interoperability B. RDF(S) Export B. Suite OWL Interoperability B. OWL Lite Import B. Suite RDF(S) Interoperability B. Manual Suite Tool X rdfsbs IRIBA Multiple software products of different types. Generic access to software products. Machine-processable. Reused across evaluations. Machine-processable. Combined for many software products of different types. Machine-processable. Reused across evaluations. Customizable, optimized and curated. Machine-processable. Combined for many software products of different types. High availability and quality. Tool X Tool Y IBSE Several teams. Stakeholders. Multiple software products of different types. Generic access to software products. Support any software product requirement. Automatic Tool Y Community. characteristics of such software products. This workflow is access, interchange, and use. This federation of infrastructures supported by evaluation software that can be used to assess permits satisfying any software or hardware requirements of any software product of the type covered by the evaluation; the different software products; customizing, optimizing, and García-Castro R. "SET-MM – A Software Evaluation Technology Maturity Model". 23rd International Conference on Software Engineering and the software product must have previously 660-665. Miami Beach, curating July 2011. and improving the availability and quality implemented the USA. 7-9 test data; Knowledge Engineering (SEKE2011). pp. required mechanisms to be integrated with the evaluation soft- of the evaluation results. ware. Test García Castro © Raúl data and evaluation results are machine-processable; Talk at IMATI-CNR. 15th October 2013 49
  • 50. The 15 tips for technology evaluation •  Know the technology •  Support different test data •  Facilitate test data definition •  Support different types of technology •  Define declarative evaluation workflows •  Use machineprocessable descriptions of evaluation resources © Raúl García Castro •  Automate the evaluation framework •  Expect reproducibility •  Beware of result analysis •  Learn statistics •  Plan for evaluation requirements •  Use a quality model •  Organize (or join) evaluation campaigns •  Share evaluation resources •  Exploit evaluation results Talk at IMATI-CNR. 15th October 2013 50
  • 51. Thank you for your attention! Speaker: Raúl García-Castro Talk at IMATI-CNR, October 15th, Genova, Italy