• Save
Seals 2nd campaign results
Upcoming SlideShare
Loading in...5
×
 

Seals 2nd campaign results

on

  • 9,319 views

 

Statistics

Views

Total Views
9,319
Views on SlideShare
1,843
Embed Views
7,476

Actions

Likes
1
Downloads
0
Comments
0

9 Embeds 7,476

http://www.seals-project.eu 7387
http://seals.sti2.org 27
http://translate.googleusercontent.com 24
http://abtasty.com 20
http://138.232.65.141 14
http://131.253.14.66 1
http://www.seals-project.eu&_=1373122598390 HTTP 1
http://seals-project.eu.netzcheck.com 1
https://www.google.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Seals 2nd campaign results Seals 2nd campaign results Presentation Transcript

  • Results  of  the  second  worldwide   evalua3on  campaign  for   seman3c  tools   ©  the  SEALS  Project   h>p://www.seals-­‐project.eu/  
  • 2nd  SEALS  Yards3cks  for   Ontology  Management  
  • 2nd  SEALS  Yards3cks  for  Ontology   Management  •  Conformance  and  interoperability  results  •  Scalability  results  •  Conclusions  3
  • Conformance  evalua3on  •  Ontology  language  conformance   –  The  ability  to  adhere  to  exis3ng  ontology  language   specifica3ons  •  Goal:  to  evaluate  the  conformance  of  seman3c   technologies  with  regards  to  ontology  representa3on   languages   Tool X O1 O1’ O1’’ Step 1: Import + Export O1 = O1’’ + α - α’4
  • Metrics  •  Execu9on  informs  about  the  correct  execu3on:     –  OK.  No  execu3on  problem   –  FAIL.  Some  execu3on  problem   –  Pla+orm  Error  (P.E.)  PlaQorm  excep3on  •  Informa9on  added  or  lost  in  terms  of  triples,  axioms,  etc.   Oi = Oi’ + α - α’•  Conformance  informs  whether  the  ontology  has  been   processed  correctly  with  no  addi3on  or  loss  of   informa3on:   –  SAME  if  Execuon  is  OK  and  Informaon  added  and   Informaon  lost  are  void   –  DIFFERENT  if  Execuon  is  OK  but  Informaon  added  or   Oi = Oi’ ? Informaon  lost  are  not  void   –  NO  if  Execuon  is  FAIL  or  P.E.   5
  • Interoperability  evalua3on  •  Ontology  language  interoperability   –  The  ability  to  interchange  ontologies  and  use  them  •  Goal:  to  evaluate  the  interoperability  of  seman3c  technologies  in   terms  of  the  ability  that  such  technologies  have  to  interchange   ontologies  and  use  them   Tool X Tool Y O1 O1’ O1’’ O1’’’ O1’’’’ Step 1: Import + Export Step 2: Import + Export O1 = O1’’ + α - α’ O1’’=O1’’’’ + β - β’ Interchange O1 = O1’’’’ + α - α’ + β - β’6
  • Metrics  •  Execu9on  informs  about  the  correct  execu3on:     –  OK.  No  execu3on  problem   –  FAIL.  Some  execu3on  problem   –  Pla+orm  Error  (P.E.)  PlaQorm  excep3on   –  Not  Executed.  (N.E.)  Second  step  not  executed  •  Informa9on  added  or  lost  in  terms  of  triples,  axioms,  etc.   Oi = Oi’ + α - α’•  Interchange  informs  whether  the  ontology  has  been   interchanged  correctly  with  no  addi3on  or  loss  of   informa3on:   –  SAME  if  Execuon  is  OK  and  Informaon  added  and  Informaon   lost  are  void   –  DIFFERENT  if  Execuon  is  OK  but  Informaon  added  or   Informaon  lost  are  not  void   Oi = Oi’ ? –  NO  if  Execuon  is  FAIL,  N.E.,  or  P.E.   7
  • Test  suites  used   Name   Defini9on   Nº  Tests   RDF(S)  Import  Test  Suite   Manual   82   OWL  Lite  Import  Test  Suite   Manual   82   OWL  DL  Import  Test  Suite   Keyword-­‐driven  generator   561   OWL  Full  Import  Test  Suite   Manual   90   OWL  Content  Pa>ern   Expressive  generator   81   OWL  Content  Pa>ern  Expressive   Expressive  generator   81   OWL  Content  Pa>ern  Full  Expressive   Expressive  generator   81  8
  • Tools  evaluated   1st  Evalua3on   Campaign   2nd  Evalua3on   Campaign  9
  • Evalua3on  Execu3on  •  Evalua3ons  automa3cally  performed  with  the  SEALS   PlaQorm   –  h>p://www.seals-­‐project.eu/   SEALS•  Evalua3on  materials  available   Test Suite Test Suite Test Suite Raw Result –  Test  Data   –  Results   Test Suite Interpretation –  Metadata   Conformance Interoperability Scalability10
  • Dynamic  result  visualiza3on  11
  • RDF(S)  conformance  results   •  Jena  and  Sesame  behave   iden3cally  (no  problems)   •  The  behaviour  of  the  OWL  API-­‐ based  tools  (NeOn  Toolkit,  OWL   API  and  Protégé  4)  has   significantly  changed   –  Transform  ontologies  to  OWL  2   –  Some  problems   •  Less  in  newer  versions   •  Protégé  OWL  improves  12
  • OWL  Lite  conformance  results   •  Jena  and  Sesame  behave   iden3cally  (no  problems)   •  The  OWL  API-­‐based  tools  (NeOn   Toolkit,  OWL  API  and  Protégé  4)   improve   –  Transform  ontologies  to  OWL  2   •  Protégé  OWL  improves  13
  • OWL  DL  conformance  results   •  Jena  and  Sesame  behave   iden3cally  (no  problems)   •  OWL  API  and  Protégé  4  improve   •  NeOn  Toolkit    worsenes   •  Protégé  OWL  behaves   iden3cally   •  Robustness  increases  14
  • Content  pa>ern  conformance  results   •  New  issues  iden3fied  in   the  OWL  API-­‐based  tools   (NeOn  Toolkit,  OWL  API   and  Protégé  4)   •  New  issue  iden3fied  in   Protégé  4   •  No  new  issues  15
  • Interoperability  results  1st  Evalua3on   2nd  Evalua3on  Campaign   Campaign   •  Same  analysis  as  in   conformance   •  OWL  DL:  New  issue  found   in  interchanges  from   Protégé  4  to  Protégé  OWL   •  Conclusions:   –  RDF-­‐based  tool  have  no   interoperability  problems   –  OWL-­‐based  tools  have  no   interoperability  problems   with  OWL  Lite  but  have   some  with  OWL  DL.   –  Tools  based  on  the  OWL   API  cannot  interoperate   using  RDF(S)  (they   convert  ontologies  into   OWL  2)   04.08.2010 16
  • 2nd  SEALS  Yards3cks  for  Ontology   Management  •  Conformance  and  interoperability  results  •  Scalability  results  •  Conclusions  17
  • Scalability  evalua3on   Tool X O1 O1’ O1’’ Step 1: Import + Export O1 = O1’’ + α - α’18
  • Execu3on  se^ngs  Test  suites:  •  Real  World.  Complex  ontologies  from  biological  and   medical  domains  •  Real  World  NCI.  Thesaurus  subsets  (1.5-­‐2  3mes  bigger)  •  LUBM.  Synthe3c  ontologies  Execu9on  Environment:  •  Win7-­‐64bit,  Intel  Core  2  Duo  CPU,  2.40GHz,  4.00  GB  RAM   (Real  World  Ontologies  Test  Collecons)  •  WinServer-­‐64bit,  AMD  Dual  Core,  2.60  GHz  (4  Processors),   8.00  GB  RAM  (LUBM  Ontologies  Test  Collecon)  Constraint:  •  30  min  threshold  per  test  case  19
  • Real  World  Scalability  Test  Suite  Test   Size   Triples   Protégé   Protégé4   Protégé OWL  API   OWL  API   Neon     Neon   Jena  v. Sesame   MB   OWL     v.41   4  v.42   v.310   v.324   v.232   v.252   270   v.265  RO1   0.2   3K   5  (sec)   2   2   2   2   3   2   3   2  RO2   0.6   4K   2   2   2   2   2   2   2   3   1  RO3   1   11K   11   3   4   12   5   7   7   8   2  RO4   3   31K   4   5   5   5   4   5   5   5   3  RO5   4   82K   8   8   10   7   7   12   7   8   4  RO6   6   92K   8   9   12   9   9   11   14   9   4  RO7   10   135K   10   11   11   11   10   13   11   10   4  RO8   10   167K   14   9   8   8   9   11   11   12   4  RO9   20   270K   22   20   24   18   16   19   19   18   7  R10   24   315K   68   21   24   19   18   26   20   19   8  R11   26   346K   162   25   19   22   21   27   22   22   9  R12   40   407K   -­‐   24   22   26   23   28   30   26   9  R13   44   646K   -­‐   36   33   35   34   44   40   37   13  R14   46   671K   -­‐   30   27   28   28   35   37   41   13  R15   84   864K   -­‐   34   26   32   26   36   33   69   21  R16   117   1623K   -­‐   -­‐   -­‐   -­‐   -­‐   -­‐   -­‐   102   33   20
  • Real  World  NCI  Scalability  Test  Suite  Test   Size   Triples   Protégé   Protégé4   Protégé4   OWL  API   OWL  API   NTK  v. NTK  v. Jena  v. Sesame   MB   OWL     v.41   v.42   v.310   v.324   232   252   270   v.265  NO1   0.5   3.6K   10  (sec)   5   6   4   3   4   4   4   2  NO2   0.6   4.3K   4   3   3   3   3   3   3   3   2  NO3   1   11K   5   4   4   4   4   4   4   3   2  NO4   4   31K   9   5   8   5   5   6   5   5   3  NO5   11   82K   13   7   10   8   8   9   8   9   5  NO6   14   109K   17   8   10   9   10   10   10   10   5  NO7   18   135K   19   9   12   10   10   12   12   11   5  NO8   23   167K   23   10   14   11   11   13   13   14   7  NO9   38   270K   37   15   16   15   13   18   17   20   9  N10   44   314K   74   16   18   16   17   21   19   23   10  N11   48   347K   136   17   19   16   18   21   20   24   10  N12   56   407K   -­‐   20   22   19   19   26   24   30   13  N13   89   646K   -­‐   29   28   28   29   39   35   47   18  N14   92   671K   -­‐   28   32   28   29   39   35   49   21  N15   118   864K   -­‐   34   36   34   36   48   45   63   26  N16   211   1540K   -­‐   61   61   62   71   83   100   282   41   21
  • LUBM  Test  Suite  Test   Size   Protégé   Protégé4   Protégé4   OWL  API   OWL  API   NTK  v. NTK  v. Jena  v. Sesame   MB   OWL     v.41   v.42   v.310   v.324   232   252   270   v.265  LO1   8   29   20   25   15   29   11   16   17   5  LO2   19   1M52   19   30   18   30   16   22   30   8  LO3   28   2M59   17   28   27   40   20   26   42   10  LO4   39   4M05   24   33   33   41   28   39   47   12  LO5   51   17M27   36   40   -­‐   54   -­‐   54   59   14  LO6   60   22M43   41   45   -­‐   60   -­‐   1M04   1M03   16  LO7   72   26M32   1M1   53   -­‐   1M18   -­‐   1M28   1M17   19  LO8   82   -­‐   1M16   59   -­‐   1M3   -­‐   -­‐   1M27   20  LO9   92   -­‐   1M37   1M8   -­‐   2M12   -­‐   -­‐   1M39   23  L10   105   -­‐   2M2   1M31   -­‐   2M53   -­‐   -­‐   1M48   27  L11   116   -­‐   3M18   -­‐   -­‐   -­‐   -­‐   -­‐   2M02   33  L12   129   -­‐   4M59   -­‐   -­‐   -­‐   -­‐   -­‐   2M15   35  L13   143   -­‐   7M21   -­‐   -­‐   -­‐   -­‐   -­‐   2M33   40  L14   153   -­‐   9M07   -­‐   -­‐   -­‐   -­‐   -­‐   2M4   42  L15   162   -­‐   11M23   -­‐   -­‐   -­‐   -­‐   -­‐   2M52   43  L16   174   -­‐   14M09   -­‐   -­‐   -­‐   -­‐   -­‐   3M02   44  L17   184   -­‐   17M   -­‐   -­‐   -­‐   -­‐   -­‐   3M2   46  L18   197   -­‐   23M05   -­‐   -­‐   -­‐   -­‐   -­‐   3M34   51  L19   251   -­‐   27M21   -­‐   -­‐   -­‐   -­‐   -­‐   3M49   1M12   22
  • LUBM  Test  Suite  (II)  Test   Size  ,   Protégé4   Jena  v. Sesame   Test   Size  ,   Sesame  v. Test   Size  ,   Sesame  v. MB   v.41   270   v.265   MB   265   MB   265  L20   263   -­‐   4M05   1M11   L36   412   1M44   Le51   1,105   -­‐  L21   284   -­‐   4M17   1M03   L37   421   1M45   Le52   1,205   -­‐  L22   242   -­‐   4M18   1M07   L38   430   1M49   Le53   1,302   -­‐  L23   251   -­‐   4M36   1M03   L39   441   1M49   Le54   1,404   -­‐  L24   263   -­‐   4M56   1M07   L40   453   1M55   Le55   1,514   -­‐  L25   284   -­‐   5M31   1M17   L41   467   2M05  L26   297   -­‐   5M35   1M18   L42   480   2M04  L27   307   -­‐   5M46   1M22   L43   489   2M14  L28   317   -­‐   6M09   1M27   L44   498   2M13  L29   330   -­‐   6M13   1M3   L45   510   2M23  L30   340   -­‐   6M23   1M3   LUBM  EXTENDED  TEST  SUITE  L31   354   -­‐   8M03   1M35   Le46   598   2M49  L32   363   -­‐   8M07   1M31   16M58   Le47   705  L33   375   -­‐   9M19   1M33   Le48   802   -­‐  L34   386   -­‐   -­‐   1M3   Le49   906   -­‐  L35   399   -­‐   -­‐   1M39   Le50   1,001   -­‐   23
  • 2nd  SEALS  Yards3cks  for  Ontology   Management  •  Conformance  and  interoperability  results  •  Scalability  results  •  Conclusions  24
  • Conclusions  –  Test  data  •  Test  suites  are  not  exhaus3ve   –  The  new  test  suites  helped  detec3ng  new  issues  •  A  more  expressive  test  suite  does  not  imply   detec3ng  more  issues  •  We  used  exis3ng  ontologies  as  input  for  the  test   data  generator   –  Requires  a  previous  analysis  of  the  ontologies  to   detect  defects     –  We  found  ontologies  with  issues  that  we  had  to   correct  25
  • Conclusions  -­‐  Results  •  Tools  have  improved  their  conformance,  interoperability,   and  robustness  •  High  influence  of  development  decisions     –  the  OWL  API  radically  changed  the  way  of  dealing  with  RDF   ontologies     •  need  tools  for  easy  evalua3on   •  need  stronger  regression  tes3ng  •  The  automated  genera3or  defined  test  cases  that  a  person   would  have  never  though  about  but  which  iden3fied  new   tool  issues  •  using  bigger  ontologies  for  conformance  and   interoperability  tes3ng  makes  much  more  difficult  to  find   problems  in  the  tools  26
  • Evaluating Storage and Reasoning Systems
  • Index•  Evaluation scenarios•  Evaluation descriptions•  Test data•  Tools•  Results•  Conclusion
  • Advanced  reasoning  system  •  Descrip3on  logic  based  system  (DLBS)  •  Standard  reasoning  services   –  Classifica3on   –  Class  sa3sfiability   –  Ontology  sa3sfiability   –  Logical  entailment  
  • Exis3ng  evalua3ons  •  Datasets   –   Synthe3c  genera3on   –   Hand  craked  ontologies   –   Real-­‐world  ontologies  •  Evalua3ons   –  KRSS  benchmark   –  TANCS  benchmark   –  Gardiner  dataset  04.08.201030
  • Evaluation criteria•  Interoperability –  the capability of the software product to interact with one or more specified systems –  a system must •  conform to the standard input formats •  be able to perform standard inference services•  Performance –  the capability of the software to provide appropriate performance, relative to the amount of resources used, under stated conditions
  • Evaluation metrics•  Interoperability –  Number of tests passed without parsing errors –  Number of inference tests passed•  Performance –  Loading time –  Inference time
  • Class satisfiability evaluation•  Standard inference service that is widely used in ontology engineering•  The goal: to assess both DLBS s interoperability and performance•  Input –  OWL ontology –  One or several class IRIs•  Output –  TRUE the evaluation outcome coincide with expected result –  FALSE the evaluation outcome differ from expected outcome –  ERROR indicates IO error –  UNKNOWN indicates that the system is unable to compute inference in the given timeframe
  • Class satisfiability evaluation
  • Ontology satisfiability evaluation•  Standard inference service typically carried out before performing any other reasoning task•  The goal: to assess both DLBS s interoperability and performance•  Input –  OWL ontology•  Output –  TRUE the evaluation outcome coincide with expected result –  FALSE the evaluation outcome differ from expected outcome –  ERROR indicates IO error –  UNKNOWN indicates that the system is unable to compute inference in the given timeframe
  • Ontology satisfiability evaluation
  • Classification evaluation•  Inference service that is typically carried out after testing ontology satisfiability and prior to performing any other reasoning task•  The goal: to assess both DLBS s interoperability and performance•  Input –  OWL ontology•  Output –  OWL ontology –  ERROR indicates IO error –  UNKNOWN indicates that the system is unable to compute inference in the given timeframe
  • Classification evaluation
  • Logical entailment evaluation•  Standard inference service that is the basis for query answering•  The goal: to assess both DLBS s interoperability and performance•  Input –  2 OWL ontologies•  Output –  TRUE the evaluation outcome coincide with expected result –  FALSE the evaluation outcome differ from expected outcome –  ERROR indicates IO error –  UNKNOWN indicates that the system is unable to compute inference in the given timeframe
  • Logical entailment
  • Storage and reasoning systems evaluation component•  SRS component is intended to evaluate the description logic based systems (DLBS) –  Implementing OWL-API 3 de-facto standard for DLBS –  Implementing SRS SEALS DLBS interface•  SRS supports test data in all syntactic formats supported by OWL-API 3•  SRS saves the evaluation results and interpretations in MathML 3 format
  • DLBS interface•  Java methods to be implemented by system developers –  OWLOntology loadOntology(IRI iri) –  boolean isSatisfiable(OWLOntology onto, OWLClass class) –  boolean isSatisfiable(OWLOntology onto) –  OWLOntology classifyOntology(OWLOntology onto) –  URI saveOntology(OWLOntology onto, IRI iri) –  boolean entails(OWLOntology onto1, OWLOntology onto2)
  • Testing Data•  The ontologies from the Gardiner evaluation suite. –  Over 300 ontologies of varying expressivity and size.•  Various versions of the GALEN ontology•  Various ontologies that have been created in EU funded projects, such as SEMINTEC, VICODI and AEO•  155 entailment tests from OWL 2 test cases repository
  • Evaluation setup•  3  DLBSs   –  FaCT++  C++  implementa3on  of  FaCT  OWL  DL  reasoner   –  HermiT  Java  based  OWL  DL  reasoner  u3lizing  novel  hypertableau   algorithms   –  Jcel  Java  based  OWL  2  EL  reasoner   –  FaCT++C    evaluated  without  OWL  prepareReasoner()  call   –  HermiTC  evaluated  without  OWL  prepareReasoner()  call  •  2  AMD  Athlon(tm)  64  X2  Dual  Core  Processor  4600+  machines   with  2GB  of  main  memory     –  DLBSs  were  allowed  to  allocate  up  to  1  GB  
  • Evaluation results: Classification FaCT++ HermiT jcelALT, ms 68 506 856ART, ms 15320 167808 2144TRUE 160 145 16FALSE 0 0 0ERROR 47 33 4UNKNOWN 3 32 0
  • Evaluation results: Class satisfiability FaCT++ HermiT jcelALT, ms 1047 255 438ART, ms 21376 517043 1113TRUE 157 145 15FALSE 1 0 0ERROR 36 35 5UNKNOWN 16 30 0
  • Evaluation results: Ontology satisfiability FaCT++ HermiT jcelALT, ms 1315 410 708ART, ms 25175 249802 1878TRUE 134 146 16FALSE 0 0 0ERROR 45 33 4UNKNOWN 0 31 0
  • Evaluation results: Entailment FaCT++ HermiTALT, ms 14 33ART, ms 1 20673TRUE 46 119FALSE 67 14ERROR 34 9UNKNOWN 0 3
  • Evaluation results: Non entailment FaCT++ HermiTALT, ms 47 92ART, ms 5 127936TRUE 7 7FALSE 0 1ERROR 3 1UNKNOWN 0 1
  • Comparative evaluation: Classification FaCT++C HermiTCALT, ms 309 207ART, ms 3994 2272TRUE 112 112
  • Comparative evaluation: Class satisfiability FaCT++C HermiTCALT, ms 333 225ART, ms 216 391TRUE 113 113
  • Comparative evaluation: Ontology satisfiability FaCT++C HermiTCALT, ms 333 225ART, ms 216 391TRUE 113 113
  • Comparative evaluation: Entailment FaCT++C HermiTCALT, ms 7 7ART, ms 2 24TRUE 1 1
  • Comparative evaluation: Non- Entailment FaCT++C HermiTCALT, ms 22 18ART, ms 2 43TRUE 4 4
  • Comparative evaluation: Classification FaCT++C HermiTC FaCT++ HermiT jcelALT, ms 398 355 1471 771 856ART, ms 11548 1241 36650 2817 2144TRUE 16 16 16 16 16
  • Comparative evaluation: Class satisfiability FaCT++C HermiTC FaCT++ HermiT jcelALT, ms 382 342 532 1062 438ART, ms 159 223 7603 3437 1113TRUE 15 15 15 15 15
  • Comparative evaluation: Ontology satisfiability FaCT++C HermiTC FaCT++ HermiT jcelALT, ms 360 365 1389 1262 708ART, ms 11548 202 36650 4790 1878TRUE 16 16 16 16 16
  • Challenging ontologies: ClassificationOntology Mosquito GALEN mged go worm- -anatomy anatomyClasses 1864 2749 229 19528 6731Relations 2 413 102 1 5FaCT++C,LT ms 3760 663 189 4362 783FaCT++C,RT ms 9568 9970 355 28041 45739HermiTC,LT ms 510 609 273 4328 973HermiTC,RT ms 944 12623 27974 12698 2491
  • Challenging ontologies: ClassificationOntology plans information human Fly- emap anato myClasses 118 121 8342 6326 13731Relations 263 197 1 3 1FaCT++C, LT ms 67 106 3186 662 1965FaCT++C, RT ms 661 126 132607 5016 156714HermiTC, LT ms 67 95 1192 746 1311HermiTC, RT ms 115576 7064 3842 6564 7097
  • Challenging ontologies: Class satisfiabilityOntology not GALEN mged go plans GALENClass Digestion Trimetho Thing GO_0042 schedule prim 447Classes 3087 2749 229 19528 118Relations 413 413 102 1 263FaCT++C, LT 1130 652 174 4351 78FaCT++C, RT 3215 1065 160 1465 79HermiTC, LT 1087 680 358 3961 67HermiTC, RT 11210 9108 4333 2776 3459
  • Challenging ontologies: Ontology satisfiabilityOntology not GALEN mged go plans GALENClasses 3087 2749 229 19528 118Relations 413 413 102 1 263FaCT++C, LT 992 618 189 4383 67FaCT++C, RT 3047 1057 170 1413 74HermiTC, LT 1166 590 346 4371 69HermiTC, RT 11562 9408 3197 2687 1827
  • Conclusion•  Errors: –  datatypes not supported in the systems –  syntax related : a system was unable to register a role or a concept –  expressivity errors•  Execution time is dominated by small number of hard problems
  • SEALS  Ontology  Matching   Evalua3on  campaign   …  also  known  as  OAEI  2011.5  6/26/1263
  • Ontology  Matching   Person   People   Author   Author   <  Author,  Author,  =,  0.97  >   writes   Commi>eeMember   <  Paper,  Paper,  =,  0.94  >   Reviewer   <  reviews,  reviews,  =,  0.91  >   <  writes,  writes,  =,  0.7  >   PCMember   <  Person,  People,  =,  0.8  >   reviews   <  Document,  Doc,  =,  0.7  >   <  Reviewer,  Review,  =,  0.6  >   reviews   …   Doc  Document   Paper   Paper   writes   Review   6/26/12 64
  • OAEI  &  SEALS  •  OAEI  :  Ontology  Alignment  Evalua3on  Ini3a3ve   –  Organized  as  annual  campaign  from  2005  to  2012   –  Included  in  Ontology  Matching  workshop  at  ISWC   –  Different  tracks  (evalua3on  scenarios)  organized  by   different  researchers  •  Star3ng  in  2010:  Support  from  SEALS   –  OAEI  2010,  OAEI  2011,  and  OAEI  2011.5  6/26/1265
  • OAEI  2011.5  par3cipants  6/26/1266
  • Jose  Aguirre   OAEI  tracks   Jerome    Euzenat   INRIA  Grenoble  •  Benchmark   –  Matching  different  versions  of  the  same  ontology   –  Scalability:     Size    run3mes  •  Conference  •  Mul3Farm  •  Anatomy  •  Large  BioMed  6/26/1267
  • Ondřej  Šváb-­‐Zamazal   OAEI  tracks   Vojtěch  Svátek   Prague  University   of  Economics  •  Benchmark  •  Conference   –  Same  domain,  different  ontology   –  Manually  generated  reference  alignment  •  Mul3Farm  •  Anatomy  •  Large  BioMed  6/26/1268
  • Chris3an  Meilicke,   OAEI  tracks   Cassia  Trojahn   University  Mannheim   INRIA  Grenoble  •  Benchmark  •  Conference  •  Mul3Farm:  Mul3lingual  Ontology  Matching   –  Based  on  Conference   –  Testcases  for  Spanish,  German,   French,  Russian,  Portuguese,   Czech,  Dutch,  Chinese  •  Anatomy  •  Large  BioMed  6/26/1269
  • Chris3an  Meilicke,   OAEI  tracks   Heiner  Stuckenschmidt   University  Mannheim  •  Benchmark  •  Conference  •  Mul3Farm  •  Anatomy   –  Matching  mouse   on  human  anatomy   –  Run3mes  •  Large  BioMed  6/26/1270
  • Ernesto  Jimenez  Ruiz   OAEI  tracks   Bernardo  Cuenca  Grau   Ian  Horrocks   University  of  Oxford  •  Benchmark  •  Conference  •  Mul3Farm  •  Anatomy  •  Large  BioMed   –  Very  large  dataset  (FMA-­‐NCI)   –  Includes  coherence  analysis  6/26/1271
  • Detailed  results   h>p://oaei.ontologymatching.org/2011.5/ results/index.html  6/26/1272
  • Ques3ons?   Write  a  mail  to  Chris3an  Meilicke   chris3an@informa3k.uni-­‐mannheim.de  6/26/1273
  • IWEST  2012  workshop  located  at  ESWC  2012   Seman3c  Search  Systems   Evalua3on  Campaign   6/26/12 74
  • Two  phase  approach  •  Seman3c  search  tools  evalua3on  demands  a   user-­‐in-­‐the-­‐loop  phase   –  usability  criterion  •  Two  phases:   –  User-­‐in-­‐the-­‐loop   –  Automated  6/26/1275
  • Evalua3on  criteria  by  phase  Each  phase  will  address  a  different  subset  of  criteria.  •  Automated  phase:  query  expressiveness,   scalability,  performance  •  User-­‐in-­‐the-­‐loop  phase:  usability,  query   expressiveness  6/26/1276
  • Par3cipants  Tool   Descrip9on   UITL   Auto  K-­‐Search   Form-­‐based   x   x  Ginseng   Natural  language  with  constrained  vocabulary  and   x   grammar  NLP-­‐Reduce   Natural  language  for  full  English  ques3ons,  sentence   x   fragments,  and  keywords.  Jena  Arq   SPARQL  query  engine.  Automated  phase  baseline   x  RDF.Net  Query   SPARQL-­‐based   x  Seman3c  Crystal   Graph-­‐based   x  Affec3ve  Graphs   Graph-­‐based   x   6/26/12 77
  • Usability  Evalua3on  Setup   •  Data:  Mooney  Natural  Language  Learning  Data   •  Subjects:    20  (10  expert  users;  10  casual  users)   –  Each  subject  evaluated  the  5  par3cipa3ng  tools   •  Task:  Formulate  5  ques3ons  in  each  tool’s  interface     •  Data  Collected:    success  rate,  input  3me,  number  of   a>empts,  response  3me,  user  sa3sfac3on   ques3onnaires,  demographics  04.08.201078
  • 1  concept,   1  rela3on   Ques3ons  1)  Give  me  all  the  capitals  of  the  USA?   2  concepts,  2  rela3ons  2)  What  are  the  ci9es  in  states  through  which  the   Mississippi  runs?   compara3ve  3)  Which  states  have  a  city  named  Columbia  with  a  city   popula3on  over  50,000?   superla3ve  4)  Which  lakes  are  in  the  state  with  the  highest  point?  5)  Tell  me  which  rivers  do  not  traverse  the   nega3on              state  with  the  capital  Nashville?   04.08.2010 79
  • Automated  Evalua3on  Setup   •  Data:  EvoOnt  dataset   –  Five  sizes:  1K  10K  100K  1M  10M  triples   •  Task:  Answer  10  ques3ons  per  dataset  size   •  Data  Collected:    ontology  load  3me,  query  3me,  number   of  results,  result  list   •  Analyses:  precision,  recall,  f-­‐measure,  mean  query  3me,   mean  3me  per  result,  etc  04.08.201080
  • Configura3on  •  All  tools  executed  on  SEALS  PlaQorm  •  Each  tool  executed  within  a  Virtual  Machine   Linux   Windows   OS   Ubuntu  10.10  (64-­‐bit)   Windows  7  (64-­‐bit)   Num  CPUs   2   4   Memory  (GB)   4   4   Tools   Arq  v2.8.2  and  Arq  v2.9.0   RDF  Query  v0.5.1-­‐beta  6/26/1281
  • FINDINGS  -­‐  USABILITY  6/26/1282
  • Graph-­‐based  tools  most  liked     (highest  ranks  and  average  SUS  scores)   Tool 100.0 Semantic-Crystal •  Perceived  by  expert  users  System Usability Scale "SUS" Questionnaire score Affective-Graphs K-Search Ginseng Nlp-Reduce 80.0 as  intui9ve  allowing  them   to  easily  formulate  more   60.0 complex  queries.   40.0 •  Casual  users  enjoyed  the   fun  and  visually-­‐appealing   20.0 interfaces  which  created  a   17 pleasant  search   .0 experience.     Casual Expert UserType 04.08.2010 83
  • Form-­‐based  approach  most  liked  by  casual   users   •  Perceived  by  casual  users  as   Tool 5Extended Questionnaire Question "The systems query Semantic-Crystal language was easy to understand and use" score Affective-Graphs K-Search Ginseng Nlp-Reduce midpoint  between  NL  and   4 graph-­‐based.   •  Allow  more  complex  queries   3 than  the  NL  does.   •  Less  complicated  and  less   2 61 query  input  3me  than  the   graph-­‐based.     1 17 •  Together  with  graph-­‐based:   Casual Expert most  liked  by  expert  users   UserType 04.08.2010 84
  • Casual  Users  liked  Controlled-­‐NL  approach   •  Casuals:     Tool •  liked  guidance  through   100.0 Semantic-CrystalSystem Usability Scale "SUS" Questionnaire score Affective-Graphs sugges3ons.   K-Search Ginseng Nlp-Reduce 80.0 •  Prefer  to  be  ‘controlled’  by  the   language  model,  allowing  only   60.0 valid  queries.   40.0 •  Experts:     •  restric3ve  and  frustra3ng.   20.0 •  Prefer  to  have  more  flexibility   and  expressiveness  rather  than   .0 17 support  and  restric3on.   Casual Expert UserType 04.08.2010 85
  • Free-­‐NL  challenge:  habitability  problem   1.0 Tool Semantic-Crystal Affective-Graphs •  Free-­‐NL  liked  for  its  simplicity,   K-Search .8 Ginseng Nlp-Reduce familiarity,  naturalness  and  low   query  input  3me  required.  Answer found rate 42 96 .6 •  Facing  habitability  problem:   mismatch  between  users  query   98 .4 terms  and  tools  ones.   .2 99 •  Lead  to  lowest  success  rate,   highest  number  of  trials  to  get   .0 97 Casual Expert UserType a  sa3sfying  answer,  and  in  turn   very  low  user  sa3sfac3on.   04.08.2010 86
  • FINDINGS  -­‐  AUTOMATED  6/26/1287
  • Overview  •  K-­‐Search  couldn’t  load  the  ontologies   –  external  ontology  import  not  supported   –  cyclic  rela3ons  with  concepts  in  remote  ontologies  not   supported  •  Non-­‐NL  tools  transform  queries  a  priori  •  Na3ve  SPARQL  tools  exhibit  differences  in  query   approach  (see  load  and  query  3mes)    6/26/1288
  • Ontology  load  3me   Arq v2.8.2 ontology load time Arq v2.9.0 ontology load time 100000 RDF Query v0.5.1-beta ontology load time •  RDF  Query  loads   ontology  on-­‐the-­‐fly.   Load  3mes  therefore   independent  of  Time (ms) 10000 dataset  size.   •  Arq  loads  ontology   1000 into  memory.     1 10 100 1000 Dataset size (thousands of triples) 6/26/12 89
  • Query  3me   Arq v2.8.2 mean query time •  RDF  Query  loads   Arq v2.9.0 mean query time ontology  on-­‐the-­‐fly.   100000 RDF Query v0.5.1-beta mean query time Query  3mes  therefore   incorporate  load  3me.     •  Expensive  for  more   than  one  query  in  a  Time (ms) 10000 session.   •  Arq  loads  ontology   into  memory.     1000 •  Query  3mes  largely   independent  of   dataset  size   1 10 100 1000 Dataset size (thousands of triples) 6/26/12 90
  • SEALS  Seman3c  Web  Service  Tools   Evalua3on  Campaign  2011   Seman9c  Web  Service  Discovery   Evalua9on  Results  04.08.20106/26/1204.08.201091
  • Evalua3on  of  SWS  Discovery  •  Finding  Web  Services  based  on  their  seman3c   descrip3ons    •  For  a  given  goal,  and  a  given  set  of  service   descrip3ons,  the  tool  returns  the  match  degree   between  the  goal  and  each  service    •  Measurement  services  are  provided  via  the  SEALS   plaQorm  to  measure  the  rate  of  matching   correctness  92 92
  • Campaign Overviewhttp://www.seals-project.eu/seals-evaluation-campaigns/2nd-seals-evaluation-campaigns/ semantic-web-service-tools-evaluation-campaign-2011•   Goal   –  Which  ontology/annota3on  is  the  best:  WSMO-­‐Lite,  OWL-­‐S  or   SAWSDL?  •  Assump3ons:   –  Same  corresponding  Test  Collec3ons  (TCs)   –  Same  corresponding  Matchmaking  algorithms  (Tools)   –  The  corresponding  tools  will  belong  to  the  same   provider   –  The  level  of  performance  of  a  tool  for  a  specific  TC  is   of  secondary  importance    93 93
  • Campaign Overviewhttp://www.seals-project.eu/seals-evaluation-campaigns/2nd-seals-evaluation-campaigns/ semantic-web-service-tools-evaluation-campaign-2011Given  that  a  tool  T  can  apply  the  same  corresponding  matchmaking  algorithm  M  to  corresponding  test  collec3ons,  say,  TC1,  TC2  and  TC3,  we  would  like  to  compare  the  performance  (e.g.  Precision,  Recall)  among  MTC1,  MTC2  and  MTC3  94 94
  • Background:  S3  Challenge   h>p://www-­‐ags.d•i.uni-­‐sb.de/~klusch/s3/index.html     T1   T2   ……   Tn   TI   TII   ……   TXV   ……   M1   M2   ……   Mn   MI   MII   ……   MXV   TCa  (e.g  owl-­‐s)   TCb  (e.g.  sawsdl)   ……  95 95
  • Background:  S3  Challenge   h>p://www-­‐ags.d•i.uni-­‐sb.de/~klusch/s3/index.html     1st  Evalua9on  Campaign  (2010)   T1   T2   ……   Tn   TI   TII   ……   TXV   ……   M1   M2   ……   Mn   MI   MII   ……   MXV   TCa  (e.g  owl-­‐s)   TCb  (e.g.  sawsdl)   ……  96 96
  • Background:  SWS  Challenge   h>p://sws-­‐challenge.org/wiki/index.php/Scenario:_Shipment_Discovery     T1   TI   Ta   M1   MI   Ma   ……   Formalism1(e.g.  ocml)   FormalismI(e.g.  owl-­‐s)   Formalisma   Goal  descrip3ons  (e.g.  plain  text)    97 97
  • SEALS  2nd     SWS  Discovery  Evalua3on   T1   T2   T3   ……   M   TC1  (e.g  owl-­‐s)   TC2  (e.g.  sawsdl)   TC3  (e.g.  wsmo-­‐lite)   ……  98 98
  • SEALS  Test  Collec3ons  •  WSMO-­‐LITE-­‐TC  (1080  services,  42  goals)   h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/WSMO-­‐LITE-­‐TC-­‐SWRL/1.0-­‐4b   h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/WSMO-­‐LITE-­‐TC-­‐SWRL/1.0-­‐4g    •  SAWSDL-­‐TC  (1080  services,  42  goals)   h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/SAWSDL-­‐TC/3.0-­‐1b   h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/SAWSDL-­‐TC/3.0-­‐1g  •  OWLS-­‐TC  (1083  services,  42  goals)   h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/OWLS-­‐TC/4.0-­‐11b   h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/OWLS-­‐TC/4.0-­‐11g   99
  • Metrics  –  Galago  (1)  100 100
  • Metrics  –  Galago  (2)  101 101
  • SWS  Discovery  Evalua3on  Workflow  102
  • SWS  Tool  Deployment   Wrapper  for  SEALS  plaQorm  103
  • Tools   WSMO-­‐LITE-­‐TC   SAWSDL-­‐TC   OWLS-­‐TC   WSMO-­‐LITE-­‐OU1   SAWSDL-­‐OU1   SAWSDL-­‐URJC2   OWLS-­‐URJC2   SAWSDL-­‐M03   OWLS-­‐M03  1.  Ning  Li,  The  Open  University  2.  Ziji  Cong  et  al.,  University  of  Rey  Juan  Carlos    3.  Ma>hias  Klusch  et  al.  German  Research  Center  for  Ar3ficial  Intelligence   104 104
  • Tools   WSMO-­‐LITE-­‐TC   SAWSDL-­‐TC   OWLS-­‐TC   WSMO-­‐LITE-­‐OU1   SAWSDL-­‐OU1   SAWSDL-­‐URJC2   OWLS-­‐URJC2   SAWSDL-­‐M03   OWLS-­‐M03  1.  Ning  Li,  The  Open  University  2.  Ziji  Cong  et  al.,  University  of  Rey  Juan  Carlos    3.  Ma>hias  Klusch  et  al.  German  Research  Center  for  Ar3ficial  Intelligence   105 105
  • Evalua3on  Execu3on  •  Evalua3on  workflow  was  executed  on  the  SEALS   PlaQorm  •  All  tools  were  executed  within  a  Virtual  Machine   Windows   OS   Windows  7  (64-­‐bit)   Num  CPUs   4   Memory  (GB)   4   Tools   WSMO-­‐LITE-­‐OU,  SAWSDL-­‐OU  106 6/26/12
  • Par3al  Evalua3on  Results   WSMO-­‐LITE  vs.  SAWSDL     WSMO-­‐LITE-­‐OU   SAWSDL-­‐OU   M   WSMO-­‐LITE-­‐TC   SAWSDL-­‐TC  107
  • *  This  table  only  shows  the  results  that  are  different   108
  • Analysis    •  Out  of  42  goals,  only  19  have  different  results  in  terms   of  Precision  and  recall  •  On  17  out  of  19  occasions,  WSMO-­‐Lite  improves   discovery  precision  over  SAWSDL  through  specializing   service  seman3cs    •  WSMO-­‐Lite  performs  worse  than  SAWSDL  in  6  of  19   occasions  on  discovery  recall  while  performing  the   same  for  the  other  13  occasions   109
  • Analysis    •  Goal  #17:  novel_author_service.wsdl  (Educ3on  domain)   h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/WSMO-­‐LITE-­‐TC-­‐SWRL/1.0-­‐4b/suite/ 17/component/GoalDocument/  •  Services  chosen  from  SAWSDL  but  not  WSMO-­‐Lite   (Economy  domain)   •  roman3cnovel_authormaxprice_service.wsdl   •  roman3cnovel_authorprice_service.wsdl   •  roman3cnovel_authorrecommendedprice_service   •  short-­‐story_authorprice_service.wsdl   •  science-­‐fic3on-­‐novel_authorprice_service.wsdl   •  sciencefic3onbook_authorrecommendedprice_service.wsdl   •  ……….   110
  • Lessons  Learned  •  WSMO-­‐LITE-­‐OU  tends  to  perform  be>er  than   SAWSDL-­‐OU  in  terms  of  precision,  but  slightly  worse   in  recall.  •  The  only  feature  of  WSMO-­‐Lite  used  against  SAWSDL   was  the  service  category  (based  on  TC  domains).   –  Services  were  filtered  by  service  category  in  WSMO-­‐LITE-­‐ OU  and  not  in  SAWSDL-­‐OU  •  Further  tests  with  addi3onal  tools  and  measures  are   needed  for  any  conclusive  results  about  WSMO-­‐Lite   vs.  SAWSDL  (many  tools  are  not  available  yet)   111
  • Conclusions  •  This  has  been  the  first  SWS  evalua3on  campaign  in  the   community  focusing  on  the  impact  of  the  service  ontology/ annota3on  on  performance  •  This  comparison  has  been  facilitated  by  the  genera3on  of   WSMO-­‐LITE-­‐TC  as  a  counterpart  of  SAWSDL-­‐TC  and  OWLS-­‐TC   in  the  SEALS  repository  •  The  current  comparison  only  involves  2  ontologies/ annota3ons  (WSMO-­‐Lite  and  SAWSDL)  •  Raw  and  Interpreta3on  results  are  available  in  RDF  via  the   SEALS  repository  (public  access)   112