Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ontology Development Kit: Bio-Ontologies 2019


Published on

Presentation on Ontology Development Kit (ODK) presented 2019-07-24 at Bio-Ontologies COSI at ISMB/ECCB 2019

Published in: Software
  • Don't forget another good way of simplifying your writing is using external resources (such as ⇒ ⇐ ). This will definitely make your life more easier
    Are you sure you want to  Yes  No
    Your message goes here
  • Grab 5 Free Shed Plans Now! Download 5 Full-Blown Shed Plans with Step-By-Step Instructions & Easy To Follow Blueprints! ☀☀☀
    Are you sure you want to  Yes  No
    Your message goes here

Ontology Development Kit: Bio-Ontologies 2019

  2. 2. CONTINUED GROWTH OF ONTOLOGY PROJECTS • OBO Ontologies registered: • 2015: 104 ontologies • 2019: 215 ontologies • Observation • Ontology development in 2019 == Bioinformatics/perl coding circa 1999 • State of development of many ontologies: • Lack of modularity or reuse • Lack of testing frameworks or continuous integration • Little or no leveraging of reasoning • Version control practice poor or non-existent • Ontologies frequently contain errors: • Semantic; e.g. duplicate definitions • Structural • Syntactic • Lexical Number of ontologies in BioPortal Source: Amina Annane and Clement Jonquet
  3. 3. 1996 eXtreme Programming (XP) Test Driven Development (TDD) 2008 GitHub1986 CVS 2005 Git 2000 SVN 1999 SourceForge 1975 Modula 1965-85 “Software Crisis” 1974 Liskov “strong typing” 2001 Cruise Control 1970s 1980s 1990s 2000s 2010s 2011 Travis 1987 Pattern Languages 1972 Source Code Control System (SCCS)
  4. 4. ONTOLOGIES AND OBO: OPEN BIO-ONTOLOGIES • 1950s-60s Semantic Networks • 1960s-1980s Knowledge Representation • 1980s-1990s Description Logics, BioCyc, Medical Knowledge Bases, DAML+OIL • 1998 Gene Ontology Created • 2002 Open Bio-Ontologies (OBO) formed, OBO principles • 2003 OBO Format • 2003 Rector paper on normalization and modularity • 2003 OWL – Web Ontology Language • 2004 Relation Ontology • 2007 OBO PURLs (Permanent URLs) • 2011-Present OBO Operations Volunteers •Open (CC BY or 0) •Standard syntax and semantics (OWL) •Standard PURLs for classes •Versioning •Well-defined scope •Classes should be defined •Use of standard relations (RO/BFO) •Documentation •Documented Plurality of Users •Commitment To Collaboration •Locus of Authority •Naming Conventions •Maintenance {
  5. 5. ODK: ONTOLOGY DEVELOPMENT KIT kernel ODK container ROBOT Make dosdp-tools Reasoners container Ontology Operations (Command Line) Workflows: chains together operations Seed an ontology project: Create a GitHub repository with workflows in place Build ontologies rapidly from Design Pattern templates Includes Elk, HermiT, Konklude Complements ODEs (Protégé) fastobo Validation of obo format files (Rust)
  6. 6. ODK GETTING STARTED: SEEDING AN ONTOLOGY PROJECT • Organizing a project in GitHub – not trivial • ODK provides a seed utility to start a new project • Sets you up with a GitHub structure • Can be seeded from a YAML project specification OR command line options Project.yaml seed - - - - - .travis.yml - .github - ISSUE_TEMPLATE - - - src - ontology - myont-edit.owl - Makefile - myont-idspaces.owl - template - mytemplate1.yaml - myont.obo - myont.owl Jinja2 Templates (hand-authored)
  7. 7. ONTOLOGY WORKFLOWS: MAKE AND ROBOT • ROBOT: ROBOT is an OBO Tool • Command can be chained together • ODK will seed your repo with a Makefile based workflow • edit  release annotate reason diff template report extract Add metadata assertions onto ontology convert Use reasoner to detect incoherency and assert inferred links compare two ontologies generate portions of ontology from templates and tabular data complete QA/QC report Extract submodules for imports Convert between OWL syntaxes, OBO format, OBO-JSON <more..>
  8. 8. MODULAR ONTOLOGY DEVELOPMENT: EXTRACT • Don’t develop monoliths! • Reuse existing ontologies • OBO was constructed in part to facilitate ontology reuse • OWLAPI provides algorithms for extracting ‘modules’ (SLME) • Also: MIREOT • ROBOT provides an easy wrapper for these myont.owl chebi.owl chebi_import..owl robot extract –i chebi.owl –t myont.terms –o chebi_import.owl the-rector-normalization-technique/ extract owl:import terms Modularisation of Domain Ontologies Implemented in Description Logics and related formalisms including OWL - Proceedings of the 2nd international conference on Knowledge capture (Rector 2003)
  9. 9. REASONING • Why use reasoning? • Semantic Validation of ontology • e.g. disjoints, domain/range • unintended equivalence • Automatic classification, modular development • ROBOT provides simple wrapper ontology standard OWL reasoners • ODK Docker container includes reasoners that are awkward to install (e.g. Konklude) ontologies-using-owl-reasoning-part-1-basics-and-disjoint- classes-axioms/
  10. 10. OBO CONVENTIONS: ROBOT REPORT • Reason command provides semantic validation • ROBOT report validates against a checklist of criteria • Implemented via SPARQL (and soon ShEx) • Ensure classes have labels and textual definitions (cardinality 1) • No two classes should share the same text definition • Labels and exact synonyms should not clash • …many more • Many criteria are ‘OBO-esque’ • Can be configured • Criteria can potentially be expanded Level Rule Name Subject Property Value ERROR duplicate_definition head-mantle fusion [CEPH:0000129] definition [IAO:0000115] . ERROR duplicate_definition tentacle thickness [CEPH:0000261] definition [IAO:0000115] . ERROR missing_label anatomical entity [UBERON:0001062] label [rdfs:label] ERROR missing_ontology_description ceph.owl dc11:description ERROR duplicate_label leucophore [CEPH:0000284] label [rdfs:label] leucophore ERROR duplicate_label leucophore [CEPH:0001077] label [rdfs:label] leucophore ERROR missing_ontology_license ceph.owl dc:license ERROR missing_ontology_title ceph.owl dc11:title
  11. 11. TEMPLATE-DRIVEN ONTOLOGY DEVELOPMENT • Dead Simple OWL Design Patterns (DOSDPs) • ROBOT Templates • Allow encoding of common ontology patterns in structured form • Ontology Documentation and Validation • Generation (“Compilation”) of ontology portions from TSVs TSV/Excel Design Pattern Template OWL dosdp tools
  12. 12. CONTINUOUS INTEGRATION • Runs ODK Docker Container • Reasoner checks • Structural Checks using robot report + custom SPARQL .travis.yml
  13. 13. RELEASE MANAGEMENT • robot diff is used to create markdown summarizing changes from last release • leverage GitHub release mechanism • ODK creates standard release products • complete inferred releases (obo, owl, json) • base files, simple files
  14. 14. UPTAKE OF ODK BY COMMUNITY • biobanking • single-cell ontology • eupathdb • worm anatomy • phipo • obcs • credit ontology • upheno • … • GO • HP • MP • Uberon • ENVO • CL • Worm Anatomy • Fly Phenotype • … • upheno • pho • eexp • eating • zebrafish phenotype • allen-neuron- types • sickle cell • monochrom • … 90+ NEW repositories created Adapted by multiple existing ontologies • Migrating ontologies detected multiple errors • malformed xrefs • duplicate definitions, labels • … • Multiple improvements to ontology quality
  15. 15. ONTOBOT • Agent (bot) for operating on ontologies • Will make Pull Requests on your ontologies • E.g. • On change of upstream ontology (e.g. chebi) • Rebuild imports (robot extract) • Create a semantic diff • Make a PR • Relies on ODK-compliant GitHub structure • Currently only running for GO • Future plans: • Command OntoBot via GitHub tickets
  16. 16. ODK AND OBO • ODK is in principle generic • But we encourage OBO conventions and principles by default • Identifiers • Versioning • License • Definitions • … • Historically OBO principles have been “thrown over the wall”, we haven’t done a good job of helping implement • OntoTips: • Definitions: textual-definitions/
  17. 17. SUMMARY/CONCLUSIONS • ODK is recommended for • Starting a new ontology project • Retrofitting into an existing project • Frees ontology developer from multiple technical tasks • Makes it easier to follow OBO principles • Flexible and we are open to modifications • Many benefits • Conventions come with benefits • No need to roll your own Coming Soon • Protégé support • OntoBot improvements
  18. 18. ACKNOWLEDGMENTS Developers and Contributors • Nico Matentzoglu • David Osumi-Sutherland • Eric Douglass • Seth Carbon • Jim Balhoff • Bjoern Peters • Matt Horridge • Rebecca Jackson (Tauber) • James Overton Testers • Simon Jupp • Erik Segerdell • Sebastian Koehler • James Seager • Leigh Carmody • Sofia Robb • Chris Grove • Raymond Lee • Alliance of Genome Resource curators • Citlalli Mejía Almonte NIH U01HG009453 INCA NIH R24HG010032 OBO
  19. 19. CHALLENGES / FUTURE DEVELOPMENT • Unified OBO development guide