Continuous Integration of Open Biological Ontology Libraries           Chris Mungall     Lawrence Berkeley National       ...
Outline• What is Continuous Integration and why we  need it for ontologies• A build tool for ontologies: OORT• Example wor...
Reuse and modularization of                    ontologies  • Re-use, don’t re-invent      – OBO Foundry  • Modularize     ...
Examples of ontology re-use     • GO is re-using the CHEBI classification of       chemical entities                      ...
Reuse is not problem-free• Modules which are tested in one context may not work  in another   – Example: Therac-25 radiati...
Integration testing in software              engineering• Traditional waterfall model  – Integration testing at end  – Def...
Example CI Server Architecture                Developer                                                     Developer     ...
Jenkins-CI• A popular extendable open source continuous  integration server• Easy to set up and administer• Multiple plugi...
What’s this got to do with ontologies?Software Engineering         Ontology EngineeringSource Code (.java, .pm)     Ontolo...
Oort: A build tool for ontologies                                                                 .obo         .owl   • Wh...
Example basic workflow• Client:   – Make local modifications using     OBO Edit   – Commit changes to SVN   – (optionally)...
Example basic workflow• Client:   – Make local modifications using     OBO Edit   – Commit changes to SVN   – (optionally)...
Example basic workflow                                                                 FAIL• Write reasoner report        ...
OBO Jenkins dashboardIn progress –Cell ontology (cl) build        Red ball = FAIL                          ‘outlook’      ...
Why we need this for GO    • GO is gradually moving towards leveraging      external ontologies and automated reasoning   ...
Why we need this for GO   • Automated quality control using reasoning          – Taxon constraints          – Useful for f...
Errors propagate in an integratedCHEBI              environment   GO               MGI          NCBITaxon                 ...
Server-side integration tests are vital CHEBI                      GO               MGI          NCBITaxon                ...
Staged builds• Fowler Principle: ‘Keep the build fast’• Staged builds  – Balances needs of bug finding and speed      Fast...
User experience• Previous environment:  – Daily cron job, monolithic perl scripts• Informal survey results:  – Gene Ontolo...
Human Phenotype Ontology is                         deployed using CI     • HPO: ~10k classes     • Logical definitions ha...
CI best practice: use a VCS• Ontologies are source code   – Always use a version control system to manage your     source ...
Future Enhancements• Migrate OBO-Edit verification checks to OWL API• Phase out perl and OBO-Format validation scripts  an...
Availability• Oort:     • http://code.google.com/p/owltools/wiki/OortIntro• OBO build server:     • http://build.berkeleyb...
Conclusions• What works for software can work for ontologies   – Ontology engineering should become more like Software    ...
Acknowledgments• Tanya Berardini, Rebecca Foulger, David Hill, Jane  Lomax, Paola Roncaglia, Midori Harris, Ramona  Walls,...
Upcoming SlideShare
Loading in …5
×

Ontologies and Continuous Integration

1,805 views

Published on

Software engineering methodologies also work for Ontology engineering. This presentation from Bio-Ontologies 2012 describes how we are using Jenkins CI in GO and other ontologies.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,805
On SlideShare
0
From Embeds
0
Number of Embeds
74
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Thera-25. Radiation therapy machine. The engineer had reused software from older models. These models had hardware interlocks that masked their software defects. Those hardware safeties had no way of reporting that they had been triggered, so there was no indication of the existence of faulty software commands.
  • Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly
  • Client/server
  • Client/server
  • Magic 10 minutesExample:Commit build:Use EL reasoner over main ontology, plus external disjointness axioms. Basic structural checksCan be executed painlessly in client ODEFast, immediate feedback, BUT doesn’t catch all issuesIntegrated ontology buildBring in external ontologiesIntegrated system buildCheck consequences of commits against all gene associations
  • User friendliness to the point of anthropomorhpization
  • See Erik Clarke’s talk
  • Recommended for anyone who uses ontologies
  • Integration, once attained, is easily lost
  • Ontologies and Continuous Integration

    1. 1. Continuous Integration of Open Biological Ontology Libraries Chris Mungall Lawrence Berkeley National Laboratory
    2. 2. Outline• What is Continuous Integration and why we need it for ontologies• A build tool for ontologies: OORT• Example workflows: GO and HPO• Lessons learned
    3. 3. Reuse and modularization of ontologies • Re-use, don’t re-invent – OBO Foundry • Modularize – Ontologies should not be monolithic standalone entities – Apply Rector normalization pattern • Building block approach – Analogous to software engineering Rector A. Modularisation of domain ontologies implemented inhttp://obofoundry.org Description Logics and related formalisms including OWL. Proceedings of the 2nd international conference on Knowledge capture (2003)
    4. 4. Examples of ontology re-use • GO is re-using the CHEBI classification of chemical entities carotenoid carotenoid biosynth – Using GONG* methodology xanthophyll xanthophyll biosynth – Automated classification • The Human Phenotype (HP) ontology is re- using FMA classification of anatomical structures • GFF3 format re-uses SO for genome feature types and validation*Wroe, C. J., Stevens, R., Goble, C. A., & Ashburner, M. (2003). A methodology to migrate the gene ontology to adescription logic environment using DAML+OIL. Pac Symp Biocomput, 624-35.
    5. 5. Reuse is not problem-free• Modules which are tested in one context may not work in another – Example: Therac-25 radiation therapy machine fatal errors – Causes of failure were complex • Software tested and used on previous models was re-used – Most software engineers are personally familiar with less lethal examples • Lesson: – Not an excuse to re-implement de-novo – Integration testing is vital – This applies to ontologies too • Inter-ontology integration • Integration between ontologies and software systems
    6. 6. Integration testing in software engineering• Traditional waterfall model – Integration testing at end – Deferral = pain • Agile, test-driven model – automated Continuous integration (CI) testing – Immediate feedback http://martinfowler.com/articles/continuousIntegration.html
    7. 7. Example CI Server Architecture Developer Developer Local IDE Local IDE java java update/ commit Web CI Web UI VCS VCS UI external server code repository java perl production Release deploy Release environment clone CI Server
    8. 8. Jenkins-CI• A popular extendable open source continuous integration server• Easy to set up and administer• Multiple plugins• Large helpful user base• Powerful, clean web based dashboard• Integrates with most Version Control Systems (VCSs) http://jenkins-ci.org/
    9. 9. What’s this got to do with ontologies?Software Engineering Ontology EngineeringSource Code (.java, .pm) Ontology (.owl, .obo)Version control system Version control systemBuilds/Releases Builds/ReleasesIDE (Eclipse, Netbeans, …) ODE (Protégé, OBO-Edit)Bugs ‘true path’ violations, inconsistenciesJunit/Xunit Tests • OWL Logical Axioms • Structural constraints • Terminology checksBuild tool (ant, maven) ???Integration tests ???Integration server Integration server
    10. 10. Oort: A build tool for ontologies .obo .owl • What does it do? – Runs ‘ontology unit tests’ and creates releases obo2owl .gaf – Logical tests: • No unsatisfiable classes • No inferred equivalencies between named classes Oort – Other tests: OWL • ≤ 1 textual definition per class API Reasoner • ≤ 1 RDFS label per class • How does it work? verifications – Built on top of OWL-API • Most OWL reasoners are available owl2obo – GUI report • For end-users .obo report report .obo – Command line • For use in CI server .owl .owlhttp://code.google.com/p/owltools/wiki/OortIntro
    11. 11. Example basic workflow• Client: – Make local modifications using OBO Edit – Commit changes to SVN – (optionally) checks dashboard in web browser
    12. 12. Example basic workflow• Client: – Make local modifications using OBO Edit – Commit changes to SVN – (optionally) checks dashboard in web browser• Server: – Jenkins polls SVN – External commit triggers • build-go job: Jenkins to launch the build- – Load main ontology go job (using Oort) – Import external disjointness axioms – Launch hermit – Write reasoner report – Fail if unsatisfiable classes found – Run additional perl checks, ensure external xrefs resolve, etc
    13. 13. Example basic workflow FAIL• Write reasoner report SUCCESS• If previous build was fail, Jenkins sends ‘service resumed’ email• Downstream jobs are triggered • Jenkins sends email alert to mail list • (e.g. bigger integrated builds, • GO editor debugs, fixes then recommits deployment)
    14. 14. OBO Jenkins dashboardIn progress –Cell ontology (cl) build Red ball = FAIL ‘outlook’ http://build.berkeleybop.org/
    15. 15. Why we need this for GO • GO is gradually moving towards leveraging external ontologies and automated reasoning – E.g.New metabolism terms come in via TermGenie • User simply selects CHEBI class – Automated graph placement (Elk) CHEBI GO ‘carotenoid biosynthesis’ EquivalentTo carotenoid carotenoid biosynth biosynthesis and ‘has output’ some carotenoid xanthophyll ‘xanthophyll biosynthesis’ EquivalentTo xanthophyll biosynthesis and biosynth ‘has output’ some xanthophyllhttp://go.termgenie.org
    16. 16. Why we need this for GO • Automated quality control using reasoning – Taxon constraints – Useful for false function predictions CHEBI GO NCBITaxon ‘in taxon’ some Metazoa never in DisjointWith carotenoid carotenoid Metazoan ‘in taxon’ some Viridiplantae biosynth ‘carotenoid biosynthesis’ DisjointWith xanthophyll ‘in taxon’ some Metazoa xanthophyll biosynthDeegan, J., Dimmer, E., & Mungall, C. J. (2010). Formalization of taxon-based constraints to detect inconsistencies inannotation and ontology development. BMC bioinformatics, 11(1), 530. BioMed Central Ltd. doi:10.1186/1471-2105-11-530
    17. 17. Errors propagate in an integratedCHEBI environment GO MGI NCBITaxon never in carotenoidcarotenoid Metazoan biosynth xanthophyllxanthophyll biosynth Mus Musculus X X in taxon xanthine Ada (gene) xanthine biosynth Inference: propagation Ada SubClassOf owl:Nothing of errors
    18. 18. Server-side integration tests are vital CHEBI GO MGI NCBITaxon never in carotenoidcarotenoid Metazoan biosynth xanthophyllxanthophyll biosynth Mus Musculus X X in taxon xanthine Ada (gene) xanthine biosynth Inference: propagation Ada SubClassOf owl:Nothing of errors• Problem may not be apparent in developers local environment – Manifests when GO is integrated with gene associations• With CI, errors can be fixed at source
    19. 19. Staged builds• Fowler Principle: ‘Keep the build fast’• Staged builds – Balances needs of bug finding and speed Fastest; Most complete; Low CPU High CPU Ontology System Basic Integration Integration Build Build Build GO CHEBI Annotations disjoints Uberon CL Taxon PR
    20. 20. User experience• Previous environment: – Daily cron job, monolithic perl scripts• Informal survey results: – Gene Ontology developers love Jenkins• Popular Features: – Transparency of build process – Direct feedback – User-friendliness – ‘build lights’• Particularly useful for obo/owl hybrid workflows
    21. 21. Human Phenotype Ontology is deployed using CI • HPO: ~10k classes • Logical definitions have dependencies on: – FMA; PATO; Uberon; GO; CL • Annotations – Link OMIM disorders to HPO classes • Validation – Oort and GULO • Uses Hudson rather than JenkinsKoehler S et al (2008) Improving ontologies by automatic reasoning and evaluation of logical definitions. BMC Bioinformatics 12(1)
    22. 22. CI best practice: use a VCS• Ontologies are source code – Always use a version control system to manage your source code • Sorry, this is non-negotiable• CI server integration with VCSs is a great feature – Polling – Commit metadata coupled with builds• Downside of VCSs: – OWL syntaxes are almost always preferable to obo format, except • They suck with VCSs – spurious diffs • We’re working on a solution
    23. 23. Future Enhancements• Migrate OBO-Edit verification checks to OWL API• Phase out perl and OBO-Format validation scripts and move to OWLAPI plus OPPL2 for scripting• Extend GO validation pipeline to include term enrichment gold standard sets – E.g. after ontology change does the p-value of angiogenesis change in the glioblastoma gene set? • (Example stolen from Erik Clarke’s talk)
    24. 24. Availability• Oort: • http://code.google.com/p/owltools/wiki/OortIntro• OBO build server: • http://build.berkeleybop.org • You can request to have your ontology and custom build pipeline added – obo-admin@obofoundry.org • Easy to clone our config and set up your own server
    25. 25. Conclusions• What works for software can work for ontologies – Ontology engineering should become more like Software engineering• Ontology re-use can be hard – A CI server is vital for staying integrated• Simple = good – Admin: Jenkins is easy to set up and maintain – Users: +1• Successful for GO, HPO – Now being extended to other ontologies – May be a vital component in OBO Foundry infrastructure• CI will be integral as information systems evolve to depend more on ontologies
    26. 26. Acknowledgments• Tanya Berardini, Rebecca Foulger, David Hill, Jane Lomax, Paola Roncaglia, Midori Harris, Ramona Walls, Laurel Cooper (beta testers)• Heiko Dietze (Oort)• Sebastian Bauer (HPO)• Seth Carbon, Amelia Ireland (Jenkins wrangling)• GO PIs• Jenkins

    ×