Ontology Access Kit_ Workshop Intro Slides.pptx

Ontology Access Kit (OAK)
workshop / soft launch
2022-06-15
https://incatools.github.io/ontology-access-kit
https://bit.ly/get-oak

Agenda
Part 1 (first hour)
● Background (these slides):
○ Motivation & Background
○ Key Concepts
● Demo/Tutorial
○ Installation
○ Command Line Usage
● Example apps:
○ Mapping Walker
○ …
Part 2 (second hour)
● Roadmap and milestones
https://github.com/INCATools/ontology-
access-kit/milestones
● Using SQLite
● Code walkthrough
○ LexMatch
○ SemSim
● Design Decisions
○ Nomenclature
○ Architecture

Why would I want a generic ontology library?
● To build data infrastructure
○ Data repositories
○ Knowledge bases
● To clean and analyze data
○ Data annotation and alignment
○ Data interpretation and discovery
■ gene set enrichment and pathway analysis
■ Semantic similarity and knowledge base embedding
■ Rolling up data
● To explore and build ontologies
○ Visualization, lookup, mapping, quality control

Why would I want ANOTHER generic ontology library?
“I already use X, it’s great!”
I assume if you’re here, then X isn’t a great fit for all your problems

Methods of ontology access (i.e computational use of ontologies)
External API or query server
REST-ish API
● BioPortal / OntoPortal
● OLS
Query Interface (SPARQL)
● Ubergraph
● Ontobee

Methods of ontology access (i.e computational use of ontologies)
REST-ish API
● BioPortal / OntoPortal
● OLS
Query Interface (SPARQL)
● Ubergraph
● Ontobee
Local File
● RDF/OWL
● OBO Format
● OBO Json
Libraries
● Pronto/fastobo
● OWLAPI
● FunOWL
● OwlReady
● Obonet
● Or simply: Curl/requests

Advantages and disadvantages of different methods
Advantages:
● No local download necessary
● Minimal local compute
Disadvantages:
● Doesn’t have my ontology/version
● Doesn’t do the thing I need
○ Many operations are more suited to in-
memory processing
○ Multi high-latency call antipattern
● Occasional downtime
[counterpoint: run service locally]

Advantages and disadvantages of different methods
Advantages:
● No local download necessary
● Minimal local compute
Disadvantages:
● Doesn’t have my ontology/version
● Doesn’t do the thing I need
○ Many operations are more suited to in-
memory processing
○ Multi high-latency call antipattern
● Occasional downtime
[counterpoint: run service locally]
Local File
Advantages
● Control over ontology/version
● Efficient operations (e.g. graph/recursive)
Disadvantages
● Doesn’t scale for long-tail sized ontologies
○ PRO, CHEBI, NCBITaxon
Increased effort / increased control

The format/datamodel landscape
OWL, RDF
● Great for communicating to OWL reasoners
● OWL and RDF are two datamodels, multiple formats…
● Wrong abstraction for many problems
○ Triples != Edges
○ Axioms != Edges
○ Most code using OWL written by non-OWL gurus is dangerously wrong
● Poor historic support outside JVM
● Poor scaling for long-tail ontologies
○ Even mid-size ontologies like Uberon are slow to parse with rdflib
https://www.w3.org/TR/owl2-primer
Use of OWL within the Gene Ontology
Christopher J Mungall, Heiko Dietze, David Osumi-Sutherland (OWLED)
doi: https://doi.org/10.1101/010090

OBO Format
● Poor for long-tail of expressivity
○ Good enough for 95% of purposes
○ Many rarely used OWL constructs can’t be expressed
○ Poor internationalization
● High hidden legacy cost of some design decisions
○ E.g. identifiers
● Parsing
○ Easy to do quick hacky parsers
○ Surprisingly hard to write a robust parser
● Not used outside biology
● Impossible to wean bioinformaticians off it
https://owlcollab.github.io/oboformat/doc/obo-syntax.html

OBO JSON
● Designed for the long-tail
○ Core structures intended to serve 95% of purposes in an easy fashion
○ Additional constructs can be added for remaining 5%
● Low uptake
https://github.com/geneontology/obographs
https://douroucouli.wordpress.com/2016/10/04/a-developer-friendly-json-exchange-format-for-ontologies

Other options
● Ad-hoc TSVs
● RRF
● SKOS

Existing Ontology Libraries
● There is a multitude of ontology libraries
● For brevity we will highlight a few key ones
○ Emphasis on Python

OWLAPI (Java)
● Complete support for all OWL specifications
○ This is highly non-trivial!!!
● Only way to directly communicate with multiple reasoners
● Has been indispensable for ontology development cycle
○ ROBOT, Protege
● Less widely used for ontology application cycle
● Challenges:
○ JVM
○ Size long-tail: Local file / memory bound, no DBMS access
○ OWL is not the right level of abstraction for many problems
https://github.com/owlcs/owlapi

FunOWL (Python)
● Partial support for OWL specifications
○ Read: Functional
○ Write: Functional, RDF
○ Not intended for communication with reasoners
● Less well supported
○ But does what is in scope quite well
https://github.com/hsolbrig/funowl

Horned OWL (Rust)
● Rust OWL library
○ Implements all of OWL2
○ Faster than OWL API
● Experimental python bindings
○ https://github.com/jannahastings/py-horned-owl
https://github.com/phillord/horned-owl

Pronto (Python, with Rust bindings)
● Support for OBO-Format and OBO-Format OWL profile
● Pros: Python, Fast (fastobo), and robust
● Cons (all related to coupling with OBO-Format):
○ Most ontologies don’t conform to its strict profile (fixable)
○ long-tail of expressivity (hard to fix)
https://github.com/althonos/pronto
https://github.com/fastobo/fastobo

Notable mentions
● OWL-y
○ SCOWL
○ OwlReady2
● Graph-y
○ Nxontology - https://github.com/related-sciences/nxontology
○ obonet
○ Ontobio
○ Phenol (Java)
○ OntologyX [R]
● API wrappers
○ Rols [R]

Existing APIs and Endpoints
● SPARQL
● SQL

owlery
● Web API for OWL
● Supports reasoning
● Easy to stand up
https://github.com/phenoscape/owlery

BioPortal
● Comprehensive and broad
○ Nearly 1k ontologies
○ Up-to-date
● Robust, well-documented APIs
○ Term access
○ Automated Mappings
○ Annotator
● API key required
● Broader than biomedicine; OntoPortal
○ MatPortal
○ AgroPortal
○ EcoPortal
https://data.bioontology.org/documentation

OLS: Ontology Lookup Service
● Less comprehensive but curated for quality
○ 275 ontologies
○ Up-to-date
● Robust, well-documented APIs
○ Term access
○ Curated Mappings (OxO)
○ Annotator (ZOOMA)
● Many local Docker installations
https://www.ebi.ac.uk/ols/docs/api

OntoBee SPARQL endpoint
● OBO plus others
● SPARQL endpoint
○ More expressive than an API
○ But sometimes more work to express yourself!
● Main drawback:
○ RDF/SPARQL not the right level of abstraction for
many tasks
■ E.g. impossible to get part-ancestors
https://www.ontobee.org/sparql

Ubergraph SPARQL endpoint
● Subset of OBO Library
○ (mostly) “coherent collection”
● SPARQL endpoint
● Relation Graph pre-calculated
○ Part-of ancestor queries is easy!
https://github.com/INCATools/ubergraph

Ubergraph SPARQL endpoint uses Relation Graph
● Edges (direct and indirect) as simple triples
https://github.com/balhoff/relation-graph Parts of organs that are parts
of abdomens

Relational Database: Semantic SQL
● Uses rdftab (Rust) for fast loading into triple tables
● Uses relation-graph (Scala) for entailed edge tables
● Uses SQL views to provide higher level constructs (OWL and RG)
● Python ORM for those that like that sort of thing
https://github.com/INCATools/semantic-sql/

Relational Database: Semantic SQL
● Uses rdftab (Rust) for fast loading into triple tables
● Uses relation-graph (Scala) for entailed edge tables
● Uses SQL views to provide higher level constructs (OWL and RG)
● Python ORM for those that like that sort of thing
https://github.com/INCATools/semantic-sql/
Find terms with mappings to
Allen Brain Atlas that are not part
of the brain

Notable Mentions: Endpoints
● Aber-OWL (OBO triplestore)
○ Supports SPARQL and DL queries
● SciGraph (neo4j)
● …

Just tell me which one to use to do X
Historically we have said: IT DEPENDS ON X (and Y and Z)
● End result: a multiple square pegs and round holes

OAK: The People’s Pluralistic Python Ontology Library
Support multiple conceptualizations of what an
ontology is
● A relational graph for data analysis
● A collection of logical statements
● A vocabulary for text mining
● A collection of concepts and mappings for
tagging data
● Terminological units plus rich metadata (e.g.
conforming to OMO datamodel)
These are all to some extent interlocking

ontology is
tagging data
Support multiple modes of access
● Local files
○ obo, json, rdf, owl
● Remote API services
○ Ontology portals
○ Large scale annotation
● Local or remote database
○ SPARQL
○ SQL

ontology is
tagging data
Support multiple modes of access
● Local files
○ obo, json, rdf, owl
● Remote API services
○ Ontology portals
○ Large scale annotation
● Local or remote database
○ SPARQL
○ SQL
Mid-term goal: Speed

OAK: Key Concepts
Interface: What do I want to do?

OAK: Key Concepts
Interface: What do I want to do? Implementation: How is it done behind the
scenes?

OAK: Key Concepts
● Basic Ontology Interface
● OboGraph Interface
● Owl Interface
● Simple Triples Interface
● Text Annotator Interface
● Validation Interface
● Ontology Changer Interface
● Mapping Retrieval Interface
● Lexical Mapping Interface
Implementation: How is it done behind the
scenes?

OAK: Key Concepts
● Basic Ontology Interface
● OboGraph Interface
● Owl Interface
● Simple Triples Interface
● Text Annotator Interface
● Validation Interface
● Ontology Changer Interface
● Mapping Retrieval Interface
● Lexical Mapping Interface
Implementation: How is it done behind the
scenes?
● Pronto Implementation
● SQL Database Implementation
○ SQLite
○ PostgreSQL
● SPARQL Implementation
○ Ontobee
○ Ubergraph
○ Local RDF File
● Ontoportal Implementation
○ Bioportal
○ Agroportal
○ MatPortal
● FunOWL Implementation
● ROBOT-Python Implementation
● TODO: neo4j, ensmallen, mongo, solr, …

Implementation
Selectors
https://incatools.github.io/ontology-
access-kit/selectors.html

Datamodels / Mini standards
access-kit/datamodels/index.html

Datamodels / Mini standards
access-kit/datamodels/index.html
https://incatools.github.io/ontology-access-kit/datamodels/text-annotator

Goal: Extensive Docs
Following diataxis.fr
Using sphinx

Sphinx autodoc uses liberally

Code-independent datamodel
docs

Glossary
Code aside, we really need
this!

Demo / Tutorial
Part 1: Command Line
● Basic lookup/search
● Different implementations
○ Obo
○ Ubergraph
○ OWL
○ Bioportal
● Using SQLite
● OboGraphViz
Part 2: Python usage
https://incatools.github.io/ontology-access-kit/intro

[Switch to demo / tutorial here]
https://incatools.github.io/ontology-access-kit/intro

Using SQLite
Why use the SQLite backend?
● No long tail expressivity loss
● It’s fast
○ No parse penalty: Once downloaded, access is instantaneous
○ Entailed_edge pre-loaded, for fast graph queries
○ SQLite is in general fast
○ Further optimizations easy - e.g concretizing views

Behind the scenes
$ chebi -vv ancestors -p i CHEBI:15356
DEBUG:root:Ancestors query:
SELECT entailed_edge.subject AS entailed_edge_subject, entailed_edge.predicate AS entailed_edge_predicate,
entailed_edge.object AS entailed_edge_object
FROM entailed_edge
WHERE entailed_edge.subject IN (__[POSTCOMPILE_subject_1]) AND entailed_edge.predicate IN (__[POSTCOMPILE_predicate_1])

How to use SQLite
Method 1: Use ready-made OBO files
● protocol A
○ Download from S3
■ https://s3.amazonaws.com/bbop-sqlite/hp.db
● Protocol B
○ Download using semsql
■ semsql download obi -o obi.db
● Protocol C
○ Use the obo sqlite selector
■ runoak -i sqlite:obo:obi COMMAND
Method 2: build your own

How to use SQLite
Method 1: Use ready-made files Method 2: build your own
● Protocol A:
○ Install rdftab
○ Install relation-graph
○ Prepare rdf/xml (pre-merged) file
■ E.g obi.owl
○ semsql make obi.db
● Protocol B:
○ Use ODK docker image
○ semsql make –docker obi.db
■ Requires ODK v13.1

Plans to make this easier
Rdftab.rs
● Job: load “statements” table from RDF
● Currently only accepts RDF/XML
Make it easier to install:
● In Rust, so easy binding PyO3
● OR could write in Python
○ Don’t need the “stanza” functionality
Relation-graph
● Job: load entailed_edge table using a
reasoner
○ Only requires tiny OWL profile (SubC,
Some, Transitivity, Property Chain)
● Written in Scala
○ Possible souffle rewrite
Can we make part of python install?
● Proposed path:
○ Write reasoner using
https://github.com/ekzhang/crepe
■ (10 lines of datalog)
○ Provide PyO3 bindings
https://github.com/INCATools/semantic-sql/issues/41

OboGraphViz
cl viz -p i,p 'memory T-cell'
https://github.com/INCATools/obographviz

Installing
1. Install graphviz
2. npm install -g obographviz

Configuring
https://github.com/INCATools/ontology-access-kit/blob/main/src/oaklib/conf/obograph-style.json

Ontology Access Kit_ Workshop Intro Slides.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ontology Access Kit_ Workshop Intro Slides.pptx

Similar to Ontology Access Kit_ Workshop Intro Slides.pptx (20)

More from Chris Mungall

More from Chris Mungall (20)

Recently uploaded

Recently uploaded (20)

Ontology Access Kit_ Workshop Intro Slides.pptx

Editor's Notes