Data integration is intrinsic to how modern research is undertaken in areas such as genomics, drug development and personalised medicine. To better enable this integration a large number of biomedical ontologies have been developed to provide standard semantics for describing metadata. There are now several hundred biomedical ontologies in widespread use that describe concepts such as genes, molecules, drugs and diseases. This amounts to millions of terms that are interconnected via relationships that naturally form a graph of biomedical terminology.
The Ontology Lookup Service (OLS) (http://www.ebi.ac.uk/ols) integrates over 160 ontologies and provide a central point for the biomedical community to query and visualise ontologies. OLS also provide a RESTful API over the ontologies that is used in high-throughput data annotation pipelines. OLS is built on top of a Neo4j database that provides efficient indexes for extracting ontological relationships. We have developed generic tools for loading RDF/OWL ontologies into Neo4j where the indexes are optimised for serving common ontology queries. We are now moving to adopt graph database more widely in applications relating to ontology mapping prediction and recommendation systems for data annotation.
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Building a Biomedical Ontology Repository with Neo4j
1. Building a repository of biomedical
ontologies with Neo4j
Simon Jupp
Samples, Phenotypes and Ontologies Team
European Bioinformatics Institute
Cambridge, UK.
2. The challenge - thousands of data
attributes…
• European Archive for molecular data
• ENA, EVA, EGA, BioSample, ArrayExpress
• How do we make sense of the data?
• SPOT team builds tools to support the mapping of this data to ontologies
and other standards
3. Why we need terminology standards (or
ontologies)
Dyschromatopsia
7. The ontology of color blindness
HP:0011518 (Dichromacy )HP:0011518 (Eye)
HP:0000551 (Abnormality of color vision )
HP:0007641 (Dyschromatopsia)
Is-a
Is-a
Disease-location
8. Ontology powered applications
Query expansion in the Gene Expression Atlas – searching “eye disease” finds
genes expressed in “Turner syndrome”
https://www.ebi.ac.uk/gxa/home
10. Ontology powered applications
SNP – trait
associations in the
GWAS catalog
All traits mapped to
disease, phenotype
and measurements
in EFO
https://www.ebi.ac.uk/gwas/
Cardiovascular disease traits
11. 11
Genotype Phenotype
Sequence
Proteins
Gene products Transcript
Pathways
Cell type
BRENDA tissue /
enzyme source
Development
Anatomy
Phenotype
Plasmodium
life cycle
-Sequence types
and features
-Genetic Context
- Molecule role
- Molecular Function
- Biological process
- Cellular component
-Protein covalent bond
-Protein domain
-UniProt taxonomy
-Pathway ontology
-Event (INOH pathway
ontology)
-Systems Biology
-Protein-protein
interaction
-Arabidopsis development
-Cereal plant development
-Plant growth and developmental stage
-C. elegans development
-Drosophila development FBdv fly
development.obo OBO yes yes
-Human developmental anatomy, abstract
version
-Human developmental anatomy, timed version
-Mosquito gross anatomy
-Mouse adult gross anatomy
-Mouse gross anatomy and development
-C. elegans gross anatomy
-Arabidopsis gross anatomy
-Cereal plant gross anatomy
-Drosophila gross anatomy
-Dictyostelium discoideum anatomy
-Fungal gross anatomy FAO
-Plant structure
-Maize gross anatomy
-Medaka fish anatomy and development
-Zebrafish anatomy and development
-NCI Thesaurus
-Mouse pathology
-Human disease
-Cereal plant trait
-PATO PATO attribute and value.obo
-Mammalian phenotype
- Human phenotype
-Habronattus courtship
-Loggerhead nesting
-Animal natural history and life history
eVOC (Expressed
Sequence Annotation
for Humans)
Ontologies for life sciences
12. Ontology Lookup Service
• Ontology search engine
• Ontology term history tracking
• Ontology visualisation
• Powerful RESTful API
Repository of over 160 pre-selected biomedical ontologies (4.5 million terms, 11
million relationships)
http://www.ebi.ac.uk/ols
• Provides unified mechanism to access
multiple ontologies
• Large community of users (~5000 p/m, 100s
of millions of hits p/m)
• Open source and dockerised
14. Build process
Nightly crawl of
all registered
ontologies
Multiple indexes created
with standalone Spring Boot
applications
API and website
run with Spring data
https://ebispot.github.io
Open Source Software
15. Loading ontologies into Neo4j
• Ontologies usually published in W3C
OWL format
• RDF based (so already a graph)
• …but not a very friendly graph for our
use-cases (more on this this afternoon)
• Primary OLS use-cases for a graph
• Term hierarchy (parent/child)
• Simple view over other relationships
• Part of, develops from
• Extracting subgraphs/subsets
• e.g. taxon specific subsets
16. OWL to Neo4j schema
Every term is a node with an label for each ontology
Each relationship and subset relation is labeled (is-a, part-of, develops-from etc..)
17. Powerful yet simple queries
• Get the transitive closure for “heart” following parent and
partonomy relations from the UBERON anatomy ontology
MATCH path = (n:Class)-[r:SUBCLASSOF|RelatedTree*]
->(parent)<-[r2:SUBCLASSOF|RelatedTree]-(sibling:Class)
WHERE n.ontology_name = {0} AND n.iri = {1}
18. Ontology Mappings
• We now have too many ontologies!! with overlapping scope
• Millions of mappings exists to interlink the ontologies
Datasource 1 Datasource 2
Human
Phenotype
Ontology
SNOMED-CTMappings
Xref
19. Ontology Mapping Service (OxO)
• New database of mappings built with Neo4j
• Crawls OLS ontologies and UMLS for mappings and provides UI and
API to access all known mappings
* Went live March 2017
http://www.ebi.ac.uk/spot/oxo *
20. Exploring the Xref graph
• We build a graph in Neo4j of known xrefs
• Direct mappings to NCIt “Retoinoblastoma” from Disease
ontology (DO) and EFO
22. Problems with mappings
• But exposes inconsistencies in public mapping
• Use this as basis for fixing and confirming mappings
23. Conclusion
• Neo4j being adopted in multiple projects across this
institute
• Liked as provides simple and effective solution to some of
our data modelling challenges
• Neo4j is a good fit for working with ontologies and
taxonomic data
• Excellent developer integration for building applications
e.g. Spring-data-neo4j
24. Ontology team
Helen ParkinsonTony Burdett
Sira SarntivijaiOlga Vrousgou Thomas Liener
Funding
• EMBL
• CORBEL This project receives funding from the
European Union’s Horizon 2020 research and
innovation programme under grant agreement No
654248.
• EXCELERATE ELIXIR-EXCELERATE is funded by
the European Commission within the Research
Infrastructures programme of Horizon 2020, grant
agreement number 676559.
25. Predicting annotation
• We do a lot of data curation with ontologies
• Need better support for mapping prediction
• E.g. Sample likes these are usually annotated with these
terms
• Need species specificity e.g. only mapping plant samples
with plant ontology terms
Input from submission Ontology class
2’-deoxy-5-azacytidine 5-aza-2’-deoxycytidine
Ovarian Cancer ovarian carcinoma
Anterior tibialis tibialis anterios
Endothelium, Vascula cardiovascular system endothelium
26. Tagging with ontologies
• We have built a large corpus of known mappings
between “data values” and ontology terms
• Piloting building a recommendation engine for our
curation tools with Neo4j