SlideShare a Scribd company logo
Building a repository of biomedical
ontologies with Neo4j
Simon Jupp
Samples, Phenotypes and Ontologies Team
European Bioinformatics Institute
Cambridge, UK.
Outline
• Why we care about ontologies in biology
• Why we need a repository of ontologies
• Building a new Ontology Lookup Service (OLS) at the
EBI
• Index OWL ontologies in Neo4j
• OLS Infrastructure
• Challenges with Neo4j
• Neo4j and Linked Open Data
What is EMBL-EBI?
• Part of the European
Molecular Biology
Laboratory
• International, non-profit
research institute
• Europe’s hub for
biological data services
and research
• Based in Hinxton,
Cambridge
Data resources at EMBL-EBI
Genes, genomes & variation
ArrayExpress
Expression Atlas
Metabolights
PRIDE
InterPro Pfam UniProt
ChEMBL ChEBI
Literature &
ontologies
Europe PubMed Central
Gene Ontology
Experimental Factor
Ontology
Molecular structures
Protein Data Bank in Europe
Electron Microscopy Data Bank
European Nucleotide
Archive
1000 Genomes
Gene, protein & metabolite expression
Protein sequences, families & motifs
Chemical biology
Reactions, interactions &
pathways
IntAct Reactome MetaboLights
Systems
BioModels
Enzyme Portal
BioSamples
Ensembl
Ensembl Genomes
European Genome-phenome Archive
Metagenomics portal
Biological data heavily interlinked
Proteome
Metabolome
Genome
tissue
CE-MS
antibody array LC-MS/MS
m/z
600 800 1000 1200 1400 1600
10
20
30
40
50
60
70
80
90
100
Intensity
609.256
b6
755.422
y8
882.357
b9
852.476
y9
995.435
b10
1092.506
b11
1181.252
y12
1318.578
b13
1587.759
b16
1715.817
b18
858.408
b18 ++
794.380
b16 ++
0
miRNA
array
mRNA
array
PathwaysProtein Interaction
Drug targets
We have a lot of data silos
• A lot of public data
• Heterogeneous semantics, formats, identifiers
• EBI and other institutes invest heavily in cross-linking
resources
We need terminology standards
CanineDog
Different Words Same Concept
One Identity for each entity
• Mouse or Mus or mice = NCBITaxon_10088
• …but not all mice are equal
Building ontologies
• Put things into categories
• Helps organise the data
• Allows us to generalise over data
• Capture the relations between things
• Anatomical parts
Biopolymer
Nucleic Acid Polypeptide
EnzymeDNA RNA
tRNA mRNA smRNA
Web Ontology Language – (OWL)
• W3C standard vocabulary for describing
ontologies
• OWL is based on a description logic
• We can use it to describe sets of things based
on their properties
• A subclassOf B - Implies all things of type A, are
also things of type B
• “heart” part-of “Cardiovascular System”
• Powerful knowledge representation
‘mitochondrial chromosome’ ‘equivalent to’
chromosome and ‘part of’ some mitochondrion
Using a DL reasoner to infer classification
Relatively flat asserted view Inferred polyhierarchy
OWL reasoner
12
Genotype Phenotype
Sequence
Proteins
Gene products Transcript
Pathways
Cell type
BRENDA tissue /
enzyme source
Development
Anatomy
Phenotype
Plasmodium
life cycle
-Sequence types
and features
-Genetic Context
- Molecule role
- Molecular Function
- Biological process
- Cellular component
-Protein covalent bond
-Protein domain
-UniProt taxonomy
-Pathway ontology
-Event (INOH pathway
ontology)
-Systems Biology
-Protein-protein
interaction
-Arabidopsis development
-Cereal plant development
-Plant growth and developmental stage
-C. elegans development
-Drosophila development FBdv fly
development.obo OBO yes yes
-Human developmental anatomy, abstract
version
-Human developmental anatomy, timed version
-Mosquito gross anatomy
-Mouse adult gross anatomy
-Mouse gross anatomy and development
-C. elegans gross anatomy
-Arabidopsis gross anatomy
-Cereal plant gross anatomy
-Drosophila gross anatomy
-Dictyostelium discoideum anatomy
-Fungal gross anatomy FAO
-Plant structure
-Maize gross anatomy
-Medaka fish anatomy and development
-Zebrafish anatomy and development
-NCI Thesaurus
-Mouse pathology
-Human disease
-Cereal plant trait
-PATO PATO attribute and value.obo
-Mammalian phenotype
-Habronattus courtship
-Loggerhead nesting
-Animal natural history and life history
eVOC (Expressed
Sequence Annotation
for Humans)
Ontologies for life sciences
We do a lot of tagging
CL:CL_0000071
(blood vessel
endothelial cell)
obo:CHEBI_39867
(valproic acid)
NCBITaxon:NCBITa
xon_9606
(Homo Sapiens)
Ontologies add value
Smarter searching
Data visualisation
Data analysis
Data integration
Summary so far…
• Ontologies provide a “semantic glue” for integrating
biological data
• There’s a lot of ontologies about
• The biological community need ontology infrastructure
and services
• Ontologies can be complex
• Ontologies can be big
• Ontologies can change
Ontologies as Graphs
• OWL ontologies aren’t graphs, but…
… can be represented as an RDF graph
… people want to use them as graphs
• Plenty of RDF databases around
• But incomplete w.r.t. OWL semantics
• SPARQL is an acquired taste
Ontology repository use-cases
• Search for ontology terms
• labels, synonyms, descriptions
• Querying the structure
• Get parent/child terms
• Querying transitive closure
• Get ancestor/descendant terms
• Querying across relations
• Partonomy or development stages
• A graph database and search index should satisfy
these requirements
The old Ontology Lookup Service
• EBI been hosting a repository of over 100 Bio-medical
ontologies for past 10 years
• SOAP services for programmatic access
• Up to 25 million requests per month (mostly API).
http://www.ebi.ac.uk/ontology-lookup
Building a repository of biomedical ontologies with Neo4j
Why we need a new OLS
• Old codebase (+10 years in
places)
• Updated to work with OWL (not
OBO)
• Uses Oracle RDMS and SQL
for querying ontology structure
(suboptimal)
• Ditch SOAP/XML in favour of
REST/JSON
OLS 3.0
• Rebuilt from scratch
• Polls ontologies by URL
• Server side checksum for detecting changes in files
• Uses Java OWL API for loading (still supports OBO)
• Infer relations with reasoner
• RESTful API built with Spring Data
• Multiple indexes for scalable querying
• SOLR server – text queries
• Embedded Neo4j – graph queries (drives REST API)
• Virtuoso server – SPARQL for Advanced users
OLS 3 beta is now live
• http://www.ebi.ac.uk/ols/beta/
• 140 ontologies
• Neo4j version 2.2
• Runs in embedded mode
• Inside Tomcat container
• 7 million nodes
• 11 million edges
• ~10Gb on disk
• Generic ontology infrastructure
• Can load any OWL or SKOS file
• Built with standard technologies
• Solr, Neo4j, Spring IO, Thymeleaf,
Bootstrap, Jquery
• Includes stand-alone Spring-Boot app for
loading ontologies into Neo4j
• Open-source project
https://github.com/EBISPOT/OLS
REST API
• Search across any field in one or more ontologies (SOLR)
• /search
• Get ontology and term meta data (Neo4j)
• /ontologies
• /ontologies/{name}
• /ontologies/{name}/terms
• /ontologies/{name}/terms/{termid}
• Get related terms and navigate ontology structure (Neo4j)
• /ontologies/{name}/terms/{termid}/parent
• /ontologies/{name}/terms/{termid}/children
• /ontologies/{name}/terms/{termid}/descendants
• /ontologies/{name}/terms/{termid}/ancestors
• /ontologies/{name}/terms/{termid}/{relation} e.g. part_of
• Get JSON for common visualisation libraries (Neo4j)
• /ontologies/{name}/terms/{termid}/tree
• /ontologies/{name}/terms/{termid}/graph
http://www.ebi.ac.uk/ols/beta/api
OWL to Neo4j schema
Label every node by type (e.g. class, property or individual) and ontology id
Label every relation by name
include additional index for “special relations” like partonomy and subsets
Nightly Neo4j build process
Nightly crawl of all
>140 registered
ontologies
Use the Java OWL API and
reasoner to classify ontology
(get the inferred
classification)
Use Neo4j
BatchInserter to
update neo4j index
Download file
create checksum
If the file is new
Drop ontology from
neo4j index
OLS 3.0 Infrastructure
2 x Load balanced Tomcat servers
Two data centers
Data center 1 (8GB VM) Data center 2 (8GB VM)
Why Neo4j?
• Our primary use-case required a graph store
• OWL mapping to RDF graph is complex (lots of blank
nodes)
• We wanted Spring Data and Spring Data Rest
• Less code for us to maintain
• Didn’t want to write our own DAO using SPARQL
• (We’ve tried this on another project)
• We wanted something that we could rely on with
community behind it
• Neo4j was quick to pick up
• 1 day GraphAware course 4 months ago
• Working pilot for new OLS + Neo4j 1 month later
Powerful yet simple queries
• Get the transitive closure for “heart” following parent and
partonomy relations from the UBERON anatomy ontology
MATCH path = (n:Class)-[r:SUBCLASSOF|RelatedTree*]
->(parent)<-[r2:SUBCLASSOF|RelatedTree]-(sibling:Class)
WHERE n.ontology_name = {0} AND n.iri = {1}
Generating visualisations
MATCH path = (n:Class)-[r:SUBCLASSOF|Related]-(parent)
WHERE n.ontology_name = {0} AND n.iri = {1}
RETURN {nodes: collect( distinct {iri: p.iri, label: p.label}), edges: collect
(distinct {source: startNode(r1).iri, target: endNode(r1).iri, label:
r1.label, uri: r1.uri} )} as result
Generating common JSON representations directly from Cypher is very powerful
Challenges
• Wanted to utilise Spring for our REST API
• We had a REST resource hierarchy that we wanted
api/ontologies/{name}/terms/{termid}/parents
api/ontologies/{name}/terms/{termid}/children
• Too hard to get this to work using just an object model
and SDN alone
• No matter what we tried always ended up sending Neo4j
into a spin
@NodeEntity
@TypeAlias(value = "Class")
public class Term {
@RelatedToVia (direction= Direction.OUTGOING, type = ”SUBCLASSOF")
@Fetch Set<Term> parents;
@RelatedToVia (direction= Direction.INCOMING, type = ”SUBCLASSOF")
@Fetch Set<Term> children;
}
…but it was easy enough to achieve what we
wanted with some Spring magic
Repository interface with custom Cypher
Define our own controllers
Custom Resource Assemblers for HAL links
Challenges
• We need dynamic fields
• Neo4j is driving the REST API
• Each ontology term has metadata where we don’t know the
field names up front (e.g. ‘created by’ or ‘comment’)
• To get get the right set of dependencies we currently use
SDN 3.4.0
• Dynamic fields not supported in SDN 4.0
• We are forced to run in embedded mode
• Is this true?
• Scaling tips for running inside a tomcat please
Challenges
• Full index rebuild takes up to 20 hours
• Most nights the update runs in ~2 hours
• We have one master Neo4j db
• If an ontology needs updating we take it out and then reload
• Built on machine with 128GB memory + SSD
• There’s always a chance we might trash the entire index
• We’d like to build an index for each ontology
independently.
• Have a final stage where we merge all the successfully built
indexes
• Other suggestions?
Things we’d like to do
• Extract subsets from a graph
• Some nodes are tagged as being in a subset
• Help to give broad overview of an annotated datasets
• May require us to infer relations
Master graph Extracted subset graph
B cells
IGJ
IGHA1
LRRN3
SYT11
DSC1
SVIL
IGLC3
DPP4
MAN1C1
liver cancer
GNA01
CEP57
ASB1
PNPLA4
FA2H
NR4A1
IFNA2
TNPO1
epithelial cells
DST
FBLN1
BCL2
WDR1
METTL7A
CYB561
FGFR2
SPARC
EMC1
Calculating shortest paths
?Where do these nodes intersect?
How can we enrich these datasets using the ontologies?
Recap
• The EBI Ontology Lookup Service provides access to the
ontologies for biological researchers and database
curators
• Main priority is providing a scalable API for external services
to develop against
• Pilot of Neo4j quickly turned into our primary index for
driving the REST API
• There is no one fit solution for the backend, always some
compromise
• So we make the most of frameworks like Spring Data Solr
and Spring Data Neo4j to make creating multiple indexes
simpler
• Neo4j has been easy to get grips with and scaled well for
our setup with pretty much out of the box configuration
A word on Linked Data
• We have many years experience working with RDF and
Semantic Web technologies
• The EBI RDF platform –EBI data that has been converted to
RDF (Billions of triples)
• The ontologies and the data in one big federated graph
• http://www.ebi.ac.uk/rdf - powerful data integration platform
• Semantic Web technologies have struggled to get
mainstream adoption
• Reasons: Hype, Complexity, Baggage, Poor
implementations
• Remain relevant in the life sciences
• A lot of public data out there that needs to be integrated
Life sciences rely on Linked Open Data
• Linked data is a rebranding of the Semantic Web
• Core principles address our data integration needs
• Use URIs to identify things
• Type things with ontology terms
• Make sure URIs resolve (self describing documents)
• Link documents together
• We see some major wins if Neo4j was more linked data
friendly
• This doesn’t have to mean supporting SPARQL
• A general feeling of tension between Neo4j and the RDF
community
Final thoughts – Neo4j and JSON-LD?
• A lot of frameworks now make it trivial to produce good
APIs
• What’s currently missing is how to integrate data from two
or more independent APIs
• Hard to crawl independent datasets for connections without
a human to interpret semantics
• Still a need to express a schema alongside the data
• W3C standard like RDF/RDFS/SKOS/OWL provide the
basic vocabularies and semantics for expressing data
schemas
• JSON-LD is bridging the gap from JSON to RDF
Be open
• We are committed to making life science data public and
freely available
• Likewise the tools and software we develop to work with
the data are open
• We always strive to use products that are open and freely
available
• We can only use Neo4j while it continues to be made
available in this model
• Vendor lock-in for our products is very bad for us
• Graph database have great potential for biology
• But we need open standards for these databases
Acknowledgements
• Sample Phenotypes and Ontologies Team - Tony
Burdett, James Malone, Dani Welter, Catherine Leroy,
Sira Sarntivijai, Ilinca Tudose, Helen Parkinson
• Matt Pearce – Flax (BioSOLR project)
• Michal Bachman and GraphAware team
• Funding
• European Molecular Biology Laboratory (EMBL)
• European Union projects: DIACHRON, BioMedBridges and
CORBEL

More Related Content

What's hot

Neo4j graphs in government
Neo4j graphs in governmentNeo4j graphs in government
Neo4j graphs in government
Neo4j
 
Graphs in Automotive and Manufacturing - Unlock New Value from Your Data
Graphs in Automotive and Manufacturing - Unlock New Value from Your DataGraphs in Automotive and Manufacturing - Unlock New Value from Your Data
Graphs in Automotive and Manufacturing - Unlock New Value from Your Data
Neo4j
 
Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
Prashant Bhargava
 
Using Knowledge Graphs to Predict Customer Needs and Improve Quality
Using Knowledge Graphs to Predict Customer Needs and Improve QualityUsing Knowledge Graphs to Predict Customer Needs and Improve Quality
Using Knowledge Graphs to Predict Customer Needs and Improve Quality
Neo4j
 
Linked Data (再)入門
Linked Data (再)入門Linked Data (再)入門
Graph Database 101- What, Why and How?.pdf
Graph Database 101- What, Why and How?.pdfGraph Database 101- What, Why and How?.pdf
Graph Database 101- What, Why and How?.pdf
Neo4j
 
Larus: Il forte impatto della Graph Technology: l'esperienza di LARUS e numer...
Larus: Il forte impatto della Graph Technology: l'esperienza di LARUS e numer...Larus: Il forte impatto della Graph Technology: l'esperienza di LARUS e numer...
Larus: Il forte impatto della Graph Technology: l'esperienza di LARUS e numer...
Neo4j
 
The Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent ApplicationsThe Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent Applications
Neo4j
 
Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...
Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...
Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...
Neo4j
 
A Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain OptimizationA Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain Optimization
Neo4j
 
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -
Dongbum Kim
 
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
Neo4j
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
Neo4j
 
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開Linked Open Data勉強会2020 前編:LODの基礎・作成・公開
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開
KnowledgeGraph
 
Linked Open Data(LOD)を用いた オープンデータの活用事例と今後の展望
Linked Open Data(LOD)を用いたオープンデータの活用事例と今後の展望Linked Open Data(LOD)を用いたオープンデータの活用事例と今後の展望
Linked Open Data(LOD)を用いた オープンデータの活用事例と今後の展望
Kouji Kozaki
 
Hackolade Tutorial - part 1 - What is a data model
Hackolade Tutorial - part 1 - What is a data modelHackolade Tutorial - part 1 - What is a data model
Hackolade Tutorial - part 1 - What is a data model
PascalDesmarets1
 
Demystifying Graph Neural Networks
Demystifying Graph Neural NetworksDemystifying Graph Neural Networks
Demystifying Graph Neural Networks
Neo4j
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
jexp
 
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWAREFIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE
 

What's hot (20)

Neo4j graphs in government
Neo4j graphs in governmentNeo4j graphs in government
Neo4j graphs in government
 
Graphs in Automotive and Manufacturing - Unlock New Value from Your Data
Graphs in Automotive and Manufacturing - Unlock New Value from Your DataGraphs in Automotive and Manufacturing - Unlock New Value from Your Data
Graphs in Automotive and Manufacturing - Unlock New Value from Your Data
 
Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
 
Using Knowledge Graphs to Predict Customer Needs and Improve Quality
Using Knowledge Graphs to Predict Customer Needs and Improve QualityUsing Knowledge Graphs to Predict Customer Needs and Improve Quality
Using Knowledge Graphs to Predict Customer Needs and Improve Quality
 
Linked Data (再)入門
Linked Data (再)入門Linked Data (再)入門
Linked Data (再)入門
 
Graph Database 101- What, Why and How?.pdf
Graph Database 101- What, Why and How?.pdfGraph Database 101- What, Why and How?.pdf
Graph Database 101- What, Why and How?.pdf
 
Larus: Il forte impatto della Graph Technology: l'esperienza di LARUS e numer...
Larus: Il forte impatto della Graph Technology: l'esperienza di LARUS e numer...Larus: Il forte impatto della Graph Technology: l'esperienza di LARUS e numer...
Larus: Il forte impatto della Graph Technology: l'esperienza di LARUS e numer...
 
The Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent ApplicationsThe Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent Applications
 
Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...
Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...
Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...
 
A Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain OptimizationA Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain Optimization
 
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -
 
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
 
Linked Open Dataとは
Linked Open DataとはLinked Open Dataとは
Linked Open Dataとは
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開Linked Open Data勉強会2020 前編:LODの基礎・作成・公開
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開
 
Linked Open Data(LOD)を用いた オープンデータの活用事例と今後の展望
Linked Open Data(LOD)を用いたオープンデータの活用事例と今後の展望Linked Open Data(LOD)を用いたオープンデータの活用事例と今後の展望
Linked Open Data(LOD)を用いた オープンデータの活用事例と今後の展望
 
Hackolade Tutorial - part 1 - What is a data model
Hackolade Tutorial - part 1 - What is a data modelHackolade Tutorial - part 1 - What is a data model
Hackolade Tutorial - part 1 - What is a data model
 
Demystifying Graph Neural Networks
Demystifying Graph Neural NetworksDemystifying Graph Neural Networks
Demystifying Graph Neural Networks
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWAREFIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
 

Similar to Building a repository of biomedical ontologies with Neo4j

GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
Neo4j
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
Connected Data World
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-jupp
Simon Jupp
 
Building and Using Ontologies to do biology
Building and Using Ontologies to do biologyBuilding and Using Ontologies to do biology
Building and Using Ontologies to do biology
robertstevens65
 
Working with big biomedical ontologies
Working with big biomedical ontologiesWorking with big biomedical ontologies
Working with big biomedical ontologies
robertstevens65
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
Pistoia Alliance
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
robertstevens65
 
Tutorial OWL and drug discovery ICBO 2013
Tutorial OWL and drug discovery ICBO 2013Tutorial OWL and drug discovery ICBO 2013
Tutorial OWL and drug discovery ICBO 2013
Samuel Croset
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
Connected Data World
 
Drug-discovery knowledge integration and analysis using OWL and reasoners
Drug-discovery knowledge integration and analysis using OWL and reasonersDrug-discovery knowledge integration and analysis using OWL and reasoners
Drug-discovery knowledge integration and analysis using OWL and reasoners
Samuel Croset
 
NCBO BioPortal SPARQL Endpoint - The Quad Economy of a Semantic Web Ontology ...
NCBO BioPortal SPARQL Endpoint - The Quad Economy of a Semantic Web Ontology ...NCBO BioPortal SPARQL Endpoint - The Quad Economy of a Semantic Web Ontology ...
NCBO BioPortal SPARQL Endpoint - The Quad Economy of a Semantic Web Ontology ...
Trish Whetzel
 
Bh14 ogo
Bh14 ogoBh14 ogo
Ontology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical studyOntology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical study
Debashisnaskar
 
FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?
INRAE (MISTEA) and University of Montpellier (LIRMM)
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
Simon Jupp
 
Bio solr building a better search for bioinformatics
Bio solr   building a better search for bioinformaticsBio solr   building a better search for bioinformatics
Bio solr building a better search for bioinformatics
Charlie Hull
 
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Hilmar Lapp
 
Protein Database
Protein DatabaseProtein Database
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical Study
Debashisnaskar
 
VDOS2013-Zhe-Slides
VDOS2013-Zhe-SlidesVDOS2013-Zhe-Slides
VDOS2013-Zhe-Slides
Zhe (Henry) He
 

Similar to Building a repository of biomedical ontologies with Neo4j (20)

GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-jupp
 
Building and Using Ontologies to do biology
Building and Using Ontologies to do biologyBuilding and Using Ontologies to do biology
Building and Using Ontologies to do biology
 
Working with big biomedical ontologies
Working with big biomedical ontologiesWorking with big biomedical ontologies
Working with big biomedical ontologies
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
Tutorial OWL and drug discovery ICBO 2013
Tutorial OWL and drug discovery ICBO 2013Tutorial OWL and drug discovery ICBO 2013
Tutorial OWL and drug discovery ICBO 2013
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
 
Drug-discovery knowledge integration and analysis using OWL and reasoners
Drug-discovery knowledge integration and analysis using OWL and reasonersDrug-discovery knowledge integration and analysis using OWL and reasoners
Drug-discovery knowledge integration and analysis using OWL and reasoners
 
NCBO BioPortal SPARQL Endpoint - The Quad Economy of a Semantic Web Ontology ...
NCBO BioPortal SPARQL Endpoint - The Quad Economy of a Semantic Web Ontology ...NCBO BioPortal SPARQL Endpoint - The Quad Economy of a Semantic Web Ontology ...
NCBO BioPortal SPARQL Endpoint - The Quad Economy of a Semantic Web Ontology ...
 
Bh14 ogo
Bh14 ogoBh14 ogo
Bh14 ogo
 
Ontology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical studyOntology and Ontology Libraries: a critical study
Ontology and Ontology Libraries: a critical study
 
FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
 
Bio solr building a better search for bioinformatics
Bio solr   building a better search for bioinformaticsBio solr   building a better search for bioinformatics
Bio solr building a better search for bioinformatics
 
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
 
Protein Database
Protein DatabaseProtein Database
Protein Database
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical Study
 
VDOS2013-Zhe-Slides
VDOS2013-Zhe-SlidesVDOS2013-Zhe-Slides
VDOS2013-Zhe-Slides
 

Recently uploaded

Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Hossein Fani
 
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
Dr. sreeremya S
 
Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...
Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...
Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...
Sérgio Sacani
 
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdfHow Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
Task Train
 
Lunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - ArtemisLunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - Artemis
Sérgio Sacani
 
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary TrackThe Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
Sérgio Sacani
 
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
James AH Campbell
 
AN EMPIRE ACROSS THE THREE CONTINENTS.pptx
AN EMPIRE ACROSS THE THREE CONTINENTS.pptxAN EMPIRE ACROSS THE THREE CONTINENTS.pptx
AN EMPIRE ACROSS THE THREE CONTINENTS.pptx
kalpnayadav03021986
 
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Review Article:- A REVIEW ON RADIOISOTOPES IN CANCER THERAPY
Review Article:- A REVIEW ON RADIOISOTOPES IN CANCER THERAPYReview Article:- A REVIEW ON RADIOISOTOPES IN CANCER THERAPY
Review Article:- A REVIEW ON RADIOISOTOPES IN CANCER THERAPY
niranjangiri009
 
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanetHydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Sérgio Sacani
 
Complementary interstellar detections from the heliotail
Complementary interstellar detections from the heliotailComplementary interstellar detections from the heliotail
Complementary interstellar detections from the heliotail
Sérgio Sacani
 
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
Sérgio Sacani
 
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptxSCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
WALTONMARBRUCAL
 
Potential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptxPotential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptx
J. Bovas Joel BFSc
 
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
University of Maribor
 
seed drying lecture, different types of dryers
seed drying lecture, different types of dryersseed drying lecture, different types of dryers
seed drying lecture, different types of dryers
Rammehargahlot1
 
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptxBIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
alishyt102010
 
Gasification and Pyrolyssis of plastic Waste under a Circular Economy perpective
Gasification and Pyrolyssis of plastic Waste under a Circular Economy perpectiveGasification and Pyrolyssis of plastic Waste under a Circular Economy perpective
Gasification and Pyrolyssis of plastic Waste under a Circular Economy perpective
Recupera
 
Adjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyerAdjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyer
pablovgd
 

Recently uploaded (20)

Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
 
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
 
Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...
Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...
Transmission Spectroscopy of the Habitable Zone Exoplanet LHS 1140 b with JWS...
 
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdfHow Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
 
Lunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - ArtemisLunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - Artemis
 
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary TrackThe Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
 
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
 
AN EMPIRE ACROSS THE THREE CONTINENTS.pptx
AN EMPIRE ACROSS THE THREE CONTINENTS.pptxAN EMPIRE ACROSS THE THREE CONTINENTS.pptx
AN EMPIRE ACROSS THE THREE CONTINENTS.pptx
 
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
 
Review Article:- A REVIEW ON RADIOISOTOPES IN CANCER THERAPY
Review Article:- A REVIEW ON RADIOISOTOPES IN CANCER THERAPYReview Article:- A REVIEW ON RADIOISOTOPES IN CANCER THERAPY
Review Article:- A REVIEW ON RADIOISOTOPES IN CANCER THERAPY
 
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanetHydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
 
Complementary interstellar detections from the heliotail
Complementary interstellar detections from the heliotailComplementary interstellar detections from the heliotail
Complementary interstellar detections from the heliotail
 
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
 
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptxSCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
 
Potential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptxPotential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptx
 
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
 
seed drying lecture, different types of dryers
seed drying lecture, different types of dryersseed drying lecture, different types of dryers
seed drying lecture, different types of dryers
 
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptxBIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
 
Gasification and Pyrolyssis of plastic Waste under a Circular Economy perpective
Gasification and Pyrolyssis of plastic Waste under a Circular Economy perpectiveGasification and Pyrolyssis of plastic Waste under a Circular Economy perpective
Gasification and Pyrolyssis of plastic Waste under a Circular Economy perpective
 
Adjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyerAdjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyer
 

Building a repository of biomedical ontologies with Neo4j

  • 1. Building a repository of biomedical ontologies with Neo4j Simon Jupp Samples, Phenotypes and Ontologies Team European Bioinformatics Institute Cambridge, UK.
  • 2. Outline • Why we care about ontologies in biology • Why we need a repository of ontologies • Building a new Ontology Lookup Service (OLS) at the EBI • Index OWL ontologies in Neo4j • OLS Infrastructure • Challenges with Neo4j • Neo4j and Linked Open Data
  • 3. What is EMBL-EBI? • Part of the European Molecular Biology Laboratory • International, non-profit research institute • Europe’s hub for biological data services and research • Based in Hinxton, Cambridge
  • 4. Data resources at EMBL-EBI Genes, genomes & variation ArrayExpress Expression Atlas Metabolights PRIDE InterPro Pfam UniProt ChEMBL ChEBI Literature & ontologies Europe PubMed Central Gene Ontology Experimental Factor Ontology Molecular structures Protein Data Bank in Europe Electron Microscopy Data Bank European Nucleotide Archive 1000 Genomes Gene, protein & metabolite expression Protein sequences, families & motifs Chemical biology Reactions, interactions & pathways IntAct Reactome MetaboLights Systems BioModels Enzyme Portal BioSamples Ensembl Ensembl Genomes European Genome-phenome Archive Metagenomics portal
  • 5. Biological data heavily interlinked Proteome Metabolome Genome tissue CE-MS antibody array LC-MS/MS m/z 600 800 1000 1200 1400 1600 10 20 30 40 50 60 70 80 90 100 Intensity 609.256 b6 755.422 y8 882.357 b9 852.476 y9 995.435 b10 1092.506 b11 1181.252 y12 1318.578 b13 1587.759 b16 1715.817 b18 858.408 b18 ++ 794.380 b16 ++ 0 miRNA array mRNA array PathwaysProtein Interaction Drug targets
  • 6. We have a lot of data silos • A lot of public data • Heterogeneous semantics, formats, identifiers • EBI and other institutes invest heavily in cross-linking resources
  • 7. We need terminology standards CanineDog Different Words Same Concept
  • 8. One Identity for each entity • Mouse or Mus or mice = NCBITaxon_10088 • …but not all mice are equal
  • 9. Building ontologies • Put things into categories • Helps organise the data • Allows us to generalise over data • Capture the relations between things • Anatomical parts Biopolymer Nucleic Acid Polypeptide EnzymeDNA RNA tRNA mRNA smRNA
  • 10. Web Ontology Language – (OWL) • W3C standard vocabulary for describing ontologies • OWL is based on a description logic • We can use it to describe sets of things based on their properties • A subclassOf B - Implies all things of type A, are also things of type B • “heart” part-of “Cardiovascular System” • Powerful knowledge representation ‘mitochondrial chromosome’ ‘equivalent to’ chromosome and ‘part of’ some mitochondrion
  • 11. Using a DL reasoner to infer classification Relatively flat asserted view Inferred polyhierarchy OWL reasoner
  • 12. 12 Genotype Phenotype Sequence Proteins Gene products Transcript Pathways Cell type BRENDA tissue / enzyme source Development Anatomy Phenotype Plasmodium life cycle -Sequence types and features -Genetic Context - Molecule role - Molecular Function - Biological process - Cellular component -Protein covalent bond -Protein domain -UniProt taxonomy -Pathway ontology -Event (INOH pathway ontology) -Systems Biology -Protein-protein interaction -Arabidopsis development -Cereal plant development -Plant growth and developmental stage -C. elegans development -Drosophila development FBdv fly development.obo OBO yes yes -Human developmental anatomy, abstract version -Human developmental anatomy, timed version -Mosquito gross anatomy -Mouse adult gross anatomy -Mouse gross anatomy and development -C. elegans gross anatomy -Arabidopsis gross anatomy -Cereal plant gross anatomy -Drosophila gross anatomy -Dictyostelium discoideum anatomy -Fungal gross anatomy FAO -Plant structure -Maize gross anatomy -Medaka fish anatomy and development -Zebrafish anatomy and development -NCI Thesaurus -Mouse pathology -Human disease -Cereal plant trait -PATO PATO attribute and value.obo -Mammalian phenotype -Habronattus courtship -Loggerhead nesting -Animal natural history and life history eVOC (Expressed Sequence Annotation for Humans) Ontologies for life sciences
  • 13. We do a lot of tagging CL:CL_0000071 (blood vessel endothelial cell) obo:CHEBI_39867 (valproic acid) NCBITaxon:NCBITa xon_9606 (Homo Sapiens)
  • 14. Ontologies add value Smarter searching Data visualisation Data analysis Data integration
  • 15. Summary so far… • Ontologies provide a “semantic glue” for integrating biological data • There’s a lot of ontologies about • The biological community need ontology infrastructure and services • Ontologies can be complex • Ontologies can be big • Ontologies can change
  • 16. Ontologies as Graphs • OWL ontologies aren’t graphs, but… … can be represented as an RDF graph … people want to use them as graphs • Plenty of RDF databases around • But incomplete w.r.t. OWL semantics • SPARQL is an acquired taste
  • 17. Ontology repository use-cases • Search for ontology terms • labels, synonyms, descriptions • Querying the structure • Get parent/child terms • Querying transitive closure • Get ancestor/descendant terms • Querying across relations • Partonomy or development stages • A graph database and search index should satisfy these requirements
  • 18. The old Ontology Lookup Service • EBI been hosting a repository of over 100 Bio-medical ontologies for past 10 years • SOAP services for programmatic access • Up to 25 million requests per month (mostly API). http://www.ebi.ac.uk/ontology-lookup
  • 20. Why we need a new OLS • Old codebase (+10 years in places) • Updated to work with OWL (not OBO) • Uses Oracle RDMS and SQL for querying ontology structure (suboptimal) • Ditch SOAP/XML in favour of REST/JSON
  • 21. OLS 3.0 • Rebuilt from scratch • Polls ontologies by URL • Server side checksum for detecting changes in files • Uses Java OWL API for loading (still supports OBO) • Infer relations with reasoner • RESTful API built with Spring Data • Multiple indexes for scalable querying • SOLR server – text queries • Embedded Neo4j – graph queries (drives REST API) • Virtuoso server – SPARQL for Advanced users
  • 22. OLS 3 beta is now live • http://www.ebi.ac.uk/ols/beta/ • 140 ontologies • Neo4j version 2.2 • Runs in embedded mode • Inside Tomcat container • 7 million nodes • 11 million edges • ~10Gb on disk • Generic ontology infrastructure • Can load any OWL or SKOS file • Built with standard technologies • Solr, Neo4j, Spring IO, Thymeleaf, Bootstrap, Jquery • Includes stand-alone Spring-Boot app for loading ontologies into Neo4j • Open-source project https://github.com/EBISPOT/OLS
  • 23. REST API • Search across any field in one or more ontologies (SOLR) • /search • Get ontology and term meta data (Neo4j) • /ontologies • /ontologies/{name} • /ontologies/{name}/terms • /ontologies/{name}/terms/{termid} • Get related terms and navigate ontology structure (Neo4j) • /ontologies/{name}/terms/{termid}/parent • /ontologies/{name}/terms/{termid}/children • /ontologies/{name}/terms/{termid}/descendants • /ontologies/{name}/terms/{termid}/ancestors • /ontologies/{name}/terms/{termid}/{relation} e.g. part_of • Get JSON for common visualisation libraries (Neo4j) • /ontologies/{name}/terms/{termid}/tree • /ontologies/{name}/terms/{termid}/graph http://www.ebi.ac.uk/ols/beta/api
  • 24. OWL to Neo4j schema Label every node by type (e.g. class, property or individual) and ontology id Label every relation by name include additional index for “special relations” like partonomy and subsets
  • 25. Nightly Neo4j build process Nightly crawl of all >140 registered ontologies Use the Java OWL API and reasoner to classify ontology (get the inferred classification) Use Neo4j BatchInserter to update neo4j index Download file create checksum If the file is new Drop ontology from neo4j index
  • 26. OLS 3.0 Infrastructure 2 x Load balanced Tomcat servers Two data centers Data center 1 (8GB VM) Data center 2 (8GB VM)
  • 27. Why Neo4j? • Our primary use-case required a graph store • OWL mapping to RDF graph is complex (lots of blank nodes) • We wanted Spring Data and Spring Data Rest • Less code for us to maintain • Didn’t want to write our own DAO using SPARQL • (We’ve tried this on another project) • We wanted something that we could rely on with community behind it • Neo4j was quick to pick up • 1 day GraphAware course 4 months ago • Working pilot for new OLS + Neo4j 1 month later
  • 28. Powerful yet simple queries • Get the transitive closure for “heart” following parent and partonomy relations from the UBERON anatomy ontology MATCH path = (n:Class)-[r:SUBCLASSOF|RelatedTree*] ->(parent)<-[r2:SUBCLASSOF|RelatedTree]-(sibling:Class) WHERE n.ontology_name = {0} AND n.iri = {1}
  • 29. Generating visualisations MATCH path = (n:Class)-[r:SUBCLASSOF|Related]-(parent) WHERE n.ontology_name = {0} AND n.iri = {1} RETURN {nodes: collect( distinct {iri: p.iri, label: p.label}), edges: collect (distinct {source: startNode(r1).iri, target: endNode(r1).iri, label: r1.label, uri: r1.uri} )} as result Generating common JSON representations directly from Cypher is very powerful
  • 30. Challenges • Wanted to utilise Spring for our REST API • We had a REST resource hierarchy that we wanted api/ontologies/{name}/terms/{termid}/parents api/ontologies/{name}/terms/{termid}/children • Too hard to get this to work using just an object model and SDN alone • No matter what we tried always ended up sending Neo4j into a spin @NodeEntity @TypeAlias(value = "Class") public class Term { @RelatedToVia (direction= Direction.OUTGOING, type = ”SUBCLASSOF") @Fetch Set<Term> parents; @RelatedToVia (direction= Direction.INCOMING, type = ”SUBCLASSOF") @Fetch Set<Term> children; }
  • 31. …but it was easy enough to achieve what we wanted with some Spring magic Repository interface with custom Cypher Define our own controllers Custom Resource Assemblers for HAL links
  • 32. Challenges • We need dynamic fields • Neo4j is driving the REST API • Each ontology term has metadata where we don’t know the field names up front (e.g. ‘created by’ or ‘comment’) • To get get the right set of dependencies we currently use SDN 3.4.0 • Dynamic fields not supported in SDN 4.0 • We are forced to run in embedded mode • Is this true? • Scaling tips for running inside a tomcat please
  • 33. Challenges • Full index rebuild takes up to 20 hours • Most nights the update runs in ~2 hours • We have one master Neo4j db • If an ontology needs updating we take it out and then reload • Built on machine with 128GB memory + SSD • There’s always a chance we might trash the entire index • We’d like to build an index for each ontology independently. • Have a final stage where we merge all the successfully built indexes • Other suggestions?
  • 34. Things we’d like to do • Extract subsets from a graph • Some nodes are tagged as being in a subset • Help to give broad overview of an annotated datasets • May require us to infer relations Master graph Extracted subset graph
  • 35. B cells IGJ IGHA1 LRRN3 SYT11 DSC1 SVIL IGLC3 DPP4 MAN1C1 liver cancer GNA01 CEP57 ASB1 PNPLA4 FA2H NR4A1 IFNA2 TNPO1 epithelial cells DST FBLN1 BCL2 WDR1 METTL7A CYB561 FGFR2 SPARC EMC1 Calculating shortest paths ?Where do these nodes intersect? How can we enrich these datasets using the ontologies?
  • 36. Recap • The EBI Ontology Lookup Service provides access to the ontologies for biological researchers and database curators • Main priority is providing a scalable API for external services to develop against • Pilot of Neo4j quickly turned into our primary index for driving the REST API • There is no one fit solution for the backend, always some compromise • So we make the most of frameworks like Spring Data Solr and Spring Data Neo4j to make creating multiple indexes simpler • Neo4j has been easy to get grips with and scaled well for our setup with pretty much out of the box configuration
  • 37. A word on Linked Data • We have many years experience working with RDF and Semantic Web technologies • The EBI RDF platform –EBI data that has been converted to RDF (Billions of triples) • The ontologies and the data in one big federated graph • http://www.ebi.ac.uk/rdf - powerful data integration platform • Semantic Web technologies have struggled to get mainstream adoption • Reasons: Hype, Complexity, Baggage, Poor implementations • Remain relevant in the life sciences • A lot of public data out there that needs to be integrated
  • 38. Life sciences rely on Linked Open Data • Linked data is a rebranding of the Semantic Web • Core principles address our data integration needs • Use URIs to identify things • Type things with ontology terms • Make sure URIs resolve (self describing documents) • Link documents together • We see some major wins if Neo4j was more linked data friendly • This doesn’t have to mean supporting SPARQL • A general feeling of tension between Neo4j and the RDF community
  • 39. Final thoughts – Neo4j and JSON-LD? • A lot of frameworks now make it trivial to produce good APIs • What’s currently missing is how to integrate data from two or more independent APIs • Hard to crawl independent datasets for connections without a human to interpret semantics • Still a need to express a schema alongside the data • W3C standard like RDF/RDFS/SKOS/OWL provide the basic vocabularies and semantics for expressing data schemas • JSON-LD is bridging the gap from JSON to RDF
  • 40. Be open • We are committed to making life science data public and freely available • Likewise the tools and software we develop to work with the data are open • We always strive to use products that are open and freely available • We can only use Neo4j while it continues to be made available in this model • Vendor lock-in for our products is very bad for us • Graph database have great potential for biology • But we need open standards for these databases
  • 41. Acknowledgements • Sample Phenotypes and Ontologies Team - Tony Burdett, James Malone, Dani Welter, Catherine Leroy, Sira Sarntivijai, Ilinca Tudose, Helen Parkinson • Matt Pearce – Flax (BioSOLR project) • Michal Bachman and GraphAware team • Funding • European Molecular Biology Laboratory (EMBL) • European Union projects: DIACHRON, BioMedBridges and CORBEL