Towards FAIRer Biological Knowledge Networks Using a Hybrid Linked Data and Graph Database approach
Jun. 15, 2018•0 likes
2 likes
Be the first to like this
Show More
•464 views
views
Total views
0
On Slideshare
0
From embeds
0
Number of embeds
0
Download to read offline
Report
Data & Analytics
Presented at Integrative Bioinformatics Conference (IB2018, Harpenden, 2018).
We describe how to use Semantic Web Technologies and graph databases like Neo4j to serve life science data and address the FAIR data principles.
Towards FAIRer Biological Knowledge Networks Using a Hybrid Linked Data and Graph Database approach
Towards FAIRer Biological Knowledge Networks
Using a Hybrid Linked Data
and Graph Database approach
Harpenden, 5/6/2018
Marco Brandizi <marco.brandizi@rothamsted.ac.uk>
Find these slides on SlideShare
KnetMiner-inspired Artwork
by Hugo Dalton (hugodalton.com)
Behind the Scenes
• Starting point: graph data model
• With concepts, relations between concepts hierarchies of concept classes and relation
types
• => There are standardised ways for it
• Make app development easier
• independent components on top of a unified data model
• clear separation between data access and apps
• Serve third-party applications, making their data access no different than ours
• Simplify the way we ingest data,
• ease conversions from multiple formats into unified model
• relax the high-memory requirements need (e.g., backing data store)
• prepare for scalability (e.g., cloud stores, big data stores)
The Semantic Web Way
• It’s for networked knowledge (semantic networks)
• Focuses on sharing via web technologies and
principles (eg, share resolvable URIs)
• Rich ‘schema’ language, already much used in life
sciences (i.e., ontologies, coming from frames and 1st-
order logics)
• protocol + a standard query language (SPARQL)
Modelling data with OWL:
Promises & Wishes
• Rich semantics, but also very formal, so that powerful automated reasoning is possible
Modelling data with OWL:
Promises & Wishes
• Rich semantics, but also very formal, so that powerful automated reasoning is possible
• On the messy web ocean?!
Modelling data with OWL:
Promises & Wishes
• Rich semantics, but also very formal, so that powerful automated reasoning is possible
• On the messy web ocean?!
• What about performance?!
Modelling data with OWL:
Promises & Wishes
• Rich semantics, but also very formal, so that powerful automated reasoning is possible
• On the messy web ocean?!
• What about performance?!
• Very formal semantics is not very easy:
Modelling data with OWL:
Promises & Wishes
• Rich semantics, but also very formal, so that powerful automated reasoning is possible
• On the messy web ocean?!
• What about performance?!
• Very formal semantics is not very easy:
• SomeValuesFrom Restriction?!
Modelling data with OWL:
Promises & Wishes
• Rich semantics, but also very formal, so that powerful automated reasoning is possible
• On the messy web ocean?!
• What about performance?!
• Very formal semantics is not very easy:
• SomeValuesFrom Restriction?!
• My blood sample derives from some Skolem human, not from
NCBI:HomoSapiens?
Modelling data with OWL:
Promises & Wishes
• Rich semantics, but also very formal, so that powerful automated reasoning is possible
• On the messy web ocean?!
• What about performance?!
• Very formal semantics is not very easy:
• SomeValuesFrom Restriction?!
• My blood sample derives from some Skolem human, not from
NCBI:HomoSapiens?
• Ontologies defined for the whole world and then harmoniously and lovely shared
Modelling data with OWL:
Promises & Wishes
• Rich semantics, but also very formal, so that powerful automated reasoning is possible
• On the messy web ocean?!
• What about performance?!
• Very formal semantics is not very easy:
• SomeValuesFrom Restriction?!
• My blood sample derives from some Skolem human, not from
NCBI:HomoSapiens?
• Ontologies defined for the whole world and then harmoniously and lovely shared
• Fairly reasonable: all those vocabularies are headaches, I don’t have expertise
Modelling data with OWL:
Promises & Wishes
• Rich semantics, but also very formal, so that powerful automated reasoning is possible
• On the messy web ocean?!
• What about performance?!
• Very formal semantics is not very easy:
• SomeValuesFrom Restriction?!
• My blood sample derives from some Skolem human, not from
NCBI:HomoSapiens?
• Ontologies defined for the whole world and then harmoniously and lovely shared
• Fairly reasonable: all those vocabularies are headaches, I don’t have expertise
• Reasonable: I have a different point of view
Modelling data with OWL:
Promises & Wishes
• Rich semantics, but also very formal, so that powerful automated reasoning is possible
• On the messy web ocean?!
• What about performance?!
• Very formal semantics is not very easy:
• SomeValuesFrom Restriction?!
• My blood sample derives from some Skolem human, not from
NCBI:HomoSapiens?
• Ontologies defined for the whole world and then harmoniously and lovely shared
• Fairly reasonable: all those vocabularies are headaches, I don’t have expertise
• Reasonable: I have a different point of view
• Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/
worse
Modelling data with OWL:
Promises & Wishes
• Rich semantics, but also very formal, so that powerful automated reasoning is possible
• On the messy web ocean?!
• What about performance?!
• Very formal semantics is not very easy:
• SomeValuesFrom Restriction?!
• My blood sample derives from some Skolem human, not from
NCBI:HomoSapiens?
• Ontologies defined for the whole world and then harmoniously and lovely shared
• Fairly reasonable: all those vocabularies are headaches, I don’t have expertise
• Reasonable: I have a different point of view
• Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/
worse
• Likely not reasonable: Not Invented Here
Modelling data with OWL:
Promises & Wishes
• Rich semantics, but also very formal, so that powerful automated reasoning is possible
• On the messy web ocean?!
• What about performance?!
• Very formal semantics is not very easy:
• SomeValuesFrom Restriction?!
• My blood sample derives from some Skolem human, not from
NCBI:HomoSapiens?
• Ontologies defined for the whole world and then harmoniously and lovely shared
• Fairly reasonable: all those vocabularies are headaches, I don’t have expertise
• Reasonable: I have a different point of view
• Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/
worse
• Likely not reasonable: Not Invented Here
• No comment: if I reinvent it, I can publish it
Modelling data with OWL:
Promises & Wishes
• Rich semantics, but also very formal, so that powerful automated reasoning is possible
• On the messy web ocean?!
• What about performance?!
• Very formal semantics is not very easy:
• SomeValuesFrom Restriction?!
• My blood sample derives from some Skolem human, not from
NCBI:HomoSapiens?
• Ontologies defined for the whole world and then harmoniously and lovely shared
• Fairly reasonable: all those vocabularies are headaches, I don’t have expertise
• Reasonable: I have a different point of view
• Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/
worse
• Likely not reasonable: Not Invented Here
• No comment: if I reinvent it, I can publish it
• Just joking (maybe…): Your ontology is good, but I’d rather stab you on your back
Modelling data with OWL:
Promises & Wishes
• Rich semantics, but also very formal, so that powerful automated reasoning is possible
• On the messy web ocean?!
• What about performance?!
• Very formal semantics is not very easy:
• SomeValuesFrom Restriction?!
• My blood sample derives from some Skolem human, not from
NCBI:HomoSapiens?
• Ontologies defined for the whole world and then harmoniously and lovely shared
• Fairly reasonable: all those vocabularies are headaches, I don’t have expertise
• Reasonable: I have a different point of view
• Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/
worse
• Likely not reasonable: Not Invented Here
• No comment: if I reinvent it, I can publish it
• Just joking (maybe…): Your ontology is good, but I’d rather stab you on your back
In fact, they did this
Modelling data with OWL:
Promises & Wishes
• Rich semantics, but also very formal, so that powerful automated reasoning is possible
• On the messy web ocean?!
• What about performance?!
• Very formal semantics is not very easy:
• SomeValuesFrom Restriction?!
• My blood sample derives from some Skolem human, not from
NCBI:HomoSapiens?
• Ontologies defined for the whole world and then harmoniously and lovely shared
• Fairly reasonable: all those vocabularies are headaches, I don’t have expertise
• Reasonable: I have a different point of view
• Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/
worse
• Likely not reasonable: Not Invented Here
• No comment: if I reinvent it, I can publish it
• Just joking (maybe…): Your ontology is good, but I’d rather stab you on your back
In fact, they did this
Modelling data with OWL:
Promises & Wishes
• Rich semantics, but also very formal, so that powerful automated reasoning is possible
• On the messy web ocean?!
• What about performance?!
• Very formal semantics is not very easy:
• SomeValuesFrom Restriction?!
• My blood sample derives from some Skolem human, not from
NCBI:HomoSapiens?
• Ontologies defined for the whole world and then harmoniously and lovely shared
• Fairly reasonable: all those vocabularies are headaches, I don’t have expertise
• Reasonable: I have a different point of view
• Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/
worse
• Likely not reasonable: Not Invented Here
• No comment: if I reinvent it, I can publish it
• Just joking (maybe…): Your ontology is good, but I’d rather stab you on your back
In fact, they did this
Simplifying Views in BioKNO
obo:GO_0030015
a owl:Class ;
rdfs:label "CCR4-NOT core complex"^^xsd:string ;
rdfs:subClassOf obo:GO_0044424, obo:GO_0044424,
[
a owl:Restriction ;
owl:onProperty <http://purl.obolibrary.org/obo/BFO_0000050> ; # 'part of'
owl:someValuesFrom obo:GO_0030014 # CCR4-NOT complex
] ;
oboInOwl:id "GO:0030015"^^xsd:string ;
obo:IAO_0000115 "The core of the CCR4-NOT complex. In Saccharomyces the CCR4-NOT...";
oboInOwl:hasOBONamespace "cellular_component"^^xsd:string .
Simplifying Views in BioKNO
obo:GO_0030015
a owl:Class ;
rdfs:label "CCR4-NOT core complex"^^xsd:string ;
rdfs:subClassOf obo:GO_0044424, obo:GO_0044424,
[
a owl:Restriction ;
owl:onProperty <http://purl.obolibrary.org/obo/BFO_0000050> ; # 'part of'
owl:someValuesFrom obo:GO_0030014 # CCR4-NOT complex
] ;
oboInOwl:id "GO:0030015"^^xsd:string ;
obo:IAO_0000115 "The core of the CCR4-NOT complex. In Saccharomyces the CCR4-NOT...";
oboInOwl:hasOBONamespace "cellular_component"^^xsd:string .
obo:GO_0030014 a bk:GeneOntologyTerm ;
dc:identifier obo:GO_0030014_acc ;
bk:is_a obo:GO_0044424 , obo:GO_0043234 ;
bk:prefName "CCR4-NOT complex" .
obo:GO_0030015 a bk:GeneOntologyTerm;
bk:prefName "CCR4-NOT core complex";
bk:is_a obo:GO_0044424, obo:GO_0043234 ;
bk:part_of obo:GO_0030014;
dc:identifier obo:GO_0030015_acc.
obo:GO_0044424 a bk:GeneOntologyTerm;
bk:prefName "intracellular part" ;
• OWL is simplified mixing classes with SKOS-style
concepts
• More suitable for less formal, more simple
taxonomies
• OWL-2 punning makes it consistent
The BioKNO Ontology
(and The rest of the World)
BioKNO External Ontologies Mapping Type
bk:Concept skos:Concept Subclass
bk:Relation
bk:relFrom
bk:relTypeRef
bk:relTo
rdf:Statement
rdf:subject
rdf:predicate
rdf:object
Subclass
Subproperties
(ie, mapping to RDF reified
statements)
bk:Path, bk:Participant, bk:Interaction, bk:Transport,
bk:Protein, bk:Gene
Classes with same names in BioPAX and SIO Equivalent Class
bk:participates_in
bk:has_participant
Relation Ontology (RO) properties with same names
biopax:participant (as sub-property)
Equivalent property
bk:produces
bk:produced_by
bk:consumes
bk:consumed_by
biopax:product (as sub-property)
RO properties with same names
Equivalent property
bk:regulates
bk:positively_regulates
bk:negatively_regulates
RO properties with same names Equivalent property
bk:is_a
bk:part_of, bk:has_part
bk:occurs_in, bk:co_occurs_with
skos:broader
Basic Formal Ontology (BFO)/RO properties with same
names
Equivalent property
bk:Publication schema:CreativeWork Subclass
bka:abstract
bka:title (also known as AbstractHeader)
bka:authors
dcterms:description
dcterms:title
dc:creator
Sub-property
CONSTRUCT {
?protIri bk:expressed_by ?sampleIri.
?degRelIri
a bk:Relation;
bka:PVALUE ?pValue;
bk:evidence bkev:EXP; # Inferred from experiment
bk:relFrom ?protIri; # Details defined by UniProt info
bk:relTo ?sampleIri; # Details defined by sample_degs_2.tsv
bk:relTypeRef bk:expressed_by.
}
WHERE {
# Some IDs and IRIs to be defined above
BIND ( LCASE ( REPLACE ( ?Sample, ' ', '_' ) ) AS ?sampleId )
BIND ( IRI ( CONCAT ( STR ( bkr: ), ?Gene_Symbol ) ) AS ?protIri )
BIND ( IRI ( CONCAT ( STR ( bkr: ), 'degex_', ?sampleId ) ) AS ?sampleIri )
BIND ( IRI ( CONCAT ( STR ( bkr: ), 'degex_', ?sampleId, '_', LCASE ( ?Gene_Symbol ) ) )
AS ?degRelIri )
BIND ( xsd:double ( ?p_value ) AS ?pValue )
}
Extraction, Loading, Transformation
SPARQL/TARQL Example
SPARQL/RDF for ELT
• RDF-to-RDF translation via CONSTRUCT (or SPARUL)
• TARQL: Using SPARQL to RDF-Convert Tabular CSV Files
• RDF/XML can be transformed via XSL
• We have done it for bio-specific ontology definitions in Ondex
• Programmatic conversions
• Using RDF frameworks, eg, Jena, RDF4J (former Sesame), rdflib for
Python
• See also java2rdf (https://github.com/EBIBioSamples/java2rdf)
• We have used it for the Ondex->RDF converter
SPARQL/RDF for ELT
• RDF-to-RDF translation via CONSTRUCT (or SPARUL)
• TARQL: Using SPARQL to RDF-Convert Tabular CSV Files
• RDF/XML can be transformed via XSL
• We have done it for bio-specific ontology definitions in Ondex
• Programmatic conversions
• Using RDF frameworks, eg, Jena, RDF4J (former Sesame), rdflib for
Python
• See also java2rdf (https://github.com/EBIBioSamples/java2rdf)
• We have used it for the Ondex->RDF converter
Issues
https://lod-cloud.net/
• Still not so popular (especially in more commercial contexts)
• It’s (perceived as) difficult (in particular, SPARQL)
• Bad reputation
• Performance can still be an issue
• eg, optimising SPARQL can be hard
• Specific issues
• eg, I need contextualised/attribute-attached properties
• and I don’t fancy reified relations…
Triple Stores vs Prop Graphs
Neo4j, Cypher DBs, Graph DBs Semantic Web/Triple Stores
Data xchg format
- No official one, just Cypher,
Support for GraphML, RDF
+/- Focus on backing applications
+ Focus on data sharing standards
Data model
+ Relations with properties
- Metadata/schemas/ontologies management
- Relations cannot have properties (reification
required)
+ Metadata/schemas/ontologies as first citizen
and standardised OWL
Performance + complex graph traversals + Comparable in most cases
Query Language
+ Cypher is easier (eg, compact, implicit elems)?
- Expressivity issues (unions)
- No standard QL (but efforts in progress, eg,
OpenCypher)
- SPARQL is Harder? (URIs, namespaces,
verbosity)
+ SPARQL More expressive
Standardisation,
openness
+/- (TinkerPop is open, Neo4j isn’t)
+ Commercial support
+ More alive and up-to date (e.g., support for
Hadoop, nice Neo4j browser, easy installation)
+ Natively open, many open implementations
- Instability and many short-lived prototypes
- Advancements seems to be slowing down
+ Some nice open and commercial browser
(LODEStar,
Scalability,
big data
+/- Commercial support to clustering/clouds for
Neo4j
+ Open support in TinkerPop
+ Load Balancing/Cluster solutions, Commercial
Cloud support (eg GraphDB)
+ SPARQL Over TinkerPop (via SAIL inteface)
KnetMiner UI Overview
Search Select Explore
Addressing FAIR
• Findable (or, Semantic Web is still useful)
• SPARQL endpoint
• which powers URI Resolution
• Dataset-level metadata (e.g., VoID)
• Mapping to Standard Ontologies
• Interested in contributing to existing standards
(e.g., Bioschemas)
• API/JSON-Schema formalisation
• Accessible
• Multiple access means (SPARQL, URIs, JSON APIs, Cypher)
• Triple Stores and Property Graphs are complementary, not
alternative
• Data dumps
• Interoperable (or, Sem Web is still useful)
• Unified model encoded in at least one common syntax (RDF)
• URIs are reused
• Mappings to ontologies
• Reusable
• All of the above, plus multiple interfaces under unified model
• Support to common graph languages (e.g., Cytoscape.js)
• Converters (e.g., our RDF conversion scripts/tools, rdf2neo)
• Open Data licences