Advertisement

Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and Graph Database approach

Bioinformatics Specialist
Jun. 15, 2018
Advertisement

More Related Content

Similar to Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and Graph Database approach(20)

More from Rothamsted Research, UK(20)

Advertisement

Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and Graph Database approach

  1. Towards FAIRer Biological Knowledge Networks 
 Using a Hybrid Linked Data 
 and Graph Database approach Harpenden, 5/6/2018
 
 Marco Brandizi <marco.brandizi@rothamsted.ac.uk> Find these slides on SlideShare KnetMiner-inspired Artwork
 by Hugo Dalton (hugodalton.com)
  2. Can we do More with KnetMiner Data? (and better)
  3. Behind the Scenes • Starting point: graph data model • With concepts, relations between concepts hierarchies of concept classes and relation types • => There are standardised ways for it • Make app development easier • independent components on top of a unified data model • clear separation between data access and apps • Serve third-party applications, making their data access no different than ours • Simplify the way we ingest data, • ease conversions from multiple formats into unified model • relax the high-memory requirements need (e.g., backing data store) • prepare for scalability (e.g., cloud stores, big data stores)
  4. Putting it on a Bigger Picture
  5. The Semantic Web Way • It’s for networked knowledge (semantic networks) • Focuses on sharing via web technologies and principles (eg, share resolvable URIs) • Rich ‘schema’ language, already much used in life sciences (i.e., ontologies, coming from frames and 1st- order logics) • protocol + a standard query language (SPARQL)
  6. Modelling data with OWL:
 Promises & Wishes
  7. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible
  8. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?!
  9. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?!
  10. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy:
  11. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?!
  12. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens?
  13. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared
  14. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise
  15. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise • Reasonable: I have a different point of view
  16. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise • Reasonable: I have a different point of view • Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/ worse
  17. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise • Reasonable: I have a different point of view • Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/ worse • Likely not reasonable: Not Invented Here
  18. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise • Reasonable: I have a different point of view • Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/ worse • Likely not reasonable: Not Invented Here • No comment: if I reinvent it, I can publish it
  19. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise • Reasonable: I have a different point of view • Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/ worse • Likely not reasonable: Not Invented Here • No comment: if I reinvent it, I can publish it • Just joking (maybe…): Your ontology is good, but I’d rather stab you on your back
  20. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise • Reasonable: I have a different point of view • Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/ worse • Likely not reasonable: Not Invented Here • No comment: if I reinvent it, I can publish it • Just joking (maybe…): Your ontology is good, but I’d rather stab you on your back In fact, they did this
  21. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise • Reasonable: I have a different point of view • Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/ worse • Likely not reasonable: Not Invented Here • No comment: if I reinvent it, I can publish it • Just joking (maybe…): Your ontology is good, but I’d rather stab you on your back In fact, they did this
  22. Modelling data with OWL:
 Promises & Wishes • Rich semantics, but also very formal, so that powerful automated reasoning is possible • On the messy web ocean?! • What about performance?! • Very formal semantics is not very easy: • SomeValuesFrom Restriction?! • My blood sample derives from some Skolem human, not from NCBI:HomoSapiens? • Ontologies defined for the whole world and then harmoniously and lovely shared • Fairly reasonable: all those vocabularies are headaches, I don’t have expertise • Reasonable: I have a different point of view • Not always reasonable: possibly, yours is complicated/wrong/stupid/idiotic/ worse • Likely not reasonable: Not Invented Here • No comment: if I reinvent it, I can publish it • Just joking (maybe…): Your ontology is good, but I’d rather stab you on your back In fact, they did this
  23. Simplifying Views in BioKNO obo:GO_0030015 a owl:Class ; rdfs:label "CCR4-NOT core complex"^^xsd:string ; rdfs:subClassOf obo:GO_0044424, obo:GO_0044424, [ a owl:Restriction ; owl:onProperty <http://purl.obolibrary.org/obo/BFO_0000050> ; # 'part of' owl:someValuesFrom obo:GO_0030014 # CCR4-NOT complex ] ; oboInOwl:id "GO:0030015"^^xsd:string ; obo:IAO_0000115 "The core of the CCR4-NOT complex. In Saccharomyces the CCR4-NOT..."; oboInOwl:hasOBONamespace "cellular_component"^^xsd:string .
  24. Simplifying Views in BioKNO obo:GO_0030015 a owl:Class ; rdfs:label "CCR4-NOT core complex"^^xsd:string ; rdfs:subClassOf obo:GO_0044424, obo:GO_0044424, [ a owl:Restriction ; owl:onProperty <http://purl.obolibrary.org/obo/BFO_0000050> ; # 'part of' owl:someValuesFrom obo:GO_0030014 # CCR4-NOT complex ] ; oboInOwl:id "GO:0030015"^^xsd:string ; obo:IAO_0000115 "The core of the CCR4-NOT complex. In Saccharomyces the CCR4-NOT..."; oboInOwl:hasOBONamespace "cellular_component"^^xsd:string . obo:GO_0030014 a bk:GeneOntologyTerm ; dc:identifier obo:GO_0030014_acc ; bk:is_a obo:GO_0044424 , obo:GO_0043234 ; bk:prefName "CCR4-NOT complex" . obo:GO_0030015 a bk:GeneOntologyTerm; bk:prefName "CCR4-NOT core complex"; bk:is_a obo:GO_0044424, obo:GO_0043234 ; bk:part_of obo:GO_0030014; dc:identifier obo:GO_0030015_acc. obo:GO_0044424 a bk:GeneOntologyTerm; bk:prefName "intracellular part" ; • OWL is simplified mixing classes with SKOS-style concepts • More suitable for less formal, more simple taxonomies • OWL-2 punning makes it consistent
  25. The BioKNO Ontology
 (and The rest of the World) BioKNO External Ontologies Mapping Type bk:Concept skos:Concept Subclass bk:Relation bk:relFrom bk:relTypeRef bk:relTo rdf:Statement
 rdf:subject rdf:predicate rdf:object Subclass Subproperties (ie, mapping to RDF reified statements) bk:Path, bk:Participant, bk:Interaction, bk:Transport, bk:Protein, bk:Gene Classes with same names in BioPAX and SIO Equivalent Class bk:participates_in bk:has_participant Relation Ontology (RO) properties with same names
 biopax:participant (as sub-property) Equivalent property bk:produces bk:produced_by bk:consumes bk:consumed_by biopax:product (as sub-property) RO properties with same names Equivalent property bk:regulates bk:positively_regulates bk:negatively_regulates RO properties with same names Equivalent property bk:is_a bk:part_of, bk:has_part bk:occurs_in, bk:co_occurs_with skos:broader Basic Formal Ontology (BFO)/RO properties with same names Equivalent property bk:Publication schema:CreativeWork Subclass bka:abstract bka:title (also known as AbstractHeader) bka:authors dcterms:description dcterms:title dc:creator Sub-property
  26. The BioKNO Ontology
  27. Putting it on a Bigger Picture
  28. Putting it on a Bigger Picture
  29. Accessing RDF through SPARQL
  30. Accessing RDF through SPARQL
  31. Accessing RDF through SPARQL
  32. CONSTRUCT { ?protIri bk:expressed_by ?sampleIri. ?degRelIri a bk:Relation; bka:PVALUE ?pValue; bk:evidence bkev:EXP; # Inferred from experiment bk:relFrom ?protIri; # Details defined by UniProt info bk:relTo ?sampleIri; # Details defined by sample_degs_2.tsv bk:relTypeRef bk:expressed_by. } WHERE { # Some IDs and IRIs to be defined above BIND ( LCASE ( REPLACE ( ?Sample, ' ', '_' ) ) AS ?sampleId ) BIND ( IRI ( CONCAT ( STR ( bkr: ), ?Gene_Symbol ) ) AS ?protIri ) BIND ( IRI ( CONCAT ( STR ( bkr: ), 'degex_', ?sampleId ) ) AS ?sampleIri ) BIND ( IRI ( CONCAT ( STR ( bkr: ), 'degex_', ?sampleId, '_', LCASE ( ?Gene_Symbol ) ) )
 AS ?degRelIri ) BIND ( xsd:double ( ?p_value ) AS ?pValue ) } Extraction, Loading, Transformation
 SPARQL/TARQL Example
  33. SPARQL/RDF for ELT • RDF-to-RDF translation via CONSTRUCT (or SPARUL) • TARQL: Using SPARQL to RDF-Convert Tabular CSV Files • RDF/XML can be transformed via XSL • We have done it for bio-specific ontology definitions in Ondex • Programmatic conversions • Using RDF frameworks, eg, Jena, RDF4J (former Sesame), rdflib for Python • See also java2rdf (https://github.com/EBIBioSamples/java2rdf) • We have used it for the Ondex->RDF converter
  34. SPARQL/RDF for ELT • RDF-to-RDF translation via CONSTRUCT (or SPARUL) • TARQL: Using SPARQL to RDF-Convert Tabular CSV Files • RDF/XML can be transformed via XSL • We have done it for bio-specific ontology definitions in Ondex • Programmatic conversions • Using RDF frameworks, eg, Jena, RDF4J (former Sesame), rdflib for Python • See also java2rdf (https://github.com/EBIBioSamples/java2rdf) • We have used it for the Ondex->RDF converter
  35. Issues https://lod-cloud.net/
  36. Issues https://lod-cloud.net/ • Still not so popular (especially in more commercial contexts) • It’s (perceived as) difficult (in particular, SPARQL) • Bad reputation • Performance can still be an issue • eg, optimising SPARQL can be hard • Specific issues • eg, I need contextualised/attribute-attached properties • and I don’t fancy reified relations…
  37. Another Graph Database World
 Property Graphs
  38. Neo4j on top Of RDF
  39. Application to Semantic Motif Search
  40. The rdf2neo Tool https://github.com/Rothamsted/rdf2neo
  41. Triple Stores vs Prop Graphs Neo4j, Cypher DBs, Graph DBs Semantic Web/Triple Stores Data xchg format - No official one, just Cypher, 
 Support for GraphML, RDF
 +/- Focus on backing applications + Focus on data sharing standards Data model + Relations with properties - Metadata/schemas/ontologies management - Relations cannot have properties (reification required) + Metadata/schemas/ontologies as first citizen and standardised OWL Performance + complex graph traversals + Comparable in most cases Query Language + Cypher is easier (eg, compact, implicit elems)?
 - Expressivity issues (unions) - No standard QL (but efforts in progress, eg, OpenCypher) - SPARQL is Harder? (URIs, namespaces, verbosity)
 + SPARQL More expressive Standardisation, openness +/- (TinkerPop is open, Neo4j isn’t) + Commercial support + More alive and up-to date (e.g., support for Hadoop, nice Neo4j browser, easy installation) + Natively open, many open implementations - Instability and many short-lived prototypes - Advancements seems to be slowing down + Some nice open and commercial browser (LODEStar, Scalability,
 big data +/- Commercial support to clustering/clouds for Neo4j
 + Open support in TinkerPop + Load Balancing/Cluster solutions, Commercial Cloud support (eg GraphDB)
 + SPARQL Over TinkerPop (via SAIL inteface)
  42. Bridging to RDF: JSON-LD … "@id": "bkr:TOB1", "@type": "bk:Protein", "prefName": "TOB1 Human", "dcterms:identifier": "TOB1", "is_annotated_by": "obo:GO_0030014", "participates_in": { "@id": "http://www.wikipathways.org/id1", "@type": "bk:Pathway", "evidence": "bkev:IMPD", "prefName":
 “Bone Morphogenic Protein (BMP) Signalling and Regulation" } } { "@context": { "bk": "http://www.ondex.org/bioknet/terms/", "bka": "http://www.ondex.org/bioknet/terms/attributes/", "bkds": "http://www.ondex.org/bioknet/terms/dataSources/", "bkev": "http://www.ondex.org/bioknet/terms/evidences/", "bkr": "http://www.ondex.org/bioknet/resources/", "dcterms": "http://purl.org/dc/terms/", "obo": "http://purl.obolibrary.org/obo/", "xsd": "http://www.w3.org/2001/XMLSchema#", "@vocab": "http://www.ondex.org/bioknet/terms/", "dcterms:identifier": { "@type": "xsd:string" }, "evidence": { "@type": “@id" } }, …
  43. KnetMiner UI Overview Search Select Explore Addressing FAIR • Findable (or, Semantic Web is still useful) • SPARQL endpoint • which powers URI Resolution • Dataset-level metadata (e.g., VoID) • Mapping to Standard Ontologies • Interested in contributing to existing standards
 (e.g., Bioschemas) • API/JSON-Schema formalisation • Accessible • Multiple access means (SPARQL, URIs, JSON APIs, Cypher) • Triple Stores and Property Graphs are complementary, not alternative • Data dumps • Interoperable (or, Sem Web is still useful) • Unified model encoded in at least one common syntax (RDF) • URIs are reused • Mappings to ontologies • Reusable • All of the above, plus multiple interfaces under unified model • Support to common graph languages (e.g., Cytoscape.js) • Converters (e.g., our RDF conversion scripts/tools, rdf2neo) • Open Data licences
Advertisement