Behind the Scenes of KnetMiner:
Towards Standardised and Interoperable
Knowledge Graphs
Harpenden, 3/6/2018

Marco Brandizi <marco.brandizi@rothamsted.ac.uk>
Find these slides on SlideShare
KnetMiner-inspired Artwork

by Hugo Dalton (hugodalton.com)
Behind the scenes of KnetMiner
Putting it on a Bigger Picture
Putting it on a Bigger Picture
<concept>
<id>1</id>
<pid>Q75WV3</pid>
<description/>
<elementOf>
<idRef>UNIPROTKB-SwissProt</idRef>
</elementOf>
<ofType>
<idRef>Protein</idRef>
</ofType>
<evidences>
<evidence>
<idRef>IMPD</idRef>
</evidence>
</evidences>
<conames>
<concept_name>
<name>Probable trehalose-phosphate phosphatase 1</name>
<isPreferred>true</isPreferred>
</concept_name>
…
<cc>
<id>Protein</id>
<fullname>Protein</fullname>
<description>
A protein is comprised of one or more Polypeptides
and potentially other molecules.
</description>
<specialisationOf>
<idRef>MolCmplx</idRef>
</specialisationOf>
</cc>
<relation>
<fromConcept>1</fromConcept>
<toConcept>3</toConcept>
<ofType>
<idRef>participates_in</idRef>
</ofType>
<evidences>
<evidence>
<idRef>ECO:0000316</idRef>
</evidence>
</evidences>
<relgds/>
</relation>
<concept>
<id>3</id>
<pid>GO:0009651</pid>
<description>response to salt stress</description>
<ofType><idRef>BioProc</idRef></ofType>
<coaccessions>
<concept_accession>
<accession>GO:0009651</accession>
<elementOf><idRef>GO</idRef></elementOf>
<ambiguous>false</ambiguous>
</concept_accession>
</coaccessions>
</concept>
Is XML/OXL Enough?
A Brief History of Data Models/Formats
The Semantic Web Approach: RDF
The Semantic Web Approach: RDF
URI Resolution
@prefix bkr: <http://www.ondex.org/bioknet/resources/> .
@prefix bk: <http://www.ondex.org/bioknet/terms/> .
@prefix bka: <http://www.ondex.org/bioknet/terms/attributes/> .
bkr:TOB1 a bk:Protein ;
bk:participates_in <http://www.wikipathways.org/id1> ;
bk:prefName "TOB1";
bk:published_in bkr:23236473.

The Turtle Syntax:
https://www.w3.org/TR/turtle/
Schema/Ontologies
Schema/Ontologies
Data store
Schema store
Schema/Ontologies
Data store
Schema store
Sharing Identifiers via URIs
Data store
Schema store
Wikipathways
Mapping Data for Interoperability
Our Data Model: The BioKNO Ontology
wp:id1
a bk:Path ; # a subclass of bk:Concept
bk:evidence bkev:IMPD ; # Imported from database, a predefined resource type.
bk:prefName "Bone Morphogenic Protein (BMP) Signalling and Regulation".
bkr:TOB1 a bk:Protein ;
dc:identifier bkr:TOB1_acc ;
bk:prefName "TOB1 HUMAN";

# A simplified link, hiding the BioPax chain:
# pathwayComponent -> BioChemicalReaction|Complex -> Protein
bk:participates_in wp:id1;


bk:is_annotated_by obo:GO_0030014. # Same URI as the OBO Gene Ontology Term.
# Structured accession, allow for linking of identifier and context.
bkr:TOB1_acc a bk:Accession ;
dcterms:identifier "TOB1";
# instance of bk:DataSource. Another predefined entity.
bk:dataSource bkds:UNIPROTKB.
BioKNO: Biological Entities
# For practical reasons, we always expect that the straight
# triple is always asserted, with the
# reified version optionally added to it.
bkr:TOB1 bk:published_in bkr:20068231.
bkr:citation_TOB1_15489334 a bk:Relation ;
# the same properties that are used for regular relations
bk:relTypeRef bk:published_in;
bk:relFrom bkr:TOB1 ;
bk:relTo bkr:15489334 ;
# An attribute
bka:score 0.95 ;

# Both attributes and object properties can be linked to a
# reified relation.
bk:evidence bkev:TextMining.
Attributes in Reified Relations
Talking to the Rest of The World
BioKNO External Ontologies Mapping Type
bk:Concept skos:Concept Subclass
bk:Relation
bk:relFrom
bk:relTypeRef
bk:relTo
rdf:Statement

rdf:subject
rdf:predicate
rdf:object
Subclass
Subproperties
(ie, mapping to RDF reified
statements)
bk:Path, bk:Participant, bk:Interaction, bk:Transport,
bk:Protein, bk:Gene
Classes with same names in BioPAX and SIO Equivalent Class
bk:participates_in
bk:has_participant
Relation Ontology (RO) properties with same names

biopax:participant (as sub-property)
Equivalent property
bk:produces
bk:produced_by
bk:consumes
bk:consumed_by
biopax:product (as sub-property)
RO properties with same names
Equivalent property
bk:regulates
bk:positively_regulates
bk:negatively_regulates
RO properties with same names Equivalent property
bk:is_a
bk:part_of, bk:has_part
bk:occurs_in, bk:co_occurs_with
skos:broader
Basic Formal Ontology (BFO)/RO properties with same
names
Equivalent property
bk:Publication schema:CreativeWork Subclass
bka:abstract
bka:title (also known as AbstractHeader)
bka:authors
dcterms:description
dcterms:title
dc:creator
Sub-property
How to Serve and Query RDF?
Typical RDF (and Data) Architecture
How to Use it, Concretely?
Playground: SPARQL Browsers
How to Use it, Concretely?
Playground: SPARQL Browsers
How to Use it, Concretely?
Playground: SPARQL Browsers
How to Use it, Concretely?
Programmatically: RDF Frameworks (Jena in this case)
How to Use it, Concretely?
Programmatically: RDF Frameworks (Jena in this case)
How to Use it, Concretely?
Programmatically: RDF Frameworks (Jena in this case)
String service = "http://localhost:3030/ds/query";
String sparql =
"PREFIX bk: <http://www.ondex.org/bioknet/terms/>n" + 

…
"n" +
"n" +
"SELECT DISTINCT ?pmid ?title ?year ?pub n" +
"{n" +
" ?prot a bk:Protein;n" +
" bk:prefName 'TOB1'.n" +
" n" +
" ?pubRel a bk:Relation;n" +
" bk:relFrom ?prot;n" +
" bk:relTo ?pub;n" +
" bka:Score ?score.n" +
" n" +
" FILTER ( ?score > 0.90 )n" +
" n" +
" ?pub n" +
" bka:PMID ?pmid ;n" +
" bka:YEAR ?dyear;n" +
" bka:abstractHeader ?titlen" +
"n" +
" BIND ( xsd:int ( ?dyear ) AS ?year )n" +
"}n" +
"LIMIT 1000";
How to Use it, Concretely?
Programmatically: RDF Frameworks (Jena in this case)
String service = "http://localhost:3030/ds/query";
String sparql =
"PREFIX bk: <http://www.ondex.org/bioknet/terms/>n" + 

…
"n" +
"n" +
"SELECT DISTINCT ?pmid ?title ?year ?pub n" +
"{n" +
" ?prot a bk:Protein;n" +
" bk:prefName 'TOB1'.n" +
" n" +
" ?pubRel a bk:Relation;n" +
" bk:relFrom ?prot;n" +
" bk:relTo ?pub;n" +
" bka:Score ?score.n" +
" n" +
" FILTER ( ?score > 0.90 )n" +
" n" +
" ?pub n" +
" bka:PMID ?pmid ;n" +
" bka:YEAR ?dyear;n" +
" bka:abstractHeader ?titlen" +
"n" +
" BIND ( xsd:int ( ?dyear ) AS ?year )n" +
"}n" +
"LIMIT 1000";
Query query = QueryFactory.create ( sparql );
QueryEngineHTTP qexec = QueryExecutionFactory.createServiceRequest(
service, query
);
ResultSet results = qexec.execSelect() ;
results.forEachRemaining ( (QuerySolution soln ) ->
{
Resource pubNode = soln.getResource ( "pub" );
String uri = pubNode.getURI ();
Literal titleNode = soln.getLiteral ( "title" );
String title = titleNode.getString ();
String titleLang = titleNode.getLanguage ();
Literal yearNode = soln.getLiteral ( "year" );
int year = yearNode.getInt ();
System.out.format (
"Publication ID: <%s>, title: %s (in %s), year: %dn",
uri, title, titleLang, year
);
});
CONSTRUCT {
?path a bk:Path;
bk:prefName ?pathName;
bk:evidence bkev:IMPD.
?bkProt a bk:Protein;
dc:identifier ?bkProtAccUri;
bk:prefName ?protName;
bk:participates_in ?path.
?bkProtAccUri a bk:Accession;
dcterms:identifier ?protName;
bk:dataSource bkds:UNIPROTKB.
}
SPARQL for Extraction, Loading, Transformation
(The Simpler-than-Ondex Way)
WHERE
{
?path a bp:Pathway;
bp:displayName ?pathName;
bp:pathwayComponent ?comp.
{
?comp a bp:BiochemicalReaction;
bp:left|bp:right ?protein.
}
UNION {
?react a bp:Complex;
bp:component ?protein.
}
?protein a bp:Protein;
bp:displayName ?protName.
BIND ( IRI ( CONCAT ( STR ( bkr: ), STR ( ?protName ) ) ) AS ?bkProt )
BIND ( IRI ( CONCAT ( STR ( ?bkProt ), "_acc" ) ) AS ?bkProtAccUri )
}
CONSTRUCT {
?path a bk:Path;
bk:prefName ?pathName;
bk:evidence bkev:IMPD.
?bkProt a bk:Protein;
dc:identifier ?bkProtAccUri;
bk:prefName ?protName;
bk:participates_in ?path.
?bkProtAccUri a bk:Accession;
dcterms:identifier ?protName;
bk:dataSource bkds:UNIPROTKB.
}
SPARQL for Extraction, Loading, Transformation
(The Simpler-than-Ondex Way)
WHERE
{
?path a bp:Pathway;
bp:displayName ?pathName;
bp:pathwayComponent ?comp.
{
?comp a bp:BiochemicalReaction;
bp:left|bp:right ?protein.
}
UNION {
?react a bp:Complex;
bp:component ?protein.
}
?protein a bp:Protein;
bp:displayName ?protName.
BIND ( IRI ( CONCAT ( STR ( bkr: ), STR ( ?protName ) ) ) AS ?bkProt )
BIND ( IRI ( CONCAT ( STR ( ?bkProt ), "_acc" ) ) AS ?bkProtAccUri )
}
SPARQL/RDF for ELT
• TARQL: Using SPARQL to RDF-Convert Tabular CSV Files
• RDF/XML can be transformed via XSL
• We have done it for bio-specific ontology definitions in Ondex
• Programmatic conversions
• Using RDF frameworks, eg, Jena, RDF4J (former Sesame), rdflib for
Python
• See also java2rdf (https://github.com/EBIBioSamples/java2rdf)
• We have used it for the Ondex->RDF converter
SPARQL/RDF for ELT
• TARQL: Using SPARQL to RDF-Convert Tabular CSV Files
• RDF/XML can be transformed via XSL
• We have done it for bio-specific ontology definitions in Ondex
• Programmatic conversions
• Using RDF frameworks, eg, Jena, RDF4J (former Sesame), rdflib for
Python
• See also java2rdf (https://github.com/EBIBioSamples/java2rdf)
• We have used it for the Ondex->RDF converter
The Bigger Picture
The Bigger Picture
https://www.economist.com/node/21521548
The Bigger Picture
https://goo.gl/n4m5xL
Artificial	Intelligence	(AI)
8
https://www.economist.com/node/21521548
The Bigger Picture
https://goo.gl/n4m5xL
Artificial	Intelligence	(AI)
8
https://www.economist.com/node/21521548
The Bigger Picture: Linked Open Data
Artificial	Intelligence	(AI)
8
https://lod-cloud.net/
In the Life Sciences
Another Graph Database World
Another Graph Database World
The Cypher Query/DML Language
Proteins->Reactions->Pathways:

// chain of paths, node selection via property (exploits indices)

MATCH (prot:Protein) - [csby:consumed_by] -> (:Reaction) -
[:part_of] -> (pway:Path{ title: ‘apoptosis’ })

// further conditions, not always so performant

WHERE prot.name =~ ‘(?i)^DNA.+’

// Usual projection and post-selection operators

RETURN prot.name, pway

// Relations can have properties

ORDER BY csby.pvalue

LIMIT 1000
Proteins->Reactions->Pathways:
// Single-path (or same-direction branching) easy to write

MATCH (prot:Protein) - [:produced_by|consumed_by] -> (:Reaction) 

- [:part_of*1..3] -> (pway:Path)

RETURN ID(prot), ID(pway) LIMIT 1000

// Very compact forms available, depending on the data

MATCH (prot:Protein) - (pway:Path) RETURN pway
Cypher as Semantic Motif Language
Cypher as Semantic Motif Language
The rdf2neo Tool
The rdf2neo Tool
The rdf2neo Tool
The rdf2neo Tool
SELECT ?iri
{
?label rdfs:subClassOf* bk:Concept.
?iri a ?label.
}
SELECT ?label
{
{
?iri a ?label.
?label rdfs:subClassOf* bk:Concept.
}
UNION {
# it's always instance of concept
BIND ( bk:Concept AS ?label )
BIND ( ?iri AS ?iri )
}
} SELECT ?name ?value
{
{
?iri ?name ?value.
VALUES ( ?name ) {
(dcterms:identifier)
(dcterms:description)
(rdfs:comment)
(bk:prefName)
(bk:altName)
}
}
UNION {
?iri ?name ?value.
?name rdfs:subPropertyOf* bk:attribute.
}
}
The rdf2neo Tool
https://github.com/Rothamsted/rdf2neo
How to Use it, Concretely?
Playground: The Neo4j Browser
How to Use it, Concretely?
Programmatically: The Neo4j Drivers (for Java in this case)
How to Use it, Concretely?
Programmatically: The Neo4j Drivers (for Java in this case)
AuthToken auth = AuthTokens.basic ( "neo4j", "test" );
try (
Driver neodb = GraphDatabase.driver ( "bolt://127.0.0.1:7687", auth );
Session session = neodb.session ();
)
{
String cypher =
"MATCH (prot:Protein{ prefName:'TOB1' }) - [r:published_in] -> (pub)n" +
"WHERE toFloat ( r.Score ) > 0.9n" +
"RETURN pub.PMID, pub.AbstractHeader, pub.YEARn" +
"ORDER BY pub.YEAR DESCn" +
"LIMIT 30";
Statement stmt = new Statement ( cypher );
StatementResult rs = session.run ( stmt );
rs.forEachRemaining ( rec -> {
String pmid = rec.get ( "pub.PMID" ).asString ();
String title = rec.get ( "pub.AbstractHeader" ).asString ();
String year = rec.get ( "pub.YEAR" ).asString ();
System.out.format (
"PMID: %s, Title: "%s", year: %sn",
pmid, title, year
);
});
}
Triple Stores vs Prop Graphs
Neo4j, Cypher DBs, Graph DBs Semantic Web/Triple Stores
Data xchg format
- No official one, just Cypher, 

Support for GraphML, RDF

+/- Focus on backing applications

+ Focus on data sharing standards

Data model
+ Relations with properties

- Metadata/schemas/ontologies management
- Relations cannot have properties (reification
required)

+ Metadata/schemas/ontologies as first citizen
and standardised OWL
Performance + complex graph traversals + Comparable in most cases
Query Language
+ Cypher is easier (eg, compact, implicit elems)?

- Expressivity issues (unions)

- No standard QL (but efforts in progress, eg,
OpenCypher)
- SPARQL is Harder? (URIs, namespaces,
verbosity)

+ SPARQL More expressive
Standardisation,
openness
+/- (TinkerPop is open, Neo4j isn’t)

+ Commercial support

+ More alive and up-to date (e.g., support for
Hadoop, nice Neo4j browser, easy installation)
+ Natively open, many open implementations

- Instability and many short-lived prototypes

- Advancements seems to be slowing down

+ Some nice open and commercial browser
(LODEStar,
Scalability,

big data
+/- Commercial support to clustering/clouds for
Neo4j

+ Open support in TinkerPop
+ Load Balancing/Cluster solutions, Commercial
Cloud support (eg GraphDB)

+ SPARQL Over TinkerPop (via SAIL inteface)
Supporting Web APIs via JSON
{
"type": "Protein",
"id": "TOB1",
"prefName": "TOB1 Human",
"participates_in":
{
"type": "Pathway",
"id": "id1",
"evidence": "IMPD",
"prefName": "Bone Morphogenic Protein (BMP) Signalling and Regulation"
},
"is_annotated_by": "GO_0030014"
}
• Designed to be compatible with browser, i.e., Javascript
• Language of choice for web APIs, web browser consuming, dynamic
web interfaces (i.e., AJAX)
• Conceptually similar to XML (trees, nested structures)
• Often used in a lightweight way, without much schema constraints
Supporting Web APIs via JSON
{
"type": "Protein",
"id": "TOB1",
"prefName": "TOB1 Human",
"participates_in":
{
"type": "Pathway",
"id": "id1",
"evidence": "IMPD",
"prefName": "Bone Morphogenic Protein (BMP) Signalling and Regulation"
},
"is_annotated_by": "GO_0030014"
}
• Designed to be compatible with browser, i.e., Javascript
• Language of choice for web APIs, web browser consuming, dynamic
web interfaces (i.e., AJAX)
• Conceptually similar to XML (trees, nested structures)
• Often used in a lightweight way, without much schema constraints
Bridging to RDF: JSON-LD
…
"@id": "bkr:TOB1",
"@type": "bk:Protein",
"prefName": "TOB1 Human",
"dcterms:identifier": "TOB1",
"is_annotated_by": "obo:GO_0030014",
"participates_in": {
"@id": "http://www.wikipathways.org/id1",
"@type": "bk:Pathway",
"evidence": "bkev:IMPD",
"prefName":

“Bone Morphogenic Protein (BMP) Signalling and Regulation"
}
}
{
"@context": {
"bk": "http://www.ondex.org/bioknet/terms/",
"bka": "http://www.ondex.org/bioknet/terms/attributes/",
"bkds": "http://www.ondex.org/bioknet/terms/dataSources/",
"bkev": "http://www.ondex.org/bioknet/terms/evidences/",
"bkr": "http://www.ondex.org/bioknet/resources/",
"dcterms": "http://purl.org/dc/terms/",
"obo": "http://purl.obolibrary.org/obo/",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"@vocab": "http://www.ondex.org/bioknet/terms/",
"dcterms:identifier": { "@type": "xsd:string" },
"evidence": { "@type": “@id" }
},
…
JSON Schemas Babylon (and Our Focus)
JSON Schemas Babylon (and Our Focus)
JSON Schemas Babylon (and Our Focus)
JSON Schemas Babylon (and Our Focus)
JSON Schemas Babylon (and Our Focus)
Take-Home Messages
• From small data integration farm to sharing with the rest of the world => FAIR Principles
• Semantic Web has pros and cons
• Still useful for data model and schema governance, identifiers, complex models (namely,
ontologies)
• Alternative data sharing approaches, PG in particular
• More alive area, can be simpler (blends into existing industrial software better)
• LOD/FAIR principles not addressed much
• Integrating the two is useful
• APIs are a useful alternative/complementary approach
• LOD/FAIR principles to be addressed as well
• In our radar:
• complete the work, publishing SPARQL, Neo4j access, APIs
• Integrating similar projects in the agrifood field (e.g. BrAPI, DFW)
• Contribute to standardisation efforts like Bioschemas
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowledge Graphs

Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowledge Graphs

  • 1.
    Behind the Scenesof KnetMiner: Towards Standardised and Interoperable Knowledge Graphs Harpenden, 3/6/2018
 Marco Brandizi <marco.brandizi@rothamsted.ac.uk> Find these slides on SlideShare KnetMiner-inspired Artwork
 by Hugo Dalton (hugodalton.com)
  • 2.
    Behind the scenesof KnetMiner
  • 3.
    Putting it ona Bigger Picture
  • 4.
    Putting it ona Bigger Picture
  • 5.
    <concept> <id>1</id> <pid>Q75WV3</pid> <description/> <elementOf> <idRef>UNIPROTKB-SwissProt</idRef> </elementOf> <ofType> <idRef>Protein</idRef> </ofType> <evidences> <evidence> <idRef>IMPD</idRef> </evidence> </evidences> <conames> <concept_name> <name>Probable trehalose-phosphate phosphatase1</name> <isPreferred>true</isPreferred> </concept_name> … <cc> <id>Protein</id> <fullname>Protein</fullname> <description> A protein is comprised of one or more Polypeptides and potentially other molecules. </description> <specialisationOf> <idRef>MolCmplx</idRef> </specialisationOf> </cc> <relation> <fromConcept>1</fromConcept> <toConcept>3</toConcept> <ofType> <idRef>participates_in</idRef> </ofType> <evidences> <evidence> <idRef>ECO:0000316</idRef> </evidence> </evidences> <relgds/> </relation> <concept> <id>3</id> <pid>GO:0009651</pid> <description>response to salt stress</description> <ofType><idRef>BioProc</idRef></ofType> <coaccessions> <concept_accession> <accession>GO:0009651</accession> <elementOf><idRef>GO</idRef></elementOf> <ambiguous>false</ambiguous> </concept_accession> </coaccessions> </concept> Is XML/OXL Enough?
  • 6.
    A Brief Historyof Data Models/Formats
  • 7.
    The Semantic WebApproach: RDF
  • 8.
    The Semantic WebApproach: RDF
  • 9.
    URI Resolution @prefix bkr:<http://www.ondex.org/bioknet/resources/> . @prefix bk: <http://www.ondex.org/bioknet/terms/> . @prefix bka: <http://www.ondex.org/bioknet/terms/attributes/> . bkr:TOB1 a bk:Protein ; bk:participates_in <http://www.wikipathways.org/id1> ; bk:prefName "TOB1"; bk:published_in bkr:23236473.
 The Turtle Syntax: https://www.w3.org/TR/turtle/
  • 10.
  • 11.
  • 12.
  • 13.
    Sharing Identifiers viaURIs Data store Schema store Wikipathways
  • 14.
    Mapping Data forInteroperability
  • 17.
    Our Data Model:The BioKNO Ontology
  • 18.
    wp:id1 a bk:Path ;# a subclass of bk:Concept bk:evidence bkev:IMPD ; # Imported from database, a predefined resource type. bk:prefName "Bone Morphogenic Protein (BMP) Signalling and Regulation". bkr:TOB1 a bk:Protein ; dc:identifier bkr:TOB1_acc ; bk:prefName "TOB1 HUMAN";
 # A simplified link, hiding the BioPax chain: # pathwayComponent -> BioChemicalReaction|Complex -> Protein bk:participates_in wp:id1; 
 bk:is_annotated_by obo:GO_0030014. # Same URI as the OBO Gene Ontology Term. # Structured accession, allow for linking of identifier and context. bkr:TOB1_acc a bk:Accession ; dcterms:identifier "TOB1"; # instance of bk:DataSource. Another predefined entity. bk:dataSource bkds:UNIPROTKB. BioKNO: Biological Entities
  • 19.
    # For practicalreasons, we always expect that the straight # triple is always asserted, with the # reified version optionally added to it. bkr:TOB1 bk:published_in bkr:20068231. bkr:citation_TOB1_15489334 a bk:Relation ; # the same properties that are used for regular relations bk:relTypeRef bk:published_in; bk:relFrom bkr:TOB1 ; bk:relTo bkr:15489334 ; # An attribute bka:score 0.95 ;
 # Both attributes and object properties can be linked to a # reified relation. bk:evidence bkev:TextMining. Attributes in Reified Relations
  • 20.
    Talking to theRest of The World BioKNO External Ontologies Mapping Type bk:Concept skos:Concept Subclass bk:Relation bk:relFrom bk:relTypeRef bk:relTo rdf:Statement
 rdf:subject rdf:predicate rdf:object Subclass Subproperties (ie, mapping to RDF reified statements) bk:Path, bk:Participant, bk:Interaction, bk:Transport, bk:Protein, bk:Gene Classes with same names in BioPAX and SIO Equivalent Class bk:participates_in bk:has_participant Relation Ontology (RO) properties with same names
 biopax:participant (as sub-property) Equivalent property bk:produces bk:produced_by bk:consumes bk:consumed_by biopax:product (as sub-property) RO properties with same names Equivalent property bk:regulates bk:positively_regulates bk:negatively_regulates RO properties with same names Equivalent property bk:is_a bk:part_of, bk:has_part bk:occurs_in, bk:co_occurs_with skos:broader Basic Formal Ontology (BFO)/RO properties with same names Equivalent property bk:Publication schema:CreativeWork Subclass bka:abstract bka:title (also known as AbstractHeader) bka:authors dcterms:description dcterms:title dc:creator Sub-property
  • 22.
    How to Serveand Query RDF?
  • 23.
    Typical RDF (andData) Architecture
  • 24.
    How to Useit, Concretely? Playground: SPARQL Browsers
  • 25.
    How to Useit, Concretely? Playground: SPARQL Browsers
  • 26.
    How to Useit, Concretely? Playground: SPARQL Browsers
  • 27.
    How to Useit, Concretely? Programmatically: RDF Frameworks (Jena in this case)
  • 28.
    How to Useit, Concretely? Programmatically: RDF Frameworks (Jena in this case)
  • 29.
    How to Useit, Concretely? Programmatically: RDF Frameworks (Jena in this case) String service = "http://localhost:3030/ds/query"; String sparql = "PREFIX bk: <http://www.ondex.org/bioknet/terms/>n" + 
 … "n" + "n" + "SELECT DISTINCT ?pmid ?title ?year ?pub n" + "{n" + " ?prot a bk:Protein;n" + " bk:prefName 'TOB1'.n" + " n" + " ?pubRel a bk:Relation;n" + " bk:relFrom ?prot;n" + " bk:relTo ?pub;n" + " bka:Score ?score.n" + " n" + " FILTER ( ?score > 0.90 )n" + " n" + " ?pub n" + " bka:PMID ?pmid ;n" + " bka:YEAR ?dyear;n" + " bka:abstractHeader ?titlen" + "n" + " BIND ( xsd:int ( ?dyear ) AS ?year )n" + "}n" + "LIMIT 1000";
  • 30.
    How to Useit, Concretely? Programmatically: RDF Frameworks (Jena in this case) String service = "http://localhost:3030/ds/query"; String sparql = "PREFIX bk: <http://www.ondex.org/bioknet/terms/>n" + 
 … "n" + "n" + "SELECT DISTINCT ?pmid ?title ?year ?pub n" + "{n" + " ?prot a bk:Protein;n" + " bk:prefName 'TOB1'.n" + " n" + " ?pubRel a bk:Relation;n" + " bk:relFrom ?prot;n" + " bk:relTo ?pub;n" + " bka:Score ?score.n" + " n" + " FILTER ( ?score > 0.90 )n" + " n" + " ?pub n" + " bka:PMID ?pmid ;n" + " bka:YEAR ?dyear;n" + " bka:abstractHeader ?titlen" + "n" + " BIND ( xsd:int ( ?dyear ) AS ?year )n" + "}n" + "LIMIT 1000"; Query query = QueryFactory.create ( sparql ); QueryEngineHTTP qexec = QueryExecutionFactory.createServiceRequest( service, query ); ResultSet results = qexec.execSelect() ; results.forEachRemaining ( (QuerySolution soln ) -> { Resource pubNode = soln.getResource ( "pub" ); String uri = pubNode.getURI (); Literal titleNode = soln.getLiteral ( "title" ); String title = titleNode.getString (); String titleLang = titleNode.getLanguage (); Literal yearNode = soln.getLiteral ( "year" ); int year = yearNode.getInt (); System.out.format ( "Publication ID: <%s>, title: %s (in %s), year: %dn", uri, title, titleLang, year ); });
  • 31.
    CONSTRUCT { ?path abk:Path; bk:prefName ?pathName; bk:evidence bkev:IMPD. ?bkProt a bk:Protein; dc:identifier ?bkProtAccUri; bk:prefName ?protName; bk:participates_in ?path. ?bkProtAccUri a bk:Accession; dcterms:identifier ?protName; bk:dataSource bkds:UNIPROTKB. } SPARQL for Extraction, Loading, Transformation (The Simpler-than-Ondex Way) WHERE { ?path a bp:Pathway; bp:displayName ?pathName; bp:pathwayComponent ?comp. { ?comp a bp:BiochemicalReaction; bp:left|bp:right ?protein. } UNION { ?react a bp:Complex; bp:component ?protein. } ?protein a bp:Protein; bp:displayName ?protName. BIND ( IRI ( CONCAT ( STR ( bkr: ), STR ( ?protName ) ) ) AS ?bkProt ) BIND ( IRI ( CONCAT ( STR ( ?bkProt ), "_acc" ) ) AS ?bkProtAccUri ) }
  • 32.
    CONSTRUCT { ?path abk:Path; bk:prefName ?pathName; bk:evidence bkev:IMPD. ?bkProt a bk:Protein; dc:identifier ?bkProtAccUri; bk:prefName ?protName; bk:participates_in ?path. ?bkProtAccUri a bk:Accession; dcterms:identifier ?protName; bk:dataSource bkds:UNIPROTKB. } SPARQL for Extraction, Loading, Transformation (The Simpler-than-Ondex Way) WHERE { ?path a bp:Pathway; bp:displayName ?pathName; bp:pathwayComponent ?comp. { ?comp a bp:BiochemicalReaction; bp:left|bp:right ?protein. } UNION { ?react a bp:Complex; bp:component ?protein. } ?protein a bp:Protein; bp:displayName ?protName. BIND ( IRI ( CONCAT ( STR ( bkr: ), STR ( ?protName ) ) ) AS ?bkProt ) BIND ( IRI ( CONCAT ( STR ( ?bkProt ), "_acc" ) ) AS ?bkProtAccUri ) }
  • 33.
    SPARQL/RDF for ELT •TARQL: Using SPARQL to RDF-Convert Tabular CSV Files • RDF/XML can be transformed via XSL • We have done it for bio-specific ontology definitions in Ondex • Programmatic conversions • Using RDF frameworks, eg, Jena, RDF4J (former Sesame), rdflib for Python • See also java2rdf (https://github.com/EBIBioSamples/java2rdf) • We have used it for the Ondex->RDF converter
  • 34.
    SPARQL/RDF for ELT •TARQL: Using SPARQL to RDF-Convert Tabular CSV Files • RDF/XML can be transformed via XSL • We have done it for bio-specific ontology definitions in Ondex • Programmatic conversions • Using RDF frameworks, eg, Jena, RDF4J (former Sesame), rdflib for Python • See also java2rdf (https://github.com/EBIBioSamples/java2rdf) • We have used it for the Ondex->RDF converter
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
    The Bigger Picture:Linked Open Data Artificial Intelligence (AI) 8 https://lod-cloud.net/
  • 40.
    In the LifeSciences
  • 41.
  • 42.
  • 43.
    The Cypher Query/DMLLanguage Proteins->Reactions->Pathways:
 // chain of paths, node selection via property (exploits indices)
 MATCH (prot:Protein) - [csby:consumed_by] -> (:Reaction) - [:part_of] -> (pway:Path{ title: ‘apoptosis’ })
 // further conditions, not always so performant
 WHERE prot.name =~ ‘(?i)^DNA.+’
 // Usual projection and post-selection operators
 RETURN prot.name, pway
 // Relations can have properties
 ORDER BY csby.pvalue
 LIMIT 1000 Proteins->Reactions->Pathways: // Single-path (or same-direction branching) easy to write
 MATCH (prot:Protein) - [:produced_by|consumed_by] -> (:Reaction) 
 - [:part_of*1..3] -> (pway:Path)
 RETURN ID(prot), ID(pway) LIMIT 1000
 // Very compact forms available, depending on the data
 MATCH (prot:Protein) - (pway:Path) RETURN pway
  • 44.
    Cypher as SemanticMotif Language
  • 45.
    Cypher as SemanticMotif Language
  • 46.
  • 47.
  • 48.
  • 49.
    The rdf2neo Tool SELECT?iri { ?label rdfs:subClassOf* bk:Concept. ?iri a ?label. } SELECT ?label { { ?iri a ?label. ?label rdfs:subClassOf* bk:Concept. } UNION { # it's always instance of concept BIND ( bk:Concept AS ?label ) BIND ( ?iri AS ?iri ) } } SELECT ?name ?value { { ?iri ?name ?value. VALUES ( ?name ) { (dcterms:identifier) (dcterms:description) (rdfs:comment) (bk:prefName) (bk:altName) } } UNION { ?iri ?name ?value. ?name rdfs:subPropertyOf* bk:attribute. } }
  • 50.
  • 51.
    How to Useit, Concretely? Playground: The Neo4j Browser
  • 52.
    How to Useit, Concretely? Programmatically: The Neo4j Drivers (for Java in this case)
  • 53.
    How to Useit, Concretely? Programmatically: The Neo4j Drivers (for Java in this case) AuthToken auth = AuthTokens.basic ( "neo4j", "test" ); try ( Driver neodb = GraphDatabase.driver ( "bolt://127.0.0.1:7687", auth ); Session session = neodb.session (); ) { String cypher = "MATCH (prot:Protein{ prefName:'TOB1' }) - [r:published_in] -> (pub)n" + "WHERE toFloat ( r.Score ) > 0.9n" + "RETURN pub.PMID, pub.AbstractHeader, pub.YEARn" + "ORDER BY pub.YEAR DESCn" + "LIMIT 30"; Statement stmt = new Statement ( cypher ); StatementResult rs = session.run ( stmt ); rs.forEachRemaining ( rec -> { String pmid = rec.get ( "pub.PMID" ).asString (); String title = rec.get ( "pub.AbstractHeader" ).asString (); String year = rec.get ( "pub.YEAR" ).asString (); System.out.format ( "PMID: %s, Title: "%s", year: %sn", pmid, title, year ); }); }
  • 54.
    Triple Stores vsProp Graphs Neo4j, Cypher DBs, Graph DBs Semantic Web/Triple Stores Data xchg format - No official one, just Cypher, 
 Support for GraphML, RDF
 +/- Focus on backing applications + Focus on data sharing standards Data model + Relations with properties - Metadata/schemas/ontologies management - Relations cannot have properties (reification required) + Metadata/schemas/ontologies as first citizen and standardised OWL Performance + complex graph traversals + Comparable in most cases Query Language + Cypher is easier (eg, compact, implicit elems)?
 - Expressivity issues (unions) - No standard QL (but efforts in progress, eg, OpenCypher) - SPARQL is Harder? (URIs, namespaces, verbosity)
 + SPARQL More expressive Standardisation, openness +/- (TinkerPop is open, Neo4j isn’t) + Commercial support + More alive and up-to date (e.g., support for Hadoop, nice Neo4j browser, easy installation) + Natively open, many open implementations - Instability and many short-lived prototypes - Advancements seems to be slowing down + Some nice open and commercial browser (LODEStar, Scalability,
 big data +/- Commercial support to clustering/clouds for Neo4j
 + Open support in TinkerPop + Load Balancing/Cluster solutions, Commercial Cloud support (eg GraphDB)
 + SPARQL Over TinkerPop (via SAIL inteface)
  • 55.
    Supporting Web APIsvia JSON { "type": "Protein", "id": "TOB1", "prefName": "TOB1 Human", "participates_in": { "type": "Pathway", "id": "id1", "evidence": "IMPD", "prefName": "Bone Morphogenic Protein (BMP) Signalling and Regulation" }, "is_annotated_by": "GO_0030014" } • Designed to be compatible with browser, i.e., Javascript • Language of choice for web APIs, web browser consuming, dynamic web interfaces (i.e., AJAX) • Conceptually similar to XML (trees, nested structures) • Often used in a lightweight way, without much schema constraints
  • 56.
    Supporting Web APIsvia JSON { "type": "Protein", "id": "TOB1", "prefName": "TOB1 Human", "participates_in": { "type": "Pathway", "id": "id1", "evidence": "IMPD", "prefName": "Bone Morphogenic Protein (BMP) Signalling and Regulation" }, "is_annotated_by": "GO_0030014" } • Designed to be compatible with browser, i.e., Javascript • Language of choice for web APIs, web browser consuming, dynamic web interfaces (i.e., AJAX) • Conceptually similar to XML (trees, nested structures) • Often used in a lightweight way, without much schema constraints
  • 57.
    Bridging to RDF:JSON-LD … "@id": "bkr:TOB1", "@type": "bk:Protein", "prefName": "TOB1 Human", "dcterms:identifier": "TOB1", "is_annotated_by": "obo:GO_0030014", "participates_in": { "@id": "http://www.wikipathways.org/id1", "@type": "bk:Pathway", "evidence": "bkev:IMPD", "prefName":
 “Bone Morphogenic Protein (BMP) Signalling and Regulation" } } { "@context": { "bk": "http://www.ondex.org/bioknet/terms/", "bka": "http://www.ondex.org/bioknet/terms/attributes/", "bkds": "http://www.ondex.org/bioknet/terms/dataSources/", "bkev": "http://www.ondex.org/bioknet/terms/evidences/", "bkr": "http://www.ondex.org/bioknet/resources/", "dcterms": "http://purl.org/dc/terms/", "obo": "http://purl.obolibrary.org/obo/", "xsd": "http://www.w3.org/2001/XMLSchema#", "@vocab": "http://www.ondex.org/bioknet/terms/", "dcterms:identifier": { "@type": "xsd:string" }, "evidence": { "@type": “@id" } }, …
  • 58.
    JSON Schemas Babylon(and Our Focus)
  • 59.
    JSON Schemas Babylon(and Our Focus)
  • 60.
    JSON Schemas Babylon(and Our Focus)
  • 61.
    JSON Schemas Babylon(and Our Focus)
  • 62.
    JSON Schemas Babylon(and Our Focus)
  • 63.
    Take-Home Messages • Fromsmall data integration farm to sharing with the rest of the world => FAIR Principles • Semantic Web has pros and cons • Still useful for data model and schema governance, identifiers, complex models (namely, ontologies) • Alternative data sharing approaches, PG in particular • More alive area, can be simpler (blends into existing industrial software better) • LOD/FAIR principles not addressed much • Integrating the two is useful • APIs are a useful alternative/complementary approach • LOD/FAIR principles to be addressed as well • In our radar: • complete the work, publishing SPARQL, Neo4j access, APIs • Integrating similar projects in the agrifood field (e.g. BrAPI, DFW) • Contribute to standardisation efforts like Bioschemas