SlideShare a Scribd company logo
1 of 38
O N D E X & G R A P H D B S
M A R C O B R A N D I Z I , 1 6 / 1 0 / 2 0 1 7
G O A L S
• Evaluate graph databases (GDBs)/frameworkd/etc in relation to ONDEX needs
• Assess GDBs as kNetMiner/ONDEX backends
• Evaluate a new architecture where raw data access is entirely based on a GDB
• Evaluate a new data exchange format, possibly integrated with one GDBs
• and hence, evaluate the data models too
• Assess data query/manipulation languages (expressivity, ease of use, speed)
• Assess that performance fits to ONDEX needs
T E S T D A T A
Trait Ontology (TO) 1500 nodes, is-a and part-of relations (i.e., mostly tree)
Gene Ontology (GO) Tree with 46k nodes
AraCyc/BioPAX Heterogeneous net, 23k nodes, 40k relations
Ara-kNet Heterogeneous net, 350k nodes 1.150M relations
T E S T S E T T I N G S ( R D F )
T E S T S E T T I N G S ( N E O 4 J )
R D F
R D F / L I N K E D D A T A
E S S E N T I A L S
• Simple, Fine-Grained Data
Model: Property/Value Pairs &
Typed Links
• Designed for Data Integration:
• Universal Identifiers, W3C
Standards
• Strong (even too much)
emphasis on knowledge
modelling via
schemas/ontologies
• Designed for the Web:
Resolvable URIs, Web APIs
R D F / L I N K E D D A T A E S S E N T I A L S
Integration as native citizen, strong emphasis on knowledge modelling, schemas, ontologies
D A T A M O D E L : O N D E X I N R D F
E X A M P L E Q U E R I E S
Count concepts (classes) in Trait Ontology:
select count (distinct ?c) WHERE {
?c a odxcc:TO_TERM.
}
Parts of membrane (transitively):
select distinct ?csup ?supName ?c ?name
WHERE {
?csup odx:conceptName ?supName.
FILTER ( ?supName = "cellular membrane" )
?c odxrt:part_of* ?csup.
?c odx:conceptName ?name.
}
LIMIT 1000
Proteins related to pathways:
select distinct ?prot ?pway {
?prot odxrt:pd_by|odxrt:cs_by ?react;
a odxcc:Protein.
?react a odxcc:Reaction.
?react odxrt:part_of ?pway.
?pway a odxcc:Path.
}
LIMIT 1000
optimised order
‘|’ for property paths
E X A M P L E Q U E R I E S
# part 2
union {
# Branch 2
?prot ^odxrt:ac_by|odxrt:is_a ?enz.
?prot a odxcc:Protein.
?enz a odxcc:Enzyme.
{
# Branch 2.1
?enz odxrt:ac_by|odxrt:in_by ?comp.
?comp a odxcc:Compound.
?comp odxrt:cs_by|odxrt:pd_by ?trns
?trns a odxcc:Transport
}
union {
# Branch 2.2
?enz ^odxrt:ca_by ?trns.
?trns a odxcc:Transport
}
?trns odxrt:part_of ?pway.
?pway a odxcc:Path.
}
} LIMIT 1000
prefix odx: <http://ondex.sourceforge.net/ondex-core#>
prefix odxcc: <http://www.ondex.org/ex/conceptClass/>
prefix odxc: <http://www.ondex.org/ex/concept/>
prefix odxrt: <http://www.ondex.org/ex/relationType/>
prefix odxr: <http://www.ondex.org/ex/relation/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?prot ?pway {
where {
# Branch 1
?prot odxrt:pd_by|odxrt:cs_by ?react.
?prot a odxcc:Protein.
?react a odxcc:Reaction.
?react odxrt:part_of ?pway.
?pway a odxcc:Path.
}
# to be continued…
Proteins related to pathways:
R D F P E R F O R M A N C E
Simple, common queries (Fuseki)
R D F P E R F O R M A N C E
Queries over ONDEX paths (Fuseki)
R D F P E R F O R M A N C E
Queries over ONDEX paths, Virtuoso
N E O 4 J
N E O 4 J E S S E N T I A L S
• Designed to backup applications
• much less about standards or Web-based sharing
• Very little to manage schemas (more later)
• No native data format (except Cypher, support for
GraphML, RDF)
• Initially based on API only, now Cypher available
• Compact, easy, no URIs (can be used as strings)
• Very performant
• Hasn’t much for clustering/federation, but Cypher can be
used in TinkerPop
• More commercial (not necessarily good)
• Cool management interface
• Probably easier to use for the average Java developer
Image credits: https://goo.gl/YLhCXG
N E O 4 J D A T A M O D E L
Both nodes and relations can have attributes
Nodes & relations have labels
(i.e., string-based types)
Cool management interface
(SPARQL version might be a student project)
C Y P H E R Q U E R Y / D M
L A N G U A G E
Proteins->Reactions->Pathways:
// chain of paths, node selection via property (exploits indices)
MATCH (prot:Protein) - [csby:consumed_by] -> (:Reaction) - [:part_of] -> (pway:Path{ title: ‘apoptosis’ })
// further conditions, but often not performant
WHERE prot.name =~ ‘(?i)^DNA.+’
// Usual projection and post-selection operators
RETURN prot.name, pway
// Relations can have properties
ORDER BY csby.pvalue
LIMIT 1000
Single-path (or same-direction branching) easy to write:
MATCH (prot:Protein) - [:pd_by|cs_by] -> (:Reaction) - [:part_of*1..3] ->
(pway:Path)
RETURN ID(prot), ID(pway) LIMIT 1000
// Very compact forms available, depending on the data
MATCH (prot:Protein) - (pway:Path) RETURN pway
C Y P H E R Q U E R Y / D M
L A N G U A G E
DML features:
MATCH (prot:Protein{ name:’P53’ }), (pway:Path{ title:’apoptosis’})
CREATE (prot) - [:participates_in] -> (pway)
DML features, embeddable in Java/Python/etc:
UNWIND $rows AS row // $rows set by the invoker, programmatically
MATCH (prot:Protein{ id: row.protId }), (pway:Path{ id:row.pathId })
CREATE (prot) - [relation:participates_in] -> (pway)
SET relation = row.relationAttributes
C Y P H E R / N E O 4 J P E R F O R M A N C E
Simple, common queries
C Y P H E R / N E O 4 J P E R F O R M A N C E
Path Queries
S O U N D S G O O D , B U T …
select distinct ?prot ?pway {
where {
# Branch 1
…
}
union {
# Branch 2
…
{
# Branch 2.1
}
union {
# Branch 2.2
}
…
}
}
• In Cypher?!
• I couldn’t find a decent way, although it might be possible (https://goo.gl/Rpa9SM)
• Partially possible in straightforward way, but redundantly, e.g., Branch 2:
MATCH (prot:Protein) <- [:ac_by] - (:Enzyme) <- [:ca_by] - (:Transport) <- [:part_of] -
(pway:Path)
RETURN prot, pway LIMIT 100
UNION
MATCH (prot:Protein) - [:is_a] -> (:Enzyme) <- [:ca_by] - (:Transport) <- [:part_of] -
(pway:Path)
RETURN prot, pway LIMIT 100
A D D E N D U M
select distinct ?prot ?pway {
where {
# Branch 1
…
}
union {
# Branch 2
…
{
# Branch 2.1
}
union {
# Branch 2.2
}
…
}
}
• In Cypher?!
Unions+branches partially possible by means of paths in WHERE:
// Branch 2
MATCH (prot:Protein), (enz:Enzyme), (tns:Transport) - [:part_of] -> (path:Path)
WHERE ( (enz) - [:ac_by|:in_by] -> (:Comp) - [:pd_by|:cs_by] -> (tns) // Branch 2.1
OR (tns) - [:ca_by] -> (enz) ) //Branch 2.2 (pt1)
AND ( (prot) - [:is_a] -> (enz) OR (prot) <- [:ac_by] - (enz) ) // Branch 2.2 (pt2)
RETURN prot, path LIMIT 30
UNION
// Branch1
MATCH (prot:Protein) - [:pd_by|:cs_by] -> (:Reaction) - [:part_of] -> (path:Path)
RETURN prot, path LIMIT 30
• However,
• 41249ms to execute against wheat net.
• it generates cartesian products and can
easily explode
S O U N D S G O O D , B U T …
• What about schemas/metadata/ontologies?
• Node and relations can only have multiple labels attached, which are just
strings. Rich schema-operations not so easy:
• Select any kind of protein, including enzymes, cytokines
• Select any type of ‘interacts with’, including ‘catalysed by’, ‘consumed by’,
‘produced by’ (might require ‘inverse of’)
• Basically, has a relational-oriented view about the schemas
S O U N D S G O O D , B U T …
• Basically, it’s relational-oriented about schemas
• we might still be OK with metadata modelled as graphs, however:
• MATCH (molecule:Molecule),
(molType:Class)-[:is_a*]->(:Class{ name:’Protein’ })
WHERE LABELS molType IN LABELS (molecule)
• It’s expensive to compute (doesn’t exploit indexes)
• MATCH (molecule:Molecule:$additionalLabel) CREATE …
• Parameterising on labels not possible
• Requires non parametric Cypher string => UNWIND-based bulk loading impossible
• => bad performance
• Programmatic approach possible, but a lot of problems with things like Lucene version mismatches (one reason
being that ONDEX would require review and proper plug-in architecture)
F L A T , R D F - L I K E M O D E L
Code for both converters:
github:/marco-brandizi/odx_neo4j_converter_test
F L A T M O D E L I M P A C T O N
C Y P H E R
Structured model:
MATCH (prot:Protein{ id: '250169' }) - [:cs_by] -> (react:Reaction) - [:part_of] -> (pway:Path)
RETURN * LIMIT 100
Flat model:
MATCH (prot:Concept {id: '250169', ccName: 'Protein'})
<- [:from] - (csby:Relation {name: 'cs_by' })
- [:to] -> (react:Concept { ccName: 'Reaction'})
<- [:from] - (partof:Relation {name:'part_of'}) - [:to]
-> (pway:Concept {ccName:'Path'})
RETURN * LIMIT 100
Rich schema-based queries
MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass),
(cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’}
F L A T M O D E L P E R F O R M A N C E
Simple, common queries
F L A T M O D E L P E R F O R M A N C E
Typical ONDEX Graph Queries
I M P A C T O N C Y P H E R
Rich schema-based queries
From:
MATCH (molecule:Molecule), (molType:Class)-[:is_a*]->(:Class{ name:’Protein’ })
WHERE molType.label IN LABELS (molecule)
To:
MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass),
(cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’}
now it’s efficient-enough (especially with length restrictions)
However…
I M P A C T O N C Y P H E R
Rich schema-based queries
MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass),
(cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’}
now it’s efficient-enough (especially with length restrictions)
However…
from: MATCH (react:Reaction) - [:part_of] -> (pway:Path)
to: MATCH (react:Concept {ccName: ‘Reaction’})
<- [:from] - (partof:Relation {name:'part_of'})
- [:to] -> (pway:Concept {ccName:'Path'})
What if we want variable-length part_of?
Not currently possible in Cypher (nor in SPARQL),
maybe in future (https://github.com/neo4j/neo4j/issues/88)
=> Having both model, redundantly, would probably be worth
=> makes it not so different than RDF
O T H E R I S S U E S
• Data Exchange format?
• None, except Cypher
• DML not so performant
• In particular, no standard data exchange format
• Could be combined with RDF
• Is Neo4j Open Source?
• Produced by a company, only the Community Edition is OSS
• OpenCypher is available
• Cypher backed by Gremlin/TinkerPop
• Apache project, more reliable OSS-wide
• Performance comparable with Neo4j (https://goo.gl/NK1tn2)
• More choice of implementations
• Alternative QL, but more complicated IMHO (Cypher supported)
Image credits: https://goo.gl/ysBFF2
C O N C L U S I O N S
Neo4J/GraphDBs Virtuoso/Triple Stores
Data X format - +
Data model
+ Relations with properties
- Metadata management
- Relations cannot have properties (req. reification)
+ Metadata as first citizen
Performance + - (comparable)
QL
+ Easier (eg, compact, omissions)? - Expressivity
for some patterns (unions, DML)
- Harder? (URIs, namespaces, verbosity) + More
expressive
Standardisation,
openness
- +
Scalability, big data - TinkerPop probably better
LB/Cluster solutions Over TinkerPop (via SAIL
implementation)
C O N C L U S I O N S
C O N C L U S I O N S
C O N C L U S I O N S
W H Y ?
• Graph + APIs
• Clearer architecture, open to more
applications, not only kNetMiner
• QL makes it easier to develop further
components/analyses/applications
• Standard Data model and format
• Don’t reinvent the wheel
• Data sharing
• Data and app integration
C O N C L U S I O N
S

More Related Content

What's hot

Introduction to Haskell: 2011-04-13
Introduction to Haskell: 2011-04-13Introduction to Haskell: 2011-04-13
Introduction to Haskell: 2011-04-13Jay Coskey
 
Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Simplilearn
 
Data translation with SPARQL 1.1
Data translation with SPARQL 1.1Data translation with SPARQL 1.1
Data translation with SPARQL 1.1andreas_schultz
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationPierre de Lacaze
 
Tackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in RTackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in RLun-Hsien Chang
 
Lambdas And Streams Hands On Lab, JavaOne 2014
Lambdas And Streams Hands On Lab, JavaOne 2014Lambdas And Streams Hands On Lab, JavaOne 2014
Lambdas And Streams Hands On Lab, JavaOne 2014Simon Ritter
 
JavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java codeJavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java codeFederico Tomassetti
 
Manipulating string data with a pattern in R
Manipulating string data with  a pattern in RManipulating string data with  a pattern in R
Manipulating string data with a pattern in RLun-Hsien Chang
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select TopicsJay Coskey
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207Jay Coskey
 
Java Input Output (java.io.*)
Java Input Output (java.io.*)Java Input Output (java.io.*)
Java Input Output (java.io.*)Om Ganesh
 
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon RitterLambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon RitterJAXLondon2014
 
Lambdas And Streams Hands On Lab
Lambdas And Streams Hands On LabLambdas And Streams Hands On Lab
Lambdas And Streams Hands On LabSimon Ritter
 
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfDatabase & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfInSync2011
 

What's hot (17)

Introduction to Haskell: 2011-04-13
Introduction to Haskell: 2011-04-13Introduction to Haskell: 2011-04-13
Introduction to Haskell: 2011-04-13
 
Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...
 
Data translation with SPARQL 1.1
Data translation with SPARQL 1.1Data translation with SPARQL 1.1
Data translation with SPARQL 1.1
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and Representation
 
XML and XPath details
XML and XPath detailsXML and XPath details
XML and XPath details
 
Tackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in RTackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in R
 
Lambdas And Streams Hands On Lab, JavaOne 2014
Lambdas And Streams Hands On Lab, JavaOne 2014Lambdas And Streams Hands On Lab, JavaOne 2014
Lambdas And Streams Hands On Lab, JavaOne 2014
 
JavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java codeJavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java code
 
Manipulating string data with a pattern in R
Manipulating string data with  a pattern in RManipulating string data with  a pattern in R
Manipulating string data with a pattern in R
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select Topics
 
Lz77 by ayush
Lz77 by ayushLz77 by ayush
Lz77 by ayush
 
Java stream
Java streamJava stream
Java stream
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
 
Java Input Output (java.io.*)
Java Input Output (java.io.*)Java Input Output (java.io.*)
Java Input Output (java.io.*)
 
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon RitterLambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
 
Lambdas And Streams Hands On Lab
Lambdas And Streams Hands On LabLambdas And Streams Hands On Lab
Lambdas And Streams Hands On Lab
 
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfDatabase & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
 

Similar to A Preliminary survey of RDF/Neo4j as backends for KnetMiner

Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitGreg Landrum
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMfnothaft
 
Neo4j_Cypher.pdf
Neo4j_Cypher.pdfNeo4j_Cypher.pdf
Neo4j_Cypher.pdfJaberRad1
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAMfnothaft
 
Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Sheng Wang
 
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...Mariano Rodriguez-Muro
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for GraphsJean Ihm
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistrybaoilleach
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistryguest5929fa7
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationINRIA-OAK
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Rothamsted Research, UK
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeNational Institute of Informatics
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRathachai Chawuthai
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMfnothaft
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Databricks
 
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | PrometheusCreating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | PrometheusInfluxData
 

Similar to A Preliminary survey of RDF/Neo4j as backends for KnetMiner (20)

Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
 
Neo4j_Cypher.pdf
Neo4j_Cypher.pdfNeo4j_Cypher.pdf
Neo4j_Cypher.pdf
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAM
 
User biglm
User biglmUser biglm
User biglm
 
Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013
 
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for Graphs
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistry
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistry
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimation
 
Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
 
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | PrometheusCreating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
 

More from Rothamsted Research, UK

FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseRothamsted Research, UK
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesRothamsted Research, UK
 
AgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasRothamsted Research, UK
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food DomainRothamsted Research, UK
 
AgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesAgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesRothamsted Research, UK
 
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Rothamsted Research, UK
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Rothamsted Research, UK
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...Rothamsted Research, UK
 
myEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference servicemyEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference serviceRothamsted Research, UK
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialRothamsted Research, UK
 

More from Rothamsted Research, UK (20)

FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use Cases
 
AgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with Bioschemas
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
 
Continuos Integration @Knetminer
Continuos Integration @KnetminerContinuos Integration @Knetminer
Continuos Integration @Knetminer
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 
AgriSchemas Progress Report
AgriSchemas Progress ReportAgriSchemas Progress Report
AgriSchemas Progress Report
 
AgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesAgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use Cases
 
Notes about SWAT4LS 2018
Notes about SWAT4LS 2018Notes about SWAT4LS 2018
Notes about SWAT4LS 2018
 
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...
 
Interoperable Open Data: Which Recipes?
Interoperable Open Data: Which Recipes?Interoperable Open Data: Which Recipes?
Interoperable Open Data: Which Recipes?
 
Linked Data with the EBI RDF Platform
Linked Data with the EBI RDF PlatformLinked Data with the EBI RDF Platform
Linked Data with the EBI RDF Platform
 
BioSD Linked Data: Lessons Learned
BioSD Linked Data: Lessons LearnedBioSD Linked Data: Lessons Learned
BioSD Linked Data: Lessons Learned
 
myEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference servicemyEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference service
 
Dev 2014 LOD tutorial
Dev 2014 LOD tutorialDev 2014 LOD tutorial
Dev 2014 LOD tutorial
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS Tutorial
 
Semic 2013
Semic 2013Semic 2013
Semic 2013
 
Uk onto net_2013_notes_brandizi
Uk onto net_2013_notes_brandiziUk onto net_2013_notes_brandizi
Uk onto net_2013_notes_brandizi
 

Recently uploaded

How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabbereGrabber
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAShane Coughlan
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionWave PLM
 
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfThe Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfkalichargn70th171
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024vaibhav130304
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMarkus Moeller
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfDeskTrack
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)Max Lee
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems ApproachNeo4j
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Andreas Granig
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAlluxio, Inc.
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfMehmet Akar
 
Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024Henry Schreiner
 
Naer Toolbar Redesign - Usability Research Synthesis
Naer Toolbar Redesign - Usability Research SynthesisNaer Toolbar Redesign - Usability Research Synthesis
Naer Toolbar Redesign - Usability Research Synthesisparimabajra
 
architecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfarchitecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfWSO2
 
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...naitiksharma1124
 
What need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersWhat need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersEmilyJiang23
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Andrea Goulet
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024Shane Coughlan
 

Recently uploaded (20)

How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfThe Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
Naer Toolbar Redesign - Usability Research Synthesis
Naer Toolbar Redesign - Usability Research SynthesisNaer Toolbar Redesign - Usability Research Synthesis
Naer Toolbar Redesign - Usability Research Synthesis
 
architecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfarchitecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdf
 
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
 
What need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersWhat need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java Developers
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 

A Preliminary survey of RDF/Neo4j as backends for KnetMiner

  • 1. O N D E X & G R A P H D B S M A R C O B R A N D I Z I , 1 6 / 1 0 / 2 0 1 7
  • 2. G O A L S • Evaluate graph databases (GDBs)/frameworkd/etc in relation to ONDEX needs • Assess GDBs as kNetMiner/ONDEX backends • Evaluate a new architecture where raw data access is entirely based on a GDB • Evaluate a new data exchange format, possibly integrated with one GDBs • and hence, evaluate the data models too • Assess data query/manipulation languages (expressivity, ease of use, speed) • Assess that performance fits to ONDEX needs
  • 3. T E S T D A T A Trait Ontology (TO) 1500 nodes, is-a and part-of relations (i.e., mostly tree) Gene Ontology (GO) Tree with 46k nodes AraCyc/BioPAX Heterogeneous net, 23k nodes, 40k relations Ara-kNet Heterogeneous net, 350k nodes 1.150M relations
  • 4. T E S T S E T T I N G S ( R D F )
  • 5. T E S T S E T T I N G S ( N E O 4 J )
  • 7. R D F / L I N K E D D A T A E S S E N T I A L S • Simple, Fine-Grained Data Model: Property/Value Pairs & Typed Links • Designed for Data Integration: • Universal Identifiers, W3C Standards • Strong (even too much) emphasis on knowledge modelling via schemas/ontologies • Designed for the Web: Resolvable URIs, Web APIs
  • 8. R D F / L I N K E D D A T A E S S E N T I A L S Integration as native citizen, strong emphasis on knowledge modelling, schemas, ontologies
  • 9. D A T A M O D E L : O N D E X I N R D F
  • 10. E X A M P L E Q U E R I E S Count concepts (classes) in Trait Ontology: select count (distinct ?c) WHERE { ?c a odxcc:TO_TERM. } Parts of membrane (transitively): select distinct ?csup ?supName ?c ?name WHERE { ?csup odx:conceptName ?supName. FILTER ( ?supName = "cellular membrane" ) ?c odxrt:part_of* ?csup. ?c odx:conceptName ?name. } LIMIT 1000 Proteins related to pathways: select distinct ?prot ?pway { ?prot odxrt:pd_by|odxrt:cs_by ?react; a odxcc:Protein. ?react a odxcc:Reaction. ?react odxrt:part_of ?pway. ?pway a odxcc:Path. } LIMIT 1000 optimised order ‘|’ for property paths
  • 11. E X A M P L E Q U E R I E S # part 2 union { # Branch 2 ?prot ^odxrt:ac_by|odxrt:is_a ?enz. ?prot a odxcc:Protein. ?enz a odxcc:Enzyme. { # Branch 2.1 ?enz odxrt:ac_by|odxrt:in_by ?comp. ?comp a odxcc:Compound. ?comp odxrt:cs_by|odxrt:pd_by ?trns ?trns a odxcc:Transport } union { # Branch 2.2 ?enz ^odxrt:ca_by ?trns. ?trns a odxcc:Transport } ?trns odxrt:part_of ?pway. ?pway a odxcc:Path. } } LIMIT 1000 prefix odx: <http://ondex.sourceforge.net/ondex-core#> prefix odxcc: <http://www.ondex.org/ex/conceptClass/> prefix odxc: <http://www.ondex.org/ex/concept/> prefix odxrt: <http://www.ondex.org/ex/relationType/> prefix odxr: <http://www.ondex.org/ex/relation/> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> select distinct ?prot ?pway { where { # Branch 1 ?prot odxrt:pd_by|odxrt:cs_by ?react. ?prot a odxcc:Protein. ?react a odxcc:Reaction. ?react odxrt:part_of ?pway. ?pway a odxcc:Path. } # to be continued… Proteins related to pathways:
  • 12. R D F P E R F O R M A N C E Simple, common queries (Fuseki)
  • 13. R D F P E R F O R M A N C E Queries over ONDEX paths (Fuseki)
  • 14. R D F P E R F O R M A N C E Queries over ONDEX paths, Virtuoso
  • 15. N E O 4 J
  • 16. N E O 4 J E S S E N T I A L S • Designed to backup applications • much less about standards or Web-based sharing • Very little to manage schemas (more later) • No native data format (except Cypher, support for GraphML, RDF) • Initially based on API only, now Cypher available • Compact, easy, no URIs (can be used as strings) • Very performant • Hasn’t much for clustering/federation, but Cypher can be used in TinkerPop • More commercial (not necessarily good) • Cool management interface • Probably easier to use for the average Java developer Image credits: https://goo.gl/YLhCXG
  • 17. N E O 4 J D A T A M O D E L Both nodes and relations can have attributes Nodes & relations have labels (i.e., string-based types) Cool management interface (SPARQL version might be a student project)
  • 18. C Y P H E R Q U E R Y / D M L A N G U A G E Proteins->Reactions->Pathways: // chain of paths, node selection via property (exploits indices) MATCH (prot:Protein) - [csby:consumed_by] -> (:Reaction) - [:part_of] -> (pway:Path{ title: ‘apoptosis’ }) // further conditions, but often not performant WHERE prot.name =~ ‘(?i)^DNA.+’ // Usual projection and post-selection operators RETURN prot.name, pway // Relations can have properties ORDER BY csby.pvalue LIMIT 1000 Single-path (or same-direction branching) easy to write: MATCH (prot:Protein) - [:pd_by|cs_by] -> (:Reaction) - [:part_of*1..3] -> (pway:Path) RETURN ID(prot), ID(pway) LIMIT 1000 // Very compact forms available, depending on the data MATCH (prot:Protein) - (pway:Path) RETURN pway
  • 19. C Y P H E R Q U E R Y / D M L A N G U A G E DML features: MATCH (prot:Protein{ name:’P53’ }), (pway:Path{ title:’apoptosis’}) CREATE (prot) - [:participates_in] -> (pway) DML features, embeddable in Java/Python/etc: UNWIND $rows AS row // $rows set by the invoker, programmatically MATCH (prot:Protein{ id: row.protId }), (pway:Path{ id:row.pathId }) CREATE (prot) - [relation:participates_in] -> (pway) SET relation = row.relationAttributes
  • 20. C Y P H E R / N E O 4 J P E R F O R M A N C E Simple, common queries
  • 21. C Y P H E R / N E O 4 J P E R F O R M A N C E Path Queries
  • 22. S O U N D S G O O D , B U T … select distinct ?prot ?pway { where { # Branch 1 … } union { # Branch 2 … { # Branch 2.1 } union { # Branch 2.2 } … } } • In Cypher?! • I couldn’t find a decent way, although it might be possible (https://goo.gl/Rpa9SM) • Partially possible in straightforward way, but redundantly, e.g., Branch 2: MATCH (prot:Protein) <- [:ac_by] - (:Enzyme) <- [:ca_by] - (:Transport) <- [:part_of] - (pway:Path) RETURN prot, pway LIMIT 100 UNION MATCH (prot:Protein) - [:is_a] -> (:Enzyme) <- [:ca_by] - (:Transport) <- [:part_of] - (pway:Path) RETURN prot, pway LIMIT 100
  • 23. A D D E N D U M select distinct ?prot ?pway { where { # Branch 1 … } union { # Branch 2 … { # Branch 2.1 } union { # Branch 2.2 } … } } • In Cypher?! Unions+branches partially possible by means of paths in WHERE: // Branch 2 MATCH (prot:Protein), (enz:Enzyme), (tns:Transport) - [:part_of] -> (path:Path) WHERE ( (enz) - [:ac_by|:in_by] -> (:Comp) - [:pd_by|:cs_by] -> (tns) // Branch 2.1 OR (tns) - [:ca_by] -> (enz) ) //Branch 2.2 (pt1) AND ( (prot) - [:is_a] -> (enz) OR (prot) <- [:ac_by] - (enz) ) // Branch 2.2 (pt2) RETURN prot, path LIMIT 30 UNION // Branch1 MATCH (prot:Protein) - [:pd_by|:cs_by] -> (:Reaction) - [:part_of] -> (path:Path) RETURN prot, path LIMIT 30 • However, • 41249ms to execute against wheat net. • it generates cartesian products and can easily explode
  • 24. S O U N D S G O O D , B U T … • What about schemas/metadata/ontologies? • Node and relations can only have multiple labels attached, which are just strings. Rich schema-operations not so easy: • Select any kind of protein, including enzymes, cytokines • Select any type of ‘interacts with’, including ‘catalysed by’, ‘consumed by’, ‘produced by’ (might require ‘inverse of’) • Basically, has a relational-oriented view about the schemas
  • 25. S O U N D S G O O D , B U T … • Basically, it’s relational-oriented about schemas • we might still be OK with metadata modelled as graphs, however: • MATCH (molecule:Molecule), (molType:Class)-[:is_a*]->(:Class{ name:’Protein’ }) WHERE LABELS molType IN LABELS (molecule) • It’s expensive to compute (doesn’t exploit indexes) • MATCH (molecule:Molecule:$additionalLabel) CREATE … • Parameterising on labels not possible • Requires non parametric Cypher string => UNWIND-based bulk loading impossible • => bad performance • Programmatic approach possible, but a lot of problems with things like Lucene version mismatches (one reason being that ONDEX would require review and proper plug-in architecture)
  • 26. F L A T , R D F - L I K E M O D E L Code for both converters: github:/marco-brandizi/odx_neo4j_converter_test
  • 27. F L A T M O D E L I M P A C T O N C Y P H E R Structured model: MATCH (prot:Protein{ id: '250169' }) - [:cs_by] -> (react:Reaction) - [:part_of] -> (pway:Path) RETURN * LIMIT 100 Flat model: MATCH (prot:Concept {id: '250169', ccName: 'Protein'}) <- [:from] - (csby:Relation {name: 'cs_by' }) - [:to] -> (react:Concept { ccName: 'Reaction'}) <- [:from] - (partof:Relation {name:'part_of'}) - [:to] -> (pway:Concept {ccName:'Path'}) RETURN * LIMIT 100 Rich schema-based queries MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass), (cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’}
  • 28. F L A T M O D E L P E R F O R M A N C E Simple, common queries
  • 29. F L A T M O D E L P E R F O R M A N C E Typical ONDEX Graph Queries
  • 30. I M P A C T O N C Y P H E R Rich schema-based queries From: MATCH (molecule:Molecule), (molType:Class)-[:is_a*]->(:Class{ name:’Protein’ }) WHERE molType.label IN LABELS (molecule) To: MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass), (cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’} now it’s efficient-enough (especially with length restrictions) However…
  • 31. I M P A C T O N C Y P H E R Rich schema-based queries MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass), (cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’} now it’s efficient-enough (especially with length restrictions) However… from: MATCH (react:Reaction) - [:part_of] -> (pway:Path) to: MATCH (react:Concept {ccName: ‘Reaction’}) <- [:from] - (partof:Relation {name:'part_of'}) - [:to] -> (pway:Concept {ccName:'Path'}) What if we want variable-length part_of? Not currently possible in Cypher (nor in SPARQL), maybe in future (https://github.com/neo4j/neo4j/issues/88) => Having both model, redundantly, would probably be worth => makes it not so different than RDF
  • 32. O T H E R I S S U E S • Data Exchange format? • None, except Cypher • DML not so performant • In particular, no standard data exchange format • Could be combined with RDF • Is Neo4j Open Source? • Produced by a company, only the Community Edition is OSS • OpenCypher is available • Cypher backed by Gremlin/TinkerPop • Apache project, more reliable OSS-wide • Performance comparable with Neo4j (https://goo.gl/NK1tn2) • More choice of implementations • Alternative QL, but more complicated IMHO (Cypher supported) Image credits: https://goo.gl/ysBFF2
  • 33. C O N C L U S I O N S Neo4J/GraphDBs Virtuoso/Triple Stores Data X format - + Data model + Relations with properties - Metadata management - Relations cannot have properties (req. reification) + Metadata as first citizen Performance + - (comparable) QL + Easier (eg, compact, omissions)? - Expressivity for some patterns (unions, DML) - Harder? (URIs, namespaces, verbosity) + More expressive Standardisation, openness - + Scalability, big data - TinkerPop probably better LB/Cluster solutions Over TinkerPop (via SAIL implementation)
  • 34. C O N C L U S I O N S
  • 35. C O N C L U S I O N S
  • 36. C O N C L U S I O N S
  • 37. W H Y ? • Graph + APIs • Clearer architecture, open to more applications, not only kNetMiner • QL makes it easier to develop further components/analyses/applications • Standard Data model and format • Don’t reinvent the wheel • Data sharing • Data and app integration
  • 38. C O N C L U S I O N S