SlideShare a Scribd company logo
O N D E X & G R A P H D B S
M A R C O B R A N D I Z I , 1 6 / 1 0 / 2 0 1 7
G O A L S
• Evaluate graph databases (GDBs)/frameworkd/etc in relation to ONDEX needs
• Assess GDBs as kNetMiner/ONDEX backends
• Evaluate a new architecture where raw data access is entirely based on a GDB
• Evaluate a new data exchange format, possibly integrated with one GDBs
• and hence, evaluate the data models too
• Assess data query/manipulation languages (expressivity, ease of use, speed)
• Assess that performance fits to ONDEX needs
T E S T D A T A
Trait Ontology (TO) 1500 nodes, is-a and part-of relations (i.e., mostly tree)
Gene Ontology (GO) Tree with 46k nodes
AraCyc/BioPAX Heterogeneous net, 23k nodes, 40k relations
Ara-kNet Heterogeneous net, 350k nodes 1.150M relations
T E S T S E T T I N G S ( R D F )
T E S T S E T T I N G S ( N E O 4 J )
R D F
R D F / L I N K E D D A T A
E S S E N T I A L S
• Simple, Fine-Grained Data
Model: Property/Value Pairs &
Typed Links
• Designed for Data Integration:
• Universal Identifiers, W3C
Standards
• Strong (even too much)
emphasis on knowledge
modelling via
schemas/ontologies
• Designed for the Web:
Resolvable URIs, Web APIs
R D F / L I N K E D D A T A E S S E N T I A L S
Integration as native citizen, strong emphasis on knowledge modelling, schemas, ontologies
D A T A M O D E L : O N D E X I N R D F
E X A M P L E Q U E R I E S
Count concepts (classes) in Trait Ontology:
select count (distinct ?c) WHERE {
?c a odxcc:TO_TERM.
}
Parts of membrane (transitively):
select distinct ?csup ?supName ?c ?name
WHERE {
?csup odx:conceptName ?supName.
FILTER ( ?supName = "cellular membrane" )
?c odxrt:part_of* ?csup.
?c odx:conceptName ?name.
}
LIMIT 1000
Proteins related to pathways:
select distinct ?prot ?pway {
?prot odxrt:pd_by|odxrt:cs_by ?react;
a odxcc:Protein.
?react a odxcc:Reaction.
?react odxrt:part_of ?pway.
?pway a odxcc:Path.
}
LIMIT 1000
optimised order
‘|’ for property paths
E X A M P L E Q U E R I E S
# part 2
union {
# Branch 2
?prot ^odxrt:ac_by|odxrt:is_a ?enz.
?prot a odxcc:Protein.
?enz a odxcc:Enzyme.
{
# Branch 2.1
?enz odxrt:ac_by|odxrt:in_by ?comp.
?comp a odxcc:Compound.
?comp odxrt:cs_by|odxrt:pd_by ?trns
?trns a odxcc:Transport
}
union {
# Branch 2.2
?enz ^odxrt:ca_by ?trns.
?trns a odxcc:Transport
}
?trns odxrt:part_of ?pway.
?pway a odxcc:Path.
}
} LIMIT 1000
prefix odx: <http://ondex.sourceforge.net/ondex-core#>
prefix odxcc: <http://www.ondex.org/ex/conceptClass/>
prefix odxc: <http://www.ondex.org/ex/concept/>
prefix odxrt: <http://www.ondex.org/ex/relationType/>
prefix odxr: <http://www.ondex.org/ex/relation/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?prot ?pway {
where {
# Branch 1
?prot odxrt:pd_by|odxrt:cs_by ?react.
?prot a odxcc:Protein.
?react a odxcc:Reaction.
?react odxrt:part_of ?pway.
?pway a odxcc:Path.
}
# to be continued…
Proteins related to pathways:
R D F P E R F O R M A N C E
Simple, common queries (Fuseki)
R D F P E R F O R M A N C E
Queries over ONDEX paths (Fuseki)
R D F P E R F O R M A N C E
Queries over ONDEX paths, Virtuoso
N E O 4 J
N E O 4 J E S S E N T I A L S
• Designed to backup applications
• much less about standards or Web-based sharing
• Very little to manage schemas (more later)
• No native data format (except Cypher, support for
GraphML, RDF)
• Initially based on API only, now Cypher available
• Compact, easy, no URIs (can be used as strings)
• Very performant
• Hasn’t much for clustering/federation, but Cypher can be
used in TinkerPop
• More commercial (not necessarily good)
• Cool management interface
• Probably easier to use for the average Java developer
Image credits: https://goo.gl/YLhCXG
N E O 4 J D A T A M O D E L
Both nodes and relations can have attributes
Nodes & relations have labels
(i.e., string-based types)
Cool management interface
(SPARQL version might be a student project)
C Y P H E R Q U E R Y / D M
L A N G U A G E
Proteins->Reactions->Pathways:
// chain of paths, node selection via property (exploits indices)
MATCH (prot:Protein) - [csby:consumed_by] -> (:Reaction) - [:part_of] -> (pway:Path{ title: ‘apoptosis’ })
// further conditions, but often not performant
WHERE prot.name =~ ‘(?i)^DNA.+’
// Usual projection and post-selection operators
RETURN prot.name, pway
// Relations can have properties
ORDER BY csby.pvalue
LIMIT 1000
Single-path (or same-direction branching) easy to write:
MATCH (prot:Protein) - [:pd_by|cs_by] -> (:Reaction) - [:part_of*1..3] ->
(pway:Path)
RETURN ID(prot), ID(pway) LIMIT 1000
// Very compact forms available, depending on the data
MATCH (prot:Protein) - (pway:Path) RETURN pway
C Y P H E R Q U E R Y / D M
L A N G U A G E
DML features:
MATCH (prot:Protein{ name:’P53’ }), (pway:Path{ title:’apoptosis’})
CREATE (prot) - [:participates_in] -> (pway)
DML features, embeddable in Java/Python/etc:
UNWIND $rows AS row // $rows set by the invoker, programmatically
MATCH (prot:Protein{ id: row.protId }), (pway:Path{ id:row.pathId })
CREATE (prot) - [relation:participates_in] -> (pway)
SET relation = row.relationAttributes
C Y P H E R / N E O 4 J P E R F O R M A N C E
Simple, common queries
C Y P H E R / N E O 4 J P E R F O R M A N C E
Path Queries
S O U N D S G O O D , B U T …
select distinct ?prot ?pway {
where {
# Branch 1
…
}
union {
# Branch 2
…
{
# Branch 2.1
}
union {
# Branch 2.2
}
…
}
}
• In Cypher?!
• I couldn’t find a decent way, although it might be possible (https://goo.gl/Rpa9SM)
• Partially possible in straightforward way, but redundantly, e.g., Branch 2:
MATCH (prot:Protein) <- [:ac_by] - (:Enzyme) <- [:ca_by] - (:Transport) <- [:part_of] -
(pway:Path)
RETURN prot, pway LIMIT 100
UNION
MATCH (prot:Protein) - [:is_a] -> (:Enzyme) <- [:ca_by] - (:Transport) <- [:part_of] -
(pway:Path)
RETURN prot, pway LIMIT 100
A D D E N D U M
select distinct ?prot ?pway {
where {
# Branch 1
…
}
union {
# Branch 2
…
{
# Branch 2.1
}
union {
# Branch 2.2
}
…
}
}
• In Cypher?!
Unions+branches partially possible by means of paths in WHERE:
// Branch 2
MATCH (prot:Protein), (enz:Enzyme), (tns:Transport) - [:part_of] -> (path:Path)
WHERE ( (enz) - [:ac_by|:in_by] -> (:Comp) - [:pd_by|:cs_by] -> (tns) // Branch 2.1
OR (tns) - [:ca_by] -> (enz) ) //Branch 2.2 (pt1)
AND ( (prot) - [:is_a] -> (enz) OR (prot) <- [:ac_by] - (enz) ) // Branch 2.2 (pt2)
RETURN prot, path LIMIT 30
UNION
// Branch1
MATCH (prot:Protein) - [:pd_by|:cs_by] -> (:Reaction) - [:part_of] -> (path:Path)
RETURN prot, path LIMIT 30
• However,
• 41249ms to execute against wheat net.
• it generates cartesian products and can
easily explode
S O U N D S G O O D , B U T …
• What about schemas/metadata/ontologies?
• Node and relations can only have multiple labels attached, which are just
strings. Rich schema-operations not so easy:
• Select any kind of protein, including enzymes, cytokines
• Select any type of ‘interacts with’, including ‘catalysed by’, ‘consumed by’,
‘produced by’ (might require ‘inverse of’)
• Basically, has a relational-oriented view about the schemas
S O U N D S G O O D , B U T …
• Basically, it’s relational-oriented about schemas
• we might still be OK with metadata modelled as graphs, however:
• MATCH (molecule:Molecule),
(molType:Class)-[:is_a*]->(:Class{ name:’Protein’ })
WHERE LABELS molType IN LABELS (molecule)
• It’s expensive to compute (doesn’t exploit indexes)
• MATCH (molecule:Molecule:$additionalLabel) CREATE …
• Parameterising on labels not possible
• Requires non parametric Cypher string => UNWIND-based bulk loading impossible
• => bad performance
• Programmatic approach possible, but a lot of problems with things like Lucene version mismatches (one reason
being that ONDEX would require review and proper plug-in architecture)
F L A T , R D F - L I K E M O D E L
Code for both converters:
github:/marco-brandizi/odx_neo4j_converter_test
F L A T M O D E L I M P A C T O N
C Y P H E R
Structured model:
MATCH (prot:Protein{ id: '250169' }) - [:cs_by] -> (react:Reaction) - [:part_of] -> (pway:Path)
RETURN * LIMIT 100
Flat model:
MATCH (prot:Concept {id: '250169', ccName: 'Protein'})
<- [:from] - (csby:Relation {name: 'cs_by' })
- [:to] -> (react:Concept { ccName: 'Reaction'})
<- [:from] - (partof:Relation {name:'part_of'}) - [:to]
-> (pway:Concept {ccName:'Path'})
RETURN * LIMIT 100
Rich schema-based queries
MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass),
(cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’}
F L A T M O D E L P E R F O R M A N C E
Simple, common queries
F L A T M O D E L P E R F O R M A N C E
Typical ONDEX Graph Queries
I M P A C T O N C Y P H E R
Rich schema-based queries
From:
MATCH (molecule:Molecule), (molType:Class)-[:is_a*]->(:Class{ name:’Protein’ })
WHERE molType.label IN LABELS (molecule)
To:
MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass),
(cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’}
now it’s efficient-enough (especially with length restrictions)
However…
I M P A C T O N C Y P H E R
Rich schema-based queries
MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass),
(cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’}
now it’s efficient-enough (especially with length restrictions)
However…
from: MATCH (react:Reaction) - [:part_of] -> (pway:Path)
to: MATCH (react:Concept {ccName: ‘Reaction’})
<- [:from] - (partof:Relation {name:'part_of'})
- [:to] -> (pway:Concept {ccName:'Path'})
What if we want variable-length part_of?
Not currently possible in Cypher (nor in SPARQL),
maybe in future (https://github.com/neo4j/neo4j/issues/88)
=> Having both model, redundantly, would probably be worth
=> makes it not so different than RDF
O T H E R I S S U E S
• Data Exchange format?
• None, except Cypher
• DML not so performant
• In particular, no standard data exchange format
• Could be combined with RDF
• Is Neo4j Open Source?
• Produced by a company, only the Community Edition is OSS
• OpenCypher is available
• Cypher backed by Gremlin/TinkerPop
• Apache project, more reliable OSS-wide
• Performance comparable with Neo4j (https://goo.gl/NK1tn2)
• More choice of implementations
• Alternative QL, but more complicated IMHO (Cypher supported)
Image credits: https://goo.gl/ysBFF2
C O N C L U S I O N S
Neo4J/GraphDBs Virtuoso/Triple Stores
Data X format - +
Data model
+ Relations with properties
- Metadata management
- Relations cannot have properties (req. reification)
+ Metadata as first citizen
Performance + - (comparable)
QL
+ Easier (eg, compact, omissions)? - Expressivity
for some patterns (unions, DML)
- Harder? (URIs, namespaces, verbosity) + More
expressive
Standardisation,
openness
- +
Scalability, big data - TinkerPop probably better
LB/Cluster solutions Over TinkerPop (via SAIL
implementation)
C O N C L U S I O N S
C O N C L U S I O N S
C O N C L U S I O N S
W H Y ?
• Graph + APIs
• Clearer architecture, open to more
applications, not only kNetMiner
• QL makes it easier to develop further
components/analyses/applications
• Standard Data model and format
• Don’t reinvent the wheel
• Data sharing
• Data and app integration
C O N C L U S I O N
S

More Related Content

What's hot

Introduction to Haskell: 2011-04-13
Introduction to Haskell: 2011-04-13Introduction to Haskell: 2011-04-13
Introduction to Haskell: 2011-04-13
Jay Coskey
 
Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...
Simplilearn
 
Data translation with SPARQL 1.1
Data translation with SPARQL 1.1Data translation with SPARQL 1.1
Data translation with SPARQL 1.1andreas_schultz
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and Representation
Pierre de Lacaze
 
XML and XPath details
XML and XPath detailsXML and XPath details
XML and XPath details
DSK Chakravarthy
 
Tackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in RTackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in R
Lun-Hsien Chang
 
Lambdas And Streams Hands On Lab, JavaOne 2014
Lambdas And Streams Hands On Lab, JavaOne 2014Lambdas And Streams Hands On Lab, JavaOne 2014
Lambdas And Streams Hands On Lab, JavaOne 2014Simon Ritter
 
JavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java codeJavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java code
Federico Tomassetti
 
Manipulating string data with a pattern in R
Manipulating string data with  a pattern in RManipulating string data with  a pattern in R
Manipulating string data with a pattern in R
Lun-Hsien Chang
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select Topics
Jay Coskey
 
Lz77 by ayush
Lz77 by ayushLz77 by ayush
Lz77 by ayush
Ayush Gupta
 
Java stream
Java streamJava stream
Java stream
Arati Gadgil
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
Jay Coskey
 
Java Input Output (java.io.*)
Java Input Output (java.io.*)Java Input Output (java.io.*)
Java Input Output (java.io.*)
Om Ganesh
 
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon RitterLambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
JAXLondon2014
 
Lambdas And Streams Hands On Lab
Lambdas And Streams Hands On LabLambdas And Streams Hands On Lab
Lambdas And Streams Hands On Lab
Simon Ritter
 
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfDatabase & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfInSync2011
 

What's hot (17)

Introduction to Haskell: 2011-04-13
Introduction to Haskell: 2011-04-13Introduction to Haskell: 2011-04-13
Introduction to Haskell: 2011-04-13
 
Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...
 
Data translation with SPARQL 1.1
Data translation with SPARQL 1.1Data translation with SPARQL 1.1
Data translation with SPARQL 1.1
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and Representation
 
XML and XPath details
XML and XPath detailsXML and XPath details
XML and XPath details
 
Tackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in RTackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in R
 
Lambdas And Streams Hands On Lab, JavaOne 2014
Lambdas And Streams Hands On Lab, JavaOne 2014Lambdas And Streams Hands On Lab, JavaOne 2014
Lambdas And Streams Hands On Lab, JavaOne 2014
 
JavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java codeJavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java code
 
Manipulating string data with a pattern in R
Manipulating string data with  a pattern in RManipulating string data with  a pattern in R
Manipulating string data with a pattern in R
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select Topics
 
Lz77 by ayush
Lz77 by ayushLz77 by ayush
Lz77 by ayush
 
Java stream
Java streamJava stream
Java stream
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
 
Java Input Output (java.io.*)
Java Input Output (java.io.*)Java Input Output (java.io.*)
Java Input Output (java.io.*)
 
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon RitterLambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
 
Lambdas And Streams Hands On Lab
Lambdas And Streams Hands On LabLambdas And Streams Hands On Lab
Lambdas And Streams Hands On Lab
 
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfDatabase & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
 

Similar to A Preliminary survey of RDF/Neo4j as backends for KnetMiner

Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
Greg Landrum
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
fnothaft
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
Rothamsted Research, UK
 
Neo4j_Cypher.pdf
Neo4j_Cypher.pdfNeo4j_Cypher.pdf
Neo4j_Cypher.pdf
JaberRad1
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAMfnothaft
 
User biglm
User biglmUser biglm
User biglm
johnatan pladott
 
Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013
Sheng Wang
 
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Mariano Rodriguez-Muro
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for Graphs
Jean Ihm
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistrybaoilleach
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistryguest5929fa7
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimation
INRIA-OAK
 
Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018
Rothamsted Research, UK
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Rothamsted Research, UK
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
National Institute of Informatics
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
Rathachai Chawuthai
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
fnothaft
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Databricks
 
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | PrometheusCreating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
InfluxData
 

Similar to A Preliminary survey of RDF/Neo4j as backends for KnetMiner (20)

Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
 
Neo4j_Cypher.pdf
Neo4j_Cypher.pdfNeo4j_Cypher.pdf
Neo4j_Cypher.pdf
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAM
 
User biglm
User biglmUser biglm
User biglm
 
Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013
 
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for Graphs
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistry
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistry
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimation
 
Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
 
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | PrometheusCreating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
 

More from Rothamsted Research, UK

FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
Rothamsted Research, UK
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use Cases
Rothamsted Research, UK
 
AgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with Bioschemas
Rothamsted Research, UK
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Rothamsted Research, UK
 
Continuos Integration @Knetminer
Continuos Integration @KnetminerContinuos Integration @Knetminer
Continuos Integration @Knetminer
Rothamsted Research, UK
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
Rothamsted Research, UK
 
AgriSchemas Progress Report
AgriSchemas Progress ReportAgriSchemas Progress Report
AgriSchemas Progress Report
Rothamsted Research, UK
 
AgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesAgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use Cases
Rothamsted Research, UK
 
Notes about SWAT4LS 2018
Notes about SWAT4LS 2018Notes about SWAT4LS 2018
Notes about SWAT4LS 2018
Rothamsted Research, UK
 
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Rothamsted Research, UK
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Rothamsted Research, UK
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...
Rothamsted Research, UK
 
Interoperable Open Data: Which Recipes?
Interoperable Open Data: Which Recipes?Interoperable Open Data: Which Recipes?
Interoperable Open Data: Which Recipes?
Rothamsted Research, UK
 
Linked Data with the EBI RDF Platform
Linked Data with the EBI RDF PlatformLinked Data with the EBI RDF Platform
Linked Data with the EBI RDF Platform
Rothamsted Research, UK
 
BioSD Linked Data: Lessons Learned
BioSD Linked Data: Lessons LearnedBioSD Linked Data: Lessons Learned
BioSD Linked Data: Lessons Learned
Rothamsted Research, UK
 
myEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference servicemyEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference service
Rothamsted Research, UK
 
Dev 2014 LOD tutorial
Dev 2014 LOD tutorialDev 2014 LOD tutorial
Dev 2014 LOD tutorial
Rothamsted Research, UK
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS Tutorial
Rothamsted Research, UK
 
Semic 2013
Semic 2013Semic 2013
Uk onto net_2013_notes_brandizi
Uk onto net_2013_notes_brandiziUk onto net_2013_notes_brandizi
Uk onto net_2013_notes_brandizi
Rothamsted Research, UK
 

More from Rothamsted Research, UK (20)

FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use Cases
 
AgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with Bioschemas
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
 
Continuos Integration @Knetminer
Continuos Integration @KnetminerContinuos Integration @Knetminer
Continuos Integration @Knetminer
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 
AgriSchemas Progress Report
AgriSchemas Progress ReportAgriSchemas Progress Report
AgriSchemas Progress Report
 
AgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesAgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use Cases
 
Notes about SWAT4LS 2018
Notes about SWAT4LS 2018Notes about SWAT4LS 2018
Notes about SWAT4LS 2018
 
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...
 
Interoperable Open Data: Which Recipes?
Interoperable Open Data: Which Recipes?Interoperable Open Data: Which Recipes?
Interoperable Open Data: Which Recipes?
 
Linked Data with the EBI RDF Platform
Linked Data with the EBI RDF PlatformLinked Data with the EBI RDF Platform
Linked Data with the EBI RDF Platform
 
BioSD Linked Data: Lessons Learned
BioSD Linked Data: Lessons LearnedBioSD Linked Data: Lessons Learned
BioSD Linked Data: Lessons Learned
 
myEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference servicemyEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference service
 
Dev 2014 LOD tutorial
Dev 2014 LOD tutorialDev 2014 LOD tutorial
Dev 2014 LOD tutorial
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS Tutorial
 
Semic 2013
Semic 2013Semic 2013
Semic 2013
 
Uk onto net_2013_notes_brandizi
Uk onto net_2013_notes_brandiziUk onto net_2013_notes_brandizi
Uk onto net_2013_notes_brandizi
 

Recently uploaded

Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 

Recently uploaded (20)

Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 

A Preliminary survey of RDF/Neo4j as backends for KnetMiner

  • 1. O N D E X & G R A P H D B S M A R C O B R A N D I Z I , 1 6 / 1 0 / 2 0 1 7
  • 2. G O A L S • Evaluate graph databases (GDBs)/frameworkd/etc in relation to ONDEX needs • Assess GDBs as kNetMiner/ONDEX backends • Evaluate a new architecture where raw data access is entirely based on a GDB • Evaluate a new data exchange format, possibly integrated with one GDBs • and hence, evaluate the data models too • Assess data query/manipulation languages (expressivity, ease of use, speed) • Assess that performance fits to ONDEX needs
  • 3. T E S T D A T A Trait Ontology (TO) 1500 nodes, is-a and part-of relations (i.e., mostly tree) Gene Ontology (GO) Tree with 46k nodes AraCyc/BioPAX Heterogeneous net, 23k nodes, 40k relations Ara-kNet Heterogeneous net, 350k nodes 1.150M relations
  • 4. T E S T S E T T I N G S ( R D F )
  • 5. T E S T S E T T I N G S ( N E O 4 J )
  • 7. R D F / L I N K E D D A T A E S S E N T I A L S • Simple, Fine-Grained Data Model: Property/Value Pairs & Typed Links • Designed for Data Integration: • Universal Identifiers, W3C Standards • Strong (even too much) emphasis on knowledge modelling via schemas/ontologies • Designed for the Web: Resolvable URIs, Web APIs
  • 8. R D F / L I N K E D D A T A E S S E N T I A L S Integration as native citizen, strong emphasis on knowledge modelling, schemas, ontologies
  • 9. D A T A M O D E L : O N D E X I N R D F
  • 10. E X A M P L E Q U E R I E S Count concepts (classes) in Trait Ontology: select count (distinct ?c) WHERE { ?c a odxcc:TO_TERM. } Parts of membrane (transitively): select distinct ?csup ?supName ?c ?name WHERE { ?csup odx:conceptName ?supName. FILTER ( ?supName = "cellular membrane" ) ?c odxrt:part_of* ?csup. ?c odx:conceptName ?name. } LIMIT 1000 Proteins related to pathways: select distinct ?prot ?pway { ?prot odxrt:pd_by|odxrt:cs_by ?react; a odxcc:Protein. ?react a odxcc:Reaction. ?react odxrt:part_of ?pway. ?pway a odxcc:Path. } LIMIT 1000 optimised order ‘|’ for property paths
  • 11. E X A M P L E Q U E R I E S # part 2 union { # Branch 2 ?prot ^odxrt:ac_by|odxrt:is_a ?enz. ?prot a odxcc:Protein. ?enz a odxcc:Enzyme. { # Branch 2.1 ?enz odxrt:ac_by|odxrt:in_by ?comp. ?comp a odxcc:Compound. ?comp odxrt:cs_by|odxrt:pd_by ?trns ?trns a odxcc:Transport } union { # Branch 2.2 ?enz ^odxrt:ca_by ?trns. ?trns a odxcc:Transport } ?trns odxrt:part_of ?pway. ?pway a odxcc:Path. } } LIMIT 1000 prefix odx: <http://ondex.sourceforge.net/ondex-core#> prefix odxcc: <http://www.ondex.org/ex/conceptClass/> prefix odxc: <http://www.ondex.org/ex/concept/> prefix odxrt: <http://www.ondex.org/ex/relationType/> prefix odxr: <http://www.ondex.org/ex/relation/> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> select distinct ?prot ?pway { where { # Branch 1 ?prot odxrt:pd_by|odxrt:cs_by ?react. ?prot a odxcc:Protein. ?react a odxcc:Reaction. ?react odxrt:part_of ?pway. ?pway a odxcc:Path. } # to be continued… Proteins related to pathways:
  • 12. R D F P E R F O R M A N C E Simple, common queries (Fuseki)
  • 13. R D F P E R F O R M A N C E Queries over ONDEX paths (Fuseki)
  • 14. R D F P E R F O R M A N C E Queries over ONDEX paths, Virtuoso
  • 15. N E O 4 J
  • 16. N E O 4 J E S S E N T I A L S • Designed to backup applications • much less about standards or Web-based sharing • Very little to manage schemas (more later) • No native data format (except Cypher, support for GraphML, RDF) • Initially based on API only, now Cypher available • Compact, easy, no URIs (can be used as strings) • Very performant • Hasn’t much for clustering/federation, but Cypher can be used in TinkerPop • More commercial (not necessarily good) • Cool management interface • Probably easier to use for the average Java developer Image credits: https://goo.gl/YLhCXG
  • 17. N E O 4 J D A T A M O D E L Both nodes and relations can have attributes Nodes & relations have labels (i.e., string-based types) Cool management interface (SPARQL version might be a student project)
  • 18. C Y P H E R Q U E R Y / D M L A N G U A G E Proteins->Reactions->Pathways: // chain of paths, node selection via property (exploits indices) MATCH (prot:Protein) - [csby:consumed_by] -> (:Reaction) - [:part_of] -> (pway:Path{ title: ‘apoptosis’ }) // further conditions, but often not performant WHERE prot.name =~ ‘(?i)^DNA.+’ // Usual projection and post-selection operators RETURN prot.name, pway // Relations can have properties ORDER BY csby.pvalue LIMIT 1000 Single-path (or same-direction branching) easy to write: MATCH (prot:Protein) - [:pd_by|cs_by] -> (:Reaction) - [:part_of*1..3] -> (pway:Path) RETURN ID(prot), ID(pway) LIMIT 1000 // Very compact forms available, depending on the data MATCH (prot:Protein) - (pway:Path) RETURN pway
  • 19. C Y P H E R Q U E R Y / D M L A N G U A G E DML features: MATCH (prot:Protein{ name:’P53’ }), (pway:Path{ title:’apoptosis’}) CREATE (prot) - [:participates_in] -> (pway) DML features, embeddable in Java/Python/etc: UNWIND $rows AS row // $rows set by the invoker, programmatically MATCH (prot:Protein{ id: row.protId }), (pway:Path{ id:row.pathId }) CREATE (prot) - [relation:participates_in] -> (pway) SET relation = row.relationAttributes
  • 20. C Y P H E R / N E O 4 J P E R F O R M A N C E Simple, common queries
  • 21. C Y P H E R / N E O 4 J P E R F O R M A N C E Path Queries
  • 22. S O U N D S G O O D , B U T … select distinct ?prot ?pway { where { # Branch 1 … } union { # Branch 2 … { # Branch 2.1 } union { # Branch 2.2 } … } } • In Cypher?! • I couldn’t find a decent way, although it might be possible (https://goo.gl/Rpa9SM) • Partially possible in straightforward way, but redundantly, e.g., Branch 2: MATCH (prot:Protein) <- [:ac_by] - (:Enzyme) <- [:ca_by] - (:Transport) <- [:part_of] - (pway:Path) RETURN prot, pway LIMIT 100 UNION MATCH (prot:Protein) - [:is_a] -> (:Enzyme) <- [:ca_by] - (:Transport) <- [:part_of] - (pway:Path) RETURN prot, pway LIMIT 100
  • 23. A D D E N D U M select distinct ?prot ?pway { where { # Branch 1 … } union { # Branch 2 … { # Branch 2.1 } union { # Branch 2.2 } … } } • In Cypher?! Unions+branches partially possible by means of paths in WHERE: // Branch 2 MATCH (prot:Protein), (enz:Enzyme), (tns:Transport) - [:part_of] -> (path:Path) WHERE ( (enz) - [:ac_by|:in_by] -> (:Comp) - [:pd_by|:cs_by] -> (tns) // Branch 2.1 OR (tns) - [:ca_by] -> (enz) ) //Branch 2.2 (pt1) AND ( (prot) - [:is_a] -> (enz) OR (prot) <- [:ac_by] - (enz) ) // Branch 2.2 (pt2) RETURN prot, path LIMIT 30 UNION // Branch1 MATCH (prot:Protein) - [:pd_by|:cs_by] -> (:Reaction) - [:part_of] -> (path:Path) RETURN prot, path LIMIT 30 • However, • 41249ms to execute against wheat net. • it generates cartesian products and can easily explode
  • 24. S O U N D S G O O D , B U T … • What about schemas/metadata/ontologies? • Node and relations can only have multiple labels attached, which are just strings. Rich schema-operations not so easy: • Select any kind of protein, including enzymes, cytokines • Select any type of ‘interacts with’, including ‘catalysed by’, ‘consumed by’, ‘produced by’ (might require ‘inverse of’) • Basically, has a relational-oriented view about the schemas
  • 25. S O U N D S G O O D , B U T … • Basically, it’s relational-oriented about schemas • we might still be OK with metadata modelled as graphs, however: • MATCH (molecule:Molecule), (molType:Class)-[:is_a*]->(:Class{ name:’Protein’ }) WHERE LABELS molType IN LABELS (molecule) • It’s expensive to compute (doesn’t exploit indexes) • MATCH (molecule:Molecule:$additionalLabel) CREATE … • Parameterising on labels not possible • Requires non parametric Cypher string => UNWIND-based bulk loading impossible • => bad performance • Programmatic approach possible, but a lot of problems with things like Lucene version mismatches (one reason being that ONDEX would require review and proper plug-in architecture)
  • 26. F L A T , R D F - L I K E M O D E L Code for both converters: github:/marco-brandizi/odx_neo4j_converter_test
  • 27. F L A T M O D E L I M P A C T O N C Y P H E R Structured model: MATCH (prot:Protein{ id: '250169' }) - [:cs_by] -> (react:Reaction) - [:part_of] -> (pway:Path) RETURN * LIMIT 100 Flat model: MATCH (prot:Concept {id: '250169', ccName: 'Protein'}) <- [:from] - (csby:Relation {name: 'cs_by' }) - [:to] -> (react:Concept { ccName: 'Reaction'}) <- [:from] - (partof:Relation {name:'part_of'}) - [:to] -> (pway:Concept {ccName:'Path'}) RETURN * LIMIT 100 Rich schema-based queries MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass), (cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’}
  • 28. F L A T M O D E L P E R F O R M A N C E Simple, common queries
  • 29. F L A T M O D E L P E R F O R M A N C E Typical ONDEX Graph Queries
  • 30. I M P A C T O N C Y P H E R Rich schema-based queries From: MATCH (molecule:Molecule), (molType:Class)-[:is_a*]->(:Class{ name:’Protein’ }) WHERE molType.label IN LABELS (molecule) To: MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass), (cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’} now it’s efficient-enough (especially with length restrictions) However…
  • 31. I M P A C T O N C Y P H E R Rich schema-based queries MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass), (cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’} now it’s efficient-enough (especially with length restrictions) However… from: MATCH (react:Reaction) - [:part_of] -> (pway:Path) to: MATCH (react:Concept {ccName: ‘Reaction’}) <- [:from] - (partof:Relation {name:'part_of'}) - [:to] -> (pway:Concept {ccName:'Path'}) What if we want variable-length part_of? Not currently possible in Cypher (nor in SPARQL), maybe in future (https://github.com/neo4j/neo4j/issues/88) => Having both model, redundantly, would probably be worth => makes it not so different than RDF
  • 32. O T H E R I S S U E S • Data Exchange format? • None, except Cypher • DML not so performant • In particular, no standard data exchange format • Could be combined with RDF • Is Neo4j Open Source? • Produced by a company, only the Community Edition is OSS • OpenCypher is available • Cypher backed by Gremlin/TinkerPop • Apache project, more reliable OSS-wide • Performance comparable with Neo4j (https://goo.gl/NK1tn2) • More choice of implementations • Alternative QL, but more complicated IMHO (Cypher supported) Image credits: https://goo.gl/ysBFF2
  • 33. C O N C L U S I O N S Neo4J/GraphDBs Virtuoso/Triple Stores Data X format - + Data model + Relations with properties - Metadata management - Relations cannot have properties (req. reification) + Metadata as first citizen Performance + - (comparable) QL + Easier (eg, compact, omissions)? - Expressivity for some patterns (unions, DML) - Harder? (URIs, namespaces, verbosity) + More expressive Standardisation, openness - + Scalability, big data - TinkerPop probably better LB/Cluster solutions Over TinkerPop (via SAIL implementation)
  • 34. C O N C L U S I O N S
  • 35. C O N C L U S I O N S
  • 36. C O N C L U S I O N S
  • 37. W H Y ? • Graph + APIs • Clearer architecture, open to more applications, not only kNetMiner • QL makes it easier to develop further components/analyses/applications • Standard Data model and format • Don’t reinvent the wheel • Data sharing • Data and app integration
  • 38. C O N C L U S I O N S