SlideShare a Scribd company logo
1 of 21
Neo4j GraphStore for
LinkedData
May 13, 2013 Arka Pattanayak
Overview
2
 Connected (or Linked) Data and the NO-SQL
movement.
 Modeling Linked Data:
 Using the Graph data structure.
 Property Graphs.
 Graph-based Database:
 Neo4j
 Querying a graph-store, two schools of thought:
 Traversal-based
 Pattern-matched
Linked Data
3
Linked Data According to the W3C
 Large scale integration of, and reasoning on,
data on the Web.
 Standardized format:
 Resource Description Framework (RDF)
 Access to data:
 XML, XHTML, etc.
 Published relationships between data.
 Query endpoints:
 RDF, GRDDL, POWDER, RDFa, the upcoming
R2RML, RIF, SPARQL
4
@see: http://www.w3.org/standards/semanticweb/data
Modeling Linked Data: Graphs
 “A Graph —records data in→ Nodes —which
have→ Properties”
 Neo4j Graph: An object that contains vertices
and edges.
 Element: An object that can have any number of
key/value pairs associated with it (i.e. properties)
 Vertex: An object that has incoming and outgoing
edges.
 Edge: An object that has a tail and head vertex.
5
@see: https://github.com/tinkerpop/blueprints/wiki/property-
graph-model
Modeling Linked Data: Property Graphs
 A property graph has these elements:
 a set of vertices
 each vertex has a unique identifier.
 each vertex has a set of outgoing edges.
 each vertex has a set of incoming edges.
 each vertex has a collection of properties defined by a map from
key to value.
 a set of edges
 each edge has a unique identifier.
 each edge has an outgoing tail vertex.
 each edge has an incoming head vertex.
 each edge has a label that denotes the type of relationship
between its two vertices.
 each edge has a collection of properties defined by a map from
key to value.
6
@see: https://github.com/tinkerpop/blueprints/wiki/property-
graph-model
Modeling Linked Data: Property Graphs in
Neo4j
7
@see: https://github.com/tinkerpop/blueprints/wiki/property-
graph-model
Modeling Linked Data: Why not RDBMS?
 Property Graph Model vs. RDBMS Anti-patterns.
8
@see: http://www.scribd.com/doc/2670985/SQL-
Antipatterns
@see: https://github.com/tinkerpop/blueprints/wiki/property-
graph-model
9
@see: http://cdn.memegenerator.net/instances/400x/23186449.jpg
NO-SQL (Not-only SQL) Movement
 NO-SQL DEFINITION:
 Next Generation Databases mostly addressing
some of the points: being non-relational,
distributed, open-source and horizontally
scalable.
 Began in early 2009 and is growing rapidly.
 Characterized by:
 schema-free, ✔Neo4j
 easy replication support, ✔Neo4j
 simple API, ✔Neo4j
 eventually consistent / BASE (not ACID), a huge
amount of data ✗Neo4j conforms to ACID! 
 …and more.
10
@see:http://nosql-database.org/
Modeling Linked Data: NO-SQL
11
Neo4j
“An embedded, disk-based, fully
transactional Java persistence engine
that stores data structured in graphs
rather than in tables.”
– Neo Technologies
12
Neo4j: Introduction
 Open-source codebase.
 Baked-in licensing flexibility:
 GPL: “If your app is free, Neo4j is free. If not,
there is a fee”.
 Feb 2010 – v1.0 released.
 Neo Technologies
 CEO: Emil Eifrem (@see:
http://www.youtube.com/watch?v=q9m_5xiGrf4 )
13
Neo4j: Understanding the Architecture
14
@see: http://docs.neo4j.org/chunked/stable/tutorials.html
Neo4j: An Accessible API
15
Updating operations:
Transaction Wrapper (ACID):
@see: http://docs.neo4j.org/chunked/stable/tutorials.html
Neo4j: Querying via Pattern Matching
 “A Traversal —navigates→ a Graph; it —
identifies→ Paths —which order→ Nodes”
 Functional front-end manager application:
 HTTP console (uses REST)
 Cypher (a declarative graph query language)
 Gremlin (an imperative, XPath-oriented, turing-
complete graph programming language)
 Result visualizations
 Query framework plugins for specific use cases:
 e.g. SPARQL
16
@see: http://docs.neo4j.org/chunked/stable/tutorials.html
Neo4j: Querying with Cypher (in Java!)
17
@see: http://docs.neo4j.org/chunked/stable/tutorials-cypher-java.html
Neo4j: The Good
 Server-side Plugins
 SPARQL plugin for Semantic Pattern Matching of
RDF triples.
 Caching plugins
 Visualization plugins
 Big-Data plugins
 many more and growing…
18
Neo4j: The Bad
 RESTful calls to standalone server are slow.
 Cypher is a whole new language.
 SPARQL support is in its infancy.
 Scaling up requires some imaginative re-tooling
of the Property Graph model.
 Indexing is limited to Nodes and Relationships.
19
Neo4j: Visualization of Node-space Partition
20
:::Live Demonstration & Discussions:::
21

More Related Content

What's hot

معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهمعرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهWeb Standards School
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehousesAlex Meadows
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Julien Le Dem
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryAlex Meadows
 
Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your DataAlex Meadows
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureObjectRocket
 
Smart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge GraphSmart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge GraphPeter Haase
 
Lightweight Collection and Storage of Software Repository Data with DataRover
Lightweight Collection and Storage of  Software Repository Data with DataRoverLightweight Collection and Storage of  Software Repository Data with DataRover
Lightweight Collection and Storage of Software Repository Data with DataRoverChristoph Matthies
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesStitch Fix Algorithms
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsPeter Haase
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise Ontotext
 
Improve your SQL workload with observability
Improve your SQL workload with observabilityImprove your SQL workload with observability
Improve your SQL workload with observabilityOVHcloud
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphIoan Toma
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Fabrizio Orlandi
 
Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2Tsendsuren Munkhdalai
 

What's hot (20)

معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهمعرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehouses
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information Discovery
 
Survey on NoSQL integration
Survey on NoSQL integrationSurvey on NoSQL integration
Survey on NoSQL integration
 
Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your Data
 
Multi model-databases
Multi model-databasesMulti model-databases
Multi model-databases
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the future
 
Smart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge GraphSmart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge Graph
 
Tracking data lineage at Stitch Fix
Tracking data lineage at Stitch FixTracking data lineage at Stitch Fix
Tracking data lineage at Stitch Fix
 
Lightweight Collection and Storage of Software Repository Data with DataRover
Lightweight Collection and Storage of  Software Repository Data with DataRoverLightweight Collection and Storage of  Software Repository Data with DataRover
Lightweight Collection and Storage of Software Repository Data with DataRover
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML Pipelines
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
 
Improve your SQL workload with observability
Improve your SQL workload with observabilityImprove your SQL workload with observability
Improve your SQL workload with observability
 
NoSql evaluation
NoSql evaluationNoSql evaluation
NoSql evaluation
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2
 
Nosql Introduction, Basics
Nosql Introduction, BasicsNosql Introduction, Basics
Nosql Introduction, Basics
 

Similar to Neo4j_allHands_04112013

CS828 P5 Individual Project v101
CS828 P5 Individual Project v101CS828 P5 Individual Project v101
CS828 P5 Individual Project v101ThienSi Le
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
No sqlpresentation
No sqlpresentationNo sqlpresentation
No sqlpresentationSalma Gouia
 
Presentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSPresentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSabdurrobsoyon
 
"Databases - The Choice is Yours", Philipp Krenn, Developer Advocate at Elastic
"Databases - The Choice is Yours", Philipp Krenn, Developer Advocate at Elastic"Databases - The Choice is Yours", Philipp Krenn, Developer Advocate at Elastic
"Databases - The Choice is Yours", Philipp Krenn, Developer Advocate at ElasticDataconomy Media
 
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4Jijcsity
 
GraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph DatabasesGraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph DatabasesLinkurious
 
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference KeynoteKingsley Uyi Idehen
 
Database Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiDatabase Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiOllieShoresna
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Ivan Ermilov
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfajajkhan16
 
2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptxRushikeshChikane2
 

Similar to Neo4j_allHands_04112013 (20)

NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
the rising no sql technology
the rising no sql technologythe rising no sql technology
the rising no sql technology
 
CS828 P5 Individual Project v101
CS828 P5 Individual Project v101CS828 P5 Individual Project v101
CS828 P5 Individual Project v101
 
Erciyes university
Erciyes universityErciyes university
Erciyes university
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
No sqlpresentation
No sqlpresentationNo sqlpresentation
No sqlpresentation
 
NoSQL
NoSQLNoSQL
NoSQL
 
Presentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSPresentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMS
 
"Databases - The Choice is Yours", Philipp Krenn, Developer Advocate at Elastic
"Databases - The Choice is Yours", Philipp Krenn, Developer Advocate at Elastic"Databases - The Choice is Yours", Philipp Krenn, Developer Advocate at Elastic
"Databases - The Choice is Yours", Philipp Krenn, Developer Advocate at Elastic
 
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
GraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph DatabasesGraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph Databases
 
No sql database
No sql databaseNo sql database
No sql database
 
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 
Database Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiDatabase Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wi
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 
2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx
 
Nosql
NosqlNosql
Nosql
 

Neo4j_allHands_04112013

  • 1. Neo4j GraphStore for LinkedData May 13, 2013 Arka Pattanayak
  • 2. Overview 2  Connected (or Linked) Data and the NO-SQL movement.  Modeling Linked Data:  Using the Graph data structure.  Property Graphs.  Graph-based Database:  Neo4j  Querying a graph-store, two schools of thought:  Traversal-based  Pattern-matched
  • 4. Linked Data According to the W3C  Large scale integration of, and reasoning on, data on the Web.  Standardized format:  Resource Description Framework (RDF)  Access to data:  XML, XHTML, etc.  Published relationships between data.  Query endpoints:  RDF, GRDDL, POWDER, RDFa, the upcoming R2RML, RIF, SPARQL 4 @see: http://www.w3.org/standards/semanticweb/data
  • 5. Modeling Linked Data: Graphs  “A Graph —records data in→ Nodes —which have→ Properties”  Neo4j Graph: An object that contains vertices and edges.  Element: An object that can have any number of key/value pairs associated with it (i.e. properties)  Vertex: An object that has incoming and outgoing edges.  Edge: An object that has a tail and head vertex. 5 @see: https://github.com/tinkerpop/blueprints/wiki/property- graph-model
  • 6. Modeling Linked Data: Property Graphs  A property graph has these elements:  a set of vertices  each vertex has a unique identifier.  each vertex has a set of outgoing edges.  each vertex has a set of incoming edges.  each vertex has a collection of properties defined by a map from key to value.  a set of edges  each edge has a unique identifier.  each edge has an outgoing tail vertex.  each edge has an incoming head vertex.  each edge has a label that denotes the type of relationship between its two vertices.  each edge has a collection of properties defined by a map from key to value. 6 @see: https://github.com/tinkerpop/blueprints/wiki/property- graph-model
  • 7. Modeling Linked Data: Property Graphs in Neo4j 7 @see: https://github.com/tinkerpop/blueprints/wiki/property- graph-model
  • 8. Modeling Linked Data: Why not RDBMS?  Property Graph Model vs. RDBMS Anti-patterns. 8 @see: http://www.scribd.com/doc/2670985/SQL- Antipatterns @see: https://github.com/tinkerpop/blueprints/wiki/property- graph-model
  • 10. NO-SQL (Not-only SQL) Movement  NO-SQL DEFINITION:  Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable.  Began in early 2009 and is growing rapidly.  Characterized by:  schema-free, ✔Neo4j  easy replication support, ✔Neo4j  simple API, ✔Neo4j  eventually consistent / BASE (not ACID), a huge amount of data ✗Neo4j conforms to ACID!   …and more. 10 @see:http://nosql-database.org/
  • 12. Neo4j “An embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables.” – Neo Technologies 12
  • 13. Neo4j: Introduction  Open-source codebase.  Baked-in licensing flexibility:  GPL: “If your app is free, Neo4j is free. If not, there is a fee”.  Feb 2010 – v1.0 released.  Neo Technologies  CEO: Emil Eifrem (@see: http://www.youtube.com/watch?v=q9m_5xiGrf4 ) 13
  • 14. Neo4j: Understanding the Architecture 14 @see: http://docs.neo4j.org/chunked/stable/tutorials.html
  • 15. Neo4j: An Accessible API 15 Updating operations: Transaction Wrapper (ACID): @see: http://docs.neo4j.org/chunked/stable/tutorials.html
  • 16. Neo4j: Querying via Pattern Matching  “A Traversal —navigates→ a Graph; it — identifies→ Paths —which order→ Nodes”  Functional front-end manager application:  HTTP console (uses REST)  Cypher (a declarative graph query language)  Gremlin (an imperative, XPath-oriented, turing- complete graph programming language)  Result visualizations  Query framework plugins for specific use cases:  e.g. SPARQL 16 @see: http://docs.neo4j.org/chunked/stable/tutorials.html
  • 17. Neo4j: Querying with Cypher (in Java!) 17 @see: http://docs.neo4j.org/chunked/stable/tutorials-cypher-java.html
  • 18. Neo4j: The Good  Server-side Plugins  SPARQL plugin for Semantic Pattern Matching of RDF triples.  Caching plugins  Visualization plugins  Big-Data plugins  many more and growing… 18
  • 19. Neo4j: The Bad  RESTful calls to standalone server are slow.  Cypher is a whole new language.  SPARQL support is in its infancy.  Scaling up requires some imaginative re-tooling of the Property Graph model.  Indexing is limited to Nodes and Relationships. 19
  • 20. Neo4j: Visualization of Node-space Partition 20
  • 21. :::Live Demonstration & Discussions::: 21

Editor's Notes

  1. http://www.scribd.com/doc/2670985/SQL-Antipatterns Comma Separated Columns Multi-Attribute Tables Entity Attribute Value
  2. Atomic: Everything in a transaction succeeds or the entire transaction is rolled back. Consistent: A transaction cannot leave the database in an inconsistent state. Isolated: Transactions cannot interfere with each other. Durable: Completed transactions persist, even when servers restart etc. Basic Availability Soft-state Eventual consistency Rather than requiring consistency after every transaction, it is enough for the database to eventually be in a consistent state. (Accounting systems do this all the time. It’s called “closing out the books.”) It’s OK to use stale data, and it’s OK to give approximate answers.
  3. Not Only SQL : Pick the right tool for the job.