SlideShare a Scribd company logo
Graph Databases
&
Neo4J
Girish Khanzode
Graph Databases
• Graph Based NoSQL Database
• Property Graph Model
• Neo4j
• Noe4j Architecture
• Data Storage
• Programmatic Data Access
• Core API
• Lucene
• Auto Index lifecycle
• Traversers API
• Cypher
• Graph Algorithms
• Neo4j HA
• Cache Sharding
• References
Graphs
• A collection nodes (things) and edges (relationships) that
connect pairs of nodes
– Suitable for any data that is related
• Can attach properties (key-value pairs) on nodes and
relationships
• Relationships connect two nodes and both nodes and
relationships can hold an arbitrary amount of key-value pairs
Graph Relations are Universal
Graph
Graphs
• Well-understood patterns and algorithms
– Studied since Leonard Euler's 7 Bridges (1736)
– Codd's Relational Model (1970)
• Knowledge graph - beyond links, search is smarter when considering how things
are related
• Facebook graph search – people interested in finding things in their part of the
world
• Bing + Britannica: referencing and cross-referencing
• People - relationships to people, to organizations, to places, to things - personal
graph
A Graph Database
• Relationships are first citizens
• NoSQL database optimized for connected data
– Social networking, logistics networks, recommendation engines
– Relationships are as important as the records
– 1000 times faster than RDBMS for connected data
• Uses graph structures with nodes, edges and properties to store data
• Open source graph databases - Neo4j, InfiniteGraph, InfoGrid,OrientDB
• Very fast querying across records
Graph Database
A Graph Database
• Transactional with the usual operations
• RDBMS - can tell sales in last year
• Graph database – can tell customer which book to buy next
• Index-free adjacency
– Every node is a pointer to its adjacent element
• Edges hold most of the important information and relations
– nodes to other nodes
– nodes to properties
Graph Based NoSQL Database
• No rigid format of SQL or the tables and columns representation
• Uses a flexible graphical representation - addresses scalability concerns
• Data can be easily transformed from one model to the other using a
graph based NoSQL database
• Nodes are organised by some relationships with one another represented
by edges between the nodes
• Both nodes and the relationships have some defined properties
Graph Based NoSQL Database
• Labelled, directed, attributed multi-graph - Graphs contains nodes which
are labelled properly with some properties and these nodes have some
relationship with one another which is shown by the directional edges
• While relational database models can replicate the graphical ones, the
edge would require a join which is a costly proposition
Advantages
• Easier Relationships Analysis
• Very fast for associative data sets
– Like social networks
• Map more directly to object oriented applications
– Object classification and Parent->Child relationships
Disadvantages
• If data is just tabular with not much relationship between the
data, graph databases do not fare well
• OLAP support for graph databases not mature
Performance Experiment
• Compute social network path exists
• 1000 persons
• Average 50 friends per person
• pathExists(a, b) limited to depth 4
# persons query time
Relational
database
1000 2000ms
Neo4j 1000 2ms
Neo4j 1000000 2ms
Property Graph Model
name: the Doctor
age: 907
species:Time Lord
first name: Rose
late name:Tyler
vehicle: Skoda
model:Type 40
Graphs -Whiteboard-friendly
• No decomposition, ER design, normalization / de-
normalization as needed with RDBMS
Neo4j
• A Graph Database
• A Property Graph containing Nodes, Relationships with Properties on
both
• Manage complex, highly connected data
• Scalable - High-performance with High-Availability
– Traverse 1,000,000+ relationships / second on commodity hardware
• Server with REST API, or Embeddable on the JVM
Neo4j
• Full ACID transactions
• Schema free, bottom-up data model design
• Stable
• Easier than RDBMS since no need for normalization
• Implemented in Java
• Open Source
Neo4j
• Schema free – Data does not have to adhere to any convention
• Support for wide variety of languages - Java, Python, Perl, Scala,Cypher
• A graph database can be thought of as a key-value store, with full support
for relationships.
• Graph databases don’t avoid design efforts
• Good design still requires effort
Why Neo4J?
• The internet is a network of pages connected to each other.
What is a better way to model that than in graphs?
• No time lost fighting with less expressive data-stores
• Easy to implement experimental features
• A single instance of Neo4j can house at most 34 billion nodes,
34 billion relationships and 68 billion properties
Core API
REST API
JVM Language Bindings
Traversal Framework
Caches
Memory-Mapped (N)IO
Filesystem
Java Ruby Clojure…
Graph Matching
Noe4j Architecture
Software Architecture
Data Storage
• Neo4j stores graph data in a number of different store files
• Each store file contains the data for a specific part of the
graph
– neostore.nodestore.db
– neostore.relationshipstore.db
– neostore.propertystore.db
– neostore.propertystore.db.index
– neostore.propertystore.db.strings
– neostore.propertystore.db.arrays
Node Store
• Size: 9 bytes
– 1st byte - in-use flag
– Next 4 bytes - ID of first relationship
– Last 4 bytes - ID of first property of node
• Fixed size records enable fast lookups
Relationship store
• neostore.relationshipstore.db
• Size: 33 bytes
• 1st byte - In use flag
• Next 8 bytes - IDs of the nodes at the start and end of the relationship
• 4 bytes - Pointer to the relationship type
• 16 bytes - pointers for the next and previous relationship records for each of the start and end nodes. (
property chain)
• 4 bytes - next property id
Relationships Storage
Data Size
nodes 235 (∼ 34 billion)
relationships 235 (∼ 34 billion)
properties 236 to 238 depending on property types (maximum ∼ 274
billion, always at least ∼ 68 billion)
relationship
types
215 (∼ 32 000)
Neo4j API – LogicalView
Programmatic Data Access
• JavaAPIs - JVM languages bind to sameAPIs
• JRuby, Jython, Clojure, Scala…
• Manage nodes and relationships
• Indexing – find data without traversal
• Traversing
• Path finding
• Pattern matching
Core API
• Deals with graphs in terms of their fundamentals
• Nodes - properties
– KV Pairs
• Relationships
– Start node
– End node
– Properties
• KV Pairs
Create Node
GraphDatabaseService db = new EmbeddedGraphDatabase("/tmp/neo");
Transaction tx = db.beginTx();
try {
Node theDoctor = db.createNode();
theDoctor.setProperty("character", "the Doctor");
tx.success();
} finally
{
tx.finish();
}
Create Relationships
Transaction tx = db.beginTx();
try {
Node theDoctor = db.createNode();
theDoctor.setProperty("character", "The Doctor");
Node susan = db.createNode();
susan.setProperty("firstname", "Susan");
susan.setProperty("lastname", "Campbell");
susan.createRelationshipTo(theDoctor,DynamicRelationshipType.withName("COMPANION_OF"));
tx.success();
} finally
{
tx.finish();
}
Index a Graph
• Graphs themselves are indexes
• Can create short-cuts to well-known nodes
• In program, keep a reference to any interesting node
• Indexes offer flexibility in what constitutes an “interesting
node”
Lucene
• The default index implementation for Neo4j
– Default implementation for IndexManager
• Supports many indexes per database
• Each index supports nodes or relationships
• Supports exact and regex-based matching
• Supports scoring
– Number of hits in the index for a given item
– Great for recommendations
Create a Node Index
GraphDatabaseService db = …
Index<Node> planets = db.index().forNodes("planets");
Type
Type
Indexname
CreateOR
retrieve
Create a Relationship Index
GraphDatabaseService db = …
Index<Relationship> enemies = db.index().forRelationships("enemies");
Type
Type
Indexname
CreateOR
retrieve
Exact Matches
GraphDatabaseService db = …
Index<Node> actors = doctorWhoDatabase.index().forNodes("actors");
Node rogerDelgado = actors.get("actor", "Roger Delgado“).getSingle();
Valueto
match
Firstmatch
only
Key
Query Matches
GraphDatabaseService db = …
Index<Node> species = doctorWhoDatabase.index().forNodes("species");
IndexHits<Node> speciesHits = species.query("species“,"S*n");
Query
Key
Transactions to Mutate Indexes
• Mutating access is still protected by transactions which cover both index and graph
GraphDatabaseService db = …
Transaction tx = db.beginTx();
try {
Node nixon= db.createNode();
nixon("character", "Richard Nixon");
db.index().forNodes("characters").add(nixon,
"character“, nixon.getProperty("character"));
tx.success();
} finally {
tx.finish();
}
Auto Index lifecycle
• Auto Index - stays consistent with the graph data
• Specify the property name to index while creation
• If node/relationship or property is removed from the graph it is removed
from the index
• If database started with auto indexing enabled but different auto indexed
properties than the last run, then already auto-indexed entities will be
deleted as they are worked upon
• Re-indexing is a manual
– Existing properties not indexed unless touched
Auto Index lifecycle
AutoIndexer<Node> nodeAutoIndex = graphDb.index().getNodeAutoIndexer();
nodeAutoIndex.startAutoIndexingProperty("species");
nodeAutoIndex.setEnabled( true );
ReadableIndex<Node> autoNodeIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex();
Node -> Relationship Indexes Supported
Core API
• Basic (nodes, relationships)
• Fast
• Imperative
• Flexible - Easily intermix mutating operations
Traversers API
• Mechanisms to query graph navigating from starting node to
related nodes according to algorithm to get answers
• Expressive
• Fast
• Declarative (mostly)
• Opinionated
Cypher - A Graph Query Language
• Query Language for Neo4j
• A declarative graph pattern matching language
– SQL for graphs
– Tabular results
• aggregation, ordering and limits
• Mutating operations
• CRUD
• Easy to formulate queries based on relationships
• Many features stem from improving pain points of SQL like join tables
Cypher - A Graph Query Language
Cypher
Query
• Query:
MATCH(n:Crew)-[r:KNOWS*]-m
WHERE n.name = ‘Neo’
RETUEN nAS Nep,r,m
Operations
• Aggregation - COUNT, SUM, AVG, MAX, MIN, COLLECT
• Where clause
start doctor=node:characters(name = 'Doctor‘)
match (doctor)<-[:PLAYED]-(actor)-[:APPEARED_IN]->(episode) where actor.actor = 'Tom
Baker‘ and episode.title =~ /.*Dalek.*/
return episode.title
• Ordering
– order by <property>
– order by <property> desc
Graph Algorithms
• Neo4j has built-in algorithms
• Callable through JVM and REST APIs
• Higher level of abstraction
• Graph Matching
– Look for patterns in a data set - retail analytics
– Higher-level abstraction than raw traversers
• REST API
– Access the server
• Binary protocol
– JSON as default format
Neo4j HA - High Availability Cluster
• A scalability package known as high availability or HA that
uses a master-slave cluster architecture
– Full data redundancy
– Service fault tolerance
– Linear read scalability
– Master-slave replication
• Single data-centre or global zones
– tolerance for high-latency
Neo4j HA
• Redundancy - improved uptime
– automatic failover
• In a Neo4j HA cluster the full graph is replicated to each instance in the
cluster.
• Full dataset is replicated across the entire cluster to each server
• Read operations can be done locally on each slave
• Read capacity of the HA cluster increases linearly with the number of
servers
Neo4j HA
HA Cluster Architecture
• Cluster performs automatic master election
• Supports master-slave replication for clustering and DR
across sites
HA Cluster Architecture
Write to a Master
• All write operations are co-ordinated by the master
• Writes to the master are fast
• Slaves eventually catch up
Write to a Master
Write to a Slave
• Writes to a slave cause a synchronous transaction
with the master
• Other slaves eventually catch up
Write to a Slave
Server Overload Problem
• Unlike other classes of NOSQL database, a graph does not
have predictable lookup since it is a highly mutable structure
• We want to co-locate related nodes for traversal
performance, but we don’t want to place so many connected
nodes on the same database that it becomes heavily loaded
• The black-hole problem - popular nodes get lumped together
on a single instance, but there is low point cut
Server Overload Problem
Thinly Spread Network
• The opposite is also true, that we don’t want too widely connected nodes
across different database instances since it will incur a substantial
performance penalty at runtime as traversals cross the (relatively latent)
network
• Load-leveling alone can lead to many relationships crossing instances
• These are very expensive to traverse, networks are many orders of
magnitude slower than in-memory traversals
Thinly Spread Network
Minimal Point Cut
• The best approach is to balance a graph across database instances by
creating a minimum point cut for a graph, where graph nodes are placed
such that there are few relationships that span shards
• Good strategy is to take a local view of the graph (no global locks) and
work incrementally (short bursts)
• Take into account use patterns
• Unlike other NoSQL stores, graph s are not predictable so we can not use
techniques like consistent hashing for scale out
Minimal Point Cut
Cache Sharding
• A strategy for large data sets of terabyte scale
• Mandates consistent request routing
• For instance, requests for user A are always sent to server 1,
while requests for user B are always sent to server 2 and so on
• The key assumption is that requests for user A typically touch
parts of the graph around user A, such has his or her friends,
preferences, likes and so on
Cache Sharding
• This means that the neighbourhood of the graph around user
A will be cached on server 1, while the neighbourhood around
user B will be cached on server 2
• By employing consistent routing of requests, the caches of all
servers in the HA cluster can be utilized maximally
• Strategy is highly effective for managing a large graph that
does not fit in RAM
Consistent Routing
• Always try to route related requests to the same server to hopefully
benefit from warm caches
Domain Specific Sharding
• No easy to shard graphs like documents or KV stores
• High performance graph databases limited in terms of data set size that
can be handled by a single machine
• Use replicas to speed up and improve availability but limits data set size
limited to a single machine’s disk/memory
• No perfect algorithm exists but domain insight of expert helps
Domain Specific Sharding
• Some domains can shard easily (geo, most web apps) using consistent
routing approach and cache sharding
– Geo - where the connections between cities are few compared with the
connections within the cities. So can place cities or countries on different
nodes
• Eventually (Petabytes) level data cannot be replicated practically
• Need to shard data across machines
References
1. http://www.neo4j.org
2. http://www.neo4j.org/learn/cypher
3. Bachman, Michal (2013)GraphAware -TowardsOnline Analytical Processing in Graph Databases
http://graphaware.com/assets/bachman-msc-thesis.pdf
4. Hunger, Michael (2012). Cypher and Neo4j http://vimeo.com/83797381
5. Mistry, Deep Neo4j: A Developer’s Perspective
http://osintegrators.com/opensoftwareintegrators%7Cneo4jadevelopersperspective
6. MapGraph:A High LevelAPI for Fast Development of High Performance GraphAnalytics on GPUs
7. Parallel Breadth First Search on GPU Clusters
8. DB-Engines Ranking of Graph DBMS
ThankYou
Check Out My LinkedIn Profile at
https://in.linkedin.com/in/girishkhanzode

More Related Content

What's hot

An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
Debanjan Mahata
 
Graph databases
Graph databasesGraph databases
Graph databases
Vinoth Kannan
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
MongoDB
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4j
Neo4j
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
Pooyan Mehrparvar
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonér
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
Databricks
 
MySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 TipsMySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 Tips
OSSCube
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File Systemelliando dias
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
valuebound
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
DataStax
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
BADR
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
Max De Marzi
 
Sql vs NoSQL-Presentation
 Sql vs NoSQL-Presentation Sql vs NoSQL-Presentation
Sql vs NoSQL-Presentation
Shubham Tomar
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
 

What's hot (20)

An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Graph databases
Graph databasesGraph databases
Graph databases
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4j
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
MySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 TipsMySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 Tips
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Sql vs NoSQL-Presentation
 Sql vs NoSQL-Presentation Sql vs NoSQL-Presentation
Sql vs NoSQL-Presentation
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 

Similar to Graph Databases

Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training Introduction
Max De Marzi
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
thai
 
Ciel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnellesCiel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnelles
Xavier Gorse
 
Graph Database and Neo4j
Graph Database and Neo4jGraph Database and Neo4j
Graph Database and Neo4jSina Khorami
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
Sarang Shravagi
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
_mdev_
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
 
Gerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol Graph Databases
Gerry McNicol Graph Databases
Gerry McNicol
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsGeorge Stathis
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Databricks
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Fred Madrid
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quickl
Neo4j
 
DBMS & Data Models - In Introduction
DBMS & Data Models - In IntroductionDBMS & Data Models - In Introduction
DBMS & Data Models - In Introduction
Rajeev Srivastava
 
Graph Databases & OrientDB
Graph Databases & OrientDBGraph Databases & OrientDB
Graph Databases & OrientDB
Arpit Poladia
 
Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011
jexp
 
Demo Neo4j - Big Data Paris
Demo Neo4j - Big Data ParisDemo Neo4j - Big Data Paris
Demo Neo4j - Big Data Paris
Neo4j
 
Intro to Graphs for Fedict
Intro to Graphs for FedictIntro to Graphs for Fedict
Intro to Graphs for Fedict
Rik Van Bruggen
 
mongodb_DS.pptx
mongodb_DS.pptxmongodb_DS.pptx
mongodb_DS.pptx
DavoudSalehi1
 

Similar to Graph Databases (20)

Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training Introduction
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
Ciel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnellesCiel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnelles
 
Graph Database and Neo4j
Graph Database and Neo4jGraph Database and Neo4j
Graph Database and Neo4j
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Gerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol Graph Databases
Gerry McNicol Graph Databases
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quickl
 
DBMS & Data Models - In Introduction
DBMS & Data Models - In IntroductionDBMS & Data Models - In Introduction
DBMS & Data Models - In Introduction
 
Graph Databases & OrientDB
Graph Databases & OrientDBGraph Databases & OrientDB
Graph Databases & OrientDB
 
Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011
 
NoSql Brownbag
NoSql BrownbagNoSql Brownbag
NoSql Brownbag
 
Demo Neo4j - Big Data Paris
Demo Neo4j - Big Data ParisDemo Neo4j - Big Data Paris
Demo Neo4j - Big Data Paris
 
Intro to Graphs for Fedict
Intro to Graphs for FedictIntro to Graphs for Fedict
Intro to Graphs for Fedict
 
mongodb_DS.pptx
mongodb_DS.pptxmongodb_DS.pptx
mongodb_DS.pptx
 

More from Girish Khanzode

Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
Girish Khanzode
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
Girish Khanzode
 
Data Visulalization
Data VisulalizationData Visulalization
Data Visulalization
Girish Khanzode
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Girish Khanzode
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Girish Khanzode
 
Hadoop
HadoopHadoop
Language R
Language RLanguage R
Language R
Girish Khanzode
 
Python Scipy Numpy
Python Scipy NumpyPython Scipy Numpy
Python Scipy Numpy
Girish Khanzode
 
Funtional Programming
Funtional ProgrammingFuntional Programming
Funtional Programming
Girish Khanzode
 

More from Girish Khanzode (12)

Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Data Visulalization
Data VisulalizationData Visulalization
Data Visulalization
 
IR
IRIR
IR
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
NLP
NLPNLP
NLP
 
NLTK
NLTKNLTK
NLTK
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Hadoop
HadoopHadoop
Hadoop
 
Language R
Language RLanguage R
Language R
 
Python Scipy Numpy
Python Scipy NumpyPython Scipy Numpy
Python Scipy Numpy
 
Funtional Programming
Funtional ProgrammingFuntional Programming
Funtional Programming
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 

Graph Databases

  • 2. Graph Databases • Graph Based NoSQL Database • Property Graph Model • Neo4j • Noe4j Architecture • Data Storage • Programmatic Data Access • Core API • Lucene • Auto Index lifecycle • Traversers API • Cypher • Graph Algorithms • Neo4j HA • Cache Sharding • References
  • 3. Graphs • A collection nodes (things) and edges (relationships) that connect pairs of nodes – Suitable for any data that is related • Can attach properties (key-value pairs) on nodes and relationships • Relationships connect two nodes and both nodes and relationships can hold an arbitrary amount of key-value pairs
  • 6. Graphs • Well-understood patterns and algorithms – Studied since Leonard Euler's 7 Bridges (1736) – Codd's Relational Model (1970) • Knowledge graph - beyond links, search is smarter when considering how things are related • Facebook graph search – people interested in finding things in their part of the world • Bing + Britannica: referencing and cross-referencing • People - relationships to people, to organizations, to places, to things - personal graph
  • 7. A Graph Database • Relationships are first citizens • NoSQL database optimized for connected data – Social networking, logistics networks, recommendation engines – Relationships are as important as the records – 1000 times faster than RDBMS for connected data • Uses graph structures with nodes, edges and properties to store data • Open source graph databases - Neo4j, InfiniteGraph, InfoGrid,OrientDB • Very fast querying across records
  • 9. A Graph Database • Transactional with the usual operations • RDBMS - can tell sales in last year • Graph database – can tell customer which book to buy next • Index-free adjacency – Every node is a pointer to its adjacent element • Edges hold most of the important information and relations – nodes to other nodes – nodes to properties
  • 10. Graph Based NoSQL Database • No rigid format of SQL or the tables and columns representation • Uses a flexible graphical representation - addresses scalability concerns • Data can be easily transformed from one model to the other using a graph based NoSQL database • Nodes are organised by some relationships with one another represented by edges between the nodes • Both nodes and the relationships have some defined properties
  • 11. Graph Based NoSQL Database • Labelled, directed, attributed multi-graph - Graphs contains nodes which are labelled properly with some properties and these nodes have some relationship with one another which is shown by the directional edges • While relational database models can replicate the graphical ones, the edge would require a join which is a costly proposition
  • 12. Advantages • Easier Relationships Analysis • Very fast for associative data sets – Like social networks • Map more directly to object oriented applications – Object classification and Parent->Child relationships
  • 13. Disadvantages • If data is just tabular with not much relationship between the data, graph databases do not fare well • OLAP support for graph databases not mature
  • 14. Performance Experiment • Compute social network path exists • 1000 persons • Average 50 friends per person • pathExists(a, b) limited to depth 4 # persons query time Relational database 1000 2000ms Neo4j 1000 2ms Neo4j 1000000 2ms
  • 15. Property Graph Model name: the Doctor age: 907 species:Time Lord first name: Rose late name:Tyler vehicle: Skoda model:Type 40
  • 16. Graphs -Whiteboard-friendly • No decomposition, ER design, normalization / de- normalization as needed with RDBMS
  • 17. Neo4j • A Graph Database • A Property Graph containing Nodes, Relationships with Properties on both • Manage complex, highly connected data • Scalable - High-performance with High-Availability – Traverse 1,000,000+ relationships / second on commodity hardware • Server with REST API, or Embeddable on the JVM
  • 18. Neo4j • Full ACID transactions • Schema free, bottom-up data model design • Stable • Easier than RDBMS since no need for normalization • Implemented in Java • Open Source
  • 19. Neo4j • Schema free – Data does not have to adhere to any convention • Support for wide variety of languages - Java, Python, Perl, Scala,Cypher • A graph database can be thought of as a key-value store, with full support for relationships. • Graph databases don’t avoid design efforts • Good design still requires effort
  • 20. Why Neo4J? • The internet is a network of pages connected to each other. What is a better way to model that than in graphs? • No time lost fighting with less expressive data-stores • Easy to implement experimental features • A single instance of Neo4j can house at most 34 billion nodes, 34 billion relationships and 68 billion properties
  • 21. Core API REST API JVM Language Bindings Traversal Framework Caches Memory-Mapped (N)IO Filesystem Java Ruby Clojure… Graph Matching Noe4j Architecture
  • 23. Data Storage • Neo4j stores graph data in a number of different store files • Each store file contains the data for a specific part of the graph – neostore.nodestore.db – neostore.relationshipstore.db – neostore.propertystore.db – neostore.propertystore.db.index – neostore.propertystore.db.strings – neostore.propertystore.db.arrays
  • 24. Node Store • Size: 9 bytes – 1st byte - in-use flag – Next 4 bytes - ID of first relationship – Last 4 bytes - ID of first property of node • Fixed size records enable fast lookups
  • 25. Relationship store • neostore.relationshipstore.db • Size: 33 bytes • 1st byte - In use flag • Next 8 bytes - IDs of the nodes at the start and end of the relationship • 4 bytes - Pointer to the relationship type • 16 bytes - pointers for the next and previous relationship records for each of the start and end nodes. ( property chain) • 4 bytes - next property id
  • 27. Data Size nodes 235 (∼ 34 billion) relationships 235 (∼ 34 billion) properties 236 to 238 depending on property types (maximum ∼ 274 billion, always at least ∼ 68 billion) relationship types 215 (∼ 32 000)
  • 28. Neo4j API – LogicalView
  • 29. Programmatic Data Access • JavaAPIs - JVM languages bind to sameAPIs • JRuby, Jython, Clojure, Scala… • Manage nodes and relationships • Indexing – find data without traversal • Traversing • Path finding • Pattern matching
  • 30. Core API • Deals with graphs in terms of their fundamentals • Nodes - properties – KV Pairs • Relationships – Start node – End node – Properties • KV Pairs
  • 31. Create Node GraphDatabaseService db = new EmbeddedGraphDatabase("/tmp/neo"); Transaction tx = db.beginTx(); try { Node theDoctor = db.createNode(); theDoctor.setProperty("character", "the Doctor"); tx.success(); } finally { tx.finish(); }
  • 32. Create Relationships Transaction tx = db.beginTx(); try { Node theDoctor = db.createNode(); theDoctor.setProperty("character", "The Doctor"); Node susan = db.createNode(); susan.setProperty("firstname", "Susan"); susan.setProperty("lastname", "Campbell"); susan.createRelationshipTo(theDoctor,DynamicRelationshipType.withName("COMPANION_OF")); tx.success(); } finally { tx.finish(); }
  • 33. Index a Graph • Graphs themselves are indexes • Can create short-cuts to well-known nodes • In program, keep a reference to any interesting node • Indexes offer flexibility in what constitutes an “interesting node”
  • 34. Lucene • The default index implementation for Neo4j – Default implementation for IndexManager • Supports many indexes per database • Each index supports nodes or relationships • Supports exact and regex-based matching • Supports scoring – Number of hits in the index for a given item – Great for recommendations
  • 35. Create a Node Index GraphDatabaseService db = … Index<Node> planets = db.index().forNodes("planets"); Type Type Indexname CreateOR retrieve
  • 36. Create a Relationship Index GraphDatabaseService db = … Index<Relationship> enemies = db.index().forRelationships("enemies"); Type Type Indexname CreateOR retrieve
  • 37. Exact Matches GraphDatabaseService db = … Index<Node> actors = doctorWhoDatabase.index().forNodes("actors"); Node rogerDelgado = actors.get("actor", "Roger Delgado“).getSingle(); Valueto match Firstmatch only Key
  • 38. Query Matches GraphDatabaseService db = … Index<Node> species = doctorWhoDatabase.index().forNodes("species"); IndexHits<Node> speciesHits = species.query("species“,"S*n"); Query Key
  • 39. Transactions to Mutate Indexes • Mutating access is still protected by transactions which cover both index and graph GraphDatabaseService db = … Transaction tx = db.beginTx(); try { Node nixon= db.createNode(); nixon("character", "Richard Nixon"); db.index().forNodes("characters").add(nixon, "character“, nixon.getProperty("character")); tx.success(); } finally { tx.finish(); }
  • 40. Auto Index lifecycle • Auto Index - stays consistent with the graph data • Specify the property name to index while creation • If node/relationship or property is removed from the graph it is removed from the index • If database started with auto indexing enabled but different auto indexed properties than the last run, then already auto-indexed entities will be deleted as they are worked upon • Re-indexing is a manual – Existing properties not indexed unless touched
  • 41. Auto Index lifecycle AutoIndexer<Node> nodeAutoIndex = graphDb.index().getNodeAutoIndexer(); nodeAutoIndex.startAutoIndexingProperty("species"); nodeAutoIndex.setEnabled( true ); ReadableIndex<Node> autoNodeIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex(); Node -> Relationship Indexes Supported
  • 42. Core API • Basic (nodes, relationships) • Fast • Imperative • Flexible - Easily intermix mutating operations
  • 43. Traversers API • Mechanisms to query graph navigating from starting node to related nodes according to algorithm to get answers • Expressive • Fast • Declarative (mostly) • Opinionated
  • 44. Cypher - A Graph Query Language • Query Language for Neo4j • A declarative graph pattern matching language – SQL for graphs – Tabular results • aggregation, ordering and limits • Mutating operations • CRUD • Easy to formulate queries based on relationships • Many features stem from improving pain points of SQL like join tables
  • 45. Cypher - A Graph Query Language
  • 48. Operations • Aggregation - COUNT, SUM, AVG, MAX, MIN, COLLECT • Where clause start doctor=node:characters(name = 'Doctor‘) match (doctor)<-[:PLAYED]-(actor)-[:APPEARED_IN]->(episode) where actor.actor = 'Tom Baker‘ and episode.title =~ /.*Dalek.*/ return episode.title • Ordering – order by <property> – order by <property> desc
  • 49. Graph Algorithms • Neo4j has built-in algorithms • Callable through JVM and REST APIs • Higher level of abstraction • Graph Matching – Look for patterns in a data set - retail analytics – Higher-level abstraction than raw traversers • REST API – Access the server • Binary protocol – JSON as default format
  • 50. Neo4j HA - High Availability Cluster • A scalability package known as high availability or HA that uses a master-slave cluster architecture – Full data redundancy – Service fault tolerance – Linear read scalability – Master-slave replication • Single data-centre or global zones – tolerance for high-latency
  • 51. Neo4j HA • Redundancy - improved uptime – automatic failover • In a Neo4j HA cluster the full graph is replicated to each instance in the cluster. • Full dataset is replicated across the entire cluster to each server • Read operations can be done locally on each slave • Read capacity of the HA cluster increases linearly with the number of servers
  • 53. HA Cluster Architecture • Cluster performs automatic master election • Supports master-slave replication for clustering and DR across sites
  • 55. Write to a Master • All write operations are co-ordinated by the master • Writes to the master are fast • Slaves eventually catch up
  • 56. Write to a Master
  • 57. Write to a Slave • Writes to a slave cause a synchronous transaction with the master • Other slaves eventually catch up
  • 58. Write to a Slave
  • 59. Server Overload Problem • Unlike other classes of NOSQL database, a graph does not have predictable lookup since it is a highly mutable structure • We want to co-locate related nodes for traversal performance, but we don’t want to place so many connected nodes on the same database that it becomes heavily loaded • The black-hole problem - popular nodes get lumped together on a single instance, but there is low point cut
  • 61. Thinly Spread Network • The opposite is also true, that we don’t want too widely connected nodes across different database instances since it will incur a substantial performance penalty at runtime as traversals cross the (relatively latent) network • Load-leveling alone can lead to many relationships crossing instances • These are very expensive to traverse, networks are many orders of magnitude slower than in-memory traversals
  • 63. Minimal Point Cut • The best approach is to balance a graph across database instances by creating a minimum point cut for a graph, where graph nodes are placed such that there are few relationships that span shards • Good strategy is to take a local view of the graph (no global locks) and work incrementally (short bursts) • Take into account use patterns • Unlike other NoSQL stores, graph s are not predictable so we can not use techniques like consistent hashing for scale out
  • 65. Cache Sharding • A strategy for large data sets of terabyte scale • Mandates consistent request routing • For instance, requests for user A are always sent to server 1, while requests for user B are always sent to server 2 and so on • The key assumption is that requests for user A typically touch parts of the graph around user A, such has his or her friends, preferences, likes and so on
  • 66. Cache Sharding • This means that the neighbourhood of the graph around user A will be cached on server 1, while the neighbourhood around user B will be cached on server 2 • By employing consistent routing of requests, the caches of all servers in the HA cluster can be utilized maximally • Strategy is highly effective for managing a large graph that does not fit in RAM
  • 67. Consistent Routing • Always try to route related requests to the same server to hopefully benefit from warm caches
  • 68. Domain Specific Sharding • No easy to shard graphs like documents or KV stores • High performance graph databases limited in terms of data set size that can be handled by a single machine • Use replicas to speed up and improve availability but limits data set size limited to a single machine’s disk/memory • No perfect algorithm exists but domain insight of expert helps
  • 69. Domain Specific Sharding • Some domains can shard easily (geo, most web apps) using consistent routing approach and cache sharding – Geo - where the connections between cities are few compared with the connections within the cities. So can place cities or countries on different nodes • Eventually (Petabytes) level data cannot be replicated practically • Need to shard data across machines
  • 70. References 1. http://www.neo4j.org 2. http://www.neo4j.org/learn/cypher 3. Bachman, Michal (2013)GraphAware -TowardsOnline Analytical Processing in Graph Databases http://graphaware.com/assets/bachman-msc-thesis.pdf 4. Hunger, Michael (2012). Cypher and Neo4j http://vimeo.com/83797381 5. Mistry, Deep Neo4j: A Developer’s Perspective http://osintegrators.com/opensoftwareintegrators%7Cneo4jadevelopersperspective 6. MapGraph:A High LevelAPI for Fast Development of High Performance GraphAnalytics on GPUs 7. Parallel Breadth First Search on GPU Clusters 8. DB-Engines Ranking of Graph DBMS
  • 71. ThankYou Check Out My LinkedIn Profile at https://in.linkedin.com/in/girishkhanzode