Graph Databases & OrientDB

Graph Databases & OrientDB
Arpit Poladia

Data and relationships
• Data by itself has
little value.
• It is the
relationships
between the data
that gives it
incredible value.
Order #134
(Order) Croma
(Provider)
Keyboard
(Product)
Raj
(Customer)
Monitor 40”
(Product)
Mouse
(Product)
GameZone
(Provider)
(Sells)
(Has)(Makes)
(Sells)(Has)
(Sells)
(Has)

Key-Value databases
Document databases
Column family databases
Graph databases
NoSQL Primer

Key-Value databases
Data Model:
• Global key-value mapping
• Big scalable HashMap
Examples:
• Redis, Riak
Pros:
• Simple data model
• Scalable
Cons:
• Create your own foreign keys
• Poor for complex data

Document databases
Data Model:
• A collection of documents
• A document is a key value
collection
Examples:
• CouchDB, MongoDB
Pros:
• Simple, powerful data model
• Scalable
Cons:
• Poor for interconnected data
• Query limited to keys and indexes

Column family databases
Data Model:
• A big table, with column
families
Examples:
• HBase, HyperTable, Cassandra
Pros:
• Supports Semi-Structured Data
• Naturally Indexed (columns)
• Scalable
Cons:
• Poor for interconnected data

Graph databases
Data Model:
•Nodes and Relations
Examples:
•Neo4j, OrientDB
Pros:
•Powerful data model
•Easy to query
Cons:
•Sharding
•Scales UP reasonably well

Representing relations
What are the problems with JOIN?
Why NoSQL products avoid relations?
Why graph relations rock!

Rings a bell?
ID Name
10 Ned
11 Theon
24 Arya
28 Jon
ID Address
10 24
10 33
32 44
ID Location
24 Winterfell
33 Riverrun
18 Pyke
18 Eyrie
44 Dorne
Customer CustomerAddress Address

Problem with JOIN
• A JOIN means searching for a key in another table.
• Indexing the key speeds up searches, but slows down
insert, updates and deletes.
• So in the best case a JOIN is a lookup into an index.
This is done per single join.
• If you traverse hundreds of relationships, you’re
executing hundreds of JOINs.

• Index algorithms are all similar and based on
balanced trees.
• This lookup took 5 steps. With millions of
indexed records, the tree depth could be 1000s
of levels!
• Joins are executed every time you cross
relationships. Query performance suffers as
the database increases in size.
Index Lookup
A-Z
A-L M-Z
E-L
E-G H-L
H-L
H-J K-L
A-L
A-D E-L
Leo

A better way to manage relations?
A graph database is any storage system
that provides index-free adjacency

A very simple graph
Anderson Red pill
Chooses

Weighted Graph
Labeled Graph
Property Graph
Different kinds of graph
Undirected Graph
Directed Graph
Pseudo Graph
Multi Graph

Property Graph
• All vertices and edges have
unique identifiers.
• Both vertices and edges can have
properties.
• Edges are directed.
• An edge connects only 2 vertices .
• Multiple edges are used to
represent 1-N and N-M
relationships.
Los
Angeles
Population : 3M
Arnold
Quote : I’ll be
back

Graph database
A graph database is any storage system
that provides index-free adjacency

Features of graph databases
• A database with an explicit graph structure.
• Each node knows its adjacent nodes.
• Optimized for connections and traversing connected data.
• Creates the relationship just once when the edge is
created.
• Traversing time is not affected by database size.
• As the number of nodes increases, the cost of a local step
(or hop) remains the same.

OrientDB
2nd generation, multi-model
(graph, document and key-value engines)
NoSQL database

How does OrientDB manage relations?
out : [#14:1]
label : ‘Customer’
name : ‘Anderson’
RID = #13:1
RID = #14:1
RID = #13:2
in: [#14:1]
label = ‘Product’
name = ‘Red pill’
out: [#13:1]
in: [#13:2]
label : ‘Chooses’
Anderson
Chooses
Red pill
The RID is the
physical position
The RID of the edge
is stored as out
The RID of the edge
is stored as in
The RID of the vertices are
stored as out and in

Features
• Supports fast “lightweight edges”.
• Fully object oriented – classes with properties and inheritance.
• Can store up to 400,000 records per second.
• Supports users and roles with fine grained privileges.
Permissive and commercial friendly Apache 2 license.

Schema hybrid
Schema-less mode : Schema is not mandatory.
• Relaxed model, collect heterogeneous documents all together.
Schema-full mode: Schema with constraints on fields and validation rules.
• Customer.age > 17
• Customer.address not null
• Customer.surname is mandatory
• Customer.email matches 'b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b'
Schema-mixed mode:
• Schema with mandatory and optional fields + constraints.

ACID
db.begin();
try {
// your code
...
db.commit();
} catch (Exception e) {
db.rollback();
}
Complex types
• Supports all basic types like bool,
int, long, double, datetime, string,
binary, decimal etc.
• Supports complex types like List,
Set, Map, Link List, Link Map etc.
• Supports embedded documents.
• Supports custom data types.

Scalability
• High Availability
• Auto-Discovery
• Multi-Master Replication
• Synchronous, Asynchronous
and Read-Only replication
• Sharding
Master
Node
Master
Node
C
C C C
CC
C
Auto-
Discovered
Node

Extended SQL
Extended SQL which is similar to SQL.
Easy to pick up and easy to use.
• Relations - select from Account where address.city.country.name = ‘Italy’
• Strings - select from City where country.name.substring(1,3).toUpperCase()
= ‘TAL’
• Regex - select from Agenda where email matches ‘bA-Z0-9._%+-?+@A-Z0-9.-
?+.A-Z?{2,4}b’
• Schemaless - select from Profile where any() like ‘%Jay%’
• Collections - select from Tree where children contains ( married = true )

Meta graphs
Decorate graph with additional information
to make retrieval faster

Location meta graph
Hotel
2
Hotel
1City
Bengaluru
City
Mysuru
Hotel
3
State
Karnataka
Country
India

Time based meta graph
Trip
2
Trip
3
Day
12
Trip
1Day
5
Month
March
Year
2016

Summing up…
• Graph databases allow you to tell a story. They allow you to
connect the dots.
• Graph databases offer simplicity and speed while permitting
relations to maintain a first-class status.
• Graph databases are designed to store connected data and
make it easy to make sense of that data.
• Graph databases make it easy to evolve the data set.

Graph Databases & OrientDB

More Related Content

Similar to Graph Databases & OrientDB

Recently uploaded

Graph Databases & OrientDB