Graph Databases & OrientDB
Arpit Poladia
Data and relationships
• Data by itself has
little value.
• It is the
relationships
between the data
that gives it
incredible value.
Order #134
(Order) Croma
(Provider)
Keyboard
(Product)
Raj
(Customer)
Monitor 40”
(Product)
Mouse
(Product)
GameZone
(Provider)
(Sells)
(Has)(Makes)
(Sells)(Has)
(Sells)
(Has)
Key-Value databases
Document databases
Column family databases
Graph databases
NoSQL Primer
Key-Value databases
Data Model:
• Global key-value mapping
• Big scalable HashMap
Examples:
• Redis, Riak
Pros:
• Simple data model
• Scalable
Cons:
• Create your own foreign keys
• Poor for complex data
Document databases
Data Model:
• A collection of documents
• A document is a key value
collection
Examples:
• CouchDB, MongoDB
Pros:
• Simple, powerful data model
• Scalable
Cons:
• Poor for interconnected data
• Query limited to keys and indexes
Column family databases
Data Model:
• A big table, with column
families
Examples:
• HBase, HyperTable, Cassandra
Pros:
• Supports Semi-Structured Data
• Naturally Indexed (columns)
• Scalable
Cons:
• Poor for interconnected data
Graph databases
Data Model:
•Nodes and Relations
Examples:
•Neo4j, OrientDB
Pros:
•Powerful data model
•Easy to query
Cons:
•Sharding
•Scales UP reasonably well
Representing relations
What are the problems with JOIN?
Why NoSQL products avoid relations?
Why graph relations rock!
Rings a bell?
ID Name
10 Ned
11 Theon
24 Arya
28 Jon
ID Address
10 24
10 33
32 44
ID Location
24 Winterfell
33 Riverrun
18 Pyke
18 Eyrie
44 Dorne
Customer CustomerAddress Address
Problem with JOIN
• A JOIN means searching for a key in another table.
• Indexing the key speeds up searches, but slows down
insert, updates and deletes.
• So in the best case a JOIN is a lookup into an index.
This is done per single join.
• If you traverse hundreds of relationships, you’re
executing hundreds of JOINs.
• Index algorithms are all similar and based on
balanced trees.
• This lookup took 5 steps. With millions of
indexed records, the tree depth could be 1000s
of levels!
• Joins are executed every time you cross
relationships. Query performance suffers as
the database increases in size.
Index Lookup
A-Z
A-L M-Z
E-L
E-G H-L
H-L
H-J K-L
A-L
A-D E-L
Leo
A better way to manage relations?
A graph database is any storage system
that provides index-free adjacency
Graph Theory
Crash Course
A very simple graph
Anderson Red pill
Chooses
Weighted Graph
Labeled Graph
Property Graph
Different kinds of graph
Undirected Graph
Directed Graph
Pseudo Graph
Multi Graph
Property Graph
• All vertices and edges have
unique identifiers.
• Both vertices and edges can have
properties.
• Edges are directed.
• An edge connects only 2 vertices .
• Multiple edges are used to
represent 1-N and N-M
relationships.
Los
Angeles
Population : 3M
Arnold
Quote : I’ll be
back
Graph database
A graph database is any storage system
that provides index-free adjacency
Features of graph databases
• A database with an explicit graph structure.
• Each node knows its adjacent nodes.
• Optimized for connections and traversing connected data.
• Creates the relationship just once when the edge is
created.
• Traversing time is not affected by database size.
• As the number of nodes increases, the cost of a local step
(or hop) remains the same.
OrientDB
2nd generation, multi-model
(graph, document and key-value engines)
NoSQL database
How does OrientDB manage relations?
out : [#14:1]
label : ‘Customer’
name : ‘Anderson’
RID = #13:1
RID = #14:1
RID = #13:2
in: [#14:1]
label = ‘Product’
name = ‘Red pill’
out: [#13:1]
in: [#13:2]
label : ‘Chooses’
Anderson
Chooses
Red pill
The RID is the
physical position
The RID of the edge
is stored as out
The RID of the edge
is stored as in
The RID of the vertices are
stored as out and in
Features
• Supports fast “lightweight edges”.
• Fully object oriented – classes with properties and inheritance.
• Can store up to 400,000 records per second.
• Supports users and roles with fine grained privileges.
Permissive and commercial friendly Apache 2 license.
Schema hybrid
Schema-less mode : Schema is not mandatory.
• Relaxed model, collect heterogeneous documents all together.
Schema-full mode: Schema with constraints on fields and validation rules.
• Customer.age > 17
• Customer.address not null
• Customer.surname is mandatory
• Customer.email matches 'b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b'
Schema-mixed mode:
• Schema with mandatory and optional fields + constraints.
ACID
db.begin();
try {
// your code
...
db.commit();
} catch (Exception e) {
db.rollback();
}
Complex types
• Supports all basic types like bool,
int, long, double, datetime, string,
binary, decimal etc.
• Supports complex types like List,
Set, Map, Link List, Link Map etc.
• Supports embedded documents.
• Supports custom data types.
Scalability
• High Availability
• Auto-Discovery
• Multi-Master Replication
• Synchronous, Asynchronous
and Read-Only replication
• Sharding
Master
Node
Master
Node
C
C C C
CC
C
Auto-
Discovered
Node
Extended SQL
Extended SQL which is similar to SQL.
Easy to pick up and easy to use.
• Relations - select from Account where address.city.country.name = ‘Italy’
• Strings - select from City where country.name.substring(1,3).toUpperCase()
= ‘TAL’
• Regex - select from Agenda where email matches ‘bA-Z0-9._%+-?+@A-Z0-9.-
?+.A-Z?{2,4}b’
• Schemaless - select from Profile where any() like ‘%Jay%’
• Collections - select from Tree where children contains ( married = true )
Demo
OrientDB Studio
Meta graphs
Decorate graph with additional information
to make retrieval faster
Location meta graph
Hotel
2
Hotel
1City
Bengaluru
City
Mysuru
Hotel
3
State
Karnataka
Country
India
Time based meta graph
Trip
2
Trip
3
Day
12
Trip
1Day
5
Month
March
Year
2016
Summing up…
• Graph databases allow you to tell a story. They allow you to
connect the dots.
• Graph databases offer simplicity and speed while permitting
relations to maintain a first-class status.
• Graph databases are designed to store connected data and
make it easy to make sense of that data.
• Graph databases make it easy to evolve the data set.
Thank You!
Any questions?

Graph Databases & OrientDB

  • 1.
    Graph Databases &OrientDB Arpit Poladia
  • 2.
    Data and relationships •Data by itself has little value. • It is the relationships between the data that gives it incredible value. Order #134 (Order) Croma (Provider) Keyboard (Product) Raj (Customer) Monitor 40” (Product) Mouse (Product) GameZone (Provider) (Sells) (Has)(Makes) (Sells)(Has) (Sells) (Has)
  • 3.
    Key-Value databases Document databases Columnfamily databases Graph databases NoSQL Primer
  • 4.
    Key-Value databases Data Model: •Global key-value mapping • Big scalable HashMap Examples: • Redis, Riak Pros: • Simple data model • Scalable Cons: • Create your own foreign keys • Poor for complex data
  • 5.
    Document databases Data Model: •A collection of documents • A document is a key value collection Examples: • CouchDB, MongoDB Pros: • Simple, powerful data model • Scalable Cons: • Poor for interconnected data • Query limited to keys and indexes
  • 6.
    Column family databases DataModel: • A big table, with column families Examples: • HBase, HyperTable, Cassandra Pros: • Supports Semi-Structured Data • Naturally Indexed (columns) • Scalable Cons: • Poor for interconnected data
  • 7.
    Graph databases Data Model: •Nodesand Relations Examples: •Neo4j, OrientDB Pros: •Powerful data model •Easy to query Cons: •Sharding •Scales UP reasonably well
  • 8.
    Representing relations What arethe problems with JOIN? Why NoSQL products avoid relations? Why graph relations rock!
  • 9.
    Rings a bell? IDName 10 Ned 11 Theon 24 Arya 28 Jon ID Address 10 24 10 33 32 44 ID Location 24 Winterfell 33 Riverrun 18 Pyke 18 Eyrie 44 Dorne Customer CustomerAddress Address
  • 10.
    Problem with JOIN •A JOIN means searching for a key in another table. • Indexing the key speeds up searches, but slows down insert, updates and deletes. • So in the best case a JOIN is a lookup into an index. This is done per single join. • If you traverse hundreds of relationships, you’re executing hundreds of JOINs.
  • 11.
    • Index algorithmsare all similar and based on balanced trees. • This lookup took 5 steps. With millions of indexed records, the tree depth could be 1000s of levels! • Joins are executed every time you cross relationships. Query performance suffers as the database increases in size. Index Lookup A-Z A-L M-Z E-L E-G H-L H-L H-J K-L A-L A-D E-L Leo
  • 12.
    A better wayto manage relations? A graph database is any storage system that provides index-free adjacency
  • 13.
  • 14.
    A very simplegraph Anderson Red pill Chooses
  • 15.
    Weighted Graph Labeled Graph PropertyGraph Different kinds of graph Undirected Graph Directed Graph Pseudo Graph Multi Graph
  • 16.
    Property Graph • Allvertices and edges have unique identifiers. • Both vertices and edges can have properties. • Edges are directed. • An edge connects only 2 vertices . • Multiple edges are used to represent 1-N and N-M relationships. Los Angeles Population : 3M Arnold Quote : I’ll be back
  • 17.
    Graph database A graphdatabase is any storage system that provides index-free adjacency
  • 18.
    Features of graphdatabases • A database with an explicit graph structure. • Each node knows its adjacent nodes. • Optimized for connections and traversing connected data. • Creates the relationship just once when the edge is created. • Traversing time is not affected by database size. • As the number of nodes increases, the cost of a local step (or hop) remains the same.
  • 19.
    OrientDB 2nd generation, multi-model (graph,document and key-value engines) NoSQL database
  • 20.
    How does OrientDBmanage relations? out : [#14:1] label : ‘Customer’ name : ‘Anderson’ RID = #13:1 RID = #14:1 RID = #13:2 in: [#14:1] label = ‘Product’ name = ‘Red pill’ out: [#13:1] in: [#13:2] label : ‘Chooses’ Anderson Chooses Red pill The RID is the physical position The RID of the edge is stored as out The RID of the edge is stored as in The RID of the vertices are stored as out and in
  • 21.
    Features • Supports fast“lightweight edges”. • Fully object oriented – classes with properties and inheritance. • Can store up to 400,000 records per second. • Supports users and roles with fine grained privileges. Permissive and commercial friendly Apache 2 license.
  • 22.
    Schema hybrid Schema-less mode: Schema is not mandatory. • Relaxed model, collect heterogeneous documents all together. Schema-full mode: Schema with constraints on fields and validation rules. • Customer.age > 17 • Customer.address not null • Customer.surname is mandatory • Customer.email matches 'b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b' Schema-mixed mode: • Schema with mandatory and optional fields + constraints.
  • 23.
    ACID db.begin(); try { // yourcode ... db.commit(); } catch (Exception e) { db.rollback(); } Complex types • Supports all basic types like bool, int, long, double, datetime, string, binary, decimal etc. • Supports complex types like List, Set, Map, Link List, Link Map etc. • Supports embedded documents. • Supports custom data types.
  • 24.
    Scalability • High Availability •Auto-Discovery • Multi-Master Replication • Synchronous, Asynchronous and Read-Only replication • Sharding Master Node Master Node C C C C CC C Auto- Discovered Node
  • 25.
    Extended SQL Extended SQLwhich is similar to SQL. Easy to pick up and easy to use. • Relations - select from Account where address.city.country.name = ‘Italy’ • Strings - select from City where country.name.substring(1,3).toUpperCase() = ‘TAL’ • Regex - select from Agenda where email matches ‘bA-Z0-9._%+-?+@A-Z0-9.- ?+.A-Z?{2,4}b’ • Schemaless - select from Profile where any() like ‘%Jay%’ • Collections - select from Tree where children contains ( married = true )
  • 26.
  • 27.
    Meta graphs Decorate graphwith additional information to make retrieval faster
  • 28.
  • 29.
    Time based metagraph Trip 2 Trip 3 Day 12 Trip 1Day 5 Month March Year 2016
  • 30.
    Summing up… • Graphdatabases allow you to tell a story. They allow you to connect the dots. • Graph databases offer simplicity and speed while permitting relations to maintain a first-class status. • Graph databases are designed to store connected data and make it easy to make sense of that data. • Graph databases make it easy to evolve the data set.
  • 31.