OrientDB: Unlock the Value of Document Data
Relationships
Fabrizio Fortino

@fabriziofortino
11th April 2016
#HUGIreland

@boistartups
The world is changing
Unstructured

Data
Big Data
Explosion
Connected

Data
Mobile, IOT
http://destinhaus.com/internet-of-things-the-rise-of-smart-manufacturing/
“… starting a new strategic enterprise
application you should no longer be assuming
that your persistence should be relational. The
relational option might be the right one - but
you should seriously look at other alternatives.”
Polyglot Persistence [2011]
Martin Fowler
Rethink how we store data
A Polyglot Persistence example
E-commerce Application
Primary Store

+

Financial Data

(RDBMS)
Recommendations

(Graph)
Products Catalog

(Document)
User Sessions

(Key-Value)
ETL Jobs / Data Synchronisation
• Hire experts for each database type

• No standards between NOSQL products

• Increased overall complexity

• High TCO

• Write and maintain ETL and data synchronisation

• Hard to refactor

• Testing can be tough
More flexibility, at what price?
Entering Multi-Model Databases
GraphDocument
Object
Key/Value

Full-Text
Spatial
Multi-Model represents the
intersection

of multiple models in a single
product
Product Positioning Quadrant
RelationshipComplexity>
Data Complexity >
Relational
Key Value
Column
Graph
Document
Multi-Model
• First Multi-Model DBMS with a Graph Engine

• Community Edition FREE (Apache v2 License)

• Enterprise Edition (profiler, live monitor, telereporter, etc)

• Vibrant community (≈ 100 contributors, ≈ 15K commits)

• Easy to install and use

• Zero configuration Multi-Master Architecture

• ACID 

• Reactive (Live Queries)
OrientDB at a Glance
Quite a long journey
1998 2009 2010 2011 20152012 20142013
OrientDB: First ever
multi-model DBMS
released as Open
Source
R&D
2016
OrientDB Enterprise
Launch
0
12K
70K
3K
1K
200
Downloads / month
Orient ODBMS: First
ever ODBMS with
index-free adjacency
Under the hood
Storage
Memory

Works in Memory Only 

(Ideal for Integration Testing)
PLocal

Write/Read to/from File System
Remote

Delegates all Operations to a Remote
Server
Document API

Handles Records as Documents
Graph API

TinkerPop Blueprints Implementation
Object API

POJO to Document mapping
User Application
• Embedded (in-process)

• Single, Standalone Node

• Multi-Master Replica

• Mixed
Deployment options
Application
Application
Application
Application
Application
Document API
• Lowest level API

• Document (record) is the storage’s unit

• An immutable id (ORID) is automatically set to each
document

• Documents can contain key-value pairs or nested/
embedded documents (no ORID)

• Transactions support (optimistic mode with MVCC)

• Classes are logical sets of documents
Schema-less, Schema-full or Hybrid?
Schema-less

relaxed model, the type of each
field is inferred for each
document
Schema-full

strict model, schema with
constraints on fields and
validation rules
Hybrid

mixed model, schema with
mandatory and optional fields
with constraints and
validation rules
• Can inherits from other classes, creating a tree
(similar to RDF Schema)

• A sub-class inherits all the schema fields from
the parents

• An abstract class is used as the foundation for
other classes (it cannot have records)

• Class hierarchies allow native polymorphic
queries

• 1 to 1 mapping with domain objects
Class concept is taken from OOP
Let’s create a Document
`
{
”@rid": “#12:216”,
”@class": ”user",
“name”: “Fabrizio”,
“meetups”: [
{
“name”: “HUG Ireland”,
“city”: “Dublin”,
“since”: “14-03-2014”
}
],
“details”: {
“@type”: “d”,
“@class”: “user_details”
“city”:”Dublin”,
“nationality”:”IT”
}
}
Immutable Record ID
Logical set
Property
Array of objects
Embedded document
Let’s create a Document
`
{
”@rid": “#12:216”,
”@class": ”user",
“name”: “Fabrizio”,
“meetups”: [
{
“name”: “HUG Ireland”,
“city”: “Dublin”,
“since”: “14-03-2014”
}
],
“details”: {
“@type”: “d”,
“@class”: “user_details”
“city”:”Dublin”,
“nationality”:”IT”
}
}
Immutable Record ID
Logical set
Property
Array of objects
Embedded document
With a traditional Document DB you have to
duplicate your data to some degree. The degree
depends on how complex are the
interdependencies of the application domain.

OrientDB combines the unique flexibility of
documents with the power of graphs to unlock the
business value of Document Data Relationships.
Graphs: everything old is new again
https://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg
What is a Graph Database?
“A Graph Database is any storage system
that provides index-free adjacency”
The Graph Traversal Pattern [2010]
Marco A. Rodriguez
G = (V, E)
Graph
Vertex
Edge
A
• Given a User (Fabrizio)

• Find Fabrizio (id=10) in member table O(log n)

• Find 18 and 24 (Hug Ireland & Microservices) in Meetup table O(log n)
What’s wrong with joins?
name id
Fabrizio 10
Uli 12
John 13
Eddie 88
User
user_id meetup_id
10 18
10 24
13 18
88 66
member
id name
18 HUG Ireland
57 AWS Users
24 Microservices
66 Scala
Meetup
• Joins are computed every time you cross relationships

• Time complexity grows with data: O(log n)

• Joining 3-4 tables with million of records could create billion combinations
• Given a User (Fabrizio)

• Traverse the edges member to reach Hug Ireland O(1) & Microservices O(1)

• Fabrizio is the index to reach the linked Meetups!
The Graph as an Index
• Every vertex and edge is “hard wired” to its adjacent vertex or edge

• Traversing an edge does not require complex computation, near O(1)

• The traversal time is not affected by the database size
Fabrizio
HUG
Ireland
Micro
Services
member
member
Easier to sketch!
Combine Documents with Graphs
`
{
“@rid”: “12:216”,
“@class”: ”user",
“name”: “Fabrizio”,
“details”: {
“@type”: “d”,
“@class”: “user_detail”,
“city”: “Dublin”,
“nationality”: ”IT”
}
`
{
“@rid”: “13:12”,
“@class”: “meetup”,
“name”: “HUG Ireland”,
“city”: “Dublin”
}
`
{
“@rid”: “14:32”,
“@class”: “member”,
“since”: “14-03-2014”,
“in”: “12:216”,
“out”: “13:12”
}
out_member=14:32 in_member=14:32
{
“@rid”: “15:79”,
“@class”: “talk”,
“title”: “OrientDB”,
“on”: “11-04-2016”,
“in”: “12:216”,
“out”: “13:12”
}
out_talk=15:79
in_talk=15:79
Combine Documents with Graphs
`
{
“@rid”: “12:216”,
“@class”: ”user",
“name”: “Fabrizio”,
“details”: {
“@type”: “d”,
“@class”: “user_detail”,
“city”: “Dublin”,
“nationality”: ”IT”
}
`
{
“@rid”: “13:12”,
“@class”: “meetup”,
“name”: “HUG Ireland”,
“city”: “Dublin”
}
`
{
“@rid”: “14:32”,
“@class”: “member”,
“since”: “14-03-2014”,
“in”: “12:216”,
“out”: “13:12”
}
out_member=14:32 in_member=14:32
{
“@rid”: “15:79”,
“@class”: “talk”,
“title”: “OrientDB”,
“on”: “11-04-2016”,
“in”: “12:216”,
“out”: “13:12”
}
out_talk=15:79
in_talk=15:79
Multi-relational Document Graph
Will you believe me if I said you can query
documents/graphs with SQL like syntax?
Show me something now! OK, time for a quick demo.
http://www.sharegoodstuffs.com/2011_12_12_archive.html
Use Case: raise standards in Irish Public Office
• Aggressive deadline
• Large amount of data from different sources with
different formats
• Messy, dirty data
• Connects records from different sources
representing the same thing without a common
identifier
• Multiple steps traverse of fixed and inferred links
to identify disparate entities connected by a path
The challenges
The solution
OrientDB
Fuzzy Inference Engine
• Main Language: Groovy
• Database Type: OrientDB Embedded
• Fuzzy Inference Engine: Duke
• minHash proximity index based on Lucene to avoid cartesian
product
• probabilistic model with configurable statistical algorithms
(Levenshtein, NGram, Soundex, Custom, etc) to identify the
same entities despite differences
• End-To-End Process Time < 10 min
• Deliverable: Database
• Preset of queries to answer the main questions (analysts are
completely independent to add / modify where conditions)
• GraphView to visually search and visualise data
Technical Details
What people from home perceived
≈ 20K tweets
Top hashtag in Ireland for 24 hours#rteinvestigates
“While we’ve long understood the value of Big Data to better
understand how people interact with us, we’ve noticed an
alarming trend of Big Data envy: organizations using complex
tools to handle “not-really-that-big” Data. Distributed map-
reduce algorithms are a handy technique for large data sets,

but many data sets we see could easily fit in a single node

relational or graph database. Even if you do have

more data than that, usually the best thing to do is

to first pick out the data you need, which can often

then be processed on such a single node”
OK but what about Big Data?
ThoughtWorksTechnology Radar, 5 April 2016
Begin the journey!
https://www.udemy.com/orientdb-getting-started/
• http://martinfowler.com/bliki/PolyglotPersistence.html

• https://en.wikipedia.org/wiki/Multi-model_database

• http://orientdb.com/

• https://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg

• http://arxiv.org/pdf/1004.1001.pdf

• https://www.udemy.com/orientdb-getting-started/

• http://www.rte.ie/news/investigations-unit/2015/1207/751833-rte-
investigates/

• https://github.com/larsga/Duke

• https://www.thoughtworks.com/radar
Resources
Q A
Thank you!
&
Fabrizio Fortino

@fabriziofortino
11th April 2016
#HUGIreland

@boistartups

OrientDB: Unlock the Value of Document Data Relationships

  • 1.
    OrientDB: Unlock theValue of Document Data Relationships Fabrizio Fortino @fabriziofortino 11th April 2016 #HUGIreland @boistartups
  • 2.
    The world ischanging Unstructured Data Big Data Explosion Connected Data Mobile, IOT http://destinhaus.com/internet-of-things-the-rise-of-smart-manufacturing/
  • 3.
    “… starting anew strategic enterprise application you should no longer be assuming that your persistence should be relational. The relational option might be the right one - but you should seriously look at other alternatives.” Polyglot Persistence [2011] Martin Fowler Rethink how we store data
  • 4.
    A Polyglot Persistenceexample E-commerce Application Primary Store + Financial Data (RDBMS) Recommendations (Graph) Products Catalog (Document) User Sessions (Key-Value) ETL Jobs / Data Synchronisation
  • 5.
    • Hire expertsfor each database type • No standards between NOSQL products • Increased overall complexity • High TCO • Write and maintain ETL and data synchronisation • Hard to refactor • Testing can be tough More flexibility, at what price?
  • 6.
    Entering Multi-Model Databases GraphDocument Object Key/Value Full-Text Spatial Multi-Modelrepresents the intersection of multiple models in a single product
  • 7.
    Product Positioning Quadrant RelationshipComplexity> DataComplexity > Relational Key Value Column Graph Document Multi-Model
  • 8.
    • First Multi-ModelDBMS with a Graph Engine • Community Edition FREE (Apache v2 License) • Enterprise Edition (profiler, live monitor, telereporter, etc) • Vibrant community (≈ 100 contributors, ≈ 15K commits) • Easy to install and use • Zero configuration Multi-Master Architecture • ACID • Reactive (Live Queries) OrientDB at a Glance
  • 9.
    Quite a longjourney 1998 2009 2010 2011 20152012 20142013 OrientDB: First ever multi-model DBMS released as Open Source R&D 2016 OrientDB Enterprise Launch 0 12K 70K 3K 1K 200 Downloads / month Orient ODBMS: First ever ODBMS with index-free adjacency
  • 10.
    Under the hood Storage Memory Worksin Memory Only (Ideal for Integration Testing) PLocal Write/Read to/from File System Remote Delegates all Operations to a Remote Server Document API Handles Records as Documents Graph API TinkerPop Blueprints Implementation Object API POJO to Document mapping User Application
  • 11.
    • Embedded (in-process) •Single, Standalone Node • Multi-Master Replica • Mixed Deployment options Application Application Application Application Application
  • 12.
    Document API • Lowestlevel API • Document (record) is the storage’s unit • An immutable id (ORID) is automatically set to each document • Documents can contain key-value pairs or nested/ embedded documents (no ORID) • Transactions support (optimistic mode with MVCC) • Classes are logical sets of documents
  • 13.
    Schema-less, Schema-full orHybrid? Schema-less relaxed model, the type of each field is inferred for each document Schema-full strict model, schema with constraints on fields and validation rules Hybrid mixed model, schema with mandatory and optional fields with constraints and validation rules
  • 14.
    • Can inheritsfrom other classes, creating a tree (similar to RDF Schema) • A sub-class inherits all the schema fields from the parents • An abstract class is used as the foundation for other classes (it cannot have records) • Class hierarchies allow native polymorphic queries • 1 to 1 mapping with domain objects Class concept is taken from OOP
  • 15.
    Let’s create aDocument ` { ”@rid": “#12:216”, ”@class": ”user", “name”: “Fabrizio”, “meetups”: [ { “name”: “HUG Ireland”, “city”: “Dublin”, “since”: “14-03-2014” } ], “details”: { “@type”: “d”, “@class”: “user_details” “city”:”Dublin”, “nationality”:”IT” } } Immutable Record ID Logical set Property Array of objects Embedded document
  • 16.
    Let’s create aDocument ` { ”@rid": “#12:216”, ”@class": ”user", “name”: “Fabrizio”, “meetups”: [ { “name”: “HUG Ireland”, “city”: “Dublin”, “since”: “14-03-2014” } ], “details”: { “@type”: “d”, “@class”: “user_details” “city”:”Dublin”, “nationality”:”IT” } } Immutable Record ID Logical set Property Array of objects Embedded document With a traditional Document DB you have to duplicate your data to some degree. The degree depends on how complex are the interdependencies of the application domain. OrientDB combines the unique flexibility of documents with the power of graphs to unlock the business value of Document Data Relationships.
  • 17.
    Graphs: everything oldis new again https://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg
  • 18.
    What is aGraph Database? “A Graph Database is any storage system that provides index-free adjacency” The Graph Traversal Pattern [2010] Marco A. Rodriguez G = (V, E) Graph Vertex Edge A
  • 19.
    • Given aUser (Fabrizio) • Find Fabrizio (id=10) in member table O(log n) • Find 18 and 24 (Hug Ireland & Microservices) in Meetup table O(log n) What’s wrong with joins? name id Fabrizio 10 Uli 12 John 13 Eddie 88 User user_id meetup_id 10 18 10 24 13 18 88 66 member id name 18 HUG Ireland 57 AWS Users 24 Microservices 66 Scala Meetup • Joins are computed every time you cross relationships • Time complexity grows with data: O(log n) • Joining 3-4 tables with million of records could create billion combinations
  • 20.
    • Given aUser (Fabrizio) • Traverse the edges member to reach Hug Ireland O(1) & Microservices O(1) • Fabrizio is the index to reach the linked Meetups! The Graph as an Index • Every vertex and edge is “hard wired” to its adjacent vertex or edge • Traversing an edge does not require complex computation, near O(1) • The traversal time is not affected by the database size Fabrizio HUG Ireland Micro Services member member Easier to sketch!
  • 21.
    Combine Documents withGraphs ` { “@rid”: “12:216”, “@class”: ”user", “name”: “Fabrizio”, “details”: { “@type”: “d”, “@class”: “user_detail”, “city”: “Dublin”, “nationality”: ”IT” } ` { “@rid”: “13:12”, “@class”: “meetup”, “name”: “HUG Ireland”, “city”: “Dublin” } ` { “@rid”: “14:32”, “@class”: “member”, “since”: “14-03-2014”, “in”: “12:216”, “out”: “13:12” } out_member=14:32 in_member=14:32 { “@rid”: “15:79”, “@class”: “talk”, “title”: “OrientDB”, “on”: “11-04-2016”, “in”: “12:216”, “out”: “13:12” } out_talk=15:79 in_talk=15:79
  • 22.
    Combine Documents withGraphs ` { “@rid”: “12:216”, “@class”: ”user", “name”: “Fabrizio”, “details”: { “@type”: “d”, “@class”: “user_detail”, “city”: “Dublin”, “nationality”: ”IT” } ` { “@rid”: “13:12”, “@class”: “meetup”, “name”: “HUG Ireland”, “city”: “Dublin” } ` { “@rid”: “14:32”, “@class”: “member”, “since”: “14-03-2014”, “in”: “12:216”, “out”: “13:12” } out_member=14:32 in_member=14:32 { “@rid”: “15:79”, “@class”: “talk”, “title”: “OrientDB”, “on”: “11-04-2016”, “in”: “12:216”, “out”: “13:12” } out_talk=15:79 in_talk=15:79 Multi-relational Document Graph
  • 23.
    Will you believeme if I said you can query documents/graphs with SQL like syntax? Show me something now! OK, time for a quick demo. http://www.sharegoodstuffs.com/2011_12_12_archive.html
  • 24.
    Use Case: raisestandards in Irish Public Office
  • 25.
    • Aggressive deadline •Large amount of data from different sources with different formats • Messy, dirty data • Connects records from different sources representing the same thing without a common identifier • Multiple steps traverse of fixed and inferred links to identify disparate entities connected by a path The challenges
  • 26.
  • 27.
    • Main Language:Groovy • Database Type: OrientDB Embedded • Fuzzy Inference Engine: Duke • minHash proximity index based on Lucene to avoid cartesian product • probabilistic model with configurable statistical algorithms (Levenshtein, NGram, Soundex, Custom, etc) to identify the same entities despite differences • End-To-End Process Time < 10 min • Deliverable: Database • Preset of queries to answer the main questions (analysts are completely independent to add / modify where conditions) • GraphView to visually search and visualise data Technical Details
  • 28.
    What people fromhome perceived ≈ 20K tweets Top hashtag in Ireland for 24 hours#rteinvestigates
  • 29.
    “While we’ve longunderstood the value of Big Data to better understand how people interact with us, we’ve noticed an alarming trend of Big Data envy: organizations using complex tools to handle “not-really-that-big” Data. Distributed map- reduce algorithms are a handy technique for large data sets, but many data sets we see could easily fit in a single node relational or graph database. Even if you do have more data than that, usually the best thing to do is to first pick out the data you need, which can often then be processed on such a single node” OK but what about Big Data? ThoughtWorksTechnology Radar, 5 April 2016
  • 30.
  • 31.
    • http://martinfowler.com/bliki/PolyglotPersistence.html • https://en.wikipedia.org/wiki/Multi-model_database •http://orientdb.com/ • https://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg • http://arxiv.org/pdf/1004.1001.pdf • https://www.udemy.com/orientdb-getting-started/ • http://www.rte.ie/news/investigations-unit/2015/1207/751833-rte- investigates/ • https://github.com/larsga/Duke • https://www.thoughtworks.com/radar Resources
  • 32.
    Q A Thank you! & FabrizioFortino @fabriziofortino 11th April 2016 #HUGIreland @boistartups