Many complex applications scale up by using several different databases, i.e. selecting the best DBMS for each use case. This tends to complicate modern architecture with many products by different vendors, no standards, and a lot of ETL which ultimately causes unpredictable results and a lot of headaches. Multi-Model DBMSs were created to make your life easier, giving you the option of using one NoSQL product with powerful multi-purpose engines capable of handling complex domains. Could one DBMS handle all your needs including speed and scalability in the times of Big Data? Luca will walk you through the benefits and trade-offs of multi-model DBMSs and will show you how easy it is to setup one open source database to handle many different use cases, saving you time and money.
Presented at Data Day Texas - Austin (TX) - USA
3. Structured Data
Small Datasets
Few Relationships
Waterfall Approach
Scale Up
CIO
The World Has Changed
Unstructured Data
Large Volume
Connected Data
Agile Approach
Scale Out
Developers
Relational NoSQL
1970 2009
A NoSQL database provides a mechanism for storage and retrieval of
data that is modeled in means other than the tabular relations
used in relational databases. Motivations for this approach include:
simplicity of design, "horizontal" scaling, which is a problem for
relational databases, and finer control over availability
What s Next?
5. Polyglot Persistence
Polyglot Persistence is a fancy term to
describe that when storing data, it is best to use
multiple data storage technologies, chosen
based upon the way data is being used by
individual applications or components.
http://www.jamesserra.com/archive/2015/07/what-is-polyglot-persistence/
6.
7. Multi-Model
A multi-model database is designed to support
multiple data models against a single, integrated
backend.
Multi-model databases are intended to offer the
data modeling advantages of polyglot
persistence without its disadvantages. Complexity,
in particular, is reduced.
https://en.wikipedia.org/wiki/Multi-model_database
8. What s a Multi-Model DBMS?
GraphDocument
Object
Key/Value
Multi-Model represents the
intersection
of multiple models in just one
product
Full-Text
Spatial
12. OrientDB
• First Multi-Model DBMS with a Graph-Engine
• Open Source Apache2 license
• Data Models are built into the core engine
• The Graph Database engine allows O(1) performance on
traversing relationships, against O(LogN) of RDBMS and
any other Multi-Model DBMS built as layers
• Schema-less, Schema-full and Schema-mixed
• Use of Apache Lucene for Full-Text and Spatial
• Written in Java (runs on every platform)
• Zero-config HA
19. Polyglot Persistence in Action
DOCUMENTKEY/VALUE GRAPH RELATIONAL
User Sessions
Rapid Access for
reads and writes.
No need to be
durable.
Financial Data
Needs transactional
updates. It will
manage orders and
payments.
Recommendations
Rapidly traverse
links between
friends, product
purchases, and
ratings.
Product Catalog
Lots of reads,
infrequent writes.
Products make
natural aggregates.
Example: Hotel Booking Application
SEARCH
Search Engine
Full-Text Search.
Support for faceted
search and
suggestions.
21. Multi-Model in Action
Example: Hotel Booking Application
User Sessions
Rapid Access for
reads and writes.
No need to be
durable.
Financial Data
Needs transactional
updates. It will
manage orders and
payments.
Recommendations
Rapidly traverse
links between
friends, product
purchases, and
ratings.
Product Catalog
Lots of reads,
infrequent writes.
Products make
natural aggregates.
Search Engine
Full-Text Search.
Support for faceted
search and
suggestions
23. Deployment
Multi-ModelPolyglot
• Only 1 product to learn
• Only 1 server to configure and deploy
• Only 1 vendor in case of support
• 5 products to learn
• 5 servers to configure and deploy
• 5 vendors in case of support
24. Polyglot Deployment
• 5 PRODUCTS TO LEARN
No standard, all products are different. Even in the same category, they
have different APIs (ex. MongoDB and CouchDB). Every developer has to
learn multiple products or you should hire multiple developers with specific
skills for every product.
• 5 SERVERS TO CONFIGURE AND DEPLOY
Usually it’s a bad idea to put more databases on the same machine due to
the aggressive use of resources such as RAM and DISK.
• 5 VENDORS IN CASE OF SUPPORT
This means 5 contracts with 5 different vendors.
29. Domain Design
Multi-ModelPolyglot
• The entire domain is represented in
just one model in the same database
• All data is interconnected and easy
to access
• Easy to refactor
• Design of 5 different ways to reproduce
part of the data on each product
• Management of Application level
relationship between data in different
datasets represented in different way
• Hard to refactor
31. Polyglot: Sequence Diagram
APPLICATION
(2) Get Product Details
(3) Get Recommendation for
the current product
(5) Get orders to
check availability
(6) Check concurrent
user activity on the
same product
(7) Update current
user activity (in
background)
(4) Get basic information for each
recommended product
(1) Request Product Detail Page
32. Polyglot: Performance
APPLICATION
(4) Get orders to
check availability
(1) Request Product Detail Page
(5) Check concurrent
user activity on the
same product
= 10ms
= 50ms
= 200ms
= 150ms
= 20ms
= 10ms
Total Time = 530ms
(6) Update current
user activity (in
background)
(2) Get Product Details
(3) Get Recommendation for
the current product
(4) Get basic information for each
recommended product
= 100ms
33. Multi-Model: Sequence Diagram
APPLICATION
(1) Request Product Detail Page
(2) Get Product Details
(3) Get Recommendation for
the current product
(5) Get orders to check availability
(7) Update concurrent user activity
(in background)
(6) Check concurrent users activity
on the same product
(4) Get basic information for
each recommended product
34. Multi-Model: Performance
APPLICATION
(1) Request Product Detail Page = 10ms
Total Time = 300ms
APPLICATION
= 290ms
(2) Get Product Details
(3) Get Recommendation for
the current product
(5) Get orders to check availability
(7) Update concurrent user activity
(in background)
(6) Check concurrent users activity
on the same product
(4) Get basic information for
each recommended product
35. Caching to the Rescue
(2) Get Product Details
(3) Get Recommendation for
the current product
(4) Get basic information for each
recommended product
(1) Request Product Detail Page
(6) Check concurrent users
activity on the same product
= 200ms
(7) Update current
user activity (in background)
= 10ms
= 50ms
= 150ms
= 20ms
= 10ms
If products description don’t change
very often, they can be cached
Caching recommendation means
loosing the ability to recommend per
use, but only per products
(5) Get orders to check availability
= 100ms If products description don’t change
very often, they can be cached
36. Polyglot: Parallel Async Execution
(2) Get Product Details
(3) Get Recommendation for the current product
(5) Get orders to check
availability
(1) Request Product Detail Page
(6) Check concurrent users activity on the same product
= 200ms
(7) Update current user activity
= 10ms
= 50ms
= 150ms
= 20ms
= 10ms
= 310ms
APPLICATION
(4) Get basic information for each recommended product
= 100ms
37. Performance
But when the
domain is simple,
using specific products
could give you better
performance
With complex
domains, Multi-Model is
faster then Polyglot
38. Performance continued...
• With OrientDB, we have many stories about users that
switched from a pure Graph Database to OrientDB. In
all the cases, they had comparable or better
performance.
• From the other side, we don t have many stories about
users that switched from a Key-Value to OrientDB.
• Performance depends on the Multi-Model product.
• With Multi-Model it s very important having the models
built in the engine. If they are just layers, you ll have a lot
of compromises in term of flexibility and performance.
40. Features
Multi-ModelPolyglot
Even if Multi-Model are feature-rich
products, it’s possible to not find the
feature you need.
You can choose from 300 products,
giving you access to all the available
features.
43. Polyglot: Synchronization by ETL
DOCUMENT
GRAPH
RELATIONAL
In order to use the Recommendation engine, you
have to develop the ETL to pump data into the
Graph Database every hour/day, mixing data of
products and sales. The Search Engine, instead,
only needs data from the Product Catalog.
ETL
ETL
ETL
44. Polyglot: Synchronization by App
DOCUMENT
GRAPH
RELATIONAL
You can avoid ETL
is the application is
responsible to
populate all the
DBMS and keep
them in synch.
APPLICATION
45. Let s put everything
in
High Availability
(HA)
47. Redis in HA
Server A
Sentinel A
Server B
Sentinel B
Server C
Sentinel C
Suggested Configuration:
Deploy at least 3 Redis Server
+ Redis Sentinel on 3 separate Boxes
http://redis.io/topics/sentinel
48. Neo4j in HA
Suggested Configuration:
Deploy at least 3 Neo4j Servers
http://neo4j.com/docs/stable/ha-architecture.html
49. MongoDB in HA
Secondary 1
Suggested Configuration:
Deploy at least 3 MongoDB Servers
(1 Primary and 2 Secondary Servers)
Primary
Secondary 2
https://docs.mongodb.org/manual/core/replica-set-members/
50. ElasticSearch in HA
Suggested Configuration:
Deploy at least 2 ElasticSearch Servers
https://www.elastic.co/guide/en/elasticsearch/guide/current/_add_failover.html
51. MySQL in HA
Sorry, but the ways to put MySQL in HA are too many…
I found this configuration with 2 master servers that should be
the minimum for HA.
55. Multi-Model in HA
APPLICATION
OrientDB supports Multi-Master
replication with flexible sharding
Zero-config cluster deployment allows
to create a cluster of servers in a few
minutes
When a new server connects to the
cluster, the database is automatically
shared
All the clients are always notified
about new servers, so in case of a
crash, the client can automatically
switch to another available server with
no failure at application level
Servers = 3
58. Confidential
OrientDB At a Glance
70,000
Downloads per month
from 200+ countries
100+
Code contributors on
Github and 15,000+
commits
1,000s
Users from SMBs to
Fortune 10 Companies
17+
Years of
Product
Research
Global Coverage and 24x7 Support
59. Awards and Press Coverage
2015 Bossie Award Winner
OrientDB is an interesting hybrid in the NoSQL world,
combining features from a document database and a graph
database.
A new breed of database hopes to blend the best
of NoSQL and RDBMS
Multi-model databases may help tame the growing
complexity of enterprise data.
11 cutting-edge databases worth exploring now
OrientDB packages itself as a "second-generation graph
database." In other words, the nodes in the graphs are
documents waiting for arbitrary key-value pairs.
60. A Bright Future
Graph DBMS increased their popularity
by 500% within the last 2 years.
Document DBMS are the 3rd fastest
growing category.
Forrester estimates that over 25 percent of enterprises will
use graph databases by 2017.
Among the top 50, OrientDB is the technology with the
largest year-on-year growth (+22 positions).
61. Don t miss my presentation
Tomorrow, at GraphDay
10:00am:
Working Towards an
Unbreakable Graph Database
that Scales