Polyglot Persistence
{ Name: ‘Bryan Reinero’,
Title: ‘Developer Advocate’,
Twitter: ‘@blimpyacht’,
Email: ‘bryan@mongdb.com’ }
What is the Polyglots?
• Using multiple Database Technologies in a
Given Application
• Using the right tool for the right job
What is the Polyglots?
• Using multiple Database Technologies in a
Given Application
• Using the right tool for the right job
Derived from “polyglot programming”.
Applications programmed from a mix of
languages.
Why is the Polyglots?
• Relational has been the dominant model
• Higher performance requirements
• Increasingly large datasets
• Use of IaaS and commodity hardware
Vertical Scaling
Horizontal Scaling
7
Availability
http://avstop.com/ac/flighttrainghandbook/imagel4b.jpg
8
Availability
http://avstop.com/ac/flighttrainghandbook/imagel4b.jpg
Requirements
• Maximize uptime
• Minimize time to recover
9
Availability
http://avstop.com/ac/flighttrainghandbook/imagel4b.jpg
Requirements
• Maximize uptime
• Minimize time to recover
Hardware failures
Network partitions
Data center failures
Maintenance
Operations
10
Availability
http://avstop.com/ac/flighttrainghandbook/imagel4b.jpg
Business critical systems
require automatic fault
detection and fail over
11
Variant Data Models
58842
45647
52320
88237
78932
Key-Value Store
Eratosthenes
Democritus
Hypatia
Shemp
Euripides
ID Name
12
Variant Data Models
Eratosthenes
Democritus
Hypatia
Shemp
Euripides
Graph Databases
13
Variant Data Models
Document Databases
{
maker : ”Agusta",
type : sportbike,
rake : 7,
trail : 3.93,
engine : {
type : "internal combustion",
layout : "inline"
cylinders : 4,
displacement : 750,
},
transmission : {
type : "cassette",
speeds : 6,
pattern : "sequential”,
ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64
]
}
}
Polyglot Persistence
Application
Servers MongoDB
RDBMS
Key /
Value
Session Data,
Shopping Carts
Product Catalog,
User Accounts,
Domain Objects
Payment
Systems,
Reporting
Graph
Social Data,
Recommendations
Polyglot Persistence
Application
Servers MongoDB
RDBMS
Key /
Value
Session Data,
Shopping Carts
Product Catalog,
User Accounts,
Domain Objects
Payment
Systems,
Reporting
Graph
Social Data,
Recommendations
What are your requirements?
• Availability
• Scalability
• Performance
• Access Patterns
• Data Model
18
Key Value Stores
58842
45647
52320
88237
78932
Used for
• Session data
• Cookies
• Shopping carts
Eratosthenes
Democritus
Hypatia
Shemp
Euripides
ID Name
19
Key Value Stores
58842
45647
52320
88237
78932
• Fast, if in memory
• Single access pattern
• Complex data parsed
in client
Eratosthenes
Democritus
Hypatia
Shemp
Euripides
ID Name
Key Value Store
“{
maker : ‘Agusta’,
type : sportbike,
rake : 7,
trail : 3.93,
engine : {
type : ‘internal combustion’,
layout : ‘inline’,
cylinders : 4,
displacement : 750,
},
transmission : {
type : ‘cassette’,
speeds : 6,
pattern : ‘sequential’,
ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]
}
}”
Data Model
RDBMS MongoDB
Table, View ➜ Collection
Row ➜ Document
Index ➜ Index
Join ➜ Embedded Document
Foreign Key ➜ Reference
Partition ➜ Shard
MongoDB
{ _id: 78234974,
maker : ”Agusta",
type : sportbike,
rake : 7,
trail : 3.93,
engine : {
type : "internal combustion",
layout : "inline"
cylinders : 4,
displacement : 750,
},
transmission : {
type : "cassette",
speeds : 6,
pattern : "sequential”,
ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]
}
}
Self Defining Schema
MongoDB
{ _id: 78234974,
maker : ”Agusta",
type : sportbike,
rake : 7,
trail : 3.93,
engine : {
type : "internal combustion",
layout : "inline"
cylinders : 4,
displacement : 750,
},
transmission : {
type : "cassette",
speeds : 6,
pattern : "sequential”,
ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]
}
}
Self Defining Schema
Nested Objects
MongoDB
{ _id: 78234974,
maker : ”Agusta",
type : sportbike,
rake : 7,
trail : 3.93,
engine : {
type : "internal combustion",
layout : "inline"
cylinders : 4,
displacement : 750,
},
transmission : {
type : "cassette",
speeds : 6,
pattern : "sequential”,
ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]
}
}
Self Defining Schema
Nested Objects
Array types
MongoDB
{ _id: 78234974,
maker : ”Agusta",
type : sportbike,
rake : 7,
trail : 3.93,
engine : {
type : "internal combustion",
layout : "inline"
cylinders : 4,
displacement : 750,
},
transmission : {
type : "cassette",
speeds : 6,
pattern : "sequential”,
ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]
}
}
Primary Key,
Auto indexed
Multiple Access Patterns
{ _id: 78234974,
maker : ”Agusta",
type : sportbike,
rake : 7,
trail : 3.93,
engine : {
type : "internal combustion",
layout : "inline"
cylinders : 4,
displacement : 750,
},
transmission : {
type : "cassette",
speeds : 6,
pattern : "sequential”,
ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]
}
}
Secondary
indexes
Projections
{ _id: 78234974,
maker : ”Agusta",
type : sportbike,
rake : 7,
trail : 3.93,
engine : {
type : "internal combustion",
layout : "inline"
cylinders : 4,
displacement : 750,
},
transmission : {
type : "cassette",
speeds : 6,
pattern : "sequential”,
ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]
}
}
Projections
db.vehicles.find (
{_id:78234974 },
{ engine:1,_id:0 }
)
Flexible Schemas
Flexible Schemas
{ maker : "M.V. Agusta",
type : sportsbike,
engine : {
type : ”internal combustion",
cylinders: 4,
displacement : 750
},
rake : 7,
trail : 3.93
}
{ maker : "M.V. Agusta",
type : Helicopter
engine : {
type : "turboshaft"
layout : "axial”,
massflow : 1318
},
Blades : 4
undercarriage : "fixed"
}
Flexible Schemas
Discriminator column
{ maker : "M.V. Agusta",
type : sportsbike,
engine : {
type : ”internal
combustion",
cylinders: 4,
displacement : 750
},
rake : 7,
trail : 3.93
}
{ maker : "M.V. Agusta",
type : Helicopter
engine : {
type : "turboshaft"
layout : "axial”,
massflow : 1318
},
Blades : 4
undercarriage : "fixed"
}
Flexible Schemas
Shared indexing strategy
{ maker : "M.V. Agusta",
type : sportsbike,
engine : {
type : ”internal
combustion",
cylinders: 4,
displacement : 750
},
rake : 7,
trail : 3.93
}
{ maker : "M.V. Agusta",
type : Helicopter
engine : {
type : "turboshaft"
layout : "axial”,
massflow : 1318
},
Blades : 4
undercarriage : "fixed"
}
Flexible Schemas
Polymorphic Attributes
{ maker : "M.V. Agusta",
type : sportsbike,
engine : {
type : ”internal
combustion",
cylinders: 4,
displacement : 750
},
rake : 7,
trail : 3.93
}
{ maker : "M.V. Agusta",
type : Helicopter,
engine : {
type : "turboshaft”,
layout : "axial”,
massflow : 1318
},
Blades : 4,
undercarriage : "fixed"
}
Tao of MongoDB
• Model data for use, not storage
• Avoid ad-hoc queries
• Index effectively, index efficiently
Whattabout
the
Real World?
35
Systems of Engagement
• Context rich and User
Relevant Interactions
• Integrates data from many
systems
• Integrates Analytics
36
37
Customer Single View
• Understand customer
relationships
• Improves customer experience
• Develops effective customer
marketing
• Improves product
Requirements
• High performance requirements
• Increasingly large datasets
• High Availability
39
Architecture
Systems of Engagement
DataServices
Systems of Record
Master Data
Raw Data
Integrated Data
…
ETL
record
record
record
40
Aggregating a Single View
Single customer
VIEW
41
Aggregating a Single View
Common Data
Source Metadata
Source Data A
Source Data B
42
Aggregating a Single View
Common Data
Source Metadata
Source Data A
Source Data B
{
_id: <hash>,
address: {
num: 860,
street: “Grove”,
city: “San Francisco”,
state: “CA”,
zip:
}
}
43
Aggregating a Single View
Common Data
Source Metadata
Source Data A
Source Data B
{
sources: [
{
source: “URI”,
updated: ISODate(),
},
…
]
}
44
Aggregating a Single View
Common Data
Source Metadata
Source Data A
Source Data B
Shopping Cart,
Purchase history,
Prescriptions,
Medical History,
Strong Consistency
vs.
Eventual Consistency
Availability
Availablity
Fail-over
Fail-over
Strong vs. Eventual Consistency
Analytics
52
Hadoop
A framework for distributed processing of large data sets
• Terabyte and petabyte datasets
• Data warehousing
• Advanced analytics
• No indexes
• Batch processing
53
Use Cases
• Behavioral analytics
• Segmentation
• Fraud detection
• Prediction
• Pricing analytics
• Sales analytics
54
Typical Implementations
Application Server
55
MongoDB as an Operational Store
Application Server
56
Data Flows
Hadoop
Connector
BSON Files
MapReduce & HDFS
57
Cluster
MONGOS
SHARD A
SHARDB
SHARD C
SHARD D
MONGOS Client
58
59
Hadoop / Spark Trade-offs
Plus
• Access to Analytics
Libraries
• Processes unstructured
data
• Handles petabyte data
sets
Minus
• Overhead of a separate
distributed system
• Writing MapReduce not
for the faint of heart
• Designed for batch
oriented processing
60
Relational for Reporting & Business Intelligence
Plus
• Existing ecosystem of BI
tools
• Lower overhead than
Hadoop clusters
• Large pool of expertise
and talent
61
Capture Data Changes
Systems of Engagement
DataServices
Data Processing
Integration, Analytics, etc.
Systems of Record
Master Data
Raw Data
Integrated Data
…
ETL
Bus
Apache Kafka
record
record
record
Integrations & ETL
RDBMSPrimary
RDBMSPrimary ETL
Oplog
Replication
LucenePrimary
Mongo
Connector
Oplog
Replication
Integrations with Search Solutions
Considerations
• Increased system
complexity
• Operations overhead
• Increased expertise
Thanks!
{ Name: ‘Bryan Reinero’,
Title: ‘Developer Advocate’,
Twitter: ‘@blimpyacht’,
Email: ‘bryan@mongdb.com’ }

Polyglot Persistence

Editor's Notes

  • #38 Source http://www.experian.co.uk/assets/about-us/white-papers/single-customer-view-whitepaper.pdf
  • #40 So let's add a component that will propagate changes from the system of engagement back to the systems of record In addition to the previous components we put in place for the single view, we need some sort of message processing component to receive and publish data changes back to the source systems. For this example, we will use Apache Kafka as it is pretty commonly used these days. We'll show changing the integrated data in the system of engagement database and propagating that back to the systems of record
  • #41 So let's add a component that will propagate changes from the system of engagement back to the systems of record In addition to the previous components we put in place for the single view, we need some sort of message processing component to receive and publish data changes back to the source systems. For this example, we will use Apache Kafka as it is pretty commonly used these days. We'll show changing the integrated data in the system of engagement database and propagating that back to the systems of record
  • #42 So let's add a component that will propagate changes from the system of engagement back to the systems of record In addition to the previous components we put in place for the single view, we need some sort of message processing component to receive and publish data changes back to the source systems. For this example, we will use Apache Kafka as it is pretty commonly used these days. We'll show changing the integrated data in the system of engagement database and propagating that back to the systems of record
  • #43 So let's add a component that will propagate changes from the system of engagement back to the systems of record In addition to the previous components we put in place for the single view, we need some sort of message processing component to receive and publish data changes back to the source systems. For this example, we will use Apache Kafka as it is pretty commonly used these days. We'll show changing the integrated data in the system of engagement database and propagating that back to the systems of record
  • #44 So let's add a component that will propagate changes from the system of engagement back to the systems of record In addition to the previous components we put in place for the single view, we need some sort of message processing component to receive and publish data changes back to the source systems. For this example, we will use Apache Kafka as it is pretty commonly used these days. We'll show changing the integrated data in the system of engagement database and propagating that back to the systems of record
  • #45 So let's add a component that will propagate changes from the system of engagement back to the systems of record In addition to the previous components we put in place for the single view, we need some sort of message processing component to receive and publish data changes back to the source systems. For this example, we will use Apache Kafka as it is pretty commonly used these days. We'll show changing the integrated data in the system of engagement database and propagating that back to the systems of record
  • #57 .
  • #58 Immutable data