MongoDB Schema Design:
Practical Applications and
Implications
Austin Zellner
Austin.zellner@mongodb.com
MongoDB SolutionArchitect
Ohio Valley Region
Agenda
• What the heck is a Mongo?
• MongoDB Data Structure 101: Documents
• Object Relationships
• Enterprise Design Patterns
• Enterprise Smells and Solutions
• If Nothing Else...
What the heck is a
Mongo?
So what is the MongoDB
Architecture?
Mongo Server
C++ Codebase
Javascript Engine
BSON
Memory
RAM / Disk
Java
Python
Perl
Ruby
Haskell
JavaScript
Native Drivers
Common Interface
> db.collection.insert({company:“10gen”, product:“MongoDB”})
>
> db.collection.findOne()
{
“_id”: ObjectId(“5106c1c2fc629bfe52792e86”),
“company”: “10gen”
“product”: “MongoDB”
}
Command Line Shell
Javascript Interpreter
Database
Collection
MongoDB contains Facts
Document – BSON File
_id
Facts
Index
_idFileLoc
BSON Read / Write Machine
Server
Index
_idFileLoc
Listener
Executes
Javascript
C++
Node
Nodes come together to Replica Set
Node
Node Node
Primary
SecondarySecondary
Replica Set
OpLog OpLog
Replica Sets form Sharded Clusters
Sharded Cluster
Node
Primary
Secondary
Node
Secondary
Node
Node
Primary
Secondary
Node
Secondary
Node
Node
Primary
Secondary
Node
Secondary
Node
Sharding allows expansion
Key 1-99 Key 100-199 Key 200-299 Key 300-399 Key 400-499 Key 400-499
Deployment from local to global
Single Data Center Deployment
Deploy Global, Access Local
Ops Manager – Production Admin Tool
• Automate rollouts and deployments
• Index suggestions to improve your query
performance
• Backups, utility functions, security
Drivers & Frameworks
Morphia
MEAN Stack
Built in Functions
Expressive Queries
• Find anyone with phone # “1-212…”
• Check if the person with number “555…” is on the “do not
call” list
Geospatial
• Find the best offer for the customer at geo coordinates of 42nd
St. and 6th Ave
Text Search • Find all tweets that mention the firm within the last 2 days
Aggregation • Count and sort number of customers grouped by city
Native Binary
JSON support
• Add an additional phone number to Mark Smith’s without
rewriting the document
• Select just the mobile phone number in the list
• Sort on the modified date
Left outer join
($lookup)
• Query for all San Francisco residences, lookup their
transactions, and sum the amount by person
Graph queries
($graphLookup)
• Query for all people within 3 degrees of separation from Mark
CRUD Operations
• Find
• Update
• Insert
• Delete
Data Modification
• Aggregation Pipeline
Utility Functions
• Geospatial
• Graph
• Text
• Etc.
From Facts to Global Deployments
Database
Collection
MongoDB contains Facts
Document – BSON File
_id
Facts
Index
_idFileLoc
BSON Read / Write Machine
Server
Index
_idFileLoc
Listener
Executes
Javascript
C++
Node
Nodes come together to Replica Set
Node
Node Node
Primary
SecondarySecondary
Replica Set
OpLog OpLog
Replica Sets form Sharded Clusters
Sharded Cluster
Node
Primary
Secondary
Node
Secondary
Node
Node
Primary
Secondary
Node
Secondary
Node
Node
Primary
Secondary
Node
Secondary
Node
Sharding allows expansion
Key 1-99 Key 100-199 Key 200-299 Key 300-399 Key 400-499 Key 400-499
Deployment from local to global
Single Data Center Deployment
Deploy Global, Access Local
Ops Manager – Production Admin Tool
• Automate rollouts and deployments
• Index suggestions to improve your query
performance
• Backups, utility functions, security
MongoDB Data
Structure 101:
Documents
RDBMS MongoDB
Database Database
Table Collection
Index Index
Row Document
Join Embedding & Linking
Terminology
How would we model
this in MongoDB?
Size: 12"
Infield/Outfield/Pitcher model
2-PieceWeb pattern
Most popularMLB® pattern among pitchers
Pro Stock®American steerhide leather offers rugged durabilityand a superiorfeel
Dual-Welting™on "exposededges"of the fingershelpsmaintainpocketshapeand durability
Pro Stock™hand-designedpattern for unbeatablecraftsmanship
Dri-Lex®ultra-breathablewrist liningrepels moisture from your hand
Black leather with rich brown embellishments
Pattern: B212
Model:WTA2000BBB212
Wilson
We use a document
{
category: “glove”,
model: “PRO112PT”,
name: “Air Elite”,
brand: “Rawlings”,
price: 229.99,
available: Date(“2013-03-31”)
}
Fields
Values
Field values
are typed
Documents are rich structures
{
category: “glove”,
model: “PRO112PT”,
name: “Air Elite”,
brand: “Rawlings”,
price: 229.99,
available: Date(“2013-03-31”),
position: [“infield”, “outfield”, “pitcher”]
}
Fields can contain arrays
Documents are rich structures
{
category: “glove”,
model: “PRO112PT”,
name: “Air Elite”,
brand: “Rawlings”,
price: 229.99,
available: Date(“2013-03-31”),
position: [“infield”, “outfield”, “pitcher”],
endorsed: {name: “Ryan Howard”,
team: “Phillies”,
position: “first base”},
history: [{date: Date(“2013-03-31”), price: 279.99},
{date: Date(“2013-06-01”), price: 259.79},
{date: Date(“2013-08-15”), price: 229.99}]
}
Fields can contain
an array of sub-
documents
Document flexibility makes
life easy…
Variation is easy with document model
{
category: glove,
model: PRO112PT,
name: Air Elite,
brand: “Rawlings”,
price: “229.99”
size: 11.25,
position: outfield,
pattern: “Pro taper”,
material: leather,
color: black
}
{
category: ball,
model: ROML,
name: MLB,
brand: “Rawlings”,
price: “6.99”
cover: leather,
core: cork,
color: white
}
{
category: bat,
model: B1403E,
name: Air Elite,
brand: “Rip-IT”,
price: 399.99
diameter: “2 5/8”,
barrel: R2 Alloy,
handle: R2 Composite,
type: composite,
}
Queries are easier to develop
{
category: “glove”,
model: “PRO112PT”,
name: “Air Elite”,
brand: “Rawlings”,
price: 229.99,
available: Date(“2013-03-31”),
position: [“infield”, “outfield”, “pitcher”],
endorsed: {name: “Ryan Howard”,
team: “Phillies”,
position: “first base”},
}
> db.products.find( { “position” : “infield”,
“endorsed.team” : “Phillies” } )
MongoDB Compass
• Visualize & explore
your schema with an
intuitive GUI
• Gain quick insights
about your data with
easy-to-read
histograms
• Build queries with a
few clicks
• Drill down to view
individual documents
in your collection
• Understand and
resolve performance
issues with visual
explain plans
• Check index
utilization
Debug &
Optimize
Visualize &
Explore
The GUI for MongoDB
Visual explain plans and full CRUD functionality are currently in beta.
• Insert new
documents or clone
existing documents
• Modify documents in
place using the
powerful visual editor
• Delete documents in
just a few clicks
Insert, Modify, &
Delete
25
MongoDB Connector for BI
Visualize and explore multi-dimensional
documents using SQL-based BI tools. The
connector does the following:
• Provides the BI tool with the schema of the
MongoDB collection to be visualized
• Translates SQL statements issued by the BI tool
into equivalent MongoDB queries that are sent to
MongoDB for processing
• Converts the results into the tabular format
expected by the BI tool, which can then visualize
the data based on user requirements
26
Location & Flow of Data
MongoDB
BI
Connector
Mapping meta-data Application data
{name:
“Andrew”,
address:
{street:…
}}
DocumentTableAnalytics & visualization
Object Relationships
1-1
Referencing & Embedding
https://docs.mongodb.com/manual/core/data-modeling-introduction/
1-1: General Recommendations
• Embed
• No additional data duplication
• Can query or index on embedded
field
• e.g., “result.type”
• Exceptional cases…
• Embedding results in large
documents
• Set of infrequently access fields
{
"_id": 333,
"date": "2003-02-09T05:00:00",
"hospital": "County Hills",
"patient": "John Doe",
"physician": "Stephen Smith",
"type": "Chest X - ray",
"result": {
"type": "txt",
"size": 12,
"content": {
"value1": 343,
"value2": "abc"
}
}
}
1-M
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [
{
id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…},
{
id: 12346,
date: 2015-02-15,
type: “blood test”,
…}]
}
Patients
Embed
1-M
Modeled in 2 possible ways
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [12345, 12346]}
{
_id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…}
{
_id: 12346,
date: 2015-02-15,
type: “blood test”,
…}
Patients
Reference
Procedures
1-M : General Recommendations
• Embed, when possible
• Many are weak entities
• Access all information in a single query
• Take advantage of update atomicity
• No additional data duplication
• Can query or index on any field
• e.g., { “phones.type”: “mobile” }
• Exceptional cases:
• 16 MB document size
• Large number of infrequently accessed fields
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [
{
id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…},
{
id: 12346,
date: 2015-02-15,
type: “blood test”,
…}]
}
M-M
{
_id: 1,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [
{
id: 12345,
name: “Joe Doctor”,
address: {…},
…},
{
id: 12346,
name: “Mary Well”,
address: {…},
…}]
}
M-M
Embedding Physicians in Hospitals collection
{
_id: 2,
name: “Plainmont Hospital”,
city: “Omaha”,
beds: 85,
physicians: [
{
id: 63633,
name: “Harold Green”,
address: {…},
…},
{
id: 12345,
name: “Joe Doctor”,
address: {…},
…}]
}
Data Duplication
…
is ok!
M-M : General Recommendation
• Use case determines whether to reference or
embed:
1. Data Duplication
• Embedding may result in data duplication
• Duplication may be okay if reads
dominate updates
• Of the two, which one changes the
least?
2. Referencing may be required if many
related items
3. Hybrid approach
• Potentially do both .. It’s ok!
{
_id: 2,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [12345, 12346]}
{
_id: 12345,
name: “Joe Doctor”,
address: {…},
…}
{
_id: 12346,
name: “Mary Well”,
address: {…},
…}
HospitalsPhysicians
Enterprise Design
Patterns
One Big Document
Use Case A
Use Case B
Use Case C
Object Store
Use Case A
Use Case B
Use Case C
Collection: Use Case A
Collection: Use Case B
Collection: Use Case C
Collection: Utility
Microservice
Service A
Service B
Service C
Collection: Use Case A
Collection: Use Case B
Collection: Use Case C
Collection: Utility
Apps
Apps
Apps
Apps
AppsApps
Apps
Domain and Utility
Registration
Use Case
Edit
Customer
Profile Use
Case
Collection: Customer
Domain
Collection: Registration
Utility
Read / Write
Write
Read / Write
Order Use
Case
Collection: Order
Domain
Read
Read / Write
Blended
Enterprise
Smells and Solutions
Smell: Aggregation of
disparate data is difficult
Cards
Loans
Deposits
… Data
Warehouse
Batch
Cross-Silo
applications
Issues
• Yesterday’s data
• Details lost
• Inflexible schema
• Slow performance
Datamar
t
Datamar
t
Datamar
t
Batch
Impact
• What happened today?
• Worse customer
satisfaction
• Missed opportunities
• Lost revenue
Batch
Batch
Reporting
Cards
Silo 1
Loans
Silo 2
Deposits
Silo 3
Solution: Using dynamic
schema and easy scaling
Data
Warehouse
Real-time or
Batch
…
Customer-facing
Applications
Regulatory
applications
Operational Single View Benefits
• Real-time
• Complete details
• Agile
• Higher customer
retention
• Increase wallet share
• Proactive exception
handling
Strategic
Reporting
Operational
Reporting
Cards
Loans
Deposits
…
Customer
Accounts
Cards
Silo 1
Loans
Silo 2
Deposits
Silo N
Aligns with Microservices Design Patterns
API Layer
(Microservices, SQL reads,
Spark)
BI UserApp Data Scientist
Customer Info
Service1
Customer Info
Service2
Customer Info
ServiceN
…
…
…
…
MongoDB BI
Connector
MongoDB BI
Connector
Spark Worker1 Spark Worker2…
Mongos
(Query Router)
Mongos
(Query Router)
Mongos
(Query Router)
Mongos
(Query Router)
CustInfo
Shard 1
Mongos
(Query Router)
Mongos
(Query Router)
DC1
DC2
DC3
CustInfo
Shard 2
CustInfo
Shard N
…
MongoDB Ops Manager
• Monitors
• Backups/restores
• Automates management
• REST API for container
orchestration integration
Smell: Response From Data Warehouse
or Other System is Slow
Cards
Loans
Deposits
… Data
Warehouse
Issues
• Data stored normalized
• Reports slow to generate
• Data updated daily but user
response must be fast
Impact
• Lost productivity
• Dissatisfied users and
business
Reporting
Cards
Silo 1
Loans
Silo 2
Deposits
Silo 3
Solution: Optimize Data Structure
as a Datamart In-memory or On-disk
Cards
Loans
Deposits
…
Data
Warehouse
Solution
• Data stored in optimal
structure for reports
• Optionally in memory
Impact
• Response times is as fast
as possible
• Users and business
satisfied
FastReporting
Cards
Silo 1
Loans
Silo 2
Deposits
Silo 3
…
Datamart/Cache
Smell: Siloed operational
applications
Silo 1 Data
Silo 2 Data
Silo N Data
…
Impact
• Views are siloed
• Duplicate management
and data access layer
• Need another layer to
aggregate
Silo 1 systems
Silo 2 Systems
Silo N
Systems
…
ReportingReportingReporting
Solution: Unified data service
…
Benefit
• Each application can still
save its own data
• Data is already aggregated
for cross-silo reporting
• One cluster and data access
layer to manage
Silo 1 Systems
Silo 2 Systems
Silo N Systems
…
Reporting
……
Smell: Master data can be hard
to change and distribute
Golden
Copy
Batch
Batch
Batch
Batch
Batch
Batch
Batch
Batch
Common issues
• Hard to change schema
of master data
• Data copied everywhere
and gets out of sync
Impact
• Process breaks from out
of sync data
• Business doesn’t have
data it needs
• Many copies creates
more management
Solution: Persistent dynamic
cache replicated globally
Real-time
Real-time Real-time
Real-time
Real-time
Real-time
Real-time
Real-time
Solution:
• Load into primary with
any schema
• Replicate to and read
from secondaries
Benefits
• Easy & fast change at
speed of business
• Easy scale out for one
stop shop for data
• Low TCO
If Nothing Else…
Schema Rules of Thumb
• Let your use case guide your schema design
• Going to read same index all the time? – Embed
• Going to read from different indexes? – Reference
• Rule of 80/20 for your Embed vs Reference
• If arrays are going to grow, Reference
Easiest way to learn: Do!
• Get a Free Atlas Tier and download Compass
• Experiment and feel free to ask questions!

MongoDB Schema Design: Practical Applications and Implications

  • 1.
    MongoDB Schema Design: PracticalApplications and Implications Austin Zellner Austin.zellner@mongodb.com MongoDB SolutionArchitect Ohio Valley Region
  • 2.
    Agenda • What theheck is a Mongo? • MongoDB Data Structure 101: Documents • Object Relationships • Enterprise Design Patterns • Enterprise Smells and Solutions • If Nothing Else...
  • 3.
    What the heckis a Mongo?
  • 4.
    So what isthe MongoDB Architecture? Mongo Server C++ Codebase Javascript Engine BSON Memory RAM / Disk Java Python Perl Ruby Haskell JavaScript Native Drivers Common Interface > db.collection.insert({company:“10gen”, product:“MongoDB”}) > > db.collection.findOne() { “_id”: ObjectId(“5106c1c2fc629bfe52792e86”), “company”: “10gen” “product”: “MongoDB” } Command Line Shell Javascript Interpreter
  • 5.
    Database Collection MongoDB contains Facts Document– BSON File _id Facts Index _idFileLoc
  • 6.
    BSON Read /Write Machine Server Index _idFileLoc Listener Executes Javascript C++ Node
  • 7.
    Nodes come togetherto Replica Set Node Node Node Primary SecondarySecondary Replica Set OpLog OpLog
  • 8.
    Replica Sets formSharded Clusters Sharded Cluster Node Primary Secondary Node Secondary Node Node Primary Secondary Node Secondary Node Node Primary Secondary Node Secondary Node
  • 9.
    Sharding allows expansion Key1-99 Key 100-199 Key 200-299 Key 300-399 Key 400-499 Key 400-499
  • 10.
    Deployment from localto global Single Data Center Deployment Deploy Global, Access Local
  • 11.
    Ops Manager –Production Admin Tool • Automate rollouts and deployments • Index suggestions to improve your query performance • Backups, utility functions, security
  • 12.
  • 13.
    Built in Functions ExpressiveQueries • Find anyone with phone # “1-212…” • Check if the person with number “555…” is on the “do not call” list Geospatial • Find the best offer for the customer at geo coordinates of 42nd St. and 6th Ave Text Search • Find all tweets that mention the firm within the last 2 days Aggregation • Count and sort number of customers grouped by city Native Binary JSON support • Add an additional phone number to Mark Smith’s without rewriting the document • Select just the mobile phone number in the list • Sort on the modified date Left outer join ($lookup) • Query for all San Francisco residences, lookup their transactions, and sum the amount by person Graph queries ($graphLookup) • Query for all people within 3 degrees of separation from Mark CRUD Operations • Find • Update • Insert • Delete Data Modification • Aggregation Pipeline Utility Functions • Geospatial • Graph • Text • Etc.
  • 14.
    From Facts toGlobal Deployments Database Collection MongoDB contains Facts Document – BSON File _id Facts Index _idFileLoc BSON Read / Write Machine Server Index _idFileLoc Listener Executes Javascript C++ Node Nodes come together to Replica Set Node Node Node Primary SecondarySecondary Replica Set OpLog OpLog Replica Sets form Sharded Clusters Sharded Cluster Node Primary Secondary Node Secondary Node Node Primary Secondary Node Secondary Node Node Primary Secondary Node Secondary Node Sharding allows expansion Key 1-99 Key 100-199 Key 200-299 Key 300-399 Key 400-499 Key 400-499 Deployment from local to global Single Data Center Deployment Deploy Global, Access Local Ops Manager – Production Admin Tool • Automate rollouts and deployments • Index suggestions to improve your query performance • Backups, utility functions, security
  • 15.
  • 16.
    RDBMS MongoDB Database Database TableCollection Index Index Row Document Join Embedding & Linking Terminology
  • 17.
    How would wemodel this in MongoDB? Size: 12" Infield/Outfield/Pitcher model 2-PieceWeb pattern Most popularMLB® pattern among pitchers Pro Stock®American steerhide leather offers rugged durabilityand a superiorfeel Dual-Welting™on "exposededges"of the fingershelpsmaintainpocketshapeand durability Pro Stock™hand-designedpattern for unbeatablecraftsmanship Dri-Lex®ultra-breathablewrist liningrepels moisture from your hand Black leather with rich brown embellishments Pattern: B212 Model:WTA2000BBB212 Wilson
  • 18.
    We use adocument { category: “glove”, model: “PRO112PT”, name: “Air Elite”, brand: “Rawlings”, price: 229.99, available: Date(“2013-03-31”) } Fields Values Field values are typed
  • 19.
    Documents are richstructures { category: “glove”, model: “PRO112PT”, name: “Air Elite”, brand: “Rawlings”, price: 229.99, available: Date(“2013-03-31”), position: [“infield”, “outfield”, “pitcher”] } Fields can contain arrays
  • 20.
    Documents are richstructures { category: “glove”, model: “PRO112PT”, name: “Air Elite”, brand: “Rawlings”, price: 229.99, available: Date(“2013-03-31”), position: [“infield”, “outfield”, “pitcher”], endorsed: {name: “Ryan Howard”, team: “Phillies”, position: “first base”}, history: [{date: Date(“2013-03-31”), price: 279.99}, {date: Date(“2013-06-01”), price: 259.79}, {date: Date(“2013-08-15”), price: 229.99}] } Fields can contain an array of sub- documents
  • 21.
  • 22.
    Variation is easywith document model { category: glove, model: PRO112PT, name: Air Elite, brand: “Rawlings”, price: “229.99” size: 11.25, position: outfield, pattern: “Pro taper”, material: leather, color: black } { category: ball, model: ROML, name: MLB, brand: “Rawlings”, price: “6.99” cover: leather, core: cork, color: white } { category: bat, model: B1403E, name: Air Elite, brand: “Rip-IT”, price: 399.99 diameter: “2 5/8”, barrel: R2 Alloy, handle: R2 Composite, type: composite, }
  • 23.
    Queries are easierto develop { category: “glove”, model: “PRO112PT”, name: “Air Elite”, brand: “Rawlings”, price: 229.99, available: Date(“2013-03-31”), position: [“infield”, “outfield”, “pitcher”], endorsed: {name: “Ryan Howard”, team: “Phillies”, position: “first base”}, } > db.products.find( { “position” : “infield”, “endorsed.team” : “Phillies” } )
  • 24.
    MongoDB Compass • Visualize& explore your schema with an intuitive GUI • Gain quick insights about your data with easy-to-read histograms • Build queries with a few clicks • Drill down to view individual documents in your collection • Understand and resolve performance issues with visual explain plans • Check index utilization Debug & Optimize Visualize & Explore The GUI for MongoDB Visual explain plans and full CRUD functionality are currently in beta. • Insert new documents or clone existing documents • Modify documents in place using the powerful visual editor • Delete documents in just a few clicks Insert, Modify, & Delete
  • 25.
    25 MongoDB Connector forBI Visualize and explore multi-dimensional documents using SQL-based BI tools. The connector does the following: • Provides the BI tool with the schema of the MongoDB collection to be visualized • Translates SQL statements issued by the BI tool into equivalent MongoDB queries that are sent to MongoDB for processing • Converts the results into the tabular format expected by the BI tool, which can then visualize the data based on user requirements
  • 26.
    26 Location & Flowof Data MongoDB BI Connector Mapping meta-data Application data {name: “Andrew”, address: {street:… }} DocumentTableAnalytics & visualization
  • 27.
  • 28.
  • 29.
  • 30.
    1-1: General Recommendations •Embed • No additional data duplication • Can query or index on embedded field • e.g., “result.type” • Exceptional cases… • Embedding results in large documents • Set of infrequently access fields { "_id": 333, "date": "2003-02-09T05:00:00", "hospital": "County Hills", "patient": "John Doe", "physician": "Stephen Smith", "type": "Chest X - ray", "result": { "type": "txt", "size": 12, "content": { "value1": 343, "value2": "abc" } } }
  • 31.
  • 32.
    { _id: 2, first: “Joe”, last:“Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”, …}, { id: 12346, date: 2015-02-15, type: “blood test”, …}] } Patients Embed 1-M Modeled in 2 possible ways { _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [12345, 12346]} { _id: 12345, date: 2015-02-15, type: “Cat scan”, …} { _id: 12346, date: 2015-02-15, type: “blood test”, …} Patients Reference Procedures
  • 33.
    1-M : GeneralRecommendations • Embed, when possible • Many are weak entities • Access all information in a single query • Take advantage of update atomicity • No additional data duplication • Can query or index on any field • e.g., { “phones.type”: “mobile” } • Exceptional cases: • 16 MB document size • Large number of infrequently accessed fields { _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”, …}, { id: 12346, date: 2015-02-15, type: “blood test”, …}] }
  • 34.
  • 35.
    { _id: 1, name: “OakValley Hospital”, city: “New York”, beds: 131, physicians: [ { id: 12345, name: “Joe Doctor”, address: {…}, …}, { id: 12346, name: “Mary Well”, address: {…}, …}] } M-M Embedding Physicians in Hospitals collection { _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [ { id: 63633, name: “Harold Green”, address: {…}, …}, { id: 12345, name: “Joe Doctor”, address: {…}, …}] } Data Duplication … is ok!
  • 36.
    M-M : GeneralRecommendation • Use case determines whether to reference or embed: 1. Data Duplication • Embedding may result in data duplication • Duplication may be okay if reads dominate updates • Of the two, which one changes the least? 2. Referencing may be required if many related items 3. Hybrid approach • Potentially do both .. It’s ok! { _id: 2, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [12345, 12346]} { _id: 12345, name: “Joe Doctor”, address: {…}, …} { _id: 12346, name: “Mary Well”, address: {…}, …} HospitalsPhysicians
  • 37.
  • 38.
    One Big Document UseCase A Use Case B Use Case C
  • 39.
    Object Store Use CaseA Use Case B Use Case C Collection: Use Case A Collection: Use Case B Collection: Use Case C Collection: Utility
  • 40.
    Microservice Service A Service B ServiceC Collection: Use Case A Collection: Use Case B Collection: Use Case C Collection: Utility Apps Apps Apps Apps AppsApps Apps
  • 41.
    Domain and Utility Registration UseCase Edit Customer Profile Use Case Collection: Customer Domain Collection: Registration Utility Read / Write Write Read / Write Order Use Case Collection: Order Domain Read Read / Write
  • 42.
  • 43.
  • 44.
    Smell: Aggregation of disparatedata is difficult Cards Loans Deposits … Data Warehouse Batch Cross-Silo applications Issues • Yesterday’s data • Details lost • Inflexible schema • Slow performance Datamar t Datamar t Datamar t Batch Impact • What happened today? • Worse customer satisfaction • Missed opportunities • Lost revenue Batch Batch Reporting Cards Silo 1 Loans Silo 2 Deposits Silo 3
  • 45.
    Solution: Using dynamic schemaand easy scaling Data Warehouse Real-time or Batch … Customer-facing Applications Regulatory applications Operational Single View Benefits • Real-time • Complete details • Agile • Higher customer retention • Increase wallet share • Proactive exception handling Strategic Reporting Operational Reporting Cards Loans Deposits … Customer Accounts Cards Silo 1 Loans Silo 2 Deposits Silo N
  • 46.
    Aligns with MicroservicesDesign Patterns API Layer (Microservices, SQL reads, Spark) BI UserApp Data Scientist Customer Info Service1 Customer Info Service2 Customer Info ServiceN … … … … MongoDB BI Connector MongoDB BI Connector Spark Worker1 Spark Worker2… Mongos (Query Router) Mongos (Query Router) Mongos (Query Router) Mongos (Query Router) CustInfo Shard 1 Mongos (Query Router) Mongos (Query Router) DC1 DC2 DC3 CustInfo Shard 2 CustInfo Shard N … MongoDB Ops Manager • Monitors • Backups/restores • Automates management • REST API for container orchestration integration
  • 47.
    Smell: Response FromData Warehouse or Other System is Slow Cards Loans Deposits … Data Warehouse Issues • Data stored normalized • Reports slow to generate • Data updated daily but user response must be fast Impact • Lost productivity • Dissatisfied users and business Reporting Cards Silo 1 Loans Silo 2 Deposits Silo 3
  • 48.
    Solution: Optimize DataStructure as a Datamart In-memory or On-disk Cards Loans Deposits … Data Warehouse Solution • Data stored in optimal structure for reports • Optionally in memory Impact • Response times is as fast as possible • Users and business satisfied FastReporting Cards Silo 1 Loans Silo 2 Deposits Silo 3 … Datamart/Cache
  • 49.
    Smell: Siloed operational applications Silo1 Data Silo 2 Data Silo N Data … Impact • Views are siloed • Duplicate management and data access layer • Need another layer to aggregate Silo 1 systems Silo 2 Systems Silo N Systems … ReportingReportingReporting
  • 50.
    Solution: Unified dataservice … Benefit • Each application can still save its own data • Data is already aggregated for cross-silo reporting • One cluster and data access layer to manage Silo 1 Systems Silo 2 Systems Silo N Systems … Reporting ……
  • 51.
    Smell: Master datacan be hard to change and distribute Golden Copy Batch Batch Batch Batch Batch Batch Batch Batch Common issues • Hard to change schema of master data • Data copied everywhere and gets out of sync Impact • Process breaks from out of sync data • Business doesn’t have data it needs • Many copies creates more management
  • 52.
    Solution: Persistent dynamic cachereplicated globally Real-time Real-time Real-time Real-time Real-time Real-time Real-time Real-time Solution: • Load into primary with any schema • Replicate to and read from secondaries Benefits • Easy & fast change at speed of business • Easy scale out for one stop shop for data • Low TCO
  • 53.
  • 54.
    Schema Rules ofThumb • Let your use case guide your schema design • Going to read same index all the time? – Embed • Going to read from different indexes? – Reference • Rule of 80/20 for your Embed vs Reference • If arrays are going to grow, Reference
  • 55.
    Easiest way tolearn: Do! • Get a Free Atlas Tier and download Compass • Experiment and feel free to ask questions!