Jay Runkel
Principal Solutions Architect
WHEN TO USE MONGODB
jay.runkel@mongodb.com
@jayrunkel
AGENDA
• When to use MongoDB? Are we asking the right question?
• Why MongoDB?
• Evaluating Use Case Suitability for MongoDB
• When you shouldn’t use MongoDB?
WHEN TO USE
MONGODB?
TRANSPORTATION
?
CLEARING A FOREST
?
MODERN APPLICATION
RDBMS MongoDB
?
SHOULD WE USE
MONGODB?
Legacy
Rigid	Schemas	
Resistant	to	
change
Throughput	&	Cost	
make	Scale-Up	
Impractical
Relational	Model Scale-up
Data	changes	constantly,	
which	fits	poorly	with	a	
relational	model
Scale-Up	clusters	were	never	
meant	to	handle	today’s	
volumes
Today
Flexible	Model
01
10
JSON
Scale-out
Flexible	Multi-Structured	
Schema	that	is	designed	to	
adapt	to	changes
Scale-out	to	the	end	of	the	world	
and	distribute	data	where	it	
needs	to	be
TRADITIONAL RDBMS SYSTEMS WEREN’T DESIGNED
FOR TODAY’S WORLD
BEING SUCCESSFUL WITH MONGODB
5x
Productivity*
We help our customers to increase
overall output, e.g. in terms of
development or ops productivity.
80%
Cost reduction*
We help our customers to dramatically
lower their total cost of ownership for data
storage and analytics by up to 80%.
* Dependent on type of implementation
While the detailed definition of success metrics look different for each customer, 2 key factors are
consistent across all of our engagements:
SHOULD WE USE
MONGODB?
CAN WE USE MONGODB?
• If we get
‒ 5x developer productivity
‒ 80% cost reduction
• Shouldn’t we consider this
alternative first?
Assess
MongoDB Fit
MongoD
B?
Build In
MongoDB
Look at
Alternatives
yes
no
SHOULD CAN WE USE
MONGODB?
WHY MONGODB?
RELATIONAL
Expressive	Query	Language
&	Secondary	Indexes
Strong Consistency
Enterprise	Management
&	Integrations
Scalability
& Performance
Always On,
Global Deployments
FlexibilityExpressive	Query	Language
&	Secondary	Indexes
Strong	Consistency
Enterprise	Management
&	Integrations
NOSQL
NEXUS ARCHITECTURE
Scalability
& Performance
Always On,
Global Deployments
FlexibilityExpressive	Query	Language
&	Secondary	Indexes
Strong	Consistency
Enterprise	Management
&	Integrations
THAT’S NICE JAY, BUT…
• Where does the developer productivity come from?
• What about the TCO savings?
DEVELOPER
PRODUCTIVITY
DOCUMENT DATA MODEL
Relational MongoDB
{
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
location: [45.123,47.232],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
DOCUMENTS ARE RICH DATA STRUCTURES
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
Profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array of sub-
documents
Fields
Typed field values
Fields can contain
arrays
DOCUMENTS ARE FLEXIBLE
Documents in the same product catalog collection in MongoDB
{
product_name: ‘Acme Paint’,
color: [‘Red’, ‘Green’],
size_oz: [8, 32],
finish: [‘satin’, ‘eggshell’]
}
{
product_name: ‘T-shirt’,
size: [‘S’, ‘M’, ‘L’, ‘XL’],
color: [‘Heather Gray’ … ],
material: ‘100% cotton’,
wash: ‘cold’,
dry: ‘tumble dry low’
}
{
product_name: ‘Mountain Bike’,
brake_style: ‘mechanical disc’,
color: ‘grey’,
frame_material: ‘aluminum’,
no_speeds: 21,
package_height: ‘7.5x32.9x55’,
weight_lbs: 44.05,
suspension_type: ‘dual’,
wheel_size_in: 26
}
DO MORE WITH YOUR DATA
{
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
location: [45.123,47.232],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
}
}
Rich Queries
Find everybody in London with a car
built between 1970 and 1980
Geospatial
Find all of the car owners within 5km
of Trafalgar Sq.
Search
Find all the cars described as having
leather seats. Count them by model.
(text, facets, collation)
Aggregation
Calculate the average value of Paul’s
car collection
Graph
Find all the cars own by Paul’s family
(descendants)
Map Reduce
What is the ownership pattern of
colors by geography over time?
(is purple trending up in China?)
Morphia
MEAN Stack
Java Python PerlRuby
Support for the most popular languages and frameworks
DRIVERS & ECOSYSTEM
DEVELOPMENT – THE PAST
DEVELOPMENT – WITH MONGODB
NEW DATA FIELDS AND TYPES
• New sensor version à new field
ALTER TABLE device_data
ADD lbs_fuel int;
• 5000 aircraft x
1 year of data x
1 reading per minute
> 2B Rows
TailNumber lbs fuel ts speed
New
Column
2BRows
How long will this take?
MONGODB LIFECYCLE
DAY 1: INITIAL EFFORTS FOR BOTH TECHNOLOGIES
DDL: create table contact ( … )
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name ) values ( ?,? )”);
fetchStmt = connection.prepareStatement
(“select id, name from contact where id = ?”);
}
save(Map m)
{
contactInsertStmt.setString(1, m.get(“id”));
contactInsertStmt.setString(2, m.get(“name”));
contactInsertStmt.execute();
}
Map fetch(String id)
{
Map m = null;
fetchStmt.setString(1, id);
rs = fetchStmt.execute();
if(rs.next()) {
m = new HashMap();
m.put(“id”, rs.getString(1));
m.put(“name”, rs.getString(2));
}
return m;
}
SQL
DDL: none
save(Map m)
{
collection.insert(m);
}
mongoDB
Map fetch(String id)
{
Map m = null;
DBObject dbo = new BasicDBObject();
dbo.put(“id”, id);
c = collection.find(dbo);
if(c.hasNext()) }
m = (Map) c.next();
}
return m;
}
Let’s assume for argument’s sake that both
approaches take the same amount of time
DAY 2: ADD SIMPLE FIELDS
m.put(“name”, “buzz”);
m.put(“id”, “K1”);
m.put(“title”, “Mr.”);
m.put(“hireDate”, new Date(2011, 11, 1));
• Capturing title and hireDate is part of adding a new business feature
• It was pretty easy to add two fields to the structure
• …but now we have to change our persistence code
Brace yourself (again) …..
SQL DAY 2 (CHANGES IN BOLD)
DDL: alter table contact add title varchar(8);
alter table contact add hireDate date;
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”);
fetchStmt = connection.prepareStatement
(“select id, name, title, hiredate from contact where id = ?”);
}
save(Map m)
{
contactInsertStmt.setString(1, m.get(“id”));
contactInsertStmt.setString(2, m.get(“name”));
contactInsertStmt.setString(3, m.get(“title”));
contactInsertStmt.setDate(4, m.get(“hireDate”));
contactInsertStmt.execute();
}
Map fetch(String id)
{
Map m = null;
fetchStmt.setString(1, id);
rs = fetchStmt.execute();
if(rs.next()) {
m = new HashMap();
m.put(“id”, rs.getString(1));
m.put(“name”, rs.getString(2));
m.put(“title”, rs.getString(3));
m.put(“hireDate”, rs.getDate(4));
}
return m;
}
Consequences:
1. Code release schedule linked to database
upgrade (new code cannot run on old schema)
2. Issues with case sensitivity starting to creep in
(many RDBMS are case insensitive for column
names, but code is case sensitive)
3. Changes require careful mods in 4 places
4. Beginning of technical debt
MONGODB DAY 2
save(Map m)
{
collection.insert(m);
}
Map fetch(String id)
{
Map m = null;
DBObject dbo = new BasicDBObject();
dbo.put(“id”, id);
c = collection.find(dbo);
if(c.hasNext()) }
m = (Map) c.next();
}
return m;
}
Advantages:
1. Zero time and money spent on overhead code
2. Code and database not physically linked
3. New material with more fields can be added into
existing collections; backfill is optional
4. Names of fields in database precisely match key
names in code layer and directly match on name, not
indirectly via positional offset
5. No technical debt is created✔ NO CHANGE
DAY 3: ADD LIST OF PHONE NUMBERS
m.put(“name”, “buzz”);
m.put(“id”, “K1”);
m.put(“title”, “Mr.”);
m.put(“hireDate”, new Date(2011, 11, 1));
n1.put(“type”, “work”);
n1.put(“number”, “1-800-555-1212”));
list.add(n1);
n2.put(“type”, “home”));
n2.put(“number”, “1-866-444-3131”));
list.add(n2);
m.put(“phones”, list);
• It was still pretty easy to add this data to the structure
• .. but meanwhile, in the persistence code …
REALLY brace yourself…
SQL DAY 3 CHANGES: OPTION 2:
PROPER APPROACH WITH MULTIPLE PHONE NUMBERS
DDL: create table phones ( … )
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,?
)”);
c2stmt = connection.prepareStatement(“insert into phones (id, type,
number) values (?, ?, ?)”;
fetchStmt = connection.prepareStatement
(“select id, name, title, hiredate, type, number from contact, phones
where phones.id = contact.id and contact.id = ?”);
}
save(Map m)
{
startTrans();
contactInsertStmt.setString(1, m.get(“id”));
contactInsertStmt.setString(2, m.get(“name”));
contactInsertStmt.setString(3, m.get(“title”));
contactInsertStmt.setDate(4, m.get(“hireDate”));
for(Map onePhone : m.get(“phones”)) {
c2stmt.setString(1, m.get(“id”));
c2stmt.setString(2, onePhone.get(“type”));
c2stmt.setString(3, onePhone.get(“number”));
c2stmt.execute();
}
contactInsertStmt.execute();
endTrans();
}
Map fetch(String id)
{
Map m = null;
fetchStmt.setString(1, id);
rs = fetchStmt.execute();
int i = 0;
List list = new ArrayList();
while (rs.next()) {
if(i == 0) {
m = new HashMap();
m.put(“id”, rs.getString(1));
m.put(“name”, rs.getString(2));
m.put(“title”, rs.getString(3));
m.put(“hireDate”, rs.getDate(4));
m.put(“phones”, list);
}
Map onePhone = new HashMap();
onePhone.put(“type”, rs.getString(5));
onePhone.put(“number”, rs.getString(6));
list.add(onePhone);
i++;
}
return m;
}
This took time and money
SQL DAY 5: ZOMBIES! (ZERO OR MORE BETWEEN ENTITIES)
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,?
)”);
c2stmt = connection.prepareStatement(“insert into phones (id, type,
number) values (?, ?, ?)”;
fetchStmt = connection.prepareStatement
(“select A.id, A.name, A.title, A.hiredate, B.type, B.number from
contact A left outer join phones B on (A.id = B. id) where A.id = ?”);
}
Whoops! And it’s also wrong!
We did not design the query accounting for contacts that have
no phone number. Thus, we have to change the join to an
outer join.
But this ALSO means we have to change the unwind logic
This took more time and money!
while (rs.next()) {
if(i == 0) {
// …
}
String s = rs.getString(5);
if(s != null) {
Map onePhone = new HashMap();
onePhone.put(“type”, s);
onePhone.put(“number”, rs.getString(6));
list.add(onePhone);
}
}
…but at least we have a DAL…
right?
COST REDUCTION
DEVELOPER COSTS ON THE RISE
$0
$20,000
$40,000
$60,000
$80,000
$100,000
$120,000
1985 2013
$0
$20,000
$40,000
$60,000
$80,000
$100,000
1985 2013
Storage Cost per GB Developer Salary
OPTIMIZING FOR ENGINEERING PRODUCTIVITY
1985 2017
Engineer Costs
Infrastructure Costs
COST REDUCTION
1. Scale out on commodity hardware vs. scale up
2. Cloud
3. Build-in HA
‒ No additional components
‒ Configuration
SCALING RELATIONAL
Scale Up Scale Out
SCALING MONGODB: AUTOMATIC SHARDING
Three types: hash-based, range-based, location-aware
Increase or decrease capacity as you go
Automatic balancing
QUERY ROUTING
Multiple query optimization models
Each sharding option appropriate
for different apps
CLOUD - ATLAS
Automated Available On-Demand
Secure Highly Available Automated Backups
Elastically Scalable
Atlas: Database as a Service for
MongoDB
RELATIONAL HIGH(?) AVAILABILITY
Application
replication
DC1 DC2
Replication
Replication
Availability Availability
Application
Bolted on Components
• Recovery: min – hours
• Manual intervention
• Expensive $$$
MONGODB REPLICA SETS
Replica Set – 2 to 50 copies
Self-healing shard
Data Center Aware
Addresses availability considerations:
High Availability
Disaster Recovery
Maintenance
Workload Isolation: operational & analytics
WAIT? WHAT ABOUT
OTHER NOSQL?
REMEMBER THIS?
{
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
location: [45.123,47.232],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
}
}
Rich Queries
Find everybody in London with a car
built between 1970 and 1980
Geospatial
Find all of the car owners within 5km
of Trafalgar Sq.
Search
Find all the cars described as having
leather seats. Count them by model.
(text, facets, collation)
Aggregation
Calculate the average value of Paul’s
car collection
Graph
Find all the cars own by Paul’s family
(descendants)
Map Reduce
What is the ownership pattern of
colors by geography over time?
(is purple trending up in China?)
AGGREGATION: POWERFUL ANALYTICS
MONGODB CONNECTOR FOR BI
Visualize and explore multi-dimensional
documents using SQL-based BI tools. The
connector does the following:
• Provides the BI tool with the schema of the
MongoDB collection to be visualized
• Translates SQL statements issued by the BI tool
into equivalent MongoDB queries that are sent to
MongoDB for processing
• Converts the results into the tabular format
expected by the BI tool, which can then visualize
the data based on user requirements
“We reduced 100+ lines of integration code to just a single line after moving to the MongoDB Spark connector.”
- Early Access Tester, Multi-National Banking Group Group
Analytics Application
Scala, Java, Python, R APIs
SQL
Machine
Learning
Libraries
Streaming Graph
Spark
Worker
Spark
Worker
Spark
Worker
Spark
Worker
MongoDB Connector for Spark
ADVANCED ANALYTICS
MongoDB Connector for Apache Spark
• Native Scala connector, certified by Databricks
• Exposes all Spark APIs & libraries
• Efficient data filtering with predicate pushdown,
secondary indexes, & in-database aggregations
• Locality awareness to reduce data movement
• Updated with Spark 2.0 support
WHAT DOES THIS MEAN?
• Developer productivity
• Wider range of use cases
• Changing requirements
• Complex queries and analytics
MongoDB
WIDE VARIETY OF USE CASES
Single View Internet of Things Mobile Real-Time Analytics
App Modernization Content ManagementBlockchain
DECISION CRITERIA
CRITERIA: WHEN TO USE MONGODB?
• Triage
‒ Building new app
‒ Re-platforming existing app
‒ Evaluating application portfolio
DECISION FLOW CHART
Existing
Application
?
Existing App
Criteria
New App
Criteria
yes
no
EXISTING APPLICATIONS
• Is there a critical requirement that isn’t being met?
‒ Performance/Scalability
‒ Agility
‒ Variable data sources/formats
‒ Availability/Resiliency
‒ Cost
‒ Cloud
• Revision or re-platform?
EXISTING APPLICATION CHALLENGES
Requirements Challenges MongoDB Features
Performance/Scalability Can’t meeting query volume
Query Latency issues
Data volume exceeding server(s)
capacity
Document Model
WiredTiger
Sharding
Commodity Hardware
Cloud/Atlas
Availability/Resiliency Need automatic failover:
• Zero down time when loss of node,
network, or data center
• No engineering effort required to
restore service
Replica sets
• Automated failover
• Zero downtime maintenance
Cloud Migration Cloud migration
No cloud provider lock in
Atlas
EXISTING APPLICATION CHALLENGES
Requirements Challenges MongoDB Features
Agility – Shorten time to value Feature backlog
Developers focused on maintenance
instead of innovation
Flexible document model
Powerful query language
Driver architecture
Variable data sources/format New data sources
Data format changes continuously
Flexible document model
Cost Mainframe MIPS
RAC clusters
Additional expensive components for
replication, failover
Commodity Hardware
Open Source
Atlas
Replica sets
CRITERIA FOR ASSESSING MONGODB FIT
• Performance/Scalability
• Availability/Resiliency
• Cloud
• Agility
• Variable Data
• Cost
• Data naturally modeled as
documents?
• Complex queries
• Analytics
• Strong consistency
ADDITIONAL CRITERIA
Requirements Challenges MongoDB Features
Data naturally modeled as documents Complex code for shredding and
reconstituting objects
Flexible document model
Complex Queries
Analytics
Complex application code
Complex architectures including
search engine, Hadoop, ETL, CDC
Limited application functionality
Long time to market
Secondary indexes
Powerful query language
Aggregration Framework
BI Connector
Spark integration
Strong Consistency Users require most up-to-date view of
data
Complex application code required to
handle edge cases
Strong consistency
Read and write concerns
WHEN NOT USE
MONGODB?
WHEN NOT TO USE MONGODB?
EXISTING APPLICATION
• Performance/Scalability
• Availability/Resiliency
• Cloud
• Agility
• Variable Data
• Cost
✓
✓
✓
✓
✓
✓
NEW APPLICATIONS
• MongoDB makes sense for vast majority of use cases
• Why not MongoDB?
‒ Fear/comfort level with new technology
‒ Don’t know how to support MongoDB?
‒ Don’t want to learn new technology
‒ Expensive enterprise license that is “free” to project team
• Our other solution is good enough
LET’S REVIEW
NEXUS ARCHITECTURE
Scalability
& Performance
Always On,
Global Deployments
FlexibilityExpressive	Query	Language
&	Secondary	Indexes
Strong	Consistency
Enterprise	Management
&	Integrations
Replica sets
Sharding
WiredTiger
Document Model
Replica Sets
Sharding
Ops & Cloud Mgr
Atlas
MongoDB query language
Secondary indexes
Aggregation Framework
BI Connector
Strong Consistency
Read and Write Concern
BI Connector
Ops & Cloud Mgr.
Spark Connector
Atlas
I DON’T ALWAYS BUILD
APPLICATIONS IN
MONGODB, BUT WHEN I
DO I GET…..
Ø 5x Developer Productivity
Ø 80% Cost Reduction
USE CASE INDICATORS FOR MONGODB
• Performance/Scalability
• Availability/Resiliency
• Cloud
• Agility
• Variable Data
• Cost
• Data naturally modeled as
documents?
• Complex queries
• Analytics
• Strong consistency
QUESTIONS?
• Jay Runkel
• Principal Solutions Architect
jay.runkel@mongodb.com
@jayrunkel
MANY REASONS FOR MONGODB?
• Strong Consistency
‒ Documents
‒ Indexes
‒ Consistency across multi-data center
deployments
• Expressive Query Language and
Secondary Indexes
‒ More powerful than SQL
‒ Analytics
‒ Dynamic index creation
• Scalability/Performance
‒ PBs of Data
‒ Millions of ops/sec
• High Availability
‒ Automated failover < 2 seconds
‒ Supports Active-Active and Active-
Passive multi-data center
deployments
• Deploy Anywhere
‒ On-Prem, AWS, Azure, Google
• Ease of Management
‒ Best in class operations tooling
‒ Configure once: one cluster spans
multi-data centers

When to Use MongoDB

  • 1.
    Jay Runkel Principal SolutionsArchitect WHEN TO USE MONGODB jay.runkel@mongodb.com @jayrunkel
  • 2.
    AGENDA • When touse MongoDB? Are we asking the right question? • Why MongoDB? • Evaluating Use Case Suitability for MongoDB • When you shouldn’t use MongoDB?
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
    BEING SUCCESSFUL WITHMONGODB 5x Productivity* We help our customers to increase overall output, e.g. in terms of development or ops productivity. 80% Cost reduction* We help our customers to dramatically lower their total cost of ownership for data storage and analytics by up to 80%. * Dependent on type of implementation While the detailed definition of success metrics look different for each customer, 2 key factors are consistent across all of our engagements:
  • 10.
  • 11.
    CAN WE USEMONGODB? • If we get ‒ 5x developer productivity ‒ 80% cost reduction • Shouldn’t we consider this alternative first? Assess MongoDB Fit MongoD B? Build In MongoDB Look at Alternatives yes no
  • 12.
    SHOULD CAN WEUSE MONGODB?
  • 13.
  • 14.
  • 15.
    Scalability & Performance Always On, GlobalDeployments FlexibilityExpressive Query Language & Secondary Indexes Strong Consistency Enterprise Management & Integrations NOSQL
  • 16.
    NEXUS ARCHITECTURE Scalability & Performance AlwaysOn, Global Deployments FlexibilityExpressive Query Language & Secondary Indexes Strong Consistency Enterprise Management & Integrations
  • 17.
    THAT’S NICE JAY,BUT… • Where does the developer productivity come from? • What about the TCO savings?
  • 18.
  • 19.
    DOCUMENT DATA MODEL RelationalMongoDB { first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] }
  • 20.
    DOCUMENTS ARE RICHDATA STRUCTURES { first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], Profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Fields can contain an array of sub- documents Fields Typed field values Fields can contain arrays
  • 21.
    DOCUMENTS ARE FLEXIBLE Documentsin the same product catalog collection in MongoDB { product_name: ‘Acme Paint’, color: [‘Red’, ‘Green’], size_oz: [8, 32], finish: [‘satin’, ‘eggshell’] } { product_name: ‘T-shirt’, size: [‘S’, ‘M’, ‘L’, ‘XL’], color: [‘Heather Gray’ … ], material: ‘100% cotton’, wash: ‘cold’, dry: ‘tumble dry low’ } { product_name: ‘Mountain Bike’, brake_style: ‘mechanical disc’, color: ‘grey’, frame_material: ‘aluminum’, no_speeds: 21, package_height: ‘7.5x32.9x55’, weight_lbs: 44.05, suspension_type: ‘dual’, wheel_size_in: 26 }
  • 22.
    DO MORE WITHYOUR DATA { first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } } } Rich Queries Find everybody in London with a car built between 1970 and 1980 Geospatial Find all of the car owners within 5km of Trafalgar Sq. Search Find all the cars described as having leather seats. Count them by model. (text, facets, collation) Aggregation Calculate the average value of Paul’s car collection Graph Find all the cars own by Paul’s family (descendants) Map Reduce What is the ownership pattern of colors by geography over time? (is purple trending up in China?)
  • 23.
    Morphia MEAN Stack Java PythonPerlRuby Support for the most popular languages and frameworks DRIVERS & ECOSYSTEM
  • 24.
  • 25.
  • 26.
    NEW DATA FIELDSAND TYPES • New sensor version à new field ALTER TABLE device_data ADD lbs_fuel int; • 5000 aircraft x 1 year of data x 1 reading per minute > 2B Rows TailNumber lbs fuel ts speed New Column 2BRows How long will this take?
  • 27.
  • 28.
    DAY 1: INITIALEFFORTS FOR BOTH TECHNOLOGIES DDL: create table contact ( … ) init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name ) values ( ?,? )”); fetchStmt = connection.prepareStatement (“select id, name from contact where id = ?”); } save(Map m) { contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.execute(); } Map fetch(String id) { Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) { m = new HashMap(); m.put(“id”, rs.getString(1)); m.put(“name”, rs.getString(2)); } return m; } SQL DDL: none save(Map m) { collection.insert(m); } mongoDB Map fetch(String id) { Map m = null; DBObject dbo = new BasicDBObject(); dbo.put(“id”, id); c = collection.find(dbo); if(c.hasNext()) } m = (Map) c.next(); } return m; } Let’s assume for argument’s sake that both approaches take the same amount of time
  • 29.
    DAY 2: ADDSIMPLE FIELDS m.put(“name”, “buzz”); m.put(“id”, “K1”); m.put(“title”, “Mr.”); m.put(“hireDate”, new Date(2011, 11, 1)); • Capturing title and hireDate is part of adding a new business feature • It was pretty easy to add two fields to the structure • …but now we have to change our persistence code Brace yourself (again) …..
  • 30.
    SQL DAY 2(CHANGES IN BOLD) DDL: alter table contact add title varchar(8); alter table contact add hireDate date; init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); fetchStmt = connection.prepareStatement (“select id, name, title, hiredate from contact where id = ?”); } save(Map m) { contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); contactInsertStmt.execute(); } Map fetch(String id) { Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) { m = new HashMap(); m.put(“id”, rs.getString(1)); m.put(“name”, rs.getString(2)); m.put(“title”, rs.getString(3)); m.put(“hireDate”, rs.getDate(4)); } return m; } Consequences: 1. Code release schedule linked to database upgrade (new code cannot run on old schema) 2. Issues with case sensitivity starting to creep in (many RDBMS are case insensitive for column names, but code is case sensitive) 3. Changes require careful mods in 4 places 4. Beginning of technical debt
  • 31.
    MONGODB DAY 2 save(Mapm) { collection.insert(m); } Map fetch(String id) { Map m = null; DBObject dbo = new BasicDBObject(); dbo.put(“id”, id); c = collection.find(dbo); if(c.hasNext()) } m = (Map) c.next(); } return m; } Advantages: 1. Zero time and money spent on overhead code 2. Code and database not physically linked 3. New material with more fields can be added into existing collections; backfill is optional 4. Names of fields in database precisely match key names in code layer and directly match on name, not indirectly via positional offset 5. No technical debt is created✔ NO CHANGE
  • 32.
    DAY 3: ADDLIST OF PHONE NUMBERS m.put(“name”, “buzz”); m.put(“id”, “K1”); m.put(“title”, “Mr.”); m.put(“hireDate”, new Date(2011, 11, 1)); n1.put(“type”, “work”); n1.put(“number”, “1-800-555-1212”)); list.add(n1); n2.put(“type”, “home”)); n2.put(“number”, “1-866-444-3131”)); list.add(n2); m.put(“phones”, list); • It was still pretty easy to add this data to the structure • .. but meanwhile, in the persistence code … REALLY brace yourself…
  • 33.
    SQL DAY 3CHANGES: OPTION 2: PROPER APPROACH WITH MULTIPLE PHONE NUMBERS DDL: create table phones ( … ) init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); c2stmt = connection.prepareStatement(“insert into phones (id, type, number) values (?, ?, ?)”; fetchStmt = connection.prepareStatement (“select id, name, title, hiredate, type, number from contact, phones where phones.id = contact.id and contact.id = ?”); } save(Map m) { startTrans(); contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); for(Map onePhone : m.get(“phones”)) { c2stmt.setString(1, m.get(“id”)); c2stmt.setString(2, onePhone.get(“type”)); c2stmt.setString(3, onePhone.get(“number”)); c2stmt.execute(); } contactInsertStmt.execute(); endTrans(); } Map fetch(String id) { Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); int i = 0; List list = new ArrayList(); while (rs.next()) { if(i == 0) { m = new HashMap(); m.put(“id”, rs.getString(1)); m.put(“name”, rs.getString(2)); m.put(“title”, rs.getString(3)); m.put(“hireDate”, rs.getDate(4)); m.put(“phones”, list); } Map onePhone = new HashMap(); onePhone.put(“type”, rs.getString(5)); onePhone.put(“number”, rs.getString(6)); list.add(onePhone); i++; } return m; } This took time and money
  • 34.
    SQL DAY 5:ZOMBIES! (ZERO OR MORE BETWEEN ENTITIES) init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); c2stmt = connection.prepareStatement(“insert into phones (id, type, number) values (?, ?, ?)”; fetchStmt = connection.prepareStatement (“select A.id, A.name, A.title, A.hiredate, B.type, B.number from contact A left outer join phones B on (A.id = B. id) where A.id = ?”); } Whoops! And it’s also wrong! We did not design the query accounting for contacts that have no phone number. Thus, we have to change the join to an outer join. But this ALSO means we have to change the unwind logic This took more time and money! while (rs.next()) { if(i == 0) { // … } String s = rs.getString(5); if(s != null) { Map onePhone = new HashMap(); onePhone.put(“type”, s); onePhone.put(“number”, rs.getString(6)); list.add(onePhone); } } …but at least we have a DAL… right?
  • 35.
  • 36.
    DEVELOPER COSTS ONTHE RISE $0 $20,000 $40,000 $60,000 $80,000 $100,000 $120,000 1985 2013 $0 $20,000 $40,000 $60,000 $80,000 $100,000 1985 2013 Storage Cost per GB Developer Salary
  • 37.
    OPTIMIZING FOR ENGINEERINGPRODUCTIVITY 1985 2017 Engineer Costs Infrastructure Costs
  • 38.
    COST REDUCTION 1. Scaleout on commodity hardware vs. scale up 2. Cloud 3. Build-in HA ‒ No additional components ‒ Configuration
  • 39.
  • 40.
    SCALING MONGODB: AUTOMATICSHARDING Three types: hash-based, range-based, location-aware Increase or decrease capacity as you go Automatic balancing
  • 41.
    QUERY ROUTING Multiple queryoptimization models Each sharding option appropriate for different apps
  • 42.
  • 43.
    Automated Available On-Demand SecureHighly Available Automated Backups Elastically Scalable Atlas: Database as a Service for MongoDB
  • 44.
    RELATIONAL HIGH(?) AVAILABILITY Application replication DC1DC2 Replication Replication Availability Availability Application Bolted on Components • Recovery: min – hours • Manual intervention • Expensive $$$
  • 45.
    MONGODB REPLICA SETS ReplicaSet – 2 to 50 copies Self-healing shard Data Center Aware Addresses availability considerations: High Availability Disaster Recovery Maintenance Workload Isolation: operational & analytics
  • 46.
  • 47.
    REMEMBER THIS? { first_name: ‘Paul’, surname:‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } } } Rich Queries Find everybody in London with a car built between 1970 and 1980 Geospatial Find all of the car owners within 5km of Trafalgar Sq. Search Find all the cars described as having leather seats. Count them by model. (text, facets, collation) Aggregation Calculate the average value of Paul’s car collection Graph Find all the cars own by Paul’s family (descendants) Map Reduce What is the ownership pattern of colors by geography over time? (is purple trending up in China?)
  • 48.
  • 49.
    MONGODB CONNECTOR FORBI Visualize and explore multi-dimensional documents using SQL-based BI tools. The connector does the following: • Provides the BI tool with the schema of the MongoDB collection to be visualized • Translates SQL statements issued by the BI tool into equivalent MongoDB queries that are sent to MongoDB for processing • Converts the results into the tabular format expected by the BI tool, which can then visualize the data based on user requirements
  • 50.
    “We reduced 100+lines of integration code to just a single line after moving to the MongoDB Spark connector.” - Early Access Tester, Multi-National Banking Group Group Analytics Application Scala, Java, Python, R APIs SQL Machine Learning Libraries Streaming Graph Spark Worker Spark Worker Spark Worker Spark Worker MongoDB Connector for Spark ADVANCED ANALYTICS MongoDB Connector for Apache Spark • Native Scala connector, certified by Databricks • Exposes all Spark APIs & libraries • Efficient data filtering with predicate pushdown, secondary indexes, & in-database aggregations • Locality awareness to reduce data movement • Updated with Spark 2.0 support
  • 51.
    WHAT DOES THISMEAN? • Developer productivity • Wider range of use cases • Changing requirements • Complex queries and analytics MongoDB
  • 52.
    WIDE VARIETY OFUSE CASES Single View Internet of Things Mobile Real-Time Analytics App Modernization Content ManagementBlockchain
  • 53.
  • 54.
    CRITERIA: WHEN TOUSE MONGODB? • Triage ‒ Building new app ‒ Re-platforming existing app ‒ Evaluating application portfolio
  • 55.
    DECISION FLOW CHART Existing Application ? ExistingApp Criteria New App Criteria yes no
  • 56.
    EXISTING APPLICATIONS • Isthere a critical requirement that isn’t being met? ‒ Performance/Scalability ‒ Agility ‒ Variable data sources/formats ‒ Availability/Resiliency ‒ Cost ‒ Cloud • Revision or re-platform?
  • 57.
    EXISTING APPLICATION CHALLENGES RequirementsChallenges MongoDB Features Performance/Scalability Can’t meeting query volume Query Latency issues Data volume exceeding server(s) capacity Document Model WiredTiger Sharding Commodity Hardware Cloud/Atlas Availability/Resiliency Need automatic failover: • Zero down time when loss of node, network, or data center • No engineering effort required to restore service Replica sets • Automated failover • Zero downtime maintenance Cloud Migration Cloud migration No cloud provider lock in Atlas
  • 58.
    EXISTING APPLICATION CHALLENGES RequirementsChallenges MongoDB Features Agility – Shorten time to value Feature backlog Developers focused on maintenance instead of innovation Flexible document model Powerful query language Driver architecture Variable data sources/format New data sources Data format changes continuously Flexible document model Cost Mainframe MIPS RAC clusters Additional expensive components for replication, failover Commodity Hardware Open Source Atlas Replica sets
  • 59.
    CRITERIA FOR ASSESSINGMONGODB FIT • Performance/Scalability • Availability/Resiliency • Cloud • Agility • Variable Data • Cost • Data naturally modeled as documents? • Complex queries • Analytics • Strong consistency
  • 60.
    ADDITIONAL CRITERIA Requirements ChallengesMongoDB Features Data naturally modeled as documents Complex code for shredding and reconstituting objects Flexible document model Complex Queries Analytics Complex application code Complex architectures including search engine, Hadoop, ETL, CDC Limited application functionality Long time to market Secondary indexes Powerful query language Aggregration Framework BI Connector Spark integration Strong Consistency Users require most up-to-date view of data Complex application code required to handle edge cases Strong consistency Read and write concerns
  • 61.
  • 62.
    WHEN NOT TOUSE MONGODB?
  • 63.
    EXISTING APPLICATION • Performance/Scalability •Availability/Resiliency • Cloud • Agility • Variable Data • Cost ✓ ✓ ✓ ✓ ✓ ✓
  • 64.
    NEW APPLICATIONS • MongoDBmakes sense for vast majority of use cases • Why not MongoDB? ‒ Fear/comfort level with new technology ‒ Don’t know how to support MongoDB? ‒ Don’t want to learn new technology ‒ Expensive enterprise license that is “free” to project team • Our other solution is good enough
  • 65.
  • 66.
    NEXUS ARCHITECTURE Scalability & Performance AlwaysOn, Global Deployments FlexibilityExpressive Query Language & Secondary Indexes Strong Consistency Enterprise Management & Integrations Replica sets Sharding WiredTiger Document Model Replica Sets Sharding Ops & Cloud Mgr Atlas MongoDB query language Secondary indexes Aggregation Framework BI Connector Strong Consistency Read and Write Concern BI Connector Ops & Cloud Mgr. Spark Connector Atlas
  • 67.
    I DON’T ALWAYSBUILD APPLICATIONS IN MONGODB, BUT WHEN I DO I GET….. Ø 5x Developer Productivity Ø 80% Cost Reduction
  • 68.
    USE CASE INDICATORSFOR MONGODB • Performance/Scalability • Availability/Resiliency • Cloud • Agility • Variable Data • Cost • Data naturally modeled as documents? • Complex queries • Analytics • Strong consistency
  • 69.
    QUESTIONS? • Jay Runkel •Principal Solutions Architect jay.runkel@mongodb.com @jayrunkel
  • 70.
    MANY REASONS FORMONGODB? • Strong Consistency ‒ Documents ‒ Indexes ‒ Consistency across multi-data center deployments • Expressive Query Language and Secondary Indexes ‒ More powerful than SQL ‒ Analytics ‒ Dynamic index creation • Scalability/Performance ‒ PBs of Data ‒ Millions of ops/sec • High Availability ‒ Automated failover < 2 seconds ‒ Supports Active-Active and Active- Passive multi-data center deployments • Deploy Anywhere ‒ On-Prem, AWS, Azure, Google • Ease of Management ‒ Best in class operations tooling ‒ Configure once: one cluster spans multi-data centers