When to Use MongoDB

Jay Runkel
Principal Solutions Architect
WHEN TO USE MONGODB
jay.runkel@mongodb.com
@jayrunkel

AGENDA
• When to use MongoDB? Are we asking the right question?
• Why MongoDB?
• Evaluating Use Case Suitability for MongoDB
• When you shouldn’t use MongoDB?

MODERN APPLICATION
RDBMS MongoDB
?

Legacy
Rigid Schemas
Resistant to
change
Throughput & Cost
make Scale-Up
Impractical
Relational Model Scale-up
Data changes constantly,
which fits poorly with a
relational model
Scale-Up clusters were never
meant to handle today’s
volumes
Today
Flexible Model
01
10
JSON
Scale-out
Flexible Multi-Structured
Schema that is designed to
adapt to changes
Scale-out to the end of the world
and distribute data where it
needs to be
TRADITIONAL RDBMS SYSTEMS WEREN’T DESIGNED
FOR TODAY’S WORLD

BEING SUCCESSFUL WITH MONGODB
5x
Productivity*
We help our customers to increase
overall output, e.g. in terms of
development or ops productivity.
80%
Cost reduction*
We help our customers to dramatically
lower their total cost of ownership for data
storage and analytics by up to 80%.
* Dependent on type of implementation
While the detailed definition of success metrics look different for each customer, 2 key factors are
consistent across all of our engagements:

CAN WE USE MONGODB?
• If we get
‒ 5x developer productivity
‒ 80% cost reduction
• Shouldn’t we consider this
alternative first?
Assess
MongoDB Fit
MongoD
B?
Build In
MongoDB
Look at
Alternatives
yes
no

RELATIONAL
Expressive Query Language
& Secondary Indexes
Strong Consistency
Enterprise Management
& Integrations

Scalability
& Performance
Always On,
Global Deployments
FlexibilityExpressive Query Language
& Secondary Indexes
Strong Consistency
& Integrations
NOSQL

NEXUS ARCHITECTURE
Scalability
& Performance
Always On,
Global Deployments
& Secondary Indexes
Strong Consistency
& Integrations

THAT’S NICE JAY, BUT…
• Where does the developer productivity come from?
• What about the TCO savings?

DOCUMENT DATA MODEL
Relational MongoDB
{
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
location: [45.123,47.232],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}

DOCUMENTS ARE RICH DATA STRUCTURES
{
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
Profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
year: 1973,
value: 100000, … },
year: 1965,
value: 330000, … }
]
}
Fields can contain an array of sub-
documents
Fields
Typed field values
Fields can contain
arrays

DOCUMENTS ARE FLEXIBLE
Documents in the same product catalog collection in MongoDB
{
product_name: ‘Acme Paint’,
color: [‘Red’, ‘Green’],
size_oz: [8, 32],
finish: [‘satin’, ‘eggshell’]
}
{
product_name: ‘T-shirt’,
size: [‘S’, ‘M’, ‘L’, ‘XL’],
color: [‘Heather Gray’ … ],
material: ‘100% cotton’,
wash: ‘cold’,
dry: ‘tumble dry low’
}
{
product_name: ‘Mountain Bike’,
brake_style: ‘mechanical disc’,
color: ‘grey’,
frame_material: ‘aluminum’,
no_speeds: 21,
package_height: ‘7.5x32.9x55’,
weight_lbs: 44.05,
suspension_type: ‘dual’,
wheel_size_in: 26
}

DO MORE WITH YOUR DATA
{
city: ‘London’,
location: [45.123,47.232],
cars: [
year: 1973,
value: 100000, … },
year: 1965,
value: 330000, … }
}
}
Rich Queries
Find everybody in London with a car
built between 1970 and 1980
Geospatial
Find all of the car owners within 5km
of Trafalgar Sq.
Search
Find all the cars described as having
leather seats. Count them by model.
(text, facets, collation)
Aggregation
Calculate the average value of Paul’s
car collection
Graph
Find all the cars own by Paul’s family
(descendants)
Map Reduce
What is the ownership pattern of
colors by geography over time?
(is purple trending up in China?)

Morphia
MEAN Stack
Java Python PerlRuby
Support for the most popular languages and frameworks
DRIVERS & ECOSYSTEM

NEW DATA FIELDS AND TYPES
• New sensor version à new field
ALTER TABLE device_data
ADD lbs_fuel int;
• 5000 aircraft x
1 year of data x
1 reading per minute
> 2B Rows
TailNumber lbs fuel ts speed
New
Column
2BRows
How long will this take?

DAY 1: INITIAL EFFORTS FOR BOTH TECHNOLOGIES
DDL: create table contact ( … )
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name ) values ( ?,? )”);
fetchStmt = connection.prepareStatement
(“select id, name from contact where id = ?”);
}
save(Map m)
{
contactInsertStmt.setString(1, m.get(“id”));
contactInsertStmt.setString(2, m.get(“name”));
contactInsertStmt.execute();
}
Map fetch(String id)
{
Map m = null;
fetchStmt.setString(1, id);
rs = fetchStmt.execute();
if(rs.next()) {
m = new HashMap();
m.put(“id”, rs.getString(1));
m.put(“name”, rs.getString(2));
}
return m;
}
SQL
DDL: none
save(Map m)
{
collection.insert(m);
}
mongoDB
{
Map m = null;
DBObject dbo = new BasicDBObject();
dbo.put(“id”, id);
c = collection.find(dbo);
if(c.hasNext()) }
m = (Map) c.next();
}
return m;
}
Let’s assume for argument’s sake that both
approaches take the same amount of time

DAY 2: ADD SIMPLE FIELDS
m.put(“name”, “buzz”);
m.put(“id”, “K1”);
m.put(“title”, “Mr.”);
m.put(“hireDate”, new Date(2011, 11, 1));
• Capturing title and hireDate is part of adding a new business feature
• It was pretty easy to add two fields to the structure
• …but now we have to change our persistence code
Brace yourself (again) …..

SQL DAY 2 (CHANGES IN BOLD)
DDL: alter table contact add title varchar(8);
alter table contact add hireDate date;
init()
{
(“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”);
(“select id, name, title, hiredate from contact where id = ?”);
}
save(Map m)
{
contactInsertStmt.setString(3, m.get(“title”));
contactInsertStmt.setDate(4, m.get(“hireDate”));
}
{
Map m = null;
if(rs.next()) {
m = new HashMap();
m.put(“title”, rs.getString(3));
m.put(“hireDate”, rs.getDate(4));
}
return m;
}
Consequences:
1. Code release schedule linked to database
upgrade (new code cannot run on old schema)
2. Issues with case sensitivity starting to creep in
(many RDBMS are case insensitive for column
names, but code is case sensitive)
3. Changes require careful mods in 4 places
4. Beginning of technical debt

MONGODB DAY 2
save(Map m)
{
collection.insert(m);
}
{
Map m = null;
DBObject dbo = new BasicDBObject();
dbo.put(“id”, id);
c = collection.find(dbo);
if(c.hasNext()) }
m = (Map) c.next();
}
return m;
}
Advantages:
1. Zero time and money spent on overhead code
2. Code and database not physically linked
3. New material with more fields can be added into
existing collections; backfill is optional
4. Names of fields in database precisely match key
names in code layer and directly match on name, not
indirectly via positional offset
5. No technical debt is created✔ NO CHANGE

DAY 3: ADD LIST OF PHONE NUMBERS
m.put(“name”, “buzz”);
m.put(“id”, “K1”);
m.put(“title”, “Mr.”);
m.put(“hireDate”, new Date(2011, 11, 1));
n1.put(“type”, “work”);
n1.put(“number”, “1-800-555-1212”));
list.add(n1);
n2.put(“type”, “home”));
n2.put(“number”, “1-866-444-3131”));
list.add(n2);
m.put(“phones”, list);
• It was still pretty easy to add this data to the structure
• .. but meanwhile, in the persistence code …
REALLY brace yourself…

SQL DAY 3 CHANGES: OPTION 2:
PROPER APPROACH WITH MULTIPLE PHONE NUMBERS
DDL: create table phones ( … )
init()
{
(“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,?
)”);
c2stmt = connection.prepareStatement(“insert into phones (id, type,
number) values (?, ?, ?)”;
(“select id, name, title, hiredate, type, number from contact, phones
where phones.id = contact.id and contact.id = ?”);
}
save(Map m)
{
startTrans();
contactInsertStmt.setString(3, m.get(“title”));
contactInsertStmt.setDate(4, m.get(“hireDate”));
for(Map onePhone : m.get(“phones”)) {
c2stmt.setString(1, m.get(“id”));
c2stmt.setString(2, onePhone.get(“type”));
c2stmt.setString(3, onePhone.get(“number”));
c2stmt.execute();
}
endTrans();
}
{
Map m = null;
int i = 0;
List list = new ArrayList();
while (rs.next()) {
if(i == 0) {
m = new HashMap();
m.put(“title”, rs.getString(3));
m.put(“hireDate”, rs.getDate(4));
m.put(“phones”, list);
}
Map onePhone = new HashMap();
onePhone.put(“type”, rs.getString(5));
onePhone.put(“number”, rs.getString(6));
list.add(onePhone);
i++;
}
return m;
}
This took time and money

SQL DAY 5: ZOMBIES! (ZERO OR MORE BETWEEN ENTITIES)
init()
{
(“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,?
)”);
c2stmt = connection.prepareStatement(“insert into phones (id, type,
number) values (?, ?, ?)”;
(“select A.id, A.name, A.title, A.hiredate, B.type, B.number from
contact A left outer join phones B on (A.id = B. id) where A.id = ?”);
}
Whoops! And it’s also wrong!
We did not design the query accounting for contacts that have
no phone number. Thus, we have to change the join to an
outer join.
But this ALSO means we have to change the unwind logic
This took more time and money!
while (rs.next()) {
if(i == 0) {
// …
}
String s = rs.getString(5);
if(s != null) {
Map onePhone = new HashMap();
onePhone.put(“type”, s);
onePhone.put(“number”, rs.getString(6));
list.add(onePhone);
}
}
…but at least we have a DAL…
right?

DEVELOPER COSTS ON THE RISE
$0
$20,000
$40,000
$60,000
$80,000
$100,000
$120,000
1985 2013
$0
$20,000
$40,000
$60,000
$80,000
$100,000
1985 2013
Storage Cost per GB Developer Salary

OPTIMIZING FOR ENGINEERING PRODUCTIVITY
1985 2017
Engineer Costs
Infrastructure Costs

COST REDUCTION
1. Scale out on commodity hardware vs. scale up
2. Cloud
3. Build-in HA
‒ No additional components
‒ Configuration

SCALING RELATIONAL
Scale Up Scale Out

SCALING MONGODB: AUTOMATIC SHARDING
Three types: hash-based, range-based, location-aware
Increase or decrease capacity as you go
Automatic balancing

QUERY ROUTING
Multiple query optimization models
Each sharding option appropriate
for different apps

Automated Available On-Demand
Secure Highly Available Automated Backups
Elastically Scalable
Atlas: Database as a Service for
MongoDB

RELATIONAL HIGH(?) AVAILABILITY
Application
replication
DC1 DC2
Replication
Replication
Availability Availability
Application
Bolted on Components
• Recovery: min – hours
• Manual intervention
• Expensive $$$

MONGODB REPLICA SETS
Replica Set – 2 to 50 copies
Self-healing shard
Data Center Aware
Addresses availability considerations:
High Availability
Disaster Recovery
Maintenance
Workload Isolation: operational & analytics

REMEMBER THIS?
{
city: ‘London’,
location: [45.123,47.232],
cars: [
year: 1973,
value: 100000, … },
year: 1965,
value: 330000, … }
}
}
Rich Queries
Find everybody in London with a car
built between 1970 and 1980
Geospatial
Find all of the car owners within 5km
of Trafalgar Sq.
Search
Find all the cars described as having
leather seats. Count them by model.
(text, facets, collation)
Aggregation
Calculate the average value of Paul’s
car collection
Graph
Find all the cars own by Paul’s family
(descendants)
Map Reduce
What is the ownership pattern of
colors by geography over time?
(is purple trending up in China?)

AGGREGATION: POWERFUL ANALYTICS

MONGODB CONNECTOR FOR BI
Visualize and explore multi-dimensional
documents using SQL-based BI tools. The
connector does the following:
• Provides the BI tool with the schema of the
MongoDB collection to be visualized
• Translates SQL statements issued by the BI tool
into equivalent MongoDB queries that are sent to
MongoDB for processing
• Converts the results into the tabular format
expected by the BI tool, which can then visualize
the data based on user requirements

“We reduced 100+ lines of integration code to just a single line after moving to the MongoDB Spark connector.”
- Early Access Tester, Multi-National Banking Group Group
Analytics Application
Scala, Java, Python, R APIs
SQL
Machine
Learning
Libraries
Streaming Graph
Spark
Worker
Spark
Worker
Spark
Worker
Spark
Worker
MongoDB Connector for Spark
ADVANCED ANALYTICS
MongoDB Connector for Apache Spark
• Native Scala connector, certified by Databricks
• Exposes all Spark APIs & libraries
• Efficient data filtering with predicate pushdown,
secondary indexes, & in-database aggregations
• Locality awareness to reduce data movement
• Updated with Spark 2.0 support

WHAT DOES THIS MEAN?
• Developer productivity
• Wider range of use cases
• Changing requirements
• Complex queries and analytics
MongoDB

WIDE VARIETY OF USE CASES
Single View Internet of Things Mobile Real-Time Analytics
App Modernization Content ManagementBlockchain

CRITERIA: WHEN TO USE MONGODB?
• Triage
‒ Building new app
‒ Re-platforming existing app
‒ Evaluating application portfolio

DECISION FLOW CHART
Existing
Application
?
Existing App
Criteria
New App
Criteria
yes
no

EXISTING APPLICATIONS
• Is there a critical requirement that isn’t being met?
‒ Performance/Scalability
‒ Agility
‒ Variable data sources/formats
‒ Availability/Resiliency
‒ Cost
‒ Cloud
• Revision or re-platform?

EXISTING APPLICATION CHALLENGES
Requirements Challenges MongoDB Features
Performance/Scalability Can’t meeting query volume
Query Latency issues
Data volume exceeding server(s)
capacity
Document Model
WiredTiger
Sharding
Commodity Hardware
Cloud/Atlas
Availability/Resiliency Need automatic failover:
• Zero down time when loss of node,
network, or data center
• No engineering effort required to
restore service
Replica sets
• Automated failover
• Zero downtime maintenance
Cloud Migration Cloud migration
No cloud provider lock in
Atlas

EXISTING APPLICATION CHALLENGES
Agility – Shorten time to value Feature backlog
Developers focused on maintenance
instead of innovation
Flexible document model
Powerful query language
Driver architecture
Variable data sources/format New data sources
Data format changes continuously
Cost Mainframe MIPS
RAC clusters
Additional expensive components for
replication, failover
Commodity Hardware
Open Source
Atlas
Replica sets

CRITERIA FOR ASSESSING MONGODB FIT
• Performance/Scalability
• Availability/Resiliency
• Cloud
• Agility
• Variable Data
• Cost
• Data naturally modeled as
documents?
• Complex queries
• Analytics
• Strong consistency

ADDITIONAL CRITERIA
Data naturally modeled as documents Complex code for shredding and
reconstituting objects
Complex Queries
Analytics
Complex application code
Complex architectures including
search engine, Hadoop, ETL, CDC
Limited application functionality
Long time to market
Secondary indexes
Powerful query language
Aggregration Framework
BI Connector
Spark integration
Strong Consistency Users require most up-to-date view of
data
Complex application code required to
handle edge cases
Strong consistency
Read and write concerns

EXISTING APPLICATION
• Cloud
• Agility
• Variable Data
• Cost
✓
✓
✓
✓
✓
✓

NEW APPLICATIONS
• MongoDB makes sense for vast majority of use cases
• Why not MongoDB?
‒ Fear/comfort level with new technology
‒ Don’t know how to support MongoDB?
‒ Don’t want to learn new technology
‒ Expensive enterprise license that is “free” to project team
• Our other solution is good enough

NEXUS ARCHITECTURE
Scalability
& Performance
Always On,
Global Deployments
& Secondary Indexes
Strong Consistency
& Integrations
Replica sets
Sharding
WiredTiger
Document Model
Replica Sets
Sharding
Ops & Cloud Mgr
Atlas
MongoDB query language
Secondary indexes
Aggregation Framework
BI Connector
Strong Consistency
Read and Write Concern
BI Connector
Ops & Cloud Mgr.
Spark Connector
Atlas

I DON’T ALWAYS BUILD
APPLICATIONS IN
MONGODB, BUT WHEN I
DO I GET…..
Ø 5x Developer Productivity
Ø 80% Cost Reduction

USE CASE INDICATORS FOR MONGODB
• Cloud
• Agility
• Variable Data
• Cost
• Data naturally modeled as
documents?
• Complex queries
• Analytics
• Strong consistency

QUESTIONS?
• Jay Runkel
• Principal Solutions Architect
jay.runkel@mongodb.com
@jayrunkel

MANY REASONS FOR MONGODB?
• Strong Consistency
‒ Documents
‒ Indexes
‒ Consistency across multi-data center
deployments
• Expressive Query Language and
Secondary Indexes
‒ More powerful than SQL
‒ Analytics
‒ Dynamic index creation
• Scalability/Performance
‒ PBs of Data
‒ Millions of ops/sec
• High Availability
‒ Automated failover < 2 seconds
‒ Supports Active-Active and Active-
Passive multi-data center
deployments
• Deploy Anywhere
‒ On-Prem, AWS, Azure, Google
• Ease of Management
‒ Best in class operations tooling
‒ Configure once: one cluster spans
multi-data centers

When to Use MongoDB

More Related Content

What's hot

Viewers also liked

Similar to When to Use MongoDB

More from MongoDB

When to Use MongoDB