Production Deployment
with MongoDB 3.0
Agenda
MongoDB Introduction
Data Model
General Production Considerations
Durability, Scalability, Availability
Deployment Architectures & Operations
Demonstration: three servers in two data centers
MongoDB in the Market of Ideas
MongoDB, Inc.
400+ employees 2,000+ customers
Over $311 million in funding13 offices around the world
THE LARGEST ECOSYSTEM
9,000,000+
MongoDB Downloads
250,000+
Online Education Registrants
35,000+
MongoDB User Group Members
35,000+
MongoDB Management Service (MMS) Users
750+
Technology and Services Partners
2,000+
Customers Across All Industries
MongoDB Use Cases
Single View Internet of Things Mobile Real-Time Analytics
Catalog Personalization Content Management
Data Model
Document Data Model
Relational MongoDB
{
first_name: 'Paul',
surname: 'Miller',
city: 'London',
location: [45.123,47.232],
cars: [
{ model: 'Bentley',
year: 1973,
value: 100000, … },
{ model: 'Rolls Royce',
year: 1965,
value: 330000, … }
]
}
Documents are Rich Data Structures
{
first_name: 'Paul',
surname: 'Miller',
cell: '+447557505611'
city: 'London',
location: [45.123,47.232],
Profession: [banking, finance, trader],
cars: [
{ model: 'Bentley',
year: 1973,
value: 100000, … },
{ model: 'Rolls Royce',
year: 1965,
value: 330000, … }
]
}
Fields can contain an array of sub-
documents
Fields
Typed field values
Fields can contain
arrays
Fully Featured Database Queries
Do More With Your Data
MongoDB
{
first_name: 'Paul',
surname: 'Miller',
city: 'London',
location: [45.123,47.232],
cars: [
{ model: 'Bentley',
year: 1973,
value: 100000, … },
{ model: 'Rolls Royce',
year: 1965,
value: 330000, … }
}
}
Rich Queries
Find Paul's cars
Find everybody in London with a car
built between 1970 and 1980
Geospatial
Find all of the car owners within 5km
of Trafalgar Sq.
Text Search
Find all the cars described as having
leather seats
Aggregation
Calculate the average value of Paul's
car collection
Map Reduce
What is the ownership pattern of
colors by geography over time?
(is purple trending up in China?)
Morphia
MEAN Stack
Java Python PerlRuby
Support for the most popular languages and frameworks
Drivers & Ecosystem
General Production Considerations
16
Expectations
5-9’s / High Availability with replication
• No scheduled downtime
• Zero-downtime maintenance
Linear scale-out for read and write
• Commodity hardware
• Cloud
– Public
– Private
– Hybrid
17
Infrastructure
Priorities
1. Storage. It’s all about the IOPS! RAID 10 or 0.
2. RAM. Working set (only) in cache for web-scale reads.
3. CPU. Web-scale writes with WiredTiger storage engine.
4. Network.
Commodity server or virtual instance, best power/price
• Dual-CPU Intel, 128GB+
• Locally mounted block storage
– Spinning disk
– SSD
– Enterprise storage with guaranteed IOPS
Production in 16 grams
Durability, Scalability, Availability
Standalone
Replica Sets
Replica Set – 2 to 50 copies
Self-healing shard
Data Center Aware
Addresses availability considerations:
High Availability
Disaster Recovery
Maintenance
Workload Isolation: operational & analytics
Write Concern for Durability
Replica Set Failover
Replicate Data Near Users
Automatic Sharding
Three types: hash-based, range-based, location-aware
Increase or decrease capacity as you go
Automatic balancing
Query Routing
Multiple query optimization models
Each sharding option appropriate
for different apps
Read Global/Write Local
Deployment Architectures & Operations
Development Architecture
Laptop
Application
mongod
SSD
127.0.0.1 / wire protocol
SATA / MMAPv1 or Wired Tiger
Driver
Single Data Center, 3 Racks
Automated failover
Tolerates server failures
Tolerates rack failures
Number of replicas defines failure
tolerance
DMZDMZ
Ideal: 3 Full Data Centers
App Server
Application
Driver
mongos
DC1
Primary
Storage
DC2
Secondary
Storage
DC3
Secondary
Storage
DMZDMZ
Ideal: 3 Full Data Centers
App Server
Application
Driver
mongos
DC1
Down
Storage
DC2
Primary
Storage
DC3
Secondary
Storage✗
DMZDMZ
Hybrid Cloud
App Server
Application
Driver
mongos
DC1
Primary
Storage
DC2
Secondary
Storage
The Cloud
Secondary
Storage
3 Data Centers (or servers, racks…)
You can have it all
• Durable commits (w: "majority")
• Automatic failover and recovery
• Lose any server
• Lose any data center
DMZDMZ
2.1 Data Centers
App Server
Application
Driver
mongos
DC1
Primary
[Anywhere]
Arbiter
DC2
Secondary
DMZDMZ
2.1 Data Centers
App Server
Application
Driver
mongos
DC1
Primary
[Anywhere]
Arbiter
DC2
Primary
✗
Active/Active Data Center
Tolerates server, rack, data center failures, network partitions
DMZDMZ
Only 2 Data Centers
App Server
Application
Driver
mongos
DC1
Primary
[Nowhere]
Nothing
DC2
Secondary
DMZDMZ
Only 2 Data Centers
App Server
Application
Driver
mongos
DC1
Primary
DC2
Secondary
2 Data Centers (or 2 servers, racks…)
Can’t have it all with two data centers
• Durable commits (w:majority)
• Automatic failover and recovery
• Lose any server (OK so far)
• Lose either data center
DMZDMZ
Only 2 Data Centers
App Server
Application
Driver
mongos
DC1
Primary
DC2
Secondary
DMZDMZ
Only 2 Data Centers
App Server
Application
Driver
mongos
DC1
Primary
DC2
Secondary
Secondary
DMZDMZ
Only 2 Data Centers
App Server
Application
Driver
mongos
DC1
Primary
DC2
Secondary
Secondary
✗
DMZDMZ
Only 2 Data Centers
App Server
Application
Driver
mongos
DC1
Down
DC2
Secondary
Primary
✗
DMZDMZ
Only 2 Data Centers
App Server
Application
Driver
mongos
DC1
Primary
DC2
Secondary
Secondary✗
2 Data Centers
Mutually exclusive
• Durable commits (w:majority)
• Automatic failover and recovery
• Lose either data center
2 Data Centers
Mutually exclusive
• Durable commits (w:majority)
• Automatic failover and recovery
• Lose either data center
We need an out-of-band actor
DMZDMZ
Only 2 Data Centers
App Server
Application
Driver
mongos
DC1
Primary
DC2
Secondary
priority:0.5
Secondary
DMZDMZ
DC2 Down
App Server
Application
Driver
mongos
DC1
Primary
DC2
Secondary
priority:0.5
Secondary
✗
DMZDMZ
Only 2 Data Centers
App Server
Application
Driver
mongos
DC2
Secondary
priority:0.5
DC1
Down
Primary
✗
DMZDMZ
DC1 Down
App Server
Application
Driver
mongos
DC1
Primary
DC2
Secondary
priority:0.5
Secondary✗
DMZDMZ
Ensuring DC1 Servers Stay Down
App Server
Application
Driver
mongos
DC1
Down
DC2
Secondary
priority:0.5
Down
✗
DMZDMZ
Remove DC1 votes (3.0 required)
App Server
Application
Driver
mongos
DC1
votes:0
DC2
Primary
votes:0
✗
DMZDMZ
Remove DC1 votes (3.0 required)
App Server
Application
Driver
mongos
DC1
votes:0
DC2
Secondary
votes:0
✗
DMZDMZ
This must not happen!
App Server
Application
Driver
mongos
DC1
Primary
DC2
Primary
Secondary
DMZDMZ
Recovering DC1
App Server
Application
Driver
mongos
DC1
Secondary
DC2
Primary
Down
DMZDMZ
Recovering DC1
App Server
Application
Driver
mongos
DC1
Primary
DC2
Secondary
Down
DMZDMZ
Recovery Complete
App Server
Application
Driver
mongos
DC1
Primary
DC2
Secondary
Secondary
Single-click provisioning, scaling &
upgrades, admin tasks
Monitoring, with charts, dashboards and
alerts on 100+ metrics
Backup and restore, with point-in-time
recovery, support for sharded clusters
MongoDB Ops Manager
The Best Way to Manage MongoDB In Your Data Center
Up to 95% Reduction in Operational Overhead
How MongoDB Ops Manager helps you
Scale EasilyMeet SLAs
Best Practices,
Automated
Cut Management
Overhead
How Ops Manager Works
Ops Manager
mongod mongodmongod
Agent Agent Agent
NewConfig.
Install and Configure two DCs
brew install mongodb
git clone https://github.com/rueckstiess/mtools.git
[[ -d ~/data/replset ]] && rm -rf ~/data/replset
mlaunch init --nodes 3 --replicaset
mongo localhost:27017 #DC1
//rs.init()
rs.status()
//rs.add("cbiow.local:27018") //DC1
//rs.status()
//rs.add("cbiow.local:27019") //simulating DC2
//rs.status()
//rs.status()
Reconfigure and Fail Over in DC1
db.mycoll.insert({a:1},{writeConcern: {w: "majority", wtimeout: 5000}})
db.mycoll.find()
r = rs.config()
r.members[2].priority = 0.5
rs.reconfig(r)
pkill -f 27017
mongo localhost:27018
rs.status()
db.mycoll.insert({a:2},{writeConcern: {w: "majority", wtimeout: 5000}})
db.mycoll.find()
db.mycoll.count()
DC1 Down and Recover in DC2
pkill –f 27018
mongo localhost:27018
rs.status()
db.mycoll.insert({a:3},{writeConcern: {w: "majority", wtimeout: 5000}})
r = rs.config()
r.members[0].votes = 0
r.members[1].votes = 0
rs.reconfig(r, { force: true })
rs.status()
db.mycoll.insert({a:4},{writeConcern: {w: "majority", wtimeout: 5000}})
db.mycoll.find()
DC1 Recovery and Restore
mlaunch start 27017
mongo localhost:27017
rs.status()
db.mycoll.insert({a:5},{writeConcern: {w: "majority", wtimeout: 5000}})
db.mycoll.find()
mlaunch start 27018
mongo localhost:27017
rs.status()
r = rs.config()
r.members[0].votes = 1
r.members[1].votes = 1
rs.reconfig(r)
For More Information
Resource Location
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Free Online Training education.mongodb.com
Webinars and Events mongodb.com/events
Documentation docs.mongodb.org
MongoDB Downloads mongodb.com/download
Additional Info info@mongodb.com
Production deployment

Production deployment

Editor's Notes

  • #11 Here we have greatly reduced the relational data model for this application to two tables. In reality no database has two tables. It is much more common to have hundreds or thousands of tables. And as a developer where do you begin when you have a complex data model?? If you're building an app you're really thinking about just a hand full of common things, like products, and these can be represented in a document much more easily that a complex relational model where the data is broken up in a way that doesn't really reflect the way you think about the data or write an application.
  • #13 Rich queries, text search, geospatial, aggregation, mapreduce are types of things you can build based on the richness of the query model.
  • #21 High Availability – Ensure application availability during many types of failures Disaster Recovery – Address the RTO and RPO goals for business continuity Maintenance – Perform upgrades and other maintenance operations with no application downtime Secondaries can be used for a variety of applications – failover, hot backup, rolling upgrades, data locality and privacy and workload isolation
  • #22 High Availability – Ensure application availability during many types of failures Disaster Recovery – Address the RTO and RPO goals for business continuity Maintenance – Perform upgrades and other maintenance operations with no application downtime Secondaries can be used for a variety of applications – failover, hot backup, rolling upgrades, data locality and privacy and workload isolation
  • #23 High Availability – Ensure application availability during many types of failures Disaster Recovery – Address the RTO and RPO goals for business continuity Maintenance – Perform upgrades and other maintenance operations with no application downtime Secondaries can be used for a variety of applications – failover, hot backup, rolling upgrades, data locality and privacy and workload isolation
  • #24 High Availability – Ensure application availability during many types of failures Disaster Recovery – Address the RTO and RPO goals for business continuity Maintenance – Perform upgrades and other maintenance operations with no application downtime Secondaries can be used for a variety of applications – failover, hot backup, rolling upgrades, data locality and privacy and workload isolation
  • #26 MongoDB provides horizontal scale-out for databases using a technique called sharding, which is trans- parent to applications. Sharding distributes data across multiple physical partitions called shards. Sharding allows MongoDB deployments to address the hardware limitations of a single server, such as bottlenecks in RAM or disk I/O, without adding complexity to the application. MongoDB supports three types of sharding: • Range-based Sharding. Documents are partitioned across shards according to the shard key value. Documents with shard key values “close” to one another are likely to be co-located on the same shard. This approach is well suited for applications that need to optimize range- based queries. • Hash-based Sharding. Documents are uniformly distributed according to an MD5 hash of the shard key value. Documents with shard key values “close” to one another are unlikely to be co-located on the same shard. This approach guarantees a uniform distribution of writes across shards, but is less optimal for range-based queries. • Tag-aware Sharding. Documents are partitioned according to a user-specified configuration that associates shard key ranges with shards. Users can optimize the physical location of documents for application requirements such as locating data in specific data centers. MongoDB automatically balances the data in the cluster as the data grows or the size of the cluster increases or decreases.
  • #27 Sharding is transparent to applications; whether there is one or one hundred shards, the application code for querying MongoDB is the same. Applications issue queries to a query router that dispatches the query to the appropriate shards. For key-value queries that are based on the shard key, the query router will dispatch the query to the shard that manages the document with the requested key. When using range-based sharding, queries that specify ranges on the shard key are only dispatched to shards that contain documents with values within the range. For queries that don’t use the shard key, the query router will dispatch the query to all shards and aggregate and sort the results as appropriate. Multiple query routers can be used with a MongoDB system, and the appropriate number is determined based on performance and availability requirements of the application.
  • #30 MongoDB provides horizontal scale-out for databases using a technique called sharding, which is trans- parent to applications. Sharding distributes data across multiple physical partitions called shards. Sharding allows MongoDB deployments to address the hardware limitations of a single server, such as bottlenecks in RAM or disk I/O, without adding complexity to the application. MongoDB supports three types of sharding: • Range-based Sharding. Documents are partitioned across shards according to the shard key value. Documents with shard key values “close” to one another are likely to be co-located on the same shard. This approach is well suited for applications that need to optimize range- based queries. • Hash-based Sharding. Documents are uniformly distributed according to an MD5 hash of the shard key value. Documents with shard key values “close” to one another are unlikely to be co-located on the same shard. This approach guarantees a uniform distribution of writes across shards, but is less optimal for range-based queries. • Tag-aware Sharding. Documents are partitioned according to a user-specified configuration that associates shard key ranges with shards. Users can optimize the physical location of documents for application requirements such as locating data in specific data centers. MongoDB automatically balances the data in the cluster as the data grows or the size of the cluster increases or decreases.
  • #61 MMS can do a lot for [ops teams]. Best Practices, Automated. MMS takes best practices for running MongoDB and automates them. So you run ops the way MongoDB engineers would do it. This not only makes it more fool-proof, but it also helps you… Cut Management Overhead. No custom scripting or special setup needed. You can spend less time running and managing manual tasks because MMS takes care of a lot of the work for you, letting you focus on other tasks. Meet SLAs. Automating critical management tasks makes it easier to meet uptime SLAs. This includes managing failover as well as doing rolling upgrades with no downtime. Scale Easily. Provision new nodes and systems with a single click.