SlideShare a Scribd company logo
Advanced Administration
Monitoring and Backup at Scale
Dr. Jeffrey Berger
Lead Database Engineer - Sailthru
Scale The Universe!
Scale The Universe!
Flat FRW Metric for isotropic cosmological geometry
Scale The Universe!
Flat FRW Metric for isotropic cosmological geometry
Scale Factor
Scale The Universe!
Flat FRW Metric for isotropic cosmological geometry
Scale Factor
Related to the hubble constant for an expanding universe, this does a great job
of actually scaling our universe.
In fact the rate of expansion is continuing to grow and accelerate!
Scale The Universe!
Uhh maybe just the galaxy
The world is a lot still
Keep zooming in...
‘Big Data’...?
Sailthru
Sailthru
● Extremely early adopter of MongoDB ~2009
● 4 Clusters and 9 Stand-Alone RS
● Largest is 32 shards and 5.5TB with ~1.5 billion profiles
● All production systems are housed in a colo data center
on hardware owned and operated by Sailthru
Sailthru
● 4 DB Team Members
○ Me
○ Dr. Joshua Wickman
○ Chandrakant Gopalan
○ Tim Burrington
Sailthru
● Our systems are
composed of replica
sets of 2 live nodes
and 1 arbiter
● Many of our systems
are ‘microsharded’
PRIMARY
ARBITER
SECONDARY
PRIMARY
ARBITER
SECONDARY
Two tales of DBA struggle
No DBEs to DB team Mass Migration
What do you do if you have to
move data from one data-center to
another, while moving 17 replica
sets into a single sharded cluster
with no (minimal) downtime?
What do you do when you join an
organization which has been using
MongoDB without any DBA
oversight?
Welcome to the DB team
What are the most important things for a DB to set
up?
MONITORING BACKUPS
Monitoring
Microsharded systems are not easy to monitor!
● Multiple replica sets on a
single machine
● Primaries and Secondaries
often sharing hardware
● Monitoring systems for
Mongo are at a instance
level not server level
SHARD 1
PRIMARY
SHARD 2
SECONDARY
MEMORY
DISK IO
NETWORK IO
Monitoring - MMS
MMS is a great tool for all Mongo deployments
● Built in user level permissions
● Automatic topology discovery
● Graphs and time series data
● Breakdown by replica set for
clusters
● Pulls a wealth of data
Monitoring - MMS
● Built in alerting
● Many variable alerting
criteria
● Integration with email,
SMS, Pagerduty and
more
Monitoring - MMS
MMS is our backup monitoring system
● Alerting time sometimes lags
behind issue time
● Organizational decision not to
host MMS and that we need an
internal monitoring system as our
main monitor
Monitoring - MMS
What we are looking forward to:
● Proactive Support has some great features coming
through MMS
● Enhanced monitoring and alerting options
● Logging long queries? Non-indexed queries?
● Perhaps we can run custom scripts and checks against
the system eventually!
Monitoring - Zabbix
“Quis custodiet ipsos custodes?” - ZABBIX
Monitoring - Zabbix
Monitoring mongo with Zabbix
https://github.com/sailthru/mongodb-zabbix
● Number of voting members
● Long query logging
● Chunk distribution in a sharded cluster
● Fsync lock status
● Failover notification
Monitoring - Zabbix
Custom checks and graphs - cluster monitoring
Monitoring - Zabbix
Long Query Logging
Monitoring - Zabbix
Zabbix does not have any automated topology discovery!
Sailthru has created its own MongoDB topological discovery
tool : DB Map
● Python Process
● Automatically discovers nodes or config changes
● Outputs all servers and information to a Mongo collection
Admin Tool - DB Map
Useful for many processes in our system
● Management scripts
● Execute aggregation queries to pull specific systems
● Keep Zabbix in sync using it as a source of truth
● Exportable for Ansible inventory files or other
management software
● Soon to be Open Sourced
Built By : Dr. Joshua Wickman
Backups
Many ways to skin a… cluster....?
● Volume snapshots (within our Datacenter)
● Snapshots of cloud secondaries (Hybrid Cloud)
● MMS Backups
Backups - Hybrid Cloud
SECONARY
(HIDDEN)
SECONDARY
PRIMARY
DATACENTER
CLOUD
Sailthru had a hybrid cloud-physical topology.
Backups - Hybrid Cloud
● Disaster recovery is immediate
● Backups can be taken care of by EC2
snapshotting
There are benefits to a hybrid setup
Backups - Hybrid Cloud
PRIMARY PRIMARY PRIMARY PRIMARY
SECONDARY SECONDARY SECONDARY SECONDARY
SECONDARY
(hidden)
SECONDARY
(hidden)
SECONDARY
(hidden)
SECONDARY
(hidden)
DC
Cloud
Backups - Hybrid Cloud
PRIMARY PRIMARY
SECONDARY SECONDARY
SECONDARY
(hidden)
SECONDARY
(hidden)
● Are these secondaries on
hardware provisioned
equally to the others?
● Is there enough bandwidth?
● Can the disks keep up with
bursts of write activity?
● Are the oplogs on these
secondaries long enough?
● Is the connection to the
cloud secure and stable?
Backups - Hybrid Cloud
DO YOU HAVE THE TIME AND RESOURCES
TO DO ALL OF THAT WORK??
We all just want backups that are fire-and-
forget it!
Backups - MMS
● Save on your team’s time
● Save on the provisioned hardware
● Much cheaper than hybrid cloud solution
Sailthru has saved almost 1 million
dollars year over year
Backups - MMS
● UI is easy to use and great
for small/individual sets
● Need automation in order to
bring up a cluster of any
reasonable size
○ Automation tools not yet
available out of the box
● Pulls your data across the
internet - make sure you
allocate this time!
The Power is Turning Off...
During 2014 Sailthru was forced to
move Data Centers
Additionally we made the infrastructure
decision to move from 17+ separate
replica sets to a sharded cluster.
Data Migrations
DC1 DC2
CLOUD
With limited bandwidth and servers this becomes some
interview’s brain teaser
Data Migrations - Dumps
DC1 DC2
Mongodump
Netcat Write to file then Mongorestore
● Lots of combinations, none ended up being fast enough.
● Hampered by disk writes and reads.
● If you touch disk you lose! The floor is lava!
Data Migrations - Mongopipe
Custom multiprocessing python process to insert
without hitting disk
● Using python, multiprocessing, ZMQ, and some custom
C objects
● Got around 2.4 bulk insert issue by sorting on shard key
● Never touches disk, all processing is done in memory
● Directly insert into many local mongos instances
● Open source coming soon!
Data Migrations - Mongopipe
Cursor
Cursor
Cursor
Writer
Writer
Writer
Mongos
Mongos
Mongos
Target
Cluster
ZMQ Batch Inserts
Sort on Shard Key
DC1 DC2
Data Migrations - Mongopipe
insert query update delete getmore command
64982 25 *0 *0 0 45|0
62484 23 *0 *0 0 50|0
37490 15 *0 *0 0 25|0
-1073585030 -4978381 *0 *0 -163 -5042014|0
197448 70 *0 *0 0 144|0
227440 105 *0 *0 0 181|0
49986 45 *0 *0 0 59|0
Data Migrations - Mongo Connector
● Mongoconnector is a way to mirror mongodb operations,
creating almost a virtual secondary without adding it to a
replica set
● Great for data migrations without downtime
https://github.com/10gen-labs/mongo-connector
Data Migrations - Mongo Connector
MONGO
OP LOG
1….
2….
3….
TARGET
DATASTORE
Elasticsearch..
Solr...
Mongodb...
MONGO CONNECTOR
OPLOG
MNGR.
DOC MNGR.
DOC MNGR.
DOC MNGR.
Access Patterns - Keystore
● What if I want to do a lot of findOnes on a cluster?
● On many unique fields?
● Am I doomed to many scatter gathers?
SHARD SHARD SHARD SHARD
MONGOSAssume sharded on _id: hashed
findOne({“ssn”: X}) findOne({“cell_phone”: X}) findOne({“_id”: X})
Created by : Ian White
Access Patterns - Keystore
Find by SSN
SHARDED COLL
Sharded on:
{_id: hashed}
Doc:
{
_id: SSN
sid: ObjectId()
}
Query on _id (shard key)
Return an ObjectId
Main Sharded Collection
Sharded on :
{_id: hashed}
Use sid that was found to query
the _id in the main collection
Access Patterns - Keystore
2 queries rather than n where n is your number of shards
** Not useful unless you are sharded out very far **
● Time averaged by keystore : ~30 seconds
● Time averaged by direct lookup: ~170 seconds
** tests done on a 32 shard cluster
Other Tools - Mongoexup
● Cron jobs are unreliable
● Any ‘prototype’ inevitably becomes production
● Constructed a python scheduler daemon to execute
these tasks
● Looking to open source in the future
Business need to regularly execute mongoexport and
uploads
Built By : Chandrakant Gopalan
Other Tools - Mongoexup
Mongo MongoExUp S3
Greenlets Greenlets
Job Status Information
What are we doing next?
● Open source even more of our tools
● Ansible Automation
● Building API layers around all our DBs
○ Tornado - ASYNC RULES
● MongoDB + Other Data Stores
○ Enhancing the Keystore concept
● Upgrading
○ WT
○ RocksDB

More Related Content

What's hot

Webinar: Keeping Your MongoDB Data Safe
Webinar: Keeping Your MongoDB Data SafeWebinar: Keeping Your MongoDB Data Safe
Webinar: Keeping Your MongoDB Data Safe
MongoDB
 
MongoDB and server performance
MongoDB and server performanceMongoDB and server performance
MongoDB and server performanceAlon Horev
 
Introducing MongoDB in a multi-site HA environment
Introducing MongoDB in a multi-site HA environmentIntroducing MongoDB in a multi-site HA environment
Introducing MongoDB in a multi-site HA environment
Sebastian Geib
 
MongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTigerMongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTiger
WiredTiger
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production Deployment
MongoDB
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localytics
andrew311
 
Caching methodology and strategies
Caching methodology and strategiesCaching methodology and strategies
Caching methodology and strategies
Tiep Vu
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
MongoDB
 
Backing Up Data with MMS
Backing Up Data with MMSBacking Up Data with MMS
Backing Up Data with MMS
MongoDB
 
Development to Production with Sharded MongoDB Clusters
Development to Production with Sharded MongoDB ClustersDevelopment to Production with Sharded MongoDB Clusters
Development to Production with Sharded MongoDB Clusters
Severalnines
 
Backup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryBackup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryMongoDB
 
Mongo db multidc_webinar
Mongo db multidc_webinarMongo db multidc_webinar
Mongo db multidc_webinar
MongoDB
 
Redis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs TalksRedis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs Talks
Redis Labs
 
Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutes
Sudheer Kondla
 
MongoDB memory management demystified
MongoDB memory management demystifiedMongoDB memory management demystified
MongoDB memory management demystified
Alon Horev
 
Using Redis at Facebook
Using Redis at FacebookUsing Redis at Facebook
Using Redis at Facebook
Redis Labs
 
How to monitor MongoDB
How to monitor MongoDBHow to monitor MongoDB
How to monitor MongoDB
Server Density
 
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage EngineMongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB
 
Using ZFS file system with MySQL
Using ZFS file system with MySQLUsing ZFS file system with MySQL
Using ZFS file system with MySQL
Mydbops
 

What's hot (20)

Webinar: Keeping Your MongoDB Data Safe
Webinar: Keeping Your MongoDB Data SafeWebinar: Keeping Your MongoDB Data Safe
Webinar: Keeping Your MongoDB Data Safe
 
MongoDB and server performance
MongoDB and server performanceMongoDB and server performance
MongoDB and server performance
 
Introducing MongoDB in a multi-site HA environment
Introducing MongoDB in a multi-site HA environmentIntroducing MongoDB in a multi-site HA environment
Introducing MongoDB in a multi-site HA environment
 
Tuning Linux for MongoDB
Tuning Linux for MongoDBTuning Linux for MongoDB
Tuning Linux for MongoDB
 
MongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTigerMongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTiger
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production Deployment
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localytics
 
Caching methodology and strategies
Caching methodology and strategiesCaching methodology and strategies
Caching methodology and strategies
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 
Backing Up Data with MMS
Backing Up Data with MMSBacking Up Data with MMS
Backing Up Data with MMS
 
Development to Production with Sharded MongoDB Clusters
Development to Production with Sharded MongoDB ClustersDevelopment to Production with Sharded MongoDB Clusters
Development to Production with Sharded MongoDB Clusters
 
Backup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryBackup, Restore, and Disaster Recovery
Backup, Restore, and Disaster Recovery
 
Mongo db multidc_webinar
Mongo db multidc_webinarMongo db multidc_webinar
Mongo db multidc_webinar
 
Redis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs TalksRedis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs Talks
 
Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutes
 
MongoDB memory management demystified
MongoDB memory management demystifiedMongoDB memory management demystified
MongoDB memory management demystified
 
Using Redis at Facebook
Using Redis at FacebookUsing Redis at Facebook
Using Redis at Facebook
 
How to monitor MongoDB
How to monitor MongoDBHow to monitor MongoDB
How to monitor MongoDB
 
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage EngineMongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
 
Using ZFS file system with MySQL
Using ZFS file system with MySQLUsing ZFS file system with MySQL
Using ZFS file system with MySQL
 

Similar to Advanced Administration, Monitoring and Backup

Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
MongoDB
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
Ferdinand Jamitzky
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment StrategyMongoDB
 
mongodb tutorial
mongodb tutorialmongodb tutorial
mongodb tutorial
Jaehong Park
 
MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
FITC
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
Edward Capriolo
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
MongoDB
 
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
Imperva Incapsula
 
Scaling with mongo db (with notes)
Scaling with mongo db (with notes)Scaling with mongo db (with notes)
Scaling with mongo db (with notes)
emiltamas
 
Manta Unleashed BigDataSG talk 2 July 2013
Manta Unleashed BigDataSG talk 2 July 2013Manta Unleashed BigDataSG talk 2 July 2013
Manta Unleashed BigDataSG talk 2 July 2013
Christopher Hogue
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Hernan Costante
 
Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Let's Containerize New York with Docker!
Let's Containerize New York with Docker!
Jérôme Petazzoni
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
Glenn K. Lockwood
 
Apache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxApache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptx
Miraj Godha
 
Scalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis System
Scalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis SystemScalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis System
Scalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis System
Tamas K Lengyel
 
You suck at Memory Analysis
You suck at Memory AnalysisYou suck at Memory Analysis
You suck at Memory Analysis
Francisco Ribeiro
 
Deployment
DeploymentDeployment
Deployment
rogerbodamer
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
tsliwowicz
 
Hybis: Advanced Introspection for Effective Windows Guest Protection
Hybis: Advanced Introspection for Effective Windows Guest ProtectionHybis: Advanced Introspection for Effective Windows Guest Protection
Hybis: Advanced Introspection for Effective Windows Guest Protection
Federico Franzoni
 
Run MongoDB with Confidence: Backing up and Monitoring with MMS
Run MongoDB with Confidence: Backing up and Monitoring with MMSRun MongoDB with Confidence: Backing up and Monitoring with MMS
Run MongoDB with Confidence: Backing up and Monitoring with MMSMongoDB
 

Similar to Advanced Administration, Monitoring and Backup (20)

Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 
mongodb tutorial
mongodb tutorialmongodb tutorial
mongodb tutorial
 
MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
 
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
 
Scaling with mongo db (with notes)
Scaling with mongo db (with notes)Scaling with mongo db (with notes)
Scaling with mongo db (with notes)
 
Manta Unleashed BigDataSG talk 2 July 2013
Manta Unleashed BigDataSG talk 2 July 2013Manta Unleashed BigDataSG talk 2 July 2013
Manta Unleashed BigDataSG talk 2 July 2013
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
 
Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Let's Containerize New York with Docker!
Let's Containerize New York with Docker!
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
Apache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxApache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptx
 
Scalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis System
Scalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis SystemScalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis System
Scalability, Fidelity and Stealth in the DRAKVUF Dynamic Malware Analysis System
 
You suck at Memory Analysis
You suck at Memory AnalysisYou suck at Memory Analysis
You suck at Memory Analysis
 
Deployment
DeploymentDeployment
Deployment
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
 
Hybis: Advanced Introspection for Effective Windows Guest Protection
Hybis: Advanced Introspection for Effective Windows Guest ProtectionHybis: Advanced Introspection for Effective Windows Guest Protection
Hybis: Advanced Introspection for Effective Windows Guest Protection
 
Run MongoDB with Confidence: Backing up and Monitoring with MMS
Run MongoDB with Confidence: Backing up and Monitoring with MMSRun MongoDB with Confidence: Backing up and Monitoring with MMS
Run MongoDB with Confidence: Backing up and Monitoring with MMS
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Advanced Administration, Monitoring and Backup

  • 1. Advanced Administration Monitoring and Backup at Scale Dr. Jeffrey Berger Lead Database Engineer - Sailthru
  • 3. Scale The Universe! Flat FRW Metric for isotropic cosmological geometry
  • 4. Scale The Universe! Flat FRW Metric for isotropic cosmological geometry Scale Factor
  • 5. Scale The Universe! Flat FRW Metric for isotropic cosmological geometry Scale Factor Related to the hubble constant for an expanding universe, this does a great job of actually scaling our universe. In fact the rate of expansion is continuing to grow and accelerate!
  • 7. Uhh maybe just the galaxy
  • 8. The world is a lot still
  • 12. Sailthru ● Extremely early adopter of MongoDB ~2009 ● 4 Clusters and 9 Stand-Alone RS ● Largest is 32 shards and 5.5TB with ~1.5 billion profiles ● All production systems are housed in a colo data center on hardware owned and operated by Sailthru
  • 13. Sailthru ● 4 DB Team Members ○ Me ○ Dr. Joshua Wickman ○ Chandrakant Gopalan ○ Tim Burrington
  • 14. Sailthru ● Our systems are composed of replica sets of 2 live nodes and 1 arbiter ● Many of our systems are ‘microsharded’ PRIMARY ARBITER SECONDARY PRIMARY ARBITER SECONDARY
  • 15. Two tales of DBA struggle No DBEs to DB team Mass Migration What do you do if you have to move data from one data-center to another, while moving 17 replica sets into a single sharded cluster with no (minimal) downtime? What do you do when you join an organization which has been using MongoDB without any DBA oversight?
  • 16. Welcome to the DB team What are the most important things for a DB to set up? MONITORING BACKUPS
  • 17. Monitoring Microsharded systems are not easy to monitor! ● Multiple replica sets on a single machine ● Primaries and Secondaries often sharing hardware ● Monitoring systems for Mongo are at a instance level not server level SHARD 1 PRIMARY SHARD 2 SECONDARY MEMORY DISK IO NETWORK IO
  • 18. Monitoring - MMS MMS is a great tool for all Mongo deployments ● Built in user level permissions ● Automatic topology discovery ● Graphs and time series data ● Breakdown by replica set for clusters ● Pulls a wealth of data
  • 19. Monitoring - MMS ● Built in alerting ● Many variable alerting criteria ● Integration with email, SMS, Pagerduty and more
  • 20. Monitoring - MMS MMS is our backup monitoring system ● Alerting time sometimes lags behind issue time ● Organizational decision not to host MMS and that we need an internal monitoring system as our main monitor
  • 21. Monitoring - MMS What we are looking forward to: ● Proactive Support has some great features coming through MMS ● Enhanced monitoring and alerting options ● Logging long queries? Non-indexed queries? ● Perhaps we can run custom scripts and checks against the system eventually!
  • 22. Monitoring - Zabbix “Quis custodiet ipsos custodes?” - ZABBIX
  • 23. Monitoring - Zabbix Monitoring mongo with Zabbix https://github.com/sailthru/mongodb-zabbix ● Number of voting members ● Long query logging ● Chunk distribution in a sharded cluster ● Fsync lock status ● Failover notification
  • 24. Monitoring - Zabbix Custom checks and graphs - cluster monitoring
  • 25. Monitoring - Zabbix Long Query Logging
  • 26. Monitoring - Zabbix Zabbix does not have any automated topology discovery! Sailthru has created its own MongoDB topological discovery tool : DB Map ● Python Process ● Automatically discovers nodes or config changes ● Outputs all servers and information to a Mongo collection
  • 27. Admin Tool - DB Map Useful for many processes in our system ● Management scripts ● Execute aggregation queries to pull specific systems ● Keep Zabbix in sync using it as a source of truth ● Exportable for Ansible inventory files or other management software ● Soon to be Open Sourced Built By : Dr. Joshua Wickman
  • 28. Backups Many ways to skin a… cluster....? ● Volume snapshots (within our Datacenter) ● Snapshots of cloud secondaries (Hybrid Cloud) ● MMS Backups
  • 29. Backups - Hybrid Cloud SECONARY (HIDDEN) SECONDARY PRIMARY DATACENTER CLOUD Sailthru had a hybrid cloud-physical topology.
  • 30. Backups - Hybrid Cloud ● Disaster recovery is immediate ● Backups can be taken care of by EC2 snapshotting There are benefits to a hybrid setup
  • 31. Backups - Hybrid Cloud PRIMARY PRIMARY PRIMARY PRIMARY SECONDARY SECONDARY SECONDARY SECONDARY SECONDARY (hidden) SECONDARY (hidden) SECONDARY (hidden) SECONDARY (hidden) DC Cloud
  • 32. Backups - Hybrid Cloud PRIMARY PRIMARY SECONDARY SECONDARY SECONDARY (hidden) SECONDARY (hidden) ● Are these secondaries on hardware provisioned equally to the others? ● Is there enough bandwidth? ● Can the disks keep up with bursts of write activity? ● Are the oplogs on these secondaries long enough? ● Is the connection to the cloud secure and stable?
  • 33. Backups - Hybrid Cloud DO YOU HAVE THE TIME AND RESOURCES TO DO ALL OF THAT WORK?? We all just want backups that are fire-and- forget it!
  • 34. Backups - MMS ● Save on your team’s time ● Save on the provisioned hardware ● Much cheaper than hybrid cloud solution Sailthru has saved almost 1 million dollars year over year
  • 35. Backups - MMS ● UI is easy to use and great for small/individual sets ● Need automation in order to bring up a cluster of any reasonable size ○ Automation tools not yet available out of the box ● Pulls your data across the internet - make sure you allocate this time!
  • 36. The Power is Turning Off... During 2014 Sailthru was forced to move Data Centers Additionally we made the infrastructure decision to move from 17+ separate replica sets to a sharded cluster.
  • 37. Data Migrations DC1 DC2 CLOUD With limited bandwidth and servers this becomes some interview’s brain teaser
  • 38. Data Migrations - Dumps DC1 DC2 Mongodump Netcat Write to file then Mongorestore ● Lots of combinations, none ended up being fast enough. ● Hampered by disk writes and reads. ● If you touch disk you lose! The floor is lava!
  • 39. Data Migrations - Mongopipe Custom multiprocessing python process to insert without hitting disk ● Using python, multiprocessing, ZMQ, and some custom C objects ● Got around 2.4 bulk insert issue by sorting on shard key ● Never touches disk, all processing is done in memory ● Directly insert into many local mongos instances ● Open source coming soon!
  • 40. Data Migrations - Mongopipe Cursor Cursor Cursor Writer Writer Writer Mongos Mongos Mongos Target Cluster ZMQ Batch Inserts Sort on Shard Key DC1 DC2
  • 41. Data Migrations - Mongopipe insert query update delete getmore command 64982 25 *0 *0 0 45|0 62484 23 *0 *0 0 50|0 37490 15 *0 *0 0 25|0 -1073585030 -4978381 *0 *0 -163 -5042014|0 197448 70 *0 *0 0 144|0 227440 105 *0 *0 0 181|0 49986 45 *0 *0 0 59|0
  • 42. Data Migrations - Mongo Connector ● Mongoconnector is a way to mirror mongodb operations, creating almost a virtual secondary without adding it to a replica set ● Great for data migrations without downtime https://github.com/10gen-labs/mongo-connector
  • 43. Data Migrations - Mongo Connector MONGO OP LOG 1…. 2…. 3…. TARGET DATASTORE Elasticsearch.. Solr... Mongodb... MONGO CONNECTOR OPLOG MNGR. DOC MNGR. DOC MNGR. DOC MNGR.
  • 44. Access Patterns - Keystore ● What if I want to do a lot of findOnes on a cluster? ● On many unique fields? ● Am I doomed to many scatter gathers? SHARD SHARD SHARD SHARD MONGOSAssume sharded on _id: hashed findOne({“ssn”: X}) findOne({“cell_phone”: X}) findOne({“_id”: X}) Created by : Ian White
  • 45. Access Patterns - Keystore Find by SSN SHARDED COLL Sharded on: {_id: hashed} Doc: { _id: SSN sid: ObjectId() } Query on _id (shard key) Return an ObjectId Main Sharded Collection Sharded on : {_id: hashed} Use sid that was found to query the _id in the main collection
  • 46. Access Patterns - Keystore 2 queries rather than n where n is your number of shards ** Not useful unless you are sharded out very far ** ● Time averaged by keystore : ~30 seconds ● Time averaged by direct lookup: ~170 seconds ** tests done on a 32 shard cluster
  • 47. Other Tools - Mongoexup ● Cron jobs are unreliable ● Any ‘prototype’ inevitably becomes production ● Constructed a python scheduler daemon to execute these tasks ● Looking to open source in the future Business need to regularly execute mongoexport and uploads Built By : Chandrakant Gopalan
  • 48. Other Tools - Mongoexup Mongo MongoExUp S3 Greenlets Greenlets Job Status Information
  • 49. What are we doing next? ● Open source even more of our tools ● Ansible Automation ● Building API layers around all our DBs ○ Tornado - ASYNC RULES ● MongoDB + Other Data Stores ○ Enhancing the Keystore concept ● Upgrading ○ WT ○ RocksDB