Advanced Administration, Monitoring and Backup

Advanced Administration
Monitoring and Backup at Scale
Dr. Jeffrey Berger
Lead Database Engineer - Sailthru

Scale The Universe!
Flat FRW Metric for isotropic cosmological geometry

Scale The Universe!
Scale Factor

Scale The Universe!
Scale Factor
Related to the hubble constant for an expanding universe, this does a great job
of actually scaling our universe.
In fact the rate of expansion is continuing to grow and accelerate!

Sailthru
● Extremely early adopter of MongoDB ~2009
● 4 Clusters and 9 Stand-Alone RS
● Largest is 32 shards and 5.5TB with ~1.5 billion profiles
● All production systems are housed in a colo data center
on hardware owned and operated by Sailthru

Sailthru
● 4 DB Team Members
○ Me
○ Dr. Joshua Wickman
○ Chandrakant Gopalan
○ Tim Burrington

Sailthru
● Our systems are
composed of replica
sets of 2 live nodes
and 1 arbiter
● Many of our systems
are ‘microsharded’
PRIMARY
ARBITER
SECONDARY
PRIMARY
ARBITER
SECONDARY

Two tales of DBA struggle
No DBEs to DB team Mass Migration
What do you do if you have to
move data from one data-center to
another, while moving 17 replica
sets into a single sharded cluster
with no (minimal) downtime?
What do you do when you join an
organization which has been using
MongoDB without any DBA
oversight?

Welcome to the DB team
What are the most important things for a DB to set
up?
MONITORING BACKUPS

Monitoring
Microsharded systems are not easy to monitor!
● Multiple replica sets on a
single machine
● Primaries and Secondaries
often sharing hardware
● Monitoring systems for
Mongo are at a instance
level not server level
SHARD 1
PRIMARY
SHARD 2
SECONDARY
MEMORY
DISK IO
NETWORK IO

Monitoring - MMS
MMS is a great tool for all Mongo deployments
● Built in user level permissions
● Automatic topology discovery
● Graphs and time series data
● Breakdown by replica set for
clusters
● Pulls a wealth of data

Monitoring - MMS
● Built in alerting
● Many variable alerting
criteria
● Integration with email,
SMS, Pagerduty and
more

Monitoring - MMS
MMS is our backup monitoring system
● Alerting time sometimes lags
behind issue time
● Organizational decision not to
host MMS and that we need an
internal monitoring system as our
main monitor

Monitoring - MMS
What we are looking forward to:
● Proactive Support has some great features coming
through MMS
● Enhanced monitoring and alerting options
● Logging long queries? Non-indexed queries?
● Perhaps we can run custom scripts and checks against
the system eventually!

Monitoring - Zabbix
“Quis custodiet ipsos custodes?” - ZABBIX

Monitoring - Zabbix
Monitoring mongo with Zabbix
https://github.com/sailthru/mongodb-zabbix
● Number of voting members
● Long query logging
● Chunk distribution in a sharded cluster
● Fsync lock status
● Failover notification

Monitoring - Zabbix
Custom checks and graphs - cluster monitoring

Monitoring - Zabbix
Long Query Logging

Monitoring - Zabbix
Zabbix does not have any automated topology discovery!
Sailthru has created its own MongoDB topological discovery
tool : DB Map
● Python Process
● Automatically discovers nodes or config changes
● Outputs all servers and information to a Mongo collection

Admin Tool - DB Map
Useful for many processes in our system
● Management scripts
● Execute aggregation queries to pull specific systems
● Keep Zabbix in sync using it as a source of truth
● Exportable for Ansible inventory files or other
management software
● Soon to be Open Sourced
Built By : Dr. Joshua Wickman

Backups
Many ways to skin a… cluster....?
● Volume snapshots (within our Datacenter)
● Snapshots of cloud secondaries (Hybrid Cloud)
● MMS Backups

Backups - Hybrid Cloud
SECONARY
(HIDDEN)
SECONDARY
PRIMARY
DATACENTER
CLOUD
Sailthru had a hybrid cloud-physical topology.

● Disaster recovery is immediate
● Backups can be taken care of by EC2
snapshotting
There are benefits to a hybrid setup

PRIMARY PRIMARY PRIMARY PRIMARY
SECONDARY SECONDARY SECONDARY SECONDARY
SECONDARY
(hidden)
SECONDARY
(hidden)
SECONDARY
(hidden)
SECONDARY
(hidden)
DC
Cloud

PRIMARY PRIMARY
SECONDARY SECONDARY
SECONDARY
(hidden)
SECONDARY
(hidden)
● Are these secondaries on
hardware provisioned
equally to the others?
● Is there enough bandwidth?
● Can the disks keep up with
bursts of write activity?
● Are the oplogs on these
secondaries long enough?
● Is the connection to the
cloud secure and stable?

DO YOU HAVE THE TIME AND RESOURCES
TO DO ALL OF THAT WORK??
We all just want backups that are fire-and-
forget it!

Backups - MMS
● Save on your team’s time
● Save on the provisioned hardware
● Much cheaper than hybrid cloud solution
Sailthru has saved almost 1 million
dollars year over year

Backups - MMS
● UI is easy to use and great
for small/individual sets
● Need automation in order to
bring up a cluster of any
reasonable size
○ Automation tools not yet
available out of the box
● Pulls your data across the
internet - make sure you
allocate this time!

The Power is Turning Off...
During 2014 Sailthru was forced to
move Data Centers
Additionally we made the infrastructure
decision to move from 17+ separate
replica sets to a sharded cluster.

Data Migrations
DC1 DC2
CLOUD
With limited bandwidth and servers this becomes some
interview’s brain teaser

Data Migrations - Dumps
DC1 DC2
Mongodump
Netcat Write to file then Mongorestore
● Lots of combinations, none ended up being fast enough.
● Hampered by disk writes and reads.
● If you touch disk you lose! The floor is lava!

Data Migrations - Mongopipe
Custom multiprocessing python process to insert
without hitting disk
● Using python, multiprocessing, ZMQ, and some custom
C objects
● Got around 2.4 bulk insert issue by sorting on shard key
● Never touches disk, all processing is done in memory
● Directly insert into many local mongos instances
● Open source coming soon!

Cursor
Cursor
Cursor
Writer
Writer
Writer
Mongos
Mongos
Mongos
Target
Cluster
ZMQ Batch Inserts
Sort on Shard Key
DC1 DC2

insert query update delete getmore command
64982 25 *0 *0 0 45|0
62484 23 *0 *0 0 50|0
37490 15 *0 *0 0 25|0
-1073585030 -4978381 *0 *0 -163 -5042014|0
197448 70 *0 *0 0 144|0
227440 105 *0 *0 0 181|0
49986 45 *0 *0 0 59|0

Data Migrations - Mongo Connector
● Mongoconnector is a way to mirror mongodb operations,
creating almost a virtual secondary without adding it to a
replica set
● Great for data migrations without downtime
https://github.com/10gen-labs/mongo-connector

Data Migrations - Mongo Connector
MONGO
OP LOG
1….
2….
3….
TARGET
DATASTORE
Elasticsearch..
Solr...
Mongodb...
MONGO CONNECTOR
OPLOG
MNGR.
DOC MNGR.
DOC MNGR.
DOC MNGR.

Access Patterns - Keystore
● What if I want to do a lot of findOnes on a cluster?
● On many unique fields?
● Am I doomed to many scatter gathers?
SHARD SHARD SHARD SHARD
MONGOSAssume sharded on _id: hashed
findOne({“ssn”: X}) findOne({“cell_phone”: X}) findOne({“_id”: X})
Created by : Ian White

Find by SSN
SHARDED COLL
Sharded on:
{_id: hashed}
Doc:
{
_id: SSN
sid: ObjectId()
}
Query on _id (shard key)
Return an ObjectId
Main Sharded Collection
Sharded on :
{_id: hashed}
Use sid that was found to query
the _id in the main collection

2 queries rather than n where n is your number of shards
** Not useful unless you are sharded out very far **
● Time averaged by keystore : ~30 seconds
● Time averaged by direct lookup: ~170 seconds
** tests done on a 32 shard cluster

Other Tools - Mongoexup
● Cron jobs are unreliable
● Any ‘prototype’ inevitably becomes production
● Constructed a python scheduler daemon to execute
these tasks
● Looking to open source in the future
Business need to regularly execute mongoexport and
uploads
Built By : Chandrakant Gopalan

Other Tools - Mongoexup
Mongo MongoExUp S3
Greenlets Greenlets
Job Status Information

What are we doing next?
● Open source even more of our tools
● Ansible Automation
● Building API layers around all our DBs
○ Tornado - ASYNC RULES
● MongoDB + Other Data Stores
○ Enhancing the Keystore concept
● Upgrading
○ WT
○ RocksDB

Advanced Administration, Monitoring and Backup

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Advanced Administration, Monitoring and Backup

Similar to Advanced Administration, Monitoring and Backup (20)

More from MongoDB

More from MongoDB (20)

Advanced Administration, Monitoring and Backup