21. When should I scale?
• Metrics:
• Business Performance
• Page faults?
• More reads or writes? Access patterns?
• Find the bottleneck!
• Shard when needed
• Do not wait until 75-80% capacity to scale
20
22. Shard the data!
app app app Sharded Replicas
mongos mongos mongos
DB DB DB
db db db
db db db
21
23. Shards can be in distinct locations
app app app Sharded Replicas
mongos mongos mongos
dc1 DB DB DB
dc2 db db db
dc3 db db db
22
24. Easier to... plan for resources
• Database size vs. working set vs. indexes only
23
26. Things to consider
• More RAM & IOPS
• Write buffering w/o battery backup - bad!
• Config servers are not a replica set
• Sharded cluster: use balancer
• Choose shard key carefully
• Test your backups
25
27. Online resources
• http://mongodb.org
• http://www.10gen.com
• Bug Tracking: https://jira.mongodb.org
• Book: MongoDB in Action, Kyle Banker
• Slides: Deployment Tips, Jared Rosoff
• Video: Schema Design, Eliot Horowitz
• Many, many more on 10gen site
26
30. Why MongoDB?
• It’s easier to:
• develop
• operate
• scale
• No, really.
• Try it, I dare you. http://try.mongodb.org/
Editor's Notes
“Who here runs an application using databases in production?\n“Who has ever had an application or service outage due to database problems?”\n\n
\n
“I know I have.”\nBeen up late at night, wondering why replication doesn’t work, manually failing over to a slave\n
Since today is about “Discover MongoDB”, some of the parts I talk about may not be relevant to you, and others may.\nFor the next 30 minutes or so, I will show you some examples of how using Mongo is easier to implement than many other data storage products.\n
Worked with companies from startup to enterprise, banking to web tech, and everything in between.\nFrom SQL server, mysql to hadoop & vertica\n
Agility:\nFor Developers:\nReally easy to get started. Download, run, use driver, show examples, and it “just works”\n
How many people have seen something like this before?\nThis is a data model diagram of ....\nThe amount of tables and joins are carefully calculated and modeled, and changing that once in production is not an easy task.\n
JSON-style objects, BSON data format. Almost every modern language speaks JSON, as well as many web APIs.\n\n"all bugs reported by michael"\n"all bugs with project website and status open"\n
Write concern:\nFire and forget\nWait for error \nWait for journal sync \nWait for fsync\nWait for replication\n\nEventual consistency\nIn reality, many applications are very useful with eventual consistency\n
mongod - primary mongo application, storage system\n\nconfig - a mongod process used to synchronously replicate the state information of a sharded environment. In a sharded environment, config servers store the metadata of the cluster. Although config server can run standalone, any production deployment should have exactly three config servers that have copies of the same metadata (for data safety and high availability).\n\nmongos - routing & coordination process that makes the mongod nodes in the cluster look like a single system. \nMongos processes have no persistent state; rather, they keep a cached copy of config server data in memory. \nAny changes that occur on the config servers are propagated to each mongos process. Mongos processes may be run on the shard servers themselves, but they are lightweight enough to exist on each application server. Many mongos processes can be run simultaneously since these processes do not coordinate between one another.\n
\n
For Operations:\nreplica set auto failover with reelection\npackages for deb, rpm maintained by 10gen, distros like freebsd maintain their own ports\nthe mongo shell has many admin-style helpers for setting up replica set and sharding configuration\nin progress: working on chef cookbooks, puppet manifests, ec2 cloudformation, ubuntu juju charms, etc\n
mongod - primary mongo application, storage system\n\nconfig - a mongod process used to synchronously replicate the state information of a sharded environment. In a sharded environment, config servers store the metadata of the cluster. Although config server can run standalone, any production deployment should have exactly three config servers that have copies of the same metadata (for data safety and high availability).\n\nmongos - routing & coordination process that makes the mongod nodes in the cluster look like a single system. \nMongos processes have no persistent state; rather, they keep a cached copy of config server data in memory. \nAny changes that occur on the config servers are propagated to each mongos process. Mongos processes may be run on the shard servers themselves, but they are lightweight enough to exist on each application server. Many mongos processes can be run simultaneously since these processes do not coordinate between one another.\n
mongos \n
Pre-production:\n6 month period\nunlimited servers\nengineer support & health check\n
Scalability:\nAny good data storage mechanism today has scalability, one way or another.\nNote: ReplSets max at 7 members\n
\n
mongos \n
In virtualization:\ndon’t scale up as far as you can you before you hit max\n
mongod - primary mongo application, storage system\n\nconfig - a mongod process used to synchronously replicate the state information of a sharded environment. In a sharded environment, config servers store the metadata of the cluster. Although config server can run standalone, any production deployment should have exactly three config servers that have copies of the same metadata (for data safety and high availability).\n\nmongos - routing & coordination process that makes the mongod nodes in the cluster look like a single system. \nMongos processes have no persistent state; rather, they keep a cached copy of config server data in memory. \nAny changes that occur on the config servers are propagated to each mongos process. Mongos processes may be run on the shard servers themselves, but they are lightweight enough to exist on each application server. Many mongos processes can be run simultaneously since these processes do not coordinate between one another.\n
mongos \n
L1 cache: sandwich in front of you\nL2 cache: Walk to kitchen and make a sandwich\nRAM: Go to the store and get ingredients, go home and make a sandwich\nDisk: Start a farm, and grow the wheat and vegetables to make a sandwich\n\nmongod database, config:\nRAM, Disk, CPU\nmongod arbiter:\nnetwork\nmongos router:\nnetwork, CPU\n\n