• Share
  • Email
  • Embed
  • Like
  • Private Content
Scaling MongoDB on Amazon Web Services (DAT209) | AWS re:Invent 2013

Scaling MongoDB on Amazon Web Services (DAT209) | AWS re:Invent 2013



Over the past year, mobile in-app feedback provider Apptentive has scaled MongoDB on AWS from a single machine to a sharded, thousands-of-operations-per-second, several hundred gigabyte cluster. This ...

Over the past year, mobile in-app feedback provider Apptentive has scaled MongoDB on AWS from a single machine to a sharded, thousands-of-operations-per-second, several hundred gigabyte cluster. This session—packed with demos, code, and actual performance numbers—shares the lessons learned along the way. Topics include picking the right tools for the job (instance sizing and selection, I/O choices, and topological choices); using chef/AWS OpsWorks and AWS CloudFormation to deploy and scale; monitoring with Amazon CloudWatch and MMS; managing backups with Amazon EBS snapshots; and using Amazon Elastic MapReduce alongside MongoDB instances.



Total Views
Views on SlideShare
Embed Views



5 Embeds 121

http://sdelamorena.me 59
https://gitter.im 52
http://assets.txmblr.com 5
http://www.tumblr.com 4
http://www.google.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Scaling MongoDB on Amazon Web Services (DAT209) | AWS re:Invent 2013 Scaling MongoDB on Amazon Web Services (DAT209) | AWS re:Invent 2013 Presentation Transcript

    • DAT209 - Scaling MongoDB on Amazon Web Services Michael Saffitz, CTO & Co-Founder, Apptentive November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
    • Nice to Meet You! Mike Saffitz CTO, Co-Founder, Apptentive Follow at: @msaffitz • Connect at: mike@apptentive.com Apptentive The easiest way for anyone with an app to talk with their customers Follow at: @apptentive • Connect at: info@apptentive.com
    • Apptentive & AWS
    • Apptentive & AWS Route53 CloudFront IAM S3 Web Servers EC2: 6 x c1.medium api.apptentive.com www.apptentive.com (Elastic Load Balancer) apptentive.com/blog Elastic Beanstalk, RDS CloudWatch Elastic MapReduce VPN Server EC2: m1.small Stats & Logging EC2: 2x m1.medium m1.small Sharded MongoDB Cluster EC2: 9 Instances CI & Chef EC2: m1.medium m1.small Redis EC2: m1.medium Virtual Private Cloud
    • Agenda • Why Scale MongoDB on AWS? • Planning • Deploying • Maintaining
    • Why Scale MongoDB on AWS?
    • Why Scale MongoDB on AWS? Supports Diverse Set of Scenarios Rapidly Scale On Demand Simple To Administer Easy Friendly Query Syntax Well Documented Flexible Broad Language Support Competitive TCO Cost Effective Fine Grain Control Over Price & Performance
    • Why Not Scale MongoDB on AWS? Your Data is Predominately Relational in Nature Don’t Want to Incur the Administrative Costs Consider RDS Hosted Alternatives Consider DynamoDB
    • 1. Planning
    • Planning Checklist • Topologies – MongoDB – AWS • Instance Selection • Storage
    • MongoDB Topologies: Single Server mongod
    • MongoDB Topologies: Single ReplicaSet w/ Arbiter Automatic Failover mongod (primary) mongod (secondary) Contains Full Copy of Data on the Primary – Can be Used for Reads mongod (arbiter) Arbiter Only Participates in Voting to Elect a New Primary (Must Have Odd #)
    • MongoDB Topologies: Single ReplicaSet Automatic Failover mongod (primary) mongod (secondary) Scale Across Instance Types mongod (secondary) Data Replicated Within ReplicaSet
    • MongoDB Topologies: Sharded Cluster App Server mongos App Server … mongos mongod process config config config Data Partitioned Across Shards mongod (primary) mongod (secondary) mongod (secondary) Data Replicated Within Shard … mongod (primary) mongod (secondary) mongod (secondary)
    • MongoDB Topologies: Picking One • Single Server? Not For Production • Don’t Shard Prematurely – ReplicaSets can take you surprisingly far • … But Don’t Wait Too Long to Shard – Collections over 256GB may have issues migrating to shards – Rebalancing consumes IO and can be very slow • Pick the Right Instance Size for Your Topology… – We’re going to get to this in a moment
    • AWS Topologies: AZs & Regions • Obvious: Distribute Across Availability Zones in a Region – No Single Point of Failure • Distributing Across Regions – Shard per Region versus Shards Across Regions – Considerations • • • • Replication Latency Data Transfer Costs Administration Costs Speedup from Geo-Based Tag Aware Sharding
    • Selecting an Instance: Considerations Compute Memory EBS Optimized? Cost
    • Selecting an Instance: Compute • Most Likely to Not Be A Significant Factor – Exceptions: Heavy use of Map/Reduce, Aggregation Framework – Mongo 2.4 added concurrency via V8 – Important! Only run 64-Bit ; 32-Bit is limited to ~2GB • Real World Numbers on m1.large:
    • Selecting an Instance: Memory • Estimate Necessary Working Set – db.runCommand( { serverStatus: 1, workingSet: 1 } ) Is pagesInMemory * 4k approaching total RAM? Is overSeconds decreasing / small? – db.stats() • Pick the Instance that Matches • Monitor on MMS – Page Faults (abstract) – Queues (better) – Response Times (best)
    • Selecting an Instance: EBS Optimization • Run EBS Optimized When Available – Especially with Provisioned IOPs • Volume Config Impacts IO Perf Far More than Instance Selection
    • Storage • Instance Storage – Non-Durable – Fast But Inconsistent Performance – Can’t Use Snapshots for Backups • “Standard” EBS – Slower – Higher Variability Performance • Provisioned IOPs EBS – Consistent Performance – Don’t Under Provision -- Watch Queue Length
    • Storage • RAID 10? Just use LVM on RAID 0 – More: http://blog.mongohq.com/debunking-myth-of-raid-10-asbest-practice-on-aws/ • Use XFS or Ext4 • Mount with noatime, noexec, nodiratime
    • Selecting an Instance: Summary 1. Lead with Working Set Requirements 2. Validate Compute is Sufficient 3. Enable EBS Optimized if Available 4. Use Provisioned IOPS EBS 5. (Confirm Cost is Acceptable)
    • 2. Deploying
    • It’s Easy. Let me show you.
    • Scaling Deployment • DevOps: Go for ‘bilities: – Reliability, Predictability, Repeatability, and Auditability • The Result is Easy Replaceability and Scalability – Build your infrastructure so it can be treated like an appliance – The impact of your decisions during planning will be significantly mitigated
    • DevOps Tools • AWS Marketplace AMIs – Preconfigured with MongoDB best practices – Do-it-yourself scaling to ReplicaSets / Shards – Helpful, but not a DevOps Solution • AWS CloudFormation – Templates for Resource Setup & Initial Configuration • Chef, Puppet, Ansible, SaltStack, & More – AWS OpsWorks, but limited by chef-solo
    • Security • Run in a VPC – Complications: Cross Region, Multiple Source Ingress • Use KeyFiles & Roles – KeyFiles: Internal authentication for cluster members – Roles allow for user-level fine grain access control • Advanced: – Keberos support in MongoDB 2.4 – SSL Support in Custom Builds & MongoDB Enterprise
    • 3. Maintaining
    • Monitoring: MongoDB Monitoring Service • Very Good, Free Holistic Monitoring – – Important: ReplLag, Page Faults, Lock % Informative: OpCounters, Connections, Queue Lengths • Includes Basic Alerting of Host Failures and Metric Thresholds • Query Profiler Details Slow Queries – db.setProfilingLevel(1)
    • Monitoring: Amazon CloudWatch • Detailed Resource Level Monitoring – Important: Queue Length, Read/Write Latencies • Versatile alerting based on Amazon Simple Notification Service (SNS)
    • Backups • Delayed Secondary – Questionable as a primary backup strategy • Dump/Restore – Impractical for larger deployments • MongoDB Service – Managed, Secure, Point in Time. Unclear suitability for larger deployments – Expensive • Snapshots – Fast, Easy, Scalable. Pay Attention to Consistency (RAID, Shards)
    • Easy Snapshot-Based Backups With Mongolly • Automatic topology detection, snapshotting, and snapshot management for EBS-backed MongoDB Databases • Easy as: $ mongolly backup • https://github.com/msaffitz/mongolly
    • Conclusions • MongoDB + AWS = • Options For All Deployment / Workload Sizes – I/O typically the focal point for optimization • Investing in a DevOps Strategy + Solution Makes It Near Effortless
    • Please give us your feedback on this presentation DAT209 As a thank you, we will select prize winners daily for completed surveys!