Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Agility and Scalability with MongoDB

1,902 views

Published on

MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.

Published in: Technology

Agility and Scalability with MongoDB

  1. 1. MongoDB Scalability and Agility Chris.Biow@MongoDB.com
  2. 2. Data Challenge “I want my data...” • Now • Secure • All varieties • Fast and interactive • Scalable to “Big” • Agile to develop and deploy operationally • Cloud and edge 2 iStock licensed (pixelfit)
  3. 3. Scalability with MongoDB Metric Meaning Examples Operations per Second 3 Concurrent reads and writes per second > 1 Million per second Nodes per Cluster Horizontal scale-out, distributed to multiple data centers worldwide, with high availability, using inexpensive cloud resources > 1000 nodes Records / Documents Data objects in any number of schemas or structures > 10 billion Data Volume Total amount of data: documents X size > 1 Petabyte = 10^15 = 1,000,000,000,000,000 ≈ 2^50
  4. 4. Key Differentiation
  5. 5. Operational Database Landscape 5
  6. 6. Document Data Model Relational MongoDB 6 { first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } } }
  7. 7. Documents are Rich Data Structures 7 { first_name: ‘Paul’, surname: ‘Miller’, cell: ‘+447557505611’ city: ‘London’, location: [45.123,47.232], Profession: [banking, finance, trader], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } } } Fields can contain an array of sub-documents Fields Typed field values Fields can contain arrays
  8. 8. Document Model Benefits • Agility and flexibility 8 – Data model supports business change – Rapidly iterate to meet new requirements • Intuitive, natural data representation – Eliminates ORM layer – Developers are more productive • Reduces the need for joins, disk seeks – Programming is more simple – Performance delivered at scale
  9. 9. Big Data Tech Interest Comparison 11 j.mp/Ssvpev
  10. 10. Enterprise Adoption Comparison 12 bit.ly/1vAI7rF
  11. 11. Architecture for Availability & Scalability
  12. 12. Replica Sets 14 • Replica Set – two or more copies • Availability solution – High Availability – Disaster Recovery – Maintenance • Deployment Flexibility – Data locality to users – Workload isolation: operational & analytics • Self-healing shard Application Driver Primary Secondary Secondary Repl ication
  13. 13. Global Data Distribution 16 Real-time Real-time Real-time Real-time Real-time Real-time Real-time Primary Secondary Secondary Secondary Secondary Secondary Secondary Secondary
  14. 14. Automatic Sharding • Sharding types 17 • Range • Hash • Tag-aware • Elastic increase or decrease in capacity • Automatic balancing
  15. 15. Query Routing • Multiple query optimization models • Each sharding option appropriate for different apps 18
  16. 16. Performance
  17. 17. Drag Strip: straight ahead, quarter-mile, stop 20
  18. 18. Road Race: stay fast, stay agile, continuous 21 Nürburgring, Germany
  19. 19. MongoDB at Scale
  20. 20. CarFax • Large data set 24
  21. 21. 25 Baseline MongoDB Comparison Initial Production • Vehicle History Database • 11 billion records (growing at 1 billion per year) • 30-year-old VMS-based RDBMS • Cumbersome • Costly • Performance: 4x faster than baseline, 10x key-value • Scale out using inexpensive commodity servers • Built-in redundancy • Flexible dynamic schema data model • Strong consistency • Analytics/aggregation • MongoDB is primary data store • 50 servers • 10 shards • 5 node replica sets per shard In-depth NoSQL evaluation
  22. 22. CARFAX Sharding and Replication • 13 billion+ documents 26 – 1.5 billion documents added every year • 1 vehicle history report is > 200 documents • 12 Shards • 9-node replica sets • Replicas distributed across 3 data centers
  23. 23. CARFAX Replication 27
  24. 24. 28
  25. 25. Foursquare • 50M users. • 6B check-ins to date (6M per day growth). • 55M points of interest / venues. • 1.7M merchants using the platform for marketing • Operations Per Second: 300,000 • Documents: 5.5B (~16.5B with replication).* 29
  26. 26. Foursquare clusters • 11 MongoDB clusters 30 – 8 are sharded • Largest cluster for check-ins • 15 shards (check ins) • Shard key user_id
  27. 27. Facebook / parse.com mobile apps • Persistent database for 270,000 mobile applications • 200 M end-user mobile devices • 250% annual growth in client apps • 500% growth in requests • 1.5 M collections • Key differentiators: 31 – Document data model – High perf. & avail. – Geospatial query and index • Charity Majors operations: j.mp/X3jVRC – Understand your database and your data, and build for them.
  28. 28. Scalability Exercises in the Cloud with Amazon Web Services
  29. 29. Petascale Database • 27x hs1.8xlarge instances 35 – 16x VCPU – 24x 2TB SATA drives, RAID0 – 8x mongod microshards • Modified Yahoo Cloud Serving Benchmark (YCSB) – Long Integer IDs (>2B) – Zipfian-distributed integer fields – Aggregation queries • Load direct to 216 shards, 10 days, $4K "objects" : 7,170,648,489, "avgObjSize" : 147,438.99952658816, "dataSize" : NumberLong("1,057,240,224,818,640") (commas added)
  30. 30. CGroup Memory Segregation for DB in `seq 0 3`; do sudo cgcreate -a mongodb:mongodb -t mongodb:mongodb -g memory:mongodb$D sudo echo 48G > /sys/fs/cgroup/memory/mongodb$D/memory.limit_in_bytes cgexec -g memory:mongodb$DB numactl –interleave=all mongod –-config ~/mongod$DB.conf done
  31. 31. Megawrite Ingest • Ingest 250-byte stock quotes at 2M/s • Concurrently run 5 QPS, subsecond/indexed response on 37 timeStamp, accountId, instrumentId, systemKey • 5x r3.4xlarge – 16x VCPU, 1x 320GB SSD, 122GB RAM, 16x mongod – 2.1M insert/second direct to shards • 16x c3.8xlarge – 32x VCPU, 2x 320GB SSD, 60GB RAM, 16x mongod, 4x mongos – 2.1M insert/second via mongos
  32. 32. Java API comparison • 2 threads on c3.8xl • 264 bsonsize object, _id index only • coll.insert() 38 15,600 ins / sec • coll.insert(List<DBObject>) listsize = 64: 118,000 ins / sec • Bulk ops API size = 64: 120,000 ins / sec
  33. 33. BulkWriteOperation bo = null; for(a = 0; a < this.items && stayAlive; a++) { if(bo == null) { bo = collection.initializeUnorderedBulkOperation(); } fillMap(this.m); BasicDBObject dbObject = new BasicDBObject(this.m); bo.insert(dbObject); if(0 == a % listsize) { BulkWriteResult rc = bo.execute(); bo = null; } } 7x Load with BulkOp
  34. 34. How do I Pick A Shard Key?
  35. 35. Shard Key characteristics 41 • A good shard key has: – sufficient cardinality – distributed writes – targeted reads ("query isolation") • Shard key should be in every query if possible – scatter gather otherwise • Choosing a good shard key is important! – affects performance and scalability – changing it later is expensive
  36. 36. Hashed shard key 42 • Pros: – Evenly distributed writes • Cons: – Random data (and index) updates can be IO intensive – Range-based queries turn into scatter gather Shard 1 mongos Shard 2 Shard 3 Shard N
  37. 37. Low cardinality shard key 43 • Induces "jumbo chunks" • Examples: boolean field Shard 1 mongos Shard 2 Shard 3 Shard N [ a, b )
  38. 38. Ascending shard key 44 • Monotonically increasing shard key values cause "hot spots" on inserts • Examples: timestamps, _id Shard 1 mongos [ ISODate(…), $maxKey ) Shard 2 Shard 3 Shard N
  39. 39. Ensuring Success with High Scalability
  40. 40. Success Factors • Storage: random seeks (IOPS) • RAM: working set based on query patterns • Query: indexing • Delete: most expensive operation • Real-time vs. bulk operations • Continuity: HA, DR, backup, restore • Agile process: iterate by powers of 4 • Sharding: shard key and strategy • Resources: don’t go it alone! 46

×