Your SlideShare is downloading. ×

MongoDB Best Practices in AWS

5,737

Published on

Published in: Technology
1 Comment
15 Likes
Statistics
Notes
  • Hi Chris,



    A fine presentation, thank you.



    But the SlideShare 'save' function tries to download a file with an extension of '.key'. I was expecting, .zip or .pptx, etc., as with other SlideShare downloads.



    When I looked at the downloaded file, it appears really to be a zip file but it contains images and thumbnails - no presentation text.



    Can this be fixed? I'd very much like to download the entire presentation.



    Thanks.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
5,737
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
101
Comments
1
Likes
15
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • The things I’m going to talk about are completely inter-related and intertwined. \n\nThere will be talks that go into much greater details on these topics.\n\nArmed with the information you gather and confident in the skills your team has practiced, you should be able to spot long term problems well before it’s too late and handle the emergencies that are sure to arise.\n
  • Add a second process just to illustrate what happens when you have more than one process contending for RAM.\n
  • Add a second process just to illustrate what happens when you have more than one process contending for RAM.\n
  • Add a second process just to illustrate what happens when you have more than one process contending for RAM.\n
  • Add a second process just to illustrate what happens when you have more than one process contending for RAM.\n
  • Add a second process just to illustrate what happens when you have more than one process contending for RAM.\n
  • Add a second process just to illustrate what happens when you have more than one process contending for RAM.\n
  • Add a second process just to illustrate what happens when you have more than one process contending for RAM.\n
  • Add a second process just to illustrate what happens when you have more than one process contending for RAM.\n
  • Since we’re talking about data stores, specifically, MongoDB, before you do anything else at all, you need to understand your data.\n\nHow big is your data set in total?\nHow big is your working set? that is, the size of the data and indexes that need to fit in RAM\nReads vs. writes? (example and use case)\nLong tail or random access? (example)\n\nArmed with this knowledge, you can accommodate both massive growth spurts without excessive over-provisioning.\n\nRandom access:\nTake a user database\nLong tail:\nTwitter feed\nYou need to be ready for 1MM users, how do I size my Use collection.stats to extrapolate\n
  • \n
  • Using standard enterprise spinning disks you can get about 200 seeks / second\n\nSo, you want to be thinking about how you can increase my seeks / second\n
  • Here, if you can imagine that you’re not pulling all your data from a single partition, you can actually increase you throughput by spreading the load across multiple stripes.\n\nSo in this case gaining potentially three times the speed.\n
  • What we typically recommend to run RAID10 in production which adds a mirror volume for each stripe.\n\nWe’ve found that this configuration really works out well for most use cases.\n\nYou get the benefit of increased redundancy and parallelization, despite the cost of writing each update to two volumes.\n\n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript

    • 1. MongoDB Best Practices in AWS Chris Harris Email : charris@10gen.com Twitter : cj_harris5
    • 2. TerminologyRDBMS MongoDBTable CollectionRow(s) JSON DocumentIndex IndexJoin Embedding & LinkingPartition ShardPartition Key Shard Key
    • 3. Here is a “simple” SQL Modelmysql> select * from book;+----+----------------------------------------------------------+| id | title |+----+----------------------------------------------------------+| 1 | The Demon-Haunted World: Science as a Candle in the Dark || 2 | Cosmos || 3 | Programming in Scala |+----+----------------------------------------------------------+3 rows in set (0.00 sec)mysql> select * from bookauthor;+---------+-----------+| book_id | author_id |+---------+-----------+| 1| 1|| 2| 1|| 3| 2|| 3| 3|| 3| 4|+---------+-----------+5 rows in set (0.00 sec)mysql> select * from author;+----+-----------+------------+-------------+-------------+---------------+| id | last_name | first_name | middle_name | nationality | year_of_birth |+----+-----------+------------+-------------+-------------+---------------+| 1 | Sagan | Carl | Edward | NULL | 1934 || 2 | Odersky | Martin | NULL | DE | 1958 || 3 | Spoon | Lex | NULL | NULL | NULL || 4 | Venners | Bill | NULL | NULL | NULL |+----+-----------+------------+-------------+-------------+---------------+4 rows in set (0.00 sec)
    • 4. The Same Data in MongoDB { "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "title" : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ]}
    • 5. Cursors $gt, $lt, $gte, $lte, $ne, $all, $in, $nin, $or, $not, $mod, $size, $exists, $type, $elemMatch> var c = db.test.find({x: 20}).skip(20).limit(10)> c.next()> c.next()... query first N results + cursor id getMore w/ cursor id next N results + cursor id or 0 ...
    • 6. Creating IndexesAn index on _id is automatic.For more use ensureIndex: db.blogs.ensureIndex({author: 1}) 1 = ascending -1 = descending
    • 7. Compound Indexesdb.blogs.save({ author: "James", ts: new Date() ...});db.blogs.ensureIndex({author: 1, ts: -1})
    • 8. Indexing Embedded Documentsdb.blogs.save({ title: "My First blog", stats : { views: 0, followers: 0 }});db.blogs.ensureIndex({"stats.followers": -1})db.blogs.find({"stats.followers": {$gt: 500}})
    • 9. MongoDB on AWS
    • 10. Four things to think about1. Machine Sizing: Disk and Memory2. Load Testing and Monitoring3. Backup and restore4. Ops Play Book
    • 11. Collection 1 Index 1
    • 12. Collection 1 Virtual Address Space 1 Index 1
    • 13. Collection 1 Virtual Address Space 1 Index 1 This is your virtual memory size (mapped)
    • 14. Collection 1 Virtual Address Space 1 Physical RAM Index 1
    • 15. Collection 1 Virtual Address Space 1 Physical RAM Index 1 This is your resident memory size
    • 16. Collection 1 Virtual Disk Address Space 1 Physical RAM Index 1
    • 17. Collection 1 Virtual Disk Address Space 1 Physical RAM Index 1 Virtual Address Space 2
    • 18. Collection 1 Virtual Disk Address Space 1 Physical RAM Index 1 100 ns = 10,000,000 ns =
    • 19. Sizing RAM and Disk• Working set• Document Size• Memory versus disk• Data lifecycle patterns • Long tail • pure random • bulk removes
    • 20. Figuring out working Set> db.wombats.stats(){ "ns" : "test.wombats", Size of data "count" : 1338330, "size" : 46915928, Average document "avgObjSize" : 35.05557523181876, size "storageSize" : 86092032, "numExtents" : 12, Size on disk (and in memory!) "nindexes" : 2, "lastExtentSize" : 20872960, "paddingFactor" : 1, Size of all indexes "flags" : 0, "totalIndexSize" : 99860480, "indexSizes" : { Size of each index "_id_" : 55877632, "name_1" : 43982848 },
    • 21. Disk configurationsSingle Disk ~200 seeks / second
    • 22. Disk configurationsSingle Disk ~200 seeks / secondRAID 0 ~200 seeks / second ~200 seeks / second ~200 seeks / second
    • 23. Disk configurationsSingle Disk ~200 seeks / secondRAID 0 ~200 seeks / second ~200 seeks / second ~200 seeks / secondRAID 10 ~400 seeks / second ~400 seeks / second ~400 seeks / second
    • 24. Basic Tips • Focus on higher Memory and not adding CPU core based instances • Use 64-bit instances • Use XFS or EXT4 file system • Use EBS in RAID. Use RAID 0 or 10 for data volume, RAID 1 for configdb
    • 25. Basic Installation Steps1. Create your EC2 Instance2. Attached EBS Storage3. Make a EXT4 file system $sudo mkfs -t ext4 /dev/[connection to volume]4. Make a data directory $sudo mkdir -p /data/db5. Mount the volume $sudo mount -a /dev/sdf /data/db6. Install MongoDB $curl http://[mongodb download site] > m.tgz $tar xzf m.tgz7. Start mongoDB $./mongodb
    • 26. Types of outage• Planned • Hardware upgrade • O/S or file-system tuning • Relocation of data to new file-system / storage • Software upgrade• Unplanned • Hardware failure • Data center failure • Region outage • Human error • Application corruption
    • 27. How MongoDB Replication works Member 1 Member 3 Member 2•Set is made up of 2 or more nodes
    • 28. How MongoDB Replication works Member 1 Member 3 Member 2 PRIMARY•Election establishes the PRIMARY•Data replication from PRIMARY to SECONDARY
    • 29. How MongoDB Replication works negotiate new master Member 1 Member 3 Member 2 DOWN•PRIMARY may fail•Automatic election of new PRIMARY if majorityexists
    • 30. How MongoDB Replication works Member 1 Member 3 PRIMARY Member 2 DOWN•New PRIMARY elected•Replication Set re-established
    • 31. How MongoDB Replication works Member 1 Member 3 PRIMARY Member 2 RECOVERING•Automatic recovery
    • 32. How MongoDB Replication works Member 1 Member 3 PRIMARY Member 2•Replication Set re-established
    • 33. Replica Set 0 •Two Node? •Network failure can cause the nodes to slip which will result in the the whole system going read only
    • 34. Replica Set 1 •Single datacenter •Single switch & power •Points of failure: •Power •Network •Datacenter •Two node failure •Automatic recovery of single node crash
    • 35. Replica Set 3 •Single datacenter AZ:1 •Multiple power/network zonesAZ:3 AZ:2 •Points of failure: •Datacenter •Two node failure •Automatic recovery of single node crash
    • 36. Replica Set 4•Multi datacenter•DR node for safety•Can’t do multi data center durable write safely since only 1node in distant DC
    • 37. Replica Set 5 •Three data centers •Can survive full data center loss •Can do w= { dc : 2 } to guarantee write in 2 data centers
    • 38. Scaling
    • 39. http://community.qlikview.com/cfs-filesystemfile.ashx/__key/CommunityServer.Blogs.Components.WeblogFiles/ theqlikviewblog/Cutting-Grass-with-Scissors-_2D00_-2.jpg
    • 40. http://www.bitquill.net/blog/wp-content/uploads/2008/07/pack_of_harvesters.jpg
    • 41. Sharding Across AZs• Each Shard is made up of a ReplicaSet• Each Replica Set is distributedacross availability zones for HA anddata protection AZ:1 AZ:3 AZ:2
    • 42. Balancing mongos config balancer configChunks! config 1 2 3 4 13 14 15 16 25 26 27 28 37 38 39 40 5 6 7 8 17 18 19 20 29 30 31 32 41 42 43 44 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4
    • 43. Balancing mongos config balancer config Imbalance Imbalance config1 2 3 45 6 7 89 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4
    • 44. Balancing mongos config balancer config Move chunk 1 to config Shard 21 2 3 45 6 7 89 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4
    • 45. Balancing mongos config balancer config config1 2 3 45 6 7 89 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4
    • 46. Balancing mongos config balancer config config 2 3 45 6 7 8 19 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4
    • 47. Balancing mongos config balancer config Chunk 1 now lives on Shard 2 config 2 3 45 6 7 8 19 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4
    • 48. Backup
    • 49. Replica Set 3 1. Lock the “Backup” Node: db.fsyncLock()backup 2. Check Locked db.currentOp() 3. Take a EBS Snapshot or MongoDump ec2-create-snapshot -d mybackup vol-nn 4. Unlock db.fsyncUnlock()
    • 50. Monitoring
    • 51. Monitoring Toolsmongostat -MMS! - http://mms.10gen.communin, cacti, nagios -http://www.mongodb.org/display/DOCS/Monitoring+and+Diagnostics
    • 52. download at mongodb.org We’re Hiring ! Chris Harris Email : charris@10gen.com Twitter : cj_harris5conferences, appearances, and meetups http://www.10gen.com/events

    ×