0
#MongoDBTokyoDeploymentPreparednessAlvin RichardsTechnical Director, 10gen
Plan A because there is no PlanB             http://bit.ly/QlJULZ
Part OneBefore you deploy…
Prototype           Ops        Playbook                                Test         Capacity         Planning             ...
Essentials• Disable NUMA• Pick appropriate file-system (xfs, ext4)• Pick 64-bit O/S   – Recent Linux kernel, Win2k8R2• Mor...
Key things to consider• Profiling   – Baseline/Blue print: Understand what should happen   – Ensure good Index usage• Moni...
What is your SLA?• High Availability?   – 24x7x365 operation?   – Limited maintenance window?• Data Protection?   – Failur...
Build & Test your Playbook• Backups• Restores (backups are not enough)• Upgrades• Replica Set Operations• Sharding Operati...
Part TwoUnder the cover…
How to see metrics• mongostat• MongoDB plug ins for   – munin, zabix, cacti, ganglia•Hosted Services   – MMS - 10gen   – S...
Operation Counters
Metrics in detail: opcounters• Counts: Insert, Update, Delete, Query, Commands• Operation counters are mostly straightforw...
Resident Memory counter
Metrics in detail: residentmemory• Key metric: to a very high degree, the performance of a mongod is a measure of how much...
Page Faults counter
Collection 1       Virtual                   Disk                   Address                   Space 1                     ...
Metrics in detail: page faults• This measures reads or writes to pages of data file that arent resident in memory• If this...
Working Set> db.blogs.stats(){                                       Size of data    "ns" : "test.blogs",    "count" : 133...
Lock % counter
Metrics in detail: lockpercentage and queues• By itself, lock % can be misleading: a high lock percentage just means that ...
Log fileMon Dec 3 15:05:37 [conn81]getmore scaleout.nodes query: { ts: { $lte: new Date(1354547123142) } }cursorid:8607875...
explain, hint// explain() shows the plan used by the operation> db.c.find(<query>).explain()// hint() forces a query to us...
B-Tree Counter
Metrics in detail: B-Tree• Indicates b-tree accesses including page fault service during an index lookup• If misses are pe...
B-Trees strengths• B-Tree indexes are designed for range queries over a single dimension• Think of a compound index on { A...
B-Trees weaknesses• Ranges queries on the first field of a compound index are suboptimal• Range queries over multiple dime...
Indexing dark corners• Some functionality cant currently always use indexes:   – $where JavaScript clauses   – $mod, $not,...
Other tricks
Warming the Cache> db.c.find( {unused_key: 1} ).explain()> db.c.find( {unused_key: 1} )   .hint( {random_index:1} )   .exp...
Journal on another disk•The journals write load is very different than thedata files   – journal = append-only   – data fi...
--directoryperdb• Allows storage tiering   – Different access patterns   – Different Disk Types / Speeds• use --directoryp...
Dynamically change log level// Change logging level to get more info> db.adminCommand({ setParameter: 1, logLevel: 1 })> d...
Because you now have aPlan B           http://bit.ly/QlJULZ
Upcoming SlideShare
Loading in...5
×

Deployment Preparedness

510

Published on

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
510
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
25
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Deployment Preparedness "

  1. 1. #MongoDBTokyoDeploymentPreparednessAlvin RichardsTechnical Director, 10gen
  2. 2. Plan A because there is no PlanB http://bit.ly/QlJULZ
  3. 3. Part OneBefore you deploy…
  4. 4. Prototype Ops Playbook Test Capacity Planning MonitorReinventing the wheel
  5. 5. Essentials• Disable NUMA• Pick appropriate file-system (xfs, ext4)• Pick 64-bit O/S – Recent Linux kernel, Win2k8R2• More RAM – Spend on RAM not Cores• Faster Disks – SSDs vs. SAN – Separate Journal and Data Files
  6. 6. Key things to consider• Profiling – Baseline/Blue print: Understand what should happen – Ensure good Index usage• Monitoring – SNMP, munin, zabix, cacti, nagios – MongoDB Monitoring Service (MMS)• Sizing – Understand Capability (RAM, IOPs) – Understand Use Cases + Schema
  7. 7. What is your SLA?• High Availability? – 24x7x365 operation? – Limited maintenance window?• Data Protection? – Failure of a Single Node? – Failure of a Data Center?• Disaster Recovery? – Manual or automatic failover? – Data Center, Region, Continent?
  8. 8. Build & Test your Playbook• Backups• Restores (backups are not enough)• Upgrades• Replica Set Operations• Sharding Operations
  9. 9. Part TwoUnder the cover…
  10. 10. How to see metrics• mongostat• MongoDB plug ins for – munin, zabix, cacti, ganglia•Hosted Services – MMS - 10gen – Server Density, Cloudkick• Profiling
  11. 11. Operation Counters
  12. 12. Metrics in detail: opcounters• Counts: Insert, Update, Delete, Query, Commands• Operation counters are mostly straightforward: more is better• Some operations in a replica set primary are accounted differently in a secondary• getlastError(), system.status etc are also counted
  13. 13. Resident Memory counter
  14. 14. Metrics in detail: residentmemory• Key metric: to a very high degree, the performance of a mongod is a measure of how much data fits in RAM.• If this quantity is stably lower than available physical memory, the mongod is likely performing well.• Correlated metrics: page faults, B-Tree misses
  15. 15. Page Faults counter
  16. 16. Collection 1 Virtual Disk Address Space 1 Physical RAM Index 1 100 ns = 10,000,000 ns =
  17. 17. Metrics in detail: page faults• This measures reads or writes to pages of data file that arent resident in memory• If this is persistently non-zero, your data doesnt fit in memory.• Correlated metrics: resident memory, B-Tree misses, iostats
  18. 18. Working Set> db.blogs.stats(){ Size of data "ns" : "test.blogs", "count" : 1338330, "size" : 46915928, Average "avgObjSize" : 35.05557523181876, document size "storageSize" : 86092032, "numExtents" : 12, Size on disk (and "nindexes" : 2, in memory!) "lastExtentSize" : 20872960, "paddingFactor" : 1, "flags" : 0, "totalIndexSize" : 99860480, Size of all "indexSizes" : { indexes "_id_" : 55877632, "name_1" : 43982848 Size of each }, index "ok" : 1}
  19. 19. Lock % counter
  20. 20. Metrics in detail: lockpercentage and queues• By itself, lock % can be misleading: a high lock percentage just means that writing is happening.• But when lock % is high and queued readers or writers is non-zero, then the mongod probably at its write capacity.• Correlated metrics: iostats
  21. 21. Log fileMon Dec 3 15:05:37 [conn81]getmore scaleout.nodes query: { ts: { $lte: new Date(1354547123142) } }cursorid:8607875337747748011ntoreturn:0keyUpdates:0numYields: 216locks(micros) r:615830nreturned:27055reslen:4194349551ms
  22. 22. explain, hint// explain() shows the plan used by the operation> db.c.find(<query>).explain()// hint() forces a query to use a specific index// x_1 is the name of the index from db.c.getIndexes()> db.c.find( {x:1} ).hint("x_1")
  23. 23. B-Tree Counter
  24. 24. Metrics in detail: B-Tree• Indicates b-tree accesses including page fault service during an index lookup• If misses are persistently non-zero, your indexes dont fit in RAM. (You might need to change or drop indexes, or shard your data.)• Correlated metrics: resident memory, page faults, iostats
  25. 25. B-Trees strengths• B-Tree indexes are designed for range queries over a single dimension• Think of a compound index on { A, B } as being an index on the concatenation of the A and B values in documents• MongoDB can use its indexes for sorting as well
  26. 26. B-Trees weaknesses• Ranges queries on the first field of a compound index are suboptimal• Range queries over multiple dimensions are suboptimal• In both these cases, a suboptimal index might be better than nothing, but best is to try to see if you cant change the problem
  27. 27. Indexing dark corners• Some functionality cant currently always use indexes: – $where JavaScript clauses – $mod, $not, $ne – regex• Negation maybe transformed into a range query – Index can be used• Complicated regular expressions scan a whole index
  28. 28. Other tricks
  29. 29. Warming the Cache> db.c.find( {unused_key: 1} ).explain()> db.c.find( {unused_key: 1} ) .hint( {random_index:1} ) .explain()# cat /data/db/* > /dev/null// New in 2.2> db.runCommand( { touch: "blogs", data: true, index: true } )
  30. 30. Journal on another disk•The journals write load is very different than thedata files – journal = append-only – data files = randomly accessed•Putting the journal on a separate disk or RAID(e.g., with a symlink) will minimize any seek-timerelated journaling overhead
  31. 31. --directoryperdb• Allows storage tiering – Different access patterns – Different Disk Types / Speeds• use --directoryperdb• add symlink into database directory
  32. 32. Dynamically change log level// Change logging level to get more info> db.adminCommand({ setParameter: 1, logLevel: 1 })> db.adminCommand({ setParameter: 1, logLevel: 0 })
  33. 33. Because you now have aPlan B http://bit.ly/QlJULZ
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×