Administration
  Part Deux
        Michael DelNegro
 Principal Database Administrator
               AOL

                1
Presentation Overview
• Introduction
• Usage
• Best Practices
• Lessons Learned
• Resources
• Upcoming
                    2
About Me
• DBA at AOL (Dulles) for six years
• Background in Sybase
• Now MySQL, PostgreSQL and NoSQL
• Was: Blogsmith, Uncut Video, Travel, Autos,
  Journals, Real Estate, Ficlets, Shopping
• Currently: Patch, MapQuest, HSS,
  Datalayer, Demand
• I Heart Big Data
                       3
About MongoDB
• “Scalable, high-performance, open source,
  document-oriented database”
• Databases (Databases)
 • Collections (Tables)
    • Documents (Rows)
     • Fields (Columns) - K/V Pairs
• Indexes
• No Joins
 • Favors Embedding Data instead of FKs
                     4
MongoDB Support
• Operating Systems
 • Linux, Windows, Mac OS X, Solaris
 • 32bit, 64bit
• Drivers
 • Java(MapQuest), Javascript, Perl,
    Ruby(Patch), Scala, Erlang, C,
    C#(Editions), C++, Haskell, PHP, Python
 • R, Smalltalk, node.js, ColdFusion
                     5
MongoDB Use Cases
• Website Data Store
• Caching Tier
• Document and Content Mgmt Systems
• Event Logging
• Real-time Stats/Analytics
• Archiving
• High Volume Problems
                  6
MongoDB Misuse
• Complex Transactional Systems
• Traditional Business Intelligence
• Small Data and/or Small Traffic
• Should NOT be our default datastore
 • Use MySQL
 • or Use PostgreSQL, CouchBase, Redis,
    Riak, Hive/HBase,Vertica, Neteeza, TBD,
    etc...
                    7
Best Practices
• Slaves are a MUST pre1.8
• Use 64 bit version
 • 32 bit version has 2.5 GB storage limit
• Use xfs or ext4
• Keep eye on oplog size
• Turn off atime & dtime
• Consider using getLastError()
                     8
More Best Practices
• Increase File Descriptor Limits
• Do not use kill -9 (pre-1.8 or non-
  journaled)
• At least 3 node replica sets
• db.runCommand(“logRotate”)
• Keep db.<collection>.totalIndexSize() less
  than RAM
• Linux dirty_background_ratio (10->5%)
  and dirty_ratio (40->10%) (pre 2.6.22)
                      9
Even More
• Use --rest (add 1000 to port)
• Write To a Log
• Take Advantage of 10gen’s MMS
• Use Shortest and Readable Field Names as
  Possible
• drop() is Much Faster Than remove()
                    10
Lessons Learned
• Be Careful About Updates
• Choose Shard Key Carefully
• Turn Off Balancer During Peak Periods
• Use Explain
• Aggressively Upgrade Within Major
  Versions
• Choose Embed vs Top-Level Collections
  Carefully
                    11
Top-Level Collections

• Don’t Belong Conceptually To Another
  Collection
• Building Blocks
• Easily Referenceable and Updatable

                    12
Embedding Pros

• Fast Retrieval of Document With Related
  Data
• Atomic Updates
• Ownership is obvious
• Maps Better With Structure of Code

                    13
Embedding Cons
• Harder To Query/Reference
• Harder To Do Mass Queries
• 16MB Limit Per Document
• Err On Side Of Embedding
• Note: Concepts Here Borrowed From
  Fantastic Preso By Ian White of Sailthru

                     14
Admin Resources
• mongodb.org
 • Events
 • Forums
 • Presentations
• Mongo Snippets (Github)
• IRC (freenode #mongodb)
                  15
More Admin Resources
• slideshare (Use Time-Based Search)
• GUI Admin Tools
 • MongoVUE (Windows)
 • MongoHub (Mac)
 • Others
• Kristina Chodorow's Blog
                    16
Even More Resources
• Follow @MongoQuestion (StackOverflow)
• MongoDB on Quora (@q_mongodb)
• Books
• Training
• Office Hours in NYC and Silicon Valley
• 10gen Support (Email Me To Be Added)
• DC MongoDB Users Group (@MongoDC)
                  17
New MongoDB Release
• 2.0 (Released Last Week)
 • Journaling is Default
 • Per Collection/Index Compact Command
 • Concurrency Improvements
 • Reduced Default Stack Size
 • Index Performance Enhancements
                  18
More 2.0 Features
• Map Reduce Performance Improvements
• Replica Set Improvements
 • Priorities
 • Data-center Awareness
• Release Notes
• 2.0 Features Presentation
                  19
Future Releases
• 2.2 (End of 2011?)
 • New Aggregation Framework
 • More Concurrency Improvements
 • Better Freelist Management
• Beyond
 • Full-Text Search
 • Auto Compaction/Defrag
                  20
Thank You!

• www.slideshare.net/radiocats
• @radiocats on Twitter
• www.linkedin.com/in/mdelnegro
• Humble Plea

                   21

MongoDB Administration 20110922

  • 1.
    Administration PartDeux Michael DelNegro Principal Database Administrator AOL 1
  • 2.
    Presentation Overview • Introduction •Usage • Best Practices • Lessons Learned • Resources • Upcoming 2
  • 3.
    About Me • DBAat AOL (Dulles) for six years • Background in Sybase • Now MySQL, PostgreSQL and NoSQL • Was: Blogsmith, Uncut Video, Travel, Autos, Journals, Real Estate, Ficlets, Shopping • Currently: Patch, MapQuest, HSS, Datalayer, Demand • I Heart Big Data 3
  • 4.
    About MongoDB • “Scalable,high-performance, open source, document-oriented database” • Databases (Databases) • Collections (Tables) • Documents (Rows) • Fields (Columns) - K/V Pairs • Indexes • No Joins • Favors Embedding Data instead of FKs 4
  • 5.
    MongoDB Support • OperatingSystems • Linux, Windows, Mac OS X, Solaris • 32bit, 64bit • Drivers • Java(MapQuest), Javascript, Perl, Ruby(Patch), Scala, Erlang, C, C#(Editions), C++, Haskell, PHP, Python • R, Smalltalk, node.js, ColdFusion 5
  • 6.
    MongoDB Use Cases •Website Data Store • Caching Tier • Document and Content Mgmt Systems • Event Logging • Real-time Stats/Analytics • Archiving • High Volume Problems 6
  • 7.
    MongoDB Misuse • ComplexTransactional Systems • Traditional Business Intelligence • Small Data and/or Small Traffic • Should NOT be our default datastore • Use MySQL • or Use PostgreSQL, CouchBase, Redis, Riak, Hive/HBase,Vertica, Neteeza, TBD, etc... 7
  • 8.
    Best Practices • Slavesare a MUST pre1.8 • Use 64 bit version • 32 bit version has 2.5 GB storage limit • Use xfs or ext4 • Keep eye on oplog size • Turn off atime & dtime • Consider using getLastError() 8
  • 9.
    More Best Practices •Increase File Descriptor Limits • Do not use kill -9 (pre-1.8 or non- journaled) • At least 3 node replica sets • db.runCommand(“logRotate”) • Keep db.<collection>.totalIndexSize() less than RAM • Linux dirty_background_ratio (10->5%) and dirty_ratio (40->10%) (pre 2.6.22) 9
  • 10.
    Even More • Use--rest (add 1000 to port) • Write To a Log • Take Advantage of 10gen’s MMS • Use Shortest and Readable Field Names as Possible • drop() is Much Faster Than remove() 10
  • 11.
    Lessons Learned • BeCareful About Updates • Choose Shard Key Carefully • Turn Off Balancer During Peak Periods • Use Explain • Aggressively Upgrade Within Major Versions • Choose Embed vs Top-Level Collections Carefully 11
  • 12.
    Top-Level Collections • Don’tBelong Conceptually To Another Collection • Building Blocks • Easily Referenceable and Updatable 12
  • 13.
    Embedding Pros • FastRetrieval of Document With Related Data • Atomic Updates • Ownership is obvious • Maps Better With Structure of Code 13
  • 14.
    Embedding Cons • HarderTo Query/Reference • Harder To Do Mass Queries • 16MB Limit Per Document • Err On Side Of Embedding • Note: Concepts Here Borrowed From Fantastic Preso By Ian White of Sailthru 14
  • 15.
    Admin Resources • mongodb.org • Events • Forums • Presentations • Mongo Snippets (Github) • IRC (freenode #mongodb) 15
  • 16.
    More Admin Resources •slideshare (Use Time-Based Search) • GUI Admin Tools • MongoVUE (Windows) • MongoHub (Mac) • Others • Kristina Chodorow's Blog 16
  • 17.
    Even More Resources •Follow @MongoQuestion (StackOverflow) • MongoDB on Quora (@q_mongodb) • Books • Training • Office Hours in NYC and Silicon Valley • 10gen Support (Email Me To Be Added) • DC MongoDB Users Group (@MongoDC) 17
  • 18.
    New MongoDB Release •2.0 (Released Last Week) • Journaling is Default • Per Collection/Index Compact Command • Concurrency Improvements • Reduced Default Stack Size • Index Performance Enhancements 18
  • 19.
    More 2.0 Features •Map Reduce Performance Improvements • Replica Set Improvements • Priorities • Data-center Awareness • Release Notes • 2.0 Features Presentation 19
  • 20.
    Future Releases • 2.2(End of 2011?) • New Aggregation Framework • More Concurrency Improvements • Better Freelist Management • Beyond • Full-Text Search • Auto Compaction/Defrag 20
  • 21.
    Thank You! • www.slideshare.net/radiocats •@radiocats on Twitter • www.linkedin.com/in/mdelnegro • Humble Plea 21