Approaching 1 Billion
       Documents in MongoDB




                    David Mytton
1/30      david@boxedice.com / @dav...
Server Density Monitoring


       Processing       Database            UI




2/30
                    www.serverdensity....
Cache / Data Store
                      Postback




       checksLatest              checksHistorical



3/30
db.stats()
       Documents                  937,393,315

       Collections                      27,566

       Indexes  ...
13 months ago




5/30
       Why we moved: http://bit.ly/mysqltomongo
Initial Setup

                     Replication




       Master                       Slave
         DC1                ...
Vertical Scaling

                  Replication




        Master                   Slave
         DC1                   ...
Tip #1

       Keep your indexes in
       memory at all times.

           db.stats()


8/30
i/o not an issue




9/30
Tip #2
         Data is flushed to disk every 60s.


        db.runCommand({fsync:1});


             --syncdelay [60]

10/...
Sharding solves
          everything




11/30
Manual Partitioning
                      Replication




           Master A                   Slave A
            DC1   ...
Sustained Traffic



                  Master                      Slave
        Avg out:       2.4Mbit/s   Avg out:       ...
Database vs collections


        • Many databases = many data files (small but
          quickly get large).
        • Man...
Namespaces = Number of collections +
                number of indexes




15/30
Tip #3


        Monitor the 24,000
         namespace limit.


16/30
Using Server Density




17/30
Console

        db.system.namespaces.count()




18/30
Replica Pairs = Failover
                        Replica Pair




             Master A                    Slave A
       ...
Tip #4


        Pre-provision your oplog files.



20/30
A shell script to generate 75GB oplog files




          for i in {0..40}
        do echo $i
        head -c 2146435072 /d...
Tip #5


        Expect slower performance
         during initial replica sync.


22/30
Tip #6


        You can rotate your log files
             from the console.


23/30
Rotating your log files

         db.runCommand("logRotate")




24/30
Tip #7

            Index creation blocks by
            default. Use background
              indexing if necessary.



2...
Tip #8

         Increase your OS file
         descriptor limit + use
        persistent connections.


26/30
Too many open files!
                 /etc/security/limits.conf
         mongo hard nofile 10000
         mongo soft nofile...
Space is not reused
        Data + indexes                  551GB

        Actual disk usage               638GB


       ...
Summary
        1. Keep indexes in memory.
        2. Data is flushed to disk every 60s.
        3. Monitor the 24k namespa...
Slides
          blog.boxedice.com/mongodb




                  David Mytton
30/30   david@boxedice.com / @davidmytton
Upcoming SlideShare
Loading in …5
×

MongoUK - Approaching 1 billion documents with MongoDB1 Billion Documents

3,954 views

Published on

Presentation given at the MongoUK conference on 18th June 2010 by David Mytton on approaching 1 billion documents in MongoDB.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,954
On SlideShare
0
From Embeds
0
Number of Embeds
289
Actions
Shares
0
Downloads
48
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

MongoUK - Approaching 1 billion documents with MongoDB1 Billion Documents

  1. 1. Approaching 1 Billion Documents in MongoDB David Mytton 1/30 david@boxedice.com / @davidmytton
  2. 2. Server Density Monitoring Processing Database UI 2/30 www.serverdensity.com
  3. 3. Cache / Data Store Postback checksLatest checksHistorical 3/30
  4. 4. db.stats() Documents 937,393,315 Collections 27,566 Indexes 45,277 Stored data 638GB Inserts 5000-8000/s 4/30 As of 17th Jun 2010.
  5. 5. 13 months ago 5/30 Why we moved: http://bit.ly/mysqltomongo
  6. 6. Initial Setup Replication Master Slave DC1 DC2 8GB RAM 8GB RAM 6/30
  7. 7. Vertical Scaling Replication Master Slave DC1 DC2 72GB RAM 8GB RAM 7/30
  8. 8. Tip #1 Keep your indexes in memory at all times. db.stats() 8/30
  9. 9. i/o not an issue 9/30
  10. 10. Tip #2 Data is flushed to disk every 60s. db.runCommand({fsync:1}); --syncdelay [60] 10/30
  11. 11. Sharding solves everything 11/30
  12. 12. Manual Partitioning Replication Master A Slave A DC1 DC2 16GB RAM 16GB RAM Replication Master B Slave B DC1 DC2 12/30 16GB RAM 16GB RAM
  13. 13. Sustained Traffic Master Slave Avg out: 2.4Mbit/s Avg out: 4.0Mbit/s Avg in: 3.8Mbit/s Avg in: 111.2Kbit/s 13/30
  14. 14. Database vs collections • Many databases = many data files (small but quickly get large). • Many collections = watch namespace limit. 14/30
  15. 15. Namespaces = Number of collections + number of indexes 15/30
  16. 16. Tip #3 Monitor the 24,000 namespace limit. 16/30
  17. 17. Using Server Density 17/30
  18. 18. Console db.system.namespaces.count() 18/30
  19. 19. Replica Pairs = Failover Replica Pair Master A Slave A DC1 DC2 16GB RAM 16GB RAM Replica Pair Master B Slave B DC1 DC2 19/30 16GB RAM 16GB RAM
  20. 20. Tip #4 Pre-provision your oplog files. 20/30
  21. 21. A shell script to generate 75GB oplog files for i in {0..40} do echo $i head -c 2146435072 /dev/zero > local.$i done 21/30
  22. 22. Tip #5 Expect slower performance during initial replica sync. 22/30
  23. 23. Tip #6 You can rotate your log files from the console. 23/30
  24. 24. Rotating your log files db.runCommand("logRotate") 24/30
  25. 25. Tip #7 Index creation blocks by default. Use background indexing if necessary. 25/30 MongoDB Manual: http://bit.ly/mongobgindex
  26. 26. Tip #8 Increase your OS file descriptor limit + use persistent connections. 26/30
  27. 27. Too many open files! /etc/security/limits.conf mongo hard nofile 10000 mongo soft nofile 10000 user type limit /etc/ssh/sshd_config UsePAM yes 27/30
  28. 28. Space is not reused Data + indexes 551GB Actual disk usage 638GB Fixed in 1.1.4 1.3.x 1.5.0 1.5.1 1.5.2 1.5.3 1.5.4? 28/30 JIRA: SERVER-366
  29. 29. Summary 1. Keep indexes in memory. 2. Data is flushed to disk every 60s. 3. Monitor the 24k namespace limit. 4. Pre-provision oplog files. 5. Expect slower performance on replica sync. 6. Rotate logs from the console. 7. Index creation blocks by default. 29/30 8. OS file descriptor limit + persistent connections.
  30. 30. Slides blog.boxedice.com/mongodb David Mytton 30/30 david@boxedice.com / @davidmytton

×