Introducing MongoDB in a multi-site HA environment

4,104
-1

Published on

This presentation was given by us at Mongo Munich on 10th of October 2011. It covers the introduction and mostly the durability and robustness testing of MongoDB at AutoScout24 before launching a new site.

Published in: Technology
1 Comment
1 Like
Statistics
Notes
  • If you are deploying on Amazon AWS MongoDirector makes it really simple to distribute your replicas across different regions in Amazon - http://blog.mongodirector.com/introducing-cross-region-mongo-replicas-on-aws/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
4,104
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
66
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Introducing MongoDB in a multi-site HA environment

  1. 1. www.autoscout24.com<br />IntroducingMongoDB in a HA multi-site Environment<br />Munich| 10.10.2011 | Sebastian Geib, Jean-Charles Thomas<br />
  2. 2. Seite 2<br />Jean-Charles Thomas<br />Team Lead Unix Systems and Applications<br />Sebastian Geib<br />Database Administrator<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />
  3. 3. Index<br />1 About AutoScout24<br />2 Why MongoDB?<br />3 MongoDBarchitectureat AutoScout24<br />4 TestingMongoDB<br />5 MongoDBbackup/restore<br />6 MonitoringMongoDB<br />7 Conclusion<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />Seite 3<br />
  4. 4. AutoScout24<br />Who arewe?<br />5.4 Mio Users<br />14 Countries<br />1.9 Mio Cars<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />Seite 4<br />
  5. 5. AutoScout24 KPI<br />Numbers, Data and Facts fortheoverall Autoscout24 IT<br />Two seperated Data Centers in Germany<br />>1000 Servers, 10 Loadbalancer, 25 Firewalls, 60 development servers<br />16 Storagesystemes with raw capacity of 800TB<br />11,2 Mrd. Total Requests (PI + Grabber and Bots) / Month<br />58 Mio. Image files for 1,9 Mio. Cars<br />2,1 Gbit/sec Peak Traffic<br />180TB Data Volume / Month<br />Four Broadband Provider with in Total 13GBit/sec<br />Seite 5<br />Internet<br />AS44355<br />Global Traffic Mgmt.<br />Loadbalancer<br />Loadbalancer<br />Firewall<br />Firewall<br />Loadbalancer<br />Loadbalancer<br />Backbone Router<br />Backbone Router<br />Applicationsserver<br />Applicationsserver<br />Database server<br />Database server<br />DC 2<br />DC 1<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />
  6. 6. Index<br />1 About AutoScout24<br />2 WhyMongoDB<br />3 MongoDBarchitectureat AutoScout24<br />4 TestingMongoDB<br />5 MongoDBbackup/restore<br />6 MonitoringMongoDB<br />7 Conclusion<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />Seite 6<br />
  7. 7. Why MongoDB at AutoScout24?<br />New Product 2011: Portal for car inspection and services<br />Complete new application development from scratch for the Front- and Back-ends<br />Let‘s use what we dreamed!<br />Initial Database Requirements<br />Scale for large quantity of data<br />Highly available across Data Centers<br />Flexible database changes (avoid the DBAs as much as possible!)<br />MapReduce functions<br />Easy management<br />MongoDB was choosen as the best Product<br />Product Launch was September 2011<br />Seite 7<br />http://werkstatt.autoscout24.de/<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />
  8. 8. Index<br />1 About AutoScout24<br />2 WhyMongoDB<br />3 MongoDBarchitectureat AutoScout24<br />4 TestingMongoDB<br />5 MongoDBbackup/restore<br />6 MonitoringMongoDB<br />7 Conclusion<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />Seite 8<br />
  9. 9. Mongo Architecture<br />Seite 8<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />
  10. 10. Mongo Architecture<br />Seite 9<br /> Replica set across two data centers: primary and secondary.<br />All four nodes are actively used by the application.<br />Primary data center split in two fire areas.<br />In the primary data center, both primary and secondary nodes can assume the role of a primary automatically.<br />In the secondary data center, both secondary nodes can only be manually promoted to become a primary to avoid split brain situations.<br />Currently running MongoDB1.8.1<br />All servers virtualized using Vmware ESX 4.1<br />2 Cores, 4 GB RAM, 100 GB HDD per server<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />
  11. 11. Index<br />1 About AutoScout24<br />2 WhyMongoDB<br />3 MongoDBarchitectureat AutoScout24<br />4 TestingMongoDB<br />5 MongoDBbackup/restore<br />6 MonitoringMongoDB<br />7 Conclusion<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />Seite 10<br />
  12. 12. MongoDB Robustness Testing<br />On-servertests<br />Seite 11<br />Running out of disk space on data volume while writing<br /><ul><li>On primary node only: crashed the whole replica set.
  13. 13. On all nodes: it led to error messages in the log but no feedback in the client.</li></ul>Running out of disk space on data volume while reading<br /><ul><li>No significant impact.</li></ul>Removing volume while writing<br /><ul><li>Primary switched to another host and insert and replica set were broken.</li></ul>Removing volume while reading<br /><ul><li>Open query didn’t return. Further queries were handled by other set members.</li></ul>| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />
  14. 14. MongoDB HA Testing 1<br />Replicasettests<br />Seite 12<br />Primary node failing while writing<br /><ul><li>9 seconds failover time until new primary is elected without safe mode enabled.
  15. 15. The failover takes 13 seconds with safe mode enabled.
  16. 16. After reboot the former primary becomes a working secondary.</li></ul>| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />
  17. 17. MongoDB HA Testing 1<br />Replicasettests<br />Seite 13<br />Secondary node failing while reading<br /><ul><li>It takes 9 seconds for the remaining replica set to realize the node is gone.</li></ul>Arbiter failing<br /><ul><li>No impact whatsoever.
  18. 18. Majority remains intact and replica set is working properly.</li></ul>| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />
  19. 19. MongoDB HA Testing 2<br />TestingReplica Set in bothdatacenters<br />Seite 14<br />Primary and Secondary nodes failing in main data center while writing<br /><ul><li>Test tool crashes and cannot write anymore.
  20. 20. Cluster remains without primary.
  21. 21. Reads are handled properly.</li></ul>Arbiter failing (in both data centers)<br /><ul><li>No impact.
  22. 22. Replica set still working fine due to majority being in place.</li></ul>| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />
  23. 23. Index<br />1 About AutoScout24<br />2 WhyMongoDB<br />3 MongoDBarchitectureat AutoScout24<br />4 TestingMongoDB<br />5 MongoDBbackup/restore<br />6 MonitoringMongoDB<br />7 Conclusion<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />Seite 15<br />
  24. 24. MongoDB Backup<br />Backingupdataandgettingintotrouble<br />Seite 16<br />Testingandpreparingbackupandrestore was themostboringtask.<br />Long waitingwith large setsoftestdata.<br />Different attempts:<br /><ul><li>LVM snapshot
  25. 25. Working fine. Restore a bitmorecomplicated.
  26. 26. Dump
  27. 27. Easiertorestoreandtoextractspecificdatafrom a restore.</li></ul>Forourcurrentdatavolumemongodumpisthebestchoiceforus.<br />Locking an issue (verifyyourlocksarereleased after backuporyou‘llbe in trouble).<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />
  28. 28. MongoDB Restore<br />TestingRestores on fullreplicaset<br />Seite 17<br />Restore of full database (test size 70 GB)<br /><ul><li>Cannot be restored in one transaction because secondaries become stale although oplog size was already increased a lot
  29. 29. With a restore in three 40 minute chunks a restore was possible
  30. 30. A stale secondary could be restored within 30 minutes by removing its data</li></ul>Restore of one Secondary/Passive after failure or becoming stale<br /><ul><li>No surprises
  31. 31. It took 30 minutes to get it back up and running</li></ul>| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />
  32. 32. Index<br />1 About AutoScout24<br />2 WhyMongoDB<br />3 MongoDBarchitectureat AutoScout24<br />4 TestingMongoDB<br />5 MongoDBbackup/restore<br />6 MonitoringMongoDB<br />7 Conclusion<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />Seite 18<br />
  33. 33. MongoDB Monitoring<br />Monitoring 1.0 (duringtesting) andsomepitfalls<br />Seite 19<br />How?<br /><ul><li>Centreon
  34. 34. Nagios-based
  35. 35. Combining Nagios and Munin plugins to have nice charting and alerting in the same place.</li></ul>What?<br /><ul><li>Basically everything that could be relevant:
  36. 36. CPU, Load, Memory, Network, I/O, Disks
  37. 37. MongoDB Specific: Commands, Connections, Replica Set State, Flushing, Locking, Memory Consumption, Data file size</li></ul>| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />
  38. 38. MongoDB Monitoring<br />Monitoring 1.1 (after going live)<br />Seite 20<br />How?<br /><ul><li>PRTG
  39. 39. XML-driven
  40. 40. Windows-based which needs to make heavy use of Cygwin to watch Linux servers.
  41. 41. Integrates with AutoScout24 platform monitoring.
  42. 42. Cluster monitoring with checks for overall availability and the like.</li></ul>What?<br /><ul><li>System Monitoring:
  43. 43. CPU, Load, Memory, Network, I/O, Swap, Disks
  44. 44. MongoDB Specific: Availability, Commands, Connections, Replica Set State, Flushing, Locking, Memory Consumption</li></ul>| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />
  45. 45. Index<br />1 About AutoScout24<br />2 WhyMongoDB<br />3 MongoDBarchitectureat AutoScout24<br />4 TestingMongoDB<br />5 MongoDBbackup/restore<br />6 MonitoringMongoDB<br />7 Conclusion<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />Seite 21<br />
  46. 46. MongoDB Conclusion<br />What‘sbeenimportantforus<br />Seite 22<br />Conclusion:<br />As longasreplicasetsareworkingfinetheyaregreat. Watch theirhealth and youwon‘tgetintotrouble.<br />Overall robustnesscouldbefurtherimprovedwithbettererrorhandling and reportingfromtheMongoDBserver.<br />C# driverneedssomefurthertweakingtoavoidaccesses on arbiters. The currentreleasefixes this but hasn‘tbeenintroduced in productionyet.<br />Whenourprimarydatacenteris down, noprimarycanbeelected in thesecondarydatacenter due tomissingmajority. This was our design choicetohavebettercontroloverprimaryelection.<br />Permissionsneedtobeset in a moreatomicfashion. Most ofourteammembersarecomingfrom an Oracle background so theyareexpecting a lot.<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />
  47. 47. MongoDB Outlook<br />Whatwearelookingforwardto<br />Seite 23<br />Outlook:<br />MongoDB2.0 looksreallypromising forus. Wearecurrentlywaitingforthefirstbugfixrelease and will thenstartourtesting.<br />Improveddatacenterawareness a bigwinforus.<br />Replicasetconfigurationwithminority in placeisreallyusefulforourfailoverscenario.<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />
  48. 48. Seite 24<br />Questions?<br />jcthomas@autoscout24.de<br />sgeib@autoscout24.de<br />Looking for a great job as DBA in one of the <br />largest internet companies in Europe?<br />Great! We are looking to hire DBAs. <br />Have a look on our homepage or contact us directly.<br />| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas <br />

×