Your SlideShare is downloading. ×

Introducing MongoDB in a multi-site HA environment

3,331

Published on

This presentation was given by us at Mongo Munich on 10th of October 2011. It covers the introduction and mostly the durability and robustness testing of MongoDB at AutoScout24 before launching a new …

This presentation was given by us at Mongo Munich on 10th of October 2011. It covers the introduction and mostly the durability and robustness testing of MongoDB at AutoScout24 before launching a new site.

Published in: Technology
1 Comment
1 Like
Statistics
Notes
  • If you are deploying on Amazon AWS MongoDirector makes it really simple to distribute your replicas across different regions in Amazon - http://blog.mongodirector.com/introducing-cross-region-mongo-replicas-on-aws/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
3,331
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
54
Comments
1
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. www.autoscout24.com
    IntroducingMongoDB in a HA multi-site Environment
    Munich| 10.10.2011 | Sebastian Geib, Jean-Charles Thomas
  • 2. Seite 2
    Jean-Charles Thomas
    Team Lead Unix Systems and Applications
    Sebastian Geib
    Database Administrator
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
  • 3. Index
    1 About AutoScout24
    2 Why MongoDB?
    3 MongoDBarchitectureat AutoScout24
    4 TestingMongoDB
    5 MongoDBbackup/restore
    6 MonitoringMongoDB
    7 Conclusion
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
    Seite 3
  • 4. AutoScout24
    Who arewe?
    5.4 Mio Users
    14 Countries
    1.9 Mio Cars
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
    Seite 4
  • 5. AutoScout24 KPI
    Numbers, Data and Facts fortheoverall Autoscout24 IT
    Two seperated Data Centers in Germany
    >1000 Servers, 10 Loadbalancer, 25 Firewalls, 60 development servers
    16 Storagesystemes with raw capacity of 800TB
    11,2 Mrd. Total Requests (PI + Grabber and Bots) / Month
    58 Mio. Image files for 1,9 Mio. Cars
    2,1 Gbit/sec Peak Traffic
    180TB Data Volume / Month
    Four Broadband Provider with in Total 13GBit/sec
    Seite 5
    Internet
    AS44355
    Global Traffic Mgmt.
    Loadbalancer
    Loadbalancer
    Firewall
    Firewall
    Loadbalancer
    Loadbalancer
    Backbone Router
    Backbone Router
    Applicationsserver
    Applicationsserver
    Database server
    Database server
    DC 2
    DC 1
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
  • 6. Index
    1 About AutoScout24
    2 WhyMongoDB
    3 MongoDBarchitectureat AutoScout24
    4 TestingMongoDB
    5 MongoDBbackup/restore
    6 MonitoringMongoDB
    7 Conclusion
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
    Seite 6
  • 7. Why MongoDB at AutoScout24?
    New Product 2011: Portal for car inspection and services
    Complete new application development from scratch for the Front- and Back-ends
    Let‘s use what we dreamed!
    Initial Database Requirements
    Scale for large quantity of data
    Highly available across Data Centers
    Flexible database changes (avoid the DBAs as much as possible!)
    MapReduce functions
    Easy management
    MongoDB was choosen as the best Product
    Product Launch was September 2011
    Seite 7
    http://werkstatt.autoscout24.de/
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
  • 8. Index
    1 About AutoScout24
    2 WhyMongoDB
    3 MongoDBarchitectureat AutoScout24
    4 TestingMongoDB
    5 MongoDBbackup/restore
    6 MonitoringMongoDB
    7 Conclusion
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
    Seite 8
  • 9. Mongo Architecture
    Seite 8
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
  • 10. Mongo Architecture
    Seite 9
    Replica set across two data centers: primary and secondary.
    All four nodes are actively used by the application.
    Primary data center split in two fire areas.
    In the primary data center, both primary and secondary nodes can assume the role of a primary automatically.
    In the secondary data center, both secondary nodes can only be manually promoted to become a primary to avoid split brain situations.
    Currently running MongoDB1.8.1
    All servers virtualized using Vmware ESX 4.1
    2 Cores, 4 GB RAM, 100 GB HDD per server
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
  • 11. Index
    1 About AutoScout24
    2 WhyMongoDB
    3 MongoDBarchitectureat AutoScout24
    4 TestingMongoDB
    5 MongoDBbackup/restore
    6 MonitoringMongoDB
    7 Conclusion
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
    Seite 10
  • 12. MongoDB Robustness Testing
    On-servertests
    Seite 11
    Running out of disk space on data volume while writing
    • On primary node only: crashed the whole replica set.
    • 13. On all nodes: it led to error messages in the log but no feedback in the client.
    Running out of disk space on data volume while reading
    • No significant impact.
    Removing volume while writing
    • Primary switched to another host and insert and replica set were broken.
    Removing volume while reading
    • Open query didn’t return. Further queries were handled by other set members.
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
  • 14. MongoDB HA Testing 1
    Replicasettests
    Seite 12
    Primary node failing while writing
    • 9 seconds failover time until new primary is elected without safe mode enabled.
    • 15. The failover takes 13 seconds with safe mode enabled.
    • 16. After reboot the former primary becomes a working secondary.
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
  • 17. MongoDB HA Testing 1
    Replicasettests
    Seite 13
    Secondary node failing while reading
    • It takes 9 seconds for the remaining replica set to realize the node is gone.
    Arbiter failing
    • No impact whatsoever.
    • 18. Majority remains intact and replica set is working properly.
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
  • 19. MongoDB HA Testing 2
    TestingReplica Set in bothdatacenters
    Seite 14
    Primary and Secondary nodes failing in main data center while writing
    • Test tool crashes and cannot write anymore.
    • 20. Cluster remains without primary.
    • 21. Reads are handled properly.
    Arbiter failing (in both data centers)
    • No impact.
    • 22. Replica set still working fine due to majority being in place.
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
  • 23. Index
    1 About AutoScout24
    2 WhyMongoDB
    3 MongoDBarchitectureat AutoScout24
    4 TestingMongoDB
    5 MongoDBbackup/restore
    6 MonitoringMongoDB
    7 Conclusion
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
    Seite 15
  • 24. MongoDB Backup
    Backingupdataandgettingintotrouble
    Seite 16
    Testingandpreparingbackupandrestore was themostboringtask.
    Long waitingwith large setsoftestdata.
    Different attempts:
    • LVM snapshot
    • 25. Working fine. Restore a bitmorecomplicated.
    • 26. Dump
    • 27. Easiertorestoreandtoextractspecificdatafrom a restore.
    Forourcurrentdatavolumemongodumpisthebestchoiceforus.
    Locking an issue (verifyyourlocksarereleased after backuporyou‘llbe in trouble).
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
  • 28. MongoDB Restore
    TestingRestores on fullreplicaset
    Seite 17
    Restore of full database (test size 70 GB)
    • Cannot be restored in one transaction because secondaries become stale although oplog size was already increased a lot
    • 29. With a restore in three 40 minute chunks a restore was possible
    • 30. A stale secondary could be restored within 30 minutes by removing its data
    Restore of one Secondary/Passive after failure or becoming stale
    • No surprises
    • 31. It took 30 minutes to get it back up and running
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
  • 32. Index
    1 About AutoScout24
    2 WhyMongoDB
    3 MongoDBarchitectureat AutoScout24
    4 TestingMongoDB
    5 MongoDBbackup/restore
    6 MonitoringMongoDB
    7 Conclusion
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
    Seite 18
  • 33. MongoDB Monitoring
    Monitoring 1.0 (duringtesting) andsomepitfalls
    Seite 19
    How?
    • Centreon
    • 34. Nagios-based
    • 35. Combining Nagios and Munin plugins to have nice charting and alerting in the same place.
    What?
    • Basically everything that could be relevant:
    • 36. CPU, Load, Memory, Network, I/O, Disks
    • 37. MongoDB Specific: Commands, Connections, Replica Set State, Flushing, Locking, Memory Consumption, Data file size
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
  • 38. MongoDB Monitoring
    Monitoring 1.1 (after going live)
    Seite 20
    How?
    • PRTG
    • 39. XML-driven
    • 40. Windows-based which needs to make heavy use of Cygwin to watch Linux servers.
    • 41. Integrates with AutoScout24 platform monitoring.
    • 42. Cluster monitoring with checks for overall availability and the like.
    What?
    • System Monitoring:
    • 43. CPU, Load, Memory, Network, I/O, Swap, Disks
    • 44. MongoDB Specific: Availability, Commands, Connections, Replica Set State, Flushing, Locking, Memory Consumption
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
  • 45. Index
    1 About AutoScout24
    2 WhyMongoDB
    3 MongoDBarchitectureat AutoScout24
    4 TestingMongoDB
    5 MongoDBbackup/restore
    6 MonitoringMongoDB
    7 Conclusion
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
    Seite 21
  • 46. MongoDB Conclusion
    What‘sbeenimportantforus
    Seite 22
    Conclusion:
    As longasreplicasetsareworkingfinetheyaregreat. Watch theirhealth and youwon‘tgetintotrouble.
    Overall robustnesscouldbefurtherimprovedwithbettererrorhandling and reportingfromtheMongoDBserver.
    C# driverneedssomefurthertweakingtoavoidaccesses on arbiters. The currentreleasefixes this but hasn‘tbeenintroduced in productionyet.
    Whenourprimarydatacenteris down, noprimarycanbeelected in thesecondarydatacenter due tomissingmajority. This was our design choicetohavebettercontroloverprimaryelection.
    Permissionsneedtobeset in a moreatomicfashion. Most ofourteammembersarecomingfrom an Oracle background so theyareexpecting a lot.
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
  • 47. MongoDB Outlook
    Whatwearelookingforwardto
    Seite 23
    Outlook:
    MongoDB2.0 looksreallypromising forus. Wearecurrentlywaitingforthefirstbugfixrelease and will thenstartourtesting.
    Improveddatacenterawareness a bigwinforus.
    Replicasetconfigurationwithminority in placeisreallyusefulforourfailoverscenario.
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas
  • 48. Seite 24
    Questions?
    jcthomas@autoscout24.de
    sgeib@autoscout24.de
    Looking for a great job as DBA in one of the
    largest internet companies in Europe?
    Great! We are looking to hire DBAs.
    Have a look on our homepage or contact us directly.
    | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

×