How We Fixed Our MongoDB Problems

5,512 views
5,294 views

Published on

How We Fixed Our MongoDB Problems

  1. 1. Eric Lubow @elubow elubow@simplereach.com #MongoDBDays How We Fixed Our MongoDB Problems
  2. 2. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Overview The Secret SimpleReach Usage Patterns Tools Architecture Implementation Questions • • • • • •
  3. 3. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays
  4. 4. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays The 2 Truths
  5. 5. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Even with the right tools, 80% of the work of building a big data system is acquiring and refining the raw data into usable data. The Real Truth
  6. 6. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays
  7. 7. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays
  8. 8. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Millions of URLs per day Over 1.25 billion page views per month 500m events per day (~6k events/second) Auto-scale 125-160 machines depending on traffic Built a predictive measurement algorithm for the social web SimpleReach
  9. 9. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays And It Goes Like This... C* Vertica
  10. 10. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays
  11. 11. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Why Mongo? Fast and easy prototyping Low barrier to entry B-Tree indexes and range queries Aggergation Everything is JSON TTLs MongoID • • • • • • •
  12. 12. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Goals Highly available Speed Repeatability Data accuracy (across storage engines) Clients should have minimal architecture knowledge Controlled Data Flow Patterns Control data set size Restore capabilities for non-ephemeral data • • • • • • • •
  13. 13. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Availability and Speed Internal service architecture Mongos on every server that talks to Mongo Server distribution across data centers Latest version isn’t always the greatest version Understand how usage patterns affect Mongo • • • • •
  14. 14. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Repeatability - Sharded Replica Set SHARD0000A MONGOS PRIMARY SECONDARY BASE AMI ORGANIZATIONAL BASE BASE IMAGE LAYOUT APPLICATION GROUP AMAZON LINUX MONITORING USERS MONGOD MONGOD- ARBITER SHARD0000B MONGOS AMAZON LINUX MONITORING USERS MONGOD APPLICATION
  15. 15. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Availability - Architecture Distribution US-EAST- 1a MONGO-SHARD- 0001-B MONGO-SHARD- 0000-A CASSANDRA-0001 CASSANDRA-0010 REDIS-0001A VERTICA-0001 iAPI- 0001 US-EAST- 1b MONGO-SHARD- 0002-B MONGO-SHARD- 0001-A CASSANDRA-0002 CASSANDRA-0011 REDIS-0001B iAPI- 0002 US-EAST- 1e MONGO-SHARD- 0002-A MONGO-SHARD- 0000-B CASSANDRA-0003 CASSANDRA-0012 VERTICA-0003 iAPI- 0003 VERTICA-0002
  16. 16. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays The Schrute of the Problem
  17. 17. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Releases Reasons why I update software: Because I want the latest version To get rid of the reminder
  18. 18. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Usage Patterns Mongos uses TCP-based flow control Separate DBs to deal with DB level locking Consistent access patterns Schema design Proper indexing Avoid scatter/gather and aim for targeted • • • • • •
  19. 19. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Consistent Access Patterns realtime_score (‘score’, ‘realtime’) score.realtime srt
  20. 20. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Schema Design Randomly pre-populate consistent document structures Use SetOnInsert to pre-populate Shard keys Separate DBs to deal with DB level locking (volume based) TTL Hashed shard keys $inc when possible, $set is expensive • • • • • • •
  21. 21. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Hourly Stats Documents{ "_id": BinData(5, "OWQ5NzQ0ZjgxZGUwYTdmMzM3Y2U0NDkzZGFlMGY0NTc="), "account_id": ObjectId("5165905f4240cf9182000069"), "hour": ISODate("2013-06-02T23:00:00Z"), "content_id": "56250f88530ecc21233be5d2384679b2", "totals": { "facebook_likes": 0, "facebook_shares": 1, "facebook_referrals": 0, "pageviews": 10134, "twitter_tweets": 16, "twitter_referrals": 3045, "social_actions": 17, "social_referrals": 3045 } }
  22. 22. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Daily Stats Documents{ "_id": BinData(5, "OWQ5NzQ0ZjgxZGUwYTdmMzM3Y2U0NDkzZGFlMGY0NTc="), "account_id": ObjectId("5165905f4240cf9182000069"), "day": ISODate("2013-06-02T00:00:00Z"), "content_id": "56250f88530ecc21233be5d2384679b2", "totals": { "pageviews": 10134, "twitter_tweets": 16, "social_actions": 17 }, "00": { "pageviews": 283, "twitter_tweets": 10, "social_actions": 10 }, "01": { "pageviews": 9851, "twitter_tweets": 6, "social_actions": 6 } }
  23. 23. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Path of a Packet INTERNET InternalAPI Solr C* Mongo Redis Vertica Consumers Queue FIRE HOSE EC API SC
  24. 24. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays NSQ by Bit.ly Distributed and de-centralized topology At least once delivery guaranteed Multicast style message routing Runtime discovery for consumers to find producers Allow for maintenance windows with no downtime Ephemeral channels for testing • • • • • •
  25. 25. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Controlled Data Flow Social Event Collector Social Data Batch & Write Processed Data Batch & Write Raw Data Calculate Score Write NSQ Multicast NSQ NSQ
  26. 26. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Problems?
  27. 27. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Service Architecture Internal API Solr Real-time C* C* Vertica
  28. 28. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Anatomy of an Endpoint MONGO MONGO VERTICA C* C* HOURLY CONTENT MONGO MONGO VERTICA C* C* TENMINUTE CONTENT QUERYINGMACHINES HELENUS HELENUS PYVERTICA PYMONGO PYMONGO PYVERTICA
  29. 29. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Endpoint Breakout Advantages Availability Consistent Access Patterns Minimal downtime changes Smaller code deploys Non-monolithic code base No async necessary • • • • • •
  30. 30. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays DevOps Monitor: Nagios, Statsd, and Cloudwatch Manage: Chef, OpsWorks, cSSHx, Vagrant Know failure cases Turn off balancer on backups Restart EVERYTHING on upgrade Extensive use of AWS • • • • • •
  31. 31. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Cloud Specificsblockdev --setra 256 Use ephemeral storage, not EBS volumes Use MMS Cloudwatch Metrics are important and easily scriptable Don’t use spots but always expect instance loss Kernel tuning • • • • • •
  32. 32. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Summary Understand your usage patterns Know the common failure cases Architecture distribution Homogeneous Distribution Monitoring & Automation • • • • •
  33. 33. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays We’re Hiring(Ask about Food Coma Fridays)
  34. 34. How We Fixed Our MongoDB Problems Eric Lubow @elubow #MongoDBDays Questions are guaranteed in life. Answers aren’t. Eric Lubow @elubow elubow@simplereach.com #Cassandra13 Thank you.

×