Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mongodb meetup

4,644 views

Published on

Presentation on how Chartbeat uses MongoDB on EC2

Published in: Technology, Business
  • Be the first to comment

Mongodb meetup

  1. 1. MongoDB & EC2: A Love Story? Eytan Daniyalzade @daniyalzadehttp://bit.ly/cb_mongodb_meetup
  2. 2. Contents● Chartbeat● Architecture● MongoDB & EC2 Challenges● Happy Ending: (MongoDB ? EC2)● Takeaways
  3. 3. chartbeat
  4. 4. Chartbeat: real-time analytics service ● 18 person startup in New York ● part of Betaworks ● peaking at just under 5M concurrents daily ○ up from 1M in July/2010
  5. 5. What chartbeat Provides● real-time view of site performance ○ top pages ○ new/returning visitors ○ traffic flow ■ where are people coming from ■ where are people going to● historic replay for the last 30 days
  6. 6. the architecture
  7. 7. Architecture, BrowserPart 1:<head><script type="text/javascript">var _sf_startpt=(new Date()).getTime()</script>...Part 2:...function loadChartbeat() { // insert script tag}window.onload = loadChartbeat;</body>(highly simplified)Ping is standard beacon logic, i.e. loading a 1x1 image.
  8. 8. Architecture, Backend● custom libevent-based C backend ○ real-time collection and aggregation● real-time system in-memory only● background queue jobs snapshot every x minutes ○ Gearman● historical data ○ mostly in MongoDB
  9. 9. Why Chartbeat uses MongoDB● Pure JSON all along ○ Live API ○ Historical data ○ No mapping back and forth● Fast Inserts (fire and forget)● Flexible Schema
  10. 10. Why Chartbeat uses EC2● Elastic Capacity● No trips to datacenter● EBS snapshots
  11. 11. Chartbeat & MongoDB & EC2 (1)● 3 Clusters ○ 1 for each product ○ 1 as a caching layer ○ 2 - 4 instance/cluster● m2-2xlarge ○ 34.2 GB merory ○ Ubuntu 10.04 ○ RAID0 x 4 - 1 TB volumes● Dedicated Snapshot Server ○ Shared among clusters ○ Serves as an arbiter as well
  12. 12. Chartbeat & MongoDB & EC2 (2) Cluster View
  13. 13. MongoDB & EC2 Challenges● Instances disappear ○ MongoDB can have long recovery operations ○ MongoDB is (was) not ACID compliant. Unclean shutdown could corrupt your data.● Poor IO performance on EBS ○ MongoDB has global read/write lock● Variable IO performance on EBS ○ Could cause replication issues
  14. 14. Question: ?? ?
  15. 15. Disappearing Instances
  16. 16. Instances Disappearing - Master/Slave● Down-time :(● Slave-promotion = headache ○ New instance ○ Copy oplog ○ Code change ○ Long/manual/error prone
  17. 17. Instances Disappearing - Replica Sets● No down-time :) yay!● Automatic failover on writes● Eventual failover on reads● No code change
  18. 18. Instances Disappearing - Replica Sets(caveats)● pymongo driver reads/writes from primary ○ pymongo 2.1 will fix this● chartbeat pymongo driver ○ based on MasterSlaveConnection ○ writes to primary ○ distribute reads among secondaries ○ automatic failover ○ eventual read re-distribution
  19. 19. Instances Disappearing - Fact of Life ● Accept this fact of life ● Always snapshot ○ Dedicated snapshot server ○ Hidden, i.e. no reads ● Automate everything ○ puppet ■ New instance from scratch within a minute ○ python-boto ■ Script all EC2 interaction ■ new_instance.py ■ mount_volumes_from_snap.py -o iid -n iid ■ snapshot_mongo.py
  20. 20. Instances Disappearing - Caveats● New volumes - slow!!! ○ EBS loads blocks lazily● Warm up EBS & File Cache before use ○ Options ■ Slowly direct the reads (app by app) ■ Run cache warm-up scripts ○ Not automated currently
  21. 21. Poor IO Performance on EBS
  22. 22. Poor IO Performance on EBS ● XFS & RAIDing Helpsbut, ● Disk IO varies over time ● MongoDB holds global lock on writes ● Query of death ○ Grinding-halt if not careful
  23. 23. Case Study: Historical Data ● For historical data, we store time series.{key:<key>ts:<key>values: {metric1: int1, metric2: int2}meta:{}} ● High Insert Rate vs Fast Historical Read ○ Optimize reads or writes? ● Fast inserts: ~1 MB/sec (through append only) ○ No disk-seek ● Historical reads: painfully slow
  24. 24. Faster Reads Through Cache DB ● Avoid reading from disk ● Favor reads over writes ● Aim for disk & memory locality {day_tskey:<key>values: {metric1: list(int), metric2: list(int)}} ● Data for historical reads resides together ● .append() to list could cause disk fragmentation
  25. 25. Avoid Fragmentation w/ Preallocation ● Fragmentation causes: ○ Inefficient disk usage ○ Slower writes (due to block allocation) ● Preallocate daily arrays instead ○ Pros: ■ No fragmentation ■ Write causes no change in data size ○ Cons: ■ Wasteful (we dont know keys ahead of time) ■ Requires heavy disk IO, ~7MB/sec (~60Mbis/sec on EBS) ● Conclusion: spread preallocation over 1 hour
  26. 26. EC2 Performance is Unpredictable
  27. 27. EC2 Unpredictability - Challenges● Resource contention in virtualized environment● EBS and Network IO performance varies drastically● RAID0 over 4 disks = 4 x risk
  28. 28. Heavy Monitoring (1)● Track individual disk performance over time● Create a new instance if disk not getting better
  29. 29. Heavy Monitoring (2)● Monitor replication lag● Remove from read mix if lag gets too high ○ Incorrect data ○ Strain on primary
  30. 30. Heavy Monitoring (3)● Track slow queries / opcounts / track page faults / IO volume ○ Tweak indexes accordingly ○ Limit requested data size if you can
  31. 31. Open Issues● More granular page-fault / memory usage information ○ Difficult due to mmap● Multi-datacenter usage● Burn-in scripts● Sharding ○ Tipping point will be insert volume ○ Or inefficient read memory usage● Better understand replication failures
  32. 32. Take-aways (1)● Automate everything ○ Instance creation, snapshotting, mount/unmount● Strive for high locality & low fragmentation● Repeatedly revise schema/index● Heavily monitor ○ Server: IO/mem/disk ○ MongoDB: Opcounts/Index Hits/Slow queries ○ Cluster: Replication lag ○ Application: CRUD times
  33. 33. Take-aways (2)
  34. 34. Questions? Slides: http://bit.ly/cb_mongodb_meetup

×