Mongodb meetup

4,163 views
4,314 views

Published on

Presentation on how Chartbeat uses MongoDB on EC2

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,163
On SlideShare
0
From Embeds
0
Number of Embeds
2,072
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Mongodb meetup

  1. 1. MongoDB & EC2: A Love Story? Eytan Daniyalzade @daniyalzadehttp://bit.ly/cb_mongodb_meetup
  2. 2. Contents● Chartbeat● Architecture● MongoDB & EC2 Challenges● Happy Ending: (MongoDB ? EC2)● Takeaways
  3. 3. chartbeat
  4. 4. Chartbeat: real-time analytics service ● 18 person startup in New York ● part of Betaworks ● peaking at just under 5M concurrents daily ○ up from 1M in July/2010
  5. 5. What chartbeat Provides● real-time view of site performance ○ top pages ○ new/returning visitors ○ traffic flow ■ where are people coming from ■ where are people going to● historic replay for the last 30 days
  6. 6. the architecture
  7. 7. Architecture, BrowserPart 1:<head><script type="text/javascript">var _sf_startpt=(new Date()).getTime()</script>...Part 2:...function loadChartbeat() { // insert script tag}window.onload = loadChartbeat;</body>(highly simplified)Ping is standard beacon logic, i.e. loading a 1x1 image.
  8. 8. Architecture, Backend● custom libevent-based C backend ○ real-time collection and aggregation● real-time system in-memory only● background queue jobs snapshot every x minutes ○ Gearman● historical data ○ mostly in MongoDB
  9. 9. Why Chartbeat uses MongoDB● Pure JSON all along ○ Live API ○ Historical data ○ No mapping back and forth● Fast Inserts (fire and forget)● Flexible Schema
  10. 10. Why Chartbeat uses EC2● Elastic Capacity● No trips to datacenter● EBS snapshots
  11. 11. Chartbeat & MongoDB & EC2 (1)● 3 Clusters ○ 1 for each product ○ 1 as a caching layer ○ 2 - 4 instance/cluster● m2-2xlarge ○ 34.2 GB merory ○ Ubuntu 10.04 ○ RAID0 x 4 - 1 TB volumes● Dedicated Snapshot Server ○ Shared among clusters ○ Serves as an arbiter as well
  12. 12. Chartbeat & MongoDB & EC2 (2) Cluster View
  13. 13. MongoDB & EC2 Challenges● Instances disappear ○ MongoDB can have long recovery operations ○ MongoDB is (was) not ACID compliant. Unclean shutdown could corrupt your data.● Poor IO performance on EBS ○ MongoDB has global read/write lock● Variable IO performance on EBS ○ Could cause replication issues
  14. 14. Question: ?? ?
  15. 15. Disappearing Instances
  16. 16. Instances Disappearing - Master/Slave● Down-time :(● Slave-promotion = headache ○ New instance ○ Copy oplog ○ Code change ○ Long/manual/error prone
  17. 17. Instances Disappearing - Replica Sets● No down-time :) yay!● Automatic failover on writes● Eventual failover on reads● No code change
  18. 18. Instances Disappearing - Replica Sets(caveats)● pymongo driver reads/writes from primary ○ pymongo 2.1 will fix this● chartbeat pymongo driver ○ based on MasterSlaveConnection ○ writes to primary ○ distribute reads among secondaries ○ automatic failover ○ eventual read re-distribution
  19. 19. Instances Disappearing - Fact of Life ● Accept this fact of life ● Always snapshot ○ Dedicated snapshot server ○ Hidden, i.e. no reads ● Automate everything ○ puppet ■ New instance from scratch within a minute ○ python-boto ■ Script all EC2 interaction ■ new_instance.py ■ mount_volumes_from_snap.py -o iid -n iid ■ snapshot_mongo.py
  20. 20. Instances Disappearing - Caveats● New volumes - slow!!! ○ EBS loads blocks lazily● Warm up EBS & File Cache before use ○ Options ■ Slowly direct the reads (app by app) ■ Run cache warm-up scripts ○ Not automated currently
  21. 21. Poor IO Performance on EBS
  22. 22. Poor IO Performance on EBS ● XFS & RAIDing Helpsbut, ● Disk IO varies over time ● MongoDB holds global lock on writes ● Query of death ○ Grinding-halt if not careful
  23. 23. Case Study: Historical Data ● For historical data, we store time series.{key:<key>ts:<key>values: {metric1: int1, metric2: int2}meta:{}} ● High Insert Rate vs Fast Historical Read ○ Optimize reads or writes? ● Fast inserts: ~1 MB/sec (through append only) ○ No disk-seek ● Historical reads: painfully slow
  24. 24. Faster Reads Through Cache DB ● Avoid reading from disk ● Favor reads over writes ● Aim for disk & memory locality {day_tskey:<key>values: {metric1: list(int), metric2: list(int)}} ● Data for historical reads resides together ● .append() to list could cause disk fragmentation
  25. 25. Avoid Fragmentation w/ Preallocation ● Fragmentation causes: ○ Inefficient disk usage ○ Slower writes (due to block allocation) ● Preallocate daily arrays instead ○ Pros: ■ No fragmentation ■ Write causes no change in data size ○ Cons: ■ Wasteful (we dont know keys ahead of time) ■ Requires heavy disk IO, ~7MB/sec (~60Mbis/sec on EBS) ● Conclusion: spread preallocation over 1 hour
  26. 26. EC2 Performance is Unpredictable
  27. 27. EC2 Unpredictability - Challenges● Resource contention in virtualized environment● EBS and Network IO performance varies drastically● RAID0 over 4 disks = 4 x risk
  28. 28. Heavy Monitoring (1)● Track individual disk performance over time● Create a new instance if disk not getting better
  29. 29. Heavy Monitoring (2)● Monitor replication lag● Remove from read mix if lag gets too high ○ Incorrect data ○ Strain on primary
  30. 30. Heavy Monitoring (3)● Track slow queries / opcounts / track page faults / IO volume ○ Tweak indexes accordingly ○ Limit requested data size if you can
  31. 31. Open Issues● More granular page-fault / memory usage information ○ Difficult due to mmap● Multi-datacenter usage● Burn-in scripts● Sharding ○ Tipping point will be insert volume ○ Or inefficient read memory usage● Better understand replication failures
  32. 32. Take-aways (1)● Automate everything ○ Instance creation, snapshotting, mount/unmount● Strive for high locality & low fragmentation● Repeatedly revise schema/index● Heavily monitor ○ Server: IO/mem/disk ○ MongoDB: Opcounts/Index Hits/Slow queries ○ Cluster: Replication lag ○ Application: CRUD times
  33. 33. Take-aways (2)
  34. 34. Questions? Slides: http://bit.ly/cb_mongodb_meetup

×