C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

1,894 views

Published on

Librato's Metrics platform relies on Cassandra as its sole data storage platform for time-series data. This session will discuss how we have scaled from a single six node Cassandra ring two years ago to the multiple storage rings that handle over 150,000 writes/second today. We'll cover the steps we have taken to scale the platform including the evolution of our underlying schema, operational tricks, and client-library improvements. The session will finish with our suggestions on how we believe Cassandra as a project and its community can be improved.

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,894
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
41
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

C* Summit 2013: Time-Series Metrics with Cassandra by Mike Heffner

  1. 1. #CASSANDRA13Time-Series Metrics with CassandraMike Heffner
  2. 2. #CASSANDRA13What we do.
  3. 3. #CASSANDRA13October 2011l  Decision: All measurements in Cassandral  Single EC2 Ring: 6 * m1.largel  Cassandra 0.8.xl  How does this work?
  4. 4. #CASSANDRA13Todayl  Multiple sharded ringsl  ~250,000 writes / secondl  EC2: m1.xlarge and m2.4xlargel  Cassandra 1.1.xl  Read load: < 1%
  5. 5. #CASSANDRA13Talk Highlightsl  Matching Schema to Storagel  Optimally Expiring Datal  Monitor Everything
  6. 6. #CASSANDRA13Matching Schema to Storage
  7. 7. #CASSANDRA13What is a Measurement?( Metric ID, Source )(X, Y) => (Time stamp, Value)
  8. 8. #CASSANDRA13Measurement CFExample: Select measurements between times [T1, T2]:
  9. 9. #CASSANDRA13Locating RowsLet us calculate the maximum row size:l  1 minute recordsl  1 week TTLl  7 days * 24 hours * 60 minutes => 10,080l  3 Longs * 8 bytes * 10k => ~240KB (not bad)
  10. 10. #CASSANDRA13Row Storage Over Time
  11. 11. #CASSANDRA13Row Storage Over Time
  12. 12. #CASSANDRA13Seek All The SStables
  13. 13. #CASSANDRA13Examining CF SSTablesMetrics/metric_id_epochs_60 histogramsOffset SSTables1 288212 588593 2011984 1783265 2230166 1549527 832898 2155210 811041 2 3 4 5 6 7 8 10nodetool cfhistograms Metrics metric_id_epochs_60
  14. 14. #CASSANDRA13Splitting the Rowsmget(Rows: [12, EBase_30], [12, EBase_40], Columns: {31->45})Retrieve Time Bases for Times 31->45 for metric ID 12:
  15. 15. #CASSANDRA13Examining CF SSTablesMetrics/metric_id_epochs_60Offset SSTables1 288212 588593 2011984 1783265 2230166 1549527 832898 2155210 811041 2 3 4 5 6 7 8 10nodetool cfhistograms Metrics metric_id_epochs_60Metrics/metric_id_epochs_60Offset SSTables1 34918202 53897623 40957604 13107415 9976 1 2 3 4 5 6 7 8 9 10BeforeAfter
  16. 16. #CASSANDRA13/graph me
  17. 17. #CASSANDRA13Optimally Expiring Data
  18. 18. #CASSANDRA13TTL Expirationl  Churn of about 750GB / dayl  12 TB totall  6% of data setl  gc_grace = 0l  STC
  19. 19. #CASSANDRA13Synchronized Compactions
  20. 20. #CASSANDRA13
  21. 21. #CASSANDRA13nodetool compact
  22. 22. #CASSANDRA13* http://hight3ch.com/garbage-truck-crushing-a-car/
  23. 23. #CASSANDRA13nodetool cleanup
  24. 24. #CASSANDRA13Cleanupl  Not just for topology changesl  Tombstoned rows (not referenced)l  Rotated row keys decrease referencesl  Cons: Must process every sstable.
  25. 25. #CASSANDRA13Immutable SStables
  26. 26. #CASSANDRA13Leverage SStable Mod Timel  If now – mtime > TTL => all data is expiredl  We can quickly eliminate entire sstables:find -mtime +<TTL> -name *.db | xargs rml  Fast and low overheadl  Cons: Rolling restart26G 2013-05-17 09:44 Metrics-metrics_60-hf-7209-Data.db
  27. 27. #CASSANDRA13nodetool setcompactionthreshold
  28. 28. #CASSANDRA13Increasing minor compactionsl  By default, STC requires a minimum of 4 sstsl  Leads to large non-compacted sstablesl  Dropping to 2 can flatten the storage growthnodetool setcompactionthreshold <ks> <cf> 2l  Cons: CPU/IO increase
  29. 29. #CASSANDRA13Result
  30. 30. #CASSANDRA13Effective Monitoring
  31. 31. #CASSANDRA13Ring Dashboards
  32. 32. #CASSANDRA13Disk Errors => Throw Awayl  If you ever see this, replace!end_request: I/O error, dev xvdb, sector 467940617end_request: I/O error, dev xvdb, sector 467940617l  Mark node down, bootstrap newl  No metric for this?
  33. 33. #CASSANDRA13Cassandra Log Volumel  Count log lines seen every 10 minutesl  Track over timel  Can identify:-  Unbalanced workloads-  Schema disagreements-  Phantom gossip nodes-  GC activityl  grep -v .java => exceptions
  34. 34. #CASSANDRA13Q & AMike Heffner/mheffner/mheffner

×