Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
@WrathOfChris github.com/WrathOfChris .blog.wrathofchris.com
Time Series Metrics
with Cassandra
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
About Me
• Chris Maxwell
• @WrathOfChris
• Sr Systems Engineer...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Mission
• Metrics service for internal services
• Deliver 90 6...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
History
Ancient Designs
Aging Tools
Pitfalls
https://flic.kr/p...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v1)
• Single instance
• carbon-relay +
(2-4) carbon-...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v1)
Problems:
• Single point of SUCCESS!
• Can grow ...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v2)
• Frontend: carbon-relay
• Backend: carbon-relay...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v2)
Problems:
• Kind of like a Dynamo, but not
• Rep...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Limitations
• Cloud Native
• Avoid Manual Intervention
• Ephem...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Design
What we set out to build
https://flic.kr/p/2spiXb
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v3)
…it got complicated…
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v3)
Ingest:
• carbon-c-relay
https://github.com/grob...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v3)
Retrieval:
• graphite-api
https://github.com/bru...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Journey
Lessons learned along the way
https://flic.kr/p/hjY15L
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Sorted String Table (SSTable)
is an i...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Merge 4 similarly sized
SSTables into...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Updating a partition frequently
may c...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Metrics workload writes to
all partit...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Getting to the older data…
• Ingest 2...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Aside: DELETE
• DELETE is the INSERT of a
TOMBSTONE to the end...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
gc_grace_seconds
Grace is getting something you don’t deserve
...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
gc_grace_seconds
deleted data reappears!
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Time To Live
• INSERT with TTL becomes
tombstone after expiry
...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
TTL
• gc_grace_seconds is 10 days
(by default)
• 10s for 6 hou...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
https://flic.kr/p/4LNiXg
https://flic.kr/p/35RACf
1.4TB
Disks
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Levelled Compaction
based on Google’s LevelDB implementation
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Levelled Compaction
• Data is ingested at Level 0
• Immediatel...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Levelled Compaction
• Metrics workload writes to
all partition...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Levelled Compaction
• Metrics workload writes to
all partition...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Increasing Write rate
Constant Ingest rate
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Increasing Write rate
Constant Ingest rate
https://flic.kr/p/4...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
compaction_throughput_mb_per_sec: 128
…then 0 (unlimited)
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Speeding Compactions
… Don’t Do This …
multithreaded: true
cas...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Date Tiered Compaction
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Date Tiered Compaction
• Written by
Björn Hegerfors at Spotify...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Compact SSTables by date window
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
– but the docs say 8GB maximum heap!
MAX_HEAP_SIZE=16G
HEAP_NE...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
– Rick Branson, Instagram
http://www.slideshare.net/planetcass...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
All systems normal
Inadvertently tested 30,000 writes/sec duri...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Cloud Native
http://wattsupwiththat.com/2015/03/17/spaceship-l...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Cloud Native
Ec2MultiRegionSnitch
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Cloud Native
Ephemeral RAID0
-Djava.io.tmpdir=/mnt/cassandra/t...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Disable AutoScaling Terminate Process:
aws autoscaling suspend...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Cloud Native
This design works to 50 instances per region
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Security Groups
IAM instance-profile role
Security Group + (pe...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Management (OpsCenter)
IAM instance-profile role
Security Grou...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Internode Encryption
server_encryption_options:
internode_encr...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Seeds
Cheated….
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Seeds
• selects first 3 nodes from each
region using Autoscale...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
General
• >= 4 Cores per node always
• >= 8 Cores as soon as f...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Breaking News!
Dense-storage Instances for EC2
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Questions?
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
d2 instances
Joining a node - system/network
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
d2 instances
Joining a node - disk performance
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
General
Metrics
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
General
Cassandra Metrics
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Metrics
CPU - DateTiered
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Metrics
JVM - DateTiered
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Metrics
Compaction/CommitLog - DateTiered
Upcoming SlideShare
Loading in …5
×

Cassandra meetup 20150331

3,331 views

Published on

Time Series Metrics with Cassandra

Published in: Technology
  • Be the first to comment

Cassandra meetup 20150331

  1. 1. @WrathOfChris github.com/WrathOfChris .blog.wrathofchris.com Time Series Metrics with Cassandra
  2. 2. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris About Me • Chris Maxwell • @WrathOfChris • Sr Systems Engineer @ Ubiquiti Networks • Cloud Guy • DevOps
  3. 3. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Mission • Metrics service for internal services • Deliver 90 60 30 days of system and app metrics • Gain experience with Cassandra
  4. 4. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris History Ancient Designs Aging Tools Pitfalls https://flic.kr/p/6pqVnP
  5. 5. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Graphite (v1) • Single instance • carbon-relay + (2-4) carbon-cache processes (=cpu)
  6. 6. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Graphite (v1) Problems: • Single point of SUCCESS! • Can grow to 16-32 cores, but I/O saturation • Carbon write-amplifies 10x (flushes every 10s)
  7. 7. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Graphite (v2) • Frontend: carbon-relay • Backend: carbon-relay + 4x carbon-cache • m3.2xlarge ephemeral SSD • Manual consistent-hash by IP • Replication 3
  8. 8. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Graphite (v2) Problems: • Kind of like a Dynamo, but not • Replacing node requires full partition key shuffle • Adding 5 nodes took 6 days on 1Gbps to re-replicate ring • Less than 50% disk free means pain during reshuffle
  9. 9. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Limitations • Cloud Native • Avoid Manual Intervention • Ephemeral SSD > EBS https://flic.kr/p/2hZy6P
  10. 10. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Design What we set out to build https://flic.kr/p/2spiXb
  11. 11. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Graphite (v3) …it got complicated…
  12. 12. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Graphite (v3) Ingest: • carbon-c-relay https://github.com/grobian/carbon-c-relay • cyanite https://github.com/pyr/cyanite • cassandra
  13. 13. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Graphite (v3) Retrieval: • graphite-api https://github.com/brutasse/graphite-api • grafana https://github.com/grafana/grafana • cyanite https://github.com/pyr/cyanite • elasticsearch (metric path cache)
  14. 14. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
  15. 15. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Journey Lessons learned along the way https://flic.kr/p/hjY15L
  16. 16. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Size Tiered Compaction • Sorted String Table (SSTable) is an immutable data file • New data written to small SSTables • Periodically merged into larger SSTables
  17. 17. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Size Tiered Compaction • Merge 4 similarly sized SSTables into 1 new SSTable • Data migrates into larger SSTables that are less- regularly compacted • Disk space required: Sum of 4 largest SSTables
  18. 18. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Size Tiered Compaction • Updating a partition frequently may cause it to be spread between SSTables • Metrics workload writes to all partitions, every period
  19. 19. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Size Tiered Compaction • Metrics workload writes to all partitions, every period • Range queries that spanned 50+ SSTables !!!
  20. 20. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Size Tiered Compaction • Getting to the older data… • Ingest 25% more data • Major Compaction: • Requires 50% free space • Compacts all SSTables into 1 large SSTable
  21. 21. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Aside: DELETE • DELETE is the INSERT of a TOMBSTONE to the end of a partition • INSERTs with TTL become tombstones in the future • Tombstones live for at least gc_grace_seconds • Data is only deleted during compaction https://flic.kr/p/35RACf
  22. 22. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris gc_grace_seconds Grace is getting something you don’t deserve (time to noetool repair a node that is down)
  23. 23. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris gc_grace_seconds deleted data reappears!
  24. 24. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Time To Live • INSERT with TTL becomes tombstone after expiry • 10s for 6 hours • 60s for 3 days • 300s for 30 days https://flic.kr/p/6Fxv7M
  25. 25. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris TTL • gc_grace_seconds is 10 days (by default) • 10s for 6 hours 10.25 days • 60s for 3 days 13 days • 300s for 30 days 40 days https://flic.kr/p/gBLHYf
  26. 26. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris https://flic.kr/p/4LNiXg https://flic.kr/p/35RACf 1.4TB Disks
  27. 27. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Levelled Compaction based on Google’s LevelDB implementation
  28. 28. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Levelled Compaction • Data is ingested at Level 0 • Immediately compacted and merged with L1 • Partitions are merged up to Ln • 90% of partition data guaranteed to be in same level
  29. 29. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Levelled Compaction • Metrics workload writes to all partitions, every period • Immediately rolled up to L1 • Immediately rolled up to L2 • Immediately rolled up to L3 • Immediately rolled up to L4 • Immediately rolled up to L5
  30. 30. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Levelled Compaction • Metrics workload writes to all partitions, every period • 1 batch of writes —> 5 writes
  31. 31. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Increasing Write rate Constant Ingest rate
  32. 32. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Increasing Write rate Constant Ingest rate https://flic.kr/p/4LNiXg
  33. 33. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris compaction_throughput_mb_per_sec: 128 …then 0 (unlimited)
  34. 34. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Speeding Compactions … Don’t Do This … multithreaded: true cassandra_in_memory_compaction_limit_in_mb: 256M
  35. 35. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Date Tiered Compaction
  36. 36. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Date Tiered Compaction • Written by Björn Hegerfors at Spotify • Experimental! • Released in 2.0.11 / 2.1.1 • Group data by time • Compact by time • Drop expired data by time
  37. 37. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Compact SSTables by date window
  38. 38. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris – but the docs say 8GB maximum heap! MAX_HEAP_SIZE=16G HEAP_NEWSIZE=2048M
  39. 39. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris – Rick Branson, Instagram http://www.slideshare.net/planetcassandra/cassandra-summit-2014-cassandra-at-instagram-2014 -XX:+CMSScavengeBeforeRemark -XX:CMSMaxAbortablePrecleanTime=60000 -XX:CMSWaitDuration=30000
  40. 40. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris All systems normal Inadvertently tested 30,000 writes/sec during launch
  41. 41. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Cloud Native http://wattsupwiththat.com/2015/03/17/spaceship-lenticular-cloud-maybe-the-coolest-cloud-picture-evah/
  42. 42. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Cloud Native Ec2MultiRegionSnitch
  43. 43. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Cloud Native Ephemeral RAID0 -Djava.io.tmpdir=/mnt/cassandra/tmp
  44. 44. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Disable AutoScaling Terminate Process: aws autoscaling suspend-processes --scaling-processes Terminate
  45. 45. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Cloud Native This design works to 50 instances per region
  46. 46. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Security Groups IAM instance-profile role Security Group + (per region) Security Group
  47. 47. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Management (OpsCenter) IAM instance-profile role Security Group + (per region) Security Group
  48. 48. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Internode Encryption server_encryption_options: internode_encryption: all • keytool -genkeypair -alias test-cass -keyalg RSA -validity 3650 -keystore test-cass.keystore • keytool -export -alias test-cass -keystore test-cass.keystore -rfc -file test-cass.crt • keytool -import -alias test-cass -file test-cass.crt -keystore test-cass.truststore
  49. 49. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Seeds Cheated….
  50. 50. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Seeds • selects first 3 nodes from each region using Autoscale Group order • ignores (self) as a seed for bootstrapping first 3 nodes in each region
  51. 51. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris General • >= 4 Cores per node always • >= 8 Cores as soon as feasible • EC2 sweet spots: • m3.2xlarge (8c/160GB) for small workloads • i2.2xlarge (8c/1.6TB) for production • Avoid c3.2xlarge - CPU:Mem ratio is too high
  52. 52. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Breaking News! Dense-storage Instances for EC2
  53. 53. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Questions?
  54. 54. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris d2 instances Joining a node - system/network
  55. 55. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris d2 instances Joining a node - disk performance
  56. 56. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris General Metrics
  57. 57. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris General Cassandra Metrics
  58. 58. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Metrics CPU - DateTiered
  59. 59. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Metrics JVM - DateTiered
  60. 60. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Metrics Compaction/CommitLog - DateTiered

×