Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
@WrathOfChris github.com/WrathOfChris .blog.wrathofchris.com
Time Series Metrics
with Cassandra
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
About Me
• Chris Maxwell
• @WrathOfChris
• Sr Systems Engineer...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Mission
• Metrics service for internal services
• Deliver 90 6...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
History
Ancient Designs
Aging Tools
Pitfalls
https://flic.kr/p...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v1)
• Single instance
• carbon-relay +
(2-4) carbon-...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v1)
Problems:
• Single point of SUCCESS!
• Can grow ...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v2)
• Frontend: carbon-relay
• Backend: carbon-relay...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v2)
Problems:
• Kind of like a Dynamo, but not
• Rep...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Limitations
• Cloud Native
• Avoid Manual Intervention
• Ephem...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Design
What we set out to build
https://flic.kr/p/2spiXb
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v3)
…it got complicated…
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v3)
Ingest:
• carbon-c-relay
https://github.com/grob...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Graphite (v3)
Retrieval:
• graphite-api
https://github.com/bru...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Journey
Lessons learned along the way
https://flic.kr/p/hjY15L
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Sorted String Table (SSTable)
is an i...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Merge 4 similarly sized
SSTables into...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Updating a partition frequently
may c...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Metrics workload writes to
all partit...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Size Tiered Compaction
• Getting to the older data…
• Ingest 2...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Aside: DELETE
• DELETE is the INSERT of a
TOMBSTONE to the end...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
gc_grace_seconds
Grace is getting something you don’t deserve
...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
gc_grace_seconds
deleted data reappears!
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Time To Live
• INSERT with TTL becomes
tombstone after expiry
...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
TTL
• gc_grace_seconds is 10 days
(by default)
• 10s for 6 hou...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
https://flic.kr/p/4LNiXg
https://flic.kr/p/35RACf
1.4TB
Disks
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Levelled Compaction
based on Google’s LevelDB implementation
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Levelled Compaction
• Data is ingested at Level 0
• Immediatel...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Levelled Compaction
• Metrics workload writes to
all partition...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Levelled Compaction
• Metrics workload writes to
all partition...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Increasing Write rate
Constant Ingest rate
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Increasing Write rate
Constant Ingest rate
https://flic.kr/p/4...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
compaction_throughput_mb_per_sec: 128
…then 0 (unlimited)
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Speeding Compactions
… Don’t Do This …
multithreaded: true
cas...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Date Tiered Compaction
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Date Tiered Compaction
• Written by
Björn Hegerfors at Spotify...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Compact SSTables by date window
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
– but the docs say 8GB maximum heap!
MAX_HEAP_SIZE=16G
HEAP_NE...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
– Rick Branson, Instagram
http://www.slideshare.net/planetcass...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
All systems normal
Inadvertently tested 30,000 writes/sec duri...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Cloud Native
http://wattsupwiththat.com/2015/03/17/spaceship-l...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Cloud Native
Ec2MultiRegionSnitch
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Cloud Native
Ephemeral RAID0
-Djava.io.tmpdir=/mnt/cassandra/t...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Disable AutoScaling Terminate Process:
aws autoscaling suspend...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Cloud Native
This design works to 50 instances per region
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Security Groups
IAM instance-profile role
Security Group + (pe...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Management (OpsCenter)
IAM instance-profile role
Security Grou...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Internode Encryption
server_encryption_options:
internode_encr...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Seeds
Cheated….
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Seeds
• selects first 3 nodes from each
region using Autoscale...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
General
• >= 4 Cores per node always
• >= 8 Cores as soon as f...
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Breaking News!
Dense-storage Instances for EC2
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Questions?
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
d2 instances
Joining a node - system/network
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
d2 instances
Joining a node - disk performance
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
General
Metrics
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
General
Cassandra Metrics
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Metrics
CPU - DateTiered
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Metrics
JVM - DateTiered
@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
Metrics
Compaction/CommitLog - DateTiered
Upcoming SlideShare
Loading in …5
×

of

Cassandra meetup 20150331 Slide 1 Cassandra meetup 20150331 Slide 2 Cassandra meetup 20150331 Slide 3 Cassandra meetup 20150331 Slide 4 Cassandra meetup 20150331 Slide 5 Cassandra meetup 20150331 Slide 6 Cassandra meetup 20150331 Slide 7 Cassandra meetup 20150331 Slide 8 Cassandra meetup 20150331 Slide 9 Cassandra meetup 20150331 Slide 10 Cassandra meetup 20150331 Slide 11 Cassandra meetup 20150331 Slide 12 Cassandra meetup 20150331 Slide 13 Cassandra meetup 20150331 Slide 14 Cassandra meetup 20150331 Slide 15 Cassandra meetup 20150331 Slide 16 Cassandra meetup 20150331 Slide 17 Cassandra meetup 20150331 Slide 18 Cassandra meetup 20150331 Slide 19 Cassandra meetup 20150331 Slide 20 Cassandra meetup 20150331 Slide 21 Cassandra meetup 20150331 Slide 22 Cassandra meetup 20150331 Slide 23 Cassandra meetup 20150331 Slide 24 Cassandra meetup 20150331 Slide 25 Cassandra meetup 20150331 Slide 26 Cassandra meetup 20150331 Slide 27 Cassandra meetup 20150331 Slide 28 Cassandra meetup 20150331 Slide 29 Cassandra meetup 20150331 Slide 30 Cassandra meetup 20150331 Slide 31 Cassandra meetup 20150331 Slide 32 Cassandra meetup 20150331 Slide 33 Cassandra meetup 20150331 Slide 34 Cassandra meetup 20150331 Slide 35 Cassandra meetup 20150331 Slide 36 Cassandra meetup 20150331 Slide 37 Cassandra meetup 20150331 Slide 38 Cassandra meetup 20150331 Slide 39 Cassandra meetup 20150331 Slide 40 Cassandra meetup 20150331 Slide 41 Cassandra meetup 20150331 Slide 42 Cassandra meetup 20150331 Slide 43 Cassandra meetup 20150331 Slide 44 Cassandra meetup 20150331 Slide 45 Cassandra meetup 20150331 Slide 46 Cassandra meetup 20150331 Slide 47 Cassandra meetup 20150331 Slide 48 Cassandra meetup 20150331 Slide 49 Cassandra meetup 20150331 Slide 50 Cassandra meetup 20150331 Slide 51 Cassandra meetup 20150331 Slide 52 Cassandra meetup 20150331 Slide 53 Cassandra meetup 20150331 Slide 54 Cassandra meetup 20150331 Slide 55 Cassandra meetup 20150331 Slide 56 Cassandra meetup 20150331 Slide 57 Cassandra meetup 20150331 Slide 58 Cassandra meetup 20150331 Slide 59 Cassandra meetup 20150331 Slide 60
Upcoming SlideShare
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Next
Download to read offline and view in fullscreen.

2 Likes

Share

Download to read offline

Cassandra meetup 20150331

Download to read offline

Time Series Metrics with Cassandra

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Cassandra meetup 20150331

  1. 1. @WrathOfChris github.com/WrathOfChris .blog.wrathofchris.com Time Series Metrics with Cassandra
  2. 2. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris About Me • Chris Maxwell • @WrathOfChris • Sr Systems Engineer @ Ubiquiti Networks • Cloud Guy • DevOps
  3. 3. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Mission • Metrics service for internal services • Deliver 90 60 30 days of system and app metrics • Gain experience with Cassandra
  4. 4. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris History Ancient Designs Aging Tools Pitfalls https://flic.kr/p/6pqVnP
  5. 5. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Graphite (v1) • Single instance • carbon-relay + (2-4) carbon-cache processes (=cpu)
  6. 6. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Graphite (v1) Problems: • Single point of SUCCESS! • Can grow to 16-32 cores, but I/O saturation • Carbon write-amplifies 10x (flushes every 10s)
  7. 7. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Graphite (v2) • Frontend: carbon-relay • Backend: carbon-relay + 4x carbon-cache • m3.2xlarge ephemeral SSD • Manual consistent-hash by IP • Replication 3
  8. 8. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Graphite (v2) Problems: • Kind of like a Dynamo, but not • Replacing node requires full partition key shuffle • Adding 5 nodes took 6 days on 1Gbps to re-replicate ring • Less than 50% disk free means pain during reshuffle
  9. 9. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Limitations • Cloud Native • Avoid Manual Intervention • Ephemeral SSD > EBS https://flic.kr/p/2hZy6P
  10. 10. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Design What we set out to build https://flic.kr/p/2spiXb
  11. 11. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Graphite (v3) …it got complicated…
  12. 12. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Graphite (v3) Ingest: • carbon-c-relay https://github.com/grobian/carbon-c-relay • cyanite https://github.com/pyr/cyanite • cassandra
  13. 13. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Graphite (v3) Retrieval: • graphite-api https://github.com/brutasse/graphite-api • grafana https://github.com/grafana/grafana • cyanite https://github.com/pyr/cyanite • elasticsearch (metric path cache)
  14. 14. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
  15. 15. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Journey Lessons learned along the way https://flic.kr/p/hjY15L
  16. 16. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Size Tiered Compaction • Sorted String Table (SSTable) is an immutable data file • New data written to small SSTables • Periodically merged into larger SSTables
  17. 17. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Size Tiered Compaction • Merge 4 similarly sized SSTables into 1 new SSTable • Data migrates into larger SSTables that are less- regularly compacted • Disk space required: Sum of 4 largest SSTables
  18. 18. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Size Tiered Compaction • Updating a partition frequently may cause it to be spread between SSTables • Metrics workload writes to all partitions, every period
  19. 19. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Size Tiered Compaction • Metrics workload writes to all partitions, every period • Range queries that spanned 50+ SSTables !!!
  20. 20. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Size Tiered Compaction • Getting to the older data… • Ingest 25% more data • Major Compaction: • Requires 50% free space • Compacts all SSTables into 1 large SSTable
  21. 21. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Aside: DELETE • DELETE is the INSERT of a TOMBSTONE to the end of a partition • INSERTs with TTL become tombstones in the future • Tombstones live for at least gc_grace_seconds • Data is only deleted during compaction https://flic.kr/p/35RACf
  22. 22. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris gc_grace_seconds Grace is getting something you don’t deserve (time to noetool repair a node that is down)
  23. 23. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris gc_grace_seconds deleted data reappears!
  24. 24. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Time To Live • INSERT with TTL becomes tombstone after expiry • 10s for 6 hours • 60s for 3 days • 300s for 30 days https://flic.kr/p/6Fxv7M
  25. 25. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris TTL • gc_grace_seconds is 10 days (by default) • 10s for 6 hours 10.25 days • 60s for 3 days 13 days • 300s for 30 days 40 days https://flic.kr/p/gBLHYf
  26. 26. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris https://flic.kr/p/4LNiXg https://flic.kr/p/35RACf 1.4TB Disks
  27. 27. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Levelled Compaction based on Google’s LevelDB implementation
  28. 28. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Levelled Compaction • Data is ingested at Level 0 • Immediately compacted and merged with L1 • Partitions are merged up to Ln • 90% of partition data guaranteed to be in same level
  29. 29. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Levelled Compaction • Metrics workload writes to all partitions, every period • Immediately rolled up to L1 • Immediately rolled up to L2 • Immediately rolled up to L3 • Immediately rolled up to L4 • Immediately rolled up to L5
  30. 30. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Levelled Compaction • Metrics workload writes to all partitions, every period • 1 batch of writes —> 5 writes
  31. 31. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Increasing Write rate Constant Ingest rate
  32. 32. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Increasing Write rate Constant Ingest rate https://flic.kr/p/4LNiXg
  33. 33. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris compaction_throughput_mb_per_sec: 128 …then 0 (unlimited)
  34. 34. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Speeding Compactions … Don’t Do This … multithreaded: true cassandra_in_memory_compaction_limit_in_mb: 256M
  35. 35. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Date Tiered Compaction
  36. 36. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Date Tiered Compaction • Written by Björn Hegerfors at Spotify • Experimental! • Released in 2.0.11 / 2.1.1 • Group data by time • Compact by time • Drop expired data by time
  37. 37. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Compact SSTables by date window
  38. 38. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris – but the docs say 8GB maximum heap! MAX_HEAP_SIZE=16G HEAP_NEWSIZE=2048M
  39. 39. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris – Rick Branson, Instagram http://www.slideshare.net/planetcassandra/cassandra-summit-2014-cassandra-at-instagram-2014 -XX:+CMSScavengeBeforeRemark -XX:CMSMaxAbortablePrecleanTime=60000 -XX:CMSWaitDuration=30000
  40. 40. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris All systems normal Inadvertently tested 30,000 writes/sec during launch
  41. 41. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Cloud Native http://wattsupwiththat.com/2015/03/17/spaceship-lenticular-cloud-maybe-the-coolest-cloud-picture-evah/
  42. 42. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Cloud Native Ec2MultiRegionSnitch
  43. 43. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Cloud Native Ephemeral RAID0 -Djava.io.tmpdir=/mnt/cassandra/tmp
  44. 44. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Disable AutoScaling Terminate Process: aws autoscaling suspend-processes --scaling-processes Terminate
  45. 45. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Cloud Native This design works to 50 instances per region
  46. 46. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Security Groups IAM instance-profile role Security Group + (per region) Security Group
  47. 47. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Management (OpsCenter) IAM instance-profile role Security Group + (per region) Security Group
  48. 48. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Internode Encryption server_encryption_options: internode_encryption: all • keytool -genkeypair -alias test-cass -keyalg RSA -validity 3650 -keystore test-cass.keystore • keytool -export -alias test-cass -keystore test-cass.keystore -rfc -file test-cass.crt • keytool -import -alias test-cass -file test-cass.crt -keystore test-cass.truststore
  49. 49. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Seeds Cheated….
  50. 50. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Seeds • selects first 3 nodes from each region using Autoscale Group order • ignores (self) as a seed for bootstrapping first 3 nodes in each region
  51. 51. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris General • >= 4 Cores per node always • >= 8 Cores as soon as feasible • EC2 sweet spots: • m3.2xlarge (8c/160GB) for small workloads • i2.2xlarge (8c/1.6TB) for production • Avoid c3.2xlarge - CPU:Mem ratio is too high
  52. 52. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Breaking News! Dense-storage Instances for EC2
  53. 53. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Questions?
  54. 54. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris d2 instances Joining a node - system/network
  55. 55. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris d2 instances Joining a node - disk performance
  56. 56. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris General Metrics
  57. 57. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris General Cassandra Metrics
  58. 58. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Metrics CPU - DateTiered
  59. 59. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Metrics JVM - DateTiered
  60. 60. @WrathOfChris blog.wrathofchris.com github.com/WrathOfChris Metrics Compaction/CommitLog - DateTiered
  • HenryHu3

    Dec. 27, 2015
  • maheshcr

    Apr. 8, 2015

Time Series Metrics with Cassandra

Views

Total views

4,338

On Slideshare

0

From embeds

0

Number of embeds

2,434

Actions

Downloads

22

Shares

0

Comments

0

Likes

2

×