Austin cassandra meetup


Published on

Slides used for my presentation to the Austin Cassandra Meetup where I discuss how Cassandra fits in to Rackspace Cloud Monitoring.

Hint: It's just a small part.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Self serviceExtensive dashboard12 platformsDeep analysis
  • Secondary, nice to have, but not critical to monitoring.
  • Austin cassandra meetup

    1. 1. How we use CassandraGary Dusbabek@gdusbabek
    2. 2. We’re hiring
    3. 3. CM Overview Control Cluster Data Cluster
    4. 4. CM OverviewThousands of serversPre-existing solutionsLessons learned from CloudkickInternal versus externalMillions of checks
    5. 5. Cassandra is the leastinteresting part
    6. 6. TerminologyEntity Something with an IP address or host name
    7. 7. TerminologyCheck Tied to an Entity Is an action Produces metrics
    8. 8. Features Remote checksCollectors in 5 DCs Processing in 3 DCsAlerting Notifications
    9. 9. All REST all the time More Ops friendlyFeatures Metrics Agent
    10. 10. Future Automation Prediction Support hooksAgent expansion Correlation AggregationEntity Spanning
    11. 11. 1,000 Words
    12. 12. Control ClusterMetadataStateThree datacentersHigh RFWide rowsEasy dump & load
    13. 13. Data ModelRich but simpleObjects used together stored together Simple parent-child relationsOne row per customer (tenant)Composite column names
    14. 14. Data Model Good: Single Parent/Child Acyclic
    15. 15. Data Model Bad: Complex Cyclic
    16. 16. Data Model As Columns Easy slicing
    17. 17. Y U NO MySQL?
    18. 18. Control ClusterAPI server is Node.jsJavascript ORM library• Define object model in JS• Read/write entire objects• Never think about CQLnode-cassandra-client
    19. 19. Control Cluster
    20. 20. The fun starts Data here Cluster
    21. 21. Data ClusterGoal: Fast graphsTime series dataFewer data pointsOK to shave resolutionRecent data is most important
    22. 22. LocatorIdentifies a single metric check identifier + nameE.g. my:check:id:ttfb
    23. 23. GranularityFull, 5m, 20m, 60m, 240m, 1440m
    24. 24. Rollup ConceptsSlot (Range)• Pegged at 4032 slots• One slot is a range of seconds (varies with granularity)• metrics_locator CF• Key is granularity name + slot num• Columns index keys in rollup tables
    25. 25. Keyed by asciiBigint column names Blob column values JDBC Rollups
    26. 26. Full Resolution! Arrival• time, name, several metrics• metric = name, type, value• Compute locator and slot• Insert metrics col=timestamp, value=encoded metric• Single Cassandra APPLY BATCH;
    27. 27. Rollups• Two types – Rollup all metrics from timeX to timeY – Rollup a single metric from timeX to timeY – Times may span multiple slots (ranges)• Use rollups to produce rollups – E.g.: use 20m data points to create 60m point. – Store number of data points with rollup
    28. 28. Rollups• Gotchas! – Do not want to rollup a coarse range when finer range that feeds data to it is scheduled for rollup shortly 60m | | | … 20m | | | | | | | |… . 5m |||||||||||||||||||||||||||||… – Mind the “tail” during datapoint queries (calculate rollups on the fly)
    29. 29. It ScalesRollup operations are idempotent* Simplifies availabilityRollups are easily parallelized Hash partition the locator space
    30. 30. But…What if data arrives after rollup is performed?More than 24hrs late: don’t care, forget itElse treat normally: slots are scheduled for rollups as they age
    31. 31. / HBase@gdusbabek
    32. 32. Image Creditssoccer bored clock car robot dish sand slots sushi airplane scale hourglass street