Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Time Series Data with Apache Cassandra

1,651 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Time Series Data with Apache Cassandra

  1. 1. Time Series Data With Apache Cassandra Strangeloop September 19, 2014 Eric Evans eevans@opennms.org @jericevans
  2. 2. Open
  3. 3. Open
  4. 4. Open
  5. 5. Open
  6. 6. Network Management System
  7. 7. OpenNMS: What It Is ● Network Management System ○ Discovery and Provisioning ○ Service monitoring ○ Data collection ○ Event management, notifications ● Java, open source, GPLv3 ● Since 1999
  8. 8. Time series: RRDTool ● Round Robin Database ● First released 1999 ● Time series storage ● File-based, constant-size, self-maintaining ● Automatic, incremental aggregation
  9. 9. … and oh yeah, graphing
  10. 10. Consider ● 5+ IOPs per update (read-modify-write)! ● 100,000s of metrics, 1,000s IOPS ● 1,000,000s of metrics, 10,000s IOPS ● 15,000 RPM SAS drive, ~175-200 IOPS
  11. 11. Hmmm We collect and write a great deal; We read (graph) relatively little. So why are we aggregating everything?
  12. 12. Also ● Not everything is a graph ● Inflexible ● Incremental backups impractical ● Availability subject to filesystem access
  13. 13. TIL Metrics typically appear in groups that are accessed together. Optimizing storage for grouped access is a great idea!
  14. 14. What OpenNMS needs: ● High throughput ● High availability ● Late aggregation ● Grouped storage/retrieval
  15. 15. Cassandra ● Apache top-level project ● Distributed database ● Highly available ● High throughput ● Tunable consistency
  16. 16. SSTables Writes Memtable Commitlog SSTable Memory Disk
  17. 17. Write Properties ● Optimized for write throughput ● Sorted on disk ● Perfect for time series!
  18. 18. Partitioning A B C Key: Apple ... Z A
  19. 19. Placement A B C Key: Apple ...
  20. 20. Replication A B C Key: Apple ...
  21. 21. CAP Theorem Consistency Availability Partition tolerance
  22. 22. Consistency A B ? W=2
  23. 23. Consistency ? B C R=2 R+W > N
  24. 24. Distribution Properties ● Symmetrical ● Linearly scalable ● Redundant ● Highly available
  25. 25. D ata M odel
  26. 26. Data Model resource
  27. 27. Data Model resource T1 T2 T3
  28. 28. Data Model resource T1 M1 M2 V1 V2 M3 V3 T2 M1 M2 V1 V2 M3 V3 T3 M1 M2 V1 V2 M3 V3
  29. 29. Data Model CREATE TABLE samples ( T timestamp, M text, V double, resource text, PRIMARY KEY(resource, T, M) );
  30. 30. Data model resource T1 M1 V1 T2 M1 V1 T3 M1 V1
  31. 31. Data model resource T1 M1 V1 T2 M1 V1 T3 M1 V1 SELECT * FROM samples WHERE resource = ‘resource’ AND T >= ‘T1’ AND T <= ‘T3’;
  32. 32. Data model resource T1 M1 V1 T2 M1 V1 T3 M1 V1 SELECT * FROM samples WHERE resource = ‘resource’ AND T >= ‘T1’ AND T <= ‘T3’;
  33. 33. Data model resource T1 M1 V1 T2 M1 V1 T3 M1 V1 resource T1 M1 V1 resource T2 M2 V2 resource T3 M3 V3
  34. 34. Newts ● Standalone time series data-store ○ Java API ○ REST interface ● Raw sample storage and retrieval ● Flexible aggregations (computed at read) ○ Rate (from counter types) ○ Pluggable aggregation functions ○ Arbitrary calculations
  35. 35. Newts ● Cassandra-speed ● Resource search indexing (preliminary) ● Approaching “1.0” ● Apache license ● Github (http://github.com/OpenNMS/newts) ● http://newts.io
  36. 36. Fin

×