Time series storage in Cassandra

2,230 views

Published on

Presented at Cassandra London (April 7, 2014); The challenges of time-series storage and analytics in OpenNMS, with an introduction to Newts, a new Cassandra-based time-series data store.

Published in: Technology

Time series storage in Cassandra

  1. 1. Time Series Storage Cassandra London Meetup April 7, 2014 Eric Evans eevans@opennms.org @jericevans
  2. 2. Open
  3. 3. Open
  4. 4. Network Management System
  5. 5. OpenNMS: What It Is ● Network Management System ○ Discovery and Provisioning ○ Service monitoring ○ Data collection ○ Event management and notifications ● Java, open source, GPLv3 ● Since 1999
  6. 6. Graph All The Things
  7. 7. RRDTool ● Round robin database ● First released 1999 ● Time-series storage ● File-based ● Constant-size ● Automatic, amortized aggregation
  8. 8. Consider ● 2 IOPs per update (read-update-write) ● 1 RRD per data source (storeByGroup=false) ● 100,000s of data sources, 1,000s IOPS ● 1,000,000s of data sources, 10,000s IOPS ● 15,000 RPM SAS drive, ~175-200 IOPS
  9. 9. Also ● Not everything is a graph ● Inflexible ● Incremental backups impractical ● ...
  10. 10. Observation #1 We collect and write a great deal; We read (graph) relatively little. We are optimized for reading everything, always.
  11. 11. Observation #2 Samples are naturally collected, and graphed together in groups. Grouping samples that are accessed together is an easy optimization.
  12. 12. Project: Newts Goals: ● Stand-alone time-series data store ● High-throughput ● Horizontally scalable ● Grouped metric storage/retrieval ● Late-aggregating
  13. 13. Cassandra Why: ● Write-optimized ● Sorted ● Horizontally scalable (linear)
  14. 14. Gist ● Samples stored as-is. ● Samples can be retrieved as-is. ● Measurements are aggregations calculated from samples (at time of query).
  15. 15. Sample { “resource” : “london”, “timestamp” : 1396289065, “name” : “meanTemp”, “type” : “GAUGE”, “value” : 17.2, “attributes” : { “units”: “celsius” } }
  16. 16. Samples CREATE TABLE newts.samples ( resource text, collected_at timestamp, metric_name text, metric_type text, value blob, attributes map<text, text>, PRIMARY KEY(resource, collected_at, metric_name) );
  17. 17. Samples resource | collected_at | metric_name | value ---------+---------------------+--------------+----------- london | 2014-03-31 18:04:25 | dewPoint | 0xc01a0000 london | 2014-03-31 18:04:25 | maxTemp | 0x40280000 london | 2014-03-31 18:04:25 | maxWindGust | 0x7ff80000 london | 2014-03-31 18:04:25 | maxWindSpeed | 0x40180000 london | 2014-03-31 18:04:25 | meanTemp | 0xbfe00000
  18. 18. Behind the scenes... london (2014-03-31 18:04:25, dewPoint): 0xc01a0000 (2014-03-31 18:04:25, maxTemp): 0x40280000 ... Ascending Order
  19. 19. http://github.com/OpenNMS/newts

×