Time series storage in Cassandra

  • 934 views
Uploaded on

Presented at Cassandra London (April 7, 2014); The challenges of time-series storage and analytics in OpenNMS, with an introduction to Newts, a new Cassandra-based time-series data store.

Presented at Cassandra London (April 7, 2014); The challenges of time-series storage and analytics in OpenNMS, with an introduction to Newts, a new Cassandra-based time-series data store.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
934
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
7
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Time Series Storage Cassandra London Meetup April 7, 2014 Eric Evans eevans@opennms.org @jericevans
  • 2. Open
  • 3. Open
  • 4. Network Management System
  • 5. OpenNMS: What It Is ● Network Management System ○ Discovery and Provisioning ○ Service monitoring ○ Data collection ○ Event management and notifications ● Java, open source, GPLv3 ● Since 1999
  • 6. Graph All The Things
  • 7. RRDTool ● Round robin database ● First released 1999 ● Time-series storage ● File-based ● Constant-size ● Automatic, amortized aggregation
  • 8. Consider ● 2 IOPs per update (read-update-write) ● 1 RRD per data source (storeByGroup=false) ● 100,000s of data sources, 1,000s IOPS ● 1,000,000s of data sources, 10,000s IOPS ● 15,000 RPM SAS drive, ~175-200 IOPS
  • 9. Also ● Not everything is a graph ● Inflexible ● Incremental backups impractical ● ...
  • 10. Observation #1 We collect and write a great deal; We read (graph) relatively little. We are optimized for reading everything, always.
  • 11. Observation #2 Samples are naturally collected, and graphed together in groups. Grouping samples that are accessed together is an easy optimization.
  • 12. Project: Newts Goals: ● Stand-alone time-series data store ● High-throughput ● Horizontally scalable ● Grouped metric storage/retrieval ● Late-aggregating
  • 13. Cassandra Why: ● Write-optimized ● Sorted ● Horizontally scalable (linear)
  • 14. Gist ● Samples stored as-is. ● Samples can be retrieved as-is. ● Measurements are aggregations calculated from samples (at time of query).
  • 15. Sample { “resource” : “london”, “timestamp” : 1396289065, “name” : “meanTemp”, “type” : “GAUGE”, “value” : 17.2, “attributes” : { “units”: “celsius” } }
  • 16. Samples CREATE TABLE newts.samples ( resource text, collected_at timestamp, metric_name text, metric_type text, value blob, attributes map<text, text>, PRIMARY KEY(resource, collected_at, metric_name) );
  • 17. Samples resource | collected_at | metric_name | value ---------+---------------------+--------------+----------- london | 2014-03-31 18:04:25 | dewPoint | 0xc01a0000 london | 2014-03-31 18:04:25 | maxTemp | 0x40280000 london | 2014-03-31 18:04:25 | maxWindGust | 0x7ff80000 london | 2014-03-31 18:04:25 | maxWindSpeed | 0x40180000 london | 2014-03-31 18:04:25 | meanTemp | 0xbfe00000
  • 18. Behind the scenes... london (2014-03-31 18:04:25, dewPoint): 0xc01a0000 (2014-03-31 18:04:25, maxTemp): 0x40280000 ... Ascending Order
  • 19. http://github.com/OpenNMS/newts