Time Series Storage
Cassandra London Meetup
April 7, 2014
Eric Evans
eevans@opennms.org
@jericevans
Open
Open
Network
Management
System
OpenNMS: What It Is
● Network Management System
○ Discovery and Provisioning
○ Service monitoring
○ Data collection
○ Event management and notifications
● Java, open source, GPLv3
● Since 1999
Graph All The Things
RRDTool
● Round robin database
● First released 1999
● Time-series storage
● File-based
● Constant-size
● Automatic, amortized aggregation
Consider
● 2 IOPs per update (read-update-write)
● 1 RRD per data source (storeByGroup=false)
● 100,000s of data sources, 1,000s IOPS
● 1,000,000s of data sources, 10,000s IOPS
● 15,000 RPM SAS drive, ~175-200 IOPS
Also
● Not everything is a graph
● Inflexible
● Incremental backups impractical
● ...
Observation #1
We collect and write a great deal; We read
(graph) relatively little.
We are optimized for reading everything,
always.
Observation #2
Samples are naturally collected, and graphed
together in groups.
Grouping samples that are accessed together
is an easy optimization.
Project: Newts
Goals:
● Stand-alone time-series data store
● High-throughput
● Horizontally scalable
● Grouped metric storage/retrieval
● Late-aggregating
Cassandra
Why:
● Write-optimized
● Sorted
● Horizontally scalable (linear)
Gist
● Samples stored as-is.
● Samples can be retrieved as-is.
● Measurements are aggregations calculated
from samples (at time of query).
Sample
{
“resource” : “london”,
“timestamp” : 1396289065,
“name” : “meanTemp”,
“type” : “GAUGE”,
“value” : 17.2,
“attributes” : { “units”: “celsius” }
}
Samples
CREATE TABLE newts.samples (
resource text,
collected_at timestamp,
metric_name text,
metric_type text,
value blob,
attributes map<text, text>,
PRIMARY KEY(resource, collected_at, metric_name)
);
Samples
resource | collected_at | metric_name | value
---------+---------------------+--------------+-----------
london | 2014-03-31 18:04:25 | dewPoint | 0xc01a0000
london | 2014-03-31 18:04:25 | maxTemp | 0x40280000
london | 2014-03-31 18:04:25 | maxWindGust | 0x7ff80000
london | 2014-03-31 18:04:25 | maxWindSpeed | 0x40180000
london | 2014-03-31 18:04:25 | meanTemp | 0xbfe00000
Behind the scenes...
london (2014-03-31 18:04:25, dewPoint):
0xc01a0000
(2014-03-31 18:04:25, maxTemp):
0x40280000
...
Ascending Order
http://github.com/OpenNMS/newts
Time series storage in Cassandra

Time series storage in Cassandra

  • 1.
    Time Series Storage CassandraLondon Meetup April 7, 2014 Eric Evans eevans@opennms.org @jericevans
  • 2.
  • 3.
  • 4.
  • 5.
    OpenNMS: What ItIs ● Network Management System ○ Discovery and Provisioning ○ Service monitoring ○ Data collection ○ Event management and notifications ● Java, open source, GPLv3 ● Since 1999
  • 6.
  • 7.
    RRDTool ● Round robindatabase ● First released 1999 ● Time-series storage ● File-based ● Constant-size ● Automatic, amortized aggregation
  • 8.
    Consider ● 2 IOPsper update (read-update-write) ● 1 RRD per data source (storeByGroup=false) ● 100,000s of data sources, 1,000s IOPS ● 1,000,000s of data sources, 10,000s IOPS ● 15,000 RPM SAS drive, ~175-200 IOPS
  • 9.
    Also ● Not everythingis a graph ● Inflexible ● Incremental backups impractical ● ...
  • 10.
    Observation #1 We collectand write a great deal; We read (graph) relatively little. We are optimized for reading everything, always.
  • 11.
    Observation #2 Samples arenaturally collected, and graphed together in groups. Grouping samples that are accessed together is an easy optimization.
  • 12.
    Project: Newts Goals: ● Stand-alonetime-series data store ● High-throughput ● Horizontally scalable ● Grouped metric storage/retrieval ● Late-aggregating
  • 13.
  • 14.
    Gist ● Samples storedas-is. ● Samples can be retrieved as-is. ● Measurements are aggregations calculated from samples (at time of query).
  • 15.
    Sample { “resource” : “london”, “timestamp”: 1396289065, “name” : “meanTemp”, “type” : “GAUGE”, “value” : 17.2, “attributes” : { “units”: “celsius” } }
  • 16.
    Samples CREATE TABLE newts.samples( resource text, collected_at timestamp, metric_name text, metric_type text, value blob, attributes map<text, text>, PRIMARY KEY(resource, collected_at, metric_name) );
  • 17.
    Samples resource | collected_at| metric_name | value ---------+---------------------+--------------+----------- london | 2014-03-31 18:04:25 | dewPoint | 0xc01a0000 london | 2014-03-31 18:04:25 | maxTemp | 0x40280000 london | 2014-03-31 18:04:25 | maxWindGust | 0x7ff80000 london | 2014-03-31 18:04:25 | maxWindSpeed | 0x40180000 london | 2014-03-31 18:04:25 | meanTemp | 0xbfe00000
  • 18.
    Behind the scenes... london(2014-03-31 18:04:25, dewPoint): 0xc01a0000 (2014-03-31 18:04:25, maxTemp): 0x40280000 ... Ascending Order
  • 19.