Time Series Data With 
Apache Cassandra 
Strangeloop 
September 19, 2014 
Eric Evans 
eevans@opennms.org 
@jericevans
Open
Open
Open
Open
Network 
Management 
System
OpenNMS: What It Is 
● Network Management System 
○ Discovery and Provisioning 
○ Service monitoring 
○ Data collection 
○ Event management, notifications 
● Java, open source, GPLv3 
● Since 1999
Time series: RRDTool 
● Round Robin Database 
● First released 1999 
● Time series storage 
● File-based, constant-size, self-maintaining 
● Automatic, incremental aggregation
… and oh yeah, graphing
Consider 
● 5+ IOPs per update (read-modify-write)! 
● 100,000s of metrics, 1,000s IOPS 
● 1,000,000s of metrics, 10,000s IOPS 
● 15,000 RPM SAS drive, ~175-200 IOPS
Hmmm 
We collect and write a great deal; We read 
(graph) relatively little. 
So why are we aggregating everything?
Also 
● Not everything is a graph 
● Inflexible 
● Incremental backups impractical 
● Availability subject to filesystem access
TIL 
Metrics typically appear in groups that are 
accessed together. 
Optimizing storage for grouped access is a 
great idea!
What OpenNMS needs: 
● High throughput 
● High availability 
● Late aggregation 
● Grouped storage/retrieval
Cassandra 
● Apache top-level project 
● Distributed database 
● Highly available 
● High throughput 
● Tunable consistency
SSTables 
Writes 
Memtable 
Commitlog 
SSTable 
Memory 
Disk
Write Properties 
● Optimized for write throughput 
● Sorted on disk 
● Perfect for time series!
Partitioning 
A 
B 
C 
Key: Apple 
... 
Z A
Placement 
A 
B 
C 
Key: Apple 
...
Replication 
A 
B 
C 
Key: Apple 
...
CAP Theorem 
Consistency 
Availability 
Partition tolerance
Consistency 
A 
B 
? 
W=2
Consistency 
? 
B 
C 
R=2 
R+W > N
Distribution Properties 
● Symmetrical 
● Linearly scalable 
● Redundant 
● Highly available
D ata M odel
Data Model 
resource
Data Model 
resource 
T1 T2 T3
Data Model 
resource 
T1 
M1 M2 
V1 V2 
M3 
V3 
T2 
M1 M2 
V1 V2 
M3 
V3 
T3 
M1 M2 
V1 V2 
M3 
V3
Data Model 
CREATE TABLE samples ( 
T timestamp, 
M text, 
V double, 
resource text, 
PRIMARY KEY(resource, T, M) 
);
Data model 
resource T1 M1 V1 T2 M1 V1 T3 M1 V1
Data model 
resource T1 M1 V1 T2 M1 V1 T3 M1 V1 
SELECT * FROM samples 
WHERE resource = ‘resource’ 
AND T >= ‘T1’ AND T <= ‘T3’;
Data model 
resource T1 M1 V1 T2 M1 V1 T3 M1 V1 
SELECT * FROM samples 
WHERE resource = ‘resource’ 
AND T >= ‘T1’ AND T <= ‘T3’;
Data model 
resource T1 M1 V1 T2 M1 V1 T3 M1 V1 
resource T1 M1 V1 
resource T2 M2 V2 
resource T3 M3 V3
Newts 
● Standalone time series data-store 
○ Java API 
○ REST interface 
● Raw sample storage and retrieval 
● Flexible aggregations (computed at read) 
○ Rate (from counter types) 
○ Pluggable aggregation functions 
○ Arbitrary calculations
Newts 
● Cassandra-speed 
● Resource search indexing (preliminary) 
● Approaching “1.0” 
● Apache license 
● Github (http://github.com/OpenNMS/newts) 
● http://newts.io
Fin

Time Series Data with Apache Cassandra

  • 1.
    Time Series DataWith Apache Cassandra Strangeloop September 19, 2014 Eric Evans eevans@opennms.org @jericevans
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
    OpenNMS: What ItIs ● Network Management System ○ Discovery and Provisioning ○ Service monitoring ○ Data collection ○ Event management, notifications ● Java, open source, GPLv3 ● Since 1999
  • 8.
    Time series: RRDTool ● Round Robin Database ● First released 1999 ● Time series storage ● File-based, constant-size, self-maintaining ● Automatic, incremental aggregation
  • 9.
    … and ohyeah, graphing
  • 10.
    Consider ● 5+IOPs per update (read-modify-write)! ● 100,000s of metrics, 1,000s IOPS ● 1,000,000s of metrics, 10,000s IOPS ● 15,000 RPM SAS drive, ~175-200 IOPS
  • 12.
    Hmmm We collectand write a great deal; We read (graph) relatively little. So why are we aggregating everything?
  • 13.
    Also ● Noteverything is a graph ● Inflexible ● Incremental backups impractical ● Availability subject to filesystem access
  • 14.
    TIL Metrics typicallyappear in groups that are accessed together. Optimizing storage for grouped access is a great idea!
  • 15.
    What OpenNMS needs: ● High throughput ● High availability ● Late aggregation ● Grouped storage/retrieval
  • 16.
    Cassandra ● Apachetop-level project ● Distributed database ● Highly available ● High throughput ● Tunable consistency
  • 17.
    SSTables Writes Memtable Commitlog SSTable Memory Disk
  • 18.
    Write Properties ●Optimized for write throughput ● Sorted on disk ● Perfect for time series!
  • 19.
    Partitioning A B C Key: Apple ... Z A
  • 20.
    Placement A B C Key: Apple ...
  • 21.
    Replication A B C Key: Apple ...
  • 22.
    CAP Theorem Consistency Availability Partition tolerance
  • 23.
  • 24.
    Consistency ? B C R=2 R+W > N
  • 25.
    Distribution Properties ●Symmetrical ● Linearly scalable ● Redundant ● Highly available
  • 26.
    D ata Model
  • 27.
  • 28.
  • 29.
    Data Model resource T1 M1 M2 V1 V2 M3 V3 T2 M1 M2 V1 V2 M3 V3 T3 M1 M2 V1 V2 M3 V3
  • 30.
    Data Model CREATETABLE samples ( T timestamp, M text, V double, resource text, PRIMARY KEY(resource, T, M) );
  • 31.
    Data model resourceT1 M1 V1 T2 M1 V1 T3 M1 V1
  • 32.
    Data model resourceT1 M1 V1 T2 M1 V1 T3 M1 V1 SELECT * FROM samples WHERE resource = ‘resource’ AND T >= ‘T1’ AND T <= ‘T3’;
  • 33.
    Data model resourceT1 M1 V1 T2 M1 V1 T3 M1 V1 SELECT * FROM samples WHERE resource = ‘resource’ AND T >= ‘T1’ AND T <= ‘T3’;
  • 34.
    Data model resourceT1 M1 V1 T2 M1 V1 T3 M1 V1 resource T1 M1 V1 resource T2 M2 V2 resource T3 M3 V3
  • 35.
    Newts ● Standalonetime series data-store ○ Java API ○ REST interface ● Raw sample storage and retrieval ● Flexible aggregations (computed at read) ○ Rate (from counter types) ○ Pluggable aggregation functions ○ Arbitrary calculations
  • 36.
    Newts ● Cassandra-speed ● Resource search indexing (preliminary) ● Approaching “1.0” ● Apache license ● Github (http://github.com/OpenNMS/newts) ● http://newts.io
  • 37.