Time Series with Apache Cassandra
Patrick McFadin

Chief Evangelist
@PatrickMcFadin
©2013 DataStax Confidential. Do not distribute without consent.

1
Quick intro to Cassandra
• Shared nothing
• Masterless peer-to-peer
• Based on Dynamo
Scaling
• Add nodes to scale
• Millions Ops/s

THROUGHPUT OPS/SEC)

Cassandra

HBase

Redis

MySQL
Uptime
• Built to replicate
• Resilient to failure
• Always on

NONE
Easy to use
• CQL is a familiar syntax
• Friendly to programmers
• Paxos for locking

CREATE TABLE users (!
username varchar,!
firstname varchar,!
lastname varchar,!
email list<varchar>,!
password varchar,!
created_date timestamp,!
PRIMARY KEY (username)!
);

INSERT INTO users (username, firstname, lastname, !
email, password, created_date)!
VALUES ('pmcfadin','Patrick','McFadin',!
['patrick@datastax.com'],'ba27e03fd95e507daf2937c937d499ab',!
'2011-06-20 13:50:00');!

INSERT INTO users (username, firstname, !
lastname, email, password, created_date)!
VALUES ('pmcfadin','Patrick','McFadin',!
['patrick@datastax.com'],!
'ba27e03fd95e507daf2937c937d499ab',!
'2011-06-20 13:50:00')!
IF NOT EXISTS;
Time series in production
• It’s all about “What’s happening”
• Data is the new currency

“Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of
financial data, ingesting into its database 2million pieces of information a second from every
major trading exchange.”*
* http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html
Why Cassandra for Time Series
Scales
Resilient
Good data model
Efficient Storage Model

What about that?
Data Model
CREATE TABLE temperature (
weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time)
);

• Weather Station Id and Time
are unique
• Store as many as needed

INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:01:00','72F');
!

INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:02:00','73F');
!

INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:03:00','73F');
!

INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
Storage Model - Logical View
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD';

weatherstation_id

event_time

temperature

2013-04-03 07:01:00

1234ABCD

72F
2013-04-03 07:02:00

1234ABCD

73F
2013-04-03 07:03:00

1234ABCD

73F
2013-04-03 07:04:00

1234ABCD

74F
Storage Model - Disk Layout
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD';

2013-04-03 07:01:00

1234ABCD

72F

2013-04-03 07:02:00

73F

2013-04-03 07:03:00

2013-04-03 07:04:00

73F

Merged, Sorted and Stored Sequentially

74F

2013-04-03 07:05:00
!

2013-04-03 07:06:00
!

74F

75F

!

!
Query patterns
SELECT temperature
FROM event_time,temperature
WHERE weatherstation_id='1234ABCD'
AND event_time > '2013-04-03 07:01:00'
AND event_time < '2013-04-03 07:04:00';

• Range queries
• “Slice” operation on disk

Single seek on disk
2013-04-03 07:01:00

1234ABCD

72F

2013-04-03 07:02:00

73F

2013-04-03 07:03:00

73F

2013-04-03 07:04:00

74F

2013-04-03 07:05:00
!

2013-04-03 07:06:00
!

74F

75F

!

!
Query patterns
SELECT temperature
FROM event_time,temperature
WHERE weatherstation_id='1234ABCD'
AND event_time > '2013-04-03 07:01:00'
AND event_time < '2013-04-03 07:04:00';
weatherstation_id

event_time

• Range queries
• “Slice” operation on disk

temperature

2013-04-03 07:01:00

1234ABCD

72F

Sorted by event_time

2013-04-03 07:02:00

1234ABCD

73F
2013-04-03 07:03:00

1234ABCD

73F
2013-04-03 07:04:00

1234ABCD

74F

Programmers like this
Ingestion models
• Apache Kafka
• Apache Flume
• Storm
• Custom Applications

Apache Kafka

Your totally!
killer!
application
Dealing with data at speed
• 1 million writes per second?
• 1 insert every microsecond
• Collisions?

Your totally!
killer!
application

weatherstation_id='5678EFGH'

• Primary Key determines node
placement
• Random partitioning
• Special data type - TimeUUID

weatherstation_id='1234ABCD'
TimeUUID
Timestamp to Microsecond

+

UUID

=

TimeUUID

• Also known as a Version 1 UUID
• Sortable
• Reversible

04d580b0-9412-11e3-baa8-0800200c9a66

=

Wednesday, February 12, 2014 6:18:06 PM GMT

http://www.famkruithof.net/uuid/uuidgen
Way more information
www.planetcassandra.org
!

• 5 minute interviews
• Use cases
• Free training!
Thank You!

Follow me for more updates all the time: @PatrickMcFadin

Time series with apache cassandra strata

  • 1.
    Time Series withApache Cassandra Patrick McFadin
 Chief Evangelist @PatrickMcFadin ©2013 DataStax Confidential. Do not distribute without consent. 1
  • 2.
    Quick intro toCassandra • Shared nothing • Masterless peer-to-peer • Based on Dynamo
  • 3.
    Scaling • Add nodesto scale • Millions Ops/s THROUGHPUT OPS/SEC) Cassandra HBase Redis MySQL
  • 4.
    Uptime • Built toreplicate • Resilient to failure • Always on NONE
  • 5.
    Easy to use •CQL is a familiar syntax • Friendly to programmers • Paxos for locking CREATE TABLE users (! username varchar,! firstname varchar,! lastname varchar,! email list<varchar>,! password varchar,! created_date timestamp,! PRIMARY KEY (username)! ); INSERT INTO users (username, firstname, lastname, ! email, password, created_date)! VALUES ('pmcfadin','Patrick','McFadin',! ['patrick@datastax.com'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');! INSERT INTO users (username, firstname, ! lastname, email, password, created_date)! VALUES ('pmcfadin','Patrick','McFadin',! ['patrick@datastax.com'],! 'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00')! IF NOT EXISTS;
  • 6.
    Time series inproduction • It’s all about “What’s happening” • Data is the new currency “Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of financial data, ingesting into its database 2million pieces of information a second from every major trading exchange.”* * http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html
  • 7.
    Why Cassandra forTime Series Scales Resilient Good data model Efficient Storage Model What about that?
  • 8.
    Data Model CREATE TABLEtemperature ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time) ); • Weather Station Id and Time are unique • Store as many as needed INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:01:00','72F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:02:00','73F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:03:00','73F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
  • 9.
    Storage Model -Logical View SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD'; weatherstation_id event_time temperature 2013-04-03 07:01:00 1234ABCD 72F 2013-04-03 07:02:00 1234ABCD 73F 2013-04-03 07:03:00 1234ABCD 73F 2013-04-03 07:04:00 1234ABCD 74F
  • 10.
    Storage Model -Disk Layout SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD'; 2013-04-03 07:01:00 1234ABCD 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 2013-04-03 07:04:00 73F Merged, Sorted and Stored Sequentially 74F 2013-04-03 07:05:00 ! 2013-04-03 07:06:00 ! 74F 75F ! !
  • 11.
    Query patterns SELECT temperature FROMevent_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00'; • Range queries • “Slice” operation on disk Single seek on disk 2013-04-03 07:01:00 1234ABCD 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 73F 2013-04-03 07:04:00 74F 2013-04-03 07:05:00 ! 2013-04-03 07:06:00 ! 74F 75F ! !
  • 12.
    Query patterns SELECT temperature FROMevent_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00'; weatherstation_id event_time • Range queries • “Slice” operation on disk temperature 2013-04-03 07:01:00 1234ABCD 72F Sorted by event_time 2013-04-03 07:02:00 1234ABCD 73F 2013-04-03 07:03:00 1234ABCD 73F 2013-04-03 07:04:00 1234ABCD 74F Programmers like this
  • 13.
    Ingestion models • ApacheKafka • Apache Flume • Storm • Custom Applications Apache Kafka Your totally! killer! application
  • 14.
    Dealing with dataat speed • 1 million writes per second? • 1 insert every microsecond • Collisions? Your totally! killer! application weatherstation_id='5678EFGH' • Primary Key determines node placement • Random partitioning • Special data type - TimeUUID weatherstation_id='1234ABCD'
  • 15.
    TimeUUID Timestamp to Microsecond + UUID = TimeUUID •Also known as a Version 1 UUID • Sortable • Reversible 04d580b0-9412-11e3-baa8-0800200c9a66 = Wednesday, February 12, 2014 6:18:06 PM GMT http://www.famkruithof.net/uuid/uuidgen
  • 16.
    Way more information www.planetcassandra.org ! •5 minute interviews • Use cases • Free training!
  • 17.
    Thank You! Follow mefor more updates all the time: @PatrickMcFadin