Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Time series with apache cassandra strata

10,907 views

Published on

This talk is geared around understanding the basics of how Apache Cassandra stores and access time series data.

Published in: Technology

Time series with apache cassandra strata

  1. 1. Time Series with Apache Cassandra Patrick McFadin
 Chief Evangelist @PatrickMcFadin ©2013 DataStax Confidential. Do not distribute without consent. 1
  2. 2. Quick intro to Cassandra • Shared nothing • Masterless peer-to-peer • Based on Dynamo
  3. 3. Scaling • Add nodes to scale • Millions Ops/s THROUGHPUT OPS/SEC) Cassandra HBase Redis MySQL
  4. 4. Uptime • Built to replicate • Resilient to failure • Always on NONE
  5. 5. Easy to use • CQL is a familiar syntax • Friendly to programmers • Paxos for locking CREATE TABLE users (! username varchar,! firstname varchar,! lastname varchar,! email list<varchar>,! password varchar,! created_date timestamp,! PRIMARY KEY (username)! ); INSERT INTO users (username, firstname, lastname, ! email, password, created_date)! VALUES ('pmcfadin','Patrick','McFadin',! ['patrick@datastax.com'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');! INSERT INTO users (username, firstname, ! lastname, email, password, created_date)! VALUES ('pmcfadin','Patrick','McFadin',! ['patrick@datastax.com'],! 'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00')! IF NOT EXISTS;
  6. 6. Time series in production • It’s all about “What’s happening” • Data is the new currency “Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of financial data, ingesting into its database 2million pieces of information a second from every major trading exchange.”* * http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html
  7. 7. Why Cassandra for Time Series Scales Resilient Good data model Efficient Storage Model What about that?
  8. 8. Data Model CREATE TABLE temperature ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time) ); • Weather Station Id and Time are unique • Store as many as needed INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:01:00','72F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:02:00','73F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:03:00','73F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
  9. 9. Storage Model - Logical View SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD'; weatherstation_id event_time temperature 2013-04-03 07:01:00 1234ABCD 72F 2013-04-03 07:02:00 1234ABCD 73F 2013-04-03 07:03:00 1234ABCD 73F 2013-04-03 07:04:00 1234ABCD 74F
  10. 10. Storage Model - Disk Layout SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD'; 2013-04-03 07:01:00 1234ABCD 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 2013-04-03 07:04:00 73F Merged, Sorted and Stored Sequentially 74F 2013-04-03 07:05:00 ! 2013-04-03 07:06:00 ! 74F 75F ! !
  11. 11. Query patterns SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00'; • Range queries • “Slice” operation on disk Single seek on disk 2013-04-03 07:01:00 1234ABCD 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 73F 2013-04-03 07:04:00 74F 2013-04-03 07:05:00 ! 2013-04-03 07:06:00 ! 74F 75F ! !
  12. 12. Query patterns SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00'; weatherstation_id event_time • Range queries • “Slice” operation on disk temperature 2013-04-03 07:01:00 1234ABCD 72F Sorted by event_time 2013-04-03 07:02:00 1234ABCD 73F 2013-04-03 07:03:00 1234ABCD 73F 2013-04-03 07:04:00 1234ABCD 74F Programmers like this
  13. 13. Ingestion models • Apache Kafka • Apache Flume • Storm • Custom Applications Apache Kafka Your totally! killer! application
  14. 14. Dealing with data at speed • 1 million writes per second? • 1 insert every microsecond • Collisions? Your totally! killer! application weatherstation_id='5678EFGH' • Primary Key determines node placement • Random partitioning • Special data type - TimeUUID weatherstation_id='1234ABCD'
  15. 15. TimeUUID Timestamp to Microsecond + UUID = TimeUUID • Also known as a Version 1 UUID • Sortable • Reversible 04d580b0-9412-11e3-baa8-0800200c9a66 = Wednesday, February 12, 2014 6:18:06 PM GMT http://www.famkruithof.net/uuid/uuidgen
  16. 16. Way more information www.planetcassandra.org ! • 5 minute interviews • Use cases • Free training!
  17. 17. Thank You! Follow me for more updates all the time: @PatrickMcFadin

×