SlideShare a Scribd company logo
1 of 49
Download to read offline
©2013 DataStax Confidential. Do not distribute without consent.
@PatrickMcFadin
Patrick McFadin

Chief Evangelist
Time Series with Apache Cassandra
1
Quick intro to Cassandra
• Shared nothing
• Masterless peer-to-peer
• Based on Dynamo
Scaling
• Add nodes to scale
• Millions Ops/s Cassandra HBase Redis MySQL
THROUGHPUTOPS/SEC)
Uptime
• Built to replicate
• Resilient to failure
• Always on
NONE
Easy to use
• CQL is a familiar syntax
• Friendly to programmers
• Paxos for locking
CREATE TABLE users (!
username varchar,!
firstname varchar,!
lastname varchar,!
email list<varchar>,!
password varchar,!
created_date timestamp,!
PRIMARY KEY (username)!
);
INSERT INTO users (username, firstname, lastname, !
email, password, created_date)!
VALUES ('pmcfadin','Patrick','McFadin',!
['patrick@datastax.com'],'ba27e03fd95e507daf2937c937d499ab',!
'2011-06-20 13:50:00');!
INSERT INTO users (username, firstname, !
lastname, email, password, created_date)!
VALUES ('pmcfadin','Patrick','McFadin',!
['patrick@datastax.com'],!
'ba27e03fd95e507daf2937c937d499ab',!
'2011-06-20 13:50:00')!
IF NOT EXISTS;
Time series in production
• It’s all about “What’s happening”
• Data is the new currency
Stack Driver
• AWS and Rackspace monitoring
• Quick indexes
• Batch rollup results
MyDrive
• Moved from Mongo to Cassandra
• Queue processing
• Bound at the storing data
“One thing that is not at all obvious
from the graph is that the system was
also under massively heavier strain
after the switch to Cassandra because
of additional bulk processing going on
in the background.”
- Karl Matthias, MyDrive
Paddy Power
• Real-time product and pricing
• Much like stock tickers
• Active-active across two data
centers
“Specifically for Cassandra and Datastax, the
ability to process time-series data is something
that lots of companies have done in the past, not
something that we were very aware of, and that was
one of the reasons why we chose this as the first
use case for Cassandra.”
- John Turner, Paddy Power
Internet Of Things
• 15B devices by 2015
• 40B devices by 2020!
Why Cassandra for Time Series
Scales
Resilient
Good data model
Efficient Storage Model
What about that?
Example 1: Weather Station
• Weather station collects data
• Cassandra stores in sequence
• Application reads in sequence
Use case
• Store data per weather station
• Store time series in order: first to last
• Get all data for one weather station
• Get data for a single date and time
• Get data for a range of dates and times
Needed Queries
Data Model to support queries
Data Model
• Weather Station Id and Time
are unique
• Store as many as needed
CREATE TABLE temperature (
weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time)
);
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:01:00','72F');
!
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:02:00','73F');
!
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:03:00','73F');
!
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
Storage Model - Logical View
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD';
1234ABCD
1234ABCD
1234ABCD
weatherstation_id event_time temperature
2013-04-03 07:04:00
74F
1234ABCD
Storage Model - Disk Layout
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
1234ABCD
2013-04-03 07:04:00
74F
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD';
Merged, Sorted and Stored Sequentially
2013-04-03 07:05:00!
!
74F
2013-04-03 07:06:00!
!
75F
Query patterns
• Range queries
• “Slice” operation on disk
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD'
AND event_time >= '2013-04-03 07:01:00'
AND event_time <= '2013-04-03 07:04:00';
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
1234ABCD
2013-04-03 07:04:00
74F
2013-04-03 07:05:00!
!
74F
2013-04-03 07:06:00!
!
75F
Single seek on disk
Query patterns
• Range queries
• “Slice” operation on disk
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD'
AND event_time >= '2013-04-03 07:01:00'
AND event_time <= '2013-04-03 07:04:00';
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
1234ABCD
2013-04-03 07:04:00
74F
weatherstation_id event_time temperature
1234ABCD
1234ABCD
1234ABCD
Programmers like this
Sorted by event_time
Additional help on the storage engine
SSTable seeks
• Each read minimum
1 seek
• Cache and bloom
filter help minimize
Total seek time = Disk Latency * number of seeks
The key to speed
Use the first part of the primary key to get the node
(data localization)
Minimize seeks for SStables
(Bloom Filter,Key Cache up-to-date)
Find the data fast in the SSTable
(Indexes)
Min/Max Value Hint
• New since 2.0
• Range index on primary key values per SSTable
• Minimizes seeks on range data
CASSANDRA-5514 if you are interested in details
SELECT temperature
FROM event_time,temperature
WHERE weatherstation_id='1234ABCD'
AND event_time => '2013-04-03 07:01:00'
AND event_time =< '2013-04-03 07:04:00';
Row Key: 1234ABCD
Min event_time: 2013-04-01 00:00:00
Max event_time: 2013-04-04 23:59:59
Row Key: 1234ABCD
Min event_time: 2013-04-05 00:00:00
Max event_time: 2013-04-09 23:59:59
Row Key: 1234ABCD
Min event_time: 2013-03-27 00:00:00
Max event_time: 2013-03-31 23:59:59
?
This one
Ingestion models
• Apache Kafka
• Apache Flume
• Storm
• Spark Streaming
• Custom Applications
Apache Kafka
Your totally!
killer!
application
Kafka + Storm
• Kafka provides reliable queuing
• Storm processes (rollups, counts)
• Cassandra stores at the same speed
• Storm lookup on Cassandra
Apache Kafka
Apache Storm
Queue Process Store
Flume
• Source accepts data
• Channel buffers data
• Sink processes and stores
• Popular for log processing
Sink
Channel
Source
Application
Load
Balancer
Syslog
Dealing with data at speed
• 1 million writes per second?
• 1 insert every microsecond
• Collisions?
• Primary Key determines node
placement
• Random partitioning
• Special data type - TimeUUID
Your totally!
killer!
application weatherstation_id='1234ABCD'
weatherstation_id='5678EFGH'
How does data replicate?
Primary key determines placement*
Partitioning
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru gender: F
johnny age:12 gender: M
suzy age:10 gender: F
jim
carol
johnny
suzy
PK
5e02739678...
a9a0198010...
f4eb27cea7...
78b421309e...
MD5 Hash
MD5* hash
operation yields
a 128-bit
number for keys
of any size.
Key Hashing
Node A
Node D Node C
Node B
The Token Ring
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
Node A
Node D Node C
Node B
carol a9a0198010...
Replication
Node A
Node D Node C
Node B
carol a9a0198010...
Replication
Node A
Node D Node C
Node B
carol a9a0198010...
Replication
Replication factor = 3
Consistency is a
different topic for
later
TimeUUID
• Also known as a Version 1 UUID
• Sortable
• Reversible
Timestamp to Microsecond + UUID = TimeUUID
04d580b0-9412-11e3-baa8-0800200c9a66 Wednesday, February 12, 2014 6:18:06 PM GMT
http://www.famkruithof.net/uuid/uuidgen
=
Example 2: Financial Transactions
• Trading of stocks
• When did they happen?
• Massive speeds and volumes
“Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of
financial data, ingesting into its database 2million pieces of information a second from every
major trading exchange.”*
* http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html
Use case
• Store data per symbol and date
• Store time series in reverse order: last to first
• Make sure every transaction is unique
• Get all trades for symbol and day
• Get trade for a single date and time
• Get last 10 trades for symbol and date
Needed Queries
Data Model to support queries
Data Model
• date is int of days since epoch
• timeuuid keeps it unique
• Reverse the times for later
queries
CREATE TABLE stock_ticks (
symbol text,
date int,
trade timeuuid,
trade_details text,
PRIMARY KEY ((symbol, date), trade)
) WITH CLUSTERING ORDER BY (trade DESC);
INSERT INTO stock_ticks(symbol, date, trade, trade_details)
VALUES (‘NFLX’,340,04d580b0-1431-1e33-baf8-0833200c98a6,'BUY:2000');
!
INSERT INTO stock_ticks(symbol, date, trade, trade_details)
VALUES (‘NFLX’,340,05d580b0-6472-1ef3-a3a8-0430200c9a66,'BUY:300');
!
INSERT INTO stock_ticks(symbol, date, trade, trade_details)
VALUES (‘NFLX’,340,02d580b0-9412-d223-55a8-0976200c9a25,'SELL:450');
!
INSERT INTO stock_ticks(symbol, date, trade, trade_details)
VALUES (‘NFLX’,340,08d580b0-4482-11e3-5fd3-3421200c9a65,'SELL:3000');
Storage Model - Logical View
08d580b0-4482-11e3-5fd3-
3421200c9a65
SELL:3000
02d580b0-9412-
d223-55a8-0976200c9a25
SELL:450
05d580b0-6472-1ef3-
a3a8-0430200c9a66
BUY:300
SELECT trade,trade_details
FROM stock_ticks
WHERE symbol =‘NFLX’ AND date=‘340’;
NFLX:340
NFLX:340
NFLX:340
symbol:date trade trade_details
04d580b0-1431-1e33-
baf8-0833200c98a6
BUY:2000
NFLX:340
Last thing inserted
First thing inserted
04d580b0-1431-1e33-
baf8-0833200c98a6
05d580b0-6472-1ef3-
a3a8-0430200c9a66
02d580b0-9412-d223-55a8
BUY:2000BUY:300
08d580b0-4482-11e3-5fd3-
3421200c9a65
SELL:3000 SELL:450
Storage Model - Disk Layout
NFLX:340
Order is from last trade to first
SELECT trade,trade_details
FROM stock_ticks
WHERE symbol =‘NFLX’ AND date=‘340’;
04d580b0-1431-1e33-
baf8-0833200c98a6
05d580b0-6472-1ef3-
a3a8-0430200c9a66
02d580b0-9412-
d223-55a8-0976200c9a25
Query patterns
• Limit queries
• Get last X trades
From here
SELECT trade,trade_details
FROM stock_ticks
WHERE symbol =‘NFLX’ AND date=‘340’
LIMIT 3;
BUY:2000BUY:300
08d580b0-4482-11e3-5fd3-
3421200c9a65
SELL:3000 SELL:450
NFLX:340
to here
Query patterns
Reverse sorted by trade
Last 3 trades
08d580b0-4482-11e3-5fd3-
3421200c9a65
SELL:3000
02d580b0-9412-
d223-55a8-0976200c9a25
SELL:450
05d580b0-6472-1ef3-
a3a8-0430200c9a66
BUY:300
NFLX:340
NFLX:340
NFLX:340
symbol:date trade trade_details
• Limit queries
• Get last X trades
SELECT trade,trade_details
FROM stock_ticks
WHERE symbol =‘NFLX’ AND date=‘340’
LIMIT 3;
Way more examples
• 5 minute interviews
• Use cases
• Free training!
!
www.planetcassandra.org
Thank You!
Follow me for more updates all the time: @PatrickMcFadin

More Related Content

What's hot

Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationshadooparchbook
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingVassilis Bekiaris
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overviewJulian Hyde
 
MySQL Index Cookbook
MySQL Index CookbookMySQL Index Cookbook
MySQL Index CookbookMYXPLAIN
 
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022InfluxData
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...DataStax
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 
How the Postgres Query Optimizer Works
How the Postgres Query Optimizer WorksHow the Postgres Query Optimizer Works
How the Postgres Query Optimizer WorksEDB
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowDataWorks Summit
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101Data Con LA
 
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우PgDay.Seoul
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache CalciteJulian Hyde
 

What's hot (20)

Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overview
 
MySQL Index Cookbook
MySQL Index CookbookMySQL Index Cookbook
MySQL Index Cookbook
 
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
How the Postgres Query Optimizer Works
How the Postgres Query Optimizer WorksHow the Postgres Query Optimizer Works
How the Postgres Query Optimizer Works
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 

Similar to Time series with Apache Cassandra - Long version

Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strataPatrick McFadin
 
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingCassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
 
Data Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and SparkData Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and SparkChristopher Batey
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014Patrick McFadin
 
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...DataStax Academy
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015Patrick McFadin
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataPatrick McFadin
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Viswanath J
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series AnalysisQAware GmbH
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and SparkJosef Adersberger
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Lucidworks
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Matthias Niehoff
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...it-people
 
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1DataStax Academy
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013jbellis
 

Similar to Time series with Apache Cassandra - Long version (20)

Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
 
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingCassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
 
Data Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and SparkData Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and Spark
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
1 Dundee - Cassandra 101
1 Dundee - Cassandra 1011 Dundee - Cassandra 101
1 Dundee - Cassandra 101
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Getting started with Cassandra 2.1
Getting started with Cassandra 2.1
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and Spark
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
 
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
 

More from Patrick McFadin

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast DataPatrick McFadin
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!Patrick McFadin
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team ApachePatrick McFadin
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelinesPatrick McFadin
 
Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Patrick McFadin
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced previewPatrick McFadin
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraPatrick McFadin
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the firePatrick McFadin
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valleyPatrick McFadin
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guidePatrick McFadin
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on firePatrick McFadin
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesPatrick McFadin
 
Cassandra 2.0 better, faster, stronger
Cassandra 2.0   better, faster, strongerCassandra 2.0   better, faster, stronger
Cassandra 2.0 better, faster, strongerPatrick McFadin
 
Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraPatrick McFadin
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data modelPatrick McFadin
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data modelPatrick McFadin
 

More from Patrick McFadin (20)

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelines
 
Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guide
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
 
Cassandra 2.0 better, faster, stronger
Cassandra 2.0   better, faster, strongerCassandra 2.0   better, faster, stronger
Cassandra 2.0 better, faster, stronger
 
Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache Cassandra
 
Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data model
 
Become a super modeler
Become a super modelerBecome a super modeler
Become a super modeler
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
 

Recently uploaded

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Time series with Apache Cassandra - Long version

  • 1. ©2013 DataStax Confidential. Do not distribute without consent. @PatrickMcFadin Patrick McFadin
 Chief Evangelist Time Series with Apache Cassandra 1
  • 2. Quick intro to Cassandra • Shared nothing • Masterless peer-to-peer • Based on Dynamo
  • 3. Scaling • Add nodes to scale • Millions Ops/s Cassandra HBase Redis MySQL THROUGHPUTOPS/SEC)
  • 4. Uptime • Built to replicate • Resilient to failure • Always on NONE
  • 5. Easy to use • CQL is a familiar syntax • Friendly to programmers • Paxos for locking CREATE TABLE users (! username varchar,! firstname varchar,! lastname varchar,! email list<varchar>,! password varchar,! created_date timestamp,! PRIMARY KEY (username)! ); INSERT INTO users (username, firstname, lastname, ! email, password, created_date)! VALUES ('pmcfadin','Patrick','McFadin',! ['patrick@datastax.com'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');! INSERT INTO users (username, firstname, ! lastname, email, password, created_date)! VALUES ('pmcfadin','Patrick','McFadin',! ['patrick@datastax.com'],! 'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00')! IF NOT EXISTS;
  • 6. Time series in production • It’s all about “What’s happening” • Data is the new currency
  • 7. Stack Driver • AWS and Rackspace monitoring • Quick indexes • Batch rollup results
  • 8. MyDrive • Moved from Mongo to Cassandra • Queue processing • Bound at the storing data “One thing that is not at all obvious from the graph is that the system was also under massively heavier strain after the switch to Cassandra because of additional bulk processing going on in the background.” - Karl Matthias, MyDrive
  • 9. Paddy Power • Real-time product and pricing • Much like stock tickers • Active-active across two data centers “Specifically for Cassandra and Datastax, the ability to process time-series data is something that lots of companies have done in the past, not something that we were very aware of, and that was one of the reasons why we chose this as the first use case for Cassandra.” - John Turner, Paddy Power
  • 10. Internet Of Things • 15B devices by 2015 • 40B devices by 2020!
  • 11. Why Cassandra for Time Series Scales Resilient Good data model Efficient Storage Model What about that?
  • 12. Example 1: Weather Station • Weather station collects data • Cassandra stores in sequence • Application reads in sequence
  • 13. Use case • Store data per weather station • Store time series in order: first to last • Get all data for one weather station • Get data for a single date and time • Get data for a range of dates and times Needed Queries Data Model to support queries
  • 14. Data Model • Weather Station Id and Time are unique • Store as many as needed CREATE TABLE temperature ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time) ); INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:01:00','72F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:02:00','73F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:03:00','73F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
  • 15. Storage Model - Logical View 2013-04-03 07:01:00 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 73F SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD'; 1234ABCD 1234ABCD 1234ABCD weatherstation_id event_time temperature 2013-04-03 07:04:00 74F 1234ABCD
  • 16. Storage Model - Disk Layout 2013-04-03 07:01:00 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 73F 1234ABCD 2013-04-03 07:04:00 74F SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD'; Merged, Sorted and Stored Sequentially 2013-04-03 07:05:00! ! 74F 2013-04-03 07:06:00! ! 75F
  • 17. Query patterns • Range queries • “Slice” operation on disk SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD' AND event_time >= '2013-04-03 07:01:00' AND event_time <= '2013-04-03 07:04:00'; 2013-04-03 07:01:00 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 73F 1234ABCD 2013-04-03 07:04:00 74F 2013-04-03 07:05:00! ! 74F 2013-04-03 07:06:00! ! 75F Single seek on disk
  • 18. Query patterns • Range queries • “Slice” operation on disk SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD' AND event_time >= '2013-04-03 07:01:00' AND event_time <= '2013-04-03 07:04:00'; 2013-04-03 07:01:00 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 73F 1234ABCD 2013-04-03 07:04:00 74F weatherstation_id event_time temperature 1234ABCD 1234ABCD 1234ABCD Programmers like this Sorted by event_time
  • 19. Additional help on the storage engine
  • 20. SSTable seeks • Each read minimum 1 seek • Cache and bloom filter help minimize Total seek time = Disk Latency * number of seeks
  • 21. The key to speed Use the first part of the primary key to get the node (data localization) Minimize seeks for SStables (Bloom Filter,Key Cache up-to-date) Find the data fast in the SSTable (Indexes)
  • 22. Min/Max Value Hint • New since 2.0 • Range index on primary key values per SSTable • Minimizes seeks on range data CASSANDRA-5514 if you are interested in details SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time => '2013-04-03 07:01:00' AND event_time =< '2013-04-03 07:04:00'; Row Key: 1234ABCD Min event_time: 2013-04-01 00:00:00 Max event_time: 2013-04-04 23:59:59 Row Key: 1234ABCD Min event_time: 2013-04-05 00:00:00 Max event_time: 2013-04-09 23:59:59 Row Key: 1234ABCD Min event_time: 2013-03-27 00:00:00 Max event_time: 2013-03-31 23:59:59 ? This one
  • 23. Ingestion models • Apache Kafka • Apache Flume • Storm • Spark Streaming • Custom Applications Apache Kafka Your totally! killer! application
  • 24. Kafka + Storm • Kafka provides reliable queuing • Storm processes (rollups, counts) • Cassandra stores at the same speed • Storm lookup on Cassandra Apache Kafka Apache Storm Queue Process Store
  • 25. Flume • Source accepts data • Channel buffers data • Sink processes and stores • Popular for log processing Sink Channel Source Application Load Balancer Syslog
  • 26. Dealing with data at speed • 1 million writes per second? • 1 insert every microsecond • Collisions? • Primary Key determines node placement • Random partitioning • Special data type - TimeUUID Your totally! killer! application weatherstation_id='1234ABCD' weatherstation_id='5678EFGH'
  • 27. How does data replicate?
  • 28. Primary key determines placement* Partitioning jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F
  • 30. Node A Node D Node C Node B The Token Ring
  • 31. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  • 32. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  • 33. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  • 34. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  • 35. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  • 36. Node A Node D Node C Node B carol a9a0198010... Replication
  • 37. Node A Node D Node C Node B carol a9a0198010... Replication
  • 38. Node A Node D Node C Node B carol a9a0198010... Replication Replication factor = 3 Consistency is a different topic for later
  • 39. TimeUUID • Also known as a Version 1 UUID • Sortable • Reversible Timestamp to Microsecond + UUID = TimeUUID 04d580b0-9412-11e3-baa8-0800200c9a66 Wednesday, February 12, 2014 6:18:06 PM GMT http://www.famkruithof.net/uuid/uuidgen =
  • 40. Example 2: Financial Transactions • Trading of stocks • When did they happen? • Massive speeds and volumes “Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of financial data, ingesting into its database 2million pieces of information a second from every major trading exchange.”* * http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html
  • 41. Use case • Store data per symbol and date • Store time series in reverse order: last to first • Make sure every transaction is unique • Get all trades for symbol and day • Get trade for a single date and time • Get last 10 trades for symbol and date Needed Queries Data Model to support queries
  • 42. Data Model • date is int of days since epoch • timeuuid keeps it unique • Reverse the times for later queries CREATE TABLE stock_ticks ( symbol text, date int, trade timeuuid, trade_details text, PRIMARY KEY ((symbol, date), trade) ) WITH CLUSTERING ORDER BY (trade DESC); INSERT INTO stock_ticks(symbol, date, trade, trade_details) VALUES (‘NFLX’,340,04d580b0-1431-1e33-baf8-0833200c98a6,'BUY:2000'); ! INSERT INTO stock_ticks(symbol, date, trade, trade_details) VALUES (‘NFLX’,340,05d580b0-6472-1ef3-a3a8-0430200c9a66,'BUY:300'); ! INSERT INTO stock_ticks(symbol, date, trade, trade_details) VALUES (‘NFLX’,340,02d580b0-9412-d223-55a8-0976200c9a25,'SELL:450'); ! INSERT INTO stock_ticks(symbol, date, trade, trade_details) VALUES (‘NFLX’,340,08d580b0-4482-11e3-5fd3-3421200c9a65,'SELL:3000');
  • 43. Storage Model - Logical View 08d580b0-4482-11e3-5fd3- 3421200c9a65 SELL:3000 02d580b0-9412- d223-55a8-0976200c9a25 SELL:450 05d580b0-6472-1ef3- a3a8-0430200c9a66 BUY:300 SELECT trade,trade_details FROM stock_ticks WHERE symbol =‘NFLX’ AND date=‘340’; NFLX:340 NFLX:340 NFLX:340 symbol:date trade trade_details 04d580b0-1431-1e33- baf8-0833200c98a6 BUY:2000 NFLX:340 Last thing inserted First thing inserted
  • 44. 04d580b0-1431-1e33- baf8-0833200c98a6 05d580b0-6472-1ef3- a3a8-0430200c9a66 02d580b0-9412-d223-55a8 BUY:2000BUY:300 08d580b0-4482-11e3-5fd3- 3421200c9a65 SELL:3000 SELL:450 Storage Model - Disk Layout NFLX:340 Order is from last trade to first SELECT trade,trade_details FROM stock_ticks WHERE symbol =‘NFLX’ AND date=‘340’;
  • 45. 04d580b0-1431-1e33- baf8-0833200c98a6 05d580b0-6472-1ef3- a3a8-0430200c9a66 02d580b0-9412- d223-55a8-0976200c9a25 Query patterns • Limit queries • Get last X trades From here SELECT trade,trade_details FROM stock_ticks WHERE symbol =‘NFLX’ AND date=‘340’ LIMIT 3; BUY:2000BUY:300 08d580b0-4482-11e3-5fd3- 3421200c9a65 SELL:3000 SELL:450 NFLX:340 to here
  • 46. Query patterns Reverse sorted by trade Last 3 trades 08d580b0-4482-11e3-5fd3- 3421200c9a65 SELL:3000 02d580b0-9412- d223-55a8-0976200c9a25 SELL:450 05d580b0-6472-1ef3- a3a8-0430200c9a66 BUY:300 NFLX:340 NFLX:340 NFLX:340 symbol:date trade trade_details • Limit queries • Get last X trades SELECT trade,trade_details FROM stock_ticks WHERE symbol =‘NFLX’ AND date=‘340’ LIMIT 3;
  • 47. Way more examples • 5 minute interviews • Use cases • Free training! ! www.planetcassandra.org
  • 48.
  • 49. Thank You! Follow me for more updates all the time: @PatrickMcFadin