SlideShare a Scribd company logo
Terror & Hysteria: Cost Effective
Scaling of Time Series Data
with Cassandra
Sam Bisbee, Threat Stack CTO
Typical [time series] problems on C*
● Disk utilization creates a scaling pattern of lighting money on
fire
– Only works for a month or two, even with 90% disk utilization
● Every write up we found focused on schema design for
tracking integers across time
– There are days we wish we only tracked integers
● Data drastically loses value over time, but C*'s design
doesn't acknowledge this
– TTLs only address 0 value states, not partial value
– Ex., 99% of reads are for data in its first day
● Not all sensors are equal
Categories of Time Series Data
Volume of Tx's
Size of Tx's
CRUD, Web 2.0
System Monitoring
(CPU, etc.)
System Monitoring
(CPU, etc.)
Traditional object store
Threat Stack
Categories of Time Series Data
Volume of Tx's
Size of Tx's
CRUD, Web 2.0
System Monitoring
(CPU, etc.)
System Monitoring
(CPU, etc.)
Traditional object store
Threat Stack
Traditional time
series on C*, what
everyone writes about
“We're going to need
a bigger boat. Or disks.”
We care about this thing called margins
(see: we're in Boston, not the Valley)
Data at Threat Stack
● 5 to 10TBs per day of raw data
– Crossed several TB per day in first few months of production with ~4 people
● 80,000 to 150,000 Tx per second, analyzed in real time
– Internal goal of analyzing, persisting, and firing alerts in <1s
● 90% write to 10% read tx
● Pre-compute query results for 70% of queries for UI
– Optimized lookup tables & complex data structures, not just “query & cache”
● 100% AWS, distrust of remote storage in our DNA
– This is not just EBS bashing. This applies to all databases on all platforms,
even a cage in a data center.
● By the way, we're on DSE 4.8.4 (C* 2.1)
Generic data model
● Entire platform assumes that events form a partially ordered, eventually
consistent, write ahead log
– A wonderful C* use case, so long as you only INSERT
● UPDATE is a dirty word and C* counters are “banned”
– We do our big counts elsewhere (“right tool for the right job”)
● No DELETEs, too many key permutations and don't want tombstones
● Duplicate writes will happen
– Legitimate: fully or partially failed batches of writes
– Legitimate: sensor resends data because it doesn't see platform's
acknowledgement of data
– How-do-you-even-computer: people cannot configure NTP, so have fun
constantly receiving data from 1970
● TTL on insert time, store and query on event time
We need to show individual events or slices,
cannot use time granularity rows
(1min, 15min, 30min, 1hr, etc.)
Creating and updating tables' schema
● ALTER TABLE isn't fun, so we support dual writes instead
– Create new schema, performing dual reads for new & old
– Cut writes over to new schema
– After TTL time, DROP TABLE old
● Each step is verifiable with unit tests and metrics
● Maintains insert only data model for temporary disk util
cost
● Allows trivial testing of analysis and A/B'ing of schema
– Just toss a new schema in, gather some insights, and then
feel free to drop it
AWS Instance Types & EBS
● EBS is generally banned on our platform
– Too many of us lived through the great outage
– Too many of us cannot live with unpredictable I/O patterns
– Biggest reason: you cannot RI EBS
● Originally used i2.2xlarge's in 2014/2015
– Considering amount of “learning” we did, we were very
grateful for SSDs due to amount of streaming we had to do
● Moved to d2.xlarge's and d2.2xlarge's in 2015
– RAID 0 the spindles with xfs
– We like the CPU and RAM to disk ratio, especially since
compaction stops after a few hours
$/TB on AWS
i2.2xlarge d2.2xlarge c3.2xlarge +
6 x 2TB io1 EBS
No Prepay $619.04 / 1.6TB
= $386 / TB / month
$586.92 / 12TB
= $49.91 / TB / month
$1,713.16 / 12TB
= $142.77/TB/month
Partial Prepay $530.37 / 1.6TB
= $331.48/TB/month
$502.12 / 12TB
= $41.85 / TB / month
$1,684.59 / 12TB
= $140.39/TB/month
Full Prepay $519.17 / 1.6TB
= $324.85/TB/month
$492 / 12TB
= $41 / TB / month
$1,680.84 / 12TB
= $140.07/TB/month
● Amortizes one-time RI across 1yr, focusing on cost instead of cash out of
pocket
● Does not account for N=3 in cluster, so x3 for each record, then x2 for worst
case compaction headroom (realistically need MUCH LESS)
● c3 column assumes d2 comparison on disk size, not fair versus i2
We only store some raw data in C*
● Deleting data proved too difficult in the early days, even
with DTCS (slides coming on how we solved this)
● Re-streaming due to regular maintenance could take a
week or more
– Dropping instance size doesn't solve throughput problem
since all resources are cut, not just disk size
– Another reason not to use EBS since you'll “never” get close
to 100% disk utilization
● Due to aforementioned C* durability design, cost of data
for day 2..N is too high even if you drop replica count
Tying C* to raw data
● Every query must constrain a minimum of:
– Sensor ID
– Event Day
● Every query result must include a minimum of:
– Sensor ID
– Event Day
– Event ID
● Batches of (sensor_id, event_day, event_id) triples are
then used to look up the raw events from raw data storage
– This isn't always necessary (aggregates, correlations, etc.)
– Even with additional hops, full reads are still <1s
Using triples to batch writes
● Partition key starts with sensor id and event day
– Bonus: you get fresh ring location every day! Helps for
averaging out your schema mistakes over the TTL
● Event batches off of RabbitMQ are already constrained to
a single sensor id and event day
– Allows mapping a single AMQP read to a single C* write
(RabbitMQ is podded, not clustered)
– Flow state of pipeline becomes trivial to understand
● Batch C* writes on partition key, then data size (soft cap at
5120 bytes, C* inner warn)
Compaction woes, STCS & DTCS
● Used STCS in 2014/2015, expired data would get stuck ∞
– “We could rotate tables” → eh, no
– “We could rotate clusters” → oh c'mon, hell no
– “We could generate every historic permutation of keys within
that time bucket with Spark and run DELETEs” →...............
● Used DTCS in 2015, but expired data still got stuck ∞
– When deciding whether an SSTable is too old to compact,
compares “now” versus max timestamp (most recent write)
– If you write constantly (time series), then SSTables will rarely
or never stop compacting
– This means that you never realize the true value of DTCS for
time series, the ability to unlink whole SSTables from disk
Cluster disk states assuming const sensor count
Disk Util
Time
What you want
What you get
Initial build up to
retention period
MTCS, fixing DTCS
https://github.com/threatstack/mtcs
Now compare w/ min time
(oldest write)
MTCS settings
● Never run repairs (never worked on STCS or DTCS anyway)
and hinted handoff is off (great way to kill a cluster anyway)
● max_sstable_age_days = 1
base_time_seconds = 1 hour
● Results in roughly hour bucket sequential SSTables
– Reads are happy due to day or hour resolution, which have to
provide this in the partition key anyway
● Rest of DTCS sub-properties are default
● Not worried about really old and small SSTables since those
are simply unlinked “soon”
MTCS + sstablejanitor.sh
● Even with MTCS, SSTables were still not getting unlinked
● So enters sstablejanitor.sh
– Cron job fires it once per hour
– Iterates over each SSTable on disk for MTCS tables (chef/cron
feeds it a list of tables and their TTLs)
– Uses sstablemetadata to determine max timestamp
– If past TTL, then uses JMX to invoke CompactionManager's
forceUserDefinedCompaction on the table
● Hack? Yes, cron + sed + awk + JMX qualifies as a hack, but
it works like a charm and we don't carry expired data
● Bonus: don't need to reserve half your disks for compaction
Discussion
@threatstack
@sbisbee

More Related Content

What's hot

Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
DataStax
 
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
DataStax
 
Beginning Operations: 7 Deadly Sins for Apache Cassandra Ops
Beginning Operations: 7 Deadly Sins for Apache Cassandra OpsBeginning Operations: 7 Deadly Sins for Apache Cassandra Ops
Beginning Operations: 7 Deadly Sins for Apache Cassandra Ops
DataStax Academy
 
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
DataStax
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japan
Hiromitsu Komatsu
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
Instaclustr
 
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
DataStax
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
DataStax
 
Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and Spark
Ben Slater
 
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
DataStax
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
zznate
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
Hiromitsu Komatsu
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
ScyllaDB
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value StoreSantal Li
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
Ben Slater
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
DataStax
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
DataStax Academy
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
ScyllaDB
 
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
DataStax
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache Cassandra
DataStax
 

What's hot (20)

Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
 
Beginning Operations: 7 Deadly Sins for Apache Cassandra Ops
Beginning Operations: 7 Deadly Sins for Apache Cassandra OpsBeginning Operations: 7 Deadly Sins for Apache Cassandra Ops
Beginning Operations: 7 Deadly Sins for Apache Cassandra Ops
 
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japan
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
 
Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and Spark
 
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache Cassandra
 

Viewers also liked

Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
DataStax
 
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
DataStax
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
DataStax
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
DataStax
 
PagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra FailuresPagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra Failures
DataStax Academy
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
DataStax
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
DataStax
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
Robbie Strickland
 

Viewers also liked (8)

Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
 
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
PagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra FailuresPagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra Failures
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 

Similar to Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
Amazon Web Services
 
Using Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series WorkloadsUsing Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series Workloads
Jeff Jirsa
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series database
javier ramirez
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
Amazon Web Services
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
Amazon Web Services
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
All Things Open
 
CrowdStrike: Real World DTCS For Operators
CrowdStrike: Real World DTCS For OperatorsCrowdStrike: Real World DTCS For Operators
CrowdStrike: Real World DTCS For Operators
DataStax Academy
 
Cassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingMatthew Dennis
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operation
niallmilton
 
Forecasting database performance
Forecasting database performanceForecasting database performance
Forecasting database performance
Shenglin Du
 
Experiences building a multi region cassandra operations orchestrator on aws
Experiences building a multi region cassandra operations orchestrator on awsExperiences building a multi region cassandra operations orchestrator on aws
Experiences building a multi region cassandra operations orchestrator on aws
Diego Pacheco
 
Rankwave MOMENT™ (English)
Rankwave MOMENT™ (English)Rankwave MOMENT™ (English)
Rankwave MOMENT™ (English)
HyoungEun Kim
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
Amazon Web Services
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
datamantra
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
DataStax
 
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsBuilding an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Amazon Web Services
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
Observer, a "real life" time series application
Observer, a "real life" time series applicationObserver, a "real life" time series application
Observer, a "real life" time series application
Kévin LOVATO
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
Brendan Gregg
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
Ryan Bosshart
 

Similar to Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016 (20)

SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Using Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series WorkloadsUsing Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series Workloads
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series database
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
 
CrowdStrike: Real World DTCS For Operators
CrowdStrike: Real World DTCS For OperatorsCrowdStrike: Real World DTCS For Operators
CrowdStrike: Real World DTCS For Operators
 
Cassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data Modeling
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operation
 
Forecasting database performance
Forecasting database performanceForecasting database performance
Forecasting database performance
 
Experiences building a multi region cassandra operations orchestrator on aws
Experiences building a multi region cassandra operations orchestrator on awsExperiences building a multi region cassandra operations orchestrator on aws
Experiences building a multi region cassandra operations orchestrator on aws
 
Rankwave MOMENT™ (English)
Rankwave MOMENT™ (English)Rankwave MOMENT™ (English)
Rankwave MOMENT™ (English)
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
 
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsBuilding an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Observer, a "real life" time series application
Observer, a "real life" time series applicationObserver, a "real life" time series application
Observer, a "real life" time series application
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
 

More from DataStax

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
DataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
DataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
DataStax
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
DataStax
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
DataStax
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
DataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
DataStax
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
DataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
DataStax
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
DataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
DataStax
 

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Recently uploaded

Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 

Recently uploaded (20)

Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 

Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016

  • 1. Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra Sam Bisbee, Threat Stack CTO
  • 2. Typical [time series] problems on C* ● Disk utilization creates a scaling pattern of lighting money on fire – Only works for a month or two, even with 90% disk utilization ● Every write up we found focused on schema design for tracking integers across time – There are days we wish we only tracked integers ● Data drastically loses value over time, but C*'s design doesn't acknowledge this – TTLs only address 0 value states, not partial value – Ex., 99% of reads are for data in its first day ● Not all sensors are equal
  • 3. Categories of Time Series Data Volume of Tx's Size of Tx's CRUD, Web 2.0 System Monitoring (CPU, etc.) System Monitoring (CPU, etc.) Traditional object store Threat Stack
  • 4. Categories of Time Series Data Volume of Tx's Size of Tx's CRUD, Web 2.0 System Monitoring (CPU, etc.) System Monitoring (CPU, etc.) Traditional object store Threat Stack Traditional time series on C*, what everyone writes about “We're going to need a bigger boat. Or disks.”
  • 5. We care about this thing called margins (see: we're in Boston, not the Valley)
  • 6. Data at Threat Stack ● 5 to 10TBs per day of raw data – Crossed several TB per day in first few months of production with ~4 people ● 80,000 to 150,000 Tx per second, analyzed in real time – Internal goal of analyzing, persisting, and firing alerts in <1s ● 90% write to 10% read tx ● Pre-compute query results for 70% of queries for UI – Optimized lookup tables & complex data structures, not just “query & cache” ● 100% AWS, distrust of remote storage in our DNA – This is not just EBS bashing. This applies to all databases on all platforms, even a cage in a data center. ● By the way, we're on DSE 4.8.4 (C* 2.1)
  • 7. Generic data model ● Entire platform assumes that events form a partially ordered, eventually consistent, write ahead log – A wonderful C* use case, so long as you only INSERT ● UPDATE is a dirty word and C* counters are “banned” – We do our big counts elsewhere (“right tool for the right job”) ● No DELETEs, too many key permutations and don't want tombstones ● Duplicate writes will happen – Legitimate: fully or partially failed batches of writes – Legitimate: sensor resends data because it doesn't see platform's acknowledgement of data – How-do-you-even-computer: people cannot configure NTP, so have fun constantly receiving data from 1970 ● TTL on insert time, store and query on event time
  • 8. We need to show individual events or slices, cannot use time granularity rows (1min, 15min, 30min, 1hr, etc.)
  • 9. Creating and updating tables' schema ● ALTER TABLE isn't fun, so we support dual writes instead – Create new schema, performing dual reads for new & old – Cut writes over to new schema – After TTL time, DROP TABLE old ● Each step is verifiable with unit tests and metrics ● Maintains insert only data model for temporary disk util cost ● Allows trivial testing of analysis and A/B'ing of schema – Just toss a new schema in, gather some insights, and then feel free to drop it
  • 10. AWS Instance Types & EBS ● EBS is generally banned on our platform – Too many of us lived through the great outage – Too many of us cannot live with unpredictable I/O patterns – Biggest reason: you cannot RI EBS ● Originally used i2.2xlarge's in 2014/2015 – Considering amount of “learning” we did, we were very grateful for SSDs due to amount of streaming we had to do ● Moved to d2.xlarge's and d2.2xlarge's in 2015 – RAID 0 the spindles with xfs – We like the CPU and RAM to disk ratio, especially since compaction stops after a few hours
  • 11. $/TB on AWS i2.2xlarge d2.2xlarge c3.2xlarge + 6 x 2TB io1 EBS No Prepay $619.04 / 1.6TB = $386 / TB / month $586.92 / 12TB = $49.91 / TB / month $1,713.16 / 12TB = $142.77/TB/month Partial Prepay $530.37 / 1.6TB = $331.48/TB/month $502.12 / 12TB = $41.85 / TB / month $1,684.59 / 12TB = $140.39/TB/month Full Prepay $519.17 / 1.6TB = $324.85/TB/month $492 / 12TB = $41 / TB / month $1,680.84 / 12TB = $140.07/TB/month ● Amortizes one-time RI across 1yr, focusing on cost instead of cash out of pocket ● Does not account for N=3 in cluster, so x3 for each record, then x2 for worst case compaction headroom (realistically need MUCH LESS) ● c3 column assumes d2 comparison on disk size, not fair versus i2
  • 12. We only store some raw data in C* ● Deleting data proved too difficult in the early days, even with DTCS (slides coming on how we solved this) ● Re-streaming due to regular maintenance could take a week or more – Dropping instance size doesn't solve throughput problem since all resources are cut, not just disk size – Another reason not to use EBS since you'll “never” get close to 100% disk utilization ● Due to aforementioned C* durability design, cost of data for day 2..N is too high even if you drop replica count
  • 13. Tying C* to raw data ● Every query must constrain a minimum of: – Sensor ID – Event Day ● Every query result must include a minimum of: – Sensor ID – Event Day – Event ID ● Batches of (sensor_id, event_day, event_id) triples are then used to look up the raw events from raw data storage – This isn't always necessary (aggregates, correlations, etc.) – Even with additional hops, full reads are still <1s
  • 14. Using triples to batch writes ● Partition key starts with sensor id and event day – Bonus: you get fresh ring location every day! Helps for averaging out your schema mistakes over the TTL ● Event batches off of RabbitMQ are already constrained to a single sensor id and event day – Allows mapping a single AMQP read to a single C* write (RabbitMQ is podded, not clustered) – Flow state of pipeline becomes trivial to understand ● Batch C* writes on partition key, then data size (soft cap at 5120 bytes, C* inner warn)
  • 15. Compaction woes, STCS & DTCS ● Used STCS in 2014/2015, expired data would get stuck ∞ – “We could rotate tables” → eh, no – “We could rotate clusters” → oh c'mon, hell no – “We could generate every historic permutation of keys within that time bucket with Spark and run DELETEs” →............... ● Used DTCS in 2015, but expired data still got stuck ∞ – When deciding whether an SSTable is too old to compact, compares “now” versus max timestamp (most recent write) – If you write constantly (time series), then SSTables will rarely or never stop compacting – This means that you never realize the true value of DTCS for time series, the ability to unlink whole SSTables from disk
  • 16. Cluster disk states assuming const sensor count Disk Util Time What you want What you get Initial build up to retention period
  • 17. MTCS, fixing DTCS https://github.com/threatstack/mtcs Now compare w/ min time (oldest write)
  • 18. MTCS settings ● Never run repairs (never worked on STCS or DTCS anyway) and hinted handoff is off (great way to kill a cluster anyway) ● max_sstable_age_days = 1 base_time_seconds = 1 hour ● Results in roughly hour bucket sequential SSTables – Reads are happy due to day or hour resolution, which have to provide this in the partition key anyway ● Rest of DTCS sub-properties are default ● Not worried about really old and small SSTables since those are simply unlinked “soon”
  • 19. MTCS + sstablejanitor.sh ● Even with MTCS, SSTables were still not getting unlinked ● So enters sstablejanitor.sh – Cron job fires it once per hour – Iterates over each SSTable on disk for MTCS tables (chef/cron feeds it a list of tables and their TTLs) – Uses sstablemetadata to determine max timestamp – If past TTL, then uses JMX to invoke CompactionManager's forceUserDefinedCompaction on the table ● Hack? Yes, cron + sed + awk + JMX qualifies as a hack, but it works like a charm and we don't carry expired data ● Bonus: don't need to reserve half your disks for compaction
  • 20.