SlideShare a Scribd company logo
1 of 15
Cassandra at Twitter
                   (one use case)
Ryan King


Cassandra Meetup
January 12, 2011

                                    TM
History
History
‣   Started port of Tweets to Cassandra June 2009
‣   Started other Cassandra projects in 2009 (more on this later)
‣   Abandoned Tweet in 2010
Time Series Data
Use cases
‣   realtime traffic/engagement analytics
‣   systems monitoring
Time Series Data
‣   write heavy
‣   stored temporally
‣   viewed temporarily
‣   hierarchical aggregation
Data Model
‣   Distributed Counters (CASSANDRA-1072)
‣   each time series is a row (or rows) of counters
‣   slice over rows to get recent data
Data Model
‣   An example (not exactly the way we do it):



                              2011-01-12T10:00   2011-01-12T10:01   ...
        host:web1:load1               5                  4          ...
        host:web2:load1               4                  3          ...
     cluster:web:load1:sum          576                505          ...
    cluster:web:load1:count         100                 95          ...
Aggregation
‣   Measured every minute (or continuously)
‣   Rollup to courser granularities
‣   More Counters! (aka, let’s do it live)
Aggregation
        Minutes            2011-01-12T10:00   2011-01-12T10:01   ...
  host:web1:load1:sum              5                  4          ...
host:web1:load1:count             1                  1           ...
  host:web2:load1:sum             4                  3           ...
host:web1:load1:count             1                  1           ...
  cluster:web:load1:sum          576                505          ...
 cluster:web:load1:count         100                 95          ...
         Hours              2011-01-12T10      2011-01-12T11     ...
  host:web1:load1:sum            300                250          ...
host:web1:load1:count            60                 59           ...
  host:web2:load1:sum             4                  3           ...
host:web1:load1:count            61                 60           ...
 cluster:web:load1:sum          3010               2995          ...
cluster:web:load1:count         6000               5990          ...
Aggregation
‣   other dimensions besides time:
‣    clusters
‣    racks / dcs, etc
‣   And combinations of the above
Pros / Cons
‣   Pros
‣    real-time data (average 30s between measurement and visibility)
‣    real time aggregation
‣    flexible data retention (once counters and TTLs work together)
‣   Cons
‣    Storage-intensive
‣    Slow reads
Questions?

ryan@twitter.com
twitter.com/rk

                   TM
Obligatory Plug.
twitter.com/jobs


                   TM

More Related Content

What's hot

Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInChris Riccomini
 
Reactive programming using rx java & akka actors - pdx-scala - june 2014
Reactive programming   using rx java & akka actors - pdx-scala - june 2014Reactive programming   using rx java & akka actors - pdx-scala - june 2014
Reactive programming using rx java & akka actors - pdx-scala - june 2014Thomas Lockney
 
Benchmarking for HTTP/2
Benchmarking for HTTP/2Benchmarking for HTTP/2
Benchmarking for HTTP/2Kit Chan
 
Prometheus casual talk1
Prometheus casual talk1Prometheus casual talk1
Prometheus casual talk1wyukawa
 
moabcon2012 - Transitioning from Grid Engine
moabcon2012 - Transitioning from Grid Enginemoabcon2012 - Transitioning from Grid Engine
moabcon2012 - Transitioning from Grid EngineFrédérick Lefebvre
 
Intro to Functional Programming with RxJava
Intro to Functional Programming with RxJavaIntro to Functional Programming with RxJava
Intro to Functional Programming with RxJavaMike Nakhimovich
 
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward
 
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
Administrative techniques to reduce Kafka costs | Anna Kepler, ViasatAdministrative techniques to reduce Kafka costs | Anna Kepler, Viasat
Administrative techniques to reduce Kafka costs | Anna Kepler, ViasatHostedbyConfluent
 
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the CloudScaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the CloudMarin Dimitrov
 
Monitoring Kafka w/ Prometheus
Monitoring Kafka w/ PrometheusMonitoring Kafka w/ Prometheus
Monitoring Kafka w/ Prometheuskawamuray
 
Self Created Load Balancer for MTA on AWS
Self Created Load Balancer for MTA on AWSSelf Created Load Balancer for MTA on AWS
Self Created Load Balancer for MTA on AWSsharu1204
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...Codemotion Tel Aviv
 
An introduction to node3
An introduction to node3An introduction to node3
An introduction to node3Vivian S. Zhang
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusTobias Schmidt
 
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...Flink Forward
 
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...Tokuhiro Matsuno
 

What's hot (20)

Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
 
Reactive programming using rx java & akka actors - pdx-scala - june 2014
Reactive programming   using rx java & akka actors - pdx-scala - june 2014Reactive programming   using rx java & akka actors - pdx-scala - june 2014
Reactive programming using rx java & akka actors - pdx-scala - june 2014
 
Benchmarking for HTTP/2
Benchmarking for HTTP/2Benchmarking for HTTP/2
Benchmarking for HTTP/2
 
Prometheus casual talk1
Prometheus casual talk1Prometheus casual talk1
Prometheus casual talk1
 
moabcon2012 - Transitioning from Grid Engine
moabcon2012 - Transitioning from Grid Enginemoabcon2012 - Transitioning from Grid Engine
moabcon2012 - Transitioning from Grid Engine
 
Intro to Functional Programming with RxJava
Intro to Functional Programming with RxJavaIntro to Functional Programming with RxJava
Intro to Functional Programming with RxJava
 
Unc203
Unc203Unc203
Unc203
 
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
 
WebSockets
WebSocketsWebSockets
WebSockets
 
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
Administrative techniques to reduce Kafka costs | Anna Kepler, ViasatAdministrative techniques to reduce Kafka costs | Anna Kepler, Viasat
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
 
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the CloudScaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
 
Monitoring Kafka w/ Prometheus
Monitoring Kafka w/ PrometheusMonitoring Kafka w/ Prometheus
Monitoring Kafka w/ Prometheus
 
Self Created Load Balancer for MTA on AWS
Self Created Load Balancer for MTA on AWSSelf Created Load Balancer for MTA on AWS
Self Created Load Balancer for MTA on AWS
 
Reactive cocoa
Reactive cocoaReactive cocoa
Reactive cocoa
 
C100 k and go
C100 k and goC100 k and go
C100 k and go
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
 
An introduction to node3
An introduction to node3An introduction to node3
An introduction to node3
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...
Flink Forward Berlin 2017: Ruben Casado Tejedor - Flink-Kudu connector: an op...
 
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
 

Viewers also liked

Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011mubarakss
 
Twitter with Cassandra
Twitter with CassandraTwitter with Cassandra
Twitter with CassandraAdhish Singla
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016StampedeCon
 
Curso de creación de Dashboards Open Source
Curso de creación de Dashboards Open SourceCurso de creación de Dashboards Open Source
Curso de creación de Dashboards Open SourceStratebi
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorialmubarakss
 
69 claves para conocer Big Data
69 claves para conocer Big Data69 claves para conocer Big Data
69 claves para conocer Big DataStratebi
 

Viewers also liked (6)

Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011
 
Twitter with Cassandra
Twitter with CassandraTwitter with Cassandra
Twitter with Cassandra
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
 
Curso de creación de Dashboards Open Source
Curso de creación de Dashboards Open SourceCurso de creación de Dashboards Open Source
Curso de creación de Dashboards Open Source
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorial
 
69 claves para conocer Big Data
69 claves para conocer Big Data69 claves para conocer Big Data
69 claves para conocer Big Data
 

Similar to Cassandra at Twitter for Real-Time Analytics

StackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStackStackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStackChiradeep Vittal
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureDataStax Academy
 
Finding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQLFinding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQLOlivier Doucet
 
Tomcat from a cluster to the cloud on RP3
Tomcat from a cluster to the cloud on RP3Tomcat from a cluster to the cloud on RP3
Tomcat from a cluster to the cloud on RP3Jean-Frederic Clere
 
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...NETWAYS
 
QNIBTerminal: Understand your datacenter by overlaying multiple information l...
QNIBTerminal: Understand your datacenter by overlaying multiple information l...QNIBTerminal: Understand your datacenter by overlaying multiple information l...
QNIBTerminal: Understand your datacenter by overlaying multiple information l...QNIB Solutions
 
Riak add presentation
Riak add presentationRiak add presentation
Riak add presentationIlya Bogunov
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging振东 刘
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightScyllaDB
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaDataWorks Summit
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLNick Dearden
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Eric Sammer
 
Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite Hortonworks
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache CalciteJulian Hyde
 
UKOUG, Lies, Damn Lies and I/O Statistics
UKOUG, Lies, Damn Lies and I/O StatisticsUKOUG, Lies, Damn Lies and I/O Statistics
UKOUG, Lies, Damn Lies and I/O StatisticsKyle Hailey
 
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...Continuent
 
Four Ways to Improve ASP .NET Performance and Scalability
 Four Ways to Improve ASP .NET Performance and Scalability Four Ways to Improve ASP .NET Performance and Scalability
Four Ways to Improve ASP .NET Performance and ScalabilityAlachisoft
 

Similar to Cassandra at Twitter for Real-Time Analytics (20)

Cloud Probing
Cloud ProbingCloud Probing
Cloud Probing
 
StackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStackStackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStack
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & Azure
 
Finding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQLFinding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQL
 
Tomcat from a cluster to the cloud on RP3
Tomcat from a cluster to the cloud on RP3Tomcat from a cluster to the cloud on RP3
Tomcat from a cluster to the cloud on RP3
 
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
 
QNIBTerminal: Understand your datacenter by overlaying multiple information l...
QNIBTerminal: Understand your datacenter by overlaying multiple information l...QNIBTerminal: Understand your datacenter by overlaying multiple information l...
QNIBTerminal: Understand your datacenter by overlaying multiple information l...
 
Riak add presentation
Riak add presentationRiak add presentation
Riak add presentation
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQL
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...
 
Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
UKOUG, Lies, Damn Lies and I/O Statistics
UKOUG, Lies, Damn Lies and I/O StatisticsUKOUG, Lies, Damn Lies and I/O Statistics
UKOUG, Lies, Damn Lies and I/O Statistics
 
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...
 
Four Ways to Improve ASP .NET Performance and Scalability
 Four Ways to Improve ASP .NET Performance and Scalability Four Ways to Improve ASP .NET Performance and Scalability
Four Ways to Improve ASP .NET Performance and Scalability
 
Micronaut brainbit
Micronaut brainbitMicronaut brainbit
Micronaut brainbit
 

Cassandra at Twitter for Real-Time Analytics

  • 1. Cassandra at Twitter (one use case) Ryan King Cassandra Meetup January 12, 2011 TM
  • 3. History ‣ Started port of Tweets to Cassandra June 2009 ‣ Started other Cassandra projects in 2009 (more on this later) ‣ Abandoned Tweet in 2010
  • 5. Use cases ‣ realtime traffic/engagement analytics ‣ systems monitoring
  • 6. Time Series Data ‣ write heavy ‣ stored temporally ‣ viewed temporarily ‣ hierarchical aggregation
  • 7. Data Model ‣ Distributed Counters (CASSANDRA-1072) ‣ each time series is a row (or rows) of counters ‣ slice over rows to get recent data
  • 8. Data Model ‣ An example (not exactly the way we do it): 2011-01-12T10:00 2011-01-12T10:01 ... host:web1:load1 5 4 ... host:web2:load1 4 3 ... cluster:web:load1:sum 576 505 ... cluster:web:load1:count 100 95 ...
  • 9. Aggregation ‣ Measured every minute (or continuously) ‣ Rollup to courser granularities ‣ More Counters! (aka, let’s do it live)
  • 10. Aggregation Minutes 2011-01-12T10:00 2011-01-12T10:01 ... host:web1:load1:sum 5 4 ... host:web1:load1:count 1 1 ... host:web2:load1:sum 4 3 ... host:web1:load1:count 1 1 ... cluster:web:load1:sum 576 505 ... cluster:web:load1:count 100 95 ... Hours 2011-01-12T10 2011-01-12T11 ... host:web1:load1:sum 300 250 ... host:web1:load1:count 60 59 ... host:web2:load1:sum 4 3 ... host:web1:load1:count 61 60 ... cluster:web:load1:sum 3010 2995 ... cluster:web:load1:count 6000 5990 ...
  • 11. Aggregation ‣ other dimensions besides time: ‣ clusters ‣ racks / dcs, etc ‣ And combinations of the above
  • 12. Pros / Cons ‣ Pros ‣ real-time data (average 30s between measurement and visibility) ‣ real time aggregation ‣ flexible data retention (once counters and TTLs work together) ‣ Cons ‣ Storage-intensive ‣ Slow reads
  • 13.

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. this is our most successful use case\nwe had a general need for realtime high-scale time series data\n
  5. \n
  6. \n
  7. counters work in trunk\nsome things, like averages can be modeled as several counters that get combined at read time\n
  8. \n
  9. \n
  10. \n
  11. you can aggregate by many dimensions at once\nevery combination is persisted separately\n
  12. \n
  13. \n
  14. \n
  15. \n