SlideShare a Scribd company logo
1 of 33
Scaling Twitter
with Cassandra
Ryan King
Storage Team
bit.ly/chirpcassandra

 ryan@twitter.com

        @rk
Legacy
•   vertically & horiztonally partitioned mysql

•   memcached (rows, indexes and fragments)

•   application managed
Legacy Drawbacks
•   many single-points-of-failure

•   hardware-intensive

•   manpower-intensive

•   tight coupling
Apache Cassandra
•   Apache top level project

•   originally developed at Facebook

•   Rackspace, Digg, SimpleGeo, Twitter, etc.
Why Cassandra?
•   highly available

•   consistent, eventually

•   decentralized

•   fault tolerant

•   elastic

•   flexible schema

•   high write throughput
What is Cassandra?
•   distributed database

•   Google's BigTable's data model

•   Amazon's Dynamo's infrastructure
Cassandra Data Model
•   keyspaces

•   column families

•   columns

•   super columns
Cassandra Infrastructure
•   partitioners

•   storage

•   querying
Partitioners
•   order-preserving

•   random

•   custom
Storage
•   commit log

•   memtables

•   sstables

•   compaction

•   bloom filters

•   indexes

•   key cache

•   row cache
Querying
•   get

•   multiget

•   range

•   slice
Consistency
Consistency
•   N, R, W
Consistency
•   N, R, W

•   N = number of replicas
Consistency
•   N, R, W

•   N = number of replicas

•   R = read replicas
Consistency
•   N, R, W

•   N = number of replicas

•   R = read replicas

•   W = write replicas
Consistency
•   N, R, W

•    N = number of replicas

•    R = read replicas

•    W = write replicas

•   send request, wait for specified number
Consistency
•   N, R, W

•    N = number of replicas

•    R = read replicas

•    W = write replicas

•   send request, wait for specified number

•   wait for others in background and perform read-
    repair
Consistency Levels
•   ZERO

•   ONE

•   QUORUM

•   ALL
Strong Consistency
•   If W + R > N, you will have consistency

    •   W=1, R=N

    •   W=N, R=1

    •   W=Q, R=Q where Q = N / 2 + 1
Eventuality
•   Hinted Handoff

•   Read Repair

•   Proactive Repair (Merkle trees)
Potential Consistency
Potential Consistency
•   causes

    •   write-through caching

    •   master-slave replication failures
Example
Read Repair
•   send read to all replicas

•   if they differ, resolve conflicts and update (in
    background)
Hinted Handoff
•   A wants to write to B

•   B is down

•   A tells C, "when B is back, send them this
    update"
Proactive Repair
•   use Merkle trees to find inconsistencies

•   resolve conflicts

•   send repaired data

•   triggered manually
Parallel Deployment
How we’re moving?
•   parallel deployments

•   incremental traffic shifting
Parallel Deployment
1. build new implementation
2. integrate it alongside existing
3. ...with switches for dynamically move/mirror traffic

4. turn up traffic
5. break something
6. Fix it

7. GOTO 4
?

More Related Content

What's hot

What's hot (20)

Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
 
Spark Streaming with Cassandra
Spark Streaming with CassandraSpark Streaming with Cassandra
Spark Streaming with Cassandra
 
Meet Up - Spark Stream Processing + Kafka
Meet Up - Spark Stream Processing + KafkaMeet Up - Spark Stream Processing + Kafka
Meet Up - Spark Stream Processing + Kafka
 
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
 
Developing a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and SprayDeveloping a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and Spray
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and Spark
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stack
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteTime Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
 
The How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkThe How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache Spark
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache spark
 

Viewers also liked

Big Data in Real-Time at Twitter
Big Data in Real-Time at TwitterBig Data in Real-Time at Twitter
Big Data in Real-Time at Twitter
nkallen
 

Viewers also liked (16)

NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)
 
My Application is Slow | Best Practices for Troubleshooting and Prevention
My Application is Slow | Best Practices for Troubleshooting and PreventionMy Application is Slow | Best Practices for Troubleshooting and Prevention
My Application is Slow | Best Practices for Troubleshooting and Prevention
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Biometric Databases and Hadoop__HadoopSummit2010
Biometric Databases and Hadoop__HadoopSummit2010Biometric Databases and Hadoop__HadoopSummit2010
Biometric Databases and Hadoop__HadoopSummit2010
 
Docker国内外本番環境サービス事例のご紹介
Docker国内外本番環境サービス事例のご紹介Docker国内外本番環境サービス事例のご紹介
Docker国内外本番環境サービス事例のご紹介
 
Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
 
Machine Data 101 Hands-on
Machine Data 101 Hands-onMachine Data 101 Hands-on
Machine Data 101 Hands-on
 
Aggregates
 Aggregates Aggregates
Aggregates
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling Twitter
 
BUILDING MATERIALS - SAND
BUILDING MATERIALS - SANDBUILDING MATERIALS - SAND
BUILDING MATERIALS - SAND
 
Facebook architecture presentation: scalability challenge
Facebook architecture presentation: scalability challengeFacebook architecture presentation: scalability challenge
Facebook architecture presentation: scalability challenge
 
Facebook Architecture - Breaking it Open
Facebook Architecture - Breaking it OpenFacebook Architecture - Breaking it Open
Facebook Architecture - Breaking it Open
 
Twitter - Architecture and Scalability lessons
Twitter - Architecture and Scalability lessonsTwitter - Architecture and Scalability lessons
Twitter - Architecture and Scalability lessons
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
 
Big Data in Real-Time at Twitter
Big Data in Real-Time at TwitterBig Data in Real-Time at Twitter
Big Data in Real-Time at Twitter
 

Similar to Scaling Twitter with Cassandra

No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
Byeongweon Moon
 
Cassandra
CassandraCassandra
Cassandra
exsuns
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
hothyfa
 
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingSDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
Korea Sdec
 

Similar to Scaling Twitter with Cassandra (20)

cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
Exploring the Fundamentals of YugabyteDB - Mydbops
Exploring the Fundamentals of YugabyteDB - Mydbops Exploring the Fundamentals of YugabyteDB - Mydbops
Exploring the Fundamentals of YugabyteDB - Mydbops
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
 
Data Diffing Based Software Architecture Patterns
Data Diffing Based Software Architecture PatternsData Diffing Based Software Architecture Patterns
Data Diffing Based Software Architecture Patterns
 
Thoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency ModelsThoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency Models
 
Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)
 
Cassandra
CassandraCassandra
Cassandra
 
MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)
 
Cassandra integrations
Cassandra integrationsCassandra integrations
Cassandra integrations
 
NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011
 
The Return of the Living Datalog
The Return of the Living DatalogThe Return of the Living Datalog
The Return of the Living Datalog
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
Concurrency and Multithreading Demistified - Reversim Summit 2014
Concurrency and Multithreading Demistified - Reversim Summit 2014Concurrency and Multithreading Demistified - Reversim Summit 2014
Concurrency and Multithreading Demistified - Reversim Summit 2014
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingSDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
 
Intro to Graphs for Fedict
Intro to Graphs for FedictIntro to Graphs for Fedict
Intro to Graphs for Fedict
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Scaling Twitter with Cassandra

Editor's Notes

  1. * storage team * personal background
  2. * began working on this problem last june * complexity had grown unmanageable * multiple internal customers * error domain grows as data size and complexity grow
  3. * every master db is a SPOF (failover is hard to pull off without strong coordination) * SPOFs lead to expensive hardware * app-managed hosts is tight coupling
  4. * our application is already tolerant of eventual consistency (actually more tolerant...) * in addition to scale, we want more flexibility than relational data models give us
  5. keyspace: database CF: table column: attribute SC: collection of attributes
  6. [insert diagrams of ring + tokens] nodes are arranged on a ring keys are mapped to the ring and written to the next N machines partitioners map keys to the ring
  7. [flow chart of how updates happen]
  8. if OPP, rows are ordered columns are ordered [diagram of range and slice]
  9. insert to mysql insert into memcache replicate to slave update mysql insert into memcache fails replication to slave fails
  10. Launching is shifting from roll back to roll forward