Scaling Twitter
with Cassandra
Ryan King
Storage Team
bit.ly/chirpcassandra

 ryan@twitter.com

        @rk
Legacy
•   vertically & horiztonally partitioned mysql

•   memcached (rows, indexes and fragments)

•   application manag...
Legacy Drawbacks
•   many single-points-of-failure

•   hardware-intensive

•   manpower-intensive

•   tight coupling
Apache Cassandra
•   Apache top level project

•   originally developed at Facebook

•   Rackspace, Digg, SimpleGeo, Twitt...
Why Cassandra?
•   highly available

•   consistent, eventually

•   decentralized

•   fault tolerant

•   elastic

•   f...
What is Cassandra?
•   distributed database

•   Google's BigTable's data model

•   Amazon's Dynamo's infrastructure
Cassandra Data Model
•   keyspaces

•   column families

•   columns

•   super columns
Cassandra Infrastructure
•   partitioners

•   storage

•   querying
Partitioners
•   order-preserving

•   random

•   custom
Storage
•   commit log

•   memtables

•   sstables

•   compaction

•   bloom filters

•   indexes

•   key cache

•   ro...
Querying
•   get

•   multiget

•   range

•   slice
Consistency
Consistency
•   N, R, W
Consistency
•   N, R, W

•   N = number of replicas
Consistency
•   N, R, W

•   N = number of replicas

•   R = read replicas
Consistency
•   N, R, W

•   N = number of replicas

•   R = read replicas

•   W = write replicas
Consistency
•   N, R, W

•    N = number of replicas

•    R = read replicas

•    W = write replicas

•   send request, w...
Consistency
•   N, R, W

•    N = number of replicas

•    R = read replicas

•    W = write replicas

•   send request, w...
Consistency Levels
•   ZERO

•   ONE

•   QUORUM

•   ALL
Strong Consistency
•   If W + R > N, you will have consistency

    •   W=1, R=N

    •   W=N, R=1

    •   W=Q, R=Q where...
Eventuality
•   Hinted Handoff

•   Read Repair

•   Proactive Repair (Merkle trees)
Potential Consistency
Potential Consistency
•   causes

    •   write-through caching

    •   master-slave replication failures
Example
Read Repair
•   send read to all replicas

•   if they differ, resolve conflicts and update (in
    background)
Hinted Handoff
•   A wants to write to B

•   B is down

•   A tells C, "when B is back, send them this
    update"
Proactive Repair
•   use Merkle trees to find inconsistencies

•   resolve conflicts

•   send repaired data

•   triggere...
Parallel Deployment
How we’re moving?
•   parallel deployments

•   incremental traffic shifting
Parallel Deployment
1. build new implementation
2. integrate it alongside existing
3. ...with switches for dynamically mov...
?
Scaling Twitter with Cassandra
Upcoming SlideShare
Loading in...5
×

Scaling Twitter with Cassandra

24,151

Published on

Published in: Technology
6 Comments
112 Likes
Statistics
Notes
No Downloads
Views
Total Views
24,151
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
998
Comments
6
Likes
112
Embeds 0
No embeds

No notes for slide

  • * storage team
    * personal background

  • * began working on this problem last june
    * complexity had grown unmanageable
    * multiple internal customers
    * error domain grows as data size and complexity grow
  • * every master db is a SPOF (failover is hard to pull off without strong coordination)
    * SPOFs lead to expensive hardware
    * app-managed hosts is tight coupling

  • * our application is already tolerant of eventual consistency (actually more tolerant...)
    * in addition to scale, we want more flexibility than relational data models give us


  • keyspace: database
    CF: table
    column: attribute
    SC: collection of attributes

  • [insert diagrams of ring + tokens]

    nodes are arranged on a ring
    keys are mapped to the ring and written to the next N machines
    partitioners map keys to the ring
  • [flow chart of how updates happen]

  • if OPP, rows are ordered
    columns are ordered

    [diagram of range and slice]











  • insert to mysql
    insert into memcache
    replicate to slave
    update mysql
    insert into memcache fails
    replication to slave fails





  • Launching is shifting from roll back to roll forward


  • Scaling Twitter with Cassandra

    1. 1. Scaling Twitter with Cassandra Ryan King Storage Team
    2. 2. bit.ly/chirpcassandra ryan@twitter.com @rk
    3. 3. Legacy • vertically & horiztonally partitioned mysql • memcached (rows, indexes and fragments) • application managed
    4. 4. Legacy Drawbacks • many single-points-of-failure • hardware-intensive • manpower-intensive • tight coupling
    5. 5. Apache Cassandra • Apache top level project • originally developed at Facebook • Rackspace, Digg, SimpleGeo, Twitter, etc.
    6. 6. Why Cassandra? • highly available • consistent, eventually • decentralized • fault tolerant • elastic • flexible schema • high write throughput
    7. 7. What is Cassandra? • distributed database • Google's BigTable's data model • Amazon's Dynamo's infrastructure
    8. 8. Cassandra Data Model • keyspaces • column families • columns • super columns
    9. 9. Cassandra Infrastructure • partitioners • storage • querying
    10. 10. Partitioners • order-preserving • random • custom
    11. 11. Storage • commit log • memtables • sstables • compaction • bloom filters • indexes • key cache • row cache
    12. 12. Querying • get • multiget • range • slice
    13. 13. Consistency
    14. 14. Consistency • N, R, W
    15. 15. Consistency • N, R, W • N = number of replicas
    16. 16. Consistency • N, R, W • N = number of replicas • R = read replicas
    17. 17. Consistency • N, R, W • N = number of replicas • R = read replicas • W = write replicas
    18. 18. Consistency • N, R, W • N = number of replicas • R = read replicas • W = write replicas • send request, wait for specified number
    19. 19. Consistency • N, R, W • N = number of replicas • R = read replicas • W = write replicas • send request, wait for specified number • wait for others in background and perform read- repair
    20. 20. Consistency Levels • ZERO • ONE • QUORUM • ALL
    21. 21. Strong Consistency • If W + R > N, you will have consistency • W=1, R=N • W=N, R=1 • W=Q, R=Q where Q = N / 2 + 1
    22. 22. Eventuality • Hinted Handoff • Read Repair • Proactive Repair (Merkle trees)
    23. 23. Potential Consistency
    24. 24. Potential Consistency • causes • write-through caching • master-slave replication failures
    25. 25. Example
    26. 26. Read Repair • send read to all replicas • if they differ, resolve conflicts and update (in background)
    27. 27. Hinted Handoff • A wants to write to B • B is down • A tells C, "when B is back, send them this update"
    28. 28. Proactive Repair • use Merkle trees to find inconsistencies • resolve conflicts • send repaired data • triggered manually
    29. 29. Parallel Deployment
    30. 30. How we’re moving? • parallel deployments • incremental traffic shifting
    31. 31. Parallel Deployment 1. build new implementation 2. integrate it alongside existing 3. ...with switches for dynamically move/mirror traffic 4. turn up traffic 5. break something 6. Fix it 7. GOTO 4
    32. 32. ?
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×