Scaling Twitter with Cassandra

28,339 views
27,607 views

Published on

Published in: Technology
6 Comments
113 Likes
Statistics
Notes
No Downloads
Views
Total views
28,339
On SlideShare
0
From Embeds
0
Number of Embeds
5,169
Actions
Shares
0
Downloads
1,008
Comments
6
Likes
113
Embeds 0
No embeds

No notes for slide

  • * storage team
    * personal background

  • * began working on this problem last june
    * complexity had grown unmanageable
    * multiple internal customers
    * error domain grows as data size and complexity grow
  • * every master db is a SPOF (failover is hard to pull off without strong coordination)
    * SPOFs lead to expensive hardware
    * app-managed hosts is tight coupling

  • * our application is already tolerant of eventual consistency (actually more tolerant...)
    * in addition to scale, we want more flexibility than relational data models give us


  • keyspace: database
    CF: table
    column: attribute
    SC: collection of attributes

  • [insert diagrams of ring + tokens]

    nodes are arranged on a ring
    keys are mapped to the ring and written to the next N machines
    partitioners map keys to the ring
  • [flow chart of how updates happen]

  • if OPP, rows are ordered
    columns are ordered

    [diagram of range and slice]











  • insert to mysql
    insert into memcache
    replicate to slave
    update mysql
    insert into memcache fails
    replication to slave fails





  • Launching is shifting from roll back to roll forward


  • Scaling Twitter with Cassandra

    1. Scaling Twitter with Cassandra Ryan King Storage Team
    2. bit.ly/chirpcassandra ryan@twitter.com @rk
    3. Legacy • vertically & horiztonally partitioned mysql • memcached (rows, indexes and fragments) • application managed
    4. Legacy Drawbacks • many single-points-of-failure • hardware-intensive • manpower-intensive • tight coupling
    5. Apache Cassandra • Apache top level project • originally developed at Facebook • Rackspace, Digg, SimpleGeo, Twitter, etc.
    6. Why Cassandra? • highly available • consistent, eventually • decentralized • fault tolerant • elastic • flexible schema • high write throughput
    7. What is Cassandra? • distributed database • Google's BigTable's data model • Amazon's Dynamo's infrastructure
    8. Cassandra Data Model • keyspaces • column families • columns • super columns
    9. Cassandra Infrastructure • partitioners • storage • querying
    10. Partitioners • order-preserving • random • custom
    11. Storage • commit log • memtables • sstables • compaction • bloom filters • indexes • key cache • row cache
    12. Querying • get • multiget • range • slice
    13. Consistency
    14. Consistency • N, R, W
    15. Consistency • N, R, W • N = number of replicas
    16. Consistency • N, R, W • N = number of replicas • R = read replicas
    17. Consistency • N, R, W • N = number of replicas • R = read replicas • W = write replicas
    18. Consistency • N, R, W • N = number of replicas • R = read replicas • W = write replicas • send request, wait for specified number
    19. Consistency • N, R, W • N = number of replicas • R = read replicas • W = write replicas • send request, wait for specified number • wait for others in background and perform read- repair
    20. Consistency Levels • ZERO • ONE • QUORUM • ALL
    21. Strong Consistency • If W + R > N, you will have consistency • W=1, R=N • W=N, R=1 • W=Q, R=Q where Q = N / 2 + 1
    22. Eventuality • Hinted Handoff • Read Repair • Proactive Repair (Merkle trees)
    23. Potential Consistency
    24. Potential Consistency • causes • write-through caching • master-slave replication failures
    25. Example
    26. Read Repair • send read to all replicas • if they differ, resolve conflicts and update (in background)
    27. Hinted Handoff • A wants to write to B • B is down • A tells C, "when B is back, send them this update"
    28. Proactive Repair • use Merkle trees to find inconsistencies • resolve conflicts • send repaired data • triggered manually
    29. Parallel Deployment
    30. How we’re moving? • parallel deployments • incremental traffic shifting
    31. Parallel Deployment 1. build new implementation 2. integrate it alongside existing 3. ...with switches for dynamically move/mirror traffic 4. turn up traffic 5. break something 6. Fix it 7. GOTO 4
    32. ?

    ×