Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Cassandra at Spotify                   28th of March 2012
About this talk
About this talk An introduction Spotify, to our service and our persistent storage needs
About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings
About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we hav...
About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we hav...
About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we hav...
About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we hav...
Noa Resare
Noa Resare Stockholm, Sweden
Noa Resare Stockholm, Sweden Service Reliability Engineering
Noa Resare Stockholm, Sweden Service Reliability Engineering noa@spotify.com
Noa Resare Stockholm, Sweden Service Reliability Engineering noa@spotify.com @blippie
Spotify — all music, all the time
Spotify — all music, all the time A better user experience than file sharing.
Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients.
Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom bac...
Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom bac...
Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom bac...
Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom bac...
Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom bac...
Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom bac...
Innovative features in practice
Innovative features in practice  Playlist
Innovative features in practice  Playlist  A named list of tracks
Innovative features in practice  Playlist  A named list of tracks  Keep multiple devices in sync
Innovative features in practice  Playlist  A named list of tracks  Keep multiple devices in sync  Support nested playlists
Innovative features in practice  Playlist  A named list of tracks  Keep multiple devices in sync  Support nested playlists...
Innovative features in practice  Playlist  A named list of tracks  Keep multiple devices in sync  Support nested playlists...
Innovative features in practice  Playlist  A named list of tracks  Keep multiple devices in sync  Support nested playlists...
Innovative features in practice  Playlist  A named list of tracks  Keep multiple devices in sync  Support nested playlists...
Suggested solutions
Suggested solutions Flat files
Suggested solutions Flat files We don’t need ACID
Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass.
Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really)
Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL
Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook doe...
Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook doe...
Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook doe...
Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook doe...
Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook doe...
Enter Cassandra
Enter Cassandra Solves a large subset of storage related problems
Enter Cassandra Solves a large subset of storage related problems Sharding, replication
Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure
Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Free so...
Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Free so...
Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Free so...
Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Free so...
Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Free so...
Cassandra, winning!
Cassandra, winning! Major upgrades without service interruptions (in theory)
Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes
Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a...
Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a...
Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a...
Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a...
Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a...
Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a...
Cassandra flexibility for Playlist
Cassandra flexibility for Playlist The main use cases for playlist:
Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P
Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P Apply ...
Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P Apply ...
Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P Apply ...
Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P Apply ...
Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P Apply ...
Let me tell you a story
Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5
Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime?
Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load aver...
Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load aver...
Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load aver...
Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load aver...
Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load aver...
Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load aver...
Backups
Backups A few terabytes of live data, many nodes. Painful.
Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data
Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the da...
Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the da...
Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the da...
Our solution to backups
Our solution to backups NetworkTopologyStrategy is cool
Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1
Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1 Beware: tricky
Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1 Beware: tricky Once remo...
Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1 Beware: tricky Once remo...
Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1 Beware: tricky Once remo...
Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1 Beware: tricky Once remo...
Solid state is a game changer
Solid state is a game changer Asymmetrically sized datasets
Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD?
Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No.
Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No. With small disks, on disk data s...
Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No. With small disks, on disk data s...
Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No. With small disks, on disk data s...
Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No. With small disks, on disk data s...
Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No. With small disks, on disk data s...
Some unpleasant surprises
Some unpleasant surprises Immaturity.
Some unpleasant surprises Immaturity. Has anyone written nodetool -h ring?
Some unpleasant surprises Immaturity. Has anyone written nodetool -h ring? Broken on disk bloom filters in 0.8. Very painfu...
Some unpleasant surprises Immaturity. Has anyone written nodetool -h ring? Broken on disk bloom filters in 0.8. Very painfu...
Some unpleasant surprises Immaturity. Has anyone written nodetool -h ring? Broken on disk bloom filters in 0.8. Very painfu...
Lessons learned from backup datacenter
Lessons learned from backup datacenter Asymmetric cluster sizes are painful.
Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes
Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs t...
Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs t...
Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs t...
Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs t...
Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs t...
Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs t...
Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs t...
Spot the bug
Spot the bug Hector java cassandra driver:
Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServ...
Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServ...
Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServ...
Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServ...
Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServ...
Thrift payload size limits Communication with Cassandra is based on thrift Large mutations, larger than 15MiB Thrift drops...
Conclusions
Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed
Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed You need to keep an eye on bugs creat...
Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed You need to keep an eye on bugs creat...
Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed You need to keep an eye on bugs creat...
Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed You need to keep an eye on bugs creat...
Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed You need to keep an eye on bugs creat...
Questions? Answers.
Upcoming SlideShare
Loading in …5
×

Spotify cassandra london

3,006 views

Published on

A presentation from the Cassandra Europe conference about Cassandra use at Spotfiy

Published in: Technology
  • Be the first to comment

Spotify cassandra london

  1. 1. Cassandra at Spotify 28th of March 2012
  2. 2. About this talk
  3. 3. About this talk An introduction Spotify, to our service and our persistent storage needs
  4. 4. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings
  5. 5. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned
  6. 6. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned What I would have liked to have known a year ago
  7. 7. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned What I would have liked to have known a year ago Not a comparison between different NoSQL solutions
  8. 8. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned What I would have liked to have known a year ago Not a comparison between different NoSQL solutions The real reason: yes, we are hiring.
  9. 9. Noa Resare
  10. 10. Noa Resare Stockholm, Sweden
  11. 11. Noa Resare Stockholm, Sweden Service Reliability Engineering
  12. 12. Noa Resare Stockholm, Sweden Service Reliability Engineering noa@spotify.com
  13. 13. Noa Resare Stockholm, Sweden Service Reliability Engineering noa@spotify.com @blippie
  14. 14. Spotify — all music, all the time
  15. 15. Spotify — all music, all the time A better user experience than file sharing.
  16. 16. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients.
  17. 17. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability.
  18. 18. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability.
  19. 19. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability. 13 markets. More than ten million users.
  20. 20. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability. 13 markets. More than ten million users. 3 datacenters.
  21. 21. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability. 13 markets. More than ten million users. 3 datacenters. Tens of gigabits of data pushed per datacenter.
  22. 22. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability. 13 markets. More than ten million users. 3 datacenters. Tens of gigabits of data pushed per datacenter. Backend systems that support a large set of innovative features.
  23. 23. Innovative features in practice
  24. 24. Innovative features in practice Playlist
  25. 25. Innovative features in practice Playlist A named list of tracks
  26. 26. Innovative features in practice Playlist A named list of tracks Keep multiple devices in sync
  27. 27. Innovative features in practice Playlist A named list of tracks Keep multiple devices in sync Support nested playlists
  28. 28. Innovative features in practice Playlist A named list of tracks Keep multiple devices in sync Support nested playlists Offline editing, pubsub
  29. 29. Innovative features in practice Playlist A named list of tracks Keep multiple devices in sync Support nested playlists Offline editing, pubsub Scale. More than half a billion lists currently in the system
  30. 30. Innovative features in practice Playlist A named list of tracks Keep multiple devices in sync Support nested playlists Offline editing, pubsub Scale. More than half a billion lists currently in the system About 10 kHz on peak traffic.
  31. 31. Innovative features in practice Playlist A named list of tracks Keep multiple devices in sync Support nested playlists Offline editing, pubsub Scale. More than half a billion lists currently in the system About 10 kHz on peak traffic. Result: accidentally implemented VCS
  32. 32. Suggested solutions
  33. 33. Suggested solutions Flat files
  34. 34. Suggested solutions Flat files We don’t need ACID
  35. 35. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass.
  36. 36. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really)
  37. 37. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL
  38. 38. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this
  39. 39. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this Simple Key-Value store
  40. 40. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this Simple Key-Value store Tokyo cabinet, some experience
  41. 41. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this Simple Key-Value store Tokyo cabinet, some experience Clustered Key-Value store
  42. 42. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this Simple Key-Value store Tokyo cabinet, some experience Clustered Key-Value store Evaluated a lot, end game contestants HBase and Cassandra
  43. 43. Enter Cassandra
  44. 44. Enter Cassandra Solves a large subset of storage related problems
  45. 45. Enter Cassandra Solves a large subset of storage related problems Sharding, replication
  46. 46. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure
  47. 47. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Free software
  48. 48. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Free software Active community, commercial backing
  49. 49. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Free software Active community, commercial backing 66 + 18 + 9 + 28 production nodes
  50. 50. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Free software Active community, commercial backing 66 + 18 + 9 + 28 production nodes About twenty nodes for various testing clusters
  51. 51. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Free software Active community, commercial backing 66 + 18 + 9 + 28 production nodes About twenty nodes for various testing clusters Datasets ranging from 8T to a few gigs.
  52. 52. Cassandra, winning!
  53. 53. Cassandra, winning! Major upgrades without service interruptions (in theory)
  54. 54. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes
  55. 55. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you
  56. 56. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Uses the knowledge that sequential is I/O faster than random I/O
  57. 57. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Uses the knowledge that sequential is I/O faster than random I/O In case of inconsistencies, knows what to do
  58. 58. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Uses the knowledge that sequential is I/O faster than random I/O In case of inconsistencies, knows what to do Cross datacenter replication support
  59. 59. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Uses the knowledge that sequential is I/O faster than random I/O In case of inconsistencies, knows what to do Cross datacenter replication support Tinker friendly
  60. 60. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Uses the knowledge that sequential is I/O faster than random I/O In case of inconsistencies, knows what to do Cross datacenter replication support Tinker friendly Readable code
  61. 61. Cassandra flexibility for Playlist
  62. 62. Cassandra flexibility for Playlist The main use cases for playlist:
  63. 63. Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P
  64. 64. Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P Apply the following changes on top of version M of playlist Q
  65. 65. Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P Apply the following changes on top of version M of playlist Q This translates to CFs head and change
  66. 66. Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P Apply the following changes on top of version M of playlist Q This translates to CFs head and change Asymmetric sizes
  67. 67. Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P Apply the following changes on top of version M of playlist Q This translates to CFs head and change Asymmetric sizes Neat trick: read change with level=ONE, fallback to LOCAL_QUORUM
  68. 68. Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P Apply the following changes on top of version M of playlist Q This translates to CFs head and change Asymmetric sizes Neat trick: read change with level=ONE, fallback to LOCAL_QUORUM
  69. 69. Let me tell you a story
  70. 70. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5
  71. 71. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime?
  72. 72. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120.
  73. 73. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top
  74. 74. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top Mattias de Zalenski: log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557 (2^54) nanoseconds = 208.499983 days Somewhere nanosecond values are shifted ten bits?
  75. 75. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top Mattias de Zalenski: log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557 (2^54) nanoseconds = 208.499983 days Somewhere nanosecond values are shifted ten bits? Downtime for payment
  76. 76. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top Mattias de Zalenski: log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557 (2^54) nanoseconds = 208.499983 days Somewhere nanosecond values are shifted ten bits? Downtime for payment Downtime for account creation
  77. 77. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top Mattias de Zalenski: log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557 (2^54) nanoseconds = 208.499983 days Somewhere nanosecond values are shifted ten bits? Downtime for payment Downtime for account creation No downtime for cassandra backed systems
  78. 78. Backups
  79. 79. Backups A few terabytes of live data, many nodes. Painful.
  80. 80. Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data
  81. 81. Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data Non-compacted. Possibly a few tens of old versions.
  82. 82. Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data Non-compacted. Possibly a few tens of old versions. Pulling data off nodes evict hot data from page cache.
  83. 83. Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data Non-compacted. Possibly a few tens of old versions. Pulling data off nodes evict hot data from page cache. Initially, only full backups (pre 0.8)
  84. 84. Our solution to backups
  85. 85. Our solution to backups NetworkTopologyStrategy is cool
  86. 86. Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1
  87. 87. Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1 Beware: tricky
  88. 88. Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1 Beware: tricky Once removed from production performance considerations
  89. 89. Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1 Beware: tricky Once removed from production performance considerations Application level incremental backups
  90. 90. Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1 Beware: tricky Once removed from production performance considerations Application level incremental backups This week, cassandra level incremental backups
  91. 91. Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1 Beware: tricky Once removed from production performance considerations Application level incremental backups This week, cassandra level incremental backups Still some issues: lots of SSTables
  92. 92. Solid state is a game changer
  93. 93. Solid state is a game changer Asymmetrically sized datasets
  94. 94. Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD?
  95. 95. Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No.
  96. 96. Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No. With small disks, on disk data structure size matters a lot
  97. 97. Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No. With small disks, on disk data structure size matters a lot Our plan:
  98. 98. Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No. With small disks, on disk data structure size matters a lot Our plan: Leveled compaction strategy, new in 1.0
  99. 99. Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No. With small disks, on disk data structure size matters a lot Our plan: Leveled compaction strategy, new in 1.0 Hack cassandra to have configurable datadirs per keyspace.
  100. 100. Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No. With small disks, on disk data structure size matters a lot Our plan: Leveled compaction strategy, new in 1.0 Hack cassandra to have configurable datadirs per keyspace. Our patch is integrated in Cassandra 1.1
  101. 101. Some unpleasant surprises
  102. 102. Some unpleasant surprises Immaturity.
  103. 103. Some unpleasant surprises Immaturity. Has anyone written nodetool -h ring?
  104. 104. Some unpleasant surprises Immaturity. Has anyone written nodetool -h ring? Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0
  105. 105. Some unpleasant surprises Immaturity. Has anyone written nodetool -h ring? Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0 Small disk, high load, very possible to get into an Out Of Disk condition
  106. 106. Some unpleasant surprises Immaturity. Has anyone written nodetool -h ring? Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0 Small disk, high load, very possible to get into an Out Of Disk condition Logging is lacking
  107. 107. Lessons learned from backup datacenter
  108. 108. Lessons learned from backup datacenter Asymmetric cluster sizes are painful.
  109. 109. Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes
  110. 110. Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs that replicate all data 10 times
  111. 111. Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs that replicate all data 10 times The workaround: manual repairs
  112. 112. Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs that replicate all data 10 times The workaround: manual repairs Remove sstables from broken node (to free up space)
  113. 113. Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs that replicate all data 10 times The workaround: manual repairs Remove sstables from broken node (to free up space) Start it to have it take writes while repopulating
  114. 114. Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs that replicate all data 10 times The workaround: manual repairs Remove sstables from broken node (to free up space) Start it to have it take writes while repopulating Snapshot and move SSTables from 4 evenly spaced nodes
  115. 115. Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs that replicate all data 10 times The workaround: manual repairs Remove sstables from broken node (to free up space) Start it to have it take writes while repopulating Snapshot and move SSTables from 4 evenly spaced nodes Do a full compaction
  116. 116. Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs that replicate all data 10 times The workaround: manual repairs Remove sstables from broken node (to free up space) Start it to have it take writes while repopulating Snapshot and move SSTables from 4 evenly spaced nodes Do a full compaction Do a repair and hope for the best
  117. 117. Spot the bug
  118. 118. Spot the bug Hector java cassandra driver:
  119. 119. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; }
  120. 120. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; } Race condition
  121. 121. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; } Race condition java.lang.ArrayIndexOutOfBoundsException
  122. 122. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; } Race condition java.lang.ArrayIndexOutOfBoundsException After close to 2**31 requests
  123. 123. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; } Race condition java.lang.ArrayIndexOutOfBoundsException After close to 2**31 requests Took a few days
  124. 124. Thrift payload size limits Communication with Cassandra is based on thrift Large mutations, larger than 15MiB Thrift drops the underlying TCP connection Hector considers the connection drop a node specific problem Retries on all cassandra nodes Effectively shutting down all cassandra traffic
  125. 125. Conclusions
  126. 126. Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed
  127. 127. Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed You need to keep an eye on bugs created, be part of the community
  128. 128. Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky
  129. 129. Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky Lots of things gets fixed. You need to keep up with upstream
  130. 130. Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky Lots of things gets fixed. You need to keep up with upstream You need to integrate with monitoring and graphing
  131. 131. Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky Lots of things gets fixed. You need to keep up with upstream You need to integrate with monitoring and graphing Consider it a toolkit for constructing solutions.
  132. 132. Questions? Answers.

×