SlideShare a Scribd company logo
Cassandra at Spotify




                   28th of March 2012
About this talk
About this talk
 An introduction Spotify, to our service and
 our persistent storage needs
About this talk
 An introduction Spotify, to our service and
 our persistent storage needs
 What Cassandra brings
About this talk
 An introduction Spotify, to our service and
 our persistent storage needs
 What Cassandra brings
 What we have learned
About this talk
 An introduction Spotify, to our service and
 our persistent storage needs
 What Cassandra brings
 What we have learned
 What I would have liked to have known a year
 ago
About this talk
 An introduction Spotify, to our service and
 our persistent storage needs
 What Cassandra brings
 What we have learned
 What I would have liked to have known a year
 ago
 Not a comparison between different NoSQL
 solutions
About this talk
 An introduction Spotify, to our service and
 our persistent storage needs
 What Cassandra brings
 What we have learned
 What I would have liked to have known a year
 ago
 Not a comparison between different NoSQL
 solutions
 The real reason: yes, we are hiring.
Noa Resare
Noa Resare
 Stockholm, Sweden
Noa Resare
 Stockholm, Sweden
 Service Reliability Engineering
Noa Resare
 Stockholm, Sweden
 Service Reliability Engineering
 noa@spotify.com
Noa Resare
 Stockholm, Sweden
 Service Reliability Engineering
 noa@spotify.com
 @blippie
Spotify — all music, all the time
Spotify — all music, all the time
 A better user experience than file sharing.
Spotify — all music, all the time
 A better user experience than file sharing.
 Native desktop and mobile clients.
Spotify — all music, all the time
 A better user experience than file sharing.
 Native desktop and mobile clients.
 Custom backend, built for performance and
 scalability.
Spotify — all music, all the time
 A better user experience than file sharing.
 Native desktop and mobile clients.
 Custom backend, built for performance and
 scalability.
Spotify — all music, all the time
 A better user experience than file sharing.
 Native desktop and mobile clients.
 Custom backend, built for performance and
 scalability.


 13 markets. More than ten million users.
Spotify — all music, all the time
 A better user experience than file sharing.
 Native desktop and mobile clients.
 Custom backend, built for performance and
 scalability.


 13 markets. More than ten million users.
 3 datacenters.
Spotify — all music, all the time
 A better user experience than file sharing.
 Native desktop and mobile clients.
 Custom backend, built for performance and
 scalability.


 13 markets. More than ten million users.
 3 datacenters.
 Tens of gigabits of data pushed per
 datacenter.
Spotify — all music, all the time
 A better user experience than file sharing.
 Native desktop and mobile clients.
 Custom backend, built for performance and
 scalability.


 13 markets. More than ten million users.
 3 datacenters.
 Tens of gigabits of data pushed per
 datacenter.
 Backend systems that support a large set of
 innovative features.
Innovative features in practice
Innovative features in practice
  Playlist
Innovative features in practice
  Playlist
  A named list of tracks
Innovative features in practice
  Playlist
  A named list of tracks
  Keep multiple devices in sync
Innovative features in practice
  Playlist
  A named list of tracks
  Keep multiple devices in sync
  Support nested playlists
Innovative features in practice
  Playlist
  A named list of tracks
  Keep multiple devices in sync
  Support nested playlists
  Offline editing, pubsub
Innovative features in practice
  Playlist
  A named list of tracks
  Keep multiple devices in sync
  Support nested playlists
  Offline editing, pubsub
  Scale. More than half a billion lists currently
  in the system
Innovative features in practice
  Playlist
  A named list of tracks
  Keep multiple devices in sync
  Support nested playlists
  Offline editing, pubsub
  Scale. More than half a billion lists currently
  in the system
  About 10 kHz on peak traffic.
Innovative features in practice
  Playlist
  A named list of tracks
  Keep multiple devices in sync
  Support nested playlists
  Offline editing, pubsub
  Scale. More than half a billion lists currently
  in the system
  About 10 kHz on peak traffic.
  Result: accidentally implemented VCS
Suggested solutions
Suggested solutions
 Flat files
Suggested solutions
 Flat files
 We don’t need ACID
Suggested solutions
 Flat files
 We don’t need ACID
 Linux page cache kicks ass.
Suggested solutions
 Flat files
 We don’t need ACID
 Linux page cache kicks ass.
 (Not really)
Suggested solutions
 Flat files
 We don’t need ACID
 Linux page cache kicks ass.
 (Not really)
 SQL
Suggested solutions
 Flat files
 We don’t need ACID
 Linux page cache kicks ass.
 (Not really)
 SQL
 Tried and true. Facebook does this
Suggested solutions
 Flat files
 We don’t need ACID
 Linux page cache kicks ass.
 (Not really)
 SQL
 Tried and true. Facebook does this
 Simple Key-Value store
Suggested solutions
 Flat files
 We don’t need ACID
 Linux page cache kicks ass.
 (Not really)
 SQL
 Tried and true. Facebook does this
 Simple Key-Value store
 Tokyo cabinet, some experience
Suggested solutions
 Flat files
 We don’t need ACID
 Linux page cache kicks ass.
 (Not really)
 SQL
 Tried and true. Facebook does this
 Simple Key-Value store
 Tokyo cabinet, some experience
 Clustered Key-Value store
Suggested solutions
 Flat files
 We don’t need ACID
 Linux page cache kicks ass.
 (Not really)
 SQL
 Tried and true. Facebook does this
 Simple Key-Value store
 Tokyo cabinet, some experience
 Clustered Key-Value store
 Evaluated a lot, end game contestants HBase
 and Cassandra
Enter Cassandra
Enter Cassandra
 Solves a large subset of storage related
 problems
Enter Cassandra
 Solves a large subset of storage related
 problems
 Sharding, replication
Enter Cassandra
 Solves a large subset of storage related
 problems
 Sharding, replication
 No single point of failure
Enter Cassandra
 Solves a large subset of storage related
 problems
 Sharding, replication
 No single point of failure
 Free software
Enter Cassandra
 Solves a large subset of storage related
 problems
 Sharding, replication
 No single point of failure
 Free software
 Active community, commercial backing
Enter Cassandra
 Solves a large subset of storage related
 problems
 Sharding, replication
 No single point of failure
 Free software
 Active community, commercial backing
 66 + 18 + 9 + 28 production nodes
Enter Cassandra
 Solves a large subset of storage related
 problems
 Sharding, replication
 No single point of failure
 Free software
 Active community, commercial backing
 66 + 18 + 9 + 28 production nodes
 About twenty nodes for various testing
 clusters
Enter Cassandra
 Solves a large subset of storage related
 problems
 Sharding, replication
 No single point of failure
 Free software
 Active community, commercial backing
 66 + 18 + 9 + 28 production nodes
 About twenty nodes for various testing
 clusters
 Datasets ranging from 8T to a few gigs.
Cassandra, winning!
Cassandra, winning!
 Major upgrades without service interruptions
 (in theory)
Cassandra, winning!
 Major upgrades without service interruptions
 (in theory)
 Crazy fast writes
Cassandra, winning!
 Major upgrades without service interruptions
 (in theory)
 Crazy fast writes
 Not just because you have a hardware RAID
 card that is good at lying to you
Cassandra, winning!
 Major upgrades without service interruptions
 (in theory)
 Crazy fast writes
 Not just because you have a hardware RAID
 card that is good at lying to you
 Uses the knowledge that sequential is I/O
 faster than random I/O
Cassandra, winning!
 Major upgrades without service interruptions
 (in theory)
 Crazy fast writes
 Not just because you have a hardware RAID
 card that is good at lying to you
 Uses the knowledge that sequential is I/O
 faster than random I/O
 In case of inconsistencies, knows what to do
Cassandra, winning!
 Major upgrades without service interruptions
 (in theory)
 Crazy fast writes
 Not just because you have a hardware RAID
 card that is good at lying to you
 Uses the knowledge that sequential is I/O
 faster than random I/O
 In case of inconsistencies, knows what to do
 Cross datacenter replication support
Cassandra, winning!
 Major upgrades without service interruptions
 (in theory)
 Crazy fast writes
 Not just because you have a hardware RAID
 card that is good at lying to you
 Uses the knowledge that sequential is I/O
 faster than random I/O
 In case of inconsistencies, knows what to do
 Cross datacenter replication support
 Tinker friendly
Cassandra, winning!
 Major upgrades without service interruptions
 (in theory)
 Crazy fast writes
 Not just because you have a hardware RAID
 card that is good at lying to you
 Uses the knowledge that sequential is I/O
 faster than random I/O
 In case of inconsistencies, knows what to do
 Cross datacenter replication support
 Tinker friendly
 Readable code
Cassandra flexibility for Playlist
Cassandra flexibility for Playlist
 The main use cases for playlist:
Cassandra flexibility for Playlist
 The main use cases for playlist:
 Get me all changes since version N of playlist P
Cassandra flexibility for Playlist
 The main use cases for playlist:
 Get me all changes since version N of playlist P
 Apply the following changes on top of version
 M of playlist Q
Cassandra flexibility for Playlist
 The main use cases for playlist:
 Get me all changes since version N of playlist P
 Apply the following changes on top of version
 M of playlist Q
 This translates to CFs head and change
Cassandra flexibility for Playlist
 The main use cases for playlist:
 Get me all changes since version N of playlist P
 Apply the following changes on top of version
 M of playlist Q
 This translates to CFs head and change
 Asymmetric sizes
Cassandra flexibility for Playlist
 The main use cases for playlist:
 Get me all changes since version N of playlist P
 Apply the following changes on top of version
 M of playlist Q
 This translates to CFs head and change
 Asymmetric sizes
 Neat trick: read change with level=ONE,
 fallback to LOCAL_QUORUM
Cassandra flexibility for Playlist
 The main use cases for playlist:
 Get me all changes since version N of playlist P
 Apply the following changes on top of version
 M of playlist Q
 This translates to CFs head and change
 Asymmetric sizes
 Neat trick: read change with level=ONE,
 fallback to LOCAL_QUORUM
Let me tell you a story
Let me tell you a story
 Latest stable kernel from Debian Squeeze
 2.6.32-5
Let me tell you a story
 Latest stable kernel from Debian Squeeze
 2.6.32-5
 What happens after 209 days of uptime?
Let me tell you a story
 Latest stable kernel from Debian Squeeze
 2.6.32-5
 What happens after 209 days of uptime?
 Load average around 120.
Let me tell you a story
 Latest stable kernel from Debian Squeeze
 2.6.32-5
 What happens after 209 days of uptime?
 Load average around 120.
 No CPU activity reported by top
Let me tell you a story
 Latest stable kernel from Debian Squeeze
 2.6.32-5
 What happens after 209 days of uptime?
 Load average around 120.
 No CPU activity reported by top
 Mattias de Zalenski:

 log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557

 (2^54) nanoseconds = 208.499983 days

 Somewhere nanosecond values are shifted ten bits?
Let me tell you a story
 Latest stable kernel from Debian Squeeze
 2.6.32-5
 What happens after 209 days of uptime?
 Load average around 120.
 No CPU activity reported by top
 Mattias de Zalenski:

 log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557

 (2^54) nanoseconds = 208.499983 days

 Somewhere nanosecond values are shifted ten bits?


 Downtime for payment
Let me tell you a story
 Latest stable kernel from Debian Squeeze
 2.6.32-5
 What happens after 209 days of uptime?
 Load average around 120.
 No CPU activity reported by top
 Mattias de Zalenski:

 log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557

 (2^54) nanoseconds = 208.499983 days

 Somewhere nanosecond values are shifted ten bits?


 Downtime for payment
 Downtime for account creation
Let me tell you a story
 Latest stable kernel from Debian Squeeze
 2.6.32-5
 What happens after 209 days of uptime?
 Load average around 120.
 No CPU activity reported by top
 Mattias de Zalenski:

 log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557

 (2^54) nanoseconds = 208.499983 days

 Somewhere nanosecond values are shifted ten bits?


 Downtime for payment
 Downtime for account creation
 No downtime for cassandra backed systems
Backups
Backups
 A few terabytes of live data, many nodes.
 Painful.
Backups
 A few terabytes of live data, many nodes.
 Painful.
 Inefficient. Copy of on disk structure, at least
 3 times the data
Backups
 A few terabytes of live data, many nodes.
 Painful.
 Inefficient. Copy of on disk structure, at least
 3 times the data
 Non-compacted. Possibly a few tens of old
 versions.
Backups
 A few terabytes of live data, many nodes.
 Painful.
 Inefficient. Copy of on disk structure, at least
 3 times the data
 Non-compacted. Possibly a few tens of old
 versions.
 Pulling data off nodes evict hot data from
 page cache.
Backups
 A few terabytes of live data, many nodes.
 Painful.
 Inefficient. Copy of on disk structure, at least
 3 times the data
 Non-compacted. Possibly a few tens of old
 versions.
 Pulling data off nodes evict hot data from
 page cache.
 Initially, only full backups (pre 0.8)
Our solution to backups
Our solution to backups
 NetworkTopologyStrategy is cool
Our solution to backups
 NetworkTopologyStrategy is cool
 Separate datacenter for backups with RF=1
Our solution to backups
 NetworkTopologyStrategy is cool
 Separate datacenter for backups with RF=1
 Beware: tricky
Our solution to backups
 NetworkTopologyStrategy is cool
 Separate datacenter for backups with RF=1
 Beware: tricky
 Once removed from production performance
 considerations
Our solution to backups
 NetworkTopologyStrategy is cool
 Separate datacenter for backups with RF=1
 Beware: tricky
 Once removed from production performance
 considerations
 Application level incremental backups
Our solution to backups
 NetworkTopologyStrategy is cool
 Separate datacenter for backups with RF=1
 Beware: tricky
 Once removed from production performance
 considerations
 Application level incremental backups
 This week, cassandra level incremental
 backups
Our solution to backups
 NetworkTopologyStrategy is cool
 Separate datacenter for backups with RF=1
 Beware: tricky
 Once removed from production performance
 considerations
 Application level incremental backups
 This week, cassandra level incremental
 backups
 Still some issues: lots of SSTables
Solid state is a game changer
Solid state is a game changer
 Asymmetrically sized datasets
Solid state is a game changer
 Asymmetrically sized datasets
 I Can Haz superlarge SSD?
Solid state is a game changer
 Asymmetrically sized datasets
 I Can Haz superlarge SSD?
 No.
Solid state is a game changer
 Asymmetrically sized datasets
 I Can Haz superlarge SSD?
 No.
 With small disks, on disk data structure size
 matters a lot
Solid state is a game changer
 Asymmetrically sized datasets
 I Can Haz superlarge SSD?
 No.
 With small disks, on disk data structure size
 matters a lot
 Our plan:
Solid state is a game changer
 Asymmetrically sized datasets
 I Can Haz superlarge SSD?
 No.
 With small disks, on disk data structure size
 matters a lot
 Our plan:
 Leveled compaction strategy, new in 1.0
Solid state is a game changer
 Asymmetrically sized datasets
 I Can Haz superlarge SSD?
 No.
 With small disks, on disk data structure size
 matters a lot
 Our plan:
 Leveled compaction strategy, new in 1.0
 Hack cassandra to have configurable datadirs
 per keyspace.
Solid state is a game changer
 Asymmetrically sized datasets
 I Can Haz superlarge SSD?
 No.
 With small disks, on disk data structure size
 matters a lot
 Our plan:
 Leveled compaction strategy, new in 1.0
 Hack cassandra to have configurable datadirs
 per keyspace.
 Our patch is integrated in Cassandra 1.1
Some unpleasant surprises
Some unpleasant surprises
 Immaturity.
Some unpleasant surprises
 Immaturity.
 Has anyone written nodetool -h ring?
Some unpleasant surprises
 Immaturity.
 Has anyone written nodetool -h ring?
 Broken on disk bloom filters in 0.8. Very
 painful upgrade to 1.0
Some unpleasant surprises
 Immaturity.
 Has anyone written nodetool -h ring?
 Broken on disk bloom filters in 0.8. Very
 painful upgrade to 1.0
 Small disk, high load, very possible to get
 into an Out Of Disk condition
Some unpleasant surprises
 Immaturity.
 Has anyone written nodetool -h ring?
 Broken on disk bloom filters in 0.8. Very
 painful upgrade to 1.0
 Small disk, high load, very possible to get
 into an Out Of Disk condition
 Logging is lacking
Lessons learned from backup datacenter
Lessons learned from backup datacenter
 Asymmetric cluster sizes are painful.
Lessons learned from backup datacenter
 Asymmetric cluster sizes are painful.
 60 production nodes, 6 backup nodes
Lessons learned from backup datacenter
 Asymmetric cluster sizes are painful.
 60 production nodes, 6 backup nodes
 Repairs that replicate all data 10 times
Lessons learned from backup datacenter
 Asymmetric cluster sizes are painful.
 60 production nodes, 6 backup nodes
 Repairs that replicate all data 10 times
 The workaround: manual repairs
Lessons learned from backup datacenter
 Asymmetric cluster sizes are painful.
 60 production nodes, 6 backup nodes
 Repairs that replicate all data 10 times
 The workaround: manual repairs
 Remove sstables from broken node (to free up
 space)
Lessons learned from backup datacenter
 Asymmetric cluster sizes are painful.
 60 production nodes, 6 backup nodes
 Repairs that replicate all data 10 times
 The workaround: manual repairs
 Remove sstables from broken node (to free up
 space)
 Start it to have it take writes while repopulating
Lessons learned from backup datacenter
 Asymmetric cluster sizes are painful.
 60 production nodes, 6 backup nodes
 Repairs that replicate all data 10 times
 The workaround: manual repairs
 Remove sstables from broken node (to free up
 space)
 Start it to have it take writes while repopulating
 Snapshot and move SSTables from 4 evenly
 spaced nodes
Lessons learned from backup datacenter
 Asymmetric cluster sizes are painful.
 60 production nodes, 6 backup nodes
 Repairs that replicate all data 10 times
 The workaround: manual repairs
 Remove sstables from broken node (to free up
 space)
 Start it to have it take writes while repopulating
 Snapshot and move SSTables from 4 evenly
 spaced nodes
 Do a full compaction
Lessons learned from backup datacenter
 Asymmetric cluster sizes are painful.
 60 production nodes, 6 backup nodes
 Repairs that replicate all data 10 times
 The workaround: manual repairs
 Remove sstables from broken node (to free up
 space)
 Start it to have it take writes while repopulating
 Snapshot and move SSTables from 4 evenly
 spaced nodes
 Do a full compaction
 Do a repair and hope for the best
Spot the bug
Spot the bug
 Hector java cassandra driver:
Spot the bug
 Hector java cassandra driver:
 private AtomicInteger counter = new AtomicInteger();

 private Server getNextServer() {
     counter.compareAndSet(16384, 0);
     return servers[counter.getAndIncrement() % servers.length];
 }
Spot the bug
 Hector java cassandra driver:
 private AtomicInteger counter = new AtomicInteger();

 private Server getNextServer() {
     counter.compareAndSet(16384, 0);
     return servers[counter.getAndIncrement() % servers.length];
 }


 Race condition
Spot the bug
 Hector java cassandra driver:
 private AtomicInteger counter = new AtomicInteger();

 private Server getNextServer() {
     counter.compareAndSet(16384, 0);
     return servers[counter.getAndIncrement() % servers.length];
 }


 Race condition
 java.lang.ArrayIndexOutOfBoundsException
Spot the bug
 Hector java cassandra driver:
 private AtomicInteger counter = new AtomicInteger();

 private Server getNextServer() {
     counter.compareAndSet(16384, 0);
     return servers[counter.getAndIncrement() % servers.length];
 }


 Race condition
 java.lang.ArrayIndexOutOfBoundsException
 After close to 2**31 requests
Spot the bug
 Hector java cassandra driver:
 private AtomicInteger counter = new AtomicInteger();

 private Server getNextServer() {
     counter.compareAndSet(16384, 0);
     return servers[counter.getAndIncrement() % servers.length];
 }


 Race condition
 java.lang.ArrayIndexOutOfBoundsException
 After close to 2**31 requests
 Took a few days
Thrift payload size limits
 Communication with Cassandra is based on
 thrift
 Large mutations, larger than 15MiB
 Thrift drops the underlying TCP connection
 Hector considers the connection drop a node
 specific problem
 Retries on all cassandra nodes
 Effectively shutting down all cassandra traffic
Conclusions
Conclusions
 In the 0.6-1.0 timeframe, developers and
 operations engineers are needed
Conclusions
 In the 0.6-1.0 timeframe, developers and
 operations engineers are needed
 You need to keep an eye on bugs created, be
 part of the community
Conclusions
 In the 0.6-1.0 timeframe, developers and
 operations engineers are needed
 You need to keep an eye on bugs created, be
 part of the community
 Exotic stuff (such a asymmetrically sized
 datacenters) is tricky
Conclusions
 In the 0.6-1.0 timeframe, developers and
 operations engineers are needed
 You need to keep an eye on bugs created, be
 part of the community
 Exotic stuff (such a asymmetrically sized
 datacenters) is tricky
 Lots of things gets fixed. You need to keep
 up with upstream
Conclusions
 In the 0.6-1.0 timeframe, developers and
 operations engineers are needed
 You need to keep an eye on bugs created, be
 part of the community
 Exotic stuff (such a asymmetrically sized
 datacenters) is tricky
 Lots of things gets fixed. You need to keep
 up with upstream
 You need to integrate with monitoring and
 graphing
Conclusions
 In the 0.6-1.0 timeframe, developers and
 operations engineers are needed
 You need to keep an eye on bugs created, be
 part of the community
 Exotic stuff (such a asymmetrically sized
 datacenters) is tricky
 Lots of things gets fixed. You need to keep
 up with upstream
 You need to integrate with monitoring and
 graphing
 Consider it a toolkit for constructing
 solutions.
Questions? Answers.

More Related Content

Similar to Spotify cassandra london

Obvious and Non-Obvious Scalability Issues: Spotify Learnings
Obvious and Non-Obvious Scalability Issues: Spotify LearningsObvious and Non-Obvious Scalability Issues: Spotify Learnings
Obvious and Non-Obvious Scalability Issues: Spotify Learnings
David Poblador i Garcia
 
Enlightenment: A Cross Platform Window Manager & Toolkit
Enlightenment: A Cross Platform Window Manager & ToolkitEnlightenment: A Cross Platform Window Manager & Toolkit
Enlightenment: A Cross Platform Window Manager & Toolkit
Samsung Open Source Group
 
The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019
Karthik Murugesan
 
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLabSF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
Chester Chen
 
Free The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainFree The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own Domain
Ken Collins
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
W2O Group
 
Perl DBI Scripting with the ILS
Perl DBI Scripting with the ILSPerl DBI Scripting with the ILS
Perl DBI Scripting with the ILS
Roy Zimmer
 
AWS Welcome to re:Invent recap - 20161214
AWS Welcome to re:Invent recap - 20161214AWS Welcome to re:Invent recap - 20161214
AWS Welcome to re:Invent recap - 20161214
Amazon Web Services
 
Maximum Uptime Cluster Orchestration with Ansible
Maximum Uptime Cluster Orchestration with AnsibleMaximum Uptime Cluster Orchestration with Ansible
Maximum Uptime Cluster Orchestration with Ansible
ScyllaDB
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
Amazon Web Services
 
Modern Release Engineering in a Nutshell - Why Researchers should Care!
Modern Release Engineering in a Nutshell - Why Researchers should Care!Modern Release Engineering in a Nutshell - Why Researchers should Care!
Modern Release Engineering in a Nutshell - Why Researchers should Care!
Bram Adams
 
Test Driven Infrastructure
Test Driven InfrastructureTest Driven Infrastructure
Test Driven Infrastructure
Arthur Maltson
 
Joomla Extreme Performance
Joomla Extreme PerformanceJoomla Extreme Performance
Joomla Extreme Performance
Mitch Pirtle
 
Introduction to Riak - Red Dirt Ruby Conf Training
Introduction to Riak - Red Dirt Ruby Conf TrainingIntroduction to Riak - Red Dirt Ruby Conf Training
Introduction to Riak - Red Dirt Ruby Conf Training
Sean Cribbs
 
Using ТРСС to study Firebird performance
Using ТРСС to study Firebird performanceUsing ТРСС to study Firebird performance
Using ТРСС to study Firebird performance
Mind The Firebird
 
The computer science behind a modern disributed data store
The computer science behind a modern disributed data storeThe computer science behind a modern disributed data store
The computer science behind a modern disributed data store
J On The Beach
 
Tasting i18n
Tasting i18nTasting i18n
Tasting i18n
abhisharma
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
ScyllaDB
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture Forum
Christopher Spring
 
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
DataStax
 

Similar to Spotify cassandra london (20)

Obvious and Non-Obvious Scalability Issues: Spotify Learnings
Obvious and Non-Obvious Scalability Issues: Spotify LearningsObvious and Non-Obvious Scalability Issues: Spotify Learnings
Obvious and Non-Obvious Scalability Issues: Spotify Learnings
 
Enlightenment: A Cross Platform Window Manager & Toolkit
Enlightenment: A Cross Platform Window Manager & ToolkitEnlightenment: A Cross Platform Window Manager & Toolkit
Enlightenment: A Cross Platform Window Manager & Toolkit
 
The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019
 
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLabSF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
 
Free The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainFree The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own Domain
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
 
Perl DBI Scripting with the ILS
Perl DBI Scripting with the ILSPerl DBI Scripting with the ILS
Perl DBI Scripting with the ILS
 
AWS Welcome to re:Invent recap - 20161214
AWS Welcome to re:Invent recap - 20161214AWS Welcome to re:Invent recap - 20161214
AWS Welcome to re:Invent recap - 20161214
 
Maximum Uptime Cluster Orchestration with Ansible
Maximum Uptime Cluster Orchestration with AnsibleMaximum Uptime Cluster Orchestration with Ansible
Maximum Uptime Cluster Orchestration with Ansible
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
Modern Release Engineering in a Nutshell - Why Researchers should Care!
Modern Release Engineering in a Nutshell - Why Researchers should Care!Modern Release Engineering in a Nutshell - Why Researchers should Care!
Modern Release Engineering in a Nutshell - Why Researchers should Care!
 
Test Driven Infrastructure
Test Driven InfrastructureTest Driven Infrastructure
Test Driven Infrastructure
 
Joomla Extreme Performance
Joomla Extreme PerformanceJoomla Extreme Performance
Joomla Extreme Performance
 
Introduction to Riak - Red Dirt Ruby Conf Training
Introduction to Riak - Red Dirt Ruby Conf TrainingIntroduction to Riak - Red Dirt Ruby Conf Training
Introduction to Riak - Red Dirt Ruby Conf Training
 
Using ТРСС to study Firebird performance
Using ТРСС to study Firebird performanceUsing ТРСС to study Firebird performance
Using ТРСС to study Firebird performance
 
The computer science behind a modern disributed data store
The computer science behind a modern disributed data storeThe computer science behind a modern disributed data store
The computer science behind a modern disributed data store
 
Tasting i18n
Tasting i18nTasting i18n
Tasting i18n
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture Forum
 
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
 

Recently uploaded

Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 

Recently uploaded (20)

Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 

Spotify cassandra london

  • 1. Cassandra at Spotify 28th of March 2012
  • 3. About this talk An introduction Spotify, to our service and our persistent storage needs
  • 4. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings
  • 5. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned
  • 6. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned What I would have liked to have known a year ago
  • 7. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned What I would have liked to have known a year ago Not a comparison between different NoSQL solutions
  • 8. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned What I would have liked to have known a year ago Not a comparison between different NoSQL solutions The real reason: yes, we are hiring.
  • 11. Noa Resare Stockholm, Sweden Service Reliability Engineering
  • 12. Noa Resare Stockholm, Sweden Service Reliability Engineering noa@spotify.com
  • 13. Noa Resare Stockholm, Sweden Service Reliability Engineering noa@spotify.com @blippie
  • 14. Spotify — all music, all the time
  • 15. Spotify — all music, all the time A better user experience than file sharing.
  • 16. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients.
  • 17. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability.
  • 18. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability.
  • 19. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability. 13 markets. More than ten million users.
  • 20. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability. 13 markets. More than ten million users. 3 datacenters.
  • 21. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability. 13 markets. More than ten million users. 3 datacenters. Tens of gigabits of data pushed per datacenter.
  • 22. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability. 13 markets. More than ten million users. 3 datacenters. Tens of gigabits of data pushed per datacenter. Backend systems that support a large set of innovative features.
  • 24. Innovative features in practice Playlist
  • 25. Innovative features in practice Playlist A named list of tracks
  • 26. Innovative features in practice Playlist A named list of tracks Keep multiple devices in sync
  • 27. Innovative features in practice Playlist A named list of tracks Keep multiple devices in sync Support nested playlists
  • 28. Innovative features in practice Playlist A named list of tracks Keep multiple devices in sync Support nested playlists Offline editing, pubsub
  • 29. Innovative features in practice Playlist A named list of tracks Keep multiple devices in sync Support nested playlists Offline editing, pubsub Scale. More than half a billion lists currently in the system
  • 30. Innovative features in practice Playlist A named list of tracks Keep multiple devices in sync Support nested playlists Offline editing, pubsub Scale. More than half a billion lists currently in the system About 10 kHz on peak traffic.
  • 31. Innovative features in practice Playlist A named list of tracks Keep multiple devices in sync Support nested playlists Offline editing, pubsub Scale. More than half a billion lists currently in the system About 10 kHz on peak traffic. Result: accidentally implemented VCS
  • 34. Suggested solutions Flat files We don’t need ACID
  • 35. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass.
  • 36. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really)
  • 37. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL
  • 38. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this
  • 39. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this Simple Key-Value store
  • 40. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this Simple Key-Value store Tokyo cabinet, some experience
  • 41. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this Simple Key-Value store Tokyo cabinet, some experience Clustered Key-Value store
  • 42. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this Simple Key-Value store Tokyo cabinet, some experience Clustered Key-Value store Evaluated a lot, end game contestants HBase and Cassandra
  • 44. Enter Cassandra Solves a large subset of storage related problems
  • 45. Enter Cassandra Solves a large subset of storage related problems Sharding, replication
  • 46. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure
  • 47. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Free software
  • 48. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Free software Active community, commercial backing
  • 49. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Free software Active community, commercial backing 66 + 18 + 9 + 28 production nodes
  • 50. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Free software Active community, commercial backing 66 + 18 + 9 + 28 production nodes About twenty nodes for various testing clusters
  • 51. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Free software Active community, commercial backing 66 + 18 + 9 + 28 production nodes About twenty nodes for various testing clusters Datasets ranging from 8T to a few gigs.
  • 53. Cassandra, winning! Major upgrades without service interruptions (in theory)
  • 54. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes
  • 55. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you
  • 56. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Uses the knowledge that sequential is I/O faster than random I/O
  • 57. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Uses the knowledge that sequential is I/O faster than random I/O In case of inconsistencies, knows what to do
  • 58. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Uses the knowledge that sequential is I/O faster than random I/O In case of inconsistencies, knows what to do Cross datacenter replication support
  • 59. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Uses the knowledge that sequential is I/O faster than random I/O In case of inconsistencies, knows what to do Cross datacenter replication support Tinker friendly
  • 60. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Uses the knowledge that sequential is I/O faster than random I/O In case of inconsistencies, knows what to do Cross datacenter replication support Tinker friendly Readable code
  • 62. Cassandra flexibility for Playlist The main use cases for playlist:
  • 63. Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P
  • 64. Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P Apply the following changes on top of version M of playlist Q
  • 65. Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P Apply the following changes on top of version M of playlist Q This translates to CFs head and change
  • 66. Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P Apply the following changes on top of version M of playlist Q This translates to CFs head and change Asymmetric sizes
  • 67. Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P Apply the following changes on top of version M of playlist Q This translates to CFs head and change Asymmetric sizes Neat trick: read change with level=ONE, fallback to LOCAL_QUORUM
  • 68. Cassandra flexibility for Playlist The main use cases for playlist: Get me all changes since version N of playlist P Apply the following changes on top of version M of playlist Q This translates to CFs head and change Asymmetric sizes Neat trick: read change with level=ONE, fallback to LOCAL_QUORUM
  • 69. Let me tell you a story
  • 70. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5
  • 71. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime?
  • 72. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120.
  • 73. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top
  • 74. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top Mattias de Zalenski: log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557 (2^54) nanoseconds = 208.499983 days Somewhere nanosecond values are shifted ten bits?
  • 75. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top Mattias de Zalenski: log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557 (2^54) nanoseconds = 208.499983 days Somewhere nanosecond values are shifted ten bits? Downtime for payment
  • 76. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top Mattias de Zalenski: log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557 (2^54) nanoseconds = 208.499983 days Somewhere nanosecond values are shifted ten bits? Downtime for payment Downtime for account creation
  • 77. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top Mattias de Zalenski: log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557 (2^54) nanoseconds = 208.499983 days Somewhere nanosecond values are shifted ten bits? Downtime for payment Downtime for account creation No downtime for cassandra backed systems
  • 79. Backups A few terabytes of live data, many nodes. Painful.
  • 80. Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data
  • 81. Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data Non-compacted. Possibly a few tens of old versions.
  • 82. Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data Non-compacted. Possibly a few tens of old versions. Pulling data off nodes evict hot data from page cache.
  • 83. Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data Non-compacted. Possibly a few tens of old versions. Pulling data off nodes evict hot data from page cache. Initially, only full backups (pre 0.8)
  • 84. Our solution to backups
  • 85. Our solution to backups NetworkTopologyStrategy is cool
  • 86. Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1
  • 87. Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1 Beware: tricky
  • 88. Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1 Beware: tricky Once removed from production performance considerations
  • 89. Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1 Beware: tricky Once removed from production performance considerations Application level incremental backups
  • 90. Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1 Beware: tricky Once removed from production performance considerations Application level incremental backups This week, cassandra level incremental backups
  • 91. Our solution to backups NetworkTopologyStrategy is cool Separate datacenter for backups with RF=1 Beware: tricky Once removed from production performance considerations Application level incremental backups This week, cassandra level incremental backups Still some issues: lots of SSTables
  • 92. Solid state is a game changer
  • 93. Solid state is a game changer Asymmetrically sized datasets
  • 94. Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD?
  • 95. Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No.
  • 96. Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No. With small disks, on disk data structure size matters a lot
  • 97. Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No. With small disks, on disk data structure size matters a lot Our plan:
  • 98. Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No. With small disks, on disk data structure size matters a lot Our plan: Leveled compaction strategy, new in 1.0
  • 99. Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No. With small disks, on disk data structure size matters a lot Our plan: Leveled compaction strategy, new in 1.0 Hack cassandra to have configurable datadirs per keyspace.
  • 100. Solid state is a game changer Asymmetrically sized datasets I Can Haz superlarge SSD? No. With small disks, on disk data structure size matters a lot Our plan: Leveled compaction strategy, new in 1.0 Hack cassandra to have configurable datadirs per keyspace. Our patch is integrated in Cassandra 1.1
  • 103. Some unpleasant surprises Immaturity. Has anyone written nodetool -h ring?
  • 104. Some unpleasant surprises Immaturity. Has anyone written nodetool -h ring? Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0
  • 105. Some unpleasant surprises Immaturity. Has anyone written nodetool -h ring? Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0 Small disk, high load, very possible to get into an Out Of Disk condition
  • 106. Some unpleasant surprises Immaturity. Has anyone written nodetool -h ring? Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0 Small disk, high load, very possible to get into an Out Of Disk condition Logging is lacking
  • 107. Lessons learned from backup datacenter
  • 108. Lessons learned from backup datacenter Asymmetric cluster sizes are painful.
  • 109. Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes
  • 110. Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs that replicate all data 10 times
  • 111. Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs that replicate all data 10 times The workaround: manual repairs
  • 112. Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs that replicate all data 10 times The workaround: manual repairs Remove sstables from broken node (to free up space)
  • 113. Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs that replicate all data 10 times The workaround: manual repairs Remove sstables from broken node (to free up space) Start it to have it take writes while repopulating
  • 114. Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs that replicate all data 10 times The workaround: manual repairs Remove sstables from broken node (to free up space) Start it to have it take writes while repopulating Snapshot and move SSTables from 4 evenly spaced nodes
  • 115. Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs that replicate all data 10 times The workaround: manual repairs Remove sstables from broken node (to free up space) Start it to have it take writes while repopulating Snapshot and move SSTables from 4 evenly spaced nodes Do a full compaction
  • 116. Lessons learned from backup datacenter Asymmetric cluster sizes are painful. 60 production nodes, 6 backup nodes Repairs that replicate all data 10 times The workaround: manual repairs Remove sstables from broken node (to free up space) Start it to have it take writes while repopulating Snapshot and move SSTables from 4 evenly spaced nodes Do a full compaction Do a repair and hope for the best
  • 118. Spot the bug Hector java cassandra driver:
  • 119. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; }
  • 120. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; } Race condition
  • 121. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; } Race condition java.lang.ArrayIndexOutOfBoundsException
  • 122. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; } Race condition java.lang.ArrayIndexOutOfBoundsException After close to 2**31 requests
  • 123. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; } Race condition java.lang.ArrayIndexOutOfBoundsException After close to 2**31 requests Took a few days
  • 124. Thrift payload size limits Communication with Cassandra is based on thrift Large mutations, larger than 15MiB Thrift drops the underlying TCP connection Hector considers the connection drop a node specific problem Retries on all cassandra nodes Effectively shutting down all cassandra traffic
  • 126. Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed
  • 127. Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed You need to keep an eye on bugs created, be part of the community
  • 128. Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky
  • 129. Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky Lots of things gets fixed. You need to keep up with upstream
  • 130. Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky Lots of things gets fixed. You need to keep up with upstream You need to integrate with monitoring and graphing
  • 131. Conclusions In the 0.6-1.0 timeframe, developers and operations engineers are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky Lots of things gets fixed. You need to keep up with upstream You need to integrate with monitoring and graphing Consider it a toolkit for constructing solutions.

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \n
  83. \n
  84. \n
  85. \n
  86. \n
  87. \n
  88. \n
  89. \n
  90. \n
  91. \n
  92. \n
  93. \n
  94. \n
  95. \n
  96. \n
  97. \n
  98. \n
  99. \n
  100. \n
  101. \n
  102. \n
  103. \n
  104. \n
  105. \n
  106. \n
  107. \n
  108. \n
  109. \n
  110. \n
  111. \n
  112. \n
  113. \n
  114. \n
  115. \n
  116. \n