SlideShare a Scribd company logo
1 of 133
Cassandra at Spotify




                       7th of March 2012
About this talk
About this talk
  An introduction Spotify, to our service and our persistent storage needs
About this talk
  An introduction Spotify, to our service and our persistent storage needs
  What Cassandra brings
About this talk
  An introduction Spotify, to our service and our persistent storage needs
  What Cassandra brings
  What we have learned
About this talk
  An introduction Spotify, to our service and our persistent storage needs
  What Cassandra brings
  What we have learned
  What I would have liked to have known a year ago
About this talk
  An introduction Spotify, to our service and our persistent storage needs
  What Cassandra brings
  What we have learned
  What I would have liked to have known a year ago
About this talk
  An introduction Spotify, to our service and our persistent storage needs
  What Cassandra brings
  What we have learned
  What I would have liked to have known a year ago


  Not a comparison between different NoSQL solutions
About this talk
  An introduction Spotify, to our service and our persistent storage needs
  What Cassandra brings
  What we have learned
  What I would have liked to have known a year ago


  Not a comparison between different NoSQL solutions
  Not a hands on introduction to Cassandra
About this talk
  An introduction Spotify, to our service and our persistent storage needs
  What Cassandra brings
  What we have learned
  What I would have liked to have known a year ago


  Not a comparison between different NoSQL solutions
  Not a hands on introduction to Cassandra
  We work with physical hardware for production
Noa Resare
Noa Resare
  Stockholm, Sweden
Noa Resare
  Stockholm, Sweden
  Service Reliability Engineering
Noa Resare
  Stockholm, Sweden
  Service Reliability Engineering
  noa@spotify.com
Noa Resare
  Stockholm, Sweden
  Service Reliability Engineering
  noa@spotify.com
  @blippie
Spotify — all music, all the time
Spotify — all music, all the time
  A better user experience than file sharing.
Spotify — all music, all the time
  A better user experience than file sharing.
  Native desktop and mobile clients.
Spotify — all music, all the time
  A better user experience than file sharing.
  Native desktop and mobile clients.
  Custom backend, built for performance and scalability.
Spotify — all music, all the time
  A better user experience than file sharing.
  Native desktop and mobile clients.
  Custom backend, built for performance and scalability.
Spotify — all music, all the time
  A better user experience than file sharing.
  Native desktop and mobile clients.
  Custom backend, built for performance and scalability.


  12 markets. More than ten million users.
Spotify — all music, all the time
  A better user experience than file sharing.
  Native desktop and mobile clients.
  Custom backend, built for performance and scalability.


  12 markets. More than ten million users.
  3 datacenters.
Spotify — all music, all the time
  A better user experience than file sharing.
  Native desktop and mobile clients.
  Custom backend, built for performance and scalability.


  12 markets. More than ten million users.
  3 datacenters.
  Tens of gigabits of data pushed per datacenter.
Spotify — all music, all the time
  A better user experience than file sharing.
  Native desktop and mobile clients.
  Custom backend, built for performance and scalability.


  12 markets. More than ten million users.
  3 datacenters.
  Tens of gigabits of data pushed per datacenter.
  Backend systems that support a large set of innovative features.
Innovative features in practice
Innovative features in practice
   Playlist
Innovative features in practice
   Playlist
   Should be simple, right?
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
   Offline editing on multiple devices
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
   Offline editing on multiple devices
   Changes pushed to connected devices
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
   Offline editing on multiple devices
   Changes pushed to connected devices
   Scale. More than half a billion lists currently in the system
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
   Offline editing on multiple devices
   Changes pushed to connected devices
   Scale. More than half a billion lists currently in the system
   About 10 khz on peak traffic.
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
   Offline editing on multiple devices
   Changes pushed to connected devices
   Scale. More than half a billion lists currently in the system
   About 10 khz on peak traffic.
   Resulting storage requirements:
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
   Offline editing on multiple devices
   Changes pushed to connected devices
   Scale. More than half a billion lists currently in the system
   About 10 khz on peak traffic.
   Resulting storage requirements:
   Full history
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
   Offline editing on multiple devices
   Changes pushed to connected devices
   Scale. More than half a billion lists currently in the system
   About 10 khz on peak traffic.
   Resulting storage requirements:
   Full history
   Really fast access to latest version number and content
Innovative features in practice
   Playlist
   Should be simple, right?
   A named list of tracks
   It gets more complicated
   Keep multiple devices in sync
   Support nested playlists
   Offline editing on multiple devices
   Changes pushed to connected devices
   Scale. More than half a billion lists currently in the system
   About 10 khz on peak traffic.
   Resulting storage requirements:
   Full history
   Really fast access to latest version number and content
Suggested solutions
Suggested solutions
  Flat files
Suggested solutions
  Flat files
  We don’t need ACID
Suggested solutions
  Flat files
  We don’t need ACID
  Linux page cache kicks ass.
Suggested solutions
  Flat files
  We don’t need ACID
  Linux page cache kicks ass.
  (Not really)
Suggested solutions
  Flat files
  We don’t need ACID
  Linux page cache kicks ass.
  (Not really)
  SQL
Suggested solutions
  Flat files
  We don’t need ACID
  Linux page cache kicks ass.
  (Not really)
  SQL
  Tried and true. Facebook does this
Suggested solutions
  Flat files
  We don’t need ACID
  Linux page cache kicks ass.
  (Not really)
  SQL
  Tried and true. Facebook does this
  Simple Key-Value store
Suggested solutions
  Flat files
  We don’t need ACID
  Linux page cache kicks ass.
  (Not really)
  SQL
  Tried and true. Facebook does this
  Simple Key-Value store
  Tokyo cabinet, some experience
Suggested solutions
  Flat files
  We don’t need ACID
  Linux page cache kicks ass.
  (Not really)
  SQL
  Tried and true. Facebook does this
  Simple Key-Value store
  Tokyo cabinet, some experience
  Clustered Key-Value store
Suggested solutions
  Flat files
  We don’t need ACID
  Linux page cache kicks ass.
  (Not really)
  SQL
  Tried and true. Facebook does this
  Simple Key-Value store
  Tokyo cabinet, some experience
  Clustered Key-Value store
  Evaluated a lot, end game contestants HBase and Cassandra
Enter Cassandra
Enter Cassandra
  Solves a large subset of storage related problems
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
  Ability to make the performance/reliability tradeoff per request
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
  Ability to make the performance/reliability tradeoff per request
  Free software
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
  Ability to make the performance/reliability tradeoff per request
  Free software
  Active community, commercial backing
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
  Ability to make the performance/reliability tradeoff per request
  Free software
  Active community, commercial backing
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
  Ability to make the performance/reliability tradeoff per request
  Free software
  Active community, commercial backing
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
  Ability to make the performance/reliability tradeoff per request
  Free software
  Active community, commercial backing




  66 + 18 + 9 + 28 production nodes
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
  Ability to make the performance/reliability tradeoff per request
  Free software
  Active community, commercial backing




  66 + 18 + 9 + 28 production nodes
  About twenty nodes for various testing clusters
Enter Cassandra
  Solves a large subset of storage related problems
  Sharding, replication
  No single point of failure
  Ability to make the performance/reliability tradeoff per request
  Free software
  Active community, commercial backing




  66 + 18 + 9 + 28 production nodes
  About twenty nodes for various testing clusters
  Datasets ranging from 8T to a few gigs.
Cassandra key concepts, on a node
  Log structured storage
  Sorted string table — SSTable
  Immutable files on disk
  Compaction — Many to one, merge sort




                 Memtable




              SSTable         SSTable    SSTable
Cassandra key concepts, In a cluster
  Clusters of nodes in a ring by key order
  All data typically written to several nodes, Replication Factor
  Rings can be expanded in production
  Gossip, detects nodes being up / down / joining
  Anti Entropy mechanisms
  Many read operations can be done sequentially
Cassandra, winning!
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
  Somewhat predictable number of seeks needed for read
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
  Somewhat predictable number of seeks needed for read
  Knows that sequential I/O faster than random I/O
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
  Somewhat predictable number of seeks needed for read
  Knows that sequential I/O faster than random I/O
  In case of inconsistencies, knows what to do
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
  Somewhat predictable number of seeks needed for read
  Knows that sequential I/O faster than random I/O
  In case of inconsistencies, knows what to do
  Replacing broken nodes straightforward
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
  Somewhat predictable number of seeks needed for read
  Knows that sequential I/O faster than random I/O
  In case of inconsistencies, knows what to do
  Replacing broken nodes straightforward
  Cross datacenter replication support
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
  Somewhat predictable number of seeks needed for read
  Knows that sequential I/O faster than random I/O
  In case of inconsistencies, knows what to do
  Replacing broken nodes straightforward
  Cross datacenter replication support
  Tinker friendly
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
  Somewhat predictable number of seeks needed for read
  Knows that sequential I/O faster than random I/O
  In case of inconsistencies, knows what to do
  Replacing broken nodes straightforward
  Cross datacenter replication support
  Tinker friendly
  Readable code
Cassandra, winning!
  Major upgrades without service interruptions (in theory)
  Crazy fast writes
  Not just because you have a hardware RAID card that is good at lying to you
  Somewhat predictable number of seeks needed for read
  Knows that sequential I/O faster than random I/O
  In case of inconsistencies, knows what to do
  Replacing broken nodes straightforward
  Cross datacenter replication support
  Tinker friendly
  Readable code
Let me tell you a story
Let me tell you a story
  Latest stable kernel from Debian Squeeze 2.6.32-5
Let me tell you a story
  Latest stable kernel from Debian Squeeze 2.6.32-5
  What happens after 209 days of uptime?
Let me tell you a story
  Latest stable kernel from Debian Squeeze 2.6.32-5
  What happens after 209 days of uptime?
  Load average around 120.
Let me tell you a story
  Latest stable kernel from Debian Squeeze 2.6.32-5
  What happens after 209 days of uptime?
  Load average around 120.
  No CPU activity reported by top
Let me tell you a story
  Latest stable kernel from Debian Squeeze 2.6.32-5
  What happens after 209 days of uptime?
  Load average around 120.
  No CPU activity reported by top

   Mattias de Zalenski:

   log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557

   (2^54) nanoseconds = 208.499983 days

   Somewhere nanosecond values are shifted ten bits?
Let me tell you a story
  Latest stable kernel from Debian Squeeze 2.6.32-5
  What happens after 209 days of uptime?
  Load average around 120.
  No CPU activity reported by top

   Mattias de Zalenski:

   log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557

   (2^54) nanoseconds = 208.499983 days

   Somewhere nanosecond values are shifted ten bits?




  Downtime for payment
Let me tell you a story
  Latest stable kernel from Debian Squeeze 2.6.32-5
  What happens after 209 days of uptime?
  Load average around 120.
  No CPU activity reported by top

   Mattias de Zalenski:

   log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557

   (2^54) nanoseconds = 208.499983 days

   Somewhere nanosecond values are shifted ten bits?




  Downtime for payment
  Downtime for account creation
Let me tell you a story
  Latest stable kernel from Debian Squeeze 2.6.32-5
  What happens after 209 days of uptime?
  Load average around 120.
  No CPU activity reported by top

   Mattias de Zalenski:

   log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557

   (2^54) nanoseconds = 208.499983 days

   Somewhere nanosecond values are shifted ten bits?




  Downtime for payment
  Downtime for account creation
  No downtime for cassandra backed systems
Backups
Backups
  A few terabytes of live data, many nodes. Painful.
Backups
  A few terabytes of live data, many nodes. Painful.
  Inefficient. Copy of on disk structure, at least 3 times the data
Backups
  A few terabytes of live data, many nodes. Painful.
  Inefficient. Copy of on disk structure, at least 3 times the data
  Non-compacted. Possibly a few tens of old versions.
Backups
  A few terabytes of live data, many nodes. Painful.
  Inefficient. Copy of on disk structure, at least 3 times the data
  Non-compacted. Possibly a few tens of old versions.
  Initially, only full backups (pre 0.8)
Backups
  A few terabytes of live data, many nodes. Painful.
  Inefficient. Copy of on disk structure, at least 3 times the data
  Non-compacted. Possibly a few tens of old versions.
  Initially, only full backups (pre 0.8)
Our solution to backups
Our solution to backups
  Separate datacenter for backups with RF=1
Our solution to backups
  Separate datacenter for backups with RF=1
  Beware: tricky
Our solution to backups
  Separate datacenter for backups with RF=1
  Beware: tricky
  Once removed from production performance considerations
Our solution to backups
  Separate datacenter for backups with RF=1
  Beware: tricky
  Once removed from production performance considerations
  Application level incremental backups
Our solution to backups
  Separate datacenter for backups with RF=1
  Beware: tricky
  Once removed from production performance considerations
  Application level incremental backups
  Soon: Cassandra incremental backups
Solid state is a game changer
Solid state is a game changer
  Large datasets, light read load
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
  With small disks, on disk datastructure size matters a lot
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
  With small disks, on disk datastructure size matters a lot
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
  With small disks, on disk datastructure size matters a lot
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
  With small disks, on disk datastructure size matters a lot




  Our plan:
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
  With small disks, on disk datastructure size matters a lot




  Our plan:
  Leveled compaction strategy, new in 1.0
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
  With small disks, on disk datastructure size matters a lot




  Our plan:
  Leveled compaction strategy, new in 1.0
  Hack cassandra to have configurable datadirs per keyspace.
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
  With small disks, on disk datastructure size matters a lot




  Our plan:
  Leveled compaction strategy, new in 1.0
  Hack cassandra to have configurable datadirs per keyspace.
  Our patch is integrated in Cassandra 1.1
Solid state is a game changer
  Large datasets, light read load
  Small datasets, heavy read load
  I Can Haz superlarge SSD?
  No.
  With small disks, on disk datastructure size matters a lot




  Our plan:
  Leveled compaction strategy, new in 1.0
  Hack cassandra to have configurable datadirs per keyspace.
  Our patch is integrated in Cassandra 1.1
Some unpleasant surprises
Some unpleasant surprises
  Immaturity
Some unpleasant surprises
  Immaturity
  Hector, larger mutations than 15MB. Connection drops in thrift.
Some unpleasant surprises
  Immaturity
  Hector, larger mutations than 15MB. Connection drops in thrift.
  Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0
Some unpleasant surprises
  Immaturity
  Hector, larger mutations than 15MB. Connection drops in thrift.
  Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0
  Small disk, high load, very possible to get into an Out Of Disk condition
Some unpleasant surprises
  Immaturity
  Hector, larger mutations than 15MB. Connection drops in thrift.
  Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0
  Small disk, high load, very possible to get into an Out Of Disk condition
  Logging is lacking
Spot the bug
Spot the bug
  Hector java cassandra driver:
Spot the bug
  Hector java cassandra driver:
  private AtomicInteger counter = new AtomicInteger();

  private Server getNextServer() {
      counter.compareAndSet(16384, 0);
      return servers[counter.getAndIncrement() % servers.length];
  }
Spot the bug
  Hector java cassandra driver:
  private AtomicInteger counter = new AtomicInteger();

  private Server getNextServer() {
      counter.compareAndSet(16384, 0);
      return servers[counter.getAndIncrement() % servers.length];
  }


  Race condition
Spot the bug
  Hector java cassandra driver:
  private AtomicInteger counter = new AtomicInteger();

  private Server getNextServer() {
      counter.compareAndSet(16384, 0);
      return servers[counter.getAndIncrement() % servers.length];
  }


  Race condition
  java.lang.ArrayIndexOutOfBoundsException
Spot the bug
  Hector java cassandra driver:
  private AtomicInteger counter = new AtomicInteger();

  private Server getNextServer() {
      counter.compareAndSet(16384, 0);
      return servers[counter.getAndIncrement() % servers.length];
  }


  Race condition
  java.lang.ArrayIndexOutOfBoundsException
  After close to 2**31 requests
Spot the bug
  Hector java cassandra driver:
  private AtomicInteger counter = new AtomicInteger();

  private Server getNextServer() {
      counter.compareAndSet(16384, 0);
      return servers[counter.getAndIncrement() % servers.length];
  }


  Race condition
  java.lang.ArrayIndexOutOfBoundsException
  After close to 2**31 requests
  Took about 5 days
Conclusions
Conclusions
  In the 0.6-1.0 timeframe, development engineers and operations are needed
Conclusions
  In the 0.6-1.0 timeframe, development engineers and operations are needed
  You need to keep an eye on bugs created, be part of the community
Conclusions
  In the 0.6-1.0 timeframe, development engineers and operations are needed
  You need to keep an eye on bugs created, be part of the community
  Exotic stuff (such a asymmetrically sized datacenters) is tricky
Conclusions
  In the 0.6-1.0 timeframe, development engineers and operations are needed
  You need to keep an eye on bugs created, be part of the community
  Exotic stuff (such a asymmetrically sized datacenters) is tricky
  Lots of things gets fixed. You need to keep up with upstream
Conclusions
  In the 0.6-1.0 timeframe, development engineers and operations are needed
  You need to keep an eye on bugs created, be part of the community
  Exotic stuff (such a asymmetrically sized datacenters) is tricky
  Lots of things gets fixed. You need to keep up with upstream
  You need to integrate with monitoring and graphing
Conclusions
  In the 0.6-1.0 timeframe, development engineers and operations are needed
  You need to keep an eye on bugs created, be part of the community
  Exotic stuff (such a asymmetrically sized datacenters) is tricky
  Lots of things gets fixed. You need to keep up with upstream
  You need to integrate with monitoring and graphing
Conclusions
  In the 0.6-1.0 timeframe, development engineers and operations are needed
  You need to keep an eye on bugs created, be part of the community
  Exotic stuff (such a asymmetrically sized datacenters) is tricky
  Lots of things gets fixed. You need to keep up with upstream
  You need to integrate with monitoring and graphing
Conclusions
  In the 0.6-1.0 timeframe, development engineers and operations are needed
  You need to keep an eye on bugs created, be part of the community
  Exotic stuff (such a asymmetrically sized datacenters) is tricky
  Lots of things gets fixed. You need to keep up with upstream
  You need to integrate with monitoring and graphing




  Consider it a toolkit for constructing solutions.
Questions? Answers.

More Related Content

Similar to Cassandra nyc

Enlightenment: A Cross Platform Window Manager & Toolkit
Enlightenment: A Cross Platform Window Manager & ToolkitEnlightenment: A Cross Platform Window Manager & Toolkit
Enlightenment: A Cross Platform Window Manager & ToolkitSamsung Open Source Group
 
The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019Karthik Murugesan
 
Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ SpotifyNikhil Tibrewal
 
Obvious and Non-Obvious Scalability Issues: Spotify Learnings
Obvious and Non-Obvious Scalability Issues: Spotify LearningsObvious and Non-Obvious Scalability Issues: Spotify Learnings
Obvious and Non-Obvious Scalability Issues: Spotify LearningsDavid Poblador i Garcia
 
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell SpotifyScaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell SpotifyEvention
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyNeville Li
 
Basic introduction to SOA
Basic introduction to SOABasic introduction to SOA
Basic introduction to SOAJoaquin Rincon
 
DevOps Naughties Style - How We DevOps at MP3.com in the Early 2000's
DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000'sDevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's
DevOps Naughties Style - How We DevOps at MP3.com in the Early 2000'stechopsguru
 
Maximum Uptime Cluster Orchestration with Ansible
Maximum Uptime Cluster Orchestration with AnsibleMaximum Uptime Cluster Orchestration with Ansible
Maximum Uptime Cluster Orchestration with AnsibleScyllaDB
 
Podcasting on AWS – A Discussion on Everything from Production to Distributio...
Podcasting on AWS – A Discussion on Everything from Production to Distributio...Podcasting on AWS – A Discussion on Everything from Production to Distributio...
Podcasting on AWS – A Discussion on Everything from Production to Distributio...Amazon Web Services
 
AWS Customer Presentation - Melodeo
AWS Customer Presentation - MelodeoAWS Customer Presentation - Melodeo
AWS Customer Presentation - MelodeoAmazon Web Services
 
Melodeo Nutsie is powered by AWS
Melodeo Nutsie is powered by AWSMelodeo Nutsie is powered by AWS
Melodeo Nutsie is powered by AWSguestda111d9
 
Rapid API Development with LoopBack/StrongLoop
Rapid API Development with LoopBack/StrongLoopRapid API Development with LoopBack/StrongLoop
Rapid API Development with LoopBack/StrongLoopRaymond Camden
 
Deliverance and Diazo - Easy Theming For Everyone
Deliverance and Diazo - Easy Theming For EveryoneDeliverance and Diazo - Easy Theming For Everyone
Deliverance and Diazo - Easy Theming For EveryoneRoché Compaan
 
High Performance WordPress - WordCamp Jerusalem 2010
High Performance WordPress - WordCamp Jerusalem 2010High Performance WordPress - WordCamp Jerusalem 2010
High Performance WordPress - WordCamp Jerusalem 2010Barry Abrahamson
 
Free The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainFree The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainKen Collins
 
Preparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsPreparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsHPCC Systems
 
Last.fm - Lessons from building the World's largest social music platform
Last.fm - Lessons from building the World's largest social music platform Last.fm - Lessons from building the World's largest social music platform
Last.fm - Lessons from building the World's largest social music platform randomfromtheweb
 
Colin Carter - LSPs and APIs
Colin Carter  - LSPs and APIsColin Carter  - LSPs and APIs
Colin Carter - LSPs and APIssconul
 

Similar to Cassandra nyc (20)

sql.pdf
sql.pdfsql.pdf
sql.pdf
 
Enlightenment: A Cross Platform Window Manager & Toolkit
Enlightenment: A Cross Platform Window Manager & ToolkitEnlightenment: A Cross Platform Window Manager & Toolkit
Enlightenment: A Cross Platform Window Manager & Toolkit
 
The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019
 
Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ Spotify
 
Obvious and Non-Obvious Scalability Issues: Spotify Learnings
Obvious and Non-Obvious Scalability Issues: Spotify LearningsObvious and Non-Obvious Scalability Issues: Spotify Learnings
Obvious and Non-Obvious Scalability Issues: Spotify Learnings
 
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell SpotifyScaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell Spotify
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Basic introduction to SOA
Basic introduction to SOABasic introduction to SOA
Basic introduction to SOA
 
DevOps Naughties Style - How We DevOps at MP3.com in the Early 2000's
DevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000'sDevOps Naughties Style - How We  DevOps at MP3.com in the Early 2000's
DevOps Naughties Style - How We DevOps at MP3.com in the Early 2000's
 
Maximum Uptime Cluster Orchestration with Ansible
Maximum Uptime Cluster Orchestration with AnsibleMaximum Uptime Cluster Orchestration with Ansible
Maximum Uptime Cluster Orchestration with Ansible
 
Podcasting on AWS – A Discussion on Everything from Production to Distributio...
Podcasting on AWS – A Discussion on Everything from Production to Distributio...Podcasting on AWS – A Discussion on Everything from Production to Distributio...
Podcasting on AWS – A Discussion on Everything from Production to Distributio...
 
AWS Customer Presentation - Melodeo
AWS Customer Presentation - MelodeoAWS Customer Presentation - Melodeo
AWS Customer Presentation - Melodeo
 
Melodeo Nutsie is powered by AWS
Melodeo Nutsie is powered by AWSMelodeo Nutsie is powered by AWS
Melodeo Nutsie is powered by AWS
 
Rapid API Development with LoopBack/StrongLoop
Rapid API Development with LoopBack/StrongLoopRapid API Development with LoopBack/StrongLoop
Rapid API Development with LoopBack/StrongLoop
 
Deliverance and Diazo - Easy Theming For Everyone
Deliverance and Diazo - Easy Theming For EveryoneDeliverance and Diazo - Easy Theming For Everyone
Deliverance and Diazo - Easy Theming For Everyone
 
High Performance WordPress - WordCamp Jerusalem 2010
High Performance WordPress - WordCamp Jerusalem 2010High Performance WordPress - WordCamp Jerusalem 2010
High Performance WordPress - WordCamp Jerusalem 2010
 
Free The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainFree The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own Domain
 
Preparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsPreparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for Translations
 
Last.fm - Lessons from building the World's largest social music platform
Last.fm - Lessons from building the World's largest social music platform Last.fm - Lessons from building the World's largest social music platform
Last.fm - Lessons from building the World's largest social music platform
 
Colin Carter - LSPs and APIs
Colin Carter  - LSPs and APIsColin Carter  - LSPs and APIs
Colin Carter - LSPs and APIs
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Cassandra nyc

  • 1. Cassandra at Spotify 7th of March 2012
  • 3. About this talk An introduction Spotify, to our service and our persistent storage needs
  • 4. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings
  • 5. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned
  • 6. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned What I would have liked to have known a year ago
  • 7. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned What I would have liked to have known a year ago
  • 8. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned What I would have liked to have known a year ago Not a comparison between different NoSQL solutions
  • 9. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned What I would have liked to have known a year ago Not a comparison between different NoSQL solutions Not a hands on introduction to Cassandra
  • 10. About this talk An introduction Spotify, to our service and our persistent storage needs What Cassandra brings What we have learned What I would have liked to have known a year ago Not a comparison between different NoSQL solutions Not a hands on introduction to Cassandra We work with physical hardware for production
  • 12. Noa Resare Stockholm, Sweden
  • 13. Noa Resare Stockholm, Sweden Service Reliability Engineering
  • 14. Noa Resare Stockholm, Sweden Service Reliability Engineering noa@spotify.com
  • 15. Noa Resare Stockholm, Sweden Service Reliability Engineering noa@spotify.com @blippie
  • 16. Spotify — all music, all the time
  • 17. Spotify — all music, all the time A better user experience than file sharing.
  • 18. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients.
  • 19. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability.
  • 20. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability.
  • 21. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability. 12 markets. More than ten million users.
  • 22. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability. 12 markets. More than ten million users. 3 datacenters.
  • 23. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability. 12 markets. More than ten million users. 3 datacenters. Tens of gigabits of data pushed per datacenter.
  • 24. Spotify — all music, all the time A better user experience than file sharing. Native desktop and mobile clients. Custom backend, built for performance and scalability. 12 markets. More than ten million users. 3 datacenters. Tens of gigabits of data pushed per datacenter. Backend systems that support a large set of innovative features.
  • 26. Innovative features in practice Playlist
  • 27. Innovative features in practice Playlist Should be simple, right?
  • 28. Innovative features in practice Playlist Should be simple, right? A named list of tracks
  • 29. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated
  • 30. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync
  • 31. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists
  • 32. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists Offline editing on multiple devices
  • 33. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists Offline editing on multiple devices Changes pushed to connected devices
  • 34. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists Offline editing on multiple devices Changes pushed to connected devices Scale. More than half a billion lists currently in the system
  • 35. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists Offline editing on multiple devices Changes pushed to connected devices Scale. More than half a billion lists currently in the system About 10 khz on peak traffic.
  • 36. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists Offline editing on multiple devices Changes pushed to connected devices Scale. More than half a billion lists currently in the system About 10 khz on peak traffic. Resulting storage requirements:
  • 37. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists Offline editing on multiple devices Changes pushed to connected devices Scale. More than half a billion lists currently in the system About 10 khz on peak traffic. Resulting storage requirements: Full history
  • 38. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists Offline editing on multiple devices Changes pushed to connected devices Scale. More than half a billion lists currently in the system About 10 khz on peak traffic. Resulting storage requirements: Full history Really fast access to latest version number and content
  • 39. Innovative features in practice Playlist Should be simple, right? A named list of tracks It gets more complicated Keep multiple devices in sync Support nested playlists Offline editing on multiple devices Changes pushed to connected devices Scale. More than half a billion lists currently in the system About 10 khz on peak traffic. Resulting storage requirements: Full history Really fast access to latest version number and content
  • 41. Suggested solutions Flat files
  • 42. Suggested solutions Flat files We don’t need ACID
  • 43. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass.
  • 44. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really)
  • 45. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL
  • 46. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this
  • 47. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this Simple Key-Value store
  • 48. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this Simple Key-Value store Tokyo cabinet, some experience
  • 49. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this Simple Key-Value store Tokyo cabinet, some experience Clustered Key-Value store
  • 50. Suggested solutions Flat files We don’t need ACID Linux page cache kicks ass. (Not really) SQL Tried and true. Facebook does this Simple Key-Value store Tokyo cabinet, some experience Clustered Key-Value store Evaluated a lot, end game contestants HBase and Cassandra
  • 52. Enter Cassandra Solves a large subset of storage related problems
  • 53. Enter Cassandra Solves a large subset of storage related problems Sharding, replication
  • 54. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure
  • 55. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Ability to make the performance/reliability tradeoff per request
  • 56. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Ability to make the performance/reliability tradeoff per request Free software
  • 57. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Ability to make the performance/reliability tradeoff per request Free software Active community, commercial backing
  • 58. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Ability to make the performance/reliability tradeoff per request Free software Active community, commercial backing
  • 59. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Ability to make the performance/reliability tradeoff per request Free software Active community, commercial backing
  • 60. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Ability to make the performance/reliability tradeoff per request Free software Active community, commercial backing 66 + 18 + 9 + 28 production nodes
  • 61. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Ability to make the performance/reliability tradeoff per request Free software Active community, commercial backing 66 + 18 + 9 + 28 production nodes About twenty nodes for various testing clusters
  • 62. Enter Cassandra Solves a large subset of storage related problems Sharding, replication No single point of failure Ability to make the performance/reliability tradeoff per request Free software Active community, commercial backing 66 + 18 + 9 + 28 production nodes About twenty nodes for various testing clusters Datasets ranging from 8T to a few gigs.
  • 63. Cassandra key concepts, on a node Log structured storage Sorted string table — SSTable Immutable files on disk Compaction — Many to one, merge sort Memtable SSTable SSTable SSTable
  • 64. Cassandra key concepts, In a cluster Clusters of nodes in a ring by key order All data typically written to several nodes, Replication Factor Rings can be expanded in production Gossip, detects nodes being up / down / joining Anti Entropy mechanisms Many read operations can be done sequentially
  • 66. Cassandra, winning! Major upgrades without service interruptions (in theory)
  • 67. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes
  • 68. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you
  • 69. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Somewhat predictable number of seeks needed for read
  • 70. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Somewhat predictable number of seeks needed for read Knows that sequential I/O faster than random I/O
  • 71. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Somewhat predictable number of seeks needed for read Knows that sequential I/O faster than random I/O In case of inconsistencies, knows what to do
  • 72. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Somewhat predictable number of seeks needed for read Knows that sequential I/O faster than random I/O In case of inconsistencies, knows what to do Replacing broken nodes straightforward
  • 73. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Somewhat predictable number of seeks needed for read Knows that sequential I/O faster than random I/O In case of inconsistencies, knows what to do Replacing broken nodes straightforward Cross datacenter replication support
  • 74. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Somewhat predictable number of seeks needed for read Knows that sequential I/O faster than random I/O In case of inconsistencies, knows what to do Replacing broken nodes straightforward Cross datacenter replication support Tinker friendly
  • 75. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Somewhat predictable number of seeks needed for read Knows that sequential I/O faster than random I/O In case of inconsistencies, knows what to do Replacing broken nodes straightforward Cross datacenter replication support Tinker friendly Readable code
  • 76. Cassandra, winning! Major upgrades without service interruptions (in theory) Crazy fast writes Not just because you have a hardware RAID card that is good at lying to you Somewhat predictable number of seeks needed for read Knows that sequential I/O faster than random I/O In case of inconsistencies, knows what to do Replacing broken nodes straightforward Cross datacenter replication support Tinker friendly Readable code
  • 77. Let me tell you a story
  • 78. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5
  • 79. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime?
  • 80. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120.
  • 81. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top
  • 82. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top Mattias de Zalenski: log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557 (2^54) nanoseconds = 208.499983 days Somewhere nanosecond values are shifted ten bits?
  • 83. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top Mattias de Zalenski: log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557 (2^54) nanoseconds = 208.499983 days Somewhere nanosecond values are shifted ten bits? Downtime for payment
  • 84. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top Mattias de Zalenski: log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557 (2^54) nanoseconds = 208.499983 days Somewhere nanosecond values are shifted ten bits? Downtime for payment Downtime for account creation
  • 85. Let me tell you a story Latest stable kernel from Debian Squeeze 2.6.32-5 What happens after 209 days of uptime? Load average around 120. No CPU activity reported by top Mattias de Zalenski: log((209 days) / (1 nanoseconds)) / log(2) = 54.0034557 (2^54) nanoseconds = 208.499983 days Somewhere nanosecond values are shifted ten bits? Downtime for payment Downtime for account creation No downtime for cassandra backed systems
  • 87. Backups A few terabytes of live data, many nodes. Painful.
  • 88. Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data
  • 89. Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data Non-compacted. Possibly a few tens of old versions.
  • 90. Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data Non-compacted. Possibly a few tens of old versions. Initially, only full backups (pre 0.8)
  • 91. Backups A few terabytes of live data, many nodes. Painful. Inefficient. Copy of on disk structure, at least 3 times the data Non-compacted. Possibly a few tens of old versions. Initially, only full backups (pre 0.8)
  • 92. Our solution to backups
  • 93. Our solution to backups Separate datacenter for backups with RF=1
  • 94. Our solution to backups Separate datacenter for backups with RF=1 Beware: tricky
  • 95. Our solution to backups Separate datacenter for backups with RF=1 Beware: tricky Once removed from production performance considerations
  • 96. Our solution to backups Separate datacenter for backups with RF=1 Beware: tricky Once removed from production performance considerations Application level incremental backups
  • 97. Our solution to backups Separate datacenter for backups with RF=1 Beware: tricky Once removed from production performance considerations Application level incremental backups Soon: Cassandra incremental backups
  • 98. Solid state is a game changer
  • 99. Solid state is a game changer Large datasets, light read load
  • 100. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load
  • 101. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD?
  • 102. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No.
  • 103. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No. With small disks, on disk datastructure size matters a lot
  • 104. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No. With small disks, on disk datastructure size matters a lot
  • 105. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No. With small disks, on disk datastructure size matters a lot
  • 106. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No. With small disks, on disk datastructure size matters a lot Our plan:
  • 107. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No. With small disks, on disk datastructure size matters a lot Our plan: Leveled compaction strategy, new in 1.0
  • 108. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No. With small disks, on disk datastructure size matters a lot Our plan: Leveled compaction strategy, new in 1.0 Hack cassandra to have configurable datadirs per keyspace.
  • 109. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No. With small disks, on disk datastructure size matters a lot Our plan: Leveled compaction strategy, new in 1.0 Hack cassandra to have configurable datadirs per keyspace. Our patch is integrated in Cassandra 1.1
  • 110. Solid state is a game changer Large datasets, light read load Small datasets, heavy read load I Can Haz superlarge SSD? No. With small disks, on disk datastructure size matters a lot Our plan: Leveled compaction strategy, new in 1.0 Hack cassandra to have configurable datadirs per keyspace. Our patch is integrated in Cassandra 1.1
  • 113. Some unpleasant surprises Immaturity Hector, larger mutations than 15MB. Connection drops in thrift.
  • 114. Some unpleasant surprises Immaturity Hector, larger mutations than 15MB. Connection drops in thrift. Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0
  • 115. Some unpleasant surprises Immaturity Hector, larger mutations than 15MB. Connection drops in thrift. Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0 Small disk, high load, very possible to get into an Out Of Disk condition
  • 116. Some unpleasant surprises Immaturity Hector, larger mutations than 15MB. Connection drops in thrift. Broken on disk bloom filters in 0.8. Very painful upgrade to 1.0 Small disk, high load, very possible to get into an Out Of Disk condition Logging is lacking
  • 118. Spot the bug Hector java cassandra driver:
  • 119. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; }
  • 120. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; } Race condition
  • 121. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; } Race condition java.lang.ArrayIndexOutOfBoundsException
  • 122. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; } Race condition java.lang.ArrayIndexOutOfBoundsException After close to 2**31 requests
  • 123. Spot the bug Hector java cassandra driver: private AtomicInteger counter = new AtomicInteger(); private Server getNextServer() { counter.compareAndSet(16384, 0); return servers[counter.getAndIncrement() % servers.length]; } Race condition java.lang.ArrayIndexOutOfBoundsException After close to 2**31 requests Took about 5 days
  • 125. Conclusions In the 0.6-1.0 timeframe, development engineers and operations are needed
  • 126. Conclusions In the 0.6-1.0 timeframe, development engineers and operations are needed You need to keep an eye on bugs created, be part of the community
  • 127. Conclusions In the 0.6-1.0 timeframe, development engineers and operations are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky
  • 128. Conclusions In the 0.6-1.0 timeframe, development engineers and operations are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky Lots of things gets fixed. You need to keep up with upstream
  • 129. Conclusions In the 0.6-1.0 timeframe, development engineers and operations are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky Lots of things gets fixed. You need to keep up with upstream You need to integrate with monitoring and graphing
  • 130. Conclusions In the 0.6-1.0 timeframe, development engineers and operations are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky Lots of things gets fixed. You need to keep up with upstream You need to integrate with monitoring and graphing
  • 131. Conclusions In the 0.6-1.0 timeframe, development engineers and operations are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky Lots of things gets fixed. You need to keep up with upstream You need to integrate with monitoring and graphing
  • 132. Conclusions In the 0.6-1.0 timeframe, development engineers and operations are needed You need to keep an eye on bugs created, be part of the community Exotic stuff (such a asymmetrically sized datacenters) is tricky Lots of things gets fixed. You need to keep up with upstream You need to integrate with monitoring and graphing Consider it a toolkit for constructing solutions.

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \n
  83. \n
  84. \n
  85. \n
  86. \n
  87. \n
  88. \n
  89. \n
  90. \n
  91. \n
  92. \n
  93. \n
  94. \n
  95. \n
  96. \n
  97. \n
  98. \n
  99. \n
  100. \n
  101. \n
  102. \n
  103. \n
  104. \n
  105. \n
  106. \n
  107. \n
  108. \n
  109. \n
  110. \n
  111. \n
  112. \n
  113. \n
  114. \n
  115. \n
  116. \n
  117. \n
  118. \n
  119. \n