Cassandra from the trenches:      migrating Netflix          Jason Brown    Senior Software Engineer             Netflix  ...
History, 2008• In the beginning, there was the webapp  – And a database  – In one datacenter• Then we grew, and grew, and ...
History, 2009• Time to rethink everything  – Abandon our datacenter  – Ditch the monolithic webapp  – Migrate single point...
History, 2010• SimpleDB/S3  – Managed by Amazon, not us  – Got us started with NoSQL in the cloud  – Problems:     • High ...
Shiny new toy (2011)• We switched to Cassandra  – Similar to SimpleDB, with limits removed  – Dynamo-model appealed to us ...
Data Modeling -  Where the rubber meets the road
About Netflix’s AB Testing• Basic concepts  – Test – An experiment where several competing    behaviors are implemented an...
Data Modeling - background• AB has two sets of data  – metadata about tests  – allocations
AB - allocations• Single table to hold allocations  – Currently at > 1 billion records  – Plus indices!• One record for ev...
AB – relational model• Typical parent-child table relationship• Not updated frequently, so service can cache
Data modeling in Cassandra• Every where I looked, the Internet told me to  understand my data use patterns• Identify the q...
Identifying the AB questions that need            to be answered• High traffic  – get all allocations for a customer• Low ...
Modeling allocations in Cassandra• Read all allocations for a customer  – as fast as possible• Find all of customers in a ...
Denormalization - HOWTO• No real world examples  – ‘Normalization is for sissies’, Pat Helland• Denormalize allocations pe...
Denormalized allocations• normalized data• denormalized (sparse) data
Implementing allocations• As allocation for a customer has a handful of  data points, they logically can be grouped  toget...
Composite columns• Composite columns are sorted by each ‘token’  in name• Allocation column naming convention  – <testId>:...
Modeling AB metadata in cassandra• Explored several models, including json  blobs, spreading across multiple CFs, differin...
Implementing metadata• One CF, one row for all test’s data  – Every data point is a column – no blobs• Composite columns  ...
Implementing indices• Cassandra’s secondary indices vs. hand-built  and maintained alternate indices• Secondary indices wo...
Hand-built Indices, 1• Reverse index  – Test/cell (key) to custIds (columns)     • Column value is timestamp• Updating ind...
Hand-built indices, 2• Counter column family  – Test/cell to count of customers in test columns  – Mutate on allocating a ...
Index rebuilding• To keep the index consistent, it needs to be  rebuilt occasionally• Even Oracle needs to have it’s indic...
Into the real world
Cassandra java clients• Hector  – github.com/rantav/hector• Astyanax  – Developed at Netflix (Eran Landau)  – github.com/n...
Astyanax features•   Clean object model•   Node discovery•   Node quarantine•   Request failover/retry•   JMX Monitoring• ...
Astyanax code example,1
Astyanax code example, 2
Astyanax code example, 3
Astyanax connection pools, 1• Round Robin uses coordinator node
Astyanax connection pooling, 2• Token aware knows where the data resides for  point reads
Astyanax latency aware• Samples response times from Cassandra  nodes• Favors faster responding nodes in pool• Use with tok...
Allocation mutates• AB allocations are immutable, so we need to  prevent mutating• Oracle - unique table constraint• Cassa...
Running cassandra• Compactions happen  – how Cassandra is maintained  – Mutations are written to memory (Memtable)  – Flus...
Compactions, 2• Latency spikes happen, especially on read-  heavy systems  – Everything can slow down  – Throttling in new...
Tunings, 1• Key and row caches  – Left unbounded can consume JVM memory    needed for normal work  – Latencies will spike ...
Tunings, 2• mmap() as in-memory cache  – When the Cassandra process is terminated, mmap    pages are returned to the free ...
Tunings, 3• Sizing memtable flushes for optimizing  compactions  – Easier when writes are uniformly    distributed, timewi...
Tunings, 4• Sharding  – If a single row has disproportionately high    gets/mutates, the nodes holding it will become    h...
Takeaways• Netflix is making all of our components  distributed and fault tolerant as we grow  domestically and internatio...
終わり(The End)• Q&A        @jasobrown jasedbrown@gmail.com        http://www.linkedin.com/in/jasedbrown
References• Pat Helland, ‘Normalization Is for Sissies”  http://blogs.msdn.com/b/pathelland/archive/  2007/07/23/normaliza...
Upcoming SlideShare
Loading in …5
×

Cassandra from the trenches: migrating Netflix (update)

6,760 views

Published on

Update talk on Cassandra at Netflix, presented at the Silicon Valley NoSQL meetup on 9 Feb 2012. Includes an introduction to Astyanax, an open source cassandra client written in java.

Published in: Technology
  • I have a column Family which stores couple of columns such as name, salary, messageTimeStamp, all of these columns are indexed
    The key for the row is UUID. I have a UI from where a user want to do a search on messageTimeStamp, namely by putting the start
    and the end date. I need to retrieve all the rows that fall under these dates. Because the search is coming from the UI, I do not
    know the start key nor the end key. All i have is two values one is startDate and other is endDate. I will really appreciate if you
    can help me with the situation
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Cassandra from the trenches: migrating Netflix (update)

  1. 1. Cassandra from the trenches: migrating Netflix Jason Brown Senior Software Engineer Netflix @jasobrown jasedbrown@gmail.com http://www.linkedin.com/in/jasedbrown
  2. 2. History, 2008• In the beginning, there was the webapp – And a database – In one datacenter• Then we grew, and grew, and grew – More databases, all conjoined – Database links, PL/SQL, Materialized views – Multi-Master replication (MMR)• Then it melted down – Couldn’t ship DVDs for ~3 days
  3. 3. History, 2009• Time to rethink everything – Abandon our datacenter – Ditch the monolithic webapp – Migrate single point of failure database to …
  4. 4. History, 2010• SimpleDB/S3 – Managed by Amazon, not us – Got us started with NoSQL in the cloud – Problems: • High latency, rate limiting (throttling) • (no) auto-sharding, no backups
  5. 5. Shiny new toy (2011)• We switched to Cassandra – Similar to SimpleDB, with limits removed – Dynamo-model appealed to us – Column-based, key-value data model seemed sufficient for most needs – Performance looked great (rudimentary tests)
  6. 6. Data Modeling - Where the rubber meets the road
  7. 7. About Netflix’s AB Testing• Basic concepts – Test – An experiment where several competing behaviors are implemented and compared – Cell – different experiences within a test that are being compared against each other – Allocation – a customer-specific assignment to a cell within a test
  8. 8. Data Modeling - background• AB has two sets of data – metadata about tests – allocations
  9. 9. AB - allocations• Single table to hold allocations – Currently at > 1 billion records – Plus indices!• One record for every test that every customer is allocated into• Unique constraint on customer/test
  10. 10. AB – relational model• Typical parent-child table relationship• Not updated frequently, so service can cache
  11. 11. Data modeling in Cassandra• Every where I looked, the Internet told me to understand my data use patterns• Identify the questions that you need to answer from the data• Know how to query your data set and make the persistence model match
  12. 12. Identifying the AB questions that need to be answered• High traffic – get all allocations for a customer• Low traffic – get count of customers in test/cell – find all customers in a test/cell – find all customers in a test who were added within a date range
  13. 13. Modeling allocations in Cassandra• Read all allocations for a customer – as fast as possible• Find all of customers in a test/cell – reverse index• Get count of customers in test/cell – count the entries in the reverse index
  14. 14. Denormalization - HOWTO• No real world examples – ‘Normalization is for sissies’, Pat Helland• Denormalize allocations per customer – Trivial with a schema-less database
  15. 15. Denormalized allocations• normalized data• denormalized (sparse) data
  16. 16. Implementing allocations• As allocation for a customer has a handful of data points, they logically can be grouped together• Avoided blobs, json or otherwise• Using a standard column family, with composite columns
  17. 17. Composite columns• Composite columns are sorted by each ‘token’ in name• Allocation column naming convention – <testId>:<field> – 42:cell = 2 – 42:enabled = Y – 47:cell = 0 – 47:enabled = Y
  18. 18. Modeling AB metadata in cassandra• Explored several models, including json blobs, spreading across multiple CFs, differing degrees of denormalization• Reverse index to identify all tests for loading
  19. 19. Implementing metadata• One CF, one row for all test’s data – Every data point is a column – no blobs• Composite columns – type:id:field • Types = base info, cells, allocation plans • Id = cell number, allocation plan (gu)id • Field = type-specific – Base info = test name, description, enabled – Cell’s name / description – Plan’s start/end dates, country to allocate to
  20. 20. Implementing indices• Cassandra’s secondary indices vs. hand-built and maintained alternate indices• Secondary indices work great on uniform data between rows• But sparse column data not easy to index
  21. 21. Hand-built Indices, 1• Reverse index – Test/cell (key) to custIds (columns) • Column value is timestamp• Updating index when allocating a customer into test (double write)
  22. 22. Hand-built indices, 2• Counter column family – Test/cell to count of customers in test columns – Mutate on allocating a customer into test• Counters are not idempotent!• Mutates need to write to every node that hosts that key
  23. 23. Index rebuilding• To keep the index consistent, it needs to be rebuilt occasionally• Even Oracle needs to have it’s indices rebuilt
  24. 24. Into the real world
  25. 25. Cassandra java clients• Hector – github.com/rantav/hector• Astyanax – Developed at Netflix (Eran Landau) – github.com/netflix• Cassie (scala) – Developed at Twitter – https://github.com/twitter/cassie
  26. 26. Astyanax features• Clean object model• Node discovery• Node quarantine• Request failover/retry• JMX Monitoring• Connection pooling• Future execution
  27. 27. Astyanax code example,1
  28. 28. Astyanax code example, 2
  29. 29. Astyanax code example, 3
  30. 30. Astyanax connection pools, 1• Round Robin uses coordinator node
  31. 31. Astyanax connection pooling, 2• Token aware knows where the data resides for point reads
  32. 32. Astyanax latency aware• Samples response times from Cassandra nodes• Favors faster responding nodes in pool• Use with token aware connection pooling
  33. 33. Allocation mutates• AB allocations are immutable, so we need to prevent mutating• Oracle - unique table constraint• Cassandra - read before write – data race!
  34. 34. Running cassandra• Compactions happen – how Cassandra is maintained – Mutations are written to memory (Memtable) – Flushed to disk (SSTable) on triggering threshold – Eventually, Cassandra merges SSTables as data for individual rows becomes scattered
  35. 35. Compactions, 2• Latency spikes happen, especially on read- heavy systems – Everything can slow down – Throttling in newer Cassandra versions helps – Astyanax avoids this problem with latency awareness
  36. 36. Tunings, 1• Key and row caches – Left unbounded can consume JVM memory needed for normal work – Latencies will spike as the JVM fights for free memory – Off-heap row cache is better but still maintains data structures on-heap
  37. 37. Tunings, 2• mmap() as in-memory cache – When the Cassandra process is terminated, mmap pages are returned to the free list• Row cache helps at startup
  38. 38. Tunings, 3• Sizing memtable flushes for optimizing compactions – Easier when writes are uniformly distributed, timewise – easier to reason about flush patterns – Best to optimize flushes based on memtable size, not time
  39. 39. Tunings, 4• Sharding – If a single row has disproportionately high gets/mutates, the nodes holding it will become hot spots – If a row grows too large, it can’t fit into memory
  40. 40. Takeaways• Netflix is making all of our components distributed and fault tolerant as we grow domestically and internationally.• Cassandra is a core piece of our cloud infrastructure.• Netflix is open sourcing it’s cloud platform, including Cassandra support
  41. 41. 終わり(The End)• Q&A @jasobrown jasedbrown@gmail.com http://www.linkedin.com/in/jasedbrown
  42. 42. References• Pat Helland, ‘Normalization Is for Sissies” http://blogs.msdn.com/b/pathelland/archive/ 2007/07/23/normalization-is-for-sissies.aspx

×