Cassandra from the trenches: migrating Netflix

  • 4,538 views
Uploaded on

Slide deck on migrating Netflix to Cassandra in EC2 from a legacy, DC-bound relational database.

Slide deck on migrating Netflix to Cassandra in EC2 from a legacy, DC-bound relational database.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,538
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
86
Comments
0
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Point of departure from the datacenterData modeling -Relational to non-relImplementation(s)real world – Ops, tuning, compactions, gotchas
  • Background as to why netflix has moved to the cloud and embraced new databases
  • Circa mid-late 2010, we evaluated a bunch of database systems, primarily focusing on the new NoSQL breed.
  • I lead AB testing and we’ll be using that data set as a model for discussion. I’ll describe the legacy oracle implementation and how I went about moving it to cass
  • Show example of an AB test (1482) on the homepage
  • Existing data sets in our legacy Oracle database that need to be migrated and transformed
  • LAST SLIDE ON DATA MODELING! Next is running this in prod!
  • Going to share real world issues from design, ops, performance
  • Some some systems, as long as one writes wins (eventual consistency), all is fine
  • Explain difference between read repair and node repair
  • Makes minor compactions smoother
  • Too large - AB Indices ran afoul of thisProblem for reads, compactions, and repairs

Transcript

  • 1. Cassandra from the trenches: migrating Netflix Jason Brown Senior Software Engineer Netflix @jasobrown jasedbrown@gmail.com http://www.linkedin.com/in/jasedbrown
  • 2. Your host for the evening• Sr. Software Engineer at Netflix > 3 years – Currently lead a team developing and operating AB testing infrastructure in EC2 – Spent time migrating core e-commerce functionality out of PL/SQL and scaling it up• MLB Advanced Media – Ran Ecommerce engineering group• Wandered about in the wireless space (J2ME, BREW)
  • 3. History• In the beginning, there was the webapp – And a database, too – In one datacenter• Then we grew, and grew, and grew – More databases, all conjoined – Database links with PL/SQL and M views – Multi-Master replication
  • 4. History,2• Then it melted down (2008) – Oracle MMR between two databases – SPOF – one Oracle instance for website (no backup)• Couldn’t ship DVDs for ~3 days
  • 5. History,3• Time to rethink everything – Abandon datacenter for EC2 • We’re not in the business of building datacenters – Ditch monolithic webapp for distributed systems • Greater independence for all teams/initiatives – Migrate SPOF database to …
  • 6. History,4• SimpleDb/S3 – Somebody else manages your database (yeah!) – Tried it out, but didn’t quite work well for us – High latency, rate limiting (throttling), (no) auto- sharding, no backup problems• Time to try out one of them (other) new fangled NoSql things…
  • 7. Shiny new toy• We selected Cassandra – Dynamo-model appealed to us – Column-based, key-value data model seemed sufficient for most needs – Performance looked great (rudimentary tests)• Now what? – Put something into it – Run it in EC2 – Sounds easy enough…
  • 8. • Data Modeling – Where the rubber meets the road
  • 9. About Netflix’s AB Testing• We use it everywhere (no, really)• Basic concepts – Test – An experiment where several competing behaviors are implemented and compared – Cell – different experiences within a test that are being compared against each other – Allocation – a customer-specific assignment to a cell within a test • Customer can only be in one cell of a test at a time • Generally immutable (very important for analysis)
  • 10. Data Modeling - background• AB has two sets of data – metadata about tests – allocations• Both need to be migrated out of Oracle and into Cassandra in the cloud
  • 11. AB - allocations• Single table to hold allocations – Currently at ~950 million records – Plus indices!• One record for every test that every customer is allocated into• Unique constraint on customer/test
  • 12. AB - metadata• Fairly typical parent-child table relationship• Not updated frequently, so service can cache
  • 13. Data modeling in cassandra• Every where I looked, the internets told me to understand my data use patterns – Understand the questions that you need to answer from the data • Meaning: know how to query your data structure the persistence model to match• There’s no free lunch here, apparently
  • 14. Identifying the AB questions that need to be answered• get all allocations for a customer• get count of customers in test/cell• find all customers in a test/cell – So we can kick them out of the test – So we can clean up ancient data – So we can move them to a different cell in test• find all customers allocated to test within a date range – So we can kick them out of the test
  • 15. Modeling allocations in cassandra• As we’re read-heavy, read all allocations for a customer as fast as possible – Denormalize allocations into a single row – But, how do I denormalize?• Find all of customers in a test/cell = reverse index• Get count of customers in test/cell = count the entries in the reverse index
  • 16. Denormalization-HOWTO• The internets talk about it, but no real world examples – ‘Normalization is for sissies’, Pat Helland• Denormalizing allocations per customer – Trivial with a schema-less database
  • 17. Denormalized allocations• Sample normalized data• Sample denormalized data (sparse!)
  • 18. Implementing allocations• As allocation for a customer has a handful of data points, they logically can be grouped together• Hello, super columns• Avoided blobs, json or otherwise – data race concerns – BI integration – Serialization alg changes could tank the data
  • 19. Implementing allocations, second round• But, cassandra devs secretly despise don’t enjoy super columns• Switched to standard column family, using composite columns• Composite columns are sorted by each ‘token’ in name – This sorts each allocation’s data together (by testId)
  • 20. Composite columns• Allocation column naming convention – <testId>:<field> – 42:cell = 2 – 42:enabled = Y – 47:cell = 0 – 47:enabled = Y• Using terse field names, but still have column name overhead (~15 bytes)
  • 21. Implementing indices• Cassandra’s secondary indices vs. hand-built and maintained alternate indices• Secondary indices work great on uniform data between rows• But sparse column data not so easy
  • 22. Hand-built Indices, 1• Reverse index – Test/cell (key) to custIds (columns) • Column value is timestamp• Mutate on allocating a customer into test
  • 23. Hand-built indices, 2• Counter column family – Test/cell to count of customers in test columns – Mutate on allocating a customer into test• Counters are not idempotent!• Mutates need to write to every node that hosts that key
  • 24. Index rebuilding• Yeah, even Oracle needs to have it’s indices rebuilt• Easy enough to rebuild the reverse index, but how about that counter column? – Read the reverse index for the count and write that as counter’s value
  • 25. Modeling AB metadata in cassandra• Explored several models, including json blobs, spreading across multiple CFs, differing degrees of denormalization• Reverse index to identify all tests for loading
  • 26. Implementing metadata• One CF, one row for all test’s data – Every data point is a column – no blobs• Composite columns – type:id:field • Types = base info, cells, allocation plans • Id = cell number, allocation plan (gu)id • Field = type-specific – Base info = test name, description, enabled – Cell’s name / description – Plan’s start/end dates, country to allocate to
  • 27. Into the real world … here comes the hurt
  • 28. Allocation mutates• AB allocations are immutable, so how do you prevent mutating? – Oracle – unique constraint on table – Cassandra – read before write• Read before write in a distributed system is a data race
  • 29. Running cassandra• Compactions happen – Part of the Cassandra lifestyle – Mutations are written to memory (memtable) – Flushed to disk (sstable) on triggering threshold • Time • Size • Operations against column family – Eventually, Cassandra decides to merge sstables as data for a individual rows becomes scattered
  • 30. Compactions, 2• Spikes happen, esp. on read-heavy systems – Everything can slow down – Sometimes, average latency > 95%ile – Throttling in newer Cass versions helps, I think – Affects clients (hector, astyanax)
  • 31. Repairs• Different from read repair!• Fix all the data in a single node by pulling shared ranges from neighbor nodes
  • 32. Repairs, 2• Replication factor determines number of nodes involved in repair of single node• Neighbor nodes will perform validation compaction – Pushes disk and network hard dep. on data size• Guess what happens when you run a multi- region cluster?
  • 33. Client libraries• Round-robin is not the way to go for connection pooling – Coordinator Cassandra nodes will incorrectly be marked down rather than target slow node• Token-aware is safer, faster, but harder to implement
  • 34. Tunings, 1• Key and row caches – Left unbounded can chew up jvm memory needed for normal work – Latencies will spike as the jvm needs to fight for memory – Off-heap row cache is better but still maintains data structures on-heap
  • 35. Tunings, 2• mmap() as in-memory cache – When process terminated, mmap pages are added to the free list
  • 36. Tunings, 3• Sizing memtable flushes for optimizing compactions – Easier when writes are uniformly distributed, timewise – easier to reason about flush patterns – Best to optimize flushes based on memtable size, not time
  • 37. Tunings, 4• Sharding – Not dead yet! – If a single row has disproportionately high gets/mutates, the nodes holding it will become hot spots – If a row grows too large, it won’t fit into memory
  • 38. Takeaways• Netflix is making all of our components distributed and fault tolerant as we grow domestically and internationally.• Cassandra is a core piece of our cloud infrastructure.
  • 39. 終わり(The End)• Q&A @jasobrown jasedbrown@gmail.com http://www.linkedin.com/in/jasedbrown
  • 40. References• Pat Helland, ‘Normalization Is for Sissies” http://blogs.msdn.com/b/pathelland/archive/ 2007/07/23/normalization-is-for-sissies.aspx• btoddb, “Storage Sizing” http://btoddb-cass- storage.blogspot.com/