Cassandra from the trenches: migrating Netflix (update)

Cassandra from the trenches:
migrating Netflix
Jason Brown
Senior Software Engineer
Netflix
@jasobrown jasedbrown@gmail.com

http://www.linkedin.com/in/jasedbrown

History, 2008
• In the beginning, there was the webapp
– And a database
– In one datacenter
• Then we grew, and grew, and grew
– More databases, all conjoined
– Database links, PL/SQL, Materialized views
– Multi-Master replication (MMR)
• Then it melted down
– Couldn’t ship DVDs for ~3 days

History, 2009
• Time to rethink everything
– Abandon our datacenter
– Ditch the monolithic webapp
– Migrate single point of failure database to …

History, 2010
• SimpleDB/S3
– Managed by Amazon, not us
– Got us started with NoSQL in the cloud
– Problems:
• High latency, rate limiting (throttling)
• (no) auto-sharding, no backups

Shiny new toy (2011)
• We switched to Cassandra
– Similar to SimpleDB, with limits removed
– Dynamo-model appealed to us
– Column-based, key-value data model seemed
sufficient for most needs
– Performance looked great (rudimentary tests)

Data Modeling -
Where the rubber meets the road

About Netflix’s AB Testing
• Basic concepts
– Test – An experiment where several competing
behaviors are implemented and compared
– Cell – different experiences within a test that are
being compared against each other
– Allocation – a customer-specific assignment to a
cell within a test

Data Modeling - background
• AB has two sets of data
– metadata about tests
– allocations

AB - allocations
• Single table to hold allocations
– Currently at > 1 billion records
– Plus indices!
• One record for every test that every customer
is allocated into
• Unique constraint on customer/test

AB – relational model
• Typical parent-child table relationship
• Not updated frequently, so service can cache

Data modeling in Cassandra
• Every where I looked, the Internet told me to
understand my data use patterns

• Identify the questions that you need to
answer from the data

• Know how to query your data set and make
the persistence model match

Identifying the AB questions that need
to be answered
• High traffic
– get all allocations for a customer
• Low traffic
– get count of customers in test/cell
– find all customers in a test/cell
– find all customers in a test who were added within
a date range

Modeling allocations in Cassandra
• Read all allocations for a customer
– as fast as possible
• Find all of customers in a test/cell
– reverse index
• Get count of customers in test/cell
– count the entries in the reverse index

Denormalization - HOWTO
• No real world examples
– ‘Normalization is for sissies’, Pat Helland

• Denormalize allocations per customer
– Trivial with a schema-less database

Denormalized allocations
• normalized data

• denormalized (sparse) data

Implementing allocations
• As allocation for a customer has a handful of
data points, they logically can be grouped
together

• Avoided blobs, json or otherwise

• Using a standard column family, with
composite columns

Composite columns
• Composite columns are sorted by each ‘token’
in name

• Allocation column naming convention
– <testId>:<field>
– 42:cell = 2
– 42:enabled = Y
– 47:cell = 0
– 47:enabled = Y

Modeling AB metadata in cassandra
• Explored several models, including json
blobs, spreading across multiple CFs, differing
degrees of denormalization
• Reverse index to identify all tests for loading

Implementing metadata
• One CF, one row for all test’s data
– Every data point is a column – no blobs
• Composite columns
– type:id:field
• Types = base info, cells, allocation plans
• Id = cell number, allocation plan (gu)id
• Field = type-specific
– Base info = test name, description, enabled
– Cell’s name / description
– Plan’s start/end dates, country to allocate to

Implementing indices
• Cassandra’s secondary indices vs. hand-built
and maintained alternate indices

• Secondary indices work great on uniform data
between rows

• But sparse column data not easy to index

Hand-built Indices, 1

• Reverse index
– Test/cell (key) to custIds (columns)
• Column value is timestamp
• Updating index when allocating a customer
into test (double write)

Hand-built indices, 2
• Counter column family
– Test/cell to count of customers in test columns
– Mutate on allocating a customer into test
• Counters are not idempotent!
• Mutates need to write to every node that
hosts that key

Index rebuilding
• To keep the index consistent, it needs to be
rebuilt occasionally
• Even Oracle needs to have it’s indices rebuilt

Cassandra java clients
• Hector
– github.com/rantav/hector
• Astyanax
– Developed at Netflix (Eran Landau)
– github.com/netflix
• Cassie (scala)
– Developed at Twitter
– https://github.com/twitter/cassie

Astyanax features
• Clean object model
• Node discovery
• Node quarantine
• Request failover/retry
• JMX Monitoring
• Connection pooling
• Future execution

Astyanax connection pools, 1
• Round Robin uses coordinator node

Astyanax connection pooling, 2
• Token aware knows where the data resides for
point reads

Astyanax latency aware
• Samples response times from Cassandra
nodes
• Favors faster responding nodes in pool
• Use with token aware connection pooling

Allocation mutates
• AB allocations are immutable, so we need to
prevent mutating
• Oracle - unique table constraint
• Cassandra - read before write
– data race!

Running cassandra
• Compactions happen
– how Cassandra is maintained
– Mutations are written to memory (Memtable)
– Flushed to disk (SSTable) on triggering threshold
– Eventually, Cassandra merges SSTables as data for
individual rows becomes scattered

Compactions, 2
• Latency spikes happen, especially on read-
heavy systems
– Everything can slow down
– Throttling in newer Cassandra versions helps
– Astyanax avoids this problem with latency
awareness

Tunings, 1
• Key and row caches
– Left unbounded can consume JVM memory
needed for normal work
– Latencies will spike as the JVM fights for free
memory
– Off-heap row cache is better but still maintains
data structures on-heap

Tunings, 2
• mmap() as in-memory cache
– When the Cassandra process is terminated, mmap
pages are returned to the free list
• Row cache helps at startup

Tunings, 3
• Sizing memtable flushes for optimizing
compactions
– Easier when writes are uniformly
distributed, timewise – easier to reason about
flush patterns
– Best to optimize flushes based on memtable
size, not time

Tunings, 4
• Sharding
– If a single row has disproportionately high
gets/mutates, the nodes holding it will become
hot spots
– If a row grows too large, it can’t fit into memory

Takeaways
• Netflix is making all of our components
distributed and fault tolerant as we grow
domestically and internationally.

• Cassandra is a core piece of our cloud
infrastructure.

• Netflix is open sourcing it’s cloud
platform, including Cassandra support

終わり(The End)

• Q&A

@jasobrown jasedbrown@gmail.com

http://www.linkedin.com/in/jasedbrown

References
• Pat Helland, ‘Normalization Is for Sissies”
http://blogs.msdn.com/b/pathelland/archive/
2007/07/23/normalization-is-for-sissies.aspx

Cassandra from the trenches: migrating Netflix (update)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cassandra from the trenches: migrating Netflix (update)

Similar to Cassandra from the trenches: migrating Netflix (update) (20)

Recently uploaded

Recently uploaded (20)

Cassandra from the trenches: migrating Netflix (update)