Netflix at-disney-09-26-2014

Cloud Data
Persistence @
Monal Daxini
Senior Software Engineer
Cloud Database Engineering
!
@monaldax
50m+
Subscribers

Summary
Netflix OSS
Microservices
m@Netflix Season 1, 2
Cassandra @ Netflix
Cassandra Best Practices
Coming Soon…

Start with
Zero To Cloud With @NetflixOSS
!
https://github.com/Netflix-Skunkworks/zerotocloud

Function OSS Library
Karyon/
Governator
RxJava
Hystrix
Ribbon/Eureka
Curator
EVCache
Astyanax
Turbine
Servo
Blitz4J
Archaius

Building Apps and AMIs
WAR
ASG /Cluster
App
AMI
ASG/Cluster
Deploy
Launch
Instances
@stonse

NetflixOSS
Suro Data Pipeline
Eureka
Zuul
Edda

Micro Services
Micro services DOES NOT mean better
Availability
Need Fault Tolerant Architecture
Service Dependency View
Distributed Tracing (Dapper inspired)

Micro Services
1 response - 1 monolithic service 99.99%
uptime
1 response - 30 micro services each 99.99%
uptime
overall 97% uptime (20hrs downtime)

Micro Services
Actual Scale
~2 Billion Edge Requests per day
Results in ~20 Billion Fan out
requests to
~100 different MicroServices

Fault Tolerant Arch
Depedency Isolation
Aggressive timeouts
Circuit breakers

MicroServices Container
Synchronous Asynchronous
Tomcat RxNetty (UDP TCP WebSockets SSE)
ThreadPool
(1 thread per request)
EventLoops

MicroServices Container
Rx
ease async programming
avoid callback hell
Netty to leverage EventLoop
Rx + Netty RxNetty

@Netflix Season-1
Media Cloud Engineering

Encoding PaaS
Master - Worker Pattern
Decoupled by Priority Queues
with message lease
State in Cassandra

Oracle >> Cassandra
Data Model & Lack of ACID
Client Cluster Symbiosis
Embrace Eventual Consistency
Data Migration
Shadow Write / Reads

Object To Cassandra Mapping
/**
* @author mdaxini
*/
@CColumnFamily(name = “Sequence", shared = true)
@Audited(columnFamily = "sequence_audit")
public class SequenceBean {
@CId(name = "id")
private String sequenceName;
@CColumn(name = "sequenceValue")
private Long sequenceValue;
@CColumn(name = "updated")
@TemporalAutoUpdate
@JsonProperty("updated")
private Date updated;

Object To Cassandra Mapping
@JsonAutoDetect(JsonMethod.NONE)
@JsonIgnoreProperties(ignoreUnknown = true)
!
@CColumnFamily(name = "task")
public class Job {
@CId
private JobKey jobKey;
public final class TaskKey {
@CId(order = 0)
private Long packageId;
@CId(order = 1)
private UUID taskId;

Priority-Scheduling Queue
Evolution:
One SQS Queue per priority range
Store and forward (rate-adaptive) to SQS
Queue
Rule based priority, leases, RDBMS based with
prefetch

Encoding PaaS Farm
One command deployment and upgrade
Self Serve
Homogeneous View of Windows and Linux
Pioneered Ubuntu - production since 2011

Innovate Fast
Build for Pragmatic Scale
Innovate for Business
Standardize Later*

@Netflix Season-2
Cloud Database Engineering
[CDE]

Platform Big Data/Caching & Services
Cassandra
Astyanax Priam
CassJMeter
Hadoop Platform
As a Service
Genie
Lipstick
Adapted from a slide by @stonse
Caching
Inviso*

CDE Charter
Dynomite*
Redis
ElasticSearch
Spark*
Solr*
* Under Construction
Cassandra (1.2.x >> 2.0.x)
Priam
Astyanax
Skynet*

All
OLTP Data in Cassandra
!
Almost!

Cassandra Prod Footprint
90+ Clusters
2700+ Nodes
4 Datacenters (Amazon Regions)
>1 Trillion operations per day

Cassandra Best Practices*
Usage
*Practices I have found useful, YMMV

Use RandomPartitioner
Have at least 3 replicas (quorum)
Same number of replicas - simpler operations
!
create keyspace oracle
with placement_strategy = 'NetworkTopologyStrategy'
!
and strategy_options = {us-west-2 : 3, us-east : 3}

Move to CQL3 from thrift
Codifies best practices
Leverage Collections (albeit restricted cardinality)
Use Key Caching
As a default turn off Row Caching
Rename all composite columns in one ALTER
TABLE statement.

Watch length of column names
Use “COMPACT STORAGE” wisely
Cannot use collections - depends on
CompositeType
Non compact storage uses 2 bytes per internal
cell, but preferred.
!
!
* Image courtsey Datastax blog

Prefer CL_ONE
data replication within 500ms across the region
Using quorum reads and writes, then set
read_repair_chance to 0.0 or very low value.
Make sure repairs are run often
Eventual Consistency does not mean hopeful
consistency

Avoid secondary indexes for high cardinality
values
Most cases we set gc_grace_seconds = 10 days
Avoid hot rows
detect using node level latency metrics

Avoid heavy rows
Avoid too wide rows (< 100K columns if smaller)
Don’t use C* as a Queue
Tombstones will bite you

SizeTieredCompactionStrategy
write heavy workload
non-predictable I/O, 2x disk space
LeveledCompactionStrategy
read heavy work loads
predictable I/O, 2x STCS

SizeTieredCompactionStrategy
LeveledCompactionStrategy
* Image courtsey Datastax blog

Guesstimate and then validate sstable_size_in_mb
Hint: based on write rate and size
160mb for LeveledCompactionStrategy
SizeTieredCompactionStrategy - C* default 50mb

Atomic batches
no isolation, only atomic for row within
partition key
no automatic rollback
Lightweight transactions

Cassandra Best Practices
Operations
*Practices we have found useful, YMMV

If your C* clusters footprint is significant
must have good automation
at least a C* semi-expert
Use cstar_perf to validate your initial clusters
We don’t use vnodes
On each node size disk to have 2x of expected
data - ephemeral ssds no ebs

Monitoring and alerting
read write latency - co-ordinator & node level
Compaction stats
Heap Usage
Network
Max & Min Row sizes

Fixed tokens, double the cluster to expand
Important to size the cluster for app needs
initially
benefits of fixed tokens outweighs vnodes
Take back up of all the nodes
to allow for eventual consistency on restores
Note: commitlog by default fsync only ever 10
seconds

Run repairs before GCGraceSeconds expires
Throttle compactions and repairs
Repairs can take a long time
run a primary range and a Keyspace at a time to
avoid performance impact.

Schema disagreements - pick the nodes with the
older date and restart them one at time.
nodetool reset local schema not persistent on 1.2
Recyle nodes in aws to prevent staleness
Expanding to new region
Launch nodes in new region without
bootstrapping
Change Keyspace replication
Run nodetool rebuild on nodes in new region.

More Info
http://techblog.netflix.com/
http://netflix.github.io/
http://slideshare.net/netflix
https://www.youtube.com/user/NetflixOpenSource
https://www.youtube.com/user/NetflixIR $$$

Netflix at-disney-09-26-2014

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Netflix at-disney-09-26-2014

Similar to Netflix at-disney-09-26-2014 (20)

Recently uploaded

Recently uploaded (20)

Netflix at-disney-09-26-2014