Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016

A NETFLIX ORIGINAL SERVICE
Streaming Data Pipeline @ scale in the Cloud
Monal Daxini

Monal Daxini
Real Time Data Infrastructure
Senior Software Engineer, Netflix
https://www.linkedin.com/in/monaldaxini
@monaldax
#Netflix #Keystone

Netflix Is a Data Driven Company
Content
Product
Marketing
Finance
Business Development
Talent
Infrastructure
←CultureofAnalytics→

In the Old Days ...
EMR
Event
Producers

Chukwa / Suro + Real-Time Branch
Event
Producer
Druid
Stream
Consumers
EMR
Consumer
Kafka
Suro Router
Event
Producer
Suro
Kafka
Suro
Proxy

● Support at-least-once processing
● Scale, Multi-tenancy, Ease of Operations
● Enable future value adds - Stream Processing As a Service
● Replace dormant open source software - Chukwa
Why a new pipeline?

Goal - Migrate Events to a new Pipeline in flight,
while not losing more that 0.1% of them

Keystone
Stream
Consumers
Samza
Router
EMR
Fronting
Kafka
Consumer
Kafka
Control Plane
Event
Producer
KSProxy

Over 75M Members
190 Countries
125M hours/day → 11B hours / quarter
14,269 years / day → 1,255,707 years / quarter
1000+ devices
37% of Internet traffic at peak

700 billion unique events ingested / day
1 trillion unique events ingested / day - Dec 2015
1+ trillion events processed every day
Keystone Scale

11 million events ingested / sec @ peak
24 GB / sec @ peak
Upto 10MB payload / Avg 3K
1.3 Petabyte / day
Keystone Scale

99.99% + Availability / Four 9s
Keystone Scale

Want to know more...
Netflix Tech Blog - Pipeline Evolution

Event flow
Keystone Pipeline As a Service

Keystone
Stream
Consumers
Samza
Router
EMR
Fronting
Kafka
Event
Producer
Consumer
Kafka
Control Plane

Event Payload is Immutable
At-least-once semantics*
* Once the event makes it to Kafka, there are disaster scenarios where this breaks.

Injected Event Metadata
● GUID
● Timestamp
● Host
● App
`

Custom Extensible Wire Protocol
● Backwards and forwards compatibility
● Support multiple serialization formats
○ JSON, AVRO, Protobuf in the works
● Additional metadata
● Efficient - 10 bytes overhead per message (hundreds of bytes to 10MB)

Netflix Kafka Producer
● Configurable - topic to Kafka clusters routing
● Sticky partitioner
● Prefer event drop than disrupt producer app
● Best effort delivery, ack = 1
● Buffer size tuning based on traffic

● Pioneer Tax
● Started with 0.7, went live with 0.8.2
● Done moving to 0.9 &
● VPC in progress
● Work closely with Confluent to get patches through - OSS contribution
Kafka in the Cloud

● No dynamic topic creation
● Two copies
● Rack / Zone aware partition assignment
● Enable unclean leader election
Fronting Kafka Topics

● 4000 + d2.xl brokers for fronting & consumer clusters
● 120 Zookeeper nodes (24 ensembles)
○ Independent zookeeper cluster per Kafka cluster
Scale - Kafka (prod)

● 24 island clusters, 8 per region
○ 3 ASGs per cluster, 1 ASG per zone
○ 24 warm standby 3 node failover clusters
● 3 router offset checkpoint cluster, 1 per region
Scale - Kafka (prod)

Kafka Cluster Size -Tips
● Per Cluster Stay under 10k partitions & 200 brokers
● Leave approx. 40% free disk space on each broker

Kafka Kong
At least once
a week

Samza
RouterFronting
KafkaEvent
Producer
X

Fronting Kafka Failover
Self Service Tool

Kafka Auditor
Open sourcing on the road map

Kafka Auditor - One per cluster
● A service deployable on single or multiple instances
● Broker monitoring
● Consumer monitoring
● Heart-beat & Continuous message latency
● On-demand Broker performance testing

Kafka Metadata Visualization
Open sourcing on the road map

Want to know more...
Netflix Tech Blog - Kafka in Keystone Pipeline

Routing Infrastructure
+
Checkpointing
Cluster
+ 0.9.1Go
C language

Router Job Manager
(Control Plane)
EC2 Instances
Zookeeper
(Instance Id assignment)
Job
Job
Job
ksnode
Checkpointing Cluster
ASG

Samza Job Deployment
● Multiple Samza jobs for one Kafka source topic
● Each job processes messages for one sink
● Each job processes partitions only from one topic
● One checkpoint topic per Kafka source topic

POWERFULL!
1 checkpoint topic per sink, & source topic
for many Samza Jobs

Immutable Config in Running Job

Custom Go
Executor
./runJob
Logs
Snapshots
Attach Volumes
./runJob
./runJob
Reconcile Loop - 1 min
Health Check
What’s running in ksnode?
Zookeeper
(Instance Id assignment)

Logs
ZFS Volume
Snapshots
Custom Go
Executor
.
/runJo
b
.
/runJo
b
.
/runJo
b
Go Tools Server
Client Tools
Stream Logs
Browse through
rotated logs by date
Ksnode Tooling

Yes! You inferred right!
No Mesos & No Yarn

Using ThreadJobFactory in production
job.factory.class=org.apache.samza.job.local.ThreadJobFactory

SAMZA-41 - static partition range assignment
job.systemstreampartition.matcher.class=
org.apache.samza.system.RegexSystemStreamPartitionMatcher
job.systemstreampartition.matcher.config.ranges=[8-10]
^8&|^9$|^10$
you need

SAMZA-41 - static partition range assignment
Simplify...
job.systemstreampartition.matcher.class=
org.apache.samza.system.RangeSystemStreamPartitionMatcher
job.systemstreampartition.matcher.config.ranges=6-10

Prefetch Buffer - When is it going to OOM?
● Default count based per Samza container
○ (50,000 / # partitions) per topic
○ systems.source.samza.fetch.threshold=50000
● Cannot avoid OOM - variable message size

SAMZA-775- size based Prefetch buffer
● How much of heap should I use for prefetching?
○ systems.source.samza.fetch.threshold.bytes=200000000 (200MB)
○ per system / stream / partition
○ if > 0 precedence over systems.source.samza.fetch.threshold

● Value of systems.source.samza.fetch.threshold.bytes based on
○ Incoming traffic Bps into source Kafka
○ 60 seconds of buffer with region failover traffic
○ Samza in memory data structures (2 x message size)

● How does it perform?
○ Per message overhead within 0.02% of computed heuristics in the patch
○ Actual footprint of systems.source.samza.fetch.threshold.bytes is 10-15% at
the most in worst case.
■ Example: If set to 200MB, worst case observed 230MB

SAMZA-655 & SAMZA-540
● Backported from 0.10
○ environment variable configuration rewriter
■ Pass config from RDS to executor to Docker to Samza Job
○ expose latency related metrics in OffsetManager
■ checkpointed offset guage

● 14,000+ docker containers (samza jobs)
● 1,400+ AWS C3-4XL instances
Scale - Routing Service

More Info - Samza Meetup (10/2015)
Samza ver 0.9.1 Contributions

Customer Facing per topic end-to-end dashboard

Dev facing infrastructure end-to-end dashboard

Scaling Avenues
1 trillion / day
Now What?

● Exposed cost attribution per event producers & topic
○ E.g. one producer reduced throughput by 600%
● Automation - frees up additional resources
Scaling Up by Scaling Down
Oxymoron?

Not DevOps, but move towards NoOps
You build it! You run it!

Team has
● No dedicated product or project managers
● No separate devops or operations team
We build and run what you saw today!

● This does not mean we are constantly overworked
○ we make wise and simple choices and
○ lean towards automation & self-healing systems
We build and run what you saw today!

● Data thruway
○ Support for schemas - registry, discovery, validation
● Self Service Tooling
Future steps

Stream Processing As a Service (SPaaS)
Multi-tenant polyglot support for stream processing engines

Big Data Systems - streaming
Data Pipeline & Stream processing - Keystone - Samza / Flink (poc)
Playback & edge Operations insight - Mantis
Stream Processing - Spark Streaming
* Metrics & monitoring - Atlas *

Apache Beam
○ Portable API layer for building sophisticated data processing applications
○ Unified model API over bounded and unbounded data sources
○ Google lineage - Dataflow model
SPaaS - “Beam Me Up, Scotty ! "

SPaaS
SPaaS UI
Container
Runtime
Beam API
Bounded / Unbounded
Data Source
Dockerized Job
1. Create 2. Submit 3. Launch
Runner
Mantis / Flink /
Spark
Running
Job
1. Submit
DSL Job
Job Dashboard

Apache Beam + Apache Flink
SPaaS - init( )

More brain food...
Netflix OSS
Samza Meetup Presentation
Netflix Tech Blog
Spark Summit 2015 Talk

Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016

Similar to Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016 (20)

Recently uploaded

Recently uploaded (20)

Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016