Keystone processes over 700 billion events per day (1 peta byte) with at-least once processing semantics in the cloud. We will explore in detail how we leverage Kafka, Samza, Docker, and Linux at scale to implement a multi-tenant pipeline in AWS cloud within a year. We will also share our plans on offering a Stream Processing as a Service for all of Netflix use.
8. ● Support at-least-once processing
● Scale, Multi-tenancy, Ease of Operations
● Enable future value adds - Stream Processing As a Service
● Replace dormant open source software - Chukwa
Why a new pipeline?
9. Goal - Migrate Events to a new Pipeline in flight,
while not losing more that 0.1% of them
13. Over 75M Members
190 Countries
125M hours/day → 11B hours / quarter
14,269 years / day → 1,255,707 years / quarter
1000+ devices
37% of Internet traffic at peak
14. 700 billion unique events ingested / day
1 trillion unique events ingested / day - Dec 2015
1+ trillion events processed every day
Keystone Scale
27. Custom Extensible Wire Protocol
● Backwards and forwards compatibility
● Support multiple serialization formats
○ JSON, AVRO, Protobuf in the works
● Additional metadata
● Efficient - 10 bytes overhead per message (hundreds of bytes to 10MB)
28. Netflix Kafka Producer
● Configurable - topic to Kafka clusters routing
● Sticky partitioner
● Prefer event drop than disrupt producer app
● Best effort delivery, ack = 1
● Buffer size tuning based on traffic
31. ● Pioneer Tax
● Started with 0.7, went live with 0.8.2
● Done moving to 0.9 &
● VPC in progress
● Work closely with Confluent to get patches through - OSS contribution
Kafka in the Cloud
32. ● No dynamic topic creation
● Two copies
● Rack / Zone aware partition assignment
● Enable unclean leader election
Fronting Kafka Topics
34. ● 24 island clusters, 8 per region
○ 3 ASGs per cluster, 1 ASG per zone
○ 24 warm standby 3 node failover clusters
● 3 router offset checkpoint cluster, 1 per region
Scale - Kafka (prod)
35. Kafka Cluster Size -Tips
● Per Cluster Stay under 10k partitions & 200 brokers
● Leave approx. 40% free disk space on each broker
42. Kafka Auditor - One per cluster
● A service deployable on single or multiple instances
● Broker monitoring
● Consumer monitoring
● Heart-beat & Continuous message latency
● On-demand Broker performance testing
51. Samza Job Deployment
● Multiple Samza jobs for one Kafka source topic
● Each job processes messages for one sink
● Each job processes partitions only from one topic
● One checkpoint topic per Kafka source topic
59. SAMZA-41 - static partition range assignment
job.systemstreampartition.matcher.class=
org.apache.samza.system.RegexSystemStreamPartitionMatcher
job.systemstreampartition.matcher.config.ranges=[8-10]
^8&|^9$|^10$
you need
61. Prefetch Buffer - When is it going to OOM?
● Default count based per Samza container
○ (50,000 / # partitions) per topic
○ systems.source.samza.fetch.threshold=50000
● Cannot avoid OOM - variable message size
62. SAMZA-775- size based Prefetch buffer
● How much of heap should I use for prefetching?
○ systems.source.samza.fetch.threshold.bytes=200000000 (200MB)
○ per system / stream / partition
○ if > 0 precedence over systems.source.samza.fetch.threshold
63. SAMZA-775- size based Prefetch buffer
● Value of systems.source.samza.fetch.threshold.bytes based on
○ Incoming traffic Bps into source Kafka
○ 60 seconds of buffer with region failover traffic
○ Samza in memory data structures (2 x message size)
64. SAMZA-775- size based Prefetch buffer
● How does it perform?
○ Per message overhead within 0.02% of computed heuristics in the patch
○ Actual footprint of systems.source.samza.fetch.threshold.bytes is 10-15% at
the most in worst case.
■ Example: If set to 200MB, worst case observed 230MB
65. SAMZA-655 & SAMZA-540
● Backported from 0.10
○ environment variable configuration rewriter
■ Pass config from RDS to executor to Docker to Samza Job
○ expose latency related metrics in OffsetManager
■ checkpointed offset guage
73. ● Exposed cost attribution per event producers & topic
○ E.g. one producer reduced throughput by 600%
● Automation - frees up additional resources
Scaling Up by Scaling Down
Oxymoron?
75. Not DevOps, but move towards NoOps
You build it! You run it!
76. Team has
● No dedicated product or project managers
● No separate devops or operations team
We build and run what you saw today!
77. ● This does not mean we are constantly overworked
○ we make wise and simple choices and
○ lean towards automation & self-healing systems
We build and run what you saw today!
79. ● Data thruway
○ Support for schemas - registry, discovery, validation
● Self Service Tooling
Future steps
80. Stream Processing As a Service (SPaaS)
Multi-tenant polyglot support for stream processing engines
81. Big Data Systems - streaming
Data Pipeline & Stream processing - Keystone - Samza / Flink (poc)
Playback & edge Operations insight - Mantis
Stream Processing - Spark Streaming
* Metrics & monitoring - Atlas *
82. Apache Beam
○ Portable API layer for building sophisticated data processing applications
○ Unified model API over bounded and unbounded data sources
○ Google lineage - Dataflow model
SPaaS - “Beam Me Up, Scotty ! "