Distributed Kafka Architecture Taboola Scale

Building Our Software to Scale
Tal Sliwowicz, Director R&D - Infrastructure Engineering
Lior Chaga - Senior Software Engineer, Data Platform
Distributed Kafka
Architecture,
Taboola Scale

Our Scale
+2.4B Pageviews / day
400K http requests / second
+1.5B monthly unique users
500B recommendations / per month
~450K recommendations / second – at peak
40TB / day

2016 vs. 2017
+36% PVs
+34% Recommendations
+220% Incoming Data

Data Infrastructure Requirements
• Fresh data
• Fast queries
• Exact (billing)
• Flexible – simple to add data and extend
• Scale faster than the business
• Endure traffic spikes

Our Strategy to Scale
• Best of breed technologies
• A lot of custom development
• Highly optimized distributions and hardware
• Software designed with scale and self healing in mind
• Everything is monitored and profiled
• Infrastructure to support extremely agile development cycles

Any server and even any data center can go down at any time without
service interruption:
• Share nothing - each server is independent
• Application is stateless - server can accept any traffic
• Data is fully replicated in real time
• Dynamic load balancing
Data must be exact
• All processing is idempotent
• “Exactly Once” semantics - never count twice
Data Infrastructure fully pluggable
• Connect into it at any point
Architecture Principles

Backend Processing
Our code deals with:
• Real time joining of multiple streams
• Real time access to data
• Distributed processing
FE
Servers
Jade
Cloud
Storage
Tensor Flow
Serving
SQL

Backend Processing
Jade
Cloud
Storage
Tensor Flow
Serving
SQL
FE
Servers

High Throughput Using Custom Buffering
Volume
• 50B protobufs / day - billing related protobufs
• 25B protobufs / day - monitoring related protobufs
Requirements
• Can’t interrupt the recommendations service
• Can’t lose data
How do we handle this volume?
• Custom message buffering
• Async sending
• Offheap - No GC by using pre-allocated DirectByteBuffers

WriteCoordinator
Protobuff
Event
Preallocated
Reusable
GzipDirectByteBuffers
Kafka
Endpoint
Filesystem Fallback Handler
failure
Message Handling Flow
drain

Monitoring
• Messages waiting on the filesystem (should be zero)
• Message producing rate
• Number of buffers being used
• Blocking Queue Size + Dropped Messages
• Message Size
• Payload Size
• Send to Kafka times
• Errors

Schema Management
Protostuff with schema evolution
• Separate git repo with strict ALM
• Testing for backward/forward compatibility
• Feature branches cannot be deployed to production
• Became critical with many developers adding data

FE
Servers
Architecture Evolution
FE
Servers
Backend
Backend
Backend
Backend
Frontend
Backend

Multi DC Deployment
• 6 FE Data centers
• 4 BE Data Centers
• Brokers: 36 FE + 50 BE
• Broker = 7TB NVME Disk, 128GB RAM, 32 CPU Core, 10GB Ethernet
• Full Data Replication to BE DCs

Mirroring With Kafka Mirror Maker
Topic 1
.
.
.
.
.
.
.
.
Topic N
Partition 1
.
.
.
Partition K
FE Kafka
Topic 1
.
.
.
.
.
.
.
Topic N
BE Kafka
Kafka
Mirror
Maker
C1
C2
Cm
P
P
P
Message size varies:
O(10KB) - O(MB)

Mirroring With Kafka Mirror Maker
Topic 1
.
.
.
.
.
.
.
.
Topic N
Partition 1
.
.
.
Partition K
FE Kafka
Topic 1
.
.
.
.
.
.
.
Topic N
BE Kafka
Kafka
Mirror
Maker
C1
C2
Ck
Partition 1
.
.
.
Partition K
P
P
P

Topic 1
.
.
.
.
.
.
.
.
Topic N
Mirroring Done Right - KFC
Topic 1
.
.
.
.
.
.
.
Topic N
BE Kafka
KFC
Mirror
C1
C2
Ck
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
Producer Pools - as
much as we need
Partition 1
.
.
.
Partition K
FE Kafka
Partition 1
.
.
.
Partition K

Introducing KFC
Multi-purpose framework for consuming from kafka and processing
messages in parallel, with built-in monitoring.
TaboolaKafkaConsumer
Consumer
Runnable
MessageProcessor KafkaConsumerParallelismStrategy KafkaCommitStrategy

Example: MirroringMessageProcessor

Example: MirroringMessageProcessor (2)

Monitoring
● Registering all org.apache.kafka.common.Metric as codahale Gauge
● Alerting on poll cycle time:
○ Messaging processing is stuck
○ Can’t find group coordinator
● Alerting on partition lag:
○ Lag by number of messages
○ Lag by delta from produce time (bursty topics)

Other KFC Usages
• Backup message to GCS - pay for PUT
• Idempotent write to C* - partial protobuf parse + pushback
• Embedded KFC for event driven processing
• BQ Upload - tweaking fetch size
• RDBMS updates
• Many more...

Apache Spark as Our Distributed Processing Engine
Serves multiple functions:
• Runs our code that joins data stream into pageviews and sessions
• SQL engine for analysts and algo group
• Raw data feed into the deep learning engine
• 100s of data aggregators constantly running feeding data to Backstage (in beta)
Fun facts
• ~15K cores + >70TB of RAM (+>8PB on disk historic data)
• Clusters will grow by 50% in size by year’s end
• Using Spark since the end of 2013 - in production
• Contributed multiple critical fixes back to community

Distributed Kafka Architecture Taboola Scale

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Distributed Kafka Architecture Taboola Scale

Similar to Distributed Kafka Architecture Taboola Scale (20)

Recently uploaded

Recently uploaded (20)

Distributed Kafka Architecture Taboola Scale