Kafka Practices @ Uber - Seattle Apache Kafka meetup

Real-Time Data Pipeline @ Uber
Mingmin Chen
George Teo
Seattle Apache Kafka Meetup
Jan 18, 2018

Agenda
● Use Cases & Current Scale
● Data Infrastructure @ Uber
● Kafka @ Uber
○ Rest Proxy & Clients
○ Local Agent
○ uReplicator (Mirrormaker)
○ Offset Sync Service
○ Chaperone (Auditing)
○ Cluster Balancing
● Future Work

Real-time Driver-Rider Matching
Stream
Processing
- Driver-Rider Match
- ETA
App Views
Vehicle information
KAFKA

A bunch more...
● Fraud Detection
● Share My ETA
● Driver & Rider Signups
● Etc.

Kafka - Use Cases
● General Pub-Sub
● Stream Processing
○ AthenaX - Self-Serve Platform (Samza, Flink)
● Database Changelog Transport
○ Schemaless, Cassandra, MySQL
● Ingestion
○ HDFS, S3
● Logging

Scale
* obligatory show-off slide

Trillion+ ~PBs
Messages/Day Data Volume
Scale
excluding replication
Tens of Thousands
Topics

Apache Kafka is Uber’s Data Hub

PRODUCERS
CONSUMERS
Real-time
Analytics, Alerts,
Dashboards
Samza / Flink
Applications
Data Science
Analytics
Reporting
Kafka
Vertica / Hive
Rider App
Driver App
API / Services
Etc.
Ad-hoc Exploration
ELK
Data Infrastructure @ Uber
Debugging
Hadoop
Surge Mobile App
Cassandra
Schemaless
MySQL
DATABASES
AWS S3
(Internal) Services

Requirements
● Scale Horizontally
● API Latency (<5ms typically)
● Availability -> 99.99%
● Durability -> 99.99%; 100% -> Critical Customers
● Multi-DC Replication
● Multi-Language Support
○ Java, Go, Python, Node.js, C++
● Auditing

Kafka Clusters
● Running Kafka 0.10.2
● Use Case-based
○ Logging
○ Database Changelogs
○ Highly Isolated & Reliable e.g. Surge
○ High Value Data (e.g. Signups)
● Fallback Secondary Clusters
● Global Aggregates
○ Offset Sync Service

DC2
DC1
Kafka Ecosystem @ Uber
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Local
Agent
Secondary
Kafka
Aggregate
Kafka
uReplicator
16
Offset Sync Service
Aggregate
Kafka
uReplicator

DC1
DC2
Kafka Ecosystem @ Uber
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Local
Agent
Secondary
Kafka
Aggregate
Kafka
uReplicator
17
Offset Sync Service
Aggregate
Kafka
uReplicator

Producer Libraries
● High Throughput (average case)
○ Non-blocking, async, batched
● At-least-once (critical use case)
○ Blocking, sync
● Topic Discovery
○ Discovers the kafka cluster a topic belongs
○ Able to multiplex to different kafka clusters

Kafka Local Agent
DC2
DC1
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Local
Agent
Secondary
Kafka
Aggregate
Kafka
uReplicator
Offset Sync Service
Aggregate
Kafka
uReplicator

Kafka Local Agent
● Producer side persistence
○ Local storage
● Isolates clients from downstream outages, backpressure
● Controlled backfill upon recovery
○ Prevents from overwhelming a recovering cluster

Local Agent in Action
Add
Figure

Kafka Rest Proxy
DC1
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Local
Agent
Secondary
Kafka
DC2
Aggregate
Kafka
uReplicator
Aggregate
Kafka
uReplicator
22
Offset Sync Service

Why Kafka Rest Proxy ?
● Simplified Client API
○ Multi-lang Support
● Decouple Client With Kafka broker
○ Thin Clients = Operational Ease
○ Easier Kafka Upgrades
● Enhanced Reliability
○ Quota Management
○ Primary & Secondary Clusters

Kafka Rest Proxy: Internals
● Based on Confluent’s open sourced Rest Proxy
● Performance enhancements
○ Simple HTTP servlets on jetty instead of Jersey
○ Optimized for binary payloads.
○ Performance increase from 7K* to 45K QPS/box
● Caching of topic metadata
● Reliability improvements*
○ Support for Fallback cluster
○ Support for multiple producers (SLA-based segregation)
● Plan to contribute back to community
*Based on benchmarking & analysis done in Jun ’2015

Kafka Secondary Cluster
DC1
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Local
Agent
Secondary
Kafka
DC2
Aggregate
Kafka
uReplicator
Aggregate
Kafka
uReplicator
25
Offset Sync Service

Kafka Secondary Cluster
● High availability on regional cluster failure
● Rest proxy produces Secondary Cluster on Regional Cluster
failure
● uReplicator/Mirrormaker backfill data back to regional cluster
on recovery

uReplicator
DC1
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Local
Agent
Secondary
Kafka
DC2
Aggregate
Kafka
uReplicator
Aggregate
Kafka
uReplicator
Offset Sync Service

uReplicator
● In-house Intercluster Replication Solution
○ Apache Helix-based
○ Mirror all traffic between & within DCs
○ Lower rebalance latencies
● Running in Production ~2 Years
● Open Sourced: https://github.com/uber/uReplicator
● Uber Engineering Blog: https://eng.uber.com/ureplicator/

Cluster Balancing
● No Auto Rebalancing
● Manual Placement is Hard
● Auto Plan Generation
○ And execution!

At-Least-Once
Application Process
ProxyClient
Kafka Proxy Server uReplicator
1
2
3 5 7
64 8
Regional Kafka Aggregate Kafka
● Most of infrastructure tuned for high throughput
○ Batching at each stage
○ Ack before being persisted (ack’ed != committed)
● Single node failure in any stage leads to data loss
● Need a reliable pipeline for High Value Data e.g. Payments

At-least-once Kafka: Data Flow
Application Process
ProxyClient
Kafka Proxy Server uReplicator
1
6
2 3 7
45 8
Regional Kafka Aggregate Kafka

Consumer
DC1
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Aggregate
Kafka
uReplicator
Consumer
Application
Consumer
Application
(Global View)

Offset Sync Service
DC1
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Applications
[ProxyClient]
Kafka REST
Proxy
Regional
Kafka
Local
Agent
Secondary
Kafka
DC2
Aggregate
Kafka
uReplicator
Aggregate
Kafka
uReplicator
Offset Sync Service

Offset Sync Service
● Used for syncing offset between aggregate clusters on
failover
● Mirrormaker periodically snapshot regional offset to
aggregate offset map to external datastore
● Use offset map to recover safe consumer offset to resume
from in passive DC

CONFIDENTIAL
>> INSERT SCREENSHOT HERE <<
Chaperone - Track Counts

CONFIDENTIAL
>> INSERT SCREENSHOT HERE <<
Chaperone - Track Latency

Chaperone - End to End Auditing
● In-house Auditing Solution for Kafka
● Running in Production for ~2 Years
○ Audit 20k+ topics for 99.99% completeness
● Open Sourced: https://github.com/uber/chaperone
● Uber Engineering Blog: https://eng.uber.com/chaperone/

Future Work
● Richer consumer semantics for service owners
○ DLQ
○ Per partition competing consumer
● Multi-zone Clusters
○ Durability during DC wide outages
● Chargebacks
● Efficiency Enhancements
○ Intelligent aggregates, automated topic GC etc..
● uReplicator 2.0
● Open Source

Thank you
Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or
utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage
or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity
to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under
applicable law. All recipients of this document are notified that the information contained herein includes proprietary and
confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of
the enclosed information to any person other than employees of addressee to the extent necessary for consultations with
authorized personnel of Uber.
More open-source projects at eng.uber.com

Kafka Practices @ Uber - Seattle Apache Kafka meetup

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Kafka Practices @ Uber - Seattle Apache Kafka meetup

Similar to Kafka Practices @ Uber - Seattle Apache Kafka meetup (20)

Recently uploaded

Recently uploaded (20)

Kafka Practices @ Uber - Seattle Apache Kafka meetup

Editor's Notes