Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator

184 views

Published on

Presented at Stream Processing Meetup (7/19/2018)(https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/251481797/).
At Uber, we operate 20+ Kafka clusters to collect system and application logs as well as event data from rider and driver apps. We need a Kafka replication solution to replicate data between Kafka clusters across multiple data centers for different purposes. This talk will introduce the history behind uReplicator and the high level architecture. As the original uReplicator ran into scalability challenges and operational overhead as the scale of Kafka clusters increased, we built the Federated uReplicator which addressed above issues and provide an extensible architecture for further scaling.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator

  1. 1. uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator Hongliang Xu Software Engineer, Streaming Data Streaming Processing Meetup@LinkedIn July 19, 2018
  2. 2. Agenda ● Use cases & scale ● Motivation ● High-level design ● Future work 3
  3. 3. Agenda ● Use cases & scale ● Motivation ● High-level design ● Future work 4
  4. 4. DC2 Kafka@Uber Applications [ProxyClient] Kafka REST Proxy Regional Kafka Aggregate Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka Secondary Kafka DC1 uReplicator DC1/2 5
  5. 5. Trillion+ ~PBs Messages/Day Data Volume Scale excluding replication Tens of Thousands Topics 6
  6. 6. Replication - Use Cases ● Aggregation ○ All-active ○ Ingestion ● Migration ○ Allow consumer move later 7
  7. 7. Agenda ● Use Cases & Scale ● Motivation ● High-level Design ● Future Work 8
  8. 8. MirrorMaker ● Expensive rebalancing ● Difficulty adding topics ● Possible data loss ● Metadata sync issues 9
  9. 9. Requirement ● Stable replication ● Simple operations ● High throughput ● No data loss ● Auditing 10
  10. 10. Agenda ● Use Cases & Scale ● Motivation ● High-level Design ● Future Work 11
  11. 11. uReplicator ● uReplicator controller ○ Helix controller ○ Handle adding/deleting topics ○ Handle adding/deleting/failure of uReplicator workers ○ Assign topic partitions to each worker process 12
  12. 12. Zookeeper Helix Controller Helix Agent Thread 1 Thread N Topic-partition Helix Agent Thread 1 Thread N Topic-partition Helix Agent Thread 1 Thread N Topic-partition New topic New partition Worker1 Worker2 Worker3 curl -X POST -d '{"topic":"testTopic", "numPartitions":"4"}' http://localhost:7061/topics Calculate idealState of the topic partitions { "externalView": {topic: testTopic}, "idealState": { "0": ["MM worker1"], "1": ["MM worker2"], "2": ["MM worker3"], "3": ["MM worker1"] } } uReplicator 13
  13. 13. Zookeeper Helix Controller Helix Agent Thread 1 Thread N Topic-partition Helix Agent Thread 1 Thread N Topic-partition Helix Agent Thread 1 Thread N Topic-partition New topic New partition Worker1 Worker2 Worker3 curl -X POST -d '{"topic":"testTopic", "numPartitions":"4"}' http://localhost:7061/topics topic: testTopic partition:0 topic: testTopic partition:3 topic: testTopic partition:1 topic: testTopic partition:2 uReplicator 14
  14. 14. uReplicator ● uReplicator worker ○ Helix agent ○ Dynamic Fetcher ○ Kafka producer ○ Commit after flush 15 Zookeeper Helix Controller Worker thread Helix Agent FetcherManager topic: testTopic partition:0 FetcherThread SimpleConsumer LeaderFind Thread LinkedBlockingQueueKafka Producer
  15. 15. Zookeeper Helix Controller Helix Agent Thread 1 Thread N Topic-partition Helix Agent Thread 1 Thread N Topic-partition Helix Agent Thread 1 Thread N Topic-partition New topic New partition Worker1 Worker2 Worker3 curl -X POST -d '{"topic":"testTopic", "numPartitions":"4"}' http://localhost:7061/topics topic: testTopic partition:0 topic: testTopic partition:3 topic: testTopic partition:1 topic: testTopic partition:2 uReplicator 16
  16. 16. Zookeeper Helix Controller Helix Agent Thread 1 Thread N Topic-partition Helix Agent Thread 1 Thread N Topic-partition Helix Agent Thread 1 Thread N Topic-partition New topic New partition Worker1 Worker2 Worker3 curl -X POST -d '{"topic":"testTopic", "numPartitions":"4"}' http://localhost:7061/topics topic: testTopic partition:0 topic: testTopic partition:3 topic: testTopic partition:1 topic: testTopic partition:2 uReplicator 17
  17. 17. Performance Issues ● Catch-up time is too long (4 hours failover) ○ Full-speed phase: 2~3 hours ○ Long tail phase: 5~6 hours 18
  18. 18. Problem: Full Speed Phase ● Destination brokers are bound by CPUs 19
  19. 19. Solution 1: Increase Batch Size ● Increase throughput ○ producer.batch.size: 64KB => 128KB ○ producer.linger.ms: 100 => 1000 ● Effect ○ Batch size increases: 10~22KB => 50~90KB ○ Compress rate: 40~62% => 27~35% (compressed size) 20
  20. 20. Solution 2: 1-1 Partition Mapping ● Round robin ● Topic with N partitions ● N^2 connection ● DoS-like traffic MM 1 MM 2 MM 3 p0 p1 p3MM 4 Source Destination p2 p0 p1 p2 p3 21 MM 1 MM 2 MM 3 p0 p1 p3MM 4 Source Destination p2 p0 p1 p2 p3 ● Deterministic partition ● Topic with N partitions ● N connection ● Reduce contention
  21. 21. Full Speed Phase Throughput ● First hour: 27.2MB/s => 53.6MB/s per aggregate broker 22 Before After
  22. 22. Problem 2: Long Tail Phase ● During full speed phase, all workers are busy ● Some workers catch up and become idle, but the others are still busy 23
  23. 23. Solution 1: Dynamic Workload Balance ● Original partition assignment ○ Number of partitions ○ Heavy partition on the same worker ● Workload-based assignment ○ Total workload when added ■ Source cluster bytes-in-rate ■ Retrieved from Chaperone3 ○ Dynamic rebalance periodically ■ Exceeds 1.5 times the average workload 24
  24. 24. Solution 2: Lag-Feedback Rebalance ● Monitors topic lags ○ Balance the workers based on lags ■ larger lag = heavier workload ○ Dedicated workers for lagging topics 25
  25. 25. Dynamic Workload Rebalance ● Periodically adjust workload every 10 minutes ● Multiple lagging topics on the same worker spreads to multiple workers 26
  26. 26. Issues Again ● Scalability ○ Number of partitions ○ Up to 1 hour for a full rebalance during rolling restart ● Operation ○ Number of deployments ○ New cluster ● Sanity ○ Loop ○ Double route 27
  27. 27. Federated uReplicator ● Deployment group ● 1 manager ● Multiple route ● Controller/worker pool 28
  28. 28. ● uReplicator Front ○ whitelist/blacklist topics (topic, src, dst) ○ Persist in DB ○ Sanity checks ○ Assigns to uReplicator Manager Federated uReplicator 29
  29. 29. Federated uReplicator ● uReplicator Manager ○ if src->dst not exist: ■ select controller/workers from pool -> new route ■ add topic to route -> done ○ else: ■ for each route: ● if #partition < max: ○ add topic to route -> done ■ select controller/workers from pool -> new route ■ add topic to route -> done 30
  30. 30. ● uReplicator Controller/Worker ○ Step 1: route assignment ○ Manager->Controller/Worker ○ Resource: @<src cluster>@<dst cluster> ○ Partition: <route id> ○ Step 2: topic assignment ○ Manager->Controller ○ Resource: <topic name> ○ Partition: @<src cluster>@<dst cluster>@<route id> Federated uReplicator 31
  31. 31. ● Auto-scaling ○ Total workload of route ○ Total lag in the route ○ Total workload / expected workload on worker ○ Automatically add workers to route Federated uReplicator 32
  32. 32. ● Consume message from Kafka ● Compare count from each Tier ● https://eng.uber.com/chaperone/ ● https://github.com/uber/chaperone Chaperone Auditing 33
  33. 33. Agenda ● Use cases & scale ● Motivation ● High-level design ● Future work 34
  34. 34. Future Work ● Open source ● Extensible framework ○ From anything to anything 35
  35. 35. Links ● https://github.com/uber/uReplicator ● https://eng.uber.com/ureplicator/ 36
  36. 36. Acknowledgement ● Chinmay Soman, Xiang Fu, Yuanchi Ning, Zhenmin Li, Xiaobing Li, Ying Zheng, Lei Lin & streaming-data@Uber 37
  37. 37. Thank you Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber. More open-source projects at eng.uber.com 38
  38. 38. Proprietary and confidential © 2018 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber. 39

×