Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

630 views

Published on

Yuto Kawamura (LINE Corporation)
LINE Vietnam Opening Day, March 31st 2018

Published in: Technology
  • Be the first to comment

Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

  1. 1. Building a company-wide data pipeline upon Apache Kafka - engineering for 150 billion messages per day Yuto Kawamura LINE Corp
  2. 2. Speaker introduction • Yuto Kawamura • Senior software engineer of LINE server development • Work at Tokyo office • Apache Kafka contributor • Joined: Apr, 2015 (about 3 years)
  3. 3. About LINE •Messaging service •Over 200 million global monthly active users 1 in countries with top market share like Japan, Taiwan and Thailand
 •Many family services •News •Music •LIVE (Video streaming) 
 1 As of June 2017. Sum of 4 countries: Japan, Taiwan, Thailand and Indonesia. 

  4. 4. Agenda • Introducing LINE server • Data pipeline w/ Apache Kafka
  5. 5. LINE Server Engineering is about … • Scalability • Many users, many requests, many data • Reliability • LINE already is a communication infra in countries

  6. 6. Scale metrics: message delivery LINE Server 25 billion /day (API call: 80 billion / day)
  7. 7. Scale metric: Accumulated data (for analysis) 40PB
  8. 8. Messaging System Architecture Overview LINE Apps LEGY JP LEGY DE LEGY SG Thrift RPC/HTTP talk-server Distributed Data Store Distributed async task processing
  9. 9. LEGY • LINE Event Delivery Gateway • API Gateway/Reverse Proxy • Written in Erlang • Features focused on needs of implementing a messaging service • e.g, Zero latency code hot swapping w/o closing client connections
  10. 10. talk-server • Java based web application server • Implements most of messaging functionality + some other features • Java8 + Spring + Thrift RPC + Tomcat8
  11. 11. Datastore with Redis and HBase • LINE’s hybrid datastore = Redis(in-memory DB, home- brew clustering) + HBase(persistent distributed key-value store) • Cascading failure handling • Async write from background task processor • Data correction batch Primary/ Backup talk-server Cache/ Primary Dual write
  12. 12. Message Delivery LEGY LEGY talk-server Storage 1. Find nearest LEGY 2. sendMessage(“Bob”, “Hello!”) 3. Proxy request 4. Write to storage talk-server X. fetchOps() 6. Proxy request 7. Read message 8. Return fetchOps() with message 5. Find LEGY Bob is connecting, Notify message arrival Alice Bob
  13. 13. There’re a lot of internal communication processing user’s request talk-server Threat detection system Timeline Server Data Analysis Background Task processing Request
  14. 14. Communication between internal systems • Communication for querying, transactional updates: • Query authentication/permission • Synchronous updates • Communication for data synchronization, update notification: • Notify user’s relationship update • Synchronize data update with another service talk-server Auth Analytics Another Service HTTP/REST/RPC
  15. 15. Apache Kafka • A distributed streaming platform • (narrow sense) A distributed persistent message queue which supports Pub-Sub model • Built-in load distribution • Built-in fail-over on both server(broker) and client
  16. 16. How it works Producer Brokers Consumer Topic Topic Consumer Consumer Producer AuthEvent event = AuthEvent.newBuilder() .setUserId(123) .setEventType(AuthEventType.REGISTER) .build(); producer.send(new ProducerRecord(“events", userId, event)); consumer = new KafkaConsumer("group.id" -> "group-A"); consumer.subscribe("events"); consumer.poll(100)… // => Record(key=123, value=...)
  17. 17. Consumer GroupA Pub-Sub Brokers Consumer Topic Topic Consumer Consumer GroupB Consumer Consumer Records[A, B, C…] Records[A, B, C…] • Multiple consumer “groups” can independently consume a single topic
  18. 18. Example: UserActivityEvent
  19. 19. Scale metric: Events produced into Kafka Service Service Service Service Service Service 150 billion msgs / day (3 million msgs / sec)
  20. 20. our Kafka needs to be high- performant • Usages sensitive for delivery latency • Broker’s latency impact throughput as well • because Kafka topic is queue
  21. 21. … wasn’t a built-in property • KAFKA-4614 Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex • 99th %ile latency of Produce request: 150 ~ 200ms => 10ms (x15 ~ x20 faster) • KAFKA-6051 ReplicaFetcherThread should close the ReplicaFetcherBlockingSend earlier on shutdown • Eliminated ~x1000 slower response during restarting broker • (unpublished yet) Kafka broker performance degradation when consumer requests to fetch old data • x10 ~ x15 speedup for 99th %ile response
  22. 22. Performance Engineering Kafka • Application Level: • Read and understand code • Patch it to eliminate bottleneck • JVM Level: • JVM profiling • GC log analysis • JVM parameters tuning • OS Level: • Linux perf • Delay Accounting • SystemTap
  23. 23. e.g, Investigating slow sendfile(2) • SystemTap: A kernel dynamic tracing tool • Inject script to probe in-kernel behavior stap —e ' ... probe syscall.sendfile { d[tid()] = gettimeofday_us() } probe syscall.sendfile.return { if (d[tid()]) { st <<< gettimeofday_us() - d[tid()] delete d[tid()] } } probe end { print(@hist_log(st)) } '
  24. 24. e.g, Investigating slow sendfile(2) • Found that slow sendfile is blocking Kafka’s event-loop • => patch Kafka to eliminate blocking sendfile stap -e ‘…’ value |---------------------------------------- count 0 | 0 1 | 71 2 |@@@ 6171 16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29472 32 |@@@ 3418 2048 | 0 4096 | 1
  25. 25. and we contribute it back
  26. 26. More interested? • Kafka Summit SF 2017 • One Day, One Data Hub, 100 Billion Messages: Kafka at LINE • https://youtu.be/ X1zwbmLYPZg • Google “kafka summit line”
  27. 27. Summary • Large scale + high reliability = difficult and exciting Engineering! • LINE’s architecture will be keep evolving with OSSs • … and there are more challenges • Multi-IDC deployment • more and more performance and reliability improvements
  28. 28. End of presentation. Any questions?

×