SlideShare a Scribd company logo
1 of 28
Download to read offline
Building a company-wide data
pipeline upon Apache Kafka -
engineering for 150 billion
messages per day
Yuto Kawamura

LINE Corp
Speaker introduction
• Yuto Kawamura

• Senior software engineer of
LINE server development

• Work at Tokyo office

• Apache Kafka contributor

• Joined: Apr, 2015 (about 3
years)
About LINE
•Messaging service 

•Over 200 million global monthly active users
1
in countries with top
market share like Japan, Taiwan and Thailand

•Many family services

•News 

•Music

•LIVE (Video streaming) 

1
As of June 2017. Sum of 4 countries: Japan, Taiwan, Thailand and Indonesia. 

Agenda
• Introducing LINE server

• Data pipeline w/ Apache Kafka
LINE Server Engineering is
about …
• Scalability

• Many users, many requests, many data

• Reliability

• LINE already is a communication infra
in countries

Scale metrics: message
delivery
LINE Server
25 billion /day
(API call: 80 billion
/ day)
Scale metric: Accumulated
data (for analysis)
40PB
Messaging System
Architecture Overview
LINE Apps
LEGY JP
LEGY DE
LEGY SG
Thrift RPC/HTTP
talk-server
Distributed Data Store
Distributed async
task processing
LEGY
• LINE Event Delivery Gateway

• API Gateway/Reverse Proxy

• Written in Erlang

• Features focused on needs of implementing a messaging
service

• e.g, Zero latency code hot swapping w/o closing client
connections
talk-server
• Java based web application server

• Implements most of messaging functionality + some other
features

• Java8 + Spring + Thrift RPC + Tomcat8
Datastore with Redis and
HBase
• LINE’s hybrid datastore =
Redis(in-memory DB, home-
brew clustering) +
HBase(persistent distributed
key-value store)

• Cascading failure handling

• Async write from background
task processor

• Data correction batch
Primary/
Backup
talk-server
Cache/
Primary
Dual write
Message Delivery
LEGY
LEGY
talk-server
Storage
1. Find nearest LEGY
2. sendMessage(“Bob”, “Hello!”)
3. Proxy request
4. Write to storage
talk-server
X. fetchOps()
6. Proxy request
7. Read message
8. Return fetchOps() with message
5. Find LEGY Bob is connecting,
Notify message arrival
Alice
Bob
There’re a lot of internal communication
processing user’s request
talk-server
Threat
detection
system
Timeline
Server
Data Analysis
Background
Task
processing
Request
Communication between
internal systems
• Communication for querying, transactional
updates:

• Query authentication/permission

• Synchronous updates
• Communication for data synchronization, update
notification:

• Notify user’s relationship update

• Synchronize data update with another service
talk-server
Auth
Analytics
Another
Service
HTTP/REST/RPC
Apache Kafka
• A distributed streaming platform

• (narrow sense) A distributed persistent message queue
which supports Pub-Sub model

• Built-in load distribution

• Built-in fail-over on both server(broker) and client
How it works
Producer
Brokers
Consumer
Topic
Topic
Consumer
Consumer
Producer
AuthEvent event = AuthEvent.newBuilder()
.setUserId(123)
.setEventType(AuthEventType.REGISTER)
.build();
producer.send(new
ProducerRecord(“events", userId, event));
consumer = new KafkaConsumer("group.id" ->
"group-A");
consumer.subscribe("events");
consumer.poll(100)…
// => Record(key=123, value=...)
Consumer GroupA
Pub-Sub
Brokers
Consumer
Topic
Topic
Consumer
Consumer GroupB
Consumer
Consumer
Records[A, B, C…]
Records[A, B, C…]
• Multiple consumer “groups” can
independently consume a single topic
Example: UserActivityEvent
Scale metric: Events
produced into Kafka
Service Service
Service
Service
Service
Service
150 billion
msgs / day
(3 million msgs / sec)
our Kafka needs to be high-
performant
• Usages sensitive for delivery latency

• Broker’s latency impact throughput as well

• because Kafka topic is queue
… wasn’t a built-in property
• KAFKA-4614 Long GC pause harming broker performance
which is caused by mmap objects created for OffsetIndex

• 99th %ile latency of Produce request: 150 ~ 200ms => 10ms
(x15 ~ x20 faster)

• KAFKA-6051 ReplicaFetcherThread should close the
ReplicaFetcherBlockingSend earlier on shutdown

• Eliminated ~x1000 slower response during restarting broker 

• (unpublished yet) Kafka broker performance degradation when
consumer requests to fetch old data

• x10 ~ x15 speedup for 99th %ile response
Performance Engineering
Kafka
• Application Level:

• Read and understand code

• Patch it to eliminate
bottleneck

• JVM Level:

• JVM profiling

• GC log analysis

• JVM parameters tuning
• OS Level:

• Linux perf

• Delay Accounting

• SystemTap
e.g, Investigating slow
sendfile(2)
• SystemTap: A kernel dynamic tracing tool

• Inject script to probe in-kernel behavior
stap —e '
...
probe syscall.sendfile {
d[tid()] = gettimeofday_us()
}
probe syscall.sendfile.return {
if (d[tid()]) {
st <<< gettimeofday_us() - d[tid()]
delete d[tid()]
}
}
probe end {
print(@hist_log(st))
}
'
e.g, Investigating slow
sendfile(2)
• Found that slow sendfile is blocking Kafka’s event-loop

• => patch Kafka to eliminate blocking sendfile
stap -e ‘…’
value |---------------------------------------- count
0 | 0
1 | 71
2 |@@@ 6171
16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29472
32 |@@@ 3418
2048 | 0
4096 | 1
and we contribute it back
More interested?
• Kafka Summit SF 2017

• One Day, One Data Hub, 100
Billion Messages: Kafka at
LINE

• https://youtu.be/
X1zwbmLYPZg

• Google “kafka summit line”
Summary
• Large scale + high reliability = difficult and exciting
Engineering!

• LINE’s architecture will be keep evolving with OSSs

• … and there are more challenges

• Multi-IDC deployment

• more and more performance and reliability
improvements
End of presentation.
Any questions?

More Related Content

What's hot

Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
confluent
 

What's hot (20)

SOAP Monitoring
SOAP MonitoringSOAP Monitoring
SOAP Monitoring
 
Introducción a Stream Processing utilizando Kafka Streams
Introducción a Stream Processing utilizando Kafka StreamsIntroducción a Stream Processing utilizando Kafka Streams
Introducción a Stream Processing utilizando Kafka Streams
 
Web Analytics using Kafka - August talk w/ Women Who Code
Web Analytics using Kafka - August talk w/ Women Who CodeWeb Analytics using Kafka - August talk w/ Women Who Code
Web Analytics using Kafka - August talk w/ Women Who Code
 
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
 
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMillDelivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
 
Migrating applications to serverless Apache Kafka + KSQL
Migrating applications to serverless Apache Kafka + KSQLMigrating applications to serverless Apache Kafka + KSQL
Migrating applications to serverless Apache Kafka + KSQL
 
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
 
Autonomous workload rebalancing in kafka
Autonomous workload rebalancing in kafkaAutonomous workload rebalancing in kafka
Autonomous workload rebalancing in kafka
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
ONAP on Vagrant
ONAP on VagrantONAP on Vagrant
ONAP on Vagrant
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
 
Tale of two streaming frameworks- Apace Storm & Apache Flink
Tale of two streaming frameworks- Apace Storm & Apache FlinkTale of two streaming frameworks- Apace Storm & Apache Flink
Tale of two streaming frameworks- Apace Storm & Apache Flink
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
 
[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7
 
GraphQL - A love story
GraphQL -  A love storyGraphQL -  A love story
GraphQL - A love story
 
Microsoft Azure and Windows Application monitoring
Microsoft Azure and Windows Application monitoringMicrosoft Azure and Windows Application monitoring
Microsoft Azure and Windows Application monitoring
 
Kafka connect
Kafka connectKafka connect
Kafka connect
 
4. introduction to Asp.Net MVC - Part II
4. introduction to Asp.Net MVC - Part II4. introduction to Asp.Net MVC - Part II
4. introduction to Asp.Net MVC - Part II
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 

Similar to Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...
kawamuray
 
NoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePack
Sadayuki Furuhashi
 
NoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePackNoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePack
Sadayuki Furuhashi
 

Similar to Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day (20)

Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
NoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePack
 
NoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePackNoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePack
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised Land
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
 
Ruslan Belkin And Sean Dawson on LinkedIn's Network Updates Uncovered
Ruslan Belkin And Sean Dawson on LinkedIn's Network Updates UncoveredRuslan Belkin And Sean Dawson on LinkedIn's Network Updates Uncovered
Ruslan Belkin And Sean Dawson on LinkedIn's Network Updates Uncovered
 
Liveperson DLD 2015
Liveperson DLD 2015 Liveperson DLD 2015
Liveperson DLD 2015
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
10135 b 11
10135 b 1110135 b 11
10135 b 11
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 

More from LINE Corporation

More from LINE Corporation (20)

JJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LTJJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LT
 
Reduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin CoroutinesReduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin Coroutines
 
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみたKotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
 
Use Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extensionUse Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extension
 
The Magic of LINE 購物 Testing
The Magic of LINE 購物 TestingThe Magic of LINE 購物 Testing
The Magic of LINE 購物 Testing
 
GA Test Automation
GA Test AutomationGA Test Automation
GA Test Automation
 
UI Automation Test with JUnit5
UI Automation Test with JUnit5UI Automation Test with JUnit5
UI Automation Test with JUnit5
 
Feature Detection for UI Testing
Feature Detection for UI TestingFeature Detection for UI Testing
Feature Detection for UI Testing
 
LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享
 
​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享
 
LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣
 
日本開發者大會短講分享
日本開發者大會短講分享日本開發者大會短講分享
日本開發者大會短講分享
 
LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享
 
在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed Kubernetes在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed Kubernetes
 
LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧
 
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
 
LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享
 
LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗
 
LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務
 
Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreel
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 

Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

  • 1. Building a company-wide data pipeline upon Apache Kafka - engineering for 150 billion messages per day Yuto Kawamura LINE Corp
  • 2. Speaker introduction • Yuto Kawamura • Senior software engineer of LINE server development • Work at Tokyo office • Apache Kafka contributor • Joined: Apr, 2015 (about 3 years)
  • 3. About LINE •Messaging service •Over 200 million global monthly active users 1 in countries with top market share like Japan, Taiwan and Thailand
 •Many family services •News •Music •LIVE (Video streaming) 
 1 As of June 2017. Sum of 4 countries: Japan, Taiwan, Thailand and Indonesia. 

  • 4. Agenda • Introducing LINE server • Data pipeline w/ Apache Kafka
  • 5. LINE Server Engineering is about … • Scalability • Many users, many requests, many data • Reliability • LINE already is a communication infra in countries

  • 6. Scale metrics: message delivery LINE Server 25 billion /day (API call: 80 billion / day)
  • 7. Scale metric: Accumulated data (for analysis) 40PB
  • 8. Messaging System Architecture Overview LINE Apps LEGY JP LEGY DE LEGY SG Thrift RPC/HTTP talk-server Distributed Data Store Distributed async task processing
  • 9. LEGY • LINE Event Delivery Gateway • API Gateway/Reverse Proxy • Written in Erlang • Features focused on needs of implementing a messaging service • e.g, Zero latency code hot swapping w/o closing client connections
  • 10. talk-server • Java based web application server • Implements most of messaging functionality + some other features • Java8 + Spring + Thrift RPC + Tomcat8
  • 11. Datastore with Redis and HBase • LINE’s hybrid datastore = Redis(in-memory DB, home- brew clustering) + HBase(persistent distributed key-value store) • Cascading failure handling • Async write from background task processor • Data correction batch Primary/ Backup talk-server Cache/ Primary Dual write
  • 12. Message Delivery LEGY LEGY talk-server Storage 1. Find nearest LEGY 2. sendMessage(“Bob”, “Hello!”) 3. Proxy request 4. Write to storage talk-server X. fetchOps() 6. Proxy request 7. Read message 8. Return fetchOps() with message 5. Find LEGY Bob is connecting, Notify message arrival Alice Bob
  • 13. There’re a lot of internal communication processing user’s request talk-server Threat detection system Timeline Server Data Analysis Background Task processing Request
  • 14. Communication between internal systems • Communication for querying, transactional updates: • Query authentication/permission • Synchronous updates • Communication for data synchronization, update notification: • Notify user’s relationship update • Synchronize data update with another service talk-server Auth Analytics Another Service HTTP/REST/RPC
  • 15. Apache Kafka • A distributed streaming platform • (narrow sense) A distributed persistent message queue which supports Pub-Sub model • Built-in load distribution • Built-in fail-over on both server(broker) and client
  • 16. How it works Producer Brokers Consumer Topic Topic Consumer Consumer Producer AuthEvent event = AuthEvent.newBuilder() .setUserId(123) .setEventType(AuthEventType.REGISTER) .build(); producer.send(new ProducerRecord(“events", userId, event)); consumer = new KafkaConsumer("group.id" -> "group-A"); consumer.subscribe("events"); consumer.poll(100)… // => Record(key=123, value=...)
  • 17. Consumer GroupA Pub-Sub Brokers Consumer Topic Topic Consumer Consumer GroupB Consumer Consumer Records[A, B, C…] Records[A, B, C…] • Multiple consumer “groups” can independently consume a single topic
  • 19. Scale metric: Events produced into Kafka Service Service Service Service Service Service 150 billion msgs / day (3 million msgs / sec)
  • 20. our Kafka needs to be high- performant • Usages sensitive for delivery latency • Broker’s latency impact throughput as well • because Kafka topic is queue
  • 21. … wasn’t a built-in property • KAFKA-4614 Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex • 99th %ile latency of Produce request: 150 ~ 200ms => 10ms (x15 ~ x20 faster) • KAFKA-6051 ReplicaFetcherThread should close the ReplicaFetcherBlockingSend earlier on shutdown • Eliminated ~x1000 slower response during restarting broker • (unpublished yet) Kafka broker performance degradation when consumer requests to fetch old data • x10 ~ x15 speedup for 99th %ile response
  • 22. Performance Engineering Kafka • Application Level: • Read and understand code • Patch it to eliminate bottleneck • JVM Level: • JVM profiling • GC log analysis • JVM parameters tuning • OS Level: • Linux perf • Delay Accounting • SystemTap
  • 23. e.g, Investigating slow sendfile(2) • SystemTap: A kernel dynamic tracing tool • Inject script to probe in-kernel behavior stap —e ' ... probe syscall.sendfile { d[tid()] = gettimeofday_us() } probe syscall.sendfile.return { if (d[tid()]) { st <<< gettimeofday_us() - d[tid()] delete d[tid()] } } probe end { print(@hist_log(st)) } '
  • 24. e.g, Investigating slow sendfile(2) • Found that slow sendfile is blocking Kafka’s event-loop • => patch Kafka to eliminate blocking sendfile stap -e ‘…’ value |---------------------------------------- count 0 | 0 1 | 71 2 |@@@ 6171 16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29472 32 |@@@ 3418 2048 | 0 4096 | 1
  • 25. and we contribute it back
  • 26. More interested? • Kafka Summit SF 2017 • One Day, One Data Hub, 100 Billion Messages: Kafka at LINE • https://youtu.be/ X1zwbmLYPZg • Google “kafka summit line”
  • 27. Summary • Large scale + high reliability = difficult and exciting Engineering! • LINE’s architecture will be keep evolving with OSSs • … and there are more challenges • Multi-IDC deployment • more and more performance and reliability improvements