SlideShare a Scribd company logo
1 of 43
MULTI-TENANCY KAFKA
CLUSTER FOR LINE SERVICES
WITH 250 BILLION DAILY
MESSAGES
SPEAKER
● Yuto Kawamura
● Senior Software Engineer
● Lead of the team providing company-
wide Kafka platform
● Active in Apache Kafka Community
● Code contribution
● Speaking at Kafka Summit
Agenda • Multitenancy company-wide
Kafka Platform
APACHE KAFKA
● Middleware for streaming data
● Highly scalable/available
● Data persistency
● Supports Pub-Sub model
KAFKA COMPONENTS
Consumer
ConsumerProducer
Producer
Broker cluster
KAFKA PLATFORM
● Large scale Kafka clusters provided for any systems/services
inside LINE
● Started internally from messaging server development team
● Expanded to company-wide platform
USAGE
● Two kinds of usage
● Distributed task queue for buffering and processing business
logic asynchronously
● e.g, Queueing heavy task from web app server to background
task processor
● "Data Hub" for distributing data to other services
PUB-SUB EXAMPLE
SINGLE CLUSTER SHARED BY MANY SYSTEMS
● Concept of “Data Hub”
● Easy to find and access data
● Operational and management efficiency
● Not making operational cost to be proportional to users
● Concentrate engineering resources for maximizing reliability/
performance
MULTITENANCY
Blockchain
Platform
Data Analysis
System
Timeline
5 million / second
210TB
Daily inflow
4GB / second
50+
systems
250 billion
Daily messages
SCALE
● Cluster can protect itself against abusive workloads
● Accidental workload doesn't propagate to other users
● We can track on which client is sending requests
● Find source of strange requests
● Certain level of isolation among client workloads
● Slow response for one client doesn't appears to another client
MULTITENANCY REQUIREMENTS
● It is more important to manage number of requests over
incoming/outgoing byte rate
● Kafka is amazingly durable for large data
● Good design leveraging system functions
● Page cache for caching data
● sendfile(2) for zero copy transfer data
● Native batching
● Typical danger exists in clients sending many requests
PROTECT CLUSTER AGAINST ABUSING
REQUEST QUOTA
● Use Request Quota
● Limit “Time of broker threads that
can be used by each client group”
● Set default quota
● Prevent single client from
accidentally consuming all broker
resources
ISOLATION AMONG CLIENT WORKLOADS
● When can performance isolation among different clients violated?
● Let’s look at example of actual troubleshooting.
DETECTION
● 50x ~ 100x slower response time
in 99th %ile Produce response
time
● Normal: ~20ms
● Observed: 50ms ~ 200ms
FINDING #1
● Coincidental disk read of a
certain amount
FINDING #2
● Network threads' utilization was
very high
REQUEST HANDLING IN KAFKA BROKER
● Network Threads: Reads/
Writes request/response from/to
client sockets
● Request Handler Threads:
Processes requests and
produces response object
REQUEST HANDLING - READ REQUEST
REQUEST HANDLING - PROCESS
REQUEST HANDLING - WRITE RESPONSE
NETWORK THREAD RUNS EVENT LOOP
● Multiplex and processes client sockets assigned sequentially
● It never blocks awaiting IO completion
WHEN NETWORK THREADS GETS BUSY...
● It means one of following:
● 1. Really busy doing lots of work Many requests/responses to
read/write
● 2. Blocked by some operations (which should not happen in
event loop in general)
RESPONSE HANDLING - NORMAL REQUESTS
● When response is in queue, all
data to be transferred are in
memory
RESPONSE HANDLING - FETCH RESPONSE
● When response is in queue, topic
data segments are not in
userspace memory
● => Copy to client socket directly
inside the kernel using
sendfile(2) system call
IF TARGET DATA NOT EXISTS IN PAGE CACHE
● Target data in page cache:
● => Just a memory copy. Very fast: ~ 100us
● Target data is NOT in page cache:
● => Needs to load data from disk into page cache first. Can be
slow: ~ 50ms (or even slower)
● => If this happens in event loop…?
SUSPECTING BLOCKING IN SENDFILE(2)
● Inspect duration of sendfile system calls issued by broker
process
● How?
SYSTEMTAP
● A kernel layer dynamic tracing tool
and scripting language
● Safe to run in production because
of low overhead
● Alternatively: DTrace, eBPF, etc…
INSPECT SENDFILE(2) DURATION
PROBLEM HYPOTHESIS
SOLUTION
● Make sure that data is ready on memory before the response is
passed to the network thread
● => Event loop never blocks
WARMUP PAGE CACHE
● Move blocking part to request
handler threads (= single queue
and pool of threads)
WARMUP PAGE CACHE
● When Network thread calls
sendfile(2) for transferring log data,
it's always in page cache
WARMUP IMPLEMENTATION
● Easiest way: Do synchronous read(2) on target data
● => Large overhead by copying memory from kernel to
userland
● Why is Kafka using sendfile(2) for transferring topic data?
● => To avoid expensive large memory copy
● How can we achieve it keeping this property?
TRICK - ZERO COPY PAGE LOAD
● Call sendfile(2) for target data with
dest /dev/null
● The /dev/null driver does not
actually copy data to anywhere
WHY IT HAS ALMOST NO OVERHEAD?
● Linux kernel internally uses `splice` to implement sendfile(2)
● `splice` implementation of /dev/null returns without iterating
target data
PATCHING KAFKA
IT WORKED
● No response time degradation
with coincidence of Fetch request
reading disk
KAFKA-7504 Broker performance degradation caused by call of sendfile
reading disk in network thread - x50 ~ x100 response time reduction
KAFKA-4614 Long GC pause harming broker performance which is caused by
mmap objects created for OffsetIndex - x100 ~ response time reduction
KAFKA-6501 ReplicaFetcherThread should close the ReplicaFetcherBlockingSend
earlier on shutdown - Eliminate significant latency during broker restart
WHY NOT CONTRIBUTE IT BACK?
CONCLUSION
● Introduced our engineering for operating the company-wide
Kafka platform
● Quota, SystemTap and patch understanding system deeply
● After fixing some issues, our hosting policy is working well and
efficiently, keeping:
● concept of single "Data Hub" and
● operational cost not to be proportional to the number of users/
usages
● We are contributing to the world through OSS
… AND NEXT
● Kafka platform evolution
● Clients standardization and management
● Higher availability
● SRE team
● Planning to rollout new team for Reliability Engineering
● Share knowledge/tools which are independent from
Middleware
● Come and ask me more!
THANK YOU

More Related Content

What's hot

What's hot (20)

マイクロサービスにおける 結果整合性との戦い
マイクロサービスにおける 結果整合性との戦いマイクロサービスにおける 結果整合性との戦い
マイクロサービスにおける 結果整合性との戦い
 
コンテナネットワーキング(CNI)最前線
コンテナネットワーキング(CNI)最前線コンテナネットワーキング(CNI)最前線
コンテナネットワーキング(CNI)最前線
 
At least onceってぶっちゃけ問題の先送りだったよね #kafkajp
At least onceってぶっちゃけ問題の先送りだったよね #kafkajpAt least onceってぶっちゃけ問題の先送りだったよね #kafkajp
At least onceってぶっちゃけ問題の先送りだったよね #kafkajp
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
 
Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Dockerからcontainerdへの移行
Dockerからcontainerdへの移行
 
KafkaとPulsar
KafkaとPulsarKafkaとPulsar
KafkaとPulsar
 
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
 
[Aurora事例祭り]Amazon Aurora を使いこなすためのベストプラクティス
[Aurora事例祭り]Amazon Aurora を使いこなすためのベストプラクティス[Aurora事例祭り]Amazon Aurora を使いこなすためのベストプラクティス
[Aurora事例祭り]Amazon Aurora を使いこなすためのベストプラクティス
 
SparkやBigQueryなどを用いた モバイルゲーム分析環境
SparkやBigQueryなどを用いたモバイルゲーム分析環境SparkやBigQueryなどを用いたモバイルゲーム分析環境
SparkやBigQueryなどを用いた モバイルゲーム分析環境
 
NGINXをBFF (Backend for Frontend)として利用した話
NGINXをBFF (Backend for Frontend)として利用した話NGINXをBFF (Backend for Frontend)として利用した話
NGINXをBFF (Backend for Frontend)として利用した話
 
JCBの Payment as a Service 実現にむけたゼロベースの組織変革とテクニカル・イネーブラー(NTTデータ テクノロジーカンファレンス ...
JCBの Payment as a Service 実現にむけたゼロベースの組織変革とテクニカル・イネーブラー(NTTデータ テクノロジーカンファレンス ...JCBの Payment as a Service 実現にむけたゼロベースの組織変革とテクニカル・イネーブラー(NTTデータ テクノロジーカンファレンス ...
JCBの Payment as a Service 実現にむけたゼロベースの組織変革とテクニカル・イネーブラー(NTTデータ テクノロジーカンファレンス ...
 
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...
 
Molecule入門
Molecule入門Molecule入門
Molecule入門
 
KeycloakでFAPIに対応した高セキュリティなAPIを公開する
KeycloakでFAPIに対応した高セキュリティなAPIを公開するKeycloakでFAPIに対応した高セキュリティなAPIを公開する
KeycloakでFAPIに対応した高セキュリティなAPIを公開する
 
BuildKitの概要と最近の機能
BuildKitの概要と最近の機能BuildKitの概要と最近の機能
BuildKitの概要と最近の機能
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)
その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)
その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)
 
ぱぱっと理解するSpring Cloudの基本
ぱぱっと理解するSpring Cloudの基本ぱぱっと理解するSpring Cloudの基本
ぱぱっと理解するSpring Cloudの基本
 
Apache Kafka 0.11 の Exactly Once Semantics
Apache Kafka 0.11 の Exactly Once SemanticsApache Kafka 0.11 の Exactly Once Semantics
Apache Kafka 0.11 の Exactly Once Semantics
 
Apache Avro vs Protocol Buffers
Apache Avro vs Protocol BuffersApache Avro vs Protocol Buffers
Apache Avro vs Protocol Buffers
 

Similar to Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages

Storage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailStorage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, Whiptail
Internet World
 

Similar to Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages (20)

Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Lookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesLookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million Devices
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Storage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailStorage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, Whiptail
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
 
Dark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill MonkmanDark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill Monkman
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Pulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformPulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platform
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafka
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Galera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replicationGalera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replication
 

More from LINE Corporation

More from LINE Corporation (20)

JJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LTJJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LT
 
Reduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin CoroutinesReduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin Coroutines
 
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみたKotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
 
Use Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extensionUse Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extension
 
The Magic of LINE 購物 Testing
The Magic of LINE 購物 TestingThe Magic of LINE 購物 Testing
The Magic of LINE 購物 Testing
 
GA Test Automation
GA Test AutomationGA Test Automation
GA Test Automation
 
UI Automation Test with JUnit5
UI Automation Test with JUnit5UI Automation Test with JUnit5
UI Automation Test with JUnit5
 
Feature Detection for UI Testing
Feature Detection for UI TestingFeature Detection for UI Testing
Feature Detection for UI Testing
 
LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享
 
​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享
 
LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣
 
日本開發者大會短講分享
日本開發者大會短講分享日本開發者大會短講分享
日本開發者大會短講分享
 
LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享
 
在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed Kubernetes在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed Kubernetes
 
LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧
 
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
 
LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享
 
LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗
 
LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務
 
Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages

  • 1. MULTI-TENANCY KAFKA CLUSTER FOR LINE SERVICES WITH 250 BILLION DAILY MESSAGES
  • 2. SPEAKER ● Yuto Kawamura ● Senior Software Engineer ● Lead of the team providing company- wide Kafka platform ● Active in Apache Kafka Community ● Code contribution ● Speaking at Kafka Summit
  • 3. Agenda • Multitenancy company-wide Kafka Platform
  • 4. APACHE KAFKA ● Middleware for streaming data ● Highly scalable/available ● Data persistency ● Supports Pub-Sub model
  • 6. KAFKA PLATFORM ● Large scale Kafka clusters provided for any systems/services inside LINE ● Started internally from messaging server development team ● Expanded to company-wide platform
  • 7. USAGE ● Two kinds of usage ● Distributed task queue for buffering and processing business logic asynchronously ● e.g, Queueing heavy task from web app server to background task processor ● "Data Hub" for distributing data to other services
  • 9. SINGLE CLUSTER SHARED BY MANY SYSTEMS ● Concept of “Data Hub” ● Easy to find and access data ● Operational and management efficiency ● Not making operational cost to be proportional to users ● Concentrate engineering resources for maximizing reliability/ performance
  • 11. 5 million / second 210TB Daily inflow 4GB / second 50+ systems 250 billion Daily messages SCALE
  • 12. ● Cluster can protect itself against abusive workloads ● Accidental workload doesn't propagate to other users ● We can track on which client is sending requests ● Find source of strange requests ● Certain level of isolation among client workloads ● Slow response for one client doesn't appears to another client MULTITENANCY REQUIREMENTS
  • 13. ● It is more important to manage number of requests over incoming/outgoing byte rate ● Kafka is amazingly durable for large data ● Good design leveraging system functions ● Page cache for caching data ● sendfile(2) for zero copy transfer data ● Native batching ● Typical danger exists in clients sending many requests PROTECT CLUSTER AGAINST ABUSING
  • 14. REQUEST QUOTA ● Use Request Quota ● Limit “Time of broker threads that can be used by each client group” ● Set default quota ● Prevent single client from accidentally consuming all broker resources
  • 15. ISOLATION AMONG CLIENT WORKLOADS ● When can performance isolation among different clients violated? ● Let’s look at example of actual troubleshooting.
  • 16. DETECTION ● 50x ~ 100x slower response time in 99th %ile Produce response time ● Normal: ~20ms ● Observed: 50ms ~ 200ms
  • 17. FINDING #1 ● Coincidental disk read of a certain amount
  • 18. FINDING #2 ● Network threads' utilization was very high
  • 19. REQUEST HANDLING IN KAFKA BROKER ● Network Threads: Reads/ Writes request/response from/to client sockets ● Request Handler Threads: Processes requests and produces response object
  • 20. REQUEST HANDLING - READ REQUEST
  • 22. REQUEST HANDLING - WRITE RESPONSE
  • 23. NETWORK THREAD RUNS EVENT LOOP ● Multiplex and processes client sockets assigned sequentially ● It never blocks awaiting IO completion
  • 24. WHEN NETWORK THREADS GETS BUSY... ● It means one of following: ● 1. Really busy doing lots of work Many requests/responses to read/write ● 2. Blocked by some operations (which should not happen in event loop in general)
  • 25. RESPONSE HANDLING - NORMAL REQUESTS ● When response is in queue, all data to be transferred are in memory
  • 26. RESPONSE HANDLING - FETCH RESPONSE ● When response is in queue, topic data segments are not in userspace memory ● => Copy to client socket directly inside the kernel using sendfile(2) system call
  • 27. IF TARGET DATA NOT EXISTS IN PAGE CACHE ● Target data in page cache: ● => Just a memory copy. Very fast: ~ 100us ● Target data is NOT in page cache: ● => Needs to load data from disk into page cache first. Can be slow: ~ 50ms (or even slower) ● => If this happens in event loop…?
  • 28. SUSPECTING BLOCKING IN SENDFILE(2) ● Inspect duration of sendfile system calls issued by broker process ● How?
  • 29. SYSTEMTAP ● A kernel layer dynamic tracing tool and scripting language ● Safe to run in production because of low overhead ● Alternatively: DTrace, eBPF, etc…
  • 32. SOLUTION ● Make sure that data is ready on memory before the response is passed to the network thread ● => Event loop never blocks
  • 33. WARMUP PAGE CACHE ● Move blocking part to request handler threads (= single queue and pool of threads)
  • 34. WARMUP PAGE CACHE ● When Network thread calls sendfile(2) for transferring log data, it's always in page cache
  • 35. WARMUP IMPLEMENTATION ● Easiest way: Do synchronous read(2) on target data ● => Large overhead by copying memory from kernel to userland ● Why is Kafka using sendfile(2) for transferring topic data? ● => To avoid expensive large memory copy ● How can we achieve it keeping this property?
  • 36. TRICK - ZERO COPY PAGE LOAD ● Call sendfile(2) for target data with dest /dev/null ● The /dev/null driver does not actually copy data to anywhere
  • 37. WHY IT HAS ALMOST NO OVERHEAD? ● Linux kernel internally uses `splice` to implement sendfile(2) ● `splice` implementation of /dev/null returns without iterating target data
  • 39. IT WORKED ● No response time degradation with coincidence of Fetch request reading disk
  • 40. KAFKA-7504 Broker performance degradation caused by call of sendfile reading disk in network thread - x50 ~ x100 response time reduction KAFKA-4614 Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex - x100 ~ response time reduction KAFKA-6501 ReplicaFetcherThread should close the ReplicaFetcherBlockingSend earlier on shutdown - Eliminate significant latency during broker restart WHY NOT CONTRIBUTE IT BACK?
  • 41. CONCLUSION ● Introduced our engineering for operating the company-wide Kafka platform ● Quota, SystemTap and patch understanding system deeply ● After fixing some issues, our hosting policy is working well and efficiently, keeping: ● concept of single "Data Hub" and ● operational cost not to be proportional to the number of users/ usages ● We are contributing to the world through OSS
  • 42. … AND NEXT ● Kafka platform evolution ● Clients standardization and management ● Higher availability ● SRE team ● Planning to rollout new team for Reliability Engineering ● Share knowledge/tools which are independent from Middleware ● Come and ask me more!