I/O Intensive 

Kafka consumer 

LINE Ads Platform 

HARUKI OKADA
About Me
● (Okada Haruki)
● : @ocadaruma
● - 2017/09
●
● 2017/10 -
● LINE
● LINE Ads Platform
Timeline LINE NEWS LINE Manga LINE BLOG
LINE Ads Platform
● LINE Ads Platform DMP
● 

●
● ML
CTR 



LINE DMP
LINE DMP
● Kafka consumer
● Mobile App Segment
● :
● SDK postback 

● postback Kafka consumer 

●
Mobile App Segment
Mobile App Segment
● Single Event Processing
● postback event 

HBase Redis
● I/O Intensive
Mobile App Segment Worker
Initial Implementation
● Using Kafka Streams DSL
KStreamBuilder builder = new KStreamBuilder();
builder
.stream(keySerde, valSerde, "postback-event")
.foreach((key, value) -> {
if (matches(value)) {
// write to storage synchronously
writeToStorage(createSegmentData(value));
}
});
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
Throughput Issue
KStreamBuilder builder = new KStreamBuilder();
builder
.stream(keySerde, valSerde, "postback-event")
.foreach((key, value) -> {
if (matches(value)) {
// write to storage synchronously
writeToStorage(createSegmentData(value));
}
});
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
● HBase
●
Async Write ?
KStreamBuilder builder = new KStreamBuilder();
builder
.stream(keySerde, valSerde, "postback-event")
.foreach((key, value) -> {
if (matches(value)) {
writeToStorageAsync(createSegmentData(value))
.whenComplete((result, err) -> {
// error handling etc
});
}
});
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
Problem
● Data loss possibility
● 1. offset[3,4,5] message consume
● 2. HBase Kafka Streams offset commit
● 3. offset=4 record => Data loss
Increase Partition & Stream Threads ?
● num.stream.threads config
● 1 Stream Thread consume topic 

partition subset
● Max concurrency = num partitions
● ex) Storage write latency 5ms , 100K/sec
● 200/sec per partition
● 500 partitions
Downsides
● partition producer broker
● LINE Ads Platform IMF Kafka cluster 

cluster partition
https://www.slideshare.net/linecorp/multitenancy-kafka-cluster-for-line-services-with-250-billion-daily-messages
※IMF Kafka: LINE Kafka cluster 2500 /
Solution
● Kafka Streams Consume offset
commit
Solution
● consume offset 

● offset complete
● offset
complete offset high
watermark
● consume loop high watermark
commit
Solution
● offset
consume
● Consumer#pause
● ↑
Decaton
Conclusion
● LINE Ads Platform Kafka
consumer
●
I/O I/O Intensive consumer


I/O intensiveなKafka ConsumerアプリケーションのスループットをLINE Ads Platformではどのように改善したか

  • 1.
    I/O Intensive 
 Kafkaconsumer 
 LINE Ads Platform 
 HARUKI OKADA
  • 2.
    About Me ● (OkadaHaruki) ● : @ocadaruma ● - 2017/09 ● ● 2017/10 - ● LINE ● LINE Ads Platform
  • 3.
    Timeline LINE NEWSLINE Manga LINE BLOG LINE Ads Platform
  • 4.
    ● LINE AdsPlatform DMP ● 
 ● ● ML CTR 
 
 LINE DMP
  • 5.
    LINE DMP ● Kafkaconsumer ● Mobile App Segment
  • 6.
    ● : ● SDKpostback 
 ● postback Kafka consumer 
 ● Mobile App Segment
  • 7.
  • 8.
    ● Single EventProcessing ● postback event 
 HBase Redis ● I/O Intensive Mobile App Segment Worker
  • 9.
    Initial Implementation ● UsingKafka Streams DSL KStreamBuilder builder = new KStreamBuilder(); builder .stream(keySerde, valSerde, "postback-event") .foreach((key, value) -> { if (matches(value)) { // write to storage synchronously writeToStorage(createSegmentData(value)); } }); KafkaStreams streams = new KafkaStreams(builder, config); streams.start();
  • 10.
    Throughput Issue KStreamBuilder builder= new KStreamBuilder(); builder .stream(keySerde, valSerde, "postback-event") .foreach((key, value) -> { if (matches(value)) { // write to storage synchronously writeToStorage(createSegmentData(value)); } }); KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); ● HBase
  • 11.
    ● Async Write ? KStreamBuilderbuilder = new KStreamBuilder(); builder .stream(keySerde, valSerde, "postback-event") .foreach((key, value) -> { if (matches(value)) { writeToStorageAsync(createSegmentData(value)) .whenComplete((result, err) -> { // error handling etc }); } }); KafkaStreams streams = new KafkaStreams(builder, config); streams.start();
  • 12.
    Problem ● Data losspossibility ● 1. offset[3,4,5] message consume ● 2. HBase Kafka Streams offset commit ● 3. offset=4 record => Data loss
  • 13.
    Increase Partition &Stream Threads ? ● num.stream.threads config ● 1 Stream Thread consume topic 
 partition subset ● Max concurrency = num partitions ● ex) Storage write latency 5ms , 100K/sec ● 200/sec per partition ● 500 partitions
  • 14.
    Downsides ● partition producerbroker ● LINE Ads Platform IMF Kafka cluster 
 cluster partition https://www.slideshare.net/linecorp/multitenancy-kafka-cluster-for-line-services-with-250-billion-daily-messages ※IMF Kafka: LINE Kafka cluster 2500 /
  • 15.
    Solution ● Kafka StreamsConsume offset commit
  • 16.
    Solution ● consume offset
 ● offset complete ● offset complete offset high watermark ● consume loop high watermark commit
  • 17.
  • 18.
    Conclusion ● LINE AdsPlatform Kafka consumer ● I/O I/O Intensive consumer