I/O intensiveなKafka ConsumerアプリケーションのスループットをLINE Ads Platformではどのように改善したか

•

2 likes•1,845 views

岡田遥来ジャコ(@ocadaruma) LINE株式会社 Apache Kafka Meetup Japan #5 @LINEでの発表資料です https://kafka-apache-jp.connpass.com/event/104465/

Technology

I/O Intensive  
Kafka consumer  
LINE Ads Platform  
HARUKI OKADA

About Me
● (Okada Haruki)
● : @ocadaruma
● - 2017/09
●
● 2017/10 -
● LINE
● LINE Ads Platform

Timeline LINE NEWS LINE Manga LINE BLOG
LINE Ads Platform

● LINE Ads Platform DMP
●  
●
● ML
CTR  
 
LINE DMP

LINE DMP
● Kafka consumer
● Mobile App Segment

● :
● SDK postback  
● postback Kafka consumer  
●
Mobile App Segment

● Single Event Processing
● postback event  
HBase Redis
● I/O Intensive
Mobile App Segment Worker

Initial Implementation
● Using Kafka Streams DSL
KStreamBuilder builder = new KStreamBuilder();
builder
.stream(keySerde, valSerde, "postback-event")
.foreach((key, value) -> {
if (matches(value)) {
// write to storage synchronously
writeToStorage(createSegmentData(value));
}
});
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();

Throughput Issue
KStreamBuilder builder = new KStreamBuilder();
builder
.stream(keySerde, valSerde, "postback-event")
.foreach((key, value) -> {
if (matches(value)) {
// write to storage synchronously
writeToStorage(createSegmentData(value));
}
});
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
● HBase

●
Async Write ?
KStreamBuilder builder = new KStreamBuilder();
builder
.stream(keySerde, valSerde, "postback-event")
.foreach((key, value) -> {
if (matches(value)) {
writeToStorageAsync(createSegmentData(value))
.whenComplete((result, err) -> {
// error handling etc
});
}
});
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();

Problem
● Data loss possibility
● 1. offset[3,4,5] message consume
● 2. HBase Kafka Streams offset commit
● 3. offset=4 record => Data loss

Increase Partition & Stream Threads ?
● num.stream.threads config
● 1 Stream Thread consume topic  
partition subset
● Max concurrency = num partitions
● ex) Storage write latency 5ms , 100K/sec
● 200/sec per partition
● 500 partitions

Downsides
● partition producer broker
● LINE Ads Platform IMF Kafka cluster  
cluster partition
https://www.slideshare.net/linecorp/multitenancy-kafka-cluster-for-line-services-with-250-billion-daily-messages
※IMF Kafka: LINE Kafka cluster 2500 /

Solution
● Kafka Streams Consume offset
commit

Solution
● consume offset  
● offset complete
● offset
complete offset high
watermark
● consume loop high watermark
commit

Solution
● offset
consume
● Consumer#pause
● ↑
Decaton

Conclusion
● LINE Ads Platform Kafka
consumer
●
I/O I/O Intensive consumer

Real Time Stream Processing with KSQL and Kafka David Peterson, Confluent APAC APIdays Melbourne 2018 Unordered, unbounded and massive datasets are increasingly common in day-to-day business. Using this to your advantage is incredibly difficult with current system designs. We are stuck in a model where we can only take advantage of this *after* it has happened. Many times, this is too late to be useful in the enterprise. KSQL is a streaming SQL engine for Apache Kafka. KSQL lowers the entry bar to the world of stream processing, providing a simple and completely interactive SQL interface for processing data in Kafka. KSQL (like Kafka) is open-source, distributed, scalable, and reliable. A real time Kafka platform moves your data up the stack, closer to the heart of your business, allowing you to build scalable, mission-critical services by quickly deploying SQL-like queries in a severless pattern. This talk will highlight key use cases for real time data, and stream processing with KSQL: Real time analytics, security and anomaly detection, real time ETL / data integration, Internet of Things, application development, and deploying Machine Learning models with KSQ. Real time data and stream processing means that Kafka is just as important to the disrupted as it is to the disruptors.

Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!

confluent

Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK

Sungmin Kim

Serverless GraphQL. AppSync 101

Marcin Sodkiewicz

Presentation for GraphQL Wroclaw #2. Presented on 7th May 2019. Available on: https://www.facebook.com/mirumeelabs/videos/496722830863869/ Serverless is one of the hottest architectural patterns for developing cloud-native applications. GraphQL is rapidly becoming a standard for creating APIs in this environment. The AWS AppSync GraphQL service is a great way to get started with GraphQL in a serverless environment. During this presentation, you will learn how to create serverless GraphQL API applications based on this technology using cloud services like DynamoDB, Elasticsearch, Lambda and Cloudformation.

Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...

confluent

KSQL is an easy-to-use and easy-to-understand streaming SQL engine for Apache Kafka built on top of Kafka Streams. The ability to write streaming applications using only SQL makes Apache Kafka available to a whole range of new developers and potential use cases, either as a stand-alone solution, or as a single component to a broader Kafka Streams implementation. Inspired by a customer project now in production, experience the lifecycle of a streaming application developed using KSQL and Kafka Streams. With Apache Gradle as our build framework, we’ll explore the open-source Gradle plugin we built during this project to improve developer efficiency and automate the deployment of KSQL pipelines, user-defined functions, and Kafka Streams microservices. We’ll demonstrate the deployment process live, and discuss design decisions around incorporating SQL-based processes into an overall streaming application. Key Takeaways 1. KSQL is a natural choice for expressing data-driven applications, but it may not naturally fit into established DevOps processes and automations. 2. We built an open-source Gradle plugin to handle all aspects of deploying a Kafka-based streaming application: KSQL pipelines, KSQL user-defined functions, and Kafka Streams microservices. 3. KSQL pipelines can be deployed using either a server start script, or the KSQL REST API, and our Gradle plugin fully supports both options.

Streams, Tables, and Time in KSQL

confluent

Speaker: Matthias J. Sax, Software Engineer, Confluent KSQL is the Streaming SQL engine for Apache Kafka that allows for continuous data stream processing. While KSQL looks very similar to SQL, it provides quite different semantics. First, KSQL queries can be defined over data streams, not just tables. Second, queries over tables are no snapshot queries, but run forever. And third, time is a core concept in KSQL and data stream processing in general. In this talk, we explore the nature of Streaming SQL and its temporal semantics that apply to both streams and tables. We will explain continuous queries semantics, the relationship between streams and tables, and demystify the temporal nature of KSQL tables. Furthermore, we dig into filter, aggregation, and join operations over stream and tables as well as stream specific operators like windowing. At the end, you will be equipped to query streams and tables using KSQL and understand their temporal relationship to each other.

A data analytics project for a food processing factory revealed that business problems could be solved and processes improved by implementing streaming applications. BAADER's Transport Manager consists of a few microservices based on Kafka Streams and several ksqlDB queries running on a managed ksqlDB in Confluent Cloud. It tracks poultry trailers to improve animal welfare. Moreover, additional data are associated, such as weather information and ETA, to optimize for on-time delivery. This session guides through the development process from a Data Scientist's perspective having a limited software development skillset. A closer look is taken on challenges being tackled such as creating and testing topologies, acquiring knowledge about state stores, streams and KTables, and dealing with ksqlDB breaking changes. Takeaways are presented about great developer resources for Kafka Streams and handy ksqlDB functions, to help developers with similar skills working with Apache Kafka.

Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...

confluent

Robin is a Developer Advocate at Confluent, the company founded by the creators of Apache Kafka, as well as an Oracle Groundbreaker Ambassador. His career has always involved data, from the old worlds of COBOL and DB2, through the worlds of Oracle and Hadoop, and into the current world with Kafka. His particular interests are analytics, systems architecture, performance testing and optimization. He blogs at http://cnfl.io/rmoff and http://rmoff.net/ and can be found tweeting grumpy geek thoughts as @rmoff. Outside of work he enjoys drinking good beer and eating fried breakfasts, although generally not at the same time.

Deploying Kafka Streams Applications with Docker and Kubernetes

confluent

Richmond kafka streams intro

confluent

All Streams Ahead! ksqlDB Workshop ANZ

confluent

KSQL and Kafka Streams – When to Use Which, and When to Use Both

confluent

Technical breakout during Confluent’s streaming event in Munich, presented by Michael Noll, Product Manager at Confluent. This three-day hands-on course focused on how to build, manage, and monitor clusters using industry best-practices developed by the world’s foremost Apache Kafka™ experts. The sessions focused on how Kafka and the Confluent Platform work, how their main subsystems interact, and how to set up, manage, monitor, and tune your cluster.

Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...

confluent

Have you ever thought that you needed to be a programmer to do stream processing and build streaming data pipelines? Think again! Apache Kafka is a distributed, scalable, and fault-tolerant streaming platform, providing low-latency pub-sub messaging coupled with native storage and stream processing capabilities. Integrating Kafka with RDBMS, NoSQL, and object stores is simple with Kafka Connect, which is part of Apache Kafka. ksqlDB is the source-available SQL streaming engine for Apache Kafka and makes it possible to build stream processing applications at scale, written using a familiar SQL interface. In this talk, we’ll explain the architectural reasoning for Apache Kafka and the benefits of real-time integration, and we’ll build a streaming data pipeline using nothing but our bare hands, Kafka Connect, and ksqlDB. Gasp as we filter events in real-time! Be amazed at how we can enrich streams of data with data from RDBMS! Be astonished at the power of streaming aggregates for anomaly detection!

Crossing the streams viktor gamov

confluent

First Steps with Apache Kafka on Google Cloud Platform

confluent

Speakers: Jay Smith, Cloud Customer Engineer, Google Cloud + Gwen Shapira, Product Manager, Confluent Curious about Apache Kafka®? Find out why you would want to use the de facto standard for real-time streaming, the easiest way to get started and how to leverage the extensive Apache Kafka ecosystem. In this chat, we'll talk about three common use cases, review stream processing patterns and discuss integration with important GCP services such as BigQuery. We'll also demo how to implement real-time clickstream analytics on Confluent Cloud, fully managed Apache Kafka as a service.

Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!

confluent

What is Apache Kafka®?

confluent

Viktor Gamov, Confluent, Developer Advocate Apache Kafka is an open source distributed streaming platform that allows you to build applications and process events as they occur. Viktor Gamov (developer Advocate at Confluent) walks through how it works and important underlying concepts. As a real-time, scalable, and durable system, Kafka can be used for fault-tolerant storage as well as for other use cases, such as stream processing, centralized data management, metrics, log aggregation, event sourcing, and more. This talk will explain what a streaming platform such as Apache Kafka is and some of the use cases and design patterns around its use—including several examples of where it is solving real business problems. https://www.meetup.com/Chennai-Kafka/events/269942117/

KSQL: Open Source Streaming for Apache Kafka

confluent

ksqlDB Workshop

confluent

Shift Remote: WEB - GraphQL and React – Quick Start - Dubravko Bogovic (Infobip)

Shift Conference

KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka

Kai Wähner

Agenda: Apache Kafka Ecosystem Kafka Streams as Foundation for KSQL Motivation for KSQL KSQL Concepts Live Demo #1 – Intro to KSQL KSQL Architecture Live Demo #2 - Clickstream Analysis Building a User Defined Function (Example: Machine Learning) Getting Started ### The rapidly expanding world of stream processing can be daunting, with new concepts such as various types of time semantics, windowed aggregates, changelogs, and programming frameworks to master. KSQL is an open-source, Apache 2.0 licensed streaming SQL engine on top of Apache Kafka which aims to simplify all this and make stream processing available to everyone. Even though it is simple to use, KSQL is built for mission-critical and scalable production deployments (using Kafka Streams under the hood). Benefits of using KSQL include No coding required; no additional analytics cluster needed; streams and tables as first-class constructs; access to the rich Kafka ecosystem. This session introduces the concepts and architecture of KSQL. Use cases such as Streaming ETL, Real-Time Stream Monitoring or Anomaly Detection are discussed. A live demo shows how to setup and use KSQL quickly and easily on top of your Kafka ecosystem.

Stream Processing Live Traffic Data with Kafka Streams

Tom Van den Bulck

In this workshop we will set up a streaming framework which will process realtime data of traffic sensors installed within the Belgian road system. Starting with the intake of the data, you will learn best practices and the recommended approach to split the information into events in a way that won't come back to haunt you. With some basic stream operations (count, filter, ... ) you will get to know the data and experience how easy it is to get things done with Spring Boot & Spring Cloud Stream. But since simple data processing is not enough to fulfill all your streaming needs, we will also let you experience the power of windows. After this workshop, tumbling, sliding and session windows hold no more mysteries and you will be a true streaming wizard.

Introduction to KSQL: Streaming SQL for Apache Kafka®

confluent

Join Tom Green, Solution Engineer at Confluent for this Lunch and Learn talk covering KSQL. Confluent KSQL is the streaming SQL engine that enables real-time data processing against Apache Kafka®. It provides an easy-to-use, yet powerful interactive SQL interface for stream processing on Kafka, without the need to write code in a programming language such as Java or Python. KSQL is scalable, elastic, fault-tolerant, and it supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, and sessionization. By attending one of these sessions, you will learn: -How to query streams, using SQL, without writing code. -How KSQL provides automated scalability and out-of-the-box high availability for streaming queries -How KSQL can be used to join streams of data from different sources -The differences between Streams and Tables in Apache Kafka

A talk on AWS AppSync

Ryan Jones

In this talk, we will look at how to reduce backend complexity by using a service from AWS called AWS AppSync. AWS AppSync is a fully managed GraphQL service that hooks into other AWS resources easily. We will also talk about how to deploy AWS AppSync using the AWS AppSync Serverless Framework Components. Full video will be hosted on the Serverless Guru Training Site, training.serverlessguru.com.

KSQL – An Open Source Streaming Engine for Apache Kafka

Kai Wähner

The rapidly expanding world of stream processing can be daunting, with new concepts such as various types of time semantics, windowed aggregates, changelogs, and programming frameworks to master. KSQL is an open-source, Apache 2.0 licensed streaming SQL engine on top of Apache Kafka which aims to simplify all this and make stream processing available to everyone. The project is managed and open sourced by Confluent. KSQL makes it easy to read, write, and process streaming data in real-time, at scale, using SQL-like semantics. It offers an easy way to express stream processing logic as an alternative to writing an application in a programming language such as Java, Python or Go. Benefits of using KSQL include: No coding required; no additional analytics cluster needed; streams and tables as first-class constructs; access to the rich Kafka ecosystem. This session introduces the concepts and architecture of KSQL. Use cases such as Streaming ETL, Real Time Stream Monitoring or Anomaly Detection are discussed. A live demo shows how to setup and use KSQL quickly and easily on top of your Kafka ecosystem.

Streaming ETL to Elastic with Apache Kafka and KSQL

confluent

Companies are recognizing the importance of a low-latency, scalable, fault-tolerant data backbone, in the form of the Apache Kafka streaming platform. With Kafka, developers can integrate multiple sources and systems, enableing low latency analytics, event-driven architectures and the population of multiple downstream systems. These data pipelines can be built using configuration alone. In this talk we’ll see how easy it is to stream data from sources such as databases and into Kafka using the Kafka Connect API. We’ll use KSQL to filter, aggregate and join it to other data, and then stream this enriched data from Kafka out into targets such as Elasticsearch. All of this can be accomplished without a single line of code!

Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware

HostedbyConfluent

Spring Cloud Stream is a framework built on top of the foundations of Spring Boot, the foremost JVM framework for developing microservice applications. It brings the familiar patterns and philosophies that Spring has championed for years through its programming model by allowing developers to focus primarily on the business logic of their applications. Kafka Streams is a powerful stream processing library built on top of Apache Kafka and attracts many developers because of its simplicity and deployment models as microservice applications. By developing Kafka Streams applications using Spring Cloud Stream, application developers get the best of both worlds - simpler stream processing execution models of Kafka Streams and battle-tested microservices foundations of Spring Boot via Spring Cloud Stream. This talk will explore: The integration points and various capabilities of Spring Cloud Stream touchpoints with Kafka Streams How to build event streaming applications using Spring’s programming model built on top of Kafka Streams, including a demo of a stateful application using Kafka Streams and Spring Cloud Stream’s functional support How to use interactive queries to expose materialized views from the state stores in the application How this Kafka Streams application can run as part of a data pipeline using Spring Cloud Data Flow in Kubernetes

Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...

Kai Wähner

In the latest webinar of the Confluent Kitchen Tour, Kai Waehner gave an overview of the release Confluent Platform 5.5, which will make life easier for developers in particular: The improvements in data compatibility and updates to ksqlDB are just a few of many customizations. Building an event-driven architecture with Apache Kafka allows the transition from traditional silos and monolithic applications to modern microservices and event streaming applications. These advantages bring with them an increased cross-industry demand for Kafka developers. The Dice Tech Salary Report recently ranked Kafka as the highest-paid technological skill of 2019 - last year it was second. With Confluent Platform 5.5, we're making it even easier for developers to access Kafka and start building event-streaming applications, regardless of the preferred programming language or underlying data formats used in their applications. This Online Talk will cover the key features of Confluent Platform 5.5, including - Support for protobuf and JSON schemas in the Confluent Schema Registry and across the platform - Exactly Once semantics for non-Java clients (C, C++, .NET., Golang, Python, REST Proxy) - Administration Functions in the REST Proxy v3 (Preview) - ksqlDB 0.7 and ksqlDB Flow View in the Conflent Control Center

What's hot

Building a fully Kafka-based product as a Data Scientist | Patrick Neff, BAADER

HostedbyConfluent

Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...

confluent

Deploying Kafka Streams Applications with Docker and Kubernetes

confluent

Richmond kafka streams intro

confluent

All Streams Ahead! ksqlDB Workshop ANZ

confluent

KSQL and Kafka Streams – When to Use Which, and When to Use Both

confluent

Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...

confluent

Crossing the streams viktor gamov

confluent

First Steps with Apache Kafka on Google Cloud Platform

confluent

Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!

confluent

What is Apache Kafka®?

confluent

KSQL: Open Source Streaming for Apache Kafka

confluent

ksqlDB Workshop

confluent

Shift Remote: WEB - GraphQL and React – Quick Start - Dubravko Bogovic (Infobip)

Shift Conference

KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka

Kai Wähner

Stream Processing Live Traffic Data with Kafka Streams

Tom Van den Bulck

Introduction to KSQL: Streaming SQL for Apache Kafka®

confluent

A talk on AWS AppSync

Ryan Jones

KSQL – An Open Source Streaming Engine for Apache Kafka

Kai Wähner

Streaming ETL to Elastic with Apache Kafka and KSQL

confluent

What's hot (20)

Building a fully Kafka-based product as a Data Scientist | Patrick Neff, BAADER

Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...

Deploying Kafka Streams Applications with Docker and Kubernetes

Richmond kafka streams intro

All Streams Ahead! ksqlDB Workshop ANZ

KSQL and Kafka Streams – When to Use Which, and When to Use Both

Apache Kafka and ksqlDB in Action: Let's Build a Streaming Data Pipeline! (Ro...

Crossing the streams viktor gamov

First Steps with Apache Kafka on Google Cloud Platform

Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!

What is Apache Kafka®?

KSQL: Open Source Streaming for Apache Kafka

ksqlDB Workshop

Shift Remote: WEB - GraphQL and React – Quick Start - Dubravko Bogovic (Infobip)

KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka

Stream Processing Live Traffic Data with Kafka Streams

Introduction to KSQL: Streaming SQL for Apache Kafka®

A talk on AWS AppSync

KSQL – An Open Source Streaming Engine for Apache Kafka

Streaming ETL to Elastic with Apache Kafka and KSQL

Similar to I/O intensiveなKafka ConsumerアプリケーションのスループットをLINE Ads Platformではどのように改善したか

Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware

HostedbyConfluent

Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...

Kai Wähner

Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...

HostedbyConfluent

You have been building your applications with stateless microservices. You might even be a rockstar using Kafka for inter service communication. Everything works wonderfully but you feel you could do something more. You want your microservices to have a state. Developing stateful microservices can be hard. I will share my experience with building stateful applications with Kafka and Spring Cloud Stream libraries. Kafka Streams State Stores and Interactive Queries are the main building blocks. They are used by stream processing applications to store and query data. They can scale and be fault tolerant together with your application instances in your container platform. But there are some limitations and we need to know how to monitor their performance. This session is targeted for developers who are interested in learning event streaming practices. Demo application code will be available to participants.

KSQL - Stream Processing simplified!

Guido Schmutz

KSQL is a stream processing SQL engine, which allows stream processing on top of Apache Kafka. KSQL is based on Kafka Stream and provides capabilities for consuming messages from Kafka, analysing these messages in near-realtime with a SQL like language and produce results again to a Kafka topic. By that, no single line of Java code has to be written and you can reuse your SQL knowhow. This lowers the bar for starting with stream processing significantly. KSQL offers powerful capabilities of stream processing, such as joins, aggregations, time windows and support for event time. In this talk I will present how KSQL integrates with the Kafka ecosystem and demonstrate how easy it is to implement a solution using KSQL for most part. This will be done in a live demo on a fictitious IoT sample.

Create cross-platform apps that interact with Microsoft Graph and Office 365 ...

Codemotion

Microsoft Graph is the access point to many Microsoft APIs (Office 365, Outlook.com, Excel Online, SharePoint, OneDrive, Skype, etc...) and unlocks many scenarios like editing Office documents stored in OneDrive from an app, accessing user and organization data, interact with mail, calendar and presence, receiving a call when something change on a file, directory, calendar item, and so on. In this session we'll see how to interact with Microsoft Graph from many cross-platform technologies, both client side (i.e. Xamarin) and server side.

Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu

VMware Tanzu

Real time dashboards with Kafka and Druid

Venu Ryali

Data Microservices In The Cloud + 日本語コメント

Takuya Saeki

App modernization on AWS with Apache Kafka and Confluent Cloud

Kai Wähner

Presentation from AWS ReInvent 2020. Learn how you can accelerate application modernization and benefit from the open-source Apache Kafka ecosystem by connecting your legacy, on-premises systems to the cloud. In this session, hear real customer stories about timely insights gained from event-driven applications built on an event streaming platform from Confluent Cloud running on AWS, which stores and processes historical data and real-time data streams. Confluent makes Apache Kafka enterprise-ready using infinite Kafka storage with Amazon S3 and multiple private networking options including AWS PrivateLink, along with self-managed encryption keys for storage volume encryption with AWS Key Management Service (AWS KMS).

e-KTP Information Extraction with Google Cloud Function & Google Cloud Vision

Imre Nagi

XStream: stream processing platform at facebook

Aniket Mokashi

Enterprise wide publish subscribe with Apache Kafka

Johan Louwers

James Watters Kafka Summit NYC 2019 Keynote

James Watters

Introduction to WSO2 Analytics Platform: 2016 Q2 Update

Srinath Perera

In this talk, we will discuss about the WSO2 Data Analytics platform that brings together all the technologies into one platform. It lets you collect data through a one sensor API, process it using batch, realtime or predictive technologies and communicate your results all within a single platform and user experience. More details https://iwringer.wordpress.com/2015/03/18/introducing-wso2-analytics-platform-note-for-architects/

Fluentd Plugins for CouchDB, Amazon SQS/SNS

Yuri Odagiri

SamzaSQL QCon'16 presentation

Yi Pan

Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...

Databricks

2017 continues to be an exciting year for big data and Apache Spark. I will talk about two major initiatives that Databricks has been building: Structured Streaming, the new high-level API for stream processing, and new libraries that we are developing for machine learning. These initiatives can provide order of magnitude performance improvements over current open source systems while making stream processing and machine learning more accessible than ever before.

Migrate a on-prem platform to the public cloud with Java - SpringBoot and PCF

Roy Braam

Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL

ScyllaDB

Event streaming applications unlock new benefits by combining various data feeds. However, getting actionable insights in a timely fashion has remained a challenge, as the data has been siloed in disparate systems. ksqlDB solves this by providing an interactive SQL interface that can seamlessly combine and transform data from various sources. In this webinar, we will show how streaming queries of high throughput NoSQL systems can derive insights from various push/pull queries via ksqlDB's User-Defined Functions, Aggregate Functions and Table Functions.

Spark and machine learning in microservices architecture

Stepan Pushkarev

Similar to I/O intensiveなKafka ConsumerアプリケーションのスループットをLINE Ads Platformではどのように改善したか (20)

Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware

Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...

Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...

KSQL - Stream Processing simplified!

Create cross-platform apps that interact with Microsoft Graph and Office 365 ...

Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu

Real time dashboards with Kafka and Druid

Data Microservices In The Cloud + 日本語コメント

App modernization on AWS with Apache Kafka and Confluent Cloud

e-KTP Information Extraction with Google Cloud Function & Google Cloud Vision

XStream: stream processing platform at facebook

Enterprise wide publish subscribe with Apache Kafka

James Watters Kafka Summit NYC 2019 Keynote

Introduction to WSO2 Analytics Platform: 2016 Q2 Update

Fluentd Plugins for CouchDB, Amazon SQS/SNS

SamzaSQL QCon'16 presentation

Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...

Migrate a on-prem platform to the public cloud with Java - SpringBoot and PCF

Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL

Spark and machine learning in microservices architecture

Recently uploaded

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Paul Groth

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Jeffrey Haguewood

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams. Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

Tobias Schneck

As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other? Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.

Key Trends Shaping the Future of Infrastructure.pdf

Cheryl Hung

Knowledge engineering: from people to machines and back

Elena Simperl

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Prayukth K V

The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development. The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers: State of global ICS asset and network exposure Sectoral targets and attacks as well as the cost of ransom Global APT activity, AI usage, actor and tactic profiles, and implications Rise in volumes of AI-powered cyberattacks Major cyber events in 2024 Malware and malicious payload trends Cyberattack types and targets Vulnerability exploit attempts on CVEs Attacks on counties – USA Expansion of bot farms – how, where, and why In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East Why are attacks on smart factories rising? Cyber risk predictions Axis of attacks – Europe Systemic attacks in the Middle East Download the full report from here: https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Sri Ambati

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Product School

DevOps and Testing slides at DASA Connect

Kari Kakkonen

Generating a custom Ruby SDK for your web service or Rails API using Smithy

g2nightmarescribd

Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Albert Hoitingh

Connector Corner: Automate dynamic content and events by pushing a button

DianaGray10

Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to: Create a campaign using Mailchimp with merge tags/fields Send an interactive Slack channel message (using buttons) Have the message received by managers and peers along with a test email for review But there’s more: In a second workflow supporting the same use case, you’ll see: Your campaign sent to target colleagues for approval If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team But—if the “Reject” button is pushed, colleagues will be alerted via Slack message Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors. And... Speakers: Akshay Agnihotri, Product Manager Charlie Greenberg, Host

UiPath Test Automation using UiPath Test Suite series, part 4

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap. The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies. Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques What will you get from this session? 1. Insights into SAP testing best practices 2. Heatmap utilization for testing 3. Optimization of testing processes 4. Demo Topics covered: Execution from the test manager Orchestrator execution result Defect reporting SAP heatmap example with demo Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Product School

Bits & Pixels using AI for Good.........

Alison B. Lowndes

PCI PIN Basics Webinar from the Controlcase Team

ControlCase

Recently uploaded (20)

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Assuring Contact Center Experiences for Your Customers With ThousandEyes

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

Key Trends Shaping the Future of Infrastructure.pdf

Knowledge engineering: from people to machines and back

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Designing Great Products: The Power of Design and Leadership by Chief Designe...

DevOps and Testing slides at DASA Connect

Generating a custom Ruby SDK for your web service or Rails API using Smithy

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

When stars align: studies in data quality, knowledge graphs, and machine lear...

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Connector Corner: Automate dynamic content and events by pushing a button

UiPath Test Automation using UiPath Test Suite series, part 4

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Bits & Pixels using AI for Good.........

PCI PIN Basics Webinar from the Controlcase Team

I/O intensiveなKafka ConsumerアプリケーションのスループットをLINE Ads Platformではどのように改善したか

1. I/O Intensive   Kafka consumer   LINE Ads Platform   HARUKI OKADA

2. About Me ● (Okada Haruki) ● : @ocadaruma ● - 2017/09 ● ● 2017/10 - ● LINE ● LINE Ads Platform

3. Timeline LINE NEWS LINE Manga LINE BLOG LINE Ads Platform

4. ● LINE Ads Platform DMP ●   ● ● ML CTR     LINE DMP

5. LINE DMP ● Kafka consumer ● Mobile App Segment

6. ● : ● SDK postback   ● postback Kafka consumer   ● Mobile App Segment

7. Mobile App Segment

8. ● Single Event Processing ● postback event   HBase Redis ● I/O Intensive Mobile App Segment Worker

9. Initial Implementation ● Using Kafka Streams DSL KStreamBuilder builder = new KStreamBuilder(); builder .stream(keySerde, valSerde, "postback-event") .foreach((key, value) -> { if (matches(value)) { // write to storage synchronously writeToStorage(createSegmentData(value)); } }); KafkaStreams streams = new KafkaStreams(builder, config); streams.start();

10. Throughput Issue KStreamBuilder builder = new KStreamBuilder(); builder .stream(keySerde, valSerde, "postback-event") .foreach((key, value) -> { if (matches(value)) { // write to storage synchronously writeToStorage(createSegmentData(value)); } }); KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); ● HBase

11. ● Async Write ? KStreamBuilder builder = new KStreamBuilder(); builder .stream(keySerde, valSerde, "postback-event") .foreach((key, value) -> { if (matches(value)) { writeToStorageAsync(createSegmentData(value)) .whenComplete((result, err) -> { // error handling etc }); } }); KafkaStreams streams = new KafkaStreams(builder, config); streams.start();

12. Problem ● Data loss possibility ● 1. offset[3,4,5] message consume ● 2. HBase Kafka Streams offset commit ● 3. offset=4 record => Data loss

13. Increase Partition & Stream Threads ? ● num.stream.threads config ● 1 Stream Thread consume topic   partition subset ● Max concurrency = num partitions ● ex) Storage write latency 5ms , 100K/sec ● 200/sec per partition ● 500 partitions

14. Downsides ● partition producer broker ● LINE Ads Platform IMF Kafka cluster   cluster partition https://www.slideshare.net/linecorp/multitenancy-kafka-cluster-for-line-services-with-250-billion-daily-messages ※IMF Kafka: LINE Kafka cluster 2500 /

15. Solution ● Kafka Streams Consume offset commit

16. Solution ● consume offset   ● offset complete ● offset complete offset high watermark ● consume loop high watermark commit

17. Solution ● offset consume ● Consumer#pause ● ↑ Decaton

18. Conclusion ● LINE Ads Platform Kafka consumer ● I/O I/O Intensive consumer 

I/O intensiveなKafka ConsumerアプリケーションのスループットをLINE Ads Platformではどのように改善したか

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to I/O intensiveなKafka ConsumerアプリケーションのスループットをLINE Ads Platformではどのように改善したか

Similar to I/O intensiveなKafka ConsumerアプリケーションのスループットをLINE Ads Platformではどのように改善したか (20)

More from LINE Corporation

More from LINE Corporation (20)

Recently uploaded

Recently uploaded (20)

I/O intensiveなKafka ConsumerアプリケーションのスループットをLINE Ads Platformではどのように改善したか