Do you ever feel that your stream processor gets in the way of expressing business requirements? Most processors are frameworks, which are highly opinionated in the design and implementation of apps. Performing Complex Event Processing invariably leads to calling out to other technologies, but what if that integration didn’t require an RPC call or could be modeled into your stream itself? This talk will explore how to build rich domain, low latency, back-pressured, and stateful streaming applications that require very little infrastructure, using Akka Streams and the Alpakka Kafka connector.
We will explore how Alpakka Kafka maps to Kafka features in order to provide a comprehensive understanding of how to build a robust streaming platform. We’ll explore transactional message delivery, defensive consumer group rebalancing, stateful stages, and state durability/persistence. Akka Streams is built on top of Akka, an asynchronous messaging-driven middleware toolkit that can be used to build Erlang-like Actor Systems in Java or Scala. It is used as a JVM library to facilitate common streaming semantics within an existing or standalone application. It’s different from other stream processors in several ways. It natively supports back-pressure flow control inside a single JVM instance or across distributed systems to help prevent overloading downstream infrastructure. It’s perfect for modeling Complex Event Processing with its easy integration into existing apps and Akka Actor systems. Also, unlike most acyclic stream processors, Akka Streams can support sophisticated pipelines, or Graphs, by allowing the user to model cycles (loops) when there’s a need.
Watch this talk here: https://www.confluent.io/online-talks/from-zero-to-hero-with-kafka-connect-on-demand
Integrating Apache Kafka® with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren't working.
This talk will discuss the key design concepts within Apache Kafka Connect and the pros and cons of standalone vs distributed deployment modes. We'll do a live demo of building pipelines with Apache Kafka Connect for streaming data in from databases, and out to targets including Elasticsearch. With some gremlins along the way, we'll go hands-on in methodically diagnosing and resolving common issues encountered with Apache Kafka Connect. The talk will finish off by discussing more advanced topics including Single Message Transforms, and deployment of Apache Kafka Connect in containers.
ksqlDB: A Stream-Relational Database Systemconfluent
Speaker: Matthias J. Sax, Software Engineer, Confluent
ksqlDB is a distributed event streaming database system that allows users to express SQL queries over relational tables and event streams. The project was released by Confluent in 2017 and is hosted on Github and developed with an open-source spirit. ksqlDB is built on top of Apache Kafka®, a distributed event streaming platform. In this talk, we discuss ksqlDB’s architecture that is influenced by Apache Kafka and its stream processing library, Kafka Streams. We explain how ksqlDB executes continuous queries while achieving fault tolerance and high vailability. Furthermore, we explore ksqlDB’s streaming SQL dialect and the different types of supported queries.
Matthias J. Sax is a software engineer at Confluent working on ksqlDB. He mainly contributes to Kafka Streams, Apache Kafka's stream processing library, which serves as ksqlDB's execution engine. Furthermore, he helps evolve ksqlDB's "streaming SQL" language. In the past, Matthias also contributed to Apache Flink and Apache Storm and he is an Apache committer and PMC member. Matthias holds a Ph.D. from Humboldt University of Berlin, where he studied distributed data stream processing systems.
https://db.cs.cmu.edu/events/quarantine-db-talk-2020-confluent-ksqldb-a-stream-relational-database-system/
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKai Wähner
Spoilt for Choice – Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka:
Apache Kafka is a de facto standard streaming data processing platform. It is widely deployed as event streaming platform. Part of Kafka is its stream processing API “Kafka Streams”. In addition, the Kafka ecosystem now offers KSQL, a declarative, SQL-like stream processing language that lets you define powerful stream-processing applications easily. What once took some moderately sophisticated Java code can now be done at the command line with a familiar and eminently approachable syntax.
This session discusses and demos the pros and cons of Kafka Streams and KSQL to understand when to use which stream processing alternative for continuous stream processing natively on Apache Kafka infrastructures. The end of the session compares the trade-offs of Kafka Streams and KSQL to separate stream processing frameworks such as Apache Flink or Spark Streaming.
MySQL users commonly ask: Here's my table, what indexes do I need? Why aren't my indexes helping me? Don't indexes cause overhead? This talk gives you some practical answers, with a step by step method for finding the queries you need to optimize, and choosing the best indexes for them.
Testing Kafka components with Kafka for JUnitMarkus Günther
Kafka for JUnit enables developers to start and stop a complete Kafka cluster comprised of Kafka brokers and distributed Kafka Connect workers from within a JUnit test. It also provides a rich set of convenient accessors to interact with such an embedded or external Kafka cluster in a lean and non-obtrusive way.
Kafka for JUnit can be used to both whitebox-test individual Kafka-based components of your application or to blackbox-test applications that offer an incoming and/or outgoing Kafka-based interface.
This presentation gives a brief introduction into Kafka for JUnit, discussing its design principles and code examples to get developers quickly up to speed using the library.
Common issues with Apache Kafka® Producerconfluent
Badai Aqrandista, Confluent, Senior Technical Support Engineer
This session will be about a common issue in the Kafka Producer: producer batch expiry. We will be discussing the Kafka Producer internals, its common causes, such as a slow network or small batching, and how to overcome them. We will also be sharing some examples along the way!
https://www.meetup.com/apache-kafka-sydney/events/279651982/
Watch this talk here: https://www.confluent.io/online-talks/from-zero-to-hero-with-kafka-connect-on-demand
Integrating Apache Kafka® with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren't working.
This talk will discuss the key design concepts within Apache Kafka Connect and the pros and cons of standalone vs distributed deployment modes. We'll do a live demo of building pipelines with Apache Kafka Connect for streaming data in from databases, and out to targets including Elasticsearch. With some gremlins along the way, we'll go hands-on in methodically diagnosing and resolving common issues encountered with Apache Kafka Connect. The talk will finish off by discussing more advanced topics including Single Message Transforms, and deployment of Apache Kafka Connect in containers.
ksqlDB: A Stream-Relational Database Systemconfluent
Speaker: Matthias J. Sax, Software Engineer, Confluent
ksqlDB is a distributed event streaming database system that allows users to express SQL queries over relational tables and event streams. The project was released by Confluent in 2017 and is hosted on Github and developed with an open-source spirit. ksqlDB is built on top of Apache Kafka®, a distributed event streaming platform. In this talk, we discuss ksqlDB’s architecture that is influenced by Apache Kafka and its stream processing library, Kafka Streams. We explain how ksqlDB executes continuous queries while achieving fault tolerance and high vailability. Furthermore, we explore ksqlDB’s streaming SQL dialect and the different types of supported queries.
Matthias J. Sax is a software engineer at Confluent working on ksqlDB. He mainly contributes to Kafka Streams, Apache Kafka's stream processing library, which serves as ksqlDB's execution engine. Furthermore, he helps evolve ksqlDB's "streaming SQL" language. In the past, Matthias also contributed to Apache Flink and Apache Storm and he is an Apache committer and PMC member. Matthias holds a Ph.D. from Humboldt University of Berlin, where he studied distributed data stream processing systems.
https://db.cs.cmu.edu/events/quarantine-db-talk-2020-confluent-ksqldb-a-stream-relational-database-system/
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKai Wähner
Spoilt for Choice – Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka:
Apache Kafka is a de facto standard streaming data processing platform. It is widely deployed as event streaming platform. Part of Kafka is its stream processing API “Kafka Streams”. In addition, the Kafka ecosystem now offers KSQL, a declarative, SQL-like stream processing language that lets you define powerful stream-processing applications easily. What once took some moderately sophisticated Java code can now be done at the command line with a familiar and eminently approachable syntax.
This session discusses and demos the pros and cons of Kafka Streams and KSQL to understand when to use which stream processing alternative for continuous stream processing natively on Apache Kafka infrastructures. The end of the session compares the trade-offs of Kafka Streams and KSQL to separate stream processing frameworks such as Apache Flink or Spark Streaming.
MySQL users commonly ask: Here's my table, what indexes do I need? Why aren't my indexes helping me? Don't indexes cause overhead? This talk gives you some practical answers, with a step by step method for finding the queries you need to optimize, and choosing the best indexes for them.
Testing Kafka components with Kafka for JUnitMarkus Günther
Kafka for JUnit enables developers to start and stop a complete Kafka cluster comprised of Kafka brokers and distributed Kafka Connect workers from within a JUnit test. It also provides a rich set of convenient accessors to interact with such an embedded or external Kafka cluster in a lean and non-obtrusive way.
Kafka for JUnit can be used to both whitebox-test individual Kafka-based components of your application or to blackbox-test applications that offer an incoming and/or outgoing Kafka-based interface.
This presentation gives a brief introduction into Kafka for JUnit, discussing its design principles and code examples to get developers quickly up to speed using the library.
Common issues with Apache Kafka® Producerconfluent
Badai Aqrandista, Confluent, Senior Technical Support Engineer
This session will be about a common issue in the Kafka Producer: producer batch expiry. We will be discussing the Kafka Producer internals, its common causes, such as a slow network or small batching, and how to overcome them. We will also be sharing some examples along the way!
https://www.meetup.com/apache-kafka-sydney/events/279651982/
GraphQL is a query language for APIs and a runtime for fulfilling those queries. It gives clients the power to ask for exactly what they need, which makes it a great fit for modern web and mobile apps. In this talk, we explain why GraphQL was created, introduce you to the syntax and behavior, and then show how to use it to build powerful APIs for your data. We will also introduce you to AWS AppSync, a GraphQL-powered serverless backend for apps, which you can use to host GraphQL APIs and also add real-time and offline capabilities to your web and mobile apps. You can follow along if you have an AWS account – no GraphQL experience required!
Level: Beginner
Speaker: Rohan Deshpande - Sr. Software Dev Engineer, AWS Mobile Applications
A comparison of different solutions for full-text search in web applications using PostgreSQL and other technology. Presented at the PostgreSQL Conference West, in Seattle, October 2009.
A small attempt to deliver a session on GraphQL Introduction.
I have prepared small Demo by using below technologies: Spring Boot, h2 Database (Server)
Angular 8, Apollo GraphQL Client
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
Apache Calcite is a dynamic data management framework. Think of it as a toolkit for building databases: it has an industry-standard SQL parser, validator, highly customizable optimizer (with pluggable transformation rules and cost functions, relational algebra, and an extensive library of rules), but it has no preferred storage primitives.
In this tutorial (given at BOSS '21 in Copenhagen as part of VLDB '21) the attendees will use Apache Calcite to build a fully fledged query processor from scratch with very few lines of code. This processor is a full implementation of SQL over an Apache Lucene storage engine. (Lucene does not support SQL queries and lacks a declarative language for performing complex operations such as joins or aggregations.) Attendees will also learn how to use Calcite as an effective tool for research.
Presenters: Julian Hyde and Stamatis Zampetakis
What to do if Your Kafka Streams App Gets OOMKilled? with Andrey SerebryanskiyHostedbyConfluent
"Have you ever had your stateful Kafka Streams app killed by Kubernetes with the termination reason ""OOMKilled""? Even if you did set up JVM heap limit, the pod still got killed? This is likely due to your RocksDB off-heap memory usage. This talk will explore ways of diagnosing the problem, including:
- analysis of heap and off-heap memory;
- diving into RocksDB metrics;
- looking at k8s java pod measurements.
It will also show a possible solution to the problem with RocksDB and pod memory tuning. Attendants will get hands-on experience of live debugging Kafka Streams app together with useful tips and tricks on its production usage in Kubernetes. As a Streaming Platform Owner at Raiffeisen Bank and a former Data Engineer with 5+ years of experience, I will share some insights on building scalable and reliable Kafka Streams stateful applications."
(Video of these slides here http://fsharpforfunandprofit.com/rop)
(My response to "this is just Either" here: http://fsharpforfunandprofit.com/rop/#monads)
Many examples in functional programming assume that you are always on the "happy path". But to create a robust real world application you must deal with validation, logging, network and service errors, and other annoyances.
So, how do you handle all this in a clean functional way? This talk will provide a brief introduction to this topic, using a fun and easy-to-understand railway analogy.
Powering Custom Apps at Facebook using Spark Script TransformationDatabricks
Script Transformation is an important and growing use-case for Apache Spark at Facebook. Spark’s script transforms allow users to run custom scripts and binaries directly from SQL and serves as an important means of stitching Facebook’s custom business logic with existing data pipelines.
Along with Spark SQL + UDFs, a growing number of our custom pipelines leverage Spark’s script transform operator to run user-provided binaries for applications such as indexing, parallel training and inference at scale. Spawning custom processes from the Spark executors introduces new challenges in production ranging from external resources allocation/management, structured data serialization, and external process monitoring.
In this session, we will talk about the improvements to Spark SQL (and the resource manager) to support running reliable and performant script transformation pipelines. This includes:
1) cgroup v2 containers for CPU, Memory and IO enforcement,
2) Transform jail for processes namespace management,
3) Support for complex types in Row format delimited SerDe,
4) Protocol Buffers for fast and efficient structured data serialization. Finally, we will conclude by sharing our results, lessons learned and future directions (e.g., transform pipelines resource over-subscription).
[2019] 바르게, 빠르게! Reactive를 품은 Spring KafkaNHN FORWARD
※다운로드하시면 더 선명한 자료를 보실 수 있습니다.
Spring Kafka 2.3에 추가된 Reactive API를 소개합니다.
모니터링시스템에서 감지한 이상 현상을 담당자들에게 통지하는 실제 사례를 중심으로 설명합니다.
Reactive 방식으로 메시지를 발행하고 소비하는 방법을 소개하고, 읽어 들인 이벤트 메시지에 적용해야 할 여러 복잡한 요구 사항을 Rx의 연산자들을 통해 간결하게 구현하는 예제를 공유합니다.
Publisher와 Subscriber 간의 동작 구조를 통해 여러 시스템 그리고 저장소와 연계할 때 주의할 점을 되짚어보고, 특히 Kafka를 이용해서 생길 수 있는 문제와 이를 해결할 방법을 제안합니다.
목차
1. Kafka 메시지를 비동기로 처리하는 방법
2. ReactiveX에서 제공하는 연산자를 활용하는 사례
3. Project Reactor의 내부 구조(Publisher-Subscriber 간 처리 흐름)
대상
- Reactive Programming에 관심 있는 분
- Kafka 등 스트리밍 플랫폼의 메시지 처리량을 높이고 싶은 분
■관련 동영상: https://youtu.be/HzQfJNusnO8
Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-architecture-and-fundamentals-explained-on-demand
This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
This talk provides a comprehensive overview of Kafka architecture and internal functions, including:
-Topics, partitions and segments
-The commit log and streams
-Brokers and broker replication
-Producer basics
-Consumers, consumer groups and offsets
This session is part 2 of 4 in our Fundamentals for Apache Kafka series.
Building distributed systems is challenging. Luckily, Apache Kafka provides a powerful toolkit for putting together big services as a set of scalable, decoupled components. In this talk, I'll describe some of the design tradeoffs when building microservices, and how Kafka's powerful abstractions can help. I'll also talk a little bit about what the community has been up to with Kafka Streams, Kafka Connect, and exactly-once semantics.
Presentation by Colin McCabe, Confluent, Big Data Day LA
Overview of GraphQL
How it is different from REST
When you should consider using it and when you should not
Incremental demos until calling GraphQL from an React application: https://github.com/bary822/graphQL-techtalk
The Alpakka initiative brings together existing systems and their technologies with the Reactive Streams implementation in Akka. This gives you high-level APIs to model streams of data that form Reactive Enterprise Integrations.
We’ll look at examples where we see Alpakka in action with Kafka, MQTT and JMS.
GraphQL is a query language for APIs and a runtime for fulfilling those queries. It gives clients the power to ask for exactly what they need, which makes it a great fit for modern web and mobile apps. In this talk, we explain why GraphQL was created, introduce you to the syntax and behavior, and then show how to use it to build powerful APIs for your data. We will also introduce you to AWS AppSync, a GraphQL-powered serverless backend for apps, which you can use to host GraphQL APIs and also add real-time and offline capabilities to your web and mobile apps. You can follow along if you have an AWS account – no GraphQL experience required!
Level: Beginner
Speaker: Rohan Deshpande - Sr. Software Dev Engineer, AWS Mobile Applications
A comparison of different solutions for full-text search in web applications using PostgreSQL and other technology. Presented at the PostgreSQL Conference West, in Seattle, October 2009.
A small attempt to deliver a session on GraphQL Introduction.
I have prepared small Demo by using below technologies: Spring Boot, h2 Database (Server)
Angular 8, Apollo GraphQL Client
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
Apache Calcite is a dynamic data management framework. Think of it as a toolkit for building databases: it has an industry-standard SQL parser, validator, highly customizable optimizer (with pluggable transformation rules and cost functions, relational algebra, and an extensive library of rules), but it has no preferred storage primitives.
In this tutorial (given at BOSS '21 in Copenhagen as part of VLDB '21) the attendees will use Apache Calcite to build a fully fledged query processor from scratch with very few lines of code. This processor is a full implementation of SQL over an Apache Lucene storage engine. (Lucene does not support SQL queries and lacks a declarative language for performing complex operations such as joins or aggregations.) Attendees will also learn how to use Calcite as an effective tool for research.
Presenters: Julian Hyde and Stamatis Zampetakis
What to do if Your Kafka Streams App Gets OOMKilled? with Andrey SerebryanskiyHostedbyConfluent
"Have you ever had your stateful Kafka Streams app killed by Kubernetes with the termination reason ""OOMKilled""? Even if you did set up JVM heap limit, the pod still got killed? This is likely due to your RocksDB off-heap memory usage. This talk will explore ways of diagnosing the problem, including:
- analysis of heap and off-heap memory;
- diving into RocksDB metrics;
- looking at k8s java pod measurements.
It will also show a possible solution to the problem with RocksDB and pod memory tuning. Attendants will get hands-on experience of live debugging Kafka Streams app together with useful tips and tricks on its production usage in Kubernetes. As a Streaming Platform Owner at Raiffeisen Bank and a former Data Engineer with 5+ years of experience, I will share some insights on building scalable and reliable Kafka Streams stateful applications."
(Video of these slides here http://fsharpforfunandprofit.com/rop)
(My response to "this is just Either" here: http://fsharpforfunandprofit.com/rop/#monads)
Many examples in functional programming assume that you are always on the "happy path". But to create a robust real world application you must deal with validation, logging, network and service errors, and other annoyances.
So, how do you handle all this in a clean functional way? This talk will provide a brief introduction to this topic, using a fun and easy-to-understand railway analogy.
Powering Custom Apps at Facebook using Spark Script TransformationDatabricks
Script Transformation is an important and growing use-case for Apache Spark at Facebook. Spark’s script transforms allow users to run custom scripts and binaries directly from SQL and serves as an important means of stitching Facebook’s custom business logic with existing data pipelines.
Along with Spark SQL + UDFs, a growing number of our custom pipelines leverage Spark’s script transform operator to run user-provided binaries for applications such as indexing, parallel training and inference at scale. Spawning custom processes from the Spark executors introduces new challenges in production ranging from external resources allocation/management, structured data serialization, and external process monitoring.
In this session, we will talk about the improvements to Spark SQL (and the resource manager) to support running reliable and performant script transformation pipelines. This includes:
1) cgroup v2 containers for CPU, Memory and IO enforcement,
2) Transform jail for processes namespace management,
3) Support for complex types in Row format delimited SerDe,
4) Protocol Buffers for fast and efficient structured data serialization. Finally, we will conclude by sharing our results, lessons learned and future directions (e.g., transform pipelines resource over-subscription).
[2019] 바르게, 빠르게! Reactive를 품은 Spring KafkaNHN FORWARD
※다운로드하시면 더 선명한 자료를 보실 수 있습니다.
Spring Kafka 2.3에 추가된 Reactive API를 소개합니다.
모니터링시스템에서 감지한 이상 현상을 담당자들에게 통지하는 실제 사례를 중심으로 설명합니다.
Reactive 방식으로 메시지를 발행하고 소비하는 방법을 소개하고, 읽어 들인 이벤트 메시지에 적용해야 할 여러 복잡한 요구 사항을 Rx의 연산자들을 통해 간결하게 구현하는 예제를 공유합니다.
Publisher와 Subscriber 간의 동작 구조를 통해 여러 시스템 그리고 저장소와 연계할 때 주의할 점을 되짚어보고, 특히 Kafka를 이용해서 생길 수 있는 문제와 이를 해결할 방법을 제안합니다.
목차
1. Kafka 메시지를 비동기로 처리하는 방법
2. ReactiveX에서 제공하는 연산자를 활용하는 사례
3. Project Reactor의 내부 구조(Publisher-Subscriber 간 처리 흐름)
대상
- Reactive Programming에 관심 있는 분
- Kafka 등 스트리밍 플랫폼의 메시지 처리량을 높이고 싶은 분
■관련 동영상: https://youtu.be/HzQfJNusnO8
Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-architecture-and-fundamentals-explained-on-demand
This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
This talk provides a comprehensive overview of Kafka architecture and internal functions, including:
-Topics, partitions and segments
-The commit log and streams
-Brokers and broker replication
-Producer basics
-Consumers, consumer groups and offsets
This session is part 2 of 4 in our Fundamentals for Apache Kafka series.
Building distributed systems is challenging. Luckily, Apache Kafka provides a powerful toolkit for putting together big services as a set of scalable, decoupled components. In this talk, I'll describe some of the design tradeoffs when building microservices, and how Kafka's powerful abstractions can help. I'll also talk a little bit about what the community has been up to with Kafka Streams, Kafka Connect, and exactly-once semantics.
Presentation by Colin McCabe, Confluent, Big Data Day LA
Overview of GraphQL
How it is different from REST
When you should consider using it and when you should not
Incremental demos until calling GraphQL from an React application: https://github.com/bary822/graphQL-techtalk
The Alpakka initiative brings together existing systems and their technologies with the Reactive Streams implementation in Akka. This gives you high-level APIs to model streams of data that form Reactive Enterprise Integrations.
We’ll look at examples where we see Alpakka in action with Kafka, MQTT and JMS.
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
In the world of sensors and social media streams, the integration and handling of high-volume event streams is more important than ever. Events have to be handled both efficiently and reliably and often many consumers or systems are interested in all or part of the events. How do we make sure that all these event are accepted and forwarded in an efficient and reliable way? Apache Kafka, a distributed, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target can be of great help in such scenario.
This session introduces Apache Kafka and its place in a modern architecture, shows its integration with Oracle Stack and presents the Oracle Event Hub cloud service, the managed Kafka service.
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps_Fest
Apache Kafka зараз на хайпі. Все більше компаній починають використовувати її, як message bus. Проте Kafka може набагато більше, аніж бути просто транспортом. Її реальна міць і краса розкриваються, коли Kafka стає центральною нервовою системою вашої архітектури. Вона швидка, надійна і доволі гнучка для різних сценаріїв використання.
На цій доповіді Сергій поділитися досвідом побудови data streaming платформи. Ми поговоримо про те, як Kafka працює, як її потрібно конфігурувати і в які халепи можна потрапити, якщо Kafka використовується неоптимально.
With the advent of reliable streaming technologies, real-time data pipelines have become a crucial component of any robust data initiative today. Compared to a traditional Hadoop-centric data hub, these real-time stacks provide high-levels of system availability and data integrity coupled with very low latency queries without incurring the overhead of inflexible schemas and batch analysis lag.
Alex Silva demonstrates how to use Kafka, Spark Streaming, Akka, and Hadoop to orchestrate a real-time stack and explains how data flows through this system. This real-time data platform combines a mix of open source technologies and home-grown services aimed at providing a full end-to-end solution, starting from flexible data-ingestion protocols to fast data analysis and queries.
Topics include:
External message providers, which connect to the platform through a data-ingestion service modeled as a robust actor system using Akka and Scala
Routing different backend systems, including Kafka and Druid
Spark Streaming, which is used to perform real-time complex analytical and scientific processing on the data
Exporting data for future processing into Hadoop
Querying and visualization
Photo of Alex Silva
Exactly-once Stream Processing with Kafka StreamsGuozhang Wang
I will present the recent additions to Kafka to achieve exactly-once semantics (0.11.0) within its Streams API for stream processing use cases. This is achieved by leveraging the underlying idempotent and transactional client features. The main focus will be the specific semantics that Kafka distributed transactions enable in Streams and the underlying mechanics to let Streams scale efficiently.
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaGuido Schmutz
After a quick overview and introduction of Apache Kafka, this session cover two components which extend the core of Apache Kafka: Kafka Connect and Kafka Streams/KSQL.
Kafka Connects role is to access data from the out-side-world and make it available inside Kafka by publishing it into a Kafka topic. On the other hand, Kafka Connect is also responsible to transport information from inside Kafka to the outside world, which could be a database or a file system. There are many existing connectors for different source and target systems available out-of-the-box, either provided by the community or by Confluent or other vendors. You simply configure these connectors and off you go.
Kafka Streams is a light-weight component which extends Kafka with stream processing functionality. By that, Kafka can now not only reliably and scalable transport events and messages through the Kafka broker but also analyse and process these event in real-time. Interestingly Kafka Streams does not provide its own cluster infrastructure and it is also not meant to run on a Kafka cluster. The idea is to run Kafka Streams where it makes sense, which can be inside a “normal” Java application, inside a Web container or on a more modern containerized (cloud) infrastructure, such as Mesos, Kubernetes or Docker. Kafka Streams has a lot of interesting features, such as reliable state handling, queryable state and much more. KSQL is a streaming engine for Apache Kafka, providing a simple and completely interactive SQL interface for processing data in Kafka.
The advent of service-oriented architecture (SOA) and microservices bring the challenge to handle messaging reliably between tens, hundreds or even thousands of software services. These services not only have to be able to respond to events in a timely manner, they also need to be failure-tolerant. Kafka offers a reliable backbone to service messaging. The talk will highlight the key advantages of using Kafka over traditional HTTP-based communication and will also discuss challenges to tackle.
The presentation was part of a talk at the London Java Community meetup organised by RecWorks on 7th Feb, 2018.
Event details: https://www.meetup.com/Londonjavacommunity/events/246897405/
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020confluent
Apache Kafka sits at the center of a technology ecosystem that can be a bit overwhelming to someone just getting started. Fortunately, Apache Kafka is also at the heart of an amazing community that is able and eager to help! So, if you are new, or relatively new, to Apache Kafka, welcome! I’d like to introduce you to the Kafka ecosystem, and present you with a plan for how to learn and be productive with it. I’d also like to introduce you to one of the most helpful and welcoming software communities I’ve ever encountered.
I’ll take you through the basics of Kafka—the brokers, the partitions, the topics—and then on and up into the different APIs and tools that are available to work with it. Consider it a Kafka 101, if you will. We’ll stay at a high level, but we’ll cover a lot of ground, with an emphasis on where and how you can dig in deeper.
I am still learning myself, so I will share with you what and who have helped me in my journey, and then I’ll invite you to continue that journey with me. It’s going to be a great adventure!
Real-time streaming and data pipelines with Apache KafkaJoe Stein
Get up and running quickly with Apache Kafka http://kafka.apache.org/
* Fast * A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients.
* Scalable * Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers
* Durable * Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact.
* Distributed by Design * Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
High level introduction to Confluent REST Proxy and Schema Registry (leveraging Apache Avro under the hood), two components of the Apache Kafka open source ecosystem. See the concepts, architecture and features.
Similar to Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbend) Kafka Summit NYC 2019 (20)
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
In our exclusive webinar, you'll learn why event-driven architecture is the key to unlocking cost efficiency, operational effectiveness, and profitability. Gain insights on how this approach differs from API-driven methods and why it's essential for your organization's success.
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
In today's data-driven world, the Internet of Things (IoT) is revolutionizing industries and unlocking new possibilities. Join Data Reply, Confluent, and Imply as we unveil a comprehensive solution for IoT that harnesses the power of real-time insights.
Workshop híbrido: Stream Processing con Flinkconfluent
El Stream processing es un requisito previo de la pila de data streaming, que impulsa aplicaciones y pipelines en tiempo real.
Permite una mayor portabilidad de datos, una utilización optimizada de recursos y una mejor experiencia del cliente al procesar flujos de datos en tiempo real.
En nuestro taller práctico híbrido, aprenderás cómo filtrar, unir y enriquecer fácilmente datos en tiempo real dentro de Confluent Cloud utilizando nuestro servicio Flink sin servidor.
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
Our talk will explore the transformative impact of integrating Confluent, HiveMQ, and SparkPlug in Industry 4.0, emphasizing the creation of a Unified Namespace.
In addition to the creation of a Unified Namespace, our webinar will also delve into Stream Governance and Scaling, highlighting how these aspects are crucial for managing complex data flows and ensuring robust, scalable IIoT-Platforms.
You will learn how to ensure data accuracy and reliability, expand your data processing capabilities, and optimize your data management processes.
Don't miss out on this opportunity to learn from industry experts and take your business to the next level.
La arquitectura impulsada por eventos (EDA) será el corazón del ecosistema de MAPFRE. Para seguir siendo competitivas, las empresas de hoy dependen cada vez más del análisis de datos en tiempo real, lo que les permite obtener información y tiempos de respuesta más rápidos. Los negocios con datos en tiempo real consisten en tomar conciencia de la situación, detectar y responder a lo que está sucediendo en el mundo ahora.
Eventos y Microservicios - Santander TechTalkconfluent
Durante esta sesión examinaremos cómo el mundo de los eventos y los microservicios se complementan y mejoran explorando cómo los patrones basados en eventos nos permiten descomponer monolitos de manera escalable, resiliente y desacoplada.
Purpose of the session is to have a dive into Apache, Kafka, Data Streaming and Kafka in the cloud
- Dive into Apache Kafka
- Data Streaming
- Kafka in the cloud
Build real-time streaming data pipelines to AWS with Confluentconfluent
Traditional data pipelines often face scalability issues and challenges related to cost, their monolithic design, and reliance on batch data processing. They also typically operate under the premise that all data needs to be stored in a single centralized data source before it's put to practical use. Confluent Cloud on Amazon Web Services (AWS) provides a fully managed cloud-native platform that helps you simplify the way you build real-time data flows using streaming data pipelines and Apache Kafka.
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
No matter whether you are migrating your Kafka cluster to Confluent Cloud, running a cloud-hybrid environment or are in a different situation where data protection and encryption of sensitive information is required, Confluent Service Mesh allows you to transparently encrypt your data without the need to make code changes to you existing applications.
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
Microservices have become a dominant architectural paradigm for building systems in the enterprise, but they are not without their tradeoffs. Learn how to build event-driven microservices with Apache Kafka
Confluent & GSI Webinars series - Session 3confluent
An in depth look at how Confluent is being used in the financial services industry. Gain an understanding of how organisations are utilising data in motion to solve common problems and gain benefits from their real time data capabilities.
It will look more deeply into some specific use cases and show how Confluent technology is used to manage costs and mitigate risks.
This session is aimed at Solutions Architects, Sales Engineers and Pre Sales, and also the more technically minded business aligned people. Whilst this is not a deeply technical session, a level of knowledge around Kafka would be helpful.
Transforming applications built with traditional messaging solutions such as TIBCO, MQ and Solace to be scalable, reliable and ready for the move to cloud
How can applications built with traditional messaging technologies like TIBCO, Solace and IBM MQ be modernised and be made cloud ready? What are the advantages to Event Streaming approaches to pub/sub vs traditional message queues? What are the strengeths and weaknesses of both approaches, and what use cases and requirements are actually a better fit for messaging than Kafka?
This session will show why the old paradigm does not work and that a new approach to the data strategy needs to be taken. It aims to show how a Data Streaming Platform is integral to the evolution of a company’s data strategy and how Confluent is not just an integration layer but the central nervous system for an organisation
Vous apprendrez également à :
• Créer plus rapidement des produits et fonctionnalités à l’aide d’une suite complète de connecteurs et d’outils de gestion des flux, et à connecter vos environnements à des pipelines de données
• Protéger vos données et charges de travail les plus critiques grâce à des garanties intégrées en matière de sécurité, de gouvernance et de résilience
• Déployer Kafka à grande échelle en quelques minutes tout en réduisant les coûts et la charge opérationnelle associés
Confluent Partner Tech Talk with Synthesisconfluent
A discussion on the arduous planning process, and deep dive into the design/architectural decisions.
Learn more about the networking, RBAC strategies, the automation, and the deployment plan.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
2. Who am I?
I’m Sean Glover
• Principal Engineer at Lightbend
• Member of the Lightbend Pipelines team
• Organizer of Scala Toronto (scalator)
• Author and contributor to various projects in the Kafka
ecosystem including Kafka, Alpakka Kafka (reactive-kafka),
Strimzi, Kafka Lag Exporter, DC/OS Commons SDK
2
/ seg1o
3. 3
“ “The Alpakka project is an initiative to
implement a library of integration
modules to build stream-aware, reactive,
pipelines for Java and Scala.
7. 7
“
“
Akka Streams is a library toolkit to provide low latency
complex event processing streaming semantics using the
Reactive Streams specification implemented internally with
an Akka actor system.
streams
9. Reactive Streams Specification
9
“ “Reactive Streams is an initiative to
provide a standard for asynchronous
stream processing with non-blocking
back pressure.
http://www.reactive-streams.org/
11. GraphStageAkka Actor Concepts
11
Mailbox
1. Constrained Actor Mailbox
2. One message at a time
“Single Threaded Illusion”
3. May contain state
Actor
State
// Message Handler “Receive Block”
def receive = {
case message: MessageType =>
}
12. Back Pressure Demo
12
Source Flow Sink
Source Kafka Topic
Destination Kafka Topic
I need some
messages.
Demand request is
sent upstream
I need to load
some messages
for downstream
...
Key: EN, Value: {“message”: “Hi Akka!” }
Key: FR, Value: {“message”: “Salut Akka!” }
Key: ES, Value: {“message”: “Hola Akka!” }
...
Demand satisfied
downstream
...
Key: EN, Value: {“message”: “Bye Akka!” }
Key: FR, Value: {“message”: “Au revoir Akka!” }
Key: ES, Value: {“message”: “Adiós Akka!” }
...
openclipart
13. Dynamic Push Pull
13
Source Flow
Bounded Mailbox
Flow sends demand request
(pull) of 5 messages max
x
I can handle 5 more
messages
Source sends (push) a batch
of 5 messages downstream
I can’t send more
messages
downstream
because I no more
demand to fulfill.
Flow’s mailbox is full!
Slow Consumer
Fast Producer
openclipart
14. Why Back Pressure?
• Prevent cascading failure
• Alternative to using a big buffer (i.e. Kafka)
• Back Pressure flow control can use several strategies
• Slow down until there’s demand (classic back pressure, “throttling”)
• Discard elements
• Buffer in memory to some max, then discard elements
• Shutdown
14
15. Why Back Pressure? A case study.
15
https://medium.com/@programmerohit/back-press
ure-implementation-aws-sqs-polling-from-a-shard
ed-akka-cluster-running-on-kubernetes-56ee8c67
efb
16. Akka Streams Factorial Example
import ...
object Main extends App {
implicit val system = ActorSystem("QuickStart")
implicit val materializer = ActorMaterializer()
val source: Source[Int, NotUsed] = Source(1 to 100)
val factorials = source.scan(BigInt(1))((acc, next) ⇒ acc * next)
val result: Future[IOResult] =
factorials
.map(num => ByteString(s"$numn"))
.runWith(FileIO.toPath(Paths.get("factorials.txt")))
}
16
https://doc.akka.io/docs/akka/2.5/stream/stream-quickstart.html
17. Apache Kafka
17
Kafka Documentation
“ “Apache Kafka is a distributed
streaming system. It’s best suited to
support fast, high volume, and fault
tolerant, data streaming platforms.
19. When to use Alpakka Kafka?
1. To build back pressure aware integrations
2. Complex Event Processing
3. A need to model the most complex of graphs
19
21. Alpakka Kafka Setup
val consumerClientConfig = system.settings. config.getConfig( "akka.kafka.consumer")
val consumerSettings =
ConsumerSettings(consumerClientConfig, new StringDeserializer, new ByteArrayDeserializer)
.withBootstrapServers( "localhost:9092")
.withGroupId( "group1")
.withProperty(ConsumerConfig. AUTO_OFFSET_RESET_CONFIG, "earliest")
val producerClientConfig = system.settings. config.getConfig( "akka.kafka.producer")
val producerSettings = ProducerSettings(system, new StringSerializer, new ByteArraySerializer)
.withBootstrapServers( "localhost:9092")
21
Alpakka Kafka config & Kafka Client
config can go here
Set ad-hoc Kafka client config
22. Anatomy of an Alpakka Kafka App
val control = Consumer
.committableSource(consumerSettings, Subscriptions. topics(topic1))
.map { msg =>
ProducerMessage. single(
new ProducerRecord(topic1, msg.record.key, msg.record.value),
passThrough = msg.committableOffset)
}
.via(Producer. flexiFlow(producerSettings))
.map(_.passThrough)
.toMat(Committer. sink(committerSettings))(Keep. both)
.mapMaterializedValue(DrainingControl. apply)
.run()
// Add shutdown hook to respond to SIGTERM and gracefully shutdown stream
sys.ShutdownHookThread {
Await.result(control.shutdown(), 10.seconds)
}
22
A small Consume -> Transform -> Produce Akka Streams app using Alpakka Kafka
23. val control = Consumer
.committableSource(consumerSettings, Subscriptions. topics(topic1))
.map { msg =>
ProducerMessage. single(
new ProducerRecord(topic1, msg.record.key, msg.record.value),
passThrough = msg.committableOffset)
}
.via(Producer. flexiFlow(producerSettings))
.map(_.passThrough)
.toMat(Committer. sink(committerSettings))(Keep. both)
.mapMaterializedValue(DrainingControl. apply)
.run()
// Add shutdown hook to respond to SIGTERM and gracefully shutdown stream
sys.ShutdownHookThread {
Await.result(control.shutdown(), 10.seconds)
}
23
Anatomy of an Alpakka Kafka App
The Committable Source propagates Kafka offset
information downstream with consumed messages
24. val control = Consumer
.committableSource(consumerSettings, Subscriptions. topics(topic1))
.map { msg =>
ProducerMessage. single(
new ProducerRecord(topic1, msg.record.key, msg.record.value),
passThrough = msg.committableOffset)
}
.via(Producer. flexiFlow(producerSettings))
.map(_.passThrough)
.toMat(Committer. sink(committerSettings))(Keep. both)
.mapMaterializedValue(DrainingControl. apply)
.run()
// Add shutdown hook to respond to SIGTERM and gracefully shutdown stream
sys.ShutdownHookThread {
Await.result(control.shutdown(), 10.seconds)
}
24
Anatomy of an Alpakka Kafka App
ProducerMessage used to map consumed offset to
transformed results.
One to One (1:1)
ProducerMessage. single
One to Many (1:M)
ProducerMessage. multi(
immutable.Seq(
new ProducerRecord(topic1, msg.record.key, msg.record.value),
new ProducerRecord(topic2, msg.record.key, msg.record.value)),
passthrough = msg.committableOffset
)
One to None (1:0)
ProducerMessage. passThrough(msg.committableOffset)
25. val control = Consumer
.committableSource(consumerSettings, Subscriptions. topics(topic1))
.map { msg =>
ProducerMessage. single(
new ProducerRecord(topic1, msg.record.key, msg.record.value),
passThrough = msg.committableOffset)
}
.via(Producer. flexiFlow(producerSettings))
.map(_.passThrough)
.toMat(Committer. sink(committerSettings))(Keep. both)
.mapMaterializedValue(DrainingControl. apply)
.run()
// Add shutdown hook to respond to SIGTERM and gracefully shutdown stream
sys.ShutdownHookThread {
Await.result(control.shutdown(), 10.seconds)
}
25
Anatomy of an Alpakka Kafka App
Produce messages to destination topic
flexiFlow accepts new ProducerMessage
type and will replace deprecated flow in the
future.
26. val control = Consumer
.committableSource(consumerSettings, Subscriptions. topics(topic1))
.map { msg =>
ProducerMessage. single(
new ProducerRecord(topic1, msg.record.key, msg.record.value),
passThrough = msg.committableOffset)
}
.via(Producer. flexiFlow(producerSettings))
.map(_.passThrough)
.toMat( Committer.sink(committerSettings))(Keep.both)
.mapMaterializedValue(DrainingControl. apply)
.run()
// Add shutdown hook to respond to SIGTERM and gracefully shutdown stream
sys.ShutdownHookThread {
Await.result(control.shutdown(), 10.seconds)
}
26
Anatomy of an Alpakka Kafka App
Batches consumed offset commits
Passthrough allows us to track what messages
have been successfully processed for At Least
Once message delivery guarantees.
27. val control = Consumer
.committableSource(consumerSettings, Subscriptions. topics(topic1))
.map { msg =>
ProducerMessage. single(
new ProducerRecord(topic1, msg.record.key, msg.record.value),
passThrough = msg.committableOffset)
}
.via(Producer. flexiFlow(producerSettings))
.map(_.passThrough)
.toMat(Committer. sink(committerSettings))(Keep. both)
.mapMaterializedValue(DrainingControl. apply)
.run()
// Add shutdown hook to respond to SIGTERM and gracefully shutdown stream
sys.ShutdownHookThread {
Await.result(control.shutdown(), 10.seconds)
}
27
Anatomy of an Alpakka Kafka App
Gacefully shutdown stream
1. Stop consuming (polling) new messages
from Source
2. Wait for all messages to be successfully
committed (when applicable)
3. Wait for all produced messages to ACK
28. val control = Consumer
.committableSource(consumerSettings, Subscriptions. topics(topic1))
.map { msg =>
ProducerMessage. single(
new ProducerRecord(topic1, msg.record.key, msg.record.value),
passThrough = msg.committableOffset)
}
.via(Producer. flexiFlow(producerSettings))
.map(_.passThrough)
.toMat(Committer. sink(committerSettings))(Keep. both)
.mapMaterializedValue(DrainingControl. apply)
.run()
// Add shutdown hook to respond to SIGTERM and gracefully shutdown stream
sys.ShutdownHookThread {
Await.result(control.shutdown(), 10.seconds)
}
28
Anatomy of an Alpakka Kafka App
Graceful shutdown when SIGTERM sent to
app (i.e. by docker daemon)
Force shutdown after grace interval
30. Why use Consumer Groups?
1. Easy, robust, and
performant scaling of
consumers to reduce
consumer lag
30
31. Back
Pressure
Consumer Group
Latency and Offset Lag
31
Cluster
Topic
Producer 1
Producer 2
Producer n
...
Throughput: 10 MB/s
Consumer 1
Consumer 2
Consumer 3
Consumer Throughput ~3 MB/s
each
~9 MB/s
Total offset lag and latency is
growing.
openclipart
32. Consumer Group
Latency and Offset Lag
32
Cluster
Topic
Producer 1
Producer 2
Producer n
...
Data Throughput: 10 MB/s
Consumer 1
Consumer 2
Consumer 3
Consumer 4
Add new consumer and rebalance
Consumers now can support a
throughput of ~12 MB/s
Offset lag and latency decreases until
consumers are caught up
34. Anatomy of a Consumer Group
34
Client A
Client B
Client C
Cluster
Consumer Group
Partitions: 0,1,2
Partitions: 3,4,5
Partitions: 6,7,8
Consumer Group Offsets topic
Ex)
P0: 100489
P1: 128048
P2: 184082
P3: 596837
P4: 110847
P5: 99472
P6: 148270
P7: 3582785
P8: 182483
Consumer
Offset Log
T3
T1 T2
Consumer
Group
Coordinator
Consumer Group Topic Subscription
Important Consumer Group Client Config
Topic Subscription:
Subscription.topics(“Topic1”, “Topic2”, “Topic3”)
Kafka Consumer Properties:
group.id: [“my-group”]
session.timeout.ms: [30000 ms]
partition.assignment.strategy: [RangeAssignor]
heartbeat.interval.ms: [3000 ms]
Consumer Group
Leader
35. Consumer Group Rebalance (1/7)
35
Client A
Client B
Client C
Cluster
Consumer Group
Partitions: 0,1,2
Partitions: 3,4,5
Partitions: 6,7,8
Consumer
Offset Log
T3
T1 T2
Consumer
Group
Coordinator
Consumer Group
Leader
36. Consumer Group Rebalance (2/7)
36
Client D
Client A
Client B
Client C
Cluster
Consumer Group
Partitions: 0,1,2
Partitions: 3,4,5
Partitions: 6,7,8
Consumer
Offset Log
T3
T1 T2
Consumer
Group
Coordinator
Consumer Group
Leader
Client D requests to join the consumer groupNew Client D with same group.id sends a request to join
the group to Coordinator
37. Consumer Group Rebalance (3/7)
37
Client D
Client A
Client B
Client C
Cluster
Consumer Group
Partitions: 0,1,2
Partitions: 3,4,5
Partitions: 6,7,8
Consumer
Offset Log
T3
T1 T2
Consumer
Group
Coordinator
Consumer Group
Leader
Consumer group coordinator requests group leader to
calculate new Client:partition assignments.
38. Consumer Group Rebalance (4/7)
38
Client D
Client A
Client B
Client C
Cluster
Consumer Group
Partitions: 0,1,2
Partitions: 3,4,5
Partitions: 6,7,8
Consumer
Offset Log
T3
T1 T2
Consumer
Group
Coordinator
Consumer Group
Leader
Consumer group leader sends new Client:Partition
assignment to group coordinator.
39. Consumer Group Rebalance (5/7)
39
Client D
Client A
Client B
Client C
Cluster
Consumer Group
Assign Partitions: 0,1
Assign Partitions: 2,3
Assign
Partitions:6,7,8
Consumer
Offset Log
T3
T1 T2
Consumer
Group
Coordinator
Consumer Group
Leader
Consumer group coordinator informs all clients of their
new Client:Partition assignments.
Assign Partitions: 4,5
40. Consumer Group Rebalance (6/7)
40
Client D
Client A
Client B
Client C
Cluster
Consumer Group
Consumer
Offset Log
T3
T1 T2
Consumer
Group
Coordinator
Consumer Group
Leader
Clients that had partitions revoked are given the chance
to commit their latest processed offsets.
Partitions
to
C
om
m
it:2
Partitions to Commit: 3,5
Partitions to Commit: 6,7,8
41. Consumer Group Rebalance (7/7)
41
Client D
Client A
Client B
Client C
Cluster
Consumer Group
Consumer
Offset Log
T3
T1 T2
Consumer
Group
Coordinator
Consumer Group
Leader
Rebalance complete. Clients begin consuming partitions
from their last committed offsets.
Partitions: 0,1
Partitions: 2,3
Partitions: 4,5
Partitions: 6,7,8
42. Consumer Group Rebalancing (asynchronous)
42
class RebalanceListener extends Actor with ActorLogging {
def receive: Receive = {
case TopicPartitionsAssigned(sub, assigned) =>
case TopicPartitionsRevoked(sub, revoked) =>
processRevokedPartitions(revoked)
}
}
val subscription = Subscriptions.topics("topic1", "topic2")
.withRebalanceListener(system.actorOf(Props[RebalanceListener]))
val control = Consumer.committableSource(consumerSettings, subscription)
...
Declare an Akka Actor to handle
assigned and revoked partition messages
asynchronously.
Useful to perform asynchronous actions
during rebalance, but not for blocking
operations you want to happen during
rebalance.
43. Consumer Group Rebalancing (asynchronous)
43
class RebalanceListener extends Actor with ActorLogging {
def receive: Receive = {
case TopicPartitionsAssigned(sub, assigned) =>
case TopicPartitionsRevoked(sub, revoked) =>
processRevokedPartitions(revoked)
}
}
val subscription = Subscriptions.topics("topic1", "topic2")
.withRebalanceListener(system.actorOf(Props[RebalanceListener]))
val control = Consumer.committableSource(consumerSettings, subscription)
...
Add Actor Reference to Topic
subscription to use.
44. Consumer Group Rebalancing (synchronous)
Synchronous partition assignment handler for next release by Enno Runne
https://github.com/akka/alpakka-kafka/pull/761
● Synchronous operations difficult to model in async library
● Will allow users block Consumer Actor thread (Consumer.poll thread)
● Provides limited consumer operations
○ Seek to offset
○ Synchronous commit
44
46. Kafka Transactions
46
“ “
Transactions enable atomic writes to
multiple Kafka topics and partitions.
All of the messages included in the
transaction will be successfully written
or none of them will be.
48. Exactly Once Delivery vs Exactly Once Processing
48
“ “Exactly-once message delivery is
impossible between two parties where
failures of communication are
possible.
Two Generals/Byzantine Generals problem
49. Why use Transactions?
1. Zero tolerance for duplicate messages
2. Less boilerplate (deduping, client offset
management)
49
50. Anatomy of Kafka Transactions
50
Client
Cluster
Consumer
Offset Log
Topic Sub
Consumer
Group
Coordinator
Transaction
Log
Transaction
Coordinator
Topic Dest
Transformation
CM UM UM CM UM UM
Control Messages
Important Client Config
Topic Subscription:
Subscription.topics(“Topic1”, “Topic2”, “Topic3”)
Destination topic partitions get included in the transaction based on messages that are
produced.
Kafka Consumer Properties:
group.id: “my-group”
isolation.level: “read_committed”
plus other relevant consumer group configuration
Kafka Producer Properties:
transactional.id: “my-transaction”
enable.idempotence: “true” (implicit)
max.in.flight.requests.per.connection: “1” (implicit)
“Consume, Transform, Produce”
51. Kafka Features That Enable Transactions
1. Idempotent producer
2. Multiple partition atomic writes
3. Consumer read isolation level
51
57. Multiple Partition Atomic Writes
57
Client
Consumer
Offset Log
Transactions
Log
User Defined
Partition 1
User Defined
Partition 2
User Defined
Partition 3
Cluster Transaction and Consumer Group Coordinators
CM
UM
UM
CM
UM
UM
CM
UM
UM
CM
CM
CM
CM
CM
CM
Ex) Second phase of two phase commit
KafkaProducer.commitTransaction()
Last Offset Processed for
Consumer Subscription
Transaction Committed
(internal) Transaction Committed control
messages (user topics)
Multiple Partitions Committed Atomically, “All or nothing”
58. Consumer Read Isolation Level
58
Client
User Defined
Partition 1
User Defined
Partition 2
User Defined
Partition 3Cluster
CM
UM
UM
CM
UM
UM
CM
UM
UM
Kafka Consumer Properties:
isolation.level: “read_committed”
61. Transactional GraphStage (1/7)
61
Transactional GraphStage
Transaction
Flow
Back Pressure Status
Resume Demand
Waiting for ACK
Commit Loop
Waiting
Transaction Status
Begin Transaction
Mailbox
62. Transactional GraphStage (2/7)
62
Transactional GraphStage
Transaction
Flow
Back Pressure Status
Resume Demand
Waiting for ACK
Commit Loop
Commit Interval Elapses
Transaction Status
Transaction is Open
Mailbox
Messages flowing
63. Transactional GraphStage (3/7)
63
Transactional GraphStage
Transaction
Flow
Back Pressure Status
Resume Demand
Waiting for ACK
Transaction Status
Transaction is Open
Commit Loop
Commit Interval Elapses
Messages flowing
Mailbox
Commit loop “tick”
message
100ms
64. Transactional GraphStage (4/7)
64
Transactional GraphStage
Transaction
Flow
Back Pressure Status
Suspend Demand
Waiting for ACK
Transaction Status
Transaction is Open
Commit Loop
Commit Interval Elapses
x
Mailbox
Messages stopped
65. Transactional GraphStage (5/7)
65
Transactional GraphStage
Transaction
Flow
Back Pressure Status
Suspend Demand
Waiting for ACK
Transaction Status
Send Consumed Offsets
Commit Loop
Commit Interval Elapses
x
Mailbox
Messages stopped
66. Transactional GraphStage (6/7)
66
Transactional GraphStage
Transaction
Flow
Back Pressure Status
Suspend Demand
Waiting for ACK
Transaction Status
Commit Transaction
Commit Loop
Commit Interval Elapses
x
Mailbox
Messages stopped
67. Transactional GraphStage (7/7)
67
Transactional GraphStage
Transaction
Flow
Back Pressure Status
Resume Demand
Waiting for ACK
Commit Loop
Waiting
Transaction Status
Begin New Transaction
Mailbox
Messages flowing
again
68. Alpakka Kafka Transactions
68
val producerSettings = ProducerSettings(system, new StringSerializer, new ByteArraySerializer)
.withBootstrapServers("localhost:9092")
.withEosCommitInterval(100.millis)
val control =
Transactional
.source(consumerSettings, Subscriptions.topics("source-topic"))
.via(transform)
.map { msg =>
ProducerMessage.Single(new ProducerRecord[String, Array[Byte]]("sink-topic", msg.record.value),
msg.partitionOffset)
}
.to(Transactional.sink(producerSettings, "transactional-id"))
.run()
Optionally provide a Transaction
commit interval (default is 100ms)
69. Alpakka Kafka Transactions
69
val producerSettings = ProducerSettings(system, new StringSerializer, new ByteArraySerializer)
.withBootstrapServers("localhost:9092")
.withEosCommitInterval(100.millis)
val control =
Transactional
.source(consumerSettings, Subscriptions.topics("source-topic"))
.via(transform)
.map { msg =>
ProducerMessage.Single(new ProducerRecord[String, Array[Byte]]("sink-topic", msg.record.value),
msg.partitionOffset)
}
.to(Transactional.sink(producerSettings, "transactional-id"))
.run()
Use Transactional.source to
propagate necessary info to
Transactional.sink (CG ID,
Offsets)
70. Alpakka Kafka Transactions
70
val producerSettings = ProducerSettings(system, new StringSerializer, new ByteArraySerializer)
.withBootstrapServers("localhost:9092")
.withEosCommitInterval(100.millis)
val control =
Transactional
.source(consumerSettings, Subscriptions.topics("source-topic"))
.via(transform)
.map { msg =>
ProducerMessage.Single(new ProducerRecord[String, Array[Byte]]("sink-topic", msg.record.value),
msg.partitionOffset)
}
.to(Transactional.sink(producerSettings, "transactional-id"))
.run()
Call Transactional.sink or .flow
to produce and commit messages.
72. What is Complex Event Processing (CEP)?
72
“ “
Complex event processing, or CEP, is
event processing that combines data
from multiple sources to infer events
or patterns that suggest more
complicated circumstances.
Foundations of Complex Event Processing, Cornell
73. Calling into an Akka Actor System
73
Source
Ask
? SinkCluster Cluster
“Ask pattern” models non-blocking request and
response of Akka messages.
openclipart
Actor System
& JVM
Actor System
& JVM
Actor System
& JVM
Cluster
Router
Akka Cluster/Actor System
Actor
74. Actor System Integration
class ProblemSolverRouter extends Actor {
def receive = {
case problem: Problem =>
val solution = businessLogic(problem)
sender() ! solution // reply to the ask
}
}
...
val control = Consumer
.committableSource(consumerSettings, Subscriptions.topics("topic1", "topic2"))
...
.mapAsync(parallelism = 5)(problem => (problemSolverRouter ? problem).mapTo[Solution])
...
.run()
74
Transform your stream by processing messages in an
Actor System. All you need is an ActorRef.
75. Actor System Integration
class ProblemSolverRouter extends Actor {
def receive = {
case problem: Problem =>
val solution = businessLogic(problem)
sender() ! solution // reply to the ask
}
}
...
val control = Consumer
.committableSource(consumerSettings, Subscriptions.topics("topic1", "topic2"))
...
.mapAsync(parallelism = 5)(problem => (problemSolverRouter ? problem).mapTo[Solution])
...
.run()
75
Use Ask pattern (? function) to call provided
ActorRef to get an async response
76. Actor System Integration
class ProblemSolverRouter extends Actor {
def receive = {
case problem: Problem =>
val solution = businessLogic(problem)
sender() ! solution // reply to the ask
}
}
...
val control = Consumer
.committableSource(consumerSettings, Subscriptions.topics("topic1", "topic2"))
...
.mapAsync(parallelism = 5)(problem => (problemSolverRouter ? problem).mapTo[Solution])
...
.run()
76
Parallelism used to limit how many messages in
flight so we don’t overwhelm mailbox of destination
Actor and maintain stream back-pressure.
78. Options for implementing Stateful Streams
1. Provided Akka Streams stages: fold, scan,
etc.
2. Custom GraphStage
3. Call into an Akka Actor System
78
79. Persistent Stateful Stages using Event Sourcing
79
1. Recover state after failure
2. Create an event log
3. Share state
Sound familiar? KTable’s!
83. Alpakka Kafka 1.0 Release Notes
Released Feb 28, 2019. Highlights:
● Upgraded the Kafka client to version 2.0.0 #544 by @fr3akX
○ Support new API’s from KIP-299: Fix Consumer indefinite blocking behaviour in #614 by
@zaharidichev
● New Committer.sink for standardised committing #622 by @rtimush
● Commit with metadata #563 and #579 by @johnclara
● Factored out akka.kafka.testkit for internal and external use: see Testing
● Support for merging commit batches #584 by @rtimush
● Reduced risk of message loss for partitioned sources #589
● Expose Kafka errors to stream #617
● Java APIs for all settings classes #616
● Much more comprehensive tests
83