Apache Kafka 0.8 basic training - VerisignMichael Noll
Apache Kafka 0.8 basic training (120 slides) covering:
1. Introducing Kafka: history, Kafka at LinkedIn, Kafka adoption in the industry, why Kafka
2. Kafka core concepts: topics, partitions, replicas, producers, consumers, brokers
3. Operating Kafka: architecture, hardware specs, deploying, monitoring, P&S tuning
4. Developing Kafka apps: writing to Kafka, reading from Kafka, testing, serialization, compression, example apps
5. Playing with Kafka using Wirbelsturm
Audience: developers, operations, architects
Created by Michael G. Noll, Data Architect, Verisign, https://www.verisigninc.com/
Verisign is a global leader in domain names and internet security.
Tools mentioned:
- Wirbelsturm (https://github.com/miguno/wirbelsturm)
- kafka-storm-starter (https://github.com/miguno/kafka-storm-starter)
Blog post at:
http://www.michael-noll.com/blog/2014/08/18/apache-kafka-training-deck-and-tutorial/
Many thanks to the LinkedIn Engineering team (the creators of Kafka) and the Apache Kafka open source community!
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
Building distributed systems is challenging. Luckily, Apache Kafka provides a powerful toolkit for putting together big services as a set of scalable, decoupled components. In this talk, I'll describe some of the design tradeoffs when building microservices, and how Kafka's powerful abstractions can help. I'll also talk a little bit about what the community has been up to with Kafka Streams, Kafka Connect, and exactly-once semantics.
Presentation by Colin McCabe, Confluent, Big Data Day LA
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
Apache Kafka is the de facto standard for data streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job?
This session explores the DOs and DONTs. Separate sections explain when to use Kafka, when NOT to use Kafka, and when to MAYBE use Kafka.
No matter if you think about open source Apache Kafka, a cloud service like Confluent Cloud, or another technology using the Kafka protocol like Redpanda or Pulsar, check out this slide deck.
A detailed article about this topic:
https://www.kai-waehner.de/blog/2022/01/04/when-not-to-use-apache-kafka/
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
Real-time data beats slow data. That’s true for almost every use case. Nevertheless, enterprise architects build new infrastructures with the Lambda architecture that includes separate batch and real-time layers.
This video explores why a single real-time pipeline, called Kappa architecture, is the better fit for many enterprise architectures. Real-world examples from companies such as Disney, Shopify, Uber, and Twitter explore the benefits of Kappa but also show how batch processing fits into this discussion positively without the need for a Lambda architecture.
The main focus of the discussion is on Apache Kafka (and its ecosystem) as the de facto standard for event streaming to process data in motion (the key concept of Kappa), but the video also compares various technologies and vendors such as Confluent, Cloudera, IBM Red Hat, Apache Flink, Apache Pulsar, AWS Kinesis, Amazon MSK, Azure Event Hubs, Google Pub Sub, and more.
Video recording of this presentation:
https://youtu.be/j7D29eyysDw
Further reading:
https://www.kai-waehner.de/blog/2021/09/23/real-time-kappa-architecture-mainstream-replacing-batch-lambda/
https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/
https://www.kai-waehner.de/blog/2021/05/09/kafka-api-de-facto-standard-event-streaming-like-amazon-s3-object-storage/
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
Streaming all over the World: Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka.
Learn about various case studies for event streaming with Apache Kafka across industries. The talk explores architectures for real-world deployments from Audi, BMW, Disney, Generali, Paypal, Tesla, Unity, Walmart, William Hill, and more. Use cases include fraud detection, mainframe offloading, predictive maintenance, cybersecurity, edge computing, track&trace, live betting, and much more.
Apache Kafka 0.8 basic training - VerisignMichael Noll
Apache Kafka 0.8 basic training (120 slides) covering:
1. Introducing Kafka: history, Kafka at LinkedIn, Kafka adoption in the industry, why Kafka
2. Kafka core concepts: topics, partitions, replicas, producers, consumers, brokers
3. Operating Kafka: architecture, hardware specs, deploying, monitoring, P&S tuning
4. Developing Kafka apps: writing to Kafka, reading from Kafka, testing, serialization, compression, example apps
5. Playing with Kafka using Wirbelsturm
Audience: developers, operations, architects
Created by Michael G. Noll, Data Architect, Verisign, https://www.verisigninc.com/
Verisign is a global leader in domain names and internet security.
Tools mentioned:
- Wirbelsturm (https://github.com/miguno/wirbelsturm)
- kafka-storm-starter (https://github.com/miguno/kafka-storm-starter)
Blog post at:
http://www.michael-noll.com/blog/2014/08/18/apache-kafka-training-deck-and-tutorial/
Many thanks to the LinkedIn Engineering team (the creators of Kafka) and the Apache Kafka open source community!
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
Building distributed systems is challenging. Luckily, Apache Kafka provides a powerful toolkit for putting together big services as a set of scalable, decoupled components. In this talk, I'll describe some of the design tradeoffs when building microservices, and how Kafka's powerful abstractions can help. I'll also talk a little bit about what the community has been up to with Kafka Streams, Kafka Connect, and exactly-once semantics.
Presentation by Colin McCabe, Confluent, Big Data Day LA
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
Apache Kafka is the de facto standard for data streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job?
This session explores the DOs and DONTs. Separate sections explain when to use Kafka, when NOT to use Kafka, and when to MAYBE use Kafka.
No matter if you think about open source Apache Kafka, a cloud service like Confluent Cloud, or another technology using the Kafka protocol like Redpanda or Pulsar, check out this slide deck.
A detailed article about this topic:
https://www.kai-waehner.de/blog/2022/01/04/when-not-to-use-apache-kafka/
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
Real-time data beats slow data. That’s true for almost every use case. Nevertheless, enterprise architects build new infrastructures with the Lambda architecture that includes separate batch and real-time layers.
This video explores why a single real-time pipeline, called Kappa architecture, is the better fit for many enterprise architectures. Real-world examples from companies such as Disney, Shopify, Uber, and Twitter explore the benefits of Kappa but also show how batch processing fits into this discussion positively without the need for a Lambda architecture.
The main focus of the discussion is on Apache Kafka (and its ecosystem) as the de facto standard for event streaming to process data in motion (the key concept of Kappa), but the video also compares various technologies and vendors such as Confluent, Cloudera, IBM Red Hat, Apache Flink, Apache Pulsar, AWS Kinesis, Amazon MSK, Azure Event Hubs, Google Pub Sub, and more.
Video recording of this presentation:
https://youtu.be/j7D29eyysDw
Further reading:
https://www.kai-waehner.de/blog/2021/09/23/real-time-kappa-architecture-mainstream-replacing-batch-lambda/
https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/
https://www.kai-waehner.de/blog/2021/05/09/kafka-api-de-facto-standard-event-streaming-like-amazon-s3-object-storage/
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
Streaming all over the World: Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka.
Learn about various case studies for event streaming with Apache Kafka across industries. The talk explores architectures for real-world deployments from Audi, BMW, Disney, Generali, Paypal, Tesla, Unity, Walmart, William Hill, and more. Use cases include fraud detection, mainframe offloading, predictive maintenance, cybersecurity, edge computing, track&trace, live betting, and much more.
ksqlDB: A Stream-Relational Database Systemconfluent
Speaker: Matthias J. Sax, Software Engineer, Confluent
ksqlDB is a distributed event streaming database system that allows users to express SQL queries over relational tables and event streams. The project was released by Confluent in 2017 and is hosted on Github and developed with an open-source spirit. ksqlDB is built on top of Apache Kafka®, a distributed event streaming platform. In this talk, we discuss ksqlDB’s architecture that is influenced by Apache Kafka and its stream processing library, Kafka Streams. We explain how ksqlDB executes continuous queries while achieving fault tolerance and high vailability. Furthermore, we explore ksqlDB’s streaming SQL dialect and the different types of supported queries.
Matthias J. Sax is a software engineer at Confluent working on ksqlDB. He mainly contributes to Kafka Streams, Apache Kafka's stream processing library, which serves as ksqlDB's execution engine. Furthermore, he helps evolve ksqlDB's "streaming SQL" language. In the past, Matthias also contributed to Apache Flink and Apache Storm and he is an Apache committer and PMC member. Matthias holds a Ph.D. from Humboldt University of Berlin, where he studied distributed data stream processing systems.
https://db.cs.cmu.edu/events/quarantine-db-talk-2020-confluent-ksqldb-a-stream-relational-database-system/
Kafka and Confluent are nice, but what about the integration with public clouds like Azure. Or even better, to integrate Kafka and Confluent with a managed API management like Azure API Gateway.
In this talk I will show you how it is possible to integrate an event streaming platform like Confluent into an enterprise API Management and different other services to build up a lambda based data platform architecture.
3 Kafka patterns to deliver Streaming Machine Learning models with Andrea Spi...HostedbyConfluent
The presentation highlights the main technical challenges Radicalbit faced while building a real-time serving engine for streaming Machine Learning algorithms. The speech describes how Kafka has been used to fasten two ML technologies together: River, an open-source suite of streaming machine learning algorithms, and Seldon-core, a DevOps-driven MLOps platform.
In particular, the talk focuses on how Kafka has been used to (1) build a dynamic model serving framework thanks to Kafka Streams joins and the broadcasting pattern (2) implement a Kafka user-given feedback topic by which online models can learn while they generate predictions, and (3) design a models' prediction bus, a particular Kafka bidirectional topic whereby predictions flow at tremendous scale; the prediction bus enabled seldon-core Kubernetes deployment to communicate with Kafka Streams, and as a conclusive subject this speech explains how this unleashed unprecedented performance.
Apache kafka performance(latency)_benchmark_v0.3SANG WON PARK
Apache Kafka를 이용하여 이미지 데이터를 얼마나 빠르게(with low latency) 전달 가능한지 성능 테스트.
최종 목적은 AI(ML/DL) 모델의 입력으로 대량의 실시간 영상/이미지 데이터를 전달하는 메세지 큐로 사용하기 위하여, Drone/제조공정 등의 장비에서 전송된 이미지를 얼마나 빨리 AI Model로 전달 할 수 있는지 확인하기 위함.
그래서 Kafka에서 이미지를 전송하는 간단한 테스트를 진행하였고,
이 과정에서 latency를 얼마나 줄여주는지를 확인해 보았다.(HTTP 프로토콜/Socket과 비교하여)
[현재 까지 결론]
- Apache Kafka는 대량의 요청 처리를 위한 throughtput에 최적화 된 솔루션임.
- 현재는 producer의 몇가지 옵션만 조정하여 테스트한 결과이므로,
- 잠정적인 결과이지만, kafka의 latency를 향상을 위해서는 많은 시도가 필요할 것 같음.
- 즉, 단일 요청의 latency는 확실히 느리지만,
- 대량의 처리를 기준으로 평균 latency를 비교하면 평균적인 latency는 많이 낮아짐.
Test Code : https://github.com/freepsw/kafka-latency-test
Kafka is most popular messaging queue.
Key Areas:
What is Messgaing Queue?
Why Messaging Queue?
Kafka- basic terminologies
Kafka- Architecture (Message Flow)
AWS SQS vs Apache Kafka
Apache Pulsar Development 101 with PythonTimothy Spann
Apache Pulsar Development 101 with Python PS2022_Ecosystem_v0.0
There is always the fear a speaker cannot make it. So just in case, since I was the MC for the ecosystem track I put together a talk just in case.
Here it is. Never seen or presented.
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안SANG WON PARK
Apache Kafak의 빅데이터 아키텍처에서 역할이 점차 커지고, 중요한 비중을 차지하게 되면서, 성능에 대한 고민도 늘어나고 있다.
다양한 프로젝트를 진행하면서 Apache Kafka를 모니터링 하기 위해 필요한 Metrics들을 이해하고, 이를 최적화 하기 위한 Configruation 설정을 정리해 보았다.
[Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안]
Apache Kafka 성능 모니터링에 필요한 metrics에 대해 이해하고, 4가지 관점(처리량, 지연, Durability, 가용성)에서 성능을 최적화 하는 방안을 정리함. Kafka를 구성하는 3개 모듈(Producer, Broker, Consumer)별로 성능 최적화를 위한 …
[Apache Kafka 모니터링을 위한 Metrics 이해]
Apache Kafka의 상태를 모니터링 하기 위해서는 4개(System(OS), Producer, Broker, Consumer)에서 발생하는 metrics들을 살펴봐야 한다.
이번 글에서는 JVM에서 제공하는 JMX metrics를 중심으로 producer/broker/consumer의 지표를 정리하였다.
모든 지표를 정리하진 않았고, 내 관점에서 유의미한 지표들을 중심으로 이해한 내용임
[Apache Kafka 성능 Configuration 최적화]
성능목표를 4개로 구분(Throughtput, Latency, Durability, Avalibility)하고, 각 목표에 따라 어떤 Kafka configuration의 조정을 어떻게 해야하는지 정리하였다.
튜닝한 파라미터를 적용한 후, 성능테스트를 수행하면서 추출된 Metrics를 모니터링하여 현재 업무에 최적화 되도록 최적화를 수행하는 것이 필요하다.
It covers a brief introduction to Apache Kafka Connect, giving insights about its benefits,use cases, motivation behind building Kafka Connect.And also a short discussion on its architecture.
IBM WebSphere MQ: Managing Workloads, Scaling and Availability with MQ ClustersDavid Ware
IBM WebSphere MQ Clustering can be used to solve many problems, from simplified administration and workload management in an MQ network, to horizontal scalability and continuous availability of messaging applications. This session will show the full range of uses of MQ Clusters to solve real problems, highlighting the underlying technology being used.
This has been superseded by http://www.slideshare.net/DavidWare1/ame-2273-mq-clustering-pdf
Watch this talk here: https://www.confluent.io/online-talks/from-zero-to-hero-with-kafka-connect-on-demand
Integrating Apache Kafka® with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren't working.
This talk will discuss the key design concepts within Apache Kafka Connect and the pros and cons of standalone vs distributed deployment modes. We'll do a live demo of building pipelines with Apache Kafka Connect for streaming data in from databases, and out to targets including Elasticsearch. With some gremlins along the way, we'll go hands-on in methodically diagnosing and resolving common issues encountered with Apache Kafka Connect. The talk will finish off by discussing more advanced topics including Single Message Transforms, and deployment of Apache Kafka Connect in containers.
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...HostedbyConfluent
If you have already worked on various Kafka Streams applications before, then you have probably found yourself in the situation of rewriting the same piece of code again and again.
Whether it's to manage processing failures or bad records, to use interactive queries, to organize your code, to deploy or to monitor your Kafka Streams app, build some in-house libraries to standardize common patterns across your projects seems to be unavoidable.
And, if you're new to Kafka Streams you might be interested to know what are those patterns to use for your next streaming project.
In this talk, I propose to introduce you to Azkarra, an open-source lightweight Java framework that was designed to provide most of that stuffs off-the-shelf by leveraging the best-of-breed ideas and proven practices from the Apache Kafka community.
ksqlDB: A Stream-Relational Database Systemconfluent
Speaker: Matthias J. Sax, Software Engineer, Confluent
ksqlDB is a distributed event streaming database system that allows users to express SQL queries over relational tables and event streams. The project was released by Confluent in 2017 and is hosted on Github and developed with an open-source spirit. ksqlDB is built on top of Apache Kafka®, a distributed event streaming platform. In this talk, we discuss ksqlDB’s architecture that is influenced by Apache Kafka and its stream processing library, Kafka Streams. We explain how ksqlDB executes continuous queries while achieving fault tolerance and high vailability. Furthermore, we explore ksqlDB’s streaming SQL dialect and the different types of supported queries.
Matthias J. Sax is a software engineer at Confluent working on ksqlDB. He mainly contributes to Kafka Streams, Apache Kafka's stream processing library, which serves as ksqlDB's execution engine. Furthermore, he helps evolve ksqlDB's "streaming SQL" language. In the past, Matthias also contributed to Apache Flink and Apache Storm and he is an Apache committer and PMC member. Matthias holds a Ph.D. from Humboldt University of Berlin, where he studied distributed data stream processing systems.
https://db.cs.cmu.edu/events/quarantine-db-talk-2020-confluent-ksqldb-a-stream-relational-database-system/
Kafka and Confluent are nice, but what about the integration with public clouds like Azure. Or even better, to integrate Kafka and Confluent with a managed API management like Azure API Gateway.
In this talk I will show you how it is possible to integrate an event streaming platform like Confluent into an enterprise API Management and different other services to build up a lambda based data platform architecture.
3 Kafka patterns to deliver Streaming Machine Learning models with Andrea Spi...HostedbyConfluent
The presentation highlights the main technical challenges Radicalbit faced while building a real-time serving engine for streaming Machine Learning algorithms. The speech describes how Kafka has been used to fasten two ML technologies together: River, an open-source suite of streaming machine learning algorithms, and Seldon-core, a DevOps-driven MLOps platform.
In particular, the talk focuses on how Kafka has been used to (1) build a dynamic model serving framework thanks to Kafka Streams joins and the broadcasting pattern (2) implement a Kafka user-given feedback topic by which online models can learn while they generate predictions, and (3) design a models' prediction bus, a particular Kafka bidirectional topic whereby predictions flow at tremendous scale; the prediction bus enabled seldon-core Kubernetes deployment to communicate with Kafka Streams, and as a conclusive subject this speech explains how this unleashed unprecedented performance.
Apache kafka performance(latency)_benchmark_v0.3SANG WON PARK
Apache Kafka를 이용하여 이미지 데이터를 얼마나 빠르게(with low latency) 전달 가능한지 성능 테스트.
최종 목적은 AI(ML/DL) 모델의 입력으로 대량의 실시간 영상/이미지 데이터를 전달하는 메세지 큐로 사용하기 위하여, Drone/제조공정 등의 장비에서 전송된 이미지를 얼마나 빨리 AI Model로 전달 할 수 있는지 확인하기 위함.
그래서 Kafka에서 이미지를 전송하는 간단한 테스트를 진행하였고,
이 과정에서 latency를 얼마나 줄여주는지를 확인해 보았다.(HTTP 프로토콜/Socket과 비교하여)
[현재 까지 결론]
- Apache Kafka는 대량의 요청 처리를 위한 throughtput에 최적화 된 솔루션임.
- 현재는 producer의 몇가지 옵션만 조정하여 테스트한 결과이므로,
- 잠정적인 결과이지만, kafka의 latency를 향상을 위해서는 많은 시도가 필요할 것 같음.
- 즉, 단일 요청의 latency는 확실히 느리지만,
- 대량의 처리를 기준으로 평균 latency를 비교하면 평균적인 latency는 많이 낮아짐.
Test Code : https://github.com/freepsw/kafka-latency-test
Kafka is most popular messaging queue.
Key Areas:
What is Messgaing Queue?
Why Messaging Queue?
Kafka- basic terminologies
Kafka- Architecture (Message Flow)
AWS SQS vs Apache Kafka
Apache Pulsar Development 101 with PythonTimothy Spann
Apache Pulsar Development 101 with Python PS2022_Ecosystem_v0.0
There is always the fear a speaker cannot make it. So just in case, since I was the MC for the ecosystem track I put together a talk just in case.
Here it is. Never seen or presented.
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안SANG WON PARK
Apache Kafak의 빅데이터 아키텍처에서 역할이 점차 커지고, 중요한 비중을 차지하게 되면서, 성능에 대한 고민도 늘어나고 있다.
다양한 프로젝트를 진행하면서 Apache Kafka를 모니터링 하기 위해 필요한 Metrics들을 이해하고, 이를 최적화 하기 위한 Configruation 설정을 정리해 보았다.
[Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안]
Apache Kafka 성능 모니터링에 필요한 metrics에 대해 이해하고, 4가지 관점(처리량, 지연, Durability, 가용성)에서 성능을 최적화 하는 방안을 정리함. Kafka를 구성하는 3개 모듈(Producer, Broker, Consumer)별로 성능 최적화를 위한 …
[Apache Kafka 모니터링을 위한 Metrics 이해]
Apache Kafka의 상태를 모니터링 하기 위해서는 4개(System(OS), Producer, Broker, Consumer)에서 발생하는 metrics들을 살펴봐야 한다.
이번 글에서는 JVM에서 제공하는 JMX metrics를 중심으로 producer/broker/consumer의 지표를 정리하였다.
모든 지표를 정리하진 않았고, 내 관점에서 유의미한 지표들을 중심으로 이해한 내용임
[Apache Kafka 성능 Configuration 최적화]
성능목표를 4개로 구분(Throughtput, Latency, Durability, Avalibility)하고, 각 목표에 따라 어떤 Kafka configuration의 조정을 어떻게 해야하는지 정리하였다.
튜닝한 파라미터를 적용한 후, 성능테스트를 수행하면서 추출된 Metrics를 모니터링하여 현재 업무에 최적화 되도록 최적화를 수행하는 것이 필요하다.
It covers a brief introduction to Apache Kafka Connect, giving insights about its benefits,use cases, motivation behind building Kafka Connect.And also a short discussion on its architecture.
IBM WebSphere MQ: Managing Workloads, Scaling and Availability with MQ ClustersDavid Ware
IBM WebSphere MQ Clustering can be used to solve many problems, from simplified administration and workload management in an MQ network, to horizontal scalability and continuous availability of messaging applications. This session will show the full range of uses of MQ Clusters to solve real problems, highlighting the underlying technology being used.
This has been superseded by http://www.slideshare.net/DavidWare1/ame-2273-mq-clustering-pdf
Watch this talk here: https://www.confluent.io/online-talks/from-zero-to-hero-with-kafka-connect-on-demand
Integrating Apache Kafka® with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren't working.
This talk will discuss the key design concepts within Apache Kafka Connect and the pros and cons of standalone vs distributed deployment modes. We'll do a live demo of building pipelines with Apache Kafka Connect for streaming data in from databases, and out to targets including Elasticsearch. With some gremlins along the way, we'll go hands-on in methodically diagnosing and resolving common issues encountered with Apache Kafka Connect. The talk will finish off by discussing more advanced topics including Single Message Transforms, and deployment of Apache Kafka Connect in containers.
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...HostedbyConfluent
If you have already worked on various Kafka Streams applications before, then you have probably found yourself in the situation of rewriting the same piece of code again and again.
Whether it's to manage processing failures or bad records, to use interactive queries, to organize your code, to deploy or to monitor your Kafka Streams app, build some in-house libraries to standardize common patterns across your projects seems to be unavoidable.
And, if you're new to Kafka Streams you might be interested to know what are those patterns to use for your next streaming project.
In this talk, I propose to introduce you to Azkarra, an open-source lightweight Java framework that was designed to provide most of that stuffs off-the-shelf by leveraging the best-of-breed ideas and proven practices from the Apache Kafka community.
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaGuido Schmutz
After a quick overview and introduction of Apache Kafka, this session cover two components which extend the core of Apache Kafka: Kafka Connect and Kafka Streams/KSQL.
Kafka Connects role is to access data from the out-side-world and make it available inside Kafka by publishing it into a Kafka topic. On the other hand, Kafka Connect is also responsible to transport information from inside Kafka to the outside world, which could be a database or a file system. There are many existing connectors for different source and target systems available out-of-the-box, either provided by the community or by Confluent or other vendors. You simply configure these connectors and off you go.
Kafka Streams is a light-weight component which extends Kafka with stream processing functionality. By that, Kafka can now not only reliably and scalable transport events and messages through the Kafka broker but also analyse and process these event in real-time. Interestingly Kafka Streams does not provide its own cluster infrastructure and it is also not meant to run on a Kafka cluster. The idea is to run Kafka Streams where it makes sense, which can be inside a “normal” Java application, inside a Web container or on a more modern containerized (cloud) infrastructure, such as Mesos, Kubernetes or Docker. Kafka Streams has a lot of interesting features, such as reliable state handling, queryable state and much more. KSQL is a streaming engine for Apache Kafka, providing a simple and completely interactive SQL interface for processing data in Kafka.
Spark Streaming has quickly established itself as one of the more popular Streaming Engines running on the Hadoop Ecosystem. Not only does it provide integration with many type of message brokers and stream sources, but it also provides the ability to leverage other major modules in Spark like Spark SQL and MLib in conjunction. This allows for businesses and developers to make use out of data in ways they couldn’t hope to do in the past.
However, while building a Spark Streaming pipeline, it’s not sufficient to only know how to express your business logic. Operationalizing these pipelines and running the application with high uptime and continuous monitoring has a lot of operational challenges. Fortunately, Spark Streaming makes all that easy as well. In this talk, we’ll go over some of the main steps you’ll need to take to get your Spark Streaming application ready for production, specifically in conjunction with Kafka. This includes steps to gracefully shutdown your application, steps to perform upgrades, monitoring, various useful spark configurations and more.
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...HostedbyConfluent
In our payments platform at Goldman Sachs Transaction Banking, Apache Kafka plays a critical role as the messaging bus in our micro-services architecture. Being a part of the financial service industry we need to ensure high-availability of our platform and quick response time during failures.
In this talk we will explore how we monitor and alert on the health of our Kafka clusters using our heartbeat application and clients using DataDog dashboards. We will see how we consolidate JMX metrics such as error-rates, connection-rates, latencies and consumer lag from all producers and consumers using JMX agent sidecar to provide a live view of the health of our entire infrastructure. We will also discuss our culture of game days where we regularly test the resiliency of all the clients in our infrastructure by simulating various failure scenarios to improve the overall availability of our infrastructure.
Event streaming applications unlock new benefits by combining various data feeds. However, getting actionable insights in a timely fashion has remained a challenge, as the data has been siloed in disparate systems. ksqlDB solves this by providing an interactive SQL interface that can seamlessly combine and transform data from various sources.
In this webinar, we will show how streaming queries of high throughput NoSQL systems can derive insights from various push/pull queries via ksqlDB's User-Defined Functions, Aggregate Functions and Table Functions.
Watch this to learn:
Real-world examples of the benefits of using a streaming database like ksqlDB and seamlessly combining data from Kafka & Cassandra/Scylla (NoSQL).
The functionality of ksqlDB via push/pull queries and UDFs/UDAFs/UDTFs.
The ease with which data stored in a NoSQL database can be transformed using ksqlDB and then persisted back for long-term storage.
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLScyllaDB
Event streaming applications unlock new benefits by combining various data feeds. However, getting actionable insights in a timely fashion has remained a challenge, as the data has been siloed in disparate systems. ksqlDB solves this by providing an interactive SQL interface that can seamlessly combine and transform data from various sources.
In this webinar, we will show how streaming queries of high throughput NoSQL systems can derive insights from various push/pull queries via ksqlDB's User-Defined Functions, Aggregate Functions and Table Functions.
Scalability, fault tolerance, distributed log…these are terms which we hear more and more these days. Make them happen is quite a challenge sometimes especially if our business need to be data intensive, agile and fast to market.
One way to answer to this challenge is microservices. These are small services that communicate to each other to deliver business value. The key word here is _communication_. Without communication all the power of microservices falls apart. And communication is not a trivial fact when involves systems with multiple data systems that are talking to one another over many channels. Each of the channel requiring their own protocol and communication methods. This is where communication can become a bottleneck if not handled properly.
One answer to this problem is Kafka, a distributed messaging system providing fast, highly scalable and redundant message exchange using a publish-subscribe model. And when we talk about fast we talk about one of the fastest messaging systems out there.
This presentation will show you an alternative way of doing microservices with event-driven architecture through Kafka.
Presenters:
Laszlo-Robert Albert (albertlaszlorobert [at] gmail [dot] com)
Dan Balescu (dfbalescu [at] gmail [dot] com)
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
In the world of sensors and social media streams, the integration and handling of high-volume event streams is more important than ever. Events have to be handled both efficiently and reliably and often many consumers or systems are interested in all or part of the events. How do we make sure that all these event are accepted and forwarded in an efficient and reliable way? Apache Kafka, a distributed, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target can be of great help in such scenario.
This session introduces Apache Kafka and its place in a modern architecture, shows its integration with Oracle Stack and presents the Oracle Event Hub cloud service, the managed Kafka service.
Introduction to Apache Kafka and Confluent... and why they matterconfluent
Milano Apache Kafka Meetup by Confluent (First Italian Kafka Meetup) on Wednesday, November 29th 2017.
Il talk introduce Apache Kafka (incluse le APIs Kafka Connect e Kafka Streams), Confluent (la società creata dai creatori di Kafka) e spiega perché Kafka è un'ottima e semplice soluzione per la gestione di stream di dati nel contesto di due delle principali forze trainanti e trend industriali: Internet of Things (IoT) e Microservices.
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
Presentation @ Oracle Code Berlin.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can we make sure that all these events are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amounts of messages between a source and a target. This session will start with an introduction of Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table.
Introduction to Apache Kafka and Confluent... and why they matter!Paolo Castagna
This is a short introduction to Apache Kafka and Confluent (the company founded by the creator of Kafka). The slides cover Apache Kafka APIs including Kafka Connect and Kafka Streams (part of Apache Kafka). Other open source, ASL licensed, projects are mentioned: #KSQL, Schema Registry, REST Proxy, etc.
Many thanks to Codemotion and Seacom for hosting the event.
Changing landscapes in data integration - Kafka Connect for near real-time da...HostedbyConfluent
In 2019 we presented “Secure Kafka at scale in true multi-tenant environment” at SFO Kafka summit. Back then, kafka was mainly used for event driven architectures, high-throughput pub/sub use cases and as a data-plane for log aggregation and for transporting metadata & metrics. A lot has changed since then - Kafka plant has grown to handle 400B incoming events in a day just in production, introduced stretch cluster pattern in addition to Active-Active cluster replication pattern. Moreover, new use cases have emerged in using Kafka for near real time stream processing and data moving pipelines across the cloud environments. Moving data in near-real time across the system is a hard problem to solve. Kafka Connect is a framework to stream data in/out of kafka reliably and can be used to achieve near-real time data moving pipelines. In this talk, we will present how kafka adoption has evolved over the last couple of years in our space and deep dive into how we approached in providing Managed Kafka Connect, a newest addition to our service portfolio.
Domain Driven Design provides not only the strategic guidelines for decomposing a large system into microservices, but also offers the main tactical pattern that helps in decoupling microservices. The presentation will focus on the way domain events could be implemented using Kafka and the trade-offs between consistency and availability that are supported by Kafka.
https://youtu.be/P6IaxNcn-Ag?t=1466
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
3. Apache Kafka
Open-source distributed event streaming platform
“event streaming is the practice of
• capturing data in real-time from event sources (like databases, sensors, mobile
devices, cloud services, and software applications) in the form of streams of
events;
• storing these event streams durably for later retrieval;
• manipulating, processing, and reacting to the event streams in real-time as well
as retrospectively; and
• routing the event streams to different destination technologies as needed.”
-- https://kafka.apache.org/documentation/
3
6. Partitioner
Producer will decide target partition to place any message, depending on:
1. Partition id, if it's specified within the message
2. key % num partitions, if no partition id is mentioned
3. Round robin if neither partition id nor message key are available in message,
meaning only value is available
6
8. Kafka vs. Message Broker
Kafka
8
• Dumb-Broker and Smart-Consumer
• Pull
• Topics/Partitions/Groups
• Connectors
• Stronger guarantees of messages
ordering
• Persisted events
Message Broker (e.g. RabbitMQ
• Smart-Broker and Dumb-Consumer
• Push
• Queues/Exchanges/Routing
• Superior support when it comes to
routing and filtering messages for
consumers to use.
• Message Time-To-Live and
delayed/scheduled messages
• Delivery retries and dead-letter
exchanges/queues
15. Overflow management
// Set the max size to 10 and fail if reached
@OnOverflow(value = OnOverflow.Strategy.BUFFER, bufferSize = 10)
@Inject @Channel("channel") Emitter<String> emitterWithBuffer;
// [DANGER ZONE] no limit
@OnOverflow(OnOverflow.Strategy.UNBOUNDED_BUFFER)
@Inject @Channel("channel") Emitter<String> danger;
// Drop the new messages if the size is reached
@OnOverflow(OnOverflow.Strategy.DROP)
@Inject @Channel("channel") Emitter<String> dropping;
// Drop the previously sent messages if the size is reached
@OnOverflow(OnOverflow.Strategy.LATEST)
@Inject @Channel("channel") Emitter<String> dropOldMessages;
https://smallrye.io/smallrye-reactive-messaging/smallrye-reactive-messaging/3.2/emitter/emitter.html#emitter-overflow