[Meetup ms] Kafka Streams

•

0 likes•178 views

Cristiano Altmann

Uma apresentação sobre o processamento e patterns relacionados a streams e o framework Kafka Streams.

Technology

Kafka Streams
Quando streams encontram tabelas

Cristiano Altmann
Arquiteto Software
https://www.linkedin.com/in/crisaltmann/

Matheus Alagia
Eng. de Computação
https://www.linkedin.com/in/matheusalagia/
https://ubots.com.br

Microservices
Independent deploy
Low coupling
Horizontal Scalability
Technological choices
Business time
Independent teams
Technological
evolution
Resilient

Microservices is really just distributed
systems!

The Hardest Part About Microservices:
Your Data

Pessoa Conta Corrente
CPF,
Nome,
Endereço,
Renda, etc.
Número conta,
Tipo,
Lançamentos,
etc
Services with different database

Pessoa Conta Corrente
CPF,
Nome,
Endereço,
Renda, etc.
Número conta,
Tipo,
Lançamentos,
etc
Relatório
CPF, Nome,
Número Conta,
Lançamentos
Services with different database

Pessoa Conta Corrente
CPF,
Nome,
Endereço,
Renda, etc.
Número conta,
Tipo,
Lançamentos,
etc
Relatório
CPF, Nome,
Número
Conta,
Lançamentos
REST

Pessoa Conta Corrente
CPF,
Nome,
Endereço,
Renda, etc.
Número conta,
Tipo,
Lançamentos,
etc
Relatório
CPF, Nome,
Número
Conta,
Lançamentos
Database integration

Pessoa
Relatório
Services with different database
Conta Cartão Previdência Título Serasa Crédito

Streaming processing can be the
answer...

What is Streaming Processing?
“Is some kind of computation over a Data Stream. First and foremost, a data
stream is an abstraction representing an unbounded dataset. Unbounded means
infinite and ever growing.”
Kafka: The Definitive Guide

Streaming processing examples
FILTER
MAP

Stream processing is a programming paradigm...
Request-Response Batch ProcessingStreaming
Processing
Throughput
Latency

The world always changes, and sometimes we are interested
in the events that caused those changes, whereas other
times we are interested in the current state of the world….
Stream-Table Duality

Stream-Table Duality Example
LANÇAMENTOS CONTA CORRENTE
Conta: 1
Valor: 100
Conta: 1
Valor: 200
Conta: 5
Valor: 200
Conta: 1
Valor: -50

Stream-Table Duality Example
LANÇAMENTOS CONTA CORRENTE
Conta: 1
Valor: 100
Conta: 1
Valor: 200
Conta: 5
Valor: 200
Conta: 1
Valor: -50
1: 250
5: 200

Systems that allow you to transition back and forth between
the two ways of looking at data are more powerful than
systems that support just one.
- Neha Narkhede (Kafka: The definitive guide)

Kafka vs Kafka Stream
● Distributed log
● High available
● ⅓ Fortune 500
● APIS:
○ Producer
○ Consumer
○ Connect
○ Streams
● Part of Kafka ecosystem
● Just a lib
● Simple API
● DSL

Mantra: Stateless
Service
Client
STATELESS
STATE STORAGE
??
?

Stateless is good!!
● Services start instantly
● Thread safe
● Scaled out linearly
● Not shared state

But sometimes we need state….
Moving the state to a database just push the
problem to another layer.

Statefull can be good too...
● Services start rapidly
● Thread safe
● Scaled out linearly
● Not shared state

KAFKA STREAMS
● Services start rapidly V
● Thread safe V
● Scaled out linearly V
● Not shared state V
● HIGH THROUGHPUT

Stream-Processing Concepts
Time
Event time
Local state
Log append time
State
Processing time
External state
Time Windows
Slide window
Tumbling window
Hopping Window

Join
● Stream - Stream
● Stream - Table
● Table - Table

Conclusion
● Stream processing is a powerful tool
● Kafka Streams is simple
● Designed for microservices

At GO-JEK, we build products that help millions of Indonesians At GO-JEK, we build products that help millions of Indonesians commute, shop, eat and pay, daily. Data Engineering team is responsible to create a reliable data infrastructure across all of GO-JEK’s 18+ products. We use Flink extensively to provide real-time streaming aggregation and analytics for billions of data points generated on daily basis. Working at such a large scale makes it really important to automate operations from infrastructure, failover, and monitoring. This way we can push features faster without causing chaos and disruption to the production environment. 1. Provisioning and deployment: With the nature of business at GoJek, we found ourselves provisioning Flink clusters quite often. Currently we run around 1000 jobs across 10 clusters for different data streams with increasing number of requests day by day. We also provision on the fly clusters with custom configuration for load testing, experimentation and chaos engineering. Provisioning these many clusters from ground up required lot of man hours and involved setting up virtual machines, monitoring agents, access management, configuration management, load testing and data stream integration. Our current setup has Flink over Yarn clusters as well as Kubernetes. We use our in-house provisioning tool Odin, built on top of Terraform and Chef for Yarn clusters and Kubernetes controllers for Kubernetes based deployments. It enables us to safely and predictably create and modify Flink infrastructure. Odin has helped us reduce provisioning time by 99% despite increasing number of requests. 2. Isolation and access control: Given the real-time and distributed nature of GoJek's services, events are classified into different streams depending on nature, time and transactional criticality, sensitivity and volume of data. Which requires setting up separate clusters based on security concerns, team segregation, job loads and criticality which comes at the cost of handling large volume data replication and maintenance. 3. Data quality control: The quality of ingestion events are controlled by Protobuf based version controlled strict event type schema with fully automated deployment pipeline. Deployed jobs are locked to a certain data schema and version which helps us accidental breaking schema changes and backward compatibility during migration and failover. 4. Monitoring and alerting: All the clusters are monitored using dedicated TICK setup. We monitors clusters for resource utilization, job stats and business impact per job. 5. Failover and Upgrading: Failover and upgrade operations are fully automated for yarn cluster failover, input stream failovers e.g. Kafka failover with stateless job strategies. Which helps us moving jobs from one cluster to another without any data loss or broken metric flow. 6. Chaos engineering and load testing: Loki is our disaster simulation tool that helps ensure the Flink infrastructure can tolerat

Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...

Flink Forward

Stream processing still evolves and changes at a speed that can make it hard to keep up with the developments. Being at the forefront of stream processing technology, the evolution of Apache Flink has mirrored many of these developments and continues to do so. We will take you on a journey through the major milestones of stream processing technology in past years, diving into the latest additions that Apache Flink and other communities introduced to the stream processing landscape, such as Streamng SQL, Time Versioned Tables, cluster-library-duality, language portability, etc. We will take a sneak peek into our crystal ball and present in what the Flink community is working on next.

Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processin...

confluent

Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...

Flink Forward

One of the main characteristics of the good streaming pipeline is correctness for event time processing. Real challenges become when such pipeline should be resilient to different types of failures. In this talk, we describe how Criteo runs Flink on one of the biggest Yarn clusters in Europe and computes 100k messages per second to acknowledge revenue of our platform within the delay of 5 minutes. Real-time revenue monitoring system calculates data under 1% of discrepancies and minimizes business impact in case of revenue anomalies.

Maximize the Business Value of Machine Learning and Data Science with Kafka (...

confluent

Today, many companies that have lots of data are still struggling to derive value from machine learning (ML) and data science investments. Why? Accessing the data may be difficult. Or maybe it’s poorly labeled. Or vital context is missing. Or there are questions around data integrity. Or standing up an ML service can be cumbersome and complex. At Nuuly, we offer an innovative clothing rental subscription model and are continually evolving our ML solutions to gain insight into the behaviors of our unique customer base as well as provide personalized services. In this session, I’ll share how we used event streaming with Apache Kafka® and Confluent Cloud to address many of the challenges that may be keeping your organization from maximizing the business value of machine learning and data science. First, you’ll see how we ensure that every customer interaction and its business context is collected. Next, I’ll explain how we can replay entire interaction histories using Kafka as a transport layer as well as a persistence layer and a business application processing layer. Order management, inventory management, logistics, subscription management – all of it integrates with Kafka as the common backbone. These data streams enable Nuuly to rapidly prototype and deploy dynamic ML models to support various domains, including pricing, recommendations, product similarity, and warehouse optimization. Join us and learn how Kafka can help improve machine learning and data science initiatives that may not be delivered to their full potential.

Flink SQL in Action

Fabian Hueske

This is a talk that I gave at the Data Council Berlin Meetup on May 16th, 2019 Abstract: Stream processing is being rapidly adopted by the enterprise. While in the past, stream processing frameworks mostly provided Java- or Scala-based APIs, stream processing with SQL is growing increasingly popular because it makes stream processing accessible to non-programmers and significantly reduces the effort to solve common tasks. About three years ago, the Apache Flink community started adding SQL support to process static and streaming data in a unified fashion. Today, Flink SQL powers production systems at Alibaba, Huawei, Lyft, and Uber. Fabian Hueske discusses the current state of Flink’s SQL support and explains the importance of Flink’s unified approach to process static and streaming data. After covering the basics, he shares common real-world use cases ranging from low-latency ETL to pattern detection and demonstrates how easily they can be addressed with Flink SQL.

SQL is the lingua franca of data processing, and everybody working with data knows SQL. Apache Flink provides SQL support for querying and processing batch and streaming data. Flink's SQL support powers large-scale production systems at Alibaba, Huawei, and Uber. Based on Flink SQL, these companies have built systems for their internal users as well as publicly offered services for paying customers.In my talk I will show how to leverage the simplicity and power of SQL on Flink. I’ll explain why unified batch and stream processing is important and what it means to run SQL queries on streams of data. Once we’ve covered the basics, I will spend the remainder of the talk demonstrating the capabilities of Flink SQL. We will explore different use cases that Flink SQL was designed for by running queries on Flink’s SQL shell. In particular, I will demonstrate the unified batch and streaming engine by running the same query on batch and streaming data and show how to build a real-time dashboard that is powered by a streaming SQL query, which continuously updates an external result table.

Building Event Streaming Applications with Pac-Man (Ricardo Ferreira, Conflue...

HostedbyConfluent

Since Pac-Man was originally released in the '80s, it has been a beacon of fun and joy for people of all ages. What few people know is that this game can also be used to inspire developers on how to build event streaming applications. In this near-zero-slides talk, attendees will get to play the game to generate events. As they play, the presenter will write from scratch a scoreboard using ksqlDB -- an open-source event streaming database built for Apache Kafka. After building the scoreboard, it will be discussed the different strategies to make the data available elsewhere so any interested service could leverage it with ease. Examples of these services will be provided to monitor in near real-time the scoreboard, revealing whoever is the most proficient Pac-Man player in the room.

Stream Processing Live Traffic Data with Kafka Streams

Tom Van den Bulck

In this workshop we will set up a streaming framework which will process realtime data of traffic sensors installed within the Belgian road system. Starting with the intake of the data, you will learn best practices and the recommended approach to split the information into events in a way that won't come back to haunt you. With some basic stream operations (count, filter, ... ) you will get to know the data and experience how easy it is to get things done with Spring Boot & Spring Cloud Stream. But since simple data processing is not enough to fulfill all your streaming needs, we will also let you experience the power of windows. After this workshop, tumbling, sliding and session windows hold no more mysteries and you will be a true streaming wizard.

uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator

Michael Hongliang Xu

Presented at Stream Processing Meetup (7/19/2018)(https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/251481797/). At Uber, we operate 20+ Kafka clusters to collect system and application logs as well as event data from rider and driver apps. We need a Kafka replication solution to replicate data between Kafka clusters across multiple data centers for different purposes. This talk will introduce the history behind uReplicator and the high level architecture. As the original uReplicator ran into scalability challenges and operational overhead as the scale of Kafka clusters increased, we built the Federated uReplicator which addressed above issues and provide an extensible architecture for further scaling.

Moving RDF Stream Processing to the Client

Ruben Taelman

On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data

confluent

Watch this talk here: https://www.confluent.io/online-talks/building-a-streaming-etl-solution-with-apache-kafka-rail-data-on-demand As data engineers, we frequently need to build scalable systems working with data from a variety of sources and with various ingest rates, sizes, and formats. This talk takes an in-depth look at how Apache Kafka can be used to provide a common platform on which to build data infrastructure driving both real-time analytics as well as event-driven applications. Using a public feed of railway data it will show how to ingest data from message queues such as ActiveMQ with Kafka Connect, as well as from static sources such as S3 and REST endpoints. We'll then see how to use stream processing to transform the data into a form useful for streaming to analytics in tools such as Elasticsearch and Neo4j. The same data will be used to drive a real-time notifications service through Telegram. If you're wondering how to build your next scalable data platform, how to reconcile the impedance mismatch between stream and batch, and how to wrangle streams of data—this talk is for you!

Kafka, Killer of Point-to-Point Integrations, Lucian Lita

confluent

With 60+ products and over 24% of the US GDP flowing through it, system integration is a tough problem for Intuit. Seasonality, scale, and massive peaks in products like TurboTax, QuickBooks, and Mint.com add extra layers of difficulty when building shared data services around transaction and user graphs, clickstream processing, a/b testing, and personalization. To reduce complexity and latency, we’ve implemented Kafka as the backbone across these data services. This allows us to asynchronously trigger relevant processing, elegantly scaling up and down as needed around peaks, all without the need for point-to-point integrations. In this talk, we share what we’ve learned about Kafka at Intuit and describe our data services architecture. We found that Kafka is invaluable in achieving a scalable, clean architecture, allowing engineering teams to focus less on integration and more on product development.

Matching the Scale at Tinder with Kafka

confluent

(Krunal Vora, Tinder) Kafka Summit San Francisco 2018 At Tinder, we have been using Kafka for streaming and processing events, data science processes and many other integral jobs. Forming the core of the pipeline at Tinder, Kafka has been accepted as the pragmatic solution to match the ever increasing scale of users, events and backend jobs. We, at Tinder, are investing time and effort to optimize the usage of Kafka solving the problems we face in the dating apps context. Kafka forms the backbone for the plans of the company to sustain performance through envisioned scale as the company starts to grow in unexplored markets. Come, learn about the implementation of Kafka at Tinder and how Kafka has helped solve the use cases for dating apps. Engage in the success story behind the business case of Kafka at Tinder.

Navigate Data Service using AWS

Arno Broekhof

Big Data Analytics Infrastructure

Min Zhou

The Future of Streaming: Global Apps, Event Stores and Serverless

Ben Stopford

Stream processing affects a wide range of industries today: capturing sensor data, connecting microservices, processing the workloads of internet giants and giving us a real-time alternative to batch analytics. While these use cases are exciting and valuable they are only a taste of what is to come. In this talk we look at three areas that are likely to become more prominent: Global Apps, Event Stores and Serverless Stream Processing

Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...

Flink Forward

Kafka Practices @ Uber - Seattle Apache Kafka meetup

Mingmin Chen

Building data pipelines is pretty hard! Building a multi-datacenter active-active real time data pipeline for multiple classes of data with different durability, latency and availability guarantees is much harder. Real time infrastructure powers critical pieces of Uber (think Surge) and in this talk we will discuss our architecture, technical challenges, learnings and how a blend of open source infrastructure (Apache Kafka and Flink) and in-house technologies have helped Uber scale.

Geo-Trending Example

David E Drummond

Scalable Dynamic Data Consumption on the Web

Ruben Taelman

Realtime stream processing with kafka

Praveen Singh Bora

Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...

Flink Forward

The application of Quantitative Analytics to trades for the generation of Risk and P&L metrics has traditionally followed a batch based approach. Regulatory changes impose increasing demand for compute on financial institutions along with a growing demand for real time analytics due to increased volumes in eTrading across all asset classes The talk is based on a use case for pricing Interest Rate Swaps, using Apache Beam, with a call to an external C++ analytics process. It describes the performance characteristics when operating in a non-cloud environment using Apache Flink as opposed to Google Cloud Dataflow The talk will touch upon the subtle difference when operating across multiple runners. It will make suggestions on approaches to portability when architecting for a multi-runner operational environment.

Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day

Ankur Bansal

Building data pipelines is pretty hard! Building a multi-datacenter active-active real time data pipeline for multiple classes of data with different durability, latency and availability guarantees is much harder. Real time infrastructure powers critical pieces of Uber (think Surge) and in this talk we will discuss our architecture, technical challenges, learnings and how a blend of open source infrastructure (Apache Kafka and Samza) and in-house technologies have helped Uber scale.

Observability for developer ( Inny So & Andrew Jones, ThoughtWorks) Kafka Su...

confluent

Have you ever tried to debug a production outage, when your system comprises apps your team has written, third-party apps your team runs, with logs going into some system, application performance metrics going into another system, and cloud platform metrics going somewhere else? Did you find yourself switching tabs, trying to correlate metrics with logs and alerts and finding yourself in a huge tangle? It is a nightmare. In the data world, we talk about aggregating all our data so we can derive new insights quickly, but what about our operational data? Observability is your ability to be able to ask questions of your system without having to write new code, or grab new data. When you've got an observable system, it feels like you have debugging superpowers, but can be challenging to even know where to start. If you can even convince your colleagues to start, finding the right tools can be challenging. In this talk Inny and Andrew will talk about what monitoring and logging are not sufficient anymore (if they ever were), observability basics, and demo an observability platform that you can use to start your observability journey today.

Stream Patterns

Diego Pacheco

Kafka Streams

Cristiano Altmann

Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)

Kai Wähner

From Prague Kafka Meetup in November 2018. This session introduces Apache Kafka as event-driven open source streaming platform. Apache Kafka goes far beyond scalable, high volume messaging. In addition, you can leverage Kafka Connect for integration and the Kafka Streams API for building lightweight stream processing microservices in autonomous teams. The open source Confluent Platform adds further components such as a KSQL, Schema Registry, REST Proxy, Clients for different programming languages and Connectors for different technologies and databases.

What's hot

Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"

Flink Forward

Building Event Streaming Applications with Pac-Man (Ricardo Ferreira, Conflue...

HostedbyConfluent

Stream Processing Live Traffic Data with Kafka Streams

Tom Van den Bulck

uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator

Michael Hongliang Xu

Moving RDF Stream Processing to the Client

Ruben Taelman

On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data

confluent

Kafka, Killer of Point-to-Point Integrations, Lucian Lita

confluent

Matching the Scale at Tinder with Kafka

confluent

Navigate Data Service using AWS

Arno Broekhof

Big Data Analytics Infrastructure

Min Zhou

The Future of Streaming: Global Apps, Event Stores and Serverless

Ben Stopford

Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...

Flink Forward

Kafka Practices @ Uber - Seattle Apache Kafka meetup

Mingmin Chen

Geo-Trending Example

David E Drummond

Scalable Dynamic Data Consumption on the Web

Ruben Taelman

Realtime stream processing with kafka

Praveen Singh Bora

Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...

Flink Forward

Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day

Ankur Bansal

Building data pipelines is pretty hard! Building a multi-datacenter active-active real time data pipeline for multiple classes of data with different durability, latency and availability guarantees is much harder. Real time infrastructure powers critical pieces of Uber (think Surge) and in this talk we will discuss our architecture, technical challenges, learnings and how a blend of open source infrastructure (Apache Kafka and Samza) and in-house technologies have helped Uber scale.

Observability for developer ( Inny So & Andrew Jones, ThoughtWorks) Kafka Su...

confluent

Stream Patterns

Diego Pacheco

What's hot (20)

Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"

Building Event Streaming Applications with Pac-Man (Ricardo Ferreira, Conflue...

Stream Processing Live Traffic Data with Kafka Streams

uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator

Moving RDF Stream Processing to the Client

On Track with Apache Kafka®: Building a Streaming ETL Solution with Rail Data

Kafka, Killer of Point-to-Point Integrations, Lucian Lita

Matching the Scale at Tinder with Kafka

Navigate Data Service using AWS

Big Data Analytics Infrastructure

The Future of Streaming: Global Apps, Event Stores and Serverless

Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...

Kafka Practices @ Uber - Seattle Apache Kafka meetup

Geo-Trending Example

Scalable Dynamic Data Consumption on the Web

Realtime stream processing with kafka

Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytic...

Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day

Observability for developer ( Inny So & Andrew Jones, ThoughtWorks) Kafka Su...

Stream Patterns

Similar to [Meetup ms] Kafka Streams

Kafka Streams

Cristiano Altmann

Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)

Kai Wähner

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)

Kai Wähner

Learn the differences between an event-driven streaming platform and middleware like MQ, ETL and ESBs – including best practices and anti-patterns, but also how these concepts and tools complement each other in an enterprise architecture. Extract-Transform-Load (ETL) is still a widely-used pattern to move data between different systems via batch processing. Due to its challenges in today’s world where real time is the new standard, an Enterprise Service Bus (ESB) is used in many enterprises as integration backbone between any kind of microservice, legacy application or cloud service to move data via SOAP / REST Web Services or other technologies. Stream Processing is often added as its own component in the enterprise architecture for correlation of different events to implement contextual rules and stateful analytics. Using all these components introduces challenges and complexities in development and operations. This session discusses how teams in different industries solve these challenges by building a native streaming platform from the ground up instead of using ETL and ESB tools in their architecture. This allows to build and deploy independent, mission-critical streaming real time application and microservices. The architecture leverages distributed processing and fault-tolerance with fast failover, no-downtime rolling deployments and the ability to reprocess events, so you can recalculate output when your code changes. Integration and Stream Processing are still key functionality but can be realized in real time natively instead of using additional ETL, ESB or Stream Processing tools.

MongoDB World 2019: Streaming ETL on the Shoulders of Giants

MongoDB

FLiP Into Trino

Timothy Spann

FLiP Into Trino FLiP into Trino. Flink Pulsar Trino Pulsar SQL (Trino/Presto) Remember the days when you could wait until your batch data load was done and then you could run some simple queries or build stale dashboards? Those days are over, today you need instant analytics as the data is streaming in real-time. You need universal analytics where that data is. I will show you how to do this utilizing the latest cloud native open source tools. In this talk we will utilize Trino, Apache Pulsar, Pulsar SQL and Apache Flink to analyze instantly data from IoT, sensors, transportation systems, Logs, REST endpoints, XML, Images, PDFs, Documents, Text, semistructured data, unstructured data, structured data and a hundred data sources you could never dream of streaming before. I will teach how to use Pulsar SQL to run analytics on live data. Tim Spann Developer Advocate StreamNative David Kjerrumgaard Developer Advocate StreamNative https://www.starburst.io/info/trinosummit/ https://github.com/tspannhw/FLiP-Into-Trino/blob/main/README.md https://github.com/tspannhw/StreamingAnalyticsUsingFlinkSQL/tree/main/src/main/java select * from pulsar."public/default"."weather"; Apache Pulsar plus Trio = fast analytics at scale

Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...

confluent

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...

confluent

MQ, ETL and ESB middleware are often used as integration backbone between legacy applications, modern microservices and cloud services. This introduces several challenges and complexities like point-to-point integration or non-scalable architectures. This session discusses how to build a completely event-driven streaming platform leveraging Apache Kafka’s open source messaging, integration and streaming components to leverage distributed processing, fault-tolerance, rolling upgrades and the ability to reprocess events. Learn the differences between a event-driven streaming platform leveraging Apache Kafka and middleware like MQ, ETL and ESBs – including best practices and anti-patterns, but also how these concepts and tools complement each other in an enterprise architecture.

Cloud lunch and learn real-time streaming in azure

Timothy Spann

Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka

Kai Wähner

Spoilt for Choice – Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka: Apache Kafka is a de facto standard streaming data processing platform. It is widely deployed as event streaming platform. Part of Kafka is its stream processing API “Kafka Streams”. In addition, the Kafka ecosystem now offers KSQL, a declarative, SQL-like stream processing language that lets you define powerful stream-processing applications easily. What once took some moderately sophisticated Java code can now be done at the command line with a familiar and eminently approachable syntax. This session discusses and demos the pros and cons of Kafka Streams and KSQL to understand when to use which stream processing alternative for continuous stream processing natively on Apache Kafka infrastructures. The end of the session compares the trade-offs of Kafka Streams and KSQL to separate stream processing frameworks such as Apache Flink or Spark Streaming.

Santander Stream Processing with Apache Flink

confluent

Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...

Michael Noll

Talk URL: https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/77360 Abstract: Would you cross the street with traffic information that’s a minute old? Certainly not. Modern businesses have the same needs nowadays, whether it’s due to competitive pressure or because their customers have much higher expectations of how they want to interact with a product or service. At the heart of this movement are events: in today’s digital age, events are everywhere. Every digital action—across online purchases to ride-sharing requests to bank deposits—creates a set of events around transaction amount, transaction time, user location, account balance, and much more. The technology that allows businesses to read, write, store, and compute and process these events in real-time are event-streaming platforms, and tens of thousands of companies like Netflix, Audi, PayPal, Airbnb, Uber, and Pinterest have picked Apache Kafka as the de facto choice to implement event-driven architectures and reshape their industries. Michael Noll explores why and how you can use Apache Kafka and its growing ecosystem to build event-driven architectures that are elastic, scalable, robust, and fault tolerant, whether it’s on-premises, in the cloud, on bare metal machines, or in Kubernetes with Docker containers. Specifically, you’ll look at Kafka as the storage and publish and subscribe layer; Kafka’s Connect framework for integrating external data systems such as MySQL, Elastic, or S3 with Kafka; and Kafka’s Streams API and KSQL as the compute layer to implement event-driven applications and microservices in Java and Scala and streaming SQL, respectively, that process the events flowing through Kafka in real time. Michael provides an overview of the most relevant functionality, both current and upcoming, and shares best practices and typical use cases so you can tie it all together for your own needs.

How to Build Streaming Apps with Confluent II

confluent

In this interactive session, you’ll access a lab environment that shows you how to build Streaming Applications on top of Kafka, leveraging Confluent's modern tooling. This is your exclusive opportunity to hear from the thought leaders of Apache Kafka on how event streaming enables you to leverage real-time data processing, with an easy-to-use, yet powerful interactive interface for stream processing, without the need to write code.

XStream: stream processing platform at facebook

Aniket Mokashi

Scalable Stream Processing with Apache Samza

Prateek Maheshwari

We have seen tremendous growth in near real-time ("nearline") processing at LinkedIn in recent years. LinkedIn now uses Apache Samza to process well over a Trillion messages every day across thousands of applications. Apache Samza serves as the foundation for several application platforms at LinkedIn, spanning a wide variety of use cases like security, notifications, machine learning, monitoring, search, and more. In this talk we will explore various features of Apache Samza that provide the flexibility and scalability to we need to power stream processing at massive scale.

AWS Step Functions을 활용한 서버리스 앱 오케스트레이션

Amazon Web Services Korea

Experiences in Architecting & Implementing Platforms using Serverless.pdf

Srushith Repakula

Chti jug - 2018-06-26

Florent Ramiere

Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...

confluent

In this talk we’ll look at the relationship between three of the most disruptive software engineering paradigms: event sourcing, stream processing and serverless. We’ll debunk some of the myths around event sourcing. We’ll look at the inevitability of event-driven programming in the serverless space and we’ll see how stream processing links these two concepts together with a single ‘database for events’. As the story unfolds we’ll dive into some use cases, examine the practicalities of each approach-particularly the stateful elements-and finally extrapolate how their future relationship is likely to unfold. Key takeaways include: The different flavors of event sourcing and where their value lies. The difference between stream processing at application- and infrastructure-levels. The relationship between stream processors and serverless functions. The practical limits of storing data in Kafka and stream processors like KSQL."

apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...

apidays

Streaming SQL to unify batch and stream processing: Theory and practice with ...

Fabian Hueske

SQL is the lingua franca for querying and processing data. To this day, it provides non-programmers with a powerful tool for analyzing and manipulating data. But with the emergence of stream processing as a core technology for data infrastructures, can you still use SQL and bring real-time data analysis to a broader audience? The answer is yes, you can. SQL fits into the streaming world very well and forms an intuitive and powerful abstraction for streaming analytics. More importantly, you can use SQL as an abstraction to unify batch and streaming data processing. Viewing streams as dynamic tables, you can obtain consistent results from SQL evaluated over static tables and streams alike and use SQL to build materialized views as a data integration tool. Fabian Hueske and Shuyi Chen explore SQL’s role in the world of streaming data and its implementation in Apache Flink and cover fundamental concepts, such as streaming semantics, event time, and incremental results. They also share their experience using Flink SQL in production at Uber, explaining how Uber leverages Flink SQL to solve its unique business challenges and how the unified stream and batch processing platform enables both technical or nontechnical users to process real-time and batch data reliably using the same SQL at Uber scale.

Similar to [Meetup ms] Kafka Streams (20)

Kafka Streams

Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)

MongoDB World 2019: Streaming ETL on the Shoulders of Giants

FLiP Into Trino

Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...

Cloud lunch and learn real-time streaming in azure

Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka

Santander Stream Processing with Apache Flink

Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...

How to Build Streaming Apps with Confluent II

XStream: stream processing platform at facebook

Scalable Stream Processing with Apache Samza

AWS Step Functions을 활용한 서버리스 앱 오케스트레이션

Experiences in Architecting & Implementing Platforms using Serverless.pdf

Chti jug - 2018-06-26

Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...

apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...

Streaming SQL to unify batch and stream processing: Theory and practice with ...

Recently uploaded

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...

SOFTTECHHUB

The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing. One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.

Communications Mining Series - Zero to Hero - Session 1

DianaGray10

This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered: • Communication Mining Overview • Why is it important? • How can it help today’s business and the benefits • Phases in Communication Mining • Demo on Platform overview • Q/A

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Albert Hoitingh

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...

Neo4j

Dr. Sean Tan, Head of Data Science, Changi Airport Group Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.

Introduction to CHERI technology - Cybersecurity

mikeeftimakis1

DevOps and Testing slides at DASA Connect

Kari Kakkonen

FIDO Alliance Osaka Seminar: Overview.pdf

FIDO Alliance

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

Neo4j

Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.

Pushing the limits of ePRTC: 100ns holdover for 100 days

Adtran

Removing Uninteresting Bytes in Software Fuzzing

Aftab Hussain

Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process. In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds. - These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

Free Complete Python - A step towards Data Science

RinaMondal9

UiPath Test Automation using UiPath Test Suite series, part 5

DianaGray10

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance

Essentials of Automations: The Art of Triggers and Actions in FME

Safe Software

In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation. We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios. Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

FIDO Alliance

Recently uploaded (20)

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...

Communications Mining Series - Zero to Hero - Session 1

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

GraphRAG is All You need? LLM & Knowledge Graph

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...

Introduction to CHERI technology - Cybersecurity

DevOps and Testing slides at DASA Connect

FIDO Alliance Osaka Seminar: Overview.pdf

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

Pushing the limits of ePRTC: 100ns holdover for 100 days

Removing Uninteresting Bytes in Software Fuzzing

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

Free Complete Python - A step towards Data Science

UiPath Test Automation using UiPath Test Suite series, part 5

Securing your Kubernetes cluster_ a step-by-step guide to success !

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

Essentials of Automations: The Art of Triggers and Actions in FME

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

[Meetup ms] Kafka Streams

1. Kafka Streams Quando streams encontram tabelas

2. Cristiano Altmann Arquiteto Software https://www.linkedin.com/in/crisaltmann/

3. Matheus Alagia Eng. de Computação https://www.linkedin.com/in/matheusalagia/ https://ubots.com.br

4. Microservices Independent deploy Low coupling Horizontal Scalability Technological choices Business time Independent teams Technological evolution Resilient

5. Microservices is really just distributed systems!

6. Increasingly we build ecosystem

7. The Hardest Part About Microservices: Your Data

8. Pessoa Conta Corrente CPF, Nome, Endereço, Renda, etc. Número conta, Tipo, Lançamentos, etc Services with different database

9. Pessoa Conta Corrente CPF, Nome, Endereço, Renda, etc. Número conta, Tipo, Lançamentos, etc Relatório CPF, Nome, Número Conta, Lançamentos Services with different database

10. Pessoa Conta Corrente CPF, Nome, Endereço, Renda, etc. Número conta, Tipo, Lançamentos, etc Relatório CPF, Nome, Número Conta, Lançamentos REST

11. Pessoa Conta Corrente CPF, Nome, Endereço, Renda, etc. Número conta, Tipo, Lançamentos, etc Relatório CPF, Nome, Número Conta, Lançamentos Database integration

12. Pessoa Relatório Services with different database Conta Cartão Previdência Título Serasa Crédito

13. Streaming processing can be the answer...

14. What is Streaming Processing? “Is some kind of computation over a Data Stream. First and foremost, a data stream is an abstraction representing an unbounded dataset. Unbounded means infinite and ever growing.” Kafka: The Definitive Guide

15. Streaming processing examples FILTER MAP

16. Stream processing is a programming paradigm... Request-Response Batch ProcessingStreaming Processing Throughput Latency

17. The world always changes, and sometimes we are interested in the events that caused those changes, whereas other times we are interested in the current state of the world…. Stream-Table Duality

18. Stream-Table Duality Example LANÇAMENTOS CONTA CORRENTE Conta: 1 Valor: 100 Conta: 1 Valor: 200 Conta: 5 Valor: 200 Conta: 1 Valor: -50

19. Stream-Table Duality Example LANÇAMENTOS CONTA CORRENTE Conta: 1 Valor: 100 Conta: 1 Valor: 200 Conta: 5 Valor: 200 Conta: 1 Valor: -50 1: 250 5: 200

20. Systems that allow you to transition back and forth between the two ways of looking at data are more powerful than systems that support just one. - Neha Narkhede (Kafka: The definitive guide)

21. Kafka vs Kafka Stream ● Distributed log ● High available ● ⅓ Fortune 500 ● APIS: ○ Producer ○ Consumer ○ Connect ○ Streams ● Part of Kafka ecosystem ● Just a lib ● Simple API ● DSL

22. Stream-Processing Design Patterns

23. Single-Event Processing FILTER

24. Code

25. External Lookup: Stream-Table Join

26. Processing with Local State

27. Mantra: Stateless Service Client STATELESS STATE STORAGE ?? ?

28. Stateless is good!! ● Services start instantly ● Thread safe ● Scaled out linearly ● Not shared state

29. But sometimes we need state…. Moving the state to a database just push the problem to another layer.

30.

31. Statefull can be good too... ● Services start rapidly ● Thread safe ● Scaled out linearly ● Not shared state

32. KAFKA STREAMS ● Services start rapidly V ● Thread safe V ● Scaled out linearly V ● Not shared state V ● HIGH THROUGHPUT

33.

34. Stream

35. Table

36. Stream

37.

38. Stream-Processing Concepts Time Event time Local state Log append time State Processing time External state Time Windows Slide window Tumbling window Hopping Window

39. Stream-Processing Concepts Time Event time Local state Log append time State Processing time External state Time Windows Slide window Tumbling window Hopping Window

40. Stream-Processing Concepts Time Event time Local state Log append time State Processing time External state Time Windows Slide window Tumbling window Hopping Window

41. Join ● Stream - Stream ● Stream - Table ● Table - Table

42. Stream - Stream (Windowed-join)

43. Stream - Table

44. Under the wood

45. Under the wood

46. Under the wood

47. Stream-Processing Landscape

48.

49. Conclusion ● Stream processing is a powerful tool ● Kafka Streams is simple ● Designed for microservices

50. Obrigado!

51. Perguntas?