SlideShare a Scribd company logo
Serverless Kafka and Spark in a
Multi-Cloud Data Lakehouse Architecture
Kai Waehner
Field CTO
kai.waehner@confluent.io
linkedin.com/in/kaiwaehner
@KaiWaehner
confluent.io
kai-waehner.de
Agenda
• Data Analytics at Rest
• Data Streaming in Motion
• Lakehouse: Data Streaming + Analytics
• A Lakehouse Example: Intelligent Connected Cars
• Cloud-Native vs. Serverless Infrastructure
• Central vs. Hybrid and Global Data Mesh
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Agenda
• Data Analytics at Rest
• Data Streaming in Motion
• Lakehouse: Data Streaming + Analytics
• A Lakehouse Example: Intelligent Connected Cars
• Cloud-Native vs. Serverless Infrastructure
• Central vs. Hybrid and Global Data Mesh
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Storage at Rest
USER
JAY
SUE
FRED
CREDIT_SCORE
695
430
710
V1
V3
V2
Analytics at Rest
SELECT * FROM
DB_TABLE
Active Query: Passive Data:
DB Table
Use Cases for Data at Rest
• Reporting
• Business Intelligence
• Data Engineering
• Big Data Analytics
• Machine Learning
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Apache Spark – The De Facto Standard for Big Data at Rest
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Big Data In Big Data Out
Big Data Storage and Processing
From Historical Data
to Insights
Delta Lake Open-source storage framework and open format for data analytics
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Agenda
• Data Analytics at Rest
• Data Streaming in Motion
• Lakehouse: Data Streaming + Analytics
• A Lakehouse Example: Intelligent Connected Cars
• Cloud-Native vs. Serverless Infrastructure
• Central vs. Hybrid and Global Data Mesh
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Real-time Data beats Slow Data.
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Real-time Data beats Slow Data.
Transportation
Real-time sensor
diagnostics
Driver-rider match
ETA updates
Insurance
Claim processing
Fraud detection
Omnichannel quote
processing
Retail
Real-time inventory
Real-time POS
reporting
Personalization
Entertainment
Real-time
recommendations
Personalized
news feed
In-app purchases
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Data at Rest Data in Motion
SELECT * FROM
DB_TABLE
CREATE TABLE T
AS SELECT * FROM
EVENT_STREAM
Active Query: Passive Data:
DB Table
Active Data: Passive Query:
Event Stream
Tables at
Rest
Streams in
Motion
USER
JAY
SUE
FRED
CREDIT_SCORE
695
430
710
V1
V3
V2
PAYMENTS
42
18
65
...
USER
JAY
SUE
FRED
...
Data Streaming = Data at Rest + Data in Motion
Payments Stream
Credit Score Stream
CREATE TABLE credit_scores AS
SELECT user, updateScore(p.amount)…
Apache Kafka – The De Facto Standard for Data in Motion
Database
CRM
Sensors
Mobile
Customer 360
Real-time
Alerting System
Data
warehouse
Producers
Consumers
Streams of real time events
Stream processing
apps
Connectors
Connectors
Stream processing
apps
Incident
Alert
Forecast
Pricing Customer
Order
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Agenda
• Data Analytics at Rest
• Data Streaming in Motion
• Lakehouse: Data Streaming + Analytics
• A Lakehouse Example: Intelligent Connected Cars
• Cloud-Native vs. Serverless Infrastructure
• Central vs. Hybrid and Global Data Mesh
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Data Lakehouse
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Building the Data Lakehouse
Author: Bill Inmon
Lakehouse is a
logical view, not physical!
Lambda Architecture
Option 1: Unified serving layer
Data
Source
Real-Time Layer
(Data Processing in Motion)
Batch Layer
(Data Processing at Rest)
Serving
Layer
Real-Time App
(Data Processing in Motion)
Batch App
(Data Processing at Rest)
ms
min/hr
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Data
Source
Real-Time Layer
(Data Processing in Motion)
Batch Layer
(Data Processing at Rest)
Real-time Query
Mixed Query
ms
min/hr
Speed
View
Batch
View
Batch Query
Lambda Architecture
Option 2: Separate serving layers
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Data
Source
Real-Time Layer
(Data Processing in Motion)
Real-Time App
(Data Processing in Motion)
Storage
Batch App
(Data Processing at Rest)
Storage
ms
min/hr
Storage
Kappa Architecture
One pipeline for real-time and batch consumers
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Kappa @ Uber
24
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Confluent + Databricks Reference Architecture
Kafka
Connect
On Premises
or any cloud
Kafka Streams
& ksqlDB - real-time
stream processing
and transformations
Databricks Data
Science Workspace
Databricks
Delta Lake
Sink
Connector
for
Confluent
Cloud
(AWS)
Legacy Data Stores:
Netezza, Teradata
Oracle, Mainframes
Databases
IoT
Data Streaming Analytics
Sources
Data Streaming
Platform
built on Kafka
On Premises
or any cloud
Databricks BI
Workspace
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Agenda
• Data Analytics at Rest
• Data Streaming in Motion
• Lakehouse: Data Streaming + Analytics
• A Lakehouse Example: Intelligent Connected Cars
• Cloud-Native vs. Serverless Infrastructure
• Central vs. Hybrid and Global Data Mesh
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Connected Car Infrastructure at Audi
27
• Real Time Data Analysis
• Swarm Intelligence
• Collaboration with Partners
• Predictive AI
• …
https://www.youtube.com/watch?v=yGLKi3TMJv8
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Connected Car Infrastructure at Audi
28
https://www.youtube.com/watch?v=yGLKi3TMJv8
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Kappa Architecture for a Lakehouse with Kafka and Spark
MQTT
Proxy
Spark Core
Storage
Spark SQL
Reporting
Kafka
Cluster
Kafka
Connect
Car Sensors
Kafka Ecosystem
Spark Ecosystem
Other Components
Kafka
Streams
All
Data
Critical
Data
Ingest
Data
Potential
Detect
Spark
MLlib
Model
Training
ksqlDB
Model
Deployment
Preprocess
Data
Consume
Data
Deploy
Analytic
Model
Mobile App
BI Tool
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Machine Learning Model Training
with Spark MLlib
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
https://dev.to/siddhantpatro/spark-mllib-for-big-data-and-machine-learning-330j
“CREATE STREAM AnomalyDetection AS
SELECT sensor_id, detectAnomaly(sensor_values)
FROM car_engine;“
User Defined Function (UDF)
Model Deployment with
Apache Kafka, ksqlDB and Spark MLlib
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
MLlib
Stream Processing with Kafka or Spark?
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Kafka Streams /
ksqlDB
Spark
Streaming
Component of the
data streaming infrastructure
Low latency
Focus on 24/7 operations
Lightweight, decoupled
microservices
Component of the data
analytics infrastructure
Strong integration with the rest
of the Spark ecossytem
Stream and batch
Machine Learning “embedded”
Agenda
• Data Analytics at Rest
• Data Streaming in Motion
• Lakehouse: Data Streaming + Analytics
• A Lakehouse Example: Intelligent Connected Cars
• Cloud-Native vs. Serverless Infrastructure
• Central vs. Hybrid and Global Data Mesh
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Cloud-Native Deployment
à Elastic Infrastructure and Faster Time-to-Market
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
You
Manage
Provider
Managed
Self-managed IaaS Hosted Cloud Service Fully Managed SaaS
Scaling Scaling Scaling
Load balancing Load balancing Load balancing
Partition placement Partition placement Partition placement
Logical Storage Logical Storage Logical Storage
Broker settings Broker settings Broker settings
Zookeeper Zookeeper Zookeeper
Kafka patching Kafka patching Kafka patching
JVM JVM JVM
O/S O/S O/S
VMs VMs VMs
Servers Servers Servers
Provider
managed
features
Product
ease of use
Fully Managed
Partially Managed
Self-Managed
What is a (truly) fully-managed SaaS?
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Agenda
• Data Analytics at Rest
• Data Streaming in Motion
• Lakehouse: Data Streaming + Analytics
• A Lakehouse Example: Intelligent Connected Cars
• Cloud-Native vs. Serverless Infrastructure
• Central vs. Hybrid and Global Data Mesh
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
AWS Cloud Outage hit Disney World Visitors…
https://www.cnet.com/tech/services-and-software/disney-parks-were-already-facing-heat-from-fans-then-an-aws-outage-came-along/
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Disaster Recovery – RPO and RTO
RPO = Recovery Point Objective
RTO = Recovery Time Objective
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Use Cases for Hybrid and Multi-Cloud Data Lakehouses
• Disaster Recovery and High Availability:
Create a disaster recovery cluster, and fail
over to it during an outage.
• Global and Multi-Cloud Replication: Move
and aggregate data across regions and
clouds.
• Data Sharing: Share data with other teams,
lines-of-business, or organizations.
• Data Migration: Migrate data and workloads
from one cluster to another (like from legacy
on-premise data warehouse to cloud-native
data lakehouse).
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
Data Replication
at Rest
or in Motion?
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Global Data Lakehouse across Edge and Hybrid Cloud
Streaming Replication between Kafka Clusters
Bridge to Databases, Data Lakes, Apps, APIs, SaaS
Aggregation of Edge
Deployments with
Replication (Aggregation)
Disaster Recovery
Operations with
Multi-Region Clusters
for RPO=0 and RTO~0
Global Data Streaming with
Replication and Cluster Linking
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
A data mesh for decentralized data products
Data
Product
Independent Data Products
for Reporting, Analytics,
Data Streaming
kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
For instance:
A KSQL microservice
Kai Waehner
Field CTO
kai.waehner@confluent.io
@KaiWaehner
confluent.io
kai-waehner.de
linkedin.com/in/kaiwaehner
Questions? Feedback?
Let’s connect!

More Related Content

What's hot

Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
Modern Data Stack France
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
Aljoscha Krettek
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
Databricks
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
Adam Doyle
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?
Yaroslav Tkachenko
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
Timothy Spann
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
Sudheer Kondla
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
chennakesava44
 
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Cathrine Wilhelmsen
 

What's hot (20)

Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
 
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
 

Similar to Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture

IoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache KafkaIoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache Kafka
confluent
 
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentCan Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
HostedbyConfluent
 
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
Kai Wähner
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Kai Wähner
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
confluent
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
Kai Wähner
 
When NOT to Use Apache Kafka? With Kai Waehner | Current 2022
When NOT to Use Apache Kafka? With Kai Waehner | Current 2022When NOT to Use Apache Kafka? With Kai Waehner | Current 2022
When NOT to Use Apache Kafka? With Kai Waehner | Current 2022
HostedbyConfluent
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
HostedbyConfluent
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Kai Wähner
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
confluent
 
Connected Vehicles and V2X with Apache Kafka
Connected Vehicles and V2X with Apache KafkaConnected Vehicles and V2X with Apache Kafka
Connected Vehicles and V2X with Apache Kafka
Kai Wähner
 
Set Your Data In Motion - CTO Roundtable
Set Your Data In Motion - CTO RoundtableSet Your Data In Motion - CTO Roundtable
Set Your Data In Motion - CTO Roundtable
confluent
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Kai Wähner
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Kai Wähner
 
The State of Serverless Computing | AWS Public Sector Summit 2017
The State of Serverless Computing | AWS Public Sector Summit 2017The State of Serverless Computing | AWS Public Sector Summit 2017
The State of Serverless Computing | AWS Public Sector Summit 2017
Amazon Web Services
 
Apache Kafka® and Analytics in a Connected IoT World
Apache Kafka® and Analytics in a Connected IoT WorldApache Kafka® and Analytics in a Connected IoT World
Apache Kafka® and Analytics in a Connected IoT World
confluent
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
Kai Wähner
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
DATAVERSITY
 

Similar to Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture (20)

IoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache KafkaIoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache Kafka
 
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentCan Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
 
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
When NOT to Use Apache Kafka? With Kai Waehner | Current 2022
When NOT to Use Apache Kafka? With Kai Waehner | Current 2022When NOT to Use Apache Kafka? With Kai Waehner | Current 2022
When NOT to Use Apache Kafka? With Kai Waehner | Current 2022
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
Connected Vehicles and V2X with Apache Kafka
Connected Vehicles and V2X with Apache KafkaConnected Vehicles and V2X with Apache Kafka
Connected Vehicles and V2X with Apache Kafka
 
Set Your Data In Motion - CTO Roundtable
Set Your Data In Motion - CTO RoundtableSet Your Data In Motion - CTO Roundtable
Set Your Data In Motion - CTO Roundtable
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
The State of Serverless Computing | AWS Public Sector Summit 2017
The State of Serverless Computing | AWS Public Sector Summit 2017The State of Serverless Computing | AWS Public Sector Summit 2017
The State of Serverless Computing | AWS Public Sector Summit 2017
 
Apache Kafka® and Analytics in a Connected IoT World
Apache Kafka® and Analytics in a Connected IoT WorldApache Kafka® and Analytics in a Connected IoT World
Apache Kafka® and Analytics in a Connected IoT World
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 

More from Kai Wähner

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Kai Wähner
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kai Wähner
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Kai Wähner
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Kai Wähner
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Kai Wähner
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
Kai Wähner
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
Kai Wähner
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Kai Wähner
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Kai Wähner
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and Manufacturing
Kai Wähner
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Kai Wähner
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Kai Wähner
 
Apache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and LogisticsApache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and Logistics
Kai Wähner
 
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR ModernizationApache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Kai Wähner
 
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Kai Wähner
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Kai Wähner
 
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache KafkaIBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
Kai Wähner
 
Apache Kafka in the Insurance Industry
Apache Kafka in the Insurance IndustryApache Kafka in the Insurance Industry
Apache Kafka in the Insurance Industry
Kai Wähner
 
Apache Kafka and MQTT - Overview, Comparison, Use Cases, Architectures
Apache Kafka and MQTT - Overview, Comparison, Use Cases, ArchitecturesApache Kafka and MQTT - Overview, Comparison, Use Cases, Architectures
Apache Kafka and MQTT - Overview, Comparison, Use Cases, Architectures
Kai Wähner
 

More from Kai Wähner (20)

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and Manufacturing
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
 
Apache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and LogisticsApache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and Logistics
 
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR ModernizationApache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
 
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
 
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache KafkaIBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
 
Apache Kafka in the Insurance Industry
Apache Kafka in the Insurance IndustryApache Kafka in the Insurance Industry
Apache Kafka in the Insurance Industry
 
Apache Kafka and MQTT - Overview, Comparison, Use Cases, Architectures
Apache Kafka and MQTT - Overview, Comparison, Use Cases, ArchitecturesApache Kafka and MQTT - Overview, Comparison, Use Cases, Architectures
Apache Kafka and MQTT - Overview, Comparison, Use Cases, Architectures
 

Recently uploaded

Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
Peter Caitens
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
Jelle | Nordend
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 

Recently uploaded (20)

Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 

Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture

  • 1. Serverless Kafka and Spark in a Multi-Cloud Data Lakehouse Architecture Kai Waehner Field CTO kai.waehner@confluent.io linkedin.com/in/kaiwaehner @KaiWaehner confluent.io kai-waehner.de
  • 2. Agenda • Data Analytics at Rest • Data Streaming in Motion • Lakehouse: Data Streaming + Analytics • A Lakehouse Example: Intelligent Connected Cars • Cloud-Native vs. Serverless Infrastructure • Central vs. Hybrid and Global Data Mesh kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 3. Agenda • Data Analytics at Rest • Data Streaming in Motion • Lakehouse: Data Streaming + Analytics • A Lakehouse Example: Intelligent Connected Cars • Cloud-Native vs. Serverless Infrastructure • Central vs. Hybrid and Global Data Mesh kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 5. Analytics at Rest SELECT * FROM DB_TABLE Active Query: Passive Data: DB Table
  • 6. Use Cases for Data at Rest • Reporting • Business Intelligence • Data Engineering • Big Data Analytics • Machine Learning kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 7. Apache Spark – The De Facto Standard for Big Data at Rest kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe Big Data In Big Data Out Big Data Storage and Processing From Historical Data to Insights
  • 8. Delta Lake Open-source storage framework and open format for data analytics kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 9. Agenda • Data Analytics at Rest • Data Streaming in Motion • Lakehouse: Data Streaming + Analytics • A Lakehouse Example: Intelligent Connected Cars • Cloud-Native vs. Serverless Infrastructure • Central vs. Hybrid and Global Data Mesh kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 10. Real-time Data beats Slow Data. kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 11. Real-time Data beats Slow Data. Transportation Real-time sensor diagnostics Driver-rider match ETA updates Insurance Claim processing Fraud detection Omnichannel quote processing Retail Real-time inventory Real-time POS reporting Personalization Entertainment Real-time recommendations Personalized news feed In-app purchases kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 12. Data at Rest Data in Motion SELECT * FROM DB_TABLE CREATE TABLE T AS SELECT * FROM EVENT_STREAM Active Query: Passive Data: DB Table Active Data: Passive Query: Event Stream
  • 14. Data Streaming = Data at Rest + Data in Motion Payments Stream Credit Score Stream CREATE TABLE credit_scores AS SELECT user, updateScore(p.amount)…
  • 15. Apache Kafka – The De Facto Standard for Data in Motion Database CRM Sensors Mobile Customer 360 Real-time Alerting System Data warehouse Producers Consumers Streams of real time events Stream processing apps Connectors Connectors Stream processing apps Incident Alert Forecast Pricing Customer Order kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 16. Agenda • Data Analytics at Rest • Data Streaming in Motion • Lakehouse: Data Streaming + Analytics • A Lakehouse Example: Intelligent Connected Cars • Cloud-Native vs. Serverless Infrastructure • Central vs. Hybrid and Global Data Mesh kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 17. Data Lakehouse kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe Building the Data Lakehouse Author: Bill Inmon Lakehouse is a logical view, not physical!
  • 18. Lambda Architecture Option 1: Unified serving layer Data Source Real-Time Layer (Data Processing in Motion) Batch Layer (Data Processing at Rest) Serving Layer Real-Time App (Data Processing in Motion) Batch App (Data Processing at Rest) ms min/hr kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 19. Data Source Real-Time Layer (Data Processing in Motion) Batch Layer (Data Processing at Rest) Real-time Query Mixed Query ms min/hr Speed View Batch View Batch Query Lambda Architecture Option 2: Separate serving layers kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 20. Data Source Real-Time Layer (Data Processing in Motion) Real-Time App (Data Processing in Motion) Storage Batch App (Data Processing at Rest) Storage ms min/hr Storage Kappa Architecture One pipeline for real-time and batch consumers kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 21. Kappa @ Uber 24 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 22. Confluent + Databricks Reference Architecture Kafka Connect On Premises or any cloud Kafka Streams & ksqlDB - real-time stream processing and transformations Databricks Data Science Workspace Databricks Delta Lake Sink Connector for Confluent Cloud (AWS) Legacy Data Stores: Netezza, Teradata Oracle, Mainframes Databases IoT Data Streaming Analytics Sources Data Streaming Platform built on Kafka On Premises or any cloud Databricks BI Workspace kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 23. Agenda • Data Analytics at Rest • Data Streaming in Motion • Lakehouse: Data Streaming + Analytics • A Lakehouse Example: Intelligent Connected Cars • Cloud-Native vs. Serverless Infrastructure • Central vs. Hybrid and Global Data Mesh kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 24. Connected Car Infrastructure at Audi 27 • Real Time Data Analysis • Swarm Intelligence • Collaboration with Partners • Predictive AI • … https://www.youtube.com/watch?v=yGLKi3TMJv8 kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 25. Connected Car Infrastructure at Audi 28 https://www.youtube.com/watch?v=yGLKi3TMJv8 kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 26. Kappa Architecture for a Lakehouse with Kafka and Spark MQTT Proxy Spark Core Storage Spark SQL Reporting Kafka Cluster Kafka Connect Car Sensors Kafka Ecosystem Spark Ecosystem Other Components Kafka Streams All Data Critical Data Ingest Data Potential Detect Spark MLlib Model Training ksqlDB Model Deployment Preprocess Data Consume Data Deploy Analytic Model Mobile App BI Tool kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 27. Machine Learning Model Training with Spark MLlib kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe https://dev.to/siddhantpatro/spark-mllib-for-big-data-and-machine-learning-330j
  • 28. “CREATE STREAM AnomalyDetection AS SELECT sensor_id, detectAnomaly(sensor_values) FROM car_engine;“ User Defined Function (UDF) Model Deployment with Apache Kafka, ksqlDB and Spark MLlib kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe MLlib
  • 29. Stream Processing with Kafka or Spark? kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe Kafka Streams / ksqlDB Spark Streaming Component of the data streaming infrastructure Low latency Focus on 24/7 operations Lightweight, decoupled microservices Component of the data analytics infrastructure Strong integration with the rest of the Spark ecossytem Stream and batch Machine Learning “embedded”
  • 30. Agenda • Data Analytics at Rest • Data Streaming in Motion • Lakehouse: Data Streaming + Analytics • A Lakehouse Example: Intelligent Connected Cars • Cloud-Native vs. Serverless Infrastructure • Central vs. Hybrid and Global Data Mesh kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 31. Cloud-Native Deployment à Elastic Infrastructure and Faster Time-to-Market kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 32. You Manage Provider Managed Self-managed IaaS Hosted Cloud Service Fully Managed SaaS Scaling Scaling Scaling Load balancing Load balancing Load balancing Partition placement Partition placement Partition placement Logical Storage Logical Storage Logical Storage Broker settings Broker settings Broker settings Zookeeper Zookeeper Zookeeper Kafka patching Kafka patching Kafka patching JVM JVM JVM O/S O/S O/S VMs VMs VMs Servers Servers Servers Provider managed features Product ease of use Fully Managed Partially Managed Self-Managed What is a (truly) fully-managed SaaS? kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 33. Agenda • Data Analytics at Rest • Data Streaming in Motion • Lakehouse: Data Streaming + Analytics • A Lakehouse Example: Intelligent Connected Cars • Cloud-Native vs. Serverless Infrastructure • Central vs. Hybrid and Global Data Mesh kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 34. AWS Cloud Outage hit Disney World Visitors… https://www.cnet.com/tech/services-and-software/disney-parks-were-already-facing-heat-from-fans-then-an-aws-outage-came-along/ kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 35. Disaster Recovery – RPO and RTO RPO = Recovery Point Objective RTO = Recovery Time Objective kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 36. Use Cases for Hybrid and Multi-Cloud Data Lakehouses • Disaster Recovery and High Availability: Create a disaster recovery cluster, and fail over to it during an outage. • Global and Multi-Cloud Replication: Move and aggregate data across regions and clouds. • Data Sharing: Share data with other teams, lines-of-business, or organizations. • Data Migration: Migrate data and workloads from one cluster to another (like from legacy on-premise data warehouse to cloud-native data lakehouse). kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe Data Replication at Rest or in Motion?
  • 37. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Global Data Lakehouse across Edge and Hybrid Cloud Streaming Replication between Kafka Clusters Bridge to Databases, Data Lakes, Apps, APIs, SaaS Aggregation of Edge Deployments with Replication (Aggregation) Disaster Recovery Operations with Multi-Region Clusters for RPO=0 and RTO~0 Global Data Streaming with Replication and Cluster Linking kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe
  • 38. A data mesh for decentralized data products Data Product Independent Data Products for Reporting, Analytics, Data Streaming kai-waehner.de | @KaiWaehner | Serverless Apache Kafka and Spark across the Globe For instance: A KSQL microservice