SlideShare a Scribd company logo
EDA Meets
Data Engineering –
What’s the Big Deal?
Guru Sattanathan, Systems Engineer
@avoguru
How are they evolving?
Kafka Events
How Kafka survived?
https://trends.google.com/trends/explore?date=all&q=%2Fm%2F0zmynvd
How Kafka survived?
https://trends.google.com/trends/explore?date=all&q=%2Fm%2F0zmynvd,%2Fm%2F0fdjtq
Change drivers
Cloud
Friendly
Process data
on the move
Automation Feedback
loop
But why?
6
ALL STARTS WITH ONE
FUNDAMENTAL ASSUMPTION:
DATA IS PASSIVE
7
TWO WORLDS
WITHIN
ENTERPRISE IT
TWO WORLDS OF ENTERPRISE IT
9
THINGS THAT
ARE
OPERATIONALLY
CRITICAL
THINGS THAT
ARE NOT
TWO WORLDS OF ENTERPRISE IT
10
THINGS THAT
HELP RUN THE
BUSINESS
THINGS THAT
HELP MAKE
MORE $$$
TWO WORLDS OF ENTERPRISE IT
11
World of
Enterprise Apps
World of
Data & Analytics
TWO WORLDS OF ENTERPRISE IT
12
Microservices,
Enterprise Integration,
Databases, Kubernetes,
CICD,
API’s, Websites, etc.
Data warehouse,
Analytics,
Machine Learning,
Neural Nets,
BI Dashboards, SQL Nerds,
Hadoop,
Spark, etc.
TWO WORLDS OF ENTERPRISE IT
13
Heavy focus on
Uptime
Heavy focus on
Data computation
TWO WORLDS OF ENTERPRISE IT
14
Data Acquisition Data Processing
Data storage wall
TWO WORLDS OF ENTERPRISE IT
15
You were able to process data only after storing it in a database or a
data lake
OR
You have to wait for data to accumulate before you start processing it
Data storage wall
led to...
TWO WORLDS OF ENTERPRISE IT
17
World of
Enterprise Apps
World of
Data & Analytics
Data storage wall
Slow feedback
TWO WORLDS OF ENTERPRISE IT
18
Real-time Batch
Data storage wall
Slow feedback
Event-Driven
Architecture
is NOT NEW
!?
True.
Whether we cared
about EDA or not,
Events always existed.
TWO WORLDS OF ENTERPRISE IT
22
Already Event-Driven Batch Event Processing
Data storage wall
Slow feedback
What’s missing ?
Ability to process data
while it is moving.
Ability to respond to events
whenever it happens.
Event Stream Processing (ESP)
again Why ?
TWO WORLDS OF ENTERPRISE IT
28
World of
Enterprise Apps
World of
Data & Analytics
The wall is blurring
Improved feedback cycle
Event Streaming Platform,
How does it look like ?
A Streaming Platform is the Underpinning
of an Event-driven Architecture
Ubiquitous connectivity
Globally scalable platform for all
event producers and consumers
Immediate data access
Data accessible to all
consumers in real time
Single system of record
Persistent storage to enable
reprocessing of past events
Continuous queries
Stream processing capabilities
for in-line data transformation
Microservices
DBs
SaaS apps
Mobile
Customer 360
Real-time fraud
detection
Data warehouse
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Stream processing apps Apps from both the worlds
{faas}
events as a backbone
appappappapp
Payments Department 2
{faas}appappappapp
Department 3 Department 4
1 input, 1 output.
Low latency,
Poor throughput.
Request/
Response
All inputs, all outputs.
Poor latency,
high throughput.
Batch
Some inputs, some outputs.
Tunable latency &
throughput.
Stream
Processing
How it works ?
The log is a simple idea
Messages are added at the end of the log
Old New
Shard data to get scalability
Messages are sent to different partitions
Producer (1) Producer (2) Producer (3)
Cluster of
machines
Partitions live on
different machines
Messages are sent
to different
partitions
36
STREAM
PROCESSING
Create and store
materialized views
Filter
Analyze in-flight
Kafka
producer/
consumer
Kafka
Streams
KSQL
Stream processing modalities with Confluent
ConsumerRecords<String, String> records = consumer.poll(100);
Map<String, Integer> counts = new DefaultMap<String,
Integer>();
for (ConsumerRecord<String, Integer> record : records) {
String key = record.key();
int c = counts.get(key)
c += record.value()
counts.put(key, c)
}
for (Map.Entry<String, Integer> entry : counts.entrySet()) {
int stateCount;
int attempts;
while (attempts++ < MAX_RETRIES) {
try {
stateCount = stateStore.getValue(entry.getKey())
stateStore.setValue(entry.getKey(), entry.getValue() +
stateCount)
break;
} catch (StateStoreException e) {
RetryUtils.backoff(attempts);
}
}
}
Stream processing approach comparison
Kafka producer/consumer Kafka Streams KSQL
builder
.stream("input-stream",
Consumed.with(Serdes.String(),
Serdes.String()))
.groupBy((key, value) -> value)
.count()
.toStream()
.to("counts", Produced.with(Serdes.String(),
Serdes.Long()));
SELECT x, count(*) FROM stream GROUP BY x
EMIT CHANGES;
KSQL Example Use cases
Data exploration Data enrichment Streaming ETL
Filter, cleanse, mask Real-time monitoring Anomaly detection
Example: Retail
KSQL joins the two
streams in real-time
Stream of shipments
that arrive
Stream of purchases from
online and physical stores
Inventory
on hand
Example: CDC from DB via Kafka to Elastic
KAFKA
CONNECT
KAFKA
CONNECT
Customers
KSQL processes table
changes in real-time
streams data in streams data out
Example: IoT, Automotive, Connected Cars
KAFKA
CONNECT
KSQL joins the two
streams in real-time
Kafka Connect
streams data in
Cars send telemetry
data via Kafka API
Kafka Streams application
to notify customers
Customers
KAFKA
STREAMS
KSQL
43
KSQL for Real-Time
Monitoring CREATE STREAM
syslog_invalid_users AS
SELECT host, message
FROM syslog
WHERE message LIKE
'%Invalid user%';
● Log data monitoring
● Tracking and alerting
● Syslog data
● Sensor / IoT data
● Application metrics
http://cnfl.io/syslogs-filtering
http://cnfl.io/syslog-alerting
44
KSQL for Anomaly
Detection
CREATE TABLE possible_fraud AS
SELECT card_number, COUNT(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5
SECONDS)
GROUP BY card_number
HAVING COUNT(*) > 3;
● Identify patterns or
anomalies in
real-time data,
surfaced in
milliseconds
45
KSQL for Streaming
ETL
CREATE STREAM vip_actions AS
SELECT user_id, page, action
FROM clickstream c
LEFT JOIN users u
ON c.user_id = u.user_id
WHERE u.level = 'Platinum';
● Joining, filtering,
and aggregating
streams of event
data
{faas}
events as a backbone
appappappapp
Payments Department 2
{faas}appappappapp
Department 3 Department 4
Change drivers
Process data
on the move
Feedback
loop
Cloud
Friendly
Automation
It takes time :)
Event Streaming Maturity Model
Value
Maturity (Investment & time)
1
2
3
4
5
Pre-Streaming
Developer
Interest
Enterprise
Streaming Pilot /
Early Production
SLA Ready,
Integrated
Streaming
Global
Streaming
Legacy systems.
Batch processes.
Complex / Slow!
LOB Pilot; Small teams
experimenting, with pub/sub
/ integration.
-> 1-3 use cases quickly
moved into Production.
Fragmented.
Multiple mission critical
use cases in production,
with; scale, DR & SLAs.
Streaming clearly
delivering business
value, with C-suite
visibility.
All data in the organization
managed through a single
logical streaming platform.
-> Digital natives / digital pure
players - probably using
Machine Learning & AI
(Relational databases -
redundant)
Pub + Sub Store Process
Data Streaming;
typical maturity stages
Central
Nervous
System
Developer downloads
Kafka & experiments
(15 mins on laptop).
Streaming Platform
managing majority of
mission critical data
processes, globally, with
multi-datacenter
replication across
on-prem and hybrid
clouds.
Projects Platform
developer.confluent.io
Guru Sattanathan | @avoguru

More Related Content

What's hot

Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Flink Forward
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
confluent
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
confluent
 

What's hot (20)

Rediscovering the Value of Apache Kafka® in Modern Data Architecture
Rediscovering the Value of Apache Kafka® in Modern Data ArchitectureRediscovering the Value of Apache Kafka® in Modern Data Architecture
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
 
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19Confluent Cloud for Apache Kafka® | Google Cloud Next ’19
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19
 
Event Driven Architecture: Mistakes, I've made a few...
Event Driven Architecture: Mistakes, I've made a few...Event Driven Architecture: Mistakes, I've made a few...
Event Driven Architecture: Mistakes, I've made a few...
 
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
 
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingThe Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
 
What every software engineer should know about streams and tables in kafka ...
What every software engineer should know about streams and tables in kafka   ...What every software engineer should know about streams and tables in kafka   ...
What every software engineer should know about streams and tables in kafka ...
 
New Approaches for Fraud Detection on Apache Kafka and KSQL
New Approaches for Fraud Detection on Apache Kafka and KSQLNew Approaches for Fraud Detection on Apache Kafka and KSQL
New Approaches for Fraud Detection on Apache Kafka and KSQL
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
 
Taming velocity - a tale of four streams
Taming velocity - a tale of four streamsTaming velocity - a tale of four streams
Taming velocity - a tale of four streams
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Kai Wähner, Technology Evangelist at Confluent: "Development of Scalable Mac...
Kai Wähner, Technology Evangelist at Confluent: "Development of  Scalable Mac...Kai Wähner, Technology Evangelist at Confluent: "Development of  Scalable Mac...
Kai Wähner, Technology Evangelist at Confluent: "Development of Scalable Mac...
 
Bridge Your Kafka Streams to Azure Webinar
Bridge Your Kafka Streams to Azure WebinarBridge Your Kafka Streams to Azure Webinar
Bridge Your Kafka Streams to Azure Webinar
 
Building a Streaming Platform with Kafka
Building a Streaming Platform with KafkaBuilding a Streaming Platform with Kafka
Building a Streaming Platform with Kafka
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
What is Apache Kafka®?
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 

Similar to EDA Meets Data Engineering – What's the Big Deal?

Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Michael Noll
 
Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain 
confluent
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 

Similar to EDA Meets Data Engineering – What's the Big Deal? (20)

Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertFast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
 
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
Apache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesApache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice Architectures
 
Building Serverless EDA w_ AWS Lambda (1).pptx
Building Serverless EDA w_ AWS Lambda (1).pptxBuilding Serverless EDA w_ AWS Lambda (1).pptx
Building Serverless EDA w_ AWS Lambda (1).pptx
 
Streaming ETL to Elastic with Apache Kafka and KSQL
Streaming ETL to Elastic with Apache Kafka and KSQLStreaming ETL to Elastic with Apache Kafka and KSQL
Streaming ETL to Elastic with Apache Kafka and KSQL
 
Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain 
 
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
 
Data Analytics at Altocloud
Data Analytics at Altocloud Data Analytics at Altocloud
Data Analytics at Altocloud
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
 
Apache Kafka® Use Cases for Financial Services
Apache Kafka® Use Cases for Financial ServicesApache Kafka® Use Cases for Financial Services
Apache Kafka® Use Cases for Financial Services
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Building event-driven Microservices with Kafka Ecosystem
Building event-driven Microservices with Kafka EcosystemBuilding event-driven Microservices with Kafka Ecosystem
Building event-driven Microservices with Kafka Ecosystem
 
Splunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Splunk App for Stream - Einblicke in Ihren NetzwerkverkehrSplunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Splunk App for Stream - Einblicke in Ihren Netzwerkverkehr
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 

More from confluent

More from confluent (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 

Recently uploaded

Recently uploaded (20)

Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 

EDA Meets Data Engineering – What's the Big Deal?

  • 1. EDA Meets Data Engineering – What’s the Big Deal? Guru Sattanathan, Systems Engineer @avoguru
  • 2. How are they evolving? Kafka Events
  • 5. Change drivers Cloud Friendly Process data on the move Automation Feedback loop
  • 7. ALL STARTS WITH ONE FUNDAMENTAL ASSUMPTION: DATA IS PASSIVE 7
  • 9. TWO WORLDS OF ENTERPRISE IT 9 THINGS THAT ARE OPERATIONALLY CRITICAL THINGS THAT ARE NOT
  • 10. TWO WORLDS OF ENTERPRISE IT 10 THINGS THAT HELP RUN THE BUSINESS THINGS THAT HELP MAKE MORE $$$
  • 11. TWO WORLDS OF ENTERPRISE IT 11 World of Enterprise Apps World of Data & Analytics
  • 12. TWO WORLDS OF ENTERPRISE IT 12 Microservices, Enterprise Integration, Databases, Kubernetes, CICD, API’s, Websites, etc. Data warehouse, Analytics, Machine Learning, Neural Nets, BI Dashboards, SQL Nerds, Hadoop, Spark, etc.
  • 13. TWO WORLDS OF ENTERPRISE IT 13 Heavy focus on Uptime Heavy focus on Data computation
  • 14. TWO WORLDS OF ENTERPRISE IT 14 Data Acquisition Data Processing Data storage wall
  • 15. TWO WORLDS OF ENTERPRISE IT 15 You were able to process data only after storing it in a database or a data lake OR You have to wait for data to accumulate before you start processing it Data storage wall
  • 17. TWO WORLDS OF ENTERPRISE IT 17 World of Enterprise Apps World of Data & Analytics Data storage wall Slow feedback
  • 18. TWO WORLDS OF ENTERPRISE IT 18 Real-time Batch Data storage wall Slow feedback
  • 20. True.
  • 21. Whether we cared about EDA or not, Events always existed.
  • 22. TWO WORLDS OF ENTERPRISE IT 22 Already Event-Driven Batch Event Processing Data storage wall Slow feedback
  • 24. Ability to process data while it is moving.
  • 25. Ability to respond to events whenever it happens.
  • 28. TWO WORLDS OF ENTERPRISE IT 28 World of Enterprise Apps World of Data & Analytics The wall is blurring Improved feedback cycle
  • 29. Event Streaming Platform, How does it look like ?
  • 30. A Streaming Platform is the Underpinning of an Event-driven Architecture Ubiquitous connectivity Globally scalable platform for all event producers and consumers Immediate data access Data accessible to all consumers in real time Single system of record Persistent storage to enable reprocessing of past events Continuous queries Stream processing capabilities for in-line data transformation Microservices DBs SaaS apps Mobile Customer 360 Real-time fraud detection Data warehouse Producers Consumers Database change Microservices events SaaS data Customer experiences Streams of real time events Stream processing apps Apps from both the worlds
  • 31. {faas} events as a backbone appappappapp Payments Department 2 {faas}appappappapp Department 3 Department 4
  • 32. 1 input, 1 output. Low latency, Poor throughput. Request/ Response All inputs, all outputs. Poor latency, high throughput. Batch Some inputs, some outputs. Tunable latency & throughput. Stream Processing
  • 34. The log is a simple idea Messages are added at the end of the log Old New
  • 35. Shard data to get scalability Messages are sent to different partitions Producer (1) Producer (2) Producer (3) Cluster of machines Partitions live on different machines Messages are sent to different partitions
  • 36. 36 STREAM PROCESSING Create and store materialized views Filter Analyze in-flight
  • 38. ConsumerRecords<String, String> records = consumer.poll(100); Map<String, Integer> counts = new DefaultMap<String, Integer>(); for (ConsumerRecord<String, Integer> record : records) { String key = record.key(); int c = counts.get(key) c += record.value() counts.put(key, c) } for (Map.Entry<String, Integer> entry : counts.entrySet()) { int stateCount; int attempts; while (attempts++ < MAX_RETRIES) { try { stateCount = stateStore.getValue(entry.getKey()) stateStore.setValue(entry.getKey(), entry.getValue() + stateCount) break; } catch (StateStoreException e) { RetryUtils.backoff(attempts); } } } Stream processing approach comparison Kafka producer/consumer Kafka Streams KSQL builder .stream("input-stream", Consumed.with(Serdes.String(), Serdes.String())) .groupBy((key, value) -> value) .count() .toStream() .to("counts", Produced.with(Serdes.String(), Serdes.Long())); SELECT x, count(*) FROM stream GROUP BY x EMIT CHANGES;
  • 39. KSQL Example Use cases Data exploration Data enrichment Streaming ETL Filter, cleanse, mask Real-time monitoring Anomaly detection
  • 40. Example: Retail KSQL joins the two streams in real-time Stream of shipments that arrive Stream of purchases from online and physical stores Inventory on hand
  • 41. Example: CDC from DB via Kafka to Elastic KAFKA CONNECT KAFKA CONNECT Customers KSQL processes table changes in real-time streams data in streams data out
  • 42. Example: IoT, Automotive, Connected Cars KAFKA CONNECT KSQL joins the two streams in real-time Kafka Connect streams data in Cars send telemetry data via Kafka API Kafka Streams application to notify customers Customers KAFKA STREAMS KSQL
  • 43. 43 KSQL for Real-Time Monitoring CREATE STREAM syslog_invalid_users AS SELECT host, message FROM syslog WHERE message LIKE '%Invalid user%'; ● Log data monitoring ● Tracking and alerting ● Syslog data ● Sensor / IoT data ● Application metrics http://cnfl.io/syslogs-filtering http://cnfl.io/syslog-alerting
  • 44. 44 KSQL for Anomaly Detection CREATE TABLE possible_fraud AS SELECT card_number, COUNT(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING COUNT(*) > 3; ● Identify patterns or anomalies in real-time data, surfaced in milliseconds
  • 45. 45 KSQL for Streaming ETL CREATE STREAM vip_actions AS SELECT user_id, page, action FROM clickstream c LEFT JOIN users u ON c.user_id = u.user_id WHERE u.level = 'Platinum'; ● Joining, filtering, and aggregating streams of event data
  • 46. {faas} events as a backbone appappappapp Payments Department 2 {faas}appappappapp Department 3 Department 4
  • 47. Change drivers Process data on the move Feedback loop Cloud Friendly Automation
  • 49. Event Streaming Maturity Model Value Maturity (Investment & time) 1 2 3 4 5 Pre-Streaming Developer Interest Enterprise Streaming Pilot / Early Production SLA Ready, Integrated Streaming Global Streaming Legacy systems. Batch processes. Complex / Slow! LOB Pilot; Small teams experimenting, with pub/sub / integration. -> 1-3 use cases quickly moved into Production. Fragmented. Multiple mission critical use cases in production, with; scale, DR & SLAs. Streaming clearly delivering business value, with C-suite visibility. All data in the organization managed through a single logical streaming platform. -> Digital natives / digital pure players - probably using Machine Learning & AI (Relational databases - redundant) Pub + Sub Store Process Data Streaming; typical maturity stages Central Nervous System Developer downloads Kafka & experiments (15 mins on laptop). Streaming Platform managing majority of mission critical data processes, globally, with multi-datacenter replication across on-prem and hybrid clouds. Projects Platform
  • 51. Guru Sattanathan | @avoguru