SlideShare a Scribd company logo
1 of 51
1
Kai Waehner | Technology Evangelist, Confluent
contact@kai-waehner.de | LinkedIn | @KaiWaehner | www.confluent.io | www.kai-waehner.de
Streaming Machine Learning with
Apache Kafka and Confluent
in the Finance Industry
2
Event Streaming in Finance Industry
Check past Kafka Summit videos for details about the use cases:
https://kafka-summit.org/past-events/
www.kai-waehner.de | @KaiWaehner
3
Machine Learning to
Improve Traditional and to Build New Use Cases
in the Finance Industry
Seconds Minutes Hours
Windows of Opportunity
Short-Sale
Risk Calculation /
Trade Approval
Wealth
Management
Credit Card Fraud
Detection
Next-Best
Offer
Know Your
Customer (KYC)
Customer
Service
Inventory
ManagementRegulatory
Reporting
Account Login
Fraud Detection
Anomaly Detection
Across Assets and Locations
Derivatives
Pricing
Compliance
Trading Post-
Processing
Strategic
Planning and
Simulations
www.kai-waehner.de | @KaiWaehner
4
Use Case: Fraud Detection
“49 percent of the
7,200 companies
they surveyed had
experienced fraud
of some kind”
www.kai-waehner.de | @KaiWaehner
5
Global Bank Builds Fraud Detection Infrastructure
Digital Transformation
• Improve customer
experience
• Increase revenue
• Reduce risk
Time
Today 2 years in the future3 years ago
Project begins Instant payment infrastructure
in production for first use cases
Improved processes leveraging
machine learning – first use case:
Payment Fraud Detection
www.kai-waehner.de | @KaiWaehner
6
Streaming Analytics for
Fraud Detection at Scale
Integration
Layer
Batch
Analytics
Platform
BI
Dashboard
Streaming
Platform
Big Data
Integration
Layer
Payment App
Streaming Platform
Other Components
Real Time
Alerting
System
All
Data
Alert
Ingest
Data
Human
Intelligence
www.kai-waehner.de | @KaiWaehner
7
Machine Learning (ML)
...allows computers to find hidden insights without
being explicitly programmed where to look.
Machine
Learning
• Decision Trees
• Naïve Bayes
• Clustering
• Neural Networks
• Etc.
Deep
Learning
• CNN
• RNN
• Transformer
• Autoencoder
• Etc.
www.kai-waehner.de | @KaiWaehner
8
Main Problem in Machine Learning?
www.kai-waehner.de | @KaiWaehner
There is an
impedance mismatch
between data science and
mission-critical, scalable,
real time infrastructure!
9
The First
Analytic Models
How to deploy the models
in production?
…real-time processing?
…at scale?
…24/7 zero uptime?
www.kai-waehner.de | @KaiWaehner
10
Hidden Technical Debt
in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
www.kai-waehner.de | @KaiWaehner
11
Scalable, Technology-Agnostic
Machine Learning Infrastructures
https://www.infoq.com/presentations/netflix-ml-meson
https://eng.uber.com/michelangelo
https://www.infoq.com/presentations/paypal-data-service-fraudwww.kai-waehner.de | @KaiWaehner
12
Event Streaming Platform –
The Commit Log
Time
P
C1 C2
C3
www.kai-waehner.de | @KaiWaehner
13
Event Streaming Platform –
A Distributed System for 24/7 and Zero Data Loss
Broker 1
Topic1
partition1
Broker 2 Broker 3 Broker 4
Topic1
partition1
Topic1
partition1
Leader Follower
Topic1
partition2
Topic1
partition2
Topic1
partition2
Topic1
partition3
Topic1
partition4
Topic1
partition3
Topic1
partition3
Topic1
partition4
Topic1
partition4
www.kai-waehner.de | @KaiWaehner
14
A Streaming Platform
is the Underpinning of an Event-driven Architecture
Microservices
DBs
SaaS apps
Mobile
Customer 360
Real-time fraud
detection
Data warehouse
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Stream processing apps
Connectors
Connectors
Stream processing apps
www.kai-waehner.de | @KaiWaehner
15
Apache Kafka at Scale
at Tech Giants
> 7 trillion messages / day > 6 Petabytes / day
“You name it”
* Kafka Is not just used by tech giants
** Kafka is not just used for big datawww.kai-waehner.de | @KaiWaehner
16Business Value per Use Case
Business
Value
Improve
Customer
Experience
(CX)
Increase
Revenue
(make money)
Decrease
Costs
(save
money)
Core Business
Platform
Increase
Operational
Efficiency
Migrate to
Cloud
Mitigate Risk
(protect money)
Key Drivers
Strategic Objectives
(sample)
Fraud
Detection
IoT sensor
ingestion
Digital
replatforming/
Mainframe Offload
Connected Car: Navigation & improved
in-car experience: Audi
Customer 360
Simplifying Omni-channel Retail at
Scale: Target
Faster transactional
processing / analysis
incl. Machine Learning / AI
Mainframe Offload: RBC
Microservices
Architecture
Online Fraud Detection
Online Security
(syslog, log aggregation,
Splunk replacement)
Middleware
replacement
Regulatory
Digital
Transformation
Application Modernization: Multiple
Examples
Website / Core
Operations
(Central Nervous System)
The [Silicon Valley] Digital Natives;
LinkedIn, Netflix, Uber, Yelp...
Predictive Maintenance: Audi
Streaming Platform in a regulated
environment (e.g. Electronic Medical
Records): Celmatix
Real-time app
updates
Real Time Streaming Platform for
Communications and Beyond: Capital One
Developer Velocity - Building Stateful
Financial Applications with Kafka
Streams: Funding Circle
Detect Fraud & Prevent Fraud in Real
Time: PayPal
Kafka as a Service - A Tale of Security
and Multi-Tenancy: Apple
Example Use Cases
$↑
$↓
$
Example Case Studies
(of many)
www.kai-waehner.de | @KaiWaehner
17
Apache Kafka’s
Open Ecosystem as Infrastructure for ML
www.kai-waehner.de | @KaiWaehner
18
Apache Kafka’s
Open Ecosystem as Infrastructure for ML
Kafka
Streams /
ksqlDB
Kafka
Connect
Rest Proxy
Schema Registry
Go/ .NET
Kafka Producer
ksqlDB
Python
Consumer
www.kai-waehner.de | @KaiWaehner
19
Fraud Department
Edge
Gateway
Analytics
Department
BI
Streaming
Platform
Integration
Stream
Processing
Payment
Mobile App
Event Streaming Platform
Other Components
Real Time
Application
(6b) All Data
(7) Potential Fraud
(3)
Read Data
Model
Training
(5)
Deploy
Fraud Model
(8b) Alert Fraud Department (e.g. Mobile App)
(2)
Preprocess
Data
(6a) Consume payment data
Model
Edge
Connector
(8a) Alert User
Real Time
Edge
Computing
Model Lite
Real Time App
Model Server
RPC
(4)
Train Fraud Model
(1)
Ingest Data
Database Integration
Consumer
Streaming Analytics for Fraud Detection at Scale
20
Ingestion of Payment Data
Replication
MirrorMaker 2 /
Confluent Replicator
Kafka
Connect
Cars
Cars
Cars
CarsPayment
App
www.kai-waehner.de | @KaiWaehner
21
Data Preprocessing
Preprocessing
Filter, transform, anonymize, extract features
Streams
Data Ready
For Model Training
www.kai-waehner.de | @KaiWaehner
22
SELECT payment_id, smartphone_id, payment_details
FROM payment p
LEFT JOIN user_database u ON p.smartphone_id =
u.smartphone_id
WHERE u.payment_type = 'Apple Pay';
Preprocessing with ksqlDB
www.kai-waehner.de | @KaiWaehner
23
Data Ingestion into a Data Store for Model Training
(and Consumption by other Decoupled Applications)
Connect
Preprocessed
Data
Batch Near Real Time Real Time
www.kai-waehner.de | @KaiWaehner
24
Extreme scale
using
TensorFlow and
TPUs in the
cloud!
Analytic Model
Model Training
Using an Elastic
Infrastructure in
the Cloud
www.kai-waehner.de | @KaiWaehner
25
TensorFlow Model —
Autoencoder for Anomaly Detection
(Fraudulent Payments)
www.kai-waehner.de | @KaiWaehner
26
Direct streaming ingestion
for model training
with TensorFlow I/O + Kafka Plugin
(no additional data storage
like S3 or HDFS required!)
Time
Model BModel A
Producer
Distributed Commit Log
Streaming Ingestion and Model Training
with TensorFlow IO
https://github.com/tensorflow/io
27
Long Term Storage in Kafka?
We use a data lake
for long-term storage!
www.kai-waehner.de | @KaiWaehner
28
Today, Kafka works well
for recent events, short
horizon storage, and
manual data balancing
CONFIDENTIAL
Kafka’s present-day design offers
extraordinarily low messaging latency by
storing topic data on fast disks that are
collocated with brokers. This is usually good.
But sometimes, you need to store a huge
amount of data for a long time.
Kafka
Processing
App
Storage
Transactions, auth,
quota enforcement,
compaction, ...
www.kai-waehner.de | @KaiWaehner
29Tiered Storage for Kafka
Object Store
Processing Storage
Transactions,
auth, quota
enforcement,
compaction, ...
Local
Remote
Kafka
Apps
Store Forever
Older data is offloaded to inexpensive object
storage, permitting it to be consumed at any time.
Save $$$
Storage limitations, like capacity and duration, are
effectively uncapped.
Instantaneously scale up and down
Your Kafka clusters will be able to automatically
self-balance load and hence elastically scale
(Only available in Confluent Platform)www.kai-waehner.de | @KaiWaehner
30Cloud-Native, Scalable and Elastic Kafka
Before Re-balance
Broker
Processing Storage
Local
Remote
Transactions, auth,
quota enforcement,
compaction, ...
Client
After Rebalance
Broker
Processing Storage
Local
Remote
Transactions, auth,
quota enforcement,
compaction, ...
Client
(Only available in Confluent Platform)www.kai-waehner.de | @KaiWaehner
31
Tiered Storage User Experience
$ bin/kafka-topics --bootstrap-server localhost:2181
--create
--topic trades
--partitions 6
--replication-factor 3
--config confluent.tier.enable=true
--config confluent.tier.local.hotset.ms=60000
--config retention.ms=-1
(Only available in Confluent Platform)www.kai-waehner.de | @KaiWaehner
32
Simplified Data Lake Architecture
● No need to build your own pipeline between Kafka and yet another Data Lake
(like HDFS or AWS S3)
● Kafka is the central source of truth / system of record
○ Event-based
○ Guaranteed order
○ Preserves offsets
● Cost reduction
○ No need for another data store (--> data lake)
● Potential: Kafka as long-term “backup”?
○ Define what “backup” means (same for HDFS etc.)
○ E.g. AWS S3 gives you SLAs for HA and data loss. Is this sufficient?
www.kai-waehner.de | @KaiWaehner
33
Reprocessing of Events
● New Consumer
○ e.g. a complete new microservices or a replacement of an existing application
● Error-Handling
○ Re-processing of data in case of error: Fix error and process events again
● Compliance / Regulatory Processing
○ Reprocessing of already processed data for legal reasons
○ Could be very old data (e.g. pharma: 10 years old)
● Query and Analysis of Existing Events
○ No need for another data store / data lake
○ Kafka Client Consumer for offset- or timestamp-based consumption of old events
○ ksqlDB (for simple pull queries)
○ Kafka-native analytics tool (e.g. Rockset with Kafka connector and ANSI SQL support for Tableau et al)
● Model Training
○ Consume events for model training with a) different one ML framework and different hyperparameters or b)
different ML frameworks
www.kai-waehner.de | @KaiWaehner
34
Is Apache Kafka a Database?
https://www.kai-waehner.de/blog/2020/03/12/can-apache-kafka-replace-database-
acid-storage-transactions-sql-nosql-data-lake/
35
Local Predictions
Model Training
in Cloud
Model Deployment
at the Edge
Analytic Model
Separation of
Model Training and Model Inference
www.kai-waehner.de | @KaiWaehner
36
Streams
Input Event
Prediction
Request
Response
Model Serving
TensorFlow Serving
gRPC / HTTP
Application
Stream Processing with External Model and RPC
www.kai-waehner.de | @KaiWaehner
Model
37TensorFlow + TF Serving +
Kafka Streams
Filter
Map
2) Configure Kafka Streams Application
3) RPC to TensorFlow Serving (and catch Exceptions)
4) Start Kafka Streams App
1) Import Kafka and TensorFlow Serving API
Model
Server
Request
Response
38
Prediction
Stream Processing
Model
doPrediction()
return value
Stream Processing
with Embedded Model
Streams
Input Event
www.kai-waehner.de | @KaiWaehner
39
Prediction
Kafka Client
Model
doPrediction()
return value
Client Application
with Embedded Model Input Event
www.kai-waehner.de | @KaiWaehner
REST
Proxy
40
TensorFlow +
Kafka Streams
Filter
Map
2) Load TensorFlow Model
3) Configure Kafka Streams Application
4) Apply TensorFlow Model to Streaming Data
5) Start Kafka Streams App
1) Import Kafka and TensorFlow API
41
“CREATE STREAM FraudDetection AS
SELECT payment_id,
detectAnomaly(payment_values)
FROM payment_table;“
User Defined Function (UDF)
Model Deployment with
Apache Kafka, ksqlDB
and TensorFlow
www.kai-waehner.de | @KaiWaehner
42
Stream Processing
with Model Server vs. Embedded Model
Why use a model server and RPC?
• Simple integration with existing technologies
and organizational processes
• Easier to understand if you come from non-
streaming world
• Later migration to real streaming is also
possible
• Model management built-in for different
models, versioning and A/B testing
• Monitoring built-in
Why embed model into streaming app?
• Better latency as remote call instead of local
inference
• Offline inference (devices, edge processing,
etc.)
• No coupling of the availability, scalability, and
latency/throughput of your Kafka Streams
application with the SLAs of the RPC interface
• No side-effects (e.g., in case of failure), all
covered by Kafka processing (e.g., exactly once)
Application
Input Event
Prediction
43
Fraud Department
MQTT
Broker
Elastic
search
Grafana
Kafka
Cluster
Kafka
Connect
KSQL
Payment
Mobile App
(MQTT over WebSockets)
Kafka Ecosystem
Other Components
Real Time
Kafka Streams
Application
(Java / Scala)
(6b) All Data
(7) Potential Fraud
(3)
Read Data
TensorFlow I/O
TensorFlow (5)
Deploy
Fraud Model
(8b) Alert Fraud Department (e.g. Mobile App)
(2)
Preprocess
Data
(6a) Consume payment data
TensorFlow
MQTT
Connector
Kafka Connect
or
Confluent Proxy
or
HiveMQ Plugin
(8a) Alert User
Real Time
Edge
Computing
(C / librdkafka)
TensorFlow Lite
Real Time
Kafka App
TensorFlow
Serving
gRPC
(4)
Train Fraud Model
(1)
Ingest Data
MySQL
DB
Kafka Connect
CDC
Python Client
Streaming Analytics for Fraud Detection at Scale
Tiered
Storage
4444
Machine Learning + Apache Kafka
à Examples @ Github
https://github.com/kaiwaehner
www.kai-waehner.de | @KaiWaehner
45
Streaming Machine Learning with
Apache Kafka and Tiered Storage
https://www.confluent.io/blog/streaming-machine-
learning-with-tiered-storage/
www.kai-waehner.de | @KaiWaehner
46
One More Thing…
www.kai-waehner.de | @KaiWaehner
How to
deploy this
globally?
47
In 2019, We Made Clusters
Stretch
Automate Disaster Recovery
Sync or Async Replication per Topic
Offset Preserving
Automated Client Failover with No
Custom Code
Multi-Region Cluster
(Only available in Confluent Platform)
48
Example of a Multi-Region Cluster in a Bank
Large FinServ Customer
Topic 1
Topic 2
Topic 1
Topic 2
Topic 3 Topic 3
synchronous
asynchronous
● Topic 1 transactions enter
from us-east and us-west
with fully synchronous
replication
● Topics 2 and 3 in the same
cluster use async - optimize
for latency
● Automated disaster recovery
Result: Clearing time from ‘deposit’ to
‘available’ goes from 5 days to 5 seconds
(including security checks)
(Only available in Confluent Platform)
49
Confluent Global Eventing Platform
Aggregate Small
Footprint Edge
Deployments with
Replication
(aggregation)
Simplify Disaster
Recovery Operations with
Multi-Region Clusters
with RPO=0 and RTO=0
Stream Data Globally
with Replication
(Coming in CP 6.0 in Q3 2020)
51
Key
Takeaways
Don’t underestimate
the Hidden Technical
Debt in Machine
Learning Systems
Leverage the Apache
Kafka Open Source
Ecosystem as scalable
and flexible Event
Streaming Platform
Use Streaming
Machine Learning with
Kafka, Tiered Storage
and TensorFlow IO to
simplify your Big Data
Architecture
www.kai-waehner.de | @KaiWaehner
52
Questions?
Let’s connect...
Kai Waehner
Technology Evangelist
kai.waehner@confluent.io
@KaiWaehner
www.confluent.io
www.kai-waehner.de
LinkedIn

More Related Content

More from Kai Wähner

Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Kai Wähner
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Kai Wähner
 

More from Kai Wähner (20)

Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and Manufacturing
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
 
Apache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and LogisticsApache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and Logistics
 
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR ModernizationApache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
 
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
 
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache KafkaIBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
 
Apache Kafka in the Insurance Industry
Apache Kafka in the Insurance IndustryApache Kafka in the Insurance Industry
Apache Kafka in the Insurance Industry
 
Apache Kafka and MQTT - Overview, Comparison, Use Cases, Architectures
Apache Kafka and MQTT - Overview, Comparison, Use Cases, ArchitecturesApache Kafka and MQTT - Overview, Comparison, Use Cases, Architectures
Apache Kafka and MQTT - Overview, Comparison, Use Cases, Architectures
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 

Recently uploaded (20)

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

Apache Kafka and Deep Learning in Banking and Financial Services

  • 1. 1 Kai Waehner | Technology Evangelist, Confluent contact@kai-waehner.de | LinkedIn | @KaiWaehner | www.confluent.io | www.kai-waehner.de Streaming Machine Learning with Apache Kafka and Confluent in the Finance Industry
  • 2. 2 Event Streaming in Finance Industry Check past Kafka Summit videos for details about the use cases: https://kafka-summit.org/past-events/ www.kai-waehner.de | @KaiWaehner
  • 3. 3 Machine Learning to Improve Traditional and to Build New Use Cases in the Finance Industry Seconds Minutes Hours Windows of Opportunity Short-Sale Risk Calculation / Trade Approval Wealth Management Credit Card Fraud Detection Next-Best Offer Know Your Customer (KYC) Customer Service Inventory ManagementRegulatory Reporting Account Login Fraud Detection Anomaly Detection Across Assets and Locations Derivatives Pricing Compliance Trading Post- Processing Strategic Planning and Simulations www.kai-waehner.de | @KaiWaehner
  • 4. 4 Use Case: Fraud Detection “49 percent of the 7,200 companies they surveyed had experienced fraud of some kind” www.kai-waehner.de | @KaiWaehner
  • 5. 5 Global Bank Builds Fraud Detection Infrastructure Digital Transformation • Improve customer experience • Increase revenue • Reduce risk Time Today 2 years in the future3 years ago Project begins Instant payment infrastructure in production for first use cases Improved processes leveraging machine learning – first use case: Payment Fraud Detection www.kai-waehner.de | @KaiWaehner
  • 6. 6 Streaming Analytics for Fraud Detection at Scale Integration Layer Batch Analytics Platform BI Dashboard Streaming Platform Big Data Integration Layer Payment App Streaming Platform Other Components Real Time Alerting System All Data Alert Ingest Data Human Intelligence www.kai-waehner.de | @KaiWaehner
  • 7. 7 Machine Learning (ML) ...allows computers to find hidden insights without being explicitly programmed where to look. Machine Learning • Decision Trees • Naïve Bayes • Clustering • Neural Networks • Etc. Deep Learning • CNN • RNN • Transformer • Autoencoder • Etc. www.kai-waehner.de | @KaiWaehner
  • 8. 8 Main Problem in Machine Learning? www.kai-waehner.de | @KaiWaehner There is an impedance mismatch between data science and mission-critical, scalable, real time infrastructure!
  • 9. 9 The First Analytic Models How to deploy the models in production? …real-time processing? …at scale? …24/7 zero uptime? www.kai-waehner.de | @KaiWaehner
  • 10. 10 Hidden Technical Debt in Machine Learning Systems https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf www.kai-waehner.de | @KaiWaehner
  • 11. 11 Scalable, Technology-Agnostic Machine Learning Infrastructures https://www.infoq.com/presentations/netflix-ml-meson https://eng.uber.com/michelangelo https://www.infoq.com/presentations/paypal-data-service-fraudwww.kai-waehner.de | @KaiWaehner
  • 12. 12 Event Streaming Platform – The Commit Log Time P C1 C2 C3 www.kai-waehner.de | @KaiWaehner
  • 13. 13 Event Streaming Platform – A Distributed System for 24/7 and Zero Data Loss Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4 www.kai-waehner.de | @KaiWaehner
  • 14. 14 A Streaming Platform is the Underpinning of an Event-driven Architecture Microservices DBs SaaS apps Mobile Customer 360 Real-time fraud detection Data warehouse Producers Consumers Database change Microservices events SaaS data Customer experiences Streams of real time events Stream processing apps Connectors Connectors Stream processing apps www.kai-waehner.de | @KaiWaehner
  • 15. 15 Apache Kafka at Scale at Tech Giants > 7 trillion messages / day > 6 Petabytes / day “You name it” * Kafka Is not just used by tech giants ** Kafka is not just used for big datawww.kai-waehner.de | @KaiWaehner
  • 16. 16Business Value per Use Case Business Value Improve Customer Experience (CX) Increase Revenue (make money) Decrease Costs (save money) Core Business Platform Increase Operational Efficiency Migrate to Cloud Mitigate Risk (protect money) Key Drivers Strategic Objectives (sample) Fraud Detection IoT sensor ingestion Digital replatforming/ Mainframe Offload Connected Car: Navigation & improved in-car experience: Audi Customer 360 Simplifying Omni-channel Retail at Scale: Target Faster transactional processing / analysis incl. Machine Learning / AI Mainframe Offload: RBC Microservices Architecture Online Fraud Detection Online Security (syslog, log aggregation, Splunk replacement) Middleware replacement Regulatory Digital Transformation Application Modernization: Multiple Examples Website / Core Operations (Central Nervous System) The [Silicon Valley] Digital Natives; LinkedIn, Netflix, Uber, Yelp... Predictive Maintenance: Audi Streaming Platform in a regulated environment (e.g. Electronic Medical Records): Celmatix Real-time app updates Real Time Streaming Platform for Communications and Beyond: Capital One Developer Velocity - Building Stateful Financial Applications with Kafka Streams: Funding Circle Detect Fraud & Prevent Fraud in Real Time: PayPal Kafka as a Service - A Tale of Security and Multi-Tenancy: Apple Example Use Cases $↑ $↓ $ Example Case Studies (of many) www.kai-waehner.de | @KaiWaehner
  • 17. 17 Apache Kafka’s Open Ecosystem as Infrastructure for ML www.kai-waehner.de | @KaiWaehner
  • 18. 18 Apache Kafka’s Open Ecosystem as Infrastructure for ML Kafka Streams / ksqlDB Kafka Connect Rest Proxy Schema Registry Go/ .NET Kafka Producer ksqlDB Python Consumer www.kai-waehner.de | @KaiWaehner
  • 19. 19 Fraud Department Edge Gateway Analytics Department BI Streaming Platform Integration Stream Processing Payment Mobile App Event Streaming Platform Other Components Real Time Application (6b) All Data (7) Potential Fraud (3) Read Data Model Training (5) Deploy Fraud Model (8b) Alert Fraud Department (e.g. Mobile App) (2) Preprocess Data (6a) Consume payment data Model Edge Connector (8a) Alert User Real Time Edge Computing Model Lite Real Time App Model Server RPC (4) Train Fraud Model (1) Ingest Data Database Integration Consumer Streaming Analytics for Fraud Detection at Scale
  • 20. 20 Ingestion of Payment Data Replication MirrorMaker 2 / Confluent Replicator Kafka Connect Cars Cars Cars CarsPayment App www.kai-waehner.de | @KaiWaehner
  • 21. 21 Data Preprocessing Preprocessing Filter, transform, anonymize, extract features Streams Data Ready For Model Training www.kai-waehner.de | @KaiWaehner
  • 22. 22 SELECT payment_id, smartphone_id, payment_details FROM payment p LEFT JOIN user_database u ON p.smartphone_id = u.smartphone_id WHERE u.payment_type = 'Apple Pay'; Preprocessing with ksqlDB www.kai-waehner.de | @KaiWaehner
  • 23. 23 Data Ingestion into a Data Store for Model Training (and Consumption by other Decoupled Applications) Connect Preprocessed Data Batch Near Real Time Real Time www.kai-waehner.de | @KaiWaehner
  • 24. 24 Extreme scale using TensorFlow and TPUs in the cloud! Analytic Model Model Training Using an Elastic Infrastructure in the Cloud www.kai-waehner.de | @KaiWaehner
  • 25. 25 TensorFlow Model — Autoencoder for Anomaly Detection (Fraudulent Payments) www.kai-waehner.de | @KaiWaehner
  • 26. 26 Direct streaming ingestion for model training with TensorFlow I/O + Kafka Plugin (no additional data storage like S3 or HDFS required!) Time Model BModel A Producer Distributed Commit Log Streaming Ingestion and Model Training with TensorFlow IO https://github.com/tensorflow/io
  • 27. 27 Long Term Storage in Kafka? We use a data lake for long-term storage! www.kai-waehner.de | @KaiWaehner
  • 28. 28 Today, Kafka works well for recent events, short horizon storage, and manual data balancing CONFIDENTIAL Kafka’s present-day design offers extraordinarily low messaging latency by storing topic data on fast disks that are collocated with brokers. This is usually good. But sometimes, you need to store a huge amount of data for a long time. Kafka Processing App Storage Transactions, auth, quota enforcement, compaction, ... www.kai-waehner.de | @KaiWaehner
  • 29. 29Tiered Storage for Kafka Object Store Processing Storage Transactions, auth, quota enforcement, compaction, ... Local Remote Kafka Apps Store Forever Older data is offloaded to inexpensive object storage, permitting it to be consumed at any time. Save $$$ Storage limitations, like capacity and duration, are effectively uncapped. Instantaneously scale up and down Your Kafka clusters will be able to automatically self-balance load and hence elastically scale (Only available in Confluent Platform)www.kai-waehner.de | @KaiWaehner
  • 30. 30Cloud-Native, Scalable and Elastic Kafka Before Re-balance Broker Processing Storage Local Remote Transactions, auth, quota enforcement, compaction, ... Client After Rebalance Broker Processing Storage Local Remote Transactions, auth, quota enforcement, compaction, ... Client (Only available in Confluent Platform)www.kai-waehner.de | @KaiWaehner
  • 31. 31 Tiered Storage User Experience $ bin/kafka-topics --bootstrap-server localhost:2181 --create --topic trades --partitions 6 --replication-factor 3 --config confluent.tier.enable=true --config confluent.tier.local.hotset.ms=60000 --config retention.ms=-1 (Only available in Confluent Platform)www.kai-waehner.de | @KaiWaehner
  • 32. 32 Simplified Data Lake Architecture ● No need to build your own pipeline between Kafka and yet another Data Lake (like HDFS or AWS S3) ● Kafka is the central source of truth / system of record ○ Event-based ○ Guaranteed order ○ Preserves offsets ● Cost reduction ○ No need for another data store (--> data lake) ● Potential: Kafka as long-term “backup”? ○ Define what “backup” means (same for HDFS etc.) ○ E.g. AWS S3 gives you SLAs for HA and data loss. Is this sufficient? www.kai-waehner.de | @KaiWaehner
  • 33. 33 Reprocessing of Events ● New Consumer ○ e.g. a complete new microservices or a replacement of an existing application ● Error-Handling ○ Re-processing of data in case of error: Fix error and process events again ● Compliance / Regulatory Processing ○ Reprocessing of already processed data for legal reasons ○ Could be very old data (e.g. pharma: 10 years old) ● Query and Analysis of Existing Events ○ No need for another data store / data lake ○ Kafka Client Consumer for offset- or timestamp-based consumption of old events ○ ksqlDB (for simple pull queries) ○ Kafka-native analytics tool (e.g. Rockset with Kafka connector and ANSI SQL support for Tableau et al) ● Model Training ○ Consume events for model training with a) different one ML framework and different hyperparameters or b) different ML frameworks www.kai-waehner.de | @KaiWaehner
  • 34. 34 Is Apache Kafka a Database? https://www.kai-waehner.de/blog/2020/03/12/can-apache-kafka-replace-database- acid-storage-transactions-sql-nosql-data-lake/
  • 35. 35 Local Predictions Model Training in Cloud Model Deployment at the Edge Analytic Model Separation of Model Training and Model Inference www.kai-waehner.de | @KaiWaehner
  • 36. 36 Streams Input Event Prediction Request Response Model Serving TensorFlow Serving gRPC / HTTP Application Stream Processing with External Model and RPC www.kai-waehner.de | @KaiWaehner Model
  • 37. 37TensorFlow + TF Serving + Kafka Streams Filter Map 2) Configure Kafka Streams Application 3) RPC to TensorFlow Serving (and catch Exceptions) 4) Start Kafka Streams App 1) Import Kafka and TensorFlow Serving API Model Server Request Response
  • 38. 38 Prediction Stream Processing Model doPrediction() return value Stream Processing with Embedded Model Streams Input Event www.kai-waehner.de | @KaiWaehner
  • 39. 39 Prediction Kafka Client Model doPrediction() return value Client Application with Embedded Model Input Event www.kai-waehner.de | @KaiWaehner REST Proxy
  • 40. 40 TensorFlow + Kafka Streams Filter Map 2) Load TensorFlow Model 3) Configure Kafka Streams Application 4) Apply TensorFlow Model to Streaming Data 5) Start Kafka Streams App 1) Import Kafka and TensorFlow API
  • 41. 41 “CREATE STREAM FraudDetection AS SELECT payment_id, detectAnomaly(payment_values) FROM payment_table;“ User Defined Function (UDF) Model Deployment with Apache Kafka, ksqlDB and TensorFlow www.kai-waehner.de | @KaiWaehner
  • 42. 42 Stream Processing with Model Server vs. Embedded Model Why use a model server and RPC? • Simple integration with existing technologies and organizational processes • Easier to understand if you come from non- streaming world • Later migration to real streaming is also possible • Model management built-in for different models, versioning and A/B testing • Monitoring built-in Why embed model into streaming app? • Better latency as remote call instead of local inference • Offline inference (devices, edge processing, etc.) • No coupling of the availability, scalability, and latency/throughput of your Kafka Streams application with the SLAs of the RPC interface • No side-effects (e.g., in case of failure), all covered by Kafka processing (e.g., exactly once) Application Input Event Prediction
  • 43. 43 Fraud Department MQTT Broker Elastic search Grafana Kafka Cluster Kafka Connect KSQL Payment Mobile App (MQTT over WebSockets) Kafka Ecosystem Other Components Real Time Kafka Streams Application (Java / Scala) (6b) All Data (7) Potential Fraud (3) Read Data TensorFlow I/O TensorFlow (5) Deploy Fraud Model (8b) Alert Fraud Department (e.g. Mobile App) (2) Preprocess Data (6a) Consume payment data TensorFlow MQTT Connector Kafka Connect or Confluent Proxy or HiveMQ Plugin (8a) Alert User Real Time Edge Computing (C / librdkafka) TensorFlow Lite Real Time Kafka App TensorFlow Serving gRPC (4) Train Fraud Model (1) Ingest Data MySQL DB Kafka Connect CDC Python Client Streaming Analytics for Fraud Detection at Scale Tiered Storage
  • 44. 4444 Machine Learning + Apache Kafka à Examples @ Github https://github.com/kaiwaehner www.kai-waehner.de | @KaiWaehner
  • 45. 45 Streaming Machine Learning with Apache Kafka and Tiered Storage https://www.confluent.io/blog/streaming-machine- learning-with-tiered-storage/ www.kai-waehner.de | @KaiWaehner
  • 46. 46 One More Thing… www.kai-waehner.de | @KaiWaehner How to deploy this globally?
  • 47. 47 In 2019, We Made Clusters Stretch Automate Disaster Recovery Sync or Async Replication per Topic Offset Preserving Automated Client Failover with No Custom Code Multi-Region Cluster (Only available in Confluent Platform)
  • 48. 48 Example of a Multi-Region Cluster in a Bank Large FinServ Customer Topic 1 Topic 2 Topic 1 Topic 2 Topic 3 Topic 3 synchronous asynchronous ● Topic 1 transactions enter from us-east and us-west with fully synchronous replication ● Topics 2 and 3 in the same cluster use async - optimize for latency ● Automated disaster recovery Result: Clearing time from ‘deposit’ to ‘available’ goes from 5 days to 5 seconds (including security checks) (Only available in Confluent Platform)
  • 49. 49 Confluent Global Eventing Platform Aggregate Small Footprint Edge Deployments with Replication (aggregation) Simplify Disaster Recovery Operations with Multi-Region Clusters with RPO=0 and RTO=0 Stream Data Globally with Replication (Coming in CP 6.0 in Q3 2020)
  • 50. 51 Key Takeaways Don’t underestimate the Hidden Technical Debt in Machine Learning Systems Leverage the Apache Kafka Open Source Ecosystem as scalable and flexible Event Streaming Platform Use Streaming Machine Learning with Kafka, Tiered Storage and TensorFlow IO to simplify your Big Data Architecture www.kai-waehner.de | @KaiWaehner
  • 51. 52 Questions? Let’s connect... Kai Waehner Technology Evangelist kai.waehner@confluent.io @KaiWaehner www.confluent.io www.kai-waehner.de LinkedIn