Apache Kafka and Machine Learning / Deep Learning in Banking and Finance Industry. See use cases, architectures (hybrid, cloud, edge) and an example for 24/7 fraud detection in real time at scale.
This session explores how and why Apache Kafka has become the de facto standard for reliable and scalable streaming infrastructures in the banking sector and financial services.
AI / Machine learning and the Apache Kafka ecosystem are a great combination for training, deploying and monitoring analytic models at scale in real time. They are showing up more and more in projects, but still feel like buzzwords and hype for science projects.
See how to connect the dots!
- How are Kafka and Machine Learning related?
- How can they be combined to productionize analytic models in mission-critical and scalable real time applications?
- We will discuss a step-by-step approach to build a scalable and reliable real time infrastructure for fraud detection in an instant-payment application using Deep Learning and an Autoencoder for anomaly detection
Video Recording: https://youtu.be/weLGSKrkTLg
Blog Post: https://www.kai-waehner.de/blog/2020/04/14/apache-kafka-machine-learning-banking-finance-industry
We build a hybrid architecture using technologies such as Apache Kafka, Kafka Connect, Kafka Streams, ksqlDB, TensorFlow, TF Serving, TF IO, Confluent Tiered Storage, Google Cloud Platform (GCP), Google Cloud Storage (GCS), and more.
Unlocking the Future of AI Agents with Large Language Models
Apache Kafka and Deep Learning in Banking and Financial Services
1. 1
Kai Waehner | Technology Evangelist, Confluent
contact@kai-waehner.de | LinkedIn | @KaiWaehner | www.confluent.io | www.kai-waehner.de
Streaming Machine Learning with
Apache Kafka and Confluent
in the Finance Industry
2. 2
Event Streaming in Finance Industry
Check past Kafka Summit videos for details about the use cases:
https://kafka-summit.org/past-events/
www.kai-waehner.de | @KaiWaehner
3. 3
Machine Learning to
Improve Traditional and to Build New Use Cases
in the Finance Industry
Seconds Minutes Hours
Windows of Opportunity
Short-Sale
Risk Calculation /
Trade Approval
Wealth
Management
Credit Card Fraud
Detection
Next-Best
Offer
Know Your
Customer (KYC)
Customer
Service
Inventory
ManagementRegulatory
Reporting
Account Login
Fraud Detection
Anomaly Detection
Across Assets and Locations
Derivatives
Pricing
Compliance
Trading Post-
Processing
Strategic
Planning and
Simulations
www.kai-waehner.de | @KaiWaehner
4. 4
Use Case: Fraud Detection
“49 percent of the
7,200 companies
they surveyed had
experienced fraud
of some kind”
www.kai-waehner.de | @KaiWaehner
5. 5
Global Bank Builds Fraud Detection Infrastructure
Digital Transformation
• Improve customer
experience
• Increase revenue
• Reduce risk
Time
Today 2 years in the future3 years ago
Project begins Instant payment infrastructure
in production for first use cases
Improved processes leveraging
machine learning – first use case:
Payment Fraud Detection
www.kai-waehner.de | @KaiWaehner
6. 6
Streaming Analytics for
Fraud Detection at Scale
Integration
Layer
Batch
Analytics
Platform
BI
Dashboard
Streaming
Platform
Big Data
Integration
Layer
Payment App
Streaming Platform
Other Components
Real Time
Alerting
System
All
Data
Alert
Ingest
Data
Human
Intelligence
www.kai-waehner.de | @KaiWaehner
7. 7
Machine Learning (ML)
...allows computers to find hidden insights without
being explicitly programmed where to look.
Machine
Learning
• Decision Trees
• Naïve Bayes
• Clustering
• Neural Networks
• Etc.
Deep
Learning
• CNN
• RNN
• Transformer
• Autoencoder
• Etc.
www.kai-waehner.de | @KaiWaehner
8. 8
Main Problem in Machine Learning?
www.kai-waehner.de | @KaiWaehner
There is an
impedance mismatch
between data science and
mission-critical, scalable,
real time infrastructure!
9. 9
The First
Analytic Models
How to deploy the models
in production?
…real-time processing?
…at scale?
…24/7 zero uptime?
www.kai-waehner.de | @KaiWaehner
10. 10
Hidden Technical Debt
in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
www.kai-waehner.de | @KaiWaehner
13. 13
Event Streaming Platform –
A Distributed System for 24/7 and Zero Data Loss
Broker 1
Topic1
partition1
Broker 2 Broker 3 Broker 4
Topic1
partition1
Topic1
partition1
Leader Follower
Topic1
partition2
Topic1
partition2
Topic1
partition2
Topic1
partition3
Topic1
partition4
Topic1
partition3
Topic1
partition3
Topic1
partition4
Topic1
partition4
www.kai-waehner.de | @KaiWaehner
14. 14
A Streaming Platform
is the Underpinning of an Event-driven Architecture
Microservices
DBs
SaaS apps
Mobile
Customer 360
Real-time fraud
detection
Data warehouse
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Stream processing apps
Connectors
Connectors
Stream processing apps
www.kai-waehner.de | @KaiWaehner
15. 15
Apache Kafka at Scale
at Tech Giants
> 7 trillion messages / day > 6 Petabytes / day
“You name it”
* Kafka Is not just used by tech giants
** Kafka is not just used for big datawww.kai-waehner.de | @KaiWaehner
16. 16Business Value per Use Case
Business
Value
Improve
Customer
Experience
(CX)
Increase
Revenue
(make money)
Decrease
Costs
(save
money)
Core Business
Platform
Increase
Operational
Efficiency
Migrate to
Cloud
Mitigate Risk
(protect money)
Key Drivers
Strategic Objectives
(sample)
Fraud
Detection
IoT sensor
ingestion
Digital
replatforming/
Mainframe Offload
Connected Car: Navigation & improved
in-car experience: Audi
Customer 360
Simplifying Omni-channel Retail at
Scale: Target
Faster transactional
processing / analysis
incl. Machine Learning / AI
Mainframe Offload: RBC
Microservices
Architecture
Online Fraud Detection
Online Security
(syslog, log aggregation,
Splunk replacement)
Middleware
replacement
Regulatory
Digital
Transformation
Application Modernization: Multiple
Examples
Website / Core
Operations
(Central Nervous System)
The [Silicon Valley] Digital Natives;
LinkedIn, Netflix, Uber, Yelp...
Predictive Maintenance: Audi
Streaming Platform in a regulated
environment (e.g. Electronic Medical
Records): Celmatix
Real-time app
updates
Real Time Streaming Platform for
Communications and Beyond: Capital One
Developer Velocity - Building Stateful
Financial Applications with Kafka
Streams: Funding Circle
Detect Fraud & Prevent Fraud in Real
Time: PayPal
Kafka as a Service - A Tale of Security
and Multi-Tenancy: Apple
Example Use Cases
$↑
$↓
$
Example Case Studies
(of many)
www.kai-waehner.de | @KaiWaehner
18. 18
Apache Kafka’s
Open Ecosystem as Infrastructure for ML
Kafka
Streams /
ksqlDB
Kafka
Connect
Rest Proxy
Schema Registry
Go/ .NET
Kafka Producer
ksqlDB
Python
Consumer
www.kai-waehner.de | @KaiWaehner
19. 19
Fraud Department
Edge
Gateway
Analytics
Department
BI
Streaming
Platform
Integration
Stream
Processing
Payment
Mobile App
Event Streaming Platform
Other Components
Real Time
Application
(6b) All Data
(7) Potential Fraud
(3)
Read Data
Model
Training
(5)
Deploy
Fraud Model
(8b) Alert Fraud Department (e.g. Mobile App)
(2)
Preprocess
Data
(6a) Consume payment data
Model
Edge
Connector
(8a) Alert User
Real Time
Edge
Computing
Model Lite
Real Time App
Model Server
RPC
(4)
Train Fraud Model
(1)
Ingest Data
Database Integration
Consumer
Streaming Analytics for Fraud Detection at Scale
22. 22
SELECT payment_id, smartphone_id, payment_details
FROM payment p
LEFT JOIN user_database u ON p.smartphone_id =
u.smartphone_id
WHERE u.payment_type = 'Apple Pay';
Preprocessing with ksqlDB
www.kai-waehner.de | @KaiWaehner
23. 23
Data Ingestion into a Data Store for Model Training
(and Consumption by other Decoupled Applications)
Connect
Preprocessed
Data
Batch Near Real Time Real Time
www.kai-waehner.de | @KaiWaehner
26. 26
Direct streaming ingestion
for model training
with TensorFlow I/O + Kafka Plugin
(no additional data storage
like S3 or HDFS required!)
Time
Model BModel A
Producer
Distributed Commit Log
Streaming Ingestion and Model Training
with TensorFlow IO
https://github.com/tensorflow/io
27. 27
Long Term Storage in Kafka?
We use a data lake
for long-term storage!
www.kai-waehner.de | @KaiWaehner
28. 28
Today, Kafka works well
for recent events, short
horizon storage, and
manual data balancing
CONFIDENTIAL
Kafka’s present-day design offers
extraordinarily low messaging latency by
storing topic data on fast disks that are
collocated with brokers. This is usually good.
But sometimes, you need to store a huge
amount of data for a long time.
Kafka
Processing
App
Storage
Transactions, auth,
quota enforcement,
compaction, ...
www.kai-waehner.de | @KaiWaehner
29. 29Tiered Storage for Kafka
Object Store
Processing Storage
Transactions,
auth, quota
enforcement,
compaction, ...
Local
Remote
Kafka
Apps
Store Forever
Older data is offloaded to inexpensive object
storage, permitting it to be consumed at any time.
Save $$$
Storage limitations, like capacity and duration, are
effectively uncapped.
Instantaneously scale up and down
Your Kafka clusters will be able to automatically
self-balance load and hence elastically scale
(Only available in Confluent Platform)www.kai-waehner.de | @KaiWaehner
30. 30Cloud-Native, Scalable and Elastic Kafka
Before Re-balance
Broker
Processing Storage
Local
Remote
Transactions, auth,
quota enforcement,
compaction, ...
Client
After Rebalance
Broker
Processing Storage
Local
Remote
Transactions, auth,
quota enforcement,
compaction, ...
Client
(Only available in Confluent Platform)www.kai-waehner.de | @KaiWaehner
31. 31
Tiered Storage User Experience
$ bin/kafka-topics --bootstrap-server localhost:2181
--create
--topic trades
--partitions 6
--replication-factor 3
--config confluent.tier.enable=true
--config confluent.tier.local.hotset.ms=60000
--config retention.ms=-1
(Only available in Confluent Platform)www.kai-waehner.de | @KaiWaehner
32. 32
Simplified Data Lake Architecture
● No need to build your own pipeline between Kafka and yet another Data Lake
(like HDFS or AWS S3)
● Kafka is the central source of truth / system of record
○ Event-based
○ Guaranteed order
○ Preserves offsets
● Cost reduction
○ No need for another data store (--> data lake)
● Potential: Kafka as long-term “backup”?
○ Define what “backup” means (same for HDFS etc.)
○ E.g. AWS S3 gives you SLAs for HA and data loss. Is this sufficient?
www.kai-waehner.de | @KaiWaehner
33. 33
Reprocessing of Events
● New Consumer
○ e.g. a complete new microservices or a replacement of an existing application
● Error-Handling
○ Re-processing of data in case of error: Fix error and process events again
● Compliance / Regulatory Processing
○ Reprocessing of already processed data for legal reasons
○ Could be very old data (e.g. pharma: 10 years old)
● Query and Analysis of Existing Events
○ No need for another data store / data lake
○ Kafka Client Consumer for offset- or timestamp-based consumption of old events
○ ksqlDB (for simple pull queries)
○ Kafka-native analytics tool (e.g. Rockset with Kafka connector and ANSI SQL support for Tableau et al)
● Model Training
○ Consume events for model training with a) different one ML framework and different hyperparameters or b)
different ML frameworks
www.kai-waehner.de | @KaiWaehner
34. 34
Is Apache Kafka a Database?
https://www.kai-waehner.de/blog/2020/03/12/can-apache-kafka-replace-database-
acid-storage-transactions-sql-nosql-data-lake/
35. 35
Local Predictions
Model Training
in Cloud
Model Deployment
at the Edge
Analytic Model
Separation of
Model Training and Model Inference
www.kai-waehner.de | @KaiWaehner
40. 40
TensorFlow +
Kafka Streams
Filter
Map
2) Load TensorFlow Model
3) Configure Kafka Streams Application
4) Apply TensorFlow Model to Streaming Data
5) Start Kafka Streams App
1) Import Kafka and TensorFlow API
41. 41
“CREATE STREAM FraudDetection AS
SELECT payment_id,
detectAnomaly(payment_values)
FROM payment_table;“
User Defined Function (UDF)
Model Deployment with
Apache Kafka, ksqlDB
and TensorFlow
www.kai-waehner.de | @KaiWaehner
42. 42
Stream Processing
with Model Server vs. Embedded Model
Why use a model server and RPC?
• Simple integration with existing technologies
and organizational processes
• Easier to understand if you come from non-
streaming world
• Later migration to real streaming is also
possible
• Model management built-in for different
models, versioning and A/B testing
• Monitoring built-in
Why embed model into streaming app?
• Better latency as remote call instead of local
inference
• Offline inference (devices, edge processing,
etc.)
• No coupling of the availability, scalability, and
latency/throughput of your Kafka Streams
application with the SLAs of the RPC interface
• No side-effects (e.g., in case of failure), all
covered by Kafka processing (e.g., exactly once)
Application
Input Event
Prediction
43. 43
Fraud Department
MQTT
Broker
Elastic
search
Grafana
Kafka
Cluster
Kafka
Connect
KSQL
Payment
Mobile App
(MQTT over WebSockets)
Kafka Ecosystem
Other Components
Real Time
Kafka Streams
Application
(Java / Scala)
(6b) All Data
(7) Potential Fraud
(3)
Read Data
TensorFlow I/O
TensorFlow (5)
Deploy
Fraud Model
(8b) Alert Fraud Department (e.g. Mobile App)
(2)
Preprocess
Data
(6a) Consume payment data
TensorFlow
MQTT
Connector
Kafka Connect
or
Confluent Proxy
or
HiveMQ Plugin
(8a) Alert User
Real Time
Edge
Computing
(C / librdkafka)
TensorFlow Lite
Real Time
Kafka App
TensorFlow
Serving
gRPC
(4)
Train Fraud Model
(1)
Ingest Data
MySQL
DB
Kafka Connect
CDC
Python Client
Streaming Analytics for Fraud Detection at Scale
Tiered
Storage
47. 47
In 2019, We Made Clusters
Stretch
Automate Disaster Recovery
Sync or Async Replication per Topic
Offset Preserving
Automated Client Failover with No
Custom Code
Multi-Region Cluster
(Only available in Confluent Platform)
48. 48
Example of a Multi-Region Cluster in a Bank
Large FinServ Customer
Topic 1
Topic 2
Topic 1
Topic 2
Topic 3 Topic 3
synchronous
asynchronous
● Topic 1 transactions enter
from us-east and us-west
with fully synchronous
replication
● Topics 2 and 3 in the same
cluster use async - optimize
for latency
● Automated disaster recovery
Result: Clearing time from ‘deposit’ to
‘available’ goes from 5 days to 5 seconds
(including security checks)
(Only available in Confluent Platform)
49. 49
Confluent Global Eventing Platform
Aggregate Small
Footprint Edge
Deployments with
Replication
(aggregation)
Simplify Disaster
Recovery Operations with
Multi-Region Clusters
with RPO=0 and RTO=0
Stream Data Globally
with Replication
(Coming in CP 6.0 in Q3 2020)
50. 51
Key
Takeaways
Don’t underestimate
the Hidden Technical
Debt in Machine
Learning Systems
Leverage the Apache
Kafka Open Source
Ecosystem as scalable
and flexible Event
Streaming Platform
Use Streaming
Machine Learning with
Kafka, Tiered Storage
and TensorFlow IO to
simplify your Big Data
Architecture
www.kai-waehner.de | @KaiWaehner