Streaming Machine Learning and Apache Kafka for real-time analytics-The Next Generation of Intelligent Software for Financial Services and Insurance Industries.
The slides cover use cases, architectures, and examples from various companies. Learn about Kafka + Machine Learning / Deep Learning for fraud detection and other use cases.
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Kafka and Machine Learning in Banking and Insurance Industry
1. Streaming Machine Learning and Apache Kafka
The Next Generation of Intelligent Software for Financial Services and Insurance
Kai Waehner
Technology Evangelist
contact@kai-waehner.de
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de
2. STREAM
PROCESSING
Create and store
materialized views
Filter
Analyze in-flight
Time
C CC
Event Streaming
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
3. Use Case: Fraud Detection
“49 percent of the
7,200 companies
they surveyed had
experienced fraud of
some kind”
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
4. Global Bank Builds Fraud Detection Infrastructure
Digital Transformation
• Improve customer experience
• Increase revenue
• Reduce risk
Time
Today 2 years in the future2 years ago
Project begins Instant payment infrastructure
in production for first use cases
Improved processes leveraging
machine learning – first use case:
Payment Fraud Detection
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
5. Streaming Analytics for
Fraud Detection at Scale
Integration
Layer
Batch
Analytics
Platform
BI
Dashboard
Streaming
Platform
Big Data
Integration
Layer
Payment App
Streaming Platform
Other Components
Real Time
Alerting
System
All
Data
Alert
Ingest
Data
Human
Intelligence
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
6. Integration Platform
for legacy and modern technologies
https://www.jug.ch/events/slides/190918_Microservices_and_Kafka_on_OpenShift.pdf
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
7. Integration Platform
for legacy and modern technologies
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
8. Machine Learning (ML)
...allows computers to find hidden insights without
being programmed where to look
9
Machine Learning
● Decision Trees
● Naïve Bayes
● Clustering
● Neural
Networks
● Etc.
Deep Learning
● CNN
● RNN
● Transformer
● Autoencoder
● Etc.
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
9. Machine Learning to
Improve Traditional and to Build New Use Cases
in the Finance and Insurance Industry
Real Time Information Digital Transformation Strategic Goals
Windows of Opportunity
Short-Sale
Risk Calculation
/ Trade Approval
Influencing Customer Behavior
(Fitness Tracker, Car Data, …)
Instant
Payment
Accelerated
Claim Processing
Robot Process Automation
(e.g. Know Your Customer, KYC)
Customer Service
(e.g. Chat Bots)
Digitalization of
Legacy Processes
Regulatory
Reporting
Fraud Detection
(Payments,
Fraudulent Claims)
Personalizing Offers,
Policies, Prices,
Recommendations
Real-time
Price Adjustments
Location-based
Services
Post Trade
Settlement
Cybersecurity
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
10. The First
Analytic Models
11
How to deploy the models
in production?
…real-time processing?
…at scale?
…24/7 zero uptime?
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
11. Hidden Technical Debt
in Machine Learning Systems
12
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
12. Streaming Analytics for
Fraud Detection at Scale
Integration
Layer
Batch
Analytics
Platform
BI
Dashboard
Streaming
Platform
Big Data
Integration
Layer
Payment App
Streaming Platform
Analytics Platform
Other Components
Real Time
Alerting
System
All
Data
Alert
Ingest
Data
Human
Intelligence
Analytics
Platform
Train
Analytic
Model
Data
Processing
Analytic
Model
Preprocess
Data
Consume
Data
Deploy
Analytic Model
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
13. Fraud Detection with Apache Kafka
at Scale in Real-Time for Billions of Messages
https://www.infoq.com/presentations/paypal-data-service-fraud
https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/69459.html
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
14. A Streaming Platform -
The Underpinning of an Event-Driven Architecture
18
Microservices
DBs
SaaS apps
Mobile
Customer 360
Real-time fraud
detection
Data warehouse
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Stream processing apps
Connectors
Connectors
Stream processing apps
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
15. Apache Kafka at Scale at Tech Giants
19
> 7 trillion messages / day > 6 Petabytes / day
...you name it!
* Kafka Is not just used by tech giants
** Kafka is not just used for big data
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
16. Event Streaming in the Finance and Insurance Industries
Check past Kafka Summit videos for details about the use cases:
https://kafka-summit.org/past-events/
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
17. Apache Kafka as Infrastructure for ML
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
18. Apache Kafka’s Open Ecosystem as Infrastructure for ML
Kafka
Streams/
ksqlDB
Kafka Connect
Confluent REST Proxy
Confluent Schema Registry
Go/.NET/Python
Kafka Producer
ksqlDB
Python
Client
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
19. AI/ML
Modernized security information and event management (SIEM)
Filter, transform
aggregate
APP SIEM Index
Search
Curated streams
Forensic
Archive
HDFS
S3
Big Query
CDC
Syslog
Network traffic
Firewall logs
RDBMS
Application logs
Payment Data
HTTP proxy logs
QRadar
Arcsight
Splunk
Elastic
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
21. Ingestion of
IoT Data
28
Replication
MirrorMaker /
Confluent Replicator /
Cluster Linking
Kafka
Connect
Analytics /
Machine
Learning
Ca
rsCa
rsCa
rsCa
rsPayment
App
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
22. Mainframe Offloading
Brownfield instead of Greenfield
Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Legacy
App
Modern
App 1
Complex business logic
Push changes once
Write
Write
continuously
Read
continuously
Modern
App 2
Write
continuously
Read
continuously
MIPS / MSU
MIPS / MSU
MIPS / MSU
Read
No MIPS / MSU
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
23. Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Legacy
App
Modern
App 1
Complex business logic
Push changes once
Write
Write
continuously
Read
continuously
Modern
App 2
Write
continuously
Read
continuously
MIPS / MSU
MIPS / MSU
MIPS / MSU
Read
No MIPS / MSU
Mainframe Offloading
Brownfield instead of Greenfield
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
24. “… rescue data off of the mainframe, in a cloud native,
microservice-based fashion … [to] … significantly reduce the
reads on the mainframe, saving RBC fixed infrastructure
costs (OPEX). RBC stayed compliant with bank regulations
and business logic, and is now able to create new
applications using the same event-based architecture.”
Mainframe Offloading
for massive cost-savings
https://www.confluent.io/customers/rbc/
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
25. Mainframe Integration, Offloading and Replacement
with Apache Kafka
https://www.kai-waehner.de/blog/2020/04/24/mainframe-offloading-replacement-apache-kafka-connect-ibm-db2-mq-cdc-cobol/
https://www.slideshare.net/KaiWaehner/mainframe-integration-offloading-and-replacement-with-apache-kafka
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
27. SELECT payment_id, smartphone_id, payment_details
FROM payment p
LEFT JOIN user_database u ON p.smartphone_id = u.smartphone_id
WHERE u.payment_type = 'Apple Pay';
Preprocessing with ksqlDB
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
28. Data Ingestion into a Data Lake
for Model Training
(and Consumption by other Decoupled Applications)
35
Connect
Preprocessed
Data
Batch Near
Real Time
Real
Time
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
29. Extreme scale
usingTensorFlow
and TPUs
in the cloud!
Analytic
Model
Model Training
Using an Elastic
Infrastructure in
the Cloud
36www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
30. TensorFlow Model —
Autoencoder for Anomaly Detection
of Fraudulent Payments
37www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
31. Direct streaming ingestion
for model training
with TensorFlow I/O + Kafka Plugin
(no additional data storage
like S3 or HDFS required!)
Time
Model BModel A
Producer
Distributed
Commit Log
Streaming Ingestion and Model Training
without another Data Lake
https://github.com/tensorflow/io
38
Model X
(at a later time)
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
32. Simplified Data Lake Architecture
Tiered Storage for Kafka provides
● one platform for all data processing
● an event-based source of truth for
materialized views
● no need for a pipeline between Kafka and
a Data Lake like Hadoop
Benefits
● cost reduction
● long-term backup
● performance isolation
(real-time and historical analysis in the same cluster)
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
33. Store Data
Long-Term
in Kafka? Kafka
Processing
App
Storage
Transactions,
auth, quota
enforcement,
compaction, ...
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
34. Confluent Tiered Storage for Kafka
Object Store
Processing Storage
Transactions,
auth, quota
enforcement,
compaction, ...
Local
Remote
Kafka
Apps
Store Forever
Older data is offloaded to inexpensive object
storage, permitting it to be consumed at any time.
Save $$$
Storage limitations, like capacity and duration,
are effectively uncapped.
Instantaneously scale up and down
Your Kafka clusters will be able to automatically
self-balance load and hence elastically scale
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
35. Confluent Tiered Storage for Kafka
42www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
36. Use Cases for Reprocessing Historical Events
Give me all events from time A to time B
Real-time Producer
Time
• New consumer application
• Error-handling
• Compliance / regulatory processing
• Query and analyze existing events
• Model training
Real-time Consumer
Consumer of
Historical Data
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
37. Is Apache Kafka a Database?
https://www.kai-waehner.de/blog/2020/03/12/can-apache-kafka-replace-database-
acid-storage-transactions-sql-nosql-data-lake/
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
38. Local Predictions
Model Training
in Cloud
Model Deployment
at the Edge
Analytic Model
Separation of
Model Training and Model Inference
45www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
41. User Defined Function (UDF)
Model Deployment with
Apache Kafka, ksqlDB
and TensorFlow
48
“CREATE STREAM FraudDetection AS
SELECT payment_id,
detectAnomaly(payment_values)
FROM payment_table;“
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
42. Model Deployment with Apache Kafka
(Embedded vs. Model Server)
https://www.confluent.io/blog/machine-learning-real-time-analytics-models-in-kafka-applications/
https://www.confluent.io/kafka-summit-san-francisco-2019/event-driven-model-serving-stream-processing-vs-rpc-
with-kafka-and-tensorflow/
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
43. Real-time Price Adjustments
in Vehicle Insurance
https://www.confluent.io/kafka-summit-san-francisco-2019/how-to-build-
real-time-price-adjustments-in-vehicle-insurance-on-streams/
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
44. Fraud Department
Edge
Gateway
Analytics
Department
BI
Streaming
Platform
Integration
Stream
Processing
Payment
Mobile App
Event Streaming Platform
Other Components
Real Time
Application
(6b) All Data
(7) Potential Fraud
(3)
Read Data
Model
Training
(5)
Deploy
Fraud Model
(8b) Alert Fraud Department (e.g. Mobile App)
(2)
Preprocess
Data
(6a) Consume payment data
Model
Edge
Connector
(8a) Alert User
Real Time
Edge
Computing
Model Lite
Real Time App
Model Server
RPC
(4)
Train Fraud Model
(1)
Ingest Data
Database Integration
Consumer
Streaming Analytics for Fraud Detection at Scale
47. One pipeline to rule them all
Real-time model scoring, batch model training, near-real time BI analytics
Give me all events from time A to time B
Car sensors
(MQTT connector)
Time
Production
infrastructure
(Java)
Data science / analytics infrastructure
(Python + Jupyter)
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
48. One More Thing…
How to
deploy this 24/7,
including
Disaster Recovery?
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
49. Multi-Region Clusters
Automate Disaster Recovery
Sync or Async Replication per Topic
Offset Preserving
Automated Client Failover with No
Custom Code
Zero Downtime + Zero Data loss
(RPO=0 and RTO=0)
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
50. Example of a Multi-Region Cluster in a Bank
Large FinServ Customer
Payment
Log
Payment
Log
Location Location
synchronous
asynchronous
● Topic 1 transactions enter
from us-east and us-west
with fully synchronous
replication
● Topics 2 and 3 in the same
cluster use async - optimize
for latency
● Automated disaster recovery
Result: Clearing time from ‘deposit’ to
‘available’ goes from 5 days to 5 seconds
(including security checks)
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
51. 59
Cluster Linking for Hybrid and Global Deployments
Migrate Kafka
clusters to
Confluent
Cloud
• Uses the Kafka protocol
• Requires no additional infrastructure
(such as MirrorMaker)
• Preserves offsets
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
52. I N V E S T M E N T & T I M E
VALUE
3
4
5
1
2
Event Streaming Maturity Model
60
Initial Awareness /
Pilot
Start to Build Pipeline /
Deliver 1 New Outcome
Leverage
Stream Processing
Build Contextual
Event-Driven Apps
Central Nervous
System
Product, Support, Training, Partners, Technical Account Management...
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
53. The Rise of Event Streaming
2010
Apache Kafka
created at LinkedIn by
Confluent founders
2014
2020
80%
Fortune 100
Companies
trust and use
Apache Kafka
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries
54. Confluent Platform
Fully Managed Cloud ServiceSelf Managed Software FREEDOM OF CHOICE
COMMITTER-DRIVEN
EXPERTISE
PartnersTrainingProfessional
Services
Enterprise
Support
Apache Kafka
EFFICIENT
OPERATIONS AT SCALE
PRODUCTION-
STAGE PREREQUISITES
UNRESTRICTED
DEVELOPER PRODUCTIVITY
SQL-based
Stream Processing
KSQL (ksqlDB)
Rich Pre-built Ecosystem
Connectors | Hub | Schema Registry
Multi-language Development
non-Java clients | REST Proxy
GUI-driven Mgmt & Monitoring
Control Center
Flexible DevOps Automation
Operator | Ansible
Dynamic Performance &
Elasticity
Auto Data Balancer | Tiered Storage
Enterprise-grade Security
RBAC | Secrets | Audit logs
Data Compatibility
Schema Registry | Schema Validation
Global Resilience
Multi-Region Clusters | Replicator
Developer Operator Architect
Open Source | Community licensed
PARTNERSHIP
FOR BUSINESS SUCCESS
Complete
Engagement Model
Revenue / Cost / Risk
Impact
TCO / ROI
Executive Buyer
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning in FinServ and Insurance Industries