Core banking systems are batch oriented: typically with heavy overnight batch cycles before business opens each morning. In this talk I will explain some of the common interface points between core-banking infrastructure and event streaming systems. Then I will focus on how to do stream processing using ksqlDB for core-banking shaped data: showing how to do common operation using various ksqlDB functions. The key features are avro-record keys and multi-key joins (ksqlDB 0.15), schema management and state store planning.
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark Teehan, Confluent
1. Use ksqlDB to Migrate Core-
banking Processing From Batch
to Streaming
Mark Teehan
Kafka Summit APAC, July 2021
2. MARK TEEHAN
▸ Principal Engineer
▸ @ Confluent in Singapore
▸ Various Data-centric roles
▸ Background in Financial systems
3. WHAT THIS TALK IS ABOUT
▸ focus on core banking systems
▸ migrate processing to Kafka
▸ rebuilding heritage takes time
▸ Stream Processor of choice: ksqlDB
▸ examine batch vs continuous
▸ message keys for banking data in ksqlDB
▸ consider roll-forward vs recreate
▸ Simulation & Testing
4. WHAT THIS TALK IS NOT ABOUT
▸CDC products and vendors
▸Use cases for batch streaming on Kafka
5. MAINFRAMES MATTER
No mainframe, no bank
Deposit
Lending
Customer Management
Branch Automation
Back Office Support
Compliance
Internet & Mobile
Fraud, Risk & Compliance
Fund Transfer
Cards
Payments
Wealth
Capital Markets
Information Security
8. CORE BANKING DATA ACCESS PATTERNS
Transactional Data
Credit Card transaction
ATM transaction
Digital banking transaction
2FA request
Approval/Denial
9. CORE BANKING DATA ACCESS PATTERNS
Transactional Data
Credit Card transaction
ATM transaction
Digital banking transaction
2FA request
Approval/Denial
ksqlDB:
STREAM
Kafka:
Retention topic
Usually Continuous
Immutable Append-Only
Collection
Represents
Historical Facts
10. CORE BANKING DATA ACCESS PATTERNS
ksqlDB:
TABLE
Represents
True, as of Now
Kafka:
Compacted Topic
Continuous or Batch
Mutable
Collection
Core Data
Customer Account
Cardholder
Branch
Agreement
Employee
11. BATCH VS CONTINUOUS
Core Data
"....In the simplest terms, a batch job is a
scheduled program that is assigned to run on
a computer without further user interaction.
Batch jobs are often queued up during
working hours, then executed during the
evening or weekend when the computer is
idle."
https://www.bmc.com/blogs/batch-jobs/
"....In the simplest terms, a batch job is a
scheduled program that is assigned to run
on a computer without further user
interaction.
Batch jobs are often queued up during
working hours, then executed during the
evening or weekend when the computer is
idle."
https://www.bmc.com/blogs/batch-jobs/
Represents
True, as of Now
Continuous or Batch
12. BATCH VS CONTINUOUS
A batch-cycle Kafka client may
not be continuous
but it is still a Kafka client
peak may be much higher than
average
Kafka is a
Data Shock Absorber
that exhibits higher intensity
produce and stream processing
13. SET UP FOR BATCH PATTERNS
Kubernetes if you can
Are Standby Replicas
Beneficial for batch?
Metrics for streams/tables -
ksqlDB 0.14+
Use SMTs / KCOPs
Producer Compression
Capture-to-cloud is feasible
14. HANDLING CORE-BANKING SHAPED DATA
beware of thousands of
columns
PIVOT using
INSERT ..SELECT
Register Schemas
match partitions to mainframe
partitions (maybe)
18. 02 core banking system access patterns
MESSAGE KEYS
Key as a Struct
KEYSTRUCT
STRUCT <
CONTROL_1 STRING
,CONTROL_2 STRING
,CONTROL_3 STRING
,CONTROL_4 STRING
, CARD_NO STRING
> KEY
Join (STREAM-STREAM)
FROM STREAM_A A
INNER JOIN STREAM_B B
WITHIN 60 MINUTES ON
A.ROWKEY = struct(
control_1 := B.KEYSTRUCT -> LOAN_CTL_1
, control_2 := B.KEYSTRUCT -> LOAN_CTL_2
, control_3 := B.KEYSTRUCT -> LOAN_CTL_3
, control_4 := B.KEYSTRUCT -> LOAN_CTL_4
, loan_no := B.KEYSTRUCT -> LOAN_NO
)
..you can still join even though Source/Target Column Names differ
Set Keys at CDC time
Avoid avoidable rekey
operations
19. ROLL-FORWARD VS RECREATE PIPELINE
For a daily batch cycle, is
100% of the data re-captured?
Is yesterday's data required?
CDC topic:LOAN
20. 02 core banking system access patterns
ROLL-FORWARD VS RECREATE PIPELINE
RocksDB rolls forward
Rolling forward is 2* time of
writing a new key
So ask: roll-forward or
drop & recreate?
from How to Tune RocksDB for Your Kafka Streams
Application
Stream Processors store
state locally (on disk)
21. 02 core banking system access patterns
Simulation & Testing
Generate mock CDC data
Life-sized data
Avro Keys
23. That's it!
▸ Know your data: Core data and Transaction
data(table and stream)
▸ Setup for Batch - Kubernetes, Standby Replicas,
SMTs, Compression, Capture to Cloud
▸ Try to set keys during CDC, use Avro Record
keys, column names do not need to match
▸ Consider roll-forward vs recreate pipeline
▸ Kafka-Connect-Datagen to simulate CDC