"Lloyds Banking Group is one of the UK’s largest financial service providers with 26 million customers and 16 unique brands including Lloyds Bank, Halifax, Bank of Scotland and Scottish Widows. Driven by our purpose to Help Britain Prosper, we are moving more of our data processing to near real-time to enable industry-leading digital applications, improve our customers' experiences, and achieve more streamlined, efficient solutions, increasingly based on an event-driven architecture approach backed by Apache Kafka.
In the first part of this presentation we will talk about our journey with Kafka since we began around 2015, up to the current day as we continue the roll out of our hybrid and multi-cloud streaming capability, enabling us to connect event producers and consumers across any of our private and public cloud environments. Kafka is now being used across dozens of streaming applications throughout the Group, supporting key capabilities such as improving service resilience by replicating critical data to separate data stores, and keeping digital banking app users informed about activity on their accounts using push notifications.
In the second part we will describe our New Payments Architecture (NPA) platform which went into production in November 2023, and explain how it uses event-driven processing to help to achieve more resilient, transparent and future-proof payments capabilities for our customers.
"
2. Classification: Public
Lloyds Banking
Group
We are a leading UK-based
financial services group, providing
a wide range of banking and
financial services, focused
primarily on retail and commercial
customers.
The Group incorporates many
household names including
Lloyds Bank, Halifax, Bank of
Scotland and Scottish Widows.
Our combined history stretches
back more than 300 years.
3. Classification: Public
3
Intro
Anton Hirschowitz ▪ Enterprise Architect in the Data & AI
Team within CTO Group Architecture at
Lloyds Banking Group
▪ Lead for Streaming and Systems of
Engagement
▪ Consulting firms: Systems Engineer → Solution
Architect / Consultant (1996-2013)
▪ Lloyds Banking Group: Solution Architect
(2013)→ Enterprise Architect (since 2017)
▪ Focused on complex data problems – data
architecture / modelling, data warehousing, Big
Data, MDM, CRM, etc, then got into streaming ~
2019
4. Classification: Public
● 2015/6: First started using Kafka (v0.9.0.0) for Common Reporting Services to ingest System
of Record data into our new Hadoop-based data lake
Potted History of Kafka at Lloyds Banking Group
Systems of Record
Custom
producers
Batch
Processing
(Spark,
MapReduce,
etc)
CDC (Change
Data Capture)
5. Classification: Public
● First major real-time operational use case for Kafka, supporting Open Banking APIs – enabling third party
banks and FinTechs to access customer account balances & transactions
● Streaming and processing data from core banking platform to a Cassandra DB used as a “System of
Engagement” (SoE) – a real-time read-only ODS
2018: First customer-facing use case
Core Banking
Custom
producer
Event
processing
Ingest “System of
Engagement” (SoE)
Open
Banking API
Third party
apps
6. Classification: Public
● Kafka is not just for point-to-point solutions
● The data in our Kafka topics is “digital gold” –
a clean, reusable, real-time source of our
most valuable data
● Just as we have been aiming for more
reusability of data through synchronous
interfaces (APIs) and batch interfaces
(published data sets on Data Warehouses /
Data Lakes), Event Streaming provides
reusability of data for event-driven
architectures
● Event Streaming & Processing becomes the
“Fourth Pillar” of our Group Data Platforms
Architecture
● We needed a team to make it happen…
~2019 : “The Revelation”
Systems Of
Engagement
Systems
of Insight
Systems
of Record
… and so we set out on a mission to create…
Customer
Transactions
Payments
Insurance
Claims
etc
Event Streaming &
Processing
Data Management
7. Classification: Public
● The Stream Team within our Data
Transfer & Integration Services Lab is
our Streaming Centre of Excellence
● Established Kafka as a shared multi-
tenanted service offered across the
Group
● Encouraging adoption of Kafka, and
reuse of streaming data feeds where
possible
● Published standards, patterns and user
guides, aiming to encourage
standardised solutions to common
requirements
“The Stream Team!”
8. Classification: Public
On-prem
● Around five years ago Lloyds Banking Group embarked on the start of our journey onto public cloud. We
selected both Google Cloud Platform (GCP) and Azure as our main cloud partners, with the initial focus on
GCP to migrate a lot of our analytic workloads.
● Many of our “traditional” (particularly mainframe based) applications will remain on-prem for at least several
years.
● We needed a solution to help us synchronise data sets in near real-time across on-prem and cloud
environments, support streaming apps on public cloud, and meet our “stressed exit” requirement (cloud
portability of critical applications)
2019-21: onwards and upwards … into the Clouds
On-prem apps
Public Cloud Hosted Apps
Replication
Central team responsibility
9. Classification: Public
Current State
Central team responsibility
Confluent For Kubernetes
Google Kubernetes Engine
On-prem
Confluent
Platform
Public Cloud Hosted
Apps
Replication
(Confluent Replicator)
Security
Metadata
Management
Observability
2.5 region stretch cluster on-prem
2x regional clusters on GCP
~ 30 tenants on-prem, ~8 on GCP
10. Classification: Public
Current State
Confluent
Platform
System of Record
→ System of
Engagement data
ingestion
System of Insight
data ingestion
Real-Time
Transaction
Classification for
Spending Insights
Fraud Detection
Mobile App
Notifications Commercial
Banking Push
Notifications
Customer Records
Data
Synchronisation
Payment
Processing
Sourcing external
data
Microservices
Asynchronous
Comms and
Queuing
Capturing Audit
Log & Clickstream
Data
11. Classification: Public
Streaming Data Product Reference Architecture
Data Mesh & Streaming Data Products
● We are developing a Data Mesh framework internally, following the core principles of “Domain ownership”,
“Data as a Product”, “Self-serve infrastructure” and “Federated data governance”
● Streaming Data Products form the basis for a new “Kappa Architecture” - using event streams as the primary
data sourcing approach for both real-time and batch processing
● “Enterprise Streaming Hub” : new framework (patterns, templates etc) for building Streaming Data Products
12. Classification: Public
● People, skills and knowledge
○ Recruiting, training, keeping good people
○ Tenants need the skills & knowledge too! Examples of suboptimal practices we
discovered:
Excessive partitions
Kafka Streams – maintaining state externally
Record-by-record process & commit
○ Communicate good practices, don’t just give tenants access and let them loose
○ Publishing patterns and policies internally is not enough – but can’t micromanage them
either
● FinOps: internal cross-charge model
○ Started with simple “T-Shirt size” tenant capacity model – possibly too simple!
Lessons Learned so far
13. Classification: Public
● Managing Kafka in hybrid cloud is hard work
○ A lot of tenants, multiple scaled-up production clusters, very different environments
(VMWare on prem, GKE on GCP)
○ Maintaining a high-quality resilient service using self-managed Kafka on GCP has been
more difficult than it should have been
○ Dependencies on CSP, Kafka software supplier and internal Cloud Services team – when
things go wrong it requires a lot of co-ordination. Cross-stack observability is critical.
Lessons Learned so far
16. Classification: Public
16
Intro
Julian Gevers ▪ Enterprise Architect in the Payments
Team within CTO Group Architecture at
Lloyds Banking Group
▪ Previous roles: Engineer, analyst, architect at
various organisations including Central Banks,
Software Vendors, System Integrators,
Management Consulting.
▪ Enjoy problem solving and designing high-
performance, resilient transaction processing
applications.
17. Classification: Public
● A bit of history
● Background to Payments
● What are the Requirements of a Modern Faster Payments Platform?
● What does the solution look like? (A Decoupled Architecture)
Agenda
Faster Payments Re-architecture at Lloyds Banking Group
18. Classification: Public
A bit of history
Cash Cheques BACS CHAPs
Telex
Faster
Payments
2000
1980 1990
1970
1930
1600
3000BC
SWIFT
Card Auth
networks
ESB
2010 2020
20. Classification: Public
Initiate
Qualify and
determine
routing
Fraud check
Check funds /
debit account
Send to
beneficiary
bank
(via scheme)
Confirm
outcome with
customer
Settlement
processing
Reconciliation
What do payment systems have to do?
Functional Requirements
Receive from
scheme
Qualify and
determine
routing
Sanctions
check
Fraud check
Credit
account
Confirmation
Settlement
processing
Reconciliation
Typical Outbound Flow
Typical Inbound Flow
21. Classification: Public
Volume – high volume, >1000 transactions per second
Scalability – unknown future demands
Speed – very low latency
Resilience – continuous service for data centre outage
Security – protect against malicious actors
Integrity – no payment loss
Observability – fast error detection & system health concerns
Portability – on-premise, but deployable on public cloud too
Extensibility – Change will happen!
Building a modern Faster Payments Platform
Non-functional Requirements
22. Classification: Public
The Solution
An Event-based Decoupled Architecture
Channels
Payment Access
Service
Faster Payments
Processor
Fraud
Accounting
Services
Accounting
Platforms
Gateway
FPS Central
Infrastructure
API
Advices
System of
Record
Operational
Queries
System of
Insight
Analytics &
reporting
Reference
Data
distribution
Dashboards
Monitoring Reconciliation
Sanctions
Screening