At NORD/LB we are using a data mesh-like approach based on Kafka to build a highly integrated data platform. We enable business-driven data integration through business objects. The introduction of a data mesh approach is first and foremost a cultural challenge. In this presentation we will start with how we engage our non-IT colleagues and how we define our data products with their help. We will then look at how we assemble them using Kafka and data streaming. We will discuss different approaches to building data products and their advantages and disadvantages. Finally we will take a brief look at our architecture and put our data mesh in its context.
2. 2
Agenda
1. NORD/LB and the Finance Sector
2. Data Mesh 101
3. Communicating Data Mesh-like Strategy
4. NORD/LB’s Target Vision
5. Applying Data Mesh Principles
6. The Data Product Business Partner
7. Summary and Outlook
4. 4
Challenges Finance Sector
§ Heavily grown, complex data objects
§ Regulatory requirements
§ Data lineage/ metadata management
§ Data quality
- Hard to fully decentralize, since quality is dependant on purpose
§ Cultural challenges
- Proactive data provisioning instead of demand-driven
§ SaaS
- Private endpoints
- External audits
9. 9
Point-to-Point Translation - Challenges
§ Very confusing
§ Not comprehensive
§ High dependency: If a language
changes, all associated translators
have to be rebuilt
§ The same thing is translated into
different languages, leading to
inconsistencies
§ Neglecting the actual challenge
11. 11
A common language - Challenges
§ Such a language must exist first
§ Everyone needs to learn this language
- This is easier for some e.g.
German and more difficult for
others e.g. Chinese
§ A platform is needed to utilise all
synergy effects
§ Where does everyone meet to talk?
12. 12
Metaphor Breakdown: Persons
§ The people stand for our various
applications
§ All have a different perspective on
data
- That‘s why they store it
differently
- BUT: They are based on the
same idea (nature of our
business)
13. 13
Metaphor Breakdown: Common Language and Platform
§ Business objects as language
- Basis for business-driven data
integration
- Already spoken by business
departments, project WoDa
models the language
§ Our data platforms are Kafka and API
§ Data products assembled from various
sources
- Various parts (attributes) of
the business object are
obtained from the leading
system (golden source)
14. 14
Metaphor Breakdown: Advantages
§ Creating a highly integrated data
landscape
§ Business driven
- Comprehensive for business
departments (important for
participation)
§ Prevention of inconsistencies
§ New perspective on data
- Data quality
- Data governance
§ Decoupling the application from the
data exchange format
15. 15
NORD/LB‘s Vision
Operative
Inventory
Systems and
Data Sources
Data
Transport
Evaluation
and
Aggregation
Transparency
and Basis of
Decision
Bank Steering
(incl. DWH)
Kafka and API
Murex
DQ
Reporting/Analysis Tool
Virtualization Layer
LoanIQ …
…
…
AI
Incl. Lake
Metadata
systematics
§ Business objects as a business-driven
language:
- Defines vocabulary
- Combines all views on data (Prerequisite
for business lineage and integration)
§ Barrier-free data access via virtualized
layer:
- Centralized data provision
- Access the object throughout its lifecycle
§ Metadata as a 360° view and control tool
for data utilization
§ Standardized real-time data transport
§ Centralized DQ evaluation based on
decentralized measures
§ Broad embedding of operational data
management in the organization
Business
Objects
and
Data
Culture
Moodys
1
2
3
4
Explanation
1
2
3
4
5
Components
5
6
Operative
Data
Management
6
16. 16
Data Mesh: Data as a Product
Operative
Inventory
Systems and
Data Sources
Data
Transport
Evaluation
and
Aggregation
Transparency
and Basis of
Decision
Bank Steering
(incl. DWH)
Kafka and API
Murex
DQ
Reporting/Analysis Tool
Virtualization Layer
LoanIQ …
…
…
AI
Incl. Lake
Metadata
systematics
§ Business objects as a business-driven
language:
- Combined from different data sources
- Currently assembled by KSQLDB
- Data streaming (full load)
- Event streaming
§ Standardized real-time data transport
§ Decoupling data transport from
application
§ Centralized DQ evaluation based on
decentralized measures
- Camunda workflows
Business
Objects
and
Data
Culture
Moodys
Explanation
Components
Operative
Data
Management
17. 17
Data Mesh: Data Ownership by Domain
* professional and technical
Operative
Inventory
Systems and
Data Sources
Data
Transport
Evaluation
and
Aggregation
Transparency
and Basis of
Decision
Bank Steering
(incl. DWH)
Kafka and API
Murex
DQ
Reporting/Analysis Tool
Virtualization Layer
LoanIQ …
…
…
AI
Incl. Lake
Metadata
systematics
§ Push Principle
§ The producer is the owner
§ Ends, when data is used in order to
create new information
§ There are exceptions
§ Role model
§ Data Owner
§ Data Steward*
§ Data Expert*
§ Inspired by banking supervisory
requirements such as BCBS239 and
international frameworks
Business
Objects
and
Data
Culture
Moodys
Explanation
Components
Operative
Data
Management
18. 18
Data Mesh: Self-Service Data Access
Operative
Inventory
Systems and
Data Sources
Data
Transport
Evaluation
and
Aggregation
Transparency
and Basis of
Decision
Bank Steering
(incl. DWH)
Kafka and API
Murex
DQ
Reporting/Analysis Tool
Virtualization Layer
LoanIQ …
…
…
AI
Incl. Lake
Metadata
systematics
§ Self-Service for business departments via
Power BI
- Building own reports
§ Business objects accessible through Kafka
& API
§ Kafka/API catalog
- Rights to read/write data
- Kafka: topic based
§ Metadata management catalog
- Data Lineage generated by data
from Schema Registry
§ Kafka Upload Tool
Business
Objects
and
Data
Culture
Moodys
Explanation
Components
Operative
Data
Management
19. 19
Data Mesh: Federated Governance
Operative
Inventory
Systems and
Data Sources
Data
Transport
Evaluation
and
Aggregation
Transparency
and Basis of
Decision
Bank Steering
(incl. DWH)
Kafka and API
Murex
DQ
Reporting/Analysis Tool
Virtualization Layer
LoanIQ …
…
…
AI
Incl. Lake
Metadata
systematics
§ Kafka Platform Rules (Federated!):
- All data needs to be professionally
modeled
- Responsible of topic (and data) is the
producer
- All data projects are accompanied and
approved by a Kafka expert
- Encrypted connection, end2end for
sensitive data
- Developer guidelines like
- Schema enforced at all time
- Naming convention
- No files via Kafka
- Cloud event format
Business
Objects
and
Data
Culture
Moodys
Explanation
Components
Operative
Data
Management
20. 20
Business Partner MVP
Business Partner
Relations to
Employees
Customer
Relations
Rating
Information
Financial
Statements
ESG
Assessment
General
Information
Roles e. g.
Unsettled
Succession
…..
§ Modeling by project
§ Various data sources
- CRM
- Rating system
- Financial statement system
- Kafka upload tool
§ Assembled in Kafka
§ Everyone takes what they need
21. 21
Data Product Business Partner
Operative
Inventory
Systems and
Data Sources
Data
Transport
Evaluation
and
Aggregation
Transparency
and Basis of
Decision
Bank Steering
(incl. DWH)
Kafka and API
CRM
DQ
Reporting/Analysis Tool
Virtualization Layer
Rating …
…
…
AI
Incl. Lake
Metadata
systematics
Business
Objects
and
Data
Culture
External
Data
Explanation
Components
Operative
Data
Management
22. 22
Data Product Business Partner
Operative
Inventory
Systems and
Data Sources
Data Transport
Building Data Products Using Apache Kafka
Evaluation and Aggregation
Transparency and
Basis of Decision
24. 24
Data Product Business Partner- Vision
No source
connectors
Only business objects
(Exceptions:
transformation only for
3rd party software, if
necessary)
Azure CosmosDB
Sink Connector
cannot parse
map datatype!
25. 25
Maps as an array of structs
Problem:
§ Complex data delievery-> Global
Format
§ Azure CosmosDB sink connector
cannot parse map datatype
- Need to switch to array of structs,
loosing the advantages of a map
§ Partial update of map of maps
- Combination of filter, union and
case-clauses
26. 26
KSQLDB
Advantages:
§ Collection functions (filter, transform, reduce)
§ Accessable for non-techies
- Self service
§ Available as managed service
§ private endpoint
Disadvantages:
§ Aggregation, but not self join
- No guarantee of transactional behaviour
- Skipping events, buffering
§ More an abstraction layer on top of Kafka
Streams
27. 27
Kafka Streams
Advantages:
§ Data streaming framework
§ Powerful and flexible
§ Stateful processing
§ Self-Join via state stores
Disadvantages:
§ Needs to be on prem
§ Not accessable for non-techies
28. 28
Apache Flink
Advantages:
§ Mature data streaming framework
§ Stateful processing
§ Solves self-join problem
§ Accessable for non-techies
§ Available as managed service
Disadvantages:
§ (At the moment) missing features like
collection functions (filter, transform,
reduce)
§ No private endpoint
29. 29
Summarization and Outlook
§ Data Mesh
- It‘s about cherry picking
- Size matters! (of the data products)
§ Data Mesh like approach: Business language realized by data products
§ Integration process and data management
§ Cultural challenges
§ SaaS
- Private endpoints
- Auditing
30. 30
Let‘s stay in touch!
Erik Schumann
Data Engineer @ NORD/LB
Erik.Schumann@nordlb.de
www.nordlb.de