Competing effectively in the digital age means being data-driven to make the right long term and short term decisions. However the quality of your decisions will be proportional to the quality of your facts. Data quality is the critical stable foundation for your organisation to transition to a data-driven and AI enabled organisation.
2. 2
COMPETING IN THE DIGITAL
AGE
In a connected world,
competing effectively
in the digital age
means making the
right decisions at
pace
3. 3
Machine learning
algorithms rely on
data to learn for
themselves
AI could potentially create
$3.5 trillion to $5.8 trillion in
annual value in the global
economy
Source: McKinsey global institute 2018
UNLOCKING THE VALUE OF AI
4. 4
Leaders in the digital
age are able to make
strategic and
operational decisions
based on data, at
scale
THIS IS THE DATA-DRIVEN
ORGANISATION
DATA DRIVEN ORGANISATIONS
5. 5
The quality of your
decisions will be
proportional to the
quality of your data
Data Quality is a foundational
element of achieving digital
success
BUILDING ON STABLE
FOUNDATIONS
6. 6
DQ is a symptom of
poor processes and
systems, which
requires coordination
across the enterprise
DATA QUALITY MUST
SUPPORT PROCESS
ASSURANCE AND
IMPROVEMENT ACROSS
THE ENTERPRISE
DQ MUST BE COORDINATED
ACROSS THE ENTERPRISE
7. Enterprise Architecture aligned end-to-end DQ approach
7
A successful DQ initiative relies on alignment to existing enterprise and risk management frameworks and assets
Information
Architecture
Business/ Process
Architecture
Integration
Architecture
Application &
Infrastructure
Architecture
Risk based approach to
identify key processes /
use cases in-scope for
DQ improvement
Definition of customer journey’s and
process value chains with customer &
organisational outcomes defined
1 2
Definition of a business
conceptual data model &
business rules based on
in-scope processes
Agreement of definitions
(decomposition of metrics and critical
data elements), sources of truth, RACI
(e.g. owners)
3 4
Document data lineage (data flows)
between key systems for each in-scope
process / use case
5
Catalogue systems and critical data sets and controls environment
within an Information Assets Register or Source Catalogue
6
21
Impact
Likelihood
High
Med
Low
Low Medium High
Inherent Risk
(‘gross’ risk)
DQ Treatment
Process improvement
Risk
Tolerance
<<Party>> <<Item>>
Owns, rents,
buys, sells,
leases
Service
Enters
Provides,
consumes
Uses,
maintains
Creates
conditions
<<Classification>>Arrangement Type of
<<Event>>
Type of
Location
Has
Occurs at
Occurs at
Involved in
Triggers
Consists of
Creates
<<Party>> <<Item>>
Owns, rents,
buys, sells,
leases
Service
Enters
Provides,
consumes
Uses,
maintains
Creates
conditions
<<Classification>>Arrangement Type of
<<Event>>
Type of
Location
Has
Occurs at
Occurs at
Involved in
Triggers
Consists of
Creates
3
Channels
Web Mobile
Broker
Contract
Centre
Branch
CRM
Product
Origination
Fulfilment
Risk /
Capital Mgt
Follow up
Integration (Message/Stream + Batch)
Servicing
Credit
Approval
Settlement
Payments
Finance
Cloud Data Asset
KYC
Sanctions Performance
Channels
Web Mobile
Broker
Contract
Centre
Branch
CRM
Product
Origination
Fulfilment
Risk /
Capital Mgt
Follow up
Integration (Message/Stream + Batch)
Servicing
Credit
Approval
Settlement
Payments
Finance
Cloud Data Asset
KYC
Sanctions Performance
6
5
4
Customer Journey & Associated
Business Value Chain
Operational Risk Matrix
Business Conceptual
Model
Business Metric Decomposition &
Business Definitions
Integration Landscape Data Lineage
Information Asset Register Source Catalogue
EnterpriseDataDecompositionTree
Note: EcoProfit = NPAT - Cost of Equity ($) + IEL(CCA) + Imputation Credits
= NPAT – Cost of Equity (%) x Eco Cap ($)
ROE = NPAT / Book Equity, Book Equity = EcoCap = Total Reg Cap
Credit Risk Capital
Other
Revenue
IEL (CCA)
X
Eco Cap ($)
Expenses
Cost of Capital
Rate (%)
-
NPAT - EL
basis ($)
Cost of Capital
+
Franking
Credits – Tax
Allocated
Expenses
Controllable
Expenses
mRWA
Loss Data
(ELD/ILD)
Economic
Profit ($)
ROE (%)
Tenor
Customer
Asset Class
Credit RWA
Capital Ratio
X
Market Risk
Capital
Reg EL
Op Risk Capital
Investment Stakes, Fixed Assets,
Deferred Acquisition
Exposure At
Default (EAD)
Provisions &
Delinquencies
Collective
Provisions
Retail Pooling &
Segmentation
Probability of
Default
Loss Given Default
Loan Amount /
Limit
Product Features
Individual
Provisions
On/Off Balance
Sheet
Revocability
Industry / ANZSIC
Salient Financials
CCRDomicile Country
SIHeld Collateral
Pricing
(Rates / Fees)
Bank Capital
*Set by Group Treasury
oRWA
Illustrative
X
Capital Buffer
(Stress Test)
TSR
(Total Shareholder Return)
EnterpriseDataDecompositionTree
Note: EcoProfit = NPAT - Cost of Equity ($) + IEL(CCA) + Imputation Credits
= NPAT – Cost of Equity (%) x Eco Cap ($)
ROE = NPAT / Book Equity, Book Equity = EcoCap = Total Reg Cap
Credit Risk Capital
Other
Revenue
IEL (CCA)
X
Eco Cap ($)
Expenses
Cost of Capital
Rate (%)
-
NPAT - EL
basis ($)
Cost of Capital
+
Franking
Credits – Tax
Allocated
Expenses
Controllable
Expenses
mRWA
Loss Data
(ELD/ILD)
Economic
Profit ($)
ROE (%)
Tenor
Customer
Asset Class
Credit RWA
Capital Ratio
X
Market Risk
Capital
Reg EL
Op Risk Capital
Investment Stakes, Fixed Assets,
Deferred Acquisition
Exposure At
Default (EAD)
Provisions &
Delinquencies
Collective
Provisions
Retail Pooling &
Segmentation
Probability of
Default
Loss Given Default
Loan Amount /
Limit
Product Features
Individual
Provisions
On/Off Balance
Sheet
Revocability
Industry / ANZSIC
Salient Financials
CCRDomicile Country
SIHeld Collateral
Pricing
(Rates / Fees)
Bank Capital
*Set by Group Treasury
oRWA
Illustrative
X
Capital Buffer
(Stress Test)
TSR
(Total Shareholder Return)
8. Principles of Cognitivo’s DQ approach
8
A pragmatic approach that doesn’t “boil-the-ocean” is required to focus on priority user cases while leveraging
organisational assets and AI to scale
Risk & Policy Based – Identify key processes that possess material data risk as
prioritised areas to perform DQ diagnosis and treatment
Process (use-case) Centric – Identify data flows that underpin key processes and
address data quality across the entire system data flow
Metadata Driven – Development or use of a conceptual data model as an
abstraction layer to work with business stakeholders to agree definitions and
business rules that is subsequently mapped to physical data models
Analytics & ML Enabled – use of data science techniques (such as ML, text
analytics, vision) to build industry and organisation specific data matching and data
quality diagnosis techniques
Embedded in Business-As-Usual – Roll out of DQ controls, measurement
(dashboard) as part of the organisation’s quality assurance processes, rather than
constructing new data KPI and consequence management framework
9. Example DQ use cases to improve key business outcomes
9
Cognitivo has extensive experience in executing data quality programmes within Financial Services, Government and
Accounting business domains
Use Case
• KYC / AML / CTF (Assurance of data feeds)
• CPS220, AIRB Accreditation
• APRA / ABS Regulatory Reporting (e.g. report on interest-only loans)
• Basel III Liquidity (FI/non-FI review)
• APS 120 Securitisation (Loan doc reconciliation)
• FATCA, GATCA
• OTC Reform, MiFID II (Cleanse LEI / SWIFT Code, Legal Form, Country of incorporation etc.)
• APS910 – SCV assurance
• IFRS9 / IFRS17 Assurance
• Staff Benefits Review (Review of former employees still on staff benefits programmes)
• Advice Compliance (SOA, PDS vs fees and charges review)
• …
Compliance
Business Management
• Payroll Assurance
• Financial Management reporting (Line of Business)
• Finance cube, business unit, GL structure review
• …
Customer / Sales
• Customer Contact Details (marketing, product service)
• Consent status
• Customer Age Review
• Customer Address Review (e.g. Suburb / postal code combination)
• Customer segmentation review
• CRM – customer structure review (e.g. customer legal structure, customer groupings)
• ..
10. DQ Execution
Lifecycle
Data Quality execution lifecycle
10
Cognitivo’s DQ Execution Lifecycle is linked to broader data demand management and IT planning lifecycles
Data Risk Demand
Management
Process / System
Improvement
Diagnosis
Conduct qualitative sizing, define
requirements, business rules and
conduct root cause analysis
Profiling - Size/quantify magnitude of each DQ Issue.
• Profile key data elements for validity / completeness
issues
• Correlate data across systems to identify integrity, this can
include use of techniques financial reconciliation
(checksums)
• Deploy an analytical process to find illogical combinations
of data, outliers etc.
• Higher complexity techniques such as text analytics and
computer vision to correlate with unstructured data
sources
• Machine learning approaches to identify patterns for
acceptable values / ranges
Holistic view of DQ issues & prioritisation
• Organisation-wide DQ issues register with self-assessment process to
periodically assess level of DQ risk
• DQ deep-dives through workshop / interviews for high risk areas
• Prioritise high impact and high occurrence issues to go into ‘fix process’
Correction process
• Obtain correct values and subsequent cleansing / system update.
• Automated through cross-system and 3rd party data source lookup
or derivation
• As a final step, client outreach may be required (e.g. establish call-
centre process)
Cleansing process
• Establish process for bulk update, testing and roll back within core
systems
• For systems where bulk update is not possible, develop RPA and
manual update capabilities.
Systemic Fix
• Make recommendations for systemic fixes through the
organisation’s broader change / fail-fix agenda
1
Monitoring & Reporting
• DQ issues profiled / fixed all trace back to a business
unit, hence DQ metrics can form process/compliance
KPI’s for business owners.
• DQ scorecards can automate existing QA processes /
operational risk controls by quantifying instances
where data entry is missing / incorrect.
• Trend analysis on DQ results for each responsible
business unit
DiagnosisA
6
2 ProfilingB
Correction &
Cleansing
C
Monitoring &
Reporting
D
11. DQ
Discovery
Scalable Data Quality DevOps
11
Cognitivo’s data quality workflow incorporates analytical tools, business testing and deployment into a DQ DevOps process
De-duplicate customers
within systems (collapse
entities)
Customer Outreach (high-priority/risk)
Cross system & 3rd party
lookup
Document /
correspondence lookup
Customer self service
Client
Applications
Correct
(obtain correct values & validate)
Issue and Workflow Management
Manual testing
values to
update via risk-
based sampling
Database Bulk Update /
Amend
Front-End Data Entry
(Inc. use of RPA – robotic
process automation)
Cleanse
(System Update)
Source Systems &
Processes
Profiling
Monitoring &
Reporting
Business Unit QA
to include DQ
measures
Customer Matching &
Analytical Environment
Issue prioritization
(based on risk i.e.
likelihood and
consequence)
Results
Dashboard
Raise new DQ
rules
To be updated
DQ Rules
Engine
DQ Development Workspace
Ingestion
DQ Production Deployment Business / Quality
Assurance
Root Cause
Analysis
Process &
System Fix
Operational Environment
Automated processes Manual processes
12. DQ Workspace & Platform Architecture
12
Cognitivo has a DQ Technical reference architecture that can be implemented on any vendor-agnostic cloud or on-premise
environment
To be updated
Data Pipeline
IngestionClient
Source Systems
Client Data
Warehouse(s)
Document
Repository
New Extracts
for checksums
etc
Batch Pipeline
Real Time Pipeline
(API’s / Messaging)
User Interface
DQ Policy & Rules
Configuration
Case Management
Conformed / derived values to
expedite DQ rule execution and
provide a history of values for
outlier / drift detection
Raw source system data to
derive row counts and
perform validity checks
User Interface
Data Lake
Data Science / ML Discovery Environment
Source Data Layer
Linked data
(lightly integrated)
SchedulerRules Store
Self-service data
Ingestion
Data Science Tools /
Workbench
Execution of analytical
workloads
DQ dashboards for consumption
by data stewards and business
stakeholders (data owners)
DQ Rule Execution
(Python)
DQ Rules based on the derived
semantic data model stored in
JSON format within the rules
store
Scheduler to execute DQ
Rules on a periodic basis
DQ Rules Engine
Text Extraction &
OCR
Results Dashboard
(PowerBI / Qlik)
Data Workspace
Provisioning
DQ Profile Result
Store
Semantic /
Conformed Data
(with history)
Data Science
Development &
Collaboration Tools e.g.
Git, Jupyter
Cross-system table linking to
correlate values across matched
customers
Store DQ profiling output
results. Contains historical
values to allow historical trend
analysis of DQ
Management of DQ rules,
tolerances and business owners
of DQ events
Case management tool for
logging, investigating and
remediating DQ issues
Provision of persistent
temporary storage and access to
access controlled data sets
(specific to department, user
and use cases)
Data import tools for un-
managed datasets (used
for discovery purposes)
Text extract and
analytics libraries e.g.
Tesseract
Batch data ingestion
using file (CSV) or
ODBC/JDBC
Real-time integration
Key Selected Technologies
13. DQ profiling techniques to be employed
13
Cognitivo’s analytical DQ framework deploys a number of analytical tests across structured and unstructured data sources
Test for
Completeness
Record count anomalies
Financial Reconciliation
(check-sums)
Test for Validity
Data Type & Format checks
(Regex pattern match)
Allowable values Reference
data lookup
Null Value check
Test for Accuracy
Illogical combinations of
multiple data fields
(e.g. individual with a business
name)
Single Field based logic
check
(e.g. age > 100)
3rd party cross reference
Cross system value cross-
reference
Reasonable value check
(record anomaly / outlier,
value drift over time)
Test for
Timeliness
Data Ingestion (ETL/ELT)
Synchronisation review
Document Text Extraction &
Cross Reference
Computer Vision
Image recognition & object
classification
Test for
Uniqueness
Duplicates within systems
Cross-system master data
reconciliation
14. Case Management
DQ Analytical
Engine
Cognitivo’s DQ Platform Capabilities
14
Cognitivo has a DQ application framework can be deployed onto private clouds via containers or accessed as a SaaS offering
Data Steward Portal (UX)
• Create profiling rules
• Diagnose DQ issues through reports and dashboards
• Workflow to approve data changes and case manage remediation
• APIs to integrate with 3rd party applications and check valid data entry based on data quality
rules
Core DQ Engine
• Semantic model of parameters for data stewards to create DQ rules
• DQ rule templates (e.g. regex functions, address validity, ABN format etc.)
• Analytical engine to run complex data accuracy / integrity rules
• API to allow 3rd party and customer automation, extension and access to DQ results
Data Pipeline
• Securely connect on-prem data sources to cloud environments in an encrypted manner
(Gateway)
• Database to store multiple time-stamped sampled extracts from source systems
• Efficient data ingestion pipeline with connectors for key council systems (e.g. Dynamics, ..)
Embedding DQ processes
• Build continuous improvement initiatives within directorates based on DQ analysis (e.g.
asking additional questions when customers call/visit)
• Set DQ KPIs within process metrics (e.g. accuracy of mandatory data capture)
Investigation
(Jupyter)
Reporting
Dashboard
Customer Data Sources
DQ Profiling
Datastore
Data Stewards (Users)
User Interface
Data Quality
Hub
Scheduler DQ Profiler
Connectors
DQ Managed
Parameters
(semantic model)
Gateway
Mobile App
API
DQ Rules Library
Web Interface