This document discusses data quality and management. It covers:
- The importance of data quality metrics like accuracy, completeness, conformity, and timeliness for verifying data.
- Why data quality is needed, as poor quality data can cause issues in applications and business processes.
- Approaches to improving data quality, including identifying quality issues, acting on the data directly through normalization and rules, or acting on underlying processes.
3. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data Quality in Data Management
MARCH 2015
3
DATA MANAGEMENT
Multiples modules
BIG DATA
Velocity, Volume, Variety, Veracity, Value
Collect
Storage
Data Mining /
Machine Learning
Data Viz
Governance
Security
Master Data
Data quality
4. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data quality – What 4
DATA QUALITY - VERACITY
Can be used for all data storage (not only in a product of Master Data Management)
Quality = metrics on your data
METRICS
Metrics can depend by data
EXAMPLES OF METRIC
Accuracy : exact representation of the data (ex: correct spelling). TABLE impact (ex : nb correct values / nb total)
Completeness : all fields are filled (ex : no void values). TABLE (ex : nb void values / nb total)
Conformity : respect specific format (ex : country code - regex). TABLE
Integrity : no missing links between records, no orphans. APPLI impact
Duplication : no unnecessary multiple representations of the same data. APPLI
Timeless : is data sufficiently up-ta-date (ex : actual time - time record saved in DB). APPLI
Consistency : matching of the data across applications. MULTI-APPLI impact
5. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data quality – Why / How 5
DATA QUALITY
Needed for the Master Data module and build the ‘single version of truth’
Need to clean, fix data on application
Need to find the poor quality on the data processing
Poor quality = issues in applications, on the process, to grow, to build
Prerequisite to exploit data (garbage in -> garbage out)
APPLY A GLOBAL TREATMENT ON THE DATA
Plan Do Check Act approach
Focus on a perimeter of data
need process of the data, define metrics/KPI
Collect
Store
Data Quality
Identify poor quality on the process
Actions to improve it
put quality at the creation of the data
Monitor the results
Act on data
Act on process
cost
benefits
action 1
action 3
action 2
6. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data quality – Act on data / process 6
ACT ONLY ON DATA
act directly on data
easier but keep issues of poor quality
Normalize void values, put business rules, regex
Use master data (countries, postal code, PO sites, vat, address)
Add integrity constraints in DB
Re-build technical links between records
ACT ON PROCESS
modify process and eliminate issues of poor quality
more difficult but more sustainable
Bind quality control on each data source of the data
Use master data
Replace manual tasks by ETL jobs
Clarify the process on the data, synchronization
Look for the root cause : Who, What, Where, When, Why ; 5 Why with actors
7. COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data quality – Act on process 7
IDENTIFY POOR QUALITY
human, process ~50%
Human : typos, bad help, insufficient controls
Bad data sources, bad transforms in ETL process
data model, applicative ~30%
MCD : not key to identify unique
Applicative : bad business controls, bug, not usage of
master data
Production ~20%
Bad ILM : obsolescence, bad synchronization of data
lifecycle and treatments
No master data, data in silos
Other causes
Migration of a system
Supplier changes, reorganization
Monitoring by data source over time
Instant monitoring by metric, by data source
collect