SlideShare a Scribd company logo
1 of 12
A HITCHHIKER'S GUIDE TO
DATA QUALITY
Tatiana Stebakova
The Data & InformationAssembly Australia April 2015
 Evolution of DQ Governance approach over the past 10
years
 How to make a quantum leap from DQ theory to
execution, personal view
 You’ve done it all by the book, but there is little traction
in Data quality. DQ and system’s thinking. Don’t panic!
Content
Evolution of DQ Governance approach
over the past 10 years
 Data Duplicates – still magic words
 Data Quality Frameworks - from emergence to maturity
 Senior Management Support - a breakthrough
 Senior Architects Support – little change
 Data Quality Governance - from novelty to mainstream
 Data QualityTools andTechnology – from luxury to BAU
 Metadata - from “what is it?” to “new black”
How to make a quantum leap from DQ theory
to execution, personal view
Step1. Data Quality Justification
DQ Horror stories
About 6.5 million Americans are 112 or
older. The US Social Security office has 6.5
million people on record as having reached
the age of 112, even though only 42 people
are known to be that old globally
"Studies in cost analysis show that
between 15% to > 20% of a company’s operating
revenue is spent doing things to get around or fix
data quality issues"
Larry English
Option 1 –What can we
gain?
Option 2 – Scare technique
Option 3(my favourite)–Risks
"Poor data is like a dirty windscreen. You can continue driving as your
vision degrades, but at some point you must stop and clear the
windscreen or risk everything"
Ken Orr
Step2. Build DQ requirements into solution
architecture and system’s development contract
Example of DQ requirements
ETL solution SHALLhave capability to perform Column integrity screening/ profiling
ETL solution SHALLhave capability to perform Data Structure screening/ profiling
ETL solution SHALLhave capability to perform Compliance to Business rule screening/ profiling
Quality should be built into the product, and testing alone
cannot be relied to ensure product quality (FDA,Current
Good Manufacturing Practice)
The … ETL controls solution SHALL perform a periodic full snapshot
of the same data for reconciliation purposes, if Delta files are used.
The … ETL solution SHALL have capability to perform Data
Structure screening/profiling
The … data extract process SHALL support logical data
consistency (temporal relationship of data).
Step3. Build data quality requirements into
system’s operation contract + DQ KPIs
“I’ve never been a good
spectator.
Either I’m playing the
game or I’m not
interested.”
Christiaan Barnard, the first surgeon,
performed heart transplant
…..solution shall have a capability to measure and report on the data quality Key Performance Indicators
(KPIs) as defined by the Governance authority.
KPI Examples:
• customer record uniqueness
• directory currency and accessibility
• information provenance.
• uptake rate - coverage
• quality of records per DQ dimensions and characteristics
• response time for typical transactions.
You’ve done it all by the book, but there
is little traction in Data quality.
 Don’t be afraid
 From Hitchhiker to Hijacker
 Become a driver. Apply for the architect’s, project lead or data
management jobs
 Drop your “data quality bugs/requirements” anywhere you can
 Look for opportunities.Change your strategy all the time
 Mimic your requirements, do not call them DQ requirements
 Lean on standards
 Do not reference DQ gurus. ReferenceTechnology gurus instead
 Befriend architects
 Be patient, keep cool
““Success is not final,
failure is not fatal: it is
the courage to continue
that counts.”
Winston Churchill
 Complex adaptive systems (CAS) - are dynamic systems able to
adapt with a changing environment where all participants are closely
linked with each other making up an “IT ecosystem” (MIT)
 Within such ecosystem, change becomes not so much as adaptation,
but co-evolution with all other related systems
 Rules of flocking:
 Follow the leader
 Align with neighbours
 Avoid overcrowding
Data Quality and system’s thinking
System’s thinking – delayed response
 Launch date - 2 March 2004
 Mission duration 10 years, 11
months and 23 days
 6.5 billion Kilometres
“After 10 years, and a journey of more than six
billion kilometres, the Rosetta spacecraft sent
its fridge-sized Philae lander down to Comet
67P/Churyumov-Gerasimenko”.
Questions

More Related Content

What's hot

Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Domino Data Lab
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprisemark madsen
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019mark madsen
 
Machine Learning Risk Management
Machine Learning Risk ManagementMachine Learning Risk Management
Machine Learning Risk ManagementAndrew Clark
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeStefan Kühn
 
IT & Innovation - short summary
IT & Innovation - short summaryIT & Innovation - short summary
IT & Innovation - short summaryPerry Nouwens
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
 
Foundation of data quality
Foundation of data qualityFoundation of data quality
Foundation of data qualityKhaled Mosharraf
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at ScaleDomino Data Lab
 
Analysis of "How to Start Thinking Like a Data Scientist" by Thomas C. Redman
Analysis of "How to Start Thinking Like a Data Scientist" by Thomas C. RedmanAnalysis of "How to Start Thinking Like a Data Scientist" by Thomas C. Redman
Analysis of "How to Start Thinking Like a Data Scientist" by Thomas C. RedmanEt Hish
 
Analysis of "How to Start Thinking Like a Data Scientist" by Thomas C. Redman
Analysis of "How to Start Thinking Like a Data Scientist" by Thomas C. RedmanAnalysis of "How to Start Thinking Like a Data Scientist" by Thomas C. Redman
Analysis of "How to Start Thinking Like a Data Scientist" by Thomas C. RedmanEt Hish
 
Intel boubker el mouttahid
Intel boubker el mouttahidIntel boubker el mouttahid
Intel boubker el mouttahidBigDataExpo
 
Big data and other buzzwords
Big data and other buzzwordsBig data and other buzzwords
Big data and other buzzwordsAndrew Clark
 
BIG DATA ANALYTICS,K.maheswari,II-M.sc(computer science),Bon Secours college...
BIG DATA  ANALYTICS,K.maheswari,II-M.sc(computer science),Bon Secours college...BIG DATA  ANALYTICS,K.maheswari,II-M.sc(computer science),Bon Secours college...
BIG DATA ANALYTICS,K.maheswari,II-M.sc(computer science),Bon Secours college...maheswarikumaran
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino Data Lab
 
eDiscovery Perspective
eDiscovery PerspectiveeDiscovery Perspective
eDiscovery PerspectiveRuss Gould
 
How to Document Agile Projects
How to Document Agile ProjectsHow to Document Agile Projects
How to Document Agile ProjectsPragati Sinha
 
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the CloudStrata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the CloudJaipaul Agonus
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Ali Alkan
 

What's hot (20)

Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprise
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
Machine Learning Risk Management
Machine Learning Risk ManagementMachine Learning Risk Management
Machine Learning Risk Management
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data Challenge
 
IT & Innovation - short summary
IT & Innovation - short summaryIT & Innovation - short summary
IT & Innovation - short summary
 
Asking Why
Asking WhyAsking Why
Asking Why
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
Foundation of data quality
Foundation of data qualityFoundation of data quality
Foundation of data quality
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at Scale
 
Analysis of "How to Start Thinking Like a Data Scientist" by Thomas C. Redman
Analysis of "How to Start Thinking Like a Data Scientist" by Thomas C. RedmanAnalysis of "How to Start Thinking Like a Data Scientist" by Thomas C. Redman
Analysis of "How to Start Thinking Like a Data Scientist" by Thomas C. Redman
 
Analysis of "How to Start Thinking Like a Data Scientist" by Thomas C. Redman
Analysis of "How to Start Thinking Like a Data Scientist" by Thomas C. RedmanAnalysis of "How to Start Thinking Like a Data Scientist" by Thomas C. Redman
Analysis of "How to Start Thinking Like a Data Scientist" by Thomas C. Redman
 
Intel boubker el mouttahid
Intel boubker el mouttahidIntel boubker el mouttahid
Intel boubker el mouttahid
 
Big data and other buzzwords
Big data and other buzzwordsBig data and other buzzwords
Big data and other buzzwords
 
BIG DATA ANALYTICS,K.maheswari,II-M.sc(computer science),Bon Secours college...
BIG DATA  ANALYTICS,K.maheswari,II-M.sc(computer science),Bon Secours college...BIG DATA  ANALYTICS,K.maheswari,II-M.sc(computer science),Bon Secours college...
BIG DATA ANALYTICS,K.maheswari,II-M.sc(computer science),Bon Secours college...
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 
eDiscovery Perspective
eDiscovery PerspectiveeDiscovery Perspective
eDiscovery Perspective
 
How to Document Agile Projects
How to Document Agile ProjectsHow to Document Agile Projects
How to Document Agile Projects
 
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the CloudStrata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
 

Viewers also liked

توسعه هواشناسی کاربردی - تهک دستگاهی
توسعه هواشناسی کاربردی - تهک دستگاهیتوسعه هواشناسی کاربردی - تهک دستگاهی
توسعه هواشناسی کاربردی - تهک دستگاهیBabak Asadi
 
Securing Large Commercial Deposits
Securing Large Commercial DepositsSecuring Large Commercial Deposits
Securing Large Commercial DepositsBizBankTX
 
თავგანწირული მხედარი
თავგანწირული      მხედარითავგანწირული      მხედარი
თავგანწირული მხედარიshorenagavasheli
 
Anish Kapoor at Versailles独家专访
Anish Kapoor at Versailles独家专访Anish Kapoor at Versailles独家专访
Anish Kapoor at Versailles独家专访Yang (Echo) ZHOU
 
浅谈电商 (赵千雨)
浅谈电商 (赵千雨)浅谈电商 (赵千雨)
浅谈电商 (赵千雨)Qianyu Zhao
 
Redaccion juridica
Redaccion juridicaRedaccion juridica
Redaccion juridicaFagundez2015
 
Paradox of Perceptions the Evolving and Complex reality Israel Iranian relati...
Paradox of Perceptions the Evolving and Complex reality Israel Iranian relati...Paradox of Perceptions the Evolving and Complex reality Israel Iranian relati...
Paradox of Perceptions the Evolving and Complex reality Israel Iranian relati...Yehudah (Eric) Sunshine
 
A POLYPHONIC REANALYSIS INSTRUMENT FOR UNDERSTANDING SCHOOL CULTURES
A POLYPHONIC REANALYSIS INSTRUMENT FOR UNDERSTANDING SCHOOL CULTURESA POLYPHONIC REANALYSIS INSTRUMENT FOR UNDERSTANDING SCHOOL CULTURES
A POLYPHONIC REANALYSIS INSTRUMENT FOR UNDERSTANDING SCHOOL CULTURESInternational Scientific Events
 
Las cifras de los Adblockers
Las cifras de los AdblockersLas cifras de los Adblockers
Las cifras de los AdblockersAdgravity
 
The innatist position (presentation)
The innatist position (presentation)The innatist position (presentation)
The innatist position (presentation)humaraneduardo
 
توسعه هواشناسی کاربردی (تهک) دستگاهی
توسعه هواشناسی کاربردی (تهک) دستگاهیتوسعه هواشناسی کاربردی (تهک) دستگاهی
توسعه هواشناسی کاربردی (تهک) دستگاهیBabak Asadi
 
PROLOG SYSTEM TUNASKITA
PROLOG SYSTEM TUNASKITAPROLOG SYSTEM TUNASKITA
PROLOG SYSTEM TUNASKITAHaris Darmawan
 
DHS_StrategicAugust2012_Final
DHS_StrategicAugust2012_FinalDHS_StrategicAugust2012_Final
DHS_StrategicAugust2012_FinalJeri Garcia
 
Soa bpm standalone_installation
Soa bpm standalone_installationSoa bpm standalone_installation
Soa bpm standalone_installationK Kumar Guduru
 

Viewers also liked (19)

Justicia de paz
Justicia de pazJusticia de paz
Justicia de paz
 
توسعه هواشناسی کاربردی - تهک دستگاهی
توسعه هواشناسی کاربردی - تهک دستگاهیتوسعه هواشناسی کاربردی - تهک دستگاهی
توسعه هواشناسی کاربردی - تهک دستگاهی
 
Securing Large Commercial Deposits
Securing Large Commercial DepositsSecuring Large Commercial Deposits
Securing Large Commercial Deposits
 
თავგანწირული მხედარი
თავგანწირული      მხედარითავგანწირული      მხედარი
თავგანწირული მხედარი
 
Anish Kapoor at Versailles独家专访
Anish Kapoor at Versailles独家专访Anish Kapoor at Versailles独家专访
Anish Kapoor at Versailles独家专访
 
浅谈电商 (赵千雨)
浅谈电商 (赵千雨)浅谈电商 (赵千雨)
浅谈电商 (赵千雨)
 
Redaccion juridica
Redaccion juridicaRedaccion juridica
Redaccion juridica
 
Paradox of Perceptions the Evolving and Complex reality Israel Iranian relati...
Paradox of Perceptions the Evolving and Complex reality Israel Iranian relati...Paradox of Perceptions the Evolving and Complex reality Israel Iranian relati...
Paradox of Perceptions the Evolving and Complex reality Israel Iranian relati...
 
A POLYPHONIC REANALYSIS INSTRUMENT FOR UNDERSTANDING SCHOOL CULTURES
A POLYPHONIC REANALYSIS INSTRUMENT FOR UNDERSTANDING SCHOOL CULTURESA POLYPHONIC REANALYSIS INSTRUMENT FOR UNDERSTANDING SCHOOL CULTURES
A POLYPHONIC REANALYSIS INSTRUMENT FOR UNDERSTANDING SCHOOL CULTURES
 
Las cifras de los Adblockers
Las cifras de los AdblockersLas cifras de los Adblockers
Las cifras de los Adblockers
 
Piganavej 11 facts
Piganavej 11 facts Piganavej 11 facts
Piganavej 11 facts
 
The innatist position (presentation)
The innatist position (presentation)The innatist position (presentation)
The innatist position (presentation)
 
توسعه هواشناسی کاربردی (تهک) دستگاهی
توسعه هواشناسی کاربردی (تهک) دستگاهیتوسعه هواشناسی کاربردی (تهک) دستگاهی
توسعه هواشناسی کاربردی (تهک) دستگاهی
 
PROLOG SYSTEM TUNASKITA
PROLOG SYSTEM TUNASKITAPROLOG SYSTEM TUNASKITA
PROLOG SYSTEM TUNASKITA
 
Miracle Essential Oil's
Miracle Essential Oil'sMiracle Essential Oil's
Miracle Essential Oil's
 
PERSPECTIVES OF ONLINE EDUCATION IN THE CZECH REPUBLIC
PERSPECTIVES OF ONLINE EDUCATION IN THE CZECH REPUBLICPERSPECTIVES OF ONLINE EDUCATION IN THE CZECH REPUBLIC
PERSPECTIVES OF ONLINE EDUCATION IN THE CZECH REPUBLIC
 
pliant_cd
pliant_cdpliant_cd
pliant_cd
 
DHS_StrategicAugust2012_Final
DHS_StrategicAugust2012_FinalDHS_StrategicAugust2012_Final
DHS_StrategicAugust2012_Final
 
Soa bpm standalone_installation
Soa bpm standalone_installationSoa bpm standalone_installation
Soa bpm standalone_installation
 

Similar to A Hitchhiker's Guide to Data Quality_20150331

Activate Your Data Lakehouse with an Enterprise Knowledge Graph
Activate Your Data Lakehouse with an Enterprise Knowledge GraphActivate Your Data Lakehouse with an Enterprise Knowledge Graph
Activate Your Data Lakehouse with an Enterprise Knowledge GraphDATAVERSITY
 
10 Steps for Taking Control of Your Organization's Digital Debris
10 Steps for Taking Control of Your Organization's Digital Debris 10 Steps for Taking Control of Your Organization's Digital Debris
10 Steps for Taking Control of Your Organization's Digital Debris Perficient, Inc.
 
Lecture 23
Lecture 23Lecture 23
Lecture 23Shani729
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Gianluca Tarasconi
 
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...Alan D. Duncan
 
Max Cottica slides from Future of Business Intelligence
Max Cottica slides from Future of Business Intelligence Max Cottica slides from Future of Business Intelligence
Max Cottica slides from Future of Business Intelligence Lauren Campbell Assoc CIPD
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming DatacentricTimothy Cook
 
Intel Faster Risk Oct08 - Andrew Parry
Intel Faster Risk Oct08 - Andrew ParryIntel Faster Risk Oct08 - Andrew Parry
Intel Faster Risk Oct08 - Andrew Parrymikeohara
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineSrikanth Sharma Boddupalli
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data Blueprint
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality EngineeringData-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality EngineeringDATAVERSITY
 
Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...John Kinmonth
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...Big Data Week
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...DataScienceConferenc1
 
BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?Christopher Bradley
 
How to unlock new data-driven potential for your organization
How to unlock new data-driven potential for your organizationHow to unlock new data-driven potential for your organization
How to unlock new data-driven potential for your organizationMichal Hodinka
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Denodo
 
Talend community user group Bristol & SW UK event
Talend community user group Bristol & SW UK eventTalend community user group Bristol & SW UK event
Talend community user group Bristol & SW UK eventKETL Limited
 

Similar to A Hitchhiker's Guide to Data Quality_20150331 (20)

Activate Your Data Lakehouse with an Enterprise Knowledge Graph
Activate Your Data Lakehouse with an Enterprise Knowledge GraphActivate Your Data Lakehouse with an Enterprise Knowledge Graph
Activate Your Data Lakehouse with an Enterprise Knowledge Graph
 
10 Steps for Taking Control of Your Organization's Digital Debris
10 Steps for Taking Control of Your Organization's Digital Debris 10 Steps for Taking Control of Your Organization's Digital Debris
10 Steps for Taking Control of Your Organization's Digital Debris
 
Lecture 23
Lecture 23Lecture 23
Lecture 23
 
Why data governance is the new buzz?
Why data governance is the new buzz?Why data governance is the new buzz?
Why data governance is the new buzz?
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
 
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
 
Max Cottica slides from Future of Business Intelligence
Max Cottica slides from Future of Business Intelligence Max Cottica slides from Future of Business Intelligence
Max Cottica slides from Future of Business Intelligence
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming Datacentric
 
Intel Faster Risk Oct08 - Andrew Parry
Intel Faster Risk Oct08 - Andrew ParryIntel Faster Risk Oct08 - Andrew Parry
Intel Faster Risk Oct08 - Andrew Parry
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality EngineeringData-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering
 
Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
 
BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?
 
How to unlock new data-driven potential for your organization
How to unlock new data-driven potential for your organizationHow to unlock new data-driven potential for your organization
How to unlock new data-driven potential for your organization
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
 
Talend community user group Bristol & SW UK event
Talend community user group Bristol & SW UK eventTalend community user group Bristol & SW UK event
Talend community user group Bristol & SW UK event
 

A Hitchhiker's Guide to Data Quality_20150331

  • 1. A HITCHHIKER'S GUIDE TO DATA QUALITY Tatiana Stebakova The Data & InformationAssembly Australia April 2015
  • 2.  Evolution of DQ Governance approach over the past 10 years  How to make a quantum leap from DQ theory to execution, personal view  You’ve done it all by the book, but there is little traction in Data quality. DQ and system’s thinking. Don’t panic! Content
  • 3. Evolution of DQ Governance approach over the past 10 years  Data Duplicates – still magic words  Data Quality Frameworks - from emergence to maturity  Senior Management Support - a breakthrough  Senior Architects Support – little change  Data Quality Governance - from novelty to mainstream  Data QualityTools andTechnology – from luxury to BAU  Metadata - from “what is it?” to “new black”
  • 4. How to make a quantum leap from DQ theory to execution, personal view
  • 5. Step1. Data Quality Justification DQ Horror stories About 6.5 million Americans are 112 or older. The US Social Security office has 6.5 million people on record as having reached the age of 112, even though only 42 people are known to be that old globally "Studies in cost analysis show that between 15% to > 20% of a company’s operating revenue is spent doing things to get around or fix data quality issues" Larry English Option 1 –What can we gain? Option 2 – Scare technique
  • 6. Option 3(my favourite)–Risks "Poor data is like a dirty windscreen. You can continue driving as your vision degrades, but at some point you must stop and clear the windscreen or risk everything" Ken Orr
  • 7. Step2. Build DQ requirements into solution architecture and system’s development contract Example of DQ requirements ETL solution SHALLhave capability to perform Column integrity screening/ profiling ETL solution SHALLhave capability to perform Data Structure screening/ profiling ETL solution SHALLhave capability to perform Compliance to Business rule screening/ profiling Quality should be built into the product, and testing alone cannot be relied to ensure product quality (FDA,Current Good Manufacturing Practice) The … ETL controls solution SHALL perform a periodic full snapshot of the same data for reconciliation purposes, if Delta files are used. The … ETL solution SHALL have capability to perform Data Structure screening/profiling The … data extract process SHALL support logical data consistency (temporal relationship of data).
  • 8. Step3. Build data quality requirements into system’s operation contract + DQ KPIs “I’ve never been a good spectator. Either I’m playing the game or I’m not interested.” Christiaan Barnard, the first surgeon, performed heart transplant …..solution shall have a capability to measure and report on the data quality Key Performance Indicators (KPIs) as defined by the Governance authority. KPI Examples: • customer record uniqueness • directory currency and accessibility • information provenance. • uptake rate - coverage • quality of records per DQ dimensions and characteristics • response time for typical transactions.
  • 9. You’ve done it all by the book, but there is little traction in Data quality.  Don’t be afraid  From Hitchhiker to Hijacker  Become a driver. Apply for the architect’s, project lead or data management jobs  Drop your “data quality bugs/requirements” anywhere you can  Look for opportunities.Change your strategy all the time  Mimic your requirements, do not call them DQ requirements  Lean on standards  Do not reference DQ gurus. ReferenceTechnology gurus instead  Befriend architects  Be patient, keep cool ““Success is not final, failure is not fatal: it is the courage to continue that counts.” Winston Churchill
  • 10.  Complex adaptive systems (CAS) - are dynamic systems able to adapt with a changing environment where all participants are closely linked with each other making up an “IT ecosystem” (MIT)  Within such ecosystem, change becomes not so much as adaptation, but co-evolution with all other related systems  Rules of flocking:  Follow the leader  Align with neighbours  Avoid overcrowding Data Quality and system’s thinking
  • 11. System’s thinking – delayed response  Launch date - 2 March 2004  Mission duration 10 years, 11 months and 23 days  6.5 billion Kilometres “After 10 years, and a journey of more than six billion kilometres, the Rosetta spacecraft sent its fridge-sized Philae lander down to Comet 67P/Churyumov-Gerasimenko”.

Editor's Notes

  1. Marvin Ford Prefect ARTUR DENT