Data quality overview

2,473 views
2,117 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,473
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
128
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Data quality overview

  1. 1. Data Quality Overview Alex Meadows 1/28/2013
  2. 2. Data Quality Facts● Cost of poor data quality in US - $600 Billion● Poor Data/Lack of visibility cited as #1 reason for project cost overruns● Poor data quality costs the US Economy $3.1 Trillion a year● Implementing data quality best practices boosts revenue by 66%● Median Fortune 1000 company could increase revenue by $2.01 Billion if they improved usability of data by 10% Source: http://www.webmastat.com/blog/2012/09/07/7-facts-about-data-quality/
  3. 3. What is Data Quality?Measuring data to determine if it is “fit for purpose”
  4. 4. Fit For Purpose?● “Bad” data is a myth!● Two Questions ● What is the data used for? ● What can be measured to make sure it meets the need?● Application use vs. Reporting/Analysis
  5. 5. Data Quality Dimensions● Consistency ● Accuracy● Correctness ● Objectivity● Timeliness ● Conciseness● Precision ● Usefulness● Unamiguous ● Usability● Completeness ● Relevance● Reliability ● Amount of data Source: Data Quality Fundamentals, The Data Warehousing Institute
  6. 6. Measuring Data Quality● Profiling – understanding metadata ● Point in time shows what data looks like now ● Automating shows trends – Alert to new/potential issues as they happen – Potentially fix issues in near real time – Six Sigma Principals
  7. 7. Statistical Process Control● Automated inspection● Visibly shows process deviation
  8. 8. Data Profiling Analysis● Duplication ● Character Set● Pattern matching ● Reference Data● Boolean/String/Numb Matching er ● Value Distribution● Date Gap ● Inter-Data Set● Date/time Comparisons● Day of Week
  9. 9. Master Data Management● Create a gold standard for data● Distribute data so that all sources are uniform ● Names ● Addresses ● Phone Numbers ● Products● Can hook into third party sources
  10. 10. Data Governance Program● Central authority for data quality control● Applies information collected from data profiling, MDM, etc. Uniformly across the business● Communication channels between business and IT groups
  11. 11. Questions?

×