1. DWH-Ahsan AbdullahDWH-Ahsan Abdullah
11
Data WarehousingData Warehousing
Lecture-21Lecture-21
Introduction to Data Quality Management (DQM)Introduction to Data Quality Management (DQM)
Virtual University of PakistanVirtual University of Pakistan
Ahsan Abdullah
Assoc. Prof. & Head
Center for Agro-Informatics Research
www.nu.edu.pk/cairindex.asp
National University of Computers & Emerging Sciences, Islamabad
Email: ahsan101@yahoo.com
3. DWH-Ahsan Abdullah
3
What is Quality? InformallyWhat is Quality? Informally
Some things are better than others i.e. they are ofSome things are better than others i.e. they are of
higher quality. How much “better” is better?higher quality. How much “better” is better?
Is the right item the best item to purchase? HowIs the right item the best item to purchase? How
about after the purchase?about after the purchase?
What is quality of service? The bank exampleWhat is quality of service? The bank example
4. DWH-Ahsan Abdullah
4
What is Quality? FormallyWhat is Quality? Formally
“Quality is conformance to requirements”
P. Crosby, “Quality is Free” 1979
“Degree of excellence”
Webster’s Third New International Dictionary
5. DWH-Ahsan Abdullah
5
What is Quality? Examples from Auto IndustryWhat is Quality? Examples from Auto Industry
Quality means meeting customer’s needs,
not necessarily exceeding them.
Quality means improving things customers
care about, because that makes their lives
easier and more comfortable.
Why example from auto-industry?
6. DWH-Ahsan Abdullah
6
What is Data Quality?What is Data Quality?
Muhammad Khan
Height = 5’8”
Weight = 160 lbs
Gender = Male
Age = 35 yrs
Emp_ID = 440
All data is an abstraction of something real
What is Data?
Note Change
the picture
7. DWH-Ahsan Abdullah
7
What is Data Quality?What is Data Quality?
Intrinsic Data Quality
Electronic reproduction of reality.
Realistic Data Quality
Degree of utility or value of data to business.
8. DWH-Ahsan Abdullah
8
Data Quality & OrganizationsData Quality & Organizations
Intelligent Learning Organization:
High-quality data is an open, shared resource with value-
adding processes.
The dysfunctional learning
organization:
Low-quality data is a proprietary resource with cost-adding
processes.
{Comment: Put picture of person in water holding round tube with data written on it}
9. DWH-Ahsan Abdullah
9
Law #1 - “Data that is not used cannot be correct!”
Law #2 - “Data quality is a function of its use, not its
collection!”
Law #3 - “Data will be no better than its most stringent use!”
Law #4 - “Data quality problems increase with the age of the
system!”
Law #5 – “The less likely something is to occur, the more
traumatic it will be when it happens!”
Orr’s Laws of Data QualityOrr’s Laws of Data Quality
10. DWH-Ahsan Abdullah
10
Total Quality Control (TQM)Total Quality Control (TQM)
Philosophy of involving all forPhilosophy of involving all for systematicsystematic andand
continuouscontinuous improvement.improvement.
It is customer oriented. Why?It is customer oriented. Why?
TQM incorporates the concept of product quality,TQM incorporates the concept of product quality,
process control, quality assurance, and qualityprocess control, quality assurance, and quality
improvement.improvement.
Quality assurance isQuality assurance is NOTNOT Quality improvementQuality improvement
11. DWH-Ahsan Abdullah
11
Co$t of fixing data qualityCo$t of fixing data quality
Lowest Quality Highest quality
Costofachievingquality
Defect minimization is economical.
Defect elimination is very very expensive.
Exponential rise
in cost
12. DWH-Ahsan Abdullah
12
Co$t of Data Quality DefectsCo$t of Data Quality Defects
Controllable CostsControllable Costs
Recurring costs for analyzing, correcting, and preventingRecurring costs for analyzing, correcting, and preventing
data errorsdata errors
Resultant CostsResultant Costs
Internal and external failure costs of business opportunitiesInternal and external failure costs of business opportunities
missed.missed.
Equipment & Training CostsEquipment & Training Costs
13. DWH-Ahsan Abdullah
13
Where data quality is critical?Where data quality is critical?
Almost everywhere, some examples:Almost everywhere, some examples:
Marketing communications.Marketing communications.
Customer matching.Customer matching.
Retail house-holding.Retail house-holding.
Combining MIS systems after acquisition.Combining MIS systems after acquisition.
14. DWH-Ahsan Abdullah
14
Characteristics or Dimensions of Data QualityCharacteristics or Dimensions of Data Quality
Data Quality
Characteristic
Definition
Accuracy Qualitatively assessing lack of error, high accuracy
corresponding to small error.
Completeness The degree to which values are present in the attributes that
require them.
15. DWH-Ahsan Abdullah
15
Completeness Vs AccuracyCompleteness Vs Accuracy
95% accurate and 100% complete
OR
100% accurate and 95% complete
Which is better?
Depends on data quality (i) tolerances,Depends on data quality (i) tolerances,
the (ii) corresponding application and the (iii) cost ofthe (ii) corresponding application and the (iii) cost of
achieving that data quality vs. the (iv) business value.achieving that data quality vs. the (iv) business value.
16. DWH-Ahsan Abdullah
16
Characteristics or Dimensions of Data QualityCharacteristics or Dimensions of Data Quality
Data Quality
Characteristic
Definition
Consistency A measure of the degree to which a set of data satisfies a set of
constraints.
Timeliness A measure of how current or up to date the data is.
Uniqueness The state of being only one of its kind or being without an equal
or parallel.
Interpretability The extent to which data is in appropriate languages, symbols,
and units, and the definitions are clear.
Accessibility The extent to which data is available, or easily and quickly
retrievable
Objectivity The extent to which data is unbiased, unprejudiced, and
impartial