This document discusses data quality and provides guidance on assessing data quality. It defines data quality as anything that affects data's ability to accurately reflect reality. There are several dimensions of data quality including consistency, accuracy, completeness, auditability, orderliness, uniqueness, and timeliness. Common sources of poor data quality relate to issues in data collection, analysis, and reporting. The document concludes by outlining a process for assessing data quality which involves identifying critical data items, evaluating quality dimensions, applying assessment criteria, analyzing results, identifying sources of issues, and taking corrective actions.
The dark energy paradox leads to a new structure of spacetime.pptx
Data Quality from Concept to Report
1. Data Quality: From Concept to Report
Birhan Abdulkadir, ILRI
Training of Trainers on Multi-Stakeholder Platform Facilitation,
Gender and Data Management, ILRI, Addis Ababa, 20-21
November 2019
2. Data Quality
Anything that alters/changes the ability of data to reflect the ‘truth’.
A perception or an assessment of data’s fitness to serve its purpose in a given context.
A measure, or set of measures, that give an organization an indication of the level of
confidence it can have in the data that is used in it’s operational and strategic decision-
making process.
Good data is our most valuable asset, and bad data
can seriously harm our business and credibility…
3. Dimensions of Data Quality Checks
Dimension What it means Example of good practice Example of bad practice Metrics
Consistency No matter where you look in the
database you will not find any
contradiction in your data
AR beneficiaries list shows farmer
Bekelech has tested 5 technologies and
progresses of all 5 technologies
captured.
AR beneficiaries list shows farmer
Bekelech has tested 5
technologies, but the final report
shows only 2 technologies.
The number of
inconsistency
Accuracy The information your data
contains corresponds to reality
The farmer name is Abeba Kebede. And
this is exactly how it’s reflected in your
database
The farmer name is spelled Ababa
Kebede in your excel.
The ration of data to
errors.
Completeness All available elements of the
data have found their way to the
database
You know that farmer “X” is born on
11/03/1975.
You have no idea how old farmer
“X” is, as the date of birth cell is
empty.
The number of missing
values.
Auditability Data is accessible and it’s
possible to trace introduced
changes.
You can track down changes made in
farmer “X” data record. E.g. 12/6/2019,
his phone number was changed.
It’s impossible to trace down the
changes in farmer “X”.
% of cells where the
metadata about
introduced changes is
not accessible.
Orderliness The data entered has the
required format and structure.
The entry for November 12, 2019 is the
format 11/12/2019
12/11/2019, 12/11/19 The ration of data of
inappropriate format
Uniqueness A data record with specific
details appears only once in
database.
Only one date for birth for a given
farmer live in Sinana
You have multiple duplicate
records for the farmer.
The number of duplicates
revealed
Timeliness Data represents reality within a
reasonable period of time.
Number of records with
delayed changes
4. Sources of Data Quality Issues
Plan
Design
Instrumen
ts
Collection
Discovery
Analysis
Reporting
Planning
• Determine data gaps prior to data collection
• Choices around which outcomes/indicators to
measure?
• Resource needs?
Design – A Potential Death Zone!
• Choosing Quantity over Quality
• Qualitative vs. Quantitative
• Sampling Frame/ Selection Bias
• Sampling Strategy – clustering/stratification/
aggregation
• Sample Size and Precision
• Beneficiary list
Instruments
• Instrument design
• Logic control (skip rules, bounds, do loops, etc.)
• Wording/vocabulary
• Units
• Recall
• Question format – yes/no, multiple response, etc.
5. Sources of Data Quality Issues
Plan
Design
Instrumen
ts
Collection
Discovery
Analysis
Reporting
Data Collection Methods
• Measurement error
• Respondent error
• Enumerator error
• Real-time enumerator monitoring
• Timing
• Degree of difficulty
• Logistics
Discovery
• Make data accessible
• Don’t rely on human memory
• Meta data: data about data
Analysis
• Analytic skills
• Missing values versus zeros
• Appropriate tests
• Trying to Do It Alone
Reporting
• Focus on specific indicators
• Biased narration
6. Data anomaly
Zero vs. blank
Zero is a real number. Do not put a zero
when you mean a blank or no data.
Changes in scale / format
Dollars vs. Birr
Missing and default values
Application programs do not handle NULL
values well …
Changes in data layout / data types
Integer becomes string, fields swap
positions, etc.
Farmer ID Gender Planting Date
Farm01 M Jul-19
Farm02 female 7/20/2019
Farm03 1 14/7/2019
Farm04 male 2019
Farm05 0 Aug-19
Farm06 3/1/2019
Farm07 F 1/14/201
Farm08 female 2-Jan-19
7. Summary: Data Quality Assessment Process
Identify which data items need to be assessed for data quality, e.g. is data critical to project
results (related directly to project indicators)
Evaluate which data quality dimensions (e.g. completeness) to use and their related
weighting (assign weights, e.g. 100% completeness)
For each data quality dimension, define values or ranges representing excellent, good or
bad quality data based on the weightings (e.g. 90% completeness means excellent data).
Apply the assessment criteria to the data
Analyze the results and determine if data quality is acceptable or not
Identify in relation to the dimension, the possible source of the data quality issue
Take corrective actions e.g. clean the data (this should be based on the sources of data
quality issues) and improve data handling processes to prevent future recurrence
Repeat the above on a periodic basis to monitor trends in Data Quality
“It is better to be roughly right than precisely wrong.”
8. Wachemo University Mekelle University Madda Walabu University Debre Birhan University Hawassa University
Amhara Region Agricultural Research
Institute (ARARI)
South Agricultural Research
Institute (SARI)
Tigray Agricultural Research
Institute (TARI)
Oromia Agricultural Research
Institute (OARI)
Ethiopian Institute of Agricultural
Research (EIAR)
Fuji integrated Farm
Hundie
REST-GRAD Sunarma SOS Sahel Ethiopia
Ethiopian Agricultural Transformation
Agency (ATA)
Offices of Agriculture: Endamekoni (Tigray) Basona Worena (Amhara) Lemo (SNNRP) Sinana (Oromia)
Innovation laboratories: SIIL ILSSI PHIL LSIL
Africa RISING
Local Partners
(Phase I)
10. Africa Research in Sustainable Intensification for the Next Generation
africa-rising.net
This presentation is licensed for use under the Creative Commons Attribution 4.0 International Licence.
Editor's Notes
How well our M&E data “tell the true story.”
Why data quality matters?
Data Quality Dimension: Measurement or assessment of records, datasets, database, etc in order to understand the quality of data.
Completeness = Is all the relevant data available
Consistency = is data consistent throughout (always male/female, m/f, 0/1, true/false)
Validity = Does the data fall within accepted domains.
Accuracy = How accurate is the data, if we are measuring temperature to what level is temperature measured, with what variance or margin of error
Conformity = Does is it conform to the accepted business rule
Duplicates = Is the value duplicated, if so what represents the true value (2 customers but with different addresses)
Multifaceted nature:
Potential problems await at all stages of the process (from design/planning of project to reporting)
How Good is Your Data?
Does the data reflect current reality?
Does the data mean what you think it does?
How Good is Your Data?
Does the data reflect current reality?
Does the data mean what you think it does
Make data easily accessible and shared
???? How many of the participant use MS Excel
Irregularities