This presentation discusses data quality and how to ensure high quality data. It defines data quality as information that is validated, precise, complete, integrated, up-to-date and relevant. It then outlines the data life cycle and different types of data errors that can occur at each stage, from design to visualization. Key dimensions of data quality are also defined, including validity, reliability, precision and timeliness. Finally, it proposes several strategies for enhancing data quality, such as regular data quality checks, assessments and audits, real-time computerized tracking, and establishing data management systems and processes.
Presentation on dealing with data quality sushanta, MEAL part-2 training 28 september 2021
1. Presentation on Dealing with Data Quality
Sushanta Kumar Sarker, Senior Monitoring and Evaluation Specialist
FAO, Cox’s Bazar, 28 September 2021
2. What is Data Quality?
Data quality refers to the
state of qualitative or
quantitative pieces of
information which are
validated, precise, complete,
integrated, up-to-date and
relevant and fit for intended
uses in operations, decision
making and planning
Design and
plan
Collect and
capture
Data
processing
and ensure
quality
Data storage
and
Management
Analysis and
Interpretation
Visualization
and Share
Retrieve
and Reuse
Data life Cycle
Sushanta
5. Stages of data Errors circle
Stages Error
Design stage • Sampling error
• Relevancy error
• Instrument design error
Data collection Stage • Sample Selection error, non sampling error and administrate error
Data entry • Transcription errors, Transposition errors
• Unit/representation inconsistencies
• Incorrect data formatting
Data analysis • Correlation vs. causation
• Not Looking Beyond Numbers
• Not using disaggregated data
• Sample bias, and solution bias
• Wrong selection
Data Visualization Error • Error in visualization tools to audience
• The presentation of misleading or bad data
• Presentation of misleading or bad data
• Inconsistent scale across the data presented
• Visually cluttered graph
Sushanta
6. Error in data visualizations
https://sranalytics.io/blog/bad-data-visualization-examples/
7. Data Quality Dimension
Validity
Valid data are considered
accurate: They measure
what they are intended to
measure. External validation is
face validity
Data
Quality
Data are up-to date and
information is available on time
Multiple source of data shows
same values/unique values
Data Reliability
Data should reflect stable and
consistent data collection
processes and analysis methods
over time
Data that are collected,
analyzed and reported in
established mechanisms in
place to reduce manipulation
10. Framework for Enhancing Data Quality
Data Management System
Data Management
Processes /
Procedures
Data Quality System
Data Quality Processes
/ Procedures
Auditable
System
Document!
Risk
Verification
Source Validity
Reliability
Completeness
Precision
Timeliness
Integrity
Paper Trail
that allows
verification of
the entire
DMS and the
data produced
within it
Collection
Collation
Analysis
Reporting
Use
www.measureevaluation.org
11. Data Quality Assurance Integrated Strategies
Strategy 1
Regular Data quality Checking
Direct observation
Spot check, surprise visits
Activities Indicators
Input indicators
Cross check
Strategy 3: Real time and
Computerized
GPs tracking, real-time picture
Mobile message, online group
response
data validation, cleaning, etc.
Strategy 2
Assessment and Audit
Data Quality Review (DQR)
Data Quality Audit (DQA)
Strategies 4
Routine data assesment
Routine data assessment
Expedited data quality
assessment
Performance Routine
Assessment
Validity
Integrity
Timeliness
Completeness
Precision
Reliability
Uniqueness
15. Group work and presentations
Regular Data Quality Check
Data Quality Assessment
and Data Quality Audit
Regular Data Quality Assessment Real-time and Computerized
Data Quality