1
National Livestock Data
Use Training
Addis Ababa, Ethiopia
1st
Batch November 7-9, 2023
2nd
Batch January 15-17, 2024
Data Quality
2
Data Quality Issues
3
Introduction
1. Collection of Data
2. Organization of Data
3. Presentation of Data
4. Analysis of Data
5. Interpretation of Data/
Inference
DATA
What is it?
5,677.00 1,552.31 1,552.31 0.00 1,552.31 5,677.00 1,552.31 1,552.31 0.00 1,552.31
4,894.00 1,338.21 1,338.21 1,338.21 4,894.00 1,338.21 1,338.21 1,338.21
325.00 88.87 88.87 88.87 325.00 88.87 88.87 88.87
458.00 125.24 125.24 125.24 458.00 125.24 125.24 125.24
7,378.00 2,017.43 2,017.43 0.00 2,017.43 7,378.00 2,017.43 2,017.43 0.00 2,017.43
4,577.00 1,251.53 1,251.53 1,251.53 4,577.00 1,251.53 1,251.53 1,251.53
2,801.00 765.90 765.90 765.90 2,801.00 765.90 765.90 765.90
2,476.00 677.04 677.04 0.00 677.04 2,476.00 677.04 677.04 0.00 677.04
2,476.00 677.04 677.04 677.04 2,476.00 677.04 677.04 677.04
2,570.00 702.74 702.74 0.00 702.74 2,570.00 702.74 702.74 0.00 702.74
1,457.00 398.40 398.40 398.40 1,457.00 398.40 398.40 398.40
651.00 178.01 178.01 178.01 651.00 178.01 178.01 178.01
462.00 126.33 126.33 126.33 462.00 126.33 126.33 126.33
2,151.00 588.17 588.17 0.00 588.17 2,151.00 588.17 588.17 0.00 588.17
5,677.00 1,552.31 1,552.31 0.00 1,552.31 5,677.00 1,552.31 1,552.31 0.00 1,552.31
4,894.00 1,338.21 1,338.21 1,338.21 4,894.00 1,338.21 1,338.21 1,338.21
325.00 88.87 88.87 88.87 325.00 88.87 88.87 88.87
458.00 125.24 125.24 125.24 458.00 125.24 125.24 125.24
7,378.00 2,017.43 2,017.43 0.00 2,017.43 7,378.00 2,017.43 2,017.43 0.00 2,017.43
4,577.00 1,251.53 1,251.53 1,251.53 4,577.00 1,251.53 1,251.53 1,251.53
2,801.00 765.90 765.90 765.90 2,801.00 765.90 765.90 765.90
2,476.00 677.04 677.04 0.00 677.04 2,476.00 677.04 677.04 0.00 677.04
2,476.00 677.04 677.04 677.04 2,476.00 677.04 677.04 677.04
2,570.00 702.74 702.74 0.00 702.74 2,570.00 702.74 702.74 0.00 702.74
1,457.00 398.40 398.40 398.40 1,457.00 398.40 398.40 398.40
651.00 178.01 178.01 178.01 651.00 178.01 178.01 178.01
462.00 126.33 126.33 126.33 462.00 126.33 126.33 126.33
2,151.00 588.17 588.17 0.00 588.17 2,151.00 588.17 588.17 0.00 588.17
4
DATA
o Data are raw facts or other
findings which, by themselves,
are of limited values to decision
makers.
o Data refers to any kind of
information that can be collected,
stored, and processed.
Information
o Information is the result of organizing,
processing, and interpreting data in a
way that puts them into context,
uncovers patterns or problem areas,
and thus transforms data into facts
that are useful to decision makers.
Data Processing Information
Raw facts
100 participants
Aggregate
Average, trends
Human sensible,
concise, accurate,
timely
6
Creation
• Manual data entry
• External Acquisition
• Capture from device
Storage
Usage
Archival
Destruction
• Security
• Backup and recovery
• Data viewing, processing, modification
and saving
• Data sharing
•Data archived & protected
•purging
1.Data Creation
•The first phase of the data lifecycle is the
creation/capture of data from d/t forms e.g. PDF,
image, Word, database.
The three types data capturing methods
•Data Acquisition: acquiring already existing data
which has been produced outside the organization
•Data Entry: manual entry of new data by
personnel within the organization
•Data Capture: capture of data generated by
devices used in various processes in the
organization
2.Storage
Once data has been created within the organization, it needs to be stored
and protected, with the appropriate level of security applied. A robust
backup and recovery process should also be implemented to ensure
retention of data during the lifecycle.
3. Usage
During the usage phase of the data lifecycle, data is used to support activities in the
organization. Data can be viewed, processed, modified and saved. An audit trail
should be maintained for all critical data to ensure that all modifications to data are
fully traceable.
4. Archival
Data Archival is the copying of data to an environment where it is stored in case it is needed again in an
active production environment, and the removal of this data from all active production environments.
A data archive is simply a place where data is stored, but where no maintenance or general usage
occurs. If necessary, the data can be restored to an environment where it can be used.
Data Lifecycle
5. Destruction
The volume of archived data inevitably grows, and while you may want to save all your data forever, that’s
not feasible. Storage cost and compliance issues exert pressure to destroy data you no longer need. Data
destruction or purging is the removal of every copy of a data item from an organization. It is typically done
Data quality refers to the worth/
accuracy of the data/ information
collected
What is Data Quality?
● Interventions, programs in the sector is
“evidence-based”
● Data quality  Data use
● Accountability
Why Data Quality In Ethiopian livestock
sector ?
Conceptual Framework of Data
Quality
Service delivery points
(the 5 prioritized system)
Intermediate aggregation levels
(e.g. districts/ regions, etc.)
M&E Unit in the ministry, or
Country level
Data
management
and
reporting
system
Functional components of Data
Management Systems Needed to Ensure
Data Quality
M&E Structures, Roles and Responsibilities
Indicator definitions and reporting guidelines
Data collection and reporting forms/tools
Data management processes
Data quality mechanisms
M&E capacity and system feedback
Dimensions of Data Quality
Validity, Integrity, Precision, Reliability,
Timeliness,
Quality Data
Data Quality Dimensions (DQD)
Every organization needs to develop and document its means to check the quality of their
data, so it needs to define the data quality dimensions.
Data Quality Assessment involves checking data against five criteria: VIP-RT
Validity
Integrity
Precision
Reliability
Timeliness
DQ help us to determine areas of poor data quality & help point us to potential solutions.
Validity
Does the data adequately represent
performance & are thus valid.
● The key question for validity is –
does this data actually represent
what we think it does?
E.g., in the animal data recording,
the ear tag is 10 digit, 9 or 11 digit is
invalid data
Face Validity: Is there a solid, logical relation
between the activity or program and what is
being measured? Is the indicator direct?
Measurement Construct Validity: Are
measurement tools / procedures well designed,
and defensible or is there potential for:
 Non sampling errors
 Sampling / representation errors
 Memory errors
 Self-presentation bias
Transcription Validity: Are transcription / data
entry and collation procedures sound and is the
data entered/ tallied correctly?
Types of Validity error
Integrity
Are data free of ‘untruths’ (introduced by
either human or technical means, willfully
or unconsciously) and therefore have
integrity?
● Manipulation / Bias - Are mechanisms are
in place to reduce the possibility that data
are manipulated for political or personal
reasons?
● Unconscious integrity issues when
organizations offer positive or negative
incentives to encourage data collection
● Willful integrity issues - when a person or
organization purposefully provides false
data
Precision
● Does the data have an acceptable
margin of error and is thus precise?
● For the data set to be “precise”, we
need to ensure that data has an
acceptable margin of error
● The margin of error should be
acceptable given the likely management
decisions to be affected -consider the
consequences of the program or policy
decisions based on the data
E.g. recording age of animal using teeth…
Reliability Timeliness
Are data collected frequently and are they
current and thus timely?
Frequency
• Data are available on a frequent enough
basis to inform management decisions.
• A schedule of data collection, collation,
analysis and reporting is in place that meets
the country livestock needs?
Currency
• The data are reported in the most current
timeframe practically available
• The data are reported as soon as possible
after collection
• The date of collection is clearly identified in
reports
 Are data collection processes stable &
consistent over time and are thus
reliable.
 Consistency: Consistent data collection
processes are used from year to year,
location to location, data source to data
source
 Internal Quality Control: Procedures
are in place for periodic review of data
collection, maintenance, and
processing.
 Transparency: Data collection, cleaning,
analysis, reporting, & quality
assessment procedures are documented
in writing and Data quality problems
are clearly described in reports.
Reproducibility vs Validity
● Reproducibility
○ the degree to which a measurement provides the same result each time it is performed on a
given subject or specimen
Data standard document.
● Validity
○ from the Latin Validus - strong
○ the degree to which a measurement truly measures (represents) what it purports to measure
(represent)
Reproducibility vs Validity
● Reproducibility
Reliability, Repeatability, Precision, Variability, Dependability, Consistency, Stability
● Validity
○ Accuracy
Relationship Between Reproducibility and
Validity
Good Reproducibility
Poor Validity
Poor Reproducibility
Good Validity
Relationship Between Reproducibility and
Validity
Good Reproducibility
Good Validity
Poor Reproducibility
Poor Validity
A B C D E F G H I J
95 98 100 100 100 102 103 139 150 160
Outliers IMPACT the mean!
Mean (114.6) 85.43
Median (101) 100
Outliers IMPACT the mean!
Outlier detection
An outlier is an observation or data point that is significantly
different from the other observations in a dataset.
A B C D E F G H I J
0
20
40
60
80
100
120
20
40
45
40
30
100
35
25
33
60
LITTER OF MILK PER DAY PER CAW
Sources of Measurement Variability
● Observer
■ within-observer
■ between-observer
● Instrument
■ within-instrument
■ between-instrument
● Subject
■ within-subject
21
Thank
you!
Questions?

01-02-Data Use Training (Data Quality).pptx

  • 1.
    1 National Livestock Data UseTraining Addis Ababa, Ethiopia 1st Batch November 7-9, 2023 2nd Batch January 15-17, 2024
  • 2.
  • 3.
    3 Introduction 1. Collection ofData 2. Organization of Data 3. Presentation of Data 4. Analysis of Data 5. Interpretation of Data/ Inference
  • 4.
    DATA What is it? 5,677.001,552.31 1,552.31 0.00 1,552.31 5,677.00 1,552.31 1,552.31 0.00 1,552.31 4,894.00 1,338.21 1,338.21 1,338.21 4,894.00 1,338.21 1,338.21 1,338.21 325.00 88.87 88.87 88.87 325.00 88.87 88.87 88.87 458.00 125.24 125.24 125.24 458.00 125.24 125.24 125.24 7,378.00 2,017.43 2,017.43 0.00 2,017.43 7,378.00 2,017.43 2,017.43 0.00 2,017.43 4,577.00 1,251.53 1,251.53 1,251.53 4,577.00 1,251.53 1,251.53 1,251.53 2,801.00 765.90 765.90 765.90 2,801.00 765.90 765.90 765.90 2,476.00 677.04 677.04 0.00 677.04 2,476.00 677.04 677.04 0.00 677.04 2,476.00 677.04 677.04 677.04 2,476.00 677.04 677.04 677.04 2,570.00 702.74 702.74 0.00 702.74 2,570.00 702.74 702.74 0.00 702.74 1,457.00 398.40 398.40 398.40 1,457.00 398.40 398.40 398.40 651.00 178.01 178.01 178.01 651.00 178.01 178.01 178.01 462.00 126.33 126.33 126.33 462.00 126.33 126.33 126.33 2,151.00 588.17 588.17 0.00 588.17 2,151.00 588.17 588.17 0.00 588.17 5,677.00 1,552.31 1,552.31 0.00 1,552.31 5,677.00 1,552.31 1,552.31 0.00 1,552.31 4,894.00 1,338.21 1,338.21 1,338.21 4,894.00 1,338.21 1,338.21 1,338.21 325.00 88.87 88.87 88.87 325.00 88.87 88.87 88.87 458.00 125.24 125.24 125.24 458.00 125.24 125.24 125.24 7,378.00 2,017.43 2,017.43 0.00 2,017.43 7,378.00 2,017.43 2,017.43 0.00 2,017.43 4,577.00 1,251.53 1,251.53 1,251.53 4,577.00 1,251.53 1,251.53 1,251.53 2,801.00 765.90 765.90 765.90 2,801.00 765.90 765.90 765.90 2,476.00 677.04 677.04 0.00 677.04 2,476.00 677.04 677.04 0.00 677.04 2,476.00 677.04 677.04 677.04 2,476.00 677.04 677.04 677.04 2,570.00 702.74 702.74 0.00 702.74 2,570.00 702.74 702.74 0.00 702.74 1,457.00 398.40 398.40 398.40 1,457.00 398.40 398.40 398.40 651.00 178.01 178.01 178.01 651.00 178.01 178.01 178.01 462.00 126.33 126.33 126.33 462.00 126.33 126.33 126.33 2,151.00 588.17 588.17 0.00 588.17 2,151.00 588.17 588.17 0.00 588.17 4
  • 5.
    DATA o Data areraw facts or other findings which, by themselves, are of limited values to decision makers. o Data refers to any kind of information that can be collected, stored, and processed. Information o Information is the result of organizing, processing, and interpreting data in a way that puts them into context, uncovers patterns or problem areas, and thus transforms data into facts that are useful to decision makers. Data Processing Information Raw facts 100 participants Aggregate Average, trends Human sensible, concise, accurate, timely
  • 6.
    6 Creation • Manual dataentry • External Acquisition • Capture from device Storage Usage Archival Destruction • Security • Backup and recovery • Data viewing, processing, modification and saving • Data sharing •Data archived & protected •purging 1.Data Creation •The first phase of the data lifecycle is the creation/capture of data from d/t forms e.g. PDF, image, Word, database. The three types data capturing methods •Data Acquisition: acquiring already existing data which has been produced outside the organization •Data Entry: manual entry of new data by personnel within the organization •Data Capture: capture of data generated by devices used in various processes in the organization 2.Storage Once data has been created within the organization, it needs to be stored and protected, with the appropriate level of security applied. A robust backup and recovery process should also be implemented to ensure retention of data during the lifecycle. 3. Usage During the usage phase of the data lifecycle, data is used to support activities in the organization. Data can be viewed, processed, modified and saved. An audit trail should be maintained for all critical data to ensure that all modifications to data are fully traceable. 4. Archival Data Archival is the copying of data to an environment where it is stored in case it is needed again in an active production environment, and the removal of this data from all active production environments. A data archive is simply a place where data is stored, but where no maintenance or general usage occurs. If necessary, the data can be restored to an environment where it can be used. Data Lifecycle 5. Destruction The volume of archived data inevitably grows, and while you may want to save all your data forever, that’s not feasible. Storage cost and compliance issues exert pressure to destroy data you no longer need. Data destruction or purging is the removal of every copy of a data item from an organization. It is typically done
  • 7.
    Data quality refersto the worth/ accuracy of the data/ information collected What is Data Quality?
  • 8.
    ● Interventions, programsin the sector is “evidence-based” ● Data quality  Data use ● Accountability Why Data Quality In Ethiopian livestock sector ?
  • 9.
    Conceptual Framework ofData Quality Service delivery points (the 5 prioritized system) Intermediate aggregation levels (e.g. districts/ regions, etc.) M&E Unit in the ministry, or Country level Data management and reporting system Functional components of Data Management Systems Needed to Ensure Data Quality M&E Structures, Roles and Responsibilities Indicator definitions and reporting guidelines Data collection and reporting forms/tools Data management processes Data quality mechanisms M&E capacity and system feedback Dimensions of Data Quality Validity, Integrity, Precision, Reliability, Timeliness, Quality Data
  • 10.
    Data Quality Dimensions(DQD) Every organization needs to develop and document its means to check the quality of their data, so it needs to define the data quality dimensions. Data Quality Assessment involves checking data against five criteria: VIP-RT Validity Integrity Precision Reliability Timeliness DQ help us to determine areas of poor data quality & help point us to potential solutions.
  • 11.
    Validity Does the dataadequately represent performance & are thus valid. ● The key question for validity is – does this data actually represent what we think it does? E.g., in the animal data recording, the ear tag is 10 digit, 9 or 11 digit is invalid data Face Validity: Is there a solid, logical relation between the activity or program and what is being measured? Is the indicator direct? Measurement Construct Validity: Are measurement tools / procedures well designed, and defensible or is there potential for:  Non sampling errors  Sampling / representation errors  Memory errors  Self-presentation bias Transcription Validity: Are transcription / data entry and collation procedures sound and is the data entered/ tallied correctly? Types of Validity error
  • 12.
    Integrity Are data freeof ‘untruths’ (introduced by either human or technical means, willfully or unconsciously) and therefore have integrity? ● Manipulation / Bias - Are mechanisms are in place to reduce the possibility that data are manipulated for political or personal reasons? ● Unconscious integrity issues when organizations offer positive or negative incentives to encourage data collection ● Willful integrity issues - when a person or organization purposefully provides false data Precision ● Does the data have an acceptable margin of error and is thus precise? ● For the data set to be “precise”, we need to ensure that data has an acceptable margin of error ● The margin of error should be acceptable given the likely management decisions to be affected -consider the consequences of the program or policy decisions based on the data E.g. recording age of animal using teeth…
  • 13.
    Reliability Timeliness Are datacollected frequently and are they current and thus timely? Frequency • Data are available on a frequent enough basis to inform management decisions. • A schedule of data collection, collation, analysis and reporting is in place that meets the country livestock needs? Currency • The data are reported in the most current timeframe practically available • The data are reported as soon as possible after collection • The date of collection is clearly identified in reports  Are data collection processes stable & consistent over time and are thus reliable.  Consistency: Consistent data collection processes are used from year to year, location to location, data source to data source  Internal Quality Control: Procedures are in place for periodic review of data collection, maintenance, and processing.  Transparency: Data collection, cleaning, analysis, reporting, & quality assessment procedures are documented in writing and Data quality problems are clearly described in reports.
  • 14.
    Reproducibility vs Validity ●Reproducibility ○ the degree to which a measurement provides the same result each time it is performed on a given subject or specimen Data standard document. ● Validity ○ from the Latin Validus - strong ○ the degree to which a measurement truly measures (represents) what it purports to measure (represent)
  • 15.
    Reproducibility vs Validity ●Reproducibility Reliability, Repeatability, Precision, Variability, Dependability, Consistency, Stability ● Validity ○ Accuracy
  • 16.
    Relationship Between Reproducibilityand Validity Good Reproducibility Poor Validity Poor Reproducibility Good Validity
  • 17.
    Relationship Between Reproducibilityand Validity Good Reproducibility Good Validity Poor Reproducibility Poor Validity
  • 18.
    A B CD E F G H I J 95 98 100 100 100 102 103 139 150 160 Outliers IMPACT the mean! Mean (114.6) 85.43 Median (101) 100 Outliers IMPACT the mean! Outlier detection An outlier is an observation or data point that is significantly different from the other observations in a dataset.
  • 19.
    A B CD E F G H I J 0 20 40 60 80 100 120 20 40 45 40 30 100 35 25 33 60 LITTER OF MILK PER DAY PER CAW
  • 20.
    Sources of MeasurementVariability ● Observer ■ within-observer ■ between-observer ● Instrument ■ within-instrument ■ between-instrument ● Subject ■ within-subject
  • 21.