1
QUALITYQUALITY
OFOF
DATADATA
2
LEARNING OBJECTIVESLEARNING OBJECTIVES
Realise importance of correct data for
program management
Realise distinction between random data
errors and falsified data
Understand causes of poor data quality
Being able to check data quality through
supervision and review of reports
Learn when and how to correct erroneous
data
3
Occurrence & importance of errorsOccurrence & importance of errors
In business context:
 Error rates of 1-5 % are not exceptional
 Estimated cost ≈ 10 % of revenue
 Problems with data quality ↑ when data originate from
multiple sources
 After initial enthusiasm to improve data quality, focus
on data quality generally slowly fades
In disease control context
 Error occurrence?
 Impact on program performance?
 Checking of errors: limited effort
4
Errors in RNTCP?Errors in RNTCP?
Based on pre-test carried in all countries:
Real possibility of errors in subdistrict
reports
Minor possibility in district reports
Little attention to checking for errors
5
Errors identified in 1039 TB patients
cohort review method in NY city: Munsiff et all IJTLD, 2006, 10 : 1133-9
• 41% of cases presented errors
• multiple errors per patient: 596 / 424 = 1.4
• What kind of errors?
- program info errors 55 %
- patient related errors 45 %
NB. Error rates in HMIS > 50 %
Gillies A. Methods Inf Med 2000, 39 : 208-12
6
DataData qualityquality: definitiondefinition
The state of
validity,validity,
reliability,reliability,
consistency,consistency,
timelinesstimeliness
and completenessand completeness
making data appropriate for a specific use
Problems with data quality do not only arise from
incorrect data
Inconsistent data is a problem as well
7
Data Quality ~ ManagementData Quality ~ Management
Quality AssuranceQuality Assurance
Activities to ensure
quality before data
collection
Quality AssuranceQuality Assurance
Activities to ensure
quality before data
collection
Quality ControlQuality Control
Monitoring and
maintaining quality of
data during RNTCP
implementation
Quality ControlQuality Control
Monitoring and
maintaining quality of
data during RNTCP
implementation
Data managementData management
Handling and analysis of data throughout
the RNTCP surveillance
Data managementData management
Handling and analysis of data throughout
the RNTCP surveillance
8
Quality assurance & controlQuality assurance & control
Quality assurance Quality control
- anticipates problems before they occur - responds to observed problems
- uses all available information to generate
improvements
- uses ongoing measurements to make
decisions on the processes or products
- is not tied to a specific quality standard - requires a pre-specified quality standard
for comparability
- is applicable mostly at the planning stage - is applicable mostly at the processing
stage
- is all-encompassing in its activities - is a set procedure that is a subset of
quality assurance
9
Quality controlQuality control
Quality control is a regulatory procedure
through which we:
 measure quality
 compare quality with pre-set standards
 act on the differences
The objective of quality control is to achieve
a given quality level with minimum cost
(ex. EQA sampling)
10
Dimensions of data qualityDimensions of data quality
1. Intrinsic data quality
 accuracy (validity and reliability)
2. Contextual data quality
 relevant
 timely
 complete
3. Representational data quality
 interpretability, easy to understand
4. Accessibility data quality
 accessibility, security
11
Intrinsic data qualityIntrinsic data quality
ACCURACYACCURACY
Exact conformity to the true value
WHY IMPORTANT?
Accurate data = precondition for
accurate decisions!!
Two concepts: validityvalidity and reliabilityreliability
QUESTION: is this guaranteed?
12
ValidityValidity
= the degree to which
a measurement reflects the truth
There should be no systematic error or bias
What is a valid sputum result for an open TB case?What is a valid sputum result for an open TB case?
A result is valid if it corresponds to the true value!A result is valid if it corresponds to the true value!
Open TB case = sputum positive!!Open TB case = sputum positive!!
13
ReliabilityReliability
The degree to which a measurement gives the same
result:
 each time it is used under the same condition
 with the same subject
A necessary but not sufficient condition for validity
because one can make the same errors twice
Reliability = repeatibility of measurementsReliability = repeatibility of measurements
Reliability is inversely related toReliability is inversely related to random errorrandom error
14
Dimensions of data qualityDimensions of data quality
1. Intrinsic data quality
 accuracy
2. Contextual data quality
 relevant
 timely
 complete
15
RELEVANCERELEVANCE
(usefulness)(usefulness)
Reflects the degree to which information
meets the real needs of clients.
Is concerned with whether the available
information sheds light on the issues that are
important to users.
16
RELEVANCERELEVANCE
A good information source should include all
relevant content and exclude all irrelevant content.
. Decision making for RNTCP management
Relevant for what?
.
Assessing relevance is subjective and depends upon the
varying needs of users!
17
TIMELINESSTIMELINESS
Refers to the moment data are compiled,
reported and analysed
Given RNTCP’s normalization of the data reporting
system, timeliness is not a major issue in India.
But it could be an issue in remote areas and in PPM
18
COMPLETENESSCOMPLETENESS
No missing data (records, items)
All data fields that have to be filled up,
should indeed contain data.
QUESTION: does this presently happen??
19
Missing records
• Annual report 2001 NTP Bangladesh
Reports DOTS areas non DOTS areas
-------------------------------------------------------
Received 2230 180
Missing 59 4
% missing 3% 2%
20
Dimensions of data qualityDimensions of data quality
1. Intrinsic data quality
 accuracy
2. Contextual data quality
 relevant
 timely
 complete
3. Representational data quality
 interpretability, easy to understand
21
Representational data qualityRepresentational data quality
Interpretability
Data must be in appropriate language and
units, and the data definitions must be
clear to all (language, jargon, concepts)
Ease of understanding
Data must be clear, without ambiguity, and
easily comprehended.
22
Dimensions of data qualityDimensions of data quality
1. Intrinsic data quality
 accuracy
2. Contextual data quality
 relevant
 timely
 complete
3. Representational data quality
 interpretability, easy to understand
4. Accessibility data quality
 accessibility, security
23
ACCESSIBILITYACCESSIBILITY
Essential element of any data quality assessment.Essential element of any data quality assessment.
If data is not accessible, then it has little or no valueIf data is not accessible, then it has little or no value..If data is not accessible, then it has little or no valueIf data is not accessible, then it has little or no value..
Accessibility = precondition for use, but no guarantee for use!
Data items should be easily obtainable and legal to collect.Data items should be easily obtainable and legal to collect.
In computer era, guidelines have to be established for whoIn computer era, guidelines have to be established for who
may access which datamay access which data
24
SECURITYSECURITY
The protection of data from:
☞unauthorized modification (accidental or
intentional)
☞equipment malfunction (computer crash),
☞natural disasters (fire, tsunami..) and crime
Be aware!
Security threats are more serious when HMIS is
computerized:
 unauthorized access to data
 damage to files (viruses…)
Be aware!
Security threats are more serious when HMIS is
computerized:
 unauthorized access to data
 damage to files (viruses…)
25
Data management covers the whole process, starting from data
recording to transcription, compilation, analysis & interpretation,
reporting, feedback and use.
TB CENTRE
(OPD or lab)
TB CENTRE
(OPD or lab)
TRANSCRIPTIONTRANSCRIPTIONRECORDINGRECORDING
COMPILATIONCOMPILATION
ANALYSIS
& INTERPRETATION
ANALYSIS
& INTERPRETATION
REPORTINGREPORTING
FEEDBACK & USEFEEDBACK & USE
26
Where can errors occur?Where can errors occur?
At each step, especially during:
Data recording
Manual data transcription
Data compilation
Data entry in computer
Analysis
Interpretation
27
Step in data flow Source of error
Data recording Information not registered
Wrong information (wrong address, etc )
Right information wrongly entered (in the wrong
place)
Missing records
Data compilation Wrong counts
Missed reports
Duplicate counting
Compterised data
entry
Wrong entry
Partial entry
Partial entry of records
Template based
computerised data
analysis
nil
28
Prevention of data errorsPrevention of data errors
 clarity of the instructions
 training and motivation of the staff
 honesty of the staff
 user-friendliness of the data supports,
such as data forms and templates
 supervision
29
Prevention of data errorsPrevention of data errors
 computerized data handling :
 improves the accuracy of the data
 prevents processing and analysis errors
 makes fudging less easy, once the data
have been entered in the computer
 use of independent double entry techniques
(and checking of inconsistencies between
the 2 entries)
 data entry formatted to acceptable ranges
and modalities only
30
How to proceed with the dataHow to proceed with the data
verification?verification?
1. Be alert
2. Routine checking of data
3. Quarterly report checking
31
BE ALERT!BE ALERT!
 Registers that look meticulously clean
 All data entered with the same pen
 Lack of variation / identically results every quarter
 A too nice performance:
 absence of initial defaulters
 too low death rates
 too high cure rates
 absence of defaulting in IP …
Be alert to the likelihood of intentional
falsification of data!!!
Do not accept data without checking
their veracity!!!
32
How to proceed with the dataHow to proceed with the data
verification?verification?
 Routine checking of the data through
supervision
 Completeness checking
 Consistency checking
 Quarterly report checking
 Range checking
 Modality checking
33
Completeness checkingCompleteness checking
Completeness of report = all data have been reported!
A minimal completeness check verifies if
all variables contain data.
A minimal completeness check verifies if
all variables contain data.
Example:
200 NSP cases and age information only for 187 casesInformation is incomplete!
How to solve?
Verify via the original reports.
34
Consistency checkingConsistency checking
Checks whether the values of data items are concordant
Example: CAT III and Sputum+
How to check for inconsistencies?How to check for inconsistencies?
By cross tabulation
CAT Sputum result
SP+ SP-
CAT I 1162 114
CAT II 300 148
CAT III 16 103016
Contradiction
35
Range checkingRange checking
Any method of detecting whether a quantitative
variable is within an acceptable range
Example 1: Height of an adult patient
Acceptable range = 1.00 m to 2,00 m
3.00 m is impossible
0.98 is possible, but needs verification
Example 2: Age of an adult patient
Acceptable range =15 to 100 years
150 years is impossible
Any “impossible” or “out of range” value should
be verified via the original record or the patient.
Any “impossible” or “out of range” value should
be verified via the original record or the patient.
36
Modality checkingModality checking
The data of a qualitative variable are classified in
groups or modalities.
Each data should belong to one modality only
Example : Sex
Two modalities: Male or Female
Other values are impossible!
“Not known” is sometimes entered but is not a valid
modality and should be verified and corrected!
37
Correction of errorsCorrection of errors
ERROR ERRORS ??
Go back to the original data source.
But what if the original data source is erroneous?
The best method is to go back to a previous step in
the data flow, and verify patient records, lab records,
etc.
 If correct data found, then modify the erroneous data
 If correct data not found, then report as “missing”.
38
Errors in dataErrors in data
Risk for wrong decisionsRisk for wrong decisions
Information has to be of good quality
• correct data
• correct data processing
ValidValid
ReliableReliable
CompleteComplete
ConsistentConsistent
TimelyTimely
39
Erroneous dataErroneous data
BadBad informationinformation
WrongWrong decisionsdecisions
Appropriate actions??Appropriate actions??
40
Don’t forget : there is more room for
error than shown in this picture

Data verification slides bangalore to t (4)

  • 1.
  • 2.
    2 LEARNING OBJECTIVESLEARNING OBJECTIVES Realiseimportance of correct data for program management Realise distinction between random data errors and falsified data Understand causes of poor data quality Being able to check data quality through supervision and review of reports Learn when and how to correct erroneous data
  • 3.
    3 Occurrence & importanceof errorsOccurrence & importance of errors In business context:  Error rates of 1-5 % are not exceptional  Estimated cost ≈ 10 % of revenue  Problems with data quality ↑ when data originate from multiple sources  After initial enthusiasm to improve data quality, focus on data quality generally slowly fades In disease control context  Error occurrence?  Impact on program performance?  Checking of errors: limited effort
  • 4.
    4 Errors in RNTCP?Errorsin RNTCP? Based on pre-test carried in all countries: Real possibility of errors in subdistrict reports Minor possibility in district reports Little attention to checking for errors
  • 5.
    5 Errors identified in1039 TB patients cohort review method in NY city: Munsiff et all IJTLD, 2006, 10 : 1133-9 • 41% of cases presented errors • multiple errors per patient: 596 / 424 = 1.4 • What kind of errors? - program info errors 55 % - patient related errors 45 % NB. Error rates in HMIS > 50 % Gillies A. Methods Inf Med 2000, 39 : 208-12
  • 6.
    6 DataData qualityquality: definitiondefinition Thestate of validity,validity, reliability,reliability, consistency,consistency, timelinesstimeliness and completenessand completeness making data appropriate for a specific use Problems with data quality do not only arise from incorrect data Inconsistent data is a problem as well
  • 7.
    7 Data Quality ~ManagementData Quality ~ Management Quality AssuranceQuality Assurance Activities to ensure quality before data collection Quality AssuranceQuality Assurance Activities to ensure quality before data collection Quality ControlQuality Control Monitoring and maintaining quality of data during RNTCP implementation Quality ControlQuality Control Monitoring and maintaining quality of data during RNTCP implementation Data managementData management Handling and analysis of data throughout the RNTCP surveillance Data managementData management Handling and analysis of data throughout the RNTCP surveillance
  • 8.
    8 Quality assurance &controlQuality assurance & control Quality assurance Quality control - anticipates problems before they occur - responds to observed problems - uses all available information to generate improvements - uses ongoing measurements to make decisions on the processes or products - is not tied to a specific quality standard - requires a pre-specified quality standard for comparability - is applicable mostly at the planning stage - is applicable mostly at the processing stage - is all-encompassing in its activities - is a set procedure that is a subset of quality assurance
  • 9.
    9 Quality controlQuality control Qualitycontrol is a regulatory procedure through which we:  measure quality  compare quality with pre-set standards  act on the differences The objective of quality control is to achieve a given quality level with minimum cost (ex. EQA sampling)
  • 10.
    10 Dimensions of dataqualityDimensions of data quality 1. Intrinsic data quality  accuracy (validity and reliability) 2. Contextual data quality  relevant  timely  complete 3. Representational data quality  interpretability, easy to understand 4. Accessibility data quality  accessibility, security
  • 11.
    11 Intrinsic data qualityIntrinsicdata quality ACCURACYACCURACY Exact conformity to the true value WHY IMPORTANT? Accurate data = precondition for accurate decisions!! Two concepts: validityvalidity and reliabilityreliability QUESTION: is this guaranteed?
  • 12.
    12 ValidityValidity = the degreeto which a measurement reflects the truth There should be no systematic error or bias What is a valid sputum result for an open TB case?What is a valid sputum result for an open TB case? A result is valid if it corresponds to the true value!A result is valid if it corresponds to the true value! Open TB case = sputum positive!!Open TB case = sputum positive!!
  • 13.
    13 ReliabilityReliability The degree towhich a measurement gives the same result:  each time it is used under the same condition  with the same subject A necessary but not sufficient condition for validity because one can make the same errors twice Reliability = repeatibility of measurementsReliability = repeatibility of measurements Reliability is inversely related toReliability is inversely related to random errorrandom error
  • 14.
    14 Dimensions of dataqualityDimensions of data quality 1. Intrinsic data quality  accuracy 2. Contextual data quality  relevant  timely  complete
  • 15.
    15 RELEVANCERELEVANCE (usefulness)(usefulness) Reflects the degreeto which information meets the real needs of clients. Is concerned with whether the available information sheds light on the issues that are important to users.
  • 16.
    16 RELEVANCERELEVANCE A good informationsource should include all relevant content and exclude all irrelevant content. . Decision making for RNTCP management Relevant for what? . Assessing relevance is subjective and depends upon the varying needs of users!
  • 17.
    17 TIMELINESSTIMELINESS Refers to themoment data are compiled, reported and analysed Given RNTCP’s normalization of the data reporting system, timeliness is not a major issue in India. But it could be an issue in remote areas and in PPM
  • 18.
    18 COMPLETENESSCOMPLETENESS No missing data(records, items) All data fields that have to be filled up, should indeed contain data. QUESTION: does this presently happen??
  • 19.
    19 Missing records • Annualreport 2001 NTP Bangladesh Reports DOTS areas non DOTS areas ------------------------------------------------------- Received 2230 180 Missing 59 4 % missing 3% 2%
  • 20.
    20 Dimensions of dataqualityDimensions of data quality 1. Intrinsic data quality  accuracy 2. Contextual data quality  relevant  timely  complete 3. Representational data quality  interpretability, easy to understand
  • 21.
    21 Representational data qualityRepresentationaldata quality Interpretability Data must be in appropriate language and units, and the data definitions must be clear to all (language, jargon, concepts) Ease of understanding Data must be clear, without ambiguity, and easily comprehended.
  • 22.
    22 Dimensions of dataqualityDimensions of data quality 1. Intrinsic data quality  accuracy 2. Contextual data quality  relevant  timely  complete 3. Representational data quality  interpretability, easy to understand 4. Accessibility data quality  accessibility, security
  • 23.
    23 ACCESSIBILITYACCESSIBILITY Essential element ofany data quality assessment.Essential element of any data quality assessment. If data is not accessible, then it has little or no valueIf data is not accessible, then it has little or no value..If data is not accessible, then it has little or no valueIf data is not accessible, then it has little or no value.. Accessibility = precondition for use, but no guarantee for use! Data items should be easily obtainable and legal to collect.Data items should be easily obtainable and legal to collect. In computer era, guidelines have to be established for whoIn computer era, guidelines have to be established for who may access which datamay access which data
  • 24.
    24 SECURITYSECURITY The protection ofdata from: ☞unauthorized modification (accidental or intentional) ☞equipment malfunction (computer crash), ☞natural disasters (fire, tsunami..) and crime Be aware! Security threats are more serious when HMIS is computerized:  unauthorized access to data  damage to files (viruses…) Be aware! Security threats are more serious when HMIS is computerized:  unauthorized access to data  damage to files (viruses…)
  • 25.
    25 Data management coversthe whole process, starting from data recording to transcription, compilation, analysis & interpretation, reporting, feedback and use. TB CENTRE (OPD or lab) TB CENTRE (OPD or lab) TRANSCRIPTIONTRANSCRIPTIONRECORDINGRECORDING COMPILATIONCOMPILATION ANALYSIS & INTERPRETATION ANALYSIS & INTERPRETATION REPORTINGREPORTING FEEDBACK & USEFEEDBACK & USE
  • 26.
    26 Where can errorsoccur?Where can errors occur? At each step, especially during: Data recording Manual data transcription Data compilation Data entry in computer Analysis Interpretation
  • 27.
    27 Step in dataflow Source of error Data recording Information not registered Wrong information (wrong address, etc ) Right information wrongly entered (in the wrong place) Missing records Data compilation Wrong counts Missed reports Duplicate counting Compterised data entry Wrong entry Partial entry Partial entry of records Template based computerised data analysis nil
  • 28.
    28 Prevention of dataerrorsPrevention of data errors  clarity of the instructions  training and motivation of the staff  honesty of the staff  user-friendliness of the data supports, such as data forms and templates  supervision
  • 29.
    29 Prevention of dataerrorsPrevention of data errors  computerized data handling :  improves the accuracy of the data  prevents processing and analysis errors  makes fudging less easy, once the data have been entered in the computer  use of independent double entry techniques (and checking of inconsistencies between the 2 entries)  data entry formatted to acceptable ranges and modalities only
  • 30.
    30 How to proceedwith the dataHow to proceed with the data verification?verification? 1. Be alert 2. Routine checking of data 3. Quarterly report checking
  • 31.
    31 BE ALERT!BE ALERT! Registers that look meticulously clean  All data entered with the same pen  Lack of variation / identically results every quarter  A too nice performance:  absence of initial defaulters  too low death rates  too high cure rates  absence of defaulting in IP … Be alert to the likelihood of intentional falsification of data!!! Do not accept data without checking their veracity!!!
  • 32.
    32 How to proceedwith the dataHow to proceed with the data verification?verification?  Routine checking of the data through supervision  Completeness checking  Consistency checking  Quarterly report checking  Range checking  Modality checking
  • 33.
    33 Completeness checkingCompleteness checking Completenessof report = all data have been reported! A minimal completeness check verifies if all variables contain data. A minimal completeness check verifies if all variables contain data. Example: 200 NSP cases and age information only for 187 casesInformation is incomplete! How to solve? Verify via the original reports.
  • 34.
    34 Consistency checkingConsistency checking Checkswhether the values of data items are concordant Example: CAT III and Sputum+ How to check for inconsistencies?How to check for inconsistencies? By cross tabulation CAT Sputum result SP+ SP- CAT I 1162 114 CAT II 300 148 CAT III 16 103016 Contradiction
  • 35.
    35 Range checkingRange checking Anymethod of detecting whether a quantitative variable is within an acceptable range Example 1: Height of an adult patient Acceptable range = 1.00 m to 2,00 m 3.00 m is impossible 0.98 is possible, but needs verification Example 2: Age of an adult patient Acceptable range =15 to 100 years 150 years is impossible Any “impossible” or “out of range” value should be verified via the original record or the patient. Any “impossible” or “out of range” value should be verified via the original record or the patient.
  • 36.
    36 Modality checkingModality checking Thedata of a qualitative variable are classified in groups or modalities. Each data should belong to one modality only Example : Sex Two modalities: Male or Female Other values are impossible! “Not known” is sometimes entered but is not a valid modality and should be verified and corrected!
  • 37.
    37 Correction of errorsCorrectionof errors ERROR ERRORS ?? Go back to the original data source. But what if the original data source is erroneous? The best method is to go back to a previous step in the data flow, and verify patient records, lab records, etc.  If correct data found, then modify the erroneous data  If correct data not found, then report as “missing”.
  • 38.
    38 Errors in dataErrorsin data Risk for wrong decisionsRisk for wrong decisions Information has to be of good quality • correct data • correct data processing ValidValid ReliableReliable CompleteComplete ConsistentConsistent TimelyTimely
  • 39.
    39 Erroneous dataErroneous data BadBadinformationinformation WrongWrong decisionsdecisions Appropriate actions??Appropriate actions??
  • 40.
    40 Don’t forget :there is more room for error than shown in this picture