Reliability and ValidityReliability and ValidityHatim Al-JifreeHatim Al-JifreeMB;ChB(Hon), FRCSC, GOC, MMedEdMB;ChB(Hon), FRCSC, GOC, MMedEd
Lecture objectivesLecture objectivesTo review the definitions of reliability andTo review the definitions of reliability andvalidityvalidityTo review methods of evaluating reliability andTo review methods of evaluating reliability andvalidity in survey researchvalidity in survey researchEBM prospectiveEBM prospective
DefinitionDefinitionThe degree ofThe degree of stabilitystability exhibited when aexhibited when ameasurement ismeasurement is repeatedrepeated under identicalunder identicalconditionsconditionsLack of reliability may arise from divergencesLack of reliability may arise from divergencesbetweenbetween observersobservers oror instrumentsinstruments ofofmeasurement ormeasurement or instabilityinstability of the attributeof the attributebeing measuredbeing measured(from Last. Dictionary of Epidemiology)(from Last. Dictionary of Epidemiology)
Assessment of reliabilityAssessment of reliabilityReliability is assessed in 3 formsReliability is assessed in 3 forms1.1. Test-retest reliabilityTest-retest reliability2.2. Alternate-form reliabilityAlternate-form reliability3.3. Internal consistency reliabilityInternal consistency reliability
Test-retest reliabilityTest-retest reliabilityMost common form in surveysMost common form in surveysSame respondents complete a survey atSame respondents complete a survey attwotwo different points indifferent points in timetimeUsually quantified with aUsually quantified with a correlationcorrelationcoefficient (coefficient (rr value)value)rr values are considered good ifvalues are considered good if rr ≥≥ 0.700.70
Test-retest reliability (2)Test-retest reliability (2)If data are recorded by an observer, youIf data are recorded by an observer, youcan have thecan have the same observersame observer makemake twotwoseparate measurementsseparate measurementsThe comparison between the twoThe comparison between the twomeasurements ismeasurements is intrintraaobserverobserver reliabilityreliabilityWhat does a difference mean?What does a difference mean?
Test-retest reliability (3)Test-retest reliability (3)You can test-retestYou can test-retest specific questionsspecific questions ororthethe entireentire survey instrumentsurvey instrumentVariables likely to change over a shortVariables likely to change over a shortperiod of time, such as energy, happiness,period of time, such as energy, happiness,anxietyanxietyTest-retest over very short periods of timeTest-retest over very short periods of time
Test-retest reliability (4)Test-retest reliability (4)Potential problem with test-retest is thePotential problem with test-retest is thepractice effectpractice effectIndividuals become familiar with theIndividuals become familiar with theitemsitemsWhat effect does this have on yourWhat effect does this have on yourreliability estimates?reliability estimates?It inflates the reliability estimateIt inflates the reliability estimate
Alternate-form reliabilityAlternate-form reliabilityUse differently worded forms toUse differently worded forms tomeasure the same attributemeasure the same attributeQuestions or responses are rewordedQuestions or responses are rewordedOr their order is changedOr their order is changedTo produce two items that areTo produce two items that aresimilar but not identicalsimilar but not identical
Alternate-form reliability (2)Alternate-form reliability (2)Two items address:Two items address:The same aspect of behaviorThe same aspect of behaviorSame vocabularySame vocabularySame level of difficultySame level of difficultyItems should differ in wording onlyItems should differ in wording onlyIt is common to simply change the order of theIt is common to simply change the order of theresponse alternativesresponse alternativesThis reduces practice effectThis reduces practice effect
Example: Assessment of depressionExample: Assessment of depressionCircle one itemCircle one itemVersion A:Version A:During the past 4 weeks, I have felt downhearted:During the past 4 weeks, I have felt downhearted:Every dayEvery day 11Some daysSome days 22NeverNever 33Version B:Version B:During the past 4 weeks, I have felt downhearted:During the past 4 weeks, I have felt downhearted:NeverNever 11Some daysSome days 22Every dayEvery day 33
Alternate-form reliability (3)Alternate-form reliability (3)You could alsoYou could also change the wordingchange the wordingof theof the responseresponse alternatives withoutalternatives withoutchanging the meaningchanging the meaning
Example: Assessment of urinary functionExample: Assessment of urinary functionVersion A:Version A:During the past week, how often did you usually empty yourDuring the past week, how often did you usually empty yourbladder?bladder?1 to 2 times per day1 to 2 times per day3 to 4 times per day3 to 4 times per day5 to 8 times per day5 to 8 times per day12 times per day12 times per dayMore than 12 times per dayMore than 12 times per day
Example: Assessment of urinary functionExample: Assessment of urinary functionVersion B:Version B:During the past week, how often did you usually empty yourDuring the past week, how often did you usually empty yourbladder?bladder?Every 12 to 24 hoursEvery 12 to 24 hoursEvery 6 to 8 hoursEvery 6 to 8 hoursEvery 3 to 5 hoursEvery 3 to 5 hoursEvery 2 hoursEvery 2 hoursMore than every 2 hoursMore than every 2 hours
Alternate-form reliability (4)Alternate-form reliability (4)You could also change the actual wording ofYou could also change the actual wording ofthethe questionquestionThe two items must be equivalentThe two items must be equivalentItems with different degrees of difficulty do notItems with different degrees of difficulty do notmeasure the same attributemeasure the same attributeWhat might they measure?What might they measure?Reading comprehension or cognitive functionReading comprehension or cognitive function
Example: Assessment of lonelinessExample: Assessment of lonelinessVersion A:Version A:How often in the past month have you felt alone in the world?How often in the past month have you felt alone in the world?Every dayEvery daySome daysSome daysOccasionallyOccasionallyNeverNeverVersion B:Version B:During the past 4 weeks, how often have you felt a sense of loneliness?During the past 4 weeks, how often have you felt a sense of loneliness?All of the timeAll of the timeSometimesSometimesFrom time to timeFrom time to timeNeverNever
Example of nonequivalent item rewordingExample of nonequivalent item rewordingVersion A:Version A:When your boss blames you for something you did not do, how often do you stickWhen your boss blames you for something you did not do, how often do you stickup for yourself?up for yourself?All the timeAll the timeSome of the timeSome of the timeNone of the timeNone of the timeVersion B:Version B:When presented with difficult professional situations where a superior censuresWhen presented with difficult professional situations where a superior censuresyou for an act for which you are not responsible, how frequently do youyou for an act for which you are not responsible, how frequently do yourespond in an assertive way?respond in an assertive way?All of the timeAll of the timeSome of the timeSome of the timeNone of the timeNone of the time
Alternate-form reliability (5)Alternate-form reliability (5)You can measure alternate-form reliability at theYou can measure alternate-form reliability at the samesametimepointtimepoint oror separate timepointsseparate timepointsIf large enough sample:If large enough sample:You can split it in half and administer one item to eachYou can split it in half and administer one item to eachhalfhalfThen compare the two halvesThen compare the two halvesThis is called a split-halves methodThis is called a split-halves methodCan split into thirds and administer three forms of the itemCan split into thirds and administer three forms of the item
Internal consistency reliabilityInternal consistency reliabilityApplied toApplied to groups of itemsgroups of items that are thought tothat are thought tomeasuremeasure different aspectsdifferent aspects of theof the same conceptsame conceptCronbachCronbach’’s coefficient alphas coefficient alphaMeasures internal consistency reliabilityMeasures internal consistency reliabilityIt is a reflection of how well the different itemsIt is a reflection of how well the different itemscomplement eachcomplement eachInterpret like a correlation coefficient (Interpret like a correlation coefficient (≥≥0.70 is good)0.70 is good)
Example: Assessment of physical functionExample: Assessment of physical functionLimited alotLimited alittleNotlimitedVigorous activities, such as running, lifting heavyobjects, participating in strenuous sports1 2 3Moderate activities, such as moving a table,pushing a vacuum cleaner, bowling, or playing golf1 2 3Lifting or carrying groceries 1 2 3Climbing several flights of stairs 1 2 3Bending, kneeling, or stooping 1 2 3Walking more than a mile 1 2 3Walking several blocks 1 2 3Walking one block 1 2 3Bathing or dressing yourself 1 2 3
Calculation of CronbachCalculation of Cronbach’’s coefficient alphas coefficient alphaExample: Assessment of emotional healthExample: Assessment of emotional healthDuring the past month:During the past month: Yes NoYes NoHave you been a very nervous person?Have you been a very nervous person? 1 01 0Have you felt downhearted and blue?Have you felt downhearted and blue? 1 01 0Have you felt so down in the dumps thatHave you felt so down in the dumps thatnothing could cheer you up?nothing could cheer you up? 1 01 0
CalculationsCalculationsMean score=2Mean score=2Sample variance=Sample variance=5.1)15()22()23()20()23()22( 22222=−−+−+−+−+−86.0235.1)4)(.6(.)2)(.8(.)4)(.6(.11)(%)(%1= ++−=−−=∑kkVarnegposalphaCCiiConclude that this scale has good reliability
Internal consistency reliability (2)Internal consistency reliability (2)If internal consistency is low:If internal consistency is low:You can add more itemsYou can add more itemsRe-examine existing items forRe-examine existing items forclarityclarity
Interobserver reliabilityInterobserver reliabilityHow wellHow well twotwo evaluators agree in theirevaluators agree in theirassessment of a variableassessment of a variableUseUse correlation coefficientcorrelation coefficient to compareto comparedata between observersdata between observersMay be used asMay be used as property of the testproperty of the test or asor asanan outcome variableoutcome variable
DefinitionDefinitionHow well a surveyHow well a surveymeasures what it setsmeasures what it setsout to measureout to measure
Assessment of validityAssessment of validityValidity is measured in four formsValidity is measured in four formsFace validityFace validityContent validityContent validityCriterion validityCriterion validityConstruct validityConstruct validity
Face validityFace validityCursory review of survey items by untrainedCursory review of survey items by untrainedjudgesjudgesEx. Showing the survey toEx. Showing the survey to untraineduntrainedindividualsindividuals to see whether they think theto see whether they think theitems look okayitems look okayVery casual, softVery casual, softMany donMany don’’t really consider this as at really consider this as ameasure of validity at allmeasure of validity at all
Content validityContent validitySubjectiveSubjective measure of how appropriate themeasure of how appropriate theitems seem to a set of reviewers who haveitems seem to a set of reviewers who havesome knowledgesome knowledge of the subject matterof the subject matterUsually consists of an organized review ofUsually consists of an organized review ofthe surveythe survey’’s contentss contentsStill very qualitativeStill very qualitative
Criterion validityCriterion validityMeasure of how wellMeasure of how well one instrumentone instrument stacks upstacks upagainst another instrumentagainst another instrument or predictoror predictorConcurrentConcurrent: assess your instrument against a: assess your instrument against a““gold standardgold standard””PredictivePredictive: assess the ability of your: assess the ability of yourinstrument to forecastinstrument to forecast future eventsfuture events,,behavior, attitudes, orbehavior, attitudes, or outcomesoutcomesAssess withAssess with correlation coefficientcorrelation coefficient
Construct validityConstruct validityMostMost valuablevaluable and mostand most difficultdifficultmeasure of validitymeasure of validityBasically, it is a measure of howBasically, it is a measure of howmeaningful the scale or instrument ismeaningful the scale or instrument iswhen it is in practical usewhen it is in practical use
Construct validity (2)Construct validity (2)ConvergentConvergent: Implies that: Implies that several differentseveral differentmethodsmethods for obtaining thefor obtaining the same informationsame informationabout a given trait or concept produce similarabout a given trait or concept produce similarresultsresultsEvaluation is analogous toEvaluation is analogous to alternate-formalternate-formreliabilityreliability exceptexcept that it isthat it is more theoreticalmore theoretical andandrequires a great deal of work-usuallyrequires a great deal of work-usually bybymultiple investigators with different approachesmultiple investigators with different approaches
Construct validity (3)Construct validity (3)DivergentDivergent: The ability of a measure to: The ability of a measure toestimate the underlying truth in a givenestimate the underlying truth in a givenarea-must be shown not to correlate tooarea-must be shown not to correlate tooclosely with similar butclosely with similar but distinct conceptsdistinct conceptsor traitsor traits
IntroductionIntroductionThree Steps in Using MedicalThree Steps in Using MedicalLiterature Articles :Literature Articles :Are the results of the study valid?Are the results of the study valid?What are the results?What are the results?How can I apply these results toHow can I apply these results topatient care?patient care?
IntroductionIntroductionFour types of papers:Four types of papers:TherapyTherapyDiagnostic InterventionDiagnostic InterventionPrognosisPrognosisSystematic reviewSystematic review
TherapyTherapyStudy design: RCTStudy design: RCTWere Patients Randomized?Were Patients Randomized?Was Randomization Concealed?Was Randomization Concealed?Were Patients Analyzed in the Groups toWere Patients Analyzed in the Groups toWhich They Were Randomized?Which They Were Randomized?Intention to treat analysisIntention to treat analysis
TherapyTherapyWere Patients inWere Patients inThe TreatmentThe TreatmentAnd Control GroupsAnd Control GroupsSimilar With Respect to KnownSimilar With Respect to KnownPrognostic Factors?Prognostic Factors?Were Patients Aware of GroupWere Patients Aware of GroupAllocation?Allocation?
TherapyTherapyWere Clinicians Aware of GroupWere Clinicians Aware of GroupAllocation?Allocation?Were Outcome Assessors AwareWere Outcome Assessors Awareof Group Allocation?of Group Allocation?Was Follow-up Complete?Was Follow-up Complete?Was Follow-up Long Enough?Was Follow-up Long Enough?
Diagnostic InterventionDiagnostic InterventionStudy Design: Cross-sectionalStudy Design: Cross-sectionalWas there an independent, blind comparison with aWas there an independent, blind comparison with areference standard?reference standard?•Spectrum of patientsSpectrum of patients•Did the results of the test being evaluated influence theDid the results of the test being evaluated influence thedecision to perform the reference standard?decision to perform the reference standard?•Were the methods description permit replication?Were the methods description permit replication?
PrognosisPrognosis• Study design: CohortStudy design: Cohort• Was aWas a– Defined,Defined,– representative sample of patientrepresentative sample of patient– assembled at a common point in the course of their disease?assembled at a common point in the course of their disease?• Inception Cohort; earlyInception Cohort; early• Late stage prognosisLate stage prognosis• Patient equal in all prognostic factorsPatient equal in all prognostic factors• Stratified analysis?Stratified analysis?• Follow up complete and long enoughFollow up complete and long enough• Valid and reliable data collectionValid and reliable data collection