Successfully reported this slideshow.
Your SlideShare is downloading. ×

Reliability and validity

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 44 Ad

More Related Content

Slideshows for you (20)

Advertisement

Similar to Reliability and validity (20)

Advertisement

Recently uploaded (20)

Reliability and validity

  1. 1. Reliability and ValidityReliability and Validity Hatim Al-JifreeHatim Al-Jifree MB;ChB(Hon), FRCSC, GOC, MMedEdMB;ChB(Hon), FRCSC, GOC, MMedEd
  2. 2. Lecture objectivesLecture objectives To review the definitions of reliability andTo review the definitions of reliability and validityvalidity To review methods of evaluating reliability andTo review methods of evaluating reliability and validity in survey researchvalidity in survey research EBM prospectiveEBM prospective
  3. 3. ReliabilityReliability
  4. 4. DefinitionDefinition The degree ofThe degree of stabilitystability exhibited when aexhibited when a measurement ismeasurement is repeatedrepeated under identicalunder identical conditionsconditions Lack of reliability may arise from divergencesLack of reliability may arise from divergences betweenbetween observersobservers oror instrumentsinstruments ofof measurement ormeasurement or instabilityinstability of the attributeof the attribute being measuredbeing measured (from Last. Dictionary of Epidemiology)(from Last. Dictionary of Epidemiology)
  5. 5. Assessment of reliabilityAssessment of reliability Reliability is assessed in 3 formsReliability is assessed in 3 forms 1.1. Test-retest reliabilityTest-retest reliability 2.2. Alternate-form reliabilityAlternate-form reliability 3.3. Internal consistency reliabilityInternal consistency reliability
  6. 6. Test-retest reliabilityTest-retest reliability Most common form in surveysMost common form in surveys Same respondents complete a survey atSame respondents complete a survey at twotwo different points indifferent points in timetime Usually quantified with aUsually quantified with a correlationcorrelation coefficient (coefficient (rr value)value) rr values are considered good ifvalues are considered good if rr ≥≥ 0.700.70
  7. 7. Test-retest reliability (2)Test-retest reliability (2) If data are recorded by an observer, youIf data are recorded by an observer, you can have thecan have the same observersame observer makemake twotwo separate measurementsseparate measurements The comparison between the twoThe comparison between the two measurements ismeasurements is intrintraaobserverobserver reliabilityreliability What does a difference mean?What does a difference mean?
  8. 8. Test-retest reliability (3)Test-retest reliability (3) You can test-retestYou can test-retest specific questionsspecific questions oror thethe entireentire survey instrumentsurvey instrument Variables likely to change over a shortVariables likely to change over a short period of time, such as energy, happiness,period of time, such as energy, happiness, anxietyanxiety Test-retest over very short periods of timeTest-retest over very short periods of time
  9. 9. Test-retest reliability (4)Test-retest reliability (4) Potential problem with test-retest is thePotential problem with test-retest is the practice effectpractice effect Individuals become familiar with theIndividuals become familiar with the itemsitems What effect does this have on yourWhat effect does this have on your reliability estimates?reliability estimates? It inflates the reliability estimateIt inflates the reliability estimate
  10. 10. Alternate-form reliabilityAlternate-form reliability Use differently worded forms toUse differently worded forms to measure the same attributemeasure the same attribute Questions or responses are rewordedQuestions or responses are reworded Or their order is changedOr their order is changed To produce two items that areTo produce two items that are similar but not identicalsimilar but not identical
  11. 11. Alternate-form reliability (2)Alternate-form reliability (2) Two items address:Two items address: The same aspect of behaviorThe same aspect of behavior Same vocabularySame vocabulary Same level of difficultySame level of difficulty Items should differ in wording onlyItems should differ in wording only It is common to simply change the order of theIt is common to simply change the order of the response alternativesresponse alternatives This reduces practice effectThis reduces practice effect
  12. 12. Example: Assessment of depressionExample: Assessment of depression Circle one itemCircle one item Version A:Version A: During the past 4 weeks, I have felt downhearted:During the past 4 weeks, I have felt downhearted: Every dayEvery day 11 Some daysSome days 22 NeverNever 33 Version B:Version B: During the past 4 weeks, I have felt downhearted:During the past 4 weeks, I have felt downhearted: NeverNever 11 Some daysSome days 22 Every dayEvery day 33
  13. 13. Alternate-form reliability (3)Alternate-form reliability (3) You could alsoYou could also change the wordingchange the wording of theof the responseresponse alternatives withoutalternatives without changing the meaningchanging the meaning
  14. 14. Example: Assessment of urinary functionExample: Assessment of urinary function Version A:Version A: During the past week, how often did you usually empty yourDuring the past week, how often did you usually empty your bladder?bladder? 1 to 2 times per day1 to 2 times per day 3 to 4 times per day3 to 4 times per day 5 to 8 times per day5 to 8 times per day 12 times per day12 times per day More than 12 times per dayMore than 12 times per day
  15. 15. Example: Assessment of urinary functionExample: Assessment of urinary function Version B:Version B: During the past week, how often did you usually empty yourDuring the past week, how often did you usually empty your bladder?bladder? Every 12 to 24 hoursEvery 12 to 24 hours Every 6 to 8 hoursEvery 6 to 8 hours Every 3 to 5 hoursEvery 3 to 5 hours Every 2 hoursEvery 2 hours More than every 2 hoursMore than every 2 hours
  16. 16. Alternate-form reliability (4)Alternate-form reliability (4) You could also change the actual wording ofYou could also change the actual wording of thethe questionquestion The two items must be equivalentThe two items must be equivalent Items with different degrees of difficulty do notItems with different degrees of difficulty do not measure the same attributemeasure the same attribute What might they measure?What might they measure? Reading comprehension or cognitive functionReading comprehension or cognitive function
  17. 17. Example: Assessment of lonelinessExample: Assessment of loneliness Version A:Version A: How often in the past month have you felt alone in the world?How often in the past month have you felt alone in the world? Every dayEvery day Some daysSome days OccasionallyOccasionally NeverNever Version B:Version B: During the past 4 weeks, how often have you felt a sense of loneliness?During the past 4 weeks, how often have you felt a sense of loneliness? All of the timeAll of the time SometimesSometimes From time to timeFrom time to time NeverNever
  18. 18. Example of nonequivalent item rewordingExample of nonequivalent item rewording Version A:Version A: When your boss blames you for something you did not do, how often do you stickWhen your boss blames you for something you did not do, how often do you stick up for yourself?up for yourself? All the timeAll the time Some of the timeSome of the time None of the timeNone of the time Version B:Version B: When presented with difficult professional situations where a superior censuresWhen presented with difficult professional situations where a superior censures you for an act for which you are not responsible, how frequently do youyou for an act for which you are not responsible, how frequently do you respond in an assertive way?respond in an assertive way? All of the timeAll of the time Some of the timeSome of the time None of the timeNone of the time
  19. 19. Alternate-form reliability (5)Alternate-form reliability (5) You can measure alternate-form reliability at theYou can measure alternate-form reliability at the samesame timepointtimepoint oror separate timepointsseparate timepoints If large enough sample:If large enough sample: You can split it in half and administer one item to eachYou can split it in half and administer one item to each halfhalf Then compare the two halvesThen compare the two halves This is called a split-halves methodThis is called a split-halves method Can split into thirds and administer three forms of the itemCan split into thirds and administer three forms of the item
  20. 20. Internal consistency reliabilityInternal consistency reliability Applied toApplied to groups of itemsgroups of items that are thought tothat are thought to measuremeasure different aspectsdifferent aspects of theof the same conceptsame concept CronbachCronbach’’s coefficient alphas coefficient alpha Measures internal consistency reliabilityMeasures internal consistency reliability It is a reflection of how well the different itemsIt is a reflection of how well the different items complement eachcomplement each Interpret like a correlation coefficient (Interpret like a correlation coefficient (≥≥0.70 is good)0.70 is good)
  21. 21. Example: Assessment of physical functionExample: Assessment of physical function Limited a lot Limited a little Not limited Vigorous activities, such as running, lifting heavy objects, participating in strenuous sports 1 2 3 Moderate activities, such as moving a table, pushing a vacuum cleaner, bowling, or playing golf 1 2 3 Lifting or carrying groceries 1 2 3 Climbing several flights of stairs 1 2 3 Bending, kneeling, or stooping 1 2 3 Walking more than a mile 1 2 3 Walking several blocks 1 2 3 Walking one block 1 2 3 Bathing or dressing yourself 1 2 3
  22. 22. Calculation of CronbachCalculation of Cronbach’’s coefficient alphas coefficient alpha Example: Assessment of emotional healthExample: Assessment of emotional health During the past month:During the past month: Yes NoYes No Have you been a very nervous person?Have you been a very nervous person? 1 01 0 Have you felt downhearted and blue?Have you felt downhearted and blue? 1 01 0 Have you felt so down in the dumps thatHave you felt so down in the dumps that nothing could cheer you up?nothing could cheer you up? 1 01 0
  23. 23. ResultsResults Patient Item 1 Item 2 Item 3 Summed scale score 1 0 1 1 2 2 1 1 1 3 3 0 0 0 0 4 1 1 1 3 5 1 1 0 2 Percentage positive 3/5=.6 4/5=.8 3/5=.6
  24. 24. CalculationsCalculations Mean score=2Mean score=2 Sample variance=Sample variance= 5.1 )15( )22()23()20()23()22( 22222 = − −+−+−+−+− 86.0 2 3 5.1 )4)(.6(.)2)(.8(.)4)(.6(. 1 1 )(%)(% 1 =            ++ −=       −        −= ∑ k k Var negpos alphaCC ii Conclude that this scale has good reliability
  25. 25. Internal consistency reliability (2)Internal consistency reliability (2) If internal consistency is low:If internal consistency is low: You can add more itemsYou can add more items Re-examine existing items forRe-examine existing items for clarityclarity
  26. 26. Interobserver reliabilityInterobserver reliability How wellHow well twotwo evaluators agree in theirevaluators agree in their assessment of a variableassessment of a variable UseUse correlation coefficientcorrelation coefficient to compareto compare data between observersdata between observers May be used asMay be used as property of the testproperty of the test or asor as anan outcome variableoutcome variable
  27. 27. ValidityValidity
  28. 28. DefinitionDefinition How well a surveyHow well a survey measures what it setsmeasures what it sets out to measureout to measure
  29. 29. Assessment of validityAssessment of validity Validity is measured in four formsValidity is measured in four forms Face validityFace validity Content validityContent validity Criterion validityCriterion validity Construct validityConstruct validity
  30. 30. Face validityFace validity Cursory review of survey items by untrainedCursory review of survey items by untrained judgesjudges Ex. Showing the survey toEx. Showing the survey to untraineduntrained individualsindividuals to see whether they think theto see whether they think the items look okayitems look okay Very casual, softVery casual, soft Many donMany don’’t really consider this as at really consider this as a measure of validity at allmeasure of validity at all
  31. 31. Content validityContent validity SubjectiveSubjective measure of how appropriate themeasure of how appropriate the items seem to a set of reviewers who haveitems seem to a set of reviewers who have some knowledgesome knowledge of the subject matterof the subject matter Usually consists of an organized review ofUsually consists of an organized review of the surveythe survey’’s contentss contents Still very qualitativeStill very qualitative
  32. 32. Criterion validityCriterion validity Measure of how wellMeasure of how well one instrumentone instrument stacks upstacks up against another instrumentagainst another instrument or predictoror predictor ConcurrentConcurrent: assess your instrument against a: assess your instrument against a ““gold standardgold standard”” PredictivePredictive: assess the ability of your: assess the ability of your instrument to forecastinstrument to forecast future eventsfuture events,, behavior, attitudes, orbehavior, attitudes, or outcomesoutcomes Assess withAssess with correlation coefficientcorrelation coefficient
  33. 33. Construct validityConstruct validity MostMost valuablevaluable and mostand most difficultdifficult measure of validitymeasure of validity Basically, it is a measure of howBasically, it is a measure of how meaningful the scale or instrument ismeaningful the scale or instrument is when it is in practical usewhen it is in practical use
  34. 34. Construct validity (2)Construct validity (2) ConvergentConvergent: Implies that: Implies that several differentseveral different methodsmethods for obtaining thefor obtaining the same informationsame information about a given trait or concept produce similarabout a given trait or concept produce similar resultsresults Evaluation is analogous toEvaluation is analogous to alternate-formalternate-form reliabilityreliability exceptexcept that it isthat it is more theoreticalmore theoretical andand requires a great deal of work-usuallyrequires a great deal of work-usually byby multiple investigators with different approachesmultiple investigators with different approaches
  35. 35. Construct validity (3)Construct validity (3) DivergentDivergent: The ability of a measure to: The ability of a measure to estimate the underlying truth in a givenestimate the underlying truth in a given area-must be shown not to correlate tooarea-must be shown not to correlate too closely with similar butclosely with similar but distinct conceptsdistinct concepts or traitsor traits
  36. 36. EBM ProspectiveEBM Prospective
  37. 37. IntroductionIntroduction Three Steps in Using MedicalThree Steps in Using Medical Literature Articles :Literature Articles : Are the results of the study valid?Are the results of the study valid? What are the results?What are the results? How can I apply these results toHow can I apply these results to patient care?patient care?
  38. 38. IntroductionIntroduction Four types of papers:Four types of papers: TherapyTherapy Diagnostic InterventionDiagnostic Intervention PrognosisPrognosis Systematic reviewSystematic review
  39. 39. TherapyTherapy Study design: RCTStudy design: RCT Were Patients Randomized?Were Patients Randomized? Was Randomization Concealed?Was Randomization Concealed? Were Patients Analyzed in the Groups toWere Patients Analyzed in the Groups to Which They Were Randomized?Which They Were Randomized? Intention to treat analysisIntention to treat analysis
  40. 40. TherapyTherapy Were Patients inWere Patients in The TreatmentThe Treatment And Control GroupsAnd Control Groups Similar With Respect to KnownSimilar With Respect to Known Prognostic Factors?Prognostic Factors? Were Patients Aware of GroupWere Patients Aware of Group Allocation?Allocation?
  41. 41. TherapyTherapy Were Clinicians Aware of GroupWere Clinicians Aware of Group Allocation?Allocation? Were Outcome Assessors AwareWere Outcome Assessors Aware of Group Allocation?of Group Allocation? Was Follow-up Complete?Was Follow-up Complete? Was Follow-up Long Enough?Was Follow-up Long Enough?
  42. 42. Diagnostic InterventionDiagnostic Intervention Study Design: Cross-sectionalStudy Design: Cross-sectional Was there an independent, blind comparison with aWas there an independent, blind comparison with a reference standard?reference standard? •Spectrum of patientsSpectrum of patients •Did the results of the test being evaluated influence theDid the results of the test being evaluated influence the decision to perform the reference standard?decision to perform the reference standard? •Were the methods description permit replication?Were the methods description permit replication?
  43. 43. PrognosisPrognosis • Study design: CohortStudy design: Cohort • Was aWas a – Defined,Defined, – representative sample of patientrepresentative sample of patient – assembled at a common point in the course of their disease?assembled at a common point in the course of their disease? • Inception Cohort; earlyInception Cohort; early • Late stage prognosisLate stage prognosis • Patient equal in all prognostic factorsPatient equal in all prognostic factors • Stratified analysis?Stratified analysis? • Follow up complete and long enoughFollow up complete and long enough • Valid and reliable data collectionValid and reliable data collection
  44. 44. Thank YouThank You

×