Your SlideShare is downloading. ×
Reliability and validity1
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Reliability and validity1

219
views

Published on

Reliability & Validity

Reliability & Validity

Published in: Healthcare

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
219
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. RELIABILITY AND VALIDITY Mrs. Bhaumika Sharma Lecturer, MMIHS
  • 2. Introduction • For a measure to be useful, it must be both reliable and valid. • Reliable= consistent in producing the same results every time the measure is used • Valid= measuring what it is supposed to measure
  • 3. Reliability • An instrument’s reliability is the consistency with which it measures the target attribute. • If a scale weighed a person at 120 pounds one minute and 150 pounds the next, we would consider it unreliable. • The less variation an instrument produces in repeated measurements, the higher its reliability. • Thus, reliability can be equated with a measure’s stability, consistency, or dependability. • A reliable measure maximizes the true score component and minimizes the error component.
  • 4. Contd. • The reliability of an instrument can be assessed in various ways. • The method chosen depends on the nature of the instrument and on the aspect of reliability of greatest concern. • Three key aspects are stability, internal consistency, and equivalence.
  • 5. 5 What in the world is a measurement instrument? • Any tool that you use to measure with… • What “instrument” might you use to measure the following items? 1. How heavy the apples are 2. How hot the meat is 3. How much orange juice there is 4. How tall the wall is a scale a meat thermometer a measuring cup a yardstick
  • 6. © A. Taylor Do not duplicate without author’s permission 6 What instrument you choose depends on several factors • Ease of use – do you know how to use it? I still can’t figure out my pedometer. • Access – do you have access to the instrument? Where do you get a distance wheel from? What if you don’t have a car – and therefore no odometer? • Appropriateness – is it the best tool for what you are measuring? You wouldn’t use a ruler to measure the distance from here to Ohio; you wouldn’t use an odometer to measure the length of a room. • Accuracy – how precise do your measurements have to be? Counting paces is easy and cheap but only an estimate. • Cost – how expensive is it to use the instrument or interpret the results? You can also use a laser but that can be really expensive.
  • 7. Stability of an instrument • The stability of an instrument is the extent to which similar results are obtained on two separate administrations. • Assessments of an instrument’s stability involve procedures that evaluate test–retest reliability. • Researchers administer the same measure to a sample on two occasions and then compare the scores.
  • 8. Contd. • The comparison is performed objectively by computing a reliability coefficient, which is a numeric index of the magnitude of the test’s reliability. • The value of the reliability coefficient theoretically can range between 0.00 and 1.00. • The test–retest method is a relatively easy approach to estimating reliability, and can be used with all the measures.
  • 9. • Attitudes, behaviors, knowledge, physical condition, and so forth can be modified by experiences between testing.
  • 10. Internal Consistency • Scales and tests that involve summing items are often evaluated for their internal consistency. • Scales designed to measure an attribute ideally are composed of items that measure that attribute and nothing else. • Internal consistency reliability is the most widely used reliability approach among nurse researchers.
  • 11. Split Half Technique • One of the oldest methods for assessing internal consistency is the split-half technique. • For this approach, items on a scale are split into two groups and scored independently. • Use odd items versus even items. • Scores on the two half tests then are used to compute a correlation coefficient.
  • 12. Cronbach’s Alpha • The most widely used method for evaluating internal consistency is coefficient alpha (or Cronbach’s alpha). • The normal range of values is between .00 and 1.00, and higher values reflect a higher reliability • Coefficient alpha is preferable to the Split-half procedure because it gives an estimate of the split-half correlation for all possible ways of dividing the measure into two halves.
  • 13. Equivalence • Used for observational method • Interrater (or interobserver) reliability is estimated by having two or more trained observers watching an event simultaneously, and independently recording data according to the instrument’s instructions. • The data can then be used to compute an index of equivalence or agreement between observers.
  • 14. • Another procedure is to compute reliability as a function of agreements, using the following equation: No. of agreements ------------------------------------------------- No. of agreements + disagreements
  • 15. METHODS TO MAINTAIN RELIABILITY 1. Translation of research instruments in common language (local language) for the concerned respondents which gives accurate message and responses. 2. Applying test-retest method 3. Training to the enumerator to prevent ambiguity and misunderstand, and providing proper guidelines
  • 16. Contd. 4. Alternative method of data collection: two different types of measuring tools are used for the same respondents for getting similar information. • Two instruments are compared on item by item basis and degree of similarity is determined. The greater the differences lower the reliability.
  • 17. Contd. 5. Split half method: the instrument is used to collect data. • After collecting data, the instrument is halved into two and correlation is calculated. • The correlation coefficient ranges from 0 to 1. • Value 0.6 or less is considered less reliability.
  • 18. Contd. • 6. Pre-testing: pre testing or preliminary testing is the process of measuring the effectiveness of the instruments prepared to gather data in advance. • After the tool is completed it must be tested on subjects who meet the criterion for the study sample. • The study area/setting used for the pre testing should match the population under study.
  • 19. Contd. • Its objective is to detect discrepancies that have crept in and to remove them after necessary modifications in the questionnaire/interview schedule.
  • 20. VARIABLES • DEPENDENT VARIABLE: the variable used to describe or measure the problem under study. • Dependent variable is effect • INDEPENDENT VARIABLE: the variable under study that influence the problem (dependent variable) is called independent variable. • Independents variable is cause. Any change in the dependent variable is due to change in the independent variable.
  • 21. Variables Contd. • INTERVENING VARIABLES: Independent variables that are not related to the purpose of the study, but may affect the dependent variable. • Example: Effect of nurses educational level on their job performance Nurses educational level (Independent Variable) Job performance (Dependent Variable) Training (Intervening Variable)
  • 22. Contd. • CONFOUNDING VARIABLE: A confounding variable (also called a third variable) is an extraneous variable that DOES cause a problem because we know that it DOES have a relationship with the independent and dependent variables. • A confounding variable is the type of extraneous variable that systematically varies or influences the independent variable and also influences the dependent variable. • A confounding variable is the kind of extraneous variable that we must be most concerned with.
  • 23. Nutritional Status of Children (Dependent Variable) Mother’s Education (Independent Variable) Economic Status of Family (Confounding Variable)
  • 24. © A. Taylor Do not duplicate without author’s permission 28 Reliable but not Valid 165 These instruments are very RELIABLE They both report consistently – too consistently But, neither measures what it is supposed to: • The scale is not really measuring weight • The clock is not measuring time They are NOT VALID
  • 25. © A. Taylor Do not duplicate without author’s permission 29 Putting Reliability and Validity Together • Every instrument can be evaluated on two dimensions: – Reliability • How consistent it is given the same conditions – Validity • If it measures what it is supposed to and how accurate it is
  • 26. © A. Taylor Do not duplicate without author’s permission 30 Putting Reliability and Validity Together • Imagine that I have 3 fish tank thermometers, a blue one, a red one, and a green one. • The blue one always reads the same temperature no matter how hot or cold the water is. • The red one shows a different temperature every time even if I just measured it 5 seconds earlier. • The green one seems to read accurately, warm when the water is warm and cold when the water is cool.
  • 27. © A. Taylor Do not duplicate without author’s permission 31 Complete the chart below Is it consistent? Is it measuring what it is supposed to? Is it reliable? Is it valid? Blue always reads the same temperature no matter what Red different temperature every time even if nothing has changed Green warm when the water is warm and cold when the water is cool Yes No Yes, the thermometer only changes if the temperature changes No No Yes Reliable but Not valid Not reliable Not valid Reliable and Valid
  • 28. © A. Taylor Do not duplicate without author’s permission 32 What can be said of the reliability and validity of the following? • A spelling test with the following item: 2 + 5 = ____ – Probably reliable, if you get it wrong once you will probably get it wrong again (assuming no new learning) – same with getting it right. – Lacks validity, this is more appropriate for a math test, not a spelling test • An elastic ruler (every time you use it is stretches to a different length) – Lack reliability – You can’t have validity without reliability • A thermometer used to measure volume – Probably reliable – Lacks validity for this task • A scale that reads 40 pounds at baseline – Reliable, will consistently be 40 lbs. off – Not valid
  • 29. VALIDITY • The important criterion for evaluating a quantitative instrument. • Validity is the degree to which an instrument measures what it is supposed to measure. • Reliability and validity are not independent qualities of an instrument. • A measuring device that is unreliable cannot possibly be valid. • Validity characteristics of research suggests the universal acceptance of their research findings; if the research is conducted systematically and all concerned accept the findings of the research.
  • 30. Types of Validity 1. Internal validity: refers to the extent to which it is possible to make an inference the independent variable is truly causing or influencing the dependent variable and that the relationship between two is not the spurious effect of an extraneous variable. Also called causal validity 2. External validity: refers to the extent to which the results of the study can be generalized. – A study is externally valid to the extent that the sample is representative of the broader population and the study setting and experimental arrangements are representative of the environments.
  • 31. Contd. 1. History threat refers to any event, other than the planned treatment event, that occurs between the pretest and posttest measurement and has an influence on the dependent variable. 2. Selection refers to selecting participants for the various groups in the study.
  • 32. Selection contd. • Selection is not a threat for the one group design but it is a threat for the two group design. • If subjects were selected by random sampling and random assignment, all had equal chance of being in treatment or comparison groups, and the groups are equivalent.
  • 33. Maturation 3. Maturation is present when a physical or mental change occurs over time and it affects the participants' performance on the dependent variable. • For example, if we measure first grade students' ability to perform arithmetic problems at the beginning of the year and again at the end of the year, some of their improvement will probably be due to their natural maturation (and not just due to what you have taught them during the year). • Therefore in the one group design, we will not know if their improvement is due to the teacher or if it is due to maturation.
  • 34. Contd. • Maturation is not a threat in the two group design because as long as the people in both groups mature at the same rate, the difference between the two groups will not be due to maturation.
  • 35. Testing 4. Testing refers to any change on the second administration of a test as a result of having previously taken the test. • Did the pre-test affect the scores on the post- test? • A pre-test may sensitize participant in unanticipated ways and their performance on the post-test may be due to the pre-test, not to the treatment, or, more likely, and interaction of the pre-test and treatment.
  • 36. Contd. • This is a threat to the one group design. • Not a threat to the two group(intervention and control) design. • Both groups are exposed to the pre-test and so the difference between groups is not due to testing.
  • 37. Instrumentation 5. Instrumentation refers to any change that occurs in the way the dependent variable is measured in the research study. • Instrumentation is not a threat in the two group design because as long as the people in both groups are affected equally by the instrumentation effect, the difference between the two groups will not be due to instrumentation.
  • 38. 6. Mortality • Differential loss of participants across groups. • Did some participants drop out? Did this affect the results? • Did about the same number of participants make it through the entire study in both experimental and comparison groups? • Is a threat for any design with more than one group.
  • 39. 7. Others a. DESIGN CONTAMINATION • Did the comparison group know (or find out) about the experimental group? • Did either group have a reason to want to make the research succeed or fail? • Often, investigators must interview subjects after the experiment concludes in order to find out if design contamination occurred.
  • 40. b. Compensatory rivalry • When subjects in some treatments receive goods or services believed to be desirable and this becomes known to subjects in other groups, social competition may motivate the latter to attempt to reverse or reduce the anticipated effects of the desirable treatment levels.
  • 41. c. Resentful demoralization • If subjects learn that their group receives less desirable goods or services, they may experience feelings of resentment and demoralization. • Their response may be to perform at an abnormally low level, thereby increasing the magnitude of the difference between their performance and that of groups that receive the desirable goods or services.
  • 42. Types of Internal Validity 1. Content validity 2. Criterion related validity 3. Construct validity
  • 43. CONTENT VALIDITY • Deals with whether the assessment content and composition is appropriate given what is being measured. • The content validity of an instrument is necessarily based on judgments. • There are no totally objective methods for ensuring the adequate content coverage of an instrument. • Experts in the content area are often called on to analyze the items’ adequacy in representing the content coverage of an
  • 44. Contd. • Face validity: subtype of content validity which verifies basically that the instrument gives the appearance of measuring concepts. • Here, colleagues or subjects can give their opinion about the instrument. • Consensual validity: subtype of content validity which is a process by which a panel of experts judges the validity.
  • 45. CRITERION RELATED VALIDITY • It represents the relationship between one measure and another measure of the same phenomena. • Criterion validity is usually measured using a correlation coefficient – when the correlation is high, the tool can be considered valid • It indicates that what degree the subject’s performance on the measurement tool and the subject’s actual behavior are related. • A correlation coefficient is computed between scores on the instruments and the criterion. • Two forms of criterion validity: concurrent and predictive validity
  • 46. Contd. • Concurrent validity uses an already existing and well-accepted measure against which the new measure can be compared. • For example, if you were developing a new pain assessment tool you would compare the ratings obtained from the new tools with those obtained using a previously validated tool. • Predictive validity measures the extent to which a tool can predict a future event of interest. • For example, does a tool developed to measure the risk of pressure sores in children in hospital in fact identify the children at risk?
  • 47. CONSTRUCT VALIDITY • This tests the link between a measure and the underlying theory. • If a test has construct validity, you would expect to see a reasonable correlation with tests measuring related areas. • Evidence of construct validity can be provided by comparing the results obtained with the results obtained using other tests, other (related) characteristics of the individual or factors in the individual’s environment which would be expected to affect test performance. • Construct validity is usually measured using a correlation coefficient – when the correlation is high, the tool can be considered valid.
  • 48. Contd. • Construct validity is based on the extent to which a test measures a theoretical construct or trait. • Constructs are specified and then interrelated with others in empirical testing • Empirical testing confirms or fails to confirm the relationship that would be predicted among concepts. • A complex process involving several studies.
  • 49. Maintaining Research Validity • Consistency of instrument with statement of the research problem, questions, objectives, hypothesis (if stated) and variables under study. • Using reliable instruments • Using random sampling method for data collection • Selecting matching groups for intervention and control groups • Controlling extraneous variables strictly • Adequate and representative sample size Considering threat to internal and external validity.
  • 50. External Validity • It is the extent to which the results of a study can be generalized to other situations and to other people.
  • 51. THREATS TO EXTERNAL VALIDITY • "A threat to external validity is an explanation of how you might be wrong in making a generalization.“ • Generally, generalizability is limited when the cause (i.e. the independent variable) depends on other factors; therefore, all threats to external validity interact with the independent variable.
  • 52. Threats to External Validity • Pre-test treatment interaction – When subjects’ reactions to a treatment are affected by exposure to a pretest • Multiple treatment interference – When subjects receive multiple treatments, effects from the first treatment may make determining the impact of the second treatment difficult • Selection treatment interaction – A problem when non-random samples are used – Ex) When using volunteer subjects, what target population do they represent?
  • 53. Threats to External Validity • Specificity of variables – Refers to the idea that experiments are conducted using specific variables under specific conditions that may limit generalizability – A problem when variables are poorly operationalized – Do the experimental conditions represent reality? • Treatment diffusion – Refers to unintended information sharing – When information is shared between experimental groups that impacts the how treatments are implemented in each group • Experimenter effects – Refers to a researchers influence on subjects or how procedures are followed. (ex, was the researcher more enthusiastic with one group over another?)
  • 54. Threats to External Validity • Reactive arrangements – Artificial environment – responding differently to a “fake” environment – Hawthorne effect – acting differently because you know you are a participant – John Henry effect – when the control group tries to “beat the treatment” because they know they are in the control group – Placebo effect – when control group subjects respond to the placebo in a manner consistent with their expectations for treatment – Novelty effect – increased response to a treatment because it is different, not better
  • 55. THREATS TO EXTERNAL VALIDITY • Failure to describe independent variables explicitly • Lack of representativeness of available and target populations • Hawthorne effect: the alteration of behavior by the subjects of a study due to their awareness of being observed. • Inadequate operationalizing of dependent variables • Sensitization/reactivity to experimental/research conditions • Interaction effects of extraneous factors and experimental/ research treatments • Ecological validity • Invalidity or unreliability of instruments • Multiple treatment validity
  • 56. Contd. • Ecological validity has typically been take to refer to whether or not one can generalize from observed behaviour in the laboratory to natural behaviour in the world. • Multiple treatment validity: When subjects receive multiple treatments, effects from the first treatment may make determining the impact of the second treatment difficult.
  • 57. HAVE A GOOD DAYHAVE A GOOD DAY