Louzel Report - Reliability & validity

13,129 views

Published on

Published in: Technology, Business
1 Comment
34 Likes
Statistics
Notes
No Downloads
Views
Total views
13,129
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
897
Comments
1
Likes
34
Embeds 0
No embeds

No notes for slide
  • Raagas - provides the most revealing statistical index of quality that is ordinarily available to judge a measuring instrument Lee - Reliability can be jeopardized if the wording of a survey is confusing or if the survey interviewer misinterprets a question. Wiersa and Jurs - Reliability refers to the consistency of the research and the extent to which studies can be replicated. ----Findings are reliable when various researchers using the same approach would find the same result
  • high reliability indicates minimum error variance; reliability is necessary characteristic for validity, that is, a study cannot be valid if its lacks reliability. If a study is unreliable, we can hardly interpret the results with confidence.
  • There are different types of reliability, each of which deals with a different kind of test of consistency. Each is determined in a different manner. Stability * after 2 nd - Test reliability is important for tests used as predictors such as aptitude tests, affective and questionnaire instruments, since these measures are based heavily on the assumption that the scores will be stable over time. Such tests would not be useful if they produced very different scores at different times. The same test is given twice on a particular topic. This would provide two score for each individual tested. The correlation between the two sets of scores will yield the rest-retest reliability coefficient.
  • Generally, though not universally, a period of six (6) weeks is used to determine a test’s reliability.
  • If the same group takes both tests, the average score as well as the degree of score variability should be essentially the same on both tests.
  • The major problem involved with this method of estimating reliability is the difficulty of constructing two forms that are essentially equivalent
  • *1 st tab -- One standard method of splitting a test has been to score the odd-numbered items and the even-numbered items separately. Then the correlation between scores on the odd and even-numbered items is calculated. Of course, splitting a test in this way implies that the scores on which the reliability is based are from half-length tests. To obtain an estimate of the reliability of the full-length test, is necessary to correct, or step up the half test correlation to the full-length correlation. Split half –If the scale is very reliable, we’d expect a person’s score to be the same on one half of the scale as the other, and so the two halves should correlated perfectly. The correlation between the two halves is the statistic computed in the split-half method – large correlations being a sign of reliability. Split half reliability works as follows: say we have an attitude to teaching measure that consists of 10 tems, first we randomly split the test into 2 (for example, the even and the odd items.) then we calculate respondents’ scores on each ‘half test’. We can then see whether the two scores are related to one another. If they are both measuring the same thing. We would expect them to be strongly related, with a correlation coefficient of over 0.8. we would expect this measure to be over 0.7 before we can say that our test is consistent.
  •   * last tab -  If the test is reliable, the scores on the 2 halves have high positive association. An individual scoring high on one half would tend to score high on the other half, and vice versa.
  •  
  • - require only one administration of a test. - Applicable to binary data, for example, responses to items on achievement tests for which the response to an item is either correct or incorrect.
  • Multipoint data – attitude scale item that has 5 response options. For this reason, CCA is commonly used to estimate consistency of attitude scale
  • - Essentially, reliability and validity establish the credibility of the research. Reliability focuses on replicability and validity focuses on the accuracy of the findings. attaining reliability does not assure the validity of research. For example, observers could agree on the conclusions and yet the conclusions could be in error. If conclusions cannot be drawn with confidence, there are deficiencies in the research procedures and the study lacks validity.
  • Raagas - validity is important in all forms of research and all types of tests and measures Lee - an item designed to measure customer awareness of a given service should measure awareness and not another related concept. The validation process begins with an understanding of the interpretation to be made from the tests or instruments. - In other words, does the research instrument allow you to hit "the bull’s eye" of your research object? Researchers generally determine validity by asking a series of questions, and will often look for the answers in the research of others.
  • the validity of your results is not guaranteed by following some prescribed procedures. as Brinberg and McGrath (1985) put it, “Validity is not a commodity that can be purchased with techniques. Instead it depends on the relationship of your conclusions to the real world, and there are no methods that can assure you that you have adequately grasped those aspects of the ___ that you are studying. validity is a goal rather than a product; it is never something that can be proven or taken for granted. Validity is also relative – it has to be assessed in relationship to the purposes and circumstances of the research, rather than being a context-independent property of methods or conclusions. Finally, validity threat are made implausible by evidence, not methods; methods are only a way of getting evidence that can help you rule out these threats. in the simplest terms, findings are valid when the researcher can draw meaningful inferences from instruments that measure what they intend to measure so validity basically means “measuring what you think you’re measuring.
  • Validity has 3 distinct aspects , all of which are important. - Content validity is also invoked when the argument is made that the measurement so self-evidently reflects or represents the various aspects of the phenomenon being researched on. This faith is supported by little more than common sense. -Content validity refers to the degree to which adequate data is collected as it relates to the construct being measured . For example, a survey measuring the image of the library in the community would need to include questions that cover the opinions, attitudes and knowledge relevant to the study. Content validity is usually not measured quantitatively but is derived from the researchers’ knowledge of the community and the relevant theoretical literature. The items in your questionnaire must relate to the construct being measured. For example, a questionnaire measuring the effectiveness of a teacher is useless if it contains items relating to ____. This validity is achieved when items are first selected; don’t include items that are blatantly very similar to other items and ensure that questions cover the full range of construct. The extent to which the content of the test items reflects whatever is under study.
  • CV is determined by expert judgement of item and sample validity, not statistical means.
  •   Criterion-related validity measures the ability of survey instrument or question to predict or estimate. While not often employed in library-related surveys, criterion-related validity is vital in areas such as pre-election, political surveys. Criterion validity is closely related to theory. When you are developing a measure, you usually expect it- in theory at last – to be related to other measures or to predict certain outcomes. For example, if we develop a new mathematics test, we would expect the scores pupils achieve on that test not to be totally unrelated to those they get on a state-mandated mathematics test. What is needed to establish criterion validity are two things: a good knowledge of theory relating to the concept so that we can decide what variables we can expect to be predictive by and related to it; and a measure of the relationship between our measure and those factors.   -this is whether the questionnaire is measuring what it claims to measure. In an ideal world, you could assess this by relating scores on each item to real world observations. (comparing scores on sociability items with the number of times a person actually goes out to socialize). This is often impractical and so there are other techniques such as : 1. use the questionnaire in variety of situations and see how predictive it is 2. see how well it correlates with other known measures of your construct (sociable people might be expected to score highly on extroversion scales) 3. there are statistical techniques such as Item Validity Index.
  • CONCURRENT VALIDITY The question here is whether scores on your instrument agree with scores on other factors you would expect to be related to it. For example, if you were to measure attitudes to school, you would, from theory expect some relationship with school achievement. Likewise, when designing a measure of pupil learning in geography, you would expect there to be relationship with scores on previously existing measures of learning in that subject. PREDICTIVE VALIDITY Refers to whether or no the instrument you are using predicts the outcomes you would theoretically expect it too. For example, when we select students to study on our university courses, we will use their scores on specific test to determine whether or not they are likely to be successfully complete the course and are therefore suitable candidates. Any test we use for this purpose should therefore predict academic success. Likewise, whenever we develop a screening test for selection of employees, we expect this test to predict how well the prospective employee will do the job
  •   The third type of validity, construct validity is used to “measure or infer the presence of abstract characteristics for which no empirical evidence seems possible. Construct validity can be one of the more difficult psychometric aspects to measure. It draws upon the skill and knowledge of the researcher when forming questions for the survey instrument. Slightly more complex issue relating to the internal structure of an instrument and the concept it is measuring 1 way… = the theory itself may be incorrect, making this approach hazardous. This is one reason why little construct validation is attempted in some researches. A more significant reason is the lack of well-established measures that can be used in a variety of circumstances. Instead, some researchers tend to develop measures for each specific problem or survey and rely on content validity. the term construct refers to the theoretical construct or trait being measured, no to the technical construction of the test items. A construct is a postulated attribute or structure that explains some phenomenon, such as individual’s behavior. Because constructs are abstract and are not considered to be real objects or events, they sometimes are called hypothetical constructs.
  • How might you be wrong? What are the plausible alternative explanations and validity threats to the potential conclusions of your study, and how will you deal with these? How do data that you have, or that you could collect, support or challenge your ideas about what’s going on? Why should we believe the results? How will we know that the conclusions are valid? Group threats – if our experimental and control groups were different to start with, we might merely be measuring these differences. For example using government employee for one group and businessmen for other group, or professional for one group then elementary students on another. Regression to the mean – if the participants produce extreme scores on a pre-test (either very high and low), by chance they are likely to score closer to the mean on a subsequent test – regardless of anything the experimenter does to them. This is called regression to mean, and it is particularly a problem for any real-world study that investigates the effects of some policy or measure that has been introduced in response to a perceived problem. Time threats – with the passage of time, events may occur which produce changes in our participants’ behavior, we have to be careful to design our study so that these changes are not mistakenly regarded as consequences of our experimental manipulations.
  • History –events in the participants’ lives which are entirely unrelated to our manipulations of the independent variable, may have fortuitously given rise to changes similar to those we were expecting. Maturation – participants- especially the young ones – may change simply as a consequence of development. These changes may be confused with changes due to manipulations of the independent variable the experimenter is interested in. - Pre-test & post-test = any observed change in the dependent variable might be due to a reaction to the pre-test. The pre-test might cause fatigue, provide practice, or even alert the participants to the purpose of the study. This may then affect their performance on the post-test. Reactivity and Experiment Effects – measuring a person’s behavior may affect their behavior, for variety of reasons. People’s reaction to having their behavior measured may cause them to change their behavior.
  • Characteristics of Subjects / Respondents of the study Intelligence – those with basic intelligence learn the material faster than those who have low IQ Experience / preparation for the topic being introduced Socio – economic Researcher’s personal characteristics In case of surveys, interviews conducted by researcher may also be unduly affected by their capacity to establish rapport with respondents -this is important to exert care in determining who will be involved in the research activity. Investigators in experiments as well as surveys can be screened for their capacity to steer the process without affecting respondents. This is why training is very important. Not everyone is cut out to conduct good interviews.
  • - Addressing issues of reliability and validity in your survey will assist in making the survey process a useful and successful endeavor. Anyone can distribute a survey, but it takes some forethought and planning to derive results that will be useful for marketing purposes. - Reliability is necessary but not sufficient condition for a valid survey. That is, a test or measuring instrument could be reliable but not valid. In that case, it would be consistently measuring something for which it was not intended. However, a test must be reliable to be valid. If it is not consistent n what it measures, it cannot be measuring that for which it is intended. If it doesnt measure something consistently, its not going to be useful or valid. - A measure is reliable to the extent that it is free from unsystematic sources of error - If the scale measures your weight correctly… then it is both reliable and valid. If it consistently overweighs you by six pounds, then the scale is reliable but not valid. If the scale measures erratically from time to time, then it is not reliable and therefore cannot be valid.   Reliability and validity can be enhanced by planning the survey research process. We could have caught the confusion around the concept of service if we had adequately pretested the survey. A pretest is a trial version of the survey administered to a similar population. Typically, as part f the pretest, respondents are interviewed about each question. Insofar as the definitions of reliability and validity in quantitative research reveal two strands: Firstly, with regards to reliability, whether the result is replicable. Secondly, with regards to validity, whether the means of measurement are accurate and whether they are actually measuring what they are intended to measure.
  • - Essentially, reliability and validity establish the credibility of the research. Reliability focuses on replicability and validity focuses on the accuracy of the findings. attaining reliability does not assure the validity of research. For example, observers could agree on the conclusions and yet the conclusions could be in error. If conclusions cannot be drawn with confidence, there are deficiencies in the research procedures and the study lacks validity. Reliability and validity are conceptualized as trustworthiness, rigor and quality in qualitative paradigm. It is also through this association that the way to achieve validity and reliability of a research get affected from the qualitative researchers’ perspectives which are to eliminate bias and increase the researcher’s truthfulness of a proposition about some social phenomenon (Denzin, 1978) using triangulation. Then triangulation is defined to be “a validity procedure where researchers search for convergence among multiple and different sources of information to form themes or categories in a study” (Creswell & Miller, 2000, p. 126). Therefore, reliability, validity and triangulation, if they are to be relevant research concepts, particularly from a qualitative point of view, have to be redefined as we have seen in order to reflect the multiple ways of establishing truth.
  • Another way to make an instrument more reliable is by measuring it with more than one item. When we use more than one item, individual errors that respondent can make when answering a single item (misreading/misinterpreting a question) cancel each other out. That is why we construct scales. In general, more items means higher reliability. We don’t necessarily don’t want to take this to extremes. Respondents can get bored if you keep on asking them what seems like similar questions and in an increasingly haphazard way. This will increase the risk of measurement error rather than reducing it. Also, we want to keep survey instruments short, and if we use scales with a lot of items, we wont be able to ask about many different things. Measure a construct that is very clearly and even narrowly defined. This may in some cases conflict with validity (are we measuring our concepts too narrowly) obviously, we want to try and create measurements that are both reliable and valid.
  • One standard method of splitting a test has been to score the odd-numbered items and the even-numbered items separately. Then the correlation between scores on the odd and even-numbered items is calculated. Of course, splitting a test in this way implies that the scores on which the reliability is based are from half-length tests. To obtain an estimate of the reliability of the full-length test, is necessary to correct, or step up the half test correlation to the full-length correlation. This is the reliability index of the whole test where r is the reliability coefficient, computed using the odd and even numbered items. Since longer tests tend to be more reliable, and since split-half reliability represents the reliability of a test only half as long as the actual test, a correlation formula needs to be applied to determine the reliability r2 of the whole test. Thus, we use the formula above.    
  • - require only one administration of a test. - Applicable to binary data, for example, responses to items on achievement tests for which the response to an item is either correct or incorrect.
  •    to overcome this problem, Cronbach suggested splitting the data in half in every conceivable way and computing correlation coefficient for each split. The average of these values is known as Cronbach’s alpha, which is the most common measure of scale reliability  
  • When we measure internal consistency or test-retest reliability, we may find that our test is not in fact reliable enough. Then we need to see whether we can pinpoint any particular item as being “at fault”. When looking at internal consistency, we can look at how strongly each individual item is correlated with the scale score. Any items that we weakly related to the test as a whole lower our reliability and should be removed from our instrument. When looking at test-retest reliability, we can indentify items that respondents are scoring very differently on at our two test times. These are causing lower reliability. The problem with this method is that there are several ways in which a set of data can be split into 2 and so the results might stem from the way in which the data were split. Split half – simplest statistical technique; randomly splits the questionnaire items into 2 groups. A score for each participant is then calculated based on each half of the scale. If the scale is very reliable, we’d expect a person’s score to be the same on one half of the scale as the other, and so the two halves should correlated perfectly.
  • Louzel Report - Reliability & validity

    1. 1. Tools of Research: Reliability and Validity Louzel M Linejan Presenter
    2. 2. Sequence of Discussion: <ul><li>Reliability </li></ul><ul><li>definition </li></ul><ul><li>Methods of estimating Reliability </li></ul><ul><li>Validity </li></ul><ul><li>definition </li></ul><ul><li>Forms of Validity </li></ul><ul><li>Factors affecting validity </li></ul><ul><li>Reliability vis-à-vis Validity </li></ul>
    3. 3. Reliability
    4. 4. Reliability ● r efers to how consistently data are collected (Lee 2004) ● the degree to which a test consistently measures whatever it measures and indicates the consistency of the scores produced (Raagas,2009). ● the extent to which results are consistent over time and an accurate representation of the total population under study (Joppe 2000). ● concerns with the replicability and consistency of the methods, conditions and results (Wiersa and Jurs, 2005).
    5. 5. Reliability ● expressed numerically, usually as a coefficient ranging from 0.0 to 1.0; meaning the score of the respondent perfectly reflected their true status with respect to the variable being measured. If a test is perfectly reliable, the reliability coefficient is 1.0 ; meaning the score of the respondent perfectly reflected their true status with respect to the variable being measured. ● no test is perfectly reliable and the scores are invariably affected by errors of measurements resulting from a variety of causes.
    6. 6. Methods of Estimating Reliability 1. Stability ( also called Test – Retest Reliability)   - the degree to which results/scores on the same test are consistent over time. The more similar the scores on the test over time, the more stable or consistent are the scores. - indicates score variation that occurs from one testing session to another. - provides evidence that scores obtained on a test at one time (test), are the same or close to the same when the test is readministered some other time (retest).
    7. 7. <ul><li>1. Stability ( also called Test – Retest Reliability) </li></ul><ul><li>The procedure for determining test-retest reliability is basically simple: </li></ul><ul><li>Administer the test to an appropriate group </li></ul><ul><li>After some time has passed, administer the </li></ul><ul><li>same test to the same group </li></ul><ul><li>Correlate the two sets of scores </li></ul><ul><li>Evaluate the results </li></ul>Methods of Estimating Reliability
    8. 8. <ul><li>1. Stability ( also called Test – Retest Reliability) </li></ul><ul><li>Disadvantage: </li></ul><ul><li>difficulty of knowing how much time should elapse between the two testing sessions </li></ul><ul><li>- If the interval is too short, the chances of the subject’s remembering responses made on the first test are increased, and the estimate of reliability tends to be artificially high. </li></ul><ul><li>- If the interval is too long, the respondents’ test performance may increase due to the intervening learning or maturation, and the estimate of the reliability tends to be artificially low. </li></ul>Methods of Estimating Reliability
    9. 9. 2. Equivalence (or Equivalent Forms) - Two tests that are identical, except for the actual items included. - The two forms measure the same variable, have the same number of items, the same structure, the same difficulty level, and the same direction for administration, scoring and interpretation. - If there is equivalence, the two tests can be used interchangeably. The correlation between scores on the two forms will yield an estimate of their reliability. Methods of Estimating Reliability
    10. 10. <ul><li>2. Equivalence (or Equivalent Forms) </li></ul><ul><li>The procedure for determining equivalent-forms reliability is similar to that for determining test-retest reliability: </li></ul><ul><li>  </li></ul><ul><li>Administer one form of the test to an appropriate group </li></ul><ul><li>After some time, or shortly thereafter, administer the </li></ul><ul><li>second form of the test to the same group </li></ul><ul><li>Correlate the two sets of scores </li></ul><ul><li>Evaluate the results </li></ul><ul><li>  </li></ul><ul><li>  </li></ul>Methods of Estimating Reliability
    11. 11. 3. Internal Consistency Reliability (Methods of Internal Analysis)   - commonly used form of reliability which deals with one test a time. This is obtained through Split-Half , Kuder-Richardson and Cronbach Coefficient Alpha . Each provides information about the consistency among the items in a single test.   - applicable to instruments that have more than one item as it refers to how homogenous the items of a test are; or how well the measure of a single construct Methods of Estimating Reliability
    12. 12. 3. Internal Consistency Reliability a. Split-Half Reliability - A common approach is to split a test into two reasonable equivalent halves. These independent subjects are then used as a source of the two independent scores needed for reliability’s estimation. - simplest statistical technique; randomly splits the questionnaire items into 2 groups. A score for each participant is then calculated based on each half of the scale. Methods of Estimating Reliability
    13. 13. a. Split-Half Reliability Methods of Estimating Reliability This procedure requires only 1 administration of the test. Test items are divided into 2 halves, with the items of the 2 halves are then scores independently. The problem with this method is that there are several ways in which a set of data can be split into two and so the results might stem from the way in which the data were split.
    14. 14. 3. Internal Consistency Reliability b. Kuder-Richardson   Kuder and Richardson developed two of the most widely accepted methods for estimating reliability. These are the K-R20 and K-R21 . These estimate the consistency reliability by determining how all items in a test relate to all other test items and the whole test. These are useful for true-false and the multiple choice items. Methods of Estimating Reliability
    15. 15. 3. Internal Consistency Reliability b. Kuder-Richardson   K – R20 = most advisable if the “proportion of correct responses to a particular item” vary a lot; provide the mean of all possible split-half coefficients Methods of Estimating Reliability K – R21 = most advisable if the items do not vary much in difficulty, i.e., the “proportion of correct responses to a particular item” are more or less similar; may be substituted for K-R20 if it can be assumed that item difficulty levels are similar.
    16. 16. Methods of Estimating Reliability 3. Internal Consistency Reliability c. Cronbach Coefficient Alpha used only if the item scores are other than 0 & 1. This is advisable for essay items, problem solving and 5-scaled items. ; based on 2 or more parts of the test, requires only one administration of the test.
    17. 17. Validity
    18. 18. Validity ● degree to which a test measures what is supposed to measure and consequently, permits appropriate interpretations of test scores (Raagas,2009). ● Validity determines whether the research truly measures that which it was intended to measure or how truthful the research results are (Joppe 2000). ● refers to the ability of the survey questions to accurately measure what they claim to measure (Lee 2004) ● anwers the question: Are we measuring what we want to measure? (Muijs, 2004)
    19. 19. Validity ● Validity’s 3 terms of degrees: - highly valid - moderately valid - generally invalid ● The validation process begins with an understanding of the interpretation to be made from the tests or instruments.
    20. 20. Forms of Validity ● CONTENT VALIDITY ● CONSTRUCT VALIDITY ● CRITERION-RELATED VALIDITY
    21. 21. Content Validity <ul><li>the d egree to which a test measures an intended content area </li></ul><ul><li>the degree to which adequate data is collected as it relates to the construct being measured. </li></ul><ul><li>Establishing the representativeness of the items with respect to whatever is being measured </li></ul><ul><li>Content validity is also invoked when the argument is made that the measurement so self-evidently reflects or represents the various aspects of the phenomenon being researched on. </li></ul>
    22. 22. Content Validity Requires Item Validity and Sampling Validity Item Validity - concerned whether the test items are relevant to the intended content area Sampling Validity - concerned with how well the test sample represents the total content area
    23. 23. Criterion-related Validity <ul><li>Establishes validity through a comparison with Criterion-standard by which the validity of the test will be judged. If the scores of the measure being validated relate highly to the criterion , the measure is valid. If not, the measure is not valid, for the purpose for which the criterion measure is used. </li></ul><ul><li>Measures the ability of survey instrument or question to predict or estimate. </li></ul><ul><li>this is whether the questionnaire is measuring what it claims to measure. </li></ul>
    24. 24. Criterion-related Validity <ul><li>Concurrent Validity </li></ul><ul><li>degree to which the scores on a test are related to scores on another test administered at the same time. </li></ul><ul><li>Whether or not the test scores estimate a specified present performance </li></ul><ul><li>Based on establishing an existing situation - “What is” </li></ul><ul><li>Predictive Validity </li></ul><ul><li>degree to which scores on a test are related to scores on another test administered in the future. </li></ul><ul><li>Whether or not the test scores predict a specified future performance </li></ul><ul><li>Based on establishing – “What is likely to happen?” </li></ul>
    25. 25. Construct Validity <ul><li>seeks to determine whether the construct underlying a variable is actually measured. </li></ul><ul><li>determined by a series of validation studies that can include content and criterion-related approaches. </li></ul><ul><li>both confirmatory and disconfirmatory evidence are used </li></ul><ul><li>The extreme difficulty of this kind of validation lies in the observable nature of many of the construct (such as social class, personality, attitudes etc) used to explain behavior. </li></ul><ul><li>One way to assess to construct validity is to test whether or not the measure confirms hypotheses generated from the theory based on the concepts. </li></ul>
    26. 26. <ul><li> Threats arise when researchers draw incorrect inferences from the sample to people, settings or situations not sufficiently related to the sample such as a different racial, ethnic or socioeconomic group (Creswell, 2003). </li></ul><ul><li>Group threats – if our experimental and control groups have wide and extensive differences. </li></ul><ul><li>Regression to the mean – if the participants produce extreme scores on a pre-test (either very high and low). </li></ul><ul><li>Time threats – with the passage of time, events may occur which produce changes in our participants’ behavior. </li></ul>Factors affecting Validity
    27. 27. <ul><li>Respondent’s History –events in the participants’ lives which are entirely unrelated to our manipulations of the variables. </li></ul><ul><li>Maturation – participants- especially the young ones – may change simply as a consequence of development. </li></ul><ul><li>Reactivity and Experiment Effects - measuring a person’s behavior may affect their behavior, for variety of reasons. People’s reaction to having their behavior measured may cause them to change their behavior. </li></ul>Factors affecting Validity
    28. 28. <ul><li>Instrumentation – may also result in systematic error if they are not carefully planned. Lack of specifity in assessing certain variables could lead to varying interpretations by respondents. </li></ul><ul><li>Characteristics of Subjects / Respondents of the </li></ul><ul><li>study – respondents in surveys may bear basic personal </li></ul><ul><li>characteristics that could influence their response to </li></ul><ul><li>particular factors or variables of the study variables </li></ul><ul><li>Researcher’s personal characteristics – researcher’s may influence the way subjects in experiments or respondents in a survey will respond. </li></ul>Factors affecting Validity
    29. 29. Reliability and Validity Suppose the reported reliability coefficient for a test was 0.24, this definitely is not good. Would this tell something about the validity of the test? What if a test is so hard and no respondent could answer even a single item? Scores would still be consistent, but not valid. If a test measures what it is supposed to measure, it is reliable, but a reliable test can consistently measure the wrong thing and be invalid. Yes, it would. It would show that the validity is not high because if it were, the reliability would be higher.
    30. 30. Reliability and Validity Reliability is necessary but not sufficient for establishing validity. A valid test is always reliable but a reliable test is not always valid. What if the reported reliability was 0.92, which is definitely high. Would this tell anything about validity? “ not really”. It would only indicate that the test validity might be also high, because the reliability is high, but not necessarily; the test could be consistently measuring the wrong thing.
    31. 31. Thank You!
    32. 32. - ensure that the quality of questions we ask is clear and unambiguous. Unambiguous and clear question are likely to be more reliable, and the same goes for items on a rating scale for observers. - Another way to make an instrument more reliable is by measuring it with more than one item. - ensure that dependent variable is measured as precisely as possinble What can we do to make our instruments more reliable?
    33. 33. a. Split-Half Reliability When we need to predict the reliability of a test twice as long as given test, as in the split halves method, the formula is shown below: Methods of Estimating Reliability The problem with this method is that there are several ways in which a set of data can be split into two and so the results might stem from the way in which the data were split.
    34. 34. b. Kuder-Richardson   K – R20 = most advisable if the p values vary a lot Methods of Estimating Reliability K – R21 = most advisable if the items do not vary much in difficulty, i.e., the p values are more or less similar
    35. 35. c. Cronbach Coefficient Alpha   used only if the item scores are other than 0 & 1. This is advisable for essay items, problem solving and 5-scaled items. Methods of Estimating Reliability where si = standard deviation of a single test item and S = standard deviation of the total score of each examinee.
    36. 36. <ul><li>Internal Reliability – consistency in the research process (can be addressed in a number of ways. Much of QR involves observation by multiple observers as part of data gathering. Relies on the logical analysis of the results </li></ul><ul><li>External Reliability – asks the question “ Are the findings generalizable. </li></ul>Reliability
    37. 37. <ul><li>Internal Validity – cause and effect inference linking the independent variable and the dependent variable. To answer this question, the researcher must be confident that factors such as extraneous variables have been controlled and are not producing and effect that is mistaken as experimental treatment effect. </li></ul><ul><li>External Reliability – deals with the generalizability of the results of the study. To what populations, variables, situations and so forth do the results generalize? Generally, the more extensive the results can be generalize, the more useful the research given that there is adequate validity. </li></ul>Validity

    ×