13.
Test/Retest Reliability EstimatesA second problem with the Test/Retest method is the lengthof time required to conduct the two test administrations.A short delay between Time 1 and Time 2 increases thepotential for carry-over effects due to memory, fatigue,practice, etc.But a long delay between Time 1 and Time 2 increases thepotential for carry-over effects due to mood, developmentalchange, etc.Consequently, the Test/Retest method is most appropriate incontexts wherein the test is not susceptible to carry-overeffects.
18.
Internal Consistency Estimates of ReliabilityWe have seen that reliability estimates can be obtained byadministering the same test to the same examinees and bycorrelating the results: Test/RetestWe have also seen that reliability estimates can be obtained byadministering two parallel or alternate forms of a test, and thencorrelating those results: Parallel- & Alternate-FormsIn both of the above cases, the researcher must administer twoexams, and they are sometimes given at different times makingthem susceptible to carry-over effects.Here, we will see that it is possible to obtain a reliability estimateusing only a single test.The most common way to obtain a reliability estimate using asingle test is through the Split-half approach.
19.
Split-Half approach to ReliabilityWhen using the Split-Half approach, one gives a single test toa group of examinees.Later, the test is divided into two parts, which may beconsidered to be alternate forms of one another.• In fact, the split is not so arbitrary; an attempt is made to choose the two halves so that they are parallel or essentially τ-equivalent. • If the halves are considered parallel, then the reliability of the whole test is estimated using the Spearman- Brown formula. • If the halves are essentially τ-equivalent, then the coefficient α can be used to estimate reliability.
22.
Split-Half approach to ReliabilityOn the other hand, the two test halves may not (and are likelynot) parallel forms.This is confirmed when it is determined that the two halveshave unequal variances.In these situations, it is best to use a different approach toestimating reliability.• Cronbach’s coefficient αα can be used to estimate the reliability of the entire test.
23.
Split-Half approach to ReliabilityIf the test halves are not essentially τ-equivalent, thencoefficient α will give a lower bound for the test’s reliability.• In other words, the test’s reliability must be greater than, or equal to, the value produced by Cronbach’s α. • If α is a high value, then you know that the test reliability is also high. • If α is a low value, then you may not know whether the test actually has low reliability or whether the halves of the test are simply not essentially τ-equivalent.
26.
Split-Half approach to ReliabilityIt is the case, that if the variances on both test halves areequal, then the Spearman-Brown formula and Cronbach’s αwill produce identical results.If the variances of the two test halves are equal, but thehalves are not Essentially τ-Equivalent, then both theSpearman-Brown formula and Cronbach’s α willunderestimate the test’s reliability.• Lower bound estimateIf the observed-score variances of the test halves are equaland the tests are Essentially τ-Equivalent, then the Spearman-Brown formula and Cronbach’s α will both equal the test’sreliability.
27.
Split-Half approach to ReliabilityObviously, the major advantage to using internal-consistencyreliability estimates is that test need only be given once toobtain such an estimate.Naturally, this approach is limited only to tests that can bedivided into two parts, or into two parts that are eitherparallel or essentially τ-equivalent, or when the test lacksindependent items that can be separated from one another.• In these situations, one must use test/retest, parallel- or alternate-forms reliability approaches.Assuming one is able to use the Split-Half approach, however,how does one go about forming two test halves?
28.
Split-Half approach to ReliabilityForming Test Halves:There are 3 commonly used methods for forming test halves:1. The Odd/Even method2. The Order method3. The Matched Random Subsets method
29.
Odd/Even approach to Test HalvesThe Odd/Even method classifies items by their order,whether odd-numbered or even numbered, on the test.• In other words, all odd-numbered test items form the first half, and all even-numbered test items form the second half.After the two halves are formed, a score for each half isobtained for each examinee.These scores are used to obtain an estimate of reliability.This is a fairly simple, and straightforward approach toforming two test halves.
30.
Ordered approach to Test HalvesThe Ordered method requires that a test be divided prior to itsadministration.From this point, there are multiple additional approaches toadministrating the Ordered method.1. Every examinee can be given the same test and then, one can compare scores from the first half to scores from the second half. • Carry-over effects may be a concern.2. Each half is labeled, say A and B, are then given in different orders to different examinees. • In other words, half the examinees will be randomly assigned order A-B, and the other half will be assigned order B-A.The Ordered method is generally considered to be lesssatisfactory than the Odd/Even method because of the increasedpotential for carry-over effects.
31.
The Matched Random Subsets approach to Test HalvesThe Matched Random Subsets method is much moresophisticated than the two aforementioned methods.This process involves several steps:1. For each test item, two statistics are computed: • The proportion of examinees passing the item – a measure of the item’s “difficulty.” • The biserial or point-biserial correlation between the item score and the total test score.2. Each item is plotted on a graph using the above two statistics. • Items that are close together on the graph are paired, and one item from each pair is randomly assigned to one half of the test. • The remaining items form the other half of the test.
32.
The Matched Random Subsets approach to Test HalvesFor example, in the graphic above, we see the plot of testitems A, B, C, D, E & F.Test items A & B are similar, and therefore grouped. Likewise,so is C with D, and E with F.
33.
Internal-Consistency Reliability – The General CaseIn our previous examples, we divided a given test into two equal halves.But, here we can examine dividing a given test into multiple equal components.Even in these cases, we can apply the basic principles of each of the methods for dividinga test. • For example, the odd/even method can be modified to divide a nine item test into thirds by taking every third item in a sequence to form a given component, etc. • The Matched Random Subsets method would involve forming triplets, rather than pairs, but then the first item is randomly assigned to one component, the next to another, and so on.
34.
Internal-Consistency Reliability – The General CaseLet us assume that a given test is divided into N components.The variances of the scores on each component and the variances of the entire test areused to estimate the reliability of the test.If the components are essentially τ-equivalent, then formulas presented herein willprovide good estimates of the test’s reliability.If, however, the components are not essentially τ-equivalent, then the formulaspresented herein will underestimate (i.e., provide a lower bound for) the test’s reliability.Furthermore, it is important the any test divided into components measure only a singletrait (i.e., be homogeneous in content).• Intelligence tests are a classic example of a heterogeneous test, because they measure a broad spectrum of traits.
35.
Internal-Consistency Reliability – The General Case1.
36.
Internal-Consistency Reliability – The General Case1.
37.
Internal-Consistency Reliability – The General Case1.
38.
Internal-Consistency Reliability – The General Case1.
39.
The Spearman-Brown Formula: The General Case1.
40.
The Spearman-Brown Formula: The General Case1.
41.
The Spearman-Brown Formula: The General Case1.
42.
The Spearman-Brown Formula: The General Case1.
43.
The Spearman-Brown Formula: The General Case1.
44.
The Spearman-Brown Formula: The General CaseIf the component tests are not parallel, then the Spearman-Brown formula will wither over- or underestimate thereliability of a longer test.An example scenario of overestimation:• Suppose one has a 10 item test with a reliability of 0.60.• The Spearman-Brown formula predicts that by adding a parallel ten-item test that the resultant total reliability will be 0.75.• But suppose the test that is added by a faulty test that has no variance.• Effectively, we’ve only added a constant to every examinee’s score, which does not contribute to the test’s reliability.• In this case, the total test reliability would still be 0.60.
45.
The Spearman-Brown Formula: The General CaseIf the component tests are not parallel, then the Spearman-Brown formula will wither over- or underestimate thereliability of a longer test.An example scenario of underestimation:• Suppose a ten item test has a reliability of 0.00.• The Spearman-Brown formula predicts that by doubling the test length with a parallel component would produce a reliability of 0.00.• However, if a non-parallel test is added instead with a reliability of, say, 0.70, then the resultant reliability of the lengthened test will be greater than 0.00.
46.
Comparison of Methods of Estimating ReliabilitiesSo far, we have learned several different ways to estimate the reliability of agiven test.Here is a summary of the basic principles of each, that one should use whendeciding which is appropriate for estimating the reliability of one’s test:1. When using Test/Retest methods, one should use Parallel- or Alternate- Forms reliability estimates because most internal-consistency measures would be inaccurate.2. Use of Cronbach’s α or the Kuder-Richardson methods produces a lower bound for a given test’s reliability. • If the tests happen to be essentially τ-equivalent, then the estimated reliability is the test’s reliability. • But these tests should only be used for homogeneous tests3. When using the Split-Half method, the Spearman-Brown formula can over- or underestimate a test’s reliability if the components are not parallel. • When the components are parallel, then the estimate provided is very good for judging the effects of changing test length.
47.
Standard Errors of Measurement & Confidence Intervals for True Scores1.
48.
Standard Errors of Measurement & Confidence Intervals for True ScoresThe bottom chart depicts an approximately normaldistribution of observed scores obtained from manyindependent testings of a single examinee.Note how the scores vary, but tend to group around theexaminee’s true score.
49.
Standard Errors of Measurement & Confidence Intervals for True Scores1.
50.
Standard Errors of Measurement & Confidence Intervals for True Scores1.
51.
Standard Errors of Measurement & Confidence Intervals for True Scores1.
52.
Standard Errors of Measurement & Confidence Intervals for True ScoresThe confidence intervals for true scores can be interpreted ineither of two ways:1. The intervals can be expected to contain a given examinee’s true score a specified percentage of time when the interval is constructed using observed scores that are the result of repeated independent testings of the examinee using the same test (or parallel tests).2. The interval can be expected to cover a specified percentage of the examinee’s true scores when many examinees are tested once with the same test (or parallel tests) and a confidence interval is calculated for each examinee.
53.
Standard Errors of Measurement & Confidence Intervals for True ScoresTests with a high degree of measurement error will produceconfidence intervals that are necessarily wider.Less reliable tests tend to have a high degree ofmeasurement error.Therefore, wide confidence intervals are an indication thatthe observed scores are not very good estimates of truescores.If a test has good reliability, then the confidence intervals willalso be narrow, indicating good estimates of true scores.
Be the first to comment