3. English test
tools / instruments
to draw out evidence
of the existance of
English abilities
4. 1.Good instruments :
2.The hidden English abilities
are guaranteed to be
observable.
1.Bad instruments :
2.1. Damage measurements and
evaluation.
3.2. Can not describe the real
language ablities of the test takers.
6. RELIABLE = STABLE = CONSISTENT
Reliability
• Reliable test is a test that can produce stable scores or
consistent scores.
• Test scores demonstate consistency or stability no
matter who administers the test (Rater or Interrater).
• The scores consistent no matter when or where the test
is administrated.
7. Reliability
– Observed Score is the data gathered by the researcher
– True Score is the actual unknown values that correspond to the
construct of interest
– Error
– Systematic Error is variations that results from constructs of disinterest
– Unsystematic / Random Error is nonsystematic variations in the observed
scores
Observed Score = True Score + (Measurement) Error
8. Students Rater 1 Rater 2
A 8 8
B 8.6 8.6
C 9 9
D 8 8
E 9.4 9.4
Perfect Consistency
Consistency between Raters or interraters
9. Students Rater 1 Rater 2
A 8.2 8
B 8.6 8.8
C 8.9 9
D 8 8.1
E 9.3 9.4
Consistent Test
13. How do we determine whether a
measurement is reliable?
The principles of reliability estimation utilizing
these APPROACHES:
Test Retest
Parrarel Forms
Internal Consistency
14. TEST RETEST
Uses the same test twice to the same group of
subjects on different testing occasions.
There is a repetation on the use of the same
instrument and the invovements of the same
subjects.
The repetetion is done on different day.
15. TIME ACTIVITY PARTICIPANTS RESULT
Wednesday, 3/10/18 Vocabulary test 50 students of SMA
1 Bangsal
Result 1
Wednesday,
10/10/18
The same
vocabulary test
50 students of SMA
1 Bangsal
Result 2
16. Advantanges Disadvantages
• We only need one set of a test. • Requires of testing occasions.
• It is not easy to create a similar
condition on different testing
occasions.
• Too close time for the test
administration makes the test
takers still remember the
content of the test.
• Far too long of the second test
may affect the test takers’
performance.
• Cause boredom, ailment and
the like.
Back
17. Parrarel Forms
Requires two or more sets of tests.
Each set of test is made equal in every aspect
of the test with other test.
Equal in :
Test format
Test lenght
The level of difficulty
Discrimination indexes used
Time allocation
Test content
18. SET A
(administrated on
Tuesday)
SET B
(administrated on
Friday)
Administered
to a group of
students
Scores
produced from
completing Set
A
Scores
produced from
completing Set
B
Correlational
analysis
19. Advantanges Disadvantages
• Has more variations in sets of
the tests.
• Time consuming to make two
more sets of the tests.
• Not easy to keep the students’
motivation in doing the second
test.
Back
20. Internal Consistency
Based on the logic that if the items in the test are highly
correlated, the test is said to be reliable.
Before develop a test, it should be built a theoritical
ability that would be measured by the test.
The items of the test should be constructed to measure
a single ability (technically it is calaed as “construct”).
Tests with higher internal consistency more accurately
measure the intended construct of the test developers.
22. Students item Total score
A 1 1 0 1 0 1 1 0 1 0 6
B 1 0 1 0 1 0 1 1 1 0 6
C 0 1 1 1 1 1 1 1 1 1 9
D 1 0 1 0 0 0 1 1 1 0 4
E 1 0 0 0 1 1 1 1 1 1 7
Test takers’ scores
(Hypothetical dichotoumous scoring on 10 items)
F 1 1 0 1 1 1 1 1 1 1 9
Total score 5 3 3 3 4 4 6 5 6 3
23. Approaches to perform internal consistency
Split half : split the scores based on the test achievement in the first half of
the items and those on the second items.
The split can be half of the total items or based on the odd or even numbers.
Some drawbacks of split-half are:
Inter-item estimation : the test scores are correlated with themselves
within the same test.
It is called as inter-item correlation.
Obtained scores in each item are correlated with one another.
Item 1 is correlated with item 2, 3, 4, 5, 6, 7, 8, 9, 10 or
Item 2 is correlated with item 1, 3, 4, 5, 6, 7, 8, 9, 10.
Examples :
26. Does not fully reflect the true value of reliablity of the
test (Kline, 1993:11)
Different split may cause different result of reliability
(Cronbach, 1951)
Test lenght affects the reliability of the test. The more
items in the test, the reliable the test is (Wiersma and
Jurs, 1990:163).
DRAWBACKS OF SPLIT-HALF
Back
27. Example of inter-item estimation
There two or more raters evaluate students speaking
skills.
The scoring may be based on some several aspects.a
statistical analysis may be used to analyze the data,
usually uses t-test.
A correlational analysis may be applied to examine the
closeness of the scores got by the two rates.