2. Standardization is the process of trying out the test on a
group of people to see the scores which are typically
obtained. This standardization provides a mean (average)
and standard deviation (spread) relative to a certain
group. When an individual the test, she can determine
how far above or below the average her score is, relative
to the normative group.
A standardized test is a test administered and scored
in a consistent manner. Test are designed in such a way
that the “questions, conditions for administering, scoring
procedures, and interpretations are consistent and are
administered and scored in a predetermined, standard
manner.
3. Understanding Norms and Test Scores
Standardization is the process of testing a group of
people to see the scores that typically attained. With a
standardized test, the participant can compare where that
score fell compared to the standardization group‘s
performance.
With the standardization the normative group must
reflect the population for which the test was designed.
The group‘s performance is the basis for the test norms.
What is standardized testing? Standardized tests are
tools designed to allow measure of student performance
relative to all others taking the same test.
4. History of Standardized Testing
1909- Thorndike Handwriting Scale was first popular
standardized achievement test used in public schools.
1930- Most schools in the United States and Canada were
using some form of standardized testing.
1950- Student would graduate from high school taking
probably three standardized tests to the present where kids
take between 18-21 tests, it is easy to believe that the “volume
of testing has an annual growth rate of 10-20 percent”.
1965- Standardized tests were not used in early grades,
because these were years of growth and development.
1980- Sixteen states and districts in 21 others now required
children to take a standardized test before entering
kindergarten and districts in at least 42 states requires
students to pass a standardized test before “graduating” from
kindergarten.
5. Types of Standardized Testing
Norm-referenced
Testing measures performance relative to all
other students taking the same test. You can
use it if you want to know how a student is
compared to the rest.
Criterion referenced
Testing measures factual knowledge of a
defined body of material. Multiple- choice tests
that people take to get their license or a test in
fractions are both examples of this type of
testing.
6. Application in Classroom and Similar Settings
Standardized test are intended to help a teacher, school,
or district make decisions on what is working in the
classroom, how to improve the education, and how to help
a specific student.
However, standardized test scores should not be the
only thing a teacher, school, district, or school should look
at when making a decision about programs or students.
Other areas of consideration should be: observations in
the classroom; evaluation of day-to-day class work,
homework and assignments ; meetings with parents; and
observation of student change and growth throughout the
year.
7. Establishing Test Validity
According to Calmorin the degree of validity is most
important attribute of test. Validity refers to the degree to
which test is capable of achieving certain aims. The
validity must be determined with reference to the
particular use for which the test being considered. The
validity of test must always be considered in relation to the
purpose it serves. Validity is always specific in relation to
some definite situation. A test is always valid.
8. Item Analysis
It is done after the first try out of the test. One method conducting item
analysis is U-L Index Method.
1. The teachers score the papers and rank the scores from highest to
lowest according to the total score.
2. Separate the upper 27% and lower 27% of the papers.
3. Tally the responses made to each test item by each student in the
upper 27% then do the same with the lower 27%.
4. Compute the percentage of the upper group that got the item right.
This is called the U.
5. Compute the percentage of the lower group that got the item right. This
is called L.
6. Average U and L percentage. The result is the difficulty index.
7. Subtract the L percentage from the U percentage. The result is
discrimination index.
9. After the item analysis, the tester uses the
following table of equivalents interpreting the
difficulty index:
.00- .20 - Very Difficult
.21- .80 - Moderately Difficult
.81- 1.00 - Very Easy
10. Item Revision
On the basis of the item analysis data, test items are
revised for improvement. After revising the test items that
need revision, the tester needs another try out. The revised
must be administered to the same set of samples.
Third try out
After two revisions, the test is considered ready for the
final form. The test is good in terms of difficulty index and
discrimination indices. At this time, the test is ready for it
reliability testing.
11. How to Establish Reliability
Reliability may be estimated through a variety of methods that fall into two
types;
single-administration and multiple-administration.
Multiple –administration methods require that two assessments are
administered.
• Test-retest reliability
Is estimated as the Pearson Product-moment Correlation Coefficient
between two administrators of the same measure. This is sometimes
known as the Coefficient of Stability.
• Alternative forms reliability
Is estimated by the Pearson product-moment correlation coefficient of two
different forms of a measure, usually administered together. This is
sometimes known as the Coefficient of Equivalence.
12. Single- administration methods include split-half and
internal consistency.
Split- half reliability
Treats the two halves of a measure as alternative forms.
This “halves reliability” estimate is then stepped up to the
full test length using the Spearman Brown Prediction
Formula. This is sometimes referred to as the Coefficient
of Internal Consistency.
Internal Consistency
Measure is Cronbach’s alpha, which is usually interpreted
as the mean of all possible split-half coefficients.
Cronbachs alpha which is a generalization of an earlier
form of estimating internal consistency, Kuder-Richardson
Formula 20.
13. Reliability Estimation Using a Split-half
Methodology
The split-half design in effect creates two
comparable test administrations. The items in a test
are split into two test that are equivalent in content
and difficulty. Often this is done by splitting among
odd and even numbered items. This assumes that
the assessment is homogenous in content.
14. Estimating Reliability using Kuder- Richardson Formula 20
The rationale for Kuder and Richardson’s most commonly used procedure is
roughly equivalent to:
Securing the mean inter-correlation of the number of items (k) in the test.
Considering this to be the reliability coefficient for the typical item in the test.
Stepping up this average with the Spearman- Brown formula to estimate the
reliability coefficient of an assessment of k items.
Formula for Kuder- Richardson Formula 20:
Where:
k - the number of items in the test
SD – standard deviation of the test
p – the proportion of examinees who got an item correctly
q – the proportion of those who got the item incorrectly