Test Standardization
Standardization is the process of trying out the test on a 
group of people to see the scores which are typically 
obtained. This standardization provides a mean (average) 
and standard deviation (spread) relative to a certain 
group. When an individual the test, she can determine 
how far above or below the average her score is, relative 
to the normative group. 
A standardized test is a test administered and scored 
in a consistent manner. Test are designed in such a way 
that the “questions, conditions for administering, scoring 
procedures, and interpretations are consistent and are 
administered and scored in a predetermined, standard 
manner.
Understanding Norms and Test Scores 
Standardization is the process of testing a group of 
people to see the scores that typically attained. With a 
standardized test, the participant can compare where that 
score fell compared to the standardization group‘s 
performance. 
With the standardization the normative group must 
reflect the population for which the test was designed. 
The group‘s performance is the basis for the test norms. 
What is standardized testing? Standardized tests are 
tools designed to allow measure of student performance 
relative to all others taking the same test.
History of Standardized Testing 
1909- Thorndike Handwriting Scale was first popular 
standardized achievement test used in public schools. 
1930- Most schools in the United States and Canada were 
using some form of standardized testing. 
1950- Student would graduate from high school taking 
probably three standardized tests to the present where kids 
take between 18-21 tests, it is easy to believe that the “volume 
of testing has an annual growth rate of 10-20 percent”. 
1965- Standardized tests were not used in early grades, 
because these were years of growth and development. 
1980- Sixteen states and districts in 21 others now required 
children to take a standardized test before entering 
kindergarten and districts in at least 42 states requires 
students to pass a standardized test before “graduating” from 
kindergarten.
Types of Standardized Testing 
Norm-referenced 
Testing measures performance relative to all 
other students taking the same test. You can 
use it if you want to know how a student is 
compared to the rest. 
Criterion referenced 
Testing measures factual knowledge of a 
defined body of material. Multiple- choice tests 
that people take to get their license or a test in 
fractions are both examples of this type of 
testing.
Application in Classroom and Similar Settings 
Standardized test are intended to help a teacher, school, 
or district make decisions on what is working in the 
classroom, how to improve the education, and how to help 
a specific student. 
However, standardized test scores should not be the 
only thing a teacher, school, district, or school should look 
at when making a decision about programs or students. 
Other areas of consideration should be: observations in 
the classroom; evaluation of day-to-day class work, 
homework and assignments ; meetings with parents; and 
observation of student change and growth throughout the 
year.
Establishing Test Validity 
According to Calmorin the degree of validity is most 
important attribute of test. Validity refers to the degree to 
which test is capable of achieving certain aims. The 
validity must be determined with reference to the 
particular use for which the test being considered. The 
validity of test must always be considered in relation to the 
purpose it serves. Validity is always specific in relation to 
some definite situation. A test is always valid.
Item Analysis 
It is done after the first try out of the test. One method conducting item 
analysis is U-L Index Method. 
1. The teachers score the papers and rank the scores from highest to 
lowest according to the total score. 
2. Separate the upper 27% and lower 27% of the papers. 
3. Tally the responses made to each test item by each student in the 
upper 27% then do the same with the lower 27%. 
4. Compute the percentage of the upper group that got the item right. 
This is called the U. 
5. Compute the percentage of the lower group that got the item right. This 
is called L. 
6. Average U and L percentage. The result is the difficulty index. 
7. Subtract the L percentage from the U percentage. The result is 
discrimination index.
After the item analysis, the tester uses the 
following table of equivalents interpreting the 
difficulty index: 
.00- .20 - Very Difficult 
.21- .80 - Moderately Difficult 
.81- 1.00 - Very Easy
Item Revision 
On the basis of the item analysis data, test items are 
revised for improvement. After revising the test items that 
need revision, the tester needs another try out. The revised 
must be administered to the same set of samples. 
Third try out 
After two revisions, the test is considered ready for the 
final form. The test is good in terms of difficulty index and 
discrimination indices. At this time, the test is ready for it 
reliability testing.
How to Establish Reliability 
Reliability may be estimated through a variety of methods that fall into two 
types; 
single-administration and multiple-administration. 
Multiple –administration methods require that two assessments are 
administered. 
• Test-retest reliability 
Is estimated as the Pearson Product-moment Correlation Coefficient 
between two administrators of the same measure. This is sometimes 
known as the Coefficient of Stability. 
• Alternative forms reliability 
Is estimated by the Pearson product-moment correlation coefficient of two 
different forms of a measure, usually administered together. This is 
sometimes known as the Coefficient of Equivalence.
Single- administration methods include split-half and 
internal consistency. 
Split- half reliability 
Treats the two halves of a measure as alternative forms. 
This “halves reliability” estimate is then stepped up to the 
full test length using the Spearman Brown Prediction 
Formula. This is sometimes referred to as the Coefficient 
of Internal Consistency. 
Internal Consistency 
Measure is Cronbach’s alpha, which is usually interpreted 
as the mean of all possible split-half coefficients. 
Cronbachs alpha which is a generalization of an earlier 
form of estimating internal consistency, Kuder-Richardson 
Formula 20.
Reliability Estimation Using a Split-half 
Methodology 
The split-half design in effect creates two 
comparable test administrations. The items in a test 
are split into two test that are equivalent in content 
and difficulty. Often this is done by splitting among 
odd and even numbered items. This assumes that 
the assessment is homogenous in content.
Estimating Reliability using Kuder- Richardson Formula 20 
The rationale for Kuder and Richardson’s most commonly used procedure is 
roughly equivalent to: 
Securing the mean inter-correlation of the number of items (k) in the test. 
Considering this to be the reliability coefficient for the typical item in the test. 
Stepping up this average with the Spearman- Brown formula to estimate the 
reliability coefficient of an assessment of k items. 
Formula for Kuder- Richardson Formula 20: 
Where: 
k - the number of items in the test 
SD – standard deviation of the test 
p – the proportion of examinees who got an item correctly 
q – the proportion of those who got the item incorrectly
SUBMITTED BY: 
Aileen B. Ferriols 
SUBMITTED TO: 
MRS. KATHERINE PARANGAT

Test standardization

  • 1.
  • 2.
    Standardization is theprocess of trying out the test on a group of people to see the scores which are typically obtained. This standardization provides a mean (average) and standard deviation (spread) relative to a certain group. When an individual the test, she can determine how far above or below the average her score is, relative to the normative group. A standardized test is a test administered and scored in a consistent manner. Test are designed in such a way that the “questions, conditions for administering, scoring procedures, and interpretations are consistent and are administered and scored in a predetermined, standard manner.
  • 3.
    Understanding Norms andTest Scores Standardization is the process of testing a group of people to see the scores that typically attained. With a standardized test, the participant can compare where that score fell compared to the standardization group‘s performance. With the standardization the normative group must reflect the population for which the test was designed. The group‘s performance is the basis for the test norms. What is standardized testing? Standardized tests are tools designed to allow measure of student performance relative to all others taking the same test.
  • 4.
    History of StandardizedTesting 1909- Thorndike Handwriting Scale was first popular standardized achievement test used in public schools. 1930- Most schools in the United States and Canada were using some form of standardized testing. 1950- Student would graduate from high school taking probably three standardized tests to the present where kids take between 18-21 tests, it is easy to believe that the “volume of testing has an annual growth rate of 10-20 percent”. 1965- Standardized tests were not used in early grades, because these were years of growth and development. 1980- Sixteen states and districts in 21 others now required children to take a standardized test before entering kindergarten and districts in at least 42 states requires students to pass a standardized test before “graduating” from kindergarten.
  • 5.
    Types of StandardizedTesting Norm-referenced Testing measures performance relative to all other students taking the same test. You can use it if you want to know how a student is compared to the rest. Criterion referenced Testing measures factual knowledge of a defined body of material. Multiple- choice tests that people take to get their license or a test in fractions are both examples of this type of testing.
  • 6.
    Application in Classroomand Similar Settings Standardized test are intended to help a teacher, school, or district make decisions on what is working in the classroom, how to improve the education, and how to help a specific student. However, standardized test scores should not be the only thing a teacher, school, district, or school should look at when making a decision about programs or students. Other areas of consideration should be: observations in the classroom; evaluation of day-to-day class work, homework and assignments ; meetings with parents; and observation of student change and growth throughout the year.
  • 7.
    Establishing Test Validity According to Calmorin the degree of validity is most important attribute of test. Validity refers to the degree to which test is capable of achieving certain aims. The validity must be determined with reference to the particular use for which the test being considered. The validity of test must always be considered in relation to the purpose it serves. Validity is always specific in relation to some definite situation. A test is always valid.
  • 8.
    Item Analysis Itis done after the first try out of the test. One method conducting item analysis is U-L Index Method. 1. The teachers score the papers and rank the scores from highest to lowest according to the total score. 2. Separate the upper 27% and lower 27% of the papers. 3. Tally the responses made to each test item by each student in the upper 27% then do the same with the lower 27%. 4. Compute the percentage of the upper group that got the item right. This is called the U. 5. Compute the percentage of the lower group that got the item right. This is called L. 6. Average U and L percentage. The result is the difficulty index. 7. Subtract the L percentage from the U percentage. The result is discrimination index.
  • 9.
    After the itemanalysis, the tester uses the following table of equivalents interpreting the difficulty index: .00- .20 - Very Difficult .21- .80 - Moderately Difficult .81- 1.00 - Very Easy
  • 10.
    Item Revision Onthe basis of the item analysis data, test items are revised for improvement. After revising the test items that need revision, the tester needs another try out. The revised must be administered to the same set of samples. Third try out After two revisions, the test is considered ready for the final form. The test is good in terms of difficulty index and discrimination indices. At this time, the test is ready for it reliability testing.
  • 11.
    How to EstablishReliability Reliability may be estimated through a variety of methods that fall into two types; single-administration and multiple-administration. Multiple –administration methods require that two assessments are administered. • Test-retest reliability Is estimated as the Pearson Product-moment Correlation Coefficient between two administrators of the same measure. This is sometimes known as the Coefficient of Stability. • Alternative forms reliability Is estimated by the Pearson product-moment correlation coefficient of two different forms of a measure, usually administered together. This is sometimes known as the Coefficient of Equivalence.
  • 12.
    Single- administration methodsinclude split-half and internal consistency. Split- half reliability Treats the two halves of a measure as alternative forms. This “halves reliability” estimate is then stepped up to the full test length using the Spearman Brown Prediction Formula. This is sometimes referred to as the Coefficient of Internal Consistency. Internal Consistency Measure is Cronbach’s alpha, which is usually interpreted as the mean of all possible split-half coefficients. Cronbachs alpha which is a generalization of an earlier form of estimating internal consistency, Kuder-Richardson Formula 20.
  • 13.
    Reliability Estimation Usinga Split-half Methodology The split-half design in effect creates two comparable test administrations. The items in a test are split into two test that are equivalent in content and difficulty. Often this is done by splitting among odd and even numbered items. This assumes that the assessment is homogenous in content.
  • 14.
    Estimating Reliability usingKuder- Richardson Formula 20 The rationale for Kuder and Richardson’s most commonly used procedure is roughly equivalent to: Securing the mean inter-correlation of the number of items (k) in the test. Considering this to be the reliability coefficient for the typical item in the test. Stepping up this average with the Spearman- Brown formula to estimate the reliability coefficient of an assessment of k items. Formula for Kuder- Richardson Formula 20: Where: k - the number of items in the test SD – standard deviation of the test p – the proportion of examinees who got an item correctly q – the proportion of those who got the item incorrectly
  • 15.
    SUBMITTED BY: AileenB. Ferriols SUBMITTED TO: MRS. KATHERINE PARANGAT