1. Topic :
Meaning, Characteristic, Objectivity,
Validity, Reliability, Usability, Norms,
Construction of Tests
Presented by :
GAJE SINGH
MSc.Nursing(Previous)
2. WHAT DO YOU UNDERSTAND BYATEST?
Noun (1)
1. a short exam to measure somebody’s knowledge or skill in something
2. a short medical examination of a part of your body
Verb (2)
1. to try, use or examine something carefully to find out if it is working properly or what it is like
2. to examine a part of the body to find out if it is healthy
A Test is a tool or a device or an instrument used for the purpose of measurement and
assessment of knowledge and skill where it also serves as a basis for evaluation of
performance with respect to criteria set in the beginning.
3. Introduction
Test /Assessment device is an instrument used to determine both how well a
student has learned covered materials, and how well he will do in future
endeavors.
Most formal assessments that are used to assign grades and/or for selection
purposes or predictions involve test.
A test is a systematic method for measuring students' behaviors and evaluating
these behaviors against standards and norms.
4. Definition
It is a device or procedure for confronting a subject with a standard set of
questions or tasks to which the student is to respond independently and the
results of which can be treated as a quantitative comparison of the performance
of different students.
“Test is a systematic procedure for observing persons and describing them with
either a numerical scale or a category system. Thus, test may give either
qualitative or quantitative information.”
Anthony J. Nitko
5. COMPARISON OF TEST, MEASUREMENT,
ASSESSMENTAND EVALUATION
Fig – basic-of-measurement-evaluation.jpg
Source - Google
6. Let’s Understand them all
Test - A method to determine a student's ability to complete certain tasks or
demonstrate mastery of a skill or knowledge of content.
A test is one form of an assessment. For example: multiple choice tests,
weekly spelling tests, practical skill test.
Measurement - The set of procedures and the principles for how to use the
procedures in educational tests and assessments.
Some of the basic principles of measurement in educational evaluations
would be raw scores, percentile ranks, derived scores, standard scores, etc.
8. Continued…
Assessment - The process of gathering information to monitor progress and make educational
decisions if necessary.
An assessment may include a test, but it also includes methods such as observations, interviews,
behavior monitoring, etc.
Evaluation - Procedures used to determine whether the subject (i.e. student) meets preset
criteria, such as qualifying for special education services.
This uses assessment (remember that an assessment may be a test) to make a determination of
qualification in accordance with a pre-determined criteria.
12. Continued…
You may be thinking that learning to bake cookies and learning something like education/teaching aren't the
same at all, and, in a way, you are right. But, the information you get from assessing what you have learned
is the same.
Brian used what he learned from each batch of cookies to improve the next batch. You learn from every
homework assignment you complete and every quiz you take what you still need to study in order to know
the material.
Same is the case with us, as we all are:
1) learning content creation for seminars
2) testing ourselves by presenting the content
3) measuring and assessing our performance by feedbacks from peer group
4) getting evaluated by our teachers against a set criterion.
13. Types of Tests
1) Discrete point tests and Integrative tests
Discrete point tests are expected to test one item or skill at a time.
For eg. Testing vocabulary or grammar.
Integrative tests combine various items, structures, skills, into one
single test.
For eg. Writing essay, giving directions.
14. Continued…
2.) Achievement tests
It is an important tool in school evaluation and has a great significance in measuring instructional
progress and progress of students in the subject area
“Any test that measures the attainments or accomplishments of an individual after a period of
training or learning.”
NM Downie
“A type of ability test that describes what a person has learned to do.”
Throndike and Hagen
15. Types of Achievement Tests
Fig. Difference between standardized and
Non-standardized tests
Source - Comprehensive textbook
of Nursing Education
by Jaspreet Kaur Sodhi
16. Continued…
Fig. Difference between standardized and
Non-standardized tests
Source - Comprehensive textbook
of Nursing Education
by Jaspreet Kaur Sodhi
17. Purposes of Tests
To ensure that set objectives are achieved.
To determine the progress of learners in the class.
To highlight areas where learning has not taken place and where re-teaching is needed.
To place students/candidates into a particular class, school, level or employment. Thus, teachers
can use tests to place a pupil into 2nd year after he/she has passed the test set for 1st year, and so
on.
Tests are used to predict outcomes Tests helps to predict whether or not a learner will be able to
do a certain job or task. For eg. We assume that if one can pass the NCLEX test or examination,
she/he will be able to work as a registered nurse.
18. Characteristics of a Good Test
Validity
Reliability
Accurate measurement of academic ability
Combine both discrete point and integrative test procedures
Represents Teaching-Learning objectives and goals
Test material must be properly and systematically selected
Variety
Test Objectivity
Comprehensiveness
Discrimination
19. Objectivity of a Test
A test is objective when the scorer's personal judgment does not affect the
scoring. It eliminates fixed opinion or judgment of the person who scores it.
The objectivity is a prerequisite of reliability and validity.
The objectivity of a test can be increased by: Using more objective type items,
e.g. Multiple choice, short answers, true or false.
Preparing scoring key.
20. Validity of a Test
Validity refers to how well a test measures what it is purports to measure.
An evaluation procedure is valid to the extent that it provides an assessment of
the degree to which learners have achieved specific objectives, content matter
and learning experiences.
It is a matter of degree i.e. high, moderate, low. It does not exist on an all or
none basis.
21. Factors Affecting Validity
Unclear direction results to low validity.
If reading vocabulary is poor, the students fail to reply to the test item, even if
they know the answer.
Difficult sentences are difficult to understand, unnecessarily confused, which
will affect the validity of the test.
Medium of expression-English, as the medium of instruction and response for
non-english medium students creates more serious problem, affects the validity
of a test.
22. Continued…
Difficulty level of items: too easy or too difficult test items would not
discriminate among learners; thereby the validity of a test will be lowered.
Influence of extraneous factors, e.g. style of expression, legibility, mechanics
of grammar, handwriting, length of the answer and method of organizing the
matter.
Inappropriate time limits-if no time limit is given the results will be
invalidated.
23. Types of Validity
1. Face Validity - ascertains that the measure appears to be assessing the intended construct under study.
The expert panelist can easily assess face validity. This is not a very scientific type of validity.
2. Content Validity - it ensures that the measure covers the broad range of areas within the concept under
study. Not everything can be covered, so items need to be sampled from all of the domains.
3. Criterion Validity - How well test performance predicts future performance or estimates current
performance on the valued measures other than itself (a criterion).
4. Construct Validity - is used to ensure that the measure is actually measuring what is intended to
measure (i.e. The construct), and not other variables. A panel of experts' familiar with the construct can
examine the items and decide what that specific item is intended to measure.
24. Reliability of a Test
Reliability is the degree to which an assessment tool produces stable and consistent results.
OR
The degree of consistency among test scores.
25. Factors Affecting Reliability
Data collecting method
Interval between testing occasions
Group homogeneity – more homogenous, more reliable
Speed of the method
Difficulty of the items-Too easy or too difficult tests for a group will tend to be less reliable
because the differences among the students in such tests are narrow.
26. Continued…
Objectivity of scoring is more reliable than subjective scoring.
Ambiguous wording of items is less reliable.
Optional questions: If optional questions are given, the same student may
not attempt the same items on a second administration; thereby the
reliability of the test is reduced.
27. Types of Reliability
1.) Test-retest reliability is a measure of
reliability obtained by administering the same
test twice over a period of time to a group of
individuals.
Example: A test designed to assess student
learning in psychology could be given to a
group of students twice, with the second
administration perhaps coming a week after
the first. The obtained correlation coefficient
would indicate the stability of the scores.
28. Continued…
2.) Parallel forms reliability is a measure of
reliability obtained by administering different
versions of an assessment tool (both versions must
contain items that probe the same construct, skill,
knowledge base, etc.) to the same group of
individuals.
The scores from the two versions can then be
correlated in order to evaluate the consistency of
results across alternate versions.
Example: If you wanted to evaluate the reliability
of a critical thinking assessment, you might create a
large set of items that all pertain to critical thinking
and then randomly split the questions up into two
sets, which would represent the parallel forms.
29. Continued…
3.) Interrater reliability is a measure of
reliability used to assess the degree to which
different judges or raters agree with their
assessment decisions.
Inter rater reliability is useful because human
observers will not necessarily interpret answers
the same way.
It is especially useful when judgment can be
considered relatively subjective.
Thus, the use of this type of reliability would
probably be more likely when evaluating, artwork
as opposed math problems.
30. Continued…
4.) Internal consistency reliability is also called homogeneity. Internal consistency ensures that
all the subparts of a research instrument measure the same characteristics.
For example, a patient's satisfaction measurement scale developed to measure his or her
satisfaction with nursing care must include all the subparts related to the measurement of
satisfaction with nursing care only; including a subpart related to patient's satisfaction with health
care would be inappropriate in this scale.
31. Usability of a Test
The overall simplicity of use of a test for both constructor and for learner. It is an important
criterion used for assessing the value of a test.
Practicability depends upon various factors like ease of administrability, scoring, interpretation
and economy.
This includes the case of administering the test with little possibilities for error in giving
directions, timing, ease and economy of scoring without sacrificing accuracy and the ease of
interpretation.
32. Construction of a Test
If a test has to be really
made valid, reliable and
practical, then it will have
to be suitably planned.
Previous available 10 years
exams should be kept in
view so that test can be
planned in an effective
manner.
33. General guidelines for constructing test items
Place items from easy to difficult or periodically place easier items in the test.
All items of the same type should be together. Don't use more than two item types, in most cases.
Include directions for the total test and each test section. Include instructions how to respond and
the point value for questions in each section.
Try not to include information that will offer answers to previous questions.
Try to put all matching on one page (etc.) so students do not have to turn pages to match answers
and responses.
34. Continued…
Title the test and each section.
Test only one idea or principle in each item.
List items in a systematic order.
Arrange the correct alternatives for multiple-choice items randomly.
Avoid ambiguous questions.
Keep the reading difficulty of test items low unless your aim is to measure verbal and
reading abilities.
35. General Precautions in Test Construction
It should be decided when the test has to be conducted in the context of time and
frequency.
It should be determined how many questions have to be included in the test.
It should be determined what types of questions have to be used in the test.
Those topics should be determined from which questions have to be constructed. This
decision is taken keeping in view the teaching objectives.
The level of difficulty of questions should be decided at the beginning of the test.
36. Continued…
The format and type of printing should be decided in advance.
It should be determined what should be the passing score.
In order to control the personal bias of the examiner, there should be a provision for
central evaluation. A particular question should be checked by the same examiner.
A rule book should be prepared before the evaluation of the scripts.
37. Steps in Construction of Tests
STEP 1 – Selection of teaching objectives for measurements
Those teaching objectives among Cognitive, Affective and Psychomotor
domains which are to be made the base of test construction should be
selected first.
For eg. Cognitive Domain (Knowledge, comprehension, application,
analysis, sysnthesis, evaluation) – knowledge
38. The teacher assigns
weightage to these
selected objectives
according to task done
and importance of
objectives.
STEP 2 – Assigning
weightage to selected
objectives
S.no. Objectives Marks %
1. Knowledge 10 20
2. Understanding 20 40
3. Application 8 16
4. Analysis 5 10
5. Synthesis 5 10
6. Evaluation 2 4
Total 50 100
39. STEP 3 – Weightage to
content
Content is used as the means of
realizing the objectives and
questions have to be constructed
on its basis. Therefore, it
becomes necessary to give
weightage to it.
S.no. Topics Marks %
1. Introduction 10 20
2. Definition 5 10
3. Types 8 16
4. Purposes 5 10
5. Procedure
steps
20 40
6. Complications 2 4
Total 50 100
40. S. no. Form of
questions
No. of
questions
Marks Percentage
1. Objective
type
25 25 50
2. Short
answer
type
5 15 30
3. Long essay
type
1 10 20
Total 31 50 100
STEP 4 – Giving
weightage to the type of
items
In this step, a teacher
determines the number of
items, their types, their
relative marks. For it, it
would be convenient to
use the following table:
41. STEP 5 – Determining alternatives
At this level, it is determined how many alternatives or options should be
given according to the type of questions.
Giving alternatives influences the reliability and validity of a test; therefore,
it is suggested that alternatives should not be given in objective type
questions, while in essay type questions only internal choice can be given.
42. STEP 6 – Division of Sections
If the scope or types of questions are uniform, then it is not necessary to
divide the test into sections.
However, if it is diverse and different types of questions have been specified
and the nature of the test seems to be heterogeneous, then a separate section
should be made comprising each type of item.
43. STEP 7 – Estimation of time
At this step estimation of the total time the whole test is likely to calculate.
Time is estimated on the basis of the type and number of items, some time should be reserved
for distribution and collection of answer sheets.
44. STEP 8 – Preparation of the
Blue Print
◦
A blueprint provides a bird's eye view
of the entire test. In it we can see the
topics, teaching objectives, and types of
questions, number of items and
distribution of scores and their mutual
relationships.
A blueprint is the basis for test
construction.
A format is given below:
Content Knowledge
Comprehens
ion
Applicati
on
Analysi
s/
Synthe
sis
Evaluati
on
Tot
al
scor
e
Principles 2 2 2 6
Factors
Affecting
3 3 4 10
Pathophysio
logy
3 3 4 10
Assessment 1 3 4 2 10
Nursing
Measures
3 3 4 10
Evaluation
of Care
1 1 2 4
Total Items 9 15 18 8 50
45. STEP 9 – Preparation of score key
A score key increases the reliability of a test, so that the test constructor should provide the
procedure for scoring the answer script.
Directions must be given whether the scoring will be made by a scoring key (When the
answer is recorded on the test paper) or by scoring stencil (when the answer is recorded on
a separate answer sheet) and how marks will be awarded to the test items.
In case of essay type item, it should be indicated whether to score with 'point method' or
with the 'rating method'. In the point method, each answer is compared with a set of ideal
answers in the scoring key. Then a given number of points are assigned.
In the rating method, the answers are rated on the bases of degrees of quality and determine
the credit assigned to each answer.
46. First tryout of the Test
At this stage the initial format of the test is administered to a small representative sample.
After that the process of item analysis is used to calculate difficulty level and
discriminative value.
There are a variety of techniques for performing an item analysis, which is often used, for
example, to determine which items will be kept for the final version of a test.
There are a variety of techniques for performing an item analysis, which is often used, for
example, to determine which items will be kept for the final version of a test.
Detailed discussion of Item Analysis will be done in next unit.
47. Preparation of Final Test
The test will provide useful information about the students' knowledge of the learning
objectives. Considering the questions relating to the various learning objectives as separate
subtests, the evaluator can develop a profile of each student's knowledge of or skill in the
objectives.
The final test is constructed after the above analysis for this in a suitable format is prepared
and norms are specified. Also, instructions for examinees be prepared. The test constructed
in accordance to the above referred procedure will definitely assume a purpose or an idea
of what is good or desirable from the standpoint of individual or society or both.
48.
49. Summary
In this seminar we discussed about the meaning, definition, purposes,
characteristics, objectivity, validity, reliability, usability and construction of
tests.
50. Related Research Article
Measuring Pupil Progress: A Comparison of Standardized Achievement Tests and Curriculum-Related Measures
Douglas Marston , Lynn S. Fuchs, Stanley L. Deno
Experimental Teaching Project, Minneapolis Public Schools
First Published January 1, 2002 Research Article
Abstract
In a series of two studies, the relative sensitivity of traditional standardized achievement tests and
alternative curriculum-based measures was assessed. In the first investigation, the magnitude of
student growth over a 10-week period on the two types of instruments was compared in the areas
of reading and written language. The second study was an operational replication employing
different reading tests over a 16 week interval. Results indicated that the curriculum-based
measures were more sensitive to student progress and related more consistently to a criterion
measure of student growth.
51. Conclusion
Tests make the basis of evaluation for all the domains which are supposed
to measure numerous characteristics and traits of learners. It is on the part
of a good teacher to prepare the test paper which is free of biases and
prejudices. A test should be constructed in a way it is universally applicable
and practicable.
By preparing this seminar I gained knowledge and understanding of Test
attributes and its construction. This has given me ample amount of
confidence to comprehend test outline and design such tests in future
independently.
52. Bibliography
1.) Meaning of Test; No date. Available from:
https://www.google.com/search?q=meaninf+of+test&oq=meaninf+of+test&aqs=chrome..
69i57j0l5.4692j1j7&sourceid=chrome&ie=UTF-8
2.) Jaspreet Kaur Sodhi. Comprehensive Textbook of Nursing Education. 1ST Edition.
New Delhi: Jaypee Brothers Medical Publishers; 2017. Pg no 195-204
3.) KP Neerja. Textbook of Communication and Education Technology for Nurses. 1st
Edition. New Delhi: Jaypee Brothers Medical Publishers; 2011. Pg no 417-422
53.
54. Evaluation
1.) Which term best describes the consistency of an assessment measure?
A.) Variance
B.) Reliability
C.) Validity
D.) Correlation
Ans. B.
55. 2.) Mr. Jones, the classroom teacher, administers a test to Joey in September and then again
in October. Joey’s scores are one point off. This test could be said to have :
A.) Split half reliability
B.) Interrater Reliability
C.) Test retest Reliability
D.) None
Ans. C
56. 3.) Mark and Eve collect data on the same student using the same assessment and find
their data is almost exactly the same. It could be said that Mark and Eve have:
A.) Interrater Reliability
B.) Alternate forms
C.) Test retest Reliability
D.) None
Ans. A