Laos Session 3: Principles of Reliability and Validity (EN)
1. Session 3: Principles of Assessment:
Validity and Reliability
Professor Jim Tognolini
2. Introduction to Modern Assessment
Theory: A basis for all assessments
During this session we will
•define reliability.
•define measurement error.
•examine the sources of measurement error.
•define validity and identify threats to validity.
•build assessment frameworks
•operationalise frameworks with Tables of Specification.
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
3. The reliability of results gives the extent to which the results are
consistent or error-free. The concept of reliability is closely associated with
the idea of consistency.
Reliability is not an all or nothing concept; there can exist degrees of
reliability.
How similar are results if students are assessed at different times? How
similar are results if students are assessed with a different sample of
equivalent tasks? How similar are results if essays have been marked by
different markers.
Reliability
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
5. Sources of Measurement Error
The following are some of the sources of measurement error
1. Test taking skills
2. Comprehension of instructions
3. Sampling variance of items
4. Temporary factors such as health; fatigue; motivation; testing
conditions.
5. Memory fluctuations
6. Marking bias (especially in essays)
7. Guessing
8. Item types
The aim for test developers is to identify sources of measurement
error and minimise their impact.
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
7. Validity
The validity of the results if a test can best be defined as the extent to
which the results of a test measure what they purport to measure.
It is the interpretation (including inferences and decisions) that is
validated, not the test or the test score.
Messick also argued (1989) that validation can include the evaluation of
the consequences of the test; are the specific benefits likely to be
realised?
In 1999 the Standards (AERA, APA and NCME) suggested that validation
can be viewed as developing scientifically sound validity arguments to
support the intended interpretation of test scores and their relevance to
the proposed use.
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
8. Threats to Validity
1. Factors in the test itself
I. Unclear direction (e.g. how to respond to guessing; recording
answers).
II. Reading vocab and sentence structure is too difficult
III. Inappropriate level of difficulty of test items (e.g. guessing)
IV. Poorly constructed test items
V. Ambiguity
VI. Test items (tasks) inappropriate for content being assessed
VII.Test too short
VIII.Improper arrangement of items
IX. Identifiable pattern of answers
2. Factors in test administration and scoring
I. Insufficient time
II. Cheating
III. Unreliable scoring
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
9. Reliability is a necessary but insufficient condition for validity.
Relationship between Validity and Reliability
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
10. Some basic assessment theory
• Validity and reliability are not deterministic – maximise
validity and reliability
• Validity is paramount
• Ways to minimise threats to validity and reliability
Breadth of material sampled – increase validity
Guessing
Quality of items
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
12. Preliminary Questions
• Why are we assessing?
• What are we assessing?
• What is the most appropriate way to assess these outcomes?
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
13. Assessment framework
Definition of construct
Domains/strands
Sub-domains/sub-
strands
Outcomes/content
standards
…
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
14. Example 1 - Mathematics
Mathematics
Domains/strands
Sub-domains/sub-
strands
Outcomes/content
standards
…
Number
Addition &
subtraction
Multiplication and
Division
Fractions and
Decimals
Measurement
Space
Chance
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
15. Example 1 - Mathematics
Mathematics
Domains/str
ands
Sub-domains/sub-
strands
Outcomes/content
standards
Progress levels
Number
Addition &
subtraction
Students develop facility
with number facts and
computation with larger
numbers in addition and
subtraction and an
appreciation of the
relationship between those
facts
Early Stage 1
Combines,
separates and
compares
collections of
objects, describes
using everyday
language and
records using
informal methods
Stage 1
Uses concrete
materials and
mental strategies
for addition and
subtraction
involving one- and
two-digit numbers
Multiplication and
Division
Multiplication and Division
Students develop facility with
number facts and computation
with larger numbers in
multiplication and division and
an appreciation of the
relationship between those
facts
Early Stage 1
Groups and shares
collections of objects,
describes using
everyday language and
records using informal
methods
Stage 1
Models and uses
strategies for
multiplication and
division
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
16. Example 1 - Mathematics
Mathematics
Domains/
strands
Sub-domains/sub-
strands
Outcomes/content
standards
Progress levels
Number
Addition &
subtraction
Students develop facility
with number facts and
computation with larger
numbers in addition and
subtraction and an
appreciation of the
relationship between
those facts
Stage 2
Uses mental and
written
strategies for
addition and
subtraction
involving two-,
three- and four-
digit numbers
Stage 3
Selects and
applies
appropriate
strategies for
addition and
subtraction with
numbers of any
size
Multiplication and
Division
Multiplication and Division
Students develop facility
with number facts and
computation with larger
numbers in multiplication
and division and an
appreciation of the
relationship between
those facts
Stage 2
Uses mental and
written strategies
for multiplication
and division
Stage 3
Selects and
applies
appropriate
strategies for
multiplication and
division
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
17. Example developmental continuum for mathematics
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
18. Assessment framework
Definition of construct
Domains/strands
Sub-domains/sub-
strands
Outcomes/content
standards
…
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
19. Example 2 – Scientific literacy
Scientific literacy
Domains/strands Outcomes/content standards …
Formulating
Formulating or
identifying investigable
questions and
hypotheses, planning
investigations and
collecting evidence
Interpreting
Interpreting evidence
and drawing conclusions,
critiquing the
trustworthiness of
evidence and claims
made by others, and
communicating findings
Using
Using understandings for
describing and explaining
natural phenomena,
making sense of reports,
and for decision-making
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
20. Example 2 – Scientific literacy
Scientific literacy
Domains/strands Outcomes/content standards …
Formulating
Formulating or identifying
investigable questions and
hypotheses, planning
investigations and collecting
evidence
Level 1 - Year 2
Responds to the teacher’s
questions, observes and
describes
Level 2 - Year 4
Given a question in a familiar
context, identifies a variable to
be considered, observes and
describes or makes non-
standard measurements and
limited records of data
Interpreting
Interpreting evidence and drawing
conclusions, critiquing the
trustworthiness of evidence and
claims made by others, and
communicating findings
Level 1 - Year 2
Describes what happened
Level 2 - Year 4
Makes comparisons between
objects or events observed
Using
Using understandings for
describing and explaining natural
phenomena, making sense of
reports, and for decision-making
Level 1 - Year 2
Describes an aspect or
property of an individual
object or event that has been
experienced or reported
Level 2 - Year 4
Describes changes to,
differences between or
properties of objects or events
that have been experienced or
reported
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
21. Domain A
Domain B
Domain C
Level 1 - Year 2 Level 2 - Year 4 Level 3 – Year 6
Formulating - Domain A
Interpreting - Domain B
Using - Domain C
T1
T2
T7
T8T6
T11
T3 T10
T4
T5 T12
Example developmental continuum for scientific literacy
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
22. • Preparing a list of learning outcomes – these describe the types of
performances the students are expected to demonstrate (e.g. Knows
basic terms – “Writes a definition of each term”; “Identifies the term
that represents each weather element”; etc.)
• Outlining the course content – the content describes the area in which
each type of performance is to be demonstrated (e.g. “air pressure”;
“wind”; “temperature”; etc.)
• Preparing a chart that relates the relative emphasis of the learning
objectives to the content through the number, type and percentage of
items.
Building a table of specifications
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
23. Table of specifications
Learning Outcomes
Content Area Basic Skills Application Problem
Solving
Total
Percentage
Fractions 5 5 5 15
Mixed
numbers
5 5 10 20
Decimals 5 15 10 30
Decimal to
Fraction
conversions
5 15 15 35
Total
Percentage
Points
20 40 40 100
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
24. Table of specifications - English
CONTENT
COGNITIVE
LEVELS
WEIGHTAGE MARKS
Identifies Interprets Infers Weightage Marks
PLOT
1 SA
(2marks)
1 Essay
(3marks)
1SA
(2marks)
28% 7marks
CHARACTER
1 SA
(2marks)
1 Essay
(4marks)
24% 6marks
CRISIS
1
Performance
task
(8 marks)
32% 8marks
LANGUAGE
1 SA
(1mark)
1 SA
(1mark)
1 SA
(2marks)
16% 4marks
Weightage 20% 32% 48% 100
Marks 5marks 8marks 12marks 25marks
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
25. Table of specifications - Geography
CONTENT COGNITIVE LEVELS WEIGHTAGE MARKS
Basic map skills &
Understanding
Application
Extended
Understanding
Weightage Marks
Physical
Landforms
1 SA
(2 marks)
1 Essay
(6 marks)
2 SAs
(4 marks)
24% 12 marks
Location
4 SA
(8 marks)
16% 8 marks
Climate
1 SA
(2 marks)
1 Perform. task
(16 marks)
1 SA
(2 marks)
40% 20 marks
Vegetation
2 SAs
(4 marks)
1 Essay
(6 marks)
20% 10 marks
Weightage 32% 44% 24% 100
Marks 16 marks 22 marks 12 marks 50 marks
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
26. Constructing a test that operationally
defines the scale.
Test constructors are challenged by the need to
1. define items that enable students at different stages along the scale
to demonstrate that they have enough of the subject (construct) to
correctly answer the item;
2. ensure that the items are assessing the outcomes for the particular
location on the scale;
3. ensure that as the items are being written, the ones that are
intended to be located further towards the top of the scale on the line
are, in fact, are more demanding then those that are located towards
the bottom of the scale on the line; and
4. ensure that the reason that the items are more demanding is a
function of the property/variable that is being measured and not a
function of some other extraneous feature (validity).
Capacity Development Workshop: Test and
Item Development and Design, Laos,
September 2016
27. Assessment Literacy: Question 1
What is the most important thing to consider when selecting a
method for assessing performance against learning
objectives?
•how easy the assessment is to score
•how easy the assessment is to prepare
•how useful the assessment is at assessing the learning
objective
•how well the assessment is accepted by the school
administration
Standards, Standard Setting and
Maintenance, March 2015
27
28. Assessment Literacy: Question 2
Standards, Standard Setting and
Maintenance, March 2015
28
What does it mean when you are told that the test is
“reliable”?
A.student scores from the assessment can be used for a large
number of decisions
B.students who take the same test are likely to get similar
scores next time
C.the test score accurately assesses the content
D.the test score is more valid than teacher-based
assessments
30. Standards, Standard Setting and
Maintenance, March 2015
30
Class teachers in a school want to assess their students’
understanding of the method for solving problems that they
have been teaching. Which one of the following would be the
most appropriate method for seeing whether the teaching had
been effective? Justify your answer.
•select a problem solving book with a problem solving test
already in it
•develop an assessment method consistent with what has
actually been taught in class
•select a problem solving test (like the PSA) that will give a
problem solving mark
•select an assessment that measures students’ attitudes to
problem solving strategies
Assessment Literacy: Question 3
31. The following Table of Specifications for a Mathematics
assessment was prepared by the classroom teacher. Use this
Table to answer items 4 and 5. Note: The numbers in the
cells refer to the number of items.
Standards, Standard Setting and
Maintenance, March 2015
31
Content Area
Bloom’s Taxonomy
Knowledge Comprehension Application Synthesis Analysis
Total
Place values
and number
sense
1 2 2 1 1 7
Space 2 3 3 2 0 10
Addition and
subtraction
2 4 4 5 1 16
Multiplication
& Division 1 3 2 2 2 10
Measurement
2 2 3 3 3 13
Total 8 14 14 13 7 56
32. Assessment Literacy: Question 4
How many items did the teacher aim to use to assess higher
order thinking skills, where higher order thinking skills are
those that assess items at or above Application in Bloom’s
Taxonomy?
A.14
B.34
C.7
D.None of the above
Standards, Standard Setting and
Maintenance, March 2015
32
33. Assessment Literacy: Question 5
Which one of the following statements BEST DEFINES a Table
of Specifications?
A.It ensures that the total number of marks for the
assessment will equal 100.
B.It classifies educational goals, learning objectives and
standards.
C.It relates the content to the cognitive level of the learning
objectives for the purpose of improving the validity of the
instrument.
D.It is a table that is used by teachers to reliably assess
students.
Standards, Standard Setting and
Maintenance, March 2015