Chapter 2 ppt eval & testing 4e formatted 01.10 kg edits

© 2013 Springer Publishing Company, LLC.
Chapter 2
Qualities of Effective
Assessment Procedures
&Oermann Gaberson
Evaluation and Testing in Nursing Education
4th edition

General Criteria for Effective
Assessment Procedures
♦ Produce results that can be used to make
appropriate inferences about learners’
knowledge and abilities
– Important educational decisions based on
such inferences
♦ Practical and easy to use
2

Guiding Questions
♦ To what extent will the interpretation of the
scores be appropriate, meaningful, and useful
for their intended application?
♦ What are the consequences of how the results
are used and interpreted?
3

Assessment Validity
♦ Concept has changed over time
♦ Current philosophy
– Meaningfulness of the interpretations that
teachers make of assessment results
– Adequacy and appropriateness of inferences
about scores and how results are used
– Emphasis on consequences (intended and
unintended) of test use
4

Assessment Validity (cont’d)
♦ Not a static property of the test itself
♦ Not an either/or judgment
– Degrees of validity depending on purpose of test
and how scores will be used
5

Assessment Validity (cont’d)
♦ Unitary concept
– Variety of sources of evidence to support the
validity of the interpretation and use of
assessment results
– Four major considerations for validation
• Content
• Construct
• Assessment-criterion relationships
• Consequences
6

Content Considerations
♦ Goal of content validation
– Determine the degree to which the assessment
tasks accurately represent the domain of content
or abilities about which the teacher wants to
interpret assessment results
– A test is only a sample of the universe of possible
assessment tasks
– “Face validity” is insufficient evidence of content
representativeness
7

Content Considerations
♦ Start by defining the universe of content
– Should be related to the purpose for which
the test will be used
♦ Write or select test items that satisfactorily
represent the desired content domain
– Test blueprint or table of specifications
documents
– Also important when selecting a published test
8

Content Considerations (cont’d)
♦ Assessed by content-domain experts
– Determine if assessment tasks represent the
• content domain (as specified on test blueprint)
• learning outcomes
– Trustworthiness of this evidence is based on
estimation of rater reliability
• How closely do the judgments of multiple
experts agree?
9

Construct Considerations
♦ “Umbrella” concept for all types of
assessment validation
♦ Goes beyond content considerations
– Used to make inferences from assessment results
to more general abilities (e.g., clinical reasoning)
– What construct is the assessment intended to
measure?
10

Construct Considerations (cont’d)
♦ Construct
– Characteristic assumed to exist because it explains some
observed behavior
– Cannot be observed directly—inferred from performance
♦ Construct validation
– Determining the extent to which assessment results can
be interpreted in terms of the construct
♦ Two central elements
– Construct representation
– Construct relevance
11

♦ Construct representation
– Extent to which important elements of the
construct are represented in the assessment
♦ Construct relevance
– Extent to which the assessment focuses only
on relevant elements of the construct
– Omits factors that are unrelated or irrelevant
to the construct (e.g., writing ability, English
language literacy)
12

♦ Methods used in construct validation
– Define the domain to be measured
– Analyze the process of responding to tasks
required by the assessment
– Compare assessment results of known groups
– Compare assessment results before and after a
learning activity
– Correlate assessment results with other measures
13

Assessment-Criterion Relationship
Considerations
♦ Predictive validation
– Focuses on predicting future performance (the
criterion) based on current assessment results
♦ Concurrent validation
– Uses assessment results to estimate
performance on another assessment (the
criterion measure) at the same time
– Not widely used for teacher-made assessments
14

Assessment-Criterion Relationship
Considerations (cont’d)
♦ Relationship between assessment scores and
criterion-measure scores usually expressed as
a correlation coefficient
♦ Teacher who uses the test must judge what
magnitude of correlation is adequate for the
intended use of the assessment
15

Consideration of Consequences
♦ Assessment has intended and unintended
consequences
♦ Concept of validity includes consideration of
the consequences of assessment use and how
results are interpreted by students, teachers,
and other stakeholders
16

Influences on Validity
♦ Characteristics of the assessment
– Examples: clarity of directions, number of items,
test construction errors
♦ Assessment administration and scoring factors
– Examples: cheating, scoring errors, time limits
♦ Student characteristics
– Examples: test anxiety, motivation
17

Reliability
♦ Consistency of test scores
♦ Extent to which test scores are accurate,
error-free, and stable
♦ Reproducibility and generalizability of
test scores
♦ Necessary but insufficient condition
for validity
18

Reliability (cont’d)
♦ Sources of inconsistency
– Instability of the behavior being measured
– Sample of tasks varies from one assessment to
another
– Assessment conditions vary significantly
– Scoring procedures are inconsistent
♦ These and other factors introduce an
unknown amount of error into every
measurement
19

♦ Obtained score
– The number of correct answers
♦ True score
– Hypothetical
– Cannot be measured directly
– Represents what the student actually knows
♦ Error score
– Difference between true score and obtained score
– Cannot be measured directly
– Affects measurement reliability
20

♦ Methods of determining assessment reliability
estimate how much measurement error is
present
♦ When assessment results are reasonably
consistent, measurement error ↓ and
reliability ↑
21

♦ Reliability pertains to assessment results, not
to the assessment instrument
♦ A reliability estimate always refers to a
particular type of consistency
♦ A reliability estimate is always represented by
a statistical value (reliability coefficient or
standard error of measurement)
22

Methods of Estimating Reliability
♦ Measures of stability
– Indicates whether students would achieve similar
scores if they took the same assessment at
another time—test-retest procedure
– Appropriate when the trait being measured is
expected to be stable over time
– Limited usefulness for teacher-made assessments,
but an important consideration when selecting
standardized tests
23

Methods of Estimating
♦ Measures of equivalence
– Use of two or more forms of the same
assessment, based on the same blueprint
– Both forms administered to the same group of
students in close succession; resulting scores are
correlated
– High reliability coefficient indicates that the forms
sample the domain equally well
– Widely used in standardized testing, but not
practical for teacher-made assessments
24

♦ Measures of internal consistency—split-half
methods
– Used with a set of scores from only one
administration of a single assessment: Divide the
assessment into two equal subtests, score
subtests separately, correlate the two sets of
subscores
– Underestimates the true reliability of the scores
produced by the whole assessment—correct with
Spearman-Brown prophecy formula
25

♦ Measures of internal consistency—coefficient
alpha
– Extent to which the assessment tasks measure
similar characteristics
– Kuder-Richardson formulas are a specific
type of coefficient alpha
• Require dichotomously scored assessment tasks
26

♦ Measures of consistency of ratings
– Determine if same scores would have been obtained if a
different person had scored the assessment or judged the
performance
– Two equally qualified persons score each student’s paper
or rate each student’s performance; two scores are
compared
– Produces a percentage of agreement or index of scorer
consistency (correlation)
– Interrater consistency facilitated by the use of scoring
rubrics and training of raters
27

Influences on Reliability of Scores
♦ Assessment-related factors
– Length of the test
• In general, more assessment tasks (e.g., test items) → greater
score reliability
– Homogeneity of assessment tasks
• Score reliability enhanced by homogeneity of content covered by
the assessment
– Item difficulty and discrimination ability
• Moderately difficulty items, good discrimination between high and
low achievers, and absence of technical errors → greater score
reliability
28

Influences on
Reliability of Scores (cont’d)
♦ Student-related factors
– Heterogeneity of the student group
• In general, increased range of ability in the group of students →
greater score reliability
– Testwiseness
• Student with test-taking skills and experience may obtain a higher
score than true ability would predict
– Motivation
• Influences individual students differently
• Scores of poorly motivated students may not accurately represent
their actual achievement levels
29

♦ Assessment administration conditions
– Time limits
• Inadequate time can lower the reliability of scores
• Some students who know the content well may be
unable to respond to all of the items
– Cheating
• Contributes random errors to assessment scores
• Raises offenders’ observed scores above their
true scores
Influences on
Reliability of Scores (cont’d)
30

Practicality (Usability)
♦ A quality of the assessment instrument itself
and its administration procedures
♦ Qualities of practical assessments
– Easy to administer and score
– Do not take too much time away from other
instructional activities
– Have reasonable resource requirements
31

Practicality (Usability; cont’d)
♦ Practicality criteria
– Easy to construct and use
– Reasonable time requirements for administration
and scoring the assessment and interpreting
results
– Reasonable costs associated with assessment
construction, administration, and scoring
– Assessment results can be interpreted easily and
accurately by those who will use them
32

Chapter 2 ppt eval & testing 4e formatted 01.10 kg edits

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (12)

Similar to Chapter 2 ppt eval & testing 4e formatted 01.10 kg edits

Similar to Chapter 2 ppt eval & testing 4e formatted 01.10 kg edits (20)

More from stanbridge

More from stanbridge (20)