Test-development-final-REPPORT.pptx

Test development
•In this context, test
development is an umbrella
term for all that goes into the
process of creating a test.

The process of developing a test occurs in five stages:
• 1. Test conceptualization; idea for a
test is conceived(gi planohan or gi buhatan ug
idea unsa nindut na test ang i porma)
• 2. Test construction; process of test
development that entails writing test items (or re-writing
or revising existing items), as well as formatting items,
setting scoring rules, and otherwise designing and
building a test.

• 3. Test tryout;it is administered to a representative
sample of testtakers ,(Homan ma construct nato ang ato
test, ato try out sa group of sample of test takers.)
• 4. Item analysis; on the test as a whole and on each
item wilThe data from the tryout will be collected and
testtakers’ performance l be analyzed. Statistical
procedures. (katong gipang buhat nato na item gikan sa
gibuhat na test atong, gi gather ang score or result sa
atong test na gibuhat then ato nang i analyze ug evaluate.

• 5. test revision: refers to action taken to modify a
test’s content or format for the purpose of improving the
test’s effectiveness as a tool of measurement
• (human sa makuha na nato ang result sa atong item
analysis ato na dayun i modify or naay usabon sa atong
test if naay usbon example content ug format),

Test conceptualization
• The beginnings of any published test can probably be
traced to thoughts—self-talk, in behavioral terms.
• The test developer says to himself or herself something
like:
• Unsay may nindut test ha na maka tabang sa society at
large? kinsay mga tao sa akong test,unsay nindut na
objective sa test, valid ba akong test ug reliable etc.

• Norm-Referenced Test
• report whether test takers performed better or worse than
the NORMS(common, normal, average, or basehan nga
mga tao ), which is determined by comparing scores
against the performance results of a statistically selected
group of test takers, typically of the same age or grade
level, who have already taken the exam.
• akong score or result or perfomance gi compare sa
result score or result, perfomance sa norms or sa tanan
nag take sa test.

The purpose of norm-referenced tests
is to rank individuals in relation to
others of a similar representative group.
Norm-referenced tests are used for
many purposes such as college
entrance (the SAT and ACT) and IQ
tests.
pasabot na kita daw tanan or test
takers maka score in average
score ,ug dira na makita kung
kinsay pinaka taas ug kuha or
kinsay pinaka baba ug kuha na
score , so dira na mahibal an ang
atong performance

Criterion Referenced Test
• criterion-referenced tests measure a test taker's
performance compared to a specific set of standards or
criteria within the test.
• meaning gina measure akong performance or result ug gi
compare ngadto sa standard score or criteria sa maong
test.

• These tests can also have cut scores that determine
whether a test-taker has passed or failed the test or has
basic, proficient, or advanced skills.
• example:
• Licensed to practice Medicine, Practice Psycholgist or
kanang mag take ug Civil Service.

Pilot Work
• refer also as pilot study, and pilot research.
• to the preliminary research surrounding the creation of a
prototype of the test.it is a neccesity when constructing a
test or other measurement you want to measure or to
create.
• In pilot work, the test developer typically attempts to
determine how best to measure a targeted construct.
• for example: bago nako ipa finalize akoa gibuhat na soft
drinks or kape akoa sani ipa free taste kung okay naba or

Test construction
• the set of activities involved in developing and evaluating
a test of some psychological function.
• is a stage in the process of test development that entails
writing test items , as well as formatting items, setting
scoring rules, and otherwise designing and building a test.

• Scaling=the act of arranging in a graduated series.
(grading, ordering. the act of putting things in a
sequential arrangement) . Scaled scores can facilitate
feedback
. testtaker is presumed to have more or less of the
characteristic the testtaker presumably possesses.
• Number was assigned to responses so that of test score
can be calculated through SCALING the test items.

•Scaling may be defined
as the process of setting
rules for assigning
numbers in
measurement.

• L.L Thurstone , he was credited as
forefront of efforts to develop sound
scaling methods.
• He adapted psychophysical
scaling methods to the study of
psychological variables such as
attitudes and values
• Thurstone scale was the first formal
technique to measure an attitude
• One of the Primary Architects of
modern factor analysis.

Attitudes and Values of Human
• like–dislike
• that include pairs such as wise–foolish, beneficial–
harmful, enjoyable–not enjoyable,
• good–bad, and
• pleasant–unpleasant.
• example given previously, participants could respond to
scales.

Unidimensional Scaling
• the quality of measuring a single construct, trait, or other
attribute.
• For example, a unidimensional personality scale, attitude
scale, or other scale would contain items related only to
the respective concept of interest
• Ex.self-esteem, which are assumed to have a single
dimension going from low to high

• The three most popular unidimensional scaling methods
• (1) Thurstone’s equal-appearing scaling,
• (2) Likert’s summative scaling
• (3) Guttman’s cumulative scaling

Multidimensional Scaling
• the quality of a scale, test, or so forth that is capable of
measuring more than one dimension of a construct.
• employ different items or tests to measure each
dimension of the construct separately, and then combine
the scores on each dimension to create an overall
measure of the multidimensional construct

• Example, academic aptitude can be measured using two
separate tests of students’ mathematical and verbal
ability, and then combining these scores to create an
overall measure for academic aptitude.
• (e.g., academic aptitude, intelligence)

• Thurstone was one of the first and most productive
scaling theorists. He actually invented three different
methods for developing a unidimensional scale:
• the method of paired comparisons
• the method of equal-appearing intervals,
• the method of successive intervals.

Method of Paired Comparison(Ordinal or Ranking) by
L.Thurstone
• Testtakers are presented with pairs of stimuli (two
photographs, two objects, two statements), which they are
asked to compare.

method of equal-appearing intervals
One scaling method used to obtain data that are presumed
to be interval.
• used to measure attitudes of people in interval nature
• Each statement tries to indicate the attidude in a slightly
way.
• Should have positive,negative and neutral

• Example:If measuring attitude toward a life satisfaction:
• I had find easy what I wanted in life
• I had trouble in finding job
• Its okay to be in simple life
• I feel like my life is miserable
• can answer in Agree or Neutral, Disagree

method of successive intervals
• is a psychological scaling procedure in which stimuli are
classified into successive intervals according to the
degree of some defined attribute which they are judged to
possess.(need to transform the test taker responses into
some scale.)

• I had find easy what I
wanted in life
• I had trouble in finding job
• Its okay to be in simple
life
• I feel like my life is
miserable
• can answer in Agree or
Neutral, Disagree

Example method of successive intervals

Types of Scaling or Scaling Methods
• Rating Scales(Ordinal or Ranking)
• which can be defined as a grouping of words, statements,
or symbols on which judgments of the strength of a
particular trait, attitude, or emotion are indicated by the
testtaker.
• Rating scales can be used to record judgments of
oneself, others, experiences, or objects, and they can
take several forms

• as a closed-ended survey
question used to represent
respondent feedback in a
comparative form for
specific particular
features/products/services.
• it also called summative
scale.

•Likert Scale(Ordinal or Ranking
• Usually to scale attitude(a unidimensional scale that
researchers use to collect respondents' attitudes and
opinions)
• Each item presents the testtaker with five alternative
responses (sometimes seven), usually on an
• agree–disagree or approve–disapprove continuum
• involves a series of statements that respondents may choose
from in order to rate their responses to evaluative questions

• Guttman Scale (Ordinal or Ranking)
• Items on it range sequentially from weaker to stronger
expressions of the attitude, belief, or feeling being
measured.
the purpose of Guttman scaling is to establish a one-
dimensional continuum for a concept you wish to measure.
The resulting data are then analyzed by
means of scalogram analysis, an item-analysis procedure
and approach to test development that
involves a graphic mapping of a testtaker’s responses.

Guttman Scale
Example set of items or statements
above.
so that a respondent who agrees with
any specific question in the list will also
agree with all previous questions.
Also Known as
SCALOGRAM
ANALYSIS
CUMULATIVE
SCALING

WRITING ITEMS
• Item format( Variables such as form ,plan, structure,
arrangement and layout of individual test items.
Two types of Item Format
• 1 Selected-Response format: require testtakers to select
a response from a set of alternative responses
• ex:Multiple-Choice, Matching items and True-
Flase(Binary-Choice Item)

• 2 Constructed-Response formatt: require testtakers to
supply or to create the correct answer, not merely to
select it
• ex: Essay Items, Completions items or short-answer
items.

• Item pool, is the reservoir(reserba) or
well from which items will or will not be
drawn for the final version of the test.

• Writing items for computer administration
two advantages of digital media:
item bank is a relatively large and easily accessible
collection of test questions.
item branching the ability to individualize testing through a
technique,ability of the computer to tailor the content and
order of presentation of test items on the basis of responses

• computerized adaptive testing (CAT)
• an interactive, computer_x0002_administered test-taking process
wherein items presented to the testtaker are based in part on the
testtaker’s performance on previous items
• CAT tends to reduce floor effect and Ceiling Effect

Scoring Items.
3 Different Test Scoring Model
• Cumulative Scoring model, Test Score or result
is presumed to represent the strenght of the
targeted ability,or trait or state
• example: traditional test

Test of Interpersonal Agree Sometimes Disagree
I like to talk people yes
I like to go outside yes
I like to play with others yes
result It means you have the
of Interpersonal Ability

• Class or Categorical Scoring,
• This approach is used by some diagnostic systems
wherein individuals must exhibit a certain number of
symptoms to qualify for a specific diagnosis.
• example: DSM 5,diba by category ang psychological
diseases didto.

Ipsative scoring
• is comparing a testtaker’s score on one scale(
ex.Introvert) within a test to another scale (
ex.Extravert) within that same test.
•
• Example: EPPS(Edward Personal
PreferenceSchedule) a kind personality test.

• EPPS-like forced-choice item, to which the
respondents would indicate which is “more
true” of themselves:
• I feel depressed when I fail at something.
• I feel nervous when giving a talk before a
group

TEST TRYOUT
• From pool of items from which the final version of the test
will be developed, the test developer will try out the test.
• Ang Pangutana pila ka Tao ang kinahanglan sa Test out?
• An informal rule of thumb is that there should be no fewer
than 5 subjects and preferably as many as 10 for each
item on the test, (the more the better)

• Pseudobulbar affect (PBA), is a
neurological disorder characterized by
frequent and involuntary outbursts of
laughing or crying that may or may not be
appropriate to the situation

Good Item?
• reliable and valid
• helps to discriminate testtakers :
• item 1 is answered correctly (or in an expected manner)
by high scorers on the test as a whole.
• item 1 is answered incorrect by low scorers on the test
as a whole to Low Scorers.
• an item that is answered incorrectly by high scorers on
the test as a whole is probably not a good item.

How does a test developer identify good items?
• After the first draft of the test has been administered
to a representative group of examinees, the test
developer analyzes test scores and responses to
individual items=Item Analysis

ITEM ANALYSIS
• Item analysis refers to statistical methods used for
selecting items for inclusion in a psychological test.

Index of the item’s difficulty
• Item-Difficulty Index (per item ni sya pag analyze)
• the item difficulty is simply the percentage of students
who answer an item correctly. The item difficulty index
ranges from 0 to 100; the higher the value, the easier
the question.
• note: THE LARGER THE ITEM-DIFFICULTY INDEX,
THE EASIER ITEM and Vice Versa.
• THE SMALLER THE ITEM-DIFFICULTY INDEX, THE
DIFFICULT THE ITEM

• Individual items on the test must be
ranging only to .30 to .80 of Item
Difficulty Index.
• (average sa Item Difficulty is .50)
• if naay gani naka butang sa item-
difficulty (.30,.46.,40.,45) -Difficult
• if naay gani naka butang sa item-
difficulty(.60,.65,.70,.76,80.)-Easy

• if naay Item Difficulty .75? easy of Difficult?
• if naay Item Difficulty .50?

• Item-Difficulty Index
• In the context of
Achievement Test or
Traditional Test
• Item-endorsement index
• In the context of
Personality Test

The Item-Reliability Index:
provides an indication of the internal consistency of a test
The Item-Validity Index
statistic designed to provide an indication of the degree to
which a test is measuring what it purports to measure

• The Item-Discrimination Index
• Measures of item discrimination indicate
how adequately an item separates or
discriminates between high scorers and low
scorers on an entire test.

The Item-Discrimination Index
Item 25 is answered correctly by high scorers on
the test as a whole.
item 25 is answered incorrect by low scorers on
the test as a whole to Low Scorers.
an item that is answered incorrectly by high
scorers on the test as a whole is probably not a
good item.

Item-Characteristic Curves(ICC)
•is a graphic
representation
of item
difficulty and
discrimination.

• Speed tests
• a type of test na naay time
frame.
• example: typing test,
• any kind of test na naay
short time limit.
• Power test
• a type test na nag sugod
ang item sa easy dayun
anam ka lisod ang item.
example: test na sugod sa
easy dayun anam ug ka
lisod.

Test Revision
• stage in the development of a new test.
• test revision in the context of modifying an existing
test to create a new edition. Much of our discussion
of test revision in the development of a brand-new
test may also apply to the development of
subsequent editions of existing tests, depending on
just how “revised” the revision really is.
Diri na mupasok ang
• Co-Validation & Cross Validation.

• Cross Validation,(Rotation Estimation or Out-Of-
Sample Testing)
• revalidation of a test on a sample of testtakers other
than those on whom test performance was originally
found to be a valid predictor of some criterion.
• example:nagbuhat kog Test para sa English grammar
nya akong mga participant nako(test taker) kay Major
In English, human akong gi Cross Validate(gi pa try ig
test) ngadto lahi na participant Major in Math.

• Validity shrinkage,The decrease in item validities
that inevitably occurs after cross-validation of
findings
• we expect na mugamay ang validity(standardized)
sa atong final version test item tungud kay atong gi
administered sa lahi na participant or dili apil sa
criteria na test taker

• Co-validation
• may be defined as a test validation process conducted on two or
more tests using the same sample of testtakers.
• A current trend among test publishers who publish more than one
designed for used with same population is to co-validate or co-
norm
• example:nag pa conduct kog patest ko sakong new develop test
nga English Grammar dayun gi sabayan sa nakog pa test existing
na English Vocabulary nya akong participant or test taker kay isa
ra Major In English

Cross Validation
• Isa ka New
Develop Test gi pa
test sa duha ka
lahi -lahi grupo na
test taker
Co-Validation
• Duha ka test same
construct dayun gi
pa validate or gipa
test sa isa ra ka
grupo na test taker

Test-development-final-REPPORT.pptx

Test-development-final-REPPORT.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Test-development-final-REPPORT.pptx

Similar to Test-development-final-REPPORT.pptx (20)

More from EzriCoda1

More from EzriCoda1 (9)

Recently uploaded

Recently uploaded (20)

Test-development-final-REPPORT.pptx