Caveon webinar series - smart items- using innovative item design to make your life easier Q&A

CAVEON WEBINAR SERIES
S M A R T I T E M S : U S I N G I N N O V A T I V E I T E M D E S I G N
T O M A K E Y O U R L I F E E A S I E R
Q&A RESPONSES
Questions came from Webinar attendees
Answered by Dr. David Foster, Caveon
WHAT HAPPENS IF YOU SAY NO BUT THEN WANT TO GO BACK AFTER READING
THE OTHER OPTIONS?
This is not a question about SmartItems, but about the Discrete Option Multiple Choice (DOMC) item type.
The SmartItem design can be applied to any item type (multiple choice, DOMC, short answer, drag-and-
drop, and the rest of them, even essay questions). To answer your direct question, the DOMC format does
not allow returning to and reviewing items, nor returning to and reviewing options. That would be the same
policy if the DOMC were part of the SmartItem. I answer this question for the SmartItem itself in the next
question.
In general, and this is the Foster opinion, being able to return, review items and possibly change answers, is
a left-over artifact from paper testing. There are tests, such as computerized adaptive tests, which do not
allow this feature for clear reasons. As far as any clear advice from academic researchers, I believe the jury
is out as to whether allowing returning to items helps or hurts a person’s test score.
CAN YOU REVIEW YOUR ANSWERS?
I’m going to restate the question a bit: If you are taking a test with SmartItems, can you return to previous
questions? This is possible if the testing system stores the exact item version you had seen previously. The
testing engine should prohibit a new version of the SmartItem from being presented. So, the solution is
simple. Just present the same item version seen and answered earlier. Of course, and this was answered
above, with DOMC, returning to a previously answered question is not possible.
COULD YOU HAVE A FUTURE CALL TO SHOW HOW AN ITEM WRITER WOULD
ACTUALLY CREATE A SMART ITEM? I'M THINKING OF A DEMO TYPE SESSION
SHOWING THE TOOLS IN USE.
That would be easy to do. Rather than waiting for a standard webinar to be scheduled on this topic, just call
Caveon to get a personal demonstration of the software in action.

SMARTITMES: USING INNOVATIVE ITEM DESIGN - Q&A RESPONSES
Caveon Test Security
david.foster@caveon • www.caveon.com
WHAT ITEM BANK APPLICATIONS HAVE INTEGRATED
WITH THE SMART ITEM API AS OF TODAY?
Caveon’s Scorpion (item development and banking) and Caveon’s SEI
(software for administering an exam) are able to completely support
SmartItems. This is useful for new testing programs, and for testing programs
who need to change the technology they use to develop and administer
tests.
Caveon is working with a couple of other providers of testing software to
integrate the Caveon SmartItem API into their software. Using the SmartItem
API, users of other testing software can get the benefits of SmartItems
without having to change the software used for item development/banking
or test administration. The SmartItem API is easy to integrate and comes with
a GUI to help SME’s who are non-coders create SmartItems that use code.
The SmartItem API will be available in June 2018.
WHAT ITEM BANK APPLICATIONS HAVE INTEGRATED
WITH THE SMART ITEM API AS OF TODAY?
I described a couple of examples (one in History and one in ELA) while on the
call. If there had been time, I could have shown them, as they look behind the
scenes, but also how they perform on a test. We have created SmartItems
for “performance items”, for every level of cognitive complexity, for many
topics in education and on the job. There do not appear to be skills or
competencies where SmartItems would not work.
Rather than try to describe these well in this Q&A document, I’d be happy to
show you personally. This offer is open, not just to the person submitting the
question, but to any webinar attendee who wondered about this as well.
Please call Caveon to set something up. My personal email is
david.foster@caveon.com if that is easier. Seeing it work for a variety of areas
and skills is so much better than trying to read a description.

WHAT ABOUT DIFFICULTY OF ITEMS FROM THE SAME FAMILY?
I’m going to assume that by the term “same family”, you are asking about the differences in difficulty
naturally generated from a SmartItem, and how those differences impact test scores. If I’ve not understood
the question, my email is given in an earlier question and I’m happy to discuss over a call.
SmartItems generate items of different difficulty, almost by design and purpose. Skills or competencies
seem to always cover a range of content and sub-skills, resulting in item versions that are not equally
difficult. As a real simple example to illustrate the point, adding 10 and 10 is easier than adding 73 and 98.
One test taker might see the easy one and the other the difficult one, since they are both part of the
competency or standard.
This variation in difficulty is inherent in the SmartItem approach. To try to make all items equally equivalent
in difficulty (and in other characteristics as well) is to simply revert to what we have been doing for 100
years now. It is better to consider the advantages of spanning the entire competency and embracing the
fact that difficulty will be variable for test takers.
That said, how are we to handle this difference in difficulty? Naturally, it would not be good or fair if a test
taker were given all of the more difficult item versions on a test that has 60 Smartitems. Here is one
possible way to look at the problem. Since the generation of item versions by a SmartItem is a random
event, the difficulty differences reduce with subsequent items on a test. After a certain number of items
(this number likely depends on test content, testing conditions, and other factors) the difference in the
overall difficulty of the tests for test takers is likely going to be trivial or negligible. I’m not sure what that
number is, although I believe it is within the length of most high-stakes exams today. I’m conducting
research on SmartItems to work that out and should have something soon. Anyone who wishes to help
with that research is welcome.
I know it is odd and against most of what we have learned about testing to purposefully introduce such
variability into a test. It is anti-standardization at its core. We resist the notion. However, the reasons to
consider its use are powerful:
1. Eliminate virtually all security threats.
2. Make tests fair for all.
3. Motivate deeper and broader learning and teaching.
4. Reduce costs.
5. Make testing more convenient.
Research conducted so far over the last couple of years clearly shows that providing test takers with this
kind of variable experience does no harm to the quality of the items, nor the usefulness of the test score. In
fact, measurement may be improved, even beyond the fact that irrelevant behaviors (cheating, theft and
use of test taking skills) are prohibited.

IS THERE A SOFTWARE TO CREATE SMARTITEMS THAT
CAN BE INTEGRATED WITH AN ITEM BANKING SYSTEM?
The Caveon SmartItem API can be used to integrate this technology into other
development, banking and test administration systems.
WHAT ABOUT KNOWLEDGE, SKILLS AND ABILITIES
(KSAS) THAT ARE NOT ABOUT RECALL AND
IDENTIFICATION?
SmartItems can be created to measure any skills or competencies. Another way
to state this: If a test can be created today to measure what are considered to
be more complex skills, competencies, standards, etc., then the SmartItem
approach can be applied to that test.
I’m sure there is some doubt about my answer. If you wish to see more complex
examples, I’m just an email away.
CAN SMART ITEMS BE USED FOR ANY CONTENT
AREA? OR JUST MATH?
I hope I have answered this well enough during the webinar as well as in this
Q&A document.
HAVE YOU DEVELOPED AND TESTED ELA ITEMS?
We have created them, and can demonstrate the, but I have yet to subject them
to field testing. Want to be the first to do so? I can provide you free access to
Scorpion and SEI for this research purpose.
WHAT ABOUT HIGHER LEVEL BLOOM'S ITEMS--
ANALYSIS, SYNTHESIS?
A walk in the park for SmartItems. I don’t mean to be flippant, but my statement
is true. I believe I’ve answered this question above, and therefore can afford to
have a little fun.
As I mentioned above, I’m happy to demonstrate these items, which is a much,
much better type of answer.

HOW ARE SCORES ON SMARTITEMS COMBINED INTO A TOTAL TEST
SCORE THAT IS COMPARABLE FOR ALL EXAMINEES CONSIDERING THAT
INDIVIDUAL INSTANCES OF SMARTITEMS MAY DIFFER IN DIFFICULTY?
I hope I answered this well enough above, but I’ll answer it again now coming from a different perspective.
Today, on all tests, we combine the performance on items from individual test takers to create a total score.
We ignore the fact that those items may have been taken under conditions that make the item more difficult
for one test taker than another, even though they have the same level of competency. Here are a few
examples of those conditions: device used, distractions in the testing area, screen resolution, temperature in
the room, reading ability of the test taker, ability to read the language of the item, lack or proficiency in test
taking skills, and many more. These factors change the difficulty of the item, even though you and I believe
the item hasn’t changed in its essential properties.
Here is another question to consider: how are tests today able to combine item performances that are
produced under conditions that affect item difficulty into a total test score? The answer is complicated, of
course. Psychometric theory allows random effects to operate in answering items and still allow us to
interpret test scores properly. I believe the answer to your question about SmartItems comes from and is
supported by the same psychometric theoretical principles.
DO TESTING ORGANIZATIONS NEED TO HIRE SPECIALIZED CODERS TO
WRITE THE ITEMS?
The answer I have for this is a little bit complicated. But let me give it a shot.
No, specialized coders are not needed. Normal coders may be, however. While Caveon has developed a
nice GUI for subject matter experts (SMEs) to use, there may be types of item content that require an item
design where the GUI doesn’t fit as well. The coders can come in as consultants and help the SME create
a “template” for the more complex items. Those templates can be re-used for other items.
We recently held a workshop to create SmartItems for an entire exam. There were 6 SMEs in the room
tasked with the responsibility of creating the items. Caveon provided 2 coders for the workshop who
assisted the SMEs in helping to craft the items. (For this workshop, one of the earlier ones, the GUI was not
yet completed.) I would add that the coders themselves were individuals trained in item writing and
picked up the coding skills recently and informally. One point I hope I’m making is that coding skills are
much easier to come by than the experience and creative talents of SMEs.
SmartItems definitely do NOT require SMEs to learn to code. And coders certainly do not have the subject
matter expertise to get to first base with an item.

WHAT ABOUT DIAGNOSTIC QUESTIONS, AS IN MEDICAL EXAMS?
SYMPTOMS ARE LISTED, AND THE CANDIDATE MUST GIVE THE MOST
LIKELY CAUSE. HOW WOULD YOU WRITE THESE?
There are several ways to write them, and SmartItems can provide the same benefits that they provide
to other content and purposes. Here are my two suggestions for this:
1. Use multiple choice as the response format. This allows the test taker to compare options and locate
the best answer.
2. Use DOMC, but place the options in the stem, allowing the comparison to take place. Then using
DOMC simply ask the options again one at a time until the test taker can select the answer he or she
determined was the “best” one.
Now, how would SmartItems affect both of these solutions? Here is one way, but I’m sure you can think
of others. The SmartItem can be coded so that, within the limits of the defined medical competency,
different “best” options as well as different appropriate distractors are generated on-the-fly for a variety
of specific medical scenarios.
HAVE NURSING AND ALLIED HEALTH PROGRAMS USED SMARTITEMS
TO TEST STUDENTS?
No health-based assessment program has yet used or researched SmartItems. There are a few
beginning to look at its use. A group of researchers at the University of Dusseldorf have been
researching and publishing research on the item type (DOMC) that varies how the options are presented
and how many options. Their field of content expertise is applied medicine.
DOMC is not a SmartItem but can be used as part of a SmartItem and shares some of the same
purposes. DOMC reduces content exposure in the service of better security and eliminates the use of
test taking skills.
Another medical board, ABOG, is currently working on research with Caveon using DOMC.

HOW DO YOU ENSURE THE DIFFERENT VERSIONS
OF AN ITEM ARE EQUIVALENT IN THE WAY THEY
PERFORM ACROSS TEST TAKERS?
I answered this above, and hope that was helpful. But here is a bit more…
It is important to make sure that SmartItems, like any other item, perform well to
psychometric criteria. The assumption is that the variations it renders will
perform well also. Certainly, reviews and research can be conducted to bolster
confidence. In the end, the usual analyses can determine if a test of SmartItems
is reliable and contributes to validity criteria.
A test made up of lousy SmartItems will be a lousy test, just like any test that is
created without much thought. But a test built of very good SmartItems will
perform well and provide the amazing advantages I gave in the webinar.
ARE SMARTITEMS PART OF AUTOMATED ITEM
GENERATION?
Automated Item Generation (AIG) and the SmartItem approach have a couple of
overlapping goals, but most are different. Here is an example: AIG researchers
have stated that a goal of AIG is to create thousands of items that can serve as
replacements for test items that have been compromised through
theft/harvesting efforts. The security goal for SmartItems is to create enough
variations in real time, on-the-fly, so that theft is meaningless in the first place.
Another big difference is that SmartItems purposefully and blatantly intend to
cover an entire defined competency and handle natural differences in difficulty
as I’ve explained above. AIG research usually narrowly defines the “item
models” so that the difficulty issue isn’t an issue. Content is constrained to
produce items that can be used to replace worn out or compromised items of
equivalent difficulty and performance.
SmartItems do their work in real time while a person is taking the test. AIG
typically creates items that can be stored in item banking systems, reviewed for
accuracy, perhaps field tested, and selected at some point for use on an exam.
These are not the only differences, but they are important ones. I’m happy to
discuss these further and let you know of the others at another time. You are
welcome to contact me.

I AM ASSUMING THAT AT THIS TIME THIS IS A CUSTOMIZED CODING
AND IS THEREFORE PROPRIETARY. DOES IT REMOVE THE NECESSITY
FOR CREATING OTHER TYPES OF MC ITEMS (DRAG/DROP,) ETC?
The coding is not proprietary. It is Python. Using our tools, or the SmartItem API allows anyone to
integrate SmartItems into their system. SmartItem technology is patent-pending, and the license is
granted if our tools or the API are used. At Caveon we have learned that the value of a patent is not in
the financial benefits, but in the fact that we can require and support consistency in the implementation
of SmartItems, using technical specifications. Without the adherence to specifications, organizations
creating testing software are not often able to implement these innovations consistently.
I hope I was clear in the webinar that SmartItems work with any item type, including drag-and-drop.
Using SmartItem technology does not eliminate the value that various so-called “item types” provide. In
fact, our tools support what we call and “external” item type which allows the integration of outside
technology with an item. SmartItem technology can contribute to this type of item as well.
DOES THIS HAVE THE ABILITY TO BE ADAPTIVE BASED ON STUDENT
ABILITY OR IS IT ONLY TIED TO THE BASICS OF THE OBJECTIVE?
SmartItems eliminate the need for equivalent forms in test design but can be used for great value in
adaptive tests. SmartItems vary in difficulty compared to each other. There are difficult SmartItems (for
example, if the competency is a difficult or advanced one) and there are easy ones (for example, if the
competency is easier or is a foundation/introductory skill). This range of difficulty between SmartItems
would likely mirror that found with traditional items and can therefore be calibrated statistically and used
in computerized adaptive tests.
I’m not sure what is meant by “only tied to the basics of the objective”. Perhaps I answered the question
satisfactorily anyway.
ARE SMART ITEMS THE NEW NAME FOR DOMC?
I’m sure you can answer this question by now. DOMC retains its own name. SmartItems are not another
name for DOMC.
SmartItems can use a wide variety of item types, including DOMC. By adding DOMC to a SmartItem,
security is boosted as content exposure is further reduced, but the additional value is in removing the
irrelevant influence of test taking skills. Of course, if you wish to keep the influence of test taking skills
on your test scores, feel free to ignore DOMC benefits.

FOR THE MCQ SMARTITEM, ARE THE DISTRACTORS ALSO
CALCULATED ON THE FLY? IF SO, IS THERE SOMETHING
IN THE CODE THAT HELPS DETERMINE WHAT THE
DISTRACTORS WILL BE? THAT IS, ARE THE DISTRACTORS
COMMON ERRORS?
CAN YOU SET HOW FAR APART THE NUMBERS WILL BE
SPREAD (FOR EXAMPLE, ARE THE ANSWER CHOICES
ONLY 1 DIGIT APART OR 25 DIGITS APART)?
I remember answering this question at the end of the webinar. And I’ll try again
now.
As a SmartItem that uses options—for ultimate selection by a test taker (MC or
DOMC)—is designed and built, code will be used to generate both correct options
and incorrect ones (distractors is the name commonly used for these). Naturally,
distractors should be created or coded that are effective, relative to the correct
option(s). Nonsense, implausible distractors must be avoided. Code is one good
way to make sure that your incorrect options are effective for whatever correct
option is displayed. But there are also other non-code ways.
Like some of the other concepts above, this one might be better discussed while
viewing or even creating a SmartItem. I’m happy to facilitate a demonstration.
THANKS FOR THE QUESTIONS
I sincerely appreciate your interest in this new approach to testing. I hope you are
able to withhold judgment about its value until you have seen it in action, perhaps
with a few of your own items.
If you would like to pilot the Caveon system within your own walls, we can
accommodate that too. It’s web-based—you just need access. Trying it out is free.
When you wish to use it for your operational exams, you just have to pay the
license fee for the software.
-Dave

Caveon webinar series - smart items- using innovative item design to make your life easier Q&A

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to Caveon webinar series - smart items- using innovative item design to make your life easier Q&A

Similar to Caveon webinar series - smart items- using innovative item design to make your life easier Q&A (20)

More from Caveon Test Security

More from Caveon Test Security (20)

Recently uploaded

Recently uploaded (20)

Caveon webinar series - smart items- using innovative item design to make your life easier Q&A