Statistical detection of test fraud (data forensics) - where do I start?

Nathan Thompson, PhD
Terry Ausman
Statistical Detection: Where do I start?

Welcome!
 This is some of the lessons I have learned
while diving into the field.
 Overview of the topic
 Discuss resources
 Save time and effort for anyone starting out
 Purpose is NOT to be a full workshop on
data forensics
2

Outline
 History
 Where do I start learning?
 Resources
 What are threats to test security?
 How do I start deterring?
 Deterrent solutions like weblock and remote
proctoring
 How do I start detecting?
 Intro to data forensics
 Software for detection
3

History
 Literature dates before 1950
 Many collusion indices
 Most were descriptive or completely ad hoc
 Notable exception: Frary, Tideman, and Watts
(1977) – G2
 Modern era started when Wollack adapted G2
to IRT
 Other analyses not as much literature
4

Resources
In the past, if you wanted to learn:
1. Read all the original articles
2. Read reviews
• Bliss (2012) Covington Award – 25 indices
• Khalid, Mehmood, & Rehman (2011) – 20 indices
• Cizek 1997 book: good but little attention to
forensics
• You still need all the originals.
6
UNTIL…

Resources
Wollack &
Maynes (2013)
7
Kingston & Clark
(2014)
 You can now start here!

Overview of Security Threats
 Major sources of issues
 Brain dump makers (harvesting)
 Brain dump takers (preknowledge)
 Specific location problems
 Examinee collusion
 Receiving help (teacher, proctor, outside)
 Proxy testing
 What is your list?
8

Harvesting
9
What: Steal your
content and make it
public
Why: Often (but not
always) to make
money
How: Memorization
or images; Brain
dump sites
Deter: CAT/LOFT
Detect: Unusual
responses & latencies;
brain dump comparisons;
Trojan Horses
Minimize: Frequent
republishing

Preknowledge
10
What: Knowing
the questions and
answers
Why: Easy pass
How: Brain dump
sites (used to be
word of mouth)
Deter: CAT/LOFT
Detect: High score, low
time; brain dump
comparisons; Trojan
Horses
Minimize: Frequent
republishing

Examinee Collusion
11
What: Copying
Why: More
items correct
How: Individual
or group effort
Deter: CAT/LOFT,
multiple forms,
proctors
Detect: Collusion
indices, group
rollups
Minimize:
CAT/LOFT,
multiple forms

Receiving help
12
What: Teacher,
proctor, or
outside aid
Why: More items
correct; often
benefits the aider
How: Individual
or group effort
Deter: CAT/LOFT,
multiple forms,
proctors
Detect: Collusion
indices, group rollups,
erasure
Minimize: CAT/LOFT,
multiple forms, TEIs,
Perf tests

Many options
 User roles in test development
 Limit access to test content during delivery
 Verify identity of examinee
 Test window date/time
 Test location (IP addresses)
 Lockdown browser
 Proctor/Examinee authentication
 Biometrics for ID
 Proctor training
14

It’s a Hypothesis Test!
17
 First step:
 Identify the threats you are worried about and how
you think it would present itself in data

 Independent variables
 Test centers/locations
 Countries
 Training programs
 Test forms
 Individuals
18

 Dependent variables
 Item response or test time
 Item statistics
 Test statistics (mean/SD, pass rate)
 Person statistics (intra-individual)
 Collusion indices
19

20
If you aim at
nothing, that’s
exactly what
you’ll hit.

 Example: Teachers helping kids
 Item statistics different than other teachers
 Relatively high scores with relatively short time
– bivariate plot?
 Item latencies different than other teachers
21

 Example: Brain dump users
 Responses on Trojan Horses
 Relatively high scores with
relatively short time
 Item latencies
 Group level not likely (could be
at any test center)
22

 Time
 High score, low time: Preknowledge or aid
 Low score, high time: Harvester
 Response patterns
 Person fit
 Score gains
23
Step 2: Determine your analysis

Options for Detection
24
Intra-Indivvidual
• Time/RTE
(CBT only)
• Response
patterns
• Score gains
• Person fit
Inter-Individual
• Collusion
Indices
• Erasure
(paper only,
also Group
level)
Group
• Roll-up of
intra and
inter
• Descriptive
Statistics

More on Collusion Indices
 How is collusion quantified? 100 item test…
 Error similarity – we both had 10 errors:
 Same items?
 Same responses on those items?
 Response similarity
 We gave the same response on 50 items? 90?
 Some indices are standardized/probabilistic (good)
 Some are descriptive or non-probabilistic (bad)
 Can vary in direction (one/two)
25

More on Collusion Indices
 There are issues to consider when
comparing:
 ESA only looks at errors, ignores rest of data
 Major confound with ability
 Two examinees with 99/100 will get flagged as
collusion!
 Therefore important to condition on this
 Some indices have no theoretical basis
whatsoever
26

More about collusion
27
Probabilistic Descriptive Ad
hoc
Error
Similarity
B&B EIC EEIC HH
HHJ
Response
Similarity
Wollack’s Omega
Wesolowsky Zjk
Frary et al G2
RIC

More resources
 ITC Guidelines on the Security of Tests,
Examinations, and Other Assessments
 TILSA Test Security Guidebook
 Conference presentations/workshops
(harder to find)
28

Software
 Next step: Find software that meets your
needs
 Scrutiny!
 S-check
 R packages (CopyDetect)
 SIFT
 Integrity
 Caveon
 IRT software like IRTPRO or Xcalibre
29

30
Epilogue: Then what?
 Define a pathway for
investigation and
actions
 Joy Matthews-Lopez
and Paul Jones

Examples (if time)
 500 certification candidates
 Gr4 Math (locations)
 Check on teachers and schools; there is
incentive to help students
31

Statistical detection of test fraud (data forensics) - where do I start?

Recommended

Recommended

More Related Content

Similar to Statistical detection of test fraud (data forensics) - where do I start?

Similar to Statistical detection of test fraud (data forensics) - where do I start? (20)

More from Nathan Thompson

More from Nathan Thompson (8)

Recently uploaded

Recently uploaded (20)

Statistical detection of test fraud (data forensics) - where do I start?

Editor's Notes