Exams evaluate students. Who’s evaluating exams? Data-Informed Exam Design

Exams Evaluate Students:
Who’s Evaluating Exams?
Data-Informed Exam Design
G. Alex Ambrose, Professor of the Practice, Kaneb Center for Teaching and Learning
Kael Kanczuzewski, Academic Technology Professional, Learning Platforms
Xiaojing Duan, Learning Platform/Analytics Engineer, Learning Platforms
Kelley M. H Young, Assistant Teaching Professor, Dept of Chemistry and Biochemistry
J. Daniel Gezelter, Professor and Director of Undergraduate Studies, Dept of Chemistry and Biochemistry
University of Notre Dame
2019 Midwest Scholarship of Teaching and Learning Conference
1

How to Cite this Presentation:
Ambrose, G. Alex, Duan, Xiaojing, Kanczuzewski, Kael,
Young, Kelley M., & Gezelter, J. Daniel (2019) “Exams
Evaluate Students: Who’s Evaluating Exams? Data-
Informed Exam Design” 2019 Midwest Scholarship of
Teaching and Learning Conference, Indiana University-
South Bend.

- Learning Platforms
- Enterprise Architecture
- Platform Services
- InfoSec
- Business Intelligence
- Project Management
Collaborators

Research Context, Challenge, Goal, & Questions
Exam Data & Tools
Exam Item Analysis, Analytics, & Dashboard
Course & Instructor Implications
Questions & Discussion
4

Research Context, Challenge, & Goals
Exams are:
1. a tool to assess mastery,
2. an incentive to motivate students to study, and
3. the cause of retention issues for underserved and underrepresented
students in STEM majors.
Challenge: Can we make exams do the first two tasks more effectively while
fixing the retention issue?
Goals: Transform the exam design, delivery, grading, analysis, and redesign
process to make it more efficient, effective, error-free, easy to use, and
enjoyable. 5

Research Questions
RQ1: How do we evaluate an assessment technology tool?
RQ2: What are the best item analysis methods and easiest visualizations to
support students and instructors?
RQ3: What are the course, student & instructor impacts and implications for
continuous improvement changes?
6

Exam Data & Tools
7
RQ1: How do we evaluate an assessment technology tool?

Ed Tech Evaluation: SAMR + 5 E’s
8
https://www.wqusability.com/
https://www.schoology.com/blog/samr-model-practical-guide-edtech-integration
Error
Tolerant
Effective
Easy to
Learn
Efficient
Engaging
5 E’s of Usability

Data Ownership.. Or at least full access
9
https://www.jisc.ac.uk/learning-analytics

The Gradescope Pilot
Gradescope enables instructors to grade
paper-based exams or assignments online.
Paper exams are scanned. Gradescope AI
interprets responses and groups similar
answers to speed up grading. Rubrics help
ensure fair and consistent grading.
Working closely with Gradescope, we have
access to export data including item-level
question data.
10

12
Gradescope Instructor Results and SAMR
Substitution
Augmentation
Modification
Redefinition
N=14

Exam Data & Tools
15
support students and instructors?

Gradescope Current Distractor Performance
16

Item Difficulty Index
● Definition: a measure of how many students exhibited mastery of one topic.
● Formula: the percentage of the total group that got the item correct.
Reference: https://www.uwosh.edu/testing/faculty-information/test-scoring/score-report-interpretation/item-analysis-1/item-difficulty
17

Item Discrimination Index
● Definition: a measure of an item’s effectiveness in differentiating students who
mastered the topic from those who did not.
● Formula:
○ (Upper Group Percent Correct) - (Lower Group Percent Correct)
○ Upper Group = Top 27% of exam score
○ Lower Group = Lowest 27% of exam score
● Index scale:
○ 40%-100% Excellent Predictive Value
○ 25% - 39% Good Predictive Value
○ 0 - 24% Possibly No Predictive Value
Reference: https://www.uwosh.edu/testing/faculty-information/test-scoring/score-report-interpretation/item-analysis-1/item-i
19

Connecting Exam to HW Analytics
21

Exam Data & Tools
22
continuous improvement changes?

Exam design is slightly modified –
item answer spaces are delineated,
and an initial rubric is put in place.
23
Exam processing requires
significant investment in labor.

24
Exam scanning requires roughly
2-3 hours additional time for 1000
exams.
Exam grading is much smoother,
and improvements are immediately
apparent.

Exam feedback can be more
informative and personalized.
● Applied rubric items
● Personal item feedback
25
Exam data is easily accessible.
● Overall exam statistics
● Item-by-item statistics
● Grades synchronize with LMS

Regrade requests drop dramatically
● Previous benchmark of 40
requests for 1000 exams
(4% regrades)
After moving to Gradescope:
● Exam item regrade requests: 64
● Total exam items graded: 85,078
● 0.075% regrades
26
Monolithic Exams
The data analytics and economies of
scale are only possible with monolithic
exams.
Multiple versions and randomized
answers are still works in progress at
Gradescope.
Many large courses at ND don’t
currently use monolithic exams.

Test Item Library
We want test questions that efficiently
gauge mastery of material.
We want to eliminate item bias,
particularly linguistic and cultural
biases.
Do free response items test mastery of
material that multiple choice items
don’t capture?
27
Early Warning System
Can we catch struggling students early
in the semester?
Do homework attempts signify
problems with mastery?
Do particular homework items correlate
with particular exam items?
Which homework items don’t provide
mastery on exams?

Summary
RQ1: How do we evaluate an assessment technology tool? (SAMR, 5 E’s)
support students and instructors? (Distractor Performance, Item
Difficulty & Discrimination)
continuous improvement changes? (Scanning, Re+Grading, Feedback,
Data & Analytics, Revisit & Revise Test Item Library, & Early Warning)
28

Future Work?
● Early Warning: Cross-reference student learning activity, homework
analytics, and exam item analysis to let instructors intervene early to
improve student performance, course, and assessment design.
● Question Bank: Over time make a more inclusive question bank (not too
long without any unintentional bias) in Gradescope and compare previous
exam items year over year.
● Deeper Analysis: Overlay filters based on demographics, SES, ESL, and
HS preparation
● Scale to other STEM Courses: Calculus, Organic Chemistry, and Physics
29

References
Ambrose, G. Alex, Abbott, Kevin, Lanski, Alison (2017) “Under the Hood of a Next Generation Digital Learning Environment in Progress” Educause
Review.
Gugiu, M. R., & Gugiu, P. C. (2013). Utilizing item analysis to improve the evaluation of student performance. Journal of Political Science Education, 9(3),
345-361
Kern, Beth, et al. "The role of SoTL in the academy: Upon the 25th anniversary of Boyer’s Scholarship Reconsidered." Journal of the Scholarship of
Teaching and Learning 15.3 (2015):1-14.
Miller, Patrick, Duan, Xiajing (2018) “NGDLE Learning Analytics: Gaining a 360-Degree View of Learning” Educause Review.
Nielsen, J. (1993). Usability Engineering (1st ed.). Morgan Kaufmann.
Nieveen, N., & van den Akker, J. (1999). Exploring the potential of a computer tool for instructional developers. Educational Technology Research and
Development, 47(3), 77-98.
Puentedura, R. R. (2014). SAMR and TPCK: A hands-on approach to classroom practice. Hipassus. En ligne: Retrieved from:
http://www.hippasus.com/rrpweblog/archives/2012/09/03/BuildingUponSAMR.pdf
Siri, A., & Freddano, M. (2011). The use of item analysis for the improvement of objective examinations. Procedia-Social and Behavioral Sciences, 29, 188-
197.
Syed, M., Anggara, T., Duan, X., Lanski, A., Chawla, N. & Ambrose, G. A. (2018) Learning Analytics Modular Kit: A Closed Loop Success Story in Boosting
Students Proceedings of the International Conference on Learning Analytics & Knowledge.
30

Research Problem, Goal, Questions, and Context
Exam Data & Tools
31

32
More Information, Connect, Collaborate?
Visit our Lab Blog at sites.nd.edu/real

Exams evaluate students. Who’s evaluating exams? Data-Informed Exam Design

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Exams evaluate students. Who’s evaluating exams? Data-Informed Exam Design

Similar to Exams evaluate students. Who’s evaluating exams? Data-Informed Exam Design (20)

More from G. Alex Ambrose

More from G. Alex Ambrose (8)

Recently uploaded

Recently uploaded (20)

Exams evaluate students. Who’s evaluating exams? Data-Informed Exam Design