2019 Midwest Scholarship of Teaching & Learning (SOTL) conference presentation. The goal of this presentation is to share our data-informed approach to re-engineer the exam design, delivery, grading, and item analysis process in order to construct better exams that maximize all students potential to flourish. Can we make the use of exam analytics so easy and time efficient that faculty clearly see the benefit? For more info see our blog at https://kaneb.nd.edu/real/
1. Exams Evaluate Students:
Who’s Evaluating Exams?
Data-Informed Exam Design
G. Alex Ambrose, Professor of the Practice, Kaneb Center for Teaching and Learning
Kael Kanczuzewski, Academic Technology Professional, Learning Platforms
Xiaojing Duan, Learning Platform/Analytics Engineer, Learning Platforms
Kelley M. H Young, Assistant Teaching Professor, Dept of Chemistry and Biochemistry
J. Daniel Gezelter, Professor and Director of Undergraduate Studies, Dept of Chemistry and Biochemistry
University of Notre Dame
2019 Midwest Scholarship of Teaching and Learning Conference
1
2. How to Cite this Presentation:
Ambrose, G. Alex, Duan, Xiaojing, Kanczuzewski, Kael,
Young, Kelley M., & Gezelter, J. Daniel (2019) “Exams
Evaluate Students: Who’s Evaluating Exams? Data-
Informed Exam Design” 2019 Midwest Scholarship of
Teaching and Learning Conference, Indiana University-
South Bend.
5. Research Context, Challenge, & Goals
Exams are:
1. a tool to assess mastery,
2. an incentive to motivate students to study, and
3. the cause of retention issues for underserved and underrepresented
students in STEM majors.
Challenge: Can we make exams do the first two tasks more effectively while
fixing the retention issue?
Goals: Transform the exam design, delivery, grading, analysis, and redesign
process to make it more efficient, effective, error-free, easy to use, and
enjoyable. 5
6. Research Questions
RQ1: How do we evaluate an assessment technology tool?
RQ2: What are the best item analysis methods and easiest visualizations to
support students and instructors?
RQ3: What are the course, student & instructor impacts and implications for
continuous improvement changes?
6
7. Research Context, Challenge, Goal, & Questions
Exam Data & Tools
Exam Item Analysis, Analytics, & Dashboard
Course & Instructor Implications
Questions & Discussion
7
RQ1: How do we evaluate an assessment technology tool?
8. Ed Tech Evaluation: SAMR + 5 E’s
8
https://www.wqusability.com/
https://www.schoology.com/blog/samr-model-practical-guide-edtech-integration
Error
Tolerant
Effective
Easy to
Learn
Efficient
Engaging
5 E’s of Usability
9. Data Ownership.. Or at least full access
9
https://www.jisc.ac.uk/learning-analytics
10. The Gradescope Pilot
Gradescope enables instructors to grade
paper-based exams or assignments online.
Paper exams are scanned. Gradescope AI
interprets responses and groups similar
answers to speed up grading. Rubrics help
ensure fair and consistent grading.
Working closely with Gradescope, we have
access to export data including item-level
question data.
10
15. Research Context, Challenge, Goal, & Questions
Exam Data & Tools
Exam Item Analysis, Analytics, & Dashboard
Course & Instructor Implications
Questions & Discussion
15
RQ2: What are the best item analysis methods and easiest visualizations to
support students and instructors?
17. Item Difficulty Index
● Definition: a measure of how many students exhibited mastery of one topic.
● Formula: the percentage of the total group that got the item correct.
Reference: https://www.uwosh.edu/testing/faculty-information/test-scoring/score-report-interpretation/item-analysis-1/item-difficulty
17
19. Item Discrimination Index
● Definition: a measure of an item’s effectiveness in differentiating students who
mastered the topic from those who did not.
● Formula:
○ (Upper Group Percent Correct) - (Lower Group Percent Correct)
○ Upper Group = Top 27% of exam score
○ Lower Group = Lowest 27% of exam score
● Index scale:
○ 40%-100% Excellent Predictive Value
○ 25% - 39% Good Predictive Value
○ 0 - 24% Possibly No Predictive Value
Reference: https://www.uwosh.edu/testing/faculty-information/test-scoring/score-report-interpretation/item-analysis-1/item-i
19
22. Research Context, Challenge, Goal, & Questions
Exam Data & Tools
Exam Item Analysis, Analytics, & Dashboard
Course & Instructor Implications
Questions & Discussion
22
RQ3: What are the course, student & instructor impacts and implications for
continuous improvement changes?
23. Course & Instructor Implications
Exam design is slightly modified –
item answer spaces are delineated,
and an initial rubric is put in place.
23
Exam processing requires
significant investment in labor.
24. Course & Instructor Implications
24
Exam scanning requires roughly
2-3 hours additional time for 1000
exams.
Exam grading is much smoother,
and improvements are immediately
apparent.
25. Course & Instructor Implications
Exam feedback can be more
informative and personalized.
● Applied rubric items
● Personal item feedback
25
Exam data is easily accessible.
● Overall exam statistics
● Item-by-item statistics
● Grades synchronize with LMS
26. Course & Instructor Implications
Regrade requests drop dramatically
● Previous benchmark of 40
requests for 1000 exams
(4% regrades)
After moving to Gradescope:
● Exam item regrade requests: 64
● Total exam items graded: 85,078
● 0.075% regrades
26
Monolithic Exams
The data analytics and economies of
scale are only possible with monolithic
exams.
Multiple versions and randomized
answers are still works in progress at
Gradescope.
Many large courses at ND don’t
currently use monolithic exams.
27. Course & Instructor Implications
Test Item Library
We want test questions that efficiently
gauge mastery of material.
We want to eliminate item bias,
particularly linguistic and cultural
biases.
Do free response items test mastery of
material that multiple choice items
don’t capture?
27
Early Warning System
Can we catch struggling students early
in the semester?
Do homework attempts signify
problems with mastery?
Do particular homework items correlate
with particular exam items?
Which homework items don’t provide
mastery on exams?
28. Summary
RQ1: How do we evaluate an assessment technology tool? (SAMR, 5 E’s)
RQ2: What are the best item analysis methods and easiest visualizations to
support students and instructors? (Distractor Performance, Item
Difficulty & Discrimination)
RQ3: What are the course, student & instructor impacts and implications for
continuous improvement changes? (Scanning, Re+Grading, Feedback,
Data & Analytics, Revisit & Revise Test Item Library, & Early Warning)
28
29. Future Work?
● Early Warning: Cross-reference student learning activity, homework
analytics, and exam item analysis to let instructors intervene early to
improve student performance, course, and assessment design.
● Question Bank: Over time make a more inclusive question bank (not too
long without any unintentional bias) in Gradescope and compare previous
exam items year over year.
● Deeper Analysis: Overlay filters based on demographics, SES, ESL, and
HS preparation
● Scale to other STEM Courses: Calculus, Organic Chemistry, and Physics
29
30. References
Ambrose, G. Alex, Abbott, Kevin, Lanski, Alison (2017) “Under the Hood of a Next Generation Digital Learning Environment in Progress” Educause
Review.
Gugiu, M. R., & Gugiu, P. C. (2013). Utilizing item analysis to improve the evaluation of student performance. Journal of Political Science Education, 9(3),
345-361
Kern, Beth, et al. "The role of SoTL in the academy: Upon the 25th anniversary of Boyer’s Scholarship Reconsidered." Journal of the Scholarship of
Teaching and Learning 15.3 (2015):1-14.
Miller, Patrick, Duan, Xiajing (2018) “NGDLE Learning Analytics: Gaining a 360-Degree View of Learning” Educause Review.
Nielsen, J. (1993). Usability Engineering (1st ed.). Morgan Kaufmann.
Nieveen, N., & van den Akker, J. (1999). Exploring the potential of a computer tool for instructional developers. Educational Technology Research and
Development, 47(3), 77-98.
Puentedura, R. R. (2014). SAMR and TPCK: A hands-on approach to classroom practice. Hipassus. En ligne: Retrieved from:
http://www.hippasus.com/rrpweblog/archives/2012/09/03/BuildingUponSAMR.pdf
Siri, A., & Freddano, M. (2011). The use of item analysis for the improvement of objective examinations. Procedia-Social and Behavioral Sciences, 29, 188-
197.
Syed, M., Anggara, T., Duan, X., Lanski, A., Chawla, N. & Ambrose, G. A. (2018) Learning Analytics Modular Kit: A Closed Loop Success Story in Boosting
Students Proceedings of the International Conference on Learning Analytics & Knowledge.
30
31. Research Problem, Goal, Questions, and Context
Exam Data & Tools
Exam Item Analysis, Analytics, & Dashboard
Course & Instructor Implications
Questions & Discussion
31