Integrating formative practice questions with textbook content at frequent intervals creates an active learning environment that is more effective for student learning. Advances in artificial intelligence have made it possible to develop automatic question generation systems robust enough for use with students at scale. In this paper, we analyze five types of automatically generated questions using data from hundreds of thousands of students across more than eight thousand textbooks. The difficulty and persistence performance metrics of these questions build on previous research and reveal insights into question performance and student behavior. Metacognitive tutorial activities are also generated, and investigation into students’ open- ended responses show differences in how students apply what they have learned from the text.
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Advancing Intelligent Textbooks with Automatically Generated Practice: A Large-Scale Analysis of Student Data
1. c
Advancing Intelligent Textbooks with Automatically
Generated Practice:
A Large-Scale Analysis of Student Data
Rachel Van Campenhout, Michelle Clark, Bill Jerome, Jeffrey S. Dittel, and Benny G.
Johnson
VitalSource Learning Science
iTextbooks Workshop, AIED 2023
2. c
Goals for this Research
This study evaluates the automatically generated (AG) questions in two
different ways: analyzing questions based on difficulty and persistence
where applicable, and performing an initial analysis of students’ textual
responses.
There are two primary research goals in this investigation:
1. To learn more about the performance of AG questions at a large scale
2. To gain a better understanding of emerging patterns in student
behavior
3. c
Automatic Question
Generation: Prior Research In the same course, we compared
AG questions to human-authored
questions on:
• Engagement
• Difficulty
• Persistence
• Discrimination
This provided a baseline of
question performance (for
matching and fill-in-the-blank) in a
classroom setting.
Van Campenhout, R., Dittel, J. S., Jerome, B., & Johnson, B. G. (2021).
Transforming textbooks into learning by doing environments: an
evaluation of textbook-based automatic question generation. In: Third
Workshop on Intelligent Textbooks at the 22nd International Conference
on Artificial Intelligence in Education. CEUR Workshop Proceedings,
ISSN 1613-0073, pp. 1–12. Retrieved from: http://ceur-ws.org/Vol-
2895/paper06.pdf
Johnson, B. G., Dittel, J. S., Van Campenhout, R., & Jerome, B. (2022).
Discrimination of automatically generated questions used as formative
practice. Proceedings of the Ninth ACM Conference on
Learning@Scale (pp. 325-329).
https://doi.org/10.1145/3491140.3528323
4. c
CoachMe Practice Questions
Describe how AI-generated
questions are used in the
classroom at the
University of Central
Florida.
Share benefits and
lessons learned on using
AI-generated courseware
in the classroom.
• Formative questions
aligned to short text
sections
• 5 question types +
tutorials
• Primarily lower-level
Bloom’s with tutorials
scaffolding to higher
levels
• Available across
domain categories
5. c
Method of Generation
The textbook is used as the corpus for the natural language processing
for question generation.
Using Kurdi et al.,’s method categorization:
Levels of Understanding
• Semantic and syntactic
Procedures of Transformation
• Primarily rule-based system
6. c
Data Set
This data set includes all student interaction events between January 4,
2022 (launch) through May 12, 2023, and includes:
• 8,407 textbooks
• 334,902 students
• 941,318 unique questions
• 8,753,453 clickstream events
• 5,370,981 total number of questions answered
• 7,077,271 total number of individual answer attempts
7. c
Difficulty
In this study, difficulty is determined by the students’ first answer
attempts on the questions. The difficulty index (percentage of students
answering correctly on the first attempt) for each AG question type is
noted below.
8. c
Persistence
For students who answer a question incorrectly on their first attempt,
persistence is defined as continuing to answer until entering the correct
response. This table gives the persistence rate for each question type.
9. c
Student Behavior: “Non-Genuine”
Answers
For FITB questions, students could enter anything. How often do they
submit “non-genuine” responses? A set of simple rules were developed
to estimate the percentage of non-genuine first attempts such as:
• Very short answers (less than 3 characters)
• Answers with no vowels
• Answers containing punctuation
• Known common non-answers (e.g., “idk”)
About 12.2% of responses were categorized as non-genuine.
Persistence for this 12.2% was 46.5%.
10. c
Single Course Comparison
How do these metrics compare to questions used in a known classroom
context?
Non-genuine answers were 17.1%, but persistence for that group was
92.0%
11. c
Metacognitive Tutorials
• For select MC
questions, correct
responses initiate
follow-up questions
• Students correct a
peer’s incorrect
choice, allowing for
higher level cognitive
and metacognitive
processes
13. c
Metacognitive Tutorial Responses: Key
Terms and Length
How did students use the correct and incorrect key terms? How does that
relate to answer length?
14. c
Metacognitive Tutorial Responses: Single
Course Comparison
How do these results compare to a single course?
Left: all data Right: Criminal Justice
15. c
Summary of Findings
• The difficulty and persistence performance metrics were slightly lower
yet qualitatively consistent with prior research on AG questions within
courseware
• Within a university course, both the difficulty index and persistence
rate increased to levels comparable to the courses in prior research.
• Some students (12%) engage in “non-genuine” response strategies,
but nearly half persist.
• Student tutorial responses reveal a relationship between key terms
used and length.
16. c
Conclusion
Recent advancements in artificial intelligence, specifically in natural
language processing and machine learning tools, have facilitated the
development of automatic question generation systems capable of
producing high-quality formative practice questions.
AQG systems can accomplish what is otherwise too costly—the
generation of millions of formative practice questions to support learning
by doing in textbooks at scale.
Application of artificial intelligence in accordance with learning science
research has significant potential for benefiting students.
17. c
For questions on the research, feel free to reach out to
rachel.vancampenhout@vitalsource.com
Thank You!