Successfully reported this slideshow.
Your SlideShare is downloading. ×

Generation of Assessment Questions from Textbooks Enriched with Knowledge Models

Ad

Generation of Assessment Questions from
Textbooks Enriched with Knowledge Models
Lucas Dresscher, Isaac Alpizar Chacon, Se...

Ad

1a. Introduction
● Assessment question in digital textbooks:
○ opportunity to practice
○ opportunity to receive feedback
○...

Ad

1a. Introduction
● Assessment question in digital textbooks:
○ opportunity to practice
○ opportunity to receive feedback
○...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Upcoming SlideShare
Aera 2012
Aera 2012
Loading in …3
×

Check these out next

1 of 25 Ad
1 of 25 Ad
Advertisement

More Related Content

Similar to Generation of Assessment Questions from Textbooks Enriched with Knowledge Models (20)

More from Sergey Sosnovsky (20)

Advertisement

Generation of Assessment Questions from Textbooks Enriched with Knowledge Models

  1. 1. Generation of Assessment Questions from Textbooks Enriched with Knowledge Models Lucas Dresscher, Isaac Alpizar Chacon, Sergey Sosnovsky
  2. 2. 1a. Introduction ● Assessment question in digital textbooks: ○ opportunity to practice ○ opportunity to receive feedback ○ opportunity to engage in interactive learning
  3. 3. 1a. Introduction ● Assessment question in digital textbooks: ○ opportunity to practice ○ opportunity to receive feedback ○ opportunity to engage in interactive learning ● Assessment questions in adaptive textbooks: ○ opportunity to assess student knowledge ➢ opportunity to adapt
  4. 4. 1a. Introduction ● Assessment question in digital textbooks: ○ opportunity to practice ○ opportunity to receive feedback ○ opportunity to engage in interactive learning ● Assessment questions in adaptive textbooks: ○ opportunity to assess student knowledge ➢ opportunity to adapt ● Three primary ways to add assessment questions: ○ create them manually ○ integrate external assessment material ○ generate them automatically
  5. 5. 1b. Introduction ● AQG: current state-of-the-art ○ High-quality factoid questions ○ Limitations: ■ Simplicity ■ Few generic systems ■ Little adaptivity ● Contributions ○ Unique source type ○ Open domain ○ Variety of questions ○ Supports adaptivity
  6. 6. 2. Intextbooks Platform ● Knowledge models ○ Extraction: create semantic model ○ Enrichment: link additional information ○ Serialization: export model
  7. 7. 2. Intextbooks Platform ● Knowledge models ○ Extraction: create semantic model ○ Enrichment: link additional information ○ Serialization: export model
  8. 8. 3. Automatic Question Generation System ● Generic semantic rule-based AQG system ○ Domain independent ○ Uses textual and semantic information ○ Can generate questions for a part of a book or a (set of) concept(s)
  9. 9. 3A. Source Extraction ● Input ○ Knowledge model of a textbook ● Extraction ○ Relevant sentences ○ Index terms’ enrichments ■ Domain specificity ● Output ○ Initial set of sentences
  10. 10. 3B. Preprocessing ● Annotation pipeline Stanford CoreNLP ○ Parts of Speech (POS) ○ Dependency parsing ○ ● Filtering ○ Incorrect form (questions, imperative, grammatically incomplete,..) ○ Context reference (needs preceding sentences to make sense) ○ Visual references (figures, tables,...) ○ Numerical examples, ○ etc. ● Output ○ set of grammatically-fit annotated sentences
  11. 11. 3C. Sentence Selection ● Select most appropriate sentences ● Scoring process ○ Weighted average of individual features ■ Different aspect ■ Feature score f [0, 1] ■ Weight w ○ Sentence score s [0, 1] ○ Threshold comparison ● Output ○ Potential source phrases
  12. 12. ● Determine generatable question types ● Question types ○ True-false (unmodified) ○ True-false (negated) ○ True-false (substituted) ○ Cloze ○ Multiple-choice ● Output ○ Definitive source sentences The mean is the average. The mean is not the average. The median is the average. The ______ is the average. The ______ is the average. 3D. Question Type Selection A. Median B. Mean C. Mode D. Variance
  13. 13. 3E. Question Construction ● Create questions in surface form ● TFU ○ Directly use sentence ○ Answer: true ● TFN ○ Negate question stem ○ Answer: false ● TFS ○ Substitute target concept ○ Answer: false The mean is the average. The mean is not the average. The mean is the average. The mean is the average. The median is the average.
  14. 14. 3E. Question Construction ● CQ ○ Replace target concept by gap ○ Gap selection ● MCQ ○ CQ with options ■ Key ■ Distractors ○ Distractor generation ■ Related elements ■ Scoring procedure The mean is the average. The _____ is the average. The mean is the average. The _____ is the average. A. Median B. Mean C. Mode D. Variance
  15. 15. 4. Evaluation ● Selection procedure ○ Three university-level statistics textbooks ○ Ten randomly selected co-occurring concepts ■ Five for automatic generation ■ Five for manual creation ○ 50 questions in total
  16. 16. 4. Evaluation ● Selection procedure ○ Three university-level statistics textbooks ○ Ten randomly selected co-occurring concepts ■ Five for automatic generation ■ Five for manual creation ○ 50 questions in total ● Evaluation approach ○ Expert evaluation ○ Metrics ■ General: wording, assessment value, difficulty ■ Specific: gap quality, distractor quality
  17. 17. 4. Evaluation ● Research questions: 1. Is the approach conceptually sound? 2. Is the approach practically sound?
  18. 18. 4. Evaluation ● Inter-rater agreement (Fleiss’ Kappa) ○ 0.24 wording ○ 0.27 assessment value ○ -0.02 difficulty
  19. 19. 4. Evaluation ● Inter-rater agreement (Fleiss’ Kappa) ○ 0.24 wording ○ 0.27 assessment value ○ -0.02 difficulty ● Comparison between handcrafted and generated questions (Mann-Whitney U test) ○ Statistically significant difference for the overall assessment value (Handcrafted > Generated) ■ 0.32 (U = 413.5, P = 0.048) ■ No statistically significant difference per question type ○ No significant differences for the overall wording
  20. 20. 4. Evaluation ● Assessment value needs improvements ● Largest difference TFUs and MCQs ● TFSs particularly poor
  21. 21. 4. Evaluation ● Good overall wording ● Small differences ● TFSs, CQs and MCQs particularly good
  22. 22. 4. Evaluation ● Easy to medium ● Small differences
  23. 23. 5. Conclusion ● Limitations ○ Nature of education textbooks ○ Weights of feature sets ● Future work ○ Other domains ○ Additional features ○ Closer Intextooks integration
  24. 24. Time to Generate Some Questions
  25. 25. 4. Evaluation ● Question type specific metrics ● Gap quality good ○ No difference ○ Comment: ambiguous question ● Distractor quality mediocre ○ 2 out of 3 good handcrafted ○ 1-2 out of 3 good generated ○ Comment: unrelated to key

Editor's Notes

  • Welcome, glad you’re all here, advantage of a digital version we don’t have to worry about traffic jams all around Utrecht
    Excited to talk about research of past 7-8 months about automated generation of assessment questions from textbook models
    First introduce research area
    Why?
    Problem aims to resolve
    Current state of the art
    Then a quick look at earlier work which my research is built-upon
    Before moving on to the actual AQG system I researched
    Finalize with results from a pilot evaluation and some concluding words
  • Motivation for this research area begins with the general value of assessment questions
    Improve learning process: interactivity, reinforce learning process by repeating concepts, provides evidence of a student’s knowledge
    MOOCS platforms, like Coursera or Duolingo, rapidly grown in popularity, further enhanced by the covid-19 epidemic. With millions of users it is no longer feasible to manually develop assessment questions at such large scale → So, needs a solution.
    New techniques and improvements from other research areas, e.g. neural networks



  • Motivation for this research area begins with the general value of assessment questions
    Improve learning process: interactivity, reinforce learning process by repeating concepts, provides evidence of a student’s knowledge
    MOOCS platforms, like Coursera or Duolingo, rapidly grown in popularity, further enhanced by the covid-19 epidemic. With millions of users it is no longer feasible to manually develop assessment questions at such large scale → So, needs a solution.
    New techniques and improvements from other research areas, e.g. neural networks



  • Motivation for this research area begins with the general value of assessment questions
    Improve learning process: interactivity, reinforce learning process by repeating concepts, provides evidence of a student’s knowledge
    MOOCS platforms, like Coursera or Duolingo, rapidly grown in popularity, further enhanced by the covid-19 epidemic. With millions of users it is no longer feasible to manually develop assessment questions at such large scale → So, needs a solution.
    New techniques and improvements from other research areas, e.g. neural networks



  • Current state-of-the-art
    Capable of generating high-quality factoid questions using different types of systems and generation sources
    Limitations
    General shortcoming is the simplicity of the questions, targeting only lower cognitive levels
    Research commonly focuses on just a single question type with systems specifically designed for this type, sometimes single domain
    Little adaptivity in question scope and difficulty
  • Most importantly for this research is the extracted domain knowledge, obtained by processing the index at the end of the textbook
    Each Individual index terms are identified.
    Using the page numbers, the term is recognized in the sentences that are about this index term, which are then extracted.
    Terms are connected to DBPedia resources, adding semantic information:
    Abstracts, categories, other concepts to which the term is related, domain specificity: the relationship the index term has with the domain of the textbook.
  • Most importantly for this research is the extracted domain knowledge, obtained by processing the index at the end of the textbook
    Each Individual index terms are identified.
    Using the page numbers, the term is recognized in the sentences that are about this index term, which are then extracted.
    Terms are connected to DBPedia resources, adding semantic information:
    Abstracts, categories, other concepts to which the term is related, domain specificity: the relationship the index term has with the domain of the textbook.
  • The system operates in five major steps
  • Output: initial set of relevant sentences
  • Incorrect form:
    Questions
    Imperative sentences
    Grammatically incorrect
  • Which QT can be created is mainly determined by looking at the structure of the sentence, its target concept and relations with other related terms

  • Challenging task, because distractor needs to be semantically similar but not plausible answer themselves

    Follows same scoring procedure as sentence selection (with a feature set that combined computes the quality of the distractor)

    Substituted True-False question is selected in the same way, but requires just a single related element
  • 1 question of each of the 5 types was created for every concept, resulting in 50 questions in total (25 handcrafted, 25 generated)
  • For CQS and MCQs, they also looked at:
    how well is the gap chosen?
    how well are the selected distractors ?
  • Low but expected as it’s a difficult metric to estimate objectively.

    → usually calibrated based on data produced by real test takers.
  • Compare evaluations of handcrafted and generated questions
  • Compare evaluations of handcrafted and generated questions

×