14th International Conference on Methodologies and
Intelligent Systems for Technology Enhanced Learning
University of Salamanca (Spain)
26th-28th June, 2024
Introduction
● Importance of Feedback in Education
○ Crucial for understanding and improving learning
○ Formative assessment enhances learning outcomes
● Objective of the Study
○ Assess the quality of three types of feedback:
■ Schematic Feedback (Traditional AI)
■ Discursive Feedback (LLM)
■ Combined Feedback (LLM + Schematic Feedback)
Research Questions
RQ1: How can we exploit Large Language Models (LLMs) to provide helpful feedback to
students in our educational scenario?
● Objective: Investigate how effectively LLMs, such as ChatGPT-4, can be utilized to generate
feedback that is both informative and supportive for students in data science courses.
● Approach: Compare traditional AI-generated feedback with feedback generated by LLMs
and a combination of both, focusing on clarity, accuracy, and educational value.
RQ2: What is the students' perception of the feedback provided?
● Objective: Understand the students' viewpoints on the usefulness, clarity, and overall quality
of the feedback they receive.
● Approach: Collect and analyze student feedback through surveys and course evaluations,
comparing their experiences with LLM-generated feedback against traditional feedback
methods.
Methodology - The Feedback
Methodology - Schematic Feedback
Schematic Feedback
● Technique: Toolchain compares correct solution with student's solution
● Feedback Generation: a couple of 3-ple
a. list 1: (command_1, output_1, optionalComment_1)
b. list 2: (command_2, output_2, optionalComment_2)
Feedback comparison
● Step 1: Calculated a distance as an estimate of the final grade
● Step 2: Statistical analysis of code evaluates differences between
command/output ratios by returning feedback on missing commands
● Step 3: classifier asserts the comment correctness returning feedback
Methodology - LLM-generated Feedback
● Technique: Two types of feedback generated via chatGPT-4
a. 1st - discursive feedback: (i) student model, (ii) the exercise,
(iii) given solution, (iv) chatGPT explain the mistakes
b. 2nd - combine feedback: enriches the 1st with the schematic
feedback
Methodology - LLMgen -> Discursive Feedback
● Technique: Produced from a prompts built with schematic feedback informations
● Student Model: contains general ability in concept taught in the course
estimated through the Rasch model
Methodology - LLM gen -> Combined Feedback
● Technique: 2 types of feedback
through chatGPT-4
a. feedback 1. prompt with (i) the
student's model, (ii) the exercise (iii) the
solution provided, and (iv) a sentence
asking chatGPT to explain errors if any.
b. feedback 2. enriches the prompt with
the schematic feedback we saw earlier
● 3 steps evaluation:
a. estimate of the final grade by a distance
b. statistical analysis to evaluate command/output ratio
c. a classifier asserts whether the comment is correct by returning feedback on correctness
Evaluation by Teachers
Objective: answering RQ1 comparing the 3 feedbacks
● RQ1.1: (i) Identify the best of the 3 feedbacks by the students in 60 wrong solutions. (ii)
and asked two professors to evaluate the feedback deemed clear enough
● RQ1.2: Compares student and correct solutions
Evaluation by Students
Objective: answering RQ2 analysing LLM
and SF & CF utility
● RQ2.1: (i) at the beginning of the course,
evaluation usefulness of AI-feedback. (ii)
before the exam, assessment of student
experience with AI-feedback.
● RQ2.2: a couple of questions about
Schematic & Combined feedbacks:
a. what about the effectiveness of
schematic vs. combined feedback.
b. Are schematics sufficient, or is a mix
sometimes better, even necessary?
Conclusion
Key Findings
● 3 types of feedbacks:
a. (i) Schematic Feedback by toolchain with status analysis code and a Classifier for short
sentences.
b. (ii) Discursive Feedback using Schematic Feedback infos, and (iii) Combined Feedback
whose exploits LLM (ChatGPT-4
● Combined Feedback is most effective in helping students
Implications
● Potential for broader application of LLMs in educational settings
● Importance of context-specific prompts for effective feedback

Exploring the Impact of LLM-Generated Feedback

  • 1.
    14th International Conferenceon Methodologies and Intelligent Systems for Technology Enhanced Learning University of Salamanca (Spain) 26th-28th June, 2024
  • 2.
    Introduction ● Importance ofFeedback in Education ○ Crucial for understanding and improving learning ○ Formative assessment enhances learning outcomes ● Objective of the Study ○ Assess the quality of three types of feedback: ■ Schematic Feedback (Traditional AI) ■ Discursive Feedback (LLM) ■ Combined Feedback (LLM + Schematic Feedback)
  • 3.
    Research Questions RQ1: Howcan we exploit Large Language Models (LLMs) to provide helpful feedback to students in our educational scenario? ● Objective: Investigate how effectively LLMs, such as ChatGPT-4, can be utilized to generate feedback that is both informative and supportive for students in data science courses. ● Approach: Compare traditional AI-generated feedback with feedback generated by LLMs and a combination of both, focusing on clarity, accuracy, and educational value. RQ2: What is the students' perception of the feedback provided? ● Objective: Understand the students' viewpoints on the usefulness, clarity, and overall quality of the feedback they receive. ● Approach: Collect and analyze student feedback through surveys and course evaluations, comparing their experiences with LLM-generated feedback against traditional feedback methods.
  • 4.
  • 5.
    Methodology - SchematicFeedback Schematic Feedback ● Technique: Toolchain compares correct solution with student's solution ● Feedback Generation: a couple of 3-ple a. list 1: (command_1, output_1, optionalComment_1) b. list 2: (command_2, output_2, optionalComment_2) Feedback comparison ● Step 1: Calculated a distance as an estimate of the final grade ● Step 2: Statistical analysis of code evaluates differences between command/output ratios by returning feedback on missing commands ● Step 3: classifier asserts the comment correctness returning feedback
  • 6.
    Methodology - LLM-generatedFeedback ● Technique: Two types of feedback generated via chatGPT-4 a. 1st - discursive feedback: (i) student model, (ii) the exercise, (iii) given solution, (iv) chatGPT explain the mistakes b. 2nd - combine feedback: enriches the 1st with the schematic feedback
  • 7.
    Methodology - LLMgen-> Discursive Feedback ● Technique: Produced from a prompts built with schematic feedback informations ● Student Model: contains general ability in concept taught in the course estimated through the Rasch model
  • 8.
    Methodology - LLMgen -> Combined Feedback ● Technique: 2 types of feedback through chatGPT-4 a. feedback 1. prompt with (i) the student's model, (ii) the exercise (iii) the solution provided, and (iv) a sentence asking chatGPT to explain errors if any. b. feedback 2. enriches the prompt with the schematic feedback we saw earlier ● 3 steps evaluation: a. estimate of the final grade by a distance b. statistical analysis to evaluate command/output ratio c. a classifier asserts whether the comment is correct by returning feedback on correctness
  • 9.
    Evaluation by Teachers Objective:answering RQ1 comparing the 3 feedbacks ● RQ1.1: (i) Identify the best of the 3 feedbacks by the students in 60 wrong solutions. (ii) and asked two professors to evaluate the feedback deemed clear enough ● RQ1.2: Compares student and correct solutions
  • 10.
    Evaluation by Students Objective:answering RQ2 analysing LLM and SF & CF utility ● RQ2.1: (i) at the beginning of the course, evaluation usefulness of AI-feedback. (ii) before the exam, assessment of student experience with AI-feedback. ● RQ2.2: a couple of questions about Schematic & Combined feedbacks: a. what about the effectiveness of schematic vs. combined feedback. b. Are schematics sufficient, or is a mix sometimes better, even necessary?
  • 11.
    Conclusion Key Findings ● 3types of feedbacks: a. (i) Schematic Feedback by toolchain with status analysis code and a Classifier for short sentences. b. (ii) Discursive Feedback using Schematic Feedback infos, and (iii) Combined Feedback whose exploits LLM (ChatGPT-4 ● Combined Feedback is most effective in helping students Implications ● Potential for broader application of LLMs in educational settings ● Importance of context-specific prompts for effective feedback