Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Investigating Automated Student
Modeling in a Java MOOC
Michael Yudelson1, Roya Hosseini2,
Arto Vihavainen3, & Peter Brusi...
Everybody’s Coding
• Programming is no longer the trade of the few
– Wide penetration of computer science
– Challenge for ...
Problem
• Programming course (MOOC or otherwise) at
University of Helsinki
– 100 close-formed/open-ended assignments over ...
Data
• Every snapshot compiled and ran against tests
• JavaParser* extracted concepts/skills (programming
constructs)
• In...
Questions
• Given the approach is fully automated
– Can we build accurate models of learning?
– Can we do that while using...
Methodology (1)
• Modeling student learning
– Additive Factors Model
• responseilj = studenti + problemj +
Σk(skillk + ski...
Methodology (2)
• Selecting relevant concepts (+PC)
– PC – conditional independence search algorithm
from Tetrad tool*
– W...
Methodology (3)
• Validating models
– Consecutive code snapshots and changes in
passing/failing the tests (YY, YN, NY, NN)...
Methodology (4)
• Conditional probabilities – relative frequencies of
– A: pass-to-pass – no-negative support for any chan...
Results (1)
• Accuracy, Data size,
Validation values
Michael V. Yudelson (C) 2014 10
Results (2)
Model Acc. Acc. rnk File Sz rnk Val. A-D rnk Val. B,D rnk Overall rnk
Rasch .71 - - - - -
AFM A .81
AFM B .73
...
Discussion
• It is possible to fully automate student
modeling (in programming domain) with a
fraction of rich data
• Mode...
Future Work
• Address concept counts in snapshots
• Make use of code structure (parse trees)
• Make use of student behavio...
Thank You!
Michael V. Yudelson (C) 2014 14
Upcoming SlideShare
Loading in …5
×

Edm2014 investigating automated student modeling in a java mooc

309 views

Published on

EDM 2014

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Edm2014 investigating automated student modeling in a java mooc

  1. 1. Investigating Automated Student Modeling in a Java MOOC Michael Yudelson1, Roya Hosseini2, Arto Vihavainen3, & Peter Brusilovsky2 1Carnegie Learning, 2University of Pittsburgh, 3University of Helsinki
  2. 2. Everybody’s Coding • Programming is no longer the trade of the few – Wide penetration of computer science – Challenge for educators • Talent pool is different • Abundance of learning materials doesn’t help – Even if digital, there’s no persistent student model – New languages appear and need to be taught (e.g., R, Swift) Michael V. Yudelson (C) 2014 2
  3. 3. Problem • Programming course (MOOC or otherwise) at University of Helsinki – 100 close-formed/open-ended assignments over 6 weeks (101-103 lines of code each) – NetBeans plugin for testing/submitting/feedback – Code snapshots are meticulously archived – No provisions to account for student learning (no student model) • On top of black-box-style pass/fail code grading – Build longitudinal student model automatically – Non-trivial programming assignments Michael V. Yudelson (C) 2014 3
  4. 4. Data • Every snapshot compiled and ran against tests • JavaParser* extracted concepts/skills (programming constructs) • Incremental snapshots that did not result in changes to concepts removed Course Students All (Male) Age Min/Median/ Max Code snapshots All / Median Intro to Programming, Fall 2012 185 (121) 18 / 22 / 65 204460 / 1131 Intro to Programming, Fall 2013 207 (147) 18 / 22 / 57 263574 / 1126 Programming MOOC, Spring 2013 683 (492) 13 / 23 / 75 842356 / 876 * Hosseini, R., & Brusilovsky, P. (2013). JavaParser: A Fine- Grain Concept Indexing Tool for Java Problems. In The First Workshop on AI-supported Education for Computer Science (AIEDCS 2013) (pp. 60-63). Michael V. Yudelson (C) 2014 4 Code for assignment: automatically saved, ran against tests, submitted
  5. 5. Questions • Given the approach is fully automated – Can we build accurate models of learning? – Can we do that while using a fraction of the data? • Only fraction of the concepts are relevant in each successive code snapshot – Can the models be used beyond detecting student progress • E.g. for building an intelligent [fully automated] hinting component for struggling students Michael V. Yudelson (C) 2014 5
  6. 6. Methodology (1) • Modeling student learning – Additive Factors Model • responseilj = studenti + problemj + Σk(skillk + skill_slopek * attemtpsik) • responseij – student ith code passing test l for problem j • Selecting concepts (AFM A, AFM B, AFM C) – A. all concepts available – B. changes from the previous snapshot – C. changes, distinguishing addition/deletion Michael V. Yudelson (C) 2014 6
  7. 7. Methodology (2) • Selecting relevant concepts (+PC) – PC – conditional independence search algorithm from Tetrad tool* – What concepts are associated with [not] passing the test – PC data-mining task was setup for each problem • Different snapshot submission speeds (+Ln) – Smoothing attempt counts by taking a logarithm Michael V. Yudelson (C) 2014 7 *Spirtes, P., Glymour, C., and Scheines, R. (2000) Causation, Prediction, and Search, 2nd Ed. MIT Press, Cambridge MA.
  8. 8. Methodology (3) • Validating models – Consecutive code snapshots and changes in passing/failing the tests (YY, YN, NY, NN) – Model support scores for adding, deleting concepts: positive, negative, neutral (P,N,0) • Support – sum of slopes for the concept changes – NYP0 – from fail to pass, positive support for addition, neutral for deletions Michael V. Yudelson (C) 2014 8
  9. 9. Methodology (4) • Conditional probabilities – relative frequencies of – A: pass-to-pass – no-negative support for any changes – B: pass-to-fail – negative support for any change – C: fail-to-fail – no positive support for changes – D: fail-to-pass – positive support for changes • Grouped conditional probabilities – Average of all A, B, C, D – Average of B and D (arguably of primary interest) • Last but not least – size of the data required to fit models Michael V. Yudelson (C) 2014 9
  10. 10. Results (1) • Accuracy, Data size, Validation values Michael V. Yudelson (C) 2014 10
  11. 11. Results (2) Model Acc. Acc. rnk File Sz rnk Val. A-D rnk Val. B,D rnk Overall rnk Rasch .71 - - - - - AFM A .81 AFM B .73 AFM C .78 AFM A+PC .84 1 AFM B+PC .77 AFM C+PC .83 2 AFM A+Ln* .75 2 (.62) 3 (.45) AFM B+Ln .71 1 (123Mb) 1 (.63) 2 (4.75) AFM C+Ln .77 2 (139Mb) AFM A+PC+Ln .82 3 6 (284Mb) 8 (.59) 2 (.47) 3 (4.75) AFM B+PC+Ln .75 3 (141Mb) 3 (.62) 1 (.49) 1 (4.00) AFM C+PC+Ln .78 Michael V. Yudelson (C) 2014 11 * Logarithm of opportunity counts slightly inflates log file size due to text format See full table in the paper
  12. 12. Discussion • It is possible to fully automate student modeling (in programming domain) with a fraction of rich data • Models we built have potential to be used for providing in-problem learning support • The choice of best model has tradeoffs – Accuracy vs. data requirement vs. validation* Michael V. Yudelson (C) 2014 12
  13. 13. Future Work • Address concept counts in snapshots • Make use of code structure (parse trees) • Make use of student behaviors – Builder, Massager, Reducer, Struggler • Account for within IDE actions (save, run, ask for hint) • Tie to student’s browsing of the support material Michael V. Yudelson (C) 2014 13
  14. 14. Thank You! Michael V. Yudelson (C) 2014 14

×