Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Smart like a Fox: How clever students trick dumb programming assignment assessment systems

206 views

Published on

This case study reports on two first-semester programming courses with more than 190 students. Both courses made use of automated assessments. We observed how students trick these systems by analysing the version history of suspect submissions. By analysing more than 3300 submissions, we revealed four astonishingly simple tricks (overfitting, evasion) and cheat-patterns (redirection, and injection) that students used to trick automated programming assignment assessment systems (APAAS). Although not the main focus of this study, it discusses and proposes corresponding counter-measures where appropriate.
Nevertheless, the primary intent of this paper is to raise problem awareness and to identify and systematise observable problem patterns in a more formal approach. The identified immaturity of existing APAAS solutions might have implications for courses that rely deeply on automation like MOOCs. Therefore, we conclude to look at APAAS solutions much more from a security point of view (code injection). Moreover, we identify the need to evolve existing unit testing frameworks into more evaluation-oriented teaching solutions that provide better trick and cheat detection capabilities and differentiated grading support.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Smart like a Fox: How clever students trick dumb programming assignment assessment systems

  1. 1. How clever students trick dumb automated programming assignment assessment systems (APAAS) Nane Kratzke SMART LIKE A FOX 1
  2. 2. Introduction Methodology Analysis Discussion, Counter Measures Limitations, Conclusion Agenda 2 Presentation on SpeakerDeck Preprint on ResearchGate Presentation at CSEDU 2019, Heraklion, Crete, Greece (2 – 4 May 2019)
  3. 3. • We are at a transition point between the industrialisation age and the digitisation age. • Computer science related skills are a vital asset in this context. One of these basic skills is practical programming. • The course sizes of university and college programming courses are steadily increasing. • Even MOOC’s are used more frequently to convey necessary programming capabilities to students of different disciplines. • The coursework is composed of assignments that are highly suited to be assessed automatically. • However, it is very often underestimated how astonishingly easy it is to trick these systems! Introduction 3 The question arises whether “robots” certificate the expertiseto program or to cheat?
  4. 4. A small example to get your attention ... 4 VPL == Virtual Programming Lab • Count the occurence of a character c in a String s. • Develop a method countChar(). How to get full points in Moodle/VPL? The same works for every assignment! INTRODUCTION
  5. 5. INTRODUCTION • APAAS solutions are systems that execute injected code (student submissions). • Code injection is known as a severe threat from a security point of view. • APAAS solutions protect the host system via sandbox mechanisms. • Much effort is invested in sophisticated code plagiarism detection and authorship control of student submissions. • But it was astonishing to see that APAAS solutions like VPL overlook the cheating cleverness of students. • The grading component can be cheated very straightforward. • Unattended automated programming examinations must be rated suspect. APAAS == Code Injection System 5
  6. 6. Introduction Methodology Analysis Discussion, Counter Measures Limitations, Conclusion Agenda 6
  7. 7. • Two first semester programming Java courses in the winter semester 2018/19: • A regular computer science study programme (CS) • An information technology and design focused study programme (ITD) • In both courses we searched for student submissions that intentionally trick the grading component. • APAAS: Moodle/VPL (Version 3.3.3) Methodology 7 • To minimise Hawthorne and Experimenter effects neither the students nor the advisers were aware to be part of this study. • Even if cheating was detected this had no consequences for the students. It was not even communicated. • Students were unaware that the version history of their submissions were logged and analyzed.
  8. 8. METHODOLOGY • VPL submissions were downloaded from Moodle • Python/Jupyter based sample selection • S1: triggered evaluations • S2: maximum versions • S3: low average high end • S4: condition related terms • S5: unusual terms (System.exit, ...) • S6: random submissions • NumPy, matplotlib, statistics, Javaparser libraries • Exported weekly into archived PDF documents (for manual analysis) Searching for cheats Automated sample selection, manual sample analysis 8
  9. 9. METHODOLOGY Analysis of submissions 9 Manual annotation Task description Result, workload, working phases, student identifier
  10. 10. Introduction Methodology Analysis Discussion, Counter Measures Limitations, Conclusion Agenda 10
  11. 11. ANALYSIS Observed cheat-pattern frequency 11
  12. 12. ANALYSIS Continuous Example Assignment 12 Count the occurence of a character c in a String s (not case-sensitive). We searched for solutions that differed significantly from this intendend (reference) solution. The reference solution used to check for correctness.
  13. 13. ANALYSIS CHEAT PATTERN (1) • Get a maximum of points but do not solve the given problem in a general way • Solution is completely useless outside the scope of the test cases • Mapping simply input parameters to expected output parameters (63%) Overfitting 13
  14. 14. ANALYSIS CHEAT PATTERN (2) (30%) Problem Evasion 14 Example assignment: Count the occurence of a character c in a String s recursively. Solution pretends to be recursive, but it is merely a redirection to an overloaded method using loops (non- recursive). Intended solution Evasion solution
  15. 15. ANALYSIS CHEAT PATTERN (3) (6%) Redirection 15 (1) A small spelling error will result in compiler messages indicating that a specific method is expected by the test logic! (2) Compiler error messages can reveal the reference solution. (3) A clever student might now simply redirect the submission to the reference method (to let the grader evaluate itself). Redirecting solution
  16. 16. ANALYSIS CHEAT PATTERN (4) (2%) Injection 16 Print simply the points you want to have in a APAAS specific format on standard out. • Change the intended workflow of the evaluation logic • Use the standard out stream to place text that is evaluated by the APAAS system • The evaluator calls the to be evaluated code. • The submission code can print to standard out and then terminates further evaluation calls. • The evaluator parses standard outs content and will give full points! Some strings with a specific meaning for VPL.
  17. 17. Introduction Methodology Analysis Discussion, Counter Measures Limitations, Conclusion Agenda 17
  18. 18. DISCUSSION • Randomize Test CasesOverfitting • AST-based code inspectionProblem Evasion • AST-based code inspectionRedirection • Seperate standard out stream for evaluation and submission logicInjection Counter Measures 18 A more detailed discussion can be found in the paper.
  19. 19. DISCUSSION JEdUnit 19 JEdUnit https://github.com/nkratzke/JEdUnit JEdUnit is a unit testing framework with a special focus on educational aspects. It strives to simplify automatic evaluation of (small) Java programming assignments using Moodle/VPL. It is used and developed for programming classes at the Lübeck University of Applied Sciences. However, this framework might be helpful for other programming instructors, so it has been open sourced.
  20. 20. DISCUSSION Randomize Test Cases 20 Don‘t do that: Do that: JEdUnit DSL to express randomized test values. E.g. apply regular expressions inversely to generate random strings.
  21. 21. DISCUSSION AST-based code inspections 21 E.g.: Don‘t allow to bypass recursions by inspecting and penalizing loop presence. The JEdUnit DSL is able to express selectors on abstract syntax trees (AST) to check for the presence or absence of language constructs. The selector model of JEdUnit works similar like CSS selectors work on DOM- trees.
  22. 22. DISCUSSION Isolation of submission and evaluation logic 22 Submission logic gets an isolated fake console Submission shares stdout with evaluation process JEdUnit approach VPL approach
  23. 23. DISCUSSION Further Features of JEdUnit 23 JEdUnit https://github.com/nkratzke/JEdUnit • Weighting of test cases (by annotations) • Checkstyle integration (weightened rules) • DSL • to formulate test cases in a check, explain, onError pattern • to randomize test cases • to write arbitrary code inspections based on a selector model • Predefined code inspections (switch on/off): proper collection usage, Loops, Lambdas, inner classes, datafields, sonsole output, etc. • Automated class structure comparison (OO use cases to compare the structural equality of a multi-class submission with a multi-class reference solution.
  24. 24. Introduction Methodology Analysis Discussion, Counter Measures Limitations, Conclusion Agenda 24
  25. 25. LIMITATIONS We searched qualitatively and not quantitatively for cheat-patterns • Do not draw any conclusions what kind of cheat-pattern occur at what level of programming expertise • Do not draw any conclusions on the quantitative aspects of cheating • The study does not proclaim to have identified all kinds of cheat- patterns The study does not proclaim that all APAAS solutions have the same set of vulnerabilities • Do not generalize Moodle/VPL specific-problems. • However, the Overfitting, Problem Evasion, Redirection, and Injection patterns can be used to check for vulnerabilities in other APAAS solutions. Threats on Validity 25
  26. 26. • We have to be aware that (even first-year) students are clever enough to trick automated grading solutions. • Cheat patterns: • Overfitting • Problem Evasion • Redirection • Injection • Options we currently investigate: • Randomise test cases • Pragmatic code inspection • Isolation of submission and evaluation logic • Exactly these features seem to be only incompletely provided by current APAAS systems. Conclusion 26 JEdUnit https://github.com/nkratzke/JEdUnit
  27. 27. Acknowledgement 27 Presentation on SpeakerDeck Preprint on ResearchGate Advisers of the practical courses • David Engelhardt, Thomas Hamer, Clemens Stauner, Volker Völz, Patrick Willnow Student tutors • Franz Bretterbauer, Francisco Cardoso, Jannik Gramann, Till Hahn, Thorleif Harder, Jan Steffen Krohn, Diana Meier, Jana Schwieger, Jake Stradling, and Janos Vinz Picture Reference • Hacker: Pixabay.com (CC0) • Robot: Pixabay.com (CC0)
  28. 28. About 28 Nane Kratzke Web: http://nane.kratzke.pages.mylab.th-luebeck.de/about Twitter: @NaneKratzke LinkedIn: https://de.linkedin.com/in/nanekratzke GitHub: https://github.com/nkratzke ResearchGate: https://www.researchgate.net/profile/Nane_Kratzke SlideShare: http://de.slideshare.net/i21aneka

×