Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PyParis 2017 / Camisole : A secure online sandbox to grade student - Antoine Pietri

167 views

Published on

PyParis 2017
http://pyparis.org

Published in: Technology
  • Be the first to comment

  • Be the first to like this

PyParis 2017 / Camisole : A secure online sandbox to grade student - Antoine Pietri

  1. 1. Camisole A secure online sandbox to grade students
  2. 2. Context: Prologin ● French national programming contest for students under 20 ● Online qualification with algorithmic exercises ● Thousands of applications every year ● C, C++, C#, Python, Haskell, OCaml, Java, PHP, … https://prologin.org
  3. 3. Problem: secure untrusted code evaluation (We are lazy and we want to grade our students without looking at their code.) Aimed at: teachers, programming contests, learning websites, Design goals: ● Simple enough to be used by everyone (teachers, developers, tinkerers…) ● Fast and precise (overhead matters in programming contests) ● Secure enough to be used in online websites (or malicious students) ● Abstract the languages in a modular way
  4. 4. HTTP/JSON interface $ curl camisole/run -d '{"lang": "python", "source": "print(42)"}' { "success": true, "tests": [ { "exitcode": 0, "meta": { ... }, "name": "test000", "stderr": "", "stdout": "42n" } ] }
  5. 5. Limits and quotas { "lang": "ocaml", "source": "print_string "Hello, world!n"", "compile": { "wall-time": 10 }, "execute": { "time": 2, "wall-time": 5, "processes": 1, "mem": 100000 } } ● User time ● Wall time ● Memory ● Stack size ● Number of processes/threads ● Size of files created ● Filesystem blocks ● Filesystem inodes ● … possibly more?
  6. 6. Test suite Statement: “Write a program that outputs twice its input.” { "lang": "python", "source": "print(int(input()) * 2)", "tests": [{"name": "test_h2g2", "stdin": "42"}, {"name": "test_?", "stdin": "404"}, {"name": "test_leet", "stdin": "1337"}, {"name": "test_666", "stdin": "27972"}] } { "success": true, "tests": [ { "exitcode": 0, "meta": { ... }, "name": "test_h2g2", "stderr": "", "stdout": "84n" }, { "exitcode": 0, "meta": { ... }, "name": "test_notfound", "stderr": "", "stdout": "808n" }, { "exitcode": 0, "meta": { ... }, "name": "test_leet", "stderr": "", "stdout": "2674n" }, { "exitcode": 0, "meta": { ... }, "name": "test_666", "stderr": "", "stdout": "55944n" } ] }
  7. 7. Metadata { "success": true, "tests": [ { "exitcode": 0, "meta": { "cg-mem": 2408, "csw-forced": 9, "csw-voluntary": 2, "exitcode": 0, "exitsig": null, "killed": false, "max-rss": 6628, "message": null, "status": "OK", "time": 0.009, "time-wall": 0.028 }, "name": "test000", "stderr": "", "stdout": "42n" } ] } ● Time ● Wall time ● Memory of the cgroup ● Context switches ● Exit code ● Signal received ● Killed or exited successfully ● Max resident set size ● … possibly more?
  8. 8. Front-end integration: programming contest
  9. 9. Front-end integration: online course * (* not actually using camisole, but could… :-))
  10. 10. Architecture User application Camisole Isolation backend HTTP/JSON API Virtual machine Sandbox Untrusted program
  11. 11. Solutions considered that don’t really work: ● ptrace ○ Overhead to monitor the system calls ○ Multiprocessing doesn’t work ○ Not multiplatform ○ Lot of things to handle ○ Runtimes can do weird things ● Docker ○ Overhead because overkill ○ Not precise enough Isolation backend
  12. 12. Isolation backend Backends : ● “Big brother” (chroot + setrlimit + memory watchdog + outside firewall) ○ Previous in-house solution ○ Isolation is very sloppy ● Isolate (https://github.com/ioi/isolate) ○ Resources limitation using cgroups ○ Isolation with namespaces ○ Lightweight FS isolation (chroot + mount --bind) ● Nsjail? (http://nsjail.com/) ○ Could be implemented as an alternate backend ○ You know how every time you do something, Google comes and does it 10x better?
  13. 13. Language module system Python 3.6 __init_subclass__ in action! from camisole.models import Lang, Program class Python(Lang, name='Python'): source_ext = '.py' interpreter = Program('python3') reference_source = r'print(42)' Load arbitrary language modules with: $ export CAMISOLEPATH=~/mylangs
  14. 14. (Simple, except for Java.) import re import subprocess from pathlib import Path from camisole.models import Lang, Program RE_WRONG_FILENAME_ERROR = re.compile(r...,') PSVMAIN_SIGNATURE = 'public static void main(' PSVMAIN_DESCRIPTOR = 'descriptor: ([Ljava/lang/String;)V' class Java(Lang): source_ext = '.java' compiled_ext = '.class' compiler = Program('javac', env={'LANG': 'C'}, version_opt='-version') interpreter = Program('java', version_opt='-version') # /usr/lib/jvm/java-8-openjdk/jre/lib/amd64/jvm.cfg links to # /etc/java-8-openjdk/amd64/jvm.cfg allowed_dirs = ['/etc/java-8-openjdk'] # ensure we can parse the javac(1) stderr extra_binaries = {'disassembler': Program('javap', version_opt='-version')} reference_source = r''' class SomeClass { static int fortytwo() { return 42; } static class Subclass { // nested psvmain! wow! public static void main(String args[]) { System.out.println(SomeClass.fortytwo()); } } } ''' def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) # use an illegal class name so that javac(1) will spit out the actual # class named used in the source self.class_name = '1337' # we give priority to the public class, if any, so keep a flag if we # found such a public class self.found_public = False try: self.heapsize = self.opts['execute'].pop('mem') except KeyError: self.heapsize = None def compile_opt_out(self, output): # javac has no output directive, file name is class name return [] async def compile(self): # try to compile with default class name (Main) retcode, info, binary = await super().compile() if retcode != 0: # error: public class name is not '1337' -- obviously, it's illegal, # so find what it actually is match = RE_WRONG_FILENAME_ERROR.search(info['stderr']) if match: self.found_public = True self.class_name = match.group(1) # retry with new name retcode, info, binary = await super().compile() return (retcode, info, binary) def source_filename(self): return self.class_name + self.source_ext def execute_filename(self): # return eg. Main.class return self.class_name + self.compiled_ext def execute_command(self, output): cmd = [self.interpreter.cmd] # Use the memory limit as a maximum heap size if self.heapsize is not None: cmd.append(f'-Xmx{self.heapsize}k') # foo/Bar.class is run with $ java -cp foo Bar cmd += ['-cp', str(Path(self.filter_box_prefix(output)).parent), self.class_name] return cmd def find_class_having_main(self, classes): for file in classes: # run javap(1) with type signatures try: stdout = subprocess.check_output( [self.extra_binaries['disassembler'].cmd, '-s', str(file)], stderr=subprocess.DEVNULL, env=self.compiler.env) except subprocess.SubprocessError: continue # iterate on lines to find p s v main() signature and then # its descriptor on the line below; we don't rely on the type # from the signature, because it could be String[], String... or # some other syntax I'm not even aware of lines = iter(stdout.decode().split('n')) for line in lines: if line.lstrip().startswith(PSVMAIN_SIGNATURE): if next(lines).lstrip() == PSVMAIN_DESCRIPTOR: return file.stem def read_compiled(self, path, isolator): # in case of multiple or nested classes, multiple .class files are # generated by javac classes = list(isolator.path.glob('*.class')) files = [(file.name, file.open('rb').read()) for file in classes] if not self.found_public: # the main() may be anywhere, so run javap(1) on all .class new_class_name = self.find_class_having_main(classes) if new_class_name: self.class_name = new_class_name return files def write_binary(self, path, binary): # see read_compiled(), we need to write back all .class files # but give only the main class name (execute_filename()) to java(1) for file, data in binary: with (path / file).open('wb') as c: c.write(data) return path / self.execute_filename()
  15. 15. Low-level API When simple single-file evaluation doesn’t suit your needs: opts = {'time': 5, 'mem': 5000} isolator = Isolator(opts, allowed_dirs=['/home']) async with isolator: await isolator.run(command, env=env, data=input()) return (isolator.stdout, isolator.stderr)
  16. 16. Deployment We autobuild an OVA (VirtualBox export) using packer.io: https://camisole.prologin.org/ova/camisole-latest.ova Importing it in VirtualBox and running the VM just works™ and gives you an HTTP server with all the built-in languages (Ada, C, Brainfuck, C#, C++, F#, Haskell, Java, Javascript, Lua, OCaml, Pascal, Perl, PHP, Python, Ruby, Rust, Scheme, VisualBasic). Great for non-tech savvy people!
  17. 17. Conclusion ● Elegant API for a hard problem: good abstraction! ● Linux isolation is awesome ● Python 3.5 and 3.6 features are awesome (f-strings, __init_subclass__, async…) Will our simplicity-centered design will make the project gain traction? :-) Full documentation: https://camisole.prologin.org Contribute! https://github.com/prologin/camisole Contact: #prologin @ irc.freenode.net antoine.pietri@prologin.org alexandre.macabies@prologin.org

×