Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Werther (2011, mit ll) experiences in cyber security education the mit lincoln laboratory capture-the-flag exercise


Published on

Published in: Technology, News & Politics
  • Be the first to comment

  • Be the first to like this

Werther (2011, mit ll) experiences in cyber security education the mit lincoln laboratory capture-the-flag exercise

  1. 1. Experiences In Cyber Security Education: The MIT Lincoln Laboratory Capture-the-Flag Exercise ∗ Joseph Werther, Michael Zhivich, Tim Leek Nickolai Zeldovich MIT Lincoln Laboratory MIT CSAIL POC: BSTRACT during the competition. The competition itself was anMany popular and well-established cyber security Capture eighteen-hour event held over the weekend of April 2-3,the Flag (CTF) exercises are held each year in a variety during which students worked in teams of three to five toof settings, including universities and semi-professional defend their instance of WordPress, while simultaneouslysecurity conferences. CTF formats also vary greatly, rang- attacking those of other teams. A scoring system pro-ing from linear puzzle-like challenges to team-based of- vided numerical measures of instantaneous and cumula-fensive and defensive free-for-all hacking competitions. tive security, including measures of availability, integrity,While these events are exciting and important as contests confidentiality and offense. Carefully crafted but realisticof skill, they offer limited educational opportunities. In vulnerabilities were introduced into WordPress at the startparticular, since participation requires considerable a pri- of the competition via ten plug-ins authored by the Lin-ori domain knowledge and practical computer security coln team. Since getting a high availability score requiredexpertise, the majority of typical computer science stu- a team to run these plug-ins, participants were forceddents are excluded from taking part in these events. Our to come to terms with the very real dangers of rapidlygoal in designing and running the MIT/LL CTF was to deploying untrusted code. WordPress is famous for itsmake the experience accessible to a wider community by extensibility, and plug-in architectures are increasinglyproviding an environment that would not only test and common and popular in software engineering. A goalchallenge the computer security skills of the participants, of the MIT/LL CTF was to explore this novel computerbut also educate and prepare those without an extensive security issue.prior expertise. This paper describes our experience in The event was open to all Boston area students, withoutdesigning, organizing, and running an education-focused pre-requisites or a qualification round, with main motiva-CTF, and discusses our teaching methods, game design, tion including capturing an actual flag (we made a flag thatscoring measures, logged data, and lessons learned. the winning team took home), learning about practical computer security, and taking home a $1,500 first-place1 I NTRODUCTION prize. Sixty-eight students registered for the event over the course of two weeks in a first-come, first-served basis.In April of 2011, MIT Lincoln Laboratory organized a Of these, fifty-three actually formed teams, and on theCTF competition on MIT campus to promote interest in day of the exercise, forty-five showed up in person at 8:30and educate students about practical computer security. am on a weekend to compete in the CTF.The competition was structured around defending and Participants’ response to the MIT/LL CTF was over-attacking a web application server. The target system whelmingly positive. After the competition ended, weconsisted of a LAMP (Linux, Apache, MySQL, PHP) distributed a survey to ascertain the educational value ofsoftware stack and WordPress, a popular blogging plat- this CTF. The survey responses indicate that studentsform [1]. In order to familiarize participants with the learned much about practical computer security both be-target system and to provide an opportunity to implement fore the competition (in lectures and labs, self-study, andsubstantial solutions, a virtual machine very similar to group activities), and during the competition itself (whereone used during the competition was made available to the time pressures of the competition bring into sharp fo-the participants over a month before the competition took cus theoretical computer security lessons). While donningplace. To help participants prepare for the event, we of- the “black hat” to hunt for flaws in code and configura-fered evening lectures and labs that discussed defensive tions is certainly fun, we assert that it is also a powerfuland offensive techniques and tools that might be useful intellectual tool for challenging assumptions and mind- ∗ This work is sponsored by OUSD under Air Force contract FA8721- sets. We believe that a CTF is a valuable pedagogical05-C-0002 and by DARPA CRASH program under contract N66001-10-2-4089. Opinions, interpretations, conclusions, and recommendations tool that can be exploited to engage students in the studyare those of the authors and are not necessarily endorsed by the United of complicated modern computer and network systems.States Government. Further, we believe it can be accessible to a much larger 1
  2. 2. subset of computer science students than a traditional We are certainly not the first to consider offensive com-CTF. ponents to be crucial to learning practical computer secu- This paper discusses our experiences of designing and rity. O’Connor et al [11] suggest that framing study ofrunning the MIT/LL CTF. We examine the successes general computer science concepts from a perspective ofand failures, related to both our educational goals and an adversary encourages student participation and interest.system design, and share our overall conclusions with Moreover, they argue that framing the problem in termsrespect to future improvements. Section 2 discusses our of security (e.g., forensics) makes learning about otherpedagogical approach, and Section 3 describes the educa- topics, such as file system formats, much more exciting.tional methods and lectures used for the CTF. Section 4 George Mason University explored the merits of havingexamines the choices made with respect to selecting the students create offensive test cases to exploit the codecomponents and techniques used in designing the game they developed as part of class assignments [8]. In doingand evaluating participants’ progress. Section 5 covers so, they found that the offensive mindset led students tothe evaluation metrics used to assess teams during the discover vulnerabilities not found through other testingcompetition. Section 6 analyzes data gathered during and methods. Bratus [2] also argues for adding componentsafter the event including team scoring trends, reactions of adversarial, hacker mindset to the traditional computerto changes in the importance of metrics during the game, science curriculum due to the low-level knowledge it im-and survey results from the participants. Section 7 dis- parts, and its necessity in understanding and effectivelycusses related CTF exercises and how the MIT/LL CTF implementing secure systems. In designing our CTF, weis different. Finally, Section 8 offers lessons learned by sought to enable a wider range of participants to benefitthe MIT/LL CTF staff, and how we plan to change the from this approach to computer security education.event in future iterations. 3 E DUCATIONAL D ESIGN2 P EDAGOGICAL A PPROACH Since our goal was to create not only another CTF ex-Students learn about computer security in a number of ercise, but also a pedagogic tool for teaching computerways. Reading conference papers, journal articles, and security, we incorporated several educational components,books, for instance, allows students to acquire valuable in the form of lectures and labs, into the MIT/LL CTF. Intheoretical apparatus and learn what has been tried before. total, we offered five classes in the month preceding theDesigning and implementing defensive systems and of- competition.fensive tools is also valuable, as it requires the application The first lecture provided an overview of the CTFof abstract knowledge to build real working solutions. game, its mechanics and rules. As part of this description, In addition to these fairly traditional educational av- we presented the game platform and architecture (Linux,enues, we believe that practicing defending and attacking x86), as well as the intended target – the WordPress con-real computer systems in real time is also of immense tent management system. We also explained and justifiededucational value, and that it offers lessons that can’t the scoring system, with suitable measures of confiden-effectively be taught in a classroom. A CTF event is a tiality, integrity, availability and offense (see Section 5).playground in which students can fail or succeed safely at This meeting allowed those who had not taken part in acomputer defense, and where it is permissible to engage CTF exercise before to understand the game better andin attack, without fear of consequences or reprisal. We ask questions.believe it is crucial for CTF events to include an offensive In the second class, we presented the basics of webcomponent, not only because students find it exhilarating, applications, the WordPress API, and some of the funda-but because it also challenges flawed reasoning and as- mental ways in which its design makes computer securitysumptions in tools, techniques, and systems, and leads to a difficult. We did not teach PHP, JavaScript or SQL, evendeeper understanding of computer science in general [6]. though WordPress makes use of all three, as these details Despite the significant educational potential of a CTF, could easily be mastered by the general computer sci-many potential participants (i.e., those with a general com- ence student in self-study. The intent was not to educateputer science background or even a few computer security students to the point that they might go off and write aclasses under their belt) perceive there to be a high barrier web application; rather, we hoped to orient them in thisto entry. Unfortunately, they are often right: participating (perhaps unfamiliar) terrain, providing an overview ofin and learning from a typical CTF competition requires the target and sketching the security issues for them tosignificant skills and background knowledge. For par- consider on their own.ticipants with inadequate skills, it can be frustrating and The third class covered various aspects of Linux serverbewildering, as their systems are compromised quickly security, also in lecture form. Topics ranged from high-and repeatedly. They are lucky if they even know what level concepts, including the principle of least privilege,has happened, let alone why. multi-layer defense and attack surface, to low-level dis- 2
  3. 3. cussions of practical details such as firewalls, application code of the system. An easy way to do this was to focusconfiguration, package management and setuid bina- on web application security.ries. A number of standard tools and packages such Having chosen the game genre, the next decision wasas AppArmor, Tripwire, and fail2ban were also ex- selecting an open source or custom-written web appli-plained. Additionally, MIT’s volunteer student-run com- cation framework. We believed that the game would beputing group SIPB presented a case study in securing more meaningful to the participants if we used realis-the web application ( and virtual ma- tic, commercial off-the-shelf (COTS) software during thechine ( hosting services provided to the CTF, since it would allow them to build reusable exper-MIT community. tise for a popular software package that they are likely The fourth lecture discussed web application exploita- to encounter again elsewhere. With this in mind we settion techniques including SQL Injection, Cross-Site out to select a common web application framework thatScripting, and other server-side and client-side attack would enable our CTF to be educational, well-designed,vectors. Each topic was addressed from the aspects of and fun to play. After considering several candidates, thevulnerability discovery, exploitation, and mitigation. It PHP-based Content Management System (CMS) Word-was our hope that approaching each issue from all three Press [1] was selected as the CTF’s base architecture.angles would help the participants build better tools in theweeks leading up to the competition. 4.2 Modular Game Design The final class consisted of a lab in which the par- One of the main requirements in selecting a web applica-ticipants were asked to work through computer security tion framework was modularity. We needed a robust waychallenges with the help of the organizers. For this exer- to introduce new vulnerabilities that could be exploitedcise, we used Google Gruyere [7]. This site allowed each by competition participants that could not be discoveredstudent to stand up a separate instance of the exercises by simple source code “diff”. A plug-in architecture,and practice finding and exploiting the plethora of issues especially one as flexible and extensively used as in Word-discussed in the previous lecture. By allowing partici- Press, allowed us to create new functionality that featuredpants to build exploits and actually apply the knowledge carefully crafted but realistic vulnerabilities. At the samethey gained through the previous lectures, we hope they time, this architecture enabled us to provide the partici-gained perspective on the tools they would build or use pants with the basic framework (i.e., LAMP server within the coming weeks to prevent similar intrusions from WordPress) ahead of the competition without revealingsucceeding on their server. any details of the plug-ins we were building. Finally, sep- In addition to the classes, a mailing list and wiki were arating our challenges into different plug-ins enabled easyset up to provide information and answer questions. After division of labor.the release of the competition VM (both before and after Plug-ins are used extensively, particularly in web appli-lectures and lab), participants posed a number of ques- cations. We felt that the dynamics of acquiring untrustedtions about server configuration, tool use, and defensive code, examining it for potential flaws, fixing the ones thatand offensive strategies using these resources. We also can be easily found and providing some kind of sandbox-held a post mortem session right after the competition ing or code isolation as a fail-safe was a realistic strategyto discuss the vulnerabilities that we introduced into the that system administrators might employ. Formulating aplug-ins provided and to give teams an opportunity to game around this dynamic enabled participants to practiceexplain their strategies, including how they found and ex- several important practical computer security skills, in-ploited vulnerabilities. We used this forum also to solicit cluding source code auditing, fuzzing or web applicationfeedback about the mechanics and implementation of the penetration testing to find vulnerabilities, patching codecompetition itself. without removing functionality, and configuring appropri- ate sandboxing mechanisms.4 E XERCISE D ESIGNThis section covers design decisions made while planning 4.3 Pre-release of Select Game Contentthe CTF and its component challenges. To enable students to create significant solutions we wanted to release as much of the CTF content ahead of the4.1 Target Selection competition as possible, without releasing the challengesIn order to design a CTF that carried an academic fla- themselves. By selecting a modular framework, we werevor, we realized that challenges based on compiled bina- able to withhold all of the challenges explicitly writtenries would require an unacceptably large amount of prior for the CTF while providing a virtual machine to teamsknowledge and thus contradict our pedagogical goals laid a month ahead of time. Since the distributed VM wasout in Section 2. As such, we chose a CTF setting that almost identical to the one deployed at the competition,would naturally allow the participants access to the source participating teams were encouraged to build defensive 3
  4. 4. tools and scripts to lock down their servers, while verify- Varying values for Wd , WA , WI and WC provided much-ing the base server configuration was secure. needed flexibility for simulating scenarios with different Furthermore, since WordPress has a large set of pub- importance assigned to the corresponding properties. Thelicly available plug-ins, many of which have published rest of this section describes how each of the score com-security vulnerabilities, we were able to further aug- ponents was measured.ment the pre-released virtual machine with representa-tive sample plug-ins. Three vulnerable WordPress plug- 5.1 Functionality-based Metricsins and corresponding exploit code were gathered from Each team’s availability score was measured using a and installed into the VM’s Word- ing engine with a module written for each plug-in, with anPress instance. This provided a proxy for what the teams additional module to evaluate WordPress’s basic function-would see during the game, thus enabling teams to build ality. Scores from each grading module were combinedand test defensive measures and sandboxing implementa- into a weighted average, according to an assigned impor-tions before the competition began. tance of each plug-in. For this competition, all function- ality was weighted evenly; however, these weights could4.4 Simulating the Real World be easily adjusted to reflect the difficulty of securing aIn addition to supporting the pedagogical goals described plug-in that provides some complicated functionality.above, we aimed to emulate the dynamic nature of real During the competition each team’s website was eval-world operations. Business pressures require IT infras- uated on 5-10 minute intervals. A team’s overall avail-tructure to be nimble and provide new functionality on ability score was calculated as the mean of all availabilityshort notice. To simulate these requirements during the scores to date.CTF, we released one batch of new plug-ins at the begin-ning of the competition, and another batch near the end 5.2 Flag-based Metricsof the first day. Because plug-ins provided independent Every 15 minutes, flags (128-character alphanumericfunctionality, teams could choose to run some of them, strings) were deposited into the file system and databasebut not others, thus giving them time to review and harden on each team’s server. If opposing teams captured thesenew plug-ins; however, any delay in enabling plug-ins flags, they could submit them to the scoreboard system.corresponded to sacrifices in the availability score. Flags were assigned a point value corresponding to the perceived difficulty and level of access needed to acquire4.5 Selecting Participants the flag. By providing flags of varying difficulty, weGiven the open nature of the MIT/LL CTF, we did not hoped that teams try to escalate privilege to gain higherrequire any qualifications of our participants, aside from levels of access than those afforded by the more basicbeing enrolled in an undergraduate or graduate level pro- exploits available in the game. Following the insertion ofgram at a Boston-area university and willingness to spend flags into each system, the scoring bot would wait a ran-the weekend competing in the CTF. We encouraged par- dom period and check whether the flags were unaltered.ticipants to form teams of at least three members, as we The confidentiality, integrity and offense score compo-felt that competing with fewer people would put the team nents were derived from the flag dropping and evaluationat a significant disadvantage. Our resulting participant system described above. Integrity was calculated as thepool was comprised of undergraduates from MIT, North- fraction of flags that were unaltered after the random sleepeastern University, Boston University, Olin College, Uni- period; if the VM was inaccessible, a score of zero wasversity of Massachusetts (Boston), and Wellesley College, reported, as the state of the flags could not be determined.divided into (some multi-institutional) teams of three to The instantaneous integrity scores were averaged togetherfive members. to produce the integrity subscore. Confidentiality and offense were closely linked por-5 S CORING tions of each team’s score. Confidentiality was a runningIn order to evaluate the teams’ performance during the record of the percentage of flag points assigned to a teamCTF, we separately assessed each team’s ability to defend that had not yet been submitted by an opposing team fortheir server (the Defense subscore) and to capture other offensive points. Conversely, offense was calculated asteam’s flags (the Offense subscore). The defense portion the raw percentage of flags that a team did not own them-of the score was itself derived from three measures: confi- selves but submitted to the scoreboard for points. Unlikedentiality (C), integrity (I), and availability (A) that were availability and integrity scores, which were immediatelycombined using weights WC , WI , and WA . computed by the grading system, the confidentiality and offense scores were based on flags submitted by different Score = Wd × Defense + (1 − Wd ) × Offense teams. Since a flag could be submitted to the scoreboard Defense = WC × C + WI × I + WA × A by a team any time after that flag entered the CTF ecosys- 4
  5. 5. tem, it was impossible to instantaneously tell when a availability was worth only 1 of the total score, and the 6given team’s confidentiality was compromised. team’s confidentiality and integrity scores improved with most paths to attack removed. Furthermore, this strategy5.3 Situational Awareness enabled the entire team to focus on offense.In an effort to provide situational awareness, the score- Since this was not the game equilibrium we were look-board presented two informational screens to each team. ing for, we adjusted the grading weights to shift the bal-The errors view provided access to availability and flag ance of the game back in favor of keeping the web serversrotation errors. Each availability error was tagged with running. On the second day, we announced that new scor-the corresponding plug-in and a timestamp. Flag rota- ing weights were Wd = .80 and WA = .80, thus makingtion errors indicated to the team when a flag could not be availability worth now a hefty 64% of the total score. Newdropped onto their system, and whether it was a database weights, of course, did not apply retro-actively to scoresor a file system flag. With both of these pieces of informa- obtained on the first day.tion, teams were adequately able to detect and fix broken Given this change in scoring metric, we would havefunctionality on their system. expected to see availability rise during the second half of The grading view provided each team with a break- the competition. We would have also expected resourcesdown of their scores in a more granular fashion. The last from offense-related activities to be diverted to defense;ten instantaneous integrity and availability scores were thus, we expected to see offense scores decrease in thedisplayed to each team, enabling them to identify when second half of the competition as well. However, thetheir level of service had been adversely affected. Con- results showed a more complicated picture, as we nowfidentiality was displayed as a number of flag-points not discuss.scored by other teams out of the total number of flagpoints assigned to the specific team. Conversely, offense 6.1.1 Availability Scoreswas displayed as the total number of flag points scored out Figure 1 (left) shows the instantaneous availability for theof the total number of flag points in the CTF not belonging top 5 teams (given the final standings).to the specified team. We considered offering information The general trends in availability scores for the first dayabout confidentiality and offense score in a similar fash- seem to imply that two strategies were in place. Someion to integrity, but realized that data was not meaningful teams (DROPTABLES, Ohack, and Pwnies) seem to havebecause of the practice of flag hoarding – i.e. collecting enabled the web server and plug-ins at the beginning offlags from an opponent’s VM but not submitting them competition and thus suffered from exploits throughoutinstantly to conceal the intrusion. the day. Other teams (CookieMonster and GTFO) dis- abled the plug-ins at the beginning of the day, thus paying6 DATA A NALYSIS the cost in decreased availability; however, once they fig-In our CTF, we collected data from two different sources: ured out some way to secure them, the web servers werethe availability, integrity, confidentiality and offense turned back on.scores aggregated during the competition itself, and partic- On the second day, the picture is quite different.ipant surveys distributed after the competition. Through DROPTABLES and GTFO (1st and 2nd place finalists,analysis of the scoreboard information we can gain insight respectively) battle to keep their availability up (presum-into the way the scoring function affected and incentivized ably while under heavy attack). CookieMonster starts outcertain actions within the game, while survey responses with high availability, but drops rather precipitously asprovide a way to gauge effectiveness of the exercise in the team’s web server is owned and eventually destroyededucating, challenging, and enticing participants into the beyond recovery. Pwnies and Ohack struggle against at-field of computer security. tacks as well, but do manage to get the plug-ins functional towards the end of the competition. While we are not6.1 Scoring Data Analysis seeing a clear trend demonstrating that availability wasThe scoring function discussed in Section 5 included ad- higher during the second day of the competition, it is ev-justable weights to enable us to shift the balance of the ident that the teams were trying to get their availabilitygameplay if we deemed the CTF to be too focused on back up, even if some did not succeed.some aspect of defense or offense. On day one, we setWd = 1 (thus giving even weight to defense and offense 6.1.2 Offense Scores 2components), and set W{A,I,C} = 1 . In this configura- 3 Figure 1 (right) shows the cumulative running averagetion, we discovered that the teams shut off the web server of the offense score for the two days of the competition(or turned it on only during scoring runs), thus remov- (again, just for the top 5 teams). Because flag hoarding ising most obvious pathways an adversary would use to allowed by our system, there is no “instantaneous” offensetake over the system. This strategy was cost-effective, as score – this score necessarily carries the memory of the 5
  6. 6. Figure 1: Instantaneous availability (left) and offense (right) scores for the top 5 ranked teams on day 1 (top) and day 2 (bottom). The plateausbetween 13:00-14:00 and 17:00-18:00 represent lunch and dinner breaks, respectively. The competition ended at 21:00 on day 1 and 19:00 on day 2.entire competition. Here, we expected to see a decrease computer security skill, but their comments indicate thatin offense activity on day two, as teams’ resources are they may have been over-confident in their skill levelredirected to defense to ensure highest availability. before the competition began. On day one, Ohack and Pwnies dominate the offen- When polled about interest in a computer security ca-sive landscape (at the expense of their availability scores), reer both before and after the event, respondents displayedwhile DROPTABLES, CookieMonster and GTFO spend an average of 1.1 point increase in interest (again, on aless time on offense and thus provide higher availabil- 10-point scale). This is likely due to the fact that manyity (at least before the dinner break). On day two, of those who responded stated that they had a 10 out ofOhack again dominates all other teams in total number 10 interest in a computer security career even before theof flags submitted. The sharp jump for both Ohack and competition.Pwnies coincides with our announcement that the scoring When asked to select the lectures that they found to beweights are changing – therefore, any flags that were be- most helpful, the Linux Server Lockdown was the mosting hoarded are submitted to the scoreboard as soon as popular (13 votes), closely followed by Web Applicationpossible. After that, offensive scores decline as the teams Exploitation (11 votes). Survey comments indicated thatpresumably spend more effort on defense, with the excep- respondents found several techniques presented duringtion of GTFO, whose offense score increases throughout our lectures useful. From the Linux Server Lockdown lec-the day. ture, Apache’s mod_security was mentioned as being Overall, the data supports the notion that teams refo- helpful by several respondents. Additionally, respondentscused their efforts from offense to defense due to the used fail2ban and Tripwire for detecting attacks, al-change in scoring metric. The results are somewhat mud- beit some mentioned that Tripwire consumed too manydled by the fact that the teams figured out several vul- system resources and had to be turned off. Surveys alsonerabilities by the time day two started, so maintaining a indicated that knowledge of PHP’s SQL escaping func-higher availability was more difficult. tions (described in Web Application Exploitation lecture) was useful in patching vulnerabilities in plug-ins. Respon-6.2 Participant Survey Results dents also reported using input “white-listing” techniqueWe released an Internet-based survey to MIT/LL CTF par- taught in class to prevent command injection.ticipants two days after the competition. This survey cov- Of course, several knowledge areas that were not ad-ered overall CTF impressions and education takeaways, dressed in lecture proved very useful as well. Severalpre-CTF class evaluation, evaluation of in-game strate- respondents reported that being unfamiliar with PHP de-gies, and post-CTF reflections. The analysis below is velopment posed a significant limitation – these teamsbased on 22 responses we received to the survey. were ill-prepared for auditing the plug-in source code to When asked to rate confidence in their computer se- discover vulnerabilities via code inspection, which wascurity skills before and after the competition, we found useful both for patching and for exploit improvement of (on average) 1.4 points on a 10-point Aside from PHP development experience, some respon-scale. Two respondents reported a decrease in perceived dents indicated that remote system logging tools, specif- 6
  7. 7. ically syslog-ng, proved useful for quickly detecting an external third-party red team charged with attackingsystem attacks, and also in analyzing and re-purposing all participating teams.competitors’ exploits. Finally, social engineering skills Many other challenge or puzzle-oriented security exer-(an aspect not covered in lecture) helped one team obtain cises exist as well. One popular and notable example isroot access on several other teams’ VMs. This “exploit” NYU Polytechnic Institute’s Cyber Security Awarenessused an ingenious combination of a Gmail account resem- Week (CSAW) competition, which is loosely based on thebling one used by the organizers, a signature block copied DefCon CTF qualifier competition, but also includes afrom previous announcements to the list of CTF partic- hardware security challenge and a game segment targetedipants, and a replica of the official CTF wiki complete at high school students. In addition to the many compe-with a not-so-authentic SSH public key to be used in the titions held at the inter-collegiate and semi-professionalauthorized_keys file of the root user on team VMs. levels, many other exercises have been held at universi- All survey respondents indicated that they spent at least ties [9, 12]. These events tend to vary in focus betweenan hour preparing for the competition. The two most defensive and offensive, and are often the capstone eventcommon preparation time ranges were 1-2 hours (9 re- of a semester-long course.spondents) and 4-8 hours (8 respondents). When asked MIT/LL CTF is similar in some aspects to the afore-about time allocation between offense and defense during mentioned academic CTFs; however, our main goal wasthe competition itself, 50% of respondents reported that to make the CTF experience available to students of differ-they spent more time on defense, 36% claim to have spent ent academic backgrounds and varying practical expertisethe majority of their time on offense, and the remain- in computer security. By giving students an opportunityder indicated neither offense nor defense occupied the to experience what it is like to defend and attack computermajority of their competition time. 86% of respondents systems first-hand as part of this competition, we wereindicated that they attempted to patch at least one of the aiming to encourage interest in practical computer secu-vulnerable plug-ins, while those who developed exploits rity and promote further study outside the competition.created an average of 1.5 exploits during the competition. By providing a short series of lectures and labs beforeOverall, 91% of those who responded said they would the event instead of requiring completion of a semester-like to participate in an event like this again. long course we lowered the barrier to entry and enabled a7 R ELATED W ORK wider group of students to participate in this event. The competition was not structured to protect teams with littleMany other cyber security exercises, both with academic computer security expertise (in fact, several survey re-and recreational motives, have shaped the current CTF spondents found the competition quite difficult and gainedlandscape. Perhaps the most notable is the annual DefCon new respect for the skills required to succeed); however,CTF exercise held in Las Vegas, Nevada. This competi- we believe that it was accessible to the students who at-tion is open to participants world-wide, but has a prelimi- tended our lectures and prepared in the weeks precedingnary challenge-based qualification round in which partici- the competition.pants solve computer security puzzles on a Jeopardy-like Several self-paced security courses, including Google’sboard. The top eight teams from the qualifiers have histor- Gruyere [7] and Stanford’s Webseclab [3], also help stu-ically played in a team versus team security competition dents learn about web application security. However,during the DefCon conference, an experience described we believe that a CTF based around defending an entirein detail in [5]. server in real time while running possibly buggy plug-ins Notable academic CTFs include University of Califor- allows students to learn about a broader range of computernia, Santa Barbara’s iCTF, the National Security Agency’s security topics, as well as about operational considera-Cyber Defense Exercise (CDX), and the Collegiate Cyber tions involved in maintaining a secure system.Defense Competition (CCDC). From 2003-2007, iCTFhad a team versus team format similar to the MIT/LLcompetition. Since then, they have moved towards a sto-ryline oriented model which provides each team with 8 L ESSONS L EARNEDidentical parallel versions of the game [4]. CDX placesthe emphasis on the defensive aspect of security, em- Running the MIT/LL CTF was a new experience for allploying a Red Cell of hackers from various government team members involved. Aside from learning quite a bitorganizations to simulate a live, defensive operation for about the mechanics of web application exploitation whilethe participants [13]. The organizations that compete in constructing our WordPress plug-ins (and wondering howthis game often hold semester-long courses to prepare the world of web applications can ever be secured), wefor the exercise. Similar to CDX, CCDC also places the also learned a few lessons about running a competition ofemphasis on the defensive aspect of security, employing this scale and complexity that are worth sharing. 7
  8. 8. 8.1 Marketing and Scheduling were forbidden by our CTF rules, and we were askedRunning a CTF that includes participants from multiple several times by participants to mediate a situation whereuniversities required paying a lot of attention to schedul- excessive network or CPU bandwidth was being and marketing. Originally, we planned on a week of Our methods used to verify validity of such claims wereday-time preparatory classes, followed by a week of com- rather rudimentary – we could attempt to connect to thepetition during MIT’s month-long break in January 2011. server in question, or traceroute the purported source ofDespite extensive postering and e-mailing, we received the denial-of-service attack; with some preparation, wevery little interest, which was likely due to two causes: may have had better tools at our disposal.student were too busy to spend two weeks on this activity, 8.3 Operational Issues and Game Mechanicsand several other January classes and competitions (with When designing and implementing our grading bots, wesignificantly larger prizes) targeted students interested in were concerned about encouraging undesirable behaviorsweb applications and security. – e.g. if the grading bot is easy to identify, a participant Feedback from MIT’s SIPB and BU’s BUILDS student might open the firewall only for the grading bot and ex-groups helped us choose a better schedule by shifting clude all other teams from accessing their server. Due toclasses to evenings and limiting the competition to one time restrictions in developing the grader bots, we usedweekend. The new format was quite successful by lim- a different mechanism to interact with WordPress (viaiting the time commitment required for a busy student. XMLRPC, used by desktop-based blogging software) in-Advertising the CTF at multiple Boston-area universi- stead of through a scripted browser. In retrospect, this wasties also greatly increased participation – even though suboptimal, as certain activities performed by the graderwe required physical presence, students from Wellesley were clearly distinguishable; furthermore, some aspectsand Olin College braved the drive and came to compete of the plug-ins ran JavaScript on the client and thus wereanyway. not exercised by the grader. To make grading slightly less8.2 Infrastructure and Data Collection predictable, we did the following:We built our CTF infrastructure on VMWare’s ESX so- • Randomized starting time for a grading run (withinlution, which afforded us quite a bit of scalability and a 10-minute interval),robustness from the beginning. While we were careful • Randomly picked content drawn fromin designing and testing our grading bots, the scoreboard 100 papers generated by MIT’s Au-and other “in-game” infrastructure servers, we still had tomated Paper Generator available ata number of issues to fix during the competition itself.,Some of the participants suggested that we hold scrim- • Randomly picked a string for the ‘User-Agent’mages before the actual CTF to work out the kinks in the header from a large set obtained frominfrastructure, and that seems like a very good sugges- Despite extensive testing using failure modes thatwe could easily predict (e.g., Apache is down, machine is In order to make the competition more accessible tofirewalled, MySQL is not running), we ended up having weaker teams (and to obviate the necessity for teams toto contend with unexpected scenarios. For example, our create their own whole-system backups), we offered eachflag rotator bot relied on using rm command to remove team three snapshots and three reverts per day of thepreviously-placed flags in the file system. In desperate competition. Because we couldn’t give each team accessattempts to stem damage from attacks, some teams de- to the ESX console, we performed these operations our-cided to delete /bin/rm from their systems, resulting in selves when requested by a team member. This “lifeline”completely unexpected grading bot failures. enabled teams whose server was completely destroyed A competition like a CTF is a great opportunity to to continue participating in the game; however, the me-collect data that might elucidate the ways an adversary ac- chanics could be improved. We required a team membertually attacks the web application, and what actions both to approach us in person and bring a token given at theattackers and defenders take throughout the competition. start of the CTF to identify their team in order to requestOur ESX server was setup to capture all network traffic a snapshot or a revert. Since all participants were in thethat traversed the virtual switch; regretfully, something same room, it was obvious that a certain team’s VM willwent wrong overnight, and the file system appeared to be momentarily reverted – in fact, requesting a restorehave corrupted our packet capture file. In retrospect, we became quickly known as the “walk of shame”. Unfor-would have liked better visibility into participants’ servers, tunately, many teams created snapshots once their VMssome of which might have been obtained using the ESX were already infiltrated; thus, the attackers were able toconsole or some custom software using VMWare tools compromise the machine again before countermeasuresAPI. For example, volume-based denial of service attacks could be taken by the defenders. In addition, the initial 8
  9. 9. snapshot provided to the teams had all plug-ins installed [3] E. Bursztein, B. Gourdin, C. Fabry, J. Bau, G. Rydst-and Apache running, which again created a vulnerable edt, H. Bojinov, D. Boneh, and J. C. Mitchell. Web-situation and prevented the lifeline from being useful (es- seclab security education workbench. In Proc. of thepecially during the second day of the competition). 3rd Workshop on Cyber Security Experimentation and Test, Washington, DC, August 2010.8.4 Measuring the CTF’s Educational ImpactTo measure the effect of the CTF competition on partici- [4] N. Childers, B. Boe, L. Cavallaro, L. Cavedon,pants’ knowledge of computer security, we conducted an M. Cova, M. Egele, and G. Vigna. Organizing largeInternet survey after the event. The survey was rather ad scale hacking competitions. In Proc. of the 7thhoc, as our primary motivation was to interest students Conference on Detection of Intrusions and Malwarein the field of computer security, to make the CTF expe- & Vulnerability Assessment, Bonn, Germany, Julyrience available to those with little practical experience the area, and to run an exciting competition. In these [5] C. Cowan, S. Arnold, S. Beattie, and C. Wright. De-goals, the survey results indicate that we were success- fcon capture the flag: Defending vulnerable codeful. In future iterations of the MIT/LL CTF, we plan to from intense attack. In Proc. of the DARPA Informa-measure the effect CTF had on participants’ knowledge tion Survivability Conference and Exposition, Wash-more directly by following methods similar to [10], by in- ington, DC, April 2003.cluding pre-CTF and post-CTF quizzes to assess players’prior knowledge of computer security areas and experi- [6] R. L. Fanelli and T. J. O’Connor. Experiences withence gained by participating in the CTF. Since this is a practice-focused undergraduate security education.completely voluntary game, we will likely need to incen- In Proc. of the 3rd Workshop on Cyber Securitytivize participation in these quizzes to ensure statistically Experimentation and Test, Washington, DC, Augustsignificant results; perhaps a raffle for a small prize would well. [7] Google, Inc. Web application exploits and defenses. C ONCLUSIONThe MIT/LL CTF competition was a great learning ex- [8] M. E. Locasto. Helping students 0wn their ownperience both for the students involved and for the orga- code. IEEE Security and Privacy, 7(3):53–56, 2009.nizers. We believe that this exercise helped the students [9] G. Louthan, W. Roberts, M. Butler, and J. Hale. Theunderstand the intricacies of practical computer security, Blunderdome: An offensive exercise for buildinghighlighted their strengths and weaknesses in computer network, systems, and web security awareness. Insecurity skills and generally increased their interest and Proc. of the 3rd Workshop on Cyber Security Ex-desire to learn more about this area. We plan to continue perimentation and Test, Washington, DC, Augustfostering this community by encouraging creation of read- groups focused on practical computer security and byrunning similar CTF competitions in the upcoming years, [10] M. Mink and R. Greifeneder. Evaluation of the of-incorporating the feedback from this year’s participants. fensive approach in information security education. IFIP Advances in Information and CommunicationACKNOWLEDGMENTS Technology, 330:203–214, 2010.We would like to thank the anonymous reviewers and our [11] T. J. O’Connor, B. Sangster, and E. Dean. Usingshepherd, Sean Peisert, for providing feedback that helped hacking to teach computer science fundamentals.improve this paper. We would also like to thank Lincoln In American Society for Engineering Education, St.CTO, Bernadette Johnson, for her guidance and encour- Lawrence Section, 2010.agement. Additionally, we thank Geoffrey Thomas ofMIT SIPB for his presentation on securing Linux servers. [12] M. O’Leary. A laboratory based capstone course in computer security for undergraduates. In Proc. of theR EFERENCES 37th SIGCSE Technical Symposium on Computer Science Education, Houston, TX, March 2006. [1] WordPress: Blog tool and publishing platform. [13] United States National Security Agency. Fact sheet: NSA/CSS cyber defense exercise - after exercise. [2] S. Bratus. What hackers learn that the rest of us don’t: Notes on hacker curriculum. IEEE Security press_releases/cdx_fact_sheet.pdf. and Privacy, 5(4):72–75, 2007. 9