Successfully reported this slideshow.
Your SlideShare is downloading. ×

Binary Analysis - Luxembourg

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Binary Analysis - Luxembourg

  1. 1. Binary Analysis for Vulnerability Detection National University of Singapore http://www.comp.nus.edu.sg/~abhik Visit to University of Luxembourg S&T center, January 2017. 1 Research project with DSO National Labs, 2013-16. “TSUNAMi: Trustworthy systems from un-trusted component amalgamations” National Research Foundation (NRF), 2015-2020.
  2. 2. Singapore 2 274 sq. mi., 5 million population, about 12 hours flight from Luxembourg.
  3. 3. NUS 3 Founded 1905. 9000 grad. & 23000 undergrad. from 88 countries.
  4. 4. Cybersecurity research 4 The National Cybersecurity R&D Programme seeks to develop R&D expertise and capabilities in cybersecurity for Singapore. It aims to improve the trustworthiness of cyber infrastructures with an emphasis on security, reliability, resiliency and usability. A 5-year S$130 million funding will be available to support research efforts into both technological and human-science aspects of cybersecurity in the following outcome- based R&D themes. The themes are designed to provide an element of operational context, while not restricting “game-changing” ideas from the community. Cybersecurity research spans six themes: Scalable Trustworthy Systems: Resilient Systems: Effective Situation Awareness and Attack Attribution: Combatting Insider Threats: Threats Detection, Analysis and Defence: Efficient and Effective Digital Forensics: https://www.nrf.gov.sg/programmes/national-cybersecurity-r-d- programme
  5. 5. Outline • NCR project – Trustworthy systems from Un-trusted Components • Technical contributions in Binary Analysis • Technology showcase • Initiatives – Consortium 5
  6. 6. COTS-integrated Platforms 6 Trustworthy System Outsourced and Shared Data Vulnerability Malicious Behavior Flaws Data Breach Binary analysis of paramount need for software acquisition or assembly.
  7. 7. Vulnerability Discovery Binary Hardening Verification Data Protection 7 Agency Collaboration – DSTA, … Industry Collaboration ST, Symantec, NEC, … Education – NUS (New module) Research Outputs – Publications, Tools, Academic Collaboration, Exchanges, Seminars, Workshops Enhancing local capabilities
  8. 8. Use of research in NRF project • Binary Analysis o Useful to government agencies for procuring software. o Deep binary analysis on evaluation version prior to procurement. • Binary hardening o Useful to government agencies on procured software. • Point technologies from individual work-packages. 8
  9. 9. Contributions • Binary analysis for  Fuzz testing  Comprehension  Debugging  Patching (Latest work) • -> Research Program at NUS since 2008, with DRTech, DSO, … 9
  10. 10. Video • https://youtu.be/C1hl_ujw6B0 • (1 Minute) • https://youtu.be/EHBjMSQvIpg • (1 Minute) 10
  11. 11. Who cares? 11 A team of hackers won $2 million by building a machine that could hack better than they could Read more at http://www.businessinsider.sg/forallsecure- mayhem-darpa-cyber-grand-challenge-2016- 8/#ZuIF7Dmq3aaCAdaq.99 DARPA Cyber Grand Challenge -> Automation of Security [detecting and fixing vulnerabilities in binaries automatically]
  12. 12. Fuzz Testing 12 Springfield Project - Fuzzing as a service OSS-Fuzz - Continuous fuzzing for open-source projects Pioneered by Barton Miller at Unv. of Wisconsin in 1988 And now, in 2016 …
  13. 13. A true story – why fuzz? • May 4, 2015 o Abhik was preparing lecture notes on fuzzing. o 11:00 AM – finished deciding on structure and trying to decide on a motivating example for fuzzing to interest the students, there are so many of them. o 11:11 AM – I get email update about a latest incident – an integer overflow in Boeing – a classic case where an automated method for sending out mal-formed or boundary inputs can reveal errors. 13
  14. 14. Presented by Thuan Pham (Model-Based) Black- box Fuzzing 1 📄 Model- Based Blackbox Fuzzing Input model Peach, Spike … Seed Input 📄 📄 📄 Pass all checks Satisfy some checks Satisfy some checks Mutated Inputs
  15. 15. Presented by Thuan Pham 📄 📄📄 📄AFLFast (Coverage-based) Grey-box Fuzzing 15 Seed Inputs Mutated Inputs … 📄📄 Input Queue Put “interesting” inputs back in the queue EnqueueDequeue
  16. 16. White-box Fuzzing 16
  17. 17. Problem Statement • How to direct the exploration to reach certain locations or targets, or enhance coverage o in large-scale program binaries o with highly-structured inputs (e.g., multi-media files) o given inadequate test suite or seeds. 17
  18. 18. Directed Search in White- box Fuzzing Apply to Crash Reproduction Problem 18 Crash reproducing supports - In-house debugging and fixing - Vulnerability checking
  19. 19. Overview 19 Program binary Benign input files (Crash instruction, loaded modules, call stack, register values) Crash input files Hercules Toolset 1. Directed Search Algorithm 2. Guided Selective Symbolic Execution
  20. 20. Control Flow Graph Construction Resolve indirect jumps/calls 20 IDA Pro CFG Generator Jump Table Extraction Edge Profiling •Assembly code •Direct Jumps/Calls Indirect Jumps/Calls CFG Program binaries
  21. 21. First-cut Analyzer 21 • Output of Stage-1 : Flow Structures and input file(s) that can reach crash module • Output of Stage-2 : refined CFG, MDG and Hybrid symbolic file • Output of Stage-3: Crash input(s) and crash explanation (based on UNSAT core)
  22. 22. UNSAT-core 22 … … b1 b2 b3 B4 bc1¬bc1 ¬bc2 ¬bc3 ¬bc4 bc2 bc3 bc4 First attempt: PC = bc1 ^ ¬bc3 ^ bc4 PC ^ CC == UNSAT bc1 contradicts CC Second attempt: PC’ = ¬bc1 ^ bc2 ^ bc4 PC’ ^ CC == SAT 1) Backtrack to b1 2) Take another branch Notations: bx: branch instruction bcx: branch condition at bx PC: path condition CC: crash condition Crash instruction
  23. 23. Evaluation 23 Progra m Advisory ID #Seed files Hercules Peach S2E WMP 9.0 CVE-2014-2671 10 WMP 9.0 CVE-2010-0718 10 AR 9.2 CVE-2010-2204 10 RP 1.0 CVE-2010-3000 10 MP 0.35 CVE-2011-0502 10 OV 1.04 CVE-2010-0688 10 Time bound: 24hrs
  24. 24. Vulnerabilities in file-processing programs 24 315 399 328 352 304 310 199 203 343 169 0 100 200 300 400 500 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 #CVE-assigned vulnerabilities by year (US National Vulnerability Database) (By 30/8) File Processing Programs
  25. 25. Combining Black-box and White-box Fuzzing 25 Augmented MoBF MoBF + Transplantation Selective and Targeted Whitebox Fuzzing • Handles missing data chunks by data chunk transplantation • Enforces integrity checks • Guides data chunk transplantation • Explores deep paths • Generates specific values causing program crashes Peach Fuzzer Production-quality MoBF Hercules (ICSE’15) Scale to WMP, Adobe Reader
  26. 26. Combination 26
  27. 27. Crucial IF 27 Input File with necessary part Input File with a missing part Test suites Crucial IFs
  28. 28. Experimental Results28 Program Advisory ID Input Model #Seed files Hercules++ Peach Hercules VLC 2.0.7 OSVDB-95632 PNG 0 – 10 VLC 2.0.3 CVE-2012-5470 PNG 0 – 10 LTP 1.5.4 CVE-2011-3328 PNG 0 – 10 XNV1.98 Unknown-1 PNG 0 – 10 XNV1.98 Unknown-2 PNG 0 – 10 XNV1.98 Unknown-3 PNG 0 – 10 WMP 9.0 Unknown-4 WAV 10 WMP 9.0 CVE-2014-2671 WAV 10 WMP 9.0 CVE-2010-0718 MIDI 0 – 10 AR 9.2 CVE-2010-2204 PDF 10 RP 1.0 CVE-2010-3000 FLV 10 MP 0.35 CVE-2011-0502 MIDI 0 – 10 OV 1.04 CVE-2010-0688 ORB 0 – 10
  29. 29. Coverage-based Grey-box Fuzzing AFL, LibFuzzer … 2 Mutators Test suite Mutated files Input Queue EnqueueDequeue
  30. 30. Exposing paths in Grey- Box Fuzzing 30
  31. 31. Key change 31 • Input: Seed Inputs S • 1: T✗ = ∅ • 2: T = S • 3: if T = ∅ then • 4: add empty file to T • 5: end if • 6: repeat • 7: t = chooseNext(T) • 8: p = assignEnergy(t) • 9: for i from 1 to p do • 10: t0 = mutate_input(t) • 11: if t0 crashes then • 12: add t0 to T✗ • 13: else if isInteresting(t0 ) then • 14: add t0 to T • 15: end if • 16: end for • 17: until timeout reached or abort-signal • Output: Crashing Inputs T✗
  32. 32. • Constant: o AFL uses this schedule (fuzzing ~1 minute) o (i) .. how AFL judges fuzzing time for the test exercising path i • Cut-off Exponential: Power Schedules p(i) = (i) p(i) = 0, if f(i) > µ min( (i)/β*2s(i), M) otherwise β is a constant s(i) #times the input exercising path i has been chosen for fuzzing f(i) #fuzz exercising path i (path-frequency) µ mean #fuzz exercising a discovered path (avg. path- frequency) M maximum energy expendable on a state
  33. 33. Prioritize low probability paths [CCS16]  Use grey-box fuzzer which keeps track of path id for a test.  Find probabilities that fuzzing a test t which exercises π leads to an input which exercises π’  Higher weightage to low probability paths discovered, to gravitate to those -> discover new states in Markov Chain with minimal effort. 33 π π ' 1 void crashme (char* s) { 2 if (s[0] == ’b’) 3 if (s[1] == ’a’) 4 if (s[2] == ’d’) 5 if (s[3] == ’!’) 6 abort (); 7 } p 8 CVEs in Binutils (3 new over GB fuzzing) Finds crashes 7x faster, as compared to plain GB fuzzing. Independent evaluation found crashes 19x faster on DARPA Cyber Grand Challenge (CGC) binaries.
  34. 34. Coverage-based Greybox Fuzzing as Markov Chain From Hackernews 1
  35. 35. Other works – Crash Bucketing 35 p1 f1 f2 f3 f4x x x b2 b1 b4 b3 b5  Identify culprit constraint  Use culprit constraint as “reason” of failure  Group failing paths having same “reason” together Culprit constraint[Upcoming work FASE17] Point-of-Failure based Approach Call-stack based Approach Symbolic analysis based Approach
  36. 36. Program Repository Size (kLOC) #Failing Tests #Cluster Point-of- Failure #Cluster Stack hash #Cluster Symbolic Analysis mkfifo Coreutils 38 2 1 1 1 mkdir Coreutils 40 2 1 1 1 mknod Coreutils 39 2 1 1 1 md5sum Coreutils 43 48 1 1 1 pr Coreutils 54 6 2 2 4 ptx Coreutils 62 3095 16 1 3 seq Coreutils 39 72 1 1 18 paste Coreutils 38 4510 10 1 3 touch Coreutils 18 406 2 3 14 du Coreutils 41 100 2 2 8 cut Coreutils 43 5 1 1 1 grep SIR 61 7122 1 1 11 gzip SIR 44 265 1 1 1 seq SIR 57 31 1 1 1 polymorph BugBench 25 67 1 1 2 xmail Exploit-db 30 129 1 1 1 exim Exploit-db 253 16 1 1 6 gpg Exploit-db 218 2 1 1 1
  37. 37. Recall CGC 37 A team of hackers won $2 million by building a machine that could hack better than they could Read more at http://www.businessinsider.sg/forallsecure- mayhem-darpa-cyber-grand-challenge-2016- 8/#ZuIF7Dmq3aaCAdaq.99 DARPA Cyber Grand Challenge -> Automation of Security [detecting and fixing vulnerabilities in binaries automatically]
  38. 38. Auto-Patching 38
  39. 39. Automated Patching • Automated patching – source code and binaries o Vulnerability localization [where to fix] • Hypothesize the error causes – suspect o Symbolic execution [what values should be returned: angelic values] • Specification of the suspicious fragment • Input-output requirements from each test • Repair constraint o Program synthesis [which code can return these values] • Decide operators which can appear in the fix • Generate a fix by solving repair constraint. 39 Buggy Program Failing / Passing Tests Patched Program Patching Tool
  40. 40. Example 40 1 int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = down_sep; // bias= up_sep + 100 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } inhibit up_sep down_se p Observed output Expected Output Result 1 0 100 0 0 pass 1 11 110 0 1 fail 0 100 50 1 1 pass 1 -20 60 0 1 fail 0 0 10 0 0 pass
  41. 41. Repair Constraint 41 1 int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = f(inhibit, up_sep, down_sep) 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } Inhibit == 1 up_sep == 11 down_se p == 110 Symbolic Execution f(1,11,110) > 110
  42. 42. Conjure up a function • Instead of solving • Select primitive components to be used by the synthesized program based on complexity • Look for a program that uses only these primitive components and satisfy the repair constraint o Done via another constraint solving problem – pgm. synthesis • Solving the repair constraint is the key, not how it is solved • Enumerate expressions over a given set of components / operators o Enforce axioms of the operators o If candidate repair contains a constant, solve using SMT 42 Repair Constraint: f(1,11,110) > 110  f(1,0,100) ≤ 100  f(1,-20,60) > 60
  43. 43. Patching Tool Released 43 SEMFIX: ICSE 2013, Angelix: ICSE 2016 http://angelix.io
  44. 44. Repair-ed 44 0 10 20 30 40 wireshark php gzip gmp libtiff Overall Angelix SPR GenProg #Fixes Del Del, Per Angelix 28 5 18% SPR 31 13 42% Subject LoC wireshark 2814K php 1046K gzip 491K gmp 145K libtiff 77K
  45. 45. Over-fitting problem in Program Repair • Searches for arbitrary modifications could lead to undesirable program modifications like deletion of functionality 45 static void BadPPM(char file) { fprintf(stderr, "%s: Not a PPM file.n", file); exit(-2); } ➢Derived rules that disallow patches that cause significant changes to the control flow or data-flow of the program ➢Benefits of Anti-patterns: ○ Can be easily integrated with any automated repair tools ○ Localizes Better ○ Generate Fixes Faster Example of automatically generated patches Goal of Repair tools: Make all test pass Test: Pass if non-zero exit status Trivial Patch: Delete exit(-2) ➢Should disallow this modifications
  46. 46. “Latest” Results 46 1 i f ( hbtype == TLS1 HB REQUEST) { 2 . . . 3 memcpy (bp , pl , payload ) ; 4 . . . 5 } (a) The buggy part of the Heartbleed- vulnerable OpenSSL 1 i f ( hbtype == TLS1 HB REQUEST 2 && payload + 18 < s->s3->rrec.length) { 3 . . . 4 } (b) A fix generated automatically 1 if (1 + 2 + payload + 16 > s->s3->rrec.length) 2 return 0; 3 . . . 4 i f ( hbtype == TLS1_HB_REQUEST) { 5 . . . 6 } 7 e l s e i f ( hbtype == TLS1_HB_RESPONSE) { 8 . . . 9 } 10 r e t u r n 0 ; (c) The developer-provided repair The Heartbleed Bug is a serious vulnerability in the popular OpenSSL cryptographic software library. This weakness allows stealing the information protected, under normal conditions, by the SSL/TLS encryption used to secure the Internet. SSL/TLS provides communication security and privacy over the Internet for applications such as web, email, instant messaging (IM) and some virtual private networks (VPNs). --- Source: heartbleed.com
  47. 47. • Scalable white-box analysis on binaries • How Why For whom • Cluster paths online Guide search SW Acquisition • Control Symbolic Variables Extract semantics Developers with 3rd party code • Hybrid symbolic file COTS system assembly • Inject path sensitivity into GB 47 Collaborators: Marcel Boehme, Satish Chandra (Facebook), Sergey Mechtaev, Van Thuan Pham, Mukul Prasad (Fujitsu), Shin Hwei Tan, Jooyong Yi, Hiroaki Yoshida (Fujitsu). Relevant papers: http://www.comp.nus.edu.sg/~abhik/projects/Repair/index.html http://www.comp.nus.edu.sg/~abhik/projects/Fuzz/

Editor's Notes

  • It is the reason why the model-based blackbox fuzzing technique comes in.
    The technique has been implemented in some well-known tools like Peach Fuzzer and Spike. Basically, the idea is using an input model (someone calls it input grammar) which specifies the information of file format such as the data chunk types and data fields. With that support of input model, the fuzzing tool can generate more valid and semi-valid inputs, As a result, these inputs can lead to deeper program paths and have more chance to expose vulnerabilities.
  • The first and common technique is blackbox fuzzing. It considers the PUT as a black box, and have no information about it. Given a seed input, the tool randomly mutate or modify some parts of the seed file to generate massive number of new files before feeding them to the program under test, and monitor the program to detect abnormal behaviours like program crashes.
    However, since the seed file is randomly mutated, it is very likely that a large portion of the mutated files will be rejected by the parser code because these file are invalid respect to the file format.
  • File processing programs are everywhere.
    Even though these programs are carefully tested, according to the data we collect from the US National Vulnerability Database, in 10 years, since 2007 the NVD has assigned CVE ID for more than 3000 vulnerabilities found in these programs. The number could be much bigger because we do not know how many vulnerabilities which have been discovered but not reported to NVD. Maybe several of them are sold in the black market so attackers can use them to exploit the affected programs and attack our systems.
    In fact, a large portion of these vulnerabilities has been exposed by crafted common media and document file formats which we use very often in our daily life, such as MIDI, FLV, PDF, PNG. Because of that, it is the pressing need for us to design a better testing technique to effectively and efficiently discover before some attackers can do it.
  • Data chunk transplantation is the key idea in our new Whitebox Fuzzing approach - we call it Model-based Whitebox Fuzzing because this is a combination with substantial modifications between Model-based Blackbox Fuzzing and normal Whitebox Fuzzing.
    Model-based Blackbox Fuzzing side handles the missing data chunk problem by implementing the data chunk transplantation idea. Moreover, having the input model, it also enforces the integrity constraints of generated test cases.
    On the right hand side, the whitebox fuzzing supports the data chunk transplantation by providing some guidance. I will explain how Whitebox Fuzzing can support Data chunk transplantation in details in the next few slides. Moreover, Whitebox Fuzzing does concolic exploration to reach potential target crash locations and generate specific values that can cause program to crash. In terms of implementation, we build our system on top of Peach Fuzzer - a production-quality fuzzer and Hercules -- a selective and targeted whitebox fuzzing.
    Now, let me explain in details how our system is designed and implemented. First of all, let me explain how the input model is written and how the original version of Peach Fuzzer works. These things are important to fully understand our approach.
  • More satisfying to me as a security researcher than any academic award.
  • Suppose f1 is a failing path. To identify the culprit constraint of f1, out technique explore all paths in DFS search strategy until it finds the closest passing path p. During the exploration, some new failing paths (f2,f3,f4) and some infeasible paths will be traversed/detected. The branch condition of the branch from which the passing path p deviates is identified as culprit constraint.

×