Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Binary Analysis - Luxembourg

14,668 views

Published on

Distinguished Lecture at University of Luxembourg S&T center - January 2017

Published in: Education
  • Be the first to comment

  • Be the first to like this

Binary Analysis - Luxembourg

  1. 1. Binary Analysis for Vulnerability Detection National University of Singapore http://www.comp.nus.edu.sg/~abhik Visit to University of Luxembourg S&T center, January 2017. 1 Research project with DSO National Labs, 2013-16. “TSUNAMi: Trustworthy systems from un-trusted component amalgamations” National Research Foundation (NRF), 2015-2020.
  2. 2. Singapore 2 274 sq. mi., 5 million population, about 12 hours flight from Luxembourg.
  3. 3. NUS 3 Founded 1905. 9000 grad. & 23000 undergrad. from 88 countries.
  4. 4. Cybersecurity research 4 The National Cybersecurity R&D Programme seeks to develop R&D expertise and capabilities in cybersecurity for Singapore. It aims to improve the trustworthiness of cyber infrastructures with an emphasis on security, reliability, resiliency and usability. A 5-year S$130 million funding will be available to support research efforts into both technological and human-science aspects of cybersecurity in the following outcome- based R&D themes. The themes are designed to provide an element of operational context, while not restricting “game-changing” ideas from the community. Cybersecurity research spans six themes: Scalable Trustworthy Systems: Resilient Systems: Effective Situation Awareness and Attack Attribution: Combatting Insider Threats: Threats Detection, Analysis and Defence: Efficient and Effective Digital Forensics: https://www.nrf.gov.sg/programmes/national-cybersecurity-r-d- programme
  5. 5. Outline • NCR project – Trustworthy systems from Un-trusted Components • Technical contributions in Binary Analysis • Technology showcase • Initiatives – Consortium 5
  6. 6. COTS-integrated Platforms 6 Trustworthy System Outsourced and Shared Data Vulnerability Malicious Behavior Flaws Data Breach Binary analysis of paramount need for software acquisition or assembly.
  7. 7. Vulnerability Discovery Binary Hardening Verification Data Protection 7 Agency Collaboration – DSTA, … Industry Collaboration ST, Symantec, NEC, … Education – NUS (New module) Research Outputs – Publications, Tools, Academic Collaboration, Exchanges, Seminars, Workshops Enhancing local capabilities
  8. 8. Use of research in NRF project • Binary Analysis o Useful to government agencies for procuring software. o Deep binary analysis on evaluation version prior to procurement. • Binary hardening o Useful to government agencies on procured software. • Point technologies from individual work-packages. 8
  9. 9. Contributions • Binary analysis for  Fuzz testing  Comprehension  Debugging  Patching (Latest work) • -> Research Program at NUS since 2008, with DRTech, DSO, … 9
  10. 10. Video • https://youtu.be/C1hl_ujw6B0 • (1 Minute) • https://youtu.be/EHBjMSQvIpg • (1 Minute) 10
  11. 11. Who cares? 11 A team of hackers won $2 million by building a machine that could hack better than they could Read more at http://www.businessinsider.sg/forallsecure- mayhem-darpa-cyber-grand-challenge-2016- 8/#ZuIF7Dmq3aaCAdaq.99 DARPA Cyber Grand Challenge -> Automation of Security [detecting and fixing vulnerabilities in binaries automatically]
  12. 12. Fuzz Testing 12 Springfield Project - Fuzzing as a service OSS-Fuzz - Continuous fuzzing for open-source projects Pioneered by Barton Miller at Unv. of Wisconsin in 1988 And now, in 2016 …
  13. 13. A true story – why fuzz? • May 4, 2015 o Abhik was preparing lecture notes on fuzzing. o 11:00 AM – finished deciding on structure and trying to decide on a motivating example for fuzzing to interest the students, there are so many of them. o 11:11 AM – I get email update about a latest incident – an integer overflow in Boeing – a classic case where an automated method for sending out mal-formed or boundary inputs can reveal errors. 13
  14. 14. Presented by Thuan Pham (Model-Based) Black- box Fuzzing 1 📄 Model- Based Blackbox Fuzzing Input model Peach, Spike … Seed Input 📄 📄 📄 Pass all checks Satisfy some checks Satisfy some checks Mutated Inputs
  15. 15. Presented by Thuan Pham 📄 📄📄 📄AFLFast (Coverage-based) Grey-box Fuzzing 15 Seed Inputs Mutated Inputs … 📄📄 Input Queue Put “interesting” inputs back in the queue EnqueueDequeue
  16. 16. White-box Fuzzing 16
  17. 17. Problem Statement • How to direct the exploration to reach certain locations or targets, or enhance coverage o in large-scale program binaries o with highly-structured inputs (e.g., multi-media files) o given inadequate test suite or seeds. 17
  18. 18. Directed Search in White- box Fuzzing Apply to Crash Reproduction Problem 18 Crash reproducing supports - In-house debugging and fixing - Vulnerability checking
  19. 19. Overview 19 Program binary Benign input files (Crash instruction, loaded modules, call stack, register values) Crash input files Hercules Toolset 1. Directed Search Algorithm 2. Guided Selective Symbolic Execution
  20. 20. Control Flow Graph Construction Resolve indirect jumps/calls 20 IDA Pro CFG Generator Jump Table Extraction Edge Profiling •Assembly code •Direct Jumps/Calls Indirect Jumps/Calls CFG Program binaries
  21. 21. First-cut Analyzer 21 • Output of Stage-1 : Flow Structures and input file(s) that can reach crash module • Output of Stage-2 : refined CFG, MDG and Hybrid symbolic file • Output of Stage-3: Crash input(s) and crash explanation (based on UNSAT core)
  22. 22. UNSAT-core 22 … … b1 b2 b3 B4 bc1¬bc1 ¬bc2 ¬bc3 ¬bc4 bc2 bc3 bc4 First attempt: PC = bc1 ^ ¬bc3 ^ bc4 PC ^ CC == UNSAT bc1 contradicts CC Second attempt: PC’ = ¬bc1 ^ bc2 ^ bc4 PC’ ^ CC == SAT 1) Backtrack to b1 2) Take another branch Notations: bx: branch instruction bcx: branch condition at bx PC: path condition CC: crash condition Crash instruction
  23. 23. Evaluation 23 Progra m Advisory ID #Seed files Hercules Peach S2E WMP 9.0 CVE-2014-2671 10 WMP 9.0 CVE-2010-0718 10 AR 9.2 CVE-2010-2204 10 RP 1.0 CVE-2010-3000 10 MP 0.35 CVE-2011-0502 10 OV 1.04 CVE-2010-0688 10 Time bound: 24hrs
  24. 24. Vulnerabilities in file-processing programs 24 315 399 328 352 304 310 199 203 343 169 0 100 200 300 400 500 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 #CVE-assigned vulnerabilities by year (US National Vulnerability Database) (By 30/8) File Processing Programs
  25. 25. Combining Black-box and White-box Fuzzing 25 Augmented MoBF MoBF + Transplantation Selective and Targeted Whitebox Fuzzing • Handles missing data chunks by data chunk transplantation • Enforces integrity checks • Guides data chunk transplantation • Explores deep paths • Generates specific values causing program crashes Peach Fuzzer Production-quality MoBF Hercules (ICSE’15) Scale to WMP, Adobe Reader
  26. 26. Combination 26
  27. 27. Crucial IF 27 Input File with necessary part Input File with a missing part Test suites Crucial IFs
  28. 28. Experimental Results28 Program Advisory ID Input Model #Seed files Hercules++ Peach Hercules VLC 2.0.7 OSVDB-95632 PNG 0 – 10 VLC 2.0.3 CVE-2012-5470 PNG 0 – 10 LTP 1.5.4 CVE-2011-3328 PNG 0 – 10 XNV1.98 Unknown-1 PNG 0 – 10 XNV1.98 Unknown-2 PNG 0 – 10 XNV1.98 Unknown-3 PNG 0 – 10 WMP 9.0 Unknown-4 WAV 10 WMP 9.0 CVE-2014-2671 WAV 10 WMP 9.0 CVE-2010-0718 MIDI 0 – 10 AR 9.2 CVE-2010-2204 PDF 10 RP 1.0 CVE-2010-3000 FLV 10 MP 0.35 CVE-2011-0502 MIDI 0 – 10 OV 1.04 CVE-2010-0688 ORB 0 – 10
  29. 29. Coverage-based Grey-box Fuzzing AFL, LibFuzzer … 2 Mutators Test suite Mutated files Input Queue EnqueueDequeue
  30. 30. Exposing paths in Grey- Box Fuzzing 30
  31. 31. Key change 31 • Input: Seed Inputs S • 1: T✗ = ∅ • 2: T = S • 3: if T = ∅ then • 4: add empty file to T • 5: end if • 6: repeat • 7: t = chooseNext(T) • 8: p = assignEnergy(t) • 9: for i from 1 to p do • 10: t0 = mutate_input(t) • 11: if t0 crashes then • 12: add t0 to T✗ • 13: else if isInteresting(t0 ) then • 14: add t0 to T • 15: end if • 16: end for • 17: until timeout reached or abort-signal • Output: Crashing Inputs T✗
  32. 32. • Constant: o AFL uses this schedule (fuzzing ~1 minute) o (i) .. how AFL judges fuzzing time for the test exercising path i • Cut-off Exponential: Power Schedules p(i) = (i) p(i) = 0, if f(i) > µ min( (i)/β*2s(i), M) otherwise β is a constant s(i) #times the input exercising path i has been chosen for fuzzing f(i) #fuzz exercising path i (path-frequency) µ mean #fuzz exercising a discovered path (avg. path- frequency) M maximum energy expendable on a state
  33. 33. Prioritize low probability paths [CCS16]  Use grey-box fuzzer which keeps track of path id for a test.  Find probabilities that fuzzing a test t which exercises π leads to an input which exercises π’  Higher weightage to low probability paths discovered, to gravitate to those -> discover new states in Markov Chain with minimal effort. 33 π π ' 1 void crashme (char* s) { 2 if (s[0] == ’b’) 3 if (s[1] == ’a’) 4 if (s[2] == ’d’) 5 if (s[3] == ’!’) 6 abort (); 7 } p 8 CVEs in Binutils (3 new over GB fuzzing) Finds crashes 7x faster, as compared to plain GB fuzzing. Independent evaluation found crashes 19x faster on DARPA Cyber Grand Challenge (CGC) binaries.
  34. 34. Coverage-based Greybox Fuzzing as Markov Chain From Hackernews 1
  35. 35. Other works – Crash Bucketing 35 p1 f1 f2 f3 f4x x x b2 b1 b4 b3 b5  Identify culprit constraint  Use culprit constraint as “reason” of failure  Group failing paths having same “reason” together Culprit constraint[Upcoming work FASE17] Point-of-Failure based Approach Call-stack based Approach Symbolic analysis based Approach
  36. 36. Program Repository Size (kLOC) #Failing Tests #Cluster Point-of- Failure #Cluster Stack hash #Cluster Symbolic Analysis mkfifo Coreutils 38 2 1 1 1 mkdir Coreutils 40 2 1 1 1 mknod Coreutils 39 2 1 1 1 md5sum Coreutils 43 48 1 1 1 pr Coreutils 54 6 2 2 4 ptx Coreutils 62 3095 16 1 3 seq Coreutils 39 72 1 1 18 paste Coreutils 38 4510 10 1 3 touch Coreutils 18 406 2 3 14 du Coreutils 41 100 2 2 8 cut Coreutils 43 5 1 1 1 grep SIR 61 7122 1 1 11 gzip SIR 44 265 1 1 1 seq SIR 57 31 1 1 1 polymorph BugBench 25 67 1 1 2 xmail Exploit-db 30 129 1 1 1 exim Exploit-db 253 16 1 1 6 gpg Exploit-db 218 2 1 1 1
  37. 37. Recall CGC 37 A team of hackers won $2 million by building a machine that could hack better than they could Read more at http://www.businessinsider.sg/forallsecure- mayhem-darpa-cyber-grand-challenge-2016- 8/#ZuIF7Dmq3aaCAdaq.99 DARPA Cyber Grand Challenge -> Automation of Security [detecting and fixing vulnerabilities in binaries automatically]
  38. 38. Auto-Patching 38
  39. 39. Automated Patching • Automated patching – source code and binaries o Vulnerability localization [where to fix] • Hypothesize the error causes – suspect o Symbolic execution [what values should be returned: angelic values] • Specification of the suspicious fragment • Input-output requirements from each test • Repair constraint o Program synthesis [which code can return these values] • Decide operators which can appear in the fix • Generate a fix by solving repair constraint. 39 Buggy Program Failing / Passing Tests Patched Program Patching Tool
  40. 40. Example 40 1 int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = down_sep; // bias= up_sep + 100 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } inhibit up_sep down_se p Observed output Expected Output Result 1 0 100 0 0 pass 1 11 110 0 1 fail 0 100 50 1 1 pass 1 -20 60 0 1 fail 0 0 10 0 0 pass
  41. 41. Repair Constraint 41 1 int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = f(inhibit, up_sep, down_sep) 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } Inhibit == 1 up_sep == 11 down_se p == 110 Symbolic Execution f(1,11,110) > 110
  42. 42. Conjure up a function • Instead of solving • Select primitive components to be used by the synthesized program based on complexity • Look for a program that uses only these primitive components and satisfy the repair constraint o Done via another constraint solving problem – pgm. synthesis • Solving the repair constraint is the key, not how it is solved • Enumerate expressions over a given set of components / operators o Enforce axioms of the operators o If candidate repair contains a constant, solve using SMT 42 Repair Constraint: f(1,11,110) > 110  f(1,0,100) ≤ 100  f(1,-20,60) > 60
  43. 43. Patching Tool Released 43 SEMFIX: ICSE 2013, Angelix: ICSE 2016 http://angelix.io
  44. 44. Repair-ed 44 0 10 20 30 40 wireshark php gzip gmp libtiff Overall Angelix SPR GenProg #Fixes Del Del, Per Angelix 28 5 18% SPR 31 13 42% Subject LoC wireshark 2814K php 1046K gzip 491K gmp 145K libtiff 77K
  45. 45. Over-fitting problem in Program Repair • Searches for arbitrary modifications could lead to undesirable program modifications like deletion of functionality 45 static void BadPPM(char file) { fprintf(stderr, "%s: Not a PPM file.n", file); exit(-2); } ➢Derived rules that disallow patches that cause significant changes to the control flow or data-flow of the program ➢Benefits of Anti-patterns: ○ Can be easily integrated with any automated repair tools ○ Localizes Better ○ Generate Fixes Faster Example of automatically generated patches Goal of Repair tools: Make all test pass Test: Pass if non-zero exit status Trivial Patch: Delete exit(-2) ➢Should disallow this modifications
  46. 46. “Latest” Results 46 1 i f ( hbtype == TLS1 HB REQUEST) { 2 . . . 3 memcpy (bp , pl , payload ) ; 4 . . . 5 } (a) The buggy part of the Heartbleed- vulnerable OpenSSL 1 i f ( hbtype == TLS1 HB REQUEST 2 && payload + 18 < s->s3->rrec.length) { 3 . . . 4 } (b) A fix generated automatically 1 if (1 + 2 + payload + 16 > s->s3->rrec.length) 2 return 0; 3 . . . 4 i f ( hbtype == TLS1_HB_REQUEST) { 5 . . . 6 } 7 e l s e i f ( hbtype == TLS1_HB_RESPONSE) { 8 . . . 9 } 10 r e t u r n 0 ; (c) The developer-provided repair The Heartbleed Bug is a serious vulnerability in the popular OpenSSL cryptographic software library. This weakness allows stealing the information protected, under normal conditions, by the SSL/TLS encryption used to secure the Internet. SSL/TLS provides communication security and privacy over the Internet for applications such as web, email, instant messaging (IM) and some virtual private networks (VPNs). --- Source: heartbleed.com
  47. 47. • Scalable white-box analysis on binaries • How Why For whom • Cluster paths online Guide search SW Acquisition • Control Symbolic Variables Extract semantics Developers with 3rd party code • Hybrid symbolic file COTS system assembly • Inject path sensitivity into GB 47 Collaborators: Marcel Boehme, Satish Chandra (Facebook), Sergey Mechtaev, Van Thuan Pham, Mukul Prasad (Fujitsu), Shin Hwei Tan, Jooyong Yi, Hiroaki Yoshida (Fujitsu). Relevant papers: http://www.comp.nus.edu.sg/~abhik/projects/Repair/index.html http://www.comp.nus.edu.sg/~abhik/projects/Fuzz/

×