Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection

  1. 1. Simseer and BugwiseWeb Services for Binary-level Software Similarity and Defect Detection SILVIO CESARE AND YANG XIANG DEAKIN UNIVERSITY
  2. 2. Introduction Defect detection  Finds software bugs  E.g., buffer overflows, divide-by-zeros, use-after-frees Malware variant detection  Discover obfuscated, evolved, mutated copies of malware Software theft detection  Discover illegitimate copies of software Plagiarism detection  Discover unauthorized copying of software code.  E.g., student assignments.
  3. 3. Motivation Defect detection  External Auditing  Verification of compilation and linkage Malware variant detection  Increase predictive power of signatures  Most new malware are variants of existing malware Software theft detection  Protection of intellectual property  Automated detection reduces costs of investigation Plagiarism detection  Provide a deterrent through automated detection  Manual approach not scalable
  4. 4. Innovation This research makes the following contributions:  We propose an online web service, Bugwise, to perform binary-level defect detection.  We propose an online web service, Simseer, to address malware variant detection, software theft detection and plagiarism detection.  We use state-of-the-art algorithms in novel applications.  We implement and make our services public
  5. 5. Related Work Defect detection  Formal methods, program analysis, abstract interpretation, data flow analysis. Software similarity  Features make a birthmark (fingerprint)  Similarity function comparing birthmarks (euclidean distance, cosine similarity etc). Birthmarks  Vectors, strings, sets, trees, graphs etc.  Byte-level content, instructions, basic blocks, control flow, API calls etc.  Our system uses control flow.
  6. 6. Our Approach Bugwise and Simseer use a unified backend from our previous work – Malwise. We implement two web services using cloud-based virtual private servers. Simseer  Uses control flow as a feature to generate a signature (birthmark). Bugwise  Combines decompilation with traditional data flow analysis to detect several bug classes.
  7. 7. Web Services WorkflowWeb Frontend Scan Server Script SSH Tunnel Scheduler ScriptEvolutionary SSH Tunnel (Simseer) MalwiseTree Creation Store and Display SSH Tunnel (Bugwise) Results
  8. 8. The Web Frontend Accepts submission of archives and executables. Implemented with server side PHP programming language. PHP launches script to process submitted binary. Script performs validation.  E.g., Filenames have no special characters. Launches C++ network client to submit binary to scan server.
  9. 9. The Web Frontend
  10. 10. The Scheduling Work Queue Listens to TCP port on scan server. Connects to web frontend via SSH tunnel. Accepts binaries from web frontend. Queues jobs so that only 1 is running at any time. Launches Simseer or Bugwise script to process binary.
  11. 11. Malwise Backend Malwise is a native C++ application of ~100,000 LOC. Plugin-based modular system. Simseer and Bugwise differ by their configuration and plugins. Configuation specified in XML.
  12. 12. The Simseer Backend Performs unpacking to remove malware obfsucation. Decompiles the control flow. 1st pass generates signatures. 2nd pass shows similarity between signatures.
  13. 13. The Bugwise Backend Performs decompilation of local variables. Performs compiler-style optimisations (dead code elimiation, copy propagation, constant folding etc). Performs data flow analysis (reaching defintions, upwards exposed uses etc). Detects double frees (deallocating the same memory twice) using the data flow analysis results.
  14. 14. Configuration - Simseer (l), Bugwise (r) <ModuleGroup><ModuleGroup> <Name>Scan</Name> <Name>Scan</Name> <Run>Code Optimsation 1</Run> <Run>Packer Detection Using Entropy</Run> <Run>Linux Arch</Run> <Run>Unpacker Using Application Level Emulation</Run> <Run>Pre Decompiler Data Flow Analysis</Run> <Run>Structuring</Run> <Run>X86 Decompiler Data Flow Analysis</Run> <Run>NGram Structuring</Run> <Run>Decompiler Data Flow Analysis</Run></ModuleGroup> <Run>Code Optimsation 2</Run> <Run>IRDataFlowAnalysis</Run> <Run>Double Free Detection</Run> </ModuleGroup>
  15. 15. Simseer Evolutionary Tree Visualization Phylogenetic tree – e.g. tree of life. The closer nodes are in the tree, the more similar those nodes are. Simseer backend generates distance/similarity matrix. PHYLIP software package takes matrix and generates tree. Tree is rendered to an image.
  16. 16. Program Realtionships Visualization
  17. 17. Results Processing Parse XML output from Malwise PHP parser Simseer  Display evolutionary tree and similarity matrix Bugwise  Display table showing address of double frees
  18. 18. Efficiency of Malwise as a Web Services Does a web service incur much overhead compared to command line usage? Test case is 9 samples submitted to Simseer. Python script sends samples and waits for results. We compare the times of command line versus the web service. Mean overhead is 0.64 seconds.
  19. 19. Processing timesSimseer Web Service (l), Malwise Command Line (r)
  20. 20. Availability http://www.FooCodeChu.Com Rate limiting of submissions. Limit of sample sizes and the number of samples in archives. We intend to relax these restrictions as we migrate to more scalable infrastructure.
  21. 21. Future Work Enterprise messaging to perform load balancing and queuing? More options to scans to exploit Malwise plugin system. Any-time clustering to cluster new samples incrementally in real-time? Bug detection could be developed as bug management system.
  22. 22. Conclusion We make available new services for bug detection and software similarity. Our backend Malwise is versatile and allows plugins to implement these services. Bugwise has found real bugs in Linux. The web service overhead is minimal. We believe web services in these applications will have future growth.