Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection
Upcoming SlideShare
Loading in...5

Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection Simseer and Bugwise - Web Services for Binary-level Software Similarity and Defect Detection Presentation Transcript

  • Simseer and BugwiseWeb Services for Binary-level Software Similarity and Defect Detection SILVIO CESARE AND YANG XIANG DEAKIN UNIVERSITY
  • Introduction Defect detection  Finds software bugs  E.g., buffer overflows, divide-by-zeros, use-after-frees Malware variant detection  Discover obfuscated, evolved, mutated copies of malware Software theft detection  Discover illegitimate copies of software Plagiarism detection  Discover unauthorized copying of software code.  E.g., student assignments.
  • Motivation Defect detection  External Auditing  Verification of compilation and linkage Malware variant detection  Increase predictive power of signatures  Most new malware are variants of existing malware Software theft detection  Protection of intellectual property  Automated detection reduces costs of investigation Plagiarism detection  Provide a deterrent through automated detection  Manual approach not scalable
  • Innovation This research makes the following contributions:  We propose an online web service, Bugwise, to perform binary-level defect detection.  We propose an online web service, Simseer, to address malware variant detection, software theft detection and plagiarism detection.  We use state-of-the-art algorithms in novel applications.  We implement and make our services public
  • Related Work Defect detection  Formal methods, program analysis, abstract interpretation, data flow analysis. Software similarity  Features make a birthmark (fingerprint)  Similarity function comparing birthmarks (euclidean distance, cosine similarity etc). Birthmarks  Vectors, strings, sets, trees, graphs etc.  Byte-level content, instructions, basic blocks, control flow, API calls etc.  Our system uses control flow.
  • Our Approach Bugwise and Simseer use a unified backend from our previous work – Malwise. We implement two web services using cloud-based virtual private servers. Simseer  Uses control flow as a feature to generate a signature (birthmark). Bugwise  Combines decompilation with traditional data flow analysis to detect several bug classes.
  • Web Services WorkflowWeb Frontend Scan Server Script SSH Tunnel Scheduler ScriptEvolutionary SSH Tunnel (Simseer) MalwiseTree Creation Store and Display SSH Tunnel (Bugwise) Results
  • The Web Frontend Accepts submission of archives and executables. Implemented with server side PHP programming language. PHP launches script to process submitted binary. Script performs validation.  E.g., Filenames have no special characters. Launches C++ network client to submit binary to scan server.
  • The Web Frontend
  • The Scheduling Work Queue Listens to TCP port on scan server. Connects to web frontend via SSH tunnel. Accepts binaries from web frontend. Queues jobs so that only 1 is running at any time. Launches Simseer or Bugwise script to process binary.
  • Malwise Backend Malwise is a native C++ application of ~100,000 LOC. Plugin-based modular system. Simseer and Bugwise differ by their configuration and plugins. Configuation specified in XML.
  • The Simseer Backend Performs unpacking to remove malware obfsucation. Decompiles the control flow. 1st pass generates signatures. 2nd pass shows similarity between signatures.
  • The Bugwise Backend Performs decompilation of local variables. Performs compiler-style optimisations (dead code elimiation, copy propagation, constant folding etc). Performs data flow analysis (reaching defintions, upwards exposed uses etc). Detects double frees (deallocating the same memory twice) using the data flow analysis results.
  • Configuration - Simseer (l), Bugwise (r) <ModuleGroup><ModuleGroup> <Name>Scan</Name> <Name>Scan</Name> <Run>Code Optimsation 1</Run> <Run>Packer Detection Using Entropy</Run> <Run>Linux Arch</Run> <Run>Unpacker Using Application Level Emulation</Run> <Run>Pre Decompiler Data Flow Analysis</Run> <Run>Structuring</Run> <Run>X86 Decompiler Data Flow Analysis</Run> <Run>NGram Structuring</Run> <Run>Decompiler Data Flow Analysis</Run></ModuleGroup> <Run>Code Optimsation 2</Run> <Run>IRDataFlowAnalysis</Run> <Run>Double Free Detection</Run> </ModuleGroup>
  • Simseer Evolutionary Tree Visualization Phylogenetic tree – e.g. tree of life. The closer nodes are in the tree, the more similar those nodes are. Simseer backend generates distance/similarity matrix. PHYLIP software package takes matrix and generates tree. Tree is rendered to an image.
  • Program Realtionships Visualization
  • Results Processing Parse XML output from Malwise PHP parser Simseer  Display evolutionary tree and similarity matrix Bugwise  Display table showing address of double frees
  • Efficiency of Malwise as a Web Services Does a web service incur much overhead compared to command line usage? Test case is 9 samples submitted to Simseer. Python script sends samples and waits for results. We compare the times of command line versus the web service. Mean overhead is 0.64 seconds.
  • Processing timesSimseer Web Service (l), Malwise Command Line (r)
  • Availability http://www.FooCodeChu.Com Rate limiting of submissions. Limit of sample sizes and the number of samples in archives. We intend to relax these restrictions as we migrate to more scalable infrastructure.
  • Future Work Enterprise messaging to perform load balancing and queuing? More options to scans to exploit Malwise plugin system. Any-time clustering to cluster new samples incrementally in real-time? Bug detection could be developed as bug management system.
  • Conclusion We make available new services for bug detection and software similarity. Our backend Malwise is versatile and allows plugins to implement these services. Bugwise has found real bugs in Linux. The web service overhead is minimal. We believe web services in these applications will have future growth.