3. Motivation
Defect detection
External Auditing
Verification of compilation and linkage
Malware variant detection
Increase predictive power of signatures
Most new malware are variants of existing malware
Software theft detection
Protection of intellectual property
Automated detection reduces costs of investigation
Plagiarism detection
Provide a deterrent through automated detection
Manual approach not scalable
4. Innovation
This research makes the following contributions:
We propose an online web service, Bugwise, to perform
binary-level defect detection.
We propose an online web service, Simseer, to address
malware variant detection, software theft detection and
plagiarism detection.
We use state-of-the-art algorithms in novel applications.
We implement and make our services public
5. Related Work
Defect detection
Formal methods, program analysis, abstract interpretation,
data flow analysis.
Software similarity
Features make a birthmark (fingerprint)
Similarity function comparing birthmarks (euclidean distance,
cosine similarity etc).
Birthmarks
Vectors, strings, sets, trees, graphs etc.
Byte-level content, instructions, basic blocks, control flow, API
calls etc.
Our system uses control flow.
6. Our Approach
Bugwise and Simseer use a unified backend from our
previous work – Malwise.
We implement two web services using cloud-based
virtual private servers.
Simseer
Uses control flow as a feature to generate a signature (birthmark).
Bugwise
Combines decompilation with traditional data flow analysis to detect
several bug classes.
7. Web Services Workflow
Web Frontend Scan Server
Script SSH Tunnel Scheduler
Script
Evolutionary
SSH Tunnel (Simseer) Malwise
Tree Creation
Store and
Display SSH Tunnel (Bugwise)
Results
8. The Web Frontend
Accepts submission of archives and executables.
Implemented with server side PHP programming
language.
PHP launches script to process submitted binary.
Script performs validation.
E.g., Filenames have no special characters.
Launches C++ network client to submit binary to scan
server.
10. The Scheduling Work Queue
Listens to TCP port on scan server.
Connects to web frontend via SSH tunnel.
Accepts binaries from web frontend.
Queues jobs so that only 1 is running at any time.
Launches Simseer or Bugwise script to process
binary.
11. Malwise Backend
Malwise is a native C++ application of ~100,000
LOC.
Plugin-based modular system.
Simseer and Bugwise differ by their configuration
and plugins.
Configuation specified in XML.
12. The Simseer Backend
Performs unpacking to remove malware obfsucation.
Decompiles the control flow.
1st pass generates signatures.
2nd pass shows similarity between signatures.
13. The Bugwise Backend
Performs decompilation of local variables.
Performs compiler-style optimisations (dead code
elimiation, copy propagation, constant folding etc).
Performs data flow analysis (reaching defintions,
upwards exposed uses etc).
Detects double frees (deallocating the same memory
twice) using the data flow analysis results.
15. Simseer Evolutionary Tree Visualization
Phylogenetic tree – e.g. tree of life.
The closer nodes are in the tree, the more similar those
nodes are.
Simseer backend generates distance/similarity matrix.
PHYLIP software package takes matrix and generates
tree.
Tree is rendered to an image.
17. Results Processing
Parse XML output from Malwise
PHP parser
Simseer
Display evolutionary tree and similarity matrix
Bugwise
Display table showing address of double frees
18. Efficiency of Malwise as a Web Services
Does a web service incur much overhead compared
to command line usage?
Test case is 9 samples submitted to Simseer.
Python script sends samples and waits for results.
We compare the times of command line versus the
web service.
Mean overhead is 0.64 seconds.
20. Availability
http://www.FooCodeChu.Com
Rate limiting of submissions.
Limit of sample sizes and the number of samples in
archives.
We intend to relax these restrictions as we migrate to
more scalable infrastructure.
21. Future Work
Enterprise messaging to perform load balancing and
queuing?
More options to scans to exploit Malwise plugin system.
Any-time clustering to cluster new samples incrementally
in real-time?
Bug detection could be developed as bug management
system.
22. Conclusion
We make available new services for bug detection and
software similarity.
Our backend Malwise is versatile and allows plugins to
implement these services.
Bugwise has found real bugs in Linux.
The web service overhead is minimal.
We believe web services in these applications will have future
growth.