Silvio Cesare Deakin University [email_address]
  Who am I and where did this talk come from? <ul><li>PhD student at Deakin University. </li></ul><ul><li>Research focus i...
Introduction <ul><li>Many applications of software similarity and classification </li></ul><ul><li>Malware Detection </li>...
Problem Formulation <ul><li>Extract features, fingerprints, or 'birthmarks' from programs p and q. </li></ul><ul><li>If bi...
Software Similarity Problem
Taxonomy of Program Features <ul><li>Raw Code </li></ul><ul><li>Abstract Syntax Trees </li></ul><ul><li>Variables </li></u...
Program Features Examples AST (left) and Control Flow (right)
Taxonomy of Features in Program Binaries <ul><li>Headers </li></ul><ul><li>Object Code </li></ul><ul><li>Symbols </li></ul...
Program Transformations <ul><li>Compiler Optimisation and Recompilation </li></ul><ul><li>Program Obfuscation </li></ul><u...
Traditional Malware Packing
Processing Program Features <ul><li>Treat features or birthmark as a mathematical object. </li></ul><ul><ul><li>Strings </...
Software Birthmark Similarity <ul><li>Strings </li></ul><ul><ul><li>Edit distance etc </li></ul></ul><ul><li>Vectors </li>...
Software Indexing and Searching <ul><li>Nearest neighbour is closest program in database to query. </li></ul><ul><li>Based...
rNN (Range Nearest Neighbour)
<ul><li>Wiki on Software Similarity and Classification </li></ul><ul><li>Book on Software Similarity and Classification </...
Wiki on Software Similarity and Classification <ul><li>Reviews of academic papers. </li></ul><ul><li>http://www.foocodechu...
Book on ‘Software Similarity and Classification’ <ul><li>Academic style survey of the topic. </li></ul><ul><li>Published b...
Simseer – A Software Similarity Web Service <ul><li>An online service to identify similarity between programs. </li></ul><...
 
 
Conclusion <ul><li>Presented a review of software similarity. </li></ul><ul><li>Demonstrated a new web service. </li></ul>...
Upcoming SlideShare
Loading in …5
×

Simseer - A Software Similarity Web Service

1,228 views
1,054 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,228
On SlideShare
0
From Embeds
0
Number of Embeds
129
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Simseer - A Software Similarity Web Service

  1. 1. Silvio Cesare Deakin University [email_address]
  2. 2. Who am I and where did this talk come from? <ul><li>PhD student at Deakin University. </li></ul><ul><li>Research focus includes malware detection and automated vulnerability detection. </li></ul><ul><li>Software similarity is the focus of this talk. </li></ul><ul><li>This talk is an overview of the core topics, how its approached in academia, and a web service that identifies software similarity. </li></ul>
  3. 3. Introduction <ul><li>Many applications of software similarity and classification </li></ul><ul><li>Malware Detection </li></ul><ul><li>Software Theft Detection </li></ul><ul><li>Plagiarism Detection </li></ul><ul><li>Software Clone Detection </li></ul>
  4. 4. Problem Formulation <ul><li>Extract features, fingerprints, or 'birthmarks' from programs p and q. </li></ul><ul><li>If birthmark(p) similar to birthmark(q), then programs are similar. </li></ul>
  5. 5. Software Similarity Problem
  6. 6. Taxonomy of Program Features <ul><li>Raw Code </li></ul><ul><li>Abstract Syntax Trees </li></ul><ul><li>Variables </li></ul><ul><li>Pointers </li></ul><ul><li>Instructions </li></ul><ul><li>Basic Blocks </li></ul><ul><li>Procedures </li></ul><ul><li>API Calls </li></ul><ul><li>Control Flow Graphs </li></ul><ul><li>Call Graphs </li></ul><ul><li>Data Flow </li></ul><ul><li>Procedure Dependency Graphs </li></ul><ul><li>System Dependency Graphs </li></ul><ul><li>Object Inheritance and Dependency </li></ul>
  7. 7. Program Features Examples AST (left) and Control Flow (right)
  8. 8. Taxonomy of Features in Program Binaries <ul><li>Headers </li></ul><ul><li>Object Code </li></ul><ul><li>Symbols </li></ul><ul><li>Debugging Information </li></ul><ul><li>Relocations </li></ul><ul><li>Dynamic Linking Information </li></ul>
  9. 9. Program Transformations <ul><li>Compiler Optimisation and Recompilation </li></ul><ul><li>Program Obfuscation </li></ul><ul><li>Plagiarism, Software Theft, and Derivative Works </li></ul><ul><li>Malware packing, polymorphism and metamorphism </li></ul>
  10. 10. Traditional Malware Packing
  11. 11. Processing Program Features <ul><li>Treat features or birthmark as a mathematical object. </li></ul><ul><ul><li>Strings </li></ul></ul><ul><ul><li>Vectors </li></ul></ul><ul><ul><li>Sets </li></ul></ul><ul><ul><li>Sets of Vectors </li></ul></ul><ul><ul><li>Trees </li></ul></ul><ul><ul><li>Graphs </li></ul></ul>
  12. 12. Software Birthmark Similarity <ul><li>Strings </li></ul><ul><ul><li>Edit distance etc </li></ul></ul><ul><li>Vectors </li></ul><ul><ul><li>Cosine Similarity </li></ul></ul><ul><ul><li>Euclidean distance etc </li></ul></ul><ul><li>Set Similarity </li></ul><ul><ul><li>Jaccard distance etc </li></ul></ul><ul><li>Set of Vectors Similarity </li></ul><ul><ul><li>Minimum matching distance </li></ul></ul><ul><li>Trees and Graphs </li></ul><ul><ul><li>Edit distances etc </li></ul></ul>
  13. 13. Software Indexing and Searching <ul><li>Nearest neighbour is closest program in database to query. </li></ul><ul><li>Based on 'distance' – a measure of dissimilarity between objects. </li></ul><ul><li>Distances that are 'metric' can index and search more efficiently. </li></ul>
  14. 14. rNN (Range Nearest Neighbour)
  15. 15. <ul><li>Wiki on Software Similarity and Classification </li></ul><ul><li>Book on Software Similarity and Classification </li></ul><ul><li>Simseer – A Software Similarity Web Service </li></ul>
  16. 16. Wiki on Software Similarity and Classification <ul><li>Reviews of academic papers. </li></ul><ul><li>http://www.foocodechu.com/wiki </li></ul>
  17. 17. Book on ‘Software Similarity and Classification’ <ul><li>Academic style survey of the topic. </li></ul><ul><li>Published by Springer. </li></ul><ul><li>100 pages. </li></ul><ul><li>Available in April. </li></ul><ul><li>http://www.springer.com/computer/security+and+cryptology/book/978-1-4471-2908-0 </li></ul>
  18. 18. Simseer – A Software Similarity Web Service <ul><li>An online service to identify similarity between programs. </li></ul><ul><li>Performs unpacking. </li></ul><ul><li>Renders an evolutionary tree to show program relationships. </li></ul><ul><li>Free to use! </li></ul><ul><li>http://www.foocodechu.com/?q=simseer-a-software-similarity-web-service </li></ul>
  19. 21. Conclusion <ul><li>Presented a review of software similarity. </li></ul><ul><li>Demonstrated a new web service. </li></ul><ul><li>Try it! </li></ul><ul><li>http://www.foocodechu.com </li></ul><ul><li>Questions? </li></ul>

×