Rechkov. Lomonosov Report

1,171 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,171
On SlideShare
0
From Embeds
0
Number of Embeds
668
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Rechkov. Lomonosov Report

  1. 1. Introduction Assembler as a native language Anomalies detection Detecting abnormal executable files using binary code mining Rechkov Anton TU Berlin Germany & TTI SFU Russia 21th March 2012 Rechkov Anton Lomonosov Scholarship Report 21th March 2012 1 / 31
  2. 2. Introduction Assembler as a native language Anomalies detectionMalware evolution Ciphered Encrypted malware code of viruses Oligomorphic Generation of a decryptor by randomly selecting each piece of the decryptor from several predefined alternatives. Polymorphic Generation of a sample by encypting malware body and modifying decryptor each replication Metamorphic Reprograming all virus body by some obfuscation engine. Rechkov Anton Lomonosov Scholarship Report 21th March 2012 2 / 31
  3. 3. Introduction Assembler as a native language Anomalies detectionModern detection technique Signature analysis Searching a determine pattern in code. Emulation Unpacking and analysis through the emulation of malware code and continue signature analysis. Behavioral analysis Analysis of functions graph flow. Rechkov Anton Lomonosov Scholarship Report 21th March 2012 3 / 31
  4. 4. Introduction Assembler as a native language Anomalies detectionCode modification Obfuscation Transformation of executable program code which preserves functionality, but complicates the analysis and understanding algorithms. Deobfuscation Resolving irrelevant code by Algebraic models Formal grammars Rechkov Anton Lomonosov Scholarship Report 21th March 2012 4 / 31
  5. 5. Introduction Assembler as a native language Anomalies detectionCode modification Obfuscation Transformation of executable program code which preserves functionality, but complicates the analysis and understanding algorithms. Deobfuscation Resolving irrelevant code by Algebraic models Formal grammars Rechkov Anton Lomonosov Scholarship Report 21th March 2012 4 / 31
  6. 6. Introduction Assembler as a native language Anomalies detectionOutline 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Rechkov Anton Lomonosov Scholarship Report 21th March 2012 5 / 31
  7. 7. Introduction Assembler as a native language Anomalies detectionBinary code miningTable of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 6 / 31
  8. 8. Introduction Assembler as a native language Anomalies detectionBinary code miningStructure of compiler Common compiler scheme Code generator engine: Machine code generator, Optimizers: interprocedural optimization (IPO), profile-guided optimization (PGO), high-level optimizations Mutation code generator / obfuscator. Rechkov Anton Lomonosov Scholarship Report 21th March 2012 7 / 31
  9. 9. Introduction Assembler as a native language Anomalies detectionBinary code miningCommon Code generator features high-level optimizations Unique intermediate language Preoptimizing in intermediate representation Code generation Code templates from Intermediate to Target Number of used instruction types Machine dependent optimizer Instructions cost Rechkov Anton Lomonosov Scholarship Report 21th March 2012 8 / 31
  10. 10. Introduction Assembler as a native language Anomalies detectionBinary code miningCommon Code generator features high-level optimizations Unique intermediate language Preoptimizing in intermediate representation Code generation Code templates from Intermediate to Target Number of used instruction types Machine dependent optimizer Instructions cost Rechkov Anton Lomonosov Scholarship Report 21th March 2012 8 / 31
  11. 11. Introduction Assembler as a native language Anomalies detectionBinary code miningCommon Code generator features high-level optimizations Unique intermediate language Preoptimizing in intermediate representation Code generation Code templates from Intermediate to Target Number of used instruction types Machine dependent optimizer Instructions cost Rechkov Anton Lomonosov Scholarship Report 21th March 2012 8 / 31
  12. 12. Introduction Assembler as a native language Anomalies detectionBinary code miningApproving theory Experiment Determine instruction sequences Compile source code with compilers Compare distributions Compilers ⇒ MSVC ⇒ LLVM ⇒ GCC ⇒ Intel C++ Compiler Rechkov Anton Lomonosov Scholarship Report 21th March 2012 9 / 31
  13. 13. Introduction Assembler as a native language Anomalies detectionBinary code miningApproving theory Experiment Determine instruction sequences Compile source code with compilers Compare distributions Compilers ⇒ MSVC ⇒ LLVM ⇒ GCC ⇒ Intel C++ Compiler Rechkov Anton Lomonosov Scholarship Report 21th March 2012 9 / 31
  14. 14. Introduction Assembler as a native language Anomalies detectionBinary code miningXTEA distribution test Frequency of words in binary. (a) LLVM (b) MSVC (c) Intel C++ (d) GCC Rechkov Anton Lomonosov Scholarship Report 21th March 2012 10 / 31
  15. 15. Introduction Assembler as a native language Anomalies detectionBinary code mining Optimize binary’s mean distribution Rechkov Anton Lomonosov Scholarship Report 21th March 2012 11 / 31
  16. 16. Introduction Assembler as a native language Anomalies detectionNative language processingTable of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 12 / 31
  17. 17. Introduction Assembler as a native language Anomalies detectionNative language processingText Mining Language detection Author detection Text Classification Document clustering Rechkov Anton Lomonosov Scholarship Report 21th March 2012 13 / 31
  18. 18. Introduction Assembler as a native language Anomalies detectionStochastic modelsTable of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 14 / 31
  19. 19. Introduction Assembler as a native language Anomalies detectionStochastic modelsNeural networks Advantages + effectively with small number of training vectors + assessment of all samples proximity Disadvantages - predetermining model manual words definition manual excessive elements analysis reeducation limitations Rechkov Anton Lomonosov Scholarship Report 21th March 2012 15 / 31
  20. 20. Introduction Assembler as a native language Anomalies detectionStochastic modelsProbability model Advantages + self-sufficient word definition + education only by positive vectors + education unification(flexible reeducation) Disadvantages - big sample set for education - errors while distribution determination - computational complexity Rechkov Anton Lomonosov Scholarship Report 21th March 2012 16 / 31
  21. 21. Introduction Assembler as a native language Anomalies detectionOutline 1 Assembler as a native language 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 17 / 31
  22. 22. Introduction Assembler as a native language Anomalies detectionPreparationTable of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 18 / 31
  23. 23. Introduction Assembler as a native language Anomalies detectionPreparationCollect statistics samples Python Detection list of max repeated sequences Disassembling Searching strings Matlab Stochastic models Rechkov Anton Lomonosov Scholarship Report 21th March 2012 19 / 31
  24. 24. Introduction Assembler as a native language Anomalies detectionPreparationCollect statistics samples Python Detection list of max repeated sequences Disassembling Searching strings Matlab Stochastic models Rechkov Anton Lomonosov Scholarship Report 21th March 2012 19 / 31
  25. 25. Introduction Assembler as a native language Anomalies detectionPreparationCollect statistics samples Python Detection list of max repeated sequences Disassembling Searching strings Matlab Stochastic models Rechkov Anton Lomonosov Scholarship Report 21th March 2012 19 / 31
  26. 26. Introduction Assembler as a native language Anomalies detectionCode generator lexemesTable of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 20 / 31
  27. 27. Introduction Assembler as a native language Anomalies detectionCode generator lexemesFrom disassembling to lexemes Lexem 3 to 6 instruction length sequences ignore unknown bytes maximum repeated sequences Rechkov Anton Lomonosov Scholarship Report 21th March 2012 21 / 31
  28. 28. Introduction Assembler as a native language Anomalies detection Code generator lexemes Lexemes analysis Suffix Tree exampleSuffix tree: Economy memory, String searching faster then O(N 2 ), Fast assessment of maximum repeats in strings Rechkov Anton Lomonosov Scholarship Report 21th March 2012 22 / 31
  29. 29. Introduction Assembler as a native language Anomalies detectionAnomalies detection by neural networksTable of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 23 / 31
  30. 30. Introduction Assembler as a native language Anomalies detectionAnomalies detection by neural networksRadial basis networks Neural net architecture no need to choose the number of hidden layers lack of the pathology convergence fast convergence through a combination of learning algorithms. Rechkov Anton Lomonosov Scholarship Report 21th March 2012 24 / 31
  31. 31. Introduction Assembler as a native language Anomalies detectionAnomalies detection by neural networksDetection compilers Compiler detection testing Rechkov Anton Lomonosov Scholarship Report 21th March 2012 25 / 31
  32. 32. Introduction Assembler as a native language Anomalies detectionAnomalies detection by probability modelTable of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 26 / 31
  33. 33. Introduction Assembler as a native language Anomalies detectionAnomalies detection by probability modelMultivariate Gamma Empirical and theoretical PDF of element Using a set of bi- and 3-variate 40 Gamma: 35 Gamma PDF Empirical PDF Suggest Gamma 30 distribution 25 Sample proximity PDF 20 Fast education 15 10 5 0 −0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 X Rechkov Anton Lomonosov Scholarship Report 21th March 2012 27 / 31
  34. 34. Introduction Assembler as a native language Anomalies detectionAnomalies detection by probability modelProbability model testing Error graphs of compiler probabilities based on coefficient of minimal value Pp = Pmin ∗ 10coef i i 1 1 false positive GCC O0 false positive MS false negative Clang 0.9 false negative LLVM 0.9 false negative Intel false negative GCC O2 0.8 0.8 false negative MS 0.7 0.7 0.6 0.6 error error 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18 20 coeff for min value coeff for min value Rechkov Anton Lomonosov Scholarship Report 21th March 2012 28 / 31
  35. 35. Introduction Assembler as a native language Anomalies detectionAnomalies detection by probability modelProbability model testing Problem of existing zero elements 1 1 false positive GCC O2 false positive GCC O2 false negative Clang 0.9 false negative Clang 0.9 false negative Intel false negative Intel false negative GCC O0 false negative GCC O0 0.8 0.8 false negative MS false negative MS 0.7 0.7 0.6 0.6 error error 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 coeff for min value coeff for min value Rechkov Anton Lomonosov Scholarship Report 21th March 2012 29 / 31
  36. 36. Introduction Assembler as a native language Anomalies detectionAnomalies detection by probability modelConclusion Proposed connection between native language and assembler Developed algorithms of lexical assembler language analyzes Developed experimental stochastic models: Based on neural networks Based on probability model Realized lexical assembler language analysis. Approximate false positive errors of compiler detection: 27% 10-15% Rechkov Anton Lomonosov Scholarship Report 21th March 2012 30 / 31
  37. 37. Introduction Assembler as a native language Anomalies detectionAnomalies detection by probability model Questions? Rechkov Anton Lomonosov Scholarship Report 21th March 2012 31 / 31

×