BY
MAYANK CHAUDHRI
2016H103014G
 INTRODUCTION
 MOTIVATION
 DETECTION TECHNIQUES
 Signature Based
 Anomaly Based
 Specification Based
 MALWARE OBFUSCATION
 Malware
 Malware, short for "malicious software," refers to a type of computer program designed
to infect a legitimate user's computer and inflict harm on it in multiple ways
 Antimalware
 Antimalware software protects against infections caused by many types of malware,
including viruses, worms, Trojan horses, rootkits, spyware, key
loggers, ransomware and adware.
 Obfuscation
 The obfuscation is a technique that makes programs harder to understand
 Why do we need to study malwares ?
 So are only the computers that can be affected ?
 Wait
 Does it look fake ?
 Virus 666
 US patent 6506148 B2
 Why do we need antimalware's?
 Techniques used for detecting malware can be categorized broadly in to two
categories:
 anomaly-based detection
 and signature-based detection
 An anomaly-based detection technique uses its knowledge of what constitutes
normal behavior to decide the maliciousness of a program under inspection
 Specification-based techniques leverage some specification or rule set of what is valid
behavior in order to decide the maliciousness of a program under inspection
 Signature-based detection uses its characterization of what is known to be
malicious to decide the maliciousness of a program under inspection
 Static
 Static analysis uses syntax or structural properties
 A static approach attempts to detect malware before the program under inspection executes
 Example strings utility (naïve way)
 Dynamic
 dynamic approach will leverage runtime information
 a dynamic approach attempts to detect malicious behavior during program execution or after
program execution
 Example Sysinternals suit (naïve way)
 Hybrid
 In this case, static and dynamic information is used to detect malware
 What is a signature?
 The signatures are typically hashes or byte-streams that are used to determine whether
a file or buffer contains a malicious payload
 Hashes are generated using algorithms like CRC or MD5 which are typically fast and
can be calculated many times per second
 This is most typical and preferred method employed by antimalware/antivirus
 There is a tradeoff between being fast and being accurate
 Byte-Streams
 Simplest form of signatures
 Signature is a byte-stream that is specific to a malware file and that does not normally
appear on non-malicious files
 Example:- to detect the European Institute for Computer Anti-Virus Research (EICAR)
antivirus testing file, an antivirus engine may simply search for this entire string:
 X5O!P%@AP[4PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
 Easiest and fast approach for detection
 Many robust and efficient algorithms are present for string matching
 Example : Aho-Corasick, Knuth-Morris-Pratt, Boyer-Moore, etc.
 This approach is error prone
 Checksums
 The most typical signature-matching algorithm is used by almost all existing AV engines
and is based on calculating CRCs.
 An antivirus engine may detect this testing file by calculating the CRC32 checksum of
the entire buffer against chunks of data or by analyzing the specific parts of a file format
that can be divided
 Fast but a lot of false positives due to collisions
 Use of modified CRC for detection. But still it gives false positioves
 Example :
 “petfood” and “eisenhower” have the same CRC32 hash 0xD0132158
 Use of custom checksums
 Cryptographic hashes
 Follows the 3 main properties of cryptographic hash functions
 Generates a “signature” that univocally identifies one buffer and just one buffer
 Reduces false positives
 More expensive than calculating a CRC32 hash
 A single bit change may need to compute a new signature
 They are used for recently discovered malwares that are considered critical. Meanwhile
stronger signature are being developed
 The aim is to identify a whole family of malwares and reduce false positives
 Fuzzy Hashing
 Minimal or no diffusion at all
 No confusion at all
 A good collision rate (depends on application)
 Some available hashes
 Ssdeep, DeepToad, SpamSum etc.
 False positives are possible but less compared to earlier discussed techniques
 The are not used independently but used with some sophisticated techniques like
bloom filters
 Bypassing such filters is not easy
 Attacker needs to change many parts because changing just one bit will not work
 The number of changes required to bypass the fuzzy signature depends on the
block size and how the block size is chosen
 If block size depends on the size of given buffer and is not fixed then it is easier to
bypass
 Fixed block size based fuzzy signatures are difficult to bypass
 Graph-Based Hashes for Executables
 Software program can be divided into two different kinds of graphs
 Call graph – Directed graph showing the relationship between all the functions in the program
 Flow graph- Directed graph showing the relationship between basic blocks
 Antimalware's with code analysis engines may use signatures in the form of graphs
using information extracted from call graphs or the flow graphs
 This approach is expensive but effective
 For better performance limit to some instructions, basic blocks, time-outs
 These techniques are powerful for the detection of the polymorphic viruses, while the
instructions will be different between different evolutions but the call graphs usually
remain stable.
 False positive cases are still possible
 Evasion techniques
 Change the layout of the call graph
 Implement anti-disassembly tricks
 Mix anti-disassembly techniques with opaque predicates
 Use time-out tricks (make the flow graph as complex as possible)
 Example of control flow graph tool
 http://github.com/joxeankoret/pyew
 Dynamic signature-based detection is characterized by using solely information
gathered during the execution to decide its maliciousness
 looks for patterns of behavior that would reveal the true malicious intent of a
program.
 Signature-based method for worm detection that is based on known malicious
behaviors
 A state transition based technique for detection
 Uses static and dynamic properties to determine the maliciousness
 First executes the program and then apply static signature detection
 Example
 Worm vs. Worm
 Malicious Code Filter
 Anomaly based detection usually occurs in two phases:
 Training (learning) phase and
 Detection (monitoring) phase
 During the training phase the detector attempts to learn the normal behavior .
 The detector could be learning the behavior of system, program or both
 The key advantage of anomaly based detection is to detect zero-day attacks
 Two fundamental problems associated with this approach are
 High false alarm rate
 Complexity of choosing the features to be learned in training phase
 In dynamic anomaly-based detection, information gathered from the program’s
execution is used to detect malicious code
 The detection phase monitors the program under inspection during its execution,
checking for inconsistencies with what was learned during the training phase
 Examples
 IDS, using computer forensic methods for Privacy-Invasive Software, monitoring system
call sequences, process call sequences
 Setting a threshold is a challenging problem to reduce false positive cases
 In static anomaly-based detection, characteristics about the file structure of the
program under inspection are used to detect malicious code
 A key advantage of static anomaly based detection is that its use may make it
possible to detect malware without having to allow the malware carrying program
execute on the host system
 Data-mining and machine learning approaches are used to detect the malwares
 Hybrid anomaly based detection
 Specification-based detection is a type of anomaly-based detection that tries to
address the typical high false alarm rate associated with most anomaly-based
detection techniques
 Specification-based detection attempts to approximate the requirements for an
application or system
 Training phase is the attainment of some rule set
 The main limitation of specification-based detection is that it is often difficult to
specify completely and accurately the entire set of valid behaviors a system should
exhibit
 Approaches classified as dynamic specification-based use behavior observed at
runtime to determine the maliciousness of an executable
 Example
 Monitoring Security-Critical Programs (using monitored system call events)
 Using Dynamic Information Flow to Protect Applications
 Process Behavior Monitoring
 Using Instruction Block Signatures
 Structural properties of programs are use for detection
 Example
 Static Detection of Malicious Code in Executables (API- graph)
 Compiler Approach to Malcode Detection (certifying compiler)
 Detecting Malcode in Firmware
 Hybrid specification based detection
 Example
 Types of malware obfuscation techniques
 Encryption
 Exclusive OR
 Dead code insertion
 Register Reassignment
 Subroutine Reordering
 Instruction substitution
 Code transposition
 Code integration
 Base64 encoding
 Code packing
 ROT13
 Encryption
 The first approach to evade the signature based antivirus scanners is to use encryption
 Exclusive OR
 Perform XOR operation with some byte
 Base64 Encoding
 Base64 is commonly used in malware to disguise text strings
 ROT13
 Rotate13 a simple letter substitution to jumble text
 Code Packing
 A packer is piece of software that takes the original malware file and compresses it
 Dead-Code Insertion
 Dead-code insertion is a simple technique that adds some ineffective instructions to a
program to change its appearance, but keep its behavior
 Register Reassignment
 Switches registers generation to generation while keeping program behavior same
 Subroutine Reordering
 Obfuscate an original code by changing the order of its subroutines in a random way.
 Example Win32/Ghost
 Instruction Substitution
 Evolves an original code by replacing some instruction with other equivalent ones
 Code Transposition
 Code transposition reorders the sequence of the instructions of an original code without having any
impact on its behavior.
 Code Integration
 Introduced by the Win32/Zmist malware
 Malware knits itself to the code of its target program
 Decompile the target program into manageable objects , add itself between them and
reassembles the integrated code into a new generation.
 Antivirus hackers handbook, Joxean Koret Elias Bachaalany, Willy Publication.
 Practical Malware Analysis, Andrew Honig, No Starch Press.
 Nwokedi Idika, Aditya P. Mathur, A Survey of Malware Detection Techniques,
 Ilsun You , Kangbin Yim, Malware Obfuscation Techniques: A Brief Survey , 2010 International Conference on
Broadband, Wireless Computing, Communication and Applications.
 defcon-17-sean_taylor-binary_obfuscation.pdf, Defcon 02017.
Antimalware

Antimalware

  • 1.
  • 2.
     INTRODUCTION  MOTIVATION DETECTION TECHNIQUES  Signature Based  Anomaly Based  Specification Based  MALWARE OBFUSCATION
  • 3.
     Malware  Malware,short for "malicious software," refers to a type of computer program designed to infect a legitimate user's computer and inflict harm on it in multiple ways  Antimalware  Antimalware software protects against infections caused by many types of malware, including viruses, worms, Trojan horses, rootkits, spyware, key loggers, ransomware and adware.  Obfuscation  The obfuscation is a technique that makes programs harder to understand
  • 4.
     Why dowe need to study malwares ?  So are only the computers that can be affected ?  Wait  Does it look fake ?  Virus 666  US patent 6506148 B2  Why do we need antimalware's?
  • 6.
     Techniques usedfor detecting malware can be categorized broadly in to two categories:  anomaly-based detection  and signature-based detection  An anomaly-based detection technique uses its knowledge of what constitutes normal behavior to decide the maliciousness of a program under inspection  Specification-based techniques leverage some specification or rule set of what is valid behavior in order to decide the maliciousness of a program under inspection  Signature-based detection uses its characterization of what is known to be malicious to decide the maliciousness of a program under inspection
  • 7.
     Static  Staticanalysis uses syntax or structural properties  A static approach attempts to detect malware before the program under inspection executes  Example strings utility (naïve way)  Dynamic  dynamic approach will leverage runtime information  a dynamic approach attempts to detect malicious behavior during program execution or after program execution  Example Sysinternals suit (naïve way)  Hybrid  In this case, static and dynamic information is used to detect malware
  • 8.
     What isa signature?  The signatures are typically hashes or byte-streams that are used to determine whether a file or buffer contains a malicious payload  Hashes are generated using algorithms like CRC or MD5 which are typically fast and can be calculated many times per second  This is most typical and preferred method employed by antimalware/antivirus  There is a tradeoff between being fast and being accurate
  • 9.
     Byte-Streams  Simplestform of signatures  Signature is a byte-stream that is specific to a malware file and that does not normally appear on non-malicious files  Example:- to detect the European Institute for Computer Anti-Virus Research (EICAR) antivirus testing file, an antivirus engine may simply search for this entire string:  X5O!P%@AP[4PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*  Easiest and fast approach for detection  Many robust and efficient algorithms are present for string matching  Example : Aho-Corasick, Knuth-Morris-Pratt, Boyer-Moore, etc.  This approach is error prone
  • 11.
     Checksums  Themost typical signature-matching algorithm is used by almost all existing AV engines and is based on calculating CRCs.  An antivirus engine may detect this testing file by calculating the CRC32 checksum of the entire buffer against chunks of data or by analyzing the specific parts of a file format that can be divided  Fast but a lot of false positives due to collisions  Use of modified CRC for detection. But still it gives false positioves  Example :  “petfood” and “eisenhower” have the same CRC32 hash 0xD0132158  Use of custom checksums
  • 12.
     Cryptographic hashes Follows the 3 main properties of cryptographic hash functions  Generates a “signature” that univocally identifies one buffer and just one buffer  Reduces false positives  More expensive than calculating a CRC32 hash  A single bit change may need to compute a new signature  They are used for recently discovered malwares that are considered critical. Meanwhile stronger signature are being developed
  • 13.
     The aimis to identify a whole family of malwares and reduce false positives  Fuzzy Hashing  Minimal or no diffusion at all  No confusion at all  A good collision rate (depends on application)  Some available hashes  Ssdeep, DeepToad, SpamSum etc.  False positives are possible but less compared to earlier discussed techniques  The are not used independently but used with some sophisticated techniques like bloom filters
  • 14.
     Bypassing suchfilters is not easy  Attacker needs to change many parts because changing just one bit will not work  The number of changes required to bypass the fuzzy signature depends on the block size and how the block size is chosen  If block size depends on the size of given buffer and is not fixed then it is easier to bypass  Fixed block size based fuzzy signatures are difficult to bypass
  • 15.
     Graph-Based Hashesfor Executables  Software program can be divided into two different kinds of graphs  Call graph – Directed graph showing the relationship between all the functions in the program  Flow graph- Directed graph showing the relationship between basic blocks  Antimalware's with code analysis engines may use signatures in the form of graphs using information extracted from call graphs or the flow graphs  This approach is expensive but effective  For better performance limit to some instructions, basic blocks, time-outs  These techniques are powerful for the detection of the polymorphic viruses, while the instructions will be different between different evolutions but the call graphs usually remain stable.
  • 16.
     False positivecases are still possible  Evasion techniques  Change the layout of the call graph  Implement anti-disassembly tricks  Mix anti-disassembly techniques with opaque predicates  Use time-out tricks (make the flow graph as complex as possible)  Example of control flow graph tool  http://github.com/joxeankoret/pyew
  • 17.
     Dynamic signature-baseddetection is characterized by using solely information gathered during the execution to decide its maliciousness  looks for patterns of behavior that would reveal the true malicious intent of a program.  Signature-based method for worm detection that is based on known malicious behaviors  A state transition based technique for detection
  • 18.
     Uses staticand dynamic properties to determine the maliciousness  First executes the program and then apply static signature detection  Example  Worm vs. Worm  Malicious Code Filter
  • 19.
     Anomaly baseddetection usually occurs in two phases:  Training (learning) phase and  Detection (monitoring) phase  During the training phase the detector attempts to learn the normal behavior .  The detector could be learning the behavior of system, program or both  The key advantage of anomaly based detection is to detect zero-day attacks  Two fundamental problems associated with this approach are  High false alarm rate  Complexity of choosing the features to be learned in training phase
  • 20.
     In dynamicanomaly-based detection, information gathered from the program’s execution is used to detect malicious code  The detection phase monitors the program under inspection during its execution, checking for inconsistencies with what was learned during the training phase  Examples  IDS, using computer forensic methods for Privacy-Invasive Software, monitoring system call sequences, process call sequences  Setting a threshold is a challenging problem to reduce false positive cases
  • 21.
     In staticanomaly-based detection, characteristics about the file structure of the program under inspection are used to detect malicious code  A key advantage of static anomaly based detection is that its use may make it possible to detect malware without having to allow the malware carrying program execute on the host system  Data-mining and machine learning approaches are used to detect the malwares  Hybrid anomaly based detection
  • 22.
     Specification-based detectionis a type of anomaly-based detection that tries to address the typical high false alarm rate associated with most anomaly-based detection techniques  Specification-based detection attempts to approximate the requirements for an application or system  Training phase is the attainment of some rule set  The main limitation of specification-based detection is that it is often difficult to specify completely and accurately the entire set of valid behaviors a system should exhibit
  • 23.
     Approaches classifiedas dynamic specification-based use behavior observed at runtime to determine the maliciousness of an executable  Example  Monitoring Security-Critical Programs (using monitored system call events)  Using Dynamic Information Flow to Protect Applications  Process Behavior Monitoring  Using Instruction Block Signatures
  • 24.
     Structural propertiesof programs are use for detection  Example  Static Detection of Malicious Code in Executables (API- graph)  Compiler Approach to Malcode Detection (certifying compiler)  Detecting Malcode in Firmware  Hybrid specification based detection  Example
  • 25.
     Types ofmalware obfuscation techniques  Encryption  Exclusive OR  Dead code insertion  Register Reassignment  Subroutine Reordering  Instruction substitution  Code transposition  Code integration  Base64 encoding  Code packing  ROT13
  • 26.
     Encryption  Thefirst approach to evade the signature based antivirus scanners is to use encryption  Exclusive OR  Perform XOR operation with some byte  Base64 Encoding  Base64 is commonly used in malware to disguise text strings  ROT13  Rotate13 a simple letter substitution to jumble text
  • 27.
     Code Packing A packer is piece of software that takes the original malware file and compresses it  Dead-Code Insertion  Dead-code insertion is a simple technique that adds some ineffective instructions to a program to change its appearance, but keep its behavior  Register Reassignment  Switches registers generation to generation while keeping program behavior same  Subroutine Reordering  Obfuscate an original code by changing the order of its subroutines in a random way.  Example Win32/Ghost
  • 28.
     Instruction Substitution Evolves an original code by replacing some instruction with other equivalent ones  Code Transposition  Code transposition reorders the sequence of the instructions of an original code without having any impact on its behavior.  Code Integration  Introduced by the Win32/Zmist malware  Malware knits itself to the code of its target program  Decompile the target program into manageable objects , add itself between them and reassembles the integrated code into a new generation.
  • 29.
     Antivirus hackershandbook, Joxean Koret Elias Bachaalany, Willy Publication.  Practical Malware Analysis, Andrew Honig, No Starch Press.  Nwokedi Idika, Aditya P. Mathur, A Survey of Malware Detection Techniques,  Ilsun You , Kangbin Yim, Malware Obfuscation Techniques: A Brief Survey , 2010 International Conference on Broadband, Wireless Computing, Communication and Applications.  defcon-17-sean_taylor-binary_obfuscation.pdf, Defcon 02017.