Artificial Intelligence in Virus Detection & Recognition
ARTIFICIAL INTELLIGENCE METHODS IN VIRUS DETECTION & RECOGNITIONINTRODUCTION TO HEURISTIC SCANNING AHMAD ALI. A 09409002 S7 CSE
PRESENTATION OUTLINE Introduction Fundamentals of malware Metaheuristics in Virus Detection & Recognition Heuristic scanning theory Lacks in specific detection Heuristic scanning conception Recognizing potential threat Coping with anti-heuristic mechanisms Towards accuracy improvement Summary
MALWAREMalware (malicious software) is a softwaredesigned to infiltrate or damage a computersystem without the owners informed consent.
MALWARE TYPES It is important to be aware that nevertheless all of them have similar purpose, each one behave differently. Viruses Worms Wabbits Trojan horses Exploits/Backdoors Rootkits Key loggers/Dialers Hoaxes
INFECTION STRATEGIES Nonresident viruses: The simplest form of viruseswhich dont stay in memory, but infect foundedexecutable file and search for another to replicate. Resident viruses: More complex and efficient type ofviruses which stay in memory and hide their presencefrom other processes. Fast infectors: Type which is designed to infect as many files as possible. Slow infectors: Using stealth and encryption techniques to stay undetected outlast.
HeuristicHeuristic is a method to solve a problem, commonly aninformal method. It is particularly used to rapidly cometo a solution that is reasonably close to the best possible answer, or optimal solution... MetaheuristicMetaheuristic is a heuristic method which combines user-given black-box procedures in a hopefully efficient way.Metaheuristics are generally applied to problems forwhich there is no satisfactory problem-specificalgorithm or heuristic.
GENERAL METAHEURISTICSList below shows main metaheuristics used for virusdetection and recognition: Pattern matching Automatic learning Environment emulation Neural networks Data mining Bayes networks Hidden Markov models and other...
LACKS IN SPECIFIC DETECTIONThere are two basic methods to detect viruses - specificand generic.Why is generic detection gaining importance? There arefour reasons: The number of viruses increases rapidly. The number of virus mutants increases. The development of polymorphic viruses. Viruses directed at a specific organization or company.
Specific detection methods like signature scanning became very efficient ways of detecting known threats. Finding specific signature in code allows scanner to recognize every virus which signature has been stored in built-in database.BB ?2 B9 10 01 81 37 ?2 81 77 02 ?2 83 C3 04 E2 F2 FireFly virus signature(hexadecimal)
Problem occurs when virus source is changed by a programmer or mutation engine. Signature is being malformed due to even minor changes. Virus may behave in an exactly same way but is undetectable due to new, unique signature.BB ?2 B9 10 01 81 37 ?2 81 A1 D3 ?2 01 C3 04 E2F2 Malformed signature(hexadecimal)
HEURISTIC SCANNING CONCEPTION Q: How to recognize a virus without any knowledge about its internal structure? A: By examining its behavior and characteristics.
Heuristic scanning in its basic form is implementation ofthree metaheuristics: Pattern matching Automatic learning Environment emulationOf course modern solutions provide more functionalitiesbut principle stays the same.
The basic idea of heuristic scanning is to examineassembly language instruction sequences(step-by-step)If there are sequences behaving suspiciously, programcan be qualified as a virus.
RECOGNIZING POTENTIAL THREAT Singular suspicion is never a reason to trigger the alarm. But if the same program also tries to stay resident and contains routine to search for executables, it is highly probable that its a real virus.
HEURISTIC SCANNING ASARTIFICIAL NEURON Figure: Single-layer classifier with threshold
MALWARE EVOLVESViruses started to use various stealth techniques.This allowed them to be invisible for traditionalscanner.Moreover, most of them started using real-timeencryption.
Pattern matching is not enough Why not to create artificial runtime environment to let the virus do its job? Such approach found implementation in environment emulation engines, which became standard AV software weapon.
VIRTUAL REALITYAnti-virus program provides a virtual machine withindependent operating system and allows virus toperform its routines.Behavior and characteristics are being continuouslyexamined, while virus is not aware that is workingon a fake system.This leads to decryption routines and revealment ofits true nature. Also stealth techniques are uselessbecause whole VM is monitored by AV software.
FALSE POSITIVES & AUTOMATIC LEARNINGHS will blame innocent programs for being potentialthreats. Such behavior is called false positive.We must be aware that program is right when risingalarm, because scanned app posses suspicioussequences, we cant blame scanner for failure.
Q: So what can be done to avoid false positives? A: Automatic learning!
AVOIDING FALSE POSITIVES & IMPROVING ACCURACYAssume that machine is not infected.Combine scanning techniquesDefinition of (combinations of) suspicious abilities.Recognition of common program codesRecognition of specific programs
WHAT CAN BE EXPECTED FROM IT IN THE FUTURE?The Development ContinuesMost anti-virus developers still do not supply aready-to-use heuristic analyzer.Those who have heuristics already available are stillimproving it.
SUMMARYBasically HS is inherited from combination of patternmatching, automatic learning and environment emulationmetaheuristics. As a heuristic method its not 100%effective. So why do we apply HS?Pros Can detect future threats User is less dependent on product update Improves conventional scanning resultsCons False positives Making decision after alarm requires knowledge
REFERENCES Data mining methods for detection of new malicious executable-MATHEW.G CHU LTZ Heuristics scanner for Artificial Intelligence- RIGHARD ZWIENEN DERG Heuristics Antivirus technology- FRANSVELDSMAN