This document describes a model for detecting unknown computer viruses in real-time by monitoring for deviations from expected program behavior. The model involves instrumenting programs to observe internal states and outputs when different input sequences are provided. Any differences from a program's normal behavior could signal that the program has been corrupted by a virus or other malicious attack. While this model may be effective, its main challenge is the enormous time and space requirements needed to track all possible program states and histories at a practical scale.
Application of data mining based malicious code detection techniques for dete...UltraUploader
This document discusses using data mining techniques to detect spyware. It applies Naive Bayes algorithms used in previous work to detect viruses to a dataset of 312 benign and 614 spyware executables. Feature extraction examines byte sequences in files. Initial tests showed low accuracy, but removing Trojan programs from the dataset improved results, with a window size of 4 and without Trojans achieving 80% detection with a 4% false positive rate. Future work proposes testing against larger window sizes and obfuscated code.
TriggerScope: Towards Detecting Logic Bombs in Android ApplicationsPietro De Nicolao
Presentation of paper "TriggerScope: Towards Detecting Logic Bombs in Android Applications" for the course of Advanced Topics in Computer Security of prof. Stefano Zanero.
Source and further information: https://github.com/pietrodn/triggerscope
The document proposes new methods for automatically generating malware invariants from binary code to detect and identify malware. Current signature-based malware detectors can be evaded through obfuscation, but malware invariants capture semantic properties that are more difficult to obfuscate. The method involves using formal methods and static analysis to extract invariants from binary code and represent them as semantic signatures, called malware invariants, that can be matched against suspicious code to detect malware families. Combining multiple static and dynamic analysis tools can help generate strong malware invariants that circumvent common obfuscation techniques used by malware writers.
Agisa towards automatic generation of infection signaturesUltraUploader
This document describes AGIS, a system for automatically generating infection signatures to detect compromised systems. AGIS monitors the runtime behavior of suspicious code to detect infections based on security policy violations. It then uses dynamic and static analysis to identify the characteristic behaviors of the malware in terms of system/API calls. Important instructions related to the infection's malicious behavior are extracted from executables to generate signatures that can be used by scanners to detect the infection. AGIS was implemented on Windows and evaluated against real malware samples, demonstrating its ability to detect new infections and generate high-quality signatures.
A zero-day (also known as 0-day) vulnerability is a computer-software vulnerability that is unknown to those who would be interested in mitigating the vulnerability.
ABSTRACT :
--------------------
Modern malware that are metamorphic or polymorphic in nature mutate their code by employing code obfuscation and encryption methods to thwart detection. Thus, conventional signature based scanners fail to detect these malware. In order to address the problems of detecting known variants of metamorphic malware, we propose a method using bioinformatics techniques effectively used for Protein and DNA matching. Instead of using exact signature matching methods, more sophisticated signature(s) are extracted using multiple sequence alignment (MSA). The results show that the proposed method is capable of identifying malware variants with minimum false alarms and misses. Also, the detection rate achieved with our proposed method is better compared to commercial antivirus products used in the study.
Status:
----------
This work has been accepted by 8th IEEE International Conference on Innovations in Information Technology (Innovations'12).
Link:
-------
http://ieeexplore.ieee.org/xpl/login.jsp?reload=true&tp=&arnumber=6207739&url=http://ieeexplore.ieee.org/iel5/6203543/6207707/06207739.pdf?arnumber=6207739
e-mail: grijesh.mnit@gmail.com
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREIJNSA Journal
In the era of information technology and connected world, detecting malware has been a major security concern for individuals, companies and even for states. The New generation of malware samples upgraded with advanced protection mechanism such as packing, and obfuscation frustrate anti-virus solutions. API call analysis is used to identify suspicious malicious behavior thanks to its description capability of a software functionality. In this paper, we propose an effective and efficient malware detection method that uses sequential pattern mining algorithm to discover representative and discriminative API call patterns. Then, we apply three machine learning algorithms to classify malware samples. Based on the experimental results, the proposed method assures favorable results with 0.999 F-measure on a dataset including 8152 malware samples belonging to 16 families and 523 benign samples.
Application of data mining based malicious code detection techniques for dete...UltraUploader
This document discusses using data mining techniques to detect spyware. It applies Naive Bayes algorithms used in previous work to detect viruses to a dataset of 312 benign and 614 spyware executables. Feature extraction examines byte sequences in files. Initial tests showed low accuracy, but removing Trojan programs from the dataset improved results, with a window size of 4 and without Trojans achieving 80% detection with a 4% false positive rate. Future work proposes testing against larger window sizes and obfuscated code.
TriggerScope: Towards Detecting Logic Bombs in Android ApplicationsPietro De Nicolao
Presentation of paper "TriggerScope: Towards Detecting Logic Bombs in Android Applications" for the course of Advanced Topics in Computer Security of prof. Stefano Zanero.
Source and further information: https://github.com/pietrodn/triggerscope
The document proposes new methods for automatically generating malware invariants from binary code to detect and identify malware. Current signature-based malware detectors can be evaded through obfuscation, but malware invariants capture semantic properties that are more difficult to obfuscate. The method involves using formal methods and static analysis to extract invariants from binary code and represent them as semantic signatures, called malware invariants, that can be matched against suspicious code to detect malware families. Combining multiple static and dynamic analysis tools can help generate strong malware invariants that circumvent common obfuscation techniques used by malware writers.
Agisa towards automatic generation of infection signaturesUltraUploader
This document describes AGIS, a system for automatically generating infection signatures to detect compromised systems. AGIS monitors the runtime behavior of suspicious code to detect infections based on security policy violations. It then uses dynamic and static analysis to identify the characteristic behaviors of the malware in terms of system/API calls. Important instructions related to the infection's malicious behavior are extracted from executables to generate signatures that can be used by scanners to detect the infection. AGIS was implemented on Windows and evaluated against real malware samples, demonstrating its ability to detect new infections and generate high-quality signatures.
A zero-day (also known as 0-day) vulnerability is a computer-software vulnerability that is unknown to those who would be interested in mitigating the vulnerability.
ABSTRACT :
--------------------
Modern malware that are metamorphic or polymorphic in nature mutate their code by employing code obfuscation and encryption methods to thwart detection. Thus, conventional signature based scanners fail to detect these malware. In order to address the problems of detecting known variants of metamorphic malware, we propose a method using bioinformatics techniques effectively used for Protein and DNA matching. Instead of using exact signature matching methods, more sophisticated signature(s) are extracted using multiple sequence alignment (MSA). The results show that the proposed method is capable of identifying malware variants with minimum false alarms and misses. Also, the detection rate achieved with our proposed method is better compared to commercial antivirus products used in the study.
Status:
----------
This work has been accepted by 8th IEEE International Conference on Innovations in Information Technology (Innovations'12).
Link:
-------
http://ieeexplore.ieee.org/xpl/login.jsp?reload=true&tp=&arnumber=6207739&url=http://ieeexplore.ieee.org/iel5/6203543/6207707/06207739.pdf?arnumber=6207739
e-mail: grijesh.mnit@gmail.com
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREIJNSA Journal
In the era of information technology and connected world, detecting malware has been a major security concern for individuals, companies and even for states. The New generation of malware samples upgraded with advanced protection mechanism such as packing, and obfuscation frustrate anti-virus solutions. API call analysis is used to identify suspicious malicious behavior thanks to its description capability of a software functionality. In this paper, we propose an effective and efficient malware detection method that uses sequential pattern mining algorithm to discover representative and discriminative API call patterns. Then, we apply three machine learning algorithms to classify malware samples. Based on the experimental results, the proposed method assures favorable results with 0.999 F-measure on a dataset including 8152 malware samples belonging to 16 families and 523 benign samples.
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREIJNSA Journal
In the era of information technology and connected world, detecting malware has been a major security concern for individuals, companies and even for states. The New generation of malware samples upgraded with advanced protection mechanism such as packing, and obfuscation frustrate anti-virus solutions. API call analysis is used to identify suspicious malicious behavior thanks to its description capability of a
software functionality. In this paper, we propose an effective and efficient malware detection method that uses sequential pattern mining algorithm to discover representative and discriminative API call patterns. Then, we apply three machine learning algorithms to classify malware samples. Based on the experimental results, the proposed method assures favorable results with 0.999 F-measure on a dataset including 8152
malware samples belonging to 16 families and 523 benign samples.
This document summarizes a study on automatically generating heuristic classifiers to detect unknown Win32 viruses. The researchers constructed multiple neural network classifiers using distinct n-gram features from virus samples. They found that combining the individual classifier outputs using voting reduced false positives to a very low level while only slightly increasing false negatives. This voting approach made it practical to retrain the classifiers regularly on updated sample sets.
IRJET - Heuristic Approach to Intrusion Detection SystemIRJET Journal
1. The document proposes a hybrid intrusion detection system called SPARTAN that combines signature-based and anomaly-based detection to improve detection rates and reduce false positives/negatives.
2. It analyzes the limitations of signature-based and anomaly-based detection alone and argues their combination can leverage the benefits of both.
3. SPARTAN's framework includes a signature detection engine, anomaly detection engine, and signature generation engine that work together to detect known attacks, identify anomalies, and automatically update signatures without much administrator intervention.
A Smart Fuzzing Approach for Integer Overflow DetectionITIIIndustries
Fuzzing is one of the most commonly used methods to detect software vulnerabilities, a major cause of information security incidents. Although it has advantages of simple design and low error report, its efficiency is usually poor. In this paper we present a smart fuzzing approach for integer overflow detection and a tool, SwordFuzzer, which implements this approach. Unlike standard fuzzing techniques, which randomly change parts of the input file with no information about the underlying syntactic structure of the file, SwordFuzzer uses online dynamic taint analysis to identify which bytes in the input file are used in security sensitive operations and then focuses on mutating such bytes. Thus, the generated inputs are more likely to trigger potential vulnerabilities. We evaluated SwordFuzzer with an example program and a number of real-world applications. The experimental results show that SwordFuzzer can accurately locate the key bytes of the input file and dramatically improve the effectiveness of fuzzing in detecting real-world vulnerabilities
IRJET- Android Malware Detection using Machine LearningIRJET Journal
This document discusses using machine learning algorithms to detect Android malware. It aims to extract features from Android applications (APKs) and train machine learning models to classify APKs as malware or benign. The proposed approach extracts features from an APK's manifest file and decompiled code to identify permissions, URLs, API calls, and other indicators. Random forest classifiers are trained on a dataset of benign and malicious APKs to detect known malware families. The models can classify new APKs as either malware or benign, and if malware, identify the specific malware family. The approach aims to detect malware with high accuracy while reducing analysis time by processing multiple APKs in parallel.
This document proposes a data mining framework to automatically detect new malicious executables. It extracts features from binaries and uses three data mining classifiers trained on these features: a rule learner, probabilistic classifier, and multi-classifier system. When evaluated on a test set, the framework detects 97.76% of new malicious binaries, more than doubling the detection rate of a signature-based method.
Malware Dectection Using Machine learningShubham Dubey
Malware detection is an important factor in the security of the computer systems. However, currently utilized signature-based methods cannot provide accurate detection of zero-day attacks and polymorphic viruses. That is why the need for machine learning-based detection arises.
A Security Analysis Framework Powered by an Expert SystemCSCJournals
Today\'s IT systems are facing a major challenge in confronting the fast rate of emerging security threats. Although many security tools are being employed within organizations in order to standup to these threats, the information revealed is very inferior in providing a rich understanding to the consequences of the discovered vulnerabilities. We believe expert systems can play an important role in capturing any security expertise from various sources in order to provide the informative deductions we are looking for from the supplied inputs. Throughout this research effort, we have built the Open Security Knowledge Engineered (OpenSKE) framework (http://code.google.com/p/openske), which is a security analysis framework built around an expert system in order to reason over the security information collected from external sources. Our implementation has been published online in order to facilitate and encourage online collaboration to increase the practical research within the field of security analysis.
Malware Detection Using Machine Learning TechniquesArshadRaja786
Malware viruses can be easily detected using machine learning Techniques such as K-Mean Algorithms, KNN algorithm, Boosted J48 Decision Tree and other Data Mining Techniques. Among them J48 proved to be more effective in detecting computer virus and upcoming networks worms...
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...Yandex
В докладе речь пойдёт о применении алгоритмов машинного обучения для обнаружения вредоносных приложений для Android. Я расскажу, как на базе Матрикснета в Яндексе был спроектирован высокопроизводительный инструмент для решения этой задачи. А также продемонстрирую, в каких случаях аналитические методы выявления вредоносного ПО помогают блокировать множество простых образцов вирусного кода. Затем мы поговорим о том, как можно усовершенствовать такие методы для обнаружения более хитроумных вредных программ.
Misuse detection approaches aim to detect known intrusion patterns by encoding them as signatures. Signatures precisely define the patterns of events that characterize an intrusion. This allows misuse detection to be fully effective against known attacks but provides no detection for unknown attacks. Common misuse detection techniques include pattern matching, rule-based systems, and state-based analysis. Pattern matching searches for encoded signatures in audit data, while rule-based systems apply rules to detect intrusion scenarios. State-based systems represent intrusions as state transitions to identify compromised system states.
This document provides an overview of software testing concepts and best practices. It discusses why software testing is important given that no software is 100% defect-free. It then covers testing objectives, principles, techniques including black-box and white-box testing, testing levels, test management, and characteristics of a good software tester. The document emphasizes that testing should start early and continue throughout the software development life cycle.
robust malware detection for iot devices using deep eigen space learningVenkat Projects
Internet of Things (IoT) in military settings generally consists of a diverse range of Internet-connected devices and nodes (e.g. medical devices and wearable combat uniforms). These IoT devices and nodes are a valuable target for cyber criminals, particularly state-sponsored or nation state actors. A common attack vector is the use of malware. In this paper, we present a deep learning based method to detect Internet Of Battlefield Things (IoBT) malware via the device’s Operational Code (OpCode) sequence. We transmute OpCodes into a vector space and apply a deep Eigenspace learning approach to classify malicious and benign applications. We also demonstrate the robustness of our proposed approach in malware detection and its sustainability against junk code insertion attacks. Lastly, we make available our malware sample on Github, which hopefully will benefit future research efforts (e.g. to facilitate evaluation of future malware detection approaches).
Analysis, Anti-Analysis, Anti-Anti-Analysis: An Overview of the Evasive Malwa...Marcus Botacin
Slides da minha apresentação no Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSEG 2017) referente a minha co-orientação do aluno Vitor Falcão: Um survey sobre técnicas de anti-análise e contramedidas para lidar com software malicioso (malware), includindo técnicas de anti-vm, anti-debug e anti-disassembly.
Classification of Malware based on Data Mining Approachijsrd.com
This document discusses a system called the Intelligent Malware Detection System (IMDS) that uses data mining techniques to classify malware. The IMDS uses a PE parser to extract API execution sequences from Windows portable executable files. It then applies an OOA mining algorithm called OOA_Fast_FP-Growth to generate association rules from the API sequences to classify files as malware or benign. Experimental results showed the IMDS outperformed other classification techniques and anti-virus software in detecting malware.
Advanced routing worm and its security challengesUltraUploader
An advanced routing worm is presented that can propagate faster than traditional worms. It does this by only scanning addresses that are routable based on Border Gateway Protocol (BGP) routing tables, reducing the scanning space. This allows the routing worm to spread up to three times faster than worms that scan the entire IPv4 address space. Additionally, the geographic information from BGP tables enables selective attacks targeting specific countries, companies, or networks. Routing worms pose new security challenges and are important to study for simulating internet-scale worm propagation.
Adequacy of checksum algorithms for computer virus detectionUltraUploader
Checksums are used to detect changes in files and detect computer viruses. However, checksum algorithms that detect random errors are not sufficient against an entity deliberately trying to fool the checksum. An attacker can insert forged data that generates the same checksum as the original. The paper discusses three types of attacks against checksums (brute force, birthday, trap door) and features needed for checksum algorithms to be resistant to attackers, such as having a long length and being non-invertible. It also describes tests to evaluate if a checksum algorithm provides even mapping and is non-invertible.
A software authentication system for the prevention of computer virusesUltraUploader
This document proposes a software authentication system to help prevent the spread of computer viruses. It begins by reviewing existing anti-virus approaches like pattern-matching and checksum verification. These approaches have limitations, as they cannot detect new viruses and rely on an assumption that all software is virus-free. The proposed system aims to overcome these issues by having software vendors digitally sign their programs, without requiring a central trusted authority. Vendors generate their own keys, and certification centers hold pieces of the private key to authenticate signatures locally without user interaction. This allows vendors to verify software authenticity while maintaining independence.
Anti disassembly using cryptographic hash functionsUltraUploader
This document proposes and evaluates a new method of anti-disassembly for computer viruses based on cryptographic hash functions. It uses dynamic code generation to obscure viral code until runtime, making static analysis difficult. The method finds byte sequences or "runs" within hash function outputs by brute-forcing salt values concatenated to an input key. Empirical tests found salts to produce desired runs for MD5, SHA-1 and SHA-256 in hours on a desktop computer, demonstrating the method's viability. It is portable, targeted, and the code is never present in analyzable form before running.
A study of detecting computer viruses in real infected files in the n-gram re...UltraUploader
This document discusses detecting computer viruses in real-infected executable files using machine learning methods and n-gram representations. It created three datasets: benign files, virus loader files, and real infected executable files. Experiments showed that while machine learning methods can detect viruses in small virus loader files, detecting viruses is nearly impossible in real infected executable files when represented as n-grams due to the high dimensionality and low information content of the n-gram vectors. The document explores this limitation from an information theoretic perspective and through classification experiments.
A memory symptom based virus detection approachUltraUploader
This document proposes a new memory symptom-based approach for virus detection. It summarizes that traditional anti-virus software relies on pattern matching which has problems like damage during the zero-day period before new virus patterns are added. The proposed approach detects viruses based on their memory usage symptoms during execution, rather than pattern matching. An experiment analyzed the memory usage curves of 109 test programs and was able to detect viruses with 97% true positive rate and only 13% false positive rate, showing the effectiveness of this symptom-based approach.
A physiological decomposition of virus and worm programsUltraUploader
This thesis presents a physiological model of viruses and worms by identifying their distinct functional organs or components. It analyzes existing virus and worm programs to understand how they implement malicious behaviors. The goal is to provide a framework for developing proactive techniques to detect known and unknown viruses based on their physiological properties, unlike current reactive antivirus methods. The thesis studies virus and worm implementations in Microsoft VBA and VBScript to illustrate key behaviors. It aims to advance the field from a reactive to proactive approach through a deeper understanding of viral physiology.
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREIJNSA Journal
In the era of information technology and connected world, detecting malware has been a major security concern for individuals, companies and even for states. The New generation of malware samples upgraded with advanced protection mechanism such as packing, and obfuscation frustrate anti-virus solutions. API call analysis is used to identify suspicious malicious behavior thanks to its description capability of a
software functionality. In this paper, we propose an effective and efficient malware detection method that uses sequential pattern mining algorithm to discover representative and discriminative API call patterns. Then, we apply three machine learning algorithms to classify malware samples. Based on the experimental results, the proposed method assures favorable results with 0.999 F-measure on a dataset including 8152
malware samples belonging to 16 families and 523 benign samples.
This document summarizes a study on automatically generating heuristic classifiers to detect unknown Win32 viruses. The researchers constructed multiple neural network classifiers using distinct n-gram features from virus samples. They found that combining the individual classifier outputs using voting reduced false positives to a very low level while only slightly increasing false negatives. This voting approach made it practical to retrain the classifiers regularly on updated sample sets.
IRJET - Heuristic Approach to Intrusion Detection SystemIRJET Journal
1. The document proposes a hybrid intrusion detection system called SPARTAN that combines signature-based and anomaly-based detection to improve detection rates and reduce false positives/negatives.
2. It analyzes the limitations of signature-based and anomaly-based detection alone and argues their combination can leverage the benefits of both.
3. SPARTAN's framework includes a signature detection engine, anomaly detection engine, and signature generation engine that work together to detect known attacks, identify anomalies, and automatically update signatures without much administrator intervention.
A Smart Fuzzing Approach for Integer Overflow DetectionITIIIndustries
Fuzzing is one of the most commonly used methods to detect software vulnerabilities, a major cause of information security incidents. Although it has advantages of simple design and low error report, its efficiency is usually poor. In this paper we present a smart fuzzing approach for integer overflow detection and a tool, SwordFuzzer, which implements this approach. Unlike standard fuzzing techniques, which randomly change parts of the input file with no information about the underlying syntactic structure of the file, SwordFuzzer uses online dynamic taint analysis to identify which bytes in the input file are used in security sensitive operations and then focuses on mutating such bytes. Thus, the generated inputs are more likely to trigger potential vulnerabilities. We evaluated SwordFuzzer with an example program and a number of real-world applications. The experimental results show that SwordFuzzer can accurately locate the key bytes of the input file and dramatically improve the effectiveness of fuzzing in detecting real-world vulnerabilities
IRJET- Android Malware Detection using Machine LearningIRJET Journal
This document discusses using machine learning algorithms to detect Android malware. It aims to extract features from Android applications (APKs) and train machine learning models to classify APKs as malware or benign. The proposed approach extracts features from an APK's manifest file and decompiled code to identify permissions, URLs, API calls, and other indicators. Random forest classifiers are trained on a dataset of benign and malicious APKs to detect known malware families. The models can classify new APKs as either malware or benign, and if malware, identify the specific malware family. The approach aims to detect malware with high accuracy while reducing analysis time by processing multiple APKs in parallel.
This document proposes a data mining framework to automatically detect new malicious executables. It extracts features from binaries and uses three data mining classifiers trained on these features: a rule learner, probabilistic classifier, and multi-classifier system. When evaluated on a test set, the framework detects 97.76% of new malicious binaries, more than doubling the detection rate of a signature-based method.
Malware Dectection Using Machine learningShubham Dubey
Malware detection is an important factor in the security of the computer systems. However, currently utilized signature-based methods cannot provide accurate detection of zero-day attacks and polymorphic viruses. That is why the need for machine learning-based detection arises.
A Security Analysis Framework Powered by an Expert SystemCSCJournals
Today\'s IT systems are facing a major challenge in confronting the fast rate of emerging security threats. Although many security tools are being employed within organizations in order to standup to these threats, the information revealed is very inferior in providing a rich understanding to the consequences of the discovered vulnerabilities. We believe expert systems can play an important role in capturing any security expertise from various sources in order to provide the informative deductions we are looking for from the supplied inputs. Throughout this research effort, we have built the Open Security Knowledge Engineered (OpenSKE) framework (http://code.google.com/p/openske), which is a security analysis framework built around an expert system in order to reason over the security information collected from external sources. Our implementation has been published online in order to facilitate and encourage online collaboration to increase the practical research within the field of security analysis.
Malware Detection Using Machine Learning TechniquesArshadRaja786
Malware viruses can be easily detected using machine learning Techniques such as K-Mean Algorithms, KNN algorithm, Boosted J48 Decision Tree and other Data Mining Techniques. Among them J48 proved to be more effective in detecting computer virus and upcoming networks worms...
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...Yandex
В докладе речь пойдёт о применении алгоритмов машинного обучения для обнаружения вредоносных приложений для Android. Я расскажу, как на базе Матрикснета в Яндексе был спроектирован высокопроизводительный инструмент для решения этой задачи. А также продемонстрирую, в каких случаях аналитические методы выявления вредоносного ПО помогают блокировать множество простых образцов вирусного кода. Затем мы поговорим о том, как можно усовершенствовать такие методы для обнаружения более хитроумных вредных программ.
Misuse detection approaches aim to detect known intrusion patterns by encoding them as signatures. Signatures precisely define the patterns of events that characterize an intrusion. This allows misuse detection to be fully effective against known attacks but provides no detection for unknown attacks. Common misuse detection techniques include pattern matching, rule-based systems, and state-based analysis. Pattern matching searches for encoded signatures in audit data, while rule-based systems apply rules to detect intrusion scenarios. State-based systems represent intrusions as state transitions to identify compromised system states.
This document provides an overview of software testing concepts and best practices. It discusses why software testing is important given that no software is 100% defect-free. It then covers testing objectives, principles, techniques including black-box and white-box testing, testing levels, test management, and characteristics of a good software tester. The document emphasizes that testing should start early and continue throughout the software development life cycle.
robust malware detection for iot devices using deep eigen space learningVenkat Projects
Internet of Things (IoT) in military settings generally consists of a diverse range of Internet-connected devices and nodes (e.g. medical devices and wearable combat uniforms). These IoT devices and nodes are a valuable target for cyber criminals, particularly state-sponsored or nation state actors. A common attack vector is the use of malware. In this paper, we present a deep learning based method to detect Internet Of Battlefield Things (IoBT) malware via the device’s Operational Code (OpCode) sequence. We transmute OpCodes into a vector space and apply a deep Eigenspace learning approach to classify malicious and benign applications. We also demonstrate the robustness of our proposed approach in malware detection and its sustainability against junk code insertion attacks. Lastly, we make available our malware sample on Github, which hopefully will benefit future research efforts (e.g. to facilitate evaluation of future malware detection approaches).
Analysis, Anti-Analysis, Anti-Anti-Analysis: An Overview of the Evasive Malwa...Marcus Botacin
Slides da minha apresentação no Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSEG 2017) referente a minha co-orientação do aluno Vitor Falcão: Um survey sobre técnicas de anti-análise e contramedidas para lidar com software malicioso (malware), includindo técnicas de anti-vm, anti-debug e anti-disassembly.
Classification of Malware based on Data Mining Approachijsrd.com
This document discusses a system called the Intelligent Malware Detection System (IMDS) that uses data mining techniques to classify malware. The IMDS uses a PE parser to extract API execution sequences from Windows portable executable files. It then applies an OOA mining algorithm called OOA_Fast_FP-Growth to generate association rules from the API sequences to classify files as malware or benign. Experimental results showed the IMDS outperformed other classification techniques and anti-virus software in detecting malware.
Advanced routing worm and its security challengesUltraUploader
An advanced routing worm is presented that can propagate faster than traditional worms. It does this by only scanning addresses that are routable based on Border Gateway Protocol (BGP) routing tables, reducing the scanning space. This allows the routing worm to spread up to three times faster than worms that scan the entire IPv4 address space. Additionally, the geographic information from BGP tables enables selective attacks targeting specific countries, companies, or networks. Routing worms pose new security challenges and are important to study for simulating internet-scale worm propagation.
Adequacy of checksum algorithms for computer virus detectionUltraUploader
Checksums are used to detect changes in files and detect computer viruses. However, checksum algorithms that detect random errors are not sufficient against an entity deliberately trying to fool the checksum. An attacker can insert forged data that generates the same checksum as the original. The paper discusses three types of attacks against checksums (brute force, birthday, trap door) and features needed for checksum algorithms to be resistant to attackers, such as having a long length and being non-invertible. It also describes tests to evaluate if a checksum algorithm provides even mapping and is non-invertible.
A software authentication system for the prevention of computer virusesUltraUploader
This document proposes a software authentication system to help prevent the spread of computer viruses. It begins by reviewing existing anti-virus approaches like pattern-matching and checksum verification. These approaches have limitations, as they cannot detect new viruses and rely on an assumption that all software is virus-free. The proposed system aims to overcome these issues by having software vendors digitally sign their programs, without requiring a central trusted authority. Vendors generate their own keys, and certification centers hold pieces of the private key to authenticate signatures locally without user interaction. This allows vendors to verify software authenticity while maintaining independence.
Anti disassembly using cryptographic hash functionsUltraUploader
This document proposes and evaluates a new method of anti-disassembly for computer viruses based on cryptographic hash functions. It uses dynamic code generation to obscure viral code until runtime, making static analysis difficult. The method finds byte sequences or "runs" within hash function outputs by brute-forcing salt values concatenated to an input key. Empirical tests found salts to produce desired runs for MD5, SHA-1 and SHA-256 in hours on a desktop computer, demonstrating the method's viability. It is portable, targeted, and the code is never present in analyzable form before running.
A study of detecting computer viruses in real infected files in the n-gram re...UltraUploader
This document discusses detecting computer viruses in real-infected executable files using machine learning methods and n-gram representations. It created three datasets: benign files, virus loader files, and real infected executable files. Experiments showed that while machine learning methods can detect viruses in small virus loader files, detecting viruses is nearly impossible in real infected executable files when represented as n-grams due to the high dimensionality and low information content of the n-gram vectors. The document explores this limitation from an information theoretic perspective and through classification experiments.
A memory symptom based virus detection approachUltraUploader
This document proposes a new memory symptom-based approach for virus detection. It summarizes that traditional anti-virus software relies on pattern matching which has problems like damage during the zero-day period before new virus patterns are added. The proposed approach detects viruses based on their memory usage symptoms during execution, rather than pattern matching. An experiment analyzed the memory usage curves of 109 test programs and was able to detect viruses with 97% true positive rate and only 13% false positive rate, showing the effectiveness of this symptom-based approach.
A physiological decomposition of virus and worm programsUltraUploader
This thesis presents a physiological model of viruses and worms by identifying their distinct functional organs or components. It analyzes existing virus and worm programs to understand how they implement malicious behaviors. The goal is to provide a framework for developing proactive techniques to detect known and unknown viruses based on their physiological properties, unlike current reactive antivirus methods. The thesis studies virus and worm implementations in Microsoft VBA and VBScript to illustrate key behaviors. It aims to advance the field from a reactive to proactive approach through a deeper understanding of viral physiology.
This document proposes a general, formal definition of malware as a single sentence in modal logic. It defines malware as software that causes incorrectness in other software. It also derives formal definitions for benign software (benware), anti-malware, and medical software for affected systems (medware) based on this definition of malware. The document argues that its definition of malware is abstract and independent of specific malware manifestations, making it generally applicable.
A filter that prevents the spread of mail attachment-type trojan horse comput...UltraUploader
The document describes a mail filter developed to prevent the spread of the "W32/Sircam" computer worm via email attachments. The filter checks for specific phrases and file extensions associated with W32/Sircam. When implemented on an email server, the filter successfully rejected all emails containing W32/Sircam without delaying other email traffic, ending the worm outbreak on the network within 4 days. The filter demonstrates how targeted filtering can block malware spread through email without overly restricting normal email flows.
This document summarizes a technical report about a framework called Honeyd that allows for the creation of virtual honeypots. Honeyd simulates entire computer systems at the network level by emulating different operating system behaviors and can provide arbitrary routing topologies and services. It is designed to deceive network fingerprinting and mapping tools for security purposes such as detecting attacks and studying adversaries.
This document proposes an email worm vaccine architecture that uses behavior-based anomaly detection to intercept incoming emails and scan attachments in virtual machines to detect malicious software. The system includes a virtual machine cluster to open attachments safely, a host-based intrusion detection system to monitor for dangerous behaviors, and an email-aware mail transfer agent to classify messages and communicate with the detection system. The implementation demonstrates detecting malware using parallel virtual machines while maintaining a low false positive rate.
This document discusses anti-virus strategies for large corporations. It notes that virus outbreaks are getting out of hand as users won't scan or care about viruses. The document examines various anti-virus technologies like scanners, memory resident programs, device drivers, and heuristic analysis. It recommends a multi-layered approach using different technologies along with clear policies, education of staff, and independent testing of anti-virus software.
A semantics based approach to malware detectionUltraUploader
This document proposes a semantics-based framework for malware detection that reasons about program behavior. It defines what it means for a detector to be sound and complete with respect to obfuscations. The framework uses trace semantics to characterize program behaviors and abstract interpretation to ignore irrelevant details. It shows this framework in action by proving a previous semantics-aware malware detector is complete against common obfuscations.
This document discusses a potential type of malicious software called a "semantic virus" that could undermine the utility of semantic search engines by generating and submitting random, syntactically valid but semantically incorrect RDF data. The virus would take existing RDF triples as input and generate new, related triples that are fake, such as claiming "Galway is part of England". It could submit this noisy data at scale using services like Ping The Semantic Web. While digital signatures can address authentication, there is an open challenge to validate the quality and accuracy of semantic data at internet scale.
An overview of computer viruses in a research environmentUltraUploader
This document provides an overview of computer viruses in a research environment. It examines computer viruses as malicious software, relates them to models of security and integrity, and looks at research techniques for controlling threats from viruses and other malicious code. The document focuses on environments like UNIX and VAX/VMS operating systems used for research and development.
Analysis and detection of metamorphic computer virusesUltraUploader
This document presents research on analyzing and detecting metamorphic computer viruses. It examines four virus creation kits to quantify their ability to generate variant viruses. Hidden Markov models are developed and shown to accurately classify viruses as belonging to a particular family, outperforming three commercial virus scanners. A simpler detection method using similarity indexing also detects metamorphic viruses with high accuracy. The research demonstrates that current commercial scanners cannot detect the most advanced metamorphic viruses, but the presented techniques can detect such viruses effectively.
An approach to containing computer virusesUltraUploader
This document proposes a mechanism for detecting modifications to executable files using encryption. The mechanism works by encrypting executables using a cryptosystem. At runtime, the encrypted executable is decrypted and checked for modifications before being executed. If modifications are detected, the proper authorities are notified. While this mechanism cannot prevent modifications, it can detect them when they occur and limit the potential damage of computer viruses. The mechanism is most effective when all executables are encrypted, but can still provide some protection even if not all files are encrypted.
A theoretical model of differential social attributions toward computing tech...UltraUploader
This document proposes a theoretical model to explain how and why people attribute human characteristics to computing technologies. It discusses how the common practice of anthropomorphism, or ascribing human traits to computers, can have both positive and negative effects. The model suggests that an individual's perspective on the computer exists on a continuum between viewing it as a locally simplex tool or a globally complex entity. One's position is influenced by the social cues of the technology, the person's self-evaluation, the interaction context, and available attributional cues. The goal is to better understand this phenomenon and guide future research on social interactions with technology.
This summary provides the key details about a generic virus scanner in C++ described in the document:
The document describes a generic virus scanner implemented in C++ that can scan files for viruses across different file systems, file types, and operating systems. It defines an abstract class called VirInfo that encapsulates common virus features, and subclasses can be used to define viruses that infect different systems. The scanner's general design allows it to potentially scan for other types of threats beyond just viruses. Signature scanning is identified as the most common and effective method for detecting known viruses described in the document.
A fault tolerance approach to computer virusesUltraUploader
This document discusses applying fault tolerance techniques to address the problem of computer viruses. Specifically, it proposes using Program Flow Monitors (PFMs) and N-Version Programming (NVP) to detect and contain computer viruses. PFMs involve monitoring program control flow to detect deviations from expected behavior, which could indicate a virus. NVP involves running multiple versions of a program and voting on the outputs to mask any errors introduced by viruses or faults in one version. The document characterizes computer viruses as both faults and errors that can propagate through systems, and analyzes how PFMs and NVP could address the different stages of a virus's behavior.
Are current antivirus programs able to detect complex metamorphic malware an ...UltraUploader
This document presents research evaluating current antivirus programs' ability to detect complex metamorphic malware. The researchers designed a metamorphic engine to mutate malware code while preserving functionality. They applied this engine to the MyDoom worm and tested major antivirus products. Most products were unable to reliably detect the mutated MyDoom, demonstrating limitations in static detection techniques and the need for dynamic behavioral analysis to identify evolved malware variants.
Architecture of a morphological malware detectorUltraUploader
This document proposes an architecture for a morphological malware detector that combines syntactic and semantic analysis. It builds an efficient signature matching engine using tree automata techniques to represent control flow graphs (CFG). It also describes a graph rewriting engine to handle common malware mutations. The detector extracts CFGs from malware binaries to generate signatures, which are compiled into a minimal automaton database for efficient matching. Experiments showed promising results with a low false positive rate.
This document introduces a method for anomaly detection in Unix processes based on defining "normal" as short sequences of system calls. Initial experiments found this definition to be stable during normal behavior and able to detect common intrusions. The method builds a database of normal system call sequences from process traces and flags abnormalities as sequences not in the database. Testing on sendmail found the database captured normal variations while detecting anomalies introduced by documented attacks.
A fast static analysis approach to detect exploit code inside network flowsUltraUploader
This document proposes a static analysis approach to detect exploit code within network flows. It aims to distinguish between data and executable code by looking for evidence of meaningful data and control flow patterns within binary code fragments recovered through disassembly, without knowing the exact location of the code. The approach is evaluated on real network trace data and is shown to detect a variety of exploit code, including polymorphic and metamorphic variants. It also automatically generates precise signatures to complement signature-based detection systems.
This document discusses and compares signature-based and behavior-based anti-malware approaches. Signature-based detection identifies malware by matching patterns in software to known malware signatures but is susceptible to evasion and cannot detect new malware. Behavior-based detection monitors program behaviors and flags anomalous behaviors as potentially malicious, but it can produce false positives and be evaded through mimicry attacks. The document also describes specification-based monitoring, a behavior-based technique that mediates program events according to security policies.
This document discusses and compares signature-based and behavior-based anti-malware approaches. Signature-based detection identifies malware by matching patterns in software to known malware signatures but is susceptible to evasion and cannot detect new malware. Behavior-based detection monitors program behaviors and flags anomalous behaviors as potentially malicious, but it can produce false positives and be evaded through mimicry attacks. The document also describes specification-based monitoring, a behavior-based technique that mediates program events according to security policies.
System Event Monitoring for Active AuthenticationCoveros, Inc.
The authors use system event monitoring to distinguish between the behavioral characteristics of normal and anomalous computer system users. Identifying anomalous behavior at the system event level diminishes privacy concerns and supports the identification of cross-application behavioral patterns.
Optimised Malware Detection in Digital Forensics IJNSA Journal
This summarizes a research paper that proposes developing a new framework to optimize malware detection in digital forensics investigations. The paper discusses challenges with existing detection methods, such as signature-based approaches requiring extensive manual analysis. Through a market research survey of forensics professionals, the paper finds weaknesses in current skills, tools, and accuracy rates. Most respondents agreed a new customized detection tool is needed that employs both dynamic and static analysis methods. The proposed framework aims to address these issues to more effectively detect and analyze malware.
A generic virus detection agent on the internetUltraUploader
VICEd is a system for generic virus detection over the Internet. It detects viruses based on their behavior rather than pattern matching, making it more effective against unknown or mutated viruses. It uses an emulator to simulate program execution and generate behavior sequences, and a virus analyzer containing rules to detect known virus behaviors. This allows detection to occur remotely over the Internet, with the user running the emulator locally and sending results for analysis by the provider.
Computer Worms Based on Monitoring Replication and Damage: Experiment and Eva...IOSRjournaljce
This document describes experiments conducted to evaluate a proposed worm detection system (WDS) and its ability to detect computer worms. The experiments involved networking three machines and transferring files between them that may contain worms. Five known worms that cause specific types of damage were tested to evaluate if the WDS could detect the worms through their damage or replication behavior. Additional experiments tested the WDS's ability to detect unknown worms and known worms through signature matching. The results showed that the WDS successfully detected the worms and met all evaluation criteria.
A Study On Countermeasures Against Computer Virus Propagation Using An Agent-...Sara Perez
This document describes a study that uses an agent-based simulation approach to model the propagation of computer viruses and evaluate potential countermeasures. The study models a computer network as spots that agents (computers) can move between. Computer states include virus infection levels and data/OS status. A simulation tracks the movement of agents and changes in their states over time according to probabilistic infection, detection, and countermeasure scenarios. The simulation aims to analyze virus spread when an infected computer connects to an intranet and evaluate how disconnection or security patches affect protection.
This document discusses the development of a software package that combines virus protection, key logging, and download management capabilities. It provides code snippets for each of these components, including a download manager class, virus detector class, and key logger class. The goal was to create popular software for both personal and corporate users by bundling useful utilities with potential privacy concerns like key logging.
Intrusion detection system based on web usage miningIJCSEA Journal
This artical present a system developed to find cyber threats automatically based on web usage mining
methods in application layer. This system is an off-line intrusion detection system which includes different
part to detect attacks and as a result helps find different kinds of attacks with different dispersals. In this
study web server access logs used as the input data and after pre-processing, scanners and all identified
attacks will be detected. As the next step, vectors feature from web access logs and parameters sent by
HTTP will derived by three different means and at the end by employment of two clustering algorithms
based on K-Means, anomaly behaviour of data are detached. Tentative results derived from this system
represent that used methods are more applicable than similar systems because this system covers different
kinds of attacks and mostly increase the accuracy and decrease false alarms.
APPBACS: AN APPLICATION BEHAVIOR ANALYSIS AND CLASSIFICATION SYSTEMijcsit
Number and complicacy of malware attack has increased multiple folds in recent times. Informed Internet
users generally keep their computer protected but get confused when it comes to execute the untrusted
applications. In such cases users may fall prey to malicious applications. There are malware behavior
analyzers available but leave report analysis to the user. Common users are not trained to understand and
analyze these reports, and generally expect direct recommendation whether to execute this application on
their computer. This research paper tries to analyze behavior and help the common users and analysts to
quickly classify an application as safe or malicious.
Automatic static unpacking of malware binariesUltraUploader
This document discusses an approach to automatically unpacking malware binaries through static analysis techniques. It begins by explaining the problem with relying solely on dynamic analysis for unpacking malware, as dynamic techniques are susceptible to anti-monitoring defenses. The document then presents its approach, which first uses program analysis to identify potential self-modifying code that could indicate unpacking. It focuses on finding "transition points" where control flow moves from unmodified to modified code. The code between these transition points represents the static unpacker that can be analyzed to perform unpacking without prior knowledge of the packing algorithm. Control and data flow analysis are then used to identify and potentially neutralize any dynamic defenses in the unpacker.
Optimised malware detection in digital forensicsIJNSA Journal
On the Internet, malware is one of the most serious threats to system security. Most complex issues and
problems on any systems are caused by malware and spam. Networks and systems can be accessed and
compromised by malware known as botnets, which compromise other systems through a coordinated
attack. Such malware uses anti-forensic techniques to avoid detection and investigation. To prevent systems
from the malicious activity of this malware, a new framework is required that aims to develop an optimised
technique for malware detection. Hence, this paper demonstrates new approaches to perform malware
analysis in forensic investigations and discusses how such a framework may be developed.
A method for detecting abnormal program behavior on embedded devicesRaja Ram
The document presents a method for detecting abnormal program behavior on embedded devices using a self-organizing map (SOM) approach. It extracts features from the processor's program counter and cycles per instruction, and uses these features to train an unsupervised SOM to classify program behavior. Testing on an ARM Cortex-M3 processor showed the method can identify unknown program behaviors not in the training set with over 98.4% accuracy.
This document discusses enhancing SIEM correlation rules through baselining. It proposes using baselining to identify anomalous events in addition to rule-based correlation. Three approaches are suggested for updating baselines: 1) using a static window baseline, 2) using an extended window to dynamically update the baseline, and 3) using a fixed sliding time window to dynamically prune historical data effects. The goal is to reduce false positive alerts by investigating different baseline training techniques and lifecycle updating approaches.
Similar to A model for detecting the existence of unknown computer viruses in real time (20)
This document is the manual for PHP, the PHP Documentation Group's copyright from 1997 to 2002. It contains information about installing and configuring PHP on various operating systems like Unix, Linux, Windows, etc. It also covers PHP syntax, functions, classes, and other features. The manual is distributed under the GNU General Public License and parts of it are also distributed under the Open Publication License. It was translated into Italian with contributions from multiple people.
Broadband network virus detection system based on bypass monitorUltraUploader
The document describes a Broadband Network Virus Detection System (VDS) based on bypass monitoring that can detect viruses on high-speed networks. The VDS uses four detection engines to analyze network traffic for viruses based on binary content, URLs, emails, and scripts. It accurately logs statistical information on detected viruses like name, source/target IPs, and spread frequency. The VDS mirrors network traffic to a detection engine in real-time without needing to reassemble packets into files. This allows it to efficiently detect viruses directly in network packets or data streams on gigabit-speed networks.
This document discusses botnets and their applications. It begins with an overview of botnets, how they are controlled through command and control servers, and how rootkits can help conceal botnet activity. It then explores how botnets can be used for spam, phishing, click fraud, identity theft, and distributed denial-of-service attacks. Detection and mitigation techniques are also summarized, including network intrusion detection, honeynets, DNS monitoring, and modeling botnet propagation across timezones. Recent botnets like AgoBot, PhatBot, and Bobax are also examined in the context of spam distribution. Open research questions around botnet membership detection, click fraud detection, and phishing detection are presented.
Bot software spreads, causes new worriesUltraUploader
Bot software infects millions of computers worldwide without the owners' knowledge and turns them into zombies that perform malicious tasks as part of a bot network. These bot networks, which can include thousands of infected computers, are used to spread viruses and worms, send spam emails, install spyware, and launch denial-of-service attacks. While initially just an automated way to spread malware, bot networks are now also used for criminal activities like identity theft due to their ability to stealthily command a large number of compromised computers. Security experts warn that the proliferation of bot networks poses serious risks and is very difficult to stop given their automation and scale.
Blended attacks exploits, vulnerabilities and buffer overflow techniques in c...UltraUploader
The document discusses blended threats that combine exploits and vulnerabilities with computer viruses. It begins with definitions of blended attacks and buffer overflows. It then describes three generations of buffer overflow techniques as well as other vulnerabilities exploited by blended threats, such as URL encoding and MIME header parsing. The document also discusses past threats like the Morris worm and CodeRed that blended exploits with viruses, and techniques used to combat future blended threats through defense in depth.
Win32/Blaster was a worm that exploited a vulnerability in Windows RPC to infect systems running Windows 2000 and Windows XP. It installed itself to automatically run on startup and then attempted to infect other systems on the local network and randomly selected IP addresses. The infection process involved exploiting the RPC vulnerability to execute a remote shell, downloading the worm binary, and executing it. It also launched a SYN flooding DDoS attack against Windows Update sites each month after the 16th. The worm spread quickly after the vulnerability was disclosed and highlighted the increasing automation and harm of worms.
Bird binary interpretation using runtime disassemblyUltraUploader
The document describes BIRD (Binary Interpretation using Runtime Disassembly), a binary analysis and instrumentation infrastructure for the Windows/x86 platform. BIRD combines static and dynamic disassembly to guarantee that every instruction in a binary is analyzed before execution. It provides services to convert binary code to assembly and insert instrumentation code without affecting program semantics. The prototype took 12 student months to develop and can successfully analyze applications like Microsoft Office, Internet Explorer, and IIS with low overhead of below 4%.
Biologically inspired defenses against computer virusesUltraUploader
This document discusses two biologically inspired approaches to computer virus detection and removal: a neural network virus detector that learns to identify infected and uninfected programs, and a computer immune system that can automatically identify, analyze, and remove new viruses from a system. The neural network technique has been incorporated into an IBM commercial antivirus product, while the computer immune system is still in prototype form. Both aim to replace human analysis of viruses to allow faster response times needed to address increasing rates of new virus creation and spread.
1. The document discusses biological viruses and computer viruses, providing background on how biological viruses work by hijacking cellular mechanisms of DNA replication, transcription and translation. It defines a computer virus as a piece of code with self-replicating ability that relies on other programs to exist, similar to biological viruses. 2. Computer viruses can cause damage by infecting programs which then infect other programs, potentially spreading like an epidemic across connected computers. 3. The document argues that a better understanding of biological and computer mechanisms can help improve defenses against viruses.
Biological aspects of computer virologyUltraUploader
This document discusses biological aspects of computer viruses and how factors that influence the spread of biological pathogens can also affect the propagation of computer malware. It analyzes three major factors that influence the spread of a computer worm: the infection propagator, which examines characteristics of exploited vulnerabilities like prevalence and age; the target locator, which focuses on how worms find new targets; and the worm's virulence, which looks at aspects that increase its infectiousness. The document suggests studying computer virus propagation through the lens of epidemiology models used for infectious diseases.
Biological models of security for virus propagation in computer networksUltraUploader
This document discusses how biological models of disease propagation and defense mechanisms in living organisms can inspire new approaches to computer network security and virus detection. Specifically, it describes how genetic regulatory networks that turn off harmful genes, protein interaction networks that model cellular processes, and epidemiological models of disease spread can provide models for automatically detecting and containing computer viruses without relying solely on pre-defined virus signatures. The authors propose several new security models drawing on these biological analogies, such as using surrogate code to maintain system functionality when parts are shut off, modeling network interactions to determine how viruses propagate, and evolving network services in real-time to reconstitute functionality after attacks.
Biological models of security for virus propagation in computer networks
A model for detecting the existence of unknown computer viruses in real time
1. A Model for Detecting the Existence
of Unknown Computer Viruses in Real-Time
Je rey M. Voas
Reliable Software Technologies
Penthouse Suite
1001 N. Highland Street
Arlington, VA 22210 USA
Je ery E. Payne
Reliable Software Technologies
Penthouse Suite
1001 N. Highland Street
Arlington, VA 22210 USA
Dr. Frederick B. Cohen
ASP
PO Box 81270
Pittsburgh, PA 15217 USA
Search terms: Computer viruses, Trojan horses, Models of computation, Virus detection
and prevention.
Abstract
This paper describes a model for detecting the existence of computer viruses in real-time.
1
2. 1 Background
Protection technologies in common use 5 are capable of preventing corruption by viruses
e.g. through mandatory access control, detecting known viruses e.g. by searching for
them, detecting speci c types of corruption as it occurs e.g. trapping the modi cation of
executable les in certain ways, and detecting corruption before it causes signi cant damage
e.g. through cryptographic checksums in integrity shells, but some points must be made
concerning their limited capabilities.
Any system that relies solely on access control to prevent corruption is no more secure
than the individuals or policies allow it to be. Even though sound access control
policies against viruses exist, the vast majority of current systems do not implement
these controls, and even if these controls are in place, current implementations are
imperfect. Even the most sound access control techniques only limit the extent of
transitive spread. 2
The use of known virus detection schemes doesn't detect viruses unknown to the writer
of the defense. New viruses cannot be detected with this technique, and the time and
space required for detection grows with the number of viruses.
Trapping attempts to modify executable les and other similar techniques don't detect
viruses corrupting non `executable' forms, corruption through the use of unusual input
parameters, or corruptions in forms other than those sought. They also prevent normal
system development activity under some circumstances and are thus of only limited
applicability. 5
In order for integrity shells to be ideal in an environment with change, we must have
some measure of the legitimacy of change. This cannot be e ective without some
understanding of intent. 3
Two of these virus detection techniques are posteriori in that they are ine ective until
after a corruption occurs. In the case of an integrity shell, detection occurs after `primary'
infection, and is capable of preventing `secondary' infection. In the case of known virus de-
tection techniques, detection doesn't normally occur until some human has detected damage,
traced it to a cause, and designed a method for detecting the cause. The a-priori technique
of trapping unusual modi cation behavior is quite weak in that it covers only a small portion
of the possible viruses. Access control techniques don't detect viruses, but only attempt to
limit their transitive spread.
Several attempts have been made to nd techniques which will reliably di erentiate
programs containing viruses from other programs by examination, dispite the widely know
result that this problem is undecidable 1 . Syntactic analysis has been attempted by several
authors 9 11 16 17 but none of these have a sound theoretical basis and all result
in in nite numbers of false positives and false negatives. In general, the problem of virus
2
3. detection is undecidable, and as we would therefore expect, in the most comprehensive study
published to date, a 50 false positive and 50 false negative rate was demonstrated 9 . The
use of evolutionary and self-encrypting viruses clearly demonstrates the futility of syntactic
analysis 4 , and virus attackers are now commonly applying this technique.
An alternative approach is to try to di erentiate between `legitimate' and `illegitimate'
behavior by somehow modeling intent. One way to do this is by building up a set of expec-
tations about the behavior of programs and detecting deviations from those expectations. A
closely related research result that may be helpful in understanding the present work is `rov-
ing emulation' 6 which has been proposed and simulated as a technique to detect faults at
run-time. Results on error latency have shown that e ective protection can be attained with
su cient roll-back capability. Other antivirus researchers have also used heuristic approaches
for run-time detection of virus-like activities 18 with substantial success.
In the remainder of this paper, we shall consider the concept of detecting behavioral
changes in some more depth. We begin by providing a mathematical model of program ex-
ecution and describing the di erence between legitimate and illegitimate program behavior
based on behavioral expectations. Next we describe some of the di culties in applying this
model to a practical situation by showing the complexity associated with simplistic applica-
tion of the model and some of the risks associated with less accurate implementations which
eliminate complexity in exchange for accuracy. We then describe some lines of research that
seem to have hope for providing adequate accuracy while maintaining reasonable complexity.
Finally, we summarize results, draw conclusions, and propose further work.
2 Some Formalities
We commonly refer to computer programs, and by that term, we intend to indicate a trans-
formation of the form 15 :
P:=I;O;S;f : I S ! S+
;O
where:
I = f1;:::;1g The integers
I = fi0;:::;img;m 2 I The input symbols
O = fo0;:::;ong;n 2 I The output symbols
S = S+
= fs0;:::;spg;p 2 I The internal next states
We also de ne I
as the set of all possible input sequences the Kleene closure of I
and the `histories' H of a Moore machine as the output and state sequences generated by
applying I
to the machine from each possible initial state.
A program executing in an undesirable manner, by de nition, has some undesired execu-
tion behavior, in that it produces undesirable outputs O or `next states' S+
. Since internal
program states may be observed during execution by program instrumentation, and dynamic
techniques exist to observe the propagation of program state errors 8, 13 , we may be able
to create similar techniques to detect certain types of undesirable behavior.
3
4. Using the Moore model of computation, we could di erentiate `expected' and `unex-
pected' behavior by instrumenting the state-space of a program as follows:
Before a virus modi es a program `Px' in storage, Px has some functionality
fx. If a modi ed form of Px acts di erently from the original, we have new
program Py with functionality fy, where fx 6= fy for at least one input sequence
I2 I
. As Py executes, and Iis provided as the input sequence, fy must produce
an internal state sy and or output sequence Oy that di ers from those of Py under
input I i.e. sy 6= sx or Oy 6= Ox.
Similarly, a `parameter altering' corruption may corrupt the behavior of a program Px
which normally operates on a subset Dx of all the possible inputs Dx I
by forcing it to
operate on a di erent subset Dy of inputs.
When some input sequence in a set Dy is provided to Px, we observe a mod-
i ed history denoted by HDy
, where HDx
6= HDy
and 6 9d 2 Dx;d 2 Dy. Any
di erences between HDx
and HDy
signal an unanticipated application of Px.
The analogy between corruptions caused by computer viruses or other malicious attacks
and program state errors caused by random faults or mistakes is the basis of this paper.
In this model, we detect corruption in real-time before the corruption propagates. This has
several advantages in non-viral attacks as well. For example, a common method for bypassing
access control is by providing unexpected input parameters to operating system calls in the
hope that the implementationfails to trap some condition and grants an inappropriate access.
Trojan horses pose similar problems in that they perform some unanticipated function under
some conditions. The same technique o ers hope for detecting Trojan horses when they
become active and thereby preventing some of their e ects.
In this model, we view an executing program as a series of dynamic state spaces that
are constantly being modi ed, instead of the conventional view of programs from as static
syntactic spaces. This is quite di erent from the previous e orts in virus detection, and
there is therefore some hope that this model will provide the basis for some form of rapid
real-time detection of unanticipated system activities. The problem with this model is that
for all but the most trivial of examples, the size of the state-space is enormous. The issue
that remains to be resolved is how to get the practical advantages of this technique without
the enormous time and space penalties inherent in its use.
3 Discussion
The model discussed here is of a deterministic Von Neumman computer in which a stream of
instructions is executed in order to e ect a required transformationof the input to the output.
4
5. We consider the program state of the computer at a point in execution to include all of the
information necessary to restart the computation if the executing program is interrupted.
In a synchronous nite state machine model of computation, the state is speci ed as the
memory state of all of the memory registers of the machine and the current clock value high
or low.
In practice, the state of a machine may not be that simple. In a timesharing system
with good process protection, the program state typically includes the state of all process
speci c processor registers and all of the internal memory of the program. In a personal
computer without good protection, the state of a program may be far more complex, but is
almost always encoded as the entire memory and register state of the machine. For a more
accurate restart of a process, many other factors may be involved, including the state of
input and output ports and bu ers, the disk state of the machine, the state of any mounted
devices, etc. We model all relevant state information as a set of variable value pairings.
State information is presented as a nite set fp1;p2;:::;png where each pi is an ordered pair
identi er, value, and where initial values may be indeterminate. For example:
fa, 5, b, 1, c, 300.2, d, unde ned, pc, 3g
might be part of the program state of a program with the four variables `a', `b', `c', anc `d',
immediately after executing statement 2 `pc' in this case indicates the `program counter'
register. The variable a has the value 5, b has the value 1, and c has the value 300.2.
The use of unde ned means that at some point in the execution of the program the value
of d became indeterminate i.e. no assertion can be made about its value at this point. If we
know only about the high level semantics of a program and do not wish to make assertions
about how the processor interprets instructions, an example of an indeterminate value would
be the value of a pointer after its storage has been freed.
For convenience, we may assume that all of the values associated with a given identi er
are of the same type. This is reasonable in the general case because we can assign each bit
of program state to a unique identi er, which will e ectively make them the same type. The
program state domain, D, is constructed by taking allpossible values of each component of the
program state at all instructions. To derive the history for any particular program execution
under any given input sequence, we can characterize each instruction in the executable
program in terms of its e ect on the program state.
There are many methods for doing this, and for most modern computers, we can do this
very easily through the use of a hardware de nition language and a simulator. For any single
input sequence, this is not particularly di cult, since in the worst case, we require only one
copy of the machine state for each instruction executed, and in most cases, each instruction
execution is simulatable in a xed time. Thus the time and space required is at most linear
with the number of instructions executed.
The complexity problems start when we wish to characterize a large number of di erent
input sequences. For each input symbol, we have as many possible program executions as
5
6. there are symbols in I, and in general, for an n-step program execution sequence, we have
jIjn possible executions. Assuming there are m bits of state information and that input
symbols require k bits to represent, we require m2kn
bits of state information. For any
non-trivial input set and program, this is enormous. For example, for a program with only
1 byte of state information and inputs of at most 1 byte each executing only 50 instructions
requires over 1 googol 10100
bytes of storage to characterize! 1
In a real computer, external inputs can come at any of a large number of di erent times,
and the normal processing mechanism allows `interrupts' which alter program execution
sequences between, or in rare cases during, instruction execution. Multiprocessing introduces
further uncertainties, particularly in DOS based computing environments where interprocess
interaction is truly arbitrary and uncontrolled. We simply cannot count on anything about
programs in these environments. Keeping the complexity issue in mind, we will press on in
the hope that we may eventually be able to collapse these very large numbers into manageable
and e ective protection mechanisms.
4 Reduced Complexity Models
One way to collapse much of the state information associated with a program is to assume
that there is a separation between `program' and `data'.
Let P consist of the set of m transitions 2
T = ft1;:::;tmg. Let Atj ;P;r;x represent the
program state that exists after executing instruction tj on the rth iteration of tj by input
x from domainP, where domainP represents all `valid'input sequences to program P.
There may be other inputs on which P could execute that would cause undesired states. Let
nx;tj
represent the number of times that instruction ij is executed by input x. Formally, D
is:
m
k=1 x2domainP
nx;tk
r=1
Atk ;P;r;x:
The execution of a single program instruction may, in general, change any number of
components of the data state, resulting in an entirely new data state, but in practice, the
vast majority of transitions alter only a very small portion of the data state. As a sequence of
transitions in a program is executed by the computer, the initial program state is successively
transformed until the program halts or loops inde nitely.
D represents the expected internal state behavior of P i.e. at any snapshot into the
execution of P, we should nd the program state of P is identical to some program state in
D. If this does not occur, then P has created a program state that is not possible according
12850 = 25650 10100
2a.k.a. instructions
6
7. to the domainP. This signals that P is in a state that cannot be achieved by any input in
domainP and suggests that:
1. P has received an input that is not in domainP, or
2. P has been altered by malicious code, or
3. The execution of P has not proceeded according to the model.
If P is receiving invalid inputs, it is not necessarily the case that a security violation has
occurred, however it is a situation that may require attention. It could be that some sensor
that sends inputs to P has failed. Or it could be that malicious code is sending perverse
input parameters to P. If the behavior of P has been altered, then we have a warning
that potentially malicious code has a ected the internal behavior of P. A hardware failure,
modeling error, or implementation inaccuracy could also result in an equivalent result.
To assure that the program states that are being created are not being in uenced by
malicious code, each program state created during execution needs to be checked against the
members of D. A theoretical algorithm for producing a warning that a program state has
been created that is not in D follows:
1. Create the set Dtk
=
x2domainP
nx;tk
r=1
Atk ;P;r;x
for each instruction tk in T. Dtk
contains every program state that could ever occur
after transition tk, given that the program input is in domainP.
2. During the execution of P in its operational environment, insert probes after each
transition tk to sample the program states being created. Determine whether the
sampled program states are in Dtk
. If they are not, produce a warning.
5 Building a Practical Virus Warning System
The cardinality of D is enormous, so it is infeasible to create and store every Dtk
. Even if
D were not so large, there may exist an input x for which nx;tk
is so large that Dtk
cannot
be stored. This means that the theoretical algorithm is generally impractical, however there
are several ways in which the algorithm can be partially implemented. Since we are unable
to fully implement the algorithm, we must accept a risk of false negatives.
To partially implement the theoretical model, we have many options. We could, for
example, store a small random sample of D at a random sampling of instructions; but
7
8. this would yield enormous numbers of false positives, since we could never hope to store a
substantial portion of D at any place, and thus the likelihood of an uncovered state would
be very high. A more directed approach might be to select the portion of the state to be
stored and places in the program at which to store and compare so that we know that certain
things are expected to be the case. This mechanism is provided in some computer languages,
in which invariants are speci ed by the programmer for the purpose of automating certain
aspects of program proofs 14 .
Another approach is to limit ourselves to particular classes of attacks. For example, we
could identify speci c instructions that computer viruses locate to attack executable code
and select those locations for performing tests. By assuming that each transition tk in a
target program does not have an equally likely chance of being attacked, we can reduce the
search space. Another approach might be to use information content measures to identify
instructions or state space characteristics that are more or less likely to occur in the program
being analyzed, and identify executions wherein these characteristics are not followed for an
identi able portion of the program execution.
The latter approach is particularly nice since we have to store only a very small amount
of data state information and evaluate a relatively simple metric in order to make a deter-
mination. Of course, the number of false positives and false negatives will depend heavily on
how tight the bounds of the program execution characteristics are, but then it is also quite
simple to alter the bounds by simply changing the metric for di erent applications.
If we are looking for particular types of attacks, we might also try to do a variant on
fault-tree analysis 7, 10 . We rst determine a set of unacceptable output states i.e. enu-
merate disastrous output states and then we apply a dynamic technique termed propagation
analysis 8, 12, 13 to determine where program states can be created in the program text
that could result in the disastrous output states that we enumerated. The key to making
propagation analysis e ective is the ability to simulate the internal behavior of viruses. As
an example, in a timesharing system we could identify places in the program where system
calls are made as a place where disastrous output could originate.
We do not pretend that this is trivial, but it can be made a lot simpler through additional
e ort in the speci cation and requirements phases during software development. In a sense,
this is a preliminary design-for-security step. Applying propagation analysis provides a list
of source locations in which a disastrous internal program state could be created that could
propagate producing undesirable outputs. We then map the source locations to the object
instructions i.e. nd the instructions that execute the source locations identi ed as dan-
gerous. Next we add instrumentation instructions to generate samplings for Dtk
s. During
normal execution, we place self-test instructions at these locations to detect variations.
8
9. 6 Summary, Conclusions, and Further Work
We have brie y explored the possibility of using experimental analysis of program behavior
to di erentiate between legitimate and illegitimate program execution, described some of
the complexity problems associated with this approach, and contemplated the possibility of
reducing that complexity to a reasonable level.
The general approach of detecting the intrusion by analyzing state information in exe-
cuting programs has the appeal that it could result in earlier detection than can be attained
through other general purpose techniques, while providing more generality than attack spe-
ci c defenses. The concept lends itself to detecting a very broad range of attacks including
Trojan horses and viruses, attacks generated by unanticipated input sequences, and even
unintentional corruptions resulting from transient or permanent faults.
This research is still in a very early stage, and clearly we can only draw limited con-
clusions. Although the general problem of virus detection is undecidable, the problem of
characterizing known program behavior in a nite state environment is only exponential in
the number of instructions executed. Furthermore, the upper bound on state sequences is
far higher than we would expect in normal program execution and complete state transition
information may not be required to detect many program variations. The possibility of a
practical defense based on this idea is therefore still an open question.
A great deal of further work will be required to determine the feasibility of this line of
defense in practical environments. More speci cally, tighter bounds on the complexity of
real program state spaces for particular classes of programs should be determined, and there
is a very real possibility that this will yield viable partial solutions. Several ideas have been
raised and analysis of these ideas may yield useful results, particularly for special classes
of programs such as those written in certain languages or those that compute particular
sorts of functions. There is also the possibility that programs generated from mathematical
speci cations may provide particular analytical advantages in detecting improper execution
and that programs written with additional protection related information may yield far more
e cient defensive techniques based on this concept.
References
1 F. Cohen. Computational Aspects of Computer Viruses. Computers Security, 8:325
344, 1989.
2 F. Cohen. Protection and Administration of Information Networks with Partial Order-
ings. Computers Security, 6:118 128, 1987.
3 F. Cohen. Models of Practical Defenses Against Computer Viruses. Computers
Security, 8:149 160, 1989.
9
10. 4 F. Cohen. A Short Course on Computer Viruses. ASP Press Pittsburgh, PA USA, 1991.
5 F. Cohen. Defense-In-Depth Against Computer Viruses. Computers Security, await-
ing publication, 1992
6 M. Breuer, F. Cohen, and A. Ismaeel. Roving Emulation. In Proceedings of Built-in
Self-test Conf., March 1983.
7 E. Henley and H. Kumamoto. Reliability Engineering and Risk Assessment. Princeton-
Hall, Inc., Englewood Cli s, N.J., 1981.
8 J. Voas, L. Morell, and K. Miller. Predicting Where Faults Can Hide From Testing.
IEEE Software, 82, March 1991.
9 M. Pozzo. PhD thesis, U. of California, Los Angeles, 1990.
10 R. Barlow, J. Fussell, and N. Singpurwalla. Reliability and Fault Tree Analysis: The-
oretical and Applied Aspects of System Reliability and Safety Assessment. Society for
Industrial and Applied Mathematics, 1975.
11 F. Skulason. The `F-prot' virus detection program uses several unpublished examples
of these techniques.
12 J. Voas. A Dynamic Technique for Predicting Where Data State Errors May be Created
that Cause Certain Classes of Software Failures. Submitted 1992.
13 J. Voas. A Dynamic Failure Model for Estimating the Impact that a Program Location
has on the Program. In Lecture Notes in Computer Science: Proc. of the 3rd European
Software Engineering Conf., volume 550, pages 308 331, Milan, Italy, October 1991.
Springer-Verlag.
14 D. Wortman. On legality assertions in EUCLID. IEEE Trans. on Software Engineering,
SE-54:359 366, July 1979.
15 E. Moore Gedanken Experiments on Sequential Machines in Automata Studies, C.
Shannon and J. McCarthy, ed. Princeton University Press, Princeton, NJ, pp129-153,
1956.
16 P. Kerchen, R. Lo, J. Crossley, G. Elkinbard, K. Levitt, and R. Olsson Static Analysis
Virus Detection Tools for Unix Systems. 13th National Computer Security Conference,
Washington, D.C., USA October, 1990
17 M. King Identifying and Controlling Undesirable Program Behaviors 14th National
Computer Security Conference, Washington, D.C., USA October, 1991
18 S. Chang, et. al. Internal documents and discussions with personnel at Trend Microde-
vices regarding their commercial antivirus product, 1990.
10