malware detection ppt for vtu project and other final year project
1. RAJARAJESWARI COLLEGE OF ENGINEERING
MYSORE ROAD BENGALURU-74
DEPARTMENT OF MASTER OF COMPUTER APPLICATIONS
PROJECT ON
DEEP LEARNING MALWARE DETECTION USING
AUTO ENCODER
SUBMITTED BY
BHAVYASHREE V
UNDER THE GUIDANCE OF
PROF. DEEPA K R
ASSISTANT PROFESSOR OF MCA
3. ABSTRACT
• Malware is malicious software disseminated to infiltrate the secrecy, integrity, and functionality of a system,
such as viruses, worms, Trojans, backdoors, and spyware. With computers and the Internet being essential in
everyday life, malware poses a serious threat to their security.
• The input dataset is taken from dataset repository. Based on the characteristics of the observations, the dataset
was created in a UNIX / Lunix-based virtual machine for classification purposes, which are harmless with
malware software for Android devices.
• The data set consists of 100,000 observation data and 35 features. In our process, the input dataset was
collected from dataset repository.
• Then, we have to implement the machine learning algorithms such as Random forest and CNNAfter that, the
results shows that the accuracy, precision, recall, f1-score.
4. OBJECTIVES
The main objective of our project is,
• To classify or to detect the malware in the software.
• To implement the machine learning algorithms.
• To enhance the overall performance for classification algorithms.
• To classify or detect the malware effectively.
5. INTRODUCTION
• With computers and the Internet being essential in everyday life, malware poses a serious threat to
their security.
• As a result, the detection of malware is of major concern to both the anti-malware industry and
researchers.
• many researches have been conducted on intelligent malware detection by applying data mining
and machine learning techniques in recent years.
• Most recently, machine learning is being used with better performance.
6. EXISTING SYSTEM
• Evaluates the classical MLAs and deep learning architectures for malware detection, classification, and
categorization using different public and private datasets
• Our major contribution is in proposing a novel image processing technique with optimal parameters for
MLAs and deep learning architectures to arrive at an effective zero-day malware detection model.
• Overall, this paper paves way for an effective visual detection of malware using a scalable and hybrid
deep learning framework for real-time deployments.
7. DISADVANTAGES
• The results is low when compared with proposed
• It doesn’t efficient for large volume of data’s
• Theoretical limits.
• The performance is considerably very low
• Lower learning rate was found to be good in identifying the executable as either benign or
malware.
8. PROPOSED SYSTEM
• In this system, malware dataset as input was taken from dataset repository like UCI repository. Then,
we have to implement the data pre-processing step such as checking any missing values for avoid
wrong prediction, label encoding is, to encode the data into numeric binary integer value.
• Then, we have to split the dataset into test and train. Test data is used for predict the model and train
data is used for evaluate the model.
• Then, we have to implement the feature selection for selecting the best features from the splitted data.
• Then, we have to implement the classification algorithm (i.e.) machine learning such as Random forest
and CNN. Finally, the experimental results shows that the performance metrics such as accuracy,
precision and recall.
9. ADVANTAGES
• The experimental result is high when compared with existing system.
• The prediction results is efficient.
• To classify the result effectively.
• Time consumption is low.
• It can handle packed malware, and can work on various malwares irrespective of the operating
system.
14. DATA SELECTION
• DATA SELECTION
• The input data was collected from dataset repository.
• The data selection is the process of selecting the data for detecting the malware.
• In this project, we have to use the malware detection dataset
• The dataset which contains the information about the classification(malware and benign) ,host etc.,
• In python, we have to read the dataset by using the pandas packages.
• Our dataset, is in the form of ‘.csv’ file extension.
15. DATA PREPROCESSING
• Data pre-processing is the process of removing the unwanted data from the dataset.
• Pre-processing data transformation operations are used to transform the dataset into a structure
suitable for machine learning.
• Missing data removal: In this process, the null values such as missing values and Nan values are
replaced by 0.
• Encoding Categorical data: That categorical data is defined as variables with a finite set of label
values.
16. DATA SPLITTING
• During the machine learning process, data are needed so that learning can take place.
• In addition to the data required for training, test data are needed to evaluate the performance of the
algorithm in order to see how well it works.
• In our process, we considered 70% of the our dataset to be the training data and the remaining 30% to be
the testing data.
• Data splitting is the act of partitioning available data into two portions, usually for cross-validator
purposes.
• One Portion of the data is used to develop a predictive model and the other to evaluate the model's
performance.
17. FEATURE SELECTION
• In our process, we have to implement the feature selection such as principle component
analysis(PCA).
• Principal Component Analysis is an unsupervised learning algorithm that is used for the
dimensionality reduction in machine learning.
• It is a statistical process that converts the observations of correlated features into a set of linearly
uncorrelated features with the help of orthogonal transformation.
18. CLASSIFICATION
• In our process, we have to implement the machine learning algorithm such as random forest and
logistic regression.
• The random forest is a classification algorithm consisting of many decisions trees. It uses bagging
and feature randomness when building each individual tree to try to create an uncorrelated forest of
trees whose prediction by committee is more accurate than that of any individual tree.
19. PERFORMANCE
• The Final Result will get generated based on the overall classification and prediction. The
performance of this proposed approach is evaluated using some measures like,
Accuracy
• Accuracy of classifier refers to the ability of classifier. It predicts the class label correctly and the
accuracy of the predictor refers to how well a given predictor can guess the value of predicted
attribute for a new data.
AC= (TP+TN)/ (TP+TN+FP+FN)
20. PERFORMANCE
Precision
• Precision is defined as the number of true positives divided by the number of true positives plus
the number of false positives.
Precision=TP/ (TP+FP)
Recall
• Recall is the number of correct results divided by the number of results that should have been
returned. In binary classification, recall is called sensitivity. It can be viewed as the probability that
a relevant document is retrieved by the query.
Recall=TP/ (TP+FN)
21. SYSTEM REQUIREMENTS
SOFTWARE REQUIREMENTS:
• O/S : Windows 7.
• Language : Python
• Front End : Anaconda Navigator – Spyder
HARDWARE REQUIREMENTS:
• System : Pentium IV 2.4 GHz
• Hard Disk : 200 GB
• Mouse : Logitech.
• Keyboard : 110 keys enhanced
• Ram : 4GB
22. CONCLUSION
• We conclude that, a machine-learning based method for the detection of malware attacks in the
software
• The research in the paper adopted an approach based on the random forest and logistic regression
which was classify the attacks effectively.
• The experimental results indicate that the proposed approach outperformed the machine learning
algorithms and achieved the highest performance in terms of Accuracy, Precision and F1-score.