SlideShare a Scribd company logo
1 of 23
Download to read offline
RAJARAJESWARI COLLEGE OF ENGINEERING
MYSORE ROAD BENGALURU-74
DEPARTMENT OF MASTER OF COMPUTER APPLICATIONS
PROJECT ON
DEEP LEARNING MALWARE DETECTION USING
AUTO ENCODER
SUBMITTED BY
BHAVYASHREE V
UNDER THE GUIDANCE OF
PROF. DEEPA K R
ASSISTANT PROFESSOR OF MCA
DEEP LEARNING MALWARE
DETECTION USING
AUTO ENCODER
ABSTRACT
• Malware is malicious software disseminated to infiltrate the secrecy, integrity, and functionality of a system,
such as viruses, worms, Trojans, backdoors, and spyware. With computers and the Internet being essential in
everyday life, malware poses a serious threat to their security.
• The input dataset is taken from dataset repository. Based on the characteristics of the observations, the dataset
was created in a UNIX / Lunix-based virtual machine for classification purposes, which are harmless with
malware software for Android devices.
• The data set consists of 100,000 observation data and 35 features. In our process, the input dataset was
collected from dataset repository.
• Then, we have to implement the machine learning algorithms such as Random forest and CNNAfter that, the
results shows that the accuracy, precision, recall, f1-score.
OBJECTIVES
The main objective of our project is,
• To classify or to detect the malware in the software.
• To implement the machine learning algorithms.
• To enhance the overall performance for classification algorithms.
• To classify or detect the malware effectively.
INTRODUCTION
• With computers and the Internet being essential in everyday life, malware poses a serious threat to
their security.
• As a result, the detection of malware is of major concern to both the anti-malware industry and
researchers.
• many researches have been conducted on intelligent malware detection by applying data mining
and machine learning techniques in recent years.
• Most recently, machine learning is being used with better performance.
EXISTING SYSTEM
• Evaluates the classical MLAs and deep learning architectures for malware detection, classification, and
categorization using different public and private datasets
• Our major contribution is in proposing a novel image processing technique with optimal parameters for
MLAs and deep learning architectures to arrive at an effective zero-day malware detection model.
• Overall, this paper paves way for an effective visual detection of malware using a scalable and hybrid
deep learning framework for real-time deployments.
DISADVANTAGES
• The results is low when compared with proposed
• It doesn’t efficient for large volume of data’s
• Theoretical limits.
• The performance is considerably very low
• Lower learning rate was found to be good in identifying the executable as either benign or
malware.
PROPOSED SYSTEM
• In this system, malware dataset as input was taken from dataset repository like UCI repository. Then,
we have to implement the data pre-processing step such as checking any missing values for avoid
wrong prediction, label encoding is, to encode the data into numeric binary integer value.
• Then, we have to split the dataset into test and train. Test data is used for predict the model and train
data is used for evaluate the model.
• Then, we have to implement the feature selection for selecting the best features from the splitted data.
• Then, we have to implement the classification algorithm (i.e.) machine learning such as Random forest
and CNN. Finally, the experimental results shows that the performance metrics such as accuracy,
precision and recall.
ADVANTAGES
• The experimental result is high when compared with existing system.
• The prediction results is efficient.
• To classify the result effectively.
• Time consumption is low.
• It can handle packed malware, and can work on various malwares irrespective of the operating
system.
FLOW DIAGRAM
Dataset
Input data Preprocessing
Handling missing values
Label Encoding
Drop unwanted columns
Data splitting
Feature selection
Test
Train
Classification
Prediction
PCA
Accuracy
Visualization
RF and CNN
MODULES
• Data selection
• Data preprocessing
• Feature selection
• Data Splitting
• Classification
• Result Generation
MODULES DESCRIPTION
DATA SELECTION
• DATA SELECTION
• The input data was collected from dataset repository.
• The data selection is the process of selecting the data for detecting the malware.
• In this project, we have to use the malware detection dataset
• The dataset which contains the information about the classification(malware and benign) ,host etc.,
• In python, we have to read the dataset by using the pandas packages.
• Our dataset, is in the form of ‘.csv’ file extension.
DATA PREPROCESSING
• Data pre-processing is the process of removing the unwanted data from the dataset.
• Pre-processing data transformation operations are used to transform the dataset into a structure
suitable for machine learning.
• Missing data removal: In this process, the null values such as missing values and Nan values are
replaced by 0.
• Encoding Categorical data: That categorical data is defined as variables with a finite set of label
values.
DATA SPLITTING
• During the machine learning process, data are needed so that learning can take place.
• In addition to the data required for training, test data are needed to evaluate the performance of the
algorithm in order to see how well it works.
• In our process, we considered 70% of the our dataset to be the training data and the remaining 30% to be
the testing data.
• Data splitting is the act of partitioning available data into two portions, usually for cross-validator
purposes.
• One Portion of the data is used to develop a predictive model and the other to evaluate the model's
performance.
FEATURE SELECTION
• In our process, we have to implement the feature selection such as principle component
analysis(PCA).
• Principal Component Analysis is an unsupervised learning algorithm that is used for the
dimensionality reduction in machine learning.
• It is a statistical process that converts the observations of correlated features into a set of linearly
uncorrelated features with the help of orthogonal transformation.
CLASSIFICATION
• In our process, we have to implement the machine learning algorithm such as random forest and
logistic regression.
• The random forest is a classification algorithm consisting of many decisions trees. It uses bagging
and feature randomness when building each individual tree to try to create an uncorrelated forest of
trees whose prediction by committee is more accurate than that of any individual tree.
PERFORMANCE
• The Final Result will get generated based on the overall classification and prediction. The
performance of this proposed approach is evaluated using some measures like,
Accuracy
• Accuracy of classifier refers to the ability of classifier. It predicts the class label correctly and the
accuracy of the predictor refers to how well a given predictor can guess the value of predicted
attribute for a new data.
AC= (TP+TN)/ (TP+TN+FP+FN)
PERFORMANCE
Precision
• Precision is defined as the number of true positives divided by the number of true positives plus
the number of false positives.
Precision=TP/ (TP+FP)
Recall
• Recall is the number of correct results divided by the number of results that should have been
returned. In binary classification, recall is called sensitivity. It can be viewed as the probability that
a relevant document is retrieved by the query.
Recall=TP/ (TP+FN)
SYSTEM REQUIREMENTS
SOFTWARE REQUIREMENTS:
• O/S : Windows 7.
• Language : Python
• Front End : Anaconda Navigator – Spyder
HARDWARE REQUIREMENTS:
• System : Pentium IV 2.4 GHz
• Hard Disk : 200 GB
• Mouse : Logitech.
• Keyboard : 110 keys enhanced
• Ram : 4GB
CONCLUSION
• We conclude that, a machine-learning based method for the detection of malware attacks in the
software
• The research in the paper adopted an approach based on the random forest and logistic regression
which was classify the attacks effectively.
• The experimental results indicate that the proposed approach outperformed the machine learning
algorithms and achieved the highest performance in terms of Accuracy, Precision and F1-score.
Thank You…

More Related Content

Similar to malware detection ppt for vtu project and other final year project

Outlier detection for high dimensional data
Outlier detection for high dimensional dataOutlier detection for high dimensional data
Outlier detection for high dimensional dataParag Tamhane
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applicationsAnish Das
 
College_Tech-seminar_2024_Indrajith.pptx
College_Tech-seminar_2024_Indrajith.pptxCollege_Tech-seminar_2024_Indrajith.pptx
College_Tech-seminar_2024_Indrajith.pptxIndrajithN1
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET Journal
 
GIS_presentation .pptx
GIS_presentation                    .pptxGIS_presentation                    .pptx
GIS_presentation .pptxlahelex741
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionKishor Datta Gupta
 
Making pentesting sexy ossams - BSidesQuebec2013
Making pentesting sexy ossams - BSidesQuebec2013Making pentesting sexy ossams - BSidesQuebec2013
Making pentesting sexy ossams - BSidesQuebec2013BSidesQuebec2013
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptxDr.Shweta
 
House price prediction
House price predictionHouse price prediction
House price predictionSabahBegum
 
BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...
BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...
BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...BlueHat Security Conference
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfSaketBansal9
 
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESA DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESPNandaSai
 
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSFAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSMaurvi04
 
Presentation 7.pptx
Presentation 7.pptxPresentation 7.pptx
Presentation 7.pptxShivam327815
 

Similar to malware detection ppt for vtu project and other final year project (20)

module 1.pptx
module 1.pptxmodule 1.pptx
module 1.pptx
 
Outlier detection for high dimensional data
Outlier detection for high dimensional dataOutlier detection for high dimensional data
Outlier detection for high dimensional data
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applications
 
College_Tech-seminar_2024_Indrajith.pptx
College_Tech-seminar_2024_Indrajith.pptxCollege_Tech-seminar_2024_Indrajith.pptx
College_Tech-seminar_2024_Indrajith.pptx
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
 
GIS_presentation .pptx
GIS_presentation                    .pptxGIS_presentation                    .pptx
GIS_presentation .pptx
 
Module-4_Part-II.pptx
Module-4_Part-II.pptxModule-4_Part-II.pptx
Module-4_Part-II.pptx
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detection
 
Vapt life cycle
Vapt life cycleVapt life cycle
Vapt life cycle
 
Making pentesting sexy ossams - BSidesQuebec2013
Making pentesting sexy ossams - BSidesQuebec2013Making pentesting sexy ossams - BSidesQuebec2013
Making pentesting sexy ossams - BSidesQuebec2013
 
seminar.pptx
seminar.pptxseminar.pptx
seminar.pptx
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...
BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...
BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdf
 
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESA DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
 
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSFAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
 
Presentation 7.pptx
Presentation 7.pptxPresentation 7.pptx
Presentation 7.pptx
 

Recently uploaded

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

malware detection ppt for vtu project and other final year project

  • 1. RAJARAJESWARI COLLEGE OF ENGINEERING MYSORE ROAD BENGALURU-74 DEPARTMENT OF MASTER OF COMPUTER APPLICATIONS PROJECT ON DEEP LEARNING MALWARE DETECTION USING AUTO ENCODER SUBMITTED BY BHAVYASHREE V UNDER THE GUIDANCE OF PROF. DEEPA K R ASSISTANT PROFESSOR OF MCA
  • 2. DEEP LEARNING MALWARE DETECTION USING AUTO ENCODER
  • 3. ABSTRACT • Malware is malicious software disseminated to infiltrate the secrecy, integrity, and functionality of a system, such as viruses, worms, Trojans, backdoors, and spyware. With computers and the Internet being essential in everyday life, malware poses a serious threat to their security. • The input dataset is taken from dataset repository. Based on the characteristics of the observations, the dataset was created in a UNIX / Lunix-based virtual machine for classification purposes, which are harmless with malware software for Android devices. • The data set consists of 100,000 observation data and 35 features. In our process, the input dataset was collected from dataset repository. • Then, we have to implement the machine learning algorithms such as Random forest and CNNAfter that, the results shows that the accuracy, precision, recall, f1-score.
  • 4. OBJECTIVES The main objective of our project is, • To classify or to detect the malware in the software. • To implement the machine learning algorithms. • To enhance the overall performance for classification algorithms. • To classify or detect the malware effectively.
  • 5. INTRODUCTION • With computers and the Internet being essential in everyday life, malware poses a serious threat to their security. • As a result, the detection of malware is of major concern to both the anti-malware industry and researchers. • many researches have been conducted on intelligent malware detection by applying data mining and machine learning techniques in recent years. • Most recently, machine learning is being used with better performance.
  • 6. EXISTING SYSTEM • Evaluates the classical MLAs and deep learning architectures for malware detection, classification, and categorization using different public and private datasets • Our major contribution is in proposing a novel image processing technique with optimal parameters for MLAs and deep learning architectures to arrive at an effective zero-day malware detection model. • Overall, this paper paves way for an effective visual detection of malware using a scalable and hybrid deep learning framework for real-time deployments.
  • 7. DISADVANTAGES • The results is low when compared with proposed • It doesn’t efficient for large volume of data’s • Theoretical limits. • The performance is considerably very low • Lower learning rate was found to be good in identifying the executable as either benign or malware.
  • 8. PROPOSED SYSTEM • In this system, malware dataset as input was taken from dataset repository like UCI repository. Then, we have to implement the data pre-processing step such as checking any missing values for avoid wrong prediction, label encoding is, to encode the data into numeric binary integer value. • Then, we have to split the dataset into test and train. Test data is used for predict the model and train data is used for evaluate the model. • Then, we have to implement the feature selection for selecting the best features from the splitted data. • Then, we have to implement the classification algorithm (i.e.) machine learning such as Random forest and CNN. Finally, the experimental results shows that the performance metrics such as accuracy, precision and recall.
  • 9. ADVANTAGES • The experimental result is high when compared with existing system. • The prediction results is efficient. • To classify the result effectively. • Time consumption is low. • It can handle packed malware, and can work on various malwares irrespective of the operating system.
  • 11. Dataset Input data Preprocessing Handling missing values Label Encoding Drop unwanted columns Data splitting Feature selection Test Train Classification Prediction PCA Accuracy Visualization RF and CNN
  • 12. MODULES • Data selection • Data preprocessing • Feature selection • Data Splitting • Classification • Result Generation
  • 14. DATA SELECTION • DATA SELECTION • The input data was collected from dataset repository. • The data selection is the process of selecting the data for detecting the malware. • In this project, we have to use the malware detection dataset • The dataset which contains the information about the classification(malware and benign) ,host etc., • In python, we have to read the dataset by using the pandas packages. • Our dataset, is in the form of ‘.csv’ file extension.
  • 15. DATA PREPROCESSING • Data pre-processing is the process of removing the unwanted data from the dataset. • Pre-processing data transformation operations are used to transform the dataset into a structure suitable for machine learning. • Missing data removal: In this process, the null values such as missing values and Nan values are replaced by 0. • Encoding Categorical data: That categorical data is defined as variables with a finite set of label values.
  • 16. DATA SPLITTING • During the machine learning process, data are needed so that learning can take place. • In addition to the data required for training, test data are needed to evaluate the performance of the algorithm in order to see how well it works. • In our process, we considered 70% of the our dataset to be the training data and the remaining 30% to be the testing data. • Data splitting is the act of partitioning available data into two portions, usually for cross-validator purposes. • One Portion of the data is used to develop a predictive model and the other to evaluate the model's performance.
  • 17. FEATURE SELECTION • In our process, we have to implement the feature selection such as principle component analysis(PCA). • Principal Component Analysis is an unsupervised learning algorithm that is used for the dimensionality reduction in machine learning. • It is a statistical process that converts the observations of correlated features into a set of linearly uncorrelated features with the help of orthogonal transformation.
  • 18. CLASSIFICATION • In our process, we have to implement the machine learning algorithm such as random forest and logistic regression. • The random forest is a classification algorithm consisting of many decisions trees. It uses bagging and feature randomness when building each individual tree to try to create an uncorrelated forest of trees whose prediction by committee is more accurate than that of any individual tree.
  • 19. PERFORMANCE • The Final Result will get generated based on the overall classification and prediction. The performance of this proposed approach is evaluated using some measures like, Accuracy • Accuracy of classifier refers to the ability of classifier. It predicts the class label correctly and the accuracy of the predictor refers to how well a given predictor can guess the value of predicted attribute for a new data. AC= (TP+TN)/ (TP+TN+FP+FN)
  • 20. PERFORMANCE Precision • Precision is defined as the number of true positives divided by the number of true positives plus the number of false positives. Precision=TP/ (TP+FP) Recall • Recall is the number of correct results divided by the number of results that should have been returned. In binary classification, recall is called sensitivity. It can be viewed as the probability that a relevant document is retrieved by the query. Recall=TP/ (TP+FN)
  • 21. SYSTEM REQUIREMENTS SOFTWARE REQUIREMENTS: • O/S : Windows 7. • Language : Python • Front End : Anaconda Navigator – Spyder HARDWARE REQUIREMENTS: • System : Pentium IV 2.4 GHz • Hard Disk : 200 GB • Mouse : Logitech. • Keyboard : 110 keys enhanced • Ram : 4GB
  • 22. CONCLUSION • We conclude that, a machine-learning based method for the detection of malware attacks in the software • The research in the paper adopted an approach based on the random forest and logistic regression which was classify the attacks effectively. • The experimental results indicate that the proposed approach outperformed the machine learning algorithms and achieved the highest performance in terms of Accuracy, Precision and F1-score.