MALWARE DETECTION: A FRAMEWORK FOR REVERSE
ENGINEERED ANDROID APPLICATIONS
SECURING THE ANDROID WORLD: A FRAMEWORK TO DETECT AND ELIMINATE MALWARE.
PEDDURI GANESH - 23033I1023
UNDER THE GUIDANCE
J.BINDHU BHARGAVI M.TECH
CONTENTS
DOMAIN
INTRODUCTION
ABSTRACT OBJECTIVES
INTRODUCTION
EXISTING
SYSTEM
PROPOSED
SYSTEM
CONTENTS
SYSTEM REQUIREMENTS SYSTEMARCHITECTURE ALGORITHMS ALGORITHMS DESCRIPTION
RESULT GENERATION CONCLUSION FUTURE WORK REFERENCES
DOMAIN INTRODUCTION
• Malware detection is an essential task in the field of cybersecurity, especially for mobile devices such as
Android smartphones. With the increasing use of mobile devices in our daily lives, the risk of malware
attacks on these devices has also increased.
• Malware can come in many forms and can be hidden in seemingly harmless apps or downloaded from
suspicious websites, making it difficult to detect.
• Reverse engineering is a technique used to analyse and understand how software works and can be used to
detect malware.
• This technique can help identify suspicious code patterns and potential vulnerabilities that could be
exploited by malware.
ABSTRACT
• Today, Android is one of the most used operating systems in
smartphone technology. This is the main reason, Android has
become the favourite target for hackers and attackers.
• Malicious codes are being embedded in Android applications
in such a sophisticated manner that detecting and identifying
an application as a malware has become the toughest job for
security providers.
• This project uses Reverse Engineered Android applications
features and Machine Learning algorithms to find
vulnerabilities present in Smartphone applications.
ABSTRACT
• Firstly, we propose a model that has more innovative static feature sets with the largest
current datasets of malware samples than conventional methods.
• Secondly, we have used ensemble learning with machine learning algorithms such as SVM(support
vector machine),Stochastic Gradient Descent(SDG) , Random Forest Classifier ,Decision Tree
Classifier, for classification and Naive Bayes ,Logistic Regression for reading and analysing the given
data.
• By using these algorithms we can predict an applications contain malware or not
• We use a large data set of 9600 values to train our machine learning model
• Our experimental results says about machine learning model has 96.24 percentage of accuracy and
0.3 false positive ratio (FPR)
OBJECTIVES
Identification of Malicious Behaviour: The primary objective of malware detection is to
identify any malicious behaviour or activity that may be present in an Android
application. This can include actions such as data theft, unauthorized access to sensitive
information, or unwanted communication with external servers.
Detection of Malware Code: The framework should aim to identify any code or
programming structures that are indicative of malware. This may involve analysing
the application's executable files, data structures, network activity, or other elements
that could indicate the presence of malicious code.
Prevention of Malware Attacks: The ultimate goal of malware detection is to prevent
the spread of malware and protect Android devices from attack. By identifying and
analysing malware in a systematic and comprehensive way, the framework can help
developers and security experts to create more effective defence strategies and
prevent future attacks.
INTRODUCTION
• Android has a large market share of mobile devices globally, with around 80% of the market share, making it a
significant target for malware.
• Malware targeting Android devices has increased due to the open-source operating system, and it's easy to
implement unwanted permissions in Android apps.
• Machine learning is used to predict malware applications by extracting static features from reverse-engineered
Android applications and using algorithms such as SVM, logistic progression, and ensemble learning.
• Boosting or ensemble techniques like Adaboost can improve the classification of misclassified variables.
• Over 70% of Android mobile applications request unnecessary permissions that are not needed for the app to
function, which makes it difficult to determine an app's vindictiveness.
• Android requires users to access apps from untrusted outlets like file-sharing sites or third-party app stores,
making it a prime target for malware, and new malware Android versions are introduced every few seconds.
EXISTING SYSTEM
The existing system does not use best machine learning algorithm and ensemble
learning techniques. However the existing system only use random forest algorithm
used for reading the data and not having the best classification algorithm
The existing system does not utilise the characteristics of reverse engineering
applications which has ability to detect these sophisticated forms of malware
The existing project uses two data sets 700 malware samples and 160 features.
Which may not be sufficient to accurate classify all type of malware
The existing system accuracy may decreases overtime as malware evolves and the
new attack strategies are developed
PROPOSED SYSTEM
In the proposed system we have a six machine learning algorithms and
implemented a boost in ensemble learning approach (adaboost) with decision tree
Our model has large trained data on the latest samples of malware around 9600
data set samples
We use reverse engineering based approach to identify and detect the
sophisticated forms of malware
We use 56,000 features for characterising
The accuracy of our project is 96.24 percentage with 0.3 percentage of false
positive ratio
SYSTEM REQUIREMENTS
Hardware Requirement
 Processor - Pentium –IV
 RAM - 4 GB (min)
 Hard Disk - 20 GB
 Key Board - Standard Windows Keyboard
 Monitor - SVGA 5.2
Software Requirement
• Operating system : Windows 7 Ultimate.
• Coding Language : Python.
• Front-End : Python.
• Back-End : Django-ORM
• Designing : Html, CSS, JavaScript.
• Data Base : MySQL (WAMP Server)
SYSTEM ARCHITECTURE
ALGORITHMS
SVM(support vector
machine)
Stochastic Gradient
Descent(SDG)
Random Forest
Classifier
Decision
Tree Classifier
Naive Bayes Logistic Regression
ALGORITHMS DESCRIPTION
• SVM (Support Vector Machine): A machine learning algorithm that separates data points
into different classes by finding the best decision boundary between them.
• Stochastic Gradient Descent (SGD): A method for optimizing machine learning models by
iteratively adjusting the weights of the model using randomly selected subsets of the
training data.
• Random Forest Classifier: An ensemble learning method that uses multiple decision trees
to classify data points by aggregating the predictions of the individual trees.
ALGORITHMS DESCRIPTION
• Decision Tree Classifier: A machine learning algorithm that builds a tree-like model of
decisions and their possible consequences to classify data points.
• Naive Bayes: A probabilistic machine learning algorithm that predicts the probability of a
data point belonging to a certain class based on the probabilities of the individual features
of the data point.
• Logistic Regression: A machine learning algorithm that models the relationship between a
dependent binary variable and one or more independent variables using a logistic function.
RESULT GENERATION
• Result visualization: Results can be presented in various formats such as tables, graphs, charts, and
diagrams to make them more easily understandable and visually appealing. This helps to
communicate the results effectively to stakeholders and decision-makers.
• The project presents a comprehensive evaluation of the framework, which includes testing on a
large dataset of real-world Android applications and comparing the results with other state-of-the-
art malware detection tools.
• The evaluation results show that the proposed project achieves high detection rates while
maintaining low false positive rates.
CONCLUSION
1. Framework development: A new framework has been developed in this research that can
detect malicious Android applications.
2. Machine learning-based technique: The proposed technique employs various elements of
machine learning and has achieved a high accuracy of 96.24% in identifying malicious
Android applications.
3. Feature extraction: Reverse application engineering have been leveraged to extract
features from Android apps' behavior into binary vectors.
4. Model performance: The suggested model has a false positive rate of 0.3 and an accuracy
of 96% in the given environment with enhanced and larger feature and sample sets.
5. Future work: The research suggests considering model resilience in terms of enhanced
and dynamic features, addressing the issue of dependent variables or high
intercorrelation between machine algorithms, and improving sustainability concerns in
the future.
FUTURE ENHANCEMENT
• The proposed system for detecting Android malware apps in the paper "Malware Detection: A
Framework for Reverse Engineered Android Applications" aims to address the challenges of identifying
sophisticated and embedded malware in Android applications. The paper presents two key
enhancements to the existing system:
1. A model that incorporates innovative static feature sets with larger current datasets of malware samples
2. An ensemble learning approach using machine learning algorithms to improve the model's performance.
The proposed model achieved an impressive 96.24% accuracy in detecting extracted malware from
Android applications, with a 0.3 False Positive Rate (FPR). Additionally, the system uses reverse-
engineered Android applications features and machine learning algorithms to find vulnerabilities present
in smartphone applications. Overall, this framework offers a more effective and efficient approach to
detect Android malware, improving the safety and privacy of phone users.
REFERENCES
• [1] A. O. Christiana, B. A. Gyunka, and A. Noah, “Android Malware Detection through Machine Learning Techniques: A
Review,” Int. J. Online Biomed. Eng. IJOE, vol. 16, no. 02, p. 14, Feb. 2020, doi: 10.3991/ijoe.v16i02.11549.
• [2] D. Ghimire and J. Lee, “Geometric Feature-Based Facial Expression Recognition in Image Sequences Using Multi-
Class AdaBoost and Support Vector Machines,” Sensors, vol. 13, no. 6, pp. 7714–7734, Jun. 2013, doi:
10.3390/s130607714.
• [3] R. Wang, “AdaBoost for Feature Selection, Classification and Its Relation with SVM, A Review,” Phys. Procedia, vol.
25, pp. 800–807, 2012, doi: 10.1016/j.phpro.2012.03.160.
• [4] J. Sun, H. Fujita, P. Chen, and H. Li, “Dynamic financial distress prediction with concept drift based on time
weighting combined with Adaboost support vector machine ensemble,” Knowl.-Based Syst., vol. 120, pp. 4–14, Mar.
2017, doi: 10.1016/j.knosys.2016.12.019.
• [5] A. Garg and K. Tai, “Comparison of statistical and machine learning methods in modelling of data with
multicollinearity,” Int. J. Model. Identif. Control, vol. 18, no. 4, p. 295, 2013, doi: 10.1504/IJMIC.2013.053535.
REFERENCES
• [6] C. P. Obite, N. P. Olewuezi, G. U. Ugwuanyim, and D. C. Bartholomew, “Multicollinearity Effect in
Regression Analysis: A Feed Forward Artificial Neural Network Approach,” Asian J. Probab. Stat., pp.
22–33, Jan. 2020, doi: 10.9734/ajpas/2020/v6i130151.
• [7] W. Wang et al., “Constructing Features for Detecting Android Malicious Applications: Issues,
Taxonomy and Directions,” IEEE Access, vol. 7, pp. 67602–67631,
2019, doi: 10.1109/ACCESS.2019.2918139.
• [8] B. Rashidi, C. Fung, and E. Bertino, “Android malicious application detection using support
vector machine and active learning,” in 2017 13th International Conference on Network and Service
Management 53 (CNSM), Tokyo, Nov. 2017, pp. 1–9. doi: 10.23919/CNSM.2017.8256035.

MALWARE DETECTION A FRAMEWORK FOR REVERSE ENGINEERED ANDROID APPLICATIONS_.pptx

  • 1.
    MALWARE DETECTION: AFRAMEWORK FOR REVERSE ENGINEERED ANDROID APPLICATIONS SECURING THE ANDROID WORLD: A FRAMEWORK TO DETECT AND ELIMINATE MALWARE. PEDDURI GANESH - 23033I1023 UNDER THE GUIDANCE J.BINDHU BHARGAVI M.TECH
  • 2.
  • 3.
    CONTENTS SYSTEM REQUIREMENTS SYSTEMARCHITECTUREALGORITHMS ALGORITHMS DESCRIPTION RESULT GENERATION CONCLUSION FUTURE WORK REFERENCES
  • 4.
    DOMAIN INTRODUCTION • Malwaredetection is an essential task in the field of cybersecurity, especially for mobile devices such as Android smartphones. With the increasing use of mobile devices in our daily lives, the risk of malware attacks on these devices has also increased. • Malware can come in many forms and can be hidden in seemingly harmless apps or downloaded from suspicious websites, making it difficult to detect. • Reverse engineering is a technique used to analyse and understand how software works and can be used to detect malware. • This technique can help identify suspicious code patterns and potential vulnerabilities that could be exploited by malware.
  • 5.
    ABSTRACT • Today, Androidis one of the most used operating systems in smartphone technology. This is the main reason, Android has become the favourite target for hackers and attackers. • Malicious codes are being embedded in Android applications in such a sophisticated manner that detecting and identifying an application as a malware has become the toughest job for security providers. • This project uses Reverse Engineered Android applications features and Machine Learning algorithms to find vulnerabilities present in Smartphone applications.
  • 6.
    ABSTRACT • Firstly, wepropose a model that has more innovative static feature sets with the largest current datasets of malware samples than conventional methods. • Secondly, we have used ensemble learning with machine learning algorithms such as SVM(support vector machine),Stochastic Gradient Descent(SDG) , Random Forest Classifier ,Decision Tree Classifier, for classification and Naive Bayes ,Logistic Regression for reading and analysing the given data. • By using these algorithms we can predict an applications contain malware or not • We use a large data set of 9600 values to train our machine learning model • Our experimental results says about machine learning model has 96.24 percentage of accuracy and 0.3 false positive ratio (FPR)
  • 7.
    OBJECTIVES Identification of MaliciousBehaviour: The primary objective of malware detection is to identify any malicious behaviour or activity that may be present in an Android application. This can include actions such as data theft, unauthorized access to sensitive information, or unwanted communication with external servers. Detection of Malware Code: The framework should aim to identify any code or programming structures that are indicative of malware. This may involve analysing the application's executable files, data structures, network activity, or other elements that could indicate the presence of malicious code. Prevention of Malware Attacks: The ultimate goal of malware detection is to prevent the spread of malware and protect Android devices from attack. By identifying and analysing malware in a systematic and comprehensive way, the framework can help developers and security experts to create more effective defence strategies and prevent future attacks.
  • 8.
    INTRODUCTION • Android hasa large market share of mobile devices globally, with around 80% of the market share, making it a significant target for malware. • Malware targeting Android devices has increased due to the open-source operating system, and it's easy to implement unwanted permissions in Android apps. • Machine learning is used to predict malware applications by extracting static features from reverse-engineered Android applications and using algorithms such as SVM, logistic progression, and ensemble learning. • Boosting or ensemble techniques like Adaboost can improve the classification of misclassified variables. • Over 70% of Android mobile applications request unnecessary permissions that are not needed for the app to function, which makes it difficult to determine an app's vindictiveness. • Android requires users to access apps from untrusted outlets like file-sharing sites or third-party app stores, making it a prime target for malware, and new malware Android versions are introduced every few seconds.
  • 9.
    EXISTING SYSTEM The existingsystem does not use best machine learning algorithm and ensemble learning techniques. However the existing system only use random forest algorithm used for reading the data and not having the best classification algorithm The existing system does not utilise the characteristics of reverse engineering applications which has ability to detect these sophisticated forms of malware The existing project uses two data sets 700 malware samples and 160 features. Which may not be sufficient to accurate classify all type of malware The existing system accuracy may decreases overtime as malware evolves and the new attack strategies are developed
  • 10.
    PROPOSED SYSTEM In theproposed system we have a six machine learning algorithms and implemented a boost in ensemble learning approach (adaboost) with decision tree Our model has large trained data on the latest samples of malware around 9600 data set samples We use reverse engineering based approach to identify and detect the sophisticated forms of malware We use 56,000 features for characterising The accuracy of our project is 96.24 percentage with 0.3 percentage of false positive ratio
  • 11.
    SYSTEM REQUIREMENTS Hardware Requirement Processor - Pentium –IV  RAM - 4 GB (min)  Hard Disk - 20 GB  Key Board - Standard Windows Keyboard  Monitor - SVGA 5.2 Software Requirement • Operating system : Windows 7 Ultimate. • Coding Language : Python. • Front-End : Python. • Back-End : Django-ORM • Designing : Html, CSS, JavaScript. • Data Base : MySQL (WAMP Server)
  • 12.
  • 13.
    ALGORITHMS SVM(support vector machine) Stochastic Gradient Descent(SDG) RandomForest Classifier Decision Tree Classifier Naive Bayes Logistic Regression
  • 14.
    ALGORITHMS DESCRIPTION • SVM(Support Vector Machine): A machine learning algorithm that separates data points into different classes by finding the best decision boundary between them. • Stochastic Gradient Descent (SGD): A method for optimizing machine learning models by iteratively adjusting the weights of the model using randomly selected subsets of the training data. • Random Forest Classifier: An ensemble learning method that uses multiple decision trees to classify data points by aggregating the predictions of the individual trees.
  • 15.
    ALGORITHMS DESCRIPTION • DecisionTree Classifier: A machine learning algorithm that builds a tree-like model of decisions and their possible consequences to classify data points. • Naive Bayes: A probabilistic machine learning algorithm that predicts the probability of a data point belonging to a certain class based on the probabilities of the individual features of the data point. • Logistic Regression: A machine learning algorithm that models the relationship between a dependent binary variable and one or more independent variables using a logistic function.
  • 16.
    RESULT GENERATION • Resultvisualization: Results can be presented in various formats such as tables, graphs, charts, and diagrams to make them more easily understandable and visually appealing. This helps to communicate the results effectively to stakeholders and decision-makers. • The project presents a comprehensive evaluation of the framework, which includes testing on a large dataset of real-world Android applications and comparing the results with other state-of-the- art malware detection tools. • The evaluation results show that the proposed project achieves high detection rates while maintaining low false positive rates.
  • 17.
    CONCLUSION 1. Framework development:A new framework has been developed in this research that can detect malicious Android applications. 2. Machine learning-based technique: The proposed technique employs various elements of machine learning and has achieved a high accuracy of 96.24% in identifying malicious Android applications. 3. Feature extraction: Reverse application engineering have been leveraged to extract features from Android apps' behavior into binary vectors. 4. Model performance: The suggested model has a false positive rate of 0.3 and an accuracy of 96% in the given environment with enhanced and larger feature and sample sets. 5. Future work: The research suggests considering model resilience in terms of enhanced and dynamic features, addressing the issue of dependent variables or high intercorrelation between machine algorithms, and improving sustainability concerns in the future.
  • 18.
    FUTURE ENHANCEMENT • Theproposed system for detecting Android malware apps in the paper "Malware Detection: A Framework for Reverse Engineered Android Applications" aims to address the challenges of identifying sophisticated and embedded malware in Android applications. The paper presents two key enhancements to the existing system: 1. A model that incorporates innovative static feature sets with larger current datasets of malware samples 2. An ensemble learning approach using machine learning algorithms to improve the model's performance. The proposed model achieved an impressive 96.24% accuracy in detecting extracted malware from Android applications, with a 0.3 False Positive Rate (FPR). Additionally, the system uses reverse- engineered Android applications features and machine learning algorithms to find vulnerabilities present in smartphone applications. Overall, this framework offers a more effective and efficient approach to detect Android malware, improving the safety and privacy of phone users.
  • 19.
    REFERENCES • [1] A.O. Christiana, B. A. Gyunka, and A. Noah, “Android Malware Detection through Machine Learning Techniques: A Review,” Int. J. Online Biomed. Eng. IJOE, vol. 16, no. 02, p. 14, Feb. 2020, doi: 10.3991/ijoe.v16i02.11549. • [2] D. Ghimire and J. Lee, “Geometric Feature-Based Facial Expression Recognition in Image Sequences Using Multi- Class AdaBoost and Support Vector Machines,” Sensors, vol. 13, no. 6, pp. 7714–7734, Jun. 2013, doi: 10.3390/s130607714. • [3] R. Wang, “AdaBoost for Feature Selection, Classification and Its Relation with SVM, A Review,” Phys. Procedia, vol. 25, pp. 800–807, 2012, doi: 10.1016/j.phpro.2012.03.160. • [4] J. Sun, H. Fujita, P. Chen, and H. Li, “Dynamic financial distress prediction with concept drift based on time weighting combined with Adaboost support vector machine ensemble,” Knowl.-Based Syst., vol. 120, pp. 4–14, Mar. 2017, doi: 10.1016/j.knosys.2016.12.019. • [5] A. Garg and K. Tai, “Comparison of statistical and machine learning methods in modelling of data with multicollinearity,” Int. J. Model. Identif. Control, vol. 18, no. 4, p. 295, 2013, doi: 10.1504/IJMIC.2013.053535.
  • 20.
    REFERENCES • [6] C.P. Obite, N. P. Olewuezi, G. U. Ugwuanyim, and D. C. Bartholomew, “Multicollinearity Effect in Regression Analysis: A Feed Forward Artificial Neural Network Approach,” Asian J. Probab. Stat., pp. 22–33, Jan. 2020, doi: 10.9734/ajpas/2020/v6i130151. • [7] W. Wang et al., “Constructing Features for Detecting Android Malicious Applications: Issues, Taxonomy and Directions,” IEEE Access, vol. 7, pp. 67602–67631, 2019, doi: 10.1109/ACCESS.2019.2918139. • [8] B. Rashidi, C. Fung, and E. Bertino, “Android malicious application detection using support vector machine and active learning,” in 2017 13th International Conference on Network and Service Management 53 (CNSM), Tokyo, Nov. 2017, pp. 1–9. doi: 10.23919/CNSM.2017.8256035.