3. ABSTRACT
Email is an information stored on a computer that is exchanged between two usersover telecommunication.
Spam is one of the major threats posed to email users. Spam refers to the electronicmessaging system to send out unrequested or unwanted messages in bulk.
The privacy and security of large amount of sensitive data arethreadbymalicious spam.
Data mining has many approaches and alogorithms for email filtering. A classifier is asupervised function where the learned attribute is categorical.
Context based methods analyze the control of the email to determine if the email is spamor not.
5. ALGORITHMS:
• Algorithms used for the title for the Research in order to get the enhanced version to predict email
spam detection
•Logistic Regression Algorithm
•KNN Algorithm
•Random Forest Algorithm
•Naïve Baye’s Algorithm
•SVM Algorithm
• The main algorithm used in the title is “Random forest” and the remaining algorithms such as
“KNN Algorithm, Random Forest Algorithm,Naïve Baye’s Algorithm,SVM Algorithm” will be used
as the comparing algorithms with the main algorithm
6. STEPS USED IN RESEARCH TO GET ACCURACY
1.Data collection.
2.Data Preprocessing.
3.Splitting the dataset into Training set and Testing set.
4.Model Selection according to the Algorithm that to be
performed.
5.Model Training with the splitted Training set data’s.
6.Model Evaluation on Performance of appropriate
Accuracies.
7.Optimization of Parameters in Training set with the Model
.
8.Error Analysis of Selected Model .
9.Testing the Model with splitted Testing data set.
10.Getting Final Accuracy of the Tested dataset for
7. CAMPARISION OF LOGISTIC REGRESSION ALGORITHM AND KNN ALGORITHM
• The main algorithm “Logistic regression” will be compared with the “KNN Algorithm”.
• As the Result Logistic regression is getting Mean Accuracy of 98.00% percent , whereas
KNN algorithm is getting Mean Accuracy of 85.2% percent.
8. CAMPARISION OF LOGISTIC REGRESSION ALGORITHM AND NAÏVE BAYES
ALGORITHM
• The main algorithm “Logistic regression” will be compared with the “Naïve Baye’s Algorithm”.
• As the Result Logistic regression is getting Mean Accuracy of 98.00% percent , whereas Naïve
baye’s algorithm is getting Mean Accuracy of 97.1% percent.
9. CAMPARISION OF LOGISTIC REGRESSION ALGORITHM AND RANDOM
FOREST ALGORITHM
• The main algorithm “Logistic regression” will be compared with the “Random forest Algorithm”.
• As the Result Logistic regression is getting Mean Accuracy of 98.80% percent , whereas Random Forest algorithm is getting
Mean Accuracy of 98.1% percent.
10. CAMPARISION OF LOGISTIC REGRESSION ALGORITHM AND SVM
ALGORITHM
• The main algorithm “Logistic regression” will be compared with the “Random forest Algorithm”.
• As the Result Logistic regression is getting Mean Accuracy of 98.80% percent , whereas SVM algorithm is getting
Mean Accuracy of 95.00% percent
11. CONCLUSION
• In conclusion, this research concentrated on prediction of malware employing a classifier model
incorporating Random Forest, contrasting it with the Comparing algorithms. Findings unveiled a
significant accuracy advantage for Random Forest (97.2%) over the Algorithms with Accuracies such as :
Logistic Regression
KNN algorithm
Naïve Baye’s algorithm
SVM algorithm
Random Forest algorithm