This document describes improvements made to an existing natural language processing algorithm for detecting phishing emails. The original algorithm analyzed sentence structures and word relationships but had low recall. The author evaluated its performance on 500 legitimate and 438 phishing emails, finding a precision of 100% but recall of only 12%. Improvements included optimizing performance, adapting the verb-noun pair blacklist to focus on phishing emails, and combining it with a link analysis program. This increased precision to 99% and recall to 73.5%. The combined approach analyzes both email content and links to better identify sophisticated phishing emails that contain no malicious links.
PHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEYIJNSA Journal
Email is a channel of communication which is considered to be a confidential medium of communication for exchange of information among individuals and organisations. The confidentiality consideration about e-mail is no longer the case as attackers send malicious emails to users to deceive them into disclosing their private personal information such as username, password, and bank card details, etc. In search of a solution to combat phishing cybercrime attacks, different approaches have been developed. However, the traditional exiting solutions have been limited in assisting email users to identify phishing emails from legitimate ones. This paper reveals the different email and website phishing solutions in phishing attack detection. It first provides a literature analysis of different existing phishing mitigation approaches. It then provides a discussion on the limitations of the techniques, before concluding with an explorationin to how phishing detection can be improved.
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKijcseit
The World Wide Web has become an important part of our everyday life for information communication
and knowledge dissemination. It helps to transact information timely, rapidly and easily. Identifying theft
and identity fraud are referred as two sides of cyber-crime in which hackers and malicious users obtain the
personal data of existing legitimate users to attempt fraud or deception motivation for financial gain.
Malicious URLs host unsolicited content (spam, phishing, drive-by exploits, etc.) and lure unsuspecting
users to become victims of scams (monetary loss, theft of private information, and malware installation),
and cause losses of billions of dollars every year. To detect such crimes systems should be fast and precise
with the ability to detect new malicious content. Traditionally, this detection is done mostly through the
usage of blacklists. However, blacklists cannot be exhaustive, and lack the ability to detect newly generated
malicious URLs. To improve the generality of malicious URL detectors, machine learning techniques have
been explored with increasing attention in recent years. In this paper, I use a simple algorithm to detect
and predicting URLs it is good or bad and compared with two other algorithms to know (SVM, LR).
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A novel hybrid approach of SVM combined with NLP and probabilistic neural ne...IJECEIAES
Phishing attacks are one of the slanting cyber-attacks that apply socially engineered messages that are imparted to individuals from expert hackers going for tricking clients to uncover their delicate data, the most mainstream correspondence channel to those messages is through clients' emails. Phishing has turned into a generous danger for web clients and a noteworthy reason for money related misfortunes. Therefore, different arrangements have been created to handle this issue. Deceitful emails, also called phishing emails, utilize a scope of impact strategies to convince people to react, for example, promising a fiscal reward or summoning a feeling of criticalness. Regardless of far reaching alerts and intends to instruct clients to distinguish phishing sends, these are as yet a pervasive practice and a worthwhile business. The creators accept that influence, as a style of human correspondence intended to impact others, has a focal job in fruitful advanced tricks. Cyber criminals have ceaselessly propelling their techniques for assault. The current strategies to recognize the presence of such malevolent projects and to keep them from executing are static, dynamic and hybrid analysis. In this work we are proposing a hybrid methodology for phishing detection incorporating feature extraction and classification of the mails using SVM. At last, alongside the chose features, the PNN characterizes the spam mails from the genuine mails with more exactness and accuracy.
The Detection of Suspicious Email Based on Decision Tree ...IRJET Journal
This document summarizes a research paper that proposes a method for detecting suspicious emails using decision trees. The method extracts keywords and indicators from emails to classify them as suspicious, not suspicious, or possibly suspicious. An ID3 decision tree algorithm is used to analyze patterns in a training set of pre-classified emails and generate rules to classify new emails. The tree is built by recursively partitioning attributes based on their information gain. The resulting decision tree and rules can then be used to detect suspicious emails, which could help identify potential criminal activities or security threats.
This document discusses techniques for detecting compromised machines ("zombies") that are involved in spamming activities on a network. It proposes using heuristic search and message partitioning/replication to minimize spam access from zombies while ensuring data confidentiality and integrity. Zombies are controlled by botnet herders and use various techniques to send large volumes of spam while remaining untraceable, such as exploiting vulnerabilities on Windows systems to use infected machines as mail relays or sending spam from dynamic IP addresses. The document analyzes spam sent from different IPs to examine the extent to which spam originates from a small number of hosts.
This document discusses web spam detection using machine learning techniques. Specifically, it proposes an improved Naive Bayes classifier that incorporates user feedback and domain-specific features to better detect spam pages. The key points are:
1) Web spam has become a serious problem as internet usage has increased, threatening search engines and users. Spam pages aim to deceive search engines' ranking algorithms.
2) Existing spam detection techniques like content analysis are still lacking and Naive Bayes classifiers are commonly used but have limitations like treating all terms equally.
3) The paper proposes an improved Naive Bayes classifier that assigns different weights to terms based on user feedback and considers domain-specific features to reduce false positives and negatives and improve accuracy
PHISHING MITIGATION TECHNIQUES: A LITERATURE SURVEYIJNSA Journal
Email is a channel of communication which is considered to be a confidential medium of communication for exchange of information among individuals and organisations. The confidentiality consideration about e-mail is no longer the case as attackers send malicious emails to users to deceive them into disclosing their private personal information such as username, password, and bank card details, etc. In search of a solution to combat phishing cybercrime attacks, different approaches have been developed. However, the traditional exiting solutions have been limited in assisting email users to identify phishing emails from legitimate ones. This paper reveals the different email and website phishing solutions in phishing attack detection. It first provides a literature analysis of different existing phishing mitigation approaches. It then provides a discussion on the limitations of the techniques, before concluding with an explorationin to how phishing detection can be improved.
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKijcseit
The World Wide Web has become an important part of our everyday life for information communication
and knowledge dissemination. It helps to transact information timely, rapidly and easily. Identifying theft
and identity fraud are referred as two sides of cyber-crime in which hackers and malicious users obtain the
personal data of existing legitimate users to attempt fraud or deception motivation for financial gain.
Malicious URLs host unsolicited content (spam, phishing, drive-by exploits, etc.) and lure unsuspecting
users to become victims of scams (monetary loss, theft of private information, and malware installation),
and cause losses of billions of dollars every year. To detect such crimes systems should be fast and precise
with the ability to detect new malicious content. Traditionally, this detection is done mostly through the
usage of blacklists. However, blacklists cannot be exhaustive, and lack the ability to detect newly generated
malicious URLs. To improve the generality of malicious URL detectors, machine learning techniques have
been explored with increasing attention in recent years. In this paper, I use a simple algorithm to detect
and predicting URLs it is good or bad and compared with two other algorithms to know (SVM, LR).
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A novel hybrid approach of SVM combined with NLP and probabilistic neural ne...IJECEIAES
Phishing attacks are one of the slanting cyber-attacks that apply socially engineered messages that are imparted to individuals from expert hackers going for tricking clients to uncover their delicate data, the most mainstream correspondence channel to those messages is through clients' emails. Phishing has turned into a generous danger for web clients and a noteworthy reason for money related misfortunes. Therefore, different arrangements have been created to handle this issue. Deceitful emails, also called phishing emails, utilize a scope of impact strategies to convince people to react, for example, promising a fiscal reward or summoning a feeling of criticalness. Regardless of far reaching alerts and intends to instruct clients to distinguish phishing sends, these are as yet a pervasive practice and a worthwhile business. The creators accept that influence, as a style of human correspondence intended to impact others, has a focal job in fruitful advanced tricks. Cyber criminals have ceaselessly propelling their techniques for assault. The current strategies to recognize the presence of such malevolent projects and to keep them from executing are static, dynamic and hybrid analysis. In this work we are proposing a hybrid methodology for phishing detection incorporating feature extraction and classification of the mails using SVM. At last, alongside the chose features, the PNN characterizes the spam mails from the genuine mails with more exactness and accuracy.
The Detection of Suspicious Email Based on Decision Tree ...IRJET Journal
This document summarizes a research paper that proposes a method for detecting suspicious emails using decision trees. The method extracts keywords and indicators from emails to classify them as suspicious, not suspicious, or possibly suspicious. An ID3 decision tree algorithm is used to analyze patterns in a training set of pre-classified emails and generate rules to classify new emails. The tree is built by recursively partitioning attributes based on their information gain. The resulting decision tree and rules can then be used to detect suspicious emails, which could help identify potential criminal activities or security threats.
This document discusses techniques for detecting compromised machines ("zombies") that are involved in spamming activities on a network. It proposes using heuristic search and message partitioning/replication to minimize spam access from zombies while ensuring data confidentiality and integrity. Zombies are controlled by botnet herders and use various techniques to send large volumes of spam while remaining untraceable, such as exploiting vulnerabilities on Windows systems to use infected machines as mail relays or sending spam from dynamic IP addresses. The document analyzes spam sent from different IPs to examine the extent to which spam originates from a small number of hosts.
This document discusses web spam detection using machine learning techniques. Specifically, it proposes an improved Naive Bayes classifier that incorporates user feedback and domain-specific features to better detect spam pages. The key points are:
1) Web spam has become a serious problem as internet usage has increased, threatening search engines and users. Spam pages aim to deceive search engines' ranking algorithms.
2) Existing spam detection techniques like content analysis are still lacking and Naive Bayes classifiers are commonly used but have limitations like treating all terms equally.
3) The paper proposes an improved Naive Bayes classifier that assigns different weights to terms based on user feedback and considers domain-specific features to reduce false positives and negatives and improve accuracy
Cross breed Spam Categorization Method using Machine Learning TechniquesIJSRED
This document presents a hybrid spam filtering method using machine learning techniques of Naive Bayes and Markov Random Fields. It aims to efficiently identify and filter spam emails. The proposed method is evaluated based on accuracy, precision, and time taken. The results show the hybrid technique achieves a high true positive rate in detecting spam emails. Keywords discussed are email spam, Naive Bayes algorithm, and Markov Random Fields.
AN INTELLIGENT CLASSIFICATION MODEL FOR PHISHING EMAIL DETECTIONIJNSA Journal
Phishing attacks are one of the trending cyber-attacks that apply socially engineered messages that are
communicated to people from professional hackers aiming at fooling users to reveal their sensitive
information, the most popular communication channel to those messages is through users’ emails. This
paper presents an intelligent classification model for detecting phishing emails using knowledge discovery,
data mining and text processing techniques. This paper introduces the concept of phishing terms weighting
which evaluates the weight of phishing terms in each email. The pre-processing phase is enhanced by
applying text stemming and WordNet ontology to enrich the model with word synonyms. The model applied
the knowledge discovery procedures using five popular classification algorithms and achieved a notable
enhancement in classification accuracy; 99.1% accuracy was achieved using the Random Forest algorithm
and 98.4% using J48, which is –to our knowledge- the highest accuracy rate for an accredited data set.
This paper also presents a comparative study with similar proposed classification techniques.
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
Bayesian classifier works efficiently on some fields, and badly on some. The performance of Bayesian Classifier suffers in fields that involve correlated features. Feature selection is beneficial in reducing dimensionality, removing irrelevant data, incrementing learning accuracy, and improving result comprehensibility. But, the recent increase of dimensionality of data place a hard challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this paper, Bayesian Classifier with Correlation Based Feature Selection is introduced which can key out relevant features as well as redundancy among relevant features without pair wise correlation analysis. The efficiency and effectiveness of our method is presented through broad.
A review of spam filtering and measures of antispamAlexander Decker
This document discusses spam filtering and measures to reduce spam. It begins by defining spam as unsolicited email messages. It then discusses the problems caused by increasing spam volumes, such as becoming a security issue for businesses. Various types of spam are described, like text spam, image spam, and content-based spam. Methods for filtering spam are outlined, including collecting spam/anti-spam data, preprocessing to reduce noise, identifying bad senders, applying classification tools to specified content types, and storing results. Techniques for anti-spam filtering mentioned include white lists, rule-based filtering, content-based filtering, and pattern detection.
Now a days Short Message Service(SMS) is most popular way to communication for mobile user because it is cheapest mode or version for communication than other mode.SMS is used for transmitting short length msg of around 160 character to different devices such as smart phones, cellular phones, PDAs using standardized communication protocols. The amount of Short Message Service (SMS) spam is increasing. SMS spam should be put into the spam folder, not the inbox. The growth of the mobile phone users has led to a dramatic increase in SMS spam messages. To avoid this problem SMS filtering Techniques are used. Our proposed approach filters SMS spam on an independent mobile phone on a large dataset and acceptable processing time. There are different approaches able to automatically detect and remove most of these messages, and the best-known ones are based on Bayesian decision theory and Support Vector Machines. Riya Mehta | Ankita Gandhi"A Survey: SMS Spam Filtering" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-3 , April 2018, URL: http://www.ijtsrd.com/papers/ijtsrd12850.pdf http://www.ijtsrd.com/computer-science/data-miining/12850/a-survey-sms-spam-filtering/riya-mehta
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...IRJET Journal
This document discusses various techniques for detecting trolls using artificial intelligence and machine learning. It first reviews related work on sentiment analysis, supervised machine learning for troll detection, real-time sentiment analysis, and analyzing vulnerabilities in social networks. It then analyzes the limitations of current troll detection systems and how AI/ML solutions can help overcome these. The literature survey covers key approaches used for troll detection, including sentiment analysis, supervised learning models, and analyzing post vulnerabilities.
A Model for Fuzzy Logic Based Machine Learning Approach for Spam FilteringIOSR Journals
This document discusses machine learning techniques for spam filtering, including Naive Bayes, artificial neural networks, artificial immune systems, and fuzzy logic. It provides details on how Naive Bayes classification and artificial immune system classification work for spam filtering. Naive Bayes classification involves correlating words in emails with spam or non-spam and using Bayesian inference to calculate probabilities. Artificial immune system classification is inspired by the human immune system and involves gene libraries, negative selection, and clonal selection to recognize spam. The document aims to describe different machine learning approaches for automatic spam filtering.
Malicious-URL Detection using Logistic Regression TechniqueDr. Amarjeet Singh
Over the last few years, the Web has seen a
massive growth in the number and kinds of web services.
Web facilities such as online banking, gaming, and social
networking have promptly evolved as has the faith upon them
by people to perform daily tasks. As a result, a large amount
of information is uploaded on a daily to the Web. As these
web services drive new opportunities for people to interact,
they also create new opportunities for criminals. URLs are
launch pads for any web attacks such that any malicious
intention user can steal the identity of the legal person by
sending the malicious URL. Malicious URLs are a keystone
of Internet illegitimate activities. The dangers of these sites
have created a mandates for defences that protect end-users
from visiting them. The proposed approach is that classifies
URLs automatically by using Machine-Learning algorithm
called logistic regression that is used to binary classification.
The classifiers achieves 97% accuracy by learning phishing
URLs
IRJET- Identification of Clone Attacks in Social Networking SitesIRJET Journal
This document discusses identifying clone attacks in social networking sites. It proposes a two-level approach using fuzzy-sim algorithm for profile matching and three algorithms (Predictive FP Growth, Eclat, and Apriori) for user activity matching. An experiment compares the execution times of the three algorithms, finding that Predictive FP Growth has the fastest time of 181 milliseconds, making it best for user activity matching.
Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...IJARIIT
Criminalization may be a general development that has significantly extended in previous few years. In
order, to create the activity of the work businesses easy, use of technology is important. Crime investigation analysis
is a section records in data mining plays a crucial role in terms of predicting and learning the criminals. In our
paper, we've got planned an incorporated version for physical crime as well as cybercrime analysis. Our approach
uses data mining techniques for crime detection and criminal identity for physical crimes and digitized forensic tools
(DFT) for evaluating cybercrimes. The presented tool named as Comparative Digital Forensic Process tool
(CDFPT) is entirely based on digital forensic model and its stages named as Comparative Digital Forensic Process
Model (CDFPM). The primary step includes accepting the case details, categorizing the crime case as physical crime
or cybercrime and sooner or later storing the data in particular databases. For physical crime analysis we've used kmeans
approach cluster set of rules to make crime clusters. The k-means method effects are a lot advantageous by the
utilization of GMAPI generation. This provides advanced and consumer-friendly visual-aid to k-means approach for
tracing the region of the crime. we have applied KNN for criminal identification with the
help of observing beyond crimes and finding similar ones that suit this crime, if no past document is discovered then
the new crime sample are introduced to the crime data-set. With the advancements of web, the network form has
become much more complicated and attacking methods are further more than that as well. For crime analysis
we're detecting the attacks executed on host system through an outsider the usage of
assorted digitized forensic tools to produce information security with the help of generating reports for an
event which could need any investigation. Our digitized technique aids the development of the society
by helping the investigation businesses to follow a custom-built investigative technique in crime analysis and criminal
identification as opposed to manually looking the database to analyze criminal activities, and as a
result facilitate them in combating crimes.
The document discusses how customer involvement is crucial to defending against phishing attacks. While technology plays a role, phishing relies on tricking users into taking actions. The most effective solutions are regularly educating customers on identifying phishing techniques and conducting "ethical phishing" tests to modify customer behavior over time. By maintaining awareness and vigilance through ongoing training, organizations can significantly reduce the success of phishing scams.
An iac approach for detecting profile cloningIJNSA Journal
Nowadays, Online Social Networks (OSNs) are popular websites on the internet, which millions of users
register on and share their own personal information with others. Privacy threats and disclosing personal
information are the most important concerns of OSNs’ users. Recently, a new attack which is named
Identity Cloned Attack is detected on OSNs. In this attack the attacker tries to make a fake identity of a real
user in order to access to private information of the users’ friends which they do not publish on the public
profiles. In today OSNs, there are some verification services, but they are not active services and they are
useful for users who are familiar with online identity issues. In this paper, Identity cloned attacks are
explained in more details and a new and precise method to detect profile cloning in online social networks
is proposed. In this method, first, the social network is shown in a form of graph, then, according to
similarities among users, this graph is divided into smaller communities. Afterwards, all of the similar
profiles to the real profile are gathered (from the same community), then strength of relationship (among
all selected profiles and the real profile) is calculated, and those which have the less strength of
relationship will be verified by mutual friend system. In this study, in order to evaluate the effectiveness of
proposed method, all steps are applied on a dataset of Facebook, and finally this work is compared with
two previous works by applying them on the dataset.
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...IRJET Journal
This document discusses approaches for detecting phishing websites using machine learning. It describes three main approaches: 1) analyzing features of the URL, 2) checking the legitimacy of the website by examining the hosting and management details, and 3) using visual appearance analysis to check the genuineness of the website. It then proposes a hybrid approach that uses blacklist/whitelist screening, heuristic analysis of website features, and visual similarity comparisons to flag potential phishing sites.
IRJET - Unauthorized Terror Attack Tracking System using Web Usage MiningIRJET Journal
This document describes a system for tracking unauthorized terror attacks using web usage mining and sentiment analysis of social media data. It discusses collecting tweets on a topic using keywords, preprocessing the tweets, analyzing the sentiment of each tweet using TextBlob and VADER sentiment analysis tools, and visualizing the results through graphs and tables. The system aims to help detect terror-related activities by analyzing opinions and sentiments expressed on social media platforms like Twitter.
ReCAPTCHA is a system that uses CAPTCHAs to solve two problems at once: distinguishing humans from computers online, and digitizing old printed works. It displays words from scanned books that optical character recognition software was unable to read, and asks users to transcribe them. If users correctly type both the unknown word and a control word, their response is recorded and helps improve digitization. ReCAPTCHA has transcribed over 440 million words from scanned texts, matching the accuracy of professional human transcription. It is deployed on over 40,000 websites and harnesses the collective human effort of solving CAPTCHAs to advance an important goal.
The document describes the Columbia-GWU system submitted to the 2016 TAC KBP BeSt Evaluation. It discusses several approaches used for different languages and genres, including:
1) A sentiment system based on identifying the target only, adapted for English, Chinese, and Spanish.
2) An English sentiment system based on relation extraction, treating sentiment as a relation between source and target.
3) English and Chinese belief systems that combine high-precision word tagging with a high-recall default system.
4) A Spanish belief system based on weighted random choice of tags.
The document provides details on the data, approaches, and results for each language-specific system.
Panchayat Minister in Chhattisgarh uncovered that Anganwaadis give MidDay dinners, work proportion shops and the vast majority of them are controlled by ladies.
Chen Yoke Pin - Asian TYA Network event presentation at ricca ricca*festa 2016TYA Asia
Chen Yoke Pin - Asian TYA Network event presentation at ricca ricca*festa 2016.
29th July 2016, Naha, Okinawa, Japan.
Chen Yoke Pin (Malaysia)
Arts-Ed
Senior Manager
Asian TYA Network researches and promotes TYA (Theatre for Young Audiences) in East/South-East Asia, and networks to connect TYA professionals. Organiser of ricca ricca*festa, co-organised by ACO Okinawa and The Japan Foundation Asia Center.
Find out more at our website, Facebook page and via Twitter:
http://tya-asia.com
https://www.facebook.com/asianTYAnetwork
https://twitter.com/asianTYAnetwork
The Ministry of Culture is in charge of issues identifying with society, media, popular government, human rights at national level, the national minorities and the dialect and society of the Sami individuals.
El documento describe un proyecto de aprendizaje para estudiantes de grado 10 sobre la vida y obra del físico Stephen Hawking. Los estudiantes trabajaron en grupos para producir un resumen crítico de la película "La teoría de todo" aplicando el análisis crítico y el trabajo colaborativo.
Liew Kung Yu - Asian TYA Network event presentation at ricca ricca*festa 2016TYA Asia
Liew Kung Yu - Asian TYA Network event presentation at ricca ricca*festa 2016.
29th July 2016, Naha, Okinawa, Japan.
Kung Yu Liew (Malaysia)
Visual Artist
Asian TYA Network researches and promotes TYA (Theatre for Young Audiences) in East/South-East Asia, and networks to connect TYA professionals. Organiser of ricca ricca*festa, co-organised by ACO Okinawa and The Japan Foundation Asia Center.
Find out more at our website, Facebook page and via Twitter:
http://tya-asia.com
https://www.facebook.com/asianTYAnetwork
https://twitter.com/asianTYAnetwork
Cross breed Spam Categorization Method using Machine Learning TechniquesIJSRED
This document presents a hybrid spam filtering method using machine learning techniques of Naive Bayes and Markov Random Fields. It aims to efficiently identify and filter spam emails. The proposed method is evaluated based on accuracy, precision, and time taken. The results show the hybrid technique achieves a high true positive rate in detecting spam emails. Keywords discussed are email spam, Naive Bayes algorithm, and Markov Random Fields.
AN INTELLIGENT CLASSIFICATION MODEL FOR PHISHING EMAIL DETECTIONIJNSA Journal
Phishing attacks are one of the trending cyber-attacks that apply socially engineered messages that are
communicated to people from professional hackers aiming at fooling users to reveal their sensitive
information, the most popular communication channel to those messages is through users’ emails. This
paper presents an intelligent classification model for detecting phishing emails using knowledge discovery,
data mining and text processing techniques. This paper introduces the concept of phishing terms weighting
which evaluates the weight of phishing terms in each email. The pre-processing phase is enhanced by
applying text stemming and WordNet ontology to enrich the model with word synonyms. The model applied
the knowledge discovery procedures using five popular classification algorithms and achieved a notable
enhancement in classification accuracy; 99.1% accuracy was achieved using the Random Forest algorithm
and 98.4% using J48, which is –to our knowledge- the highest accuracy rate for an accredited data set.
This paper also presents a comparative study with similar proposed classification techniques.
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
Bayesian classifier works efficiently on some fields, and badly on some. The performance of Bayesian Classifier suffers in fields that involve correlated features. Feature selection is beneficial in reducing dimensionality, removing irrelevant data, incrementing learning accuracy, and improving result comprehensibility. But, the recent increase of dimensionality of data place a hard challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this paper, Bayesian Classifier with Correlation Based Feature Selection is introduced which can key out relevant features as well as redundancy among relevant features without pair wise correlation analysis. The efficiency and effectiveness of our method is presented through broad.
A review of spam filtering and measures of antispamAlexander Decker
This document discusses spam filtering and measures to reduce spam. It begins by defining spam as unsolicited email messages. It then discusses the problems caused by increasing spam volumes, such as becoming a security issue for businesses. Various types of spam are described, like text spam, image spam, and content-based spam. Methods for filtering spam are outlined, including collecting spam/anti-spam data, preprocessing to reduce noise, identifying bad senders, applying classification tools to specified content types, and storing results. Techniques for anti-spam filtering mentioned include white lists, rule-based filtering, content-based filtering, and pattern detection.
Now a days Short Message Service(SMS) is most popular way to communication for mobile user because it is cheapest mode or version for communication than other mode.SMS is used for transmitting short length msg of around 160 character to different devices such as smart phones, cellular phones, PDAs using standardized communication protocols. The amount of Short Message Service (SMS) spam is increasing. SMS spam should be put into the spam folder, not the inbox. The growth of the mobile phone users has led to a dramatic increase in SMS spam messages. To avoid this problem SMS filtering Techniques are used. Our proposed approach filters SMS spam on an independent mobile phone on a large dataset and acceptable processing time. There are different approaches able to automatically detect and remove most of these messages, and the best-known ones are based on Bayesian decision theory and Support Vector Machines. Riya Mehta | Ankita Gandhi"A Survey: SMS Spam Filtering" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-3 , April 2018, URL: http://www.ijtsrd.com/papers/ijtsrd12850.pdf http://www.ijtsrd.com/computer-science/data-miining/12850/a-survey-sms-spam-filtering/riya-mehta
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...IRJET Journal
This document discusses various techniques for detecting trolls using artificial intelligence and machine learning. It first reviews related work on sentiment analysis, supervised machine learning for troll detection, real-time sentiment analysis, and analyzing vulnerabilities in social networks. It then analyzes the limitations of current troll detection systems and how AI/ML solutions can help overcome these. The literature survey covers key approaches used for troll detection, including sentiment analysis, supervised learning models, and analyzing post vulnerabilities.
A Model for Fuzzy Logic Based Machine Learning Approach for Spam FilteringIOSR Journals
This document discusses machine learning techniques for spam filtering, including Naive Bayes, artificial neural networks, artificial immune systems, and fuzzy logic. It provides details on how Naive Bayes classification and artificial immune system classification work for spam filtering. Naive Bayes classification involves correlating words in emails with spam or non-spam and using Bayesian inference to calculate probabilities. Artificial immune system classification is inspired by the human immune system and involves gene libraries, negative selection, and clonal selection to recognize spam. The document aims to describe different machine learning approaches for automatic spam filtering.
Malicious-URL Detection using Logistic Regression TechniqueDr. Amarjeet Singh
Over the last few years, the Web has seen a
massive growth in the number and kinds of web services.
Web facilities such as online banking, gaming, and social
networking have promptly evolved as has the faith upon them
by people to perform daily tasks. As a result, a large amount
of information is uploaded on a daily to the Web. As these
web services drive new opportunities for people to interact,
they also create new opportunities for criminals. URLs are
launch pads for any web attacks such that any malicious
intention user can steal the identity of the legal person by
sending the malicious URL. Malicious URLs are a keystone
of Internet illegitimate activities. The dangers of these sites
have created a mandates for defences that protect end-users
from visiting them. The proposed approach is that classifies
URLs automatically by using Machine-Learning algorithm
called logistic regression that is used to binary classification.
The classifiers achieves 97% accuracy by learning phishing
URLs
IRJET- Identification of Clone Attacks in Social Networking SitesIRJET Journal
This document discusses identifying clone attacks in social networking sites. It proposes a two-level approach using fuzzy-sim algorithm for profile matching and three algorithms (Predictive FP Growth, Eclat, and Apriori) for user activity matching. An experiment compares the execution times of the three algorithms, finding that Predictive FP Growth has the fastest time of 181 milliseconds, making it best for user activity matching.
Physical and Cyber Crime Detection using Digital Forensic Approach: A Complet...IJARIIT
Criminalization may be a general development that has significantly extended in previous few years. In
order, to create the activity of the work businesses easy, use of technology is important. Crime investigation analysis
is a section records in data mining plays a crucial role in terms of predicting and learning the criminals. In our
paper, we've got planned an incorporated version for physical crime as well as cybercrime analysis. Our approach
uses data mining techniques for crime detection and criminal identity for physical crimes and digitized forensic tools
(DFT) for evaluating cybercrimes. The presented tool named as Comparative Digital Forensic Process tool
(CDFPT) is entirely based on digital forensic model and its stages named as Comparative Digital Forensic Process
Model (CDFPM). The primary step includes accepting the case details, categorizing the crime case as physical crime
or cybercrime and sooner or later storing the data in particular databases. For physical crime analysis we've used kmeans
approach cluster set of rules to make crime clusters. The k-means method effects are a lot advantageous by the
utilization of GMAPI generation. This provides advanced and consumer-friendly visual-aid to k-means approach for
tracing the region of the crime. we have applied KNN for criminal identification with the
help of observing beyond crimes and finding similar ones that suit this crime, if no past document is discovered then
the new crime sample are introduced to the crime data-set. With the advancements of web, the network form has
become much more complicated and attacking methods are further more than that as well. For crime analysis
we're detecting the attacks executed on host system through an outsider the usage of
assorted digitized forensic tools to produce information security with the help of generating reports for an
event which could need any investigation. Our digitized technique aids the development of the society
by helping the investigation businesses to follow a custom-built investigative technique in crime analysis and criminal
identification as opposed to manually looking the database to analyze criminal activities, and as a
result facilitate them in combating crimes.
The document discusses how customer involvement is crucial to defending against phishing attacks. While technology plays a role, phishing relies on tricking users into taking actions. The most effective solutions are regularly educating customers on identifying phishing techniques and conducting "ethical phishing" tests to modify customer behavior over time. By maintaining awareness and vigilance through ongoing training, organizations can significantly reduce the success of phishing scams.
An iac approach for detecting profile cloningIJNSA Journal
Nowadays, Online Social Networks (OSNs) are popular websites on the internet, which millions of users
register on and share their own personal information with others. Privacy threats and disclosing personal
information are the most important concerns of OSNs’ users. Recently, a new attack which is named
Identity Cloned Attack is detected on OSNs. In this attack the attacker tries to make a fake identity of a real
user in order to access to private information of the users’ friends which they do not publish on the public
profiles. In today OSNs, there are some verification services, but they are not active services and they are
useful for users who are familiar with online identity issues. In this paper, Identity cloned attacks are
explained in more details and a new and precise method to detect profile cloning in online social networks
is proposed. In this method, first, the social network is shown in a form of graph, then, according to
similarities among users, this graph is divided into smaller communities. Afterwards, all of the similar
profiles to the real profile are gathered (from the same community), then strength of relationship (among
all selected profiles and the real profile) is calculated, and those which have the less strength of
relationship will be verified by mutual friend system. In this study, in order to evaluate the effectiveness of
proposed method, all steps are applied on a dataset of Facebook, and finally this work is compared with
two previous works by applying them on the dataset.
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...IRJET Journal
This document discusses approaches for detecting phishing websites using machine learning. It describes three main approaches: 1) analyzing features of the URL, 2) checking the legitimacy of the website by examining the hosting and management details, and 3) using visual appearance analysis to check the genuineness of the website. It then proposes a hybrid approach that uses blacklist/whitelist screening, heuristic analysis of website features, and visual similarity comparisons to flag potential phishing sites.
IRJET - Unauthorized Terror Attack Tracking System using Web Usage MiningIRJET Journal
This document describes a system for tracking unauthorized terror attacks using web usage mining and sentiment analysis of social media data. It discusses collecting tweets on a topic using keywords, preprocessing the tweets, analyzing the sentiment of each tweet using TextBlob and VADER sentiment analysis tools, and visualizing the results through graphs and tables. The system aims to help detect terror-related activities by analyzing opinions and sentiments expressed on social media platforms like Twitter.
ReCAPTCHA is a system that uses CAPTCHAs to solve two problems at once: distinguishing humans from computers online, and digitizing old printed works. It displays words from scanned books that optical character recognition software was unable to read, and asks users to transcribe them. If users correctly type both the unknown word and a control word, their response is recorded and helps improve digitization. ReCAPTCHA has transcribed over 440 million words from scanned texts, matching the accuracy of professional human transcription. It is deployed on over 40,000 websites and harnesses the collective human effort of solving CAPTCHAs to advance an important goal.
The document describes the Columbia-GWU system submitted to the 2016 TAC KBP BeSt Evaluation. It discusses several approaches used for different languages and genres, including:
1) A sentiment system based on identifying the target only, adapted for English, Chinese, and Spanish.
2) An English sentiment system based on relation extraction, treating sentiment as a relation between source and target.
3) English and Chinese belief systems that combine high-precision word tagging with a high-recall default system.
4) A Spanish belief system based on weighted random choice of tags.
The document provides details on the data, approaches, and results for each language-specific system.
Panchayat Minister in Chhattisgarh uncovered that Anganwaadis give MidDay dinners, work proportion shops and the vast majority of them are controlled by ladies.
Chen Yoke Pin - Asian TYA Network event presentation at ricca ricca*festa 2016TYA Asia
Chen Yoke Pin - Asian TYA Network event presentation at ricca ricca*festa 2016.
29th July 2016, Naha, Okinawa, Japan.
Chen Yoke Pin (Malaysia)
Arts-Ed
Senior Manager
Asian TYA Network researches and promotes TYA (Theatre for Young Audiences) in East/South-East Asia, and networks to connect TYA professionals. Organiser of ricca ricca*festa, co-organised by ACO Okinawa and The Japan Foundation Asia Center.
Find out more at our website, Facebook page and via Twitter:
http://tya-asia.com
https://www.facebook.com/asianTYAnetwork
https://twitter.com/asianTYAnetwork
The Ministry of Culture is in charge of issues identifying with society, media, popular government, human rights at national level, the national minorities and the dialect and society of the Sami individuals.
El documento describe un proyecto de aprendizaje para estudiantes de grado 10 sobre la vida y obra del físico Stephen Hawking. Los estudiantes trabajaron en grupos para producir un resumen crítico de la película "La teoría de todo" aplicando el análisis crítico y el trabajo colaborativo.
Liew Kung Yu - Asian TYA Network event presentation at ricca ricca*festa 2016TYA Asia
Liew Kung Yu - Asian TYA Network event presentation at ricca ricca*festa 2016.
29th July 2016, Naha, Okinawa, Japan.
Kung Yu Liew (Malaysia)
Visual Artist
Asian TYA Network researches and promotes TYA (Theatre for Young Audiences) in East/South-East Asia, and networks to connect TYA professionals. Organiser of ricca ricca*festa, co-organised by ACO Okinawa and The Japan Foundation Asia Center.
Find out more at our website, Facebook page and via Twitter:
http://tya-asia.com
https://www.facebook.com/asianTYAnetwork
https://twitter.com/asianTYAnetwork
The Health Minister of Chhattisgarh, Shri Ajay Chandrakar, will oversee the third phase of the immunization drive called 'Mission Indradhanush' launched by the state's Health and Family Welfare Department in eleven areas. The campaign aims to vaccinate children against tuberculosis, diphtheria, DPT, hepatitis-B, measles, and polio to promote public health. The Ministry of Health and Family Welfare oversees health and family planning programs across India.
Melissa Tan - Asian TYA Network event presentation at ricca ricca*festa 2016TYA Asia
This document provides an overview of arts engagement efforts for young audiences in Singapore. It discusses the benefits of engaging young audiences such as encouraging lifelong interest in the arts and building cultural familiarity. Current landscape efforts include arts exposure in schools and established festival platforms. New efforts since 2014 include expanding quality arts programs in early childhood, building artist capacity, and raising awareness. Future plans involve deepening pre-school arts efforts, developing a dedicated children's arts center, and increasing arts opportunities for all children aged 12 and under to experience the arts both in and outside of school. The overall goal is to nurture life-long arts audiences, strengthen family bonds through shared arts experiences, and develop creative and confident learners.
This document provides contact information for Thomson Specialist Skin Centre, a skin clinic located in Singapore that treats conditions like eczema and hives. The clinic is located at Novena Medical Center, 10 Sinaran Drive #10-05, Square 2 in Singapore and specializes in treating skin conditions.
Este documento proporciona una introducción al uso básico de Microsoft Word 2010. Explica cómo abrir Word, identificar los elementos de la interfaz como la cinta de opciones, y describe los pasos básicos para crear un nuevo documento como preparar la página, escribir el texto, guardar y revisar el documento. También cubre temas como cambiar las vistas del documento, seleccionar formatos de texto y párrafos, y guardar y abrir archivos. El documento sirve como una guía práctica para usuarios principiantes de Word
Adjjima Na Patalung & Pavinee Samakkabutr - Asian TYA Network event presentat...TYA Asia
Adjjima Na Patalung & Pavinee Samakkabutr - Asian TYA Network event presentation at ricca ricca*festa 2016.
29th July 2016, Naha, Okinawa, Japan.
Adjjima Na Patalung (Thailand) & Pavinee Samakkabutr (Thailand)
Bangkok International Children’s Theatre Festival (BICT Fest)
http://www.bictfest.com
Asian TYA Network researches and promotes TYA (Theatre for Young Audiences) in East/South-East Asia, and networks to connect TYA professionals. Organiser of ricca ricca*festa, co-organised by ACO Okinawa and The Japan Foundation Asia Center.
Find out more at our website, Facebook page and via Twitter:
http://tya-asia.com
https://www.facebook.com/asianTYAnetwork
https://twitter.com/asianTYAnetwork
Alimentación en la niñez y adolescenciarocio piñanez
Este documento discute las necesidades nutricionales de los niños y las recomendaciones para una dieta saludable. Los niños requieren mayor cantidad de energía y nutrientes como carbohidratos, proteínas, grasas, vitaminas y minerales para apoyar su crecimiento y desarrollo. El hierro es importante para la producción de hemoglobina y resistencia a enfermedades, y se encuentra en alimentos como carnes y verduras de hoja verde. Las recomendaciones incluyen comer en horarios regulares, consumir alimentos de todos los grupos
How does your media product represent particular social groups? evaluation qu...Ethan_Whitmore
The document discusses how different social groups are represented in the author's film. It represents teens, males, and a triad group. Teens are represented by casting teen actors and having them dress casually. Males are represented through casting male actors and including props like guns that appeal to males. The triad bad guys are the antagonists represented by a group of three people working as a team. These various social groups are part of a conflict between good and evil in the film.
The document discusses different types of camera shots used in photojournalism and filmmaking. It defines wide shots, close-ups, extreme close-ups, medium shots, head and shoulder shots, long shots, two shots, point of view shots, over-the-shoulder shots, cut-in shots, and cut away shots. For each shot type, it provides examples and explains how the shot frames and focuses on the subject or scene. The goal is to help photographers and filmmakers understand how different shot types can be used to achieve different effects and guide the viewer's perspective.
This document summarizes 12 sources on topics related to cybersecurity threats such as spam, phishing, ransomware, and malware detection techniques. The sources discuss using machine learning algorithms and heuristic approaches to detect spam accounts, phishing websites, and ransomware variants. Some propose new techniques like using Twitter data and mobile messages to detect spam or combining feature selection with metaheuristic algorithms to identify phishing sites. Others analyze hyperlinks or topic distributions to identify phishing attacks or classify anti-phishing solutions. The document also mentions the rise of ransomware attacks and techniques used in related studies.
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
Bayesian classifier works efficiently on some fields, and badly on some. The performance of Bayesian Classifier suffers in fields that involve correlated features. Feature selection is beneficial in reducing dimensionality, removing irrelevant data, incrementing learning accuracy, and improving result comprehensibility. But, the recent increase of dimensionality of data place a hard challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this paper, Bayesian Classifier with Correlation Based Feature Selection is introduced which can key out relevant features as well as redundancy among relevant features without pair wise correlation analysis. The efficiency and effectiveness of our method is presented through broad.
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
This document summarizes a research paper on using correlation-based feature subset selection to improve spam detection accuracy when using a Bayesian classifier. The researchers introduce using feature subset selection to identify the most relevant features of spam emails while removing redundant features. This improves the accuracy of a naïve Bayesian classifier for spam detection from 65-74% to over 80%. It discusses how correlation-based feature subset selection works by selecting features highly correlated with the class (spam or not spam) but uncorrelated with each other. The researchers apply this method to a spam email dataset and achieve over 92% accuracy in spam detection using a Bayesian network classifier after feature subset selection, an improvement over using the classifier alone.
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
Bayesian classifier works efficiently on some fields, and badly on some. The performance of Bayesian Classifier suffers in
fields that involve correlated features. Feature selection is beneficial in reducing dimensionality, removing irrelevant data,
incrementing learning accuracy, and improving result comprehensibility. But, the recent increase of dimensionality of data place a hard
challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this paper, Bayesian Classifier
with Correlation Based Feature Selection is introduced which can key out relevant features as well as redundancy among relevant
features without pair wise correlation analysis. The efficiency and effectiveness of our method is presented through broad.
This document summarizes a student project on building a spam classifier. It defines spam and the problems it causes. It then introduces the goal of building a tool to identify spam messages. It reviews literature on spamming and organized cybercrime. The proposed solution discusses features of a modern spam filter, including threat detection using AI and machine learning. It provides a block diagram of the spam classifier that includes collecting an email data set, pre-processing email content, extracting and selecting features, implementing a K-Nearest Neighbors algorithm, and analyzing performance.
lectronic-mail is widely used most suitable method of transferring messages electronically from one
person to another, rising from and going to any part of the world. Main features of Electronic mail is its speed,
dependability, well-equipped storage options and a large number of added services make it highly well-liked
among people from all sectors of business and society. But being popular it also has negative side too. Electronics
mails are preferred media for a large number of attacks over the internet.. A number of the most popular attacks over
the internet include spams. Some methods are essentially in detection of spam related mails but they have higher false
positives. A number of filters such as Checksum-based filters, Bayesian filters, machine learning based and
memory-based filters are usually used in order to recognize spams. As spammers constantly try to find a way to
avoid existing filters, a new filters need to be developed to catch spam. This paper proposes to find an
resourceful spam mail filtering method using user profile base ontology. Ontologies permit for machineunderstandable
semantics of data. It is main to interchange information with each other for more efficient spam
filtering. Thus, it is essential to build ontology and a framework for capable email filtering. Using ontology that is
particularly designed to filter spam, bunch of useless bulk email could be filtered out on the system. We propose a
user profile-based spam filter that classifies email based on the likelihood that User profile within it have been
included in spam or valid email.
Spams are unwanted and also undesirable emails which are mass sent to the numerous victims. Further
penetration of spams into electronic processors and communication equipments such as computers and
mobiles as well as lack of control on the information shared on the internet and other communication
networks and also inefficiency of the spam detecting methods developed for Persian contexts are among the
main challenging issues of the Persian subscribers. This paper presents a novel and efficient method for
thematic identification of Persian spams. The proposed method is capable of identifying the Persian, spams
and also “Penglish” spams. “Penglish” is made up of two words Persian and English and demonstrates a
Persian text which is written by English alphabetic letters. Based on the experimental analysis of the 10000
spams of different type the efficiency of the proposed method is evaluated to be more than 98%. The
presented method is also capable of updating its databases taking the advantage of the feedbacks received
from the users.
In the era of information technology, information sharing has become very easy and fast. Many platforms are available for users to share information anywhere across the world. Among all information sharing mediums, email is the simplest, cheapest, and the most rapid method of information sharing worldwide. But, due to their simplicity, emails are vulnerable to different kinds of attacks, and the most common and dangerous one is spam. No one wants to receive emails not related to their interest because they waste receivers’ time and resources. Besides, these emails can have malicious content hidden in the form of attachments or URLs that may lead to the host system’s security breaches. Spam is any irrelevant and unwanted message or email sent by the attacker to a significant number of recipients by using emails or any other medium of information sharing. So, it requires an immense demand for the security of the email system. Spam emails may carry viruses, rats, and Trojans. Attackers mostly use this technique for luring users towards online services. They may send spam emails that contain attachments with the multiple-file extension, packed URLs that lead the user to malicious and spamming websites and end up with some sort of data or financial fraud and identify theft. Many email providers allow their users to make keywords base rules that automatically filter emails. Still, this approach is not very useful because it is difficult, and users do not want to customize their emails, due to which spammers attack their email accounts
An Indistinguishability Model for Evaluating Diverse Classes of Phishing Atta...CSCJournals
This document presents a theoretical model for evaluating phishing attacks based on the indistinguishability of natural and phishing message distributions. The model views a phishing attack as an attempt to generate messages that are indistinguishable from normal messages. It captures a phishing attack in terms of the statistical distance between the natural and phishing message probability distributions. The model also proposes metrics to analyze the success probability of phishing attacks and distinguisher algorithms. Finally, it discusses a new class of collaborative spear phishing attacks enabled by data breaches at large companies.
An Indistinguishability Model for Evaluating Diverse Classes of Phishing Atta...CSCJournals
Phishing is a growing threat to Internet users and causes billions of dollars in damage every year. While there are a number of research articles that study the tactics, techniques and procedures employed by phishers in the literature, in this paper, we present a theoretical yet practical model to study this menacing threat in a formal manner. While it is common folklore knowledge that a successful phishing attack entails creating messages that are indistinguishable from the natural, expected messages by the intended victim, this concept has not been formalized. Our model attempts to capture a phishing attack in terms of this indistinguishability between the natural and phishing message probability distributions. We view the actions performed by a phisher as an attempt to create messages that are indistinguishable to the victim from that of “normal” messages. To the best of our knowledge, this is the first study that places phishing on a concrete theoretical framework and offers a new perspective to analyze this threat. We propose metrics to analyze the success probability of a phishing attack taking into account the input used by a phisher and the work involved in creating deceptive email messages. Finally, we study and apply our model to a new class of phishing attacks called collaborative spear phishing that is gaining momentum. Recent examples include Operation Woolen-Goldfish in 2015, Rocket Kitten in 2014 and Epsilon email breach in 2011. We point out fundamental flaws in the current email-based marketing business model which enables such targeted spear phishing collaborative attacks. In this sense, our study is very timely and presents new and emerging trends in phishing.
Spear phishing attacks are a growing problem because they are highly targeted and effective at tricking users into revealing sensitive information or installing malware. Spear phishing emails impersonate trusted sources and use personal details of targets to bypass filters. A famous example is the 2011 RSA attack, where a spear phishing email downloaded malware that ultimately compromised several defense contractors. To stop these advanced attacks, organizations need integrated security across email and web that uses dynamic analysis to detect zero-day exploits and block malicious files and network callbacks, while also providing threat intelligence.
Review of the machine learning methods in the classification of phishing attackjournalBEEI
The development of computer networks today has increased rapidly. This can be seen based on the trend of computer users around the world, whereby they need to connect their computer to the Internet. This shows that the use of Internet networks is very important, whether for work purposes or access to social media accounts. However, in widely using this computer network, the privacy of computer users is in danger, especially for computer users who do not install security systems in their computer. This problem will allow hackers to hack and commit network attacks. This is very dangerous, especially for Internet users because hackers can steal confidential information such as bank login account or social media login account. The attacks that can be made include phishing attacks. The goal of this study is to review the types of phishing attacks and current methods used in preventing them. Based on the literature, the machine learning method is widely used to prevent phishing attacks. There are several algorithms that can be used in the machine learning method to prevent these attacks. This study focused on an algorithm that was thoroughly made and the methods in implementing this algorithm are discussed in detail.
PHISHING DETECTION IN IMS USING DOMAIN ONTOLOGY AND CBA – AN INNOVATIVE RULE ...ijistjournal
User ignorance towards the use of communication services like Instant Messengers, emails, websites, social networks etc. is becoming the biggest advantage for phishers. It is required to create technical awareness in users by educating them to create a phishing detection application which would generate phishing alerts for the user so that phishing messages are not ignored. The lack of basic security features to detect and prevent phishing has had a profound effect on the IM clients, as they lose their faith in e-banking and e-commerce transactions, which will have a disastrous impact on the corporate and banking sectors and businesses which rely heavily on the internet. Very little research contributions were available in for phishing detection in Instant messengers. A context based, dynamic and intelligent phishing detection methodology in IMs is proposed, to analyze and detect phishing in Instant Messages with relevance to domain ontology (OBIE) and utilizes the Classification based on Association (CBA) for generating phishing rules and alerting the victims. A PDS Monitoring system algorithm is used to identify the phishing activity during exchange of messages in IMs, with high ratio of precision and recall. The results have shown improvement by the increased percentage of precision and recall when compared to the existing methods.
Dealing with the threat of spoof and phishing mail attacks part 6#9 | Eyal ...Eyal Doron
In the following article, we will review the solution and the methods that we can use for dealing with the threat of – Phishing mail attacks and his derivative Spoof mail attack.
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORKijcseit
The World Wide Web has become an important part of our everyday life for information communication
and knowledge dissemination. It helps to transact information timely, rapidly and easily. Identifying theft
and identity fraud are referred as two sides of cyber-crime in which hackers and malicious users obtain the
personal data of existing legitimate users to attempt fraud or deception motivation for financial gain.
Malicious URLs host unsolicited content (spam, phishing, drive-by exploits, etc.) and lure unsuspecting
users to become victims of scams (monetary loss, theft of private information, and malware installation),
and cause losses of billions of dollars every year. To detect such crimes systems should be fast and precise
with the ability to detect new malicious content. Traditionally, this detection is done mostly through the
usage of blacklists. However, blacklists cannot be exhaustive, and lack the ability to detect newly generated
malicious URLs. To improve the generality of malicious URL detectors, machine learning techniques have
been explored with increasing attention in recent years. In this paper, I use a simple algorithm to detect
and predicting URLs it is good or bad and compared with two other algorithms to know (SVM, LR).
There are many malicious programs disbursing on Face book every single day. Within the recent occasions, online hackers have thought about recognition within the third-party application platform additionally to deployment of malicious programs. Programs that present appropriate method of online hackers to spread malicious content on Face book however, little is known concerning highlights of malicious programs and just how they function. Our goal ought to be to create a comprehensive application evaluator of face book the very first tool that will depend on recognition of malicious programs on Face book. To develop rigorous application evaluator of face book we utilize information that's collected by way of observation of posting conduct of Face book apps that are seen across numerous face book clients. This can be frequently possibly initial comprehensive study which has dedicated to malicious Face book programs that concentrate on quantifying additionally to knowledge of malicious programs making these particulars in to a effective recognition method. For structuring of rigorous application evaluator of face book, we utilize data within the security application within Facebook that examines profiles of Facebook clients.
SQL Vulnerability Prevention in Cybercrime using Dynamic Evaluation of Shell and Remote File Injection Attacks R. Ravi,
Department of Computer Science & Engineering,
Francis Xavier Engineering College, Tamil Nadu, India
Dr. Beulah Shekhar,
Department of Criminology,
Manonmanium Sundaranar University, Tamil Nadu, India
The document summarizes a research paper about AutoRE, a system that combines content-based and non-content-based approaches to detect spam emails generated by botnets in real-time. AutoRE first pre-processes URLs in emails to group related domains and then generates regular expressions to identify patterns. It verifies spam classifications using blacklists and behavioral analysis of email properties, sending times, and patterns. The document also discusses how AutoRE helped characterize botnets and their traffic, informing future research like systems that calculate sender reputations based on global email behavior analysis.
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MININGHeather Strinden
This document reviews literature on detecting phishing emails using data mining. It discusses how hybrid features that include both content and header information can be used to effectively classify emails as phishing or legitimate. Various techniques currently used for phishing email detection are examined, including network-level protections, authentication, client-side tools, user education, and server-side filters. Feature selection is important, as phishing emails often resemble legitimate emails, making detection complex. The review finds that server-side filters using machine learning classifiers on selected email features show promise as a solution.
All About Phishing Exploring User Research Through A Systematic Literature R...Gina Rizzo
This document summarizes a systematic literature review of 51 academic papers that studied users and phishing attacks. The review found that while over 367 papers were published on phishing starting in 2004 in the ACM Digital Library, only 13.9% (51 papers) included user studies using methods like interviews, surveys, or lab studies. Within these 51 papers, there was a lack of reporting important methodological details like participant numbers and characteristics. The review also noted potential recruitment biases in some of the studies. The document provides background on common phishing techniques and debates the importance of understanding users to prevent successful phishing attacks.
All About Phishing Exploring User Research Through A Systematic Literature R...
Research Report
1. Detecting Phishing Email Using Natural Language Processing
By Ian Harris, Tianrui Peng
University of California, Irvine
Abstract
Phishing emails areone of the most common and harmful threats that people are
constantly facing in contemporary society. In this paper, we will show our natural language
processing algorithm which can detect malicious emails by analyzing sentence structures
and the relationships between words. Our algorithm mainly focuses on analyzing the
content of phishing emails. There are many programs and research projects focused on
detecting phishing emails using the title, the header and the links inside the emails.
However, there are some sophisticated phishing emails that do not contain malicious links
inside. Our algorithm can outperform other algorithms at analyzing this type of content-
oriented phishing emails. Moreover, our algorithm can also be used to improve any existing
link-oriented phishing emails detection algorithm. By using a naturallanguage
processing(NLP) approach, our algorithm does not need to rely on constantly updated data,
and will be able to generalizeto newly generatedattacks. Therefore, our algorithm is able to
detect the constantly changing and developing phishing email attacksin contemporary
society.
1. Introduction
In contemporary society, the security of private information is a major concern of every
2. person. Social engineering attacksaredangerous threats that aim at using human interaction to
mentally manipulate people into exposing their confidential information.
Phishing is a type of social engineering attackthat focuses on gaining sensitive information by
disguising as a trustworthy entity. Electronic communications, such as email or text message are
common platforms for delivering phishing attacks. Phishing is a continual threat that constantly
affects our daily lives. Attackersoften gain personal information that effects the victims’ personal
lives, financial wellbeing, and work environment. BetweenMay 2004 and May 2005,
approximately 1.2 million computer users in United States suffered financial losses because of
phishing attacks, totaling approximately 929 million USD [1]. Based on the third Microsoft
Computer Safer Index Report released in February 2014, around 5 billion USD are lost to phishing
attacksannually [2].
Phishing emails are the most common type of phishing attacksthat people have to deal with.
Attackersareusually disguised as popular social websites, banks, administrators from IT
departments or popular shopping websites. These emails often lure users to enter personal
information into a malicious website which has a similar outlook to the legitimateone. There are
different types of phishing emails such as spear phishing, clone phishing, and whaling. As
technology becomes more integratedinto our culture, the damage caused by phishing emails
continues to rise. This growing threat calls for our attention to actively seek effective solutions for
this problem. The most common methods to deal with this problem are using reported blacklists,
user training, public awareness, and automatic phishing detection software.
There are many previous research projects that have attemptedto solve this problem by using
software classification approaches. The most common techniques being used aremachine
3. learning, blacklists, link analysis, and natural languageprocessing(NLP). NLP is suitable to solve
this problem because it can extract the semantic information from the content of the email
without relying on an existing website blacklist or link analysis. A legitimateemail typically
attemptsto present some information to the users. On the other hand, a malicious email usually
aims at luring targetedusers to visit malicious websites or to elicit a response. Using NLP
techniques, we can analyze the content of the message to determine its motives, and classify it
as legitimate or malicious.
There have been various attempts to use NLP to createan algorithm capable of detecting
phishing emails. The research presented here is based on an NLP algorithm proposed by
Professor IanG. Harris, Yuki Sawa, Ram Bhakta, and Christopher Hadnagy [3]. This algorithm was
originally designed to identify social engineering attacksby applying NLP techniques to
conversations. It examines all dialog text transmitted from the attacker to the targeteduser, and
checks the appropriateness of the sentence. A sentence is considered to be malicious if it
inquires sensitive information or commands a performance of action that might expose personal
information. [3] uses NLP techniques to detect questions and commands, and to identify their
relatedsubjects. Once a sentence is categorizedas a question or a command, its potential topics
are extractedby finding verb and noun pairs which connect verbs with their direct objects. Then
each pair is evaluated by whether it is contained by a blacklist of malicious verb and noun pairs.
1.1 My Contribution
My contributions of this research involved systematically evaluating the performance of [3]
generalizedto the detection of phishing emails, and improving the algorithm to gain better
4. performance at identifying phishing email attacks. I wrote a program that gatheredand parsed
438 malicious emails and 500 legitimate emails to evaluate[3]. Beforeimprovement, [3]’s
precision is 100% and its recall is 12%. After improving and adapting its blacklist, the new
program’s precision is 100% and 38%. Then by combining this program with another link analysis
program, Netcraft [14], the new program’s precision is 99% and its recall increased to 73.5%.
2. RelatedWork
Different research groups have attemptedto prevent phishing attacksthrough various
approaches. There arebroadly two types of phishing schemes. The first, arethe phishing
schemes that are used to detect phishing webpages. The second, arethe schemes that arefocus
on detecting phishing emails.
2.1 Work relatedto detectphishingwebsites
The most common approaches of detecting phishing websites are webpage content analysis
and link analysis. There are two major approaches to solve this problem. The first approach is
blacklist-based anti-phishing techniques such as Google Safe Browsing API. Google Safe Browsing
API allows users to validate whether an URL is in blacklists that are gatheredand updated by
Google [4]. The second approach is based on analyzing the content of the website. SpoofGuard is
a web browser plug-in createdby Stanford University, which uses this approach. It weights
certain malicious components found in the HTML content [5].
2.2 Work relatedto detectphishingemails
5. 2.21 Using URL analysis
Many research focus on analyzing the structure of the links rather than the content of the
emails to identify phishing emails. For example, an approach proposed by Garera, which
presents several differences between a malicious URL and a benign URL [6]. Through
identifying these distinctions, they createda logistic regression filter to detect phishing emails.
In addition, LinkGuard also uses data provided by APWG to identify common features of
malicious links contained in phishing emails [7].
2.22 Using email content
Many phishing detection algorithms arecontent-oriented. For example, the research of
Allen Stone which resulted in a software called EBIDS[8]. It uses NLP technologies to analyze
the plain text in emails. They first readan email in as a text file, and then feed the input into
OntoSem through the DEKADE API. The OSIM processes the text, writing its semantic and
referentialknowledge of the sentences in ontological terms. This step creates equivalence sets
for natural language strings. For example, instead of matching on “send us your account
information,” EBIDScan match on the concept of “request for personal information.” Then
based on the results of OntoSem, they run a string match algorithm to determine if the email
is a phishing email or not. The decision is based on a rule set. They put four rules in the rule
set for testing: account compromise, financial opportunity, account change, and opportunity.
So the program matches certain words and phrases within the email content, and then gets
the similarity between the email content and the four rules. If the similarity of a rule passes a
threshold, which means the email belongs to that rule, the program then decides the email is
6. malicious. If the email does not match any of these rules, then it is decided that this email is
legitimate.
Another content-oriented research also aimed at detecting phishing emails through an NLP
approach. The work of Aggarwalused content analysis to find four parametersthat can
decide whether an email is malicious or not [9]. These four parametersare: absence of
names, mention of money, reply inducing sentence, and sense of urgency. The algorithm
detects mention of money by keeping a set of names and symbols of all currencies in the
world and their common variants, and then checks whether the email contains any of them.
In order to detect presence of reply inducing sentence, it finds a set of words and phrases
which ask the user to reply to the email, such as contact, get back, and reply. It also uses
WordNet to include the hyponyms and synonyms of these reply inducing words in the set. In
order to find if any of these words are mentioned in the email content, it first uses Stanford
CoreNLP API to tag the words in the email with POS tags, and then stems the words of the
taggedfile to their base form. In addition, it can also detect sense of urgency by checking
whether the email contains words such as now, instantly, or immediately. If a sentence
contains both words that induce reply and have sense of urgency, then it means that the
email has high possibility to be malicious. After detecting these four parameters, it uses a
formula to combine the information it got, and give a final score.
2.23 Using Both URL and email content
Some approaches utilize all the information contained in an email, such as the header, the
links, and the text content. Verma implemented a software called PhishNet-NLP that utilizes
7. all the information contained in an email to detect phishing emails [10]. PhishNet-NLP
analyzes the text of email by using NLP techniques to give it a Textscore and a Contextscore.
PhishNet-NLP analyzes the Textscore by using following NLP techniques: lexical analysis,
part-of-speech tagging, named entity recognition, normalization of words to lower case,
stemming, and stop word removal. Firstly, it uses lexical analysis to split the email into
sentences, and each sentences into words. Then, it normalizes the words into lower case,
removes the stop words, and stems the words. Then PhishNet-NLP uses named entity
recognition to find out whether the email at least mentioned one institution in the body. If
there is zeroinstitution mentioned, then the email receives a Textscore of 0, which stands for
legitimate. PhishNet-NLP also defines a set of special verbs SV, which is a set that contains
verbs that usually used by malicious emails to instruct people to do certain actions, and
contains hyponyms and synonyms of these verbs. PhishNet-NLP gets hyponyms and
synonyms of verbs by using WordNet. Then PhishNet-NLP gives each of these verbs a score
by using a formula that takes four factors into account: x, l, a, and L. The values of parameters
x and a depend on whether it finds a certain combination of words in a sentence. The
parameter l depends on the number of links in the email. And the parameter L is the level of
the verb which is one more than the least number of hyponymy links followed to reach the
verb from a synset. After computing scores for each verb of the set SV, the Textscore of an
email is equal to the maximum score of all the verb scores.
Besides Textscore, PhishNet-NLP also gives a Contextscore for each email. It treats the email
as a vector of TF-IDF values in the semantics space after applying stopword elimination and
stemming. It uses the Contextscore to check the similarity between this email with other
8. emails that user received before. The Contextscore = 1 when it finds a similar email in the
inbox.
After computing both Textscore and Contextsocre, PhishNet-NLP combines these two
scores together to get a Final-text-score. Then it combines Final-text-score, headerScore, and
linkScore together to decide whether an email is malicious or legitimate.
3. Introductiontothe Algorithmof SEParser
This research is based on the algorithm created by [3]. [3] is an approach to detect social
engineering attacks by using NLP techniques to parse conversations. To detect social engineering
attacks, SEParser that analyzes the dialogs through semantic analysis of each sentence. Figure 1
presents the structure of the detecting process of SEParser.
Figure 1 Outline of the detection process
During most of the social engineering attacks, the attacker must express one of the following
types of sentence in order to lure sensitive information from users:
1. a question relatedto sensitive information
2. a command to perform a dangerous operation
SEParser analyzes the text conversation between the attacker and the victim, and checks the
appropriateness of the topic. If a sentence inquires personal information or demands dangerous
action, the topic of that sentence is not appropriate. SEParser uses NLPtechniques to find a pattern
9. in the sentence’s structure. By using this pattern, SEParser would parse all the sentences in the
conversation, and detects questions and commands from them. As soon as SEParser finds the
questions and commands, it would extract themain topics of them. The main topic of the sentence
is extractedby analyzing the parse tree of the sentence. For example, Figure 2 presents a security
policy statement.
Networking equipment must not be manipulated.
Figure 2 Security policy statement
By manually analyzing this sentence, we can identify the main topic and its related action
shown in Table 1.
For this statement, the topic of the sentence is “networking equipment”, and its related
action is “manipulate”. In order to identify this topic and action pair, SEParser will first get a parse
treeof thesentence by using Stanford Parser [11]. The Stanford Parser is a context-free parser that
is well-developed and used by various NLP research projects. The parse treereturned by Stanford
Parser gives every word in the sentence a tag. Stanford Parser can be set to use various taggers,
however the tagsshown in the figure arefrom the Penn Treebank Tagset [12]. Figure 3 showed the
parse tree of the sentence.
10. Figure 3 Parse tree of the first sentence
By analyzing the sentence structure, SEParser would identify thesentence as not a command
or a question. Therefore, it will not continue finding the topic of the statement. However, ifwe use
another example such as “Reset the router”, the SEParser would first get the parser tree from the
Stanford parser shown in Figure 4.
Figure 4 Parse tree of "reset router"
SEParser would identify this sentence as a command by analyzing its parse tree. It would
recognize the pattern of the direct command’s sentence structure. Then SEParser would find the
topic of the sentence as router, and its related verb as reset. Therefore, the verb/noun pair of the
sentence is (reset, router). SEParser detects this verb/noun pair by finding the main verbs of the
sentence and their direct objects. The verb, “reset”, is a typical command of malicious actions, so
it would be found in the blacklist. Moreover, the noun related to this verb is “router”, which is an
important type of networking equipment. Therefore, this verb/noun pair would be found in the
blacklist of the verb/noun pairs used by SEParser. SEPaser would detect this sentence as malicious,
and would alert victims before actual damage happens.
11. 3.1 SEParser Algorithm
Figure 5 presents the process of question/command detection and topic extraction.
Figure 5 Question/command detection and Topic detection algorithm
As shown in Figure 5, SEParser first gets s best parse trees from Stanford Parser. Then, it would
use patterns to decide the type of the sentence. If the sentence is a question or a command,
SEParser would extract theverb/noun pair of the sentence. For each verb/pair, SEParser would use
MatchTopic Algorithm to determine the appropriateness of the pair as shown in Figure 6.
Figure 6 MatchTopic Algorithm
The MatchTopic Algorithm gets inputs from the previous algorithm, and then searches the
blacklist to determine whether the verb/noun pair inputs are in the blacklist or not. In order to
identify a verb/noun pair as malicious, both verb and noun have to be exactly matched the entry
in the blacklist.
4. Applying and Improving AlgorithmtoDetectionof Phishing Emails
12. In this research, we applied algorithm [3] to detect phishing emails, and improved this
algorithm in order to gainbetter performance. We made three major improvements:
1. Optimizing the performance of the algorithm.
2. Adapting and extending verb/noun pair blacklist to phishing emails.
3. Combining algorithm [3] with a link analysis software, Netcraft [14].
In order to apply algorithm [3] to detect phishing emails, we first systematically tested the
performance of SEParser on detecting phishing emails without any improvement. We used 500
legitimate emails and 438 malicious emails. After testing, the first improvement we made was
optimizing SEParser so that it would run faster on larger text corpus. By changing a nested for-loop
and deleting some unused functions, we managed to triple the running speed of SEParser.
The second improvement we made was extending and adjusting the verb/noun pair blacklist so
it would be specialized to phishing emails context, instead of conversation. There arephrases such
as “click on the link” or “send this information back” that appear often in phishing emails, but are
less common in conversations. Therefore, we first experimented in optimizing the blacklist to
better identify common phishing phrases. We used information extraction techniques to find the
most common verbs and nouns pairs used in malicious emails and legitimate emails. Then, we
manually went through and added the main difference between these words into the verb/noun
pair blacklist.
The third improvement was combining SEParser with a link analysis tool, Netcraft [14]. After the
second improvement, we realized that the majority of the emails can be detected by link analysis.
Therefore, we decided that for detecting phishing email, it was practicalto combine our algorithm
with a link analysis tool. After carefully examining various link analysis tools, we decided to use
13. Netcraft [14]. Netcraft is an anti-phishing toolbar that uses link analysis to detect phishing URLs.
There areat least three research project tested and reported Netcraft as one of the most accurate
anti-phishing link analysis program. Netcraft uses serval ways todetect phishing URLs. The first way
is using black lists of phishing websites reported by community. Netcraft constantly updates these
blacklists. Netcraft also analyzes the potential phishing website’s domain, IP address, Hosting
company, Hosting country, Latest performance, Hosting history, and Site technology. By using and
analyzing all this information comprehensively, Netcraft gives each link a risk rating on a scale of 0
to 10. A lower risk rating means that the link is less likely to be malicious. We created a tool that
parses the html format of phishing emails and extracts the links from them. We used a python
library called beautifulsoup [15] to parse the html content of the email and extract the links
embedded in the emails. Some phishing emails trick users to click on malicious links by showing
non malicious links in plain text, and embedding the actual malicious link in the hyperlink
connected to the text. By parsing the html instead of the plain text, we were able to extract the
actual links from the emails. Once we got all the links, we created a program that takes the links
and sends an http request to Netcrafts’ website. The program extracts the result it gets from the
website to find the risk rating of the link. For our program, we choose to use 5 as the threshold. If
the risk rating of a link is higher than 5, then the programidentifies this link as malicious.
After feeding the link into Netcraft and parsing the results, the algorithm then determines
whether it is necessary to run the SEParser. If the link is not identified as malicious by Netcraft,
then the algorithm runs the improved SEParser to analyze the content.
5. Experimental Results
14. 5.1 Databases used to Test theDetection of Phishing Emails
In order to systematically test the performance of SEParser on detection of phishing emails,
we decided to collect two different databases: a legitimate email corpus and a phishing email
corpus. For the legitimate email corpus, we chose to use the enron email corpus. And for the
malicious email corpus, we chose to use the phishing email corpus collected and shared by
Nazario, J [13]. This phishing email corpus is also used by other research projects such as
PhishNet-NLP [10] and PhishCatch. For test purposes, we only picked 500 legitimate emails and
438 phishing emails.
We created a script called read_mbox.py to parse these emails. Most of the emails from the
enron email corpus arein Mbox format. Theemail collected by Jose are also in the Mbox format
and the contents aresometimes html or plain text. Therefore, the script first checks the format
of the email and parses the content accordingly.
5.2 Results Before Combining with Netcraft
5.21 Optimization to ImproveRun Time
For our first test, we directly applied SEParser to our databases without major changes. We
found out that we needed to optimize the performance of the program in order for it to parse
around 1000 emails in a reasonable amount of time. Beforethe changes, weran the programon
a laptop which has an IntelCore i7 processor, 256GB solid state drive and running Windows 10.
This computer was able to process 438 emails in around 3 hours. We realized the running time
was too long for us to effectively test the algorithm. Therefore, we made several changes to the
code so that it ran3 times faster than before. After the changes, we ranthe programon the same
15. computer, this time it only took 55 minutes for the programto parse the 438 emails.
5.22 Adjusting verb/noun pairblacklist
For the first test, SEParser detected 54 malicious emails out of 438 phishing emails and 0 out
of 500 legitimate emails. Therefore, its true positive is 12 percent, and 0 percent false positive.
We were satisfied with the false positive, but we decided to extend the verb/noun pair blacklist
in order to increase thetrue positive. Therefore, based on the formula shown in equation (1) and
(2), the precision of the algorithm is 100 percent, and the recall is 12 percent.
Equation (1): Precision =
𝑡𝑝
𝑡𝑝+𝑓𝑝
Equation (2): Recall=
𝑡𝑝
𝑡𝑝+𝑓𝑛
The equation uses tp as the number of true positives, fp as the number of false positives, and
fn as the number of false negatives.
For our second test, we mainly focused on finding and adding appropriate verb/noun pairs
that suited the phishing email context. After adding the appropriate verb/noun pairs, we
managed to increase the true positive from 12 percent to 38 percent. And the false positive was
still 0 percent. After the change, the precision of the algorithm is still 100 percent, and the recall
is 38 percent.
5.3 Results After combining with Netcraft
By using Netcraft to analyze the links inside phishing emails, we detected 255 emails out of
438 malicious emails. Then we ran SEParser on the rest of the 183 malicious emails, and it
detected73. By combing with Netcraft, we detected 322 phishing emails out 438. Therefore, the
16. true positive of our program is 73.5 percent. Then we ran our program on the legitimate email
corpus, and it detected2out of 500 legitimateemails. Therefore, our false positive is 0.4 percent.
After combining with Netcraft, theprecision of thealgorithm is 99 percent, and the recall is 73.8
percent.
Table 3 shown below presents the performance comparisons for the different
improvements. From this table, wecan see that therecallof the programincreased around 200%
after improvement. For all of these tests, we used 438 malicious emails, and 500 legitimate
emails.
SEParser Netcraft SEParser+Netcraft
True Positive 166 255 322
False Positive 0 2 2
Precision 100% 99% 99%
recall 38% 58% 73.5%
Table 3 Comparison between different results
6. Conclusion
In this paper, we havetested and improved SEparser on detecting phishing emails. Bytesting
and adjusting SEparser, we have shown that SEparser’s algorithm can be generalized from
analyzing dialogs of social engineer attacksto detect phishing emails by analyzing the parse tree
and grammar structure of sentences. And by combining improved SEparser with other link
analysis software, it is able to detect most of the malicious emails with less than one percent
false positive. To further improve the performance in the future project, we believe that we can
17. find certain pattern in most of the phishing emails. By analyzing parse tree of sentences, the
program would be able to identify this pattern from the content of the email. By semantically
analyzing the content, thefuture programshould be ableto detect more phishing emails without
causing more false positives.
7. Reference
1. Kerstein, Paul (July 19, 2005). "How Can We Stop Phishing and Pharming Scams?". CSO.
Archived from the originalon March 24, 2008.
2. "20% Indians are victims of Online phishing attacks: Microsoft". IANS. news.biharprabha.com.
Retrieved February 11,2014.
3. Y. Sawa, H. R. Bhakta, I. G. Harris, "Detection of Social Engineering Attacks Through Natural
Language Processingof Conversations", IEEE ConferenceonSemanticComputing, February2016
4. Google, “Google safe browsing API,” http://code.google.com/apis/safebrowsing/, accessed Oct
2011
5. N. Chou, R. Ledesma, Y. Teraguchi, and J. C. Mitchell, “Client-side defense against web-based
identity theft,” in NDSS. The Internet Society, 2004.
6. Garera, S., Provos, N., Chew, M., Rubin, A.: A framework for detection and measurement of
phishing attacks. In: Proc. 2007 ACM Workshop on Recurring Malcode, pp. 1—8 (2007)
7. Chen, J., Guo, C.: Online detection and prevention of phishing attacks. In: First Int’l Conf. on
Communications and Networking in China, ChinaCom 2006, PP. 1—7 IEEE (2006)
8. A. Stone, "Natural-LanguageProcessing for Intrusion Detection," in Computer, vol. 40, no. 12,
pp. 103-105, Dec. 2007.doi: 10.1109/MC.2007.437
18. 9. Shivam Aggarwal, Vishal Kumar, and S. D. Sudarsan. Identification and detection of phishing
emails using natural languageprocessing techniques. In Proceedings of the 7th International
Conference on Security of Information and Networks, SIN ’14, pages 217:217–217:222. ACM,
2014.
10. Verma et al: Detecting Phishing Emails the Natural Language Way. In: ESORICS 2012, LNCS
7459, pp. 824–841, 2012.
11. Dan Klein and Christopher D. Manning. Accurate unlexicalized parsing. In Proceedings of the
41st
Annual Meeting on Association for Computational Linguistics – Volume 1, 2003.
12. Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated
corpus of English: The penn treebank. Comput. Linguist., 19(2), June 1993.
13. Nazario, J.: The online phishing corpus (2016)
14. 3Sharp, 3Sharp Study finds Internet Explorer 7 Edges Out Netcraft As Most Accurate for Anti-
Phishing Protection. 2006. http://www.3sharp.com/projects/antiphishing/
15. BeatifulSoup Python Library.: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ (2016)