SlideShare a Scribd company logo
1 of 16
Download to read offline
1
A REPORT
on
DETECTION OF PHISHING
WEBSITE USING MACHINE
LEARNING
NAME MANIKANTAN ARCOT
REG N0 RA1711003040038
CLASS CSE –“C” 3RD
YEAR
2
CHAPTER 1
INTRODUCTION
Social engineering attack is a common security threat used to reveal private and
confidential information by simply tricking the users without being detected. The main
purpose of this attack is to gain sensitive information such as username, password and
account numbers. According to, phishing or web spoofing technique is one example of
social engineering attack. Phishing attack may appear in many types of communication
forms such as messaging, SMS, VOIP and fraudster emails. Users commonly have many
user accounts on various websites including social network, email and also accounts for
banking. Therefore, the innocent web users are the most vulnerable targets towards this
attack since the fact that most people are unaware of their valuable information, which
helps to make this attack successful.
Typically phishing attack exploits the social engineering to lure the victim through
sending a spoofed link by redirecting the victim to a fake web page. The spoofed link is
placed on the popular web pages or sent via email to the victim. The fake webpage is
created similar to the legitimate webpage. Thus, rather than directing the victim request to
the real web server, it will be directed to the attacker server. The current solutions of
antivirus, firewall and designated software do not fully prevent the web spoofing attack.
The implementation of Secure Socket Layer (SSL) and digital certificate (CA) also does
not protect the web user against such attack. In web spoofing attack, the attacker diverts
the request to fake web server. In fact, a certain type of SSL and CA can be forged while
everything appears to be legitimate. According to, secure browsing connection does
virtually nothing to protect the users especially from the attackers that have knowledge on
how the “secure” connections actually work. This paper develops an anti-web spoofing
solution based on inspecting the URLs of fake web pages. This solution developed series
of steps to check characteristics of websites Uniform Resources Locators (URLs).
3
CHAPTER 2
ABOUT PROJECT
This section describes the proposed model of phishing attack detection. The proposed
model focuses on identifying the phishing attack based on checking phishing websites
features, Blacklist and WHOIS database. According to few selected features can be used
to differentiate between legitimate and spoofed web pages. These selected features are
many such as URLs, domain identity, security & encryption, source code, page style and
contents, web address bar and social human factor. This study focuses only on URLs and
domain name features. Features of URLs and domain names are checked using several
criteria such as IP Address, long URL address, adding a prefix or suffix, redirecting using
the symbol “//”, and URLs having the symbol “@”.These features are inspected using a
set of rules in order to distinguish URLs of phishing webpages from the URLs of
legitimate websites.
WORKING
At first the data sets is created using the information collected from the various sources.
Once the data set is created , this data set is fed to K Means clustering algorithm and the
model is trained using this data set. A web application is developed , a front end GUI is
created using HTML , CSS and simple JAVA script code and the model that is trained
with the data sets that are created acts as a back end server.
When phishing URL is fed to the model, the model analyses the URL that is fed and
gives the appropriate output. Once the machine learning model analyses the given URL,
it sends a message to the front end portal whether it is a legitimate site or a phishing site.
4
CHAPTER 3
TOOLS AND TECHNOLOGY
4.1 PYTHON
In technical terms, Python is an object-oriented, high-level programming language with
integrated dynamic semantics primarily for web and app development. It is extremely
attractive in the field of Rapid Application Development because it offers dynamic typing
and dynamic binding options.
Python is relatively simple, so it's easy to learn since it requires a unique syntax that
focuses on readability. Developers can read and translate Python code much easier than
other languages. In turn, this reduces the cost of program maintenance and development
because it allows teams to work collaboratively without significant language and
experience barriers.
Additionally, Python supports the use of modules and packages, which means that
programs can be designed in a modular style and code can be reused across a variety of
projects. Once you've developed a module or package you need, it can be scaled for use
in other projects, and it's easy to import or export these modules.
One of the most promising benefits of Python is that both the standard library and the
interpreter are available free of charge, in both binary and source form. There is no
exclusivity either, as Python and all the necessary tools are available on all major
platforms. Therefore, it is an enticing option for developers who don't want to worry
about paying high development costs.
That makes Python accessible to almost anyone. If you have the time to learn, you can
create some amazing things with the language.
5
Python is a general-purpose programming language, which is another way to say that it
can be used for nearly everything. Most importantly, it is an interpreted language, which
means that the written code is not actually translated to a computer-readable format at
runtime. Whereas, most programming languages do this conversion before the program is
even run. This type of language is also referred to as a "scripting language" because it
was initially meant to be used for trivial projects.
The concept of a "scripting language" has changed considerably since its inception,
because Python is now used to write large, commercial style applications, instead of just
banal ones. This reliance on Python has grown even more so as the internet gained
popularity. A large majority of web applications and platforms rely on Python, including
Google's search engine, YouTube, and the web-oriented transaction system of the New
York Stock Exchange (NYSE). We know the language must be pretty serious when it's
powering a stock exchange system.
Python can also be used to process text, display numbers or images, solve scientific
equations, and save data. In short, it is used behind the scenes to process a lot of elements
you might need or encounter on your device(s) - mobile included.
BENEFITS:
1) Python can be used to develop prototypes, and quickly because it is so easy to work
with and read.
2) Most automation, data mining, and big data platforms rely on Python.
3) Python allows for a more productive coding environment than massive languages like
C# and Java. Experienced coders tend to stay more organized and productive when
working with Python
6
4) Python is easy to read, even if you're not a skilled programmer. Anyone can begin
working with the language, all it takes is a bit of patience and a lot of practice. Plus,
this makes it an ideal candidate for use among multi-programmer and large
development teams.
5) Python powers Django, a complete and open source web application framework.
Frameworks - like Ruby on Rails - can be used to simplify the development process.
6) It has a massive support base thanks to the fact that it is open source and community
developed. Millions of like-minded developers work with the language on a daily
basis and continue to improve core functionality. The latest version of Python
continues to receive enhancements and updates as time progresses. This is a great way
to network with other developers.
4. K MEANS CLUSTERING ALGORITHM
Clustering is one of the most common exploratory data analysis technique used to get an
intuition about the structure of the data. It can be defined as the task of identifying
subgroups in the data such that data points in the same subgroup (cluster) are very similar
while data points in different clusters are very different. In other words, we try to find
homogeneous subgroups within the data such that data points in each cluster are as
similar as possible according to a similarity measure such as euclidean-based distance or
correlation-based distance. The decision of which similarity measure to use is
applicationspecific.
Clustering analysis can be done on the basis of features where we try to find subgroups of
samples based on features or on the basis of samples where we try to find subgroups of
features based on samples. We’ll cover here clustering based on features. Clustering is
used in market segmentation; where we try to fined customers that are similar to each
7
other whether in terms of behaviors or attributes, image segmentation/compression;
where we try to group similar regions together, document clustering based on topics, etc.
Unlike supervised learning, clustering is considered an unsupervised learning method
since we don’t have the ground truth to compare the output of the clustering algorithm to
the true labels to evaluate its performance. We only want to try to investigate the structure
of the data by grouping the data points into distinct subgroups.
In this post, we will cover only Kmeans which is considered as one of the most used
clustering algorithms due to its simplicity.
Kmeans algorithm is an iterative algorithm that tries to partition the dataset into
Kpredefined distinct non-overlapping subgroups (clusters) where each data point belongs
to only one group. It tries to make the inter-cluster data points as similar as possible while
also keeping the clusters as different (far) as possible. It assigns data points to a cluster
such that the sum of the squared distance between the data points and the cluster’s
centroid (arithmetic mean of all the data points that belong to that cluster) is at the
minimum. The less variation we have within clusters, the more homogeneous (similar)
the data points are within the same cluster.
The way kmeans algorithm works is as follows:
• Specify number of clusters K.
• Initialize centroids by first shuffling the dataset and then randomly selecting K data
points for the centroids without replacement.
• Keep iterating until there is no change to the centroids. i.e assignment of data points
to clusters isn’t changing.
• Compute the sum of the squared distance between data points and all centroids.
8
• Assign each data point to the closest cluster (centroid).
• Compute the centroids for the clusters by taking the average of the all data points
that belong to each cluster.
The approach kmeans follows to solve the problem is called Expectation-
Maximization. The E-step is assigning the data points to the closest cluster. The M-step is
computing the centroid of each cluster.
9
CHAPTER 4
MODULES
6.1 SKLEARN
Scikit-learn provides a range of supervised and unsupervised learning algorithms via a
consistent interface in Python.
It is licensed under a permissive simplified BSD license and is distributed under many
Linux distributions, encouraging academic and commercial use.
The library is built upon the SciPy (Scientific Python) that must be installed before you
can use scikit-learn. This stack that includes:
• NumPy: Base n-dimensional array package
• SciPy: Fundamental library for scientific computing
• Matplotlib: Comprehensive 2D/3D plotting
• IPython: Enhanced interactive console
• Sympy: Symbolic mathematics
• Pandas: Data structures and analysis
Extensions or modules for SciPy care conventionally named SciKits. As such, the module
provides learning algorithms and is named scikit-learn.
The vision for the library is a level of robustness and support required for use in
production systems. This means a deep focus on concerns such as easy of use, code
quality, collaboration, documentation and performance.
10
Although the interface is Python, c-libraries are leverage for performance such as numpy
for arrays and matrix operations, LAPACK, LibSVM and the careful use of cython.
The library is focused on modeling data. It is not focused on loading, manipulating and
summarizing data. For these features, refer to NumPy and Pandas.
FIGURE 1.1 CLUSTER ANALYSIS
Some popular groups of models provided by scikit-learn include:
• Clustering: for grouping unlabeled data such as KMeans.
• Cross Validation: for estimating the performance of supervised models on unseen
data.
• Datasets: for test datasets and for generating datasets with specific properties for
investigating model behavior.
11
• Dimensionality Reduction: for reducing the number of attributes in data for
summarization, visualization and feature selection such as Principal component
analysis.
• Ensemble methods: for combining the predictions of multiple supervised models.
• Feature extraction: for defining attributes in image and text data.
• Feature selection: for identifying meaningful attributes from which to create
supervised models.
• Parameter Tuning: for getting the most out of supervised models.
• Manifold Learning: For summarizing and depicting complex multi-dimensional
data.
• Supervised Models: a vast array not limited to generalized linear models,
discriminate analysis, naive bayes, lazy methods, neural networks, support vector
machines and decision trees.
6.2 NUMPY
NumPy is a module for Python. The name is an acronym for "Numeric Python" or
"Numerical Python". It is pronounced / (NUM-py) . It is an extension module for Python,
mostly written in C. This makes sure that the precompiled mathematical and numerical
functions and functionalities of Numpy guarantee great execution speed.
Furthermore, NumPy enriches the programming language Python with powerful data
structures, implementing multi-dimensional arrays and matrices. These data structures
guarantee efficient calculations with matrices and arrays. The implementation is even
aiming at huge matrices and arrays, better know under the heading of "big data". Besides
that the module supplies a large library of high-level mathematical functions to operate on
these matrices and arrays.
12
SciPy (Scientific Python) is often mentioned in the same breath with NumPy. SciPy needs
Numpy, as it is based on the data structures of Numpy and furthermore its basic creation
and manipulation functions. It extends the capabilities of NumPy with further useful
functions for minimization, regression, Fourier-transformation and many others.
Both NumPy and SciPy are not part of a basic Python installation. They have to be
installed after the Python installation. NumPy has to be installed before installing SciPy.
FIGURE 1.2 MATRIX VISUALISATION
(Comment: The diagram of the image on the right side is the graphical visualisation of a
matrix with 14 rows and 20 columns. It's a so-called Hinton diagram. The size of a square
within this diagram corresponds to the size of the value of the depicted matrix. The colour
determines, if the value is positive or negative. In our example: the colour red denotes
negative values and the colour green denotes positive values.)
NumPy is based on two earlier Python modules dealing with arrays. One of these is
Numeric. Numeric is like NumPy a Python module for high-performance, numeric
computing, but it is obsolete nowadays. Another predecessor of NumPy is Numarray,
which is a complete rewrite of Numeric but is deprecated as well. NumPy is a merger of
those two, i.e. it is build on the code of Numeric and the features of Numarray.
When we say "Core Python", we mean Python without any special modules, i.e.
especially without NumPy.
The advantages of Core Python:
13
• high-level number objects: integers, floating point
• containers: lists with cheap insertion and append methods, dictionaries with fast
lookup
Advantages of using Numpy with Python:
• array oriented computing
• efficiently implemented multi-dimensional arrays
• designed for scientific computation
6.3 WHOIS
The life of phishing site is very short, therefore; this DNS information may not be
available after some time. If the DNS record is not available anywhere then the website is
phishing. If the domain name of the suspicious webpage is not match with the WHOIS
database record, then webpage considers as phishing.
FIGURE 1.3 WHOIS
MODULE
14
CHAPTER 5
SCREENSHOTS
FIG 2.1 OUTPUT - PHISHING URL
15
FIG 2.2OUTPUT – ILLEGITIMATE URL
FIG 2.3 SERVER RUNNING
FIG 2.4 DATASETS
16
CHAPTER 6
CONCLUSION
The most important way to protect the user from phishing attack is the education
awareness. Internet users must be aware of all security tips which are given by experts.
Every user should also be trained not to blindly follow the links to websites where they
have to enter their sensitive information. It is essential to check the URL before entering
the website. In Future System can upgrade to automatic Detect the web page and the
compatibility of the Application with the web browser. Additional work also can be done
by adding some other characteristics to distinguishing the fake web pages from the
legitimate web pages. PhishChecker application also can be upgraded into the web phone
application in detecting phishing on the mobile platform.
There are many features that can be improved in the work, for various other issues. The
heuristics can be further developed to detect phishing attacks in the presence of embedded
objects like flash. Identity extraction is an important operation and it was improved with
the Optical Character Recognition (OCR) system to extract the text and images. More
effective inferring rules for identifying a given suspicious web page, and strategies for
discovering if it is a phishing target, should be designed in order to further improve the
overall performance of this system. Moreover, it is an open challenge to develop a robust
malware detection method, retaining accuracy for future phishing emails. In addition, the
dynamic and static features complement each other, and therefore both are considered
important in achieving high accuracy

More Related Content

Similar to A REPORT On DETECTION OF PHISHING WEBSITE USING MACHINE LEARNING

Nt1310 Final Exam Questions And Answers
Nt1310 Final Exam Questions And AnswersNt1310 Final Exam Questions And Answers
Nt1310 Final Exam Questions And AnswersLisa Williams
 
Top 5 Machine Learning Tools for Software Development in 2024.pdf
Top 5 Machine Learning Tools for Software Development in 2024.pdfTop 5 Machine Learning Tools for Software Development in 2024.pdf
Top 5 Machine Learning Tools for Software Development in 2024.pdfPolyxer Systems
 
12 Reasons Why Python is One of Best Language of Web App Development
12 Reasons Why Python is One of Best Language of Web App Development12 Reasons Why Python is One of Best Language of Web App Development
12 Reasons Why Python is One of Best Language of Web App DevelopmentSofiaCarter4
 
Building Web Applications with Python: Flask and Django Explained
Building Web Applications with Python: Flask and Django ExplainedBuilding Web Applications with Python: Flask and Django Explained
Building Web Applications with Python: Flask and Django Explainedpriyanka rajput
 
CTE 323 - Lecture 1.pptx
CTE 323 - Lecture 1.pptxCTE 323 - Lecture 1.pptx
CTE 323 - Lecture 1.pptxOduniyiAdebola
 
Rapid Web Development with Python for Absolute Beginners
Rapid Web Development with Python for Absolute BeginnersRapid Web Development with Python for Absolute Beginners
Rapid Web Development with Python for Absolute BeginnersFatih Karatana
 
Web Application Vulnerabilities
Web Application VulnerabilitiesWeb Application Vulnerabilities
Web Application VulnerabilitiesPamela Wright
 
9 good reasons why you must consider python for web applications
9 good reasons why you must consider python for web applications 9 good reasons why you must consider python for web applications
9 good reasons why you must consider python for web applications SnehaDas60
 
Why Your Business Should Leverage Python App Development in 2023.pptx
Why Your Business Should Leverage Python App Development in 2023.pptxWhy Your Business Should Leverage Python App Development in 2023.pptx
Why Your Business Should Leverage Python App Development in 2023.pptxOnGraph Technologies Pvt. Ltd.
 
Top 10 python frameworks for web development in 2020
Top 10 python frameworks for web development in 2020Top 10 python frameworks for web development in 2020
Top 10 python frameworks for web development in 2020Alaina Carter
 
How TypeScript App Development is Important.pdf
How TypeScript App Development is Important.pdfHow TypeScript App Development is Important.pdf
How TypeScript App Development is Important.pdfWDP Technologies
 
Contact Book Project in Python for Beginners.docx
Contact Book Project in Python for Beginners.docxContact Book Project in Python for Beginners.docx
Contact Book Project in Python for Beginners.docxAbhinavSharma309481
 
Type of apps that can be developed using python
Type of apps that can be developed using pythonType of apps that can be developed using python
Type of apps that can be developed using pythonSemidot Infotech
 
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...IRJET Journal
 
Malicious-URL Detection using Logistic Regression Technique
Malicious-URL Detection using Logistic Regression TechniqueMalicious-URL Detection using Logistic Regression Technique
Malicious-URL Detection using Logistic Regression TechniqueDr. Amarjeet Singh
 

Similar to A REPORT On DETECTION OF PHISHING WEBSITE USING MACHINE LEARNING (20)

Python.pdf
Python.pdfPython.pdf
Python.pdf
 
Nt1310 Final Exam Questions And Answers
Nt1310 Final Exam Questions And AnswersNt1310 Final Exam Questions And Answers
Nt1310 Final Exam Questions And Answers
 
Top 5 Machine Learning Tools for Software Development in 2024.pdf
Top 5 Machine Learning Tools for Software Development in 2024.pdfTop 5 Machine Learning Tools for Software Development in 2024.pdf
Top 5 Machine Learning Tools for Software Development in 2024.pdf
 
12 Reasons Why Python is One of Best Language of Web App Development
12 Reasons Why Python is One of Best Language of Web App Development12 Reasons Why Python is One of Best Language of Web App Development
12 Reasons Why Python is One of Best Language of Web App Development
 
Building Web Applications with Python: Flask and Django Explained
Building Web Applications with Python: Flask and Django ExplainedBuilding Web Applications with Python: Flask and Django Explained
Building Web Applications with Python: Flask and Django Explained
 
CTE 323 - Lecture 1.pptx
CTE 323 - Lecture 1.pptxCTE 323 - Lecture 1.pptx
CTE 323 - Lecture 1.pptx
 
Rapid Web Development with Python for Absolute Beginners
Rapid Web Development with Python for Absolute BeginnersRapid Web Development with Python for Absolute Beginners
Rapid Web Development with Python for Absolute Beginners
 
Web Application Vulnerabilities
Web Application VulnerabilitiesWeb Application Vulnerabilities
Web Application Vulnerabilities
 
9 good reasons why you must consider python for web applications
9 good reasons why you must consider python for web applications 9 good reasons why you must consider python for web applications
9 good reasons why you must consider python for web applications
 
Why Your Business Should Leverage Python App Development in 2023.pptx
Why Your Business Should Leverage Python App Development in 2023.pptxWhy Your Business Should Leverage Python App Development in 2023.pptx
Why Your Business Should Leverage Python App Development in 2023.pptx
 
Top 10 python frameworks for web development in 2020
Top 10 python frameworks for web development in 2020Top 10 python frameworks for web development in 2020
Top 10 python frameworks for web development in 2020
 
What makes python 3.11 special
What makes python 3.11 special What makes python 3.11 special
What makes python 3.11 special
 
How TypeScript App Development is Important.pdf
How TypeScript App Development is Important.pdfHow TypeScript App Development is Important.pdf
How TypeScript App Development is Important.pdf
 
Python content
Python contentPython content
Python content
 
Python language
Python languagePython language
Python language
 
Contact Book Project in Python for Beginners.docx
Contact Book Project in Python for Beginners.docxContact Book Project in Python for Beginners.docx
Contact Book Project in Python for Beginners.docx
 
Type of apps that can be developed using python
Type of apps that can be developed using pythonType of apps that can be developed using python
Type of apps that can be developed using python
 
Roy "Accelerating ML/AI Based R&D through Text & Data Mining"
Roy "Accelerating ML/AI Based R&D through Text & Data Mining"Roy "Accelerating ML/AI Based R&D through Text & Data Mining"
Roy "Accelerating ML/AI Based R&D through Text & Data Mining"
 
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...
 
Malicious-URL Detection using Logistic Regression Technique
Malicious-URL Detection using Logistic Regression TechniqueMalicious-URL Detection using Logistic Regression Technique
Malicious-URL Detection using Logistic Regression Technique
 

More from Emma Burke

How To Do Research Paper
How To Do Research PaperHow To Do Research Paper
How To Do Research PaperEmma Burke
 
How To Write An Essay For Grad School Admission C
How To Write An Essay For Grad School Admission CHow To Write An Essay For Grad School Admission C
How To Write An Essay For Grad School Admission CEmma Burke
 
Printable Letter Writing Template Lovely 178 Best I
Printable Letter Writing Template Lovely 178 Best IPrintable Letter Writing Template Lovely 178 Best I
Printable Letter Writing Template Lovely 178 Best IEmma Burke
 
Argumentative Essay About Coll
Argumentative Essay About CollArgumentative Essay About Coll
Argumentative Essay About CollEmma Burke
 
High School Essay Writing Guide - Getting Started - P
High School Essay Writing Guide - Getting Started - PHigh School Essay Writing Guide - Getting Started - P
High School Essay Writing Guide - Getting Started - PEmma Burke
 
Five Paragraph Essay Examples For High School
Five Paragraph Essay Examples For High SchoolFive Paragraph Essay Examples For High School
Five Paragraph Essay Examples For High SchoolEmma Burke
 
Writing Supporting Details
Writing Supporting DetailsWriting Supporting Details
Writing Supporting DetailsEmma Burke
 
Why College Is Worth It - Free Essay Example Pap
Why College Is Worth It - Free Essay Example PapWhy College Is Worth It - Free Essay Example Pap
Why College Is Worth It - Free Essay Example PapEmma Burke
 
Legitimate Essay Writing Servic
Legitimate Essay Writing ServicLegitimate Essay Writing Servic
Legitimate Essay Writing ServicEmma Burke
 
I Someone To Write My Essay, Write My UK Essay
I Someone To Write My Essay, Write My UK EssayI Someone To Write My Essay, Write My UK Essay
I Someone To Write My Essay, Write My UK EssayEmma Burke
 
Home - Ing. Sergio Selicato
Home - Ing. Sergio SelicatoHome - Ing. Sergio Selicato
Home - Ing. Sergio SelicatoEmma Burke
 
Dogs Vs Cats Persuasive Es
Dogs Vs Cats Persuasive EsDogs Vs Cats Persuasive Es
Dogs Vs Cats Persuasive EsEmma Burke
 
This Cute Frog Writing Paper Would Be Great To Use With
This Cute Frog Writing Paper Would Be Great To Use WithThis Cute Frog Writing Paper Would Be Great To Use With
This Cute Frog Writing Paper Would Be Great To Use WithEmma Burke
 
Websites For Research Paper Sources
Websites For Research Paper SourcesWebsites For Research Paper Sources
Websites For Research Paper SourcesEmma Burke
 
Thesis Statement In Comparison Essay - Thesi
Thesis Statement In Comparison Essay - ThesiThesis Statement In Comparison Essay - Thesi
Thesis Statement In Comparison Essay - ThesiEmma Burke
 
Free Why I Want To Go To College Essay Example Ess
Free Why I Want To Go To College Essay Example EssFree Why I Want To Go To College Essay Example Ess
Free Why I Want To Go To College Essay Example EssEmma Burke
 
Pin For Later Mla Research Paper Format, Mla Researc
Pin For Later Mla Research Paper Format, Mla ResearcPin For Later Mla Research Paper Format, Mla Researc
Pin For Later Mla Research Paper Format, Mla ResearcEmma Burke
 
PPT - Writing Essay Papers Help PowerPoint Presentation, Free Download
PPT - Writing Essay Papers Help PowerPoint Presentation, Free DownloadPPT - Writing Essay Papers Help PowerPoint Presentation, Free Download
PPT - Writing Essay Papers Help PowerPoint Presentation, Free DownloadEmma Burke
 
Learn How To Write An Essay On Career
Learn How To Write An Essay On CareerLearn How To Write An Essay On Career
Learn How To Write An Essay On CareerEmma Burke
 
Research Proposal - Infographic Writing A Rese
Research Proposal - Infographic Writing A ReseResearch Proposal - Infographic Writing A Rese
Research Proposal - Infographic Writing A ReseEmma Burke
 

More from Emma Burke (20)

How To Do Research Paper
How To Do Research PaperHow To Do Research Paper
How To Do Research Paper
 
How To Write An Essay For Grad School Admission C
How To Write An Essay For Grad School Admission CHow To Write An Essay For Grad School Admission C
How To Write An Essay For Grad School Admission C
 
Printable Letter Writing Template Lovely 178 Best I
Printable Letter Writing Template Lovely 178 Best IPrintable Letter Writing Template Lovely 178 Best I
Printable Letter Writing Template Lovely 178 Best I
 
Argumentative Essay About Coll
Argumentative Essay About CollArgumentative Essay About Coll
Argumentative Essay About Coll
 
High School Essay Writing Guide - Getting Started - P
High School Essay Writing Guide - Getting Started - PHigh School Essay Writing Guide - Getting Started - P
High School Essay Writing Guide - Getting Started - P
 
Five Paragraph Essay Examples For High School
Five Paragraph Essay Examples For High SchoolFive Paragraph Essay Examples For High School
Five Paragraph Essay Examples For High School
 
Writing Supporting Details
Writing Supporting DetailsWriting Supporting Details
Writing Supporting Details
 
Why College Is Worth It - Free Essay Example Pap
Why College Is Worth It - Free Essay Example PapWhy College Is Worth It - Free Essay Example Pap
Why College Is Worth It - Free Essay Example Pap
 
Legitimate Essay Writing Servic
Legitimate Essay Writing ServicLegitimate Essay Writing Servic
Legitimate Essay Writing Servic
 
I Someone To Write My Essay, Write My UK Essay
I Someone To Write My Essay, Write My UK EssayI Someone To Write My Essay, Write My UK Essay
I Someone To Write My Essay, Write My UK Essay
 
Home - Ing. Sergio Selicato
Home - Ing. Sergio SelicatoHome - Ing. Sergio Selicato
Home - Ing. Sergio Selicato
 
Dogs Vs Cats Persuasive Es
Dogs Vs Cats Persuasive EsDogs Vs Cats Persuasive Es
Dogs Vs Cats Persuasive Es
 
This Cute Frog Writing Paper Would Be Great To Use With
This Cute Frog Writing Paper Would Be Great To Use WithThis Cute Frog Writing Paper Would Be Great To Use With
This Cute Frog Writing Paper Would Be Great To Use With
 
Websites For Research Paper Sources
Websites For Research Paper SourcesWebsites For Research Paper Sources
Websites For Research Paper Sources
 
Thesis Statement In Comparison Essay - Thesi
Thesis Statement In Comparison Essay - ThesiThesis Statement In Comparison Essay - Thesi
Thesis Statement In Comparison Essay - Thesi
 
Free Why I Want To Go To College Essay Example Ess
Free Why I Want To Go To College Essay Example EssFree Why I Want To Go To College Essay Example Ess
Free Why I Want To Go To College Essay Example Ess
 
Pin For Later Mla Research Paper Format, Mla Researc
Pin For Later Mla Research Paper Format, Mla ResearcPin For Later Mla Research Paper Format, Mla Researc
Pin For Later Mla Research Paper Format, Mla Researc
 
PPT - Writing Essay Papers Help PowerPoint Presentation, Free Download
PPT - Writing Essay Papers Help PowerPoint Presentation, Free DownloadPPT - Writing Essay Papers Help PowerPoint Presentation, Free Download
PPT - Writing Essay Papers Help PowerPoint Presentation, Free Download
 
Learn How To Write An Essay On Career
Learn How To Write An Essay On CareerLearn How To Write An Essay On Career
Learn How To Write An Essay On Career
 
Research Proposal - Infographic Writing A Rese
Research Proposal - Infographic Writing A ReseResearch Proposal - Infographic Writing A Rese
Research Proposal - Infographic Writing A Rese
 

Recently uploaded

Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 

Recently uploaded (20)

Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
CĂłdigo Creativo y Arte de Software | Unidad 1
CĂłdigo Creativo y Arte de Software | Unidad 1CĂłdigo Creativo y Arte de Software | Unidad 1
CĂłdigo Creativo y Arte de Software | Unidad 1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

A REPORT On DETECTION OF PHISHING WEBSITE USING MACHINE LEARNING

  • 1. 1 A REPORT on DETECTION OF PHISHING WEBSITE USING MACHINE LEARNING NAME MANIKANTAN ARCOT REG N0 RA1711003040038 CLASS CSE –“C” 3RD YEAR
  • 2. 2 CHAPTER 1 INTRODUCTION Social engineering attack is a common security threat used to reveal private and confidential information by simply tricking the users without being detected. The main purpose of this attack is to gain sensitive information such as username, password and account numbers. According to, phishing or web spoofing technique is one example of social engineering attack. Phishing attack may appear in many types of communication forms such as messaging, SMS, VOIP and fraudster emails. Users commonly have many user accounts on various websites including social network, email and also accounts for banking. Therefore, the innocent web users are the most vulnerable targets towards this attack since the fact that most people are unaware of their valuable information, which helps to make this attack successful. Typically phishing attack exploits the social engineering to lure the victim through sending a spoofed link by redirecting the victim to a fake web page. The spoofed link is placed on the popular web pages or sent via email to the victim. The fake webpage is created similar to the legitimate webpage. Thus, rather than directing the victim request to the real web server, it will be directed to the attacker server. The current solutions of antivirus, firewall and designated software do not fully prevent the web spoofing attack. The implementation of Secure Socket Layer (SSL) and digital certificate (CA) also does not protect the web user against such attack. In web spoofing attack, the attacker diverts the request to fake web server. In fact, a certain type of SSL and CA can be forged while everything appears to be legitimate. According to, secure browsing connection does virtually nothing to protect the users especially from the attackers that have knowledge on how the “secure” connections actually work. This paper develops an anti-web spoofing solution based on inspecting the URLs of fake web pages. This solution developed series of steps to check characteristics of websites Uniform Resources Locators (URLs).
  • 3. 3 CHAPTER 2 ABOUT PROJECT This section describes the proposed model of phishing attack detection. The proposed model focuses on identifying the phishing attack based on checking phishing websites features, Blacklist and WHOIS database. According to few selected features can be used to differentiate between legitimate and spoofed web pages. These selected features are many such as URLs, domain identity, security & encryption, source code, page style and contents, web address bar and social human factor. This study focuses only on URLs and domain name features. Features of URLs and domain names are checked using several criteria such as IP Address, long URL address, adding a prefix or suffix, redirecting using the symbol “//”, and URLs having the symbol “@”.These features are inspected using a set of rules in order to distinguish URLs of phishing webpages from the URLs of legitimate websites. WORKING At first the data sets is created using the information collected from the various sources. Once the data set is created , this data set is fed to K Means clustering algorithm and the model is trained using this data set. A web application is developed , a front end GUI is created using HTML , CSS and simple JAVA script code and the model that is trained with the data sets that are created acts as a back end server. When phishing URL is fed to the model, the model analyses the URL that is fed and gives the appropriate output. Once the machine learning model analyses the given URL, it sends a message to the front end portal whether it is a legitimate site or a phishing site.
  • 4. 4 CHAPTER 3 TOOLS AND TECHNOLOGY 4.1 PYTHON In technical terms, Python is an object-oriented, high-level programming language with integrated dynamic semantics primarily for web and app development. It is extremely attractive in the field of Rapid Application Development because it offers dynamic typing and dynamic binding options. Python is relatively simple, so it's easy to learn since it requires a unique syntax that focuses on readability. Developers can read and translate Python code much easier than other languages. In turn, this reduces the cost of program maintenance and development because it allows teams to work collaboratively without significant language and experience barriers. Additionally, Python supports the use of modules and packages, which means that programs can be designed in a modular style and code can be reused across a variety of projects. Once you've developed a module or package you need, it can be scaled for use in other projects, and it's easy to import or export these modules. One of the most promising benefits of Python is that both the standard library and the interpreter are available free of charge, in both binary and source form. There is no exclusivity either, as Python and all the necessary tools are available on all major platforms. Therefore, it is an enticing option for developers who don't want to worry about paying high development costs. That makes Python accessible to almost anyone. If you have the time to learn, you can create some amazing things with the language.
  • 5. 5 Python is a general-purpose programming language, which is another way to say that it can be used for nearly everything. Most importantly, it is an interpreted language, which means that the written code is not actually translated to a computer-readable format at runtime. Whereas, most programming languages do this conversion before the program is even run. This type of language is also referred to as a "scripting language" because it was initially meant to be used for trivial projects. The concept of a "scripting language" has changed considerably since its inception, because Python is now used to write large, commercial style applications, instead of just banal ones. This reliance on Python has grown even more so as the internet gained popularity. A large majority of web applications and platforms rely on Python, including Google's search engine, YouTube, and the web-oriented transaction system of the New York Stock Exchange (NYSE). We know the language must be pretty serious when it's powering a stock exchange system. Python can also be used to process text, display numbers or images, solve scientific equations, and save data. In short, it is used behind the scenes to process a lot of elements you might need or encounter on your device(s) - mobile included. BENEFITS: 1) Python can be used to develop prototypes, and quickly because it is so easy to work with and read. 2) Most automation, data mining, and big data platforms rely on Python. 3) Python allows for a more productive coding environment than massive languages like C# and Java. Experienced coders tend to stay more organized and productive when working with Python
  • 6. 6 4) Python is easy to read, even if you're not a skilled programmer. Anyone can begin working with the language, all it takes is a bit of patience and a lot of practice. Plus, this makes it an ideal candidate for use among multi-programmer and large development teams. 5) Python powers Django, a complete and open source web application framework. Frameworks - like Ruby on Rails - can be used to simplify the development process. 6) It has a massive support base thanks to the fact that it is open source and community developed. Millions of like-minded developers work with the language on a daily basis and continue to improve core functionality. The latest version of Python continues to receive enhancements and updates as time progresses. This is a great way to network with other developers. 4. K MEANS CLUSTERING ALGORITHM Clustering is one of the most common exploratory data analysis technique used to get an intuition about the structure of the data. It can be defined as the task of identifying subgroups in the data such that data points in the same subgroup (cluster) are very similar while data points in different clusters are very different. In other words, we try to find homogeneous subgroups within the data such that data points in each cluster are as similar as possible according to a similarity measure such as euclidean-based distance or correlation-based distance. The decision of which similarity measure to use is applicationspecific. Clustering analysis can be done on the basis of features where we try to find subgroups of samples based on features or on the basis of samples where we try to find subgroups of features based on samples. We’ll cover here clustering based on features. Clustering is used in market segmentation; where we try to fined customers that are similar to each
  • 7. 7 other whether in terms of behaviors or attributes, image segmentation/compression; where we try to group similar regions together, document clustering based on topics, etc. Unlike supervised learning, clustering is considered an unsupervised learning method since we don’t have the ground truth to compare the output of the clustering algorithm to the true labels to evaluate its performance. We only want to try to investigate the structure of the data by grouping the data points into distinct subgroups. In this post, we will cover only Kmeans which is considered as one of the most used clustering algorithms due to its simplicity. Kmeans algorithm is an iterative algorithm that tries to partition the dataset into Kpredefined distinct non-overlapping subgroups (clusters) where each data point belongs to only one group. It tries to make the inter-cluster data points as similar as possible while also keeping the clusters as different (far) as possible. It assigns data points to a cluster such that the sum of the squared distance between the data points and the cluster’s centroid (arithmetic mean of all the data points that belong to that cluster) is at the minimum. The less variation we have within clusters, the more homogeneous (similar) the data points are within the same cluster. The way kmeans algorithm works is as follows: • Specify number of clusters K. • Initialize centroids by first shuffling the dataset and then randomly selecting K data points for the centroids without replacement. • Keep iterating until there is no change to the centroids. i.e assignment of data points to clusters isn’t changing. • Compute the sum of the squared distance between data points and all centroids.
  • 8. 8 • Assign each data point to the closest cluster (centroid). • Compute the centroids for the clusters by taking the average of the all data points that belong to each cluster. The approach kmeans follows to solve the problem is called Expectation- Maximization. The E-step is assigning the data points to the closest cluster. The M-step is computing the centroid of each cluster.
  • 9. 9 CHAPTER 4 MODULES 6.1 SKLEARN Scikit-learn provides a range of supervised and unsupervised learning algorithms via a consistent interface in Python. It is licensed under a permissive simplified BSD license and is distributed under many Linux distributions, encouraging academic and commercial use. The library is built upon the SciPy (Scientific Python) that must be installed before you can use scikit-learn. This stack that includes: • NumPy: Base n-dimensional array package • SciPy: Fundamental library for scientific computing • Matplotlib: Comprehensive 2D/3D plotting • IPython: Enhanced interactive console • Sympy: Symbolic mathematics • Pandas: Data structures and analysis Extensions or modules for SciPy care conventionally named SciKits. As such, the module provides learning algorithms and is named scikit-learn. The vision for the library is a level of robustness and support required for use in production systems. This means a deep focus on concerns such as easy of use, code quality, collaboration, documentation and performance.
  • 10. 10 Although the interface is Python, c-libraries are leverage for performance such as numpy for arrays and matrix operations, LAPACK, LibSVM and the careful use of cython. The library is focused on modeling data. It is not focused on loading, manipulating and summarizing data. For these features, refer to NumPy and Pandas. FIGURE 1.1 CLUSTER ANALYSIS Some popular groups of models provided by scikit-learn include: • Clustering: for grouping unlabeled data such as KMeans. • Cross Validation: for estimating the performance of supervised models on unseen data. • Datasets: for test datasets and for generating datasets with specific properties for investigating model behavior.
  • 11. 11 • Dimensionality Reduction: for reducing the number of attributes in data for summarization, visualization and feature selection such as Principal component analysis. • Ensemble methods: for combining the predictions of multiple supervised models. • Feature extraction: for defining attributes in image and text data. • Feature selection: for identifying meaningful attributes from which to create supervised models. • Parameter Tuning: for getting the most out of supervised models. • Manifold Learning: For summarizing and depicting complex multi-dimensional data. • Supervised Models: a vast array not limited to generalized linear models, discriminate analysis, naive bayes, lazy methods, neural networks, support vector machines and decision trees. 6.2 NUMPY NumPy is a module for Python. The name is an acronym for "Numeric Python" or "Numerical Python". It is pronounced / (NUM-py) . It is an extension module for Python, mostly written in C. This makes sure that the precompiled mathematical and numerical functions and functionalities of Numpy guarantee great execution speed. Furthermore, NumPy enriches the programming language Python with powerful data structures, implementing multi-dimensional arrays and matrices. These data structures guarantee efficient calculations with matrices and arrays. The implementation is even aiming at huge matrices and arrays, better know under the heading of "big data". Besides that the module supplies a large library of high-level mathematical functions to operate on these matrices and arrays.
  • 12. 12 SciPy (Scientific Python) is often mentioned in the same breath with NumPy. SciPy needs Numpy, as it is based on the data structures of Numpy and furthermore its basic creation and manipulation functions. It extends the capabilities of NumPy with further useful functions for minimization, regression, Fourier-transformation and many others. Both NumPy and SciPy are not part of a basic Python installation. They have to be installed after the Python installation. NumPy has to be installed before installing SciPy. FIGURE 1.2 MATRIX VISUALISATION (Comment: The diagram of the image on the right side is the graphical visualisation of a matrix with 14 rows and 20 columns. It's a so-called Hinton diagram. The size of a square within this diagram corresponds to the size of the value of the depicted matrix. The colour determines, if the value is positive or negative. In our example: the colour red denotes negative values and the colour green denotes positive values.) NumPy is based on two earlier Python modules dealing with arrays. One of these is Numeric. Numeric is like NumPy a Python module for high-performance, numeric computing, but it is obsolete nowadays. Another predecessor of NumPy is Numarray, which is a complete rewrite of Numeric but is deprecated as well. NumPy is a merger of those two, i.e. it is build on the code of Numeric and the features of Numarray. When we say "Core Python", we mean Python without any special modules, i.e. especially without NumPy. The advantages of Core Python:
  • 13. 13 • high-level number objects: integers, floating point • containers: lists with cheap insertion and append methods, dictionaries with fast lookup Advantages of using Numpy with Python: • array oriented computing • efficiently implemented multi-dimensional arrays • designed for scientific computation 6.3 WHOIS The life of phishing site is very short, therefore; this DNS information may not be available after some time. If the DNS record is not available anywhere then the website is phishing. If the domain name of the suspicious webpage is not match with the WHOIS database record, then webpage considers as phishing. FIGURE 1.3 WHOIS MODULE
  • 14. 14 CHAPTER 5 SCREENSHOTS FIG 2.1 OUTPUT - PHISHING URL
  • 15. 15 FIG 2.2OUTPUT – ILLEGITIMATE URL FIG 2.3 SERVER RUNNING FIG 2.4 DATASETS
  • 16. 16 CHAPTER 6 CONCLUSION The most important way to protect the user from phishing attack is the education awareness. Internet users must be aware of all security tips which are given by experts. Every user should also be trained not to blindly follow the links to websites where they have to enter their sensitive information. It is essential to check the URL before entering the website. In Future System can upgrade to automatic Detect the web page and the compatibility of the Application with the web browser. Additional work also can be done by adding some other characteristics to distinguishing the fake web pages from the legitimate web pages. PhishChecker application also can be upgraded into the web phone application in detecting phishing on the mobile platform. There are many features that can be improved in the work, for various other issues. The heuristics can be further developed to detect phishing attacks in the presence of embedded objects like flash. Identity extraction is an important operation and it was improved with the Optical Character Recognition (OCR) system to extract the text and images. More effective inferring rules for identifying a given suspicious web page, and strategies for discovering if it is a phishing target, should be designed in order to further improve the overall performance of this system. Moreover, it is an open challenge to develop a robust malware detection method, retaining accuracy for future phishing emails. In addition, the dynamic and static features complement each other, and therefore both are considered important in achieving high accuracy