SlideShare a Scribd company logo
1
A REPORT
on
DETECTION OF PHISHING
WEBSITE USING MACHINE
LEARNING
NAME MANIKANTAN ARCOT
REG N0 RA1711003040038
CLASS CSE –“C” 3RD
YEAR
2
CHAPTER 1
INTRODUCTION
Social engineering attack is a common security threat used to reveal private and
confidential information by simply tricking the users without being detected. The main
purpose of this attack is to gain sensitive information such as username, password and
account numbers. According to, phishing or web spoofing technique is one example of
social engineering attack. Phishing attack may appear in many types of communication
forms such as messaging, SMS, VOIP and fraudster emails. Users commonly have many
user accounts on various websites including social network, email and also accounts for
banking. Therefore, the innocent web users are the most vulnerable targets towards this
attack since the fact that most people are unaware of their valuable information, which
helps to make this attack successful.
Typically phishing attack exploits the social engineering to lure the victim through
sending a spoofed link by redirecting the victim to a fake web page. The spoofed link is
placed on the popular web pages or sent via email to the victim. The fake webpage is
created similar to the legitimate webpage. Thus, rather than directing the victim request to
the real web server, it will be directed to the attacker server. The current solutions of
antivirus, firewall and designated software do not fully prevent the web spoofing attack.
The implementation of Secure Socket Layer (SSL) and digital certificate (CA) also does
not protect the web user against such attack. In web spoofing attack, the attacker diverts
the request to fake web server. In fact, a certain type of SSL and CA can be forged while
everything appears to be legitimate. According to, secure browsing connection does
virtually nothing to protect the users especially from the attackers that have knowledge on
how the “secure” connections actually work. This paper develops an anti-web spoofing
solution based on inspecting the URLs of fake web pages. This solution developed series
of steps to check characteristics of websites Uniform Resources Locators (URLs).
3
CHAPTER 2
ABOUT PROJECT
This section describes the proposed model of phishing attack detection. The proposed
model focuses on identifying the phishing attack based on checking phishing websites
features, Blacklist and WHOIS database. According to few selected features can be used
to differentiate between legitimate and spoofed web pages. These selected features are
many such as URLs, domain identity, security & encryption, source code, page style and
contents, web address bar and social human factor. This study focuses only on URLs and
domain name features. Features of URLs and domain names are checked using several
criteria such as IP Address, long URL address, adding a prefix or suffix, redirecting using
the symbol “//”, and URLs having the symbol “@”.These features are inspected using a
set of rules in order to distinguish URLs of phishing webpages from the URLs of
legitimate websites.
WORKING
At first the data sets is created using the information collected from the various sources.
Once the data set is created , this data set is fed to K Means clustering algorithm and the
model is trained using this data set. A web application is developed , a front end GUI is
created using HTML , CSS and simple JAVA script code and the model that is trained
with the data sets that are created acts as a back end server.
When phishing URL is fed to the model, the model analyses the URL that is fed and
gives the appropriate output. Once the machine learning model analyses the given URL,
it sends a message to the front end portal whether it is a legitimate site or a phishing site.
4
CHAPTER 3
TOOLS AND TECHNOLOGY
4.1 PYTHON
In technical terms, Python is an object-oriented, high-level programming language with
integrated dynamic semantics primarily for web and app development. It is extremely
attractive in the field of Rapid Application Development because it offers dynamic typing
and dynamic binding options.
Python is relatively simple, so it's easy to learn since it requires a unique syntax that
focuses on readability. Developers can read and translate Python code much easier than
other languages. In turn, this reduces the cost of program maintenance and development
because it allows teams to work collaboratively without significant language and
experience barriers.
Additionally, Python supports the use of modules and packages, which means that
programs can be designed in a modular style and code can be reused across a variety of
projects. Once you've developed a module or package you need, it can be scaled for use
in other projects, and it's easy to import or export these modules.
One of the most promising benefits of Python is that both the standard library and the
interpreter are available free of charge, in both binary and source form. There is no
exclusivity either, as Python and all the necessary tools are available on all major
platforms. Therefore, it is an enticing option for developers who don't want to worry
about paying high development costs.
That makes Python accessible to almost anyone. If you have the time to learn, you can
create some amazing things with the language.
5
Python is a general-purpose programming language, which is another way to say that it
can be used for nearly everything. Most importantly, it is an interpreted language, which
means that the written code is not actually translated to a computer-readable format at
runtime. Whereas, most programming languages do this conversion before the program is
even run. This type of language is also referred to as a "scripting language" because it
was initially meant to be used for trivial projects.
The concept of a "scripting language" has changed considerably since its inception,
because Python is now used to write large, commercial style applications, instead of just
banal ones. This reliance on Python has grown even more so as the internet gained
popularity. A large majority of web applications and platforms rely on Python, including
Google's search engine, YouTube, and the web-oriented transaction system of the New
York Stock Exchange (NYSE). We know the language must be pretty serious when it's
powering a stock exchange system.
Python can also be used to process text, display numbers or images, solve scientific
equations, and save data. In short, it is used behind the scenes to process a lot of elements
you might need or encounter on your device(s) - mobile included.
BENEFITS:
1) Python can be used to develop prototypes, and quickly because it is so easy to work
with and read.
2) Most automation, data mining, and big data platforms rely on Python.
3) Python allows for a more productive coding environment than massive languages like
C# and Java. Experienced coders tend to stay more organized and productive when
working with Python
6
4) Python is easy to read, even if you're not a skilled programmer. Anyone can begin
working with the language, all it takes is a bit of patience and a lot of practice. Plus,
this makes it an ideal candidate for use among multi-programmer and large
development teams.
5) Python powers Django, a complete and open source web application framework.
Frameworks - like Ruby on Rails - can be used to simplify the development process.
6) It has a massive support base thanks to the fact that it is open source and community
developed. Millions of like-minded developers work with the language on a daily
basis and continue to improve core functionality. The latest version of Python
continues to receive enhancements and updates as time progresses. This is a great way
to network with other developers.
4. K MEANS CLUSTERING ALGORITHM
Clustering is one of the most common exploratory data analysis technique used to get an
intuition about the structure of the data. It can be defined as the task of identifying
subgroups in the data such that data points in the same subgroup (cluster) are very similar
while data points in different clusters are very different. In other words, we try to find
homogeneous subgroups within the data such that data points in each cluster are as
similar as possible according to a similarity measure such as euclidean-based distance or
correlation-based distance. The decision of which similarity measure to use is
applicationspecific.
Clustering analysis can be done on the basis of features where we try to find subgroups of
samples based on features or on the basis of samples where we try to find subgroups of
features based on samples. We’ll cover here clustering based on features. Clustering is
used in market segmentation; where we try to fined customers that are similar to each
7
other whether in terms of behaviors or attributes, image segmentation/compression;
where we try to group similar regions together, document clustering based on topics, etc.
Unlike supervised learning, clustering is considered an unsupervised learning method
since we don’t have the ground truth to compare the output of the clustering algorithm to
the true labels to evaluate its performance. We only want to try to investigate the structure
of the data by grouping the data points into distinct subgroups.
In this post, we will cover only Kmeans which is considered as one of the most used
clustering algorithms due to its simplicity.
Kmeans algorithm is an iterative algorithm that tries to partition the dataset into
Kpredefined distinct non-overlapping subgroups (clusters) where each data point belongs
to only one group. It tries to make the inter-cluster data points as similar as possible while
also keeping the clusters as different (far) as possible. It assigns data points to a cluster
such that the sum of the squared distance between the data points and the cluster’s
centroid (arithmetic mean of all the data points that belong to that cluster) is at the
minimum. The less variation we have within clusters, the more homogeneous (similar)
the data points are within the same cluster.
The way kmeans algorithm works is as follows:
• Specify number of clusters K.
• Initialize centroids by first shuffling the dataset and then randomly selecting K data
points for the centroids without replacement.
• Keep iterating until there is no change to the centroids. i.e assignment of data points
to clusters isn’t changing.
• Compute the sum of the squared distance between data points and all centroids.
8
• Assign each data point to the closest cluster (centroid).
• Compute the centroids for the clusters by taking the average of the all data points
that belong to each cluster.
The approach kmeans follows to solve the problem is called Expectation-
Maximization. The E-step is assigning the data points to the closest cluster. The M-step is
computing the centroid of each cluster.
9
CHAPTER 4
MODULES
6.1 SKLEARN
Scikit-learn provides a range of supervised and unsupervised learning algorithms via a
consistent interface in Python.
It is licensed under a permissive simplified BSD license and is distributed under many
Linux distributions, encouraging academic and commercial use.
The library is built upon the SciPy (Scientific Python) that must be installed before you
can use scikit-learn. This stack that includes:
• NumPy: Base n-dimensional array package
• SciPy: Fundamental library for scientific computing
• Matplotlib: Comprehensive 2D/3D plotting
• IPython: Enhanced interactive console
• Sympy: Symbolic mathematics
• Pandas: Data structures and analysis
Extensions or modules for SciPy care conventionally named SciKits. As such, the module
provides learning algorithms and is named scikit-learn.
The vision for the library is a level of robustness and support required for use in
production systems. This means a deep focus on concerns such as easy of use, code
quality, collaboration, documentation and performance.
10
Although the interface is Python, c-libraries are leverage for performance such as numpy
for arrays and matrix operations, LAPACK, LibSVM and the careful use of cython.
The library is focused on modeling data. It is not focused on loading, manipulating and
summarizing data. For these features, refer to NumPy and Pandas.
FIGURE 1.1 CLUSTER ANALYSIS
Some popular groups of models provided by scikit-learn include:
• Clustering: for grouping unlabeled data such as KMeans.
• Cross Validation: for estimating the performance of supervised models on unseen
data.
• Datasets: for test datasets and for generating datasets with specific properties for
investigating model behavior.
11
• Dimensionality Reduction: for reducing the number of attributes in data for
summarization, visualization and feature selection such as Principal component
analysis.
• Ensemble methods: for combining the predictions of multiple supervised models.
• Feature extraction: for defining attributes in image and text data.
• Feature selection: for identifying meaningful attributes from which to create
supervised models.
• Parameter Tuning: for getting the most out of supervised models.
• Manifold Learning: For summarizing and depicting complex multi-dimensional
data.
• Supervised Models: a vast array not limited to generalized linear models,
discriminate analysis, naive bayes, lazy methods, neural networks, support vector
machines and decision trees.
6.2 NUMPY
NumPy is a module for Python. The name is an acronym for "Numeric Python" or
"Numerical Python". It is pronounced / (NUM-py) . It is an extension module for Python,
mostly written in C. This makes sure that the precompiled mathematical and numerical
functions and functionalities of Numpy guarantee great execution speed.
Furthermore, NumPy enriches the programming language Python with powerful data
structures, implementing multi-dimensional arrays and matrices. These data structures
guarantee efficient calculations with matrices and arrays. The implementation is even
aiming at huge matrices and arrays, better know under the heading of "big data". Besides
that the module supplies a large library of high-level mathematical functions to operate on
these matrices and arrays.
12
SciPy (Scientific Python) is often mentioned in the same breath with NumPy. SciPy needs
Numpy, as it is based on the data structures of Numpy and furthermore its basic creation
and manipulation functions. It extends the capabilities of NumPy with further useful
functions for minimization, regression, Fourier-transformation and many others.
Both NumPy and SciPy are not part of a basic Python installation. They have to be
installed after the Python installation. NumPy has to be installed before installing SciPy.
FIGURE 1.2 MATRIX VISUALISATION
(Comment: The diagram of the image on the right side is the graphical visualisation of a
matrix with 14 rows and 20 columns. It's a so-called Hinton diagram. The size of a square
within this diagram corresponds to the size of the value of the depicted matrix. The colour
determines, if the value is positive or negative. In our example: the colour red denotes
negative values and the colour green denotes positive values.)
NumPy is based on two earlier Python modules dealing with arrays. One of these is
Numeric. Numeric is like NumPy a Python module for high-performance, numeric
computing, but it is obsolete nowadays. Another predecessor of NumPy is Numarray,
which is a complete rewrite of Numeric but is deprecated as well. NumPy is a merger of
those two, i.e. it is build on the code of Numeric and the features of Numarray.
When we say "Core Python", we mean Python without any special modules, i.e.
especially without NumPy.
The advantages of Core Python:
13
• high-level number objects: integers, floating point
• containers: lists with cheap insertion and append methods, dictionaries with fast
lookup
Advantages of using Numpy with Python:
• array oriented computing
• efficiently implemented multi-dimensional arrays
• designed for scientific computation
6.3 WHOIS
The life of phishing site is very short, therefore; this DNS information may not be
available after some time. If the DNS record is not available anywhere then the website is
phishing. If the domain name of the suspicious webpage is not match with the WHOIS
database record, then webpage considers as phishing.
FIGURE 1.3 WHOIS
MODULE
14
CHAPTER 5
SCREENSHOTS
FIG 2.1 OUTPUT - PHISHING URL
15
FIG 2.2OUTPUT – ILLEGITIMATE URL
FIG 2.3 SERVER RUNNING
FIG 2.4 DATASETS
16
CHAPTER 6
CONCLUSION
The most important way to protect the user from phishing attack is the education
awareness. Internet users must be aware of all security tips which are given by experts.
Every user should also be trained not to blindly follow the links to websites where they
have to enter their sensitive information. It is essential to check the URL before entering
the website. In Future System can upgrade to automatic Detect the web page and the
compatibility of the Application with the web browser. Additional work also can be done
by adding some other characteristics to distinguishing the fake web pages from the
legitimate web pages. PhishChecker application also can be upgraded into the web phone
application in detecting phishing on the mobile platform.
There are many features that can be improved in the work, for various other issues. The
heuristics can be further developed to detect phishing attacks in the presence of embedded
objects like flash. Identity extraction is an important operation and it was improved with
the Optical Character Recognition (OCR) system to extract the text and images. More
effective inferring rules for identifying a given suspicious web page, and strategies for
discovering if it is a phishing target, should be designed in order to further improve the
overall performance of this system. Moreover, it is an open challenge to develop a robust
malware detection method, retaining accuracy for future phishing emails. In addition, the
dynamic and static features complement each other, and therefore both are considered
important in achieving high accuracy

More Related Content

Similar to A REPORT On DETECTION OF PHISHING WEBSITE USING MACHINE LEARNING

Python.pdf
Python.pdfPython.pdf
Python.pdf
Kajal Digital
 
Nt1310 Final Exam Questions And Answers
Nt1310 Final Exam Questions And AnswersNt1310 Final Exam Questions And Answers
Nt1310 Final Exam Questions And Answers
Lisa Williams
 
Top 5 Machine Learning Tools for Software Development in 2024.pdf
Top 5 Machine Learning Tools for Software Development in 2024.pdfTop 5 Machine Learning Tools for Software Development in 2024.pdf
Top 5 Machine Learning Tools for Software Development in 2024.pdf
Polyxer Systems
 
12 Reasons Why Python is One of Best Language of Web App Development
12 Reasons Why Python is One of Best Language of Web App Development12 Reasons Why Python is One of Best Language of Web App Development
12 Reasons Why Python is One of Best Language of Web App Development
SofiaCarter4
 
Building Web Applications with Python: Flask and Django Explained
Building Web Applications with Python: Flask and Django ExplainedBuilding Web Applications with Python: Flask and Django Explained
Building Web Applications with Python: Flask and Django Explained
priyanka rajput
 
CTE 323 - Lecture 1.pptx
CTE 323 - Lecture 1.pptxCTE 323 - Lecture 1.pptx
CTE 323 - Lecture 1.pptx
OduniyiAdebola
 
Rapid Web Development with Python for Absolute Beginners
Rapid Web Development with Python for Absolute BeginnersRapid Web Development with Python for Absolute Beginners
Rapid Web Development with Python for Absolute Beginners
Fatih Karatana
 
Web Application Vulnerabilities
Web Application VulnerabilitiesWeb Application Vulnerabilities
Web Application Vulnerabilities
Pamela Wright
 
9 good reasons why you must consider python for web applications
9 good reasons why you must consider python for web applications 9 good reasons why you must consider python for web applications
9 good reasons why you must consider python for web applications
SnehaDas60
 
Why Your Business Should Leverage Python App Development in 2023.pptx
Why Your Business Should Leverage Python App Development in 2023.pptxWhy Your Business Should Leverage Python App Development in 2023.pptx
Why Your Business Should Leverage Python App Development in 2023.pptx
OnGraph Technologies Pvt. Ltd.
 
Top 10 python frameworks for web development in 2020
Top 10 python frameworks for web development in 2020Top 10 python frameworks for web development in 2020
Top 10 python frameworks for web development in 2020
Alaina Carter
 
Can I Develop an Mobile Apps with Python
Can I Develop an Mobile Apps with PythonCan I Develop an Mobile Apps with Python
Can I Develop an Mobile Apps with Python
Mobulous Technologies
 
What makes python 3.11 special
What makes python 3.11 special What makes python 3.11 special
What makes python 3.11 special
Moon Technolabs Pvt. Ltd.
 
How TypeScript App Development is Important.pdf
How TypeScript App Development is Important.pdfHow TypeScript App Development is Important.pdf
How TypeScript App Development is Important.pdf
WDP Technologies
 
Python content
Python contentPython content
Python content
MUDDUKRISHNA14
 
Python language
Python languagePython language
Python language
prakashnachnani
 
Contact Book Project in Python for Beginners.docx
Contact Book Project in Python for Beginners.docxContact Book Project in Python for Beginners.docx
Contact Book Project in Python for Beginners.docx
AbhinavSharma309481
 
Type of apps that can be developed using python
Type of apps that can be developed using pythonType of apps that can be developed using python
Type of apps that can be developed using python
Semidot Infotech
 
Roy "Accelerating ML/AI Based R&D through Text & Data Mining"
Roy "Accelerating ML/AI Based R&D through Text & Data Mining"Roy "Accelerating ML/AI Based R&D through Text & Data Mining"
Roy "Accelerating ML/AI Based R&D through Text & Data Mining"
National Information Standards Organization (NISO)
 
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET Journal
 

Similar to A REPORT On DETECTION OF PHISHING WEBSITE USING MACHINE LEARNING (20)

Python.pdf
Python.pdfPython.pdf
Python.pdf
 
Nt1310 Final Exam Questions And Answers
Nt1310 Final Exam Questions And AnswersNt1310 Final Exam Questions And Answers
Nt1310 Final Exam Questions And Answers
 
Top 5 Machine Learning Tools for Software Development in 2024.pdf
Top 5 Machine Learning Tools for Software Development in 2024.pdfTop 5 Machine Learning Tools for Software Development in 2024.pdf
Top 5 Machine Learning Tools for Software Development in 2024.pdf
 
12 Reasons Why Python is One of Best Language of Web App Development
12 Reasons Why Python is One of Best Language of Web App Development12 Reasons Why Python is One of Best Language of Web App Development
12 Reasons Why Python is One of Best Language of Web App Development
 
Building Web Applications with Python: Flask and Django Explained
Building Web Applications with Python: Flask and Django ExplainedBuilding Web Applications with Python: Flask and Django Explained
Building Web Applications with Python: Flask and Django Explained
 
CTE 323 - Lecture 1.pptx
CTE 323 - Lecture 1.pptxCTE 323 - Lecture 1.pptx
CTE 323 - Lecture 1.pptx
 
Rapid Web Development with Python for Absolute Beginners
Rapid Web Development with Python for Absolute BeginnersRapid Web Development with Python for Absolute Beginners
Rapid Web Development with Python for Absolute Beginners
 
Web Application Vulnerabilities
Web Application VulnerabilitiesWeb Application Vulnerabilities
Web Application Vulnerabilities
 
9 good reasons why you must consider python for web applications
9 good reasons why you must consider python for web applications 9 good reasons why you must consider python for web applications
9 good reasons why you must consider python for web applications
 
Why Your Business Should Leverage Python App Development in 2023.pptx
Why Your Business Should Leverage Python App Development in 2023.pptxWhy Your Business Should Leverage Python App Development in 2023.pptx
Why Your Business Should Leverage Python App Development in 2023.pptx
 
Top 10 python frameworks for web development in 2020
Top 10 python frameworks for web development in 2020Top 10 python frameworks for web development in 2020
Top 10 python frameworks for web development in 2020
 
Can I Develop an Mobile Apps with Python
Can I Develop an Mobile Apps with PythonCan I Develop an Mobile Apps with Python
Can I Develop an Mobile Apps with Python
 
What makes python 3.11 special
What makes python 3.11 special What makes python 3.11 special
What makes python 3.11 special
 
How TypeScript App Development is Important.pdf
How TypeScript App Development is Important.pdfHow TypeScript App Development is Important.pdf
How TypeScript App Development is Important.pdf
 
Python content
Python contentPython content
Python content
 
Python language
Python languagePython language
Python language
 
Contact Book Project in Python for Beginners.docx
Contact Book Project in Python for Beginners.docxContact Book Project in Python for Beginners.docx
Contact Book Project in Python for Beginners.docx
 
Type of apps that can be developed using python
Type of apps that can be developed using pythonType of apps that can be developed using python
Type of apps that can be developed using python
 
Roy "Accelerating ML/AI Based R&D through Text & Data Mining"
Roy "Accelerating ML/AI Based R&D through Text & Data Mining"Roy "Accelerating ML/AI Based R&D through Text & Data Mining"
Roy "Accelerating ML/AI Based R&D through Text & Data Mining"
 
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...
 

More from Emma Burke

How To Do Research Paper
How To Do Research PaperHow To Do Research Paper
How To Do Research Paper
Emma Burke
 
How To Write An Essay For Grad School Admission C
How To Write An Essay For Grad School Admission CHow To Write An Essay For Grad School Admission C
How To Write An Essay For Grad School Admission C
Emma Burke
 
Printable Letter Writing Template Lovely 178 Best I
Printable Letter Writing Template Lovely 178 Best IPrintable Letter Writing Template Lovely 178 Best I
Printable Letter Writing Template Lovely 178 Best I
Emma Burke
 
Argumentative Essay About Coll
Argumentative Essay About CollArgumentative Essay About Coll
Argumentative Essay About Coll
Emma Burke
 
High School Essay Writing Guide - Getting Started - P
High School Essay Writing Guide - Getting Started - PHigh School Essay Writing Guide - Getting Started - P
High School Essay Writing Guide - Getting Started - P
Emma Burke
 
Five Paragraph Essay Examples For High School
Five Paragraph Essay Examples For High SchoolFive Paragraph Essay Examples For High School
Five Paragraph Essay Examples For High School
Emma Burke
 
Writing Supporting Details
Writing Supporting DetailsWriting Supporting Details
Writing Supporting Details
Emma Burke
 
Why College Is Worth It - Free Essay Example Pap
Why College Is Worth It - Free Essay Example PapWhy College Is Worth It - Free Essay Example Pap
Why College Is Worth It - Free Essay Example Pap
Emma Burke
 
Legitimate Essay Writing Servic
Legitimate Essay Writing ServicLegitimate Essay Writing Servic
Legitimate Essay Writing Servic
Emma Burke
 
I Someone To Write My Essay, Write My UK Essay
I Someone To Write My Essay, Write My UK EssayI Someone To Write My Essay, Write My UK Essay
I Someone To Write My Essay, Write My UK Essay
Emma Burke
 
Home - Ing. Sergio Selicato
Home - Ing. Sergio SelicatoHome - Ing. Sergio Selicato
Home - Ing. Sergio Selicato
Emma Burke
 
Dogs Vs Cats Persuasive Es
Dogs Vs Cats Persuasive EsDogs Vs Cats Persuasive Es
Dogs Vs Cats Persuasive Es
Emma Burke
 
This Cute Frog Writing Paper Would Be Great To Use With
This Cute Frog Writing Paper Would Be Great To Use WithThis Cute Frog Writing Paper Would Be Great To Use With
This Cute Frog Writing Paper Would Be Great To Use With
Emma Burke
 
Websites For Research Paper Sources
Websites For Research Paper SourcesWebsites For Research Paper Sources
Websites For Research Paper Sources
Emma Burke
 
Thesis Statement In Comparison Essay - Thesi
Thesis Statement In Comparison Essay - ThesiThesis Statement In Comparison Essay - Thesi
Thesis Statement In Comparison Essay - Thesi
Emma Burke
 
Free Why I Want To Go To College Essay Example Ess
Free Why I Want To Go To College Essay Example EssFree Why I Want To Go To College Essay Example Ess
Free Why I Want To Go To College Essay Example Ess
Emma Burke
 
Pin For Later Mla Research Paper Format, Mla Researc
Pin For Later Mla Research Paper Format, Mla ResearcPin For Later Mla Research Paper Format, Mla Researc
Pin For Later Mla Research Paper Format, Mla Researc
Emma Burke
 
PPT - Writing Essay Papers Help PowerPoint Presentation, Free Download
PPT - Writing Essay Papers Help PowerPoint Presentation, Free DownloadPPT - Writing Essay Papers Help PowerPoint Presentation, Free Download
PPT - Writing Essay Papers Help PowerPoint Presentation, Free Download
Emma Burke
 
Learn How To Write An Essay On Career
Learn How To Write An Essay On CareerLearn How To Write An Essay On Career
Learn How To Write An Essay On Career
Emma Burke
 
Research Proposal - Infographic Writing A Rese
Research Proposal - Infographic Writing A ReseResearch Proposal - Infographic Writing A Rese
Research Proposal - Infographic Writing A Rese
Emma Burke
 

More from Emma Burke (20)

How To Do Research Paper
How To Do Research PaperHow To Do Research Paper
How To Do Research Paper
 
How To Write An Essay For Grad School Admission C
How To Write An Essay For Grad School Admission CHow To Write An Essay For Grad School Admission C
How To Write An Essay For Grad School Admission C
 
Printable Letter Writing Template Lovely 178 Best I
Printable Letter Writing Template Lovely 178 Best IPrintable Letter Writing Template Lovely 178 Best I
Printable Letter Writing Template Lovely 178 Best I
 
Argumentative Essay About Coll
Argumentative Essay About CollArgumentative Essay About Coll
Argumentative Essay About Coll
 
High School Essay Writing Guide - Getting Started - P
High School Essay Writing Guide - Getting Started - PHigh School Essay Writing Guide - Getting Started - P
High School Essay Writing Guide - Getting Started - P
 
Five Paragraph Essay Examples For High School
Five Paragraph Essay Examples For High SchoolFive Paragraph Essay Examples For High School
Five Paragraph Essay Examples For High School
 
Writing Supporting Details
Writing Supporting DetailsWriting Supporting Details
Writing Supporting Details
 
Why College Is Worth It - Free Essay Example Pap
Why College Is Worth It - Free Essay Example PapWhy College Is Worth It - Free Essay Example Pap
Why College Is Worth It - Free Essay Example Pap
 
Legitimate Essay Writing Servic
Legitimate Essay Writing ServicLegitimate Essay Writing Servic
Legitimate Essay Writing Servic
 
I Someone To Write My Essay, Write My UK Essay
I Someone To Write My Essay, Write My UK EssayI Someone To Write My Essay, Write My UK Essay
I Someone To Write My Essay, Write My UK Essay
 
Home - Ing. Sergio Selicato
Home - Ing. Sergio SelicatoHome - Ing. Sergio Selicato
Home - Ing. Sergio Selicato
 
Dogs Vs Cats Persuasive Es
Dogs Vs Cats Persuasive EsDogs Vs Cats Persuasive Es
Dogs Vs Cats Persuasive Es
 
This Cute Frog Writing Paper Would Be Great To Use With
This Cute Frog Writing Paper Would Be Great To Use WithThis Cute Frog Writing Paper Would Be Great To Use With
This Cute Frog Writing Paper Would Be Great To Use With
 
Websites For Research Paper Sources
Websites For Research Paper SourcesWebsites For Research Paper Sources
Websites For Research Paper Sources
 
Thesis Statement In Comparison Essay - Thesi
Thesis Statement In Comparison Essay - ThesiThesis Statement In Comparison Essay - Thesi
Thesis Statement In Comparison Essay - Thesi
 
Free Why I Want To Go To College Essay Example Ess
Free Why I Want To Go To College Essay Example EssFree Why I Want To Go To College Essay Example Ess
Free Why I Want To Go To College Essay Example Ess
 
Pin For Later Mla Research Paper Format, Mla Researc
Pin For Later Mla Research Paper Format, Mla ResearcPin For Later Mla Research Paper Format, Mla Researc
Pin For Later Mla Research Paper Format, Mla Researc
 
PPT - Writing Essay Papers Help PowerPoint Presentation, Free Download
PPT - Writing Essay Papers Help PowerPoint Presentation, Free DownloadPPT - Writing Essay Papers Help PowerPoint Presentation, Free Download
PPT - Writing Essay Papers Help PowerPoint Presentation, Free Download
 
Learn How To Write An Essay On Career
Learn How To Write An Essay On CareerLearn How To Write An Essay On Career
Learn How To Write An Essay On Career
 
Research Proposal - Infographic Writing A Rese
Research Proposal - Infographic Writing A ReseResearch Proposal - Infographic Writing A Rese
Research Proposal - Infographic Writing A Rese
 

Recently uploaded

Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
ImMuslim
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
The basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptxThe basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptx
heathfieldcps1
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
RidwanHassanYusuf
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
Celine George
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
Prof. Dr. K. Adisesha
 
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
EduSkills OECD
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
nitinpv4ai
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
RamseyBerglund
 
Stack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 MicroprocessorStack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 Microprocessor
JomonJoseph58
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
deepaannamalai16
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
nitinpv4ai
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
imrankhan141184
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
zuzanka
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
blueshagoo1
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
TechSoup
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
Krassimira Luka
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
deepaannamalai16
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 

Recently uploaded (20)

Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
The basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptxThe basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptx
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
 
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
 
Stack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 MicroprocessorStack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 Microprocessor
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 

A REPORT On DETECTION OF PHISHING WEBSITE USING MACHINE LEARNING

  • 1. 1 A REPORT on DETECTION OF PHISHING WEBSITE USING MACHINE LEARNING NAME MANIKANTAN ARCOT REG N0 RA1711003040038 CLASS CSE –“C” 3RD YEAR
  • 2. 2 CHAPTER 1 INTRODUCTION Social engineering attack is a common security threat used to reveal private and confidential information by simply tricking the users without being detected. The main purpose of this attack is to gain sensitive information such as username, password and account numbers. According to, phishing or web spoofing technique is one example of social engineering attack. Phishing attack may appear in many types of communication forms such as messaging, SMS, VOIP and fraudster emails. Users commonly have many user accounts on various websites including social network, email and also accounts for banking. Therefore, the innocent web users are the most vulnerable targets towards this attack since the fact that most people are unaware of their valuable information, which helps to make this attack successful. Typically phishing attack exploits the social engineering to lure the victim through sending a spoofed link by redirecting the victim to a fake web page. The spoofed link is placed on the popular web pages or sent via email to the victim. The fake webpage is created similar to the legitimate webpage. Thus, rather than directing the victim request to the real web server, it will be directed to the attacker server. The current solutions of antivirus, firewall and designated software do not fully prevent the web spoofing attack. The implementation of Secure Socket Layer (SSL) and digital certificate (CA) also does not protect the web user against such attack. In web spoofing attack, the attacker diverts the request to fake web server. In fact, a certain type of SSL and CA can be forged while everything appears to be legitimate. According to, secure browsing connection does virtually nothing to protect the users especially from the attackers that have knowledge on how the “secure” connections actually work. This paper develops an anti-web spoofing solution based on inspecting the URLs of fake web pages. This solution developed series of steps to check characteristics of websites Uniform Resources Locators (URLs).
  • 3. 3 CHAPTER 2 ABOUT PROJECT This section describes the proposed model of phishing attack detection. The proposed model focuses on identifying the phishing attack based on checking phishing websites features, Blacklist and WHOIS database. According to few selected features can be used to differentiate between legitimate and spoofed web pages. These selected features are many such as URLs, domain identity, security & encryption, source code, page style and contents, web address bar and social human factor. This study focuses only on URLs and domain name features. Features of URLs and domain names are checked using several criteria such as IP Address, long URL address, adding a prefix or suffix, redirecting using the symbol “//”, and URLs having the symbol “@”.These features are inspected using a set of rules in order to distinguish URLs of phishing webpages from the URLs of legitimate websites. WORKING At first the data sets is created using the information collected from the various sources. Once the data set is created , this data set is fed to K Means clustering algorithm and the model is trained using this data set. A web application is developed , a front end GUI is created using HTML , CSS and simple JAVA script code and the model that is trained with the data sets that are created acts as a back end server. When phishing URL is fed to the model, the model analyses the URL that is fed and gives the appropriate output. Once the machine learning model analyses the given URL, it sends a message to the front end portal whether it is a legitimate site or a phishing site.
  • 4. 4 CHAPTER 3 TOOLS AND TECHNOLOGY 4.1 PYTHON In technical terms, Python is an object-oriented, high-level programming language with integrated dynamic semantics primarily for web and app development. It is extremely attractive in the field of Rapid Application Development because it offers dynamic typing and dynamic binding options. Python is relatively simple, so it's easy to learn since it requires a unique syntax that focuses on readability. Developers can read and translate Python code much easier than other languages. In turn, this reduces the cost of program maintenance and development because it allows teams to work collaboratively without significant language and experience barriers. Additionally, Python supports the use of modules and packages, which means that programs can be designed in a modular style and code can be reused across a variety of projects. Once you've developed a module or package you need, it can be scaled for use in other projects, and it's easy to import or export these modules. One of the most promising benefits of Python is that both the standard library and the interpreter are available free of charge, in both binary and source form. There is no exclusivity either, as Python and all the necessary tools are available on all major platforms. Therefore, it is an enticing option for developers who don't want to worry about paying high development costs. That makes Python accessible to almost anyone. If you have the time to learn, you can create some amazing things with the language.
  • 5. 5 Python is a general-purpose programming language, which is another way to say that it can be used for nearly everything. Most importantly, it is an interpreted language, which means that the written code is not actually translated to a computer-readable format at runtime. Whereas, most programming languages do this conversion before the program is even run. This type of language is also referred to as a "scripting language" because it was initially meant to be used for trivial projects. The concept of a "scripting language" has changed considerably since its inception, because Python is now used to write large, commercial style applications, instead of just banal ones. This reliance on Python has grown even more so as the internet gained popularity. A large majority of web applications and platforms rely on Python, including Google's search engine, YouTube, and the web-oriented transaction system of the New York Stock Exchange (NYSE). We know the language must be pretty serious when it's powering a stock exchange system. Python can also be used to process text, display numbers or images, solve scientific equations, and save data. In short, it is used behind the scenes to process a lot of elements you might need or encounter on your device(s) - mobile included. BENEFITS: 1) Python can be used to develop prototypes, and quickly because it is so easy to work with and read. 2) Most automation, data mining, and big data platforms rely on Python. 3) Python allows for a more productive coding environment than massive languages like C# and Java. Experienced coders tend to stay more organized and productive when working with Python
  • 6. 6 4) Python is easy to read, even if you're not a skilled programmer. Anyone can begin working with the language, all it takes is a bit of patience and a lot of practice. Plus, this makes it an ideal candidate for use among multi-programmer and large development teams. 5) Python powers Django, a complete and open source web application framework. Frameworks - like Ruby on Rails - can be used to simplify the development process. 6) It has a massive support base thanks to the fact that it is open source and community developed. Millions of like-minded developers work with the language on a daily basis and continue to improve core functionality. The latest version of Python continues to receive enhancements and updates as time progresses. This is a great way to network with other developers. 4. K MEANS CLUSTERING ALGORITHM Clustering is one of the most common exploratory data analysis technique used to get an intuition about the structure of the data. It can be defined as the task of identifying subgroups in the data such that data points in the same subgroup (cluster) are very similar while data points in different clusters are very different. In other words, we try to find homogeneous subgroups within the data such that data points in each cluster are as similar as possible according to a similarity measure such as euclidean-based distance or correlation-based distance. The decision of which similarity measure to use is applicationspecific. Clustering analysis can be done on the basis of features where we try to find subgroups of samples based on features or on the basis of samples where we try to find subgroups of features based on samples. We’ll cover here clustering based on features. Clustering is used in market segmentation; where we try to fined customers that are similar to each
  • 7. 7 other whether in terms of behaviors or attributes, image segmentation/compression; where we try to group similar regions together, document clustering based on topics, etc. Unlike supervised learning, clustering is considered an unsupervised learning method since we don’t have the ground truth to compare the output of the clustering algorithm to the true labels to evaluate its performance. We only want to try to investigate the structure of the data by grouping the data points into distinct subgroups. In this post, we will cover only Kmeans which is considered as one of the most used clustering algorithms due to its simplicity. Kmeans algorithm is an iterative algorithm that tries to partition the dataset into Kpredefined distinct non-overlapping subgroups (clusters) where each data point belongs to only one group. It tries to make the inter-cluster data points as similar as possible while also keeping the clusters as different (far) as possible. It assigns data points to a cluster such that the sum of the squared distance between the data points and the cluster’s centroid (arithmetic mean of all the data points that belong to that cluster) is at the minimum. The less variation we have within clusters, the more homogeneous (similar) the data points are within the same cluster. The way kmeans algorithm works is as follows: • Specify number of clusters K. • Initialize centroids by first shuffling the dataset and then randomly selecting K data points for the centroids without replacement. • Keep iterating until there is no change to the centroids. i.e assignment of data points to clusters isn’t changing. • Compute the sum of the squared distance between data points and all centroids.
  • 8. 8 • Assign each data point to the closest cluster (centroid). • Compute the centroids for the clusters by taking the average of the all data points that belong to each cluster. The approach kmeans follows to solve the problem is called Expectation- Maximization. The E-step is assigning the data points to the closest cluster. The M-step is computing the centroid of each cluster.
  • 9. 9 CHAPTER 4 MODULES 6.1 SKLEARN Scikit-learn provides a range of supervised and unsupervised learning algorithms via a consistent interface in Python. It is licensed under a permissive simplified BSD license and is distributed under many Linux distributions, encouraging academic and commercial use. The library is built upon the SciPy (Scientific Python) that must be installed before you can use scikit-learn. This stack that includes: • NumPy: Base n-dimensional array package • SciPy: Fundamental library for scientific computing • Matplotlib: Comprehensive 2D/3D plotting • IPython: Enhanced interactive console • Sympy: Symbolic mathematics • Pandas: Data structures and analysis Extensions or modules for SciPy care conventionally named SciKits. As such, the module provides learning algorithms and is named scikit-learn. The vision for the library is a level of robustness and support required for use in production systems. This means a deep focus on concerns such as easy of use, code quality, collaboration, documentation and performance.
  • 10. 10 Although the interface is Python, c-libraries are leverage for performance such as numpy for arrays and matrix operations, LAPACK, LibSVM and the careful use of cython. The library is focused on modeling data. It is not focused on loading, manipulating and summarizing data. For these features, refer to NumPy and Pandas. FIGURE 1.1 CLUSTER ANALYSIS Some popular groups of models provided by scikit-learn include: • Clustering: for grouping unlabeled data such as KMeans. • Cross Validation: for estimating the performance of supervised models on unseen data. • Datasets: for test datasets and for generating datasets with specific properties for investigating model behavior.
  • 11. 11 • Dimensionality Reduction: for reducing the number of attributes in data for summarization, visualization and feature selection such as Principal component analysis. • Ensemble methods: for combining the predictions of multiple supervised models. • Feature extraction: for defining attributes in image and text data. • Feature selection: for identifying meaningful attributes from which to create supervised models. • Parameter Tuning: for getting the most out of supervised models. • Manifold Learning: For summarizing and depicting complex multi-dimensional data. • Supervised Models: a vast array not limited to generalized linear models, discriminate analysis, naive bayes, lazy methods, neural networks, support vector machines and decision trees. 6.2 NUMPY NumPy is a module for Python. The name is an acronym for "Numeric Python" or "Numerical Python". It is pronounced / (NUM-py) . It is an extension module for Python, mostly written in C. This makes sure that the precompiled mathematical and numerical functions and functionalities of Numpy guarantee great execution speed. Furthermore, NumPy enriches the programming language Python with powerful data structures, implementing multi-dimensional arrays and matrices. These data structures guarantee efficient calculations with matrices and arrays. The implementation is even aiming at huge matrices and arrays, better know under the heading of "big data". Besides that the module supplies a large library of high-level mathematical functions to operate on these matrices and arrays.
  • 12. 12 SciPy (Scientific Python) is often mentioned in the same breath with NumPy. SciPy needs Numpy, as it is based on the data structures of Numpy and furthermore its basic creation and manipulation functions. It extends the capabilities of NumPy with further useful functions for minimization, regression, Fourier-transformation and many others. Both NumPy and SciPy are not part of a basic Python installation. They have to be installed after the Python installation. NumPy has to be installed before installing SciPy. FIGURE 1.2 MATRIX VISUALISATION (Comment: The diagram of the image on the right side is the graphical visualisation of a matrix with 14 rows and 20 columns. It's a so-called Hinton diagram. The size of a square within this diagram corresponds to the size of the value of the depicted matrix. The colour determines, if the value is positive or negative. In our example: the colour red denotes negative values and the colour green denotes positive values.) NumPy is based on two earlier Python modules dealing with arrays. One of these is Numeric. Numeric is like NumPy a Python module for high-performance, numeric computing, but it is obsolete nowadays. Another predecessor of NumPy is Numarray, which is a complete rewrite of Numeric but is deprecated as well. NumPy is a merger of those two, i.e. it is build on the code of Numeric and the features of Numarray. When we say "Core Python", we mean Python without any special modules, i.e. especially without NumPy. The advantages of Core Python:
  • 13. 13 • high-level number objects: integers, floating point • containers: lists with cheap insertion and append methods, dictionaries with fast lookup Advantages of using Numpy with Python: • array oriented computing • efficiently implemented multi-dimensional arrays • designed for scientific computation 6.3 WHOIS The life of phishing site is very short, therefore; this DNS information may not be available after some time. If the DNS record is not available anywhere then the website is phishing. If the domain name of the suspicious webpage is not match with the WHOIS database record, then webpage considers as phishing. FIGURE 1.3 WHOIS MODULE
  • 14. 14 CHAPTER 5 SCREENSHOTS FIG 2.1 OUTPUT - PHISHING URL
  • 15. 15 FIG 2.2OUTPUT – ILLEGITIMATE URL FIG 2.3 SERVER RUNNING FIG 2.4 DATASETS
  • 16. 16 CHAPTER 6 CONCLUSION The most important way to protect the user from phishing attack is the education awareness. Internet users must be aware of all security tips which are given by experts. Every user should also be trained not to blindly follow the links to websites where they have to enter their sensitive information. It is essential to check the URL before entering the website. In Future System can upgrade to automatic Detect the web page and the compatibility of the Application with the web browser. Additional work also can be done by adding some other characteristics to distinguishing the fake web pages from the legitimate web pages. PhishChecker application also can be upgraded into the web phone application in detecting phishing on the mobile platform. There are many features that can be improved in the work, for various other issues. The heuristics can be further developed to detect phishing attacks in the presence of embedded objects like flash. Identity extraction is an important operation and it was improved with the Optical Character Recognition (OCR) system to extract the text and images. More effective inferring rules for identifying a given suspicious web page, and strategies for discovering if it is a phishing target, should be designed in order to further improve the overall performance of this system. Moreover, it is an open challenge to develop a robust malware detection method, retaining accuracy for future phishing emails. In addition, the dynamic and static features complement each other, and therefore both are considered important in achieving high accuracy