Cantina content based approach to detect phishing websites
CANTINAA Content-Based Approach to Detecting Phishing WebSites
•CANTINA is a content-basedapproach.•Examines whether the content islegitimate or not.•Detects phishing URLs and links.ABSTRACT
INTRODUCTION• PhishingA kind of attack in which victims are tricked byspoofed emails and fraudulent web sites into givingup personal information•How many phishing sites are there?9,255 unique phishing sites were reported in June of2006 alone•How much phishing costs each year?$1 billion to 2.8 billion per year
PROPOSED SYSTEM• Detects phishing websites• Examines text-based content along with surfacecharacteristics.• Text based content includes:-Age of Domain.-Known Images.-Suspicious URL.-Suspicious links. Detects phishing links in users email.
TF-IDF ALGORITHM• Term Frequency (TF)–The number of times a given term appearsin a specific document–Measure of the importance of the termwithin the particular document• Inverse Document Frequency (IDF)–Measure how common a term is across anentire collection of documents• High TF-IDF weight means High TF
MODULES• Parsing the web pages• Generating the lexical signature• Testing Process• Report Generation
Parsing the web pages• Link, anchor tag, form tag and attachment in theweb pages is turned into corresponding Text Link,HTML Link e.t.c.•Done by parsing each Text• Uses HTML Parser API• It is used for extracting information fromHTML code
Generating the lexical signature• TF-IDF algorithm used to generatelexical signatures.• Calculating the TF-IDF value for eachword in a document.• Selecting the words with highestvalue.
Testing Process• Feed this lexical signature to a searchengine.• Check domain name of the currentweb page matches the domain nameof the N top search results.
Report Generation• If a page is Legitimate it returns“legitimate”• If a page is phishing it returns“phishing”
• Used to detect fraudulent websites,emails.•Protects from giving up personalinformation like credit card numbers,bank details, account passwords etc.•Used to detect suspicious links inemail.APPLICATIONS
•Content-based approach for detectingphishing websites.•User friendly interface for the users.•Anti-phishing website that protects usersfrom giving their personal information.CONCLUSION