This document discusses using machine learning to detect malicious URLs. It proposes extracting various features from URLs, including querying blacklists, domain registration information, host properties, and lexical features of the URL. These features are then used to train classifiers like logistic regression to distinguish benign from malicious URLs. The approach is shown to achieve over 86.5% accuracy in detecting malicious URLs using a diverse set of over 18,000 features, performing better than blacklists alone. Future work includes scaling the approach for deployment and incorporating webpage content analysis.
Phishing Website Detection by Machine Learning Techniques Presentation.pdfVaralakshmiKC
Phishing is a form of social engineering where attackers deceive people into revealing sensitive information or installing malware such as ransomware.Phishing (pronounced: fishing) is an attack that attempts to steal your money, or your identity, by getting you to reveal personal information -- such as credit card numbers, bank information, or passwords -- on websites that pretend to be legitimate.
Phishing is a social engineering Technique which they main aim is to target the user Information like user id, password, credit card information and so on. Which result a financial loss to the user. Detecting Phishing is the one of the challenge problem that relay to human vulnerabilities. This paper proposed the Detecting Phishing Web Sites using different Machine Learning Approaches. In this to evaluate different classification models to predict malicious and benign websites by using Machine Learning Algorithms. Experiments are performed on data set consisting malicious and benign, In This paper the results shows the proposed Algorithms has high detection accuracy. Nakkala Srinivas Mudiraj ""Detecting Phishing using Machine Learning"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23755.pdf
Paper URL: https://www.ijtsrd.com/computer-science/computer-security/23755/detecting-phishing-using-machine-learning/nakkala-srinivas-mudiraj
Malware Dectection Using Machine learningShubham Dubey
Malware detection is an important factor in the security of the computer systems. However, currently utilized signature-based methods cannot provide accurate detection of zero-day attacks and polymorphic viruses. That is why the need for machine learning-based detection arises.
Phishing Website Detection by Machine Learning Techniques Presentation.pdfVaralakshmiKC
Phishing is a form of social engineering where attackers deceive people into revealing sensitive information or installing malware such as ransomware.Phishing (pronounced: fishing) is an attack that attempts to steal your money, or your identity, by getting you to reveal personal information -- such as credit card numbers, bank information, or passwords -- on websites that pretend to be legitimate.
Phishing is a social engineering Technique which they main aim is to target the user Information like user id, password, credit card information and so on. Which result a financial loss to the user. Detecting Phishing is the one of the challenge problem that relay to human vulnerabilities. This paper proposed the Detecting Phishing Web Sites using different Machine Learning Approaches. In this to evaluate different classification models to predict malicious and benign websites by using Machine Learning Algorithms. Experiments are performed on data set consisting malicious and benign, In This paper the results shows the proposed Algorithms has high detection accuracy. Nakkala Srinivas Mudiraj ""Detecting Phishing using Machine Learning"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23755.pdf
Paper URL: https://www.ijtsrd.com/computer-science/computer-security/23755/detecting-phishing-using-machine-learning/nakkala-srinivas-mudiraj
Malware Dectection Using Machine learningShubham Dubey
Malware detection is an important factor in the security of the computer systems. However, currently utilized signature-based methods cannot provide accurate detection of zero-day attacks and polymorphic viruses. That is why the need for machine learning-based detection arises.
Incident response (IR) is the systematic response and management of events following a cyber attack or any security breach. It involves a series of actions and activities aimed at reducing the impact of security breaches and cyber attacks on organizations.
Visit - https://www.siemplify.co/
This documentation provides a brief insight of face recognition based attendance system using neural networks in terms of product architecture which can be used for educational purpose.
BSc CSIT Final Year Project Report on Hamro Krishi - NepalSirish Paudel
This is a copy of Final Year Project Report 2072 prepared by BSc CSIT students: Sirish Paudel, Sahaz Bhattarai, Jiwan Bhattarai and Suman Shreshta from New Summit College, Kathmandu, Nepal
With the advent of the Internet and social media, while hundreds of people have benefitted from the vast sources of information available, there has been an enormous increase in the rise of cyber-crimes, particularly targeted towards women. According to a 2019 report in the [4] Economics Times, India has witnessed a 457% rise in cybercrime in the five year span between 2011 and 2016. Most speculate that this is due to impact of social media such as Facebook, Instagram and Twitter on our daily lives. While these definitely help in creating a sound social network, creation of user accounts in these sites usually needs just an email-id. A real life person can create multiple fake IDs and hence impostors can easily be made. Unlike the real world scenario where multiple rules and regulations are imposed to identify oneself in a unique manner (for example while issuing one’s passport or driver’s license), in the virtual world of social media, admission does not require any such checks. In this paper, we study the different accounts of Instagram, in particular and try to assess an account as fake or real using Machine Learning techniques namely Logistic Regression and Random Forest Algorithm.
This talk is a generic but comprehensive overview of security mechanism, controls and potential attacks in modern browsers. The talk focuses also on new technologies, such as HTML5 and related APIs to highlight new attack scenario against browsers.
Author: Dr Sandeep Sood
Password-based authentication is used in online web applications due to its simplicity and convenience. Efficient password-based authentication schemes are required to authenticate the legitimacy of remote users, or data origin over an insecure communication channel. Password-based authentication schemes are highly susceptible to phishing attacks.
Incident response (IR) is the systematic response and management of events following a cyber attack or any security breach. It involves a series of actions and activities aimed at reducing the impact of security breaches and cyber attacks on organizations.
Visit - https://www.siemplify.co/
This documentation provides a brief insight of face recognition based attendance system using neural networks in terms of product architecture which can be used for educational purpose.
BSc CSIT Final Year Project Report on Hamro Krishi - NepalSirish Paudel
This is a copy of Final Year Project Report 2072 prepared by BSc CSIT students: Sirish Paudel, Sahaz Bhattarai, Jiwan Bhattarai and Suman Shreshta from New Summit College, Kathmandu, Nepal
With the advent of the Internet and social media, while hundreds of people have benefitted from the vast sources of information available, there has been an enormous increase in the rise of cyber-crimes, particularly targeted towards women. According to a 2019 report in the [4] Economics Times, India has witnessed a 457% rise in cybercrime in the five year span between 2011 and 2016. Most speculate that this is due to impact of social media such as Facebook, Instagram and Twitter on our daily lives. While these definitely help in creating a sound social network, creation of user accounts in these sites usually needs just an email-id. A real life person can create multiple fake IDs and hence impostors can easily be made. Unlike the real world scenario where multiple rules and regulations are imposed to identify oneself in a unique manner (for example while issuing one’s passport or driver’s license), in the virtual world of social media, admission does not require any such checks. In this paper, we study the different accounts of Instagram, in particular and try to assess an account as fake or real using Machine Learning techniques namely Logistic Regression and Random Forest Algorithm.
This talk is a generic but comprehensive overview of security mechanism, controls and potential attacks in modern browsers. The talk focuses also on new technologies, such as HTML5 and related APIs to highlight new attack scenario against browsers.
Author: Dr Sandeep Sood
Password-based authentication is used in online web applications due to its simplicity and convenience. Efficient password-based authentication schemes are required to authenticate the legitimacy of remote users, or data origin over an insecure communication channel. Password-based authentication schemes are highly susceptible to phishing attacks.
Your listing data is valuable. Scraping it NOT good for distribution of your listings to your competitors and fraudsters. Controlling your listing data is good business - protects your value, saves on costs and maximizes revenue. This session explores the specific of how one property portal found strong ROI with bot detection protecting their listings.
Ensuring Property Portal Listing Data SecurityDistil Networks
Securing your property portal listing data is harder than ever. Why? Web scraping is cheap and easy. Bots simply steal whatever content they’ve been programmed to fetch – listing text, photos, and other data that should only be available to paid subscribers and legitimate consumers.
Review this presentation to learn how to avoid expensive litigation by protecting your content before the theft occurs. Review the latest research on how non-human traffic has evolved over the past few years and best practices to protect both copyrighted and non-copyrightable content.
Hear the results from research conducted with property portal executives on the current state of anti-scraping efforts.
How to clean up travel website traffic from bots and spammers?tnooz
Did you know 30% of travel industry website visitors are unsavory competitors, hackers, spammers, and fraudsters?
When aggressive scrapers took his website offline, Rob Gennaro, Digital Marketing Officer at Red Label Vacations, said enough was enough.
The fact is, travel suppliers, OTAs, and meta search sites are all being scraped by bots which hurts their marketing metrics, SEO, website performance, and customer loyalty.
You can protect your site from web-scraping competitors and fraudsters.
Attend this FREE 30-minute TLearn webinar to understand:
The prevalence and impact of bots on your website
How to identify and block fraudsters and scrapers
When a web scraper is actually good
The future of online travel and website security
Our panelists are:
Rob Gennaro, digital marketing officer, Red Label Vacations
Rami Essaid, co-founder and CEO, Distil Networks
Kevin May, moderator and editor, Tnooz
Nick Vivion, producer and reporter, Tnooz
Cleaning up website traffic from bots & spammersDistil Networks
Did you know 30% of travel industry website visitors are unsavory competitors, hackers, spammers, and fraudsters?
The fact is, travel suppliers, OTAs, and meta search sites are all being scraped by bots which hurts their marketing metrics, SEO, website performance, and customer loyalty.
You can protect your site from web-scraping competitors and fraudsters.
Watch this presentation to understand:
- The prevalence and impact of bots on your website
- How to identify and block fraudsters and scrapers
- When a web scraper is actually good
- The future of online travel and website security
The Retail Strategy and Planning Series is designed to provide retail executives with the tactical tips, insights, metrics and trend data needed to guide 2017 strategies. Tune into Are Bot Operators Eating Your Lunch? and learn how to protect your brand image, reputation and SEO rankings from bad bots: rtou.ch/2c5cPmx.
Did you know 30% of Ecommerce website visitors are unsavory competitors, hackers, and fraudsters?
Fact is, online retailers are particularly susceptible to the effects of advanced bot threats, including competitive tactics like price scraping, product matching, variation tracking and availability targeting. Even worse, security breaches such as transaction fraud and account takeovers endanger the overall security of your website, customer base, and brand.
When aggressive scrapers caused repeated site slowdowns, Brian Gress, Director of IT Systems & Governance at Hayneedle, said enough was enough.
Key takeaways include how to:
- Stop competitors from scraping your prices and monitoring your inventory
- Reduce chargeback fees due to transaction fraud, carding and account hijacking
- Optimize your conversion funnel and enjoy clean analytics and KPIs
- Protect your brand image, reputation and SEO rankings
PUMMP: PHISHING URL DETECTION USING MACHINE LEARNING WITH MONOMORPHIC AND POL...IJCNCJournal
Phishing scams are increasing drastically, which affects Internet users in compromising personal
credentials. This paper proposes a novel feature utilization method for phishing URL detection called the
Polymorphic property of features. In the initial stage, the URL-related features (46 features) were
extracted. Later, a subset of features (19 out of 46) with the polymorphic property of features was
identified, and they were extracted from different parts of the URL (the domain and path). After extracting
the features, various machine learning classification algorithms were applied to build the machine
learning model using monomorphic treatment of features, polymorphic treatment of features, and both
monomorphic and polymorphic treatment of features. By the polymorphic property of features, we mean
that the same feature provides different interpretations when considered in different parts of the URL. The
machine learning models were built on two different datasets. A comparison of the machine learning
models derived from the two datasets reveals the fact that the model built with both monomorphic and
polymorphic treatment of features yielded higher accuracy in Phishing URL detection than the existing
works.
PUMMP: Phishing URL Detection using Machine Learning with Monomorphic and Pol...IJCNCJournal
Phishing scams are increasing drastically, which affects Internet users in compromising personal credentials. This paper proposes a novel feature utilization method for phishing URL detection called the Polymorphic property of features. In the initial stage, the URL-related features (46 features) were extracted. Later, a subset of features (19 out of 46) with the polymorphic property of features was identified, and they were extracted from different parts of the URL (the domain and path). After extracting the features, various machine learning classification algorithms were applied to build the machine learning model using monomorphic treatment of features, polymorphic treatment of features, and both monomorphic and polymorphic treatment of features. By the polymorphic property of features, we mean that the same feature provides different interpretations when considered in different parts of the URL. The machine learning models were built on two different datasets. A comparison of the machine learning models derived from the two datasets reveals the fact that the model built with both monomorphic and polymorphic treatment of features yielded higher accuracy in Phishing URL detection than the existing works
State of the Art Analysis Approach for Identification of the Malignant URLsIOSRjournaljce
Malicious URLs have been universally used to ascend various cyber attacks including spamming, phishing and malware. Malware, short term for malicious software, is software which is developed to penetrate computers in a network without the user’s permission or notification. Existing methods typically detect malicious URLs of a single attack type. Hence such detection systems are failed to protect the users from various attacks. Malware spreading widely throughout the area of network as consequence of this it becomes predicament in distributed computer and network systems. Malicious links are the place of origin of all attacks which circulated all over the web. Hence malicious URLs should be detected for the prevention of users from these malware attacks. In this paper we described a novel approach which analyze all types of attacks by identifying malicious URLs and secure the web users from them. This technique prevents the users from malignant URLs before visiting them. Therefore efficiency of web security gets maintained. For such anatomization we developed an analyzer which identifies URLs and examine as malicious or benign. We also developed five processes which crawl for suspicious URLs. This approach will prevent the users from all types of attacks and increase efficiency of web crawling phase.
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"Jishnu Pradeep
Presentation based on Paper titled: "Your botnet is my botnet: Analysis of a botnet takeover". The original authors are Brett Stone-Gross, Marco Cova, Lorenzo Cavallaro, Bob Gilbert, Martin Szydlowski,
Richard Kemmerer, Christopher Kruegel, and Giovanni Vigna.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
2. Who am I ?
• Info security Investigator @ Cisco.
• Completed Mtech from IIT Jodhpur in 2014.
• Areas of interest include machine learning,
computer vision and A.I.
• Email : satyamiitj89@gmail.com
6. Problem in a Nutshell
6
URL features to identify malicious Web sites
No context, no content
Different classes of URLs
Benign, spam, phishing, exploits, scams...
For now, distinguish benign vs. malicious
facebook.com fblight.com
8. State of the Practice
8
Current approaches
Blacklists [SORBS, URIBL, SURBL, Spamhaus]
Learning on hand-tuned features [Garera et al, 2007]
Limitations
Cannot predict unlisted sites
Cannot account for new features
Arms race: Fast feedback cycle is critical
More automated approach?
10. Data Sets
10
Malicious URLs
5,000 from PhishTank (phishing)
15,000 from Spamscatter (spam, phishing, etc)
Benign URLs
15,000 from Yahoo Web directory
15,000 from DMOZ directory
Malicious x Benign → 4 Data Sets
30,000 – 55,000 features per data set
11. Algorithms
11
Logistic regression w/ L1-norm regularization
Other models
Naive Bayes
Support vector machines (linear, RBF kernels)
Implicit feature selection
Easier to interpret
13. Features to consider?
14
1) Blacklists
2) Simple heuristics
3) Domain name registration
4) Host properties
5) Lexical
14. (1) Blacklist Queries
15
List of known malicious sites
Providers: SORBS, URIBL, SURBL,
Spamhaus
http://www.bfuduuioo1fp.mobi
In blacklist?
Yes
http://fblight.com
No
In blacklist?
http://www.bfuduuioo1fp.mobi
Blacklist queries as features
........................................
........................................
15. (2) Manually-Selected Features
16
Considered by previous studies
IP address in hostname?
Number of dots in URL
WHOIS (domain name) registration date
stopgap.cn registered 28
June 2009
http://72.23.5.122/www.bankofamerica.com/
http://www.bankofamerica.com.qytrpbcw.stopgap.cn/
16. (3) WHOIS Features
17
Domain name registration
Date of registration, update, expiration
Registrant: Who registered domain?
Registrar: Who manages registration?
http://sleazysalmon.com
http://angryalbacore.com
http://mangymackerel.com
http://yammeringyellowtail.com
Registered on
29 June 2009
By SpamMedia
17. (4) Host-Based Features
18
Blacklisted? (SORBS, URIBL, SURBL, Spamhaus)
WHOIS: registrar, registrant, dates
IP address: Which ASes/IP prefixes?
DNS: TTL? PTR record exists/resolves?
Geography-related: Locale? Connection speed?
75.102.60.0/2269.63.176.0/20
facebook.com fblight.com
18. (5) Lexical Features
19
Tokens in URL hostname + path
Length of URL
Entropy of the domain name
http://www.bfuduuioo1fp.mobi/ws/ebayisapi.dll
21. Limitations
22
False positives
Sites hosted in disreputable ISP
Guilt by association
False negatives
Compromised sites
Free hosting sites
Hosted in reputable ISP
Future work: Web page content
22. Conclusion
23
Detect malicious URLs with high accuracy
Only using URL
Diverse feature set helps: 86.5% w/ 18,000+
features
Proof concept working in lab
Future work
Scaling up for deployment
23. References
Ma, Justin, et al. "Beyond blacklists: learning
to detect malicious web sites from suspicious
URLs." Proceedings of the 15th ACM SIGKDD
international conference on Knowledge
discovery and data mining. ACM, 2009.