Generative AI on Enterprise Cloud with NiFi and Milvus
Ista presentation-malicious url
1. Evaluating Deep Learning
Approaches to Characterize and
Classify Malicious URL’s
Vinayakumar R1, K.P Soman1 and Prabaharan Poornachandran2
1Centre for Computational Engineering and Networking (CEN), Amrita School of
Engineering, Coimbatore, Amrita Vishwa Vidyapeetham,
Amrita University, India.
2Center for Cyber Security Systems and Networks, Amrita School of Engineering,
Amritapuri, Amrita Vishwa Vidyapeetham,
Amrita University, India.
2. Outline
• Introduction
• Background information / Related works
• Proposed Method – Deep Learning
• Description of the data set and Results
• Summary
• Future Work
• References
2
3. Introduction
• Web services drive new opportunities for people
to interact, they also create new opportunities for
criminals.
• Almost all online threats have something in
common, they all require the user to click on a
hyperlink or type in a website address.
• Malicious Uniform Resource Locator (URL),
termed as malicious website is a foundation
mechanisms for many of Internet criminal
activities such as phishing, spamming, identity
theft, financial fraud and malware.
3
4. Background information / Related works
• Blacklisting is the most commonly used
approach.
• Blacklisting is completely ineffective at finding
both variations of malicious URL or newly
generated URL.
• Machine learning methods with Feature
engineering is another most commonly used
approach.
• Deep learning is a new field of machine
learning that has the capability to obtain
optimal feature representation by taking URL
as such as input [1].
4
6. Description of the data set and Results
• Data set 1 - legitimate URL’s from Alexa [2] and
DMOZ directory [3] for legitimate and Phishtank [4]
for malicious
• Data set 2 - legitimate URL’s from Alexa [2] and
DMOZ directory [3] and malicious URL’s from
MalwareURL [5]
6
10. Summary
• The effectiveness of machine learning and deep
learning approaches are reviewed towards detect
and analysis of malicious URL's.
• Deep learning approaches performed well in
comparison to the classical machine learning
algorithms.
• Deep learning approaches avoids manual hand
crafted feature engineering method and thereby
itself serve as robust in handling drifting of URL’s
and in the scenario of adversarial machine
learning setting.
10
11. Future Work
• We lack behind in showing the inner mechanics of
deep networks. This can be considered as one of
future directions. This can be done by transforming
the non-linear state to linearized form and thereby
calculate and analyze the shape of Eigen values and
Eigen vectors from them over time-steps [6].
• In real-time scenario, getting an adequate labeled
training data is often considered as a difficult task.
One of the largest available open source labeled
URL’s training data is of size 2.4 million [7]. Thus
require a larger study by transforming supervised
learning to semi-supervised to unsupervised learning
in deep learning mechanisms. This can be considered
as another significant future direction. 11
12. References
[1] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton.
"Deep learning." Nature 521.7553 (2015): 436-444.
[2] http://www.alexa.com/topsites
[3] http://www.dmoz.org/
[4] http://www.phishtank.com/
[5] http://www.malwareurl.com/
[6] Moazzezi, R. Change-based population coding. PhD
thesis,UCL (University College London), 2011.
[7] K. Rieck, T. Krueger, and A. Dewald, Cujo: efficient
detection and prevention of drive-by-download attacks,
in Proceedings of the 26th Annual Computer Security
Applications Conference. ACM, 2010, pp.31-39.
12