Link to our Github:
https://github.com/zivdar001matin/web-crawler-detection
Authors:
Ahmad Etezadi, Matin Zivdar
We build a web crawler detection model to predict anomaly requests for server log requests in this project.
Technologies Used: Machine Learning, MLflow, Scikit-learn, Tensorflow
2. ● Anomaly detection
● Server Logs Dataset
● Pre Processing
● Models
● Evaluation
● API
Summary
3. Anomaly detection
Anomaly detection is one of the most popular machine learning techniques.
In this project, we are asked to identify abnormal behaviors in a system, which relies on the
analysis of logs collected in real-time from the log aggregation systems of an enterprise.
5. Pre Processing
Feature
Extraction
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis sit
amet odio vel purus bibendum luctus.
Generation of features from data that are in a
format that is difficult to analyse directly.
Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Duis sit amet odio vel purus
bibendum luctus.
Feature
Transformation
Transformation of Data:
Encoding, Normalization, etc.
11. PCA is an unsupervised machine learning algorithm
that attempts to reduce the dimensionality.
Using PCA, you can reduce the dimensionality data
and reconstruct the it.
Since anomaly show the largest reconstruction error,
abnormalities can be found based on the error
between the original data and the reconstructed data.
PCA
14. Isolation forest works on the principle of the
decision tree algorithm.
Due to the fact that anomalies require fewer
random partitions than normal normal data
points in the data set. So anomalies are points
that have a shorter path in the tree.
Isolation Forest
19. An autoencoder is a special type of neural
network that copies the input values to the
output values.
It does not require the target variable like the
conventional Y, thus it is categorized as
unsupervised learning.
Autoencoder