SlideShare a Scribd company logo
1 of 28
Download to read offline
By Ahmad Etezadi &
Matin Zivdar
Log Anomaly
Detection
● Anomaly detection
● Server Logs Dataset
● Pre Processing
● Models
● Evaluation
● API
Summary
Anomaly detection
Anomaly detection is one of the most popular machine learning techniques.
In this project, we are asked to identify abnormal behaviors in a system, which relies on the
analysis of logs collected in real-time from the log aggregation systems of an enterprise.
Sample Log
IP [TIME] [Method Path] StatusCode ResponseLength [[UserAgent]] ResponseTime
Pre Processing
Feature
Extraction
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis sit
amet odio vel purus bibendum luctus.
Generation of features from data that are in a
format that is difficult to analyse directly.
Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Duis sit amet odio vel purus
bibendum luctus.
Feature
Transformation
Transformation of Data:
Encoding, Normalization, etc.
Extraction
tehran_traffic_statistics
is_spider
Pre Processing
url_depth is_phone
Extraction
URL
Depth
Iran Network Traffic Statistics
OS Family Device Brand
http://members.tehran-ix.ir/statistics/ixp/pkts
Transformation
One-hot
encoding
Normalization
time_weight
status_code
Pre Processing
method
Bucketing
response_length requested_file_type
Transformation
Bucketing Normalization One hot encoding
● Learning Algorithm
Unsupervised Learning
02
● K-means
● PCA
● Isolation Forest
● AutoEncoder
Supervised Learning
01
● k-NN
● Local Outlier Factor (LOF)
● SVM
● Neural Networks Based
Anomaly
Detection
Algorithm
PCA is an unsupervised machine learning algorithm
that attempts to reduce the dimensionality.
Using PCA, you can reduce the dimensionality data
and reconstruct the it.
Since anomaly show the largest reconstruction error,
abnormalities can be found based on the error
between the original data and the reconstructed data.
PCA
Anomaly Samples
Conclusion
Isolation forest works on the principle of the
decision tree algorithm.
Due to the fact that anomalies require fewer
random partitions than normal normal data
points in the data set. So anomalies are points
that have a shorter path in the tree.
Isolation Forest
Anomaly Samples
Conclusion
Comparison
Isolation
Forest
vs.
PCA
Number of Isolation Forest outlier: 124,091
Number of PCA outlier: 123,695
common data in PCA & Isolation Forest outlier: 20,512
Let's Dig Deeper
An autoencoder is a special type of neural
network that copies the input values to the
output values.
It does not require the target variable like the
conventional Y, thus it is categorized as
unsupervised learning.
Autoencoder
Sample of error reconstruction
Histogram of error by Autoencoder
Evaluation Models
Autoencoder PCA Isolation Forest
Follow the Rules!
- User agents
- Malicious IPs dataset
Model ROC Curve
API Example
We also tried:
- Extracting time series features
- Labeling all other requests for an
anomaly “IP” and “User Agent” as an
anomaly request
Thank You!

More Related Content

Similar to Web Crawler Detection Model

The Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance TuningThe Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance TuningjClarity
 
Apeman masta midih-oc2_demo_day
Apeman masta midih-oc2_demo_dayApeman masta midih-oc2_demo_day
Apeman masta midih-oc2_demo_dayMIDIH_EU
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Spark Summit
 
Scalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using RayScalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using RayDatabricks
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark MLAhmet Bulut
 
Java/Hybris performance monitoring and optimization
Java/Hybris performance monitoring and optimizationJava/Hybris performance monitoring and optimization
Java/Hybris performance monitoring and optimizationEPAM Lviv
 
Building Cogeneration Planning Scheduling Systems using IBM ILOG ODME, CPLEX ...
Building Cogeneration Planning Scheduling Systems using IBM ILOG ODME, CPLEX ...Building Cogeneration Planning Scheduling Systems using IBM ILOG ODME, CPLEX ...
Building Cogeneration Planning Scheduling Systems using IBM ILOG ODME, CPLEX ...Alkis Vazacopoulos
 
Dot Net Application Monitoring
Dot Net Application MonitoringDot Net Application Monitoring
Dot Net Application MonitoringRavi Okade
 
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaIntroduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaSandesh Rao
 
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEAIntroduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEASandesh Rao
 
How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? Deepak Shankar
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200xIBM Sverige
 
Next generation alerting and fault detection, SRECon Europe 2016
Next generation alerting and fault detection, SRECon Europe 2016Next generation alerting and fault detection, SRECon Europe 2016
Next generation alerting and fault detection, SRECon Europe 2016Dieter Plaetinck
 
Mechanisms for Database Intrusion Detection and Response
Mechanisms for Database Intrusion Detection and ResponseMechanisms for Database Intrusion Detection and Response
Mechanisms for Database Intrusion Detection and ResponseAshish Kamra
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData
 
Combining out - of - band monitoring with AI and big data for datacenter aut...
Combining out - of - band monitoring with AI and big data  for datacenter aut...Combining out - of - band monitoring with AI and big data  for datacenter aut...
Combining out - of - band monitoring with AI and big data for datacenter aut...Ganesan Narayanasamy
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon Web Services
 
Q con shanghai2013-罗婷-performance methodology
Q con shanghai2013-罗婷-performance methodologyQ con shanghai2013-罗婷-performance methodology
Q con shanghai2013-罗婷-performance methodologyMichael Zhang
 
Application of the Actor Model to Large Scale NDE Data Analysis
Application of the Actor Model to Large Scale NDE Data AnalysisApplication of the Actor Model to Large Scale NDE Data Analysis
Application of the Actor Model to Large Scale NDE Data AnalysisChrisCoughlin9
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Databricks
 

Similar to Web Crawler Detection Model (20)

The Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance TuningThe Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance Tuning
 
Apeman masta midih-oc2_demo_day
Apeman masta midih-oc2_demo_dayApeman masta midih-oc2_demo_day
Apeman masta midih-oc2_demo_day
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
 
Scalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using RayScalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using Ray
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
 
Java/Hybris performance monitoring and optimization
Java/Hybris performance monitoring and optimizationJava/Hybris performance monitoring and optimization
Java/Hybris performance monitoring and optimization
 
Building Cogeneration Planning Scheduling Systems using IBM ILOG ODME, CPLEX ...
Building Cogeneration Planning Scheduling Systems using IBM ILOG ODME, CPLEX ...Building Cogeneration Planning Scheduling Systems using IBM ILOG ODME, CPLEX ...
Building Cogeneration Planning Scheduling Systems using IBM ILOG ODME, CPLEX ...
 
Dot Net Application Monitoring
Dot Net Application MonitoringDot Net Application Monitoring
Dot Net Application Monitoring
 
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaIntroduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
 
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEAIntroduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
 
How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration?
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200x
 
Next generation alerting and fault detection, SRECon Europe 2016
Next generation alerting and fault detection, SRECon Europe 2016Next generation alerting and fault detection, SRECon Europe 2016
Next generation alerting and fault detection, SRECon Europe 2016
 
Mechanisms for Database Intrusion Detection and Response
Mechanisms for Database Intrusion Detection and ResponseMechanisms for Database Intrusion Detection and Response
Mechanisms for Database Intrusion Detection and Response
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
 
Combining out - of - band monitoring with AI and big data for datacenter aut...
Combining out - of - band monitoring with AI and big data  for datacenter aut...Combining out - of - band monitoring with AI and big data  for datacenter aut...
Combining out - of - band monitoring with AI and big data for datacenter aut...
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 
Q con shanghai2013-罗婷-performance methodology
Q con shanghai2013-罗婷-performance methodologyQ con shanghai2013-罗婷-performance methodology
Q con shanghai2013-罗婷-performance methodology
 
Application of the Actor Model to Large Scale NDE Data Analysis
Application of the Actor Model to Large Scale NDE Data AnalysisApplication of the Actor Model to Large Scale NDE Data Analysis
Application of the Actor Model to Large Scale NDE Data Analysis
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Web Crawler Detection Model