2. Course Contents
• Introduction to Machine Learning and AI - 30 mins
• Introduction to Anomaly Detection - 30 mins
• Different Anomaly Techniques - 40 mins
• Case studies from real world scenario - 30 minutes
• Using anomaly detection in your work area - 30 mins
• Summary and wrap up - 20 minutes
3. Artificial Intelligence
• Using computers to solve problems or make
decisions
• Strong AI
• Computers thinking at a level of human beings like
reasoning and thinking
• Not there yet
• Also called as Artificial General Intelligence (AGI) and
Artificial Super Intelligence (ASI)
• Weak AI
• Solve problems by detecting useful patterns
• Dominant mode of AI today
John Mccarthy – coined the term AI in 1957
4. Machine Learning
• Study of algorithms and statistical models
• Perform a specific task
• Without using explicit instructions
• But Relying on patterns and inference instead.
• Machine learning algorithms build a mathematical model of sample data,
known as "training data"
• Make predictions or decisions without being explicitly programmed to perform
the task.
7. What is an anomaly?
• Anomaly is a single (or) set of data instances that differ significantly
from the rest of the points.
• Could be generated by variability in measurement, experimental
errors or voluntarily addition
• Anomaly Detection - process to find out anomalies present in the
data for further analysis
8. Why anomaly detection is needed in the first place?
• Outliers could bring down efficiency of forecasts drastically affecting
the accuracy if not identified
• Important for businesses to identify patterns, detect anomalies, take
corrective measures through these alarms before things go wrong
• Important tool for fraud, network intrusion, surveillance and many
more
9. ALGORITHMS FOR ANOMALY DETECTION
• Cluster based
• K-Means Clustering
• K-Medoids Clustering
• DBSCAN
• Non Cluster Based
• Isolation Forests
• Gaussian Distribution Approximation
• Histogram Based Outlier Detection
• Angle Based Outlier Detection
• Seasonal Decomposition
11. Results …
Algorithm % of Anomalies
K-Means 21.76%
K-Medoids 19.7%
DBSCAN 12.01%
Gaussian Distribution Approximation 26.64%
Histogram Based Outlier Detection 13.13%
Isolation Forests 10.13%
Angle Based Outlier Detection 30%
Seasonal Decomposition 10.32%
12. Which is a better algorithm to use?
• For the dataset considered Seasonal Decompose produced the best
results as it gave outliers that were values when the curve suddenly
peaked and dipped.
• We consider those as anomalies because, the data we have is
unlabelled and we considered sudden value changes as
inconsistencies