Nicola Pagni - Anomaly Detection in Elasticsearch

Anomaly Detection in
Elasticsearch

Chi Sono ?
Nicola Pagni
nicolap@seacom.i
t
Profilo Linkedin

Elasticsearch
• Distributed data storage & information retrieval platform
• Search engine / aggregation engine / analytics
• Suggestions / percolation / highlighting / geo
• Document store
• Horizontally distributed
• Real time & near real time
• Apache License
• Proprietary plugins available for security & messaging
• Open Source

Elasticsearch
• Simple RESTful API
• Fast document retrieval
• Versatile - Many popular use cases:
• Log data analysis
• Full text search
• Analytics & Aggregations
• Data visualization w/ Kibana
• Alerting & classification w/ Percolator
• Suggestion engine

Elasticsearch
A Cluster is the largest container in Elasticsearch
A Node is a running instance of Elasticsearch
An Index is a lightweight container for data
A Document can be “indexed” and the results go into an “index”…It
is then searchable
A Shard is a single piece of an Elasticsearch index
…some key terms:

Elasticsearch
RESTful API to interact with Elasticsearch
GET - PUT - POST - HEAD -
DELETE
Manage Elasticsearch through API calls
Get and set configurations, mappings, templates, aliases, and index
setting

Elastic Stack
Kibana
Elasticsearch
Beats Logstash
X-Pack
Security
Alerting
Monitoring
Reporting
Graph
Kibana
Elasticsearch
Beats Logstash
Machine Learning

Elastic Stack

SECURITY ALERTING MACHINE
LEARNING
MONITORING GRAPH REPORTING

Elastic Stack
Anomaly Detection
Autonomous cars Voice Recognition
Fraud detection
Speech Recognition
Language Translation Entity Resolution
Predictive Medicine
Learn to Rank
RecommendationsImage Classification

Terminology
• Machine Learning
‒ Broad term, but X-Pack Machine Learning is automated anomaly detection for time-series data
(for now).
• Anomaly Detection
‒ Discovery of what’s “weird” or “different”, not what’s “bad”
• Unsupervised Learning
‒ Learning without human-labeled examples (without being “taught”)
• Bayesian
‒ An approach based on probability in which prior results are used to calculate probabilities of
certain present or future events

What’s “unusual”/”anomalous”?
Deviations in count/values
Rare Events
Unusual Vs Population

How does anomaly detection work end to end?
analysis baseline models
index
3. persist baselines
1. get historical data
2. auto-create baselines
now=T

End-to-End
index
results
index
1. read baselines
2. get new data
3. analyze vs. baseline
4. update baselines
now=T+t

End-to-End
index
results
index
1. read baselines
2. get new data
3. analyze vs. baseline
4. update baselines
alert!
now=T+2tt

Anomaly Detection Complements Rules
• Rules are not great for defining normal / unusual
• Rules for “normalcy” don’t evolve with data / infrastructure
What’s the right threshold ?

Unsupervised Machine Learning
• Automatically baseline the normal and detect what isn’t

Accelerate root cause analysis w/ Influencer detection

Continuously evolve with online learning techniques

Anomalies in temporal pattern
•Single (univariate) time series
Example: Is there unusual traffic on website ?

Anomalies in temporal pattern
•Multiple time series
‒Multiple metrics
‒Single metric split by a field;
•Each series modeled
independently
Example:
Is there unusual web activity
from any country?
Time
Metric

IT Operational Analytics
Using rules & queries
Get notified when:
• Free disk space goes below 5%
• Elasticsearch cluster health is red
Using anomaly detection
Get notified when:
• Unexpected spike in error rates
• Unusual server CPU activity
Use Case

Security Analytics
Get notified when:
• > 5 failed logins on a machine in 5 min
• Process X starts on any server
Get notified when:
• Login at unusual time / location
• Anomalous outbound data transfer
Use Case

Application Monitoring
Using rules & queries:
Get notified when:
• App response time exceeds SLA
• Active connections exceed threshold
Get notified when:
• Unusual spike/dip in inactive users
• Sudden drop in app performance
Use Case

Marketing Analytics
Get notified when:
• Weekly activity for a user drops > 30%
• Activity on flagged account
Get notified when:
• Sudden spike in visitors from a city
Use Case

Nicola Pagni - Anomaly Detection in Elasticsearch

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Nicola Pagni - Anomaly Detection in Elasticsearch

Similar to Nicola Pagni - Anomaly Detection in Elasticsearch (20)

More from MeetupDataScienceRoma

More from MeetupDataScienceRoma (20)

Recently uploaded

Recently uploaded (20)

Nicola Pagni - Anomaly Detection in Elasticsearch