Applying ML for Log Analysis

Applying AI for Log Analysis
July 2017

Confidential and Proprietary July 2017
Hi!
Ronny Lehmann
CTO & Founder – Loom Systems
Formerly 8200, BioCatch
Machine-Learning | High-performance Cloud-Computing
@ronnyle_mann

Founded in April 2015
30 people (5 in San Francisco)
Bootstrap for 2 first years, recently funded
Hiring very much

Today’s Big-Data Bottleneck:
You are.
2000’s Big-Data Bottlenecks:
✓ Storing
✓ Querying
✓ Real-time processing

Good dev(ops) are hard-to-find
Employee tenure very low (<3yrs. Source: PayScale)
Operations is Tribal Knowledge
Machines are very loyal, never ask for a
raise and have excellent memory. Can
(some) of this be done with machines?

➜“I’ve been hearing this for 20 years”
Total Recall, a movie based on a book from 1966, featuring
a self-driving car as science fiction.
If Artificial-Intelligence has matured enough to drive your
car, it can probably also help with your IT.
Skeptic?!

• Real-time trend detection
• Pattern Recognition
• Large Dimensionality
• Complex State
• Strict Methodology
HUMANS
Good at top-down tasks
BOTS
Superior at bottom-up tasks
• Deep reasoning
• Contextual thinking
• Tired
• Bored
• Lazy
• Frustrated
• Married

That’s what we do @ Loom Systems
AIOps - Algorithmic IT operations
Use Big Data and Machine Learning Technologies to Achieve a Data-Centric Approach to
Availability and Performance Monitoring.
Extend the Data-Centric Approach to Other ITOM (IT Operations Monitoring) Disciplines, and Seek
to Exploit the Linkages It Allows Between ITOM, SIEM and Business Analytics

Action
•Remedy
•Recommendation
•Insight
•Knowledge
Root-Cause
Analysis
•Aggregation
•Correlation
•Causality
Data
Modelling
•Visualizations
•Define KPIs
•Reporting
•Rules & Thresholds
Data
Preparation
•Collection
•Normalization
•Sanitizing
•Preprocessing
Cracking the science behind data-science

Loom Ops – real-time AIOps
Processing
Semi-structured ->
Structured Data
MLP & Pattern
Recognition
Measure-All
Analysis
Behavior
Tracking
Anomaly Detection
& Trend Prediction
Correlation
Engine
Alerting
Incident
Enrichment
Insights Engine Routing

Three layers of context
Generic Context
Something being mentioned more than normal, or is appearing after long absence
Something stopped/started happening
Common Business Context
Semantical words (timeout, Trojan, failure)
Common Software
Proprietary Business Context
Names of business products, servers, applications..

Sep 27 14:25:54 megatron sshd[7498]: WARN - Failed password for user ronny from 192.168.118.1 port 48278 ssh2
Processing
Generic context – rate of this pattern in the logs
Common Business Context –
➜ Contextual words (Warn, Failed)
➜ Common Entities (User, IP, ssh)
Proprietary Business Context –
➜ Server Name
Real-time Sturcturing, Clustering
Token & Entity Extraction and Classification
HistogrammegatronServer
MetersshdApplication
MeterronnyUser
Meter192.168.118.1source_IP
Random48278source_port
Failed password for user [user] from [source_IP] port [source_port] ssh2

Automatic Structuring

- This is not (only) anomaly-detection (!)
Algorithms
3σ
Baseline
ARIMA
Feature extraction
Detection & Alerting
History
Scoring
Self Feedback
User Direct and
Indirect Feedback
Detection
When tracking up to 1M signals -> must
automatically determine what kind of
detections are interesting for every signal
(examples: website response time, ad-
click rate)

Root-Cause Analysis
When something breaks, anomalies are everywhere. How do you know what to fix?

Root-Cause Analysis
When something breaks, everything starts complaining. How do you know what to fix?

Automated Root-Cause Analysis. Aggregating the detections, correlating
and determining causality between them.
How?:
➜ Time-based causality
➜ Relationship-based analysis
➜ Graphs-based analysis
Root-Cause Analysis

Examples

Sep 27 14:25:54 megatron sshd[7498]: WARN - Failed password for user ronny from 192.168.118.1 port…
Sep 27 14:25:54 megatron sshd[7498]: WARN - Failed password for user dror from 192.168.118.4 port…
Sep 27 14:25:54 megatron sshd[7498]: WARN - Failed password for user john from 192.168.118.14 port…
Sep 27 14:25:55 megatron sshd[7498]: WARN - Failed password for user dan from 192.168.118.121 port…
Sep 27 14:25:55 megatron sshd[7498]: WARN - Failed password for user gab from 192.168.118.51 port…
Sep 27 14:25:55 megatron sshd[7498]: WARN - Failed password for user anna from 192.168.118.66 port…
Sep 27 14:25:55 megatron sshd[7498]: WARN - Failed password for user dan from 192.168.118.123 port…
Sep 27 14:25:56 megatron sshd[7498]: WARN - Failed password for user jim from 192.168.118.133 port…
Sep 27 14:25:56 megatron sshd[7498]: WARN - Failed password for user nate from 192.168.118.201 port…
Sep 27 14:25:56 megatron sshd[7498]: WARN - Failed password for user stan from 192.168.118.194 port…
Sep 27 14:25:56 megatron sshd[7498]: WARN - Failed password for user paul from 192.168.118.144 port…
Sep 27 14:25:56 megatron sshd[7498]: WARN - Failed password for user avi from 192.168.118.81 port…
Sep 27 14:25:57 megatron sshd[7498]: WARN - Failed password for user stas from 192.168.118.54 port…
ronny is mentioned more than normal in the context of ssh failures
The context of ssh failures is mentioned more than normal
Root-Cause Analysis – Relationship Based

Root-Cause Analysis- Graph Based

Correlated Incidents

Processing
Semi-structured ->
Structured Data
MLP & Pattern
Recognition
Measure-All
Analysis
Behavior
Tracking
Anomaly Detection
& Trend Prediction
Correlation
Engine
Alerting
Incident
Enrichment
Insights Engine Routing
Real-Time AIOps

Countering Alert Flooding / Alert Fatigue
➜ Overall rate of incidents
➜ Quality of an incident
An incident report:
➜ Root-Cause Analysis
➜ History of similar incidents
➜ Insights & Recommendations
Incident Enrichments

Incident Enrichments

Thank you!
(still hiring very much)

Applying ML for Log Analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Applying ML for Log Analysis

Similar to Applying ML for Log Analysis (20)

More from DoiT International

More from DoiT International (19)

Recently uploaded

Recently uploaded (20)

Applying ML for Log Analysis

Editor's Notes