Machine Learning Concepts for Software Monitoring - Lior Redlus, Coralogix - DevOpsDays Tel Aviv 2016

Machine Learning concepts for
software monitoring
Lior Redlus
Co-founder and Chief Scientist
Coralogix

About Myself
• 31yr. Scientist at heart.
• B.Sc and M.Sc in Neuroscience and Information Processing (BIU)
• Co-founder and Chief Scientist @ Coralogix

About Coralogix
• A Machine Learning platform for software Log Analysis
• Log Management already included: indexing, querying, filtering,
alerting etc.
• Coralogix Analytics:
• Turns your data into patterns and flows
• Gives you deep insights on your system
• Automatically detects production problems

In this talk…
• We’ll explore some challenges in software logs today
• Have an overview of machine learning and some use cases
• Suggest a fully-automatic algorithm for anomaly detection in logs

Schedule:
• Logs today
• Machine Learning to the rescue!
• Types of Machine Learning
• Applying to Log Records
• Possible log analysis pipeline

Logs today (1)
• What do we use them for?
• Debugging
• Security
• Compliance
• User analytics
• and many more!
• Two use cases stand out:
• Production Monitoring (70%)
• Production Troubleshooting (67%)

Logs today (2)
• Open-source software accelerates
development
• Cloud enables massive scale
• Even small companies are
generating huge amounts of logs
• The growth is exponential!

Logs today (3)
• Log Management (and Big Data) approach:
1. Collect everything!
2. Don’t worry, we’ll know what to do when we need it
• Or will we..?

Logs today (4)
• The problem with Log Management:
• Humans do the analysis
• And humans are bad at…
• Identifying complex relationships
• Noticing small (but important) changes
• Staying 100% in focus all the time

Logs today (5)
• Too much time is wasted on FINDING issues
instead of FIXING them
• Most DevOps spend >70% of issue resolution
time just to find what went wrong!

• Problem:
Log Management does not have a “brain”
• Solution:
Give it a brain!
In other words: welcome to Log Analytics
Logs today (6)

Machine Learning to the rescue
• What is Machine Learning?
"Field of study that gives computers the ability to learn without being
explicitly programmed.“
- Arthur Samuel, pioneer of Machine Learning, 1959

• Traditional coding:
• You have a model of the world
• You write code that explicitly represents this model
• The code behaves exactly as expected
• Need to manually update the code in a changing world

• Machine Learning:
• You have loose concepts about the world (or even none!)
• You write code that learns the data and builds models of the world
• The exact behavior of the code is not known, but generally works well
• Can automatically update the model as needed!
• How well?
• Much faster than humans
• Sometimes with better accuracy!

• Supervised Learning:
• Uses data with clearly-defined output (“labeled data”)
• Machine learns explicitly through right and wrong answers
• Two main types:
• Regression – Predict continuous values based on sets of (correlated) data
• Classification – Predict the class of an item based on its properties
Types of Machine Learning - Supervised

• Regression 1 – Given the temperature and yogurt sold
• Predict the temperature based on amount of yogurt sold
• Linear regression:
Types of Machine Learning – Regression (1)
Temperature (F)
Frozen yogurt sold (lbs)

• Regression 2 – Given cups of coffee sold per 10 minutes
• Predict how many cups are sold on any given time of the day
• Linear regression:
• Polynomial regression:
Types of Machine Learning – Regression (2)
Time of day (hours)
Cups of coffee sold

• Classification: can we automatically identify the type of an iris?
• Assumption: we can differentiate iris types by their leaves sizes
Types of Machine Learning – Classification (1)

• Classification: given leaves sizes of irises (Fisher’s data set, 1936)
• Predict which type is an iris based on its leaves

• Classification: Fisher’s iris data set
• Support Vector Machine (SVM) achieves 73% accuracy!
Sepal Width (cm)
Sepal Length (cm)
SVM with
linear kernel
Sepal Length (cm)
Sepal Width (cm)
setosa
versicolor
virginica

• Reinforcement (reward-based) Learning:
• A set of rules defines interaction with the environment
• “Good” actions may grant rewards
• “Bad” actions may reduce rewards
• Machine tries to maximize this score
• Used in game bots, recommender systems etc.
Types of Machine Learning – Reinforcement (1)

• Recommender systems:
• Build profiles for items and for users
• Recommend an item to a user based on previous purchases
• Gain rewards when users click on recommended items
• Update profiles based on recommendations, ratings etc.

• Generally speaking, recommender systems offer similar things to
similar users:
Jim
Bob

Types of Machine Learning – Unsupervised (1)
• Problem:
• Supervised learning is good, but requires labeled data
• Most data in the world is not labeled, there’s no right/wrong answer
• Labeling requires human effort à tedious and expensive
• Unsupervised Learning:
• The machine automatically recognizes relationships in the data
• No right or wrong answers are given
• Many times used to enhance Supervised Learning

• Some approaches include:
• Clustering algorithms: k-means, k-nearest-neighbors etc.
• Anomaly detection of rare events
• Deep learning (for pretty much everything…)
• Deep Learning approach:
• Learn from a lot of non-labeled data
• Learn highly non-linear correlations (represent complex relationships)
• Surprisingly good results for many applications!

• Deep Learning: can we automatically cluster digits together?
• Data: 60,000 b/w 20x20 pixel images of hand-written digits
• Each image is “flattened” to a 1D vector of 400 floating point values [0..1]
[0.0, 0.0, 0.01, 0.07, 0.07, 0.07, 0.49, 0.65, 1.0, 0.97, …, 0.0, 0.0]

• Image vectors are fed to the neural network
[0.0, 0.0, 0.01, 0.07, 0.07, 0.07, 0.49, 0.65, 1.0, 0.97, …, 0.0, 0.0]
.
.
.
. . . . . . . . .

• The neural network automatically learns features of the images
• Each neuron “lights up” when it recognizes a feature in the previous layer
round edges
vertical lines
diagonal lines
… etc …
.
.

• The last layer recognizes highly complex features of the image: the digits!
• This method achieves an amazing 0.2% error rate in this task!
[0.0, 0.0, 0.01, 0.07, 0.97, …, 0.0, 0.0]
3
1
Output:
1

Applying to Log Records (1)
• Problems:
• Log data is very redundant
• Hard to find the important events
• Rare logs are a needle in the haystack
• Also:
• Actions in the system are represented by a series of logs records
• But other logs interrupt the visual flow
• Tracing the logs of a complete action is hard

Applying to Log Records (2)
• Solutions:
• Identify log prototypes (“log templates”)
• Cluster logs which represent an action
• Alert when actions are incomplete or anomalous
• Notify about new errors which have never occurred before
And much more!

Log prototypes distribution – real-world
• The 10 most frequent logs make up ~60% of the data (!)
Log
Prototypes
Log
Frequency

Log prototypes distribution – real-world
Show me statistics and
correlate these:
Alert me when
these happen:

Today’s schedule:
• Logs today
• Machine Learning to the rescue!
• Types of Machine Learning
• Possible log analysis pipeline

Log analysis pipeline - clustering
• Cluster log records (raw strings) into log prototypes:
I. Find a distance metric to compare log records
II. Create a new type of log if distance is too far
III. Find the variables within log types

Log 1: “Creating tag on Stream: -1 Position: 42”
Log 2: “Creating tag on Stream: 2 Position: 65”

• Problem: comparing all log sub-strings is expensive!
• Solution: use heuristic distance methods
“Creating tag on Stream: -1 Position: 42”
{Creating}
{tag}
{on}
{Stream:}
{-1}
.
.
.
Locality-sensitive hashing (LSH)
0011000010…0100
Log 1 Hash
Log 2 Hash
…
Log n Hash

• Result: M raw log records à N log prototypes
(N << M)
• M is in the billions; N is in the thousands
“Creating tag on Stream: -1 Position: 42”
“Creating tag on Stream: 2 Position: 65”
“Creating tag on Stream: {var1} Position: {var2}”

Log analysis pipeline – variable statistics
• Model distribution of variables within log prototypes
• Define anomaly boundaries
“Creating tag on Stream: {var1} Position: {var2}”
ValuesVariable
[-1 , 2 , … , 1]var1
[42 , 65 , … , 53]var2
Anomalous
values

Log analysis pipeline – sequence finding
• Find sequences of log prototypes that are statistically-related
• Independence assumption – if logs are unrelated, all pairs should
have the same probability
• Sequences with related logs will have higher counts, and break
the G-Test:

Authenticate
payment
Purchase
request
Get cart
from DB
Process DB
response
Send response
to client
Update BI
system 2
Mark as
complete
1 2 3
4
6
7
Update BI
system 1
55

• Count all log sequences of length 2 (2-sequences)
• L1L2 will be a frequent 2-sequence
• We expect not to find any occurrences of L1L4

• After mapping all 2-sequences, normalize their scores:
• Subtract by the average
• Divide by the variance
• Try to lengthen all 2-sequences by one log to 3-sequences
𝑆#
$ %
=
𝐹𝑟𝑒𝑞 𝑆#
$
− 𝜇 𝑆 $
𝜎 𝑆 $

• Repeat the process:
• For each k-sequence try to construct a longer (k+1)-sequence
• Stop when failing the G-Test or when the normalized score decreases:
• Save the k-sequence as valid (an action in the system)
𝑆#
/ %
< 𝑆#
/1# %

Log analysis pipeline
• Determine the ratio of each log within the sequence
• E.g. 1:1:1 is a 3-sequence where the ratio of each log prototype is the same
• In our example:
• 1:1:1:1:2:1:1, a 7-sequence with one log prototype expected twice as much
as the others

Log analysis pipeline
• Alert about a sequence anomaly when ratio is distant enough from
the valid sequence, e.g. 𝑝 < 0.001
• Software is constantly changing – update all models all the time
• Of course, there is much more then we explored here!

Summary
• Everyone will analyze their Big Data – including logs
• Hard to do by yourself – but extremely rewarding!
• Most importantly:
You can focus on your product instead of its bugs

Questions?
• Please feel free to contact me directly:
Lior Redlus, Chief Scientist, lior@coralogix.com
http://www.coralogix.com

Machine Learning Concepts for Software Monitoring - Lior Redlus, Coralogix - DevOpsDays Tel Aviv 2016

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Machine Learning Concepts for Software Monitoring - Lior Redlus, Coralogix - DevOpsDays Tel Aviv 2016

Similar to Machine Learning Concepts for Software Monitoring - Lior Redlus, Coralogix - DevOpsDays Tel Aviv 2016 (20)

More from DevOpsDays Tel Aviv

More from DevOpsDays Tel Aviv (20)

Recently uploaded

Recently uploaded (20)

Machine Learning Concepts for Software Monitoring - Lior Redlus, Coralogix - DevOpsDays Tel Aviv 2016

Editor's Notes