Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Machine Learning concepts	for	
software	monitoring
Lior Redlus
Co-founder	and	Chief	Scientist
Coralogix
About	Myself
• 31yr.	Scientist	at	heart.
• B.Sc and	M.Sc in	Neuroscience	and	Information	Processing	(BIU)
• Co-founder	and...
About	Coralogix
• A	Machine	Learning	platform	for	software	Log	Analysis
• Log	Management	already	included:	indexing,	query...
In	this	talk…
• We’ll	explore	some	challenges	in	software	logs	today
• Have	an	overview	of	machine	learning	and	some	use	c...
Schedule:
• Logs	today
• Machine	Learning	to	the	rescue!
• Types	of	Machine	Learning
• Applying	to	Log	Records
• Possible	...
Logs	today	(1)
• What	do	we	use	them	for?
• Debugging
• Security
• Compliance
• User	analytics
• and	many	more!
• Two	use	...
Logs	today	(2)
• Open-source	software	accelerates	
development
• Cloud	enables	massive	scale
• Even	small	companies	are	
g...
Logs	today	(3)
• Log	Management	(and	Big	Data)	approach:
1. Collect	everything!
2. Don’t	worry,	we’ll	know	what	to	do	when...
Logs	today	(4)
• The	problem	with	Log	Management:
• Humans	do	the	analysis
• And	humans	are	bad	at…
• Identifying	complex	...
Logs	today	(5)
• Too	much	time	is	wasted	on	FINDING issues	
instead	of	FIXING them
• Most	DevOps spend	>70%	of	issue	resol...
• Problem:
Log	Management	does	not	have	a	“brain”
• Solution:
Give	it	a	brain!
In	other	words:	welcome	to	Log	Analytics
Lo...
Schedule:
• Logs	today
• Machine	Learning	to	the	rescue!
• Types	of	Machine	Learning
• Applying	to	Log	Records
• Possible	...
Machine	Learning	to	the	rescue
• What	is	Machine	Learning?
"Field	of	study	that	gives	computers	the	ability	to	learn	witho...
Machine	Learning	to	the	rescue
• Traditional	coding:
• You	have	a	model	of	the	world
• You	write	code	that	explicitly	repr...
Machine	Learning	to	the	rescue
• Machine	Learning:
• You	have	loose	concepts	about	the	world	(or	even	none!)
• You	write	c...
Schedule:
• Logs	today
• Machine	Learning	to	the	rescue!
• Types	of	Machine	Learning
• Applying	to	Log	Records
• Possible	...
• Supervised	Learning:
• Uses	data	with	clearly-defined	output	(“labeled	data”)
• Machine	learns	explicitly	through	right	...
• Regression	1	– Given	the	temperature	and	yogurt	sold
• Predict	the	temperature	based	on	amount	of	yogurt	sold
• Linear	r...
• Regression	2	– Given	cups	of	coffee	sold	per	10	minutes
• Predict	how	many	cups	are	sold	on	any	given	time	of	the	day
• ...
• Classification:	can	we	automatically	identify	the	type	of	an	iris?
• Assumption:	we	can	differentiate	iris	types	by	thei...
Types	of	Machine	Learning	– Classification	(2)
• Classification:	given	leaves	sizes	of	irises	(Fisher’s data	set,	1936)
• ...
Types	of	Machine	Learning	– Classification	(3)
• Classification:	Fisher’s iris	data	set
• Support	Vector	Machine	(SVM)	ach...
• Reinforcement	(reward-based)	Learning:
• A	set	of	rules	defines	interaction	with	the	environment
• “Good”	actions	may	gr...
Types	of	Machine	Learning	– Reinforcement	(2)
• Recommender	systems:
• Build	profiles	for	items	and	for	users
• Recommend	...
Types	of	Machine	Learning	– Reinforcement	(3)
• Generally	speaking,	recommender	systems	offer	similar	things	to	
similar	u...
Types	of	Machine	Learning	– Unsupervised	(1)
• Problem:
• Supervised	learning	is	good,	but	requires	labeled	data
• Most	da...
• Some	approaches	include:
• Clustering	algorithms:	k-means,	k-nearest-neighbors	etc.
• Anomaly	detection	of	rare	events
•...
Types	of	Machine	Learning	– Unsupervised	(3)
• Deep	Learning:	can	we	automatically	cluster	digits	together?
• Data:	60,000...
Types	of	Machine	Learning	– Unsupervised	(4)
• Deep	Learning:	can	we	automatically	cluster	digits	together?
• Image	vector...
Types	of	Machine	Learning	– Unsupervised	(5)
• Deep	Learning:	can	we	automatically	cluster	digits	together?
• The	neural	n...
Types	of	Machine	Learning	– Unsupervised	(6)
• Deep	Learning:	can	we	automatically	cluster	digits	together?
• The	last	lay...
Schedule:
• Logs	today
• Machine	Learning	to	the	rescue!
• Types	of	Machine	Learning
• Applying	to	Log	Records
• Possible	...
Applying	to	Log	Records	(1)
• Problems:
• Log	data	is	very	redundant
• Hard	to	find	the	important	events
• Rare	logs	are	a...
Applying	to	Log	Records	(2)
• Solutions:
• Identify	log	prototypes	(“log	templates”)
• Cluster	logs	which	represent	an	act...
Log	prototypes	distribution	– real-world
• The	10	most	frequent	logs	make	up	~60%	of	the	data	(!)
Log	
Prototypes
Log	
Fre...
Log	prototypes	distribution	– real-world
Show	me	statistics	and	
correlate	these:
Alert	me	when	
these	happen:
Today’s	schedule:
• Logs	today
• Machine	Learning	to	the	rescue!
• Types	of	Machine	Learning
• Possible	log	analysis	pipel...
Log	analysis	pipeline	- clustering
• Cluster	log	records	(raw	strings)	into	log	prototypes:
I. Find	a	distance	metric	to	c...
Log	1: “Creating	tag	on	Stream:	-1	Position:	42”
Log	2: “Creating	tag	on	Stream:	2	Position:	65”
Log	analysis	pipeline	- c...
Log	analysis	pipeline	- clustering
• Problem:	comparing	all	log	sub-strings	is	expensive!
• Solution:	use	heuristic	distan...
Log	analysis	pipeline	- clustering
• Result:	M	raw	log	records	à N	log	prototypes
(N	<<	M)
• M	is	in	the	billions;	N	is	in...
Log	analysis	pipeline	– variable	statistics
• Model	distribution	of	variables	within	log	prototypes
• Define	anomaly	bound...
Log	analysis	pipeline	– sequence	finding
• Find	sequences	of	log	prototypes	that	are	statistically-related
• Independence	...
Authenticate
payment
Log	analysis	pipeline	– sequence	finding
Purchase
request
Get	cart
from	DB
Process	DB	
response
Send	...
Log	analysis	pipeline	– sequence	finding
• Count	all	log	sequences	of	length	2	(2-sequences)
• L1L2 will	be	a	frequent	2-s...
Log	analysis	pipeline	– sequence	finding
• After	mapping	all	2-sequences,	normalize	their	scores:
• Subtract	by	the	averag...
Log	analysis	pipeline	– sequence	finding
• Repeat	the	process:
• For	each	k-sequence	try	to	construct	a	longer	(k+1)-seque...
Log	analysis	pipeline
• Determine	the	ratio	of	each	log	within	the	sequence
• E.g.	1:1:1	is	a	3-sequence	where	the	ratio	o...
Log	analysis	pipeline
• Alert about a sequence anomaly when ratio is distant enough from
the valid sequence, e.g. 𝑝 < 0.00...
Summary
• Everyone	will	analyze	their	Big	Data	– including	logs
• Hard	to	do	by	yourself	– but	extremely	rewarding!
• Most...
Questions?
• Please	feel	free	to	contact	me	directly:
Lior	Redlus,	Chief	Scientist,	lior@coralogix.com
http://www.coralogi...
Machine Learning Concepts for Software Monitoring - Lior Redlus, Coralogix - DevOpsDays Tel Aviv 2016
Machine Learning Concepts for Software Monitoring - Lior Redlus, Coralogix - DevOpsDays Tel Aviv 2016
Upcoming SlideShare
Loading in …5
×

Machine Learning Concepts for Software Monitoring - Lior Redlus, Coralogix - DevOpsDays Tel Aviv 2016

427 views

Published on

"Cloud environments and Open Source software have lowered the bar for anyone to implement software solutions.
Complex relationships between system components are frequently missed by the human eye, and small but important changes are neglected. This, along with the sheer amount of monitoring data, call for a new approach.
"

Published in: Technology
  • $25 per hour jobs on Facebook, now hiring! ♥♥♥ http://t.cn/AieXipTS
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Earn Up To $316/day! Social Media Jobs from the comfort of home! ♥♥♥ http://ishbv.com/socialpaid/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Machine Learning Concepts for Software Monitoring - Lior Redlus, Coralogix - DevOpsDays Tel Aviv 2016

  1. 1. Machine Learning concepts for software monitoring Lior Redlus Co-founder and Chief Scientist Coralogix
  2. 2. About Myself • 31yr. Scientist at heart. • B.Sc and M.Sc in Neuroscience and Information Processing (BIU) • Co-founder and Chief Scientist @ Coralogix
  3. 3. About Coralogix • A Machine Learning platform for software Log Analysis • Log Management already included: indexing, querying, filtering, alerting etc. • Coralogix Analytics: • Turns your data into patterns and flows • Gives you deep insights on your system • Automatically detects production problems
  4. 4. In this talk… • We’ll explore some challenges in software logs today • Have an overview of machine learning and some use cases • Suggest a fully-automatic algorithm for anomaly detection in logs
  5. 5. Schedule: • Logs today • Machine Learning to the rescue! • Types of Machine Learning • Applying to Log Records • Possible log analysis pipeline
  6. 6. Logs today (1) • What do we use them for? • Debugging • Security • Compliance • User analytics • and many more! • Two use cases stand out: • Production Monitoring (70%) • Production Troubleshooting (67%)
  7. 7. Logs today (2) • Open-source software accelerates development • Cloud enables massive scale • Even small companies are generating huge amounts of logs • The growth is exponential!
  8. 8. Logs today (3) • Log Management (and Big Data) approach: 1. Collect everything! 2. Don’t worry, we’ll know what to do when we need it • Or will we..?
  9. 9. Logs today (4) • The problem with Log Management: • Humans do the analysis • And humans are bad at… • Identifying complex relationships • Noticing small (but important) changes • Staying 100% in focus all the time
  10. 10. Logs today (5) • Too much time is wasted on FINDING issues instead of FIXING them • Most DevOps spend >70% of issue resolution time just to find what went wrong!
  11. 11. • Problem: Log Management does not have a “brain” • Solution: Give it a brain! In other words: welcome to Log Analytics Logs today (6)
  12. 12. Schedule: • Logs today • Machine Learning to the rescue! • Types of Machine Learning • Applying to Log Records • Possible log analysis pipeline
  13. 13. Machine Learning to the rescue • What is Machine Learning? "Field of study that gives computers the ability to learn without being explicitly programmed.“ - Arthur Samuel, pioneer of Machine Learning, 1959
  14. 14. Machine Learning to the rescue • Traditional coding: • You have a model of the world • You write code that explicitly represents this model • The code behaves exactly as expected • Need to manually update the code in a changing world
  15. 15. Machine Learning to the rescue • Machine Learning: • You have loose concepts about the world (or even none!) • You write code that learns the data and builds models of the world • The exact behavior of the code is not known, but generally works well • Can automatically update the model as needed! • How well? • Much faster than humans • Sometimes with better accuracy!
  16. 16. Schedule: • Logs today • Machine Learning to the rescue! • Types of Machine Learning • Applying to Log Records • Possible log analysis pipeline
  17. 17. • Supervised Learning: • Uses data with clearly-defined output (“labeled data”) • Machine learns explicitly through right and wrong answers • Two main types: • Regression – Predict continuous values based on sets of (correlated) data • Classification – Predict the class of an item based on its properties Types of Machine Learning - Supervised
  18. 18. • Regression 1 – Given the temperature and yogurt sold • Predict the temperature based on amount of yogurt sold • Linear regression: Types of Machine Learning – Regression (1) Temperature (F) Frozen yogurt sold (lbs)
  19. 19. • Regression 2 – Given cups of coffee sold per 10 minutes • Predict how many cups are sold on any given time of the day • Linear regression: • Polynomial regression: Types of Machine Learning – Regression (2) Time of day (hours) Cups of coffee sold
  20. 20. • Classification: can we automatically identify the type of an iris? • Assumption: we can differentiate iris types by their leaves sizes Types of Machine Learning – Classification (1)
  21. 21. Types of Machine Learning – Classification (2) • Classification: given leaves sizes of irises (Fisher’s data set, 1936) • Predict which type is an iris based on its leaves
  22. 22. Types of Machine Learning – Classification (3) • Classification: Fisher’s iris data set • Support Vector Machine (SVM) achieves 73% accuracy! Sepal Width (cm) Sepal Length (cm) SVM with linear kernel Sepal Length (cm) Sepal Width (cm) setosa versicolor virginica
  23. 23. • Reinforcement (reward-based) Learning: • A set of rules defines interaction with the environment • “Good” actions may grant rewards • “Bad” actions may reduce rewards • Machine tries to maximize this score • Used in game bots, recommender systems etc. Types of Machine Learning – Reinforcement (1)
  24. 24. Types of Machine Learning – Reinforcement (2) • Recommender systems: • Build profiles for items and for users • Recommend an item to a user based on previous purchases • Gain rewards when users click on recommended items • Update profiles based on recommendations, ratings etc.
  25. 25. Types of Machine Learning – Reinforcement (3) • Generally speaking, recommender systems offer similar things to similar users: Jim Bob
  26. 26. Types of Machine Learning – Unsupervised (1) • Problem: • Supervised learning is good, but requires labeled data • Most data in the world is not labeled, there’s no right/wrong answer • Labeling requires human effort à tedious and expensive • Unsupervised Learning: • The machine automatically recognizes relationships in the data • No right or wrong answers are given • Many times used to enhance Supervised Learning
  27. 27. • Some approaches include: • Clustering algorithms: k-means, k-nearest-neighbors etc. • Anomaly detection of rare events • Deep learning (for pretty much everything…) • Deep Learning approach: • Learn from a lot of non-labeled data • Learn highly non-linear correlations (represent complex relationships) • Surprisingly good results for many applications! Types of Machine Learning – Unsupervised (2)
  28. 28. Types of Machine Learning – Unsupervised (3) • Deep Learning: can we automatically cluster digits together? • Data: 60,000 b/w 20x20 pixel images of hand-written digits • Each image is “flattened” to a 1D vector of 400 floating point values [0..1] [0.0, 0.0, 0.01, 0.07, 0.07, 0.07, 0.49, 0.65, 1.0, 0.97, …, 0.0, 0.0]
  29. 29. Types of Machine Learning – Unsupervised (4) • Deep Learning: can we automatically cluster digits together? • Image vectors are fed to the neural network [0.0, 0.0, 0.01, 0.07, 0.07, 0.07, 0.49, 0.65, 1.0, 0.97, …, 0.0, 0.0] . . . . . . . . . . . .
  30. 30. Types of Machine Learning – Unsupervised (5) • Deep Learning: can we automatically cluster digits together? • The neural network automatically learns features of the images • Each neuron “lights up” when it recognizes a feature in the previous layer round edges vertical lines diagonal lines … etc … . .
  31. 31. Types of Machine Learning – Unsupervised (6) • Deep Learning: can we automatically cluster digits together? • The last layer recognizes highly complex features of the image: the digits! • This method achieves an amazing 0.2% error rate in this task! [0.0, 0.0, 0.01, 0.07, 0.97, …, 0.0, 0.0] 3 1 Output: 1
  32. 32. Schedule: • Logs today • Machine Learning to the rescue! • Types of Machine Learning • Applying to Log Records • Possible log analysis pipeline
  33. 33. Applying to Log Records (1) • Problems: • Log data is very redundant • Hard to find the important events • Rare logs are a needle in the haystack • Also: • Actions in the system are represented by a series of logs records • But other logs interrupt the visual flow • Tracing the logs of a complete action is hard
  34. 34. Applying to Log Records (2) • Solutions: • Identify log prototypes (“log templates”) • Cluster logs which represent an action • Alert when actions are incomplete or anomalous • Notify about new errors which have never occurred before And much more!
  35. 35. Log prototypes distribution – real-world • The 10 most frequent logs make up ~60% of the data (!) Log Prototypes Log Frequency
  36. 36. Log prototypes distribution – real-world Show me statistics and correlate these: Alert me when these happen:
  37. 37. Today’s schedule: • Logs today • Machine Learning to the rescue! • Types of Machine Learning • Possible log analysis pipeline
  38. 38. Log analysis pipeline - clustering • Cluster log records (raw strings) into log prototypes: I. Find a distance metric to compare log records II. Create a new type of log if distance is too far III. Find the variables within log types
  39. 39. Log 1: “Creating tag on Stream: -1 Position: 42” Log 2: “Creating tag on Stream: 2 Position: 65” Log analysis pipeline - clustering
  40. 40. Log analysis pipeline - clustering • Problem: comparing all log sub-strings is expensive! • Solution: use heuristic distance methods “Creating tag on Stream: -1 Position: 42” {Creating} {tag} {on} {Stream:} {-1} . . . Locality-sensitive hashing (LSH) 0011000010…0100 Log 1 Hash Log 2 Hash … Log n Hash
  41. 41. Log analysis pipeline - clustering • Result: M raw log records à N log prototypes (N << M) • M is in the billions; N is in the thousands “Creating tag on Stream: -1 Position: 42” “Creating tag on Stream: 2 Position: 65” “Creating tag on Stream: {var1} Position: {var2}”
  42. 42. Log analysis pipeline – variable statistics • Model distribution of variables within log prototypes • Define anomaly boundaries “Creating tag on Stream: {var1} Position: {var2}” ValuesVariable [-1 , 2 , … , 1]var1 [42 , 65 , … , 53]var2 Anomalous values
  43. 43. Log analysis pipeline – sequence finding • Find sequences of log prototypes that are statistically-related • Independence assumption – if logs are unrelated, all pairs should have the same probability • Sequences with related logs will have higher counts, and break the G-Test:
  44. 44. Authenticate payment Log analysis pipeline – sequence finding Purchase request Get cart from DB Process DB response Send response to client Update BI system 2 Mark as complete 1 2 3 4 6 7 Update BI system 1 55
  45. 45. Log analysis pipeline – sequence finding • Count all log sequences of length 2 (2-sequences) • L1L2 will be a frequent 2-sequence • We expect not to find any occurrences of L1L4
  46. 46. Log analysis pipeline – sequence finding • After mapping all 2-sequences, normalize their scores: • Subtract by the average • Divide by the variance • Try to lengthen all 2-sequences by one log to 3-sequences 𝑆# $ % = 𝐹𝑟𝑒𝑞 𝑆# $ − 𝜇 𝑆 $ 𝜎 𝑆 $
  47. 47. Log analysis pipeline – sequence finding • Repeat the process: • For each k-sequence try to construct a longer (k+1)-sequence • Stop when failing the G-Test or when the normalized score decreases: • Save the k-sequence as valid (an action in the system) 𝑆# / % < 𝑆# /1# %
  48. 48. Log analysis pipeline • Determine the ratio of each log within the sequence • E.g. 1:1:1 is a 3-sequence where the ratio of each log prototype is the same • In our example: • 1:1:1:1:2:1:1, a 7-sequence with one log prototype expected twice as much as the others
  49. 49. Log analysis pipeline • Alert about a sequence anomaly when ratio is distant enough from the valid sequence, e.g. 𝑝 < 0.001 • Software is constantly changing – update all models all the time • Of course, there is much more then we explored here!
  50. 50. Summary • Everyone will analyze their Big Data – including logs • Hard to do by yourself – but extremely rewarding! • Most importantly: You can focus on your product instead of its bugs
  51. 51. Questions? • Please feel free to contact me directly: Lior Redlus, Chief Scientist, lior@coralogix.com http://www.coralogix.com

×