Your SlideShare is downloading. ×
0
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios

1,471

Published on

Nicholas Scott's presentation on advanced analytics Nagios. …

Nicholas Scott's presentation on advanced analytics Nagios.
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,471
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
34
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Try to keep this applicable to real life, as this is the Nagios world conference, I just like the math portion of it Looking for hardcore application, Wittenberg is presenting right now and its very applicative However, I will foray into implementation a bit, and since I like programming to some tips on what I learned when implementing these Statistics, I like it, perhaps some things I overlooked Haile story
  • Cover the new CP component for Nagios XI - Some of the features, dates, extrapolation, RRD data validity exclusions - sprinkled with the how and why behind whats going on RRD Data Analysis tool - Derivatives, Bivariate comparisons, correlation - Free, I put it together for fun contact me if you want it, want to use it in a project or personal use, whatevs
  • Nagios collects data at 5 minutes, and, god help us, our uptime... Each service is a complex function, how would you write a function to represent all factors that affect the services perfdata? After thinking about that? Are you sure? Financial sectors deals with this everyday Goal is to make this data usable, heart of forecasting and analysis, understand the numbers better, seems abstract at first, and takes time
  • The capacity planning component was designed so that you don't have to know much to get a some forecasting going
  • Periods: Time where a pattern may repeat itself Extrap is limited to 4 * period Methods: A few more are in development, but the current set is a 'good start' All are self-projecting, rather than cause-and-effect
  • Without going through the forumula, well kind of Smoothed value – exponentially weighted Trend value - Represents variations of the time series that happen at a lower frequency Seasonal Value Represent items that occur across trends, could be a construed as the trend of the trend Calculates initial trend by: Split the two known periods, calculate trend by summing second period_t – first period_t, divide by L, then divide that sum by L,
  • Feeds back on itself, if the difference from period 1 to period 2 contained some strange outlier, it will be represented, and exaggerated in next steps However, there is something satisfying about having a somewhat educated guess as to what a stat is going to be in several weeks/months Which is a shortcoming of holt winters, outliers can destroy it Smoothing may be necessary or preferred, not currently implemented, on todo list for future release, presents own issues, Would like to discuss implementation as its fascinating, but we'll move on as its also time consuming
  • Should not be used to predict future values, but to predict future direction Should be treated as more of a “this should be around this level at this time.” Will however be wrong if dealing with an exponential or quadratic dataset, wouldn't be noticeable if extrapolation period was short enough however, eg derivatives.
  • Good for noisy data as it is mean only as a trender Actual graph line shows where the least squared of the residuals will be in the future Aside: Fun to implement. If you're interested in Linear Algebra you'll have a blast.
  • Do it if you like Linear Algebra, or just want to hone youre programming prowess, doing any sort of matrix operations will make you better at algorithms. Don't look for pot of gold at the end, its hard to do clever stuff that severely reduces time complexity of basic matrix operations RRD abstraction class is avaiable through the stats thing I wrote about, makes it take less thought on getting info out of the RRD
  • Much like least squares, fits polynomial to have the minimum sum of the squared residuals Gears more towards items where you would expect exponential growth Given thats its for exponential growth, can be very touch, the more data you have to compare with, the better it will be, which goes for every one of these, but this one in particular
  • Once again, this is for anticpated exponential datasets User decision, are you expecting quadratic or cubic growth or decay, or want to plan for it?
  • Looking to take a crack at some general stat data with an eye on nagios Analysis stuff has been around a while, just looking to make something specific to Nagios and RRDs Take a look at what these definitions actually mean to a network operation, or the usual nagios setup If you want to use it, or help develop it, feel free
  • Looking to take a crack at some general stat data with an eye on nagios Analysis stuff has been around a while, just looking to make something specific to Nagios and RRDs Take a look at what these definitions actually mean to a network operation, or the usual nagios setup
  • Weird random stuff happens, and this weird random stuff throws off statistical analysis, kind of strange if you think about it philosophically, however this isn't philosophy, this is math, there are rules Would you have wanted that spke to 5 to register as a critical? That speaks to the noise, as we'll see when we go into the derivatives Stdev – helps to understand the outliers and for setting up normal distributions for calculating the odds of what future values may be Variance – Can help identify multiplicative trend when mean and variance are increasing with some period
  • Our use case is thatx = RRD data with the y being the time value those values occured. Since we're not in math class, no need to do this as h approaches 0 business This actually makes our job pretty easy, obviously we'll need a y_t-1 value, which we'll just leave as 0 as we
  • Everyday. Every single time you see a Bytes/Sec reading, thats a delta, and thats all this is trying to do Why is the current byte count useless to us? Do our brains not keep its state? Probably, can we apply that other metrics? Would it be useful? When would it not be useful? Bytes per second is always increasing, CPU load is not Can we relate this to physics, if we can we can use their entire wealth of information, however the nature may be different
  • Do you care what the rate of change is of your CPU load per 300 seconds? What does the mean actually symbolize here? Or any of them Interpret: Mean – The CPU load was slowly growing Max – magnitude of the highest rate of positive increase, and we can see the time that it happened, not when it peaked, but when it started its rise to it Min – Same thing
  • Root partition on Nagios test box, obviously a very active nagios box Obviously not an active hard drive and these values are nothing to worry about Keep in mind peaks of actual bytes happen when the derivative is going from pos -> neg at zero. Helps isolate actual times of events.
  • Now we get back to the second derivative, which if you remember is similar to the acceleration How fast was the rate of change changing? What does this mean? At zero the velocity is at its local max/min Cycle is back as far as timing goes d(d(cos)) F = ma, is there something we could assign to be m, F? Might show relative magnitude of impulse
  • Correlation We have all these services/hosts, are they related? We can postulate, but we don't know for sure If there are lags we woudn't really know, but lets start simple Graph em Find Pearsons
  • We can see that there is definitely a relationship, two different checks that are checking local ping, but are getting slightly different results Transcends that though We can imagine a line on that graph that would do a pretty good job of representing those points 0 - .09 : None .1 - .3 : Small .3 - .5 : Medium Else Strong
  • Hard to pull the relationship out of this graph R shows a medium NEGATIVE correlation, meaning that when one goes up, the other goes down Would've been hard to pull that out without a little help 0 - .09 : None .1 - .3 : Small .3 - .5 : Medium Else Strong
  • Shows an example of no, or very weak correlation
  • Transcript

    1. Data Analysis Nicholas Scott nscott@nagios.com
    2. Disclaimer Math may occur later. I apologize in advance. 2012 2
    3. Abstract Introduction Capacity Planning Component Features Different Forecasting Methods When to use RRD Analysis Tool Statistics Pillow Talk 2012 3
    4. Introduction Nagios Data Gathering Attributes SO MUCH DATA (TOO MUCH?) Generally noisy Sources usually not simple How many factors are affecting service X on a given host Y? We have data showing X is like this but why? 2012 4
    5. Capacity Planning Terminology Residuals – Variation that exists after fitting Period – A frame of time where a pattern cycles through a complete iteration Example: 2012 5
    6. Capacity Planning/home/nscott/Documents/NWC Presentations/DataAnalytics/capacityplanning/capacityplanning.mp4 2012 6
    7. Capacity Planning Holt-Winters Great next-step forecasting for complex systems 2012 7
    8. Capacity Planning Gets Dicey for anything more, tradeoffs 2012 8
    9. Capacity Planning Least Squares Better for simple trending, obviously Finds trend line that minimizes the sum of the residuals squared Less computationally expensive than HW 2012 9
    10. Capacity Planning Good choice for noisy data Possible future mean value 2012 10
    11. Capacity Planning Linear Algebra is fun Linear Algebra is grindy Linear Algebra is a great way to really think about algorithms RRD Python abstraction class is available 2012 11
    12. Capacity Planning Quadratic/Cubic Fit Naive Experimental Fits a polynomial of given order to data 2012 12
    13. Capacity Planning For quadratic or cubic datasets User decision 2012 13
    14. RRD Analysis Tool Goals General stats, mean, variance, etc Also do derivatives, multiple order derivatives Bivariate correlation Dependencies: Python >= 2.4 numpy, rrdtool, scipy, matplotlib, mako 2012 14
    15. RRD Analysis Tool Example running of this thing: ./analyze.py -H localhost -S Current_Load -s 2012 15
    16. RRD Analysis Tool Why do you want to smooth your stuff? Noise noise noise Comedy Option: Pretty graphs Mean Stddev Variance 2012 16
    17. RRD Analysis Tool Derivatives Δx Quick refresher: Δy Actual form well use: y t − y t−1 y t − yt −1 = t t −t t−1 RRD Resolution 2012 17
    18. RRD Analysis Tool Uses? Relateable to physics? Position Velocity Acceleration Jerk (seriously) 2012 18
    19. RRD Analysis Tool Example, first derivative on CPU Load: analyze.py -H localhost -S Current_Load -d 1 2012 19
    20. RRD Analysis Tool Direct use case? Back to bytes/sec 2012 20
    21. RRD Analysis Tool Second derivative (acceleration) analyze.py -H localhost -S Root_Partition -d 1,2 2012 21
    22. RRD Analysis Tool Bivariate Analysis Compare two possibly related variables Define a relationship Graph them on the same graph Find Pearsons Correlation Coefficient 2012 22
    23. RRD Analysis Tool Example: analyze.py -H localhost,localhost -S _HOST_,PING 2012 23
    24. RRD Analysis Tool Example: analyze.py -H localhost,localhost -S HTTP,Current_Load 2012 24
    25. RRD Analysis Tool Example: analyze.py -H localhost,localhost -S Current_Load,Root_Partition 2012 25

    ×