Database Performance Analysis with Time Series


Published on

Showing how to use R and Time Series Analysis techniques to analyse performance and plan capacity and SLAs.

Published in: Technology, Economy & Finance
1 Comment
  • I am using Enteros Performance Explorer-i database performance analysis tool - IMHO it is absolutely industry's best!

    It has moving averages, seasonality analysis, linear regression predication, trend analysis, and automated spike analysis, cross database and cross instance analysis, Oracle RAC support, ASH analysis and much more.

    Sorry for being too excited, but for me Performance Explorer-i was a treasure chest, and considering my complex, challenging and hugely active production database environment is a life savior.
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Time Series – Data that is collected sequentially, usually in regular intervals.Time series are all around us – weather, stock, cpu, disk space…
  • Recognize abnormal data and send alerts Recognize changes and be proactive Analyze long term trends for planning Set Realistic SLAs
  • One question we’ll keep asking ourselves: Which techniques are really useful?
  • All kinds of data issues can prevent analysisYou can and sometimes should fix the data so analysis is possibleReplace missing data with average values (or maximum values where makes sense)Remove outliers when it makes sense.Analyze two sides of discontinuity separately
  • Linear trend. Easy to fit and use, but rarely makes sense in real life
  • Moving Average requires picking a window size and weights.Small window – matches data better, but may include noiseLarge window – more of a general trend, but will contain a delay
  • Remove trend to allow analyzing other components.
  • 50 degrees Fahrenheit is cold for August but hot for January. How about 60% CPU? Is it always OK or always a problem?
  • Reminder: Correlation is a measure of the strength of the relation between two variables. How much do the variables change together?
  • How is data in our series correlates to itself? We see strong correlation between data points 24 hours away.
  • Average CPU for each hour. Similar to those average temperatures for each month charts you sometimes see in tour guides.
  • One chart to rule them all – data, trend, seasonality and all the rest.
  • “All the rest” is not completely random – there is still some auto-correlation. Data correlates to points with a lag of one and two.
  • R used the auto-correlations to model the data
  • We test the model.We can see that the residuals no longer have auto-correlationand the statistical test for the fit shows that the result is likely not random.
  • I added couple of hours with high CPU here. Can you spot them?
  • After removing seasonality and average, we can clearly see that data point that is an outlier. It stands out.
  • Calculate moving average of future by adding the moving average for the last 20 points as an additional point. Then using the last 19 real points and the new one to calculate another point… Obviously this gets less accurate the more you do it.Adding seasonality is a matter of adding the hourly average to the appropriate new points.
  • Red – Match the model to existing dataBlue – Predicted dataGreen – 99% probability that we will not get data outside these lines
  • A bit like moving average but with very specific weights.
  • Blue – Predicted dataGreen – 99% probability that we will not get data outside these lines
  • The redo data is very noisy, but adding a moving average trend allows us to see a point where redo generation drops. This happened to be Dec 20 where many users left for vacation.
  • Correlation every 6 hours and stronger correlation every 24. These are the times we recalculate materialized views. Few views every 6 hours and a bunch every 24.
  • Removing the seasonality allows us to notice abnormal data. Worth investigating – what was running at that time? Is it likely to happen again?
  • Not exactly trend, but we do have changing levels of data.
  • There are periodic correlations but they are not regular, so it is not seasonality.This graph does indicate extremely strong auto-correlation
  • Partial Autocorrelation graph. This is similar to autocorrelation, but when we calculate auto-correlation for lag 2, we remove the correlation already explained by lag 1 and so on.Using this graph we can see auto-correlation up to lag 17. Once the CPU climbs, it may take over 3 hours until it is back to normal!
  • Checking that the AR(17) model fits.
  • Database Performance Analysis with Time Series

    1. 1. Analyzing Oracle Performance Using Time Series Models <br />Chen (Gwen) Shapira<br />
    2. 2. Why?<br />Abnormal Data<br />Changes<br />Trends<br />SLAs<br />
    3. 3. See<br />Techniques<br />Use Cases<br />Real Data<br />
    4. 4. Techniques<br />
    5. 5.
    6. 6. Trend<br />
    7. 7. Trend<br />
    8. 8. Moving Average Trend<br />
    9. 9.
    10. 10. Remove Trend<br />
    11. 11. Seasonality<br />
    12. 12.
    13. 13.
    14. 14. Seasonal Effect<br />
    15. 15. Components<br />
    16. 16. More AutoCorrelation<br />
    17. 17. Xt= 0.33Xt-1 + 0.07Xt-2 – 0.09Xt-3+ e<br />
    18. 18. Test Model<br />
    19. 19. Use Cases<br />
    20. 20. Fake Incident<br />
    21. 21. Detect By<br />Remove trend<br />Remove Seasonality<br />Mark “normal data”<br />What’s left?<br />
    22. 22. Spot the Incident<br />
    23. 23. “I have seen the future and it is very much like the present, only longer”<br />KehlogAlbran<br />
    24. 24. Exponential Smoothing<br />Calculate moving average of future<br />Add seasonality<br />
    25. 25.
    26. 26. AutoCorrelation<br /> Use the model:Xt = aXt-1…To calculate Xt+1,Xt+2…<br />
    27. 27.
    28. 28. Real Data 1:Redo Blocks per Hour<br />
    29. 29. Holiday<br />
    30. 30. Seasonality<br />
    31. 31. Abnormal Data<br />
    32. 32. Real Data 2:CPU on DB Server<br />
    33. 33.
    34. 34. Seasonality?<br />
    35. 35. Partial AutoCorrelation<br />
    36. 36. Check Fit of Model<br />
    37. 37. Prediction<br />
    38. 38. Conclusions<br />Use moving average to describe trend<br />Look for seasonality<br />Predict with Exponential Smoothing<br />AutoCorrelation?<br />Seasonality aware monitoring<br />
    39. 39. Questions?<br />