Your SlideShare is downloading. ×

Adaptive Fault Detection

2,171

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,171
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
17
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • seq 1 15 | awk '{printf "%.2f%%\\n", 100-(.499**$1*100)}'\n
  • seq 1 15 | awk '{printf "%.2f%%\\n", 100-(.499**$1*100)}'\n
  • seq 1 15 | awk '{printf "%.2f%%\\n", 100-(.499**$1*100)}'\n
  • seq 1 15 | awk '{printf "%.2f%%\\n", 100-(.499**$1*100)}'\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript

    • 1. Adaptive Fault DetectionBaron Schwartz • Percona Live NYC 2012Date
    • 2. Me Optimization, Backups, 3r e rs V Replication, and more Co d e rs v Ed io Author of High Performance MySQL iti n 5.✤ on 5✤ Creator of some tools that you might use✤ I love hearing from people just like you: High @xaprb on Twitter Performance MySQL ✤ ✤ http://www.linkedin.com/in/xaprb Baron Schwartz, Peter Zaitsev & Vadim Tkachenko
    • 3. Conventional Fault DetectionMetrics, Thresholds, and Actions
    • 4. Nagios and ThresholdsIs there a right answer?
    • 5. Motivations✤ Detect unknown failure mode✤ Capture diagnostic data automatically✤ Surface relevant information
    • 6. Six Sigmas99.7% of measurements fall within ±3 sigmas of mean in a normal distribution
    • 7. Abnormality DetectionStatistical process control, operations research, and intuition
    • 8. Shewhart Control ChartsMetrics that fall ± too many standard deviations from the mean are out of bounds
    • 9. Holt-Winters ForecastingPredict the future based on history, trend, and seasonality
    • 10. Brownian MotionA random walk shouldn’t go the same way for long
    • 11. Probability of Increase/Decrease Increase 49.88% Same 0.21% Decrease 49.91%Coin TossingQPS increases and decreases with ~equal probability
    • 12. Length of QPS Runs 100000 75000 50000 25000 0 -9 -8 -7 -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 7 8 9I Feel Normal TodayLong runs of QPS increases/decreases behave like a coin toss
    • 13. Run Improbability Length 1 50.10% 75.09% 2 75.10% 93.79% 3 87.57% 98.46% 4 93.80% 99.62% 5 96.91% 99.90% 6 98.46% 99.98% 7 99.23% 99.99% 8 99.62% 9 99.90% 10 99.95%Oddly EvenHow long is an unusually long random walk?
    • 14. Run Improbability Length 1 50.10% 75.09% 2 75.10% 93.79% 3 87.57% 98.46% 4 93.80% 99.62% Two ! 5 96.91% 99.90% 6 98.46% 99.98% 7 99.23% 99.99% 8 99.62% 9 99.90% 10 99.95%Oddly EvenHow long is an unusually long random walk?
    • 15. Run Improbability Length 1 50.10% 75.09% 2 75.10% 93.79% 3 87.57% 98.46% 4 93.80% 99.62% Two ! 5 96.91% 99.90% 6 98.46% 99.98% 7 99.23% 99.99% 8 99.62% Three ! 9 99.90% 10 99.95%Oddly EvenHow long is an unusually long random walk?
    • 16. Run Improbability Length 1 50.10% 75.09% 2 75.10% 93.79% 3 87.57% 98.46% 4 93.80% 99.62% Two ! 5 96.91% 99.90% 6 98.46% 99.98% 7 99.23% 99.99% 8 99.62% Three ! 9 99.90% 10 99.95% Same thing, butOddly Even two variablesHow long is an unusually long random walk?
    • 17. Houston, We Have an OpportunityThese techniques fall short of what’s needed
    • 18. God hath chosen the foolish things of theworld to confound the wise [1 Cor 1:27]
    • 19. Unexpected Things Happen... but who says abnormal is bad?
    • 20. Bottleneck DetectionFind abnormalities, then determine whether they are system faults
    • 21. Metrics That MatterThroughput, concurrency, and change—but not response time (why?)
    • 22. AlgorithmsVarious combinations of severity, directionality, run length, duration, and more
    • 23. Out of BoundsHow often each algorithm detected abnormalities
    • 24. Drilling DownOne of the algorithms triggered at the gray line
    • 25. Another ExampleThis one from a more selective algorithm
    • 26. Brownian CommotionJust because it’s long and aimed the right way doesn’t mean it’s scary
    • 27. QPS Cxn RunsSee Spots RunThere’s a clear run of decreasing QPS and increasing connections—but no stall/lockup
    • 28. QPS Cxn RunsSee Spots RunThere’s a clear run of decreasing QPS and increasing connections—but no stall/lockup
    • 29. A New ApproachCombinations of algorithms to avoid run-based false positives
    • 30. I’d Like Fries With ThatAnd supersize my Threads_running, please
    • 31. Why Not Response Time?We are legion and we want your carat patch
    • 32. Mass ApplicationDoes it generalize to many different settings?
    • 33. Dataset #2Looks reasonable on a different workload—so far
    • 34. Stall Detection In ActionSorry, no witty comment here
    • 35. Oh Crud.What does “bad” mean on a workload like this?
    • 36. He taketh the wise in their own craftiness.[1 Cor 3:19]
    • 37. Different Like Everybody ElseVariance-to-mean ratio / index of dispersion to the rescue?
    • 38. Still Life With Purple LineIf you think this is art, I am happy to sell it to you
    • 39. When In Doubt, Get EmpiricalMeasure 99.7th% V:M on a well-behaved dataset. Good enough for government work.
    • 40. Workload #1 ReduxStill finds lots of the same “bad spots” with the V:M ratio filter
    • 41. If you can read this, flip me over!Workload #2 With New Filter
    • 42. Conclusions✤ Workloads differ. GIGO.✤ Unlikely events are not necessarily bad.✤ Naive techniques fail; more sophisticated methods are required.✤ Thresholds are too simplistic, but reappear.✤ Common sense beats elaborate math.✤ Automated problem detection is a good thing?
    • 43. Jobs✤ Want to work and live in America’s #1 City?✤ I am hiring DBAs and developers. Talk to me.
    • 44. Questions? @xaprb • http://www.linkedin.com/in/xaprb
    • 45. Image Credits✤ http://www.flickr.com/photos/katej/853418592/✤ http://www.flickr.com/photos/domesticat/2963393184/✤ http://www.flickr.com/photos/markho/481969187/✤ http://www.flickr.com/photos/exquisitur/3502317741/✤ http://www.flickr.com/photos/calleephoto/4952091078/✤ http://www.flickr.com/photos/paperpariah/4150220583/✤ http://www.flickr.com/photos/stevewall/6057281066/✤ http://www.flickr.com/photos/mybloodyself/5879425774/✤ http://www.flickr.com/photos/josephrobertson/92849605/✤ http://en.wikipedia.org/

    ×