Monitoring the User Experiencefor Availability and Performance             Nathan Vonnahme        Nathan.vonnahme@bannerhe...
Hi from Fairbanks ~100,000 people                    2012
Not very big Fairbanks Memorial Hospital • ~150 beds • ~1500 employees; ~40 IT dept. • around 300 production servers (~75%...
Context • Last year: “Monitoring and Test Driven Development”                            2012
Last year – central idea Monitoring tools are to the sysadmin what testing tools are to the developer.                    ...
Last year            2012
Last year: “Not a CI [Continuous Integration] tool”    • Not as coupled to build processes    • We don’t build most of our...
QA and the Manufacturing Metaphor Testing the end product                           2012
The main idea Can users actually use it? Does it suck?                2012
Availability: Can users actually use it? If the database server won’t    ping, that’s obviously    important. If the netwo...
Performance: Does it suck? Let’s be objective. The user doesn’t  care why.                         2012
It matters              THIS IS days when I was happiest “I started to learn that the    were the days withANof small succ...
Two examples 1. A mission-critical legacy Windows app with a    complicated architecture 2. A cooler modern web app       ...
LEGACY WINDOWS APP
Cerner Electronic Medical Record Note: Only fake patients in this presentation.
Complex architecture • SAN         s • AIX      • Citrix • Oracle   • Network • Middle- •   Desktop   ware           •   E...
A quick definition of Citrix The users are remote-controlling the application  running on the Citrix server. The “client” ...
Complex support team                       2012
Complex monitoring • Not just Nagios for monitoring. • Most things are well-monitored by Somebody • Certain things are “lo...
Can the users actually use it? AutoIt lets us drive it like a user. 1. Log in 2. Search for a (fake) patient 3. Open the p...
Do it live.              2012
The Live Demo Interactive demo with a test patient in production.                          2012
Runify "c:program  filesnsclient++scriptscheck_pow  erchart.exe" fat –w5,1,1 -c40,5,5 • Compiled .exe • Use fat or citrix ...
The AutoIt Script Open in SciTE and glance through. Note, Nagios.com has a couple of good articles by  Sam Lansing about u...
NSCP config [/settings/external scripts/scripts] check_ctx = scriptscheck_powerchart.exe   citrix -w20,5,5 -c40,5,5 check_...
Gotchas: too many running at once Runs a KillHungProcesses() routine at  beginning and end to clean them up. Also set max_...
Gotchas: Occasional popups AdlibRegister("HandleRelationship") Func HandleRelationship()   If WinActive("Assign a Relation...
Gotchas: Stupid typer I repeatedly mistakenly added lines to the wrong    section of the nsclient.ini file.               ...
Gotchas: Service vs. Interactive GUI                           2012        29
Gotchas: AutoIt output to STDOUT You have to compile   to .exe with “Console”   checkbox or /console switch   on commandli...
Gotchas: Annoying phantom icons in the tray Slow too! Solution: LP#TrayIconBuster                         2012            ...
The results: Citrix 1 week of 15   minute   samples                      2012
The results:                  Fat client       Note the much         different Y axis         scale2012
Citrix vs Fat: icon to login (zoomed to 1.5 days)                            2012
Citrix vs Fat: open one patient’s chart                            2012
The advantages • Continuous monitoring of the end product • Nagios alerting, escalating, whatever. • Control/experiment   ...
Still TODO: • Thresholds don’t yet affect OK/WARNING/CRITICAL • Thresholds in perfdata • Make it less noisy for less ignor...
Questions/comments about this example?                        2012
MODERN WEB APP2012
Demofairbanks.bannerhealth.com/  cpoe_ordersetsIt’s not a fair comparison, just an example.
Modern == Ajax Notice there’s not a full page  load. How do we tell if users can  actually use this? How can we tell if it...
Options: Perl/PHP                    I’ve done this… Symfony                       functional scripts, for                ...
Options: Selenium • Runs a real browser or browsers, kind of like   AutoIt • Other people are using it quite successfully ...
PhantomJS Headless webkit   As of July 2012 [webkit] has the most market    share of any layout engine at over 40% of    t...
PhantomJS • Allows you to write client scripts in JavaScript,   executed by its own fast V8 engine   • “Always bet on JS” ...
CasperJS Nicer JS API on top of PhantomJS                         2012
SHOW ME THE CODEZ Quick Walkthrough in Aquamacs or Gist Demo failure + screenshot                        2012
Works on Nagios XI host • A little bit heavy (it is firing up a WebKit) • Zombiejs looks promising as a lighter (pure JS i...
The results – 10 minute interval Boring… note total amplitude.                           2012
The benefits • Ongoing assurance that it is basically usable • If it’s usable, all its layers must be working. • Benchmark...
Still TODO • Check timestamp at bottom of page • Run the same checks on an internal development   version • Upgrade jQuery...
Main idea again Can the users actually use it? Does it suck?                  2012
Concluding Questions/comments JS Sample Code at  gist.github.com/n8v Email me if you want the AutoIt code.                ...
Sneak peek ./check_facebook_friends.js -u   nathan.vonnahme -w @202 -c @203 Come to my other talk, “Writing Custom Nagios ...
Upcoming SlideShare
Loading in...5
×

Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience

1,301

Published on

Nathan Vonnahme's presentation on using Nagios
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,301
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Photos: http://www.newsminer.com/pages/features_our_town/
  • Photo: http://www.uaf.edu/files/news/featured/03/tvc/index.html
  • Image: http://availabilityadvisor.com/2012/01/10/how-much-availability-is-enough/
  • Les Claypool:Primus performancesucks! Image http://www.thehazardreport.com/2012/09/primus-sucks-at-club-nokia-part-ii-2010.html
  • Quote from “Controlling Your Environment Makes You Happy”http://www.joelonsoftware.com/uibook/chapters/fog0000000057.html----- Meeting Notes (9/24/12 15:05) -----THIS IS NOT AN IPHONE 5!
  • Image: http://www.couriermail.com.au/spike/columnists/blame-game-in-full-swing-amid-election-2010-fallout/story-e6frerex-1225911543729
  • Img http://axtonlennox111.deviantart.com/art/1-2-3-NOT-IT-dam-242775890
  • AdlibRegister basically runs every quarter second until unregistered.
  • Img http://livelovelaugh-lace1013.blogspot.com/2008_10_01_archive.html
  • This is why the nagios.com article uses notepad to write output to a file… I guess it works!
  • This Citrix performance is what we actually deliver to our users. Note how much more erratic it is too.
  • Much more similar Y scale– the performance doesn’t change that much.This makes the spikes much more interesting. What was going on with the backend on Thursday?
  • ----- Meeting Notes (9/24/12 15:05) -----need darker background
  • Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience

    1. 1. Monitoring the User Experiencefor Availability and Performance Nathan Vonnahme Nathan.vonnahme@bannerhealth.com
    2. 2. Hi from Fairbanks ~100,000 people 2012
    3. 3. Not very big Fairbanks Memorial Hospital • ~150 beds • ~1500 employees; ~40 IT dept. • around 300 production servers (~75% Vmware)... roughly 400 "apps“ • 24x7x365.25 as only healthcare can be • Nagios monitors 113 hosts (includes some devices like switches and UPSes), but 442 services. • Nagios in the gaps 2012
    4. 4. Context • Last year: “Monitoring and Test Driven Development” 2012
    5. 5. Last year – central idea Monitoring tools are to the sysadmin what testing tools are to the developer. 2012
    6. 6. Last year 2012
    7. 7. Last year: “Not a CI [Continuous Integration] tool” • Not as coupled to build processes • We don’t build most of our software! • Strengths in flexibility, alerting, production monitoring • Potentially weak in being IT-focused instead of customer-focused. 2012
    8. 8. QA and the Manufacturing Metaphor Testing the end product 2012
    9. 9. The main idea Can users actually use it? Does it suck? 2012
    10. 10. Availability: Can users actually use it? If the database server won’t ping, that’s obviously important. If the network is down, duh. But is your monitoring coverage good enough to prove that your end product is delivering? 2012
    11. 11. Performance: Does it suck? Let’s be objective. The user doesn’t care why. 2012
    12. 12. It matters THIS IS days when I was happiest “I started to learn that the were the days withANof small successes and few NOT lots IPHONE 5! small frustrations.” –Joel Spolsky 2012
    13. 13. Two examples 1. A mission-critical legacy Windows app with a complicated architecture 2. A cooler modern web app 2012
    14. 14. LEGACY WINDOWS APP
    15. 15. Cerner Electronic Medical Record Note: Only fake patients in this presentation.
    16. 16. Complex architecture • SAN s • AIX • Citrix • Oracle • Network • Middle- • Desktop ware • Every- • Another thing x 2 SAN for High Avail- • VMWare ability • Window
    17. 17. A quick definition of Citrix The users are remote-controlling the application running on the Citrix server. The “client” runs on the Citrix server. The user’s actual workstation runs a “thin client”. 90% or more of Cerner customers use Citrix. For comparison: old school “fat client” 2012
    18. 18. Complex support team 2012
    19. 19. Complex monitoring • Not just Nagios for monitoring. • Most things are well-monitored by Somebody • Certain things are “lost and found” and Nobody wants to touch them. 2012
    20. 20. Can the users actually use it? AutoIt lets us drive it like a user. 1. Log in 2. Search for a (fake) patient 3. Open the patient’s chart. 2012
    21. 21. Do it live. 2012
    22. 22. The Live Demo Interactive demo with a test patient in production. 2012
    23. 23. Runify "c:program filesnsclient++scriptscheck_pow erchart.exe" fat –w5,1,1 -c40,5,5 • Compiled .exe • Use fat or citrix client • WARNING if more than 5,1,1 seconds for login, search, chart • CRITICAL over 40,5,5 2012
    24. 24. The AutoIt Script Open in SciTE and glance through. Note, Nagios.com has a couple of good articles by Sam Lansing about using AutoIt. Also see the presentation he did just before this one! 2012
    25. 25. NSCP config [/settings/external scripts/scripts] check_ctx = scriptscheck_powerchart.exe citrix -w20,5,5 -c40,5,5 check_fat = scriptscheck_powerchart.exe fat -w20,5,5 -c40,5,5 2012
    26. 26. Gotchas: too many running at once Runs a KillHungProcesses() routine at beginning and end to clean them up. Also set max_check_commands=1 in config for this host. 2012
    27. 27. Gotchas: Occasional popups AdlibRegister("HandleRelationship") Func HandleRelationship() If WinActive("Assign a Relationship") Then Send("cc{ENTER}") EndIf EndFunc ... AdlibUnRegister("HandleRelationship") 2012
    28. 28. Gotchas: Stupid typer I repeatedly mistakenly added lines to the wrong section of the nsclient.ini file. 2012
    29. 29. Gotchas: Service vs. Interactive GUI 2012 29
    30. 30. Gotchas: AutoIt output to STDOUT You have to compile to .exe with “Console” checkbox or /console switch on commandline aut2exe compiler "%programfiles%Autoit3aut2exeaut2exe.exe" /in check_powerchart.au3 /console /nopack 2012
    31. 31. Gotchas: Annoying phantom icons in the tray Slow too! Solution: LP#TrayIconBuster 2012 31
    32. 32. The results: Citrix 1 week of 15 minute samples 2012
    33. 33. The results: Fat client Note the much different Y axis scale2012
    34. 34. Citrix vs Fat: icon to login (zoomed to 1.5 days) 2012
    35. 35. Citrix vs Fat: open one patient’s chart 2012
    36. 36. The advantages • Continuous monitoring of the end product • Nagios alerting, escalating, whatever. • Control/experiment • Just the two (Citrix vs fat) tell us a lot • We have a historical benchmark to compare anomalies with • We’ve already identified a symptomatic Citrix problem we can treat with a Nagios event handler. 2012
    37. 37. Still TODO: • Thresholds don’t yet affect OK/WARNING/CRITICAL • Thresholds in perfdata • Make it less noisy for less ignorable notifications • Make it run more often with tighter thresholds now that we have a benchmark. • Move it to a dedicated VM/user accounts • More experiment groups • No antivirus • No Citrix printers • No VMWare (fat, no Citrix) 2012
    38. 38. Questions/comments about this example? 2012
    39. 39. MODERN WEB APP2012
    40. 40. Demofairbanks.bannerhealth.com/ cpoe_ordersetsIt’s not a fair comparison, just an example.
    41. 41. Modern == Ajax Notice there’s not a full page load. How do we tell if users can actually use this? How can we tell if it sucks? 2012
    42. 42. Options: Perl/PHP I’ve done this… Symfony functional scripts, for example. They’ll never do JavaScript or AJAX. I worry it won’t match the top level, what the user sees. 2012
    43. 43. Options: Selenium • Runs a real browser or browsers, kind of like AutoIt • Other people are using it quite successfully (e.g. Nathan Broderick 2011; Sam Lansing 2012) • I have not gotten into it… images of too many browser windows? • Not very mobile-like 2012
    44. 44. PhantomJS Headless webkit As of July 2012 [webkit] has the most market share of any layout engine at over 40% of the browser market share according to StatCounter. -- http://en.wikipedia.org/wiki/WebKit 2012
    45. 45. PhantomJS • Allows you to write client scripts in JavaScript, executed by its own fast V8 engine • “Always bet on JS” – Brendan Eich, original author of JavaScript • JavaScript is funny in that most people don’t actually learn it before they start using it. – Douglas Crockford (paraphrase), author of JavaScript: The Good Parts 2012
    46. 46. CasperJS Nicer JS API on top of PhantomJS 2012
    47. 47. SHOW ME THE CODEZ Quick Walkthrough in Aquamacs or Gist Demo failure + screenshot 2012
    48. 48. Works on Nagios XI host • A little bit heavy (it is firing up a WebKit) • Zombiejs looks promising as a lighter (pure JS in Node) JS-capable headless browser • Quick CasperJS install instructions on my blog, n8v.enteuxis.org 2012
    49. 49. The results – 10 minute interval Boring… note total amplitude. 2012
    50. 50. The benefits • Ongoing assurance that it is basically usable • If it’s usable, all its layers must be working. • Benchmarked performance so we can evaluate the end result of changes 2012
    51. 51. Still TODO • Check timestamp at bottom of page • Run the same checks on an internal development version • Upgrade jQuery UI and see if performance changes • Check from an Internet host 2012
    52. 52. Main idea again Can the users actually use it? Does it suck? 2012
    53. 53. Concluding Questions/comments JS Sample Code at gist.github.com/n8v Email me if you want the AutoIt code. 2012
    54. 54. Sneak peek ./check_facebook_friends.js -u nathan.vonnahme -w @202 -c @203 Come to my other talk, “Writing Custom Nagios Plugins” Friday morning to see how it works. 2012
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×