0
Integrating Nagios with<br />Test Driven Development	<br />Nathan Vonnahme<br />Nathan.Vonnahme@bannerhealth.com<br />Inte...
2011<br />2<br />Intro<br /><ul><li>9 years of hospital IT
Developer -> sysadmin
Sysadmin -> developer</li></li></ul><li>2011<br />3<br />Central Idea<br />Monitoring tools are to the sysadmin what testi...
Blog post<br />goo.gl/Oc2rn <br />CASE sensitive!<br />2011<br />
Overview<br />Test-driven Development<br />Monitoring and TDD<br />Sample tools<br />Further ideas/implications<br />2011<...
I.Test-driven Development (TDD)<br />2011<br />Nagios World Conference<br />6<br />
KwaliteeUhshuranse (QA)<br />What is “quality”?<br />“In its broadest sense, quality is a degree of excellence: the extent...
Quality is a result<br />“Quality is the result of a comparison between what was required and what was provided. It is jud...
How do you get quality?<br />By, among other things:<br /><ul><li>verifying before delivery that products and services pos...
preventing the supply of products and services which possess features which dissatisfy customers</li></li></ul><li>Testing...
Catches defects</li></li></ul><li>Software Testing Types<br />(Names/categories vary)<br />Unit<br />Integration<br />User...
1.  Unit testing<br /><ul><li>Does each individual piece do what we expect?
Yes or no?</li></li></ul><li>2. Integration testing<br /><ul><li>Do the units work together?
We need a parallel universe!</li></li></ul><li>3. UserAcceptance testing<br /><ul><li>AKA “Functional testing” or “Custome...
Does it work for the actual user?
Does it do what we promised?
“User stories”</li></ul>As a user closing the application, <br />I want to be prompted to save anything that has changed s...
Test Automation<br />The robots will do our work for us!<br />
Why automate?<br /><ul><li>Thorough
Consistent
Fast
More likely to actually get run
Running scoreboard
“us 43 them 0”</li></li></ul><li>TDD (~1999)<br />Testing is such a good idea, let’s do it<br /><ul><li>Early (test first,...
Always (test automation)
Often (continuous integration)</li></ul>2011<br />
Continuous Integration<br />Run the test suite whenever source code changes!<br />Alarm if any tests fail!<br />2011<br />
Behavior-driven development (~2003)<br />BDD builds on TDD but especially emphasizes acceptance testing, and begins by wri...
II. Monitoring and TDD<br />2011<br />Nagios World Conference<br />20<br />
Monitoring as TDD<br />At the highest level, monitoring is QA.<br />2011<br />
Translation needed<br />How do these types of testing translate for the sysadmin?<br /><ul><li>Unit
Integration
Acceptance</li></ul>2011<br />
Unit<br /><ul><li>Hosts pingable
Filesystems writable
CPU, memory properly utilized
OS functioning
Services running</li></ul>2011<br />
Unit Monitoring with Nagios<br />Many plugins for “unit” monitoring:<br /><ul><li>check_ping
check_swap
check_load
check_disk
check_dhcp
check_http
check_ldap
check_mysql
check_nt
check_ntp_peer
check_pgsql
check_procs
check_radius
check_smtp
check_ssh</li></ul>2011<br />
Integration<br /><ul><li>Cross-machine functions
Database queryable
Data fresh
Expected scheduled processes have completed
Backups worked
Printing works</li></ul>2011<br />
Integration Monitoring with Nagios<br />A few plugins for “integration” monitoring:<br /><ul><li>check_mysql_query
check_hpjd  ?</li></ul>Usually requires custom plugins. Some of mine:<br /><ul><li>check_backup
check_printer
check_app_logins
check_chartserver*</li></ul>2011<br />*See my other talk.<br />
User Acceptance<br /><ul><li>Required functions
Acceptable speed
Expected security/access
Core expectations
Upcoming SlideShare
Loading in...5
×

Nagios Conference 2011 - Nathan Vonnahme - Integrating Nagios With Test Driven Development

1,331

Published on

Nathan Vonnahme's presentation on integrating Nagios with test driven development. The presentation was given during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,331
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Write this on a board?
  • E.g. not “Does the Adder object add numbers properly?” but “Can the cashier complete a sale?”
  • We’re all trying to deliver quality. XKCD mouseover text: “The weird sense of duty really good sysadmins have can border on the sociopathic, but it&apos;s nice to know that it stands between the forces of darkness and your cat blog&apos;s servers.”
  • We certainlyneed to monitor these things! They particularly speed problem resolution.
  • *See my other talk.
  • http://www.perl.org/about/whitepapers/perl-testing.html
  • (Plug my other talk)
  • Transcript of "Nagios Conference 2011 - Nathan Vonnahme - Integrating Nagios With Test Driven Development"

    1. 1. Integrating Nagios with<br />Test Driven Development <br />Nathan Vonnahme<br />Nathan.Vonnahme@bannerhealth.com<br />Integrating Nagios with<br />Test Driven Development<br />
    2. 2. 2011<br />2<br />Intro<br /><ul><li>9 years of hospital IT
    3. 3. Developer -> sysadmin
    4. 4. Sysadmin -> developer</li></li></ul><li>2011<br />3<br />Central Idea<br />Monitoring tools are to the sysadmin what testing tools are to the developer.<br />
    5. 5. Blog post<br />goo.gl/Oc2rn <br />CASE sensitive!<br />2011<br />
    6. 6. Overview<br />Test-driven Development<br />Monitoring and TDD<br />Sample tools<br />Further ideas/implications<br />2011<br />
    7. 7. I.Test-driven Development (TDD)<br />2011<br />Nagios World Conference<br />6<br />
    8. 8. KwaliteeUhshuranse (QA)<br />What is “quality”?<br />“In its broadest sense, quality is a degree of excellence: the extent to which something is fit for its purpose. In the narrow sense, product or service quality is defined as conformance with requirement, freedom from defects or contamination, or simply a degree of customer satisfaction.”<br />
    9. 9. Quality is a result<br />“Quality is the result of a comparison between what was required and what was provided. It is judged not by the producer but by the receiver.”<br />
    10. 10. How do you get quality?<br />By, among other things:<br /><ul><li>verifying before delivery that products and services possess the features required
    11. 11. preventing the supply of products and services which possess features which dissatisfy customers</li></li></ul><li>Testing<br /><ul><li>Ensures essential function
    12. 12. Catches defects</li></li></ul><li>Software Testing Types<br />(Names/categories vary)<br />Unit<br />Integration<br />User Acceptance<br />
    13. 13. 1. Unit testing<br /><ul><li>Does each individual piece do what we expect?
    14. 14. Yes or no?</li></li></ul><li>2. Integration testing<br /><ul><li>Do the units work together?
    15. 15. We need a parallel universe!</li></li></ul><li>3. UserAcceptance testing<br /><ul><li>AKA “Functional testing” or “Customer testing”
    16. 16. Does it work for the actual user?
    17. 17. Does it do what we promised?
    18. 18. “User stories”</li></ul>As a user closing the application, <br />I want to be prompted to save anything that has changed since the last save <br />so that I can preserve useful work and discard erroneous work. <br />
    19. 19. Test Automation<br />The robots will do our work for us!<br />
    20. 20. Why automate?<br /><ul><li>Thorough
    21. 21. Consistent
    22. 22. Fast
    23. 23. More likely to actually get run
    24. 24. Running scoreboard
    25. 25. “us 43 them 0”</li></li></ul><li>TDD (~1999)<br />Testing is such a good idea, let’s do it<br /><ul><li>Early (test first, then implement)
    26. 26. Always (test automation)
    27. 27. Often (continuous integration)</li></ul>2011<br />
    28. 28. Continuous Integration<br />Run the test suite whenever source code changes!<br />Alarm if any tests fail!<br />2011<br />
    29. 29. Behavior-driven development (~2003)<br />BDD builds on TDD but especially emphasizes acceptance testing, and begins by writing user stories, not unit tests.<br />2011<br />
    30. 30. II. Monitoring and TDD<br />2011<br />Nagios World Conference<br />20<br />
    31. 31. Monitoring as TDD<br />At the highest level, monitoring is QA.<br />2011<br />
    32. 32. Translation needed<br />How do these types of testing translate for the sysadmin?<br /><ul><li>Unit
    33. 33. Integration
    34. 34. Acceptance</li></ul>2011<br />
    35. 35. Unit<br /><ul><li>Hosts pingable
    36. 36. Filesystems writable
    37. 37. CPU, memory properly utilized
    38. 38. OS functioning
    39. 39. Services running</li></ul>2011<br />
    40. 40. Unit Monitoring with Nagios<br />Many plugins for “unit” monitoring:<br /><ul><li>check_ping
    41. 41. check_swap
    42. 42. check_load
    43. 43. check_disk
    44. 44. check_dhcp
    45. 45. check_http
    46. 46. check_ldap
    47. 47. check_mysql
    48. 48. check_nt
    49. 49. check_ntp_peer
    50. 50. check_pgsql
    51. 51. check_procs
    52. 52. check_radius
    53. 53. check_smtp
    54. 54. check_ssh</li></ul>2011<br />
    55. 55. Integration<br /><ul><li>Cross-machine functions
    56. 56. Database queryable
    57. 57. Data fresh
    58. 58. Expected scheduled processes have completed
    59. 59. Backups worked
    60. 60. Printing works</li></ul>2011<br />
    61. 61. Integration Monitoring with Nagios<br />A few plugins for “integration” monitoring:<br /><ul><li>check_mysql_query
    62. 62. check_hpjd ?</li></ul>Usually requires custom plugins. Some of mine:<br /><ul><li>check_backup
    63. 63. check_printer
    64. 64. check_app_logins
    65. 65. check_chartserver*</li></ul>2011<br />*See my other talk.<br />
    66. 66. User Acceptance<br /><ul><li>Required functions
    67. 67. Acceptable speed
    68. 68. Expected security/access
    69. 69. Core expectations
    70. 70. Homepage isn’t defaced
    71. 71. User can log in
    72. 72. Customer can buy stuff</li></ul>2011<br />
    73. 73. User Acceptance Monitoring with Nagios<br /><ul><li>Maybe some with check_http.</li></ul>2011<br />
    74. 74. Growing Test Coverage<br />Mark Jason Dominus: <br /><ul><li>Start by saying "I'll write one [Nagios check]".
    75. 75. You don't have to write all the [Nagios checks] before you start the project.</li></ul>2011<br />
    76. 76. Test Bugs<br />“When you fix (or find) a bug, add a [Nagios check] for it”<br /><ul><li>You’ll fix it much faster next time
    77. 77. For serious repeating issues, add an event handler!</li></ul>2011<br />
    78. 78. Test Features<br />“When you add a feature, add a [Nagios check] for just that feature”<br /><ul><li>A new service
    79. 79. A new piece of hardware</li></ul>2011<br />
    80. 80. Test Sysadmins<br /><ul><li>That thing you always forget to turn back on
    81. 81. That task you always procrastinate (SSL cert expiry?)
    82. 82. That filesystem that doesn’t always mount
    83. 83. That long process you forget to verify (backups, anyone?)
    84. 84. Random and senseless acts of system administration</li></ul>2011<br />
    85. 85. Test Users<br /><ul><li>“Login is slow!!!”
    86. 86. How slow?</li></ul>2011<br />
    87. 87. III. Sample Tools for Nagios<br />Reusing software testing tools for monitoring<br />2011<br />Nagios World Conference<br />34<br />
    88. 88. 2011<br />35<br />TAP – Test Anything Protocol<br />Testanything.org<br />
    89. 89. TAP – Test Anything Protocol<br /><ul><li>Widespread for Perl, many others
    90. 90. Simple output (kind of like Nagios plugins)</li></ul>1..4 ok 1 - Input file opened not ok 2 - First line of the input valid ok 3 - Read the rest of the file not ok 4 - Summarized correctly<br />2011<br />
    91. 91. TAP is widely used for Perl unit tests<br />The core Perl language has over 92,000 tests <br />There are over 142,000 tests for the libraries bundled with it.<br />Whenever you install a CPAN module:<br />t/check_stuff.t ................... ok<br />t/Nagios-Plugin-01.t .............. ok<br />t/Nagios-Plugin-02.t .............. ok<br />t/Nagios-Plugin-03.t .............. ok<br />t/Nagios-Plugin-04.t .............. Ok<br />t/Nagios-Plugin-Functions-01.t .... ok<br />t/Nagios-Plugin-Functions-02.t .... Ok<br /> . . . <br />All tests successful.<br />Files=16, Tests=971, 5 wallclocksecs ( 0.21 usr 0.07 sys + 4.27 cusr 0.42 csys = 4.97 CPU)<br />Result: PASS<br />2011<br />
    92. 92. TAP producer example<br />Writing test scripts is super easy.<br />#!/usr/bin/perl -wuse Test::Simple tests =>1;ok(1+1==2);<br />Other languages likewise.<br />Start (with Perl) by reading Test::Tutorial.<br />goo.gl/7C7pb <br />2011<br />
    93. 93. check_tap.pl<br />Explained in blog. Available at goo.gl/j8Xt2<br />check_tap.pl -s /full/path/to/testfoo.pl<br /> will run ‘testfoo.pl’ and return OK if 0 tests fail, but CRITICAL if any fail. <br />check_tap.pl -s /full/path/to/testfoo.pl -c 2<br /> will return OK if 0 tests fail, WARNING if more than 0 tests fail, and CRITICAL if more than 2 fail.<br />2011<br />
    94. 94. check_tap.pl<br />Non-Perl and remote test scripts<br />check_tap.pl -e '/usr/bin/ruby -w' -s /full/path/to/testfoo.r<br /> will run ‘testfoo.r’ using Ruby with the -w flag.<br />You can use any shell command and argument which produces TAP output, for example:<br />check_tap.pl -e '/usr/bin/curl -sk' -s 'http://url/to/mytest.php'<br />check_tap.pl -e '/usr/bin/cat' -s '/path/to/testoutput.tap'<br />In fact, anything TAP::Harness or prove regards as a source or executable.<br />2011<br />
    95. 95. Ex. 1: firewall ports<br />use Test::Ping;# written by NJV 2005; not from CPANplan tests =>13;ping_ok(qw~localhost http ~);ping_ok(qw~localhost https ~);diag"app server internal_server";ping_ok('internal_server.my.com','http');service_ok('internal_server.my.com','http');ping_ok('internal_server.my.com','https');service_ok('internal_server.my.com','https');ping_ok('internal_server.my.com','1433',‘internal_serverpingable on MS SQL port 1433');service_ok('internal_server.my.com','1433‘, ‘mssql responds’);ping_ok('internal_server.my.com','3306',‘internal_serverpingable on MySQL port 3306');service_ok('internal_server.my.com','3306‘, ‘mysql responds’);diag"testing open ports to other inside hostsn";service_ok('smtp.my.com','smtp',"can contact smtp (email) service on smtp.my.com");<br />2011<br />
    96. 96. Ex. 1: firewall ports – OUTPUT<br />1..13<br />ok 1 - this script is running on (xxx) the physician portal webserver<br />ok 2 - can connect to localhost:http<br />ok 3 - can connect to localhost:https<br /># app server internal_server<br />ok 4 - can connect to internal_server.my.com:http<br />ok 5 - service is answering on internal_server.my.com:http<br />ok 6 - can connect to internal_server.my.com:https<br />ok 7 - service is answering on internal_server.my.com:https<br />ok 8 - can contact internal_server (web app server) on MS SQL port 1433<br />ok 9 - service is answering on internal_server.my.com:1433<br />ok 10 - can contact internal_server (web app server) on MySQL port 3306<br />ok 11 - service is answering on internal_server.my.com:3306<br /># testing open ports to other inside hosts<br />ok 12 - can contact smtp (email) service on smtp.my.com<br />ok 13 - Survived to end<br />Check command:<br />check_tap.pl -v -e '/usr/bin/curl -sk' -s https://$HOSTADDRESS$/foo/porttest.pl<br />2011<br />
    97. 97. Ex. 2: farm<br /> plan tests =>29;# test thyself!my$me= hostname;like($me,qr/^portal/,"this script is running on ($me) a physician portal webserver");diag"checking citrix farm connectivityn";my@stas=qw(vcitrixdc01.my.comvcitrixdc02.my.com);foreach(@stas){ping_ok($_,'http');service_ok($_,'http',"XML service on $_ is responding in some way");}my@citrix_farm=qw(citrixps01.my.com citrixps02.my.com# ... citrixps30.my.com);foreach(@stas,@citrix_farm){service_ok($_,'1494',"ICA service on $_ is responding in some way");}<br />2011<br />
    98. 98. Ex. 2 OUTPUT<br />1..29<br />ok 1 - this script is running on (xxx) a physician portal webserver<br /># checking citrix farm connectivity<br />ok 2 - can connect to vcitrixdc01.my.com:http<br />ok 3 - XML service on vcitrixdc01.my.com is responding in some way<br />ok 4 - can connect to vcitrixdc02.my.com:http<br />ok 5 - XML service on vcitrixdc02.my.com is responding in some way<br />ok 6 - ICA service on vcitrixdc01.my.com is responding in some way<br /># ...<br />ok 13 - ICA service on citrixps06.my.com is responding in some way<br />not ok 14 - ICA service on citrixps07.my.com is responding in some way<br /># Failed test (c:inetpubwwwrootpp_nagioscheck_citrix.pl at line 80)<br /># ...<br /># Looks like you failed 1 tests of 29.<br />Check command:<br />check_tap.pl -v -e '/usr/bin/curl -sk' -s https://$HOSTADDRESS$/foo/porttest.pl -w 3 -c 6<br />TAP OK - FAIL. 1 of 29 failed. Warning/critical at 2/3. 1st='not ok 14 - ICA service on citrixps07.my.com is responding in some way' | total=29;; failures=1;; passed=28;;<br />2011<br />
    99. 99. Ex. 3 Symfony2 Functional tests<br /><ul><li>Symfony2 == next-generation MVC framework for PHP apps
    100. 100. Symfony2 uses the PHPUnit framework.
    101. 101. PHPUnit supports TAP output.
    102. 102. Good developers are already writing these.</li></ul>2011<br />
    103. 103. Ex. 3 Symfony2 functional tests<br />publicfunctiontestSearch() {$client=$this->createClient();$crawler=$client->request('GET','/');$this->assertTrue($client->getResponse()->isSuccessful());$form=$crawler->selectButton('Search')->form();$form['q']='chest pain';$crawler=$client->submit($form);$this->assertTrue($client->getResponse()->isSuccessful());$this->assertTrue($crawler->filter('h3')->count()>0);<br />}<br />2011<br />
    104. 104. Ex. 3 Symfony2 functional tests<br />$ phpunit.bat --tap -c app src/BHS/.../DefaultControllerTest.php<br />TAP version 13<br />ok 1 - BHS...DefaultControllerTest::testIndex<br />ok 2 - BHS...DefaultControllerTest::testSearch<br />ok 3 - BHS...DefaultControllerTest::testCareset<br />1..3<br />2011<br />
    105. 105. Ex. 3 Symfony2 functional tests<br />$ perl check_tap.pl -e 'c:/xampp/php/phpunit.bat --tap-c c:/web/cpoe_ordersets/app' -s 'c:/web/cpoe_ordersets/src/BHS/CPOEBundle/Tests/Controller/DefaultControllerTest.php'<br />TAP OK - PASS | total=3;; failures=0;; passed=3;;<br />2011<br />
    106. 106. Cucumber<br />cukes.info<br />2011<br />
    107. 107. cucumber-nagios<br />From the Ruby on Rails community.<br />Thanks to RanjibDey for mentioning it to me!<br />2011<br />
    108. 108. cucumber-nagios<br />Cucumber is for BDD in “natural” language:<br />Feature: google.com It should be up And I should be able to search for things Scenario: Searching for things When I go to "http://www.google.com.au/" And I fill in "q" with "wikipedia" And I press "Google Search" Then I should see "www.wikipedia.org"<br />2011<br />
    109. 109. cucumber-nagios<br />cucumber-nagios hooks Nagios up to these features:<br />$ bin/cucumber-nagios features/ebay.com.au/bidding.feature CUCUMBER OK - Critical: 0, Warning: 0, 4 okay<br />2011<br />
    110. 110. IV. Ideas/implications<br />2011<br />
    111. 111. 2011<br />54<br />Test in production<br />Your software came with tests, right?<br />Can you get them?<br />Can you write them?<br />
    112. 112. Monitor in development<br /><ul><li>Build confidence
    113. 113. Leverage Nagios’ strengths
    114. 114. Not a CI tool but:
    115. 115. Alerting rules
    116. 116. Escalation
    117. 117. Service handlers</li></ul>2011<br />
    118. 118. Tolerate ambiguity<br />How much failure is worth being paged in the middle of the night?<br />Example: Our Citrix farm<br />2011<br />
    119. 119. Check from both sides<br /><ul><li>I can use our web app; can the user?
    120. 120. Firewalls need asymmetrical testing</li></ul>2011<br />
    121. 121. Monitoring at a higher level<br /><ul><li>Measure and monitor the things the user sees/feels</li></li></ul><li>Monitoring at a higher level<br /><ul><li>Selenium tests web apps using actual browsers
    122. 122. AutoIt, AppleScript for GUI apps?</li></ul>2011<br />
    123. 123. Monitoring at a higher level<br />Testing mobile platforms is still hard<br />2011<br />
    124. 124. 2011<br />61<br />CENTRAL IDEA<br />Monitoring tools are to the sysadmin what testing tools are to the developer.<br />
    125. 125. Takeaways<br /><ul><li>Pursue quality
    126. 126. Learn from software engineering test practices
    127. 127. Monitor at multiple levels
    128. 128. Unit
    129. 129. Integration
    130. 130. Acceptance
    131. 131. Grow your monitoring suite gradually
    132. 132. Re-use testing tools for monitoring</li></ul>2011<br />
    133. 133. Blatant plug<br />Come to my other talk at 4:30!<br />Writing Custom Nagios Plugins In PerlA hands-on workshop walking you through writing your first custom check script in Perl using the Nagios::Plugin toolkit. Basic familiarity with Perl, Ruby, PHP, or shell scripting would help, but the basic concepts will be transferable to other languages.<br />2011<br />
    134. 134. Questions? Comments?<br />Comments optional at my blog post:<br />goo.gl/Oc2rn <br />2011<br />
    135. 135. Bonus Section<br />
    136. 136. Bad restaurant<br />Customer drinks water<br />Customer gets thirsty<br />Notices water glass is empty<br />Waits<br />Gets mad<br />2011<br />
    137. 137. Bad restaurant<br />Flags waiter (waiter rolls eyes)<br />Waiter opens a support ticket and assigns it to the busser<br />Busser is on break<br />
    138. 138. Bad restaurant<br />Customer waits, flags waiter again<br />Busserrefills water<br />Customer drinks and leaves a nice big tip<br />
    139. 139. Nice restaurants employ ninja waiters<br />The customer never notices but the glass is always full.<br />
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×