Have you been stalking
your servers?
Have you been stalking
your servers?
Marji Cermak
Sysadmin & DevOps Engineer at Morpht
marji@morpht.com
@cermakm
The rule of 3 things
picture: http://www.flickr.com/photos/helenaperezgarcia/5692392667/
The rule of 3 things
1. What is monitoring and why do you want to
monitor
2. Some monitoring tools available for you
3. It...
Part 1
What is monitoring and why do you want to
monitor
photo: http://www.flickr.com/photos/tiagopadua/7903366470/
Monitoring
Monitoring is an intermittent (regular or
irregular) series of observations in time,
carried out to show the ex...
Why you need to monitor
● to know about the bad news before your
customers (or your boss)
Why you need to monitor
● to know about the bad news before your
customers (or your boss)
● to scale up your server in adv...
Why you need to monitor
● to know about the bad news before your
customers (or your boss)
● to scale up your server in adv...
Why you need to monitor (cont.)
● to prove your uptime of 99.999 :)
The fun of the nines
Source: http://en.wikipedia.org/wiki/High_availability
Nines: http://en.wikipedia.org/wiki/List_of_un...
Why you need to monitor (cont.)
● to prove your uptime of 99.999 :)
● to minimise downtime (expensive)
Why you need to monitor (cont.)
● to prove your uptime of 99.999 :)
● to minimise downtime (expensive)
● to capture custom...
Why you need to monitor (cont.)
● to have data / metrics to diagnose
Diagnosing your collected data
watch out for:
● trends
Diagnosing your collected data
watch out for:
● trends
● spikes
Diagnosing your collected data
watch out for:
● trends
● spikes
● irregularities
Diagnosing your collected data
watch out for:
● trends
● spikes
● irregularities
● thresholds
Areas to monitor
● network
photo: http://www.flickr.com/photos/misja_klimov/2120956405/
Areas to monitor
● network
● server
photo: http://www.flickr.com/photos/johnjack/3666997634/
Areas to monitor
● network
● server
● services
photo: http://www.flickr.com/photos/agustingodet/3691794089/
Areas to monitor
● network
● server
● services
photo: http://www.flickr.com/photos/agustingodet/3691792393/
Areas to monitor
● network
● server
● services
● applications
photo: http://www.flickr.com/photos/cheerfulstoic/942211994/
Areas to monitor
● network
● server
● services
● applications
● users
photo: http://www.flickr.com/photos/jimmysmith/99528...
Drupal Areas to monitor?
● network
● server
● services
● applications
● users
Drupal Areas to monitor
● network
● server
● services
● applications
● users
Drupal Areas to monitor
● network
● server
● services
● applications
● users
Drupal Areas to monitor
● network
● server
● services
○ webserver
○ database
● applications
● users
Drupal Areas to monitor
● network
● server
● services
○ webserver
○ database
● applications - your Drupal site(s)
● users
Drupal Areas to monitor
● network
● server
● services
○ webserver
○ database
● applications - your Drupal site(s)
● users
Part 2
Some monitoring tools available for you
Meet Nagios, Munin and others
● Nagios
● Munin
● APC dashboard
● related Drupal modules
Nagios /ˈnɑːɡiːoʊs/
● system, network and infrastructure
monitoring software application
● monitors and alerts
● many plug...
Nagios /ˈnɑːɡiːoʊs/
Name and Pronunciation:
● NetSaint -> "Nagios Ain't Gonna Insist On
Sainthood"
● Agios' a transliterat...
Nagios /ˈnɑːɡiːoʊs/
● alerts by email/pager/IM...
● alerts to different contacts
● notification escalation
● service / hos...
Nagios /ˈnɑːɡiːoʊs/
Drupal and Nagios
Munin
● network/system monitoring application
● outputs graphs through a web interface
● many plugins
Munin
● master / node architecture
● connects to all nodes at regular intervals
● it uses the RRDtool (round robin databas...
Munin Example
Drupal and Munin
Drupal and Munin
● they complement each other
● nagios normally alerts on one “service”
● munin can be used to correlate different
things
N...
APC - what is it?
The Alternative PHP Cache (APC) is a free
and open opcode cache for PHP.
APC - what is it?
The Alternative PHP Cache (APC) is a free
and open opcode cache for PHP.
Its goal is to provide a free, ...
Monitoring APC
Memory Usage, Hit & Misses
Monitoring APC
Fragmentation
Monitoring APC
memory usage
Monitoring APC
files in cache
Other monitoring tools
● Collectd
● Graphite
● Shinken
● Sensu
● NewRelic
● Pingdom
Part 3
It is easy to start with monitoring.
How to install these tools?
Munin
sudo apt-get install munin munin-node
Nagios
sudo apt-get install nagios3
APC dashboard
...
How to configure these?
● It is a bit fiddly
● There are many guides targeting beginners
● You don’t want to do it again a...
puppet – a quick way to start
system for automating system administration
tasks
puppet – a quick way to start
● a declarative language for expressing
system configuration,
puppet – a quick way to start
● a declarative language for expressing
system configuration,
● a client and server for dist...
puppet – a quick way to start
● a declarative language for expressing
system configuration,
● a client and server for dist...
puppet – a quick way to start
package { 'munin-node': ensure => installed }
service { 'munin-node':
enable => true,
ensure...
puppet – a quick way to start
1. clone the stalk-your-box repo
2. run puppet apply on the code
3. monitor!
A quick way to start
$ git clone
git://github.com/morpht/stalk-your-box.git
/tmp/stalk-your-box
Cloning into '/tmp/stalk-y...
A quick way to start
$ cd /tmp/stalk-your-box/
$ sudo puppet apply
--modulepath=modules manifest.pp
notice: /Stage[main]/N...
What this gives you
What this gives you
What this gives you
Manifest.pp
# Execute apt-get update before any package is installed:
exec { 'apt-update':
command => 'apt-get update',
# ...
Manifest.pp
# Install munin node and munin server:
class { 'munin::node': }
class { 'munin::server':
htuser => 'munin', # ...
Manifest.pp
# Deploys APC dashboard - install php-apc package and
# deploy the apc.php script from it.
package { 'php-apc'...
Summary
It is easy to start with monitoring.
The fun part - what’s wrong?
What’s wrong here?
The fun part - what’s wrong?
Questions
Here is the get started monitoring repo:
https://github.com/morpht/stalk-your-box
Marji Cermak
Sysadmin & DevOps...
Resources
Rule of Three: en.wikipedia.org/wiki/Rule_of_three_(writing)
Nagios: http://www.nagios.org/
Munin: http://munin-...
THANK YOU!
WHAT DID YOU THINK?
Locate this session at the
DrupalCon Prague website:
http://prague2013.drupal.org/schedule
...
Have you been stalking your servers?
Have you been stalking your servers?
Have you been stalking your servers?
Have you been stalking your servers?
Upcoming SlideShare
Loading in...5
×

Have you been stalking your servers?

511

Published on

A presentation for DrupalCon Prague 2013

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
511
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Have you been stalking your servers?

  1. 1. Have you been stalking your servers?
  2. 2. Have you been stalking your servers? Marji Cermak Sysadmin & DevOps Engineer at Morpht marji@morpht.com @cermakm
  3. 3. The rule of 3 things picture: http://www.flickr.com/photos/helenaperezgarcia/5692392667/
  4. 4. The rule of 3 things 1. What is monitoring and why do you want to monitor 2. Some monitoring tools available for you 3. It is easy to start with monitoring.
  5. 5. Part 1 What is monitoring and why do you want to monitor
  6. 6. photo: http://www.flickr.com/photos/tiagopadua/7903366470/
  7. 7. Monitoring Monitoring is an intermittent (regular or irregular) series of observations in time, carried out to show the extent of compliance with a formulated standard or degree of deviation from an expected norm. J. M. Hellawell (1991), modified by A. Brown (2000), http://jncc.defra.gov.uk/page-2268 nature conservation area
  8. 8. Why you need to monitor ● to know about the bad news before your customers (or your boss)
  9. 9. Why you need to monitor ● to know about the bad news before your customers (or your boss) ● to scale up your server in advance
  10. 10. Why you need to monitor ● to know about the bad news before your customers (or your boss) ● to scale up your server in advance ● to tune up your app
  11. 11. Why you need to monitor (cont.) ● to prove your uptime of 99.999 :)
  12. 12. The fun of the nines Source: http://en.wikipedia.org/wiki/High_availability Nines: http://en.wikipedia.org/wiki/List_of_unusual_units_of_measurement#Nines
  13. 13. Why you need to monitor (cont.) ● to prove your uptime of 99.999 :) ● to minimise downtime (expensive)
  14. 14. Why you need to monitor (cont.) ● to prove your uptime of 99.999 :) ● to minimise downtime (expensive) ● to capture customer information
  15. 15. Why you need to monitor (cont.) ● to have data / metrics to diagnose
  16. 16. Diagnosing your collected data watch out for: ● trends
  17. 17. Diagnosing your collected data watch out for: ● trends ● spikes
  18. 18. Diagnosing your collected data watch out for: ● trends ● spikes ● irregularities
  19. 19. Diagnosing your collected data watch out for: ● trends ● spikes ● irregularities ● thresholds
  20. 20. Areas to monitor ● network photo: http://www.flickr.com/photos/misja_klimov/2120956405/
  21. 21. Areas to monitor ● network ● server photo: http://www.flickr.com/photos/johnjack/3666997634/
  22. 22. Areas to monitor ● network ● server ● services photo: http://www.flickr.com/photos/agustingodet/3691794089/
  23. 23. Areas to monitor ● network ● server ● services photo: http://www.flickr.com/photos/agustingodet/3691792393/
  24. 24. Areas to monitor ● network ● server ● services ● applications photo: http://www.flickr.com/photos/cheerfulstoic/942211994/
  25. 25. Areas to monitor ● network ● server ● services ● applications ● users photo: http://www.flickr.com/photos/jimmysmith/99528596/
  26. 26. Drupal Areas to monitor? ● network ● server ● services ● applications ● users
  27. 27. Drupal Areas to monitor ● network ● server ● services ● applications ● users
  28. 28. Drupal Areas to monitor ● network ● server ● services ● applications ● users
  29. 29. Drupal Areas to monitor ● network ● server ● services ○ webserver ○ database ● applications ● users
  30. 30. Drupal Areas to monitor ● network ● server ● services ○ webserver ○ database ● applications - your Drupal site(s) ● users
  31. 31. Drupal Areas to monitor ● network ● server ● services ○ webserver ○ database ● applications - your Drupal site(s) ● users
  32. 32. Part 2 Some monitoring tools available for you
  33. 33. Meet Nagios, Munin and others ● Nagios ● Munin ● APC dashboard ● related Drupal modules
  34. 34. Nagios /ˈnɑːɡiːoʊs/ ● system, network and infrastructure monitoring software application ● monitors and alerts ● many plugins
  35. 35. Nagios /ˈnɑːɡiːoʊs/ Name and Pronunciation: ● NetSaint -> "Nagios Ain't Gonna Insist On Sainthood" ● Agios' a transliteration of the Greek word άγιος (saint)
  36. 36. Nagios /ˈnɑːɡiːoʊs/ ● alerts by email/pager/IM... ● alerts to different contacts ● notification escalation ● service / host dependencies ● soft / hard states
  37. 37. Nagios /ˈnɑːɡiːoʊs/
  38. 38. Drupal and Nagios
  39. 39. Munin ● network/system monitoring application ● outputs graphs through a web interface ● many plugins
  40. 40. Munin ● master / node architecture ● connects to all nodes at regular intervals ● it uses the RRDtool (round robin database tool, handles time-series data)
  41. 41. Munin Example
  42. 42. Drupal and Munin
  43. 43. Drupal and Munin
  44. 44. ● they complement each other ● nagios normally alerts on one “service” ● munin can be used to correlate different things Nagios & Munin
  45. 45. APC - what is it? The Alternative PHP Cache (APC) is a free and open opcode cache for PHP.
  46. 46. APC - what is it? The Alternative PHP Cache (APC) is a free and open opcode cache for PHP. Its goal is to provide a free, open, and robust framework for caching and optimising PHP intermediate code. Inside your webserver (not a webcache)
  47. 47. Monitoring APC Memory Usage, Hit & Misses
  48. 48. Monitoring APC Fragmentation
  49. 49. Monitoring APC memory usage
  50. 50. Monitoring APC files in cache
  51. 51. Other monitoring tools ● Collectd ● Graphite ● Shinken ● Sensu ● NewRelic ● Pingdom
  52. 52. Part 3 It is easy to start with monitoring.
  53. 53. How to install these tools? Munin sudo apt-get install munin munin-node Nagios sudo apt-get install nagios3 APC dashboard php.apc script from php-apc package
  54. 54. How to configure these? ● It is a bit fiddly ● There are many guides targeting beginners ● You don’t want to do it again and again
  55. 55. puppet – a quick way to start system for automating system administration tasks
  56. 56. puppet – a quick way to start ● a declarative language for expressing system configuration,
  57. 57. puppet – a quick way to start ● a declarative language for expressing system configuration, ● a client and server for distributing it
  58. 58. puppet – a quick way to start ● a declarative language for expressing system configuration, ● a client and server for distributing it ● and a library for realising the configuration.
  59. 59. puppet – a quick way to start package { 'munin-node': ensure => installed } service { 'munin-node': enable => true, ensure => running, require => Package['munin-node'], }
  60. 60. puppet – a quick way to start 1. clone the stalk-your-box repo 2. run puppet apply on the code 3. monitor!
  61. 61. A quick way to start $ git clone git://github.com/morpht/stalk-your-box.git /tmp/stalk-your-box Cloning into '/tmp/stalk-your-box'... remote: Counting objects: 23, done. remote: Compressing objects: 100% (19/19), done. remote: Total 23 (delta 1), reused 23 (delta 1) Receiving objects: 100% (23/23), 11.35 KiB, done. Resolving deltas: 100% (1/1), done.
  62. 62. A quick way to start $ cd /tmp/stalk-your-box/ $ sudo puppet apply --modulepath=modules manifest.pp notice: /Stage[main]/Nagios::Server/Package[nagios3]/ensure: ensure changed 'purged' to 'present' notice: /Stage[main]/Nagios::Server/File[/etc/nagios3/htpasswd.users]/ensure: created notice: /Stage[main]/Nagios::Server/Exec[update-nagios-htpasswd]/returns: Adding password for user nagiosadmin notice: /Stage[main]/Nagios::Server/Exec[update-nagios-htpasswd]/returns: executed successfully notice: /Stage[main]/Munin::Node/Package[libcache-cache-perl]/ensure: ensure changed 'purged' to 'present' notice: /Stage[main]/Munin::Node/Package[munin-node]/ensure: ensure changed 'purged' to 'present' notice: /Stage[main]/Munin::Node/File[munin-node.conf]/content: content changed '{md5} e486786f866d7d7e025dea401c300e7b' to '{md5}dbf97a87a8da86ef68155815ecae3c1c' notice: /Stage[main]/Munin::Server/Service[apache2]: Triggered 'refresh' from 1 events notice: Finished catalog run in 44.26 seconds
  63. 63. What this gives you
  64. 64. What this gives you
  65. 65. What this gives you
  66. 66. Manifest.pp # Execute apt-get update before any package is installed: exec { 'apt-update': command => 'apt-get update', # but don't execute it more than once a day: unless => 'test $(find /var/cache/apt/pkgcache.bin -mtime 0 | wc -l ) -eq 1', } Exec['apt-update'] -> Package <| |> # Include minimal apache2 installation. Munin server, nagios # and APC dashboard depend on it. include 'apache2'
  67. 67. Manifest.pp # Install munin node and munin server: class { 'munin::node': } class { 'munin::server': htuser => 'munin', # Username for basic access auth. htpass => 'Prague2013' # Password for basic access auth. } # Install nagios: class { 'nagios::server': contact_email => 'root@localhost', # Email to send alerts to. htpass => 'Prague2013', # Password for the nagiosadmin username. }
  68. 68. Manifest.pp # Deploys APC dashboard - install php-apc package and # deploy the apc.php script from it. package { 'php-apc': ensure => installed } exec { 'deploy-apc-dashboard': path => '/bin:/usr/bin', command => 'gzip -dc /usr/share/doc/php-apc/apc.php.gz > /var/www/apc.php', notify => Service['apache2'], unless => '[ -f /var/www/apc.php ]', require => [ Package['php-apc'], Package['apache2'] ] }
  69. 69. Summary It is easy to start with monitoring.
  70. 70. The fun part - what’s wrong? What’s wrong here?
  71. 71. The fun part - what’s wrong?
  72. 72. Questions Here is the get started monitoring repo: https://github.com/morpht/stalk-your-box Marji Cermak Sysadmin & DevOps Engineer at Morpht marji@morpht.com @cermakm
  73. 73. Resources Rule of Three: en.wikipedia.org/wiki/Rule_of_three_(writing) Nagios: http://www.nagios.org/ Munin: http://munin-monitoring.org/ Nagios module: https://drupal.org/project/nagios Munin module: https://drupal.org/project/munin Munin plugins (experimental): https://drupal.org/sandbox/murrayw/2084281 Sensu: http://sensuapp.org MySQLTuner: http://MySQLTuner.pl
  74. 74. THANK YOU! WHAT DID YOU THINK? Locate this session at the DrupalCon Prague website: http://prague2013.drupal.org/schedule Click the “Take the survey” link
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×