0
Have you been stalking
your servers?
Have you been stalking
your servers?
Marji Cermak
Sysadmin & DevOps Engineer at Morpht
marji@morpht.com
@cermakm
The rule of 3 things
picture: http://www.flickr.com/photos/helenaperezgarcia/5692392667/
The rule of 3 things
1. What is monitoring and why do you want to
monitor
2. Some monitoring tools available for you
3. It...
Part 1
What is monitoring and why do you want to
monitor
photo: http://www.flickr.com/photos/tiagopadua/7903366470/
Monitoring
Monitoring is an intermittent (regular or
irregular) series of observations in time,
carried out to show the ex...
Why you need to monitor
● to know about the bad news before your
customers (or your boss)
Why you need to monitor
● to know about the bad news before your
customers (or your boss)
● to scale up your server in adv...
Why you need to monitor
● to know about the bad news before your
customers (or your boss)
● to scale up your server in adv...
Why you need to monitor (cont.)
● to prove your uptime of 99.999 :)
The fun of the nines
Source: http://en.wikipedia.org/wiki/High_availability
Nines: http://en.wikipedia.org/wiki/List_of_un...
Why you need to monitor (cont.)
● to prove your uptime of 99.999 :)
● to minimise downtime (expensive)
Why you need to monitor (cont.)
● to prove your uptime of 99.999 :)
● to minimise downtime (expensive)
● to capture custom...
Why you need to monitor (cont.)
● to have data / metrics to diagnose
Diagnosing your collected data
watch out for:
● trends
Diagnosing your collected data
watch out for:
● trends
● spikes
Diagnosing your collected data
watch out for:
● trends
● spikes
● irregularities
Diagnosing your collected data
watch out for:
● trends
● spikes
● irregularities
● thresholds
Areas to monitor
● network
photo: http://www.flickr.com/photos/misja_klimov/2120956405/
Areas to monitor
● network
● server
photo: http://www.flickr.com/photos/johnjack/3666997634/
Areas to monitor
● network
● server
● services
photo: http://www.flickr.com/photos/agustingodet/3691794089/
Areas to monitor
● network
● server
● services
photo: http://www.flickr.com/photos/agustingodet/3691792393/
Areas to monitor
● network
● server
● services
● applications
photo: http://www.flickr.com/photos/cheerfulstoic/942211994/
Areas to monitor
● network
● server
● services
● applications
● users
photo: http://www.flickr.com/photos/jimmysmith/99528...
Drupal Areas to monitor?
● network
● server
● services
● applications
● users
Drupal Areas to monitor
● network
● server
● services
● applications
● users
Drupal Areas to monitor
● network
● server
● services
● applications
● users
Drupal Areas to monitor
● network
● server
● services
○ webserver
○ database
● applications
● users
Drupal Areas to monitor
● network
● server
● services
○ webserver
○ database
● applications - your Drupal site(s)
● users
Drupal Areas to monitor
● network
● server
● services
○ webserver
○ database
● applications - your Drupal site(s)
● users
Part 2
Some monitoring tools available for you
Meet Nagios, Munin and others
● Nagios
● Munin
● APC dashboard
● related Drupal modules
Nagios /ˈnɑːɡiːoʊs/
● system, network and infrastructure
monitoring software application
● monitors and alerts
● many plug...
Nagios /ˈnɑːɡiːoʊs/
Name and Pronunciation:
● NetSaint -> "Nagios Ain't Gonna Insist On
Sainthood"
● Agios' a transliterat...
Nagios /ˈnɑːɡiːoʊs/
● alerts by email/pager/IM...
● alerts to different contacts
● notification escalation
● service / hos...
Nagios /ˈnɑːɡiːoʊs/
Drupal and Nagios
Munin
● network/system monitoring application
● outputs graphs through a web interface
● many plugins
Munin
● master / node architecture
● connects to all nodes at regular intervals
● it uses the RRDtool (round robin databas...
Munin Example
Drupal and Munin
Drupal and Munin
● they complement each other
● nagios normally alerts on one “service”
● munin can be used to correlate different
things
N...
APC - what is it?
The Alternative PHP Cache (APC) is a free
and open opcode cache for PHP.
APC - what is it?
The Alternative PHP Cache (APC) is a free
and open opcode cache for PHP.
Its goal is to provide a free, ...
Monitoring APC
Memory Usage, Hit & Misses
Monitoring APC
Fragmentation
Monitoring APC
memory usage
Monitoring APC
files in cache
Other monitoring tools
● Collectd
● Graphite
● Shinken
● Sensu
● NewRelic
● Pingdom
Part 3
It is easy to start with monitoring.
How to install these tools?
Munin
sudo apt-get install munin munin-node
Nagios
sudo apt-get install nagios3
APC dashboard
...
How to configure these?
● It is a bit fiddly
● There are many guides targeting beginners
● You don’t want to do it again a...
puppet – a quick way to start
system for automating system administration
tasks
puppet – a quick way to start
● a declarative language for expressing
system configuration,
puppet – a quick way to start
● a declarative language for expressing
system configuration,
● a client and server for dist...
puppet – a quick way to start
● a declarative language for expressing
system configuration,
● a client and server for dist...
puppet – a quick way to start
package { 'munin-node': ensure => installed }
service { 'munin-node':
enable => true,
ensure...
puppet – a quick way to start
1. clone the stalk-your-box repo
2. run puppet apply on the code
3. monitor!
A quick way to start
$ git clone
git://github.com/morpht/stalk-your-box.git
/tmp/stalk-your-box
Cloning into '/tmp/stalk-y...
A quick way to start
$ cd /tmp/stalk-your-box/
$ sudo puppet apply
--modulepath=modules manifest.pp
notice: /Stage[main]/N...
What this gives you
What this gives you
What this gives you
Manifest.pp
# Execute apt-get update before any package is installed:
exec { 'apt-update':
command => 'apt-get update',
# ...
Manifest.pp
# Install munin node and munin server:
class { 'munin::node': }
class { 'munin::server':
htuser => 'munin', # ...
Manifest.pp
# Deploys APC dashboard - install php-apc package and
# deploy the apc.php script from it.
package { 'php-apc'...
Summary
It is easy to start with monitoring.
The fun part - what’s wrong?
What’s wrong here?
The fun part - what’s wrong?
Questions
Here is the get started monitoring repo:
https://github.com/morpht/stalk-your-box
Marji Cermak
Sysadmin & DevOps...
Resources
Rule of Three: en.wikipedia.org/wiki/Rule_of_three_(writing)
Nagios: http://www.nagios.org/
Munin: http://munin-...
THANK YOU!
WHAT DID YOU THINK?
Locate this session at the
DrupalCon Prague website:
http://prague2013.drupal.org/schedule
...
Have you been stalking your servers?
Have you been stalking your servers?
Have you been stalking your servers?
Have you been stalking your servers?
Upcoming SlideShare
Loading in...5
×

Have you been stalking your servers?

505

Published on

A presentation for DrupalCon Prague 2013

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
505
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Have you been stalking your servers?"

  1. 1. Have you been stalking your servers?
  2. 2. Have you been stalking your servers? Marji Cermak Sysadmin & DevOps Engineer at Morpht marji@morpht.com @cermakm
  3. 3. The rule of 3 things picture: http://www.flickr.com/photos/helenaperezgarcia/5692392667/
  4. 4. The rule of 3 things 1. What is monitoring and why do you want to monitor 2. Some monitoring tools available for you 3. It is easy to start with monitoring.
  5. 5. Part 1 What is monitoring and why do you want to monitor
  6. 6. photo: http://www.flickr.com/photos/tiagopadua/7903366470/
  7. 7. Monitoring Monitoring is an intermittent (regular or irregular) series of observations in time, carried out to show the extent of compliance with a formulated standard or degree of deviation from an expected norm. J. M. Hellawell (1991), modified by A. Brown (2000), http://jncc.defra.gov.uk/page-2268 nature conservation area
  8. 8. Why you need to monitor ● to know about the bad news before your customers (or your boss)
  9. 9. Why you need to monitor ● to know about the bad news before your customers (or your boss) ● to scale up your server in advance
  10. 10. Why you need to monitor ● to know about the bad news before your customers (or your boss) ● to scale up your server in advance ● to tune up your app
  11. 11. Why you need to monitor (cont.) ● to prove your uptime of 99.999 :)
  12. 12. The fun of the nines Source: http://en.wikipedia.org/wiki/High_availability Nines: http://en.wikipedia.org/wiki/List_of_unusual_units_of_measurement#Nines
  13. 13. Why you need to monitor (cont.) ● to prove your uptime of 99.999 :) ● to minimise downtime (expensive)
  14. 14. Why you need to monitor (cont.) ● to prove your uptime of 99.999 :) ● to minimise downtime (expensive) ● to capture customer information
  15. 15. Why you need to monitor (cont.) ● to have data / metrics to diagnose
  16. 16. Diagnosing your collected data watch out for: ● trends
  17. 17. Diagnosing your collected data watch out for: ● trends ● spikes
  18. 18. Diagnosing your collected data watch out for: ● trends ● spikes ● irregularities
  19. 19. Diagnosing your collected data watch out for: ● trends ● spikes ● irregularities ● thresholds
  20. 20. Areas to monitor ● network photo: http://www.flickr.com/photos/misja_klimov/2120956405/
  21. 21. Areas to monitor ● network ● server photo: http://www.flickr.com/photos/johnjack/3666997634/
  22. 22. Areas to monitor ● network ● server ● services photo: http://www.flickr.com/photos/agustingodet/3691794089/
  23. 23. Areas to monitor ● network ● server ● services photo: http://www.flickr.com/photos/agustingodet/3691792393/
  24. 24. Areas to monitor ● network ● server ● services ● applications photo: http://www.flickr.com/photos/cheerfulstoic/942211994/
  25. 25. Areas to monitor ● network ● server ● services ● applications ● users photo: http://www.flickr.com/photos/jimmysmith/99528596/
  26. 26. Drupal Areas to monitor? ● network ● server ● services ● applications ● users
  27. 27. Drupal Areas to monitor ● network ● server ● services ● applications ● users
  28. 28. Drupal Areas to monitor ● network ● server ● services ● applications ● users
  29. 29. Drupal Areas to monitor ● network ● server ● services ○ webserver ○ database ● applications ● users
  30. 30. Drupal Areas to monitor ● network ● server ● services ○ webserver ○ database ● applications - your Drupal site(s) ● users
  31. 31. Drupal Areas to monitor ● network ● server ● services ○ webserver ○ database ● applications - your Drupal site(s) ● users
  32. 32. Part 2 Some monitoring tools available for you
  33. 33. Meet Nagios, Munin and others ● Nagios ● Munin ● APC dashboard ● related Drupal modules
  34. 34. Nagios /ˈnɑːɡiːoʊs/ ● system, network and infrastructure monitoring software application ● monitors and alerts ● many plugins
  35. 35. Nagios /ˈnɑːɡiːoʊs/ Name and Pronunciation: ● NetSaint -> "Nagios Ain't Gonna Insist On Sainthood" ● Agios' a transliteration of the Greek word άγιος (saint)
  36. 36. Nagios /ˈnɑːɡiːoʊs/ ● alerts by email/pager/IM... ● alerts to different contacts ● notification escalation ● service / host dependencies ● soft / hard states
  37. 37. Nagios /ˈnɑːɡiːoʊs/
  38. 38. Drupal and Nagios
  39. 39. Munin ● network/system monitoring application ● outputs graphs through a web interface ● many plugins
  40. 40. Munin ● master / node architecture ● connects to all nodes at regular intervals ● it uses the RRDtool (round robin database tool, handles time-series data)
  41. 41. Munin Example
  42. 42. Drupal and Munin
  43. 43. Drupal and Munin
  44. 44. ● they complement each other ● nagios normally alerts on one “service” ● munin can be used to correlate different things Nagios & Munin
  45. 45. APC - what is it? The Alternative PHP Cache (APC) is a free and open opcode cache for PHP.
  46. 46. APC - what is it? The Alternative PHP Cache (APC) is a free and open opcode cache for PHP. Its goal is to provide a free, open, and robust framework for caching and optimising PHP intermediate code. Inside your webserver (not a webcache)
  47. 47. Monitoring APC Memory Usage, Hit & Misses
  48. 48. Monitoring APC Fragmentation
  49. 49. Monitoring APC memory usage
  50. 50. Monitoring APC files in cache
  51. 51. Other monitoring tools ● Collectd ● Graphite ● Shinken ● Sensu ● NewRelic ● Pingdom
  52. 52. Part 3 It is easy to start with monitoring.
  53. 53. How to install these tools? Munin sudo apt-get install munin munin-node Nagios sudo apt-get install nagios3 APC dashboard php.apc script from php-apc package
  54. 54. How to configure these? ● It is a bit fiddly ● There are many guides targeting beginners ● You don’t want to do it again and again
  55. 55. puppet – a quick way to start system for automating system administration tasks
  56. 56. puppet – a quick way to start ● a declarative language for expressing system configuration,
  57. 57. puppet – a quick way to start ● a declarative language for expressing system configuration, ● a client and server for distributing it
  58. 58. puppet – a quick way to start ● a declarative language for expressing system configuration, ● a client and server for distributing it ● and a library for realising the configuration.
  59. 59. puppet – a quick way to start package { 'munin-node': ensure => installed } service { 'munin-node': enable => true, ensure => running, require => Package['munin-node'], }
  60. 60. puppet – a quick way to start 1. clone the stalk-your-box repo 2. run puppet apply on the code 3. monitor!
  61. 61. A quick way to start $ git clone git://github.com/morpht/stalk-your-box.git /tmp/stalk-your-box Cloning into '/tmp/stalk-your-box'... remote: Counting objects: 23, done. remote: Compressing objects: 100% (19/19), done. remote: Total 23 (delta 1), reused 23 (delta 1) Receiving objects: 100% (23/23), 11.35 KiB, done. Resolving deltas: 100% (1/1), done.
  62. 62. A quick way to start $ cd /tmp/stalk-your-box/ $ sudo puppet apply --modulepath=modules manifest.pp notice: /Stage[main]/Nagios::Server/Package[nagios3]/ensure: ensure changed 'purged' to 'present' notice: /Stage[main]/Nagios::Server/File[/etc/nagios3/htpasswd.users]/ensure: created notice: /Stage[main]/Nagios::Server/Exec[update-nagios-htpasswd]/returns: Adding password for user nagiosadmin notice: /Stage[main]/Nagios::Server/Exec[update-nagios-htpasswd]/returns: executed successfully notice: /Stage[main]/Munin::Node/Package[libcache-cache-perl]/ensure: ensure changed 'purged' to 'present' notice: /Stage[main]/Munin::Node/Package[munin-node]/ensure: ensure changed 'purged' to 'present' notice: /Stage[main]/Munin::Node/File[munin-node.conf]/content: content changed '{md5} e486786f866d7d7e025dea401c300e7b' to '{md5}dbf97a87a8da86ef68155815ecae3c1c' notice: /Stage[main]/Munin::Server/Service[apache2]: Triggered 'refresh' from 1 events notice: Finished catalog run in 44.26 seconds
  63. 63. What this gives you
  64. 64. What this gives you
  65. 65. What this gives you
  66. 66. Manifest.pp # Execute apt-get update before any package is installed: exec { 'apt-update': command => 'apt-get update', # but don't execute it more than once a day: unless => 'test $(find /var/cache/apt/pkgcache.bin -mtime 0 | wc -l ) -eq 1', } Exec['apt-update'] -> Package <| |> # Include minimal apache2 installation. Munin server, nagios # and APC dashboard depend on it. include 'apache2'
  67. 67. Manifest.pp # Install munin node and munin server: class { 'munin::node': } class { 'munin::server': htuser => 'munin', # Username for basic access auth. htpass => 'Prague2013' # Password for basic access auth. } # Install nagios: class { 'nagios::server': contact_email => 'root@localhost', # Email to send alerts to. htpass => 'Prague2013', # Password for the nagiosadmin username. }
  68. 68. Manifest.pp # Deploys APC dashboard - install php-apc package and # deploy the apc.php script from it. package { 'php-apc': ensure => installed } exec { 'deploy-apc-dashboard': path => '/bin:/usr/bin', command => 'gzip -dc /usr/share/doc/php-apc/apc.php.gz > /var/www/apc.php', notify => Service['apache2'], unless => '[ -f /var/www/apc.php ]', require => [ Package['php-apc'], Package['apache2'] ] }
  69. 69. Summary It is easy to start with monitoring.
  70. 70. The fun part - what’s wrong? What’s wrong here?
  71. 71. The fun part - what’s wrong?
  72. 72. Questions Here is the get started monitoring repo: https://github.com/morpht/stalk-your-box Marji Cermak Sysadmin & DevOps Engineer at Morpht marji@morpht.com @cermakm
  73. 73. Resources Rule of Three: en.wikipedia.org/wiki/Rule_of_three_(writing) Nagios: http://www.nagios.org/ Munin: http://munin-monitoring.org/ Nagios module: https://drupal.org/project/nagios Munin module: https://drupal.org/project/munin Munin plugins (experimental): https://drupal.org/sandbox/murrayw/2084281 Sensu: http://sensuapp.org MySQLTuner: http://MySQLTuner.pl
  74. 74. THANK YOU! WHAT DID YOU THINK? Locate this session at the DrupalCon Prague website: http://prague2013.drupal.org/schedule Click the “Take the survey” link
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×