Nagios Conference 2011 - William Leibzon - Nagios In Cloud Computing Environments


Published on

William Leibzon's presentation on using Nagios in a cloud computing environment. The presentation was given during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN.
For more information on the conference (including photos and videos), visit:

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Hi, My name is William Leibzon and today I'm going to talk about Nagios cluster in Cloud Computing environment. I want to apologize because I do not have much experience speaking at conferences. What is even worth I got sick yesterday and have a soar throat. However I made sure to put everything I could into slides so you can follow that and will have that to take home.
  • Ok, so lets begin. So you all heard the buzz word Cloud Computing but what is it? I pulled up this definition from some site but it is hardly THE definition. In a nutshell, cloud computing allows to run a lot of virtual servers on smaller number of hardware machines. And key to that is virtualization.
  • Virtualization allows to separate hardware from software. OS is supposed to provide us this level of indirection but OS gets tied to hardware too much and software packages are now tied to specific OS. With virtualization multiple systems running on the same hardware can more efficiently utilize resources so if say we have one system that uses more CPU and another that does more network io, we can potentially put them together on the same system and utilize its resources fully. And of course if we can put many systems on smaller piece of hardware that takes less space in a datacenter its less expensive. So business side all loves it.
  • Cloud computing is an extension of virtualization where instead of having virtual servers on specific hardware, we assume that there is unlimited amount of hardware available where virtual server can run on and just focus on virtual servers. A good cloud environment will keep these servers running even if there is an issue with hardware so potentially servers can move live from one hardware host to another. But what is even better is that we have control over what hosts we want to run and for how long. So we can have largest number of servers running at peak traffic load and scale it down to the minimum otherwise. Of course being able to do this requires monitoring of what resources are utilized and how.
  • Now for those who want to build cloud environment there are a number of solutions available, both open-source and commercial. VMWare is by far the largest commercial vendor. For open-source, there is a number of packages available to create a cloud, most of your OS vendors have one. And as far as hypervisors Xen dominates in open-source and gives better performance for Linux Virtual Servers on Linux than VMWare. There are also several competing hypervisors gaining popularity and in my opinion better. If you don't want to build your own cloud hardware infrastructure, buying from cloud infrastructure providers is a choice. Amazon EC2 is by far the most well known and used.
  • And these are the links to open-source cloud software from previous slide.
  • So after this brief intro to Cloud Computing we now come to what we're here for – monitoring. There are two pieces to cloud monitoring - hardware systems that runs hypervisors and software virtual servers. Hardware monitoring is similar to normal server monitoring, its static as far as new servers dont get added often and there aren't really any changes to it once everything is setup. Monitoring of system resources is often taking care of by cloud software but if its possible you should still monitor unix resources like system load, memory, etc and of environmental data can also be monitored. For virtual servers monitoring is dynamic and should handle addition and removal of servers well. The focus is application and network performance. Good thing about a cloud is once you reach a limit of what current servers can do, you can just launch a new server. This is auto-scaling and what makes cloud so useful. Nagios can be used to scale and itself should also be scalable.
  • What we want from monitoring architecture is same as with other applications - something that is easy to grow automaticaly, does not have single bottleneck and it still functions if any one server dies. This means Horizontal Scaling, Scaling on Demand and High Availability. And this means cluster.
  • There are 3 main ways to build nagios cluster. The first is what I called "Old Way" and otherwise known as "Classic Distributed Model". This is use of passive service checks on central nagios server and NCSA is used to forward information from client nagios servers. Second is "Shared Database" or "Central Dashboard Model" - database here is used to create a shared centralized view of several nagios hosts. Third way is what I call "Worker Nodes" and in Nagios that is represented by DNX and Mod-Gearman projects. Here all plugin checks get distributed to a set of worker node servers automatically and a cluster can handle many more checks than what single nagios server could do.
  • So here is Passive Service Checks model. I think everyone here already knows about it so I'll not go into it other than to say its not robust and it is difficult to configure client nagios hosts. It is also not a way to handle dynamically changing number of hosts and services.
  • Shared database in Nagios is represented by Merlin and NDO-DB projects. Of these two I use Merlin. So the advantage is there is no master nagios server and we just have a set of peer servers that share data by means of a database and you can have a cenralized view of that database through some web interface. The disadvantage is you still need to partition what set of hosts each server monitors manually. Plus you replace a central nagios server with a central database which despite me putting it into advantage is a single bottleneck.
  • Now here comes what you've all been waiting here from me - DNX :) or more generally Worker Nodes model. It is similar to classic distributed model as what you do is offload all active checks to a set of other servers. However this is all done automatically and nagios schedules these checks and not just sees them as passive. With NEB module architecture results of checks are written directly into nagios memory rather than put in a command queue. Both nagios and mod-gearman have 3 main components - NEB module, distribution server and client nodes. A single distribution daemon runs side by side with nagios daemon, client nodes talk to it and run all the checks and NEB module is an interface between nagios and a distribution server. In mod-gearman two of these components are from gearman project and only module is custom written for nagios. DNX also includes a sync script which can be used to make sure plugins are same on all servers, but personally I've just done it with ssh and rsync from cron.
  • So advantages of this solution is that it scales to handle essentially any number of service checks by just adding more servers with no additional configuration necessary. This is pretty much what you want for horizontal scaling. And since all nodes are the same that works very well for cloud computing where you can just clone the server. Its integration with nagios is as mention with a NEB module, it offloads checks and writes them back directly to and from nagios memory structures.
  • There is a whole slide here but disadvantage is essentially that you still have one single nagios server that ca handle all scheduling and notification. This also means no fault-tolerance although I wrote a patch to DNX and nagios to do it. I have another nagios installation to do in October on which to try it and after that I will release it with some documentation.
  • I have couple more slides on DNX. Basically it is a multi-threaded server. On the server side there are Timer, Collector, Registrar and Dispatcher threads and client will increase and decrease number of threads as needed to run plugins. The settings to control this are similar to apache. You should test your systems to find upper limit number. Communication between DNX client and server is using custom UDP-based XML protocol. UDP because we expect DNX clients to be located on the same network and don't want to bother with TCP overhead and if one or two packets get lost sometimes its not as important because nagios will schedule more checks.. DNX can support extensions that are meant to replace some of the common plugins without necesity to run external code. These only one that has been tried is check_nrpe module, which was basically NRPE source with a patch to make it into a library.
  • And this internal diagram of threads. Client is using manager-worker thread model. Server is several static threads.
  • This is mod-gearman architecture. Gearman is a little like Mapreduce system. Essentially you have clients that look at if there are any commands to run from one or more queues they belong to and server distributes checks among the queues. This queue system is rather flexible and its possible to create queues for specific hostgroup, servicegroup, etc. I do not know internals of Gearman well but I believe it is also written with manager-worker thread model.
  • Now here is comparison of DNX and Mod-Gearman. DNX aims to be a single package with no external dependencies, it even has simple XML parsing library written as part of it. Unfortunately this also means its harder to maintain and test for new releases. Neither of the projects have full-time developer but Mod-Gearman is basically 90% Gearman and so it gets all the benefits from the larger project. DNX was sponsored by LDS but from 0.20 release its all done by comunity with John Calcote still its main maintainer, last release was 2010 so the project is live. However planned features do not get added until somebody volunteers to program it. The features that we planned for are: embedded perl, encrypting the communication channel for security reasons, optional TCP rather than just UDP, and parsing nagios environment variable into worker nodes to make it even more like it is running in nagios. Load balancing of event handlers maybe added as well I do wnat to mention that DNX can support hanlding of certain checks by subset of servers using localCheckPattern directive, it was added into 0.20 release and was a patch before. Mod-Gearman as I mentioned supports this very nices with its queues and it supports offloading of event handlers too.
  • So best news of all is you can combine different cluster nagios models to create something better The picture in this DNX project and I've done this but personally prefer Merlin over NDO because it offers failover capabilities.
  • Now here is a overloaded diagram of a full nagios infrastructure that has is fault-tolerant and can be horizontally scaled. If you have all the resources in the world you can have each of the above boxes as separate servers, I've never gone quite that extreme and my largest install was 500 hosts. Also just to explain above DB Proxy and Web Interface server should cross-monitor each other with a heartbeat and you should set it up so that if one server dies the other one starts to announce itself on the same ip. For those using Amazon, this would be done with changing Elastic ip.
  • If you're starting small this is a reasonable setup for a cluster. All chekcs are offloaded to worker nodes and this frees up cpu resource on nagios server to do performance graphing. Elastic or shared ip can be used to point to active nagios server or you can register primary server in dynamic dns. Standby server does not do any checks but is there ready if something happens to primary server. One thing to mention is monitoring of Worker Nodes and the other nagios server is an exception and should be done directly by nagios server and not by worker nodes. As you grow you can begin to separate components into separate servers such as separate database server and separate performance graphing server.
  • I wanted to mention about configuring hosts. I find it best to crete a template for each type of server and to tie all services to hostgroups. This makes adding new host just a matter of adding above with a new name. But as you all know Nagios is not super great live additional of hosts so what works best is if you add a few extra servers in config and by default disable all checks. Then once server is up you a script can re-enable all checks on the host.
  • Doing auto-scaling with nagios with event-handler is slightly better than custom check. The trigger should be total number of open sockets. One option is if on any one of the server it exceeds threshold new server is launched but no more often than say once every 10-15 minutes. Another options is keep track of total number of connections from all hosts of this type. You can do it wth combining RRD data or with a database and my preference is database.
  • This is illustrative example of logic for auto-scaling when using sql database I write these in perl but above is not a real perl or full sql.
  • I wanted to also give few additional tips for those just starting with monitoring virtual systems. First of all as you will quickly learn system load is not always entirely accurate, you better of using other parameters like total number of connections server is handling time it takes to process requests. Another tip is if you control the cloud, integrate it and add an ”empty” nagios server just showing name of physical server. You will find it useful for diagnostics. And remember – you're on the cloud, you can just launch a new server if current one is not working right. For production system that is more important then debugging exact issues right away.
  • Lastly here are the links to Nagios software I mentioned in presentation. Of those I did not mention, Ganglia is good for montoring large grid of servers. So it is good if you want to to monitor hypervisor hardware on which cloud servers are going to run.
  • Nagios Conference 2011 - William Leibzon - Nagios In Cloud Computing Environments

    1. 1. Nagios and Cloud Computing <ul>Presentation by William Leibzon ( [email_address] ) Thanks for being here! </ul>Nagios <ul>Nagios 2011 Conference in Saint Paul, Minnesota </ul>
    2. 2. Cloud Computing <ul><li>What is Cloud Computing? Virtualized systems independent of hardware and leased to various customers in what is referred to as Infrastructure as a Service </li></ul>Image courtesy of
    3. 3. Virtualization and Cloud Computing <ul><li>Virtualization </li><ul><li>Separates Hardware from User Software - either one can be upgraded independent of the other
    4. 4. Efficient use of modern multi-core processors
    5. 5. Micro-Kernel design is simpler, easier to support </li></ul><li>More Servers with Less Hardware </li><ul><li>Unused system resources can be utilized in other types of servers with different resource usage
    6. 6. Less energy, more power efficient use of resources
    7. 7. Less rack space in expensive datacenters </li></ul><li>Virtualization is the core of Cloud Computing </li></ul>
    8. 8. Cloud Computing Architecture <ul><li>Virtualized Systems in a Cloud </li><ul><li>Can be managed entirely remotely
    9. 9. Can move (even live) from one hardware to another
    10. 10. Can be shutdown, saved to disk and started again when required
    11. 11. Can be easily cloned to have another alike system started exactly when it is needed </li></ul></ul><ul>Cloud allows to automate scaling up of infrastructure to handle peak traffic load while scaling down after to keep overall cost low <ul><li>This requires monitoring of all system resources ! </li></ul></ul>
    12. 12. Cloud Solutions and Vendors <ul><li>Hypervisors (Viritualization Kernels): </li><ul><li>Commercial: VMware ESX, IBM Z/VM, Microsoft VirtualPC
    13. 13. Open-Source: Xen, KVM, OpenVZ, Quemu, VirtualBox
    14. 14. Xen originally implimented paravirtualization, which required modified OS and limited it to Linux. KVM and new Xen-HVM can do full virtualization, but require Quemu and CPU virtualization extensions (Intel's VT or AMD's SVM) </li></ul><li>Virtualization and Cloud Software Suites </li><ul><li>Commercial: VMware vCloud, Microsoft Azure
    15. 15. Open-Source: Eucalyptus, OpenNebula, OpenStack, Baracus
    16. 16. Commercial based on Open-Source: Citrix XenServer, Oracle VM, Ubuntu Enterprise Cloud, Redhat CloudForms, Parallels Virtuozzo </li></ul><li>Cloud Infrastructure providers </li><ul><li>Amazon EC2 (modified Xen), Rackspace (Xen), Linode (Xen), Savvis (Vmware), many many more... </li></ul></ul>
    17. 17. Open-Source Cloud Software <ul><li>Open-Source Hypervisors used in Cloud Systems </li><ul><li>Xen -
    18. 18. KVM -
    19. 19. OpenVZ - </li></ul><li>Open-Source Cloud Management Software </li><ul><li>Eucalyptus -
    20. 20. OpenNebula -
    21. 21. OpenStack – /
    22. 22. Baracus –
    23. 23. Proxmox - </li></ul></ul>
    24. 24. Monitoring for the Cloud <ul><li>Monitoring of hardware (host OS) & hypervisor </li><ul><li>More static, hardware does not change as often
    25. 25. Monitoring of system resources often integrated into virtualizer and info not available to cloud customer </li></ul><li>Monitoring of virtual systems </li><ul><li>Dynamic, should be able to handle addition and removal of server instances
    26. 26. Focus on application and network performance
    27. 27. Ideally should monitor utilization and be able to launch new server instances (auto-scaling)
    28. 28. Monitoring system should itself be robust and handle more servers without impacting performance </li></ul></ul>
    29. 29. Cloud Monitoring Architecture <ul><li>Horizontal Scaling
    30. 30. Clouds can be as small as 10 servers and as as large as 10,000+. When developing architecture, you need to support its future growth from the start.
    31. 31. Scaling on Demand
    32. 32. A pro-active system should handle big changes in the number of cloud instances. You may have 2 webserver instances at 6am and grow to 20 at 10pm.
    33. 33. High Availability
    34. 34. Good system design should be fully fault-tolerant and application as a whole should continue to function without interruption if any one server instance dies </li></ul>This means cluster !!!
    35. 35. Nagios Cluster Options <ul>The base nagios-core package is for stand-alone monitoring where server does all service checks. It can be extended to Nagios Cluster with : <ul><li>Passive Service Checks (Classic Distributed Model)
    36. 36. ”Old Way” - NCSA used to forward results of checks from client servers to main nagios server, not robust
    37. 37. Shared database (Central Dashboard Model)
    38. 38. NDO-Mod and Merlin projects implement this with a combination of NEB modules, daemon & database
    39. 39. Worker Nodes (Load Balancing of Checks)
    40. 40. DNX and Mod-Gearman do it with combination of loaded NEB module, server daemon & client servers </li></ul></ul>
    41. 41. Passive Service Checks <ul><ul><ul><li>How
    42. 42. - One central server with all services, it does not do any checks listing them all passive
    43. 43. - Separate client nagios servers run plugins and do checks for specific sets of hosts, each has its own subset of full nagios config
    44. 44. - Scripts are setup that capture results from each client host and send them to central server using NSCA, it puts them into nagios command queue
    45. 45. Advantages
    46. 46. This will work with any nagios server, organizations have been doing it from at least 2002
    47. 47. Disadvantages </li></ul></ul></ul><ul>Requires a lot of custom scripting to organize nagios configs. Not reliable if server dies. Not robust to automate cloud instances being added and deleted </ul>NCSA NCSA Nagios Client Server Nagios Client Server
    48. 48. Shared Database <ul><li>Who: NDO-DB and Merlin
    49. 49. How
    50. 50. - Multiple Peer Nagios servers, each has different config file specifying which services it would check
    51. 51. - All servers use common database to share results of checks and status of services they are monitoring </li></ul><ul><li>Advantages
    52. 52. - There is no master nagios server. There is master DB server, however it is a better understood topic how to create a db cluster
    53. 53. - Using NEB avoids slow command-queue processing
    54. 54. Disadvantages
    55. 55. Partioning of monitoring infrastructure among servers is still manual process. It is not easy to use this for dynamic cloud environment, however it works very well for fault-tolerance </li></ul>
    56. 56. DNX and Mod-Gearman Worker Nodes <ul><li>How
    57. 57. - Similarly to Passive Service Checks, there is a central Nagios Server, it does not execute any plugins.
    58. 58. - Unlike with Passive Checks, nagios does schedule checks. Thereafter NEB module takes over.
    59. 59. - Module passes information on which plugin(s) to run to DNX server (or Gearman server for Mod-Gearman) which manages worker nodes. </li></ul>- Worker nodes are separate servers, each has special worker daemon running. The daemon communicates with management server and gets information (plugin command) on what to run. It then passes results back to management server and NEB module writes these results directly into nagios memory.
    60. 60. Advantages of DNX and Mod-Gearman <ul><li>Robust and Scalable </li><ul><li>Checks are automatically distributed among all cluster worker nodes (round-robin on equal basis by default)
    61. 61. All worker nodes are essentially the same and there is no additional re-configuration necessary to add a new node
    62. 62. This fully achieves Horizontal Scaling of nagios checks </li></ul><li>Easy to Use in a Cloud Environment </li><ul><li>As nodes are the same. Existing worker node can be replicated with no special config to start it
    63. 63. Adding node lets expand cluster on demand </li></ul><li>Efficient Integration with Nagios </li><ul><li>Using NEB loaded modules achieves low-level integration with nagios, much better than NCSA and command queue </li></ul></ul>
    64. 64. Disadvantages of DNX and Mod-Gearman <ul><li>Single Instance of Nagios Server </li><ul><li>The solution has no direct disadvantages however it only achieves horizontal scaling of nagios checks.
    65. 65. This still relies on a single central nagios server to processes the results, send alerts and schedule new checks. </li></ul><li>Does not achieve fault-tolerance </li><ul><li>If central nagios server dies entire system is out
    66. 66. Author of this presentation does have a patch to DNX that allows results to be multicast to multiple instances of a nagios servers (second one of them would be stand-by and not scheduling checks only receiving results). This is experimental. </li></ul></ul>
    67. 67. DNX Architecture <ul><li>DNX Server and DNX Client (Worker) Daemons are multi-threaded. Client thread model is controlled by these commands:
    68. 68. Communication between Server and Client using own UDP protocol passing XML packets .
    69. 69. Almost all communication is from client to server. Client contacts DNX server dispatcher port, receives list of checks to run, runs them and returns results on collector port
    70. 70. DNX Client can support having common checks built into client. check_nrpe was included before, but was pulled out of a package as it required nagios source. </li></ul>#poolInitial = 20 #poolMin = 20 #poolMax = 100 #poolGrow = 10 channelDispatcher = udp:// channelCollector = udp://
    71. 71. DNX System Internals DNX Server System Internals DNX Client (Worker Node) System Internals
    72. 72. Mod-Gearman MOD-Gearman System Nagios Checks and Mod-Gearman Queues
    73. 73. DNX vs Mod-Gearman <ul><li>Single package, no external dependencies. Includes all job cluster control components </li><ul><li>Hard to maintain and test for non-Linux environment </li></ul><li>Can use localCheckPattern in server configuration to direct jobs. But it is not documented
    74. 74. Supports nagios-2.x with a patch and nagios-3.x as is
    75. 75. Client can be extended with nagios- specific features. Planned are: - Embedded Perl, check_icmp, - check_snmp, check_nrpe </li></ul><ul><li>Mod-Gearman is built around Gearman Project </li><ul><li>Better maintained since Gearman has many uses
    76. 76. Enjoys benefits of wider testing on new releases </li></ul><li>Easy to configure and direct to separate queues depending on hostgroup & servicegroup
    77. 77. Only supports nagios 3.x
    78. 78. Supports eventhandlers and not just checks !
    79. 79. Nagios-only features are hard to add at node level </li></ul>DNX Mod-Gearman
    80. 80. Combining Shared Database and Worker Nodes <ul>Nagios cluster options can be combined ! DNX or Mod-Gearman with Merlin or ADO are great fit : - DNX offers horizontal scaling for all checks and relieaves Nagios of need to run them - Merlin provides horizontal scaling and failover for Nagios itself for infrastructure of thousands of hosts </ul>
    81. 81. Ideal Fully Fault-Tolerant Nagios Cluster Architecture Replication udpecho cross-monitor Ideally you would have each of the above as a separate cloud server, but even those with 1000s of servers may find this hard to maintain udp udp heartbeat Nagios Server Merlin/ADO DB Merlin/ADO DB Backup DB Proxy Nagios Web Interface Server Backup Nagios Web Interface Server Standby DB Proxy Worker Node Worker Node Worker Node Worker Node Backup Nagios Server Performance Data (RRD) Server (like NagiosGrapher) Backup Performance Data (RRD) Server
    82. 82. Nagios Cloud Cluster with 4 hosts N P C D N P C D MAIN NAGIOS SERVER STANDBY NAGIOS SERVER <ul><li>Standby Server has all checks disabled (except checking main nagios host)
    83. 83. Cross-monitor of other nagios does not use DNX cluster
    84. 84. If main server dies, backup takes over and registers itself in dynDNS server replacing primary.
    85. 85. DNX Clients use dynDNS address, they are restarted on server switch </li></ul>replication cross-monitor Nagios Daemon Apache Mysql DB Merlin PNP w/ RRD DNX Server DNX Client DNX Client Nagios Daemon Apache Mysql DB Merlin PNP w/ RRD DNX Server
    86. 86. Configuration of a cloud host <ul>The best way to configure monitoring of cloud hosts with multiple instances is to have a template and define all services by hostgroups Then starting new host of same type is just a matter of adding config like above but for w2, etc One of the alternatives is to add a few extra hosts to nagios config and disable all service checks on those hosts, enabling them using script when server is launched </ul>define host { use wprod-server <--- Template for all Webservers host_name w1 alias webserv1 <---- This is second way to search address w1.dynamic.cloud1.mydomain <---- Local DNS hostgroups production,loadbalanced,linux_centos5,webserv parents loadbalancer1,loadbalancer2 contact_groups admins }
    87. 87. Auto-Scaling <ul><li>Event handlers can be used or custom check.
    88. 88. Trigger based on total number of open http sockets (check_netstat, check_apache_status) from all servers
    89. 89. Write custom script that keeps number of currently active servers in DB or local file to set name of new server.
    90. 90. Have new server name as a parameter for launching cloud instance. Write startup scripts that use this to set hostname and register ip in local dynamic dns server.
    91. 91. For Amazon EC2, aws utility is very useful to automate launching of new servers. Get it at
    92. 92. Extra nagios worker node is launched similarly and this is triggered when enough servers have been launched. Can also do it based on nagios stats (check_nagios)
    93. 93. Scale down after an hour or more of low resource usage, you can do it with a check that relies on RRD data </li></ul>
    94. 94. Use of SQL DB for Auto-Scaling This is for illustration of logic only. Not real code. CREATE TABLE ServerData ( id bigint(10) unsigned NOT NULL, name varchar(50) unsigned default NULL, connections bigint(20) unsigned default 0, started_on date default NULL, PRIMARY KEY(id)); After you got results of server check (like event handler that runs): UPDATE ServerData SET connections=<data from nagios check> WHERE name=<server host> Custom check to see if new server should be started: $count=sqlexec(&quot;SELECT COUNT(id) FROM ServerData&quot;) $sumit=sqlexec(&quot;SELECT SUM(Connections) FROM ServerData&quot;) $lastlaunched=sqlexec(&quot;SELECT MAX(started_on) FROM ServerData&quot;) if $sumit/$count > $threshold && ($now-$lastlatched)<600 { <figure out the name and id> launch_new_server_instance($newname) sqlexec(”INSERT INTO ServerData VALUES ($newid, $newname,0,CURDATE())”) enable_nagios_service_checks($newname) }
    95. 95. Additional Cloud Monitoring Tips <ul><li>Cloud Servers are not entirely independent, and other servers on same hardware server may effect yours </li><ul><li>For Virtualized OS System load checks are less useful and can show ”false” spikes in load. Put larger emphasis on 15-minute load and do more checks before alerts are sent
    96. 96. But if you control the cloud, find way to get cloud hardware system load. Write check showing physical server name
    97. 97. For load issues rely more on a number of connections (TCP session) and time to process each request. Do prior tests on how many connections one server should handle </li></ul><li>Remember, you can always just launch a new server </li><ul><li>Do not spend too much time investigating cause, take it out of production first, replace, and investigate later </li></ul></ul>
    98. 98. Nagios Cluster Software <ul><ul><li>Nagios, NDO-Utils, NCSA –
    99. 99. DNX (Distributed Nagios eXecutor) -
    100. 100.
    101. 101. Mod-Gearman -
    102. 102. Gearman -
    103. 103. Merlin (Module for Effortless Redundancy and Loadbalancing by OP5) –
    104. 104. Check-Multisite (collect data from multiple servers) –
    105. 105. Ganglia (open-source computing cluster monitoring, can be integrated with nagios) – </li></ul></ul>
    106. 106. Demo & Questions <ul>Questions ? </ul>
    107. 107. <ul>More Questions? Feedback? William Leibzon < [email_address] > My Nagios Page (mostly plugins) : <ul> </ul></ul>