11. Someone Said That ...
• What happens once in every million times
happens 3500 times per day
http://blog.nomadscafe.jp/2011/05/post-12.html
12年6月15日金曜日
12. In the Context of Nagios ...
(Photo : Postal Loathing by justin)
http://www.flickr.com/photos/justin/2412778/
12年6月15日金曜日
13. Too Many Alert Mails
• Problems
• continue to call our mobile phone
• sometimes hide more important alerts
• burden mail systems
12年6月15日金曜日
15. #1: Defining Service Dependencies
• Approach
• whatever you use for monitoring remote
host status, the status depend on what you
use for monitoring
• e.g. SNMP, NRPE, SSH ...
• define service dependencies between
parent service and child services
12年6月15日金曜日
16. Consider Simple Case 1
• Your nagios monitors remote hosts via SNMP
• CPU, DISK, NTP, MEMORY
• all services are OK
CPU
DISK
Nagios SNMP
NTP
MEMORY
Remote Host
12年6月15日金曜日
17. Consider Simple Case 2
• Nagios sometimes fails to check status by
SNMP because of high server load
• In this case, nagios evaluates all service status
are UNKNOWN and sends us 4 alert mails
CPU
??? DISK
Nagios SNMP
NTP
MEMORY
Remote Host
12年6月15日金曜日
18. Consider Simple Case 3
• If many servers become over-loaded once,
nagios sends us a lot of noisy alert mails
• because it is obvious that SNMP doesn’t
work well
12年6月15日金曜日
19. Defining SNMP Service Dependencies
• Nagios stop to send alert mail if SNMP
returns UNKNOWN
• you will receive only a SNMP CRITICAL
alerts
define servicedependency {
dependent_host_name host1
dependent_service_description CPU,DISK,MEMORY,NTP
host_name host1
service_description SNMP
notification_failure_criteria u
}
12年6月15日金曜日
20. #2: Summarizing Similar Alerts
CPU
Send Summary Alert
Summarizer
テキスト
テキスト
CPU
CPU
fluentdでNagiosアラートの集約
http://6pongi.wordpress.com/2012/06/08/fluentdnagios/
12年6月15日金曜日