Thomas Dunbar's presentation on Building Technology for Storage Systems Monitoring.
The presentation was given during the Nagios World Conference North America held Sept 20-Oct 2nd, 2013 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
2. References & Introduction
* http://content.healthaffairs.org/content/30/6/1185.full.html
* nagios.org, etc
* Nagios: Building Enterprise-Grade Monitoring Infrastructure for Systems and
Monitoring, 2nd ed., David Jacobsen
* Unix Programming Environment, Kernighan & Pike
* After Virtue, 3rd ed, Alasdair MacIntyre
* Purgatorio, Dante - since Nagios ain’t gonna insist on sainthood
3. IHC and IT
Intermountain Healthcare is an internationally recognized, nonprofit
system of 22 hospitals, a Medical Group with more than 185
physician clinics, and an affiliated health insurance company,
SelectHealth. Our 33,000 employees serve patients and plan
members in Utah and southeastern Idaho. IHC has an annual
budget of around 5 billion dollars.
Datacenters in Plano, TX and Salt Lake City, UT and Ogden, UT
providing high availability systems with over 5 petabytes of
storage (over 12000 spindles) using IBM DS8000 for tier 1 and
Netapp for other storage. In-house developed applications run
on top of multiple Oracle databases over 15TB in size.
CA Service Desk/CA Spectrum/Xmatters; Nagios
5. Storage’s Nagios Servers
while SA team moving away from Nagios,
Storage is moving to it:
Using 3.5, with check_mk and pnp4nagios
DNX, if need be
Our own servers for business reasons
Integration with CA Spectrum/Service Desk, etc
7. This Talk’s Perspective
Comprehensive monitoring is a major, site
specific application.
Major applications become very difficult to
replace (e.g. air traffic control, IHC systems)
Hence, let’s consider fundamentals
8. Worldviews
* What we look through, not what we look at
* Tempts us to think it is the only way to see
* Scientific: what can we know, and how
* Technological: what can we build, and how
* Context
11. Spectrum of Traps
EventMessage: Thu 05 Sep, 2013 - 14:47:23 -
Device ********** of type NetAppONTAPDev is
no longer responding to primary management
requests (e.g. SNMP)
CA Spectrum and Nagios