Presentation on how to chat with PDF using ChatGPT code interpreter
Diagnostic System Monitoring
1. Data Warehouse | July 13, 2013
Kevin Jesse
Data Warehouse Team | University IT
David Andruczyk
Web Services Team | University IT
2. Data Warehouse | July 13, 2013
Diagnostic monitoring refers to collecting
ALL (or as many as possible) known
system metrics at periodic intervals over
time.
The information given allows you to see
fluctuations in areas of the system that
may or may not impact operational use.
This information also allows for detailed
system metrics which can be used for
further tuning.
3. Data Warehouse | July 13, 2013
Operational monitoring refers to
collecting KEY system metrics at
periodic intervals over time.
The information given allows you to
refine that initial configuration to be
more tailored to your requirements.
The information also prepares you to
address new problems that might
appear on their own or following
upgrades, increases in volumes, or new
deployments.
4. Data Warehouse | July 13, 2013
Apache Server Status
OK 0.031554 seconds response time. Idle 29, busy 1, open slots 470
WARNING 0.029917 seconds response time. Idle 27, busy 353, open slots 120
Open Files
OK: Open files is 9028 of 819200
System Core Files
OK - 0 Core(s) found
Java JVM Threads
JMX OK - ThreadCount=352
JMX WARNING - ThreadCount=683
Total Number of Processes
PROCS CRITICAL: 770 processes
5. Data Warehouse | July 13, 2013
Apache HTTP/HTTPS
HTTP OK: HTTP/1.1 200 OK - 245 bytes in 0.032 second response time
System CPU
24 CPU, average load 3.2% < 50% : OK
System Disk Usage
DISK OK - free space: / 6717 MB (92% inode=99%)
System Memory
OK - 79444M free
System Interfaces
OK: host 'localhost', interfaces up: 7, down: 0, dormant: 0
6. Data Warehouse | July 13, 2013
Benefits
Helps identify key operational
metrics
Helps with holistic view of a system
Performing poorly vs. down
Gives additional insight in to system
Allows for quicker understanding of
a failure based on data
Proactive monitoring of services
which can forecast impending
system failure
Allows SME’s to have more visibility
Enables vendors access to additional
data for troubleshooting
Risks or Downside
Over use or redundant monitoring
Initial implementation can have a
high technical cost with SME
Overwhelming amount of data to
analyze
Alert overload from
misconfiguration
Two systems to maintain (diagnostic
and operational)
7. Data Warehouse | July 13, 2013
Trend or Prediction Analysis
Identification of Overall Performance Metrics
Misconfigurations in Larger System
Can Help to Identify and Pinpoint System Abuse
Early detection via warning signals that an abnormality is
occurring helps avoid the “shock/panic” factor
Early detection of abnormalities vs. “System Down”
Allow more time for analysis, assisting with scenario /what-if
planning
Insight into enhancements that would otherwise go un-noticed
8. Data Warehouse | July 13, 2013
Nagios
Cacti
AWStats
Logwatch
Up.Time
SCOM
Tripwire
Solar Winds
Zabbix
Munin
Groundworks
Big Brother
Nfsen
MRTG
Hyperic HQ
Tivoli
http://en.wikipedia.org/wiki/Comparison_of_network_monitoring_systems
9. Data Warehouse | July 13, 2013
Diagnostic monitoring is something that SME’s
specialize in along with their other skills.
Many SME’s prefer to add a monitoring station
as an individual component of a larger cluster
or platform system. This helps an
administrator focus on tuning vs. being
impacted by other alerts or misconfigurations
in the monitoring station.
Smaller systems with less overall metrics may
not warrant standing up a unique monitoring
station. These systems would benefit most
form a collaborative and centralize diagnostic
monitoring station.