A quick overview of some basics going over network monitoring, why you should do it, what to look for and more...!
This is a presentation I made to our local network professionals group awhile back.
2. • WAN links between sites
• Links between core network devices
• Important devices like servers and core appliances
• Websites
3. • Drive space, CPU and memory utilization
• Log files (for errors or other text)
• Network utilization and bandwidth
• Important services and processes
• Internal or External website availability
4. “My Internet is slow”
Measure bandwidth or CPU of firewall, outbound connections (virus?)
“I can’t get any email, is the server down?”
Check Exchange services, monitor outbound mail traffic per sec.
“We are paying $900 per month to connect our satellite office with
a high speed connection. Is it worth it?”
Watch for peak bandwidth usage during day, week, month.
“Everyone here can’t print. You did something, didn’t you?”
Monitor spooler service, watch for errors in system log regarding
printers
5. A good monitoring system will query a device for a specific
set of statistics, retain this data and report to an appropriate
administrator if those statistics exceed an acceptable
threshold…
…if a drive is 90% full, let
the IT administrator know
via email so they can
begin to remedy the
situation.
6. What do monitoring systems use to get their data?
SNMP – Linux, Network Hardware, Windows
WMI – Windows
Performance Counters – Windows
SSH – Linux
7. SNMP
Usually requires MIB (management information base) files to monitor
advanced system statistics
WMI
Typically available by default, but highly security conscious network
admins may have this locked down
Performance Counters
If you can view it in Windows Perfmon, you can track it in some
monitoring tools
SSH
SSH
Requires root access to run commands
8. • Monitor threshold – at what point does something trigger
an alert?
• Alert – When a threshold is met for a period of time, go
into „Alert‟ status.
• Action – Send an email, SMS, restart a service, run a
script, etc.
9. • Historical trending and reporting
• Maintenance windows
• Multiple notification methods
• Ability to perform action in response to an alert
• NOC (Network Operation Center) view
• Large variety of monitor types that support
WMI, SNMP, etc.
• Ability to produce alerts based a defined span
of time
10. • Company shared drive size and availability
• Ensure Exchange service and Accounting system DB is
accessible after backups
• Make sure outgoing Internet connection is not saturated
• Keep invalid domain logon attempts at bay
• Watch for system errors
11. • How long until something is considered an emergency?
• Will the condition return to normal without your
intervention?
• How do you want to be notified –
email, SMS, page, IM, Net Send?
• Do you want the monitoring tool attempt to remedy the
situation automatically?
12. • Configure your monitors
with high thresholds while
you determine what is
“normal”
• Watch these monitors over
time to get an idea of
normal peaks and valleys of
performance stats
• Tweak your monitors
according to trending and
growth patterns
13. 1. The Death Star depends on tractor beam
2. The IT Admin sets up a monitor to watch service:
“tractor_beam”
3. He then configures the alert to “Email” Darth Vader
when the tractor beam goes down
4. Obi-Wan disables the tractor beam
5. 5 minutes later, the Millenium Falcon escapes
6. Tractor beam is down for an additionl 5 minutes, then
monitoring system sends email
7. Vader is busy choking one of his employees, and has
his BlackBerry set on “vibrate”…
“…probably should have set the monitoring system to restart the service before
Han got awa-aaacccchhdhhshhpfffft” – IT admin speaking with Darth Vader
14. =====================================
Time: 2010/10/05 20:34:22
Object: DC-ROA-01(DC-ROA-01)
Monitor: Security events
=====================================
Status: Alarm
Message: Found matching eventlog record
This is an example of an event log
Event id: 529
Computer: DC-ROA-01 report when a user attempted to log
Source: Security in with an invalid password.
User: SYSTEM
Time Generated: 2010/10/05 20:06:27
Message: Logon Type: 8 means the password was passed using
Logon Failure: ClearText
Reason: Unknown user name or bad password
User Name: amyv@mydomain.com Caller Process is the PID of the executable on the server
Domain: mydomain.com
processing the logon attempt.
Logon Type: 8
Logon Process: Advapi
Source network address is the user‟s Comcast IP.
Authentication Package: Negotiate
Workstation Name: DC-ROA-01
Caller User Name: DC-ROA-01$
Caller Domain: MYDOMAIN
Caller Logon ID: (0x0,0x3E7)
Caller Process ID: 7708
Transited Services: -
Source Network Address: 67.184.244.32
Source Port: 56049
15. It is important to be able to keep a history of
trending, especially with storage devices and
service outages. This will help determine future
needs for backup and DR processes.
You can get an idea of heavily used
volumes/resources, allowing you to organize
planned downtime when moving them.
16. Windows based
• Total Network Monitor
http://www.softinventive.com/products/total-network-monitor/
• MicroTik‟s “The Dude”
http://www.mikrotik.com/thedude.php
• Hyperic HQ Open Source
http://www.hyperic.com/products/open-source-systems-monitoring
• Spotlight on Windows (realtime monitoring only)
http://www.quest.com/spotlight-on-windows/ - free registration required
• Splunk (logfile indexing)
http://www.splunk.org
• Spiceworks (general activity monitoring)
http://community.spiceworks.com
Linux based
• Zenoss
http://www.zenoss.com/
• Nagios
http://www.nagios.org
17. This presentation will be available from www.ninp.org (via
SlideShare)
Rob Dunn: uphold2001@hotmail.com
Editor's Notes
SNMP – protocol, WMI – set of extensions to the Windows Driver Model that provides an interface which the instrumented components provide information and notification.
Cisco uses an access list to allow SNMP traffic to and from a specific host via Read only or Read-Write community strings
Time intervals – like if an threshold is tripped 10 times over 20 minutes, then produce an alert.
You can have many monitoring tools run a script, restart a service, send a notification, etc. in response to an alert. In the case of event log monitors, you can set it to perform what is called a ‘looping list’, which resets itself after every event log scan. This allows it to scan event log dates and remember the last place it scanned before running another check.