SlideShare a Scribd company logo
1 of 232
“Monitoring Networks Using Nagios”
Eleftherios Iliopoulos
University of Greenwich
M.Sc. Course: Internet Engineering and Web
Management
Master Thesis
15September 2014
Dissertation Submitted to the University of Greenwich in partial fulfillment of the
requirements for the degree of Master of Science in Internet Engineering and Web
Management.
UNIVERSITY OF GREENWICH
Date:
September
2014
Author:
Eleftherios Iliopoulos
Course Program: Master Thesis
Name of assignment: Monitoring Networks Using Nagios
Instructor: Pandithas Ioannis Pages: [75]
Supervisor: Pandithas Ioannis
ABSTRACT
A specific network management strategy is required for keeping a network stable. If a
company’s strategy includes a network monitoring tool, it will be known any time when
network devices and services are in jeopardy. By preventing pro-actively network
downtimes that may be caused by many reasons such as mis-configured hardware or
software, a company increases network uptime, it provides better business services and it
improves business continuity. Therefore, the profitability of the enterprise is guaranteed.
The objective of this thesis is to investigate if Nagios is a suitable open source monitoring
tool for small – midsize organizations. An overview in the literature section of FCAPS
framework against specific Nagios’ functionalities is presented. Moreover, a concise
description of Network Management and the technologies behind a monitoring tool are
presented. Furthermore, the implementation of Nagios in detecting and responding to
faults in a network is described. Relevant alerts to notify the system administrator
accordingly are presented. With the use of laboratory experiment, it is proven that Nagios
monitor a network effectively. Current research includes evaluation of Nagios under
testing conditions and the relevant configuration files are presented.
Keywords Nagios, Network monitoring tools
ii
ACKNOWLEDGEMENTS
I consider this as a great opportunity to thank Dr. Giorgios Papamichail
who gave me this opportunity to work under his supervision. I also thank
Dr. Ioannis Pandithas for his important suggestions which helped me a lot
for the successful completion of this assignment. It is not possible for me,
to express in words, my thankfulness to my family that always
encouraged me to achieve my goal. I am also thankful to Greenwich
University for providing me the opportunity to obtain quality education.
ABBREVIATIONS
CEO Chief Executive Officer
CGI Common Gateway Interface
CMIP Common Management Information Protocol
CMISE Common Management Information Service Element
CPCC Central Piedmont Community College
DDOS Distributed Denial of Service
FCAPS Fault, Configuration, Accounting, Performance and Security
FTP File Transfer Protocol
GUI Graphical User Interface
HTTP Hypertext Transfer Protocol
IT Information Technology
IP Internet Protocol
MIB Management Information Base
NOC Network Operations Center
NMS Network Management System
OID Object Identifier
OS Operating System
POP3 Post Office Protocol 3
SMTP Simple Mail Transfer Protocol
SNMP Simple Network Management Protocol
SSH Secure Shell
TBFG Task-Based Focus Group
TCP Transport Control Protocol
UDP User Datagram Protocol
LIST OF FIGURES
Page
1. Figure 1. Typical network management Architecture [Online image].
http://www.cisco.com/en/US/docs/internetworking/technology/handbook/NM-Basics.pdf
[Accessed 13th
March 2014)].............................................................……........10
2. Figure 2. Characteristics of SNMP protocol in use: v1, v2c and v3. [Online
image]. < http://www.cse.wustl.edu/~jain/cse567-06/ftp/net_traffic_monitors2/index.html
>. [Accessed 22th July 2014]…..
…………………………………………………………………………......12
3. Figure 3. A simplified SNMP architecture. [Online image].
http://jmiller.uaa.alaska.edu/cse465-fall2012/papers/fiang2002.pdf [Accessed 13th
July
2014)]................................................…………………………………...………. ....13
4. Figure 4. Performance methodology flow. Usability Testing. P.192. Carnegie
Mellon University. [Online image]. http://www.cs.cmu.edu/~cprose/LTI-6-
UsabilityTesting.pdf [Accessed 13th
July 2014)].………………….……...................34
Table of Contents
1.0. Chapter 1: Statement of the Problem and Research Aim…………...….…..1
1.1. Introduction.................…………...….………………..…………...…………..1
1.2. Statement of the Problem.….……………...…..……..…………...……………1
1.3. Research Aim and objective………..…………………….……………......…..4
1.4. Research Questions…..………….…….………………….………………........5
1.5. Organization of Study………...……………..………….………………...…....6
2.0. Chapter 2: Literature Review………...…………..………...…..….……........7
2.1. The Definition of Network Management….………… …….…..…...…….......7
2.2. Network Management Architecture…..…...………….…………..….…......…7
2.3. Network Management Protocol ………………………………………...….....8
2.3.1 SNMP ……………………………..…………...………..……….…...………8
2.3.2 SNMP Messages Types ………...…………………….………...….……..…..9
2.3.3 SNMP and UDP ……………....…………………..….……………….…...…10
2.3.4 SNMP Management Information Base (MIB)....………....…..……….……...11
2.4. Functional Division of Network Management ……….……………….....…...12
2.4.1 Fault Management ……………..…………...………..….....…….…...………12
2.4.2 Configuration Management ……….……………………..……...…....…....…13
2.4.3 Accounting Management ………...………………….….………………....….14
2.4.4 Performance Management.............................…………..……….…………….14
2.4.5 Security Management ………….………………………….……..…………...15
2.5. Choosing systems management tools ………….……………………....……..15
2.6. Network Monitoring tasks ….................…………….………………………..16
2.7. Comparison Nagios against Industry Standards…..........………………….….21
2.8. The selection of Nagios …………………….……………...………………....28
2.9. Validating the Literature review outcome…........……….….…………….…..28
3.0. Chapter 3: Methodology……………………………….……….……….......29
3.1. Overview………………………………………………………………...…….29
3.2. Experimental Laboratory……………....………………..……………..………31
3.3. Usability Testing………………………………………...…………..……….…31
3.4. Performance Measurement ……..……………….…………………….………..32
3.5. Questionnaire…………………………………………………………...……….33
3.6. Validity threats related to Questionnaires...……………………………..………35
3.7. Validity threats related to Experimental Laboratory and Usability testing….….37
3.7.1. Internal validity threats..…….…………………………………………….…..37
3.7.2. External validity threats..……………………….……………………………..38
3.8 Designing Rationale of the questionnaires...................................................... 39
3.8.1 Importance of Design Rationale after Literature Review................................ 39
3.8.2 Design Rationale of the Post Test Questionnaire............................................ 41
4.0. Chapter 4: Development…................…………………………………………43
4.1. Network Design.......................................…….…………....……………....……43
4.1.1 Required Resources...............................…….…………....……………....……43
4.1.2 Topology Diagram................................…….…………....……………....……43
4.1.3 Addressing Table...............................................…….…………....……....……45
4.1.4 Network Implementation...................................…….…………....……....……46
4.2. Active Monitoring….…...........…………………………………………...…….53
4.3. Passive Monitoring...............…...…..……………………………………...……54
5.0. Chapter 5: Evaluation…....................………………………………………....55
5.1. Network Elements that Need Monitoring…….…………....……………...…....55
5.2. Planning the Test………..………………………………………………...….....56
5.3. Designing the Test Activities……………………………………………...…....56
5.4. Recruiting Participants……………………………………………………….....57
5.5. Preparing the test materials…...…………………………………………...…....57
5.6. Setting up the test environments..………………………………………...…….57
5.7. Conducting the test…..…………………………………………………...……..58
5.8. Compiling the test results…..………………………….…..……………...….....58
5.9. Funding Considerations………………………………….………………...…....60
5.10. Timetable……………..……..…………………………………………...….....61
6.0. Chapter 6: Findings…................................…………………………………....63
6.1. Results From Questionnaire…….……………………………….………...…....63
6.2. Usability Evaluation………..………………………………...…………...….....65
6.2.1. Effectiveness…………………….…………………………….…….…...…....65
6.2.2 Efficiency………………………………………………………..…….…….....66
6.2.2.1.Result of Satisfaction from Post – Test Questionnaire…..……………..…....66
7.0. Chapter 7: Conclusion and Future Work…....….................………..…….....68
8.0. Chapter 8: List Of References……..........................………….……………....69
APPENDICES…….........................................................………….………………..76
APPENDIX A: Information about interviewees.........................................................76
APPENDIX B: Nagios Evaluation Questionnaire .....................................................78
APPENDIX C: Ubuntu Installation…….....................................................................82
APPENDIX D: Nagios Installation on Ubuntu………...............................................92
APPENDIX E: Network Implementation and Nagios Configuration........................97
Network Implementation............................................................................................97
Nagios Configuration................................................................................................122
1. Active Monitoring….…...........……………...……………………………...…...122
1.1 Monitoring Routers..............……………………...………………………...…..125
Monitoring HQ....................…………………………………………….....…125
Monitoring BRANCH.........…………………………………………….....…131
Monitoring ISP....................……………………………………………....….133
1.2 Monitoring Switches................…………..…………………………………....138
Monitoring DLS1.................…………………………………………….....…138
Monitoring ALS1.................……………………………………………...…..143
Monitoring ALS2.................…………………………………………….....…146
1.3. Passive Monitoring...............…...…..…………………………………….....…149
1.4. Monitoring Network Services......………………………………………...……162
Monitoring NTP Server...........………………………………………...…..…162
Monitoring Telnet...............…………………………………………….....…163
APPENDIX F: Information about interviewees in the Laboratory Experiment......168
APPENDIX G: Test Script…………........................................................................169
APPENDIX H: Task List………………...................................................................170
APPENDIX I: Note Form……………….................................................................171
APPENDIX K: Post – Questionnaire..……..............................................................172
APPENDIX L: Gantt Chart.......................................................................................175
APPENDIX M: Source Code…………....................................................................180
Nagios Scripts.......................………….....................................................................180
Script of ALS1......................………….....................................................................180
Script of ALS2......................………….....................................................................182
Script of DLS1......................………….....................................................................186
Script of ISP_LOOPBACK1.....................................................................................189
Script of SERIAL0...............………….....................................................................190
Script of SERIAL1...............………….....................................................................191
Script of WinServer.......................…………............................................................194
Script of BRANCH...................……….....................................................................196
Script of HQ.........................………….....................................................................198
Router / Switch Configurations……….....................................................................201
ALS1 Switch Configuration.....……….....................................................................201
ALS2 Switch Configuration.....……….....................................................................205
DLS1 Switch Configuration.....……….....................................................................208
BRANCH Router Configuration………...................................................................212
ISP Router Configuration………..............................................................................215
HQ Router Configuration………..............................................................................218
1.0 Statement of the Problem and Research Aim
1.1 Introduction
Today’s complex network infrastructures are becoming critical components for the
business success of an organization whether it is local or multinational. While
network availability is a crucial element for a successful organization, sometimes it
may lead an organization to business failure. Networks include hundreds or thousands
of critical devices required for the successful operation of a business. Therefore the
availability of hardware and software related to network functionality is essential.
Managing the state of hardware is a serious task since critical business services
depend on it. Clients and employees cannot perform transactions if network becomes
unreachable resulting in productivity and profit reduction. Basic operations such as
printing or sending emails are not feasible without network support. Moreover,
incorrect changes in configuration caused by a junior administrator may have rippling
effects on the health and availability of the network infrastructure. Therefore, acting
pro-actively as a member of IT (Information Technology) department, in order to
verify smoothly operation of network infrastructure, is important for securing
customers' satisfaction.
There is need for higher performance in availability of network support; in order to
allow businesses to operate more fluently. The goal of higher performance can be
achieved with active monitoring of networks in order to aid the identification and
prevention of networking failures. Thus, the role of an IT manager is critical in
promoting actions such as provisioning of network services, backup / restoration of
device configuration; automate event correlation, problem isolation and problem
resolution for greater network reliability. However, the reasons of problems faced are
not always cleared and issues such as power outages or other external events cannot
be prevented. IT manager’s goal is to gather, understand and act based on information
such as performance statistics. In such a way, they can reveal problems in IT
infrastructure that causes problems in the availability of network in the near future.
1.2 Statement of the Problem
1
Network management practices have changed through the years. Thus, new tools and
strategies are required in many organizations. IT departments have to evolve from
reactive to proactive in the process of network management. Modern business
requires changes in organizational design and realignment of IT department.
Centralized management of network via monitoring tools inspires stuff to support
vividly networking technologies throughout the organization.
The first case describes the choices for network management tools and reveals the
associated cost included in selecting any monitoring tool. The second case involves
the ways in which management tools are helping IT departments to arrange some of
the key challenges faced by network experts. The third case refers to the changing
role of networking within a modern business and the following change in the
requirements networking professionals have to fulfill in order to implement new
technologies and obtaining new abilities. The final case discusses the concept of
service monitoring as the prerequisite for the selection of a monitoring tool.
In the last decades, communications technologies have increasingly undergone a
revolution. The fast emergence of multiple protocols (and applications) and the
development of equipments from multiple vendors enhance the complexity of the
centralized management solution. The reason is the high level of heterogeneity in
underlying equipments. The problem derives from the fact that the equipments from
different vendors operate with different proprietary management protocols and
implement heterogeneous management data models. Under these circumstances,
employees of IT department have to deals with extra cost generated. Thus, it is
required by the network administrators to deploy multiple expensive management
platforms in order to manage the entire network. This fact will continue to exist
unless the CEO of the IT department stops thinking that buying hardware from
different manufacturers will help to minimize the risk of dependency from one
manufacturer and reduce the expenditure of purchasing relevant hardware from the
market. Consequently, network administrators have to use different monitoring tool
according to the network equipment being used. Even if several supervision tools for
proprietary management protocols can improve monitoring, an additional problem
2
raised is their functional limitations when new types of components are introduced in
the network. It is not rare phenomenon to see in practice, a system expert to try to
find an appropriate solution to a technical problem derived from a monitoring tool
which is implemented to work with a proprietary management protocol. [Kora, 2012,
p.1199]
The next topic relates to infrastructure that requires monitoring. Today, even on a
small organization that have been operating with the same organizational structure for
many years and with the same number of users that they can find it difficult to deal
with an infrastructure that is growing fast. The growing numbers and types of devices
in today’s business environment enhance the effectiveness and productivity of the
entire organization because work can be achieved across greater distance. Now, a
sales agent can meet a client outside the organization in order to close a deal so he
wants to be able to access his email account with his personal smartphone. Therefore,
this modern practice that contributes to business success cannot be achieved without
network growth. Traditional, manual management functions seem to be out of date
compared with the size of today’s infrastructures. However, an expanded network is
not just a greater version of the network the company previously had. Therefore, the
infrastructure must be supported and managed based on the new requirements.
Moreover, the likelihood of a network outage caused from a human error along with
the network complexity increase the concerns for availability, reliability, performance
and security. In other words, the more the company is expanding in numbers of
devices and volumes of data transferred, the more the demand on bandwidth is
increasing, and the more the number of solutions required supporting network
management functions. [IBM, 2012]
IT managers want employees with additional skills in order to better align network
operations with business requirements. Changing operations skills such as (1)
Implementing mobile, UC, and TelePresence (2) Designing complex networks for
applications (3) Tracking threats, protecting data, and providing network access
control (4) Reporting on application and user SLAs (5) Troubleshooting content and
performance issues, indicate a definite shift that is required in the networking team.
3
The original demand to configure router or switches has expanded into a requirement
to configure advanced application-oriented network software. Nowadays, it is
essential to become much more proactive, due to the specifications related to tracking
and protecting data and controlling access. Hardware - oriented metrics like
availability and up-time reports have to be expanded with metrics related to
applications and user SLAs. It is an important matter if a network link goes
unavailable because it is priority matter that users and applications should be
operational. The networking team should be able to conduct analysis and troubleshoot
any problem. This is a vital business requirement and it pinpoints that new abilities
have to be obtained at a faster rate than previously. Network experts believe that
organizations are struggling with gaps in technology, in personnel abilities, and
number of employees. In order to address the gap between existing and required
abilities of networking experts possess, a good network management tool is critical to
close the gap discussed before. [Shiao, 2008]
Service monitoring is usually confused with single-purpose custom monitoring
because it does not appear often in literature. Even if service monitoring, in its
simplest form, can be described as the development and deployment of a wireless
network, including a Perl script written to monitor the wireless network and
associated services or establishment of a connection on a port, it can perform tasks
and present the results within the context of a complete infrastructure using advanced
features. Little or no extra effort is required in order to write a variety of tests using a
Perl script to monitor the availability and connectivity of a service. A slightly more
meaningful test would be to check a service response, for example checking the status
code returned by a FTP (File Transfer Protocol) server. In terms of monitoring, the
selection of a monitoring tool should rely on the services being monitored and the
related objectives. [Silver, 2009, p.9]
1.3 Research Aim and objective
This Thesis implements Nagios network monitoring tool and evaluates Nagios on
the basis of how fast it can perform network monitoring without forgetting the fact
that Nagios is free of charge. The goal is to make suggestions that will act as
4
blueprints for improving the functionalities and usability of a system. The objectives
of current research are:
• Proving by examining real life cases that Nagios is a suitable choice of network
monitoring for a small / medium enterprise.
• Design a lab environment using pc, routers, switches and servers virtualization for
usability testing.
• Investigate how well Nagios addresses relevant functionalities by conducting and
analyzing laboratory experiments. Participants on those lab tests will be employees
of IT departments with relevant task at work.
• Outline the suggestions based on the analysis of all empirical data collected by
current research.
1.4Research questions
The RQ (Research Questions) of this research are the following:
RQ1: What is the basic theory behind a network monitoring tool?
RQ2: What are the technical (functionalities) and non-technical criteria on choosing a
network monitoring tool based on theoretical frameworks and industry standards?
RQ3: How effectively does Nagios perform the network monitoring functionalities,
theoretical frameworks and industry standards? Is Nagios suitable for small / midsize
organizations?
RQ4: How effectively does Nagios Core 3.x satisfy a small / midsize organization
in practice?
• It is very important to answer RQ1, because when a research is made for a
monitoring tool one important thing is to take into account is what technologies have
been performed by a Network Monitoring System.
By answering RQ2 what is a standard monitoring functionality at the moment will
5
be defined, based on FCAPS which will also be presented. This list will be useful to
people who need to implement a monitoring system and understand what is required
to implement.
While answering RQ3, how well Nagios can perform monitoring tasks so that it is
financially beneficial for every organization will be answered by the evaluation that
will follow.
By analyzing the functional benefits of Nagios, suggestions will be made, which can
serve as a guideline to improve the functionalities and usability of network
monitoring tool, which will be the outcome to RQ4.
1.5 Organization of Study
This Thesis is structured as follows: Chapter 2 gives a brief overview of network
management including the significance of SNMP (Simple Network Management
Protocol). It covers what an open source monitoring tool should include and gives
an insight of its functionalities. Moreover, reasons why specific methodologies are
preferred during the various stages of the thesis are explained in Chapter 3. Chapter
3 mentions issues such as Experimental Laboratory, Usability Testing and
Performance Measurement along with related validity threats. Τhe development
process is examined in Chapter 4. Chapter 5 outlines the evaluation of Nagios via a
set of predefined tasks. Chapter 6 analyzes the findings of this research, selected by
interviewing networks experts using questionnaires in order to crosscheck the RQ2.
Moreover, an analysis is shown of the post-questionnaires used before Experimental
Laboratory. Chapter 7 suggests actions for future improvements. Appendices outline
the technical requirements for installing Nagios and Ubuntu 12.04, the lab
environments and associated topology.
6
2.0 Literature Review
2.1 The Definition of Network Management
The management and operations of modern networks and network services involve a
great deal of operational tasks such as dealing with planned maintenance activities,
mass traffic events, cable cuts and hardware failures. Network management means
different thing to different people. In general, network management is a service that
involves according to Cottrell (1992): “managing the delivery of an agreed upon
service level to the user.” Features of network management are described by Boutaba
(2002) as:
1. Fault management.
2. Configuration management.
3. Performance management.
4. Security management.
5. Accounting management.
Network management enables operators to handle the complexity and scale of the
above network management/operations functions with the help of a network
monitoring tool. While each of these functions is distinct, they all occur in the same
network. At this section a simplified view of the network operations framework is
presented. A detailed view of FCAPS (Fault, Configuration, Accounting,
Performance and Security) will be examined later. This thesis is primarily dealing
with fault and performance management, as important aspects of network
management. [Jianguo D, 2010, p.10]
2.2 N e t w o r k Management Architecture
The most well-known aspect of network management system is network
performance. The network management architectures consist of a centralized network
management entities and management agents running on network devices and
computer systems. Using a management protocol, the network management entities
send polls in order to get information about network devices. Agents return requested
7
information ranging from bandwidth usage to CPU load when problems are
recognized in these services. Using this information, management entities react by
executing a group of actions including performance and error reporting to network
administrators. It is important to be understood that agents are software modules
whose first duty is to compile information related to the managed devices they locate.
Then this information is stored in a MIB, and it is finally sent to the management
entities within NMS (network management systems). Management protocols with
great acceptance are the SNMP (Simple Network Management Protocol) and CMIP
(Common Management Information Protocol). Entities that provide management
information on behalf of other entities are the Management proxies [Moceri, 2010,
p.2]
Figure 1: Typical network management architecture composed of a management
station and various agents.
2.3 Network Management Protocol
2.3.1 SNMP
The SNMP is designed to let management information be exchanged between SNMP
agents and management stations on a TCP/IP internetwork. The protocol defines the
type of network management, information storage databases and the structure of data
in use. Information called SNMP objects can be provided by the SNMP agents.
8
SNMP objects are the device's network configuration and operations, such as the
device's network interfaces, routing tables, IP (Internet Protocol) packets sent and
received, and IP packets lost and stored to MIB (Management Information Base) in a
standard format defined for each object. Even though it is possible to set SNMP to
work via TCP, it is not the best practice for larger networks due to the large number
of connections. Thus, SNMP relies on UDP (User Datagram Protocol) as a transport
protocol. A standard manner to view and alter network management information on
hardware from multiple vendors can be provided by SNMP along with MIB. Any
monitoring or management application that uses SNMP can access MIB data on a
specified device. [Mauro, 2001]
READ/WRITE are the two basic operation modes of SNMP protocol. While the
READ/WRITE mode enables setting certain variables on the specified device, the
READ mode permits only reading the SNMP variables from a specified device.
Configuring an agent with the READ/WRITE mode, with only one OID variables in
the MIB base should be set to include only a specific OID value. In this case, WRITE
access to other OID values would be forbidden. Thus, it is possible to set limitations
in the MIB base. [Wikipedia, 2013]
2.3.2 SNMP Messages Types
SNMP version 1, the initial version of the SNMP protocol introduced five protocol
data units that are still supported in current versions of the protocol. The GET
REQUEST is used to retrieve the value of a variable or list the variables of a network
data object by sending a relevant request. The GETNEXT REQUEST does the same
thing with the exception that the request is the next value in a sequence of a data
object after the GET REQUEST. Agents send GET RESPONSE data units to GET
REQUEST and GETNEXT REQUEST requests. SET REQUEST data unit is sent by
Management stations to set the value of a variable or list variables on a specified
device. When agents want to notify management stations for events taking place,
they send asynchronously TRAP messages. SNMPv2 includes revision improvements
for SNMPv1 in the key areas of performance, security, confidentiality, and manager-
to-manager communications. GETBULK performs sequential requests more
9
efficiently by permitting a management station to request larger amounts of
management data rather than having to repeat again a sequence using GETNEXT.
The INFORM message type was originally defined as another version of TRAP that
is acknowledged by the management station. SNMPv3 primarily increased
cryptographic security and remote configuration to the protocol making it in the
preferred version to use. [Matt, 2006]
Message Usages
GET REQUEST Used by Manager to retrieve a specific piece
of network information.
GETNEXT
REQUEST
Used by Manager to iteratively retrieve a
sequence of information.
GET RESPONSE Used by agent to send information to
Manager in response to a request.
SET REQUEST Used by a Manager to initialize or change the
value of a management object.
TRAP Used by agent to report an alert or other
asynchronous event to the Manager.
GETBULK Introduced in SNMPv2 to retrieve a sequence
of information as a faster alternative to
GETNEXT.
INFORM Introduced in SNMPv2, an acknowledged
version of TRAP.
Figure 2: Characteristics of SNMP protocol in use: v1, v2c and v3 are given above.
2.3.3 SNMP and UDP
10
Figure 3: A simplified SNMP architecture is given in above.
SNMP uses UDP, as transport protocol, for passing data between managers and
agents because it has not the overhead of TCP (Transmission Control Protocol). The
impact of UDP reduces network's performance so it requires low overhead due to the
unreliable nature of it. UDP has been chosen over TCP protocol because there is no
acknowledgment for lost datagrams at the protocol level. Thus, there is no end-to-end
connection between agent and NMS when datagrams (packets) are sent back and
forth. If the NMS does not receive a response, it simply assumes the packet was lost
and retransmits the request. Sequencing is not required because each request and each
response travels as a single datagram. The number of times the NMS retransmits
packets is also configurable. The unreliable nature of UDP is not a real problem but
the process differs for traps. The NMS has no way of knowing if an agent sends a trap
and the trap never arrives. All management stations use the UDP port 161 for sending
and receiving requests to agents and agents send TRAP messages to management
stations on UDP port 162. [Kozierok, 2005]
2.3.4 SNMP Management Information Base (MIB)
The MIB is a collection of the managed objects that make up the "management
11
information". Each agent has its own MIB. NMS can read or write in the MIB of the
managed objects. MIB defines a set of characteristics in a standard format associated
with the managed objects such as the OID (object identifier), access right and data
type of the objects. MIB defines data using a tree structure. Each node of the tree is
related with a managed object and can be uniquely identified by a path starting from
the root node. Each object in the MIB can be uniquely identified by a string of
numbers and a text name. This string of numbers is the OID of the managed object
system [Ipswitch, 2001].
2.4 Functional Division of Network Management
The ISO has contributed to a well-defined network management reference model for
network standardization. The OSI model breaks network management into five
functional divisions which are sometimes referred to as FCAPS so that the major
functions of network management systems are understood. The above divisions are
discussed in the next sections based on Shields (2007, p.5-8) and Parker (2005, p.4):
2.4.1 Fault Management
Fault management involves trouble management, which has to do with searching for
detection functions for service, fault recovery, and proactive maintenance, which
provides capabilities for self-healing. Trouble management triggers alarms for
network anomalies or failures and performs diagnostic tests to isolate faults in
hardware or a service. Not only does it trigger service repair but it also accomplishes
important measures to fix the diagnosed fault. Proactive maintenance performs
routine maintenance to near-fault conditions and fixes problems before service
troubles are reported to the NMS. FCAPS model identifies twelve management tasks
as important for a good fault management system:
 Fault detection
 Fault correction
 Fault isolation
 Network recovery
12
 Alarm handling
 Alarm filtering
 Alarm generation
 Clear correlation
 Diagnostic test
 Error logging
 Error handling
 Error statistics
2.4.2 Configuration Management
Configuration management is involved with resource provisioning and service
provisioning. It identifies records and maintains network configuration in order to be
able to update configuration parameters and to ensure normal network operations.
The configuration management which faces three kinds of networks: logical, service,
and custom, involves the following management tasks:
 Resource initialization
 Network provisioning
 Auto-discovery
 Backup and restore
 Resource shut down
 Change management
 Pre-provisioning
 Inventory/asset management
 Copy configuration
 Remote configuration
 Automated software distribution
13
 Job initiation, tracking, and execution
2.4.3 Accounting Management
Accounting management processes and manipulates services related to user
management and administration. Moreover, accounting management creates and
verifies billing for usage of network resources and services. The below list resumes
the eight tasks that enable accounting management for monitoring tools:
 Track service/resource use
 Cost for services
 Accounting limit
 Usage quotas
 Audits
 Fraud reporting
 Combine costs from multiple resources
 Support for different accounting modes
2.4.4 Performance Management
Performance management deals with processes that ensure the reliability and quality
of network performance based on their capability to fit user service-level goals. It
includes evaluation of vital performance entities such as network throughput,
resource utilization, delays, congestion level and packet loss, and reporting if quality
of network resources is below a certain level. Performance management systems are
responsible for the following issues:
 Utilization and error rates
 Performance data collection
 Consistent performance level
 Performance data analysis
 Problem reporting
14
 Capacity planning
 Performance report generation
 Maintaining and examining historical logs
2.4.5 Security Management
Security management protects non authorized access to network resources, its
services and data against all security threats such as accidental abuse, unauthorized
access, and communication loss. In addition, it ensures user privacy and control over
user access privileges that derive from a range of access modes like operations
systems, service provider groups and customers. The following activities are crucial
for an efficient security management system:
 Selective resource access
 Access logs
 Data privacy
 User access rights checking
 Security audit trail log
 Security alarm/event reporting
 Take care of security breaches and attempts
 Security-related information distributions
2.5 Choosing systems management tools
The factors, not related to technical issue, that affects a small (or medium) sized
company to select the IT monitoring tool it will use, are the following [Curry, 2008,
p.7] [Drogseth, 2006, p.4, 6] [Hale, 2012, p.11-12]:
 Ease to use – not based on usability of demos, but based on usability of
implementation in a real world scenario.
 Skills mandatory to implement the specifications versus skills available.
 Specifications for and availability of user training.
15
 Cost such as licenses, tin, evaluation time, maintenance and training.
 Support – from supplier and/or communities.
 Scalability.
 Deployability – management server(s) ease of installation and agent deployment.
 Reliability.
 Accountability – the ability to sue / charge the dealer if expectations are not
reached
A prioritized list of basic requirements that meet Burgess’s (2005, p.3) expectations is
helpful, since a successful implementation of a network monitoring tool combines
those specifications.
 Open Source software
 Very energetic forum / mail lists
 Established history of community support and regular fixes and releases
 Centralized, open database
 Both Graphical User Interface (GUI) and Command Line Interface (CLI)
 Easy deployment of agents
 Scalability to several hundred devices
 Adequate documentation
2.6 Network monitoring tasks
After having analyzed the Network Management Functions of FCAPS framework,
the monitoring functionalities for each of network management functions will be
defined. In order to support the evaluation of Nagios monitoring tool for small and
medium organization, findings of Section 2.4 (“Functional Division of Network
Management”) below the monitoring functions (tasks a NMS should do based on
literature review and the criteria set by network industry) are listed below. Those
findings will be used for benchmarking on evaluating Nagios as a monitoring tool.
Fault and Performance functionalities and their important sub-functionalities are
16
presented in details along with their relevant key metrics. Moreover, Configuration,
Accounting and Security Functionalities are mentioned briefly due to their affection
in the selection of a monitoring tool even though they are out of the scope of this
thesis.
Table 1: Fault monitoring tasks and their key metrics [MindShare Services, 2007]
Tasks Key Metrics
Fault
Monitoring
Fault detection
 Mean – Time
Between Failures
 Mean – Time To
Restore
 Network Uptime
Fault correction
Fault isolation (Network
Mapping / graphs)
Network recovery
Alarm handling
Alarm filtering
Alarm generation
Clear correlation
Diagnostic test
Error logging
Error handling
Error statistics
Table 1.1: Fault detection task and its sub-tasks
Task Sub-Tasks
Fault detection
Passive fault management
Active fault management
Table 1.2: Alarm / Event Generation task and its sub-tasks
Task Sub-Tasks
Alarm / Event Generation Sending an email message
Sending an SMS message to a cell
phone or pager
Playing a sound or recorded message
on the management workstation
Logging the alert to the Network
Event log
17
Logging to a text file
Sending a Syslog message
Sending an SNMP trap
Logging the alert to a Microsoft
Windows event log
Sending a Microsoft Windows Net-
Message
Executing an external program
Executing a script
Speaking an alert message using a
text-to-speech engine
18
Table 1.3: Fault correction task and its sub-tasks
Tasks Sub-Tasks
Fault correction Device/service restart
Reconfiguration
Security action
Table 2: Configuration Monitoring tasks and its key metrics [The Configuration
Management Planning Group, 2013]
19
Tasks Key Metrics
Configuration
Monitoring
Resource initialization  MTTR Reduction
 Loss of Business
Revenue
 Simple count on
number that a
configuration
does not match
held information
 The amount of
elapsed time that
passes from the
approval of a
change to the
actual
implementation of
that change
 The number of
components that
are identified as
“unauthorized”
Network provisioning
Auto-discovery
Backup and restore
Resource shut down
Change management
Pre-provisioning
Inventory/asset
management
Copy configuration
Remote configuration
Automated software
distribution
Job initiation, tracking,
and execution
Table 3: Accounting Monitoring tasks and its key metrics [Creanord, 2013]
Tasks Key Metrics
Accounting Monitoring
Track service/resource
use
 SLA Based
resource
allocation
 Trend Analysis
 Resource
utilization
 Network
inventory
information for
costing
 capacity planning
Cost for services
Accounting limit
Usage quotas
Audits
Fraud reporting
Combined costs from
multiple resources
Support for different
accounting modes
Table 4: Performance Monitoring tasks and its key metrics [Jain, 1991, p.40] [Benoit, 2007,
p. 9-11]
Tasks Key Metrics
Performance Monitoring Utilization and error
rates
 Bandwidth
Utilization
 Network Latency
 Interface Errors
and Discards
 Network
Hardware
Resource
 Utilization (CPU
load, memory
usage, and buffer
usage)
Performance data
collection
Consistent performance
level
Performance data
analysis
Problem reporting
Capacity planning
Performance report
generation
Maintaining and
examining historical logs
20
 Availability
Table 4.1: Performance data collection task and its sub-tasks [Shields, 2007, p.26]
Tasks Sub-Tasks
Performance data collection
Input/output bits/second
Current/average response time
Peak traffic load
Interface errors/discards
Percent packet loss
Table 5: Security Monitoring tasks and its key metrics [PCI Security Standards Council, 2010,
p.8]
Tasks Key Metrics
Security Monitoring
Selective resource access  Password policies
 Acceptable use
policies
 Lockdown and
access policies
 Mobile device
access and
lockdown policies
 Business data
encryption
policies
 Antivirus, anti-
spam, anti-
malware, and
anti-spyware
policies
 Security policy
violation
adjudication
procedures
Access logs
Data privacy
User access rights
checking
Security audit trail log
Security alarm/event
reporting
Take care of security
breaches and attempts
Security-related
information distributions
21
2.7 Comparison of Nagios against industry Standards
In the following tables (from Table 6 to Table 10) the major monitoring tasks of
FCAPS against Nagios Core 3.x’s Functionalities [Silver, 2009, p.12] [Gaur, 2003,
p.6-8] [Curry, 2008, p.143-146] are presented as a result of literature study. In
addition, it is proven (Τable 11) that Nagios fulfils some other non-technical
requirements as they are posed in the section 2.5 of this chapter [Golden, 2007]
[Rusalan, 2010, p.7-8] [Nagios, 2013]. The conclusions from these two comparisons
suggest that Nagios is an ideal solution for for small to medium organizations in
terms of manpower.
Table 6: In the next table the way in which Nagios complies with Fault Monitoring
standard of FCAPS is presented as simply as possible.
Tasks Nagios Comments
Fault
Monitoring
Fault detection
Yes (alarms,
warning...)
Supports
NRPE /
NSClient
No
SNMP TRAP
handling
SNMP support V1, 2 & 3
Fault correction Yes
Fast Event
handlers allow
automatic restart
of failed
application and
services
Fault isolation
Rootcause
Analysis
Network
Mapping /
graphs
UNREACHABLE
status for devices
behind network
single
point of failure.
Also, host /
service
dependencies.
Network recovery Yes Via plugin (Nolio
22
plug-in)
Alarm handling Yes
Alarm filtering Yes
Escalation
capabilities ensure
alert notifications
reach the right
people
Alarm generation Yes email / pager
notifications
Clear correlation Yes
Diagnostic test Yes
Error logging Yes
Error handling Yes
Error statistics Yes
PNP4Nagios plug-
in
Table 7: Nagios connection with Configuration Monitoring standard of FCAPS.
Tasks Nagios Comments
Configuration
Monitoring
Resource
initialization
Yes
Network
provisioning
Yes
Auto-discovery Yes
Node discovery /
Interface
Discovery /
Service (port)
Discovery /
Application
discovery
Backup and
restore
Yes
Stores
configuration in
flat files with
simple format in a
SQL database
Resource shut
down
Yes
23
Change
management
Yes Using Perl or PHP
Pre-provisioning Yes
Inventory/asset
management
Yes Via plug-in
Copy
configuration
Yes
Remote
configuration
Yes NRPE 2.15
Automated
software
distribution
No
Job initiation,
tracking, and
execution
Yes
Table 8: Nagios connection with Accounting Monitoring standard of FCAPS
Tasks Nagios Comments
Accounting
Monitoring
Track
service/resource use
Yes
Trending and
Capacity
planning add-
ons ensure you
are aware of
aging hardware
Cost for services
Yes
Availability
reports ensure
SLAs are being
met
Accounting limit
Usage quotas Yes Keeps a history
of alerts and
downtimes for
all hosts and
services checks
24
by default
Audits Yes
Fraud reporting Yes
Combine costs from
multiple resources
Yes
Support for different
accounting modes
Yes
Table 9: Functionalities in the Performance Monitoring standard of FCAPS are
related with functionalities performed by Nagios in the same area in the next table.
Tasks Nagios Comments
Performance
Monitoring Utilization and error
rates
Yes
Monitoring of
network
services and
host resources
Performance data
collection
Yes
PNP4Nagios
plug-in
Consistent
performance level
Yes
PNP4Nagios
plug-in
Performance data
analysis
Yes
PNP4Nagios
plug-in
Problem reporting Yes
PNP4Nagios
plug-in
Capacity planning Yes
PNP4Nagios
plug-in
Performance report
generation
Yes
PNP4Nagios
plug-in
Maintaining and
examining historical
logs
Yes Historical
reports provide
record of alerts,
notifications
25
outages, and
alert reports
Table 10: Connection between Security Monitoring standard of FCAPS against the related
functionalities of Nagios.
Tasks Nagios Comments
Security Monitoring
Selective resource
access
Yes
Access logs Yes
Data privacy No
User access rights
checking
Yes
An
administrator
can prevent
access to certain
parts on a per-
user or per-role
basis
Security audit trail
log
Yes
Security
alarm/event
reporting
Yes
Take care of
security breaches
and attempts
Yes
Security-related
information
distributions
Yes
Table 11: Non-technical requirements of a monitoring tool posed by industry
fulfilled by Nagios
Industry defined standards Nagios
Open Source free software Yes
Very active forum / mail lists Yes
Established history of community Yes
26
support and regular fixes and releases
Centralized, open database Yes
Easy deployment of agents Yes
Scalability to several hundred devices Yes
Adequate documentation Yes
Ease of use Yes
Skills necessary to implement the
requirements versus skills available.
No
Requirements for and availability of
user training
No
Cost Minimum
Support (from supplier and/or
communities)
Yes
Scalability Yes
Deployability (management server(s)
ease of installation and agent
deployment)
Yes
Reliability Yes
(Accountability – the ability to sue /
charge the vendor if things go wrong)
No( only in Nagios XI)
Both Graphical User Interface (GUI)
and Command Line Interface (CLI)
No
2.8 The selection of Nagios
Although the functionalities of Nagios Core 3.x listed in literature revive can be
applied in large companies, it is difficult to apply to a relevant company. Network
management requirements and expectations are different from the network of a small
organizational, due to limited technical skills of company’s staff. Using monitoring
tools that are financially affordable, easy to install and use and able to monitor all
their resources is a priority for any company. [Zoho Corp, 2010]
Reid (2008), Ayadi (2013) and Curry (2008, p.148) argues that Nagios is the best
monitoring system for any small / medium size network. They claim that Nagios
compared to other monitoring tools is better because:
1. It has very low specifications
2. It has many plugins to use.
3. It supports SNMP keeping monitoring simple
27
4. Nagios can be installed and run in 15 minutes with basic configuration
5. It has good built-in documentation
6. It supports more network devices in the free version
2.9 Validating the Literature review outcome
To answer RQ2, it is important to define the functionalities which should be
performed by an automated NMS. The outcome of literature review (including RQ1
and RQ3 as well) should be validated with the use of questionnaires that will be
completed by professionals of the field. Moreover, the level of consistency between
Nagios’s functionalities and FCAPS framework should be defined. Analysis of data
selected with the help of questionnaires will answer whether Nagios is the best
available monitoring tool available for a small / medium organization. The
questionnaire is presented in Appendix B and the list of the participants is presented
in Appendix A. More information about research methodology is included in Chapter
3.
3.0 Methodology
3.1 Overview
There are three kinds of research methodologies in software engineering: (1)
Qualitative methodology, which seeks to extract and analyze the required
information from books, papers, observation, interviews and web sources in order to
justify or improve a theory. (2) Quantitative methodology, which collects numerical
data and examine dependency relationships among variables with the use of
statistical methods. (3) Mixed methodology that includes both types of research
methodologies (qualitative and quantitative) in a single research. [Bazeley, 2002,
p.2]
The selection of the appropriate research methodology is important for the success
of a research project. In general, a combination of two or three data sources may be
most effective in achieving a particular research objective. To answer the research
28
questions of this Thesis a mixed research methodology is adopted. More specific, a
triangulation approach methodology is selected to be used for cross - validating
results obtained by research methods. Quantitative and Qualitative data are collected
concurrently but they are analyzed and interpreted separately. Triangulation gives
opportunity to researcher to mix both quantitative and qualitative research
approaches within a stage of the research process. [Conrad C. and Serlin R, 2010,
p.155]
The Qualitative method was used to answer RQ1, RQ2 and RQ3, based on the
finding of literature review and based on questionnaires with professionals on
managing network infrastructure, in order to verify result obtained. Moreover,
experiment, the most common quantitative method, was used to check results
against predefined metrics (benchmarks). The experiment approach was used to
answer RQ4. A post-test questionnaire will be completed by the six participants, for
verification of the results after the experiment. The technique that will be used
during the experiment will be the TBFG (Task-Based Focus Group) technique, in
which a set of tasks - scenarios is given to the participants for implementation,
followed by discussion afterwards. The drawback of TBFG, as Downey (2007,
p.141) mentions, in comparison with Group usability is minimized in this Thesis
with the use of professionals with great career in Network Management. Thus,
empirical data is gathered without the need of many observers. Qualitative analysis
of the results will be performed, with the comparison and display data on Microsoft
Excel. Table 12 below displays the methodology to answer each research question:
Table 12: Research questions with methodology employed
Research Question Method(s)
RQ1
Literature review + Interviews with professionals in network
management domain. Quantitative survey.RQ2
RQ3
RQ4 Performance measurement + usability testing (quantitative
analysis) based on empirical data collected from the
29
experiment and the post-test questionnaire.
3.2 Experimental Laboratory
This method is selected because controlled laboratory experiments give researchers
the advantage of control. One of the three major purposes that laboratory experiments
serve is to test and refine existing theory. Furthermore by using experiments we can
bridge the gap between theory and real business problems. The art of designing good
experiments is in creating simple environments that capture the essence of the real
problem that can be interpreted with the support of data exposed. A good experiment
allows researcher to clearly distinguish among possible explanations while
abstracting away all unnecessary details. The most important factor that makes
experimental work rigorous is theoretical guidance. To interpret the results of an
experiment, researchers need to be able to compare the data with theoretical metrics
(benchmarks). Thus, the first step in doing experimental work is to start with an
theory such as the research questions of this thesis. [Katok, 2011, p.1-3]
3.3 Usability testing
30
A system may have excellent quality of use for some people and poor quality of use
for others. Many approaches of usability focus specifically on problems faced by
users, related with a graphical interface. Although it is important to eliminate
problems on interface, it can be a misleading indicator of overall usability. Usability
depends on the specific tasks people want to do when they use an application. Most
users on usability testing face several trivial problems, rather than facing a single fatal
problem which causes task to fail. The objectives of a usability testing vary
considerably, relying on what is tested and why so easy-to-use widgets may not give
to the application an acceptable level of usability. In order to get reliable results on
usability testing, the design of a test should include and evaluate wider usability
requirements. Therefore, usability may relate to the safe and efficient performance of
specific critical tasks by operators on the system. [Macleod, 1994]
The main purpose of a summative test of a complete product with representative users
and tasks designed is to evaluate the usability, via defined metrics, rather than
diagnose and correct specific design problems. The usability requirements should be
task-based and tied directly to product requirements in order to implement a usability
benchmark. [Usability Professionals Association, 2010]
Testing should include a lot of measures - metrics which can be categorized into four
categories as it has been suggested by Lewis (2006, p.7):
• Goal achievement indicators (such as success rate and accuracy)
• Work rate indicators (such as speed and efficiency)
• Operability indicators (such as error rate and function usage)
• Knowledge acquisition indicators (such as learnability and learning rate)
3.4 Performance measurement
31
Performance measurement is the basis of the usability engineering life-cycle for
assessing whether goals have been met or not. In traditional research on human
factors studies, measurements take place by having a group of users performing a
predefined set of tasks:
Figure 4: Performance methodology flow
The objectives of usability evaluation are broken down into two components as
presented in Figure 4. Next, their relative importance is evaluated based on goals
deriving from the research questions. Once the components of the goal have been
decided, it is necessary to quantify them by measuring the average time it takes a user
to complete a specified set of tasks - scenarios. The selected tasks to evaluate are
representative of users’ normal task in a working environment. This technique will
generally define the interaction between participants and the application – interface,
during laboratory experiment that will affect the quantitative performance data.
Performance evaluation will obtain quantitative data from participants by measuring
the time required for each task with the use of a stopwatch. The time calculated will
be reported by participants in the post-time questionnaire so that the data will be
32
collected accurately without unexpected interference. [Nielsen, 1993, p.193]
Applicable stage: test and deployment.
Personnel needed for the evaluation:
Usability experts: 2
Software developers: 0
Users: 6
Usability issues covered:
Effectiveness: Yes
Efficiency: Yes
Satisfaction: Yes
Can be conducted remotely: No Can obtain quantitative data: Yes
3.5 Questionnaire
The questionnaire is the preferred method for collecting information about the three
research questions under investigation in this thesis. Close-ended questionnaires’
format is easy to conduct, easily coded and analyzed. They permit comparisons and
quantification, and are more likely to measure degrees of difference with nominal,
ordinal, interval and ratio levels while avoiding irrelevant responses.
The basic principle is that the two questionnaires have to embody as many questions
as necessary and as few as possible so they should be designed and formatted by
researchers whose main concern is length. The two questionnaires should be written
in such a way that test users cannot be identified and the test results should be kept
private. An extensive understanding of the possible range of participant responses is
required due to the huge amount of data that is going to be processed. To achieve
reliable and valid outcomes, each question must be checked, edited and coded before
being included in the questionnaire in order to provide that each participant and test
evaluator can decipher its meaning easily and accurately. To achieve reliability and
validity, questionnaires should be short and simple.
Questionnaire design should be piloted to test if any major defects exist. The pilot
phase is used to verify that post-test questionnaire will provide useful information.
Concepts such as “Strongly disagree”, “Disagree”, “Neither agree nor disagree”,
“Agree”, and “Strongly Agree” require training and feedback to be understood.
Consequently, the test monitor (Moderator) should explain to participants the
meaning of ratio judgments of post-test questionnaire in the pilot test. The pilot test
will detect unintelligible questions producing unquantifiable responses and unwanted
33
outcomes before embarking on the main study. The purpose of the experiment is to
measure the performance of experienced users by doing a laboratory type of study.
Moreover, ISO 9241 standard, part 12, defines usability in terms of effectiveness,
efficiency, and satisfaction. The post-test questionnaire intends to measure with
metrics the usability of a software application and select extra data of user
satisfaction. Each participant will be asked to record the required time to complete a
task in the post-test questionnaire. If an error occurs, the test monitor (Moderator)
will ask users to repeat the task immediately. Then, users will rate the difficulty of
each task using the rating types mentioned above. Time limitations influence the
performance measurement of each task, so the Likert Scale will be adopted in the
post-task questionnaire, since they are easy to be completed by users. It is a common
practice to establish a baseline for each question in order to measure the success of
Nagios in the evaluation phase. Baseline values will be mentioned in Chapter 5.
[Dumas, 2010]
Table 13: Advantages and disadvantages of the Face-to-Face mode of delivery of
questionnaire as Bird notes: (2009, p.1313)
Advantages Disadvantages
Complex questions can be asked. Costly.
Can motivate participants. Time consuming.
Longer verbal responses compared to written. Spatially restricted.
Questions can be clarified.
Answers may be filtered or
censored.
Question sequence controlled.
Interviewer’s presence may affect
responses.
Vague responses can be probed.
Visual prompts can be used.
Long questionnaires sustained.
High response rates.
34
3.6 Validity threats related to questionnaires
Questionnaires have both strengths and weaknesses. Questionnaire is the most
objective research tool because it can provide generalizable results. However, large
sample of data in questionnaires can generate problem due to factors such as faulty
questionnaire design, sampling errors, non-response errors, and biased questionnaire
design. Moreover, respondent unreliability, ignorance, and misunderstanding, errors
in coding and faulty interpretation of results may cause additional problems. [Harris,
2010, p.1-2]
To improve the accuracy of testing, it is important to pay attention to the issues of
reliability and validity. Reliability is the question of whether one would get the same
result if the test were to be repeated. This implies that huge individual differences
between test users has an influence in the results. Validity notes whether the result
actually reflects the usability issues one wants to test or not, taking into consideration
the factors of possible wrong users or wrong time constraints and social influences
given to them by the tester. Whereas reliability can be addressed with statistical tests,
a high level of validity requires fact measures of real products in real use outside the
laboratory evaluation. The simplest form of reliability test is the test-retest procedure,
in which the same unit is measured two times at a different timeframe, and then
results are correlated. More robust measures focus on measuring the extent to which
all individual items correlate with each other. There are a lot of approved ways to
measure internal consistency. The most widely method used, is Cronbach’s alpha,
which evaluates the homogeneity in the individual items. Validity cannot be accessed
directly because there is no knowledge of the true values of construct. [Larsen, 2008,
p.1-2]
A questionnaire can never really be fully “validated” which means that a
questionnaire can have one kind of validity but not another. It can only be validated
for an x number of population, under y conditions, and so forth. In this thesis, one
way to test the validity of the questionnaire is to correlate its outcomes with the
outcomes of the laboratory experiment and with the results of the literature reviews.
35
There are a numerous ways to specify validity, some of which were given by Howard
(2008, p.1) and are noted below:
• Reliability
• Validity
• Internal validity
• External validity
• Sensitivity
• Specificity
• Statistical validity
• Longitudinal validity
• Linguistic validity
• Discriminant validity
• Construct validity
3.7 Validity threats related to Experimental Laboratory and Usability testing
Laboratory experiments are used to address a wide range of research questions.
However, there are various concerns if laboratory findings can be “generalizable” or
if they are “externally valid” to the real markets. There is an argument whether lab
studies can be changed to reflect better an external environment of interest. Some
definitions of external validity demand that qualitative relationship between two
variables hold across similar environments. While there may be a dispute on whether
there is a promise for quantitative results of an experiment to be externally valid, it
cannot be guaranteed that any the qualitative results will present external validity.
[Kessler, 2011]
Internal and external threats to an experimental laboratory are mentioned below
[Heffner Media Group, 2003]:
3.7.1 Internal validity threats
36
Internal validity refers to a study that allows the elimination of confounding variables
within the study itself. There are eight major threats to internal validity related with
this Thesis, which are posed below.
History: History refers to environmental events that happen to participants outside of
research study which may affect or alter participants’ performance. A special
announcement to students of different fields in New York College, at a specific day
and at a specific time, may have had effect on the results obtained via a laboratory
experiment.
Maturation: Maturation refers to the process of measuring something over a repeated
number of trials during the experiment that might make the participants feel boring,
tired, disinterested, fatigued, less motivated than they were at the beginning of the
testing. A way of overcoming this problem is to design a short meaningful post-test
questionnaire and experiment.
Testing: This is a threat if a group of participants evaluate one product or two groups
evaluate two products. The condition is true when the participants perform the pilot
test and the test. The reason for this is that participants tend to perform better at any
task the more they are exposed to that task. Changing the tasks in the pilot test and in
the actual test minimized this threat in this thesis.
Statistical Regression: It refers to the tendency of subjects to move toward the mean
on subsequent testing even if no extra training were given. This implies that average
performance will be higher in a production environment rather than in a experiment.
Instrumentation: Test evaluator should be careful on measuring the performance of all
participants during the experiment and on validating the selected data from the post-
test questionnaire. Ignoring the performance and data given from participants will
lead to false results.
Selection: The sampling technique selected will affect how representative the sample
will be, allowing researcher to make statistical generalizations for a wider population.
The population of the lab experiment is selected based on characteristics of certain
37
type of users, such as career experiences, Cisco certification, Solarwind certification,
and post-graduate degrees.
Experimenter Bias: During testing, the test monitor (Moderator) let each participant
discovers the solutions to the scenario given on his or her own without any support.
Such a way results obtained are more valid and accurate. Moreover, it also enhances
users’ confidence and satisfaction, since they solve the scenarios given on their own.
The test monitor (Moderator - Dr. Pandithas) researcher has an extensive knowledge
of the application, the user interface and test method as well. However, Team Leaders
may be sometimes biased toward the desired results. Using a test evaluator who is
unaware of the anticipated results, can reduce the impact of relevant bias. This Thesis
uses its author as test evaluator who defines which is the cause – effect relationship
among variables.
Mortality: Mortality refers to the situation where some participants drop out of an
experimental group. Imagine a case in which participants with unique characteristics
drop out of the experiment due to illness and only low motivated students remain in
the team. As a result, one team will have better performance in comparison with the
other and this situation could affect the outcome of the experiment. Because, Nagios
is tested against predefined standards and not against other products, this threat is not
applicable to our experiment.
3.7.2 External Validity Threats
External validity refers to the conditions of a study which can lead a researcher to
incorrect generalizations. In order to avoid this threat, the study is performed on a
sample of the population which is not exactly representative of the actual population
of users. Consequently, the experiment was not generalized because the result was
not the outcome of industrial practices relevant with the tested software.
Demand Characteristics: Making sure that participants don’t know the real purpose of
the questions minimizes the possibility of the participants might try to guess what the
researcher wants as an outcome and might respond accordingly rather than answering
the truth.
38
Hawthorne Effects: The presence of researcher during experiment may change the
actual performance of participants during the testing. In order to minimize the effect
of this issue participants will be told that the system is tested and not themselves.
Order Effects (or Carryover Effects): Order effects refer to the situation that some
participants may have "learned" what the task tests will be from the pre-test. Thus,
would not be anymore representatives of the population who have not been pre-
tested. Such a scenario makes the experiment useless, unless the test tasks from the
pilot test and the main test are different, which will be done in this experiment.
Treatment Interaction Effects: If the subjects are exposed to more than one
experiment / training on a network monitoring tool then the findings about Nagios
performance and usability will be affected by the previous experience. Since
participants will probably have no experience with any monitoring tool in their career
life except Nagios, the percentage of participants with familiarity with other tools is
minimized.
3.8 Designing Rationale of the questionnaires
3.8.1 Importance of Design Rationale after Literature Review
This questionnaire with experts will measure data that crosscheck the findings from
the literature about Nagios’ functional and usability characteristics. Experts’
experience, either good or bad, of this monitoring tool will help us to decide if Nagios
meet the requirement for a small / medium organization. Objective of each question
within the two broad characteristics mentioned is to provide information about
significant aspects such as:
QUESTION 2: This question intend to foresee expectations of event management &
control related with alarms in Nagios in order to investigate the options available such
as services’ status and nodes' status.
QUESTION 3: The respondents should provide meaningful answers about the
capability of setting and managing Alerts in Nagios in order to test its reporting
capability.
QUESTION 4: The respondents are asked to provide data if Nagios kept them
39
informed about faults and their location in the network in order to be verified if
Nagios informs properly the root of problem of a network error and pinpoint the exact
place of in into the network.
QUESTION 5: This question is designed in order to be learned if Nagios meet the
needs of a trouble ticket system.
QUESTION 6: This question has to specifically measure the ability of Nagios to
Monitor and report the health of the devices in the network (e.g. CPU Heating, Server
room temp, etc) in order the respondents to verify if Nagios can do what it claims.
QUESTION 7: This question is looking at Thresholds' management of Nagios in
order to be verified if Nagios meets certain standards on this kind of management.
QUESTION 8: A respondent answers this questions about the capability of Nagios to
measure and report different performance related attribute such as throughput, delays
and packet loss in order to have a better understanding of the monitoring tool.
QUESTION 9: Information should be acquired if Nagios could provide statistics of
resource utilization about capacity planning assistance so this question is asked in
order to be proven that Nagios has the ability to measure the availability of certain
hosts, services and links
QUESTION 10: This question was intentionally placed in the questionnaire in order
to investigate if all the required activities in Nagios have affective logging.
QUESTION 11: We were also interested in identifying the fault and performance
management capabilities of Nagios by asking this question in case a major network
error comes up such as an error in configuration of a VPN connection.
QUESTION 12: This question tests correlations between the beliefs of the experts
and the findings of the literature review about the motion that Nagios is easy to learn
from the beginning.
QUESTION 13: The respondents are pinpointed to provide if the monitoring tasks are
performed efficiently (quickly) by using Nagios based on the findings in the literature
review.
QUESTION 14: The respondents are asked to highlight any lack of documentation
and community support available for Nagios in order to deal with future training
demands for the personnel of a company.
40
QUESTION 15: The experts should give their opinions if they believe that the
interface design of Nagios (including how to interact e.g. using keyboard, mouse. and
commands) has weaknesses which implies that Nagios may have functional usability
disadvantages
QUESTION 16: The reason for this question is that experts should be expressed with
their opinion about their satisfaction about the overall functionality of Nagios and if
their beliefs are verified according our findings in the literature review.
3.8.2 Design Rationale of the Post Test Questionnaire
The post-test questionnaire (Appendix K) was divided into eight questions. We asked
the respondents to verify via the questions the experience earned from the task
scenarios listed in the Appendix G. We considered that these eight questions show the
reasoning why Nagios is useful for a medium / small organization. Question 1 allows
respondents express themselves about the easiness to use of Nagios installation. The
responses of question 2 provide information about respondents’ considerations about
the availability reports in Nagios. The respondents considered how easy to configure
Nagios to monitor NTP Server in question 3. Question 4 shows the importance to
monitor a network host on a regular basis which reflects the reasoning of the selecting
Nagios as monitoring tool. Question 5 indicates to the respondents the awareness of
monitoring network servers via Nagios. Since our respondents complete the task list,
question 6 ask them to judge if the on-screen information and the organization of
menus of Nagios is useful. Question 7 looks for the easiness to use of Nagios for new
users by asking the respondents. Finally, question 8 investigates the overall
architecture design of Nagios which means the complexity of the monitoring tool to
be configured in order to perform any task.
41
4. 0 Chapter 4: Development
4.1 Network Design
This chapter refers to the implementation of the Nagios network management system.
It will be demonstrated how Nagios can monitor a number of network hosts and
associated services located in these hosts. The goal can be fulfilled by building a test
network and implement Nagios’s object configuration files to monitor the network.
Scenario: An organization Headquarters is connected with its branch site via an
IPsec VPN. Moreover, EIGRP routing protocol is configured between sites by
implementing Generic Routing Encapsulation (GRE). The router in Headquarters is a
42
router on-a-stick with the DLS1 switch which is connected with ALS1 and ALS2
switches. The HQ router assigns IP addresses to all three switches. The intent is to
prove that Nagios monitoring tool is the suitable solution for the needs of a small /
mid-size enterprise.
Note: Although this project involves configuration of Network Address Translation
(NAT), IPsec VPNs, and GRE, the detailed explanation of those technologies are out
of the scope.
Note: The required telecommunication infrastructure for this project are Cisco 3745
routers with Cisco IOS Version 12.4(15) T14 and the Advanced IP Services image
C3745-ADVSECURITYK9-M
Note: The image C3745-ADVSECURITYK9-M is also used in three “switches”
(DLS1, ALS1 and ALS2). However, a switch card is added visually via GNS3 so that
those three router to act as actual “switches” for the demand of the assignment. The
only difference with the actual configuration that will be tested in NYC Campus is
that the ip route 0.0.0.0 0.0.0.0 [ip address] command will be used instead of ip
default-network command.
4.1.1 Required Resources
It includes: 3 Routers, 3 switches, Serial and console cables and two personal
computers.
4.1.2 Topology Diagram
43
44
4.1.3 Addressing Table
Device -
Hostname
interfaces IP Address Description
HQ FastEthernet1/0.1 198.168.10.33 Connection to Vlan 1
FastEthernet1/0.100 198.168.10.65 Connection to Vlan
100
FastEthernet1/0.200 198.168.10.97 Connection to Vlan
200
Serial2/1 209.165.200.226 Connection to ISP
Loopback0 10.10.20.238 HQ email server
address
Loopback1 10.10.10.1 Connection to DNS
Tunnel0 172.16.100.1 Connection to Branch
Branch Serial2/1 209.165.200.242 Connection to ISP
Loopback1 192.168.1.1 Branch LAN
Tunnel0 172.16.100.2 Connection to HQ
ISP Serial2/0 209.165.200.241 Connection to Branch
Serial2/1 209.165.200.225 Connection to HQ
Loopback1 209.165.202.129 Simulating the
internet
DLS1 Vlan1 198.168.10.34 Connection to Vlan 1
Fa2/4 Connection with
Nagios / VLAN 100 /
IT
Fa2/9 Connection with ALS2
via etherchannel 2
Fa2/10 Connection with ALS2
via etherchannel 2
Fa2/11 Connection with ALS1
via etherchannel 1
Fa2/12 Connection with ALS1
via etherchannel 1
Fa2/0 Connection to HQ
ALS2 Vlan1 198.168.10.36 Connection to Vlan 1
Fa1/7 Connection ALS1 via
etherchannel 3
Fa1/8 Connection ALS1 via
etherchannel 3
Fa1/9 Connection DLS1 via
etherchannel 2
Fa1/10 Connection DLS1 via
etherchannel 2
Fa1/15 Connection with
45
VLAN 200 - USERS
ALS1 Vlan1 198.168.10.35 Connection to Vlan 1
Fa1/7 Connection with ALS2
with etherchannel 3
Fa1/8 Connection with ALS2
with etherchannel 3
Fa1/11 Connection with DLS1
with etherchannel 1
Fa1/12 Connection with DLS1
with etherchannel 1
Fa1/9 Connection with
VLAN 200 - USERS
4.1.4 Network Implementation
The lab will be implemented with GNS3 software which gives the opportunity to set
up virtually Cisco routers and switches by using actual Cisco IOS software. The good
thing with GNS3 is that it gives the ability to insert to it virtual PCs created by Oracle
VM Virtual Box. Thus, an actual host running Ubuntu or Windows OS (Operating
System) can be added. For the requirements of this lab, Nagios will be implemented
in Ubuntu. Next, Nagios will be able to connect to the network in order to monitor
hosts and services. It should be mentioned that a detailed explanation of the Network
Implementation is out of the scope of this chapter even if it is presented in detail in
Appendix E with Nagios configuration options as well.
Step 1: Insert Cisco IOS and VirtualBox to GNS3.
It should be mentioned that prior to any configurations made on router and switch,
virtualboxes have to be inserted. This can be done by selecting Edit--->
Preferences----> VirtualBox from GNS3 and pressing Test Settings. Then, the
message “vboxwrapper and virtulabox A.P.I 4.3.8 has successfully started” appears.
This message indicates that a virtual machine can now be inserted into the GNS3. By
pressing “Apply” the procedure is finished.
46
By selecting “VirtualBox Guest” tab, the VirtualBox - Virtual machines needed to
insert to GNS3 can be defined. Be pressing “Apply” the relevant procedure is
finished.
47
From the “End devices” panel, by dragging and dropping the desired VirtualBox
guest can be selected.
The next step is to insert the Cisco IOS image that will be used by virtual routers and
switches of the lab. Press the button “...” near the “Image file:” in order to locate the
Cisco IOS image from the hard disk and finally by pressing “Test Settings” and
“Save” the procedure is completed. This implies that the tab “IOS Images” is
selected form the “IOS images and hypervisors”.
48
Step 2: Set up the routers by configuring their hostname and interface
addresses.
A. Assigning the network cards to the router and “switches” as it is presented in the
following screenshots.
49
B. Cable the network as presented in the topology diagram. Assigning IP addresses to
the interfaces on Branch, HQ, and ISP.
C. Examine the status of the interfaces with show ip interface brief command
D. A default static route should be applied on the Branch and HQ routers in order to
reach ISP router.
E. Verify connectivity with ping from the Branch LAN interface to the serial 2/1
interface of the ISP, the ISPs loopback interface, and the serial 2/1 interface of the
HQ.
F. Verify Connectivity from the Branch router to the ISP’s serial 2/1 interface, the
ISP’s loopback interface, and the HQ serial 2/1 interface. Initiate pings sourced from
the loopback interface to see if it has successfully reached those external addresses.
The pings fail because the source 192.168.1.1 IP address is an internal private
address, and the ISP is unconscious of this address.
50
Step 3: Apply NAT on the Branch and HQ routers
The HQ and Branch sites has been supplied by the ISP with pools of public addresses
in order hosts with private IP addresses to access the web by using NAT. A static
NAT has to be configured to the HQ site so that the email server with public ip
address of 209.165.200.238 will be available to mobile users and Branch office users.
The commands show ip nat statistics and show ip nat translations can be used to
confirm the configuration of the NAT. Verify if NAT traffic exists by pinging the
ISP’s serial 2/1 interface, ISP’s loopback, the HQ serial 2/1 interface and the HQ
public email server address having as source address the Loopback interface of the
Branch.
Once again, the commands show ip nat statistics and show ip nat translations
verify if the NAT operates properly. Before verifying the connectivity from Branch
LAN to the HQ LAN interface, NAT translations have to be cleared. Then, the
command show ip translations is required to display any NAT translations.
Branch# clear ip nat translation *
Branch#
Branch#ping 10.10.10.1 source 192.168.1.1
Branch# show ip translations
Branch#
The ISP cannot route the traffic from Branch LAN to the private addresses of HQ
router so the NAT is not working. The solution to this problem will be the IPsec
VPN.
Step 4: Configure an IPsec VPN to connect the Branch and HQ routers.
For this assignment, an IPsec VPN configuration has been provided, in order to
assure and protect all unicast IP traffic within it. Several configurations have to be
applied if interior gateway protocols which support multicast or broadcast traffic must
be encapsulated within IPsec VPN unicast packets. The configuration of the IPsec
51
VPN on the Branch router can be verified by the show crypto session detail
command.
Step 5: Implement GRE over IPsec.
GRE tunnel over IPsec will protect all corporate LAN traffic between the Branch and
HQ sites. The GRE tunnel can be enabled to send multicast and broadcast traffic for a
dynamic routing. The show interface tunnel 0 command verifies that the tunnel is
active and the tunnel protocol is GRE over IP.
Step 6: Apply VLAN trunking on Fast Ethernet interface of the HQ router.
Implement three sub-interfaces for the intended three VLANs. Configure each sub-
interface with the proper trunking protocol, description and IP address. The show ip
interface brief command checks the status and the interfaces’ configuration.
Step 7: Configure basic switches parameters
A. Set password and username for the privilege mode and set them to be the
username and password for line vty and line console.
B. Assign for all three switches the management IP addresses on VLAN 1 and set the
default gateways to all three switches: ALS1, ALS2 and DLS1.
Step 8: Configure DLS1 for trunking with the HQ router
Configure switch DLS1 interface fast Ethernet 2/0 for trunking with the HQ router
Fast Ethernet interface 1/0.
Step 9: Configure trunks and Etherchannels between switches
Define the EtherChannel and the trunks ports:
A. From DLS1 to ALS1.
B. From DLS1 to ALS2.
C. From ALS1 to DLS1.
D. From ALS1 to ALS2.
52
E. From ALS2 to DLS1.
F. From ALS2 to ALS1.
G. By using show interface trunk command, it can be confirmed whether trunking
is enabled on DLS1, ALS1 and ALS2.
Step 10: Configure DHCP pools and define DHCP excluded-addresses on HQ
router.
Step 11: Configure VTP on ALS1, ALS2 and DLS1
Step 12: Configure Ports and verify port status
Step 13: Verifying if the two DHCP pools are working
Step 14: Configuring SNMP & related Access-lists on Router / Switches
Finally, SNMP in the Routers / Switches is configured, in order to allow Nagios to
get their data. This allows those data to be displayed by Nagios. Additionally, it is
vital to set up access lists in each router / switch in order Nagios to have the
privileges to acquire the data as it has been mentioned.
Step 15: Configuring HQ as NTP server
4.2. Active Monitoring
Nagios directly monitors the services of each agent of the agent itself by using
plugins. This type of monitoring is called Active. Plugins can be used by logic when
the state of a host or service should be monitored. Logic can be, in turn, used by
Nagios daemon to get the information required. Nagios has an embedded Perl
interpreter to interpret a plugin which is in the most cases a shell script that inspect a
host or service status. Nagios daemon sending notifications when receive the results
of the checks from the plugins. check_interval and retry_interval specify the the
frequency of these checks which are responsible for defining the status of hosts and
services. Steps to be followed for configuration are:
Step 1: Identify the network that needs monitoring.
Step 2: Select the IPs addresses of each hardware since nagios pings IP addresses.
53
Step 3: Identify the network Services that will be monitored by Nagios.
Step 4: Implement the configuration files for every agent which represent a network
or service and name every related file with the extension .cfg.
Step 5: Each service should be defined in the command.cfg file and every host should
be also defined in the nagios.cfg
Detailed explanation of the configuration files for the identified host and services of
this assignment is given in the Appendix E. The involved building configuration files
for monitoring are:
 HQ
 Branch
 ISP
 DLS1
 ALS1
 ALS2
 NTP Server
 Telnet
4.3. Passive Monitoring
Passive monitoring is required in the IT industry when private information of a
Server / PC host, such as number of its users, its load, and the total number of its
processes cannot be retrieved. The steps should be undertaken by authorized
personnel when a Server / PC host does not meet specific performance criteria. This
kind of procedure is mandatory because active monitoring check the running services
on the hardware. The data cannot be retrieved just by using the TCP/IP protocol.
Thus, installation of daemon on the client side is required if having administrative
privileges. The OS of the client is a Windows so NSClient++ agent should be
running on it. Detailed configuration of how Passive monitoring is applied in Nagios
can be seen in Appendix E.
54
5.0. Chapter 5: Evaluation
5.1. Network Elements that Need Monitoring
Small / Midsize organizations monitor continuously several of their networking
infrastructure elements as it is outlined below [Zoho Corp, 2010, p.1-2]:
 Email Servers: IT Managers should endure business continuity with the
external world via an Email server because the lack of email distribution
system may lead the organization to financial loss. Key metrics for an email
server are availability, mails in queue and size of received emails.
 WAN links: An organization is run smoothly in terms of network
performance when WAN link(s) is not over utilized. A network monitoring
tool should detect congestion, high response time and potential discards. Even
though optimizing the WAN links is crucial, IT Managers have to set
thresholds on routers and switches in order to ensure availability and
performance of their LAN interface as well.
 Business Applications: Services such as FTP, DNS, ECHO, IMAP, LDAP,
TELNET, HTTP and POP, are running on critical applications. Therefore,
these services and their applications should be monitored along with CPU,
memory and disc space monitoring. Furthermore, server’s traffic utilization
has to be monitored as well as applications and services located on them.
 LAN Infrastructure: Network devices such as switches, printers and wireless
devices are core elements of the network of one organization. Therefore, they
should be operational.
After having clarified which network infrastructure requires monitoring, the seven
stages of the evaluation phases for which the project manager of this thesis will be
responsible should be mentioned [Kantner, 1994, p.3]:
 Planning the test.
 Designing the test activities.
 Recruiting participants.
 Preparing the test materials.
55
 Setting up the test environment.
 Conducting the test.
 Compiling the test results.
5.2. Planning the Test
“Planning the test” stage defines the goals, methodology, participant selection
requirements, working procedure, schedule, and resource requirements for the test
session. Moreover, the network topology is defined. Furthermore, Ubuntu and Nagios
Server are required to be installed in advance. More information is available on
Appendices C and D.
Six participants were involved individually in the test, requiring about one hour per
session over one day. Moreover, 15 minutes of extra time were allowed for
participants to fill out the post-test questionnaire. A formal break of 15 minutes was
available between the participants’ sessions.
Team members introduced to participants the task - scenarios which they should
implement - solve. The participants were basically left to accomplish by themselves
the tasks that were asked. A time limit was not specified, but the participants were
encouraged to try to solve all tasks without help from the team members. More
information is presented on section “Experimenter Bias” in page 40.
The test monitor used a laptop to write down any significant comments such as task
completion date and whether the tasks were completed successfully. After all sessions
were completed, the test evaluator analyzed all data that were extracted from the test
sessions. Finally, a master list of usability issues was developed based on those data.
5.3. Designing the Test Activities
One of outcomes of this stage was a task list that described in details the tasks and
relevant issues. Each task should be completed within a predefined time limit. At that
time, team members designed the post-test questionnaire for the participants so that
56
screening could start right away. This kind of questionnaire was created on the basis
of the objectives of testing, in order to work in conjunction with the findings of the
test sessions. Next, team members reviewed the design of post-test questionnaire and
task list. The purpose of this was to determine technical priorities for Nagios. More
information about Post –Test Questionnaire, is available on Appendix K.
5.4. Recruiting Participants
A parallel activity to the “Designing the Test Activities” and “Preparing the test
materials” is the selection of participants. All of them are students at New York
College. More information is available on section 3.7.1 “Selection” in page 40. The
profile of participants, which was based on their academic and professional
background, is available in Appendix F. Dr. Pandithas served as test monitor
(moderator) in all six test sessions. The author of this paper acted as test evaluator by
compiling the results of the tests.
5.5. Preparing the test materials
A “Welcome” form was presented to participants by the test monitor (moderator) in
order to minimize possibilities of misunderstanding and in order to explain the
purpose of test sessions. More information is available in Appendix G. A convenient
form was available to participants for recording quick notes about tasks after the test.
It was ensured that important topics were noted consistently without the need of
viewing videotapes. More information is available in Appendix I. Five tasks were
developed for participants to perform during the test sessions. All the tasks remained
the same in all sessions. More information is available in Appendix H.
5.6. Setting up the test environments
A state-of-the-art usability laboratory (Room B6 of New York College) was
customized for the usability test sessions. This room was chosen because the
participants, who are students at New York College, would feel comfortable in that
room. Telephones had to be deactivated during test sessions, in order to prevent any
57
distractions. More information is available in section “History” of section 3.7.1 in
page 40.
Participants were informed about the time and place of test session in order to be
available to attend throughout the sessions. Team members (test monitor and test
evaluator) have informed their co-workers, colleagues and friends that would be
unavailable during the test sessions. The sessions were scheduled to allow some break
time between the six participants.
5.7. Conducting the test
Team Leaders ensured that all materials in each envelope were labeled with the
participants’ IDs. It was assured that nothing may have been removed from the
envelopes of participants at any circumstance. The post – test questionnaire gave
participants the opportunity to categorize any identified problems in the “Severity”
column as ‘Important’, ‘Medium’ and ‘Minor’. Moreover, they were encouraged
by the Team Leaders to pinpoint any location to the post – test questionnaire where
Nagios encountered any problem and to explain how it influenced the completion of
the task. It is implied that the Test evaluator informed the participants using the
appropriate code of conduct. More information about personal characteristics of the
participants during post – test questionnaire, is available in Appendix K.
5.8. Compiling the test results
Usability can be increased by positive exploitation of the final results according to a
detailed test report which will be presented in Chapter 7. The data deriving from
questionnaire’s responses was categorized based on a list of the usability problems
reported by test sessions, which are presented in Chapter 6. The author of this thesis
conducted the analysis of data, which required approximately a total of 20 working
hours.
 Task Completion
The six participants completed on average of 4 out of 5 tasks. Tasks 1, 2 and 3 were
58
completed by all participants. Task 4 was the most difficult, since only two of the six
participants accomplish it. Finally, Task 5 caused difficulties in two out of six
participants.
 Task Completion Time
Team Members recorded the time that each participant spent on completing each
tasks. The results of test sessions are presented in Table 14 reporting time that was
spent for each task.
Table 14: Presents the average time spent by all participants in each task against a
predefined baseline.
Tasks Total Time Baseline
Task 1 12 minutes 14 minutes
Task 2 6 minutes 7 minutes
Task 3 2 minutes 2 minutes
Task 4 12 minutes 12 minutes
Task 5 7 minutes 10 minutes
Cumulative 39 minutes 45 minutes
 Number of Usability Problems Identified
Two usability problems were identified in the five usability test sessions. Based on
the “Severity” column in Appendix I and on the descriptions of errors, the errors
were categorized as Important, Medium or Minor.
Table 15: Numbers of the usability problems identified during the testing
Tasks
Usability problems
Important Medium Minor Cumulative
Task 1 0 0 0 0
Task 2 0 0 0 0
Task 3 0 0 0 0
Task 4 1 0 0 1
Task 5 1 0 0 1
Cumulative 1 0 0
Note: On Command.cfg no rule was written by default for the command check_ntp,
check_ntp_time and check_ntp_peer as it had for other plugins. The web interface of
Nagios presented a critical error, although the service for monitoring the NTP Server
59
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL
LEFTERIS_PROJECT_FINAL

More Related Content

Similar to LEFTERIS_PROJECT_FINAL

Cloud network management model a novel approach to manage cloud traffic
Cloud network management model   a novel approach to manage cloud trafficCloud network management model   a novel approach to manage cloud traffic
Cloud network management model a novel approach to manage cloud trafficijccsa
 
Analysis of IT Monitoring Using Open Source Software Techniques: A Review
Analysis of IT Monitoring Using Open Source Software Techniques: A ReviewAnalysis of IT Monitoring Using Open Source Software Techniques: A Review
Analysis of IT Monitoring Using Open Source Software Techniques: A ReviewIJERD Editor
 
Present and desired network management to cope with the expected expansion, n...
Present and desired network management to cope with the expected expansion, n...Present and desired network management to cope with the expected expansion, n...
Present and desired network management to cope with the expected expansion, n...Alexander Decker
 
A study on practical uses of common Network protocols
A study on practical uses of common Network protocolsA study on practical uses of common Network protocols
A study on practical uses of common Network protocolsNeranjan Viduranga
 
Internet of Things Microservices
Internet of Things MicroservicesInternet of Things Microservices
Internet of Things MicroservicesCapgemini
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveWalid Shaari
 
WAN Design Project
WAN Design ProjectWAN Design Project
WAN Design ProjectD Ther Htun
 
Real-Time WebRTC based Mobile Surveillance System
Real-Time WebRTC based Mobile Surveillance SystemReal-Time WebRTC based Mobile Surveillance System
Real-Time WebRTC based Mobile Surveillance SystemDr. Amarjeet Singh
 
Real-Time WebRTC based Mobile Surveillance System
Real-Time WebRTC based Mobile Surveillance SystemReal-Time WebRTC based Mobile Surveillance System
Real-Time WebRTC based Mobile Surveillance SystemDr. Amarjeet Singh
 
Simple Network Management Protocol
Simple Network Management ProtocolSimple Network Management Protocol
Simple Network Management ProtocolNilantha Piyasiri
 
Analyzing and implementing of network penetration testing
Analyzing and implementing of network penetration testingAnalyzing and implementing of network penetration testing
Analyzing and implementing of network penetration testingEngr Md Yusuf Miah
 
Democratize Observability with Software Defined Packet Brokers
Democratize Observability with Software Defined Packet BrokersDemocratize Observability with Software Defined Packet Brokers
Democratize Observability with Software Defined Packet BrokersEnterprise Management Associates
 
Hostel Mess Attendance Management System using Wifi
Hostel Mess Attendance Management System using WifiHostel Mess Attendance Management System using Wifi
Hostel Mess Attendance Management System using WifiIRJET Journal
 
Network Management Network Management Model
Network Management Network Management ModelNetwork Management Network Management Model
Network Management Network Management Modeljeronimored
 
MAMSys-DISPOSE release version 1.1.1
MAMSys-DISPOSE release version 1.1.1MAMSys-DISPOSE release version 1.1.1
MAMSys-DISPOSE release version 1.1.1Azi Azwady Jamaludin
 

Similar to LEFTERIS_PROJECT_FINAL (20)

Cloud network management model a novel approach to manage cloud traffic
Cloud network management model   a novel approach to manage cloud trafficCloud network management model   a novel approach to manage cloud traffic
Cloud network management model a novel approach to manage cloud traffic
 
Analysis of IT Monitoring Using Open Source Software Techniques: A Review
Analysis of IT Monitoring Using Open Source Software Techniques: A ReviewAnalysis of IT Monitoring Using Open Source Software Techniques: A Review
Analysis of IT Monitoring Using Open Source Software Techniques: A Review
 
Present and desired network management to cope with the expected expansion, n...
Present and desired network management to cope with the expected expansion, n...Present and desired network management to cope with the expected expansion, n...
Present and desired network management to cope with the expected expansion, n...
 
A study on practical uses of common Network protocols
A study on practical uses of common Network protocolsA study on practical uses of common Network protocols
A study on practical uses of common Network protocols
 
PacketsNeverLie
PacketsNeverLiePacketsNeverLie
PacketsNeverLie
 
dc09ttp-2011-thesis
dc09ttp-2011-thesisdc09ttp-2011-thesis
dc09ttp-2011-thesis
 
Internet of Things Microservices
Internet of Things MicroservicesInternet of Things Microservices
Internet of Things Microservices
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
 
Intranet database
Intranet databaseIntranet database
Intranet database
 
WAN Design Project
WAN Design ProjectWAN Design Project
WAN Design Project
 
En35793797
En35793797En35793797
En35793797
 
Real-Time WebRTC based Mobile Surveillance System
Real-Time WebRTC based Mobile Surveillance SystemReal-Time WebRTC based Mobile Surveillance System
Real-Time WebRTC based Mobile Surveillance System
 
Real-Time WebRTC based Mobile Surveillance System
Real-Time WebRTC based Mobile Surveillance SystemReal-Time WebRTC based Mobile Surveillance System
Real-Time WebRTC based Mobile Surveillance System
 
Simple Network Management Protocol
Simple Network Management ProtocolSimple Network Management Protocol
Simple Network Management Protocol
 
Analyzing and implementing of network penetration testing
Analyzing and implementing of network penetration testingAnalyzing and implementing of network penetration testing
Analyzing and implementing of network penetration testing
 
Democratize Observability with Software Defined Packet Brokers
Democratize Observability with Software Defined Packet BrokersDemocratize Observability with Software Defined Packet Brokers
Democratize Observability with Software Defined Packet Brokers
 
Fulltext02
Fulltext02Fulltext02
Fulltext02
 
Hostel Mess Attendance Management System using Wifi
Hostel Mess Attendance Management System using WifiHostel Mess Attendance Management System using Wifi
Hostel Mess Attendance Management System using Wifi
 
Network Management Network Management Model
Network Management Network Management ModelNetwork Management Network Management Model
Network Management Network Management Model
 
MAMSys-DISPOSE release version 1.1.1
MAMSys-DISPOSE release version 1.1.1MAMSys-DISPOSE release version 1.1.1
MAMSys-DISPOSE release version 1.1.1
 

LEFTERIS_PROJECT_FINAL

  • 1. “Monitoring Networks Using Nagios” Eleftherios Iliopoulos University of Greenwich M.Sc. Course: Internet Engineering and Web Management Master Thesis 15September 2014 Dissertation Submitted to the University of Greenwich in partial fulfillment of the requirements for the degree of Master of Science in Internet Engineering and Web Management.
  • 2. UNIVERSITY OF GREENWICH Date: September 2014 Author: Eleftherios Iliopoulos Course Program: Master Thesis Name of assignment: Monitoring Networks Using Nagios Instructor: Pandithas Ioannis Pages: [75] Supervisor: Pandithas Ioannis ABSTRACT A specific network management strategy is required for keeping a network stable. If a company’s strategy includes a network monitoring tool, it will be known any time when network devices and services are in jeopardy. By preventing pro-actively network downtimes that may be caused by many reasons such as mis-configured hardware or software, a company increases network uptime, it provides better business services and it improves business continuity. Therefore, the profitability of the enterprise is guaranteed. The objective of this thesis is to investigate if Nagios is a suitable open source monitoring tool for small – midsize organizations. An overview in the literature section of FCAPS framework against specific Nagios’ functionalities is presented. Moreover, a concise description of Network Management and the technologies behind a monitoring tool are presented. Furthermore, the implementation of Nagios in detecting and responding to faults in a network is described. Relevant alerts to notify the system administrator accordingly are presented. With the use of laboratory experiment, it is proven that Nagios monitor a network effectively. Current research includes evaluation of Nagios under testing conditions and the relevant configuration files are presented. Keywords Nagios, Network monitoring tools ii
  • 3. ACKNOWLEDGEMENTS I consider this as a great opportunity to thank Dr. Giorgios Papamichail who gave me this opportunity to work under his supervision. I also thank Dr. Ioannis Pandithas for his important suggestions which helped me a lot for the successful completion of this assignment. It is not possible for me, to express in words, my thankfulness to my family that always encouraged me to achieve my goal. I am also thankful to Greenwich University for providing me the opportunity to obtain quality education.
  • 4. ABBREVIATIONS CEO Chief Executive Officer CGI Common Gateway Interface CMIP Common Management Information Protocol CMISE Common Management Information Service Element CPCC Central Piedmont Community College DDOS Distributed Denial of Service FCAPS Fault, Configuration, Accounting, Performance and Security FTP File Transfer Protocol GUI Graphical User Interface HTTP Hypertext Transfer Protocol IT Information Technology IP Internet Protocol MIB Management Information Base NOC Network Operations Center NMS Network Management System OID Object Identifier OS Operating System POP3 Post Office Protocol 3 SMTP Simple Mail Transfer Protocol SNMP Simple Network Management Protocol SSH Secure Shell TBFG Task-Based Focus Group TCP Transport Control Protocol UDP User Datagram Protocol
  • 5. LIST OF FIGURES Page 1. Figure 1. Typical network management Architecture [Online image]. http://www.cisco.com/en/US/docs/internetworking/technology/handbook/NM-Basics.pdf [Accessed 13th March 2014)].............................................................……........10 2. Figure 2. Characteristics of SNMP protocol in use: v1, v2c and v3. [Online image]. < http://www.cse.wustl.edu/~jain/cse567-06/ftp/net_traffic_monitors2/index.html >. [Accessed 22th July 2014]….. …………………………………………………………………………......12 3. Figure 3. A simplified SNMP architecture. [Online image]. http://jmiller.uaa.alaska.edu/cse465-fall2012/papers/fiang2002.pdf [Accessed 13th July 2014)]................................................…………………………………...………. ....13 4. Figure 4. Performance methodology flow. Usability Testing. P.192. Carnegie Mellon University. [Online image]. http://www.cs.cmu.edu/~cprose/LTI-6- UsabilityTesting.pdf [Accessed 13th July 2014)].………………….……...................34
  • 6. Table of Contents 1.0. Chapter 1: Statement of the Problem and Research Aim…………...….…..1 1.1. Introduction.................…………...….………………..…………...…………..1 1.2. Statement of the Problem.….……………...…..……..…………...……………1 1.3. Research Aim and objective………..…………………….……………......…..4 1.4. Research Questions…..………….…….………………….………………........5 1.5. Organization of Study………...……………..………….………………...…....6 2.0. Chapter 2: Literature Review………...…………..………...…..….……........7 2.1. The Definition of Network Management….………… …….…..…...…….......7 2.2. Network Management Architecture…..…...………….…………..….…......…7 2.3. Network Management Protocol ………………………………………...….....8 2.3.1 SNMP ……………………………..…………...………..……….…...………8 2.3.2 SNMP Messages Types ………...…………………….………...….……..…..9 2.3.3 SNMP and UDP ……………....…………………..….……………….…...…10 2.3.4 SNMP Management Information Base (MIB)....………....…..……….……...11 2.4. Functional Division of Network Management ……….……………….....…...12 2.4.1 Fault Management ……………..…………...………..….....…….…...………12 2.4.2 Configuration Management ……….……………………..……...…....…....…13 2.4.3 Accounting Management ………...………………….….………………....….14 2.4.4 Performance Management.............................…………..……….…………….14 2.4.5 Security Management ………….………………………….……..…………...15 2.5. Choosing systems management tools ………….……………………....……..15 2.6. Network Monitoring tasks ….................…………….………………………..16 2.7. Comparison Nagios against Industry Standards…..........………………….….21 2.8. The selection of Nagios …………………….……………...………………....28 2.9. Validating the Literature review outcome…........……….….…………….…..28 3.0. Chapter 3: Methodology……………………………….……….……….......29 3.1. Overview………………………………………………………………...…….29 3.2. Experimental Laboratory……………....………………..……………..………31 3.3. Usability Testing………………………………………...…………..……….…31
  • 7. 3.4. Performance Measurement ……..……………….…………………….………..32 3.5. Questionnaire…………………………………………………………...……….33 3.6. Validity threats related to Questionnaires...……………………………..………35 3.7. Validity threats related to Experimental Laboratory and Usability testing….….37 3.7.1. Internal validity threats..…….…………………………………………….…..37 3.7.2. External validity threats..……………………….……………………………..38 3.8 Designing Rationale of the questionnaires...................................................... 39 3.8.1 Importance of Design Rationale after Literature Review................................ 39 3.8.2 Design Rationale of the Post Test Questionnaire............................................ 41 4.0. Chapter 4: Development…................…………………………………………43 4.1. Network Design.......................................…….…………....……………....……43 4.1.1 Required Resources...............................…….…………....……………....……43 4.1.2 Topology Diagram................................…….…………....……………....……43 4.1.3 Addressing Table...............................................…….…………....……....……45 4.1.4 Network Implementation...................................…….…………....……....……46 4.2. Active Monitoring….…...........…………………………………………...…….53 4.3. Passive Monitoring...............…...…..……………………………………...……54 5.0. Chapter 5: Evaluation…....................………………………………………....55 5.1. Network Elements that Need Monitoring…….…………....……………...…....55 5.2. Planning the Test………..………………………………………………...….....56 5.3. Designing the Test Activities……………………………………………...…....56 5.4. Recruiting Participants……………………………………………………….....57 5.5. Preparing the test materials…...…………………………………………...…....57 5.6. Setting up the test environments..………………………………………...…….57 5.7. Conducting the test…..…………………………………………………...……..58
  • 8. 5.8. Compiling the test results…..………………………….…..……………...….....58 5.9. Funding Considerations………………………………….………………...…....60 5.10. Timetable……………..……..…………………………………………...….....61 6.0. Chapter 6: Findings…................................…………………………………....63 6.1. Results From Questionnaire…….……………………………….………...…....63 6.2. Usability Evaluation………..………………………………...…………...….....65 6.2.1. Effectiveness…………………….…………………………….…….…...…....65 6.2.2 Efficiency………………………………………………………..…….…….....66 6.2.2.1.Result of Satisfaction from Post – Test Questionnaire…..……………..…....66 7.0. Chapter 7: Conclusion and Future Work…....….................………..…….....68 8.0. Chapter 8: List Of References……..........................………….……………....69 APPENDICES…….........................................................………….………………..76 APPENDIX A: Information about interviewees.........................................................76 APPENDIX B: Nagios Evaluation Questionnaire .....................................................78 APPENDIX C: Ubuntu Installation…….....................................................................82 APPENDIX D: Nagios Installation on Ubuntu………...............................................92 APPENDIX E: Network Implementation and Nagios Configuration........................97 Network Implementation............................................................................................97 Nagios Configuration................................................................................................122 1. Active Monitoring….…...........……………...……………………………...…...122 1.1 Monitoring Routers..............……………………...………………………...…..125 Monitoring HQ....................…………………………………………….....…125 Monitoring BRANCH.........…………………………………………….....…131 Monitoring ISP....................……………………………………………....….133
  • 9. 1.2 Monitoring Switches................…………..…………………………………....138 Monitoring DLS1.................…………………………………………….....…138 Monitoring ALS1.................……………………………………………...…..143 Monitoring ALS2.................…………………………………………….....…146 1.3. Passive Monitoring...............…...…..…………………………………….....…149 1.4. Monitoring Network Services......………………………………………...……162 Monitoring NTP Server...........………………………………………...…..…162 Monitoring Telnet...............…………………………………………….....…163 APPENDIX F: Information about interviewees in the Laboratory Experiment......168 APPENDIX G: Test Script…………........................................................................169 APPENDIX H: Task List………………...................................................................170 APPENDIX I: Note Form……………….................................................................171 APPENDIX K: Post – Questionnaire..……..............................................................172 APPENDIX L: Gantt Chart.......................................................................................175 APPENDIX M: Source Code…………....................................................................180 Nagios Scripts.......................………….....................................................................180 Script of ALS1......................………….....................................................................180 Script of ALS2......................………….....................................................................182 Script of DLS1......................………….....................................................................186 Script of ISP_LOOPBACK1.....................................................................................189 Script of SERIAL0...............………….....................................................................190 Script of SERIAL1...............………….....................................................................191 Script of WinServer.......................…………............................................................194 Script of BRANCH...................……….....................................................................196
  • 10. Script of HQ.........................………….....................................................................198 Router / Switch Configurations……….....................................................................201 ALS1 Switch Configuration.....……….....................................................................201 ALS2 Switch Configuration.....……….....................................................................205 DLS1 Switch Configuration.....……….....................................................................208 BRANCH Router Configuration………...................................................................212 ISP Router Configuration………..............................................................................215 HQ Router Configuration………..............................................................................218
  • 11. 1.0 Statement of the Problem and Research Aim 1.1 Introduction Today’s complex network infrastructures are becoming critical components for the business success of an organization whether it is local or multinational. While network availability is a crucial element for a successful organization, sometimes it may lead an organization to business failure. Networks include hundreds or thousands of critical devices required for the successful operation of a business. Therefore the availability of hardware and software related to network functionality is essential. Managing the state of hardware is a serious task since critical business services depend on it. Clients and employees cannot perform transactions if network becomes unreachable resulting in productivity and profit reduction. Basic operations such as printing or sending emails are not feasible without network support. Moreover, incorrect changes in configuration caused by a junior administrator may have rippling effects on the health and availability of the network infrastructure. Therefore, acting pro-actively as a member of IT (Information Technology) department, in order to verify smoothly operation of network infrastructure, is important for securing customers' satisfaction. There is need for higher performance in availability of network support; in order to allow businesses to operate more fluently. The goal of higher performance can be achieved with active monitoring of networks in order to aid the identification and prevention of networking failures. Thus, the role of an IT manager is critical in promoting actions such as provisioning of network services, backup / restoration of device configuration; automate event correlation, problem isolation and problem resolution for greater network reliability. However, the reasons of problems faced are not always cleared and issues such as power outages or other external events cannot be prevented. IT manager’s goal is to gather, understand and act based on information such as performance statistics. In such a way, they can reveal problems in IT infrastructure that causes problems in the availability of network in the near future. 1.2 Statement of the Problem 1
  • 12. Network management practices have changed through the years. Thus, new tools and strategies are required in many organizations. IT departments have to evolve from reactive to proactive in the process of network management. Modern business requires changes in organizational design and realignment of IT department. Centralized management of network via monitoring tools inspires stuff to support vividly networking technologies throughout the organization. The first case describes the choices for network management tools and reveals the associated cost included in selecting any monitoring tool. The second case involves the ways in which management tools are helping IT departments to arrange some of the key challenges faced by network experts. The third case refers to the changing role of networking within a modern business and the following change in the requirements networking professionals have to fulfill in order to implement new technologies and obtaining new abilities. The final case discusses the concept of service monitoring as the prerequisite for the selection of a monitoring tool. In the last decades, communications technologies have increasingly undergone a revolution. The fast emergence of multiple protocols (and applications) and the development of equipments from multiple vendors enhance the complexity of the centralized management solution. The reason is the high level of heterogeneity in underlying equipments. The problem derives from the fact that the equipments from different vendors operate with different proprietary management protocols and implement heterogeneous management data models. Under these circumstances, employees of IT department have to deals with extra cost generated. Thus, it is required by the network administrators to deploy multiple expensive management platforms in order to manage the entire network. This fact will continue to exist unless the CEO of the IT department stops thinking that buying hardware from different manufacturers will help to minimize the risk of dependency from one manufacturer and reduce the expenditure of purchasing relevant hardware from the market. Consequently, network administrators have to use different monitoring tool according to the network equipment being used. Even if several supervision tools for proprietary management protocols can improve monitoring, an additional problem 2
  • 13. raised is their functional limitations when new types of components are introduced in the network. It is not rare phenomenon to see in practice, a system expert to try to find an appropriate solution to a technical problem derived from a monitoring tool which is implemented to work with a proprietary management protocol. [Kora, 2012, p.1199] The next topic relates to infrastructure that requires monitoring. Today, even on a small organization that have been operating with the same organizational structure for many years and with the same number of users that they can find it difficult to deal with an infrastructure that is growing fast. The growing numbers and types of devices in today’s business environment enhance the effectiveness and productivity of the entire organization because work can be achieved across greater distance. Now, a sales agent can meet a client outside the organization in order to close a deal so he wants to be able to access his email account with his personal smartphone. Therefore, this modern practice that contributes to business success cannot be achieved without network growth. Traditional, manual management functions seem to be out of date compared with the size of today’s infrastructures. However, an expanded network is not just a greater version of the network the company previously had. Therefore, the infrastructure must be supported and managed based on the new requirements. Moreover, the likelihood of a network outage caused from a human error along with the network complexity increase the concerns for availability, reliability, performance and security. In other words, the more the company is expanding in numbers of devices and volumes of data transferred, the more the demand on bandwidth is increasing, and the more the number of solutions required supporting network management functions. [IBM, 2012] IT managers want employees with additional skills in order to better align network operations with business requirements. Changing operations skills such as (1) Implementing mobile, UC, and TelePresence (2) Designing complex networks for applications (3) Tracking threats, protecting data, and providing network access control (4) Reporting on application and user SLAs (5) Troubleshooting content and performance issues, indicate a definite shift that is required in the networking team. 3
  • 14. The original demand to configure router or switches has expanded into a requirement to configure advanced application-oriented network software. Nowadays, it is essential to become much more proactive, due to the specifications related to tracking and protecting data and controlling access. Hardware - oriented metrics like availability and up-time reports have to be expanded with metrics related to applications and user SLAs. It is an important matter if a network link goes unavailable because it is priority matter that users and applications should be operational. The networking team should be able to conduct analysis and troubleshoot any problem. This is a vital business requirement and it pinpoints that new abilities have to be obtained at a faster rate than previously. Network experts believe that organizations are struggling with gaps in technology, in personnel abilities, and number of employees. In order to address the gap between existing and required abilities of networking experts possess, a good network management tool is critical to close the gap discussed before. [Shiao, 2008] Service monitoring is usually confused with single-purpose custom monitoring because it does not appear often in literature. Even if service monitoring, in its simplest form, can be described as the development and deployment of a wireless network, including a Perl script written to monitor the wireless network and associated services or establishment of a connection on a port, it can perform tasks and present the results within the context of a complete infrastructure using advanced features. Little or no extra effort is required in order to write a variety of tests using a Perl script to monitor the availability and connectivity of a service. A slightly more meaningful test would be to check a service response, for example checking the status code returned by a FTP (File Transfer Protocol) server. In terms of monitoring, the selection of a monitoring tool should rely on the services being monitored and the related objectives. [Silver, 2009, p.9] 1.3 Research Aim and objective This Thesis implements Nagios network monitoring tool and evaluates Nagios on the basis of how fast it can perform network monitoring without forgetting the fact that Nagios is free of charge. The goal is to make suggestions that will act as 4
  • 15. blueprints for improving the functionalities and usability of a system. The objectives of current research are: • Proving by examining real life cases that Nagios is a suitable choice of network monitoring for a small / medium enterprise. • Design a lab environment using pc, routers, switches and servers virtualization for usability testing. • Investigate how well Nagios addresses relevant functionalities by conducting and analyzing laboratory experiments. Participants on those lab tests will be employees of IT departments with relevant task at work. • Outline the suggestions based on the analysis of all empirical data collected by current research. 1.4Research questions The RQ (Research Questions) of this research are the following: RQ1: What is the basic theory behind a network monitoring tool? RQ2: What are the technical (functionalities) and non-technical criteria on choosing a network monitoring tool based on theoretical frameworks and industry standards? RQ3: How effectively does Nagios perform the network monitoring functionalities, theoretical frameworks and industry standards? Is Nagios suitable for small / midsize organizations? RQ4: How effectively does Nagios Core 3.x satisfy a small / midsize organization in practice? • It is very important to answer RQ1, because when a research is made for a monitoring tool one important thing is to take into account is what technologies have been performed by a Network Monitoring System. By answering RQ2 what is a standard monitoring functionality at the moment will 5
  • 16. be defined, based on FCAPS which will also be presented. This list will be useful to people who need to implement a monitoring system and understand what is required to implement. While answering RQ3, how well Nagios can perform monitoring tasks so that it is financially beneficial for every organization will be answered by the evaluation that will follow. By analyzing the functional benefits of Nagios, suggestions will be made, which can serve as a guideline to improve the functionalities and usability of network monitoring tool, which will be the outcome to RQ4. 1.5 Organization of Study This Thesis is structured as follows: Chapter 2 gives a brief overview of network management including the significance of SNMP (Simple Network Management Protocol). It covers what an open source monitoring tool should include and gives an insight of its functionalities. Moreover, reasons why specific methodologies are preferred during the various stages of the thesis are explained in Chapter 3. Chapter 3 mentions issues such as Experimental Laboratory, Usability Testing and Performance Measurement along with related validity threats. Τhe development process is examined in Chapter 4. Chapter 5 outlines the evaluation of Nagios via a set of predefined tasks. Chapter 6 analyzes the findings of this research, selected by interviewing networks experts using questionnaires in order to crosscheck the RQ2. Moreover, an analysis is shown of the post-questionnaires used before Experimental Laboratory. Chapter 7 suggests actions for future improvements. Appendices outline the technical requirements for installing Nagios and Ubuntu 12.04, the lab environments and associated topology. 6
  • 17. 2.0 Literature Review 2.1 The Definition of Network Management The management and operations of modern networks and network services involve a great deal of operational tasks such as dealing with planned maintenance activities, mass traffic events, cable cuts and hardware failures. Network management means different thing to different people. In general, network management is a service that involves according to Cottrell (1992): “managing the delivery of an agreed upon service level to the user.” Features of network management are described by Boutaba (2002) as: 1. Fault management. 2. Configuration management. 3. Performance management. 4. Security management. 5. Accounting management. Network management enables operators to handle the complexity and scale of the above network management/operations functions with the help of a network monitoring tool. While each of these functions is distinct, they all occur in the same network. At this section a simplified view of the network operations framework is presented. A detailed view of FCAPS (Fault, Configuration, Accounting, Performance and Security) will be examined later. This thesis is primarily dealing with fault and performance management, as important aspects of network management. [Jianguo D, 2010, p.10] 2.2 N e t w o r k Management Architecture The most well-known aspect of network management system is network performance. The network management architectures consist of a centralized network management entities and management agents running on network devices and computer systems. Using a management protocol, the network management entities send polls in order to get information about network devices. Agents return requested 7
  • 18. information ranging from bandwidth usage to CPU load when problems are recognized in these services. Using this information, management entities react by executing a group of actions including performance and error reporting to network administrators. It is important to be understood that agents are software modules whose first duty is to compile information related to the managed devices they locate. Then this information is stored in a MIB, and it is finally sent to the management entities within NMS (network management systems). Management protocols with great acceptance are the SNMP (Simple Network Management Protocol) and CMIP (Common Management Information Protocol). Entities that provide management information on behalf of other entities are the Management proxies [Moceri, 2010, p.2] Figure 1: Typical network management architecture composed of a management station and various agents. 2.3 Network Management Protocol 2.3.1 SNMP The SNMP is designed to let management information be exchanged between SNMP agents and management stations on a TCP/IP internetwork. The protocol defines the type of network management, information storage databases and the structure of data in use. Information called SNMP objects can be provided by the SNMP agents. 8
  • 19. SNMP objects are the device's network configuration and operations, such as the device's network interfaces, routing tables, IP (Internet Protocol) packets sent and received, and IP packets lost and stored to MIB (Management Information Base) in a standard format defined for each object. Even though it is possible to set SNMP to work via TCP, it is not the best practice for larger networks due to the large number of connections. Thus, SNMP relies on UDP (User Datagram Protocol) as a transport protocol. A standard manner to view and alter network management information on hardware from multiple vendors can be provided by SNMP along with MIB. Any monitoring or management application that uses SNMP can access MIB data on a specified device. [Mauro, 2001] READ/WRITE are the two basic operation modes of SNMP protocol. While the READ/WRITE mode enables setting certain variables on the specified device, the READ mode permits only reading the SNMP variables from a specified device. Configuring an agent with the READ/WRITE mode, with only one OID variables in the MIB base should be set to include only a specific OID value. In this case, WRITE access to other OID values would be forbidden. Thus, it is possible to set limitations in the MIB base. [Wikipedia, 2013] 2.3.2 SNMP Messages Types SNMP version 1, the initial version of the SNMP protocol introduced five protocol data units that are still supported in current versions of the protocol. The GET REQUEST is used to retrieve the value of a variable or list the variables of a network data object by sending a relevant request. The GETNEXT REQUEST does the same thing with the exception that the request is the next value in a sequence of a data object after the GET REQUEST. Agents send GET RESPONSE data units to GET REQUEST and GETNEXT REQUEST requests. SET REQUEST data unit is sent by Management stations to set the value of a variable or list variables on a specified device. When agents want to notify management stations for events taking place, they send asynchronously TRAP messages. SNMPv2 includes revision improvements for SNMPv1 in the key areas of performance, security, confidentiality, and manager- to-manager communications. GETBULK performs sequential requests more 9
  • 20. efficiently by permitting a management station to request larger amounts of management data rather than having to repeat again a sequence using GETNEXT. The INFORM message type was originally defined as another version of TRAP that is acknowledged by the management station. SNMPv3 primarily increased cryptographic security and remote configuration to the protocol making it in the preferred version to use. [Matt, 2006] Message Usages GET REQUEST Used by Manager to retrieve a specific piece of network information. GETNEXT REQUEST Used by Manager to iteratively retrieve a sequence of information. GET RESPONSE Used by agent to send information to Manager in response to a request. SET REQUEST Used by a Manager to initialize or change the value of a management object. TRAP Used by agent to report an alert or other asynchronous event to the Manager. GETBULK Introduced in SNMPv2 to retrieve a sequence of information as a faster alternative to GETNEXT. INFORM Introduced in SNMPv2, an acknowledged version of TRAP. Figure 2: Characteristics of SNMP protocol in use: v1, v2c and v3 are given above. 2.3.3 SNMP and UDP 10
  • 21. Figure 3: A simplified SNMP architecture is given in above. SNMP uses UDP, as transport protocol, for passing data between managers and agents because it has not the overhead of TCP (Transmission Control Protocol). The impact of UDP reduces network's performance so it requires low overhead due to the unreliable nature of it. UDP has been chosen over TCP protocol because there is no acknowledgment for lost datagrams at the protocol level. Thus, there is no end-to-end connection between agent and NMS when datagrams (packets) are sent back and forth. If the NMS does not receive a response, it simply assumes the packet was lost and retransmits the request. Sequencing is not required because each request and each response travels as a single datagram. The number of times the NMS retransmits packets is also configurable. The unreliable nature of UDP is not a real problem but the process differs for traps. The NMS has no way of knowing if an agent sends a trap and the trap never arrives. All management stations use the UDP port 161 for sending and receiving requests to agents and agents send TRAP messages to management stations on UDP port 162. [Kozierok, 2005] 2.3.4 SNMP Management Information Base (MIB) The MIB is a collection of the managed objects that make up the "management 11
  • 22. information". Each agent has its own MIB. NMS can read or write in the MIB of the managed objects. MIB defines a set of characteristics in a standard format associated with the managed objects such as the OID (object identifier), access right and data type of the objects. MIB defines data using a tree structure. Each node of the tree is related with a managed object and can be uniquely identified by a path starting from the root node. Each object in the MIB can be uniquely identified by a string of numbers and a text name. This string of numbers is the OID of the managed object system [Ipswitch, 2001]. 2.4 Functional Division of Network Management The ISO has contributed to a well-defined network management reference model for network standardization. The OSI model breaks network management into five functional divisions which are sometimes referred to as FCAPS so that the major functions of network management systems are understood. The above divisions are discussed in the next sections based on Shields (2007, p.5-8) and Parker (2005, p.4): 2.4.1 Fault Management Fault management involves trouble management, which has to do with searching for detection functions for service, fault recovery, and proactive maintenance, which provides capabilities for self-healing. Trouble management triggers alarms for network anomalies or failures and performs diagnostic tests to isolate faults in hardware or a service. Not only does it trigger service repair but it also accomplishes important measures to fix the diagnosed fault. Proactive maintenance performs routine maintenance to near-fault conditions and fixes problems before service troubles are reported to the NMS. FCAPS model identifies twelve management tasks as important for a good fault management system:  Fault detection  Fault correction  Fault isolation  Network recovery 12
  • 23.  Alarm handling  Alarm filtering  Alarm generation  Clear correlation  Diagnostic test  Error logging  Error handling  Error statistics 2.4.2 Configuration Management Configuration management is involved with resource provisioning and service provisioning. It identifies records and maintains network configuration in order to be able to update configuration parameters and to ensure normal network operations. The configuration management which faces three kinds of networks: logical, service, and custom, involves the following management tasks:  Resource initialization  Network provisioning  Auto-discovery  Backup and restore  Resource shut down  Change management  Pre-provisioning  Inventory/asset management  Copy configuration  Remote configuration  Automated software distribution 13
  • 24.  Job initiation, tracking, and execution 2.4.3 Accounting Management Accounting management processes and manipulates services related to user management and administration. Moreover, accounting management creates and verifies billing for usage of network resources and services. The below list resumes the eight tasks that enable accounting management for monitoring tools:  Track service/resource use  Cost for services  Accounting limit  Usage quotas  Audits  Fraud reporting  Combine costs from multiple resources  Support for different accounting modes 2.4.4 Performance Management Performance management deals with processes that ensure the reliability and quality of network performance based on their capability to fit user service-level goals. It includes evaluation of vital performance entities such as network throughput, resource utilization, delays, congestion level and packet loss, and reporting if quality of network resources is below a certain level. Performance management systems are responsible for the following issues:  Utilization and error rates  Performance data collection  Consistent performance level  Performance data analysis  Problem reporting 14
  • 25.  Capacity planning  Performance report generation  Maintaining and examining historical logs 2.4.5 Security Management Security management protects non authorized access to network resources, its services and data against all security threats such as accidental abuse, unauthorized access, and communication loss. In addition, it ensures user privacy and control over user access privileges that derive from a range of access modes like operations systems, service provider groups and customers. The following activities are crucial for an efficient security management system:  Selective resource access  Access logs  Data privacy  User access rights checking  Security audit trail log  Security alarm/event reporting  Take care of security breaches and attempts  Security-related information distributions 2.5 Choosing systems management tools The factors, not related to technical issue, that affects a small (or medium) sized company to select the IT monitoring tool it will use, are the following [Curry, 2008, p.7] [Drogseth, 2006, p.4, 6] [Hale, 2012, p.11-12]:  Ease to use – not based on usability of demos, but based on usability of implementation in a real world scenario.  Skills mandatory to implement the specifications versus skills available.  Specifications for and availability of user training. 15
  • 26.  Cost such as licenses, tin, evaluation time, maintenance and training.  Support – from supplier and/or communities.  Scalability.  Deployability – management server(s) ease of installation and agent deployment.  Reliability.  Accountability – the ability to sue / charge the dealer if expectations are not reached A prioritized list of basic requirements that meet Burgess’s (2005, p.3) expectations is helpful, since a successful implementation of a network monitoring tool combines those specifications.  Open Source software  Very energetic forum / mail lists  Established history of community support and regular fixes and releases  Centralized, open database  Both Graphical User Interface (GUI) and Command Line Interface (CLI)  Easy deployment of agents  Scalability to several hundred devices  Adequate documentation 2.6 Network monitoring tasks After having analyzed the Network Management Functions of FCAPS framework, the monitoring functionalities for each of network management functions will be defined. In order to support the evaluation of Nagios monitoring tool for small and medium organization, findings of Section 2.4 (“Functional Division of Network Management”) below the monitoring functions (tasks a NMS should do based on literature review and the criteria set by network industry) are listed below. Those findings will be used for benchmarking on evaluating Nagios as a monitoring tool. Fault and Performance functionalities and their important sub-functionalities are 16
  • 27. presented in details along with their relevant key metrics. Moreover, Configuration, Accounting and Security Functionalities are mentioned briefly due to their affection in the selection of a monitoring tool even though they are out of the scope of this thesis. Table 1: Fault monitoring tasks and their key metrics [MindShare Services, 2007] Tasks Key Metrics Fault Monitoring Fault detection  Mean – Time Between Failures  Mean – Time To Restore  Network Uptime Fault correction Fault isolation (Network Mapping / graphs) Network recovery Alarm handling Alarm filtering Alarm generation Clear correlation Diagnostic test Error logging Error handling Error statistics Table 1.1: Fault detection task and its sub-tasks Task Sub-Tasks Fault detection Passive fault management Active fault management Table 1.2: Alarm / Event Generation task and its sub-tasks Task Sub-Tasks Alarm / Event Generation Sending an email message Sending an SMS message to a cell phone or pager Playing a sound or recorded message on the management workstation Logging the alert to the Network Event log 17
  • 28. Logging to a text file Sending a Syslog message Sending an SNMP trap Logging the alert to a Microsoft Windows event log Sending a Microsoft Windows Net- Message Executing an external program Executing a script Speaking an alert message using a text-to-speech engine 18
  • 29. Table 1.3: Fault correction task and its sub-tasks Tasks Sub-Tasks Fault correction Device/service restart Reconfiguration Security action Table 2: Configuration Monitoring tasks and its key metrics [The Configuration Management Planning Group, 2013] 19 Tasks Key Metrics Configuration Monitoring Resource initialization  MTTR Reduction  Loss of Business Revenue  Simple count on number that a configuration does not match held information  The amount of elapsed time that passes from the approval of a change to the actual implementation of that change  The number of components that are identified as “unauthorized” Network provisioning Auto-discovery Backup and restore Resource shut down Change management Pre-provisioning Inventory/asset management Copy configuration Remote configuration Automated software distribution Job initiation, tracking, and execution
  • 30. Table 3: Accounting Monitoring tasks and its key metrics [Creanord, 2013] Tasks Key Metrics Accounting Monitoring Track service/resource use  SLA Based resource allocation  Trend Analysis  Resource utilization  Network inventory information for costing  capacity planning Cost for services Accounting limit Usage quotas Audits Fraud reporting Combined costs from multiple resources Support for different accounting modes Table 4: Performance Monitoring tasks and its key metrics [Jain, 1991, p.40] [Benoit, 2007, p. 9-11] Tasks Key Metrics Performance Monitoring Utilization and error rates  Bandwidth Utilization  Network Latency  Interface Errors and Discards  Network Hardware Resource  Utilization (CPU load, memory usage, and buffer usage) Performance data collection Consistent performance level Performance data analysis Problem reporting Capacity planning Performance report generation Maintaining and examining historical logs 20
  • 31.  Availability Table 4.1: Performance data collection task and its sub-tasks [Shields, 2007, p.26] Tasks Sub-Tasks Performance data collection Input/output bits/second Current/average response time Peak traffic load Interface errors/discards Percent packet loss Table 5: Security Monitoring tasks and its key metrics [PCI Security Standards Council, 2010, p.8] Tasks Key Metrics Security Monitoring Selective resource access  Password policies  Acceptable use policies  Lockdown and access policies  Mobile device access and lockdown policies  Business data encryption policies  Antivirus, anti- spam, anti- malware, and anti-spyware policies  Security policy violation adjudication procedures Access logs Data privacy User access rights checking Security audit trail log Security alarm/event reporting Take care of security breaches and attempts Security-related information distributions 21
  • 32. 2.7 Comparison of Nagios against industry Standards In the following tables (from Table 6 to Table 10) the major monitoring tasks of FCAPS against Nagios Core 3.x’s Functionalities [Silver, 2009, p.12] [Gaur, 2003, p.6-8] [Curry, 2008, p.143-146] are presented as a result of literature study. In addition, it is proven (Τable 11) that Nagios fulfils some other non-technical requirements as they are posed in the section 2.5 of this chapter [Golden, 2007] [Rusalan, 2010, p.7-8] [Nagios, 2013]. The conclusions from these two comparisons suggest that Nagios is an ideal solution for for small to medium organizations in terms of manpower. Table 6: In the next table the way in which Nagios complies with Fault Monitoring standard of FCAPS is presented as simply as possible. Tasks Nagios Comments Fault Monitoring Fault detection Yes (alarms, warning...) Supports NRPE / NSClient No SNMP TRAP handling SNMP support V1, 2 & 3 Fault correction Yes Fast Event handlers allow automatic restart of failed application and services Fault isolation Rootcause Analysis Network Mapping / graphs UNREACHABLE status for devices behind network single point of failure. Also, host / service dependencies. Network recovery Yes Via plugin (Nolio 22
  • 33. plug-in) Alarm handling Yes Alarm filtering Yes Escalation capabilities ensure alert notifications reach the right people Alarm generation Yes email / pager notifications Clear correlation Yes Diagnostic test Yes Error logging Yes Error handling Yes Error statistics Yes PNP4Nagios plug- in Table 7: Nagios connection with Configuration Monitoring standard of FCAPS. Tasks Nagios Comments Configuration Monitoring Resource initialization Yes Network provisioning Yes Auto-discovery Yes Node discovery / Interface Discovery / Service (port) Discovery / Application discovery Backup and restore Yes Stores configuration in flat files with simple format in a SQL database Resource shut down Yes 23
  • 34. Change management Yes Using Perl or PHP Pre-provisioning Yes Inventory/asset management Yes Via plug-in Copy configuration Yes Remote configuration Yes NRPE 2.15 Automated software distribution No Job initiation, tracking, and execution Yes Table 8: Nagios connection with Accounting Monitoring standard of FCAPS Tasks Nagios Comments Accounting Monitoring Track service/resource use Yes Trending and Capacity planning add- ons ensure you are aware of aging hardware Cost for services Yes Availability reports ensure SLAs are being met Accounting limit Usage quotas Yes Keeps a history of alerts and downtimes for all hosts and services checks 24
  • 35. by default Audits Yes Fraud reporting Yes Combine costs from multiple resources Yes Support for different accounting modes Yes Table 9: Functionalities in the Performance Monitoring standard of FCAPS are related with functionalities performed by Nagios in the same area in the next table. Tasks Nagios Comments Performance Monitoring Utilization and error rates Yes Monitoring of network services and host resources Performance data collection Yes PNP4Nagios plug-in Consistent performance level Yes PNP4Nagios plug-in Performance data analysis Yes PNP4Nagios plug-in Problem reporting Yes PNP4Nagios plug-in Capacity planning Yes PNP4Nagios plug-in Performance report generation Yes PNP4Nagios plug-in Maintaining and examining historical logs Yes Historical reports provide record of alerts, notifications 25
  • 36. outages, and alert reports Table 10: Connection between Security Monitoring standard of FCAPS against the related functionalities of Nagios. Tasks Nagios Comments Security Monitoring Selective resource access Yes Access logs Yes Data privacy No User access rights checking Yes An administrator can prevent access to certain parts on a per- user or per-role basis Security audit trail log Yes Security alarm/event reporting Yes Take care of security breaches and attempts Yes Security-related information distributions Yes Table 11: Non-technical requirements of a monitoring tool posed by industry fulfilled by Nagios Industry defined standards Nagios Open Source free software Yes Very active forum / mail lists Yes Established history of community Yes 26
  • 37. support and regular fixes and releases Centralized, open database Yes Easy deployment of agents Yes Scalability to several hundred devices Yes Adequate documentation Yes Ease of use Yes Skills necessary to implement the requirements versus skills available. No Requirements for and availability of user training No Cost Minimum Support (from supplier and/or communities) Yes Scalability Yes Deployability (management server(s) ease of installation and agent deployment) Yes Reliability Yes (Accountability – the ability to sue / charge the vendor if things go wrong) No( only in Nagios XI) Both Graphical User Interface (GUI) and Command Line Interface (CLI) No 2.8 The selection of Nagios Although the functionalities of Nagios Core 3.x listed in literature revive can be applied in large companies, it is difficult to apply to a relevant company. Network management requirements and expectations are different from the network of a small organizational, due to limited technical skills of company’s staff. Using monitoring tools that are financially affordable, easy to install and use and able to monitor all their resources is a priority for any company. [Zoho Corp, 2010] Reid (2008), Ayadi (2013) and Curry (2008, p.148) argues that Nagios is the best monitoring system for any small / medium size network. They claim that Nagios compared to other monitoring tools is better because: 1. It has very low specifications 2. It has many plugins to use. 3. It supports SNMP keeping monitoring simple 27
  • 38. 4. Nagios can be installed and run in 15 minutes with basic configuration 5. It has good built-in documentation 6. It supports more network devices in the free version 2.9 Validating the Literature review outcome To answer RQ2, it is important to define the functionalities which should be performed by an automated NMS. The outcome of literature review (including RQ1 and RQ3 as well) should be validated with the use of questionnaires that will be completed by professionals of the field. Moreover, the level of consistency between Nagios’s functionalities and FCAPS framework should be defined. Analysis of data selected with the help of questionnaires will answer whether Nagios is the best available monitoring tool available for a small / medium organization. The questionnaire is presented in Appendix B and the list of the participants is presented in Appendix A. More information about research methodology is included in Chapter 3. 3.0 Methodology 3.1 Overview There are three kinds of research methodologies in software engineering: (1) Qualitative methodology, which seeks to extract and analyze the required information from books, papers, observation, interviews and web sources in order to justify or improve a theory. (2) Quantitative methodology, which collects numerical data and examine dependency relationships among variables with the use of statistical methods. (3) Mixed methodology that includes both types of research methodologies (qualitative and quantitative) in a single research. [Bazeley, 2002, p.2] The selection of the appropriate research methodology is important for the success of a research project. In general, a combination of two or three data sources may be most effective in achieving a particular research objective. To answer the research 28
  • 39. questions of this Thesis a mixed research methodology is adopted. More specific, a triangulation approach methodology is selected to be used for cross - validating results obtained by research methods. Quantitative and Qualitative data are collected concurrently but they are analyzed and interpreted separately. Triangulation gives opportunity to researcher to mix both quantitative and qualitative research approaches within a stage of the research process. [Conrad C. and Serlin R, 2010, p.155] The Qualitative method was used to answer RQ1, RQ2 and RQ3, based on the finding of literature review and based on questionnaires with professionals on managing network infrastructure, in order to verify result obtained. Moreover, experiment, the most common quantitative method, was used to check results against predefined metrics (benchmarks). The experiment approach was used to answer RQ4. A post-test questionnaire will be completed by the six participants, for verification of the results after the experiment. The technique that will be used during the experiment will be the TBFG (Task-Based Focus Group) technique, in which a set of tasks - scenarios is given to the participants for implementation, followed by discussion afterwards. The drawback of TBFG, as Downey (2007, p.141) mentions, in comparison with Group usability is minimized in this Thesis with the use of professionals with great career in Network Management. Thus, empirical data is gathered without the need of many observers. Qualitative analysis of the results will be performed, with the comparison and display data on Microsoft Excel. Table 12 below displays the methodology to answer each research question: Table 12: Research questions with methodology employed Research Question Method(s) RQ1 Literature review + Interviews with professionals in network management domain. Quantitative survey.RQ2 RQ3 RQ4 Performance measurement + usability testing (quantitative analysis) based on empirical data collected from the 29
  • 40. experiment and the post-test questionnaire. 3.2 Experimental Laboratory This method is selected because controlled laboratory experiments give researchers the advantage of control. One of the three major purposes that laboratory experiments serve is to test and refine existing theory. Furthermore by using experiments we can bridge the gap between theory and real business problems. The art of designing good experiments is in creating simple environments that capture the essence of the real problem that can be interpreted with the support of data exposed. A good experiment allows researcher to clearly distinguish among possible explanations while abstracting away all unnecessary details. The most important factor that makes experimental work rigorous is theoretical guidance. To interpret the results of an experiment, researchers need to be able to compare the data with theoretical metrics (benchmarks). Thus, the first step in doing experimental work is to start with an theory such as the research questions of this thesis. [Katok, 2011, p.1-3] 3.3 Usability testing 30
  • 41. A system may have excellent quality of use for some people and poor quality of use for others. Many approaches of usability focus specifically on problems faced by users, related with a graphical interface. Although it is important to eliminate problems on interface, it can be a misleading indicator of overall usability. Usability depends on the specific tasks people want to do when they use an application. Most users on usability testing face several trivial problems, rather than facing a single fatal problem which causes task to fail. The objectives of a usability testing vary considerably, relying on what is tested and why so easy-to-use widgets may not give to the application an acceptable level of usability. In order to get reliable results on usability testing, the design of a test should include and evaluate wider usability requirements. Therefore, usability may relate to the safe and efficient performance of specific critical tasks by operators on the system. [Macleod, 1994] The main purpose of a summative test of a complete product with representative users and tasks designed is to evaluate the usability, via defined metrics, rather than diagnose and correct specific design problems. The usability requirements should be task-based and tied directly to product requirements in order to implement a usability benchmark. [Usability Professionals Association, 2010] Testing should include a lot of measures - metrics which can be categorized into four categories as it has been suggested by Lewis (2006, p.7): • Goal achievement indicators (such as success rate and accuracy) • Work rate indicators (such as speed and efficiency) • Operability indicators (such as error rate and function usage) • Knowledge acquisition indicators (such as learnability and learning rate) 3.4 Performance measurement 31
  • 42. Performance measurement is the basis of the usability engineering life-cycle for assessing whether goals have been met or not. In traditional research on human factors studies, measurements take place by having a group of users performing a predefined set of tasks: Figure 4: Performance methodology flow The objectives of usability evaluation are broken down into two components as presented in Figure 4. Next, their relative importance is evaluated based on goals deriving from the research questions. Once the components of the goal have been decided, it is necessary to quantify them by measuring the average time it takes a user to complete a specified set of tasks - scenarios. The selected tasks to evaluate are representative of users’ normal task in a working environment. This technique will generally define the interaction between participants and the application – interface, during laboratory experiment that will affect the quantitative performance data. Performance evaluation will obtain quantitative data from participants by measuring the time required for each task with the use of a stopwatch. The time calculated will be reported by participants in the post-time questionnaire so that the data will be 32
  • 43. collected accurately without unexpected interference. [Nielsen, 1993, p.193] Applicable stage: test and deployment. Personnel needed for the evaluation: Usability experts: 2 Software developers: 0 Users: 6 Usability issues covered: Effectiveness: Yes Efficiency: Yes Satisfaction: Yes Can be conducted remotely: No Can obtain quantitative data: Yes 3.5 Questionnaire The questionnaire is the preferred method for collecting information about the three research questions under investigation in this thesis. Close-ended questionnaires’ format is easy to conduct, easily coded and analyzed. They permit comparisons and quantification, and are more likely to measure degrees of difference with nominal, ordinal, interval and ratio levels while avoiding irrelevant responses. The basic principle is that the two questionnaires have to embody as many questions as necessary and as few as possible so they should be designed and formatted by researchers whose main concern is length. The two questionnaires should be written in such a way that test users cannot be identified and the test results should be kept private. An extensive understanding of the possible range of participant responses is required due to the huge amount of data that is going to be processed. To achieve reliable and valid outcomes, each question must be checked, edited and coded before being included in the questionnaire in order to provide that each participant and test evaluator can decipher its meaning easily and accurately. To achieve reliability and validity, questionnaires should be short and simple. Questionnaire design should be piloted to test if any major defects exist. The pilot phase is used to verify that post-test questionnaire will provide useful information. Concepts such as “Strongly disagree”, “Disagree”, “Neither agree nor disagree”, “Agree”, and “Strongly Agree” require training and feedback to be understood. Consequently, the test monitor (Moderator) should explain to participants the meaning of ratio judgments of post-test questionnaire in the pilot test. The pilot test will detect unintelligible questions producing unquantifiable responses and unwanted 33
  • 44. outcomes before embarking on the main study. The purpose of the experiment is to measure the performance of experienced users by doing a laboratory type of study. Moreover, ISO 9241 standard, part 12, defines usability in terms of effectiveness, efficiency, and satisfaction. The post-test questionnaire intends to measure with metrics the usability of a software application and select extra data of user satisfaction. Each participant will be asked to record the required time to complete a task in the post-test questionnaire. If an error occurs, the test monitor (Moderator) will ask users to repeat the task immediately. Then, users will rate the difficulty of each task using the rating types mentioned above. Time limitations influence the performance measurement of each task, so the Likert Scale will be adopted in the post-task questionnaire, since they are easy to be completed by users. It is a common practice to establish a baseline for each question in order to measure the success of Nagios in the evaluation phase. Baseline values will be mentioned in Chapter 5. [Dumas, 2010] Table 13: Advantages and disadvantages of the Face-to-Face mode of delivery of questionnaire as Bird notes: (2009, p.1313) Advantages Disadvantages Complex questions can be asked. Costly. Can motivate participants. Time consuming. Longer verbal responses compared to written. Spatially restricted. Questions can be clarified. Answers may be filtered or censored. Question sequence controlled. Interviewer’s presence may affect responses. Vague responses can be probed. Visual prompts can be used. Long questionnaires sustained. High response rates. 34
  • 45. 3.6 Validity threats related to questionnaires Questionnaires have both strengths and weaknesses. Questionnaire is the most objective research tool because it can provide generalizable results. However, large sample of data in questionnaires can generate problem due to factors such as faulty questionnaire design, sampling errors, non-response errors, and biased questionnaire design. Moreover, respondent unreliability, ignorance, and misunderstanding, errors in coding and faulty interpretation of results may cause additional problems. [Harris, 2010, p.1-2] To improve the accuracy of testing, it is important to pay attention to the issues of reliability and validity. Reliability is the question of whether one would get the same result if the test were to be repeated. This implies that huge individual differences between test users has an influence in the results. Validity notes whether the result actually reflects the usability issues one wants to test or not, taking into consideration the factors of possible wrong users or wrong time constraints and social influences given to them by the tester. Whereas reliability can be addressed with statistical tests, a high level of validity requires fact measures of real products in real use outside the laboratory evaluation. The simplest form of reliability test is the test-retest procedure, in which the same unit is measured two times at a different timeframe, and then results are correlated. More robust measures focus on measuring the extent to which all individual items correlate with each other. There are a lot of approved ways to measure internal consistency. The most widely method used, is Cronbach’s alpha, which evaluates the homogeneity in the individual items. Validity cannot be accessed directly because there is no knowledge of the true values of construct. [Larsen, 2008, p.1-2] A questionnaire can never really be fully “validated” which means that a questionnaire can have one kind of validity but not another. It can only be validated for an x number of population, under y conditions, and so forth. In this thesis, one way to test the validity of the questionnaire is to correlate its outcomes with the outcomes of the laboratory experiment and with the results of the literature reviews. 35
  • 46. There are a numerous ways to specify validity, some of which were given by Howard (2008, p.1) and are noted below: • Reliability • Validity • Internal validity • External validity • Sensitivity • Specificity • Statistical validity • Longitudinal validity • Linguistic validity • Discriminant validity • Construct validity 3.7 Validity threats related to Experimental Laboratory and Usability testing Laboratory experiments are used to address a wide range of research questions. However, there are various concerns if laboratory findings can be “generalizable” or if they are “externally valid” to the real markets. There is an argument whether lab studies can be changed to reflect better an external environment of interest. Some definitions of external validity demand that qualitative relationship between two variables hold across similar environments. While there may be a dispute on whether there is a promise for quantitative results of an experiment to be externally valid, it cannot be guaranteed that any the qualitative results will present external validity. [Kessler, 2011] Internal and external threats to an experimental laboratory are mentioned below [Heffner Media Group, 2003]: 3.7.1 Internal validity threats 36
  • 47. Internal validity refers to a study that allows the elimination of confounding variables within the study itself. There are eight major threats to internal validity related with this Thesis, which are posed below. History: History refers to environmental events that happen to participants outside of research study which may affect or alter participants’ performance. A special announcement to students of different fields in New York College, at a specific day and at a specific time, may have had effect on the results obtained via a laboratory experiment. Maturation: Maturation refers to the process of measuring something over a repeated number of trials during the experiment that might make the participants feel boring, tired, disinterested, fatigued, less motivated than they were at the beginning of the testing. A way of overcoming this problem is to design a short meaningful post-test questionnaire and experiment. Testing: This is a threat if a group of participants evaluate one product or two groups evaluate two products. The condition is true when the participants perform the pilot test and the test. The reason for this is that participants tend to perform better at any task the more they are exposed to that task. Changing the tasks in the pilot test and in the actual test minimized this threat in this thesis. Statistical Regression: It refers to the tendency of subjects to move toward the mean on subsequent testing even if no extra training were given. This implies that average performance will be higher in a production environment rather than in a experiment. Instrumentation: Test evaluator should be careful on measuring the performance of all participants during the experiment and on validating the selected data from the post- test questionnaire. Ignoring the performance and data given from participants will lead to false results. Selection: The sampling technique selected will affect how representative the sample will be, allowing researcher to make statistical generalizations for a wider population. The population of the lab experiment is selected based on characteristics of certain 37
  • 48. type of users, such as career experiences, Cisco certification, Solarwind certification, and post-graduate degrees. Experimenter Bias: During testing, the test monitor (Moderator) let each participant discovers the solutions to the scenario given on his or her own without any support. Such a way results obtained are more valid and accurate. Moreover, it also enhances users’ confidence and satisfaction, since they solve the scenarios given on their own. The test monitor (Moderator - Dr. Pandithas) researcher has an extensive knowledge of the application, the user interface and test method as well. However, Team Leaders may be sometimes biased toward the desired results. Using a test evaluator who is unaware of the anticipated results, can reduce the impact of relevant bias. This Thesis uses its author as test evaluator who defines which is the cause – effect relationship among variables. Mortality: Mortality refers to the situation where some participants drop out of an experimental group. Imagine a case in which participants with unique characteristics drop out of the experiment due to illness and only low motivated students remain in the team. As a result, one team will have better performance in comparison with the other and this situation could affect the outcome of the experiment. Because, Nagios is tested against predefined standards and not against other products, this threat is not applicable to our experiment. 3.7.2 External Validity Threats External validity refers to the conditions of a study which can lead a researcher to incorrect generalizations. In order to avoid this threat, the study is performed on a sample of the population which is not exactly representative of the actual population of users. Consequently, the experiment was not generalized because the result was not the outcome of industrial practices relevant with the tested software. Demand Characteristics: Making sure that participants don’t know the real purpose of the questions minimizes the possibility of the participants might try to guess what the researcher wants as an outcome and might respond accordingly rather than answering the truth. 38
  • 49. Hawthorne Effects: The presence of researcher during experiment may change the actual performance of participants during the testing. In order to minimize the effect of this issue participants will be told that the system is tested and not themselves. Order Effects (or Carryover Effects): Order effects refer to the situation that some participants may have "learned" what the task tests will be from the pre-test. Thus, would not be anymore representatives of the population who have not been pre- tested. Such a scenario makes the experiment useless, unless the test tasks from the pilot test and the main test are different, which will be done in this experiment. Treatment Interaction Effects: If the subjects are exposed to more than one experiment / training on a network monitoring tool then the findings about Nagios performance and usability will be affected by the previous experience. Since participants will probably have no experience with any monitoring tool in their career life except Nagios, the percentage of participants with familiarity with other tools is minimized. 3.8 Designing Rationale of the questionnaires 3.8.1 Importance of Design Rationale after Literature Review This questionnaire with experts will measure data that crosscheck the findings from the literature about Nagios’ functional and usability characteristics. Experts’ experience, either good or bad, of this monitoring tool will help us to decide if Nagios meet the requirement for a small / medium organization. Objective of each question within the two broad characteristics mentioned is to provide information about significant aspects such as: QUESTION 2: This question intend to foresee expectations of event management & control related with alarms in Nagios in order to investigate the options available such as services’ status and nodes' status. QUESTION 3: The respondents should provide meaningful answers about the capability of setting and managing Alerts in Nagios in order to test its reporting capability. QUESTION 4: The respondents are asked to provide data if Nagios kept them 39
  • 50. informed about faults and their location in the network in order to be verified if Nagios informs properly the root of problem of a network error and pinpoint the exact place of in into the network. QUESTION 5: This question is designed in order to be learned if Nagios meet the needs of a trouble ticket system. QUESTION 6: This question has to specifically measure the ability of Nagios to Monitor and report the health of the devices in the network (e.g. CPU Heating, Server room temp, etc) in order the respondents to verify if Nagios can do what it claims. QUESTION 7: This question is looking at Thresholds' management of Nagios in order to be verified if Nagios meets certain standards on this kind of management. QUESTION 8: A respondent answers this questions about the capability of Nagios to measure and report different performance related attribute such as throughput, delays and packet loss in order to have a better understanding of the monitoring tool. QUESTION 9: Information should be acquired if Nagios could provide statistics of resource utilization about capacity planning assistance so this question is asked in order to be proven that Nagios has the ability to measure the availability of certain hosts, services and links QUESTION 10: This question was intentionally placed in the questionnaire in order to investigate if all the required activities in Nagios have affective logging. QUESTION 11: We were also interested in identifying the fault and performance management capabilities of Nagios by asking this question in case a major network error comes up such as an error in configuration of a VPN connection. QUESTION 12: This question tests correlations between the beliefs of the experts and the findings of the literature review about the motion that Nagios is easy to learn from the beginning. QUESTION 13: The respondents are pinpointed to provide if the monitoring tasks are performed efficiently (quickly) by using Nagios based on the findings in the literature review. QUESTION 14: The respondents are asked to highlight any lack of documentation and community support available for Nagios in order to deal with future training demands for the personnel of a company. 40
  • 51. QUESTION 15: The experts should give their opinions if they believe that the interface design of Nagios (including how to interact e.g. using keyboard, mouse. and commands) has weaknesses which implies that Nagios may have functional usability disadvantages QUESTION 16: The reason for this question is that experts should be expressed with their opinion about their satisfaction about the overall functionality of Nagios and if their beliefs are verified according our findings in the literature review. 3.8.2 Design Rationale of the Post Test Questionnaire The post-test questionnaire (Appendix K) was divided into eight questions. We asked the respondents to verify via the questions the experience earned from the task scenarios listed in the Appendix G. We considered that these eight questions show the reasoning why Nagios is useful for a medium / small organization. Question 1 allows respondents express themselves about the easiness to use of Nagios installation. The responses of question 2 provide information about respondents’ considerations about the availability reports in Nagios. The respondents considered how easy to configure Nagios to monitor NTP Server in question 3. Question 4 shows the importance to monitor a network host on a regular basis which reflects the reasoning of the selecting Nagios as monitoring tool. Question 5 indicates to the respondents the awareness of monitoring network servers via Nagios. Since our respondents complete the task list, question 6 ask them to judge if the on-screen information and the organization of menus of Nagios is useful. Question 7 looks for the easiness to use of Nagios for new users by asking the respondents. Finally, question 8 investigates the overall architecture design of Nagios which means the complexity of the monitoring tool to be configured in order to perform any task. 41
  • 52. 4. 0 Chapter 4: Development 4.1 Network Design This chapter refers to the implementation of the Nagios network management system. It will be demonstrated how Nagios can monitor a number of network hosts and associated services located in these hosts. The goal can be fulfilled by building a test network and implement Nagios’s object configuration files to monitor the network. Scenario: An organization Headquarters is connected with its branch site via an IPsec VPN. Moreover, EIGRP routing protocol is configured between sites by implementing Generic Routing Encapsulation (GRE). The router in Headquarters is a 42
  • 53. router on-a-stick with the DLS1 switch which is connected with ALS1 and ALS2 switches. The HQ router assigns IP addresses to all three switches. The intent is to prove that Nagios monitoring tool is the suitable solution for the needs of a small / mid-size enterprise. Note: Although this project involves configuration of Network Address Translation (NAT), IPsec VPNs, and GRE, the detailed explanation of those technologies are out of the scope. Note: The required telecommunication infrastructure for this project are Cisco 3745 routers with Cisco IOS Version 12.4(15) T14 and the Advanced IP Services image C3745-ADVSECURITYK9-M Note: The image C3745-ADVSECURITYK9-M is also used in three “switches” (DLS1, ALS1 and ALS2). However, a switch card is added visually via GNS3 so that those three router to act as actual “switches” for the demand of the assignment. The only difference with the actual configuration that will be tested in NYC Campus is that the ip route 0.0.0.0 0.0.0.0 [ip address] command will be used instead of ip default-network command. 4.1.1 Required Resources It includes: 3 Routers, 3 switches, Serial and console cables and two personal computers. 4.1.2 Topology Diagram 43
  • 54. 44
  • 55. 4.1.3 Addressing Table Device - Hostname interfaces IP Address Description HQ FastEthernet1/0.1 198.168.10.33 Connection to Vlan 1 FastEthernet1/0.100 198.168.10.65 Connection to Vlan 100 FastEthernet1/0.200 198.168.10.97 Connection to Vlan 200 Serial2/1 209.165.200.226 Connection to ISP Loopback0 10.10.20.238 HQ email server address Loopback1 10.10.10.1 Connection to DNS Tunnel0 172.16.100.1 Connection to Branch Branch Serial2/1 209.165.200.242 Connection to ISP Loopback1 192.168.1.1 Branch LAN Tunnel0 172.16.100.2 Connection to HQ ISP Serial2/0 209.165.200.241 Connection to Branch Serial2/1 209.165.200.225 Connection to HQ Loopback1 209.165.202.129 Simulating the internet DLS1 Vlan1 198.168.10.34 Connection to Vlan 1 Fa2/4 Connection with Nagios / VLAN 100 / IT Fa2/9 Connection with ALS2 via etherchannel 2 Fa2/10 Connection with ALS2 via etherchannel 2 Fa2/11 Connection with ALS1 via etherchannel 1 Fa2/12 Connection with ALS1 via etherchannel 1 Fa2/0 Connection to HQ ALS2 Vlan1 198.168.10.36 Connection to Vlan 1 Fa1/7 Connection ALS1 via etherchannel 3 Fa1/8 Connection ALS1 via etherchannel 3 Fa1/9 Connection DLS1 via etherchannel 2 Fa1/10 Connection DLS1 via etherchannel 2 Fa1/15 Connection with 45
  • 56. VLAN 200 - USERS ALS1 Vlan1 198.168.10.35 Connection to Vlan 1 Fa1/7 Connection with ALS2 with etherchannel 3 Fa1/8 Connection with ALS2 with etherchannel 3 Fa1/11 Connection with DLS1 with etherchannel 1 Fa1/12 Connection with DLS1 with etherchannel 1 Fa1/9 Connection with VLAN 200 - USERS 4.1.4 Network Implementation The lab will be implemented with GNS3 software which gives the opportunity to set up virtually Cisco routers and switches by using actual Cisco IOS software. The good thing with GNS3 is that it gives the ability to insert to it virtual PCs created by Oracle VM Virtual Box. Thus, an actual host running Ubuntu or Windows OS (Operating System) can be added. For the requirements of this lab, Nagios will be implemented in Ubuntu. Next, Nagios will be able to connect to the network in order to monitor hosts and services. It should be mentioned that a detailed explanation of the Network Implementation is out of the scope of this chapter even if it is presented in detail in Appendix E with Nagios configuration options as well. Step 1: Insert Cisco IOS and VirtualBox to GNS3. It should be mentioned that prior to any configurations made on router and switch, virtualboxes have to be inserted. This can be done by selecting Edit---> Preferences----> VirtualBox from GNS3 and pressing Test Settings. Then, the message “vboxwrapper and virtulabox A.P.I 4.3.8 has successfully started” appears. This message indicates that a virtual machine can now be inserted into the GNS3. By pressing “Apply” the procedure is finished. 46
  • 57. By selecting “VirtualBox Guest” tab, the VirtualBox - Virtual machines needed to insert to GNS3 can be defined. Be pressing “Apply” the relevant procedure is finished. 47
  • 58. From the “End devices” panel, by dragging and dropping the desired VirtualBox guest can be selected. The next step is to insert the Cisco IOS image that will be used by virtual routers and switches of the lab. Press the button “...” near the “Image file:” in order to locate the Cisco IOS image from the hard disk and finally by pressing “Test Settings” and “Save” the procedure is completed. This implies that the tab “IOS Images” is selected form the “IOS images and hypervisors”. 48
  • 59. Step 2: Set up the routers by configuring their hostname and interface addresses. A. Assigning the network cards to the router and “switches” as it is presented in the following screenshots. 49
  • 60. B. Cable the network as presented in the topology diagram. Assigning IP addresses to the interfaces on Branch, HQ, and ISP. C. Examine the status of the interfaces with show ip interface brief command D. A default static route should be applied on the Branch and HQ routers in order to reach ISP router. E. Verify connectivity with ping from the Branch LAN interface to the serial 2/1 interface of the ISP, the ISPs loopback interface, and the serial 2/1 interface of the HQ. F. Verify Connectivity from the Branch router to the ISP’s serial 2/1 interface, the ISP’s loopback interface, and the HQ serial 2/1 interface. Initiate pings sourced from the loopback interface to see if it has successfully reached those external addresses. The pings fail because the source 192.168.1.1 IP address is an internal private address, and the ISP is unconscious of this address. 50
  • 61. Step 3: Apply NAT on the Branch and HQ routers The HQ and Branch sites has been supplied by the ISP with pools of public addresses in order hosts with private IP addresses to access the web by using NAT. A static NAT has to be configured to the HQ site so that the email server with public ip address of 209.165.200.238 will be available to mobile users and Branch office users. The commands show ip nat statistics and show ip nat translations can be used to confirm the configuration of the NAT. Verify if NAT traffic exists by pinging the ISP’s serial 2/1 interface, ISP’s loopback, the HQ serial 2/1 interface and the HQ public email server address having as source address the Loopback interface of the Branch. Once again, the commands show ip nat statistics and show ip nat translations verify if the NAT operates properly. Before verifying the connectivity from Branch LAN to the HQ LAN interface, NAT translations have to be cleared. Then, the command show ip translations is required to display any NAT translations. Branch# clear ip nat translation * Branch# Branch#ping 10.10.10.1 source 192.168.1.1 Branch# show ip translations Branch# The ISP cannot route the traffic from Branch LAN to the private addresses of HQ router so the NAT is not working. The solution to this problem will be the IPsec VPN. Step 4: Configure an IPsec VPN to connect the Branch and HQ routers. For this assignment, an IPsec VPN configuration has been provided, in order to assure and protect all unicast IP traffic within it. Several configurations have to be applied if interior gateway protocols which support multicast or broadcast traffic must be encapsulated within IPsec VPN unicast packets. The configuration of the IPsec 51
  • 62. VPN on the Branch router can be verified by the show crypto session detail command. Step 5: Implement GRE over IPsec. GRE tunnel over IPsec will protect all corporate LAN traffic between the Branch and HQ sites. The GRE tunnel can be enabled to send multicast and broadcast traffic for a dynamic routing. The show interface tunnel 0 command verifies that the tunnel is active and the tunnel protocol is GRE over IP. Step 6: Apply VLAN trunking on Fast Ethernet interface of the HQ router. Implement three sub-interfaces for the intended three VLANs. Configure each sub- interface with the proper trunking protocol, description and IP address. The show ip interface brief command checks the status and the interfaces’ configuration. Step 7: Configure basic switches parameters A. Set password and username for the privilege mode and set them to be the username and password for line vty and line console. B. Assign for all three switches the management IP addresses on VLAN 1 and set the default gateways to all three switches: ALS1, ALS2 and DLS1. Step 8: Configure DLS1 for trunking with the HQ router Configure switch DLS1 interface fast Ethernet 2/0 for trunking with the HQ router Fast Ethernet interface 1/0. Step 9: Configure trunks and Etherchannels between switches Define the EtherChannel and the trunks ports: A. From DLS1 to ALS1. B. From DLS1 to ALS2. C. From ALS1 to DLS1. D. From ALS1 to ALS2. 52
  • 63. E. From ALS2 to DLS1. F. From ALS2 to ALS1. G. By using show interface trunk command, it can be confirmed whether trunking is enabled on DLS1, ALS1 and ALS2. Step 10: Configure DHCP pools and define DHCP excluded-addresses on HQ router. Step 11: Configure VTP on ALS1, ALS2 and DLS1 Step 12: Configure Ports and verify port status Step 13: Verifying if the two DHCP pools are working Step 14: Configuring SNMP & related Access-lists on Router / Switches Finally, SNMP in the Routers / Switches is configured, in order to allow Nagios to get their data. This allows those data to be displayed by Nagios. Additionally, it is vital to set up access lists in each router / switch in order Nagios to have the privileges to acquire the data as it has been mentioned. Step 15: Configuring HQ as NTP server 4.2. Active Monitoring Nagios directly monitors the services of each agent of the agent itself by using plugins. This type of monitoring is called Active. Plugins can be used by logic when the state of a host or service should be monitored. Logic can be, in turn, used by Nagios daemon to get the information required. Nagios has an embedded Perl interpreter to interpret a plugin which is in the most cases a shell script that inspect a host or service status. Nagios daemon sending notifications when receive the results of the checks from the plugins. check_interval and retry_interval specify the the frequency of these checks which are responsible for defining the status of hosts and services. Steps to be followed for configuration are: Step 1: Identify the network that needs monitoring. Step 2: Select the IPs addresses of each hardware since nagios pings IP addresses. 53
  • 64. Step 3: Identify the network Services that will be monitored by Nagios. Step 4: Implement the configuration files for every agent which represent a network or service and name every related file with the extension .cfg. Step 5: Each service should be defined in the command.cfg file and every host should be also defined in the nagios.cfg Detailed explanation of the configuration files for the identified host and services of this assignment is given in the Appendix E. The involved building configuration files for monitoring are:  HQ  Branch  ISP  DLS1  ALS1  ALS2  NTP Server  Telnet 4.3. Passive Monitoring Passive monitoring is required in the IT industry when private information of a Server / PC host, such as number of its users, its load, and the total number of its processes cannot be retrieved. The steps should be undertaken by authorized personnel when a Server / PC host does not meet specific performance criteria. This kind of procedure is mandatory because active monitoring check the running services on the hardware. The data cannot be retrieved just by using the TCP/IP protocol. Thus, installation of daemon on the client side is required if having administrative privileges. The OS of the client is a Windows so NSClient++ agent should be running on it. Detailed configuration of how Passive monitoring is applied in Nagios can be seen in Appendix E. 54
  • 65. 5.0. Chapter 5: Evaluation 5.1. Network Elements that Need Monitoring Small / Midsize organizations monitor continuously several of their networking infrastructure elements as it is outlined below [Zoho Corp, 2010, p.1-2]:  Email Servers: IT Managers should endure business continuity with the external world via an Email server because the lack of email distribution system may lead the organization to financial loss. Key metrics for an email server are availability, mails in queue and size of received emails.  WAN links: An organization is run smoothly in terms of network performance when WAN link(s) is not over utilized. A network monitoring tool should detect congestion, high response time and potential discards. Even though optimizing the WAN links is crucial, IT Managers have to set thresholds on routers and switches in order to ensure availability and performance of their LAN interface as well.  Business Applications: Services such as FTP, DNS, ECHO, IMAP, LDAP, TELNET, HTTP and POP, are running on critical applications. Therefore, these services and their applications should be monitored along with CPU, memory and disc space monitoring. Furthermore, server’s traffic utilization has to be monitored as well as applications and services located on them.  LAN Infrastructure: Network devices such as switches, printers and wireless devices are core elements of the network of one organization. Therefore, they should be operational. After having clarified which network infrastructure requires monitoring, the seven stages of the evaluation phases for which the project manager of this thesis will be responsible should be mentioned [Kantner, 1994, p.3]:  Planning the test.  Designing the test activities.  Recruiting participants.  Preparing the test materials. 55
  • 66.  Setting up the test environment.  Conducting the test.  Compiling the test results. 5.2. Planning the Test “Planning the test” stage defines the goals, methodology, participant selection requirements, working procedure, schedule, and resource requirements for the test session. Moreover, the network topology is defined. Furthermore, Ubuntu and Nagios Server are required to be installed in advance. More information is available on Appendices C and D. Six participants were involved individually in the test, requiring about one hour per session over one day. Moreover, 15 minutes of extra time were allowed for participants to fill out the post-test questionnaire. A formal break of 15 minutes was available between the participants’ sessions. Team members introduced to participants the task - scenarios which they should implement - solve. The participants were basically left to accomplish by themselves the tasks that were asked. A time limit was not specified, but the participants were encouraged to try to solve all tasks without help from the team members. More information is presented on section “Experimenter Bias” in page 40. The test monitor used a laptop to write down any significant comments such as task completion date and whether the tasks were completed successfully. After all sessions were completed, the test evaluator analyzed all data that were extracted from the test sessions. Finally, a master list of usability issues was developed based on those data. 5.3. Designing the Test Activities One of outcomes of this stage was a task list that described in details the tasks and relevant issues. Each task should be completed within a predefined time limit. At that time, team members designed the post-test questionnaire for the participants so that 56
  • 67. screening could start right away. This kind of questionnaire was created on the basis of the objectives of testing, in order to work in conjunction with the findings of the test sessions. Next, team members reviewed the design of post-test questionnaire and task list. The purpose of this was to determine technical priorities for Nagios. More information about Post –Test Questionnaire, is available on Appendix K. 5.4. Recruiting Participants A parallel activity to the “Designing the Test Activities” and “Preparing the test materials” is the selection of participants. All of them are students at New York College. More information is available on section 3.7.1 “Selection” in page 40. The profile of participants, which was based on their academic and professional background, is available in Appendix F. Dr. Pandithas served as test monitor (moderator) in all six test sessions. The author of this paper acted as test evaluator by compiling the results of the tests. 5.5. Preparing the test materials A “Welcome” form was presented to participants by the test monitor (moderator) in order to minimize possibilities of misunderstanding and in order to explain the purpose of test sessions. More information is available in Appendix G. A convenient form was available to participants for recording quick notes about tasks after the test. It was ensured that important topics were noted consistently without the need of viewing videotapes. More information is available in Appendix I. Five tasks were developed for participants to perform during the test sessions. All the tasks remained the same in all sessions. More information is available in Appendix H. 5.6. Setting up the test environments A state-of-the-art usability laboratory (Room B6 of New York College) was customized for the usability test sessions. This room was chosen because the participants, who are students at New York College, would feel comfortable in that room. Telephones had to be deactivated during test sessions, in order to prevent any 57
  • 68. distractions. More information is available in section “History” of section 3.7.1 in page 40. Participants were informed about the time and place of test session in order to be available to attend throughout the sessions. Team members (test monitor and test evaluator) have informed their co-workers, colleagues and friends that would be unavailable during the test sessions. The sessions were scheduled to allow some break time between the six participants. 5.7. Conducting the test Team Leaders ensured that all materials in each envelope were labeled with the participants’ IDs. It was assured that nothing may have been removed from the envelopes of participants at any circumstance. The post – test questionnaire gave participants the opportunity to categorize any identified problems in the “Severity” column as ‘Important’, ‘Medium’ and ‘Minor’. Moreover, they were encouraged by the Team Leaders to pinpoint any location to the post – test questionnaire where Nagios encountered any problem and to explain how it influenced the completion of the task. It is implied that the Test evaluator informed the participants using the appropriate code of conduct. More information about personal characteristics of the participants during post – test questionnaire, is available in Appendix K. 5.8. Compiling the test results Usability can be increased by positive exploitation of the final results according to a detailed test report which will be presented in Chapter 7. The data deriving from questionnaire’s responses was categorized based on a list of the usability problems reported by test sessions, which are presented in Chapter 6. The author of this thesis conducted the analysis of data, which required approximately a total of 20 working hours.  Task Completion The six participants completed on average of 4 out of 5 tasks. Tasks 1, 2 and 3 were 58
  • 69. completed by all participants. Task 4 was the most difficult, since only two of the six participants accomplish it. Finally, Task 5 caused difficulties in two out of six participants.  Task Completion Time Team Members recorded the time that each participant spent on completing each tasks. The results of test sessions are presented in Table 14 reporting time that was spent for each task. Table 14: Presents the average time spent by all participants in each task against a predefined baseline. Tasks Total Time Baseline Task 1 12 minutes 14 minutes Task 2 6 minutes 7 minutes Task 3 2 minutes 2 minutes Task 4 12 minutes 12 minutes Task 5 7 minutes 10 minutes Cumulative 39 minutes 45 minutes  Number of Usability Problems Identified Two usability problems were identified in the five usability test sessions. Based on the “Severity” column in Appendix I and on the descriptions of errors, the errors were categorized as Important, Medium or Minor. Table 15: Numbers of the usability problems identified during the testing Tasks Usability problems Important Medium Minor Cumulative Task 1 0 0 0 0 Task 2 0 0 0 0 Task 3 0 0 0 0 Task 4 1 0 0 1 Task 5 1 0 0 1 Cumulative 1 0 0 Note: On Command.cfg no rule was written by default for the command check_ntp, check_ntp_time and check_ntp_peer as it had for other plugins. The web interface of Nagios presented a critical error, although the service for monitoring the NTP Server 59