1. “Monitoring Networks Using Nagios”
Eleftherios Iliopoulos
University of Greenwich
M.Sc. Course: Internet Engineering and Web
Management
Master Thesis
15September 2014
Dissertation Submitted to the University of Greenwich in partial fulfillment of the
requirements for the degree of Master of Science in Internet Engineering and Web
Management.
2. UNIVERSITY OF GREENWICH
Date:
September
2014
Author:
Eleftherios Iliopoulos
Course Program: Master Thesis
Name of assignment: Monitoring Networks Using Nagios
Instructor: Pandithas Ioannis Pages: [75]
Supervisor: Pandithas Ioannis
ABSTRACT
A specific network management strategy is required for keeping a network stable. If a
company’s strategy includes a network monitoring tool, it will be known any time when
network devices and services are in jeopardy. By preventing pro-actively network
downtimes that may be caused by many reasons such as mis-configured hardware or
software, a company increases network uptime, it provides better business services and it
improves business continuity. Therefore, the profitability of the enterprise is guaranteed.
The objective of this thesis is to investigate if Nagios is a suitable open source monitoring
tool for small – midsize organizations. An overview in the literature section of FCAPS
framework against specific Nagios’ functionalities is presented. Moreover, a concise
description of Network Management and the technologies behind a monitoring tool are
presented. Furthermore, the implementation of Nagios in detecting and responding to
faults in a network is described. Relevant alerts to notify the system administrator
accordingly are presented. With the use of laboratory experiment, it is proven that Nagios
monitor a network effectively. Current research includes evaluation of Nagios under
testing conditions and the relevant configuration files are presented.
Keywords Nagios, Network monitoring tools
ii
3. ACKNOWLEDGEMENTS
I consider this as a great opportunity to thank Dr. Giorgios Papamichail
who gave me this opportunity to work under his supervision. I also thank
Dr. Ioannis Pandithas for his important suggestions which helped me a lot
for the successful completion of this assignment. It is not possible for me,
to express in words, my thankfulness to my family that always
encouraged me to achieve my goal. I am also thankful to Greenwich
University for providing me the opportunity to obtain quality education.
4. ABBREVIATIONS
CEO Chief Executive Officer
CGI Common Gateway Interface
CMIP Common Management Information Protocol
CMISE Common Management Information Service Element
CPCC Central Piedmont Community College
DDOS Distributed Denial of Service
FCAPS Fault, Configuration, Accounting, Performance and Security
FTP File Transfer Protocol
GUI Graphical User Interface
HTTP Hypertext Transfer Protocol
IT Information Technology
IP Internet Protocol
MIB Management Information Base
NOC Network Operations Center
NMS Network Management System
OID Object Identifier
OS Operating System
POP3 Post Office Protocol 3
SMTP Simple Mail Transfer Protocol
SNMP Simple Network Management Protocol
SSH Secure Shell
TBFG Task-Based Focus Group
TCP Transport Control Protocol
UDP User Datagram Protocol
5. LIST OF FIGURES
Page
1. Figure 1. Typical network management Architecture [Online image].
http://www.cisco.com/en/US/docs/internetworking/technology/handbook/NM-Basics.pdf
[Accessed 13th
March 2014)].............................................................……........10
2. Figure 2. Characteristics of SNMP protocol in use: v1, v2c and v3. [Online
image]. < http://www.cse.wustl.edu/~jain/cse567-06/ftp/net_traffic_monitors2/index.html
>. [Accessed 22th July 2014]…..
…………………………………………………………………………......12
3. Figure 3. A simplified SNMP architecture. [Online image].
http://jmiller.uaa.alaska.edu/cse465-fall2012/papers/fiang2002.pdf [Accessed 13th
July
2014)]................................................…………………………………...………. ....13
4. Figure 4. Performance methodology flow. Usability Testing. P.192. Carnegie
Mellon University. [Online image]. http://www.cs.cmu.edu/~cprose/LTI-6-
UsabilityTesting.pdf [Accessed 13th
July 2014)].………………….……...................34
6. Table of Contents
1.0. Chapter 1: Statement of the Problem and Research Aim…………...….…..1
1.1. Introduction.................…………...….………………..…………...…………..1
1.2. Statement of the Problem.….……………...…..……..…………...……………1
1.3. Research Aim and objective………..…………………….……………......…..4
1.4. Research Questions…..………….…….………………….………………........5
1.5. Organization of Study………...……………..………….………………...…....6
2.0. Chapter 2: Literature Review………...…………..………...…..….……........7
2.1. The Definition of Network Management….………… …….…..…...…….......7
2.2. Network Management Architecture…..…...………….…………..….…......…7
2.3. Network Management Protocol ………………………………………...….....8
2.3.1 SNMP ……………………………..…………...………..……….…...………8
2.3.2 SNMP Messages Types ………...…………………….………...….……..…..9
2.3.3 SNMP and UDP ……………....…………………..….……………….…...…10
2.3.4 SNMP Management Information Base (MIB)....………....…..……….……...11
2.4. Functional Division of Network Management ……….……………….....…...12
2.4.1 Fault Management ……………..…………...………..….....…….…...………12
2.4.2 Configuration Management ……….……………………..……...…....…....…13
2.4.3 Accounting Management ………...………………….….………………....….14
2.4.4 Performance Management.............................…………..……….…………….14
2.4.5 Security Management ………….………………………….……..…………...15
2.5. Choosing systems management tools ………….……………………....……..15
2.6. Network Monitoring tasks ….................…………….………………………..16
2.7. Comparison Nagios against Industry Standards…..........………………….….21
2.8. The selection of Nagios …………………….……………...………………....28
2.9. Validating the Literature review outcome…........……….….…………….…..28
3.0. Chapter 3: Methodology……………………………….……….……….......29
3.1. Overview………………………………………………………………...…….29
3.2. Experimental Laboratory……………....………………..……………..………31
3.3. Usability Testing………………………………………...…………..……….…31
7. 3.4. Performance Measurement ……..……………….…………………….………..32
3.5. Questionnaire…………………………………………………………...……….33
3.6. Validity threats related to Questionnaires...……………………………..………35
3.7. Validity threats related to Experimental Laboratory and Usability testing….….37
3.7.1. Internal validity threats..…….…………………………………………….…..37
3.7.2. External validity threats..……………………….……………………………..38
3.8 Designing Rationale of the questionnaires...................................................... 39
3.8.1 Importance of Design Rationale after Literature Review................................ 39
3.8.2 Design Rationale of the Post Test Questionnaire............................................ 41
4.0. Chapter 4: Development…................…………………………………………43
4.1. Network Design.......................................…….…………....……………....……43
4.1.1 Required Resources...............................…….…………....……………....……43
4.1.2 Topology Diagram................................…….…………....……………....……43
4.1.3 Addressing Table...............................................…….…………....……....……45
4.1.4 Network Implementation...................................…….…………....……....……46
4.2. Active Monitoring….…...........…………………………………………...…….53
4.3. Passive Monitoring...............…...…..……………………………………...……54
5.0. Chapter 5: Evaluation…....................………………………………………....55
5.1. Network Elements that Need Monitoring…….…………....……………...…....55
5.2. Planning the Test………..………………………………………………...….....56
5.3. Designing the Test Activities……………………………………………...…....56
5.4. Recruiting Participants……………………………………………………….....57
5.5. Preparing the test materials…...…………………………………………...…....57
5.6. Setting up the test environments..………………………………………...…….57
5.7. Conducting the test…..…………………………………………………...……..58
8. 5.8. Compiling the test results…..………………………….…..……………...….....58
5.9. Funding Considerations………………………………….………………...…....60
5.10. Timetable……………..……..…………………………………………...….....61
6.0. Chapter 6: Findings…................................…………………………………....63
6.1. Results From Questionnaire…….……………………………….………...…....63
6.2. Usability Evaluation………..………………………………...…………...….....65
6.2.1. Effectiveness…………………….…………………………….…….…...…....65
6.2.2 Efficiency………………………………………………………..…….…….....66
6.2.2.1.Result of Satisfaction from Post – Test Questionnaire…..……………..…....66
7.0. Chapter 7: Conclusion and Future Work…....….................………..…….....68
8.0. Chapter 8: List Of References……..........................………….……………....69
APPENDICES…….........................................................………….………………..76
APPENDIX A: Information about interviewees.........................................................76
APPENDIX B: Nagios Evaluation Questionnaire .....................................................78
APPENDIX C: Ubuntu Installation…….....................................................................82
APPENDIX D: Nagios Installation on Ubuntu………...............................................92
APPENDIX E: Network Implementation and Nagios Configuration........................97
Network Implementation............................................................................................97
Nagios Configuration................................................................................................122
1. Active Monitoring….…...........……………...……………………………...…...122
1.1 Monitoring Routers..............……………………...………………………...…..125
Monitoring HQ....................…………………………………………….....…125
Monitoring BRANCH.........…………………………………………….....…131
Monitoring ISP....................……………………………………………....….133
9. 1.2 Monitoring Switches................…………..…………………………………....138
Monitoring DLS1.................…………………………………………….....…138
Monitoring ALS1.................……………………………………………...…..143
Monitoring ALS2.................…………………………………………….....…146
1.3. Passive Monitoring...............…...…..…………………………………….....…149
1.4. Monitoring Network Services......………………………………………...……162
Monitoring NTP Server...........………………………………………...…..…162
Monitoring Telnet...............…………………………………………….....…163
APPENDIX F: Information about interviewees in the Laboratory Experiment......168
APPENDIX G: Test Script…………........................................................................169
APPENDIX H: Task List………………...................................................................170
APPENDIX I: Note Form……………….................................................................171
APPENDIX K: Post – Questionnaire..……..............................................................172
APPENDIX L: Gantt Chart.......................................................................................175
APPENDIX M: Source Code…………....................................................................180
Nagios Scripts.......................………….....................................................................180
Script of ALS1......................………….....................................................................180
Script of ALS2......................………….....................................................................182
Script of DLS1......................………….....................................................................186
Script of ISP_LOOPBACK1.....................................................................................189
Script of SERIAL0...............………….....................................................................190
Script of SERIAL1...............………….....................................................................191
Script of WinServer.......................…………............................................................194
Script of BRANCH...................……….....................................................................196
11. 1.0 Statement of the Problem and Research Aim
1.1 Introduction
Today’s complex network infrastructures are becoming critical components for the
business success of an organization whether it is local or multinational. While
network availability is a crucial element for a successful organization, sometimes it
may lead an organization to business failure. Networks include hundreds or thousands
of critical devices required for the successful operation of a business. Therefore the
availability of hardware and software related to network functionality is essential.
Managing the state of hardware is a serious task since critical business services
depend on it. Clients and employees cannot perform transactions if network becomes
unreachable resulting in productivity and profit reduction. Basic operations such as
printing or sending emails are not feasible without network support. Moreover,
incorrect changes in configuration caused by a junior administrator may have rippling
effects on the health and availability of the network infrastructure. Therefore, acting
pro-actively as a member of IT (Information Technology) department, in order to
verify smoothly operation of network infrastructure, is important for securing
customers' satisfaction.
There is need for higher performance in availability of network support; in order to
allow businesses to operate more fluently. The goal of higher performance can be
achieved with active monitoring of networks in order to aid the identification and
prevention of networking failures. Thus, the role of an IT manager is critical in
promoting actions such as provisioning of network services, backup / restoration of
device configuration; automate event correlation, problem isolation and problem
resolution for greater network reliability. However, the reasons of problems faced are
not always cleared and issues such as power outages or other external events cannot
be prevented. IT manager’s goal is to gather, understand and act based on information
such as performance statistics. In such a way, they can reveal problems in IT
infrastructure that causes problems in the availability of network in the near future.
1.2 Statement of the Problem
1
12. Network management practices have changed through the years. Thus, new tools and
strategies are required in many organizations. IT departments have to evolve from
reactive to proactive in the process of network management. Modern business
requires changes in organizational design and realignment of IT department.
Centralized management of network via monitoring tools inspires stuff to support
vividly networking technologies throughout the organization.
The first case describes the choices for network management tools and reveals the
associated cost included in selecting any monitoring tool. The second case involves
the ways in which management tools are helping IT departments to arrange some of
the key challenges faced by network experts. The third case refers to the changing
role of networking within a modern business and the following change in the
requirements networking professionals have to fulfill in order to implement new
technologies and obtaining new abilities. The final case discusses the concept of
service monitoring as the prerequisite for the selection of a monitoring tool.
In the last decades, communications technologies have increasingly undergone a
revolution. The fast emergence of multiple protocols (and applications) and the
development of equipments from multiple vendors enhance the complexity of the
centralized management solution. The reason is the high level of heterogeneity in
underlying equipments. The problem derives from the fact that the equipments from
different vendors operate with different proprietary management protocols and
implement heterogeneous management data models. Under these circumstances,
employees of IT department have to deals with extra cost generated. Thus, it is
required by the network administrators to deploy multiple expensive management
platforms in order to manage the entire network. This fact will continue to exist
unless the CEO of the IT department stops thinking that buying hardware from
different manufacturers will help to minimize the risk of dependency from one
manufacturer and reduce the expenditure of purchasing relevant hardware from the
market. Consequently, network administrators have to use different monitoring tool
according to the network equipment being used. Even if several supervision tools for
proprietary management protocols can improve monitoring, an additional problem
2
13. raised is their functional limitations when new types of components are introduced in
the network. It is not rare phenomenon to see in practice, a system expert to try to
find an appropriate solution to a technical problem derived from a monitoring tool
which is implemented to work with a proprietary management protocol. [Kora, 2012,
p.1199]
The next topic relates to infrastructure that requires monitoring. Today, even on a
small organization that have been operating with the same organizational structure for
many years and with the same number of users that they can find it difficult to deal
with an infrastructure that is growing fast. The growing numbers and types of devices
in today’s business environment enhance the effectiveness and productivity of the
entire organization because work can be achieved across greater distance. Now, a
sales agent can meet a client outside the organization in order to close a deal so he
wants to be able to access his email account with his personal smartphone. Therefore,
this modern practice that contributes to business success cannot be achieved without
network growth. Traditional, manual management functions seem to be out of date
compared with the size of today’s infrastructures. However, an expanded network is
not just a greater version of the network the company previously had. Therefore, the
infrastructure must be supported and managed based on the new requirements.
Moreover, the likelihood of a network outage caused from a human error along with
the network complexity increase the concerns for availability, reliability, performance
and security. In other words, the more the company is expanding in numbers of
devices and volumes of data transferred, the more the demand on bandwidth is
increasing, and the more the number of solutions required supporting network
management functions. [IBM, 2012]
IT managers want employees with additional skills in order to better align network
operations with business requirements. Changing operations skills such as (1)
Implementing mobile, UC, and TelePresence (2) Designing complex networks for
applications (3) Tracking threats, protecting data, and providing network access
control (4) Reporting on application and user SLAs (5) Troubleshooting content and
performance issues, indicate a definite shift that is required in the networking team.
3
14. The original demand to configure router or switches has expanded into a requirement
to configure advanced application-oriented network software. Nowadays, it is
essential to become much more proactive, due to the specifications related to tracking
and protecting data and controlling access. Hardware - oriented metrics like
availability and up-time reports have to be expanded with metrics related to
applications and user SLAs. It is an important matter if a network link goes
unavailable because it is priority matter that users and applications should be
operational. The networking team should be able to conduct analysis and troubleshoot
any problem. This is a vital business requirement and it pinpoints that new abilities
have to be obtained at a faster rate than previously. Network experts believe that
organizations are struggling with gaps in technology, in personnel abilities, and
number of employees. In order to address the gap between existing and required
abilities of networking experts possess, a good network management tool is critical to
close the gap discussed before. [Shiao, 2008]
Service monitoring is usually confused with single-purpose custom monitoring
because it does not appear often in literature. Even if service monitoring, in its
simplest form, can be described as the development and deployment of a wireless
network, including a Perl script written to monitor the wireless network and
associated services or establishment of a connection on a port, it can perform tasks
and present the results within the context of a complete infrastructure using advanced
features. Little or no extra effort is required in order to write a variety of tests using a
Perl script to monitor the availability and connectivity of a service. A slightly more
meaningful test would be to check a service response, for example checking the status
code returned by a FTP (File Transfer Protocol) server. In terms of monitoring, the
selection of a monitoring tool should rely on the services being monitored and the
related objectives. [Silver, 2009, p.9]
1.3 Research Aim and objective
This Thesis implements Nagios network monitoring tool and evaluates Nagios on
the basis of how fast it can perform network monitoring without forgetting the fact
that Nagios is free of charge. The goal is to make suggestions that will act as
4
15. blueprints for improving the functionalities and usability of a system. The objectives
of current research are:
• Proving by examining real life cases that Nagios is a suitable choice of network
monitoring for a small / medium enterprise.
• Design a lab environment using pc, routers, switches and servers virtualization for
usability testing.
• Investigate how well Nagios addresses relevant functionalities by conducting and
analyzing laboratory experiments. Participants on those lab tests will be employees
of IT departments with relevant task at work.
• Outline the suggestions based on the analysis of all empirical data collected by
current research.
1.4Research questions
The RQ (Research Questions) of this research are the following:
RQ1: What is the basic theory behind a network monitoring tool?
RQ2: What are the technical (functionalities) and non-technical criteria on choosing a
network monitoring tool based on theoretical frameworks and industry standards?
RQ3: How effectively does Nagios perform the network monitoring functionalities,
theoretical frameworks and industry standards? Is Nagios suitable for small / midsize
organizations?
RQ4: How effectively does Nagios Core 3.x satisfy a small / midsize organization
in practice?
• It is very important to answer RQ1, because when a research is made for a
monitoring tool one important thing is to take into account is what technologies have
been performed by a Network Monitoring System.
By answering RQ2 what is a standard monitoring functionality at the moment will
5
16. be defined, based on FCAPS which will also be presented. This list will be useful to
people who need to implement a monitoring system and understand what is required
to implement.
While answering RQ3, how well Nagios can perform monitoring tasks so that it is
financially beneficial for every organization will be answered by the evaluation that
will follow.
By analyzing the functional benefits of Nagios, suggestions will be made, which can
serve as a guideline to improve the functionalities and usability of network
monitoring tool, which will be the outcome to RQ4.
1.5 Organization of Study
This Thesis is structured as follows: Chapter 2 gives a brief overview of network
management including the significance of SNMP (Simple Network Management
Protocol). It covers what an open source monitoring tool should include and gives
an insight of its functionalities. Moreover, reasons why specific methodologies are
preferred during the various stages of the thesis are explained in Chapter 3. Chapter
3 mentions issues such as Experimental Laboratory, Usability Testing and
Performance Measurement along with related validity threats. Τhe development
process is examined in Chapter 4. Chapter 5 outlines the evaluation of Nagios via a
set of predefined tasks. Chapter 6 analyzes the findings of this research, selected by
interviewing networks experts using questionnaires in order to crosscheck the RQ2.
Moreover, an analysis is shown of the post-questionnaires used before Experimental
Laboratory. Chapter 7 suggests actions for future improvements. Appendices outline
the technical requirements for installing Nagios and Ubuntu 12.04, the lab
environments and associated topology.
6
17. 2.0 Literature Review
2.1 The Definition of Network Management
The management and operations of modern networks and network services involve a
great deal of operational tasks such as dealing with planned maintenance activities,
mass traffic events, cable cuts and hardware failures. Network management means
different thing to different people. In general, network management is a service that
involves according to Cottrell (1992): “managing the delivery of an agreed upon
service level to the user.” Features of network management are described by Boutaba
(2002) as:
1. Fault management.
2. Configuration management.
3. Performance management.
4. Security management.
5. Accounting management.
Network management enables operators to handle the complexity and scale of the
above network management/operations functions with the help of a network
monitoring tool. While each of these functions is distinct, they all occur in the same
network. At this section a simplified view of the network operations framework is
presented. A detailed view of FCAPS (Fault, Configuration, Accounting,
Performance and Security) will be examined later. This thesis is primarily dealing
with fault and performance management, as important aspects of network
management. [Jianguo D, 2010, p.10]
2.2 N e t w o r k Management Architecture
The most well-known aspect of network management system is network
performance. The network management architectures consist of a centralized network
management entities and management agents running on network devices and
computer systems. Using a management protocol, the network management entities
send polls in order to get information about network devices. Agents return requested
7
18. information ranging from bandwidth usage to CPU load when problems are
recognized in these services. Using this information, management entities react by
executing a group of actions including performance and error reporting to network
administrators. It is important to be understood that agents are software modules
whose first duty is to compile information related to the managed devices they locate.
Then this information is stored in a MIB, and it is finally sent to the management
entities within NMS (network management systems). Management protocols with
great acceptance are the SNMP (Simple Network Management Protocol) and CMIP
(Common Management Information Protocol). Entities that provide management
information on behalf of other entities are the Management proxies [Moceri, 2010,
p.2]
Figure 1: Typical network management architecture composed of a management
station and various agents.
2.3 Network Management Protocol
2.3.1 SNMP
The SNMP is designed to let management information be exchanged between SNMP
agents and management stations on a TCP/IP internetwork. The protocol defines the
type of network management, information storage databases and the structure of data
in use. Information called SNMP objects can be provided by the SNMP agents.
8
19. SNMP objects are the device's network configuration and operations, such as the
device's network interfaces, routing tables, IP (Internet Protocol) packets sent and
received, and IP packets lost and stored to MIB (Management Information Base) in a
standard format defined for each object. Even though it is possible to set SNMP to
work via TCP, it is not the best practice for larger networks due to the large number
of connections. Thus, SNMP relies on UDP (User Datagram Protocol) as a transport
protocol. A standard manner to view and alter network management information on
hardware from multiple vendors can be provided by SNMP along with MIB. Any
monitoring or management application that uses SNMP can access MIB data on a
specified device. [Mauro, 2001]
READ/WRITE are the two basic operation modes of SNMP protocol. While the
READ/WRITE mode enables setting certain variables on the specified device, the
READ mode permits only reading the SNMP variables from a specified device.
Configuring an agent with the READ/WRITE mode, with only one OID variables in
the MIB base should be set to include only a specific OID value. In this case, WRITE
access to other OID values would be forbidden. Thus, it is possible to set limitations
in the MIB base. [Wikipedia, 2013]
2.3.2 SNMP Messages Types
SNMP version 1, the initial version of the SNMP protocol introduced five protocol
data units that are still supported in current versions of the protocol. The GET
REQUEST is used to retrieve the value of a variable or list the variables of a network
data object by sending a relevant request. The GETNEXT REQUEST does the same
thing with the exception that the request is the next value in a sequence of a data
object after the GET REQUEST. Agents send GET RESPONSE data units to GET
REQUEST and GETNEXT REQUEST requests. SET REQUEST data unit is sent by
Management stations to set the value of a variable or list variables on a specified
device. When agents want to notify management stations for events taking place,
they send asynchronously TRAP messages. SNMPv2 includes revision improvements
for SNMPv1 in the key areas of performance, security, confidentiality, and manager-
to-manager communications. GETBULK performs sequential requests more
9
20. efficiently by permitting a management station to request larger amounts of
management data rather than having to repeat again a sequence using GETNEXT.
The INFORM message type was originally defined as another version of TRAP that
is acknowledged by the management station. SNMPv3 primarily increased
cryptographic security and remote configuration to the protocol making it in the
preferred version to use. [Matt, 2006]
Message Usages
GET REQUEST Used by Manager to retrieve a specific piece
of network information.
GETNEXT
REQUEST
Used by Manager to iteratively retrieve a
sequence of information.
GET RESPONSE Used by agent to send information to
Manager in response to a request.
SET REQUEST Used by a Manager to initialize or change the
value of a management object.
TRAP Used by agent to report an alert or other
asynchronous event to the Manager.
GETBULK Introduced in SNMPv2 to retrieve a sequence
of information as a faster alternative to
GETNEXT.
INFORM Introduced in SNMPv2, an acknowledged
version of TRAP.
Figure 2: Characteristics of SNMP protocol in use: v1, v2c and v3 are given above.
2.3.3 SNMP and UDP
10
21. Figure 3: A simplified SNMP architecture is given in above.
SNMP uses UDP, as transport protocol, for passing data between managers and
agents because it has not the overhead of TCP (Transmission Control Protocol). The
impact of UDP reduces network's performance so it requires low overhead due to the
unreliable nature of it. UDP has been chosen over TCP protocol because there is no
acknowledgment for lost datagrams at the protocol level. Thus, there is no end-to-end
connection between agent and NMS when datagrams (packets) are sent back and
forth. If the NMS does not receive a response, it simply assumes the packet was lost
and retransmits the request. Sequencing is not required because each request and each
response travels as a single datagram. The number of times the NMS retransmits
packets is also configurable. The unreliable nature of UDP is not a real problem but
the process differs for traps. The NMS has no way of knowing if an agent sends a trap
and the trap never arrives. All management stations use the UDP port 161 for sending
and receiving requests to agents and agents send TRAP messages to management
stations on UDP port 162. [Kozierok, 2005]
2.3.4 SNMP Management Information Base (MIB)
The MIB is a collection of the managed objects that make up the "management
11
22. information". Each agent has its own MIB. NMS can read or write in the MIB of the
managed objects. MIB defines a set of characteristics in a standard format associated
with the managed objects such as the OID (object identifier), access right and data
type of the objects. MIB defines data using a tree structure. Each node of the tree is
related with a managed object and can be uniquely identified by a path starting from
the root node. Each object in the MIB can be uniquely identified by a string of
numbers and a text name. This string of numbers is the OID of the managed object
system [Ipswitch, 2001].
2.4 Functional Division of Network Management
The ISO has contributed to a well-defined network management reference model for
network standardization. The OSI model breaks network management into five
functional divisions which are sometimes referred to as FCAPS so that the major
functions of network management systems are understood. The above divisions are
discussed in the next sections based on Shields (2007, p.5-8) and Parker (2005, p.4):
2.4.1 Fault Management
Fault management involves trouble management, which has to do with searching for
detection functions for service, fault recovery, and proactive maintenance, which
provides capabilities for self-healing. Trouble management triggers alarms for
network anomalies or failures and performs diagnostic tests to isolate faults in
hardware or a service. Not only does it trigger service repair but it also accomplishes
important measures to fix the diagnosed fault. Proactive maintenance performs
routine maintenance to near-fault conditions and fixes problems before service
troubles are reported to the NMS. FCAPS model identifies twelve management tasks
as important for a good fault management system:
Fault detection
Fault correction
Fault isolation
Network recovery
12
23. Alarm handling
Alarm filtering
Alarm generation
Clear correlation
Diagnostic test
Error logging
Error handling
Error statistics
2.4.2 Configuration Management
Configuration management is involved with resource provisioning and service
provisioning. It identifies records and maintains network configuration in order to be
able to update configuration parameters and to ensure normal network operations.
The configuration management which faces three kinds of networks: logical, service,
and custom, involves the following management tasks:
Resource initialization
Network provisioning
Auto-discovery
Backup and restore
Resource shut down
Change management
Pre-provisioning
Inventory/asset management
Copy configuration
Remote configuration
Automated software distribution
13
24. Job initiation, tracking, and execution
2.4.3 Accounting Management
Accounting management processes and manipulates services related to user
management and administration. Moreover, accounting management creates and
verifies billing for usage of network resources and services. The below list resumes
the eight tasks that enable accounting management for monitoring tools:
Track service/resource use
Cost for services
Accounting limit
Usage quotas
Audits
Fraud reporting
Combine costs from multiple resources
Support for different accounting modes
2.4.4 Performance Management
Performance management deals with processes that ensure the reliability and quality
of network performance based on their capability to fit user service-level goals. It
includes evaluation of vital performance entities such as network throughput,
resource utilization, delays, congestion level and packet loss, and reporting if quality
of network resources is below a certain level. Performance management systems are
responsible for the following issues:
Utilization and error rates
Performance data collection
Consistent performance level
Performance data analysis
Problem reporting
14
25. Capacity planning
Performance report generation
Maintaining and examining historical logs
2.4.5 Security Management
Security management protects non authorized access to network resources, its
services and data against all security threats such as accidental abuse, unauthorized
access, and communication loss. In addition, it ensures user privacy and control over
user access privileges that derive from a range of access modes like operations
systems, service provider groups and customers. The following activities are crucial
for an efficient security management system:
Selective resource access
Access logs
Data privacy
User access rights checking
Security audit trail log
Security alarm/event reporting
Take care of security breaches and attempts
Security-related information distributions
2.5 Choosing systems management tools
The factors, not related to technical issue, that affects a small (or medium) sized
company to select the IT monitoring tool it will use, are the following [Curry, 2008,
p.7] [Drogseth, 2006, p.4, 6] [Hale, 2012, p.11-12]:
Ease to use – not based on usability of demos, but based on usability of
implementation in a real world scenario.
Skills mandatory to implement the specifications versus skills available.
Specifications for and availability of user training.
15
26. Cost such as licenses, tin, evaluation time, maintenance and training.
Support – from supplier and/or communities.
Scalability.
Deployability – management server(s) ease of installation and agent deployment.
Reliability.
Accountability – the ability to sue / charge the dealer if expectations are not
reached
A prioritized list of basic requirements that meet Burgess’s (2005, p.3) expectations is
helpful, since a successful implementation of a network monitoring tool combines
those specifications.
Open Source software
Very energetic forum / mail lists
Established history of community support and regular fixes and releases
Centralized, open database
Both Graphical User Interface (GUI) and Command Line Interface (CLI)
Easy deployment of agents
Scalability to several hundred devices
Adequate documentation
2.6 Network monitoring tasks
After having analyzed the Network Management Functions of FCAPS framework,
the monitoring functionalities for each of network management functions will be
defined. In order to support the evaluation of Nagios monitoring tool for small and
medium organization, findings of Section 2.4 (“Functional Division of Network
Management”) below the monitoring functions (tasks a NMS should do based on
literature review and the criteria set by network industry) are listed below. Those
findings will be used for benchmarking on evaluating Nagios as a monitoring tool.
Fault and Performance functionalities and their important sub-functionalities are
16
27. presented in details along with their relevant key metrics. Moreover, Configuration,
Accounting and Security Functionalities are mentioned briefly due to their affection
in the selection of a monitoring tool even though they are out of the scope of this
thesis.
Table 1: Fault monitoring tasks and their key metrics [MindShare Services, 2007]
Tasks Key Metrics
Fault
Monitoring
Fault detection
Mean – Time
Between Failures
Mean – Time To
Restore
Network Uptime
Fault correction
Fault isolation (Network
Mapping / graphs)
Network recovery
Alarm handling
Alarm filtering
Alarm generation
Clear correlation
Diagnostic test
Error logging
Error handling
Error statistics
Table 1.1: Fault detection task and its sub-tasks
Task Sub-Tasks
Fault detection
Passive fault management
Active fault management
Table 1.2: Alarm / Event Generation task and its sub-tasks
Task Sub-Tasks
Alarm / Event Generation Sending an email message
Sending an SMS message to a cell
phone or pager
Playing a sound or recorded message
on the management workstation
Logging the alert to the Network
Event log
17
28. Logging to a text file
Sending a Syslog message
Sending an SNMP trap
Logging the alert to a Microsoft
Windows event log
Sending a Microsoft Windows Net-
Message
Executing an external program
Executing a script
Speaking an alert message using a
text-to-speech engine
18
29. Table 1.3: Fault correction task and its sub-tasks
Tasks Sub-Tasks
Fault correction Device/service restart
Reconfiguration
Security action
Table 2: Configuration Monitoring tasks and its key metrics [The Configuration
Management Planning Group, 2013]
19
Tasks Key Metrics
Configuration
Monitoring
Resource initialization MTTR Reduction
Loss of Business
Revenue
Simple count on
number that a
configuration
does not match
held information
The amount of
elapsed time that
passes from the
approval of a
change to the
actual
implementation of
that change
The number of
components that
are identified as
“unauthorized”
Network provisioning
Auto-discovery
Backup and restore
Resource shut down
Change management
Pre-provisioning
Inventory/asset
management
Copy configuration
Remote configuration
Automated software
distribution
Job initiation, tracking,
and execution
30. Table 3: Accounting Monitoring tasks and its key metrics [Creanord, 2013]
Tasks Key Metrics
Accounting Monitoring
Track service/resource
use
SLA Based
resource
allocation
Trend Analysis
Resource
utilization
Network
inventory
information for
costing
capacity planning
Cost for services
Accounting limit
Usage quotas
Audits
Fraud reporting
Combined costs from
multiple resources
Support for different
accounting modes
Table 4: Performance Monitoring tasks and its key metrics [Jain, 1991, p.40] [Benoit, 2007,
p. 9-11]
Tasks Key Metrics
Performance Monitoring Utilization and error
rates
Bandwidth
Utilization
Network Latency
Interface Errors
and Discards
Network
Hardware
Resource
Utilization (CPU
load, memory
usage, and buffer
usage)
Performance data
collection
Consistent performance
level
Performance data
analysis
Problem reporting
Capacity planning
Performance report
generation
Maintaining and
examining historical logs
20
31. Availability
Table 4.1: Performance data collection task and its sub-tasks [Shields, 2007, p.26]
Tasks Sub-Tasks
Performance data collection
Input/output bits/second
Current/average response time
Peak traffic load
Interface errors/discards
Percent packet loss
Table 5: Security Monitoring tasks and its key metrics [PCI Security Standards Council, 2010,
p.8]
Tasks Key Metrics
Security Monitoring
Selective resource access Password policies
Acceptable use
policies
Lockdown and
access policies
Mobile device
access and
lockdown policies
Business data
encryption
policies
Antivirus, anti-
spam, anti-
malware, and
anti-spyware
policies
Security policy
violation
adjudication
procedures
Access logs
Data privacy
User access rights
checking
Security audit trail log
Security alarm/event
reporting
Take care of security
breaches and attempts
Security-related
information distributions
21
32. 2.7 Comparison of Nagios against industry Standards
In the following tables (from Table 6 to Table 10) the major monitoring tasks of
FCAPS against Nagios Core 3.x’s Functionalities [Silver, 2009, p.12] [Gaur, 2003,
p.6-8] [Curry, 2008, p.143-146] are presented as a result of literature study. In
addition, it is proven (Τable 11) that Nagios fulfils some other non-technical
requirements as they are posed in the section 2.5 of this chapter [Golden, 2007]
[Rusalan, 2010, p.7-8] [Nagios, 2013]. The conclusions from these two comparisons
suggest that Nagios is an ideal solution for for small to medium organizations in
terms of manpower.
Table 6: In the next table the way in which Nagios complies with Fault Monitoring
standard of FCAPS is presented as simply as possible.
Tasks Nagios Comments
Fault
Monitoring
Fault detection
Yes (alarms,
warning...)
Supports
NRPE /
NSClient
No
SNMP TRAP
handling
SNMP support V1, 2 & 3
Fault correction Yes
Fast Event
handlers allow
automatic restart
of failed
application and
services
Fault isolation
Rootcause
Analysis
Network
Mapping /
graphs
UNREACHABLE
status for devices
behind network
single
point of failure.
Also, host /
service
dependencies.
Network recovery Yes Via plugin (Nolio
22
33. plug-in)
Alarm handling Yes
Alarm filtering Yes
Escalation
capabilities ensure
alert notifications
reach the right
people
Alarm generation Yes email / pager
notifications
Clear correlation Yes
Diagnostic test Yes
Error logging Yes
Error handling Yes
Error statistics Yes
PNP4Nagios plug-
in
Table 7: Nagios connection with Configuration Monitoring standard of FCAPS.
Tasks Nagios Comments
Configuration
Monitoring
Resource
initialization
Yes
Network
provisioning
Yes
Auto-discovery Yes
Node discovery /
Interface
Discovery /
Service (port)
Discovery /
Application
discovery
Backup and
restore
Yes
Stores
configuration in
flat files with
simple format in a
SQL database
Resource shut
down
Yes
23
34. Change
management
Yes Using Perl or PHP
Pre-provisioning Yes
Inventory/asset
management
Yes Via plug-in
Copy
configuration
Yes
Remote
configuration
Yes NRPE 2.15
Automated
software
distribution
No
Job initiation,
tracking, and
execution
Yes
Table 8: Nagios connection with Accounting Monitoring standard of FCAPS
Tasks Nagios Comments
Accounting
Monitoring
Track
service/resource use
Yes
Trending and
Capacity
planning add-
ons ensure you
are aware of
aging hardware
Cost for services
Yes
Availability
reports ensure
SLAs are being
met
Accounting limit
Usage quotas Yes Keeps a history
of alerts and
downtimes for
all hosts and
services checks
24
35. by default
Audits Yes
Fraud reporting Yes
Combine costs from
multiple resources
Yes
Support for different
accounting modes
Yes
Table 9: Functionalities in the Performance Monitoring standard of FCAPS are
related with functionalities performed by Nagios in the same area in the next table.
Tasks Nagios Comments
Performance
Monitoring Utilization and error
rates
Yes
Monitoring of
network
services and
host resources
Performance data
collection
Yes
PNP4Nagios
plug-in
Consistent
performance level
Yes
PNP4Nagios
plug-in
Performance data
analysis
Yes
PNP4Nagios
plug-in
Problem reporting Yes
PNP4Nagios
plug-in
Capacity planning Yes
PNP4Nagios
plug-in
Performance report
generation
Yes
PNP4Nagios
plug-in
Maintaining and
examining historical
logs
Yes Historical
reports provide
record of alerts,
notifications
25
36. outages, and
alert reports
Table 10: Connection between Security Monitoring standard of FCAPS against the related
functionalities of Nagios.
Tasks Nagios Comments
Security Monitoring
Selective resource
access
Yes
Access logs Yes
Data privacy No
User access rights
checking
Yes
An
administrator
can prevent
access to certain
parts on a per-
user or per-role
basis
Security audit trail
log
Yes
Security
alarm/event
reporting
Yes
Take care of
security breaches
and attempts
Yes
Security-related
information
distributions
Yes
Table 11: Non-technical requirements of a monitoring tool posed by industry
fulfilled by Nagios
Industry defined standards Nagios
Open Source free software Yes
Very active forum / mail lists Yes
Established history of community Yes
26
37. support and regular fixes and releases
Centralized, open database Yes
Easy deployment of agents Yes
Scalability to several hundred devices Yes
Adequate documentation Yes
Ease of use Yes
Skills necessary to implement the
requirements versus skills available.
No
Requirements for and availability of
user training
No
Cost Minimum
Support (from supplier and/or
communities)
Yes
Scalability Yes
Deployability (management server(s)
ease of installation and agent
deployment)
Yes
Reliability Yes
(Accountability – the ability to sue /
charge the vendor if things go wrong)
No( only in Nagios XI)
Both Graphical User Interface (GUI)
and Command Line Interface (CLI)
No
2.8 The selection of Nagios
Although the functionalities of Nagios Core 3.x listed in literature revive can be
applied in large companies, it is difficult to apply to a relevant company. Network
management requirements and expectations are different from the network of a small
organizational, due to limited technical skills of company’s staff. Using monitoring
tools that are financially affordable, easy to install and use and able to monitor all
their resources is a priority for any company. [Zoho Corp, 2010]
Reid (2008), Ayadi (2013) and Curry (2008, p.148) argues that Nagios is the best
monitoring system for any small / medium size network. They claim that Nagios
compared to other monitoring tools is better because:
1. It has very low specifications
2. It has many plugins to use.
3. It supports SNMP keeping monitoring simple
27
38. 4. Nagios can be installed and run in 15 minutes with basic configuration
5. It has good built-in documentation
6. It supports more network devices in the free version
2.9 Validating the Literature review outcome
To answer RQ2, it is important to define the functionalities which should be
performed by an automated NMS. The outcome of literature review (including RQ1
and RQ3 as well) should be validated with the use of questionnaires that will be
completed by professionals of the field. Moreover, the level of consistency between
Nagios’s functionalities and FCAPS framework should be defined. Analysis of data
selected with the help of questionnaires will answer whether Nagios is the best
available monitoring tool available for a small / medium organization. The
questionnaire is presented in Appendix B and the list of the participants is presented
in Appendix A. More information about research methodology is included in Chapter
3.
3.0 Methodology
3.1 Overview
There are three kinds of research methodologies in software engineering: (1)
Qualitative methodology, which seeks to extract and analyze the required
information from books, papers, observation, interviews and web sources in order to
justify or improve a theory. (2) Quantitative methodology, which collects numerical
data and examine dependency relationships among variables with the use of
statistical methods. (3) Mixed methodology that includes both types of research
methodologies (qualitative and quantitative) in a single research. [Bazeley, 2002,
p.2]
The selection of the appropriate research methodology is important for the success
of a research project. In general, a combination of two or three data sources may be
most effective in achieving a particular research objective. To answer the research
28
39. questions of this Thesis a mixed research methodology is adopted. More specific, a
triangulation approach methodology is selected to be used for cross - validating
results obtained by research methods. Quantitative and Qualitative data are collected
concurrently but they are analyzed and interpreted separately. Triangulation gives
opportunity to researcher to mix both quantitative and qualitative research
approaches within a stage of the research process. [Conrad C. and Serlin R, 2010,
p.155]
The Qualitative method was used to answer RQ1, RQ2 and RQ3, based on the
finding of literature review and based on questionnaires with professionals on
managing network infrastructure, in order to verify result obtained. Moreover,
experiment, the most common quantitative method, was used to check results
against predefined metrics (benchmarks). The experiment approach was used to
answer RQ4. A post-test questionnaire will be completed by the six participants, for
verification of the results after the experiment. The technique that will be used
during the experiment will be the TBFG (Task-Based Focus Group) technique, in
which a set of tasks - scenarios is given to the participants for implementation,
followed by discussion afterwards. The drawback of TBFG, as Downey (2007,
p.141) mentions, in comparison with Group usability is minimized in this Thesis
with the use of professionals with great career in Network Management. Thus,
empirical data is gathered without the need of many observers. Qualitative analysis
of the results will be performed, with the comparison and display data on Microsoft
Excel. Table 12 below displays the methodology to answer each research question:
Table 12: Research questions with methodology employed
Research Question Method(s)
RQ1
Literature review + Interviews with professionals in network
management domain. Quantitative survey.RQ2
RQ3
RQ4 Performance measurement + usability testing (quantitative
analysis) based on empirical data collected from the
29
40. experiment and the post-test questionnaire.
3.2 Experimental Laboratory
This method is selected because controlled laboratory experiments give researchers
the advantage of control. One of the three major purposes that laboratory experiments
serve is to test and refine existing theory. Furthermore by using experiments we can
bridge the gap between theory and real business problems. The art of designing good
experiments is in creating simple environments that capture the essence of the real
problem that can be interpreted with the support of data exposed. A good experiment
allows researcher to clearly distinguish among possible explanations while
abstracting away all unnecessary details. The most important factor that makes
experimental work rigorous is theoretical guidance. To interpret the results of an
experiment, researchers need to be able to compare the data with theoretical metrics
(benchmarks). Thus, the first step in doing experimental work is to start with an
theory such as the research questions of this thesis. [Katok, 2011, p.1-3]
3.3 Usability testing
30
41. A system may have excellent quality of use for some people and poor quality of use
for others. Many approaches of usability focus specifically on problems faced by
users, related with a graphical interface. Although it is important to eliminate
problems on interface, it can be a misleading indicator of overall usability. Usability
depends on the specific tasks people want to do when they use an application. Most
users on usability testing face several trivial problems, rather than facing a single fatal
problem which causes task to fail. The objectives of a usability testing vary
considerably, relying on what is tested and why so easy-to-use widgets may not give
to the application an acceptable level of usability. In order to get reliable results on
usability testing, the design of a test should include and evaluate wider usability
requirements. Therefore, usability may relate to the safe and efficient performance of
specific critical tasks by operators on the system. [Macleod, 1994]
The main purpose of a summative test of a complete product with representative users
and tasks designed is to evaluate the usability, via defined metrics, rather than
diagnose and correct specific design problems. The usability requirements should be
task-based and tied directly to product requirements in order to implement a usability
benchmark. [Usability Professionals Association, 2010]
Testing should include a lot of measures - metrics which can be categorized into four
categories as it has been suggested by Lewis (2006, p.7):
• Goal achievement indicators (such as success rate and accuracy)
• Work rate indicators (such as speed and efficiency)
• Operability indicators (such as error rate and function usage)
• Knowledge acquisition indicators (such as learnability and learning rate)
3.4 Performance measurement
31
42. Performance measurement is the basis of the usability engineering life-cycle for
assessing whether goals have been met or not. In traditional research on human
factors studies, measurements take place by having a group of users performing a
predefined set of tasks:
Figure 4: Performance methodology flow
The objectives of usability evaluation are broken down into two components as
presented in Figure 4. Next, their relative importance is evaluated based on goals
deriving from the research questions. Once the components of the goal have been
decided, it is necessary to quantify them by measuring the average time it takes a user
to complete a specified set of tasks - scenarios. The selected tasks to evaluate are
representative of users’ normal task in a working environment. This technique will
generally define the interaction between participants and the application – interface,
during laboratory experiment that will affect the quantitative performance data.
Performance evaluation will obtain quantitative data from participants by measuring
the time required for each task with the use of a stopwatch. The time calculated will
be reported by participants in the post-time questionnaire so that the data will be
32
43. collected accurately without unexpected interference. [Nielsen, 1993, p.193]
Applicable stage: test and deployment.
Personnel needed for the evaluation:
Usability experts: 2
Software developers: 0
Users: 6
Usability issues covered:
Effectiveness: Yes
Efficiency: Yes
Satisfaction: Yes
Can be conducted remotely: No Can obtain quantitative data: Yes
3.5 Questionnaire
The questionnaire is the preferred method for collecting information about the three
research questions under investigation in this thesis. Close-ended questionnaires’
format is easy to conduct, easily coded and analyzed. They permit comparisons and
quantification, and are more likely to measure degrees of difference with nominal,
ordinal, interval and ratio levels while avoiding irrelevant responses.
The basic principle is that the two questionnaires have to embody as many questions
as necessary and as few as possible so they should be designed and formatted by
researchers whose main concern is length. The two questionnaires should be written
in such a way that test users cannot be identified and the test results should be kept
private. An extensive understanding of the possible range of participant responses is
required due to the huge amount of data that is going to be processed. To achieve
reliable and valid outcomes, each question must be checked, edited and coded before
being included in the questionnaire in order to provide that each participant and test
evaluator can decipher its meaning easily and accurately. To achieve reliability and
validity, questionnaires should be short and simple.
Questionnaire design should be piloted to test if any major defects exist. The pilot
phase is used to verify that post-test questionnaire will provide useful information.
Concepts such as “Strongly disagree”, “Disagree”, “Neither agree nor disagree”,
“Agree”, and “Strongly Agree” require training and feedback to be understood.
Consequently, the test monitor (Moderator) should explain to participants the
meaning of ratio judgments of post-test questionnaire in the pilot test. The pilot test
will detect unintelligible questions producing unquantifiable responses and unwanted
33
44. outcomes before embarking on the main study. The purpose of the experiment is to
measure the performance of experienced users by doing a laboratory type of study.
Moreover, ISO 9241 standard, part 12, defines usability in terms of effectiveness,
efficiency, and satisfaction. The post-test questionnaire intends to measure with
metrics the usability of a software application and select extra data of user
satisfaction. Each participant will be asked to record the required time to complete a
task in the post-test questionnaire. If an error occurs, the test monitor (Moderator)
will ask users to repeat the task immediately. Then, users will rate the difficulty of
each task using the rating types mentioned above. Time limitations influence the
performance measurement of each task, so the Likert Scale will be adopted in the
post-task questionnaire, since they are easy to be completed by users. It is a common
practice to establish a baseline for each question in order to measure the success of
Nagios in the evaluation phase. Baseline values will be mentioned in Chapter 5.
[Dumas, 2010]
Table 13: Advantages and disadvantages of the Face-to-Face mode of delivery of
questionnaire as Bird notes: (2009, p.1313)
Advantages Disadvantages
Complex questions can be asked. Costly.
Can motivate participants. Time consuming.
Longer verbal responses compared to written. Spatially restricted.
Questions can be clarified.
Answers may be filtered or
censored.
Question sequence controlled.
Interviewer’s presence may affect
responses.
Vague responses can be probed.
Visual prompts can be used.
Long questionnaires sustained.
High response rates.
34
45. 3.6 Validity threats related to questionnaires
Questionnaires have both strengths and weaknesses. Questionnaire is the most
objective research tool because it can provide generalizable results. However, large
sample of data in questionnaires can generate problem due to factors such as faulty
questionnaire design, sampling errors, non-response errors, and biased questionnaire
design. Moreover, respondent unreliability, ignorance, and misunderstanding, errors
in coding and faulty interpretation of results may cause additional problems. [Harris,
2010, p.1-2]
To improve the accuracy of testing, it is important to pay attention to the issues of
reliability and validity. Reliability is the question of whether one would get the same
result if the test were to be repeated. This implies that huge individual differences
between test users has an influence in the results. Validity notes whether the result
actually reflects the usability issues one wants to test or not, taking into consideration
the factors of possible wrong users or wrong time constraints and social influences
given to them by the tester. Whereas reliability can be addressed with statistical tests,
a high level of validity requires fact measures of real products in real use outside the
laboratory evaluation. The simplest form of reliability test is the test-retest procedure,
in which the same unit is measured two times at a different timeframe, and then
results are correlated. More robust measures focus on measuring the extent to which
all individual items correlate with each other. There are a lot of approved ways to
measure internal consistency. The most widely method used, is Cronbach’s alpha,
which evaluates the homogeneity in the individual items. Validity cannot be accessed
directly because there is no knowledge of the true values of construct. [Larsen, 2008,
p.1-2]
A questionnaire can never really be fully “validated” which means that a
questionnaire can have one kind of validity but not another. It can only be validated
for an x number of population, under y conditions, and so forth. In this thesis, one
way to test the validity of the questionnaire is to correlate its outcomes with the
outcomes of the laboratory experiment and with the results of the literature reviews.
35
46. There are a numerous ways to specify validity, some of which were given by Howard
(2008, p.1) and are noted below:
• Reliability
• Validity
• Internal validity
• External validity
• Sensitivity
• Specificity
• Statistical validity
• Longitudinal validity
• Linguistic validity
• Discriminant validity
• Construct validity
3.7 Validity threats related to Experimental Laboratory and Usability testing
Laboratory experiments are used to address a wide range of research questions.
However, there are various concerns if laboratory findings can be “generalizable” or
if they are “externally valid” to the real markets. There is an argument whether lab
studies can be changed to reflect better an external environment of interest. Some
definitions of external validity demand that qualitative relationship between two
variables hold across similar environments. While there may be a dispute on whether
there is a promise for quantitative results of an experiment to be externally valid, it
cannot be guaranteed that any the qualitative results will present external validity.
[Kessler, 2011]
Internal and external threats to an experimental laboratory are mentioned below
[Heffner Media Group, 2003]:
3.7.1 Internal validity threats
36
47. Internal validity refers to a study that allows the elimination of confounding variables
within the study itself. There are eight major threats to internal validity related with
this Thesis, which are posed below.
History: History refers to environmental events that happen to participants outside of
research study which may affect or alter participants’ performance. A special
announcement to students of different fields in New York College, at a specific day
and at a specific time, may have had effect on the results obtained via a laboratory
experiment.
Maturation: Maturation refers to the process of measuring something over a repeated
number of trials during the experiment that might make the participants feel boring,
tired, disinterested, fatigued, less motivated than they were at the beginning of the
testing. A way of overcoming this problem is to design a short meaningful post-test
questionnaire and experiment.
Testing: This is a threat if a group of participants evaluate one product or two groups
evaluate two products. The condition is true when the participants perform the pilot
test and the test. The reason for this is that participants tend to perform better at any
task the more they are exposed to that task. Changing the tasks in the pilot test and in
the actual test minimized this threat in this thesis.
Statistical Regression: It refers to the tendency of subjects to move toward the mean
on subsequent testing even if no extra training were given. This implies that average
performance will be higher in a production environment rather than in a experiment.
Instrumentation: Test evaluator should be careful on measuring the performance of all
participants during the experiment and on validating the selected data from the post-
test questionnaire. Ignoring the performance and data given from participants will
lead to false results.
Selection: The sampling technique selected will affect how representative the sample
will be, allowing researcher to make statistical generalizations for a wider population.
The population of the lab experiment is selected based on characteristics of certain
37
48. type of users, such as career experiences, Cisco certification, Solarwind certification,
and post-graduate degrees.
Experimenter Bias: During testing, the test monitor (Moderator) let each participant
discovers the solutions to the scenario given on his or her own without any support.
Such a way results obtained are more valid and accurate. Moreover, it also enhances
users’ confidence and satisfaction, since they solve the scenarios given on their own.
The test monitor (Moderator - Dr. Pandithas) researcher has an extensive knowledge
of the application, the user interface and test method as well. However, Team Leaders
may be sometimes biased toward the desired results. Using a test evaluator who is
unaware of the anticipated results, can reduce the impact of relevant bias. This Thesis
uses its author as test evaluator who defines which is the cause – effect relationship
among variables.
Mortality: Mortality refers to the situation where some participants drop out of an
experimental group. Imagine a case in which participants with unique characteristics
drop out of the experiment due to illness and only low motivated students remain in
the team. As a result, one team will have better performance in comparison with the
other and this situation could affect the outcome of the experiment. Because, Nagios
is tested against predefined standards and not against other products, this threat is not
applicable to our experiment.
3.7.2 External Validity Threats
External validity refers to the conditions of a study which can lead a researcher to
incorrect generalizations. In order to avoid this threat, the study is performed on a
sample of the population which is not exactly representative of the actual population
of users. Consequently, the experiment was not generalized because the result was
not the outcome of industrial practices relevant with the tested software.
Demand Characteristics: Making sure that participants don’t know the real purpose of
the questions minimizes the possibility of the participants might try to guess what the
researcher wants as an outcome and might respond accordingly rather than answering
the truth.
38
49. Hawthorne Effects: The presence of researcher during experiment may change the
actual performance of participants during the testing. In order to minimize the effect
of this issue participants will be told that the system is tested and not themselves.
Order Effects (or Carryover Effects): Order effects refer to the situation that some
participants may have "learned" what the task tests will be from the pre-test. Thus,
would not be anymore representatives of the population who have not been pre-
tested. Such a scenario makes the experiment useless, unless the test tasks from the
pilot test and the main test are different, which will be done in this experiment.
Treatment Interaction Effects: If the subjects are exposed to more than one
experiment / training on a network monitoring tool then the findings about Nagios
performance and usability will be affected by the previous experience. Since
participants will probably have no experience with any monitoring tool in their career
life except Nagios, the percentage of participants with familiarity with other tools is
minimized.
3.8 Designing Rationale of the questionnaires
3.8.1 Importance of Design Rationale after Literature Review
This questionnaire with experts will measure data that crosscheck the findings from
the literature about Nagios’ functional and usability characteristics. Experts’
experience, either good or bad, of this monitoring tool will help us to decide if Nagios
meet the requirement for a small / medium organization. Objective of each question
within the two broad characteristics mentioned is to provide information about
significant aspects such as:
QUESTION 2: This question intend to foresee expectations of event management &
control related with alarms in Nagios in order to investigate the options available such
as services’ status and nodes' status.
QUESTION 3: The respondents should provide meaningful answers about the
capability of setting and managing Alerts in Nagios in order to test its reporting
capability.
QUESTION 4: The respondents are asked to provide data if Nagios kept them
39
50. informed about faults and their location in the network in order to be verified if
Nagios informs properly the root of problem of a network error and pinpoint the exact
place of in into the network.
QUESTION 5: This question is designed in order to be learned if Nagios meet the
needs of a trouble ticket system.
QUESTION 6: This question has to specifically measure the ability of Nagios to
Monitor and report the health of the devices in the network (e.g. CPU Heating, Server
room temp, etc) in order the respondents to verify if Nagios can do what it claims.
QUESTION 7: This question is looking at Thresholds' management of Nagios in
order to be verified if Nagios meets certain standards on this kind of management.
QUESTION 8: A respondent answers this questions about the capability of Nagios to
measure and report different performance related attribute such as throughput, delays
and packet loss in order to have a better understanding of the monitoring tool.
QUESTION 9: Information should be acquired if Nagios could provide statistics of
resource utilization about capacity planning assistance so this question is asked in
order to be proven that Nagios has the ability to measure the availability of certain
hosts, services and links
QUESTION 10: This question was intentionally placed in the questionnaire in order
to investigate if all the required activities in Nagios have affective logging.
QUESTION 11: We were also interested in identifying the fault and performance
management capabilities of Nagios by asking this question in case a major network
error comes up such as an error in configuration of a VPN connection.
QUESTION 12: This question tests correlations between the beliefs of the experts
and the findings of the literature review about the motion that Nagios is easy to learn
from the beginning.
QUESTION 13: The respondents are pinpointed to provide if the monitoring tasks are
performed efficiently (quickly) by using Nagios based on the findings in the literature
review.
QUESTION 14: The respondents are asked to highlight any lack of documentation
and community support available for Nagios in order to deal with future training
demands for the personnel of a company.
40
51. QUESTION 15: The experts should give their opinions if they believe that the
interface design of Nagios (including how to interact e.g. using keyboard, mouse. and
commands) has weaknesses which implies that Nagios may have functional usability
disadvantages
QUESTION 16: The reason for this question is that experts should be expressed with
their opinion about their satisfaction about the overall functionality of Nagios and if
their beliefs are verified according our findings in the literature review.
3.8.2 Design Rationale of the Post Test Questionnaire
The post-test questionnaire (Appendix K) was divided into eight questions. We asked
the respondents to verify via the questions the experience earned from the task
scenarios listed in the Appendix G. We considered that these eight questions show the
reasoning why Nagios is useful for a medium / small organization. Question 1 allows
respondents express themselves about the easiness to use of Nagios installation. The
responses of question 2 provide information about respondents’ considerations about
the availability reports in Nagios. The respondents considered how easy to configure
Nagios to monitor NTP Server in question 3. Question 4 shows the importance to
monitor a network host on a regular basis which reflects the reasoning of the selecting
Nagios as monitoring tool. Question 5 indicates to the respondents the awareness of
monitoring network servers via Nagios. Since our respondents complete the task list,
question 6 ask them to judge if the on-screen information and the organization of
menus of Nagios is useful. Question 7 looks for the easiness to use of Nagios for new
users by asking the respondents. Finally, question 8 investigates the overall
architecture design of Nagios which means the complexity of the monitoring tool to
be configured in order to perform any task.
41
52. 4. 0 Chapter 4: Development
4.1 Network Design
This chapter refers to the implementation of the Nagios network management system.
It will be demonstrated how Nagios can monitor a number of network hosts and
associated services located in these hosts. The goal can be fulfilled by building a test
network and implement Nagios’s object configuration files to monitor the network.
Scenario: An organization Headquarters is connected with its branch site via an
IPsec VPN. Moreover, EIGRP routing protocol is configured between sites by
implementing Generic Routing Encapsulation (GRE). The router in Headquarters is a
42
53. router on-a-stick with the DLS1 switch which is connected with ALS1 and ALS2
switches. The HQ router assigns IP addresses to all three switches. The intent is to
prove that Nagios monitoring tool is the suitable solution for the needs of a small /
mid-size enterprise.
Note: Although this project involves configuration of Network Address Translation
(NAT), IPsec VPNs, and GRE, the detailed explanation of those technologies are out
of the scope.
Note: The required telecommunication infrastructure for this project are Cisco 3745
routers with Cisco IOS Version 12.4(15) T14 and the Advanced IP Services image
C3745-ADVSECURITYK9-M
Note: The image C3745-ADVSECURITYK9-M is also used in three “switches”
(DLS1, ALS1 and ALS2). However, a switch card is added visually via GNS3 so that
those three router to act as actual “switches” for the demand of the assignment. The
only difference with the actual configuration that will be tested in NYC Campus is
that the ip route 0.0.0.0 0.0.0.0 [ip address] command will be used instead of ip
default-network command.
4.1.1 Required Resources
It includes: 3 Routers, 3 switches, Serial and console cables and two personal
computers.
4.1.2 Topology Diagram
43
55. 4.1.3 Addressing Table
Device -
Hostname
interfaces IP Address Description
HQ FastEthernet1/0.1 198.168.10.33 Connection to Vlan 1
FastEthernet1/0.100 198.168.10.65 Connection to Vlan
100
FastEthernet1/0.200 198.168.10.97 Connection to Vlan
200
Serial2/1 209.165.200.226 Connection to ISP
Loopback0 10.10.20.238 HQ email server
address
Loopback1 10.10.10.1 Connection to DNS
Tunnel0 172.16.100.1 Connection to Branch
Branch Serial2/1 209.165.200.242 Connection to ISP
Loopback1 192.168.1.1 Branch LAN
Tunnel0 172.16.100.2 Connection to HQ
ISP Serial2/0 209.165.200.241 Connection to Branch
Serial2/1 209.165.200.225 Connection to HQ
Loopback1 209.165.202.129 Simulating the
internet
DLS1 Vlan1 198.168.10.34 Connection to Vlan 1
Fa2/4 Connection with
Nagios / VLAN 100 /
IT
Fa2/9 Connection with ALS2
via etherchannel 2
Fa2/10 Connection with ALS2
via etherchannel 2
Fa2/11 Connection with ALS1
via etherchannel 1
Fa2/12 Connection with ALS1
via etherchannel 1
Fa2/0 Connection to HQ
ALS2 Vlan1 198.168.10.36 Connection to Vlan 1
Fa1/7 Connection ALS1 via
etherchannel 3
Fa1/8 Connection ALS1 via
etherchannel 3
Fa1/9 Connection DLS1 via
etherchannel 2
Fa1/10 Connection DLS1 via
etherchannel 2
Fa1/15 Connection with
45
56. VLAN 200 - USERS
ALS1 Vlan1 198.168.10.35 Connection to Vlan 1
Fa1/7 Connection with ALS2
with etherchannel 3
Fa1/8 Connection with ALS2
with etherchannel 3
Fa1/11 Connection with DLS1
with etherchannel 1
Fa1/12 Connection with DLS1
with etherchannel 1
Fa1/9 Connection with
VLAN 200 - USERS
4.1.4 Network Implementation
The lab will be implemented with GNS3 software which gives the opportunity to set
up virtually Cisco routers and switches by using actual Cisco IOS software. The good
thing with GNS3 is that it gives the ability to insert to it virtual PCs created by Oracle
VM Virtual Box. Thus, an actual host running Ubuntu or Windows OS (Operating
System) can be added. For the requirements of this lab, Nagios will be implemented
in Ubuntu. Next, Nagios will be able to connect to the network in order to monitor
hosts and services. It should be mentioned that a detailed explanation of the Network
Implementation is out of the scope of this chapter even if it is presented in detail in
Appendix E with Nagios configuration options as well.
Step 1: Insert Cisco IOS and VirtualBox to GNS3.
It should be mentioned that prior to any configurations made on router and switch,
virtualboxes have to be inserted. This can be done by selecting Edit--->
Preferences----> VirtualBox from GNS3 and pressing Test Settings. Then, the
message “vboxwrapper and virtulabox A.P.I 4.3.8 has successfully started” appears.
This message indicates that a virtual machine can now be inserted into the GNS3. By
pressing “Apply” the procedure is finished.
46
57. By selecting “VirtualBox Guest” tab, the VirtualBox - Virtual machines needed to
insert to GNS3 can be defined. Be pressing “Apply” the relevant procedure is
finished.
47
58. From the “End devices” panel, by dragging and dropping the desired VirtualBox
guest can be selected.
The next step is to insert the Cisco IOS image that will be used by virtual routers and
switches of the lab. Press the button “...” near the “Image file:” in order to locate the
Cisco IOS image from the hard disk and finally by pressing “Test Settings” and
“Save” the procedure is completed. This implies that the tab “IOS Images” is
selected form the “IOS images and hypervisors”.
48
59. Step 2: Set up the routers by configuring their hostname and interface
addresses.
A. Assigning the network cards to the router and “switches” as it is presented in the
following screenshots.
49
60. B. Cable the network as presented in the topology diagram. Assigning IP addresses to
the interfaces on Branch, HQ, and ISP.
C. Examine the status of the interfaces with show ip interface brief command
D. A default static route should be applied on the Branch and HQ routers in order to
reach ISP router.
E. Verify connectivity with ping from the Branch LAN interface to the serial 2/1
interface of the ISP, the ISPs loopback interface, and the serial 2/1 interface of the
HQ.
F. Verify Connectivity from the Branch router to the ISP’s serial 2/1 interface, the
ISP’s loopback interface, and the HQ serial 2/1 interface. Initiate pings sourced from
the loopback interface to see if it has successfully reached those external addresses.
The pings fail because the source 192.168.1.1 IP address is an internal private
address, and the ISP is unconscious of this address.
50
61. Step 3: Apply NAT on the Branch and HQ routers
The HQ and Branch sites has been supplied by the ISP with pools of public addresses
in order hosts with private IP addresses to access the web by using NAT. A static
NAT has to be configured to the HQ site so that the email server with public ip
address of 209.165.200.238 will be available to mobile users and Branch office users.
The commands show ip nat statistics and show ip nat translations can be used to
confirm the configuration of the NAT. Verify if NAT traffic exists by pinging the
ISP’s serial 2/1 interface, ISP’s loopback, the HQ serial 2/1 interface and the HQ
public email server address having as source address the Loopback interface of the
Branch.
Once again, the commands show ip nat statistics and show ip nat translations
verify if the NAT operates properly. Before verifying the connectivity from Branch
LAN to the HQ LAN interface, NAT translations have to be cleared. Then, the
command show ip translations is required to display any NAT translations.
Branch# clear ip nat translation *
Branch#
Branch#ping 10.10.10.1 source 192.168.1.1
Branch# show ip translations
Branch#
The ISP cannot route the traffic from Branch LAN to the private addresses of HQ
router so the NAT is not working. The solution to this problem will be the IPsec
VPN.
Step 4: Configure an IPsec VPN to connect the Branch and HQ routers.
For this assignment, an IPsec VPN configuration has been provided, in order to
assure and protect all unicast IP traffic within it. Several configurations have to be
applied if interior gateway protocols which support multicast or broadcast traffic must
be encapsulated within IPsec VPN unicast packets. The configuration of the IPsec
51
62. VPN on the Branch router can be verified by the show crypto session detail
command.
Step 5: Implement GRE over IPsec.
GRE tunnel over IPsec will protect all corporate LAN traffic between the Branch and
HQ sites. The GRE tunnel can be enabled to send multicast and broadcast traffic for a
dynamic routing. The show interface tunnel 0 command verifies that the tunnel is
active and the tunnel protocol is GRE over IP.
Step 6: Apply VLAN trunking on Fast Ethernet interface of the HQ router.
Implement three sub-interfaces for the intended three VLANs. Configure each sub-
interface with the proper trunking protocol, description and IP address. The show ip
interface brief command checks the status and the interfaces’ configuration.
Step 7: Configure basic switches parameters
A. Set password and username for the privilege mode and set them to be the
username and password for line vty and line console.
B. Assign for all three switches the management IP addresses on VLAN 1 and set the
default gateways to all three switches: ALS1, ALS2 and DLS1.
Step 8: Configure DLS1 for trunking with the HQ router
Configure switch DLS1 interface fast Ethernet 2/0 for trunking with the HQ router
Fast Ethernet interface 1/0.
Step 9: Configure trunks and Etherchannels between switches
Define the EtherChannel and the trunks ports:
A. From DLS1 to ALS1.
B. From DLS1 to ALS2.
C. From ALS1 to DLS1.
D. From ALS1 to ALS2.
52
63. E. From ALS2 to DLS1.
F. From ALS2 to ALS1.
G. By using show interface trunk command, it can be confirmed whether trunking
is enabled on DLS1, ALS1 and ALS2.
Step 10: Configure DHCP pools and define DHCP excluded-addresses on HQ
router.
Step 11: Configure VTP on ALS1, ALS2 and DLS1
Step 12: Configure Ports and verify port status
Step 13: Verifying if the two DHCP pools are working
Step 14: Configuring SNMP & related Access-lists on Router / Switches
Finally, SNMP in the Routers / Switches is configured, in order to allow Nagios to
get their data. This allows those data to be displayed by Nagios. Additionally, it is
vital to set up access lists in each router / switch in order Nagios to have the
privileges to acquire the data as it has been mentioned.
Step 15: Configuring HQ as NTP server
4.2. Active Monitoring
Nagios directly monitors the services of each agent of the agent itself by using
plugins. This type of monitoring is called Active. Plugins can be used by logic when
the state of a host or service should be monitored. Logic can be, in turn, used by
Nagios daemon to get the information required. Nagios has an embedded Perl
interpreter to interpret a plugin which is in the most cases a shell script that inspect a
host or service status. Nagios daemon sending notifications when receive the results
of the checks from the plugins. check_interval and retry_interval specify the the
frequency of these checks which are responsible for defining the status of hosts and
services. Steps to be followed for configuration are:
Step 1: Identify the network that needs monitoring.
Step 2: Select the IPs addresses of each hardware since nagios pings IP addresses.
53
64. Step 3: Identify the network Services that will be monitored by Nagios.
Step 4: Implement the configuration files for every agent which represent a network
or service and name every related file with the extension .cfg.
Step 5: Each service should be defined in the command.cfg file and every host should
be also defined in the nagios.cfg
Detailed explanation of the configuration files for the identified host and services of
this assignment is given in the Appendix E. The involved building configuration files
for monitoring are:
HQ
Branch
ISP
DLS1
ALS1
ALS2
NTP Server
Telnet
4.3. Passive Monitoring
Passive monitoring is required in the IT industry when private information of a
Server / PC host, such as number of its users, its load, and the total number of its
processes cannot be retrieved. The steps should be undertaken by authorized
personnel when a Server / PC host does not meet specific performance criteria. This
kind of procedure is mandatory because active monitoring check the running services
on the hardware. The data cannot be retrieved just by using the TCP/IP protocol.
Thus, installation of daemon on the client side is required if having administrative
privileges. The OS of the client is a Windows so NSClient++ agent should be
running on it. Detailed configuration of how Passive monitoring is applied in Nagios
can be seen in Appendix E.
54
65. 5.0. Chapter 5: Evaluation
5.1. Network Elements that Need Monitoring
Small / Midsize organizations monitor continuously several of their networking
infrastructure elements as it is outlined below [Zoho Corp, 2010, p.1-2]:
Email Servers: IT Managers should endure business continuity with the
external world via an Email server because the lack of email distribution
system may lead the organization to financial loss. Key metrics for an email
server are availability, mails in queue and size of received emails.
WAN links: An organization is run smoothly in terms of network
performance when WAN link(s) is not over utilized. A network monitoring
tool should detect congestion, high response time and potential discards. Even
though optimizing the WAN links is crucial, IT Managers have to set
thresholds on routers and switches in order to ensure availability and
performance of their LAN interface as well.
Business Applications: Services such as FTP, DNS, ECHO, IMAP, LDAP,
TELNET, HTTP and POP, are running on critical applications. Therefore,
these services and their applications should be monitored along with CPU,
memory and disc space monitoring. Furthermore, server’s traffic utilization
has to be monitored as well as applications and services located on them.
LAN Infrastructure: Network devices such as switches, printers and wireless
devices are core elements of the network of one organization. Therefore, they
should be operational.
After having clarified which network infrastructure requires monitoring, the seven
stages of the evaluation phases for which the project manager of this thesis will be
responsible should be mentioned [Kantner, 1994, p.3]:
Planning the test.
Designing the test activities.
Recruiting participants.
Preparing the test materials.
55
66. Setting up the test environment.
Conducting the test.
Compiling the test results.
5.2. Planning the Test
“Planning the test” stage defines the goals, methodology, participant selection
requirements, working procedure, schedule, and resource requirements for the test
session. Moreover, the network topology is defined. Furthermore, Ubuntu and Nagios
Server are required to be installed in advance. More information is available on
Appendices C and D.
Six participants were involved individually in the test, requiring about one hour per
session over one day. Moreover, 15 minutes of extra time were allowed for
participants to fill out the post-test questionnaire. A formal break of 15 minutes was
available between the participants’ sessions.
Team members introduced to participants the task - scenarios which they should
implement - solve. The participants were basically left to accomplish by themselves
the tasks that were asked. A time limit was not specified, but the participants were
encouraged to try to solve all tasks without help from the team members. More
information is presented on section “Experimenter Bias” in page 40.
The test monitor used a laptop to write down any significant comments such as task
completion date and whether the tasks were completed successfully. After all sessions
were completed, the test evaluator analyzed all data that were extracted from the test
sessions. Finally, a master list of usability issues was developed based on those data.
5.3. Designing the Test Activities
One of outcomes of this stage was a task list that described in details the tasks and
relevant issues. Each task should be completed within a predefined time limit. At that
time, team members designed the post-test questionnaire for the participants so that
56
67. screening could start right away. This kind of questionnaire was created on the basis
of the objectives of testing, in order to work in conjunction with the findings of the
test sessions. Next, team members reviewed the design of post-test questionnaire and
task list. The purpose of this was to determine technical priorities for Nagios. More
information about Post –Test Questionnaire, is available on Appendix K.
5.4. Recruiting Participants
A parallel activity to the “Designing the Test Activities” and “Preparing the test
materials” is the selection of participants. All of them are students at New York
College. More information is available on section 3.7.1 “Selection” in page 40. The
profile of participants, which was based on their academic and professional
background, is available in Appendix F. Dr. Pandithas served as test monitor
(moderator) in all six test sessions. The author of this paper acted as test evaluator by
compiling the results of the tests.
5.5. Preparing the test materials
A “Welcome” form was presented to participants by the test monitor (moderator) in
order to minimize possibilities of misunderstanding and in order to explain the
purpose of test sessions. More information is available in Appendix G. A convenient
form was available to participants for recording quick notes about tasks after the test.
It was ensured that important topics were noted consistently without the need of
viewing videotapes. More information is available in Appendix I. Five tasks were
developed for participants to perform during the test sessions. All the tasks remained
the same in all sessions. More information is available in Appendix H.
5.6. Setting up the test environments
A state-of-the-art usability laboratory (Room B6 of New York College) was
customized for the usability test sessions. This room was chosen because the
participants, who are students at New York College, would feel comfortable in that
room. Telephones had to be deactivated during test sessions, in order to prevent any
57
68. distractions. More information is available in section “History” of section 3.7.1 in
page 40.
Participants were informed about the time and place of test session in order to be
available to attend throughout the sessions. Team members (test monitor and test
evaluator) have informed their co-workers, colleagues and friends that would be
unavailable during the test sessions. The sessions were scheduled to allow some break
time between the six participants.
5.7. Conducting the test
Team Leaders ensured that all materials in each envelope were labeled with the
participants’ IDs. It was assured that nothing may have been removed from the
envelopes of participants at any circumstance. The post – test questionnaire gave
participants the opportunity to categorize any identified problems in the “Severity”
column as ‘Important’, ‘Medium’ and ‘Minor’. Moreover, they were encouraged
by the Team Leaders to pinpoint any location to the post – test questionnaire where
Nagios encountered any problem and to explain how it influenced the completion of
the task. It is implied that the Test evaluator informed the participants using the
appropriate code of conduct. More information about personal characteristics of the
participants during post – test questionnaire, is available in Appendix K.
5.8. Compiling the test results
Usability can be increased by positive exploitation of the final results according to a
detailed test report which will be presented in Chapter 7. The data deriving from
questionnaire’s responses was categorized based on a list of the usability problems
reported by test sessions, which are presented in Chapter 6. The author of this thesis
conducted the analysis of data, which required approximately a total of 20 working
hours.
Task Completion
The six participants completed on average of 4 out of 5 tasks. Tasks 1, 2 and 3 were
58
69. completed by all participants. Task 4 was the most difficult, since only two of the six
participants accomplish it. Finally, Task 5 caused difficulties in two out of six
participants.
Task Completion Time
Team Members recorded the time that each participant spent on completing each
tasks. The results of test sessions are presented in Table 14 reporting time that was
spent for each task.
Table 14: Presents the average time spent by all participants in each task against a
predefined baseline.
Tasks Total Time Baseline
Task 1 12 minutes 14 minutes
Task 2 6 minutes 7 minutes
Task 3 2 minutes 2 minutes
Task 4 12 minutes 12 minutes
Task 5 7 minutes 10 minutes
Cumulative 39 minutes 45 minutes
Number of Usability Problems Identified
Two usability problems were identified in the five usability test sessions. Based on
the “Severity” column in Appendix I and on the descriptions of errors, the errors
were categorized as Important, Medium or Minor.
Table 15: Numbers of the usability problems identified during the testing
Tasks
Usability problems
Important Medium Minor Cumulative
Task 1 0 0 0 0
Task 2 0 0 0 0
Task 3 0 0 0 0
Task 4 1 0 0 1
Task 5 1 0 0 1
Cumulative 1 0 0
Note: On Command.cfg no rule was written by default for the command check_ntp,
check_ntp_time and check_ntp_peer as it had for other plugins. The web interface of
Nagios presented a critical error, although the service for monitoring the NTP Server
59