The document discusses best practices for building a network operations center (NOC). Some key points:
- A NOC monitors and controls network activity from one or more locations. Early versions date back to the 1960s when AT&T opened centers to monitor switches and routes.
- Modern NOCs use network monitoring software and sophisticated systems to detect issues across multiple layers of the network before they impact the business.
- Maintaining skilled staff, efficient processes, integrated tools, automation, and a focus on performance, security and being proactive are characteristics of an effective NOC.
2. Network Operations Center
A network operations center (NOC, pronounced like the word knock), also known as a
"network management center", is one or more locations from which network
monitoring and control, or network management, is exercised over
a computer, telecommunication or satellite network.
History
Early versions of NOCs have been around since the 1960s. A Network Control Center was opened in
New York by AT&T in 1962 that used status boards to display switch and route information, in real-
time, from AT&T's most important toll switches. AT&T later replaced their Network Control Center
with a NOC in 1977 in Bedminster, New Jersey
AT&T revamped and modernized the NOC in 1987, adding a 75-screen video wall where computer-
driven support systems provided information on multiple layers and categories of network activity.
Managers used computer systems and terminals to find detailed information on any switch or
route in the network. They then used those same systems to issue instructions to any place in the
network. Global Network Operations Center
AT&T’s system had become a Worldwide Intelligent Network. Two regional control centers, in
Denver and Conyers, Ga., opened in 1991, and assumed the task of monitoring and managing the
flow of traffic onto and off of the network.
In 1999, AT&T replaced the NOC with a new Global Network Operations Center, to better to meet
the needs of the 21st century.
Satish Chavan
3. Network Operations Center -Purpose
In telecommunication environments, NOCs are responsible for monitoring power
failures, access network, connectivity, communication equipment alarms and other
performance issues that may affect the telecom network and services.
A NOC is usually staffed 24×7 with personnel who continuously monitor for outages,
faults, critical events, and abnormalities with the network. These events are reported by
sophisticated network monitoring software installed on the network or on the individual
devices being monitored. At fixed time intervals, each device on the network checks in
with a central manager to provide vital statistics on its health. Requires a high level of
expertise and understanding of various technology platforms. This proactively ensures
that problems with the network are detected and fixed before they can cause significant
impact on the business.
Satish Chavan
5. Network Operations Center - Operations 1
NOC Operate – Level 1 support
Proactive alarm monitoring 24x7
Issue ticket management per service level agreements (SLA)
Fault management
NOC Operate – Level 2 support
Higher level support for fault management
Change execution
Root cause analysis
Co-ordination with TAC
NOC Operate – Level 3 support
Change validation
Problem management
Co-ordination with TAC
NOC Operate – Performance Management
Performance monitoring and reporting
Analysis and improvement suggestions
Satish Chavan
6. Network Operations Center - Operations 2
NOC Operate – Configuration
Configuration activities of new network elements
Integration of new NEs with the NOC
Addition of new route or patch, area into the network
Category based of time
full-time surveillance.
only after-hours
backup/disaster recovery service
NOC Consulting
build, operate, transfer service
Satish Chavan
7. NOC- Key characteristics & Business benefits
Key characteristics
1. Skilled Staff
2. Focus on Performance
3. Efficient Processes
4. Integrated Set of Tools
5. Automation and Intelligent Tools
6. Managing service performance
7. Focus on Security
8. Being proactive
9. Quality Consistency
Business benefits
1. Quality Consistency:
2. Better Traffic /Resource Management
3. Lower Cost
4. Higher Security
5. Reduce business impact through
proactive approach.
6. Customer satisfaction index
Satish Chavan
8. N O C - Standards
FCAPS is the ISO Telecommunications Management Network model and framework
for network management.
Is defined five areas, using the acronym FCAPS:
•Fault Management
•Configuration Management
•Accounting (Administration)
•Performance Management
•Security Management.
The FCAPS model can be seen as bottom-up or network-centric.
The FAB model looks at the processes more from top-down is customer/business-centric.
The two standards that have emerged are Simple Network Management Protocol (SNMP)
by IETF and Common Management Information Protocol (CMIP) by ITU-T.
FAB model defined in the Business Process Framework (eTOM). FAB is short for fulfillment,
assurance, billing.
Satish Chavan
9. N O C - FCAPS
1. Fault management deals with the process of recognizing, isolating, and resolving a fault that
occurs in the network. Identification of potential network issues also fall under Fault
management.
2. Configuration management involves collection and storage of configuration from various
network devices, and includes tracking changes to a device configuration. Because many
network issues are due to configuration changes gone wrong, this can be considered an
important contribution to proactive network management and monitoring.
3. Accounting applies to service-provider networks where network resource utilization is tracked
and then the information is used for billing or charge-back. In networks where billing does not
apply, accounting is replaced with administration, which refers to administering end-users in
the network with passwords, permissions, etc.
4. Performance management involves managing overall network performance. Data for
parameters associated with performance, such as throughput, packet loss, response times,
utilization, etc., are collected mostly using SNMP.
5. Security is another important area of network management. Security management
in FCAPS covers the process of controlling access to resources in the network which includes
data as well as configurations and protecting user information from unauthorized users.
Satish Chavan
11. FCAPS from an ITIL Perspective
Satish Chavan
FCAPS ITIL
Fault Management
Includes Detecting, Isolating and Resolving
network problems
Service Operations
Event Management
Incident Management
Configuration Management
Gathering and storing the network and
system configuration information
Tracks change
Simplifies the change process
Service Transition
Change and Configuration Management
Accounting Management
Facilitates better distribution of resources
Measures the resource usage
Helps reducing operational cost and
Establishes better control
Service Strategy
Financial Management
Service Design
Service Level Management
Service Operation
Technical and Application Management
12. FCAPS from an ITIL Perspective
Satish Chavan
FCAPS ITIL
Performance Management
To understand the current network health and
efficiency Includes measuring various
performance metrics Ensures service
availability and performance at an optimal level
Unnoticed problems might lead to Event
Management and Incident Management
Service Design
Capacity & Availability Management
Service Operation
Technical and Application Management
Continual Service Improvement
improve quality of service Includes standardizing
and base-lining of quality achieved.
Security Management
Maintains the user and business information
confidentiality Includes protecting the network
from unauthorized users Controls overall
activities and Ensures data security through
authentication and encryption
Service Design
Information Security Management
Service Operation
Access Management (Process)
Technical and Application Management (Function)
13. N O C -Network Monitoring
Common practices define the basic components that are essential for network monitoring and are
applicable to every network.
Best practices for monitoring is a guideline to implement a good network monitoring strategy.
Adopting the best practices can help the network admin streamline their network monitoring to
identify and resolve issues much faster with very less MTTR (Mean Time To Resolve).
Best Practices
• Baseline network behavior:
Base lining network behavior over a couple of weeks or even months will help the network admin
understand what normal behavior in the network is. Knowledge of baseline behavior aids
proactive troubleshooting and even prevents network downtime.
• Escalation matrix
Network issues become a problem is because the alerts triggered based on a threshold are
ignored or the right person is not alerted. In a large network, there are can be multiple
administrators or people who take care of different aspects of the network. Escalation Policy
when a malfunction occurs, or a potential problem is detected.
An escalation matrix and plan ensures that issues are looked at and resolved on time.
Satish Chavan
14. N O C -Network Monitoring
• Reports at every layer: Networks function based on the OSI Using a monitoring system that
supports multiple technologies to monitor at all layers, as well as different types of devices in the
network would make problem detection and troubleshooting easier. Thus, when an application
delivery fails, the monitoring system can alert whether it is a server issue, a routing problem, a
bandwidth problem, or a hardware malfunction.
• Implement High Availability with failover options: Most monitoring systems are set up in the
network they monitor. But if a problem occurs and the network goes down, the monitoring system
can go down too.
It is recommended to implement a monitoring strategy with High-Availability through failover. High
Availability (HA) ensures that the monitoring system does not have a single point of failure and
provide data needed for troubleshooting. And to avoid a single point of failure, it is recommended
to set up the failover system at a remote DR site.
• Configuration management: Most network issues originate from incorrect configurations. There
are several instances where even minor configuration mistakes have led to network downtime or
loss of data. Unauthorized configuration changes to devices can lead to serious security lapses that
include hacking and data theft.
• Capacity planning and Growth: An organization grows, infrastructure associated with the
organization also should grow. When setting up a monitoring system account for future growth.
Satish Chavan
15. Essential element in NOC management
Satish Chavan
Network Operation Center Best Practices in terms of process and tools .
1. Ticketing system
A ticketing system will enable you to keep track of all open issues, according to severity,
urgency and the person assigned to handle.
2. Knowledge base
Centralized source for all knowledge and documentation that is accessible to your entire
team. This knowledge base should be a fluid information source to be continuously updated
with experiences and lessons learned for future reference and improvements.
3 . Reporting
Reports on a daily, weekly and monthly basis, include all major incidents and a root cause for
every resolved incident.
4. Monitoring
There are two major types of monitoring processes relevant to NOC
•Monitoring infrastructure .
•Customer help desk/experience.
5. Process Automation
Implementing Process Automation significantly reduces mean time to recovery (MTTR) and
helps NOCs meet SLA’s by having a procedure in place to handle incident resolution and to
consistently provide high quality response regardless of complexity of the process.
examples - disk space clean-up, reset process help reducing the manual, routine tasks.
16. Key Factors NOC Performance Management Solution
Satish Chavan
•Real time complete system-wide visibility.
•Alerting and Reporting
•Monitoring Abilities
•Multi-vendor Support
•Scalability
•Simple Interface
•Easy to Deploy
•Notifications
17. NOC Service Assurance and Service Management Activities
Satish Chavan
KPIs & SLAs
1. Number of tickets received and resolved .
2. Number of tickets proactively raised and resolved based on severity.
3. Number of tickets escalated to technical operations .
4. Number of tickets solved in SLA without escalation to technical operations.
5. Tickets raised <15mins of the occurrence of alarm .
6. 3rd party escalation and follow-ups as per SLA.
Surveillance
/Fault
Incident
Management
Problem
Management
SLA
Management
Service Management Activities