IntroductionAny Network Operation Centers (NOC’s) is always under a great pressure to meetthe demands of its organization’s business needs.The technological challenges keep growing and unforeseen problems always arrive.In fact, the only part you can count on to shrink are resources.What’s the best you can do? Make sure you’re employing the best practices, usingthe appropriate tools, and optimizing your processes and knowledge.This is what this e-book is all about.… this stills needs to be re-worked – the intro we had for the blog is too short anddid not work here… I wrote something fast, but see if you want something else..Do you have any statistics, graphs, or can direct me to some locations, so we canadd some visuals to the e-book?
Part 1: ToolsYou know tools are an essential element in NOC management. Butthey are also a key element for improvement. So what are the tools thatwill give you the top return or investment?(this parts needs to be enhanced too…)
5 essential toolsTicketing systemA ticketing system will enable you to keep track of all openissues, according to severity, urgency and the person assignedto handle each task. Knowing all pending issues will help you toprioritize the shift’s tasks and provide the best service to yourcustomers.KnowledgebaseKeep a one centralized source for all knowledge anddocumentation that is accessible to your entire team. Thisknowledge base should be a fluid information source to becontinuously updated with experiences and lessons learned forfuture reference and improvements.
Reporting and measurementsCreate reports on a daily and monthlybasis. A daily report should include allmajor incidents of the past 24 hours anda root cause for every resolved incident.This report is useful and essential for theshift leaders and NOC managers.It also keeps the rest of the IT departmentinformed about the NOC activities and ofmajor incidents. Compiling the dailyreports into a monthly report will helpmeasure the team’s progress. It will alsoshow areas where improvements can bemade or indicate any positive or negativetrends in performance.
MonitoringThere are two types of monitoring processes relevant toNOC:• Infrastructure monitoring, which can consist of the servers, the network or the data center environment.• User experience monitoring involves the simulation of user behavior and activities in order to replicate problems and find the most effective solutions.Implementing a service tree model that connects themonitoring infrastructure with an affected service willallow your team to alert other areas that may beaffected by the problems experienced.
IT Process AutomationImplementing IT Process Automation significantly reducesmean time to recovery (MTTR) and helps NOCs meet SLA’sby having a procedure in place to handle incident resolutionand to consistently provide high quality response regardlessof complexity of the process.IT Process Automation empowers a Level-one team to dealwith tasks that otherwise might require a Level-two team.Some examples include password reset, disk space clean-up, reset services etc. IT Process Automation is also a majorhelp with reducing the number of manual, routine IT tasksand free up time for more strategic projects..
Part 2: Knowledge &skillsBy ‘knowledge and skills’ we do not mean the obvious technicalknowledge, network ‘know-how’ your team members must hold inorder to run day-to-day operations, but rather –How you can ensure your team’s skills are used to their best potential,and how to keep those skills up to date over time.
Clearly define rolesDefinition of roles may vary between data centers and willdepend on team size, the IT environment and tasks. Still, thereshould be a clear distinction between the roles andresponsibilities of operators vs. shift supervisors in the NOC.Why does it matter?Mainly matters because of Decisions making. Without clearlydefined roles and responsibilities, a disagreement betweenoperators may lead to late decisions and actions, or to nodecisions taken at all. This may affect customers, criticalbusiness services, and urgent requests during off hours.It should be clearly defined, therefore, that a shift managermakes the final decisions.
Tasks & responsibilitiesAnother potential problem caused by a lack of role definition isthe division of tasks between operators and the shift leader.A shift manager should be responsible for: prioritizing tasks,assigning work to operators based on their skills, verifying thattickets are opened properly and that relevant personnel arenotified when required, escalating problems, communicating withmanagement during important NOC events, sending notificationsto the entire organization, preparing reports, and making criticaldecisions that impact many services, such as shutting down thedata center in case of an emergency.
Operators, on the other hand, are responsible forhandling the technical aspect of incidents – eitherindependently or by escalating to another teammember with the required skills. Operators arealso responsible for following up and keep ticketsup to date.While it might sound as if operators lack independence and responsibility,this is not the case. When faced with technical challenges, operators’input and skills are probably the most critical for resolution and smoothNOC operation. Operators provide additional insights into problems, andcan provide creative solutions when the standard procedures fail to work.