Scott Riley discusses the importance of real-time network management given the changing IT landscape and increased use of mobile devices, cloud services, and applications. Traditional monitoring tools have siloed views that make fault diagnosis difficult, while downtime is costly. Riley advocates adopting a single, well-connected monitoring platform that can correlate events across systems in real-time to more quickly identify issues. Such a platform allows for real-time monitoring, automatic fault remediation through scripts, bandwidth monitoring, and compliance management to improve network performance and uptime.
2. How to Get Real-Time Network Management right
Overcoming the challenges involved
With Scott Riley
3. About Scott
Scott is an IT management professional with 12 years of expertise in IT operations.
During the course of his career, he’s led technical teams to success across the UK in a
number of areas including: Network and Security, Hosting and Datacentres and Product
Development.
As Director of Cloud & Hosting Solutions at GCI, Scott has developed virtualisation
technology solutions and manages the shift from physical servers to virtualised services.
He ensures business processes and cost models are well aligned to a solution roadmap.
Email: scott.riley@gcicom.net
Twitter: @Fauxnuts
4. You Will Soon Discover
1. Importance of network performance management
2. Current challenges in network management
3. Impact of downtime on your business
4. Devising a real-time network monitoring strategy
5. Compliance management
6. Traffic shaping & bandwidth monitoring
7. Fault identification and remediation
5. Why is Network Management so
Important?
IT continually evolves, we need a core monitoring strategy and adaptable tooling
• Current Challenges
• Mobilegeddon!
• Cloud Growth in EMEA
• The Digital Tsunami
6. Current Challenges
Users are working longer hours, in more locations and across multiple devices
• Consumerisation of IT
• Explosion of Apps
• 4G LTE coverage expansion
• Enterprise playing catch-up on the
Home User experience
“Telstra now spends 50
per cent of each board
meeting discussing
future strategy and most
of it about how
technology is going to
change.”
ProfessorSteve Burdon
University of Technology,
Sydney
7. Mobilegeddon!
Mobile/Tablet usage has overtaken desktop
• Users now spend more time on
mobile than on Desktop Apps
• Google Ranking lowered if your
site is not Mobile-friendly
• 40% of sites are not ready
0
200
400
600
800
1000
1200
1400
1600
1800
2000
2007 2008 2009 2010 2011 2012 2013 2014 2015
Number of Global Users (Millions)
Desktop Mobile
8. Physical Server Decline in EMEA
Eastern Europe is showing the biggest decline in Server hardware shipment and Sales
8.7
-10.8
-6.7
-6.3
-5.2
0.7
-2.0
7.5
-6.3
-10.6
-4.1
-6.7
2.6
3.8
-15.0
-10.0
-5.0
0.0
5.0
10.0
APAC East EU Japan Latin Am MEA North Am West EU
Worldwide Server Shipping and Reveue 2014
Shipping Revenue
9. The Digital Tsunami
Cloud uptake in UK has skyrocketed in 2014
• More Demanding Users
• Faster Network Access
• More Services in Cloud
• More Apps to support
48% 53% 61% 69%
78% 84%
52% 47% 39% 31%
22% 16%
2010 2011 2012 2013 2014 2015
Cloud No Cloud
10. The Cost of Downtime
64% of businesses surveyed suffered downtime
Cost of Data loss and
Downtime in 2014
£31.3 Billion
Average Downtime
Experienced
27 hours
13. Challenges with Monitoring Silos
Rapidly diagnose issues with a single capable monitoring engine
• Partial view of system performance
• Manual correlation
• Multiple system-hopping to diagnose faults
• Alert flooding
14. A Strategy for Real-Time Monitoring
A single, well connected platform can correlate events in real-time
• Monitoring all of our systems
• Establishing baselines and trends
• Alert with specific information
• Guides us to the root cause
• Enables us to take action…quickly!
17. Compliance Management
Using a system to track configuration greatly improves your adherence to standards
• Measure compliance against
industry standards
• Set your own configuration
targets
• Track what was changed,
when, by whom
• Report on compliance status
18. Real-Time Monitoring
View events up to the second, not in a 5 minute average
• Real-time
statistics
• Up to the
second
information on
any metric
• Live bandwidth
graphs
19. Bandwidth Monitoring
From a high level overview, drill straight to areas of concern for rapid investigation
• Build High Level Business Maps
• Maps are colour coded based on
availability and performance of links
• Rapid drill down to detailed node
statistics
20. Traffic Shaping
Classify applications and apply Quality of Service policies
• Identify and categorise applications
• Apply Quality of Service Policies for
priority traffic
• Apply restrictions around non mission-
critical traffic
Anything
Else
Mission
Apps
Realtime
Apps
Voice &
Video
23. Automatic Fault Remediation
Small Service Provider with around 22,000
DSL subscribers
• Repeated fault with DSL disconnects
• Irritated customers
• Loss of confidence
• Increased Service Desk Tickets
24. Automatic Fault Remediation
Continuous monitor of SNR & Attenuation
levels
Alerts triggered on deviation from
expected levels | Helpdesk Ticket Created
Automated script reboots the alerting
devices at 00:01 the next morning
Ticket automatically updated after
maintenance confirming device online
25. Automatic Fault Remediation
By using Runbook automation;
1. Reduced their helpdesk callouts
2. Pro-actively repaired fault before
they impacted service
3. Improved overall customer
experience
Automation saves costs, reduces Mean Time To Repair (MTTR) and increases customer satisfaction
26. Summary
Thank you for your time!
The changing IT landscape | “Digital Tsunami”
The cost of downtime
The perils of monitoring silos
Real-time monitoring and event correlation
Automatic fault resolution