0
Network Intelligence
Without Borders
Mohit Lad
CEO and Co-founder
1
About ThousandEyes
Founded by network
experts; strong
investor backing
Relied on for
critical operations by
leading enterprises
Recognized as
an innovative
new approach
ThousandEyes delivers network intelligence into every network.
30 Fortune 500
5 top 5 SaaS Companies
4 top 6 US Banks
2
When You Think of Network Troubleshooting
3
Legacy Environments
NY Branch
HK Branch
Datacenter
• On-premises Apps
• Users in branch offices
over wired connections
• MPLS backbone
MPLS
MPLS
4
Internet Centric Environment
• Adoption of Cloud
Applications
• Split-tunnel from
branch offices
• Direct Internet Connectivity
between branch offices
• Wireless becoming primary
connectivity at branch offices
• Remote Users accessing
cloud applications directly
NY Branch
HK Branch
Datacenter 0365
Internet
5
ThousandEyes Cloud Agents
NY Branch
Datacenter 0365Internet
6
ThousandEyes Enterprise Agents
NY Branch
Datacenter 0365Internet
7
ThousandEyes Endpoint Agents
NY Branch
Datacenter 0365Internet
8
Product Design Principles
Intuitive &
Effective UI
Harness the Power
of SaaS
Innovative Data
Collection & Analytics
• Powerful visualizations to
model complex data
• UI design that is re-usable
and scalable
• Seamless support help
• Minimal deployment effort
• Auto-updates
• Centralized configuration
• Cross-customer data
correlation and analysis
• Easy data sharing between
different customers
• Measure black-box
environments using
active probing
• Measure with minimum
instrumentation
9
• Tackling Hybrid Network Environments with Enterprise
Agents
– Nick Kephart
• End to End Visibility with Endpoint Agent
– Scott Cressman, Martin Dam
• Internet Outage Detection
– Ricardo Oliveira
Rest of the Day
10
Tackling Hybrid Network Environments
with Enterprise Agents
Nick Kephart
11
Enterprise Agent: Internal Vantage Point
Key Use Cases
• Internet connectivity of
ISP ingress and egress
• WAN visibility between
branches and data
centers
• Performance of web,
voice and FTP
application traffic
NY Branch
HK Branch
Datacenter 0365
Internet
12
Deploying Enterprise Agents
• Locations with containerized
monitoring and operations tools
• For remote branches and stores with
limited IT infrastructure
• Branch and WAN routers (IOS XE
3.17+ on ASR 1000 and ISR 4000)
New
New
New
Virtual Appliance
Docker Container
Linux Package
Intel NUC Installer
Cisco IOS
Virtual Container
• Easily deployable across the
enterprise WAN and data center
13
Visualizing the Entire Network Path
Highlights
• Forward and reverse
path (helpful for
asymmetric routing)
• Measure and locate
changes in loss, latency
and QoS in each
direction
• Also test UDP in addition
to TCP
14
End-to-End Visibility
with Endpoint Agent
Scott Cressman
Martin Dam
15
End User Visibility Challenges
• Remote and traveling
workers
• SaaS deployments
• LAN and WAN issues
in satellite offices
NY Branch
HK Branch
Datacenter 0365
Internet
16
Today’s “Solutions”
17
Enter ThousandEyes Endpoint Agent
You can’t get this from any other monitoring solution, period.
• Extends visibility to the end-
user, in the office, at home,
on-the-go
• Troubleshoot individual user
sessions with live
performance data
• Analyze trends across user
populations, applications,
geographies
18
How Endpoint Agent Works
Lightweight client software
Windows 7+, Mac OS X 10.9+
Negligible resource consumption
Typically <1% CPU, <40MB mem, <50MB disk
Easy deployment via standard tools
msi & pkg installers w/ auto-registration
End-user & background components
Browser plugin (Chrome & IE) & system service
Always up-to-date
Updates automatically, runs in the background
WEB/APPLICATION
Completion, availability, response time, page
load waterfall
NETWORK
Loss, latency, jitter, failures, path visualization,
wireless topology, VPN, proxy, Wi-Fi quality
(live user sessions!)
Browser-based web applications • Only collects data for domains you choose to monitor
Data streamed instantly to ThousandEyes service
19
Complete Visibility from End User to Application
Internet Outage Detection
Ricardo Oliveira
CTO and Co-founder
21
The Problem Landscape
• Lack of visibility to apps
relying on the Internet
{UC,S,I,P}aaS
• Lack of visibility to
wireless/remote/mobile users
• Traditional NPM solutions
design for static clients and
on-prem apps
– Packet capture
– SNMP polling
NY Branch
HK Branch
Datacenter 0365
Internet
22
ThousandEyes Agents
NY Branch
Datacenter 0365Internet
23
• Internet is a shared network – same event impacts
multiple customers
• Harness data from multiple customers for more
accurate inference of problem
• Drive more value to customers with knowledge of
depth and breadth of problem
Drive for Internet Outage Detection
24
• Detect outages in ISPs and
understand their impact both
globally and as it relates to a
specific customer
Overview: Internet Outage Detection
• See the global and account
scope, as well as likely root
cause of BGP reachability
outages
Traffic Outage Detection Routing Outage Detection
25
1. Anonymized (http) traffic data is aggregated from all tests across the entire user base
2. Algorithms then look for patterns in path traces terminating in the same ISP
3. Exclude: noisy interfaces and networks not belonging to ISPs
How Traffic Outage Detection Works
New York
Cloud Agent
Boston
Enterprise Agent
Los Angeles
Cloud Agent
Level 3 in San Jose
Cogent in Denver
Salesforce
Google
NY Times
Customer 2
Customer 1
26
Traffic Outage Detection
Account
scope
Global scope
Severity and
scope of the
issue at this
interface
27
• ~ 170 affected interfaces / hour
Traffic Outages All the Time
28
Routing Outage Detection
Aggregates reachability issues in routing data from 350 routers
Global
scope
Account
scope
Root cause
analysis
29
• ~ 1.6k prefixes affected / hour
Routing Outages All the Time
30
Hurricane Electric route leak affecting AWS
Trans-Atlantic issues in Level 3
– https://blog.thousandeyes.com/trans-atlantic-issues-level-3-network/
Tata and TISparkle issues with submarine cable
– https://blog.thousandeyes.com/smw-4-cable-fault-ripple-effects-across-networks/
Hurricane Electric removed >500 prefixes
Tata cable cut in Singapore affecting Dropbox
Level 3, NTT routing issues affecting JIRA
– https://blog.thousandeyes.com/identifying-root-cause-routing-outage-detection/
Widespread issues in Telia’s network in Ashburn
– https://blog.thousandeyes.com/analyzing-internet-issues-traffic-outage-detection/
Recent Major Outages Detected
April 23
May 3
May 20
June 6
June 24
July 10
July 17
31
Examples of Notable Outages
32
1. Network Layer Issues in Telia in Ashburn
Detected outage
coincides with
packet loss
spikes
Ashburn, VA is
“ground zero”
for this outage
https://fvqmu.share.thousandeyes.com/
33
Specific Failure Points in Telia
High severity and wide scope
(Outages affecting at least 20 tests
for a NA/EU interface are likely to
be wide in scope)
Terminal
nodes in Telia
34
2. Hurricane Electric Route Flap
Detected outage
coincides with
spike in AS path
changes
Root cause
analysis points to
Hurricane Electric
and Telx
https://njjgkif.share.thousandeyes.com/
35
Route Flap by Hurricane Electric
Hurricane Electric
Routes flap from
using HE to NTT,
then back to HE
36
Traffic Issues in Hurricane Electric
Hurricane Electric
37
3. NTT and Level 3 Routing Issues Affect JIRA
JIRA saw 0% availability
and 100% packet loss
Most affected
interfaces are in
Ashburn, VA
https://ncigwwph.share.thousandeyes.com/
38
Traffic Terminating in NTT
Traffic paths originally
traversed Level 3 and NTT
Traffic paths then change
to traverse only NTT,
terminating there
39
JIRA’s /24 Prefix Becomes Unreachable
As the primary upstream
ISP, Level 3 is associated
with the most affected routes
Routes through upstream ISPs
NTT and Level 3 all withdrawn
40
Routers Begin Using Misconfigured /16 Prefix
The backup /16 prefix
directs to NTT, not JIRA’s
network. This is why the
traffic path changed to
traverse only NTT,
terminating there when
JIRA’s IP couldn’t be
found in NTT’s network.
41
Traffic Outages @ Cloud
• IaaS/PaaS (CDNs, hosting, DNS providers)
• SaaS (+ app context)
Routing Outages
• Leaks and hijacks
Outage Event Stream
• Outage geo + topology maps
• Alerts based on outage impact/location/type/etc
What’s Next
42
Outage Created by Level3 Flap
https://btwzofam.share.thousandeyes.com
43
• Look for purple indicators and the ‘Outage Detected’ dropdown when
investigating issues—these indicate detected outages!
• Use quick links or select specific nodes/ASes to see how paths have
changed over time
• Correlate data from the web, network and routing layers to analyze
root cause
• See our blogs and Knowledge Base articles for more info:
– Blog on Traffic Outage Detection
– https://blog.thousandeyes.com/analyzing-internet-issues-traffic-outage-detection/
– Blog on Routing Outage Detection
– https://blog.thousandeyes.com/identifying-root-cause-routing-outage-detection/
– Knowledge Base: https://support.thousandeyes.com/entries/110214366
Tips for Diagnosing Internet Outages
44
Thank You
@thousandeyes

ThousandEyes at Network Field Day 12

  • 1.
  • 2.
    1 About ThousandEyes Founded bynetwork experts; strong investor backing Relied on for critical operations by leading enterprises Recognized as an innovative new approach ThousandEyes delivers network intelligence into every network. 30 Fortune 500 5 top 5 SaaS Companies 4 top 6 US Banks
  • 3.
    2 When You Thinkof Network Troubleshooting
  • 4.
    3 Legacy Environments NY Branch HKBranch Datacenter • On-premises Apps • Users in branch offices over wired connections • MPLS backbone MPLS MPLS
  • 5.
    4 Internet Centric Environment •Adoption of Cloud Applications • Split-tunnel from branch offices • Direct Internet Connectivity between branch offices • Wireless becoming primary connectivity at branch offices • Remote Users accessing cloud applications directly NY Branch HK Branch Datacenter 0365 Internet
  • 6.
    5 ThousandEyes Cloud Agents NYBranch Datacenter 0365Internet
  • 7.
    6 ThousandEyes Enterprise Agents NYBranch Datacenter 0365Internet
  • 8.
    7 ThousandEyes Endpoint Agents NYBranch Datacenter 0365Internet
  • 9.
    8 Product Design Principles Intuitive& Effective UI Harness the Power of SaaS Innovative Data Collection & Analytics • Powerful visualizations to model complex data • UI design that is re-usable and scalable • Seamless support help • Minimal deployment effort • Auto-updates • Centralized configuration • Cross-customer data correlation and analysis • Easy data sharing between different customers • Measure black-box environments using active probing • Measure with minimum instrumentation
  • 10.
    9 • Tackling HybridNetwork Environments with Enterprise Agents – Nick Kephart • End to End Visibility with Endpoint Agent – Scott Cressman, Martin Dam • Internet Outage Detection – Ricardo Oliveira Rest of the Day
  • 11.
    10 Tackling Hybrid NetworkEnvironments with Enterprise Agents Nick Kephart
  • 12.
    11 Enterprise Agent: InternalVantage Point Key Use Cases • Internet connectivity of ISP ingress and egress • WAN visibility between branches and data centers • Performance of web, voice and FTP application traffic NY Branch HK Branch Datacenter 0365 Internet
  • 13.
    12 Deploying Enterprise Agents •Locations with containerized monitoring and operations tools • For remote branches and stores with limited IT infrastructure • Branch and WAN routers (IOS XE 3.17+ on ASR 1000 and ISR 4000) New New New Virtual Appliance Docker Container Linux Package Intel NUC Installer Cisco IOS Virtual Container • Easily deployable across the enterprise WAN and data center
  • 14.
    13 Visualizing the EntireNetwork Path Highlights • Forward and reverse path (helpful for asymmetric routing) • Measure and locate changes in loss, latency and QoS in each direction • Also test UDP in addition to TCP
  • 15.
    14 End-to-End Visibility with EndpointAgent Scott Cressman Martin Dam
  • 16.
    15 End User VisibilityChallenges • Remote and traveling workers • SaaS deployments • LAN and WAN issues in satellite offices NY Branch HK Branch Datacenter 0365 Internet
  • 17.
  • 18.
    17 Enter ThousandEyes EndpointAgent You can’t get this from any other monitoring solution, period. • Extends visibility to the end- user, in the office, at home, on-the-go • Troubleshoot individual user sessions with live performance data • Analyze trends across user populations, applications, geographies
  • 19.
    18 How Endpoint AgentWorks Lightweight client software Windows 7+, Mac OS X 10.9+ Negligible resource consumption Typically <1% CPU, <40MB mem, <50MB disk Easy deployment via standard tools msi & pkg installers w/ auto-registration End-user & background components Browser plugin (Chrome & IE) & system service Always up-to-date Updates automatically, runs in the background WEB/APPLICATION Completion, availability, response time, page load waterfall NETWORK Loss, latency, jitter, failures, path visualization, wireless topology, VPN, proxy, Wi-Fi quality (live user sessions!) Browser-based web applications • Only collects data for domains you choose to monitor Data streamed instantly to ThousandEyes service
  • 20.
    19 Complete Visibility fromEnd User to Application
  • 21.
    Internet Outage Detection RicardoOliveira CTO and Co-founder
  • 22.
    21 The Problem Landscape •Lack of visibility to apps relying on the Internet {UC,S,I,P}aaS • Lack of visibility to wireless/remote/mobile users • Traditional NPM solutions design for static clients and on-prem apps – Packet capture – SNMP polling NY Branch HK Branch Datacenter 0365 Internet
  • 23.
  • 24.
    23 • Internet isa shared network – same event impacts multiple customers • Harness data from multiple customers for more accurate inference of problem • Drive more value to customers with knowledge of depth and breadth of problem Drive for Internet Outage Detection
  • 25.
    24 • Detect outagesin ISPs and understand their impact both globally and as it relates to a specific customer Overview: Internet Outage Detection • See the global and account scope, as well as likely root cause of BGP reachability outages Traffic Outage Detection Routing Outage Detection
  • 26.
    25 1. Anonymized (http)traffic data is aggregated from all tests across the entire user base 2. Algorithms then look for patterns in path traces terminating in the same ISP 3. Exclude: noisy interfaces and networks not belonging to ISPs How Traffic Outage Detection Works New York Cloud Agent Boston Enterprise Agent Los Angeles Cloud Agent Level 3 in San Jose Cogent in Denver Salesforce Google NY Times Customer 2 Customer 1
  • 27.
    26 Traffic Outage Detection Account scope Globalscope Severity and scope of the issue at this interface
  • 28.
    27 • ~ 170affected interfaces / hour Traffic Outages All the Time
  • 29.
    28 Routing Outage Detection Aggregatesreachability issues in routing data from 350 routers Global scope Account scope Root cause analysis
  • 30.
    29 • ~ 1.6kprefixes affected / hour Routing Outages All the Time
  • 31.
    30 Hurricane Electric routeleak affecting AWS Trans-Atlantic issues in Level 3 – https://blog.thousandeyes.com/trans-atlantic-issues-level-3-network/ Tata and TISparkle issues with submarine cable – https://blog.thousandeyes.com/smw-4-cable-fault-ripple-effects-across-networks/ Hurricane Electric removed >500 prefixes Tata cable cut in Singapore affecting Dropbox Level 3, NTT routing issues affecting JIRA – https://blog.thousandeyes.com/identifying-root-cause-routing-outage-detection/ Widespread issues in Telia’s network in Ashburn – https://blog.thousandeyes.com/analyzing-internet-issues-traffic-outage-detection/ Recent Major Outages Detected April 23 May 3 May 20 June 6 June 24 July 10 July 17
  • 32.
  • 33.
    32 1. Network LayerIssues in Telia in Ashburn Detected outage coincides with packet loss spikes Ashburn, VA is “ground zero” for this outage https://fvqmu.share.thousandeyes.com/
  • 34.
    33 Specific Failure Pointsin Telia High severity and wide scope (Outages affecting at least 20 tests for a NA/EU interface are likely to be wide in scope) Terminal nodes in Telia
  • 35.
    34 2. Hurricane ElectricRoute Flap Detected outage coincides with spike in AS path changes Root cause analysis points to Hurricane Electric and Telx https://njjgkif.share.thousandeyes.com/
  • 36.
    35 Route Flap byHurricane Electric Hurricane Electric Routes flap from using HE to NTT, then back to HE
  • 37.
    36 Traffic Issues inHurricane Electric Hurricane Electric
  • 38.
    37 3. NTT andLevel 3 Routing Issues Affect JIRA JIRA saw 0% availability and 100% packet loss Most affected interfaces are in Ashburn, VA https://ncigwwph.share.thousandeyes.com/
  • 39.
    38 Traffic Terminating inNTT Traffic paths originally traversed Level 3 and NTT Traffic paths then change to traverse only NTT, terminating there
  • 40.
    39 JIRA’s /24 PrefixBecomes Unreachable As the primary upstream ISP, Level 3 is associated with the most affected routes Routes through upstream ISPs NTT and Level 3 all withdrawn
  • 41.
    40 Routers Begin UsingMisconfigured /16 Prefix The backup /16 prefix directs to NTT, not JIRA’s network. This is why the traffic path changed to traverse only NTT, terminating there when JIRA’s IP couldn’t be found in NTT’s network.
  • 42.
    41 Traffic Outages @Cloud • IaaS/PaaS (CDNs, hosting, DNS providers) • SaaS (+ app context) Routing Outages • Leaks and hijacks Outage Event Stream • Outage geo + topology maps • Alerts based on outage impact/location/type/etc What’s Next
  • 43.
    42 Outage Created byLevel3 Flap https://btwzofam.share.thousandeyes.com
  • 44.
    43 • Look forpurple indicators and the ‘Outage Detected’ dropdown when investigating issues—these indicate detected outages! • Use quick links or select specific nodes/ASes to see how paths have changed over time • Correlate data from the web, network and routing layers to analyze root cause • See our blogs and Knowledge Base articles for more info: – Blog on Traffic Outage Detection – https://blog.thousandeyes.com/analyzing-internet-issues-traffic-outage-detection/ – Blog on Routing Outage Detection – https://blog.thousandeyes.com/identifying-root-cause-routing-outage-detection/ – Knowledge Base: https://support.thousandeyes.com/entries/110214366 Tips for Diagnosing Internet Outages
  • 45.