Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Diagnosing Internet Outages

397 views

Published on

Internet Outage Detection allows users to rapidly detect both network and routing outages and understand their scope and likely root cause. Explore data from major outages and learn to use Internet Outage Detection to diagnose issues and their impact. See the webinar recording at https://www.thousandeyes.com/resources/diagnosing-internet-outages-webinar

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Diagnosing Internet Outages

  1. 1. 
 Diagnosing Internet Outages Young Xu, Product Marketing Analyst
  2. 2. 2 About ThousandEyes ThousandEyes delivers visibility into every network your organization relies on. Founded by network experts; strong investor backing Relied on for " critical operations by leading enterprises Recognized as " an innovative " new approach 27 Fortune 500 5 top 5 SaaS Companies 4 top 6 US Banks
  3. 3. 3 I see an outage. Is this affecting! just this one test, just me, or everybody?!
  4. 4. 4 •  Detect outages in ISPs and understand their impact both globally and as it relates to your organization Overview: Internet Outage Detection •  See the global and account scope, as well as likely root cause of BGP reachability outages Traffic Outage Detection Routing Outage Detection
  5. 5. 5 1.  Anonymized traffic data is aggregated from all tests across the entire user base 2.  Algorithms then look for patterns in path traces terminating in the same ISP How Traffic Outage Detection Works New York Cloud Agent Boston Enterprise Agent Los Angeles Cloud Agent Level 3 in San Jose Cogent in Denver Salesforce Google NY Times Customer 2 Customer 1
  6. 6. 6 Traffic Outage Detection Account scope Global scope Severity and scope of the issue at this interface
  7. 7. 7 Routing Outage Detection Aggregates reachability issues in routing data from 300+ public monitors Global scope Account scope Root cause analysis
  8. 8. 8 •  April 23: Hurricane Electric route leak affecting AWS •  May 3: Trans-Atlantic issues in Level 3 –  https://blog.thousandeyes.com/trans-atlantic-issues-level-3-network/ •  May 20: Tata and TISparkle issues with submarine cable –  https://blog.thousandeyes.com/smw-4-cable-fault-ripple-effects-across-networks/ •  June 6: Hurricane Electric removed >500 prefixes •  June 24: Tata cable cut in Singapore affecting Dropbox •  July 10: Level 3, NTT routing issues affecting JIRA –  https://blog.thousandeyes.com/identifying-root-cause-routing-outage-detection/ •  July 17: Widespread issues in Telia’s network in Ashburn –  https://blog.thousandeyes.com/analyzing-internet-issues-traffic-outage-detection/ Recent Major Outages Detected
  9. 9. 9 •  Look for purple indicators and the ‘Outage Detected’ dropdown when investigating issues—these indicate detected outages! •  Use quick links or select specific nodes/ASes to see how paths have changed over time •  Correlate data from the web, network and routing layers to analyze root cause •  See our blogs and Knowledge Base articles for more info: –  Blog on Traffic Outage Detection –  https://blog.thousandeyes.com/analyzing-internet-issues-traffic-outage-detection/ –  Blog on Routing Outage Detection –  https://blog.thousandeyes.com/identifying-root-cause-routing-outage-detection/ –  Knowledge Base: –  https://support.thousandeyes.com/entries/110214366 Tips for Diagnosing Internet Outages
  10. 10. 10 Demo
  11. 11. 11 1. Network Layer Issues in Telia in Ashburn Detected outage coincides with packet loss spikes Ashburn, VA is “ground zero” for this outage
  12. 12. 12 Specific Failure Points in Telia High severity and wide scope (Outages affecting at least 20 tests for a NA/EU interface are likely to be wide in scope) Terminal nodes in Telia
  13. 13. 13 2. Hurricane Electric Route Flap Affects Telx Detected outage coincides with spike in AS path changes Root cause analysis points to Hurricane Electric and Telx
  14. 14. 14 Route Flap by Hurricane Electric Hurricane Electric Routes flap from using HE to NTT, then back to HE
  15. 15. 15 Causing Traffic Issues in Hurricane Electric Hurricane Electric
  16. 16. 16 3. NTT and Level 3 Routing Issues Affect JIRA JIRA saw 0% availability and 100% packet loss Most affected interfaces are in Ashburn, VA
  17. 17. 17 Traffic Terminating in NTT Traffic paths originally traversed Level 3 and NTT Traffic paths then change to traverse only NTT, terminating there
  18. 18. 18 JIRA’s /24 Prefix Becomes Unreachable As the primary upstream ISP, Level 3 is associated with the most affected routes Routes through upstream ISPs NTT and Level 3 all withdrawn
  19. 19. 19 Routers Begin Using Misconfigured /16 Prefix The backup /16 prefix directs to NTT, not JIRA’s network. This is why the traffic path changed to traverse only NTT, terminating there when JIRA’s IP couldn’t be found in NTT’s network.
  20. 20. See what you’re missing. Watch the webinar: https://www.thousandeyes.com/resources/diagnosing-internet-outages-webinar

×