3. 3
Before We Begin...
• If you have any questions, please type them in the Questions window.
• If you have any audio problems, please chat us for help.
• A recording of this presentation will be sent to you in a few days.
3
@ThousandEyes
5. 5
Actionable Insight for Internet, Cloud, and SaaS
Correlated Insights
Quickly isolate issues to app, network,
or service
Network Visibility
Overlay, hop-by-hop underlay, ISP
performance, and BGP routing
App Experience
SaaS, API, and internal app
performance and user experience
6. 6
Your Network ISP Cloud Provider
See the Internet Like It’s Your Own Network
Moscow, Russia
Paris, France
Chicago, IL
Visualize the link between network
topologies and service delivery
Rapidly isolate problem
domain and owner
8. 8
ThousandEyes Internet Insights: App Outages
Dev Tools
Communication
Tools
Human
Resources
Social
Networking
Finance eCommerce
Sales &
Marketing
Collaboration
Tools
Top Business SaaS Apps
• Global View of SaaS App Availability
• Accelerated & Empowered IT Operations
• Data-driven Vendor Governance
10. 10
Amazon Web Services – At a Glance
• Availability Zones
• Key Components
– EC2 – compute
– S3 – storage
– API Gateway
• Ecosystem
– 200+ services
• US-EAST-1 outsized
interdependency
11. 11
Application Programming Interface (API)
• Enables communication
between disparate
applications/systems
• Increased application
complexity
• Interdependencies and
domino effects
12. 12
Amazon API Gateway
• Gatekeeper for backend
APIs in AWS
• Capable of processing
hundreds of thousands of
concurrent API calls
• AWS offers internal services to
customers via API Gateway
14. 14
12/7 - Event Sequence as Observed by ThousandEyes
1532 UTC –
Outage
Begins
1535 UTC -
Server
Response
Failures
1640 UTC -
AWS Status
Page - First
Mention
1712 UTC –
AWS API
Transaction
Times
Increase
0100 UTC –
Return to
Normal
15. 15
12/7 - Event Sequence from Amazon RCA
1530 UTC –
Multiple
services
impacted due
to congestion
from
automated
activity
1533 UTC –
EC2 API
errors and
increased
latency
1728 UTC –
Internal DNS
remediation,
issues still
persist
Ongoing
network
congestion
remediation
measures
2134 UTC –
Significant
alleviation of
network
congestion
2135 UTC –
Container API
begins to
return to
normal
2222 UTC –
Network
devices and
AWS Console
access “all
clear”
2230 UTC -
Route 53 APIs
"all clear"
2240 UTC -
EC2 "all
clear"
0041 UTC –
API Gateways
recovered
16. 16
12/10 - Event Sequence as Observed by ThousandEyes
1305 UTC –
Outage
Begins
Server
Response
Failures
Brief Clear,
Followed by
Resumption
1430 UTC –
Return to
Normal
18. 18
Lessons and Takeaways
• Understand your network and application interdependencies
– Front-end interfaces often depend on many back-end APIs
• How does your cloud provider work?
– Understand architecture and interdependencies
– Single AZ, multi-AZ, multi-cloud
– AWS ≠ Azure ≠ GCP
• Inform your Incident Response / Outage Management
– Specific guidance when issues take place
– Example: we’re seeing 2x API responses and it is impacting x, y, z across all zones
• Independent visibility and verification is needed
– Don’t just depend on the status page!
19. 19
@ThousandEyes
Learn
more
Free
Trial /
Demo
Next Steps
• Subscribe! https://blog.thousandeyes.com
• Get a real-time view of the health of the Internet
https://thousandeyes.com/outages
• Sign up for a Free Trial:
https://www.thousandeyes.com/signup
• Request a demo:
https://www.thousandeyes.com/request-demo