Outage Analysis:
March 5th/6th 2024
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential 1
Meta, Comcast, and LinkedIn
Featured speakers
Brian Tobia
Technical Marketing Engineer
Kemal Sanjta
Principal Internet Analyst
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
• If you have any questions, please type them in the Questions window.
• If you have any audio problems, please chat us for help.
• A recording of this presentation will be sent to you in a few days.
• Interested in more outage analysis and Internet insights? Check out
the ThousandEyes blog and The Internet Report podcast.
Before We Begin…
Meta
Started ~ 15:00 UTC and resolved
at 19:27 UTC (4+ hour outage)
Affected multiple Meta services
(Facebook, Instagram, Messenger,
Threads)
Network connectivity was un-
interrupted
Root cause was backend
authentication system failures
causing the application to be
unavailable
March 5, 2024
Started at 19:45 UTC and was
resolved at 21:40 UTC (2 hour
outage)
ThousandEyes saw increased
network latency and
timeouts/loss of connectivity
within the Comcast backbone
network (Texas)
Affected routing/connectivity to
multiple applications such as
Webex, Salesforce, and AWS
March 5, 2024
Started at 20:45 UTC and
resolved at 21:50 UTC (1 hour
outage)
Network connectivity was
unaffected
Application/HTTP service errors
were seen, making the service
inaccessible
No ramp up/ramp down,
indicating a possible back-end
failure
March 6, 2024
Outages Summary
Meta Outage
March 5th, 2024
5
6
High Level Summary
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential
● Beginning at approximately 15:00 UTC on 3/5/2024,
Meta experienced a significant global disruption that
impacted connectivity to many of its applications,
including Facebook, Instagram, Messenger, and
Threads.
● Platform was reachable, but users could not
proceed past authentication
● Users saw rejected passwords, feeds not being
refreshed, and were being logged out.
● No significant network conditions were observed
that could have been contributing to or causing the
outage.
Meta Outage
7
Network Paths Clear
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential
Meta Outage
8
HTTP Server Reachable
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential
Meta Outage
9
Issue Confirmation and Recovery
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential
● At around 15:17 UTC, Meta confirmed that they
were experiencing issues with their login services.
The issue was likely caused by a failure in one of
the dependencies that the login system relies on.
● ThousandEyes observed a gradual recovery of
impacted Meta services. At approximately 16:50
UTC, it appeared that the services had been
restored for some users
● ThousandEyes was able to confirm that by 18:40
UTC the majority of regions were able to connect.
● Finally, at 19:27 UTC, Meta made the official
announcement that the issue was fully resolved.
Meta Outage
Sharelink Demo
10
Comcast Outage
March 5th, 2024
11
12
High Level Summary
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential
● At approximately 19:45 UTC on March 5th,
ThousandEyes began observing outage
conditions in parts of Comcast’s network
● Impacted the reachability of many
applications and services, including Webex,
Salesforce, and AWS
● The outage appears to have impacted
traffic as it traversed Comcast’s network
backbone in Houston, Texas, including
traffic that originated in regions such as
California and Colorado
Comcast Outage
13
Network Path Failures and Packet Loss
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential
Comcast Outage
14
Comcast POP Network Architecture
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential
Comcast Outage
15
Comcast’s Houston POP During the Outage
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential
Comcast Outage
16
Issue Identification and Confirmation
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential
● This outage was initially misattributed as application
issues vs. a network outage
● Many consumer applications did not see issues due
to content for these services being cached and
served locally from CDN’s
● This outage mainly affected enterprise applications
which require traffic to be routed to centralized
locations or east/west within provider networks to
access data, rather than it being more localized
● Application providers confirmed ongoing issues due
to an external network provider
Comcast Outage
17
Issue Recovery
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential
● The outage continued for around two hours
and was eventually resolved by 21:40 UTC
● Part of the impact of this issue is due to
how AWS routes traffic into its networks
● AWS relies more on public internet to
transit users into their backbone networks
whereas Azure and GCP prefer users to
enter their network closer to the source
● Applications hosted in AWS might have felt
more impact due to users traversing over
Comcast networks to get to them
● How can you recover once you’ve detected
a provider issue? – leverage VPN/SSE to
route around the issue with multiple
providers
Comcast Outage
Sharelink Demo
18
LinkedIn Outage
March 6th, 2024
19
20
High Level Summary
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential
● On March 6, ThousandEyes detected an almost two-
hour service disruption for global users of LinkedIn
that manifested as service unavailable error
messages, suggesting a backend application issue.
● This global outage was a lights on/ lights off type of
scenario
○ No gradual performance decline which would
have indicated load issues
○ Timed out at origin means the resource
suddenly went away
● HTTP and receive errors were seen throughout the
issue, indicating that an application component had
failed
○ Page load time increased and many tests saw
operation cancelled errors
● Network paths were unaffected and showed no errors
or packet loss
LinkedIn Outage
21
Issue Confirmation and Recovery
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential
● LinkedIn acknowledged the issue at 20:50 UTC
● As the outage progressed, we saw tests returning
HTTP 503 service unavailable messages
○ Indicating that the back-end service is not
accessible
● As the service began to recover, the tests switched
to showing HTTP 502 bad gateway
○ This can indicate an invalid response was
received - backend was available, but not
sending proper data
○ This can also sometimes be seen as a sign
of system load as a service is being
restarted/restored
● ThousandEyes saw access to the LinkedIn web
application beginning to recover around 21:38 UTC
and LinkedIn posted that the issue was resolved as
of 22:05 UTC
LinkedIn Outage
Sharelink Demo
22
23
Lessons and Takeaways
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential
● Keep business moving by having a backup solution in place when an outage
happens.
○ For example, while LinkedIn was down, some companies and individuals
may have turned to posting updates on X, Facebook, or other social
media platforms.
● Authentication is a crucial step in the process of accessing a service.
A failure at this step can impact the entire application delivery chain, causing
major disruptions for users.
● It is crucial to have a complete view of the entire system in order to identify any
decrease in performance or functionality
○ Make sure you are testing all components, including 3rd party API’s
● Knowing as quickly as possible that performance has drifted from desired levels
is critical to reducing the pain to your customers, users, and partners.
● When services you rely on experience issues, a good first question is whether
the problem lies with you, that application, or a third-party provider.
24
Sharelinks
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential
Comcast: https://abxcsltbdauvkratsbjcuveindsjxenn.share.thousandeyes.com/view/internet-
insights/?roundId=1709668800&metric=interfaces&scenarioId=outageTraffic&filters=N4IgZglg
NgLgpgJwM4gFygIYActQgYwxggHsA7AERIFsMIyAVDAczQG0BdAGhG1wKKkyAeQCuMFnA
DK%2BEljjsQAQQBiKgKIBhBuooB9HVIZ7hAJT0BJAHJ7NShkoAywgOIhuvJGXYB2AJwATAE
eMAgYYJD4YhLM0rLyiqoa2roG6kYm5ta29k6u7gC%2BBUA
LinkedIn: https://aqhlwhhcznjuwoweapziutmrjmkrrexg.share.thousandeyes.com
Next Steps
Subscribe to our blog to keep up-to-date!​
thousandeyes.com/blog/​
Tune in to The Internet Report Podcast.​
thousandeyes.com/the-internet-report/
Check out the in-depth Comcast Outage
Analysis:
thousandeyes.com/blog/comcast-outage-
analysis-march-5-2024
New tutorial videos on our features​
thousandeyes.com/resources/?cat=tutorial​
New Getting Started Guides​
docs.thousandeyes.com/product-
documentation/getting-started
Blog &
Podcast
Learning
Resources
26
© 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential

Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn

  • 1.
    Outage Analysis: March 5th/6th2024 © 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential 1 Meta, Comcast, and LinkedIn
  • 2.
    Featured speakers Brian Tobia TechnicalMarketing Engineer Kemal Sanjta Principal Internet Analyst © 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
  • 3.
    • If youhave any questions, please type them in the Questions window. • If you have any audio problems, please chat us for help. • A recording of this presentation will be sent to you in a few days. • Interested in more outage analysis and Internet insights? Check out the ThousandEyes blog and The Internet Report podcast. Before We Begin…
  • 4.
    Meta Started ~ 15:00UTC and resolved at 19:27 UTC (4+ hour outage) Affected multiple Meta services (Facebook, Instagram, Messenger, Threads) Network connectivity was un- interrupted Root cause was backend authentication system failures causing the application to be unavailable March 5, 2024 Started at 19:45 UTC and was resolved at 21:40 UTC (2 hour outage) ThousandEyes saw increased network latency and timeouts/loss of connectivity within the Comcast backbone network (Texas) Affected routing/connectivity to multiple applications such as Webex, Salesforce, and AWS March 5, 2024 Started at 20:45 UTC and resolved at 21:50 UTC (1 hour outage) Network connectivity was unaffected Application/HTTP service errors were seen, making the service inaccessible No ramp up/ramp down, indicating a possible back-end failure March 6, 2024 Outages Summary
  • 5.
  • 6.
    6 High Level Summary ©2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential ● Beginning at approximately 15:00 UTC on 3/5/2024, Meta experienced a significant global disruption that impacted connectivity to many of its applications, including Facebook, Instagram, Messenger, and Threads. ● Platform was reachable, but users could not proceed past authentication ● Users saw rejected passwords, feeds not being refreshed, and were being logged out. ● No significant network conditions were observed that could have been contributing to or causing the outage. Meta Outage
  • 7.
    7 Network Paths Clear ©2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential Meta Outage
  • 8.
    8 HTTP Server Reachable ©2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential Meta Outage
  • 9.
    9 Issue Confirmation andRecovery © 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential ● At around 15:17 UTC, Meta confirmed that they were experiencing issues with their login services. The issue was likely caused by a failure in one of the dependencies that the login system relies on. ● ThousandEyes observed a gradual recovery of impacted Meta services. At approximately 16:50 UTC, it appeared that the services had been restored for some users ● ThousandEyes was able to confirm that by 18:40 UTC the majority of regions were able to connect. ● Finally, at 19:27 UTC, Meta made the official announcement that the issue was fully resolved. Meta Outage
  • 10.
  • 11.
  • 12.
    12 High Level Summary ©2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential ● At approximately 19:45 UTC on March 5th, ThousandEyes began observing outage conditions in parts of Comcast’s network ● Impacted the reachability of many applications and services, including Webex, Salesforce, and AWS ● The outage appears to have impacted traffic as it traversed Comcast’s network backbone in Houston, Texas, including traffic that originated in regions such as California and Colorado Comcast Outage
  • 13.
    13 Network Path Failuresand Packet Loss © 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential Comcast Outage
  • 14.
    14 Comcast POP NetworkArchitecture © 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential Comcast Outage
  • 15.
    15 Comcast’s Houston POPDuring the Outage © 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential Comcast Outage
  • 16.
    16 Issue Identification andConfirmation © 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential ● This outage was initially misattributed as application issues vs. a network outage ● Many consumer applications did not see issues due to content for these services being cached and served locally from CDN’s ● This outage mainly affected enterprise applications which require traffic to be routed to centralized locations or east/west within provider networks to access data, rather than it being more localized ● Application providers confirmed ongoing issues due to an external network provider Comcast Outage
  • 17.
    17 Issue Recovery © 2024Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential ● The outage continued for around two hours and was eventually resolved by 21:40 UTC ● Part of the impact of this issue is due to how AWS routes traffic into its networks ● AWS relies more on public internet to transit users into their backbone networks whereas Azure and GCP prefer users to enter their network closer to the source ● Applications hosted in AWS might have felt more impact due to users traversing over Comcast networks to get to them ● How can you recover once you’ve detected a provider issue? – leverage VPN/SSE to route around the issue with multiple providers Comcast Outage
  • 18.
  • 19.
  • 20.
    20 High Level Summary ©2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential ● On March 6, ThousandEyes detected an almost two- hour service disruption for global users of LinkedIn that manifested as service unavailable error messages, suggesting a backend application issue. ● This global outage was a lights on/ lights off type of scenario ○ No gradual performance decline which would have indicated load issues ○ Timed out at origin means the resource suddenly went away ● HTTP and receive errors were seen throughout the issue, indicating that an application component had failed ○ Page load time increased and many tests saw operation cancelled errors ● Network paths were unaffected and showed no errors or packet loss LinkedIn Outage
  • 21.
    21 Issue Confirmation andRecovery © 2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential ● LinkedIn acknowledged the issue at 20:50 UTC ● As the outage progressed, we saw tests returning HTTP 503 service unavailable messages ○ Indicating that the back-end service is not accessible ● As the service began to recover, the tests switched to showing HTTP 502 bad gateway ○ This can indicate an invalid response was received - backend was available, but not sending proper data ○ This can also sometimes be seen as a sign of system load as a service is being restarted/restored ● ThousandEyes saw access to the LinkedIn web application beginning to recover around 21:38 UTC and LinkedIn posted that the issue was resolved as of 22:05 UTC LinkedIn Outage
  • 22.
  • 23.
    23 Lessons and Takeaways ©2024 Cisco Systems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential ● Keep business moving by having a backup solution in place when an outage happens. ○ For example, while LinkedIn was down, some companies and individuals may have turned to posting updates on X, Facebook, or other social media platforms. ● Authentication is a crucial step in the process of accessing a service. A failure at this step can impact the entire application delivery chain, causing major disruptions for users. ● It is crucial to have a complete view of the entire system in order to identify any decrease in performance or functionality ○ Make sure you are testing all components, including 3rd party API’s ● Knowing as quickly as possible that performance has drifted from desired levels is critical to reducing the pain to your customers, users, and partners. ● When services you rely on experience issues, a good first question is whether the problem lies with you, that application, or a third-party provider.
  • 24.
    24 Sharelinks © 2024 CiscoSystems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential Comcast: https://abxcsltbdauvkratsbjcuveindsjxenn.share.thousandeyes.com/view/internet- insights/?roundId=1709668800&metric=interfaces&scenarioId=outageTraffic&filters=N4IgZglg NgLgpgJwM4gFygIYActQgYwxggHsA7AERIFsMIyAVDAczQG0BdAGhG1wKKkyAeQCuMFnA DK%2BEljjsQAQQBiKgKIBhBuooB9HVIZ7hAJT0BJAHJ7NShkoAywgOIhuvJGXYB2AJwATAE eMAgYYJD4YhLM0rLyiqoa2roG6kYm5ta29k6u7gC%2BBUA LinkedIn: https://aqhlwhhcznjuwoweapziutmrjmkrrexg.share.thousandeyes.com
  • 25.
    Next Steps Subscribe toour blog to keep up-to-date!​ thousandeyes.com/blog/​ Tune in to The Internet Report Podcast.​ thousandeyes.com/the-internet-report/ Check out the in-depth Comcast Outage Analysis: thousandeyes.com/blog/comcast-outage- analysis-march-5-2024 New tutorial videos on our features​ thousandeyes.com/resources/?cat=tutorial​ New Getting Started Guides​ docs.thousandeyes.com/product- documentation/getting-started Blog & Podcast Learning Resources
  • 26.
    26 © 2024 CiscoSystems, Inc. and/or its affiliates. All rights reserved. Cisco Confidential