Zombie routes
Paweł Małachowski, 2020.09.29
@pawmal80
Agenda
1. BGP withdrawals and zombie routes
2. Real life cases
3. Detection and debugging
4. Zombie risk mitigation
whoami(1)
 Atende Software
 redGuardian DDoS mitigation
 my previous talks: DPDK, DPI/regexp, DUT perftesting, BGP hijacks
https://www.slideshare.net/atendesoftware/presentations
 Previously
 Netia S.A.
 ATM S.A.
 local hosting and ISP companies, community network
 Roles: system engineer, IT operations lead, business analyst
@pawmal80
BGP withdrawals and zombie routes
BGP zombie / ghost route
 „an active routing table entry for a prefix that has been withdrawn
by its origin network”
source: https://labs.ripe.net/Members/romain_fontugne/bgp-zombies (2019)
see also: „BGP Zombies: an Analysis of Beacons Stuck Routes” (2019),
https://www.iij-ii.co.jp/en/members/romain/pdf/romain_pam2019.pdf
 not a new phenomenon
 Ghost Route Hunter (2003): https://www.sixxs.net/tools/grh/what/
 „An overview of the global IPv6 routing table” (2005):
https://meetings.ripe.net/ripe-50/presentations/ripe50-plenary-tue-ipv6-routing.pdf
 may take hours/days to „expire”
BGP zombie / ghost route
 Who cares?
It was withdrawn anyway!
 Unless we are talking about
 partial withdrawal and some ingress traffic goes via different path
you may expect / does not converge or even loops
 more-specific route and zombie sits in Tier1/Tier2/NSP/IXP
infrastructure causing partial or complete outage
More-specific prefix usage examples
 Traffic engineering
 Announce 10.0.0.0/23 into global table
 Announce 10.0.0.0/24 to some IXP peers to override their local prefs
 Customer delegation
 ISP1 announces 10.0.0.0/16 PA block
 ISP1 delegates 10.1.2.0/24 to customer
 Customer runs own BGP, announces 10.1.2.0/24 via ISP1, ISP2 and IXP
Real life cases
2016 (TPNET-OTI loop)
 Orange PL (5617) – Opentransit (5511)
 Zombie AS path: 5511 1299 24724 57811 201029 x
 Looking glass:
 TPNET sees (zombie) more specific via OTI
 OTI has less specific via TPNET
 I gave up after 20 minute outage and reannounced
more specific to save „x”
 Withdrawn later with no issues
2016 (Interoute/AS8928 hijack)
1. Warsaw: PLIX, THINX, NASK
2. Interoute: Prague, Paris, Madrid
3. NTT Madrid
4. Telia: Madrid, Hamburg
5. Warsaw: TPNET
6. Customer
2016 (Interoute/AS8928 hijack)
• zombie /24 route via NTT at former
Interoute/Madrid hijacked significant part of
ingress traffic
• luckily, no loop; trace reaches customer in
Warsaw
• many hours, finally „fixed” by
announce/withdraw flaps
2018 (Telia loop)
Massive outage after
„1299 3356 …”
path withdrawal
2018 (Telia loop)
2018 (Telia loop)
• 1299 announces zombie route
• hijacks and loops large portion of ingress traffic
• we reproduced this problem with another, non-production prefix
• ~two days of disaster!
• „Routeprocessor Switchover in one of our backbone router in Chicago
solved the issue”
2020 (TATA-Level3 loop)
Router: gin-n0v-tcore1
Site: US, New York, N0V
Command: traceroute inet4 x as-number-lookup
traceroute to x (x), 30 hops max, 52 byte packets
1 if-ae-7-5.tcore1.nto-newyork.as6453.net (63.243.128.141) 2.990 ms 1.545 ms 1.369 ms
MPLS Label=415563 CoS=0 TTL=1 S=1
2 if-ae-9-2.tcore1.n75-newyork.as6453.net (63.243.128.122) 1.653 ms 1.704 ms 1.439 ms
3 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 3.038 ms 1.118 ms 3.086 ms
4 ae-1-3103.ear3.Frankfurt1.Level3.net (4.69.163.86) [AS 3356] 82.672 ms 81.989 ms 82.221 ms
5 ix-ae-18-0.tcore1.fr0-frankfurt.as6453.net (195.219.50.49) 82.072 ms 81.949 ms 81.731 ms
6 if-ae-4-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.17) 87.154 ms if-ae-59-2.tcore2.fnm-
frankfurt.as6453.net (195.219.87.194) 87.064 ms 87.038 ms
MPLS Label=486720 CoS=0 TTL=1 S=1
7 if-ae-30-2.tcore1.pvu-paris.as6453.net (80.231.153.89) 86.645 ms if-ae-9-3.tcore1.pvu-
paris.as6453.net (195.219.87.14) 87.036 ms if-ae-9-2.tcore1.pvu-paris.as6453.net (195.219.87.10)
87.412 ms
MPLS Label=345609 CoS=0 TTL=1 S=1
8 if-ae-11-2.tcore1.pye-paris.as6453.net (80.231.153.50) 87.357 ms 87.522 ms 86.774 ms
MPLS Label=525823 CoS=0 TTL=1 S=1
9 if-ae-3-2.tcore1.l78-london.as6453.net (80.231.154.143) 87.089 ms 86.984 ms 87.120 ms
MPLS Label=558832 CoS=0 TTL=1 S=1
10 if-ae-66-2.tcore2.nto-newyork.as6453.net (80.231.130.106) 86.711 ms 86.872 ms 87.689 ms
MPLS Label=300093 CoS=0 TTL=1 S=1
11 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 86.838 ms 86.749 ms 86.667 ms
12 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 87.039 ms 86.777 ms 108.465 ms
13 ae-1-3103.ear3.Frankfurt1.Level3.net (4.69.163.86) [AS 3356] 167.903 ms 167.436 ms 167.919
ms
14 ix-ae-18-0.tcore1.fr0-frankfurt.as6453.net (195.219.50.49) 167.316 ms 167.016 ms 167.156 ms
15 if-ae-4-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.17) 172.082 ms 172.347 ms if-ae-59-
2.tcore2.fnm-frankfurt.as6453.net (195.219.87.194) 172.688 ms
MPLS Label=486720 CoS=0 TTL=1 S=1
16 if-ae-9-3.tcore1.pvu-paris.as6453.net (195.219.87.14) 172.403 ms if-ae-9-2.tcore1.pvu-
paris.as6453.net (195.219.87.10) 177.623 ms 172.588 ms
MPLS Label=345609 CoS=0 TTL=1 S=1
17 if-ae-11-2.tcore1.pye-paris.as6453.net (80.231.153.50) 173.956 ms 176.402 ms 172.581
ms
MPLS Label=525823 CoS=0 TTL=1 S=1
18 if-ae-3-2.tcore1.l78-london.as6453.net (80.231.154.143) 172.784 ms 172.592 ms 172.921
ms
MPLS Label=558832 CoS=0 TTL=1 S=1
19 if-ae-66-2.tcore2.nto-newyork.as6453.net (80.231.130.106) 172.660 ms 172.503 ms
172.937 ms
MPLS Label=300093 CoS=0 TTL=1 S=1
20 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 172.258 ms 172.540 ms 171.995
ms
21 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 183.732 ms 171.950 ms
172.068 ms
22 ae-1-3103.ear3.Frankfurt1.Level3.net (4.69.163.86) [AS 3356] 252.748 ms 252.855 ms
252.719 ms
23 ix-ae-18-0.tcore1.fr0-frankfurt.as6453.net (195.219.50.49) 253.215 ms 253.049 ms
252.474 ms
24 if-ae-59-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.194) 258.598 ms if-ae-4-
2.tcore2.fnm-frankfurt.as6453.net (195.219.87.17) 258.467 ms 257.584 ms
MPLS Label=486720 CoS=0 TTL=1 S=1
25 if-ae-9-3.tcore1.pvu-paris.as6453.net (195.219.87.14) 257.906 ms 257.857 ms if-ae-9-
2.tcore1.pvu-paris.as6453.net (195.219.87.10) 258.308 ms
MPLS Label=345609 CoS=0 TTL=1 S=1
26 if-ae-11-2.tcore1.pye-paris.as6453.net (80.231.153.50) 257.546 ms 257.812 ms 268.691
ms
MPLS Label=525823 CoS=0 TTL=1 S=1
27 if-ae-3-2.tcore1.l78-london.as6453.net (80.231.154.143) 261.149 ms 257.873 ms 258.124
ms
MPLS Label=558832 CoS=0 TTL=1 S=1
28 if-ae-66-2.tcore2.nto-newyork.as6453.net (80.231.130.106) 257.746 ms 257.491 ms
258.035 ms
MPLS Label=300093 CoS=0 TTL=1 S=1
29 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 257.737 ms 258.226 ms 257.614
ms
30 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 257.587 ms 259.322 ms
258.347 ms
2020 (TATA-Level3 loop)
…
20 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 172.258 ms 172.540 ms 171.995 ms
21 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 183.732 ms 171.950 ms 172.068 ms
22 ae-1-3103.ear3.Frankfurt1.Level3.net (4.69.163.86) [AS 3356] 252.748 ms 252.855 ms 252.719 ms
23 ix-ae-18-0.tcore1.fr0-frankfurt.as6453.net (195.219.50.49) 253.215 ms 253.049 ms 252.474 ms
24 if-ae-59-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.194) 258.598 ms if-ae-4-2.tcore2.fnm-
frankfurt.as6453.net (195.219.87.17) 258.467 ms 257.584 ms
MPLS Label=486720 CoS=0 TTL=1 S=1
25 if-ae-9-3.tcore1.pvu-paris.as6453.net (195.219.87.14) 257.906 ms 257.857 ms if-ae-9-2.tcore1.pvu-
paris.as6453.net (195.219.87.10) 258.308 ms
MPLS Label=345609 CoS=0 TTL=1 S=1
26 if-ae-11-2.tcore1.pye-paris.as6453.net (80.231.153.50) 257.546 ms 257.812 ms 268.691 ms
MPLS Label=525823 CoS=0 TTL=1 S=1
27 if-ae-3-2.tcore1.l78-london.as6453.net (80.231.154.143) 261.149 ms 257.873 ms 258.124 ms
MPLS Label=558832 CoS=0 TTL=1 S=1
28 if-ae-66-2.tcore2.nto-newyork.as6453.net (80.231.130.106) 257.746 ms 257.491 ms 258.035 ms
MPLS Label=300093 CoS=0 TTL=1 S=1
29 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 257.737 ms 258.226 ms 257.614 ms
…
2020 (TATA-Level3 loop)
1. TATA/US „sees” more specific via Level3/US
2. Level3/US does not have this zombie route and
uses „cold potato” routing to reach
Level3/Frankfurt
3. Level3 passes packets to TATA in Frankfurt (less
specific route, destination is TATAs customer in
Poland)
4. once passed to TATA, „zombie more specific via
Level3” kicks in – traffic goes to Tata/US where
it is passed to Level3/US once again…
2020 (Level3 loop and zombie resurrection)
• First outage directly after withdrawal
• Finally BGP converges
• However, few hours later zombie route resurrects in AS3356 core and causes
another 1h outage
2020 (Level3 loop and zombie resurrection)
2020 Aug (well known Centurylink/Level3-related outage)
NANOG mailing list threads:
 „Centurylink having a bad morning?”
 „[outages] Major Level3 (CenturyLink) Issues”
https://mailman.nanog.org/pipermail/nanog/2020-August/thread.html
https://mailman.nanog.org/pipermail/nanog/2020-September/thread.html
https://puck.nether.net/pipermail/outages/2020-August/013204.html
2020 Aug (well known Centurylink/Level3-related outage)
Analysis:
 https://blog.thousandeyes.com/centurylink-level-3-outage-analysis/
„Level 3 continues to advertise stale routes despite services withdrawing routes”
 https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/
 https://radar.qrator.net/blog/another-centurylink-bgp-incident
Detection and debugging
Detection & debugging
 Complete outage
 should be easy to spot
 Partial outage, suboptimal routing
 traces from the outer world
 BGP tables: Tier1s, NSP, ISP, IXP, HE.net, Qrator Radar and NLNOG Ring
looking glasses / route servers
 BGP updates log
Toolbox: traces
 http://ping.pe/
 simple and quick
 https://mtr.sh/
 fancy
 https://www.globaltraceroute.com/
 RIPE Atlas probes
 wide range of locations, very slow
ping.pe
mtr.sh
Toolbox: looking glasses
 http://lg.ring.nlnog.net/
 https://lg.he.net/
 https://radar.qrator.net/
 https://www.pch.net/tools/looking_glass/
NLNOG Ring Looking glass
BGP maps: HE vs. Qrator Radar
Toolbox: BGP updates
 PCH
 https://www.pch.net/resources/Routing_Data/IPv4_daily_snapshots/
 https://www.pch.net/resources/Raw_Routing_Data/
 RIPE
 https://stat.ripe.net/
 https://stat.ripe.net/special/bgplay (history)
 https://ris-live.ripe.net/ (live BGP stream)
 https://www.ripe.net/analyse/internet-measurements/routing-
information-service-ris/ris-raw-data
RIPE RIS Live
RIPE BGPlay
Zombie risk mitigation
Zombie risk mitigation
 Fix all Tier1 routers 
 Gradual more specific withdrawal
 stage 1: withdraw from distant locations and transits
 stage 2: withdraw from local/national peerings
 Selective more specific announcements
 by continent/peer
 no transit, just peerings
 gratis: faster convergence!
Selective announcements / traffic steering
 Use the communities, Luke!
 Features
 excellent customer BGP communities (NTT, Telia, GTT, DE-CIX)
 good enough
 ~nothing (HE)
 secret
 Transition
 transparent
 partial clear/override
 full clear
 overlap risk! (EC/LC still not widely adopted)
Example: add GTT leak to the mix (via RETN)
Note: covers all RETN, Telia, GTT and
TATA customers (not visible here)
Example: leak to Telia (via Level3)
Note: leaks to all Level3 customers
(incl. RETN) and Telia customers
Per customer announcement tailoring (BIRD filter syntax)
case bgp_path.last {
# ASx Customer Foo (uses: Level3, Telia)
x:
if pop = "PLIX" then bgp_community.add(level3_yes_telia);
if pop = "THINX" then bgp_community.add(retn_yes_telia);
if pop = "LINX" then {…}
# ASy Customer Bar (uses: GTT, Cogent)
y:
if pop = "PLIX" then bgp_community.add(level3_yes_cogent);
if pop = "THINX" then bgp_community.add(retn_yes_gtt);
if pop = "LINX" then {…}
# ASz Customer Baz...
}
docs: https://bird.network.cz/?get_doc&v=20&f=bird-5.html#ss5.4
Summary
 Still not well understood
 BGP update queueing, races/reordering, losses?
 BGP optimizers/stabilizers, broken damping?
 In $vendors we trust
 Avoid more-specifics in global table
 Monitor your reachability/visibility
e–Q&A
@redguardianeu

BGP zombie routes

  • 1.
  • 2.
    Agenda 1. BGP withdrawalsand zombie routes 2. Real life cases 3. Detection and debugging 4. Zombie risk mitigation
  • 3.
    whoami(1)  Atende Software redGuardian DDoS mitigation  my previous talks: DPDK, DPI/regexp, DUT perftesting, BGP hijacks https://www.slideshare.net/atendesoftware/presentations  Previously  Netia S.A.  ATM S.A.  local hosting and ISP companies, community network  Roles: system engineer, IT operations lead, business analyst @pawmal80
  • 4.
    BGP withdrawals andzombie routes
  • 5.
    BGP zombie /ghost route  „an active routing table entry for a prefix that has been withdrawn by its origin network” source: https://labs.ripe.net/Members/romain_fontugne/bgp-zombies (2019) see also: „BGP Zombies: an Analysis of Beacons Stuck Routes” (2019), https://www.iij-ii.co.jp/en/members/romain/pdf/romain_pam2019.pdf  not a new phenomenon  Ghost Route Hunter (2003): https://www.sixxs.net/tools/grh/what/  „An overview of the global IPv6 routing table” (2005): https://meetings.ripe.net/ripe-50/presentations/ripe50-plenary-tue-ipv6-routing.pdf  may take hours/days to „expire”
  • 6.
    BGP zombie /ghost route  Who cares? It was withdrawn anyway!  Unless we are talking about  partial withdrawal and some ingress traffic goes via different path you may expect / does not converge or even loops  more-specific route and zombie sits in Tier1/Tier2/NSP/IXP infrastructure causing partial or complete outage
  • 7.
    More-specific prefix usageexamples  Traffic engineering  Announce 10.0.0.0/23 into global table  Announce 10.0.0.0/24 to some IXP peers to override their local prefs  Customer delegation  ISP1 announces 10.0.0.0/16 PA block  ISP1 delegates 10.1.2.0/24 to customer  Customer runs own BGP, announces 10.1.2.0/24 via ISP1, ISP2 and IXP
  • 8.
  • 9.
    2016 (TPNET-OTI loop) Orange PL (5617) – Opentransit (5511)  Zombie AS path: 5511 1299 24724 57811 201029 x  Looking glass:  TPNET sees (zombie) more specific via OTI  OTI has less specific via TPNET  I gave up after 20 minute outage and reannounced more specific to save „x”  Withdrawn later with no issues
  • 10.
    2016 (Interoute/AS8928 hijack) 1.Warsaw: PLIX, THINX, NASK 2. Interoute: Prague, Paris, Madrid 3. NTT Madrid 4. Telia: Madrid, Hamburg 5. Warsaw: TPNET 6. Customer
  • 11.
    2016 (Interoute/AS8928 hijack) •zombie /24 route via NTT at former Interoute/Madrid hijacked significant part of ingress traffic • luckily, no loop; trace reaches customer in Warsaw • many hours, finally „fixed” by announce/withdraw flaps
  • 12.
    2018 (Telia loop) Massiveoutage after „1299 3356 …” path withdrawal
  • 13.
  • 14.
    2018 (Telia loop) •1299 announces zombie route • hijacks and loops large portion of ingress traffic • we reproduced this problem with another, non-production prefix • ~two days of disaster! • „Routeprocessor Switchover in one of our backbone router in Chicago solved the issue”
  • 15.
    2020 (TATA-Level3 loop) Router:gin-n0v-tcore1 Site: US, New York, N0V Command: traceroute inet4 x as-number-lookup traceroute to x (x), 30 hops max, 52 byte packets 1 if-ae-7-5.tcore1.nto-newyork.as6453.net (63.243.128.141) 2.990 ms 1.545 ms 1.369 ms MPLS Label=415563 CoS=0 TTL=1 S=1 2 if-ae-9-2.tcore1.n75-newyork.as6453.net (63.243.128.122) 1.653 ms 1.704 ms 1.439 ms 3 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 3.038 ms 1.118 ms 3.086 ms 4 ae-1-3103.ear3.Frankfurt1.Level3.net (4.69.163.86) [AS 3356] 82.672 ms 81.989 ms 82.221 ms 5 ix-ae-18-0.tcore1.fr0-frankfurt.as6453.net (195.219.50.49) 82.072 ms 81.949 ms 81.731 ms 6 if-ae-4-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.17) 87.154 ms if-ae-59-2.tcore2.fnm- frankfurt.as6453.net (195.219.87.194) 87.064 ms 87.038 ms MPLS Label=486720 CoS=0 TTL=1 S=1 7 if-ae-30-2.tcore1.pvu-paris.as6453.net (80.231.153.89) 86.645 ms if-ae-9-3.tcore1.pvu- paris.as6453.net (195.219.87.14) 87.036 ms if-ae-9-2.tcore1.pvu-paris.as6453.net (195.219.87.10) 87.412 ms MPLS Label=345609 CoS=0 TTL=1 S=1 8 if-ae-11-2.tcore1.pye-paris.as6453.net (80.231.153.50) 87.357 ms 87.522 ms 86.774 ms MPLS Label=525823 CoS=0 TTL=1 S=1 9 if-ae-3-2.tcore1.l78-london.as6453.net (80.231.154.143) 87.089 ms 86.984 ms 87.120 ms MPLS Label=558832 CoS=0 TTL=1 S=1 10 if-ae-66-2.tcore2.nto-newyork.as6453.net (80.231.130.106) 86.711 ms 86.872 ms 87.689 ms MPLS Label=300093 CoS=0 TTL=1 S=1 11 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 86.838 ms 86.749 ms 86.667 ms 12 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 87.039 ms 86.777 ms 108.465 ms 13 ae-1-3103.ear3.Frankfurt1.Level3.net (4.69.163.86) [AS 3356] 167.903 ms 167.436 ms 167.919 ms 14 ix-ae-18-0.tcore1.fr0-frankfurt.as6453.net (195.219.50.49) 167.316 ms 167.016 ms 167.156 ms 15 if-ae-4-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.17) 172.082 ms 172.347 ms if-ae-59- 2.tcore2.fnm-frankfurt.as6453.net (195.219.87.194) 172.688 ms MPLS Label=486720 CoS=0 TTL=1 S=1 16 if-ae-9-3.tcore1.pvu-paris.as6453.net (195.219.87.14) 172.403 ms if-ae-9-2.tcore1.pvu- paris.as6453.net (195.219.87.10) 177.623 ms 172.588 ms MPLS Label=345609 CoS=0 TTL=1 S=1 17 if-ae-11-2.tcore1.pye-paris.as6453.net (80.231.153.50) 173.956 ms 176.402 ms 172.581 ms MPLS Label=525823 CoS=0 TTL=1 S=1 18 if-ae-3-2.tcore1.l78-london.as6453.net (80.231.154.143) 172.784 ms 172.592 ms 172.921 ms MPLS Label=558832 CoS=0 TTL=1 S=1 19 if-ae-66-2.tcore2.nto-newyork.as6453.net (80.231.130.106) 172.660 ms 172.503 ms 172.937 ms MPLS Label=300093 CoS=0 TTL=1 S=1 20 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 172.258 ms 172.540 ms 171.995 ms 21 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 183.732 ms 171.950 ms 172.068 ms 22 ae-1-3103.ear3.Frankfurt1.Level3.net (4.69.163.86) [AS 3356] 252.748 ms 252.855 ms 252.719 ms 23 ix-ae-18-0.tcore1.fr0-frankfurt.as6453.net (195.219.50.49) 253.215 ms 253.049 ms 252.474 ms 24 if-ae-59-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.194) 258.598 ms if-ae-4- 2.tcore2.fnm-frankfurt.as6453.net (195.219.87.17) 258.467 ms 257.584 ms MPLS Label=486720 CoS=0 TTL=1 S=1 25 if-ae-9-3.tcore1.pvu-paris.as6453.net (195.219.87.14) 257.906 ms 257.857 ms if-ae-9- 2.tcore1.pvu-paris.as6453.net (195.219.87.10) 258.308 ms MPLS Label=345609 CoS=0 TTL=1 S=1 26 if-ae-11-2.tcore1.pye-paris.as6453.net (80.231.153.50) 257.546 ms 257.812 ms 268.691 ms MPLS Label=525823 CoS=0 TTL=1 S=1 27 if-ae-3-2.tcore1.l78-london.as6453.net (80.231.154.143) 261.149 ms 257.873 ms 258.124 ms MPLS Label=558832 CoS=0 TTL=1 S=1 28 if-ae-66-2.tcore2.nto-newyork.as6453.net (80.231.130.106) 257.746 ms 257.491 ms 258.035 ms MPLS Label=300093 CoS=0 TTL=1 S=1 29 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 257.737 ms 258.226 ms 257.614 ms 30 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 257.587 ms 259.322 ms 258.347 ms
  • 16.
    2020 (TATA-Level3 loop) … 20if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 172.258 ms 172.540 ms 171.995 ms 21 ae-7.edge2.NewYorkCity6.Level3.net (4.68.39.49) [AS 3356] 183.732 ms 171.950 ms 172.068 ms 22 ae-1-3103.ear3.Frankfurt1.Level3.net (4.69.163.86) [AS 3356] 252.748 ms 252.855 ms 252.719 ms 23 ix-ae-18-0.tcore1.fr0-frankfurt.as6453.net (195.219.50.49) 253.215 ms 253.049 ms 252.474 ms 24 if-ae-59-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.194) 258.598 ms if-ae-4-2.tcore2.fnm- frankfurt.as6453.net (195.219.87.17) 258.467 ms 257.584 ms MPLS Label=486720 CoS=0 TTL=1 S=1 25 if-ae-9-3.tcore1.pvu-paris.as6453.net (195.219.87.14) 257.906 ms 257.857 ms if-ae-9-2.tcore1.pvu- paris.as6453.net (195.219.87.10) 258.308 ms MPLS Label=345609 CoS=0 TTL=1 S=1 26 if-ae-11-2.tcore1.pye-paris.as6453.net (80.231.153.50) 257.546 ms 257.812 ms 268.691 ms MPLS Label=525823 CoS=0 TTL=1 S=1 27 if-ae-3-2.tcore1.l78-london.as6453.net (80.231.154.143) 261.149 ms 257.873 ms 258.124 ms MPLS Label=558832 CoS=0 TTL=1 S=1 28 if-ae-66-2.tcore2.nto-newyork.as6453.net (80.231.130.106) 257.746 ms 257.491 ms 258.035 ms MPLS Label=300093 CoS=0 TTL=1 S=1 29 if-ae-12-2.tcore1.n75-newyork.as6453.net (66.110.96.5) 257.737 ms 258.226 ms 257.614 ms …
  • 17.
    2020 (TATA-Level3 loop) 1.TATA/US „sees” more specific via Level3/US 2. Level3/US does not have this zombie route and uses „cold potato” routing to reach Level3/Frankfurt 3. Level3 passes packets to TATA in Frankfurt (less specific route, destination is TATAs customer in Poland) 4. once passed to TATA, „zombie more specific via Level3” kicks in – traffic goes to Tata/US where it is passed to Level3/US once again…
  • 18.
    2020 (Level3 loopand zombie resurrection) • First outage directly after withdrawal • Finally BGP converges • However, few hours later zombie route resurrects in AS3356 core and causes another 1h outage
  • 19.
    2020 (Level3 loopand zombie resurrection)
  • 20.
    2020 Aug (wellknown Centurylink/Level3-related outage) NANOG mailing list threads:  „Centurylink having a bad morning?”  „[outages] Major Level3 (CenturyLink) Issues” https://mailman.nanog.org/pipermail/nanog/2020-August/thread.html https://mailman.nanog.org/pipermail/nanog/2020-September/thread.html https://puck.nether.net/pipermail/outages/2020-August/013204.html
  • 21.
    2020 Aug (wellknown Centurylink/Level3-related outage) Analysis:  https://blog.thousandeyes.com/centurylink-level-3-outage-analysis/ „Level 3 continues to advertise stale routes despite services withdrawing routes”  https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/  https://radar.qrator.net/blog/another-centurylink-bgp-incident
  • 22.
  • 23.
    Detection & debugging Complete outage  should be easy to spot  Partial outage, suboptimal routing  traces from the outer world  BGP tables: Tier1s, NSP, ISP, IXP, HE.net, Qrator Radar and NLNOG Ring looking glasses / route servers  BGP updates log
  • 24.
    Toolbox: traces  http://ping.pe/ simple and quick  https://mtr.sh/  fancy  https://www.globaltraceroute.com/  RIPE Atlas probes  wide range of locations, very slow
  • 25.
  • 26.
  • 27.
    Toolbox: looking glasses http://lg.ring.nlnog.net/  https://lg.he.net/  https://radar.qrator.net/  https://www.pch.net/tools/looking_glass/
  • 28.
  • 29.
    BGP maps: HEvs. Qrator Radar
  • 30.
    Toolbox: BGP updates PCH  https://www.pch.net/resources/Routing_Data/IPv4_daily_snapshots/  https://www.pch.net/resources/Raw_Routing_Data/  RIPE  https://stat.ripe.net/  https://stat.ripe.net/special/bgplay (history)  https://ris-live.ripe.net/ (live BGP stream)  https://www.ripe.net/analyse/internet-measurements/routing- information-service-ris/ris-raw-data
  • 31.
  • 32.
  • 33.
  • 34.
    Zombie risk mitigation Fix all Tier1 routers   Gradual more specific withdrawal  stage 1: withdraw from distant locations and transits  stage 2: withdraw from local/national peerings  Selective more specific announcements  by continent/peer  no transit, just peerings  gratis: faster convergence!
  • 35.
    Selective announcements /traffic steering  Use the communities, Luke!  Features  excellent customer BGP communities (NTT, Telia, GTT, DE-CIX)  good enough  ~nothing (HE)  secret  Transition  transparent  partial clear/override  full clear  overlap risk! (EC/LC still not widely adopted)
  • 36.
    Example: add GTTleak to the mix (via RETN) Note: covers all RETN, Telia, GTT and TATA customers (not visible here)
  • 37.
    Example: leak toTelia (via Level3) Note: leaks to all Level3 customers (incl. RETN) and Telia customers
  • 38.
    Per customer announcementtailoring (BIRD filter syntax) case bgp_path.last { # ASx Customer Foo (uses: Level3, Telia) x: if pop = "PLIX" then bgp_community.add(level3_yes_telia); if pop = "THINX" then bgp_community.add(retn_yes_telia); if pop = "LINX" then {…} # ASy Customer Bar (uses: GTT, Cogent) y: if pop = "PLIX" then bgp_community.add(level3_yes_cogent); if pop = "THINX" then bgp_community.add(retn_yes_gtt); if pop = "LINX" then {…} # ASz Customer Baz... } docs: https://bird.network.cz/?get_doc&v=20&f=bird-5.html#ss5.4
  • 39.
    Summary  Still notwell understood  BGP update queueing, races/reordering, losses?  BGP optimizers/stabilizers, broken damping?  In $vendors we trust  Avoid more-specifics in global table  Monitor your reachability/visibility
  • 40.