SlideShare a Scribd company logo
1 of 19
Download to read offline
BGP ERROR HANDLING.
DEVELOPING AN OPERATOR-LED APPROACH IN THE IETF.

 Shakir, Cable&Wireless Worldwide.
Rob




                             NANOG 51 – Feb. 01, 2011 – MIAMI, FL
CUSTOMER
                        A Typical SP Network?
                        PE          PE




TRANSIT            PE   P            P        PE             PEER




                               P




                        PE               PE
             BGP
             IGP
                                                      CUSTOMER




   IGP
 Signals customer/Internal prefixes between PEs
   EGP
 Propagates internal prefixes to neighbouring ASes.
A (Modern) Typical SP Network?
CUSTOMER

                                        PE
                        PE




                               RR

TRANSIT            PE                             PE          PEER

                         P              P



                                    P


                        PE                   PE
             BGP
             IGP
                                                        CUSTOMER




   IGP
 Minimal infrastructure routing information.
   BGP
 Propagate internal routing and service data.
BGP Failures I.

JAN.
                                                             ERRORS IN AS4_PATH

09
                 Erroneous	
  data	
  in	
  the	
  AS4_PATH	
  op6onal	
  transi6ve	
  a9ribute	
  
                                                 causing	
  BGP	
  session	
  failure	
  (JunOS	
  bug).	
  




VERY LONG AS_PATH
                                                                                    FEB.
Very	
  long	
  AS_PATHs	
  in	
  the	
  global	
  BGP	
  table	
  cause	
  session	
  failure.	
  
Not	
  the	
  first	
  6me	
  this	
  had	
  been	
  seen.	
                                           09
BGP Failures II.

AUG.
                                                RIPE NCC RIS EXPERIMENT

10
                  A	
  RIPE	
  NCC	
  RIS/Duke	
  University	
  experiment	
  results	
  in	
  BGP	
  
                     sessions	
  being	
  reset	
  –	
  disrup6ng	
  global	
  table	
  (IOS	
  XR	
  bug).	
  




iBGP FAILURES
                                                                                        ??
Mul6ple	
  occurrences	
  within	
  xSP	
  networks.	
  
Likely	
  to	
  cause	
  higher	
  financial	
  impact	
  (L3VPN	
  margin).	
                      ??
Why do we see these events?

RTR A                                RTR B




         UPDATE

                                             Error!




RTR A                                RTR B




                      NOTIFICATION
Cause/Impact.

LIMITED
                                              Must	
  either	
  DISCARD	
  a9ributes	
  or	
  
TOOLSET IN STANDARDS.
                                      respond	
  with	
  NOTIFICATION.	
  



SERVICE
   Transit/Peering	
  failure	
  	
  -­‐	
  although	
  error	
  source	
  may	
  be	
  remote.	
  
IMPACT.
            iBGP	
  failure	
  –	
  high	
  impact	
  sessions?	
  Route	
  reflectors?	
  



                     Results in loss of RIB!
Would you tolerate this in your IGP based on one erroneous LSP?
Intent of Work.

DEFINE HOW
                                      Document	
  the	
  way	
  xSPs	
  use	
  BGP.	
  
BGP IS USED.
   Ensure	
  that	
  cri6cal	
  nature	
  of	
  the	
  protocol	
  is	
  understood.	
  



PROVIDE
                Determine	
  how	
  OPERATORS	
  think	
  that	
  BGP	
  should	
  
REQUIREMENTS
                          fail	
  –	
  and	
  what	
  we’ll	
  compromise	
  on.	
  



TIE TOGETHER
                  Ensure	
  that	
  tools	
  resul6ng	
  from	
  exis6ng	
  dra]s	
  
IETF WORK ITEMS.
              form	
  a	
  useful	
  framework	
  to	
  make	
  BGP	
  robust.	
  
Approach Overview.


01
   DON’T SEND NOTIFICATION.
   04




                                  MONITORING
02
 RECOVER RIB CONSISTENCY.
03
     RESTART BGP HITLESSLY.
Avoid sending NOTIFICATION.
                                                                          Error!
                                                                       172.16.0.0/12
                                                                       WITHDRAWN
                  UPDATE
               172.16.0.0/12


       RTR A                                                         RTR B



                                             NOTIFICATION




WHAT DO WE
                     “treat-­‐as-­‐withdraw”	
  mechanism	
  can	
  result	
  in	
  
COMPROMISE ON?
                        rou6ng	
  inconsistency	
  (possible	
  loops!).	
  


EXISTING WORK
                 dra]-­‐chen	
  (eBGP	
  errors)	
  –	
  includes	
  Opt	
  Trans.	
  
ITEMS IN IETF?
                          Needs	
  to	
  be	
  extended	
  to	
  cover	
  iBGP.	
  
Recover RIB Consistency.
                                                                   Missing
                                                                172.16.0.0/12
                                                                 from RTR A

                                            REQUEST
                                           172.16.0.0/12

       RTR A                                                   RTR B

                   UPDATE
                 172.16.0.0/12




HOW CAN THIS
                    Mechanisms	
  to	
  re-­‐request	
  missing	
  NLRI.	
  
BE ACHIEVED?
                         One	
  prefix	
  at	
  once,	
  or	
  whole	
  RIB.	
  


EXISTING WORK
                                         “One-­‐Time	
  Prefix	
  ORF”.	
  
ITEMS?
                                             Enhanced	
  ROUTE	
  REFRESH.	
  
Reduce Impact of Session Reset.
SESSION RESETS,
             NOTIFICATION	
  has	
  u6lity	
  for	
  resecng	
  state.	
  
CAN WE AVOID THEM?
           Consider	
  that	
  some6mes	
  it	
  is	
  unavoidable.	
  


                  FORWARDING PLANE UNAFFECTED.


                                                SESSION
                                                 RESET
          RTR A                                                   RTR B

                   SESSION
                   RE-OPEN



EXISTING WORK
                              (Expired)	
  “SOFT-­‐NOTIFICATION”.	
  
ITEMS IN IETF?
                             Further	
  work	
  required	
  to	
  revive!	
  
Introduce Further Monitoring.

EXISTING ERRORS
     NOCs	
  can	
  see	
  session	
  failures	
  very	
  easily	
  –	
  both	
  
ARE VERY VISIBLE.
    via	
  session	
  monitoring	
  and	
  forwarding	
  outage!	
  	
  



FURTHER COMPLEXITY
              Mechanisms	
  are	
  required	
  to	
  make	
  error	
  
MEANS LESS MANAGEABLE
            handling	
  visible	
  to	
  both	
  BGP	
  speakers.	
  



EXISTING WORK
                    (In-­‐band)	
  ADVISORY	
  and	
  DIAGNOSTIC.	
  
ITEMS IN IETF?
                 (Out-­‐of-­‐Band)	
  BGP	
  Monitoring	
  Protocol.	
  
Complexities of Approach.
                Know	
  the	
  NLRI?	
  
                  Re-­‐request	
  
                    (ORF)	
  



  Error!	
  
                                            Re-­‐request	
  the	
     Hitless	
  Session	
  
treat-­‐as-­‐
                                              whole	
  RIB	
               Reset	
  
withdraw	
  




                                           OOPS!
Why am I standing here?

                        NANOG

  As Operators, we deal with the fall-out of protocol issues!


SO…
 an agreed, operator-recommended approach is required.
Questions, comments, review…

           ALL MUCH APPRECIATED!

            Feedback form at http://rob.sh/bgp

http://tools.ietf.org/html/draft-shakir-idr-ops-reqs-for-bgp-error-handling-00



      rob.shakir@cw.com // +44(0)207 100 7532 // RJS-RIPE
ADDITIONAL SLIDES.
Q&A AND FURTHER BACKGROUND.
Receiver side only?
UTILITY FOR
                               Yes	
  –	
  we	
  can	
  s6ll	
  use	
  these	
  mechanisms.	
  
RX-SIDE BUGS?
              Although	
  u6lity	
  of	
  RIB	
  consistency	
  refresh	
  is	
  reduced.	
  




    Error!	
                                                                       Repeat	
  errors	
  –	
  	
  
                                      Hitless	
  Session	
  
  treat-­‐as-­‐                                                                     NLRI	
  remains	
  
                                           Reset	
  
  withdraw	
                                                                        withdrawn?	
  



REQUIREMENTS
                   Must	
  ensure	
  that	
  this	
  is	
  flagged	
  to	
  Operator.	
  

  Implementa6on	
  bugs	
  cannot	
  be	
  recovered	
  by	
  any	
  protocol	
  mechanism.	
  
Multi-session BGP

IS MULTISESSION
                     No	
  –	
  it	
  helps,	
  but	
  is	
  not	
  a	
  complete	
  solu6on.	
  
ANOTHER SOLUTION?
                                                Many	
  topologies	
  in	
  one	
  AFI.	
  



DOES HAVE
                                             Can	
  achieve	
  AFI	
  error	
  separa6on.	
  
UTILITY
     e.g.	
  IPv4	
  and	
  IPv6	
  errors	
  can	
  be	
  independent	
  of	
  each	
  other.	
  	
  



SOLUTION DOES
           For	
  complete	
  solu6on,	
  we	
  would	
  need	
  one	
  session	
  
NOT SCALE
               per	
  topology	
  –	
  control-­‐plane	
  does	
  not	
  scale	
  to	
  this!	
  

More Related Content

What's hot

Ap nr5000 pt file
Ap nr5000 pt fileAp nr5000 pt file
Ap nr5000 pt fileAddPac1999
 
Future Signaling Protocols What’s New in IETF
Future Signaling Protocols What’s New in IETFFuture Signaling Protocols What’s New in IETF
Future Signaling Protocols What’s New in IETFJohn Loughney
 
Slide Development Of A Laser Driver Chip Test Set‑Up For Slhc Experiments
Slide Development Of A Laser Driver Chip Test Set‑Up For Slhc ExperimentsSlide Development Of A Laser Driver Chip Test Set‑Up For Slhc Experiments
Slide Development Of A Laser Driver Chip Test Set‑Up For Slhc Experimentsstefanome
 
AMTELCO RED ALERT AVAYA Integration
AMTELCO RED ALERT AVAYA Integration AMTELCO RED ALERT AVAYA Integration
AMTELCO RED ALERT AVAYA Integration AMTELCO
 
Presentazion Chloride LIFE®.net 2011
Presentazion Chloride LIFE®.net 2011Presentazion Chloride LIFE®.net 2011
Presentazion Chloride LIFE®.net 2011Chloride
 
FPGA Camp - Softjin Presentation
FPGA Camp - Softjin PresentationFPGA Camp - Softjin Presentation
FPGA Camp - Softjin PresentationFPGA Central
 
Demystifying the JESD204B High-speed Data Converter-to-FPGA interface
Demystifying the JESD204B High-speed Data Converter-to-FPGA interfaceDemystifying the JESD204B High-speed Data Converter-to-FPGA interface
Demystifying the JESD204B High-speed Data Converter-to-FPGA interfaceAnalog Devices, Inc.
 
MPLS WC 2014 Segment Routing TI-LFA Fast ReRoute
MPLS WC 2014  Segment Routing TI-LFA Fast ReRouteMPLS WC 2014  Segment Routing TI-LFA Fast ReRoute
MPLS WC 2014 Segment Routing TI-LFA Fast ReRouteBruno Decraene
 
D9600 advanced headend processor
D9600 advanced headend processorD9600 advanced headend processor
D9600 advanced headend processoraniruddh Tyagi
 

What's hot (16)

Albedo.Net.Audit.Ps
Albedo.Net.Audit.PsAlbedo.Net.Audit.Ps
Albedo.Net.Audit.Ps
 
Ap nr5000 pt file
Ap nr5000 pt fileAp nr5000 pt file
Ap nr5000 pt file
 
Future Signaling Protocols What’s New in IETF
Future Signaling Protocols What’s New in IETFFuture Signaling Protocols What’s New in IETF
Future Signaling Protocols What’s New in IETF
 
Matrix setu ata vs linksys spa3102
Matrix  setu ata vs linksys spa3102Matrix  setu ata vs linksys spa3102
Matrix setu ata vs linksys spa3102
 
Slide Development Of A Laser Driver Chip Test Set‑Up For Slhc Experiments
Slide Development Of A Laser Driver Chip Test Set‑Up For Slhc ExperimentsSlide Development Of A Laser Driver Chip Test Set‑Up For Slhc Experiments
Slide Development Of A Laser Driver Chip Test Set‑Up For Slhc Experiments
 
The CTO's Espresso Guide to SON
The CTO's Espresso Guide to SONThe CTO's Espresso Guide to SON
The CTO's Espresso Guide to SON
 
Erlang and Scalability
Erlang and ScalabilityErlang and Scalability
Erlang and Scalability
 
AMTELCO RED ALERT AVAYA Integration
AMTELCO RED ALERT AVAYA Integration AMTELCO RED ALERT AVAYA Integration
AMTELCO RED ALERT AVAYA Integration
 
Presentazion Chloride LIFE®.net 2011
Presentazion Chloride LIFE®.net 2011Presentazion Chloride LIFE®.net 2011
Presentazion Chloride LIFE®.net 2011
 
Sip2012 :: outbound
Sip2012 :: outboundSip2012 :: outbound
Sip2012 :: outbound
 
Spectra OE Webcast July 2010
Spectra OE Webcast July 2010Spectra OE Webcast July 2010
Spectra OE Webcast July 2010
 
FPGA Camp - Softjin Presentation
FPGA Camp - Softjin PresentationFPGA Camp - Softjin Presentation
FPGA Camp - Softjin Presentation
 
Demystifying the JESD204B High-speed Data Converter-to-FPGA interface
Demystifying the JESD204B High-speed Data Converter-to-FPGA interfaceDemystifying the JESD204B High-speed Data Converter-to-FPGA interface
Demystifying the JESD204B High-speed Data Converter-to-FPGA interface
 
Astricon 2007
Astricon 2007Astricon 2007
Astricon 2007
 
MPLS WC 2014 Segment Routing TI-LFA Fast ReRoute
MPLS WC 2014  Segment Routing TI-LFA Fast ReRouteMPLS WC 2014  Segment Routing TI-LFA Fast ReRoute
MPLS WC 2014 Segment Routing TI-LFA Fast ReRoute
 
D9600 advanced headend processor
D9600 advanced headend processorD9600 advanced headend processor
D9600 advanced headend processor
 

Viewers also liked

Guazatinaikos Team Melendi English
Guazatinaikos Team Melendi   EnglishGuazatinaikos Team Melendi   English
Guazatinaikos Team Melendi EnglishItaca Comenius
 
Media film plan
Media film planMedia film plan
Media film planDennistar
 
第一部Twitterユーザーによる熊本市電車窓配信プロジェクトtwitrainの概要と実現に向けた取り組み
第一部Twitterユーザーによる熊本市電車窓配信プロジェクトtwitrainの概要と実現に向けた取り組み第一部Twitterユーザーによる熊本市電車窓配信プロジェクトtwitrainの概要と実現に向けた取り組み
第一部Twitterユーザーによる熊本市電車窓配信プロジェクトtwitrainの概要と実現に向けた取り組みKazuya Masuda
 
02 regimen legal administrativo de la construcción (1)
02 regimen legal administrativo de la construcción (1)02 regimen legal administrativo de la construcción (1)
02 regimen legal administrativo de la construcción (1)Richard Jimenez
 
Камиль Исаев - Intel
Камиль Исаев - IntelКамиль Исаев - Intel
Камиль Исаев - IntelKlishina
 
El uso de las TIC en las Micro pequeñas y medianas empresas industriales en ...
El uso de las TIC  en las Micro pequeñas y medianas empresas industriales en ...El uso de las TIC  en las Micro pequeñas y medianas empresas industriales en ...
El uso de las TIC en las Micro pequeñas y medianas empresas industriales en ...Magnoliapool
 
Innovation Strategy
Innovation StrategyInnovation Strategy
Innovation Strategymattard06
 
Proportions Taylor elliott
Proportions Taylor elliottProportions Taylor elliott
Proportions Taylor elliottBentonGeometry
 
Simple surgical periodontic case
Simple surgical periodontic caseSimple surgical periodontic case
Simple surgical periodontic caseReham Al-haratani
 

Viewers also liked (15)

U.S. Merchant Marine
U.S. Merchant MarineU.S. Merchant Marine
U.S. Merchant Marine
 
Guazatinaikos Team Melendi English
Guazatinaikos Team Melendi   EnglishGuazatinaikos Team Melendi   English
Guazatinaikos Team Melendi English
 
Media film plan
Media film planMedia film plan
Media film plan
 
ֆրանսիա
ֆրանսիաֆրանսիա
ֆրանսիա
 
第一部Twitterユーザーによる熊本市電車窓配信プロジェクトtwitrainの概要と実現に向けた取り組み
第一部Twitterユーザーによる熊本市電車窓配信プロジェクトtwitrainの概要と実現に向けた取り組み第一部Twitterユーザーによる熊本市電車窓配信プロジェクトtwitrainの概要と実現に向けた取り組み
第一部Twitterユーザーによる熊本市電車窓配信プロジェクトtwitrainの概要と実現に向けた取り組み
 
Some images I found on the net
Some images I found on the netSome images I found on the net
Some images I found on the net
 
211- Ariel-Israel
211- Ariel-Israel211- Ariel-Israel
211- Ariel-Israel
 
02 regimen legal administrativo de la construcción (1)
02 regimen legal administrativo de la construcción (1)02 regimen legal administrativo de la construcción (1)
02 regimen legal administrativo de la construcción (1)
 
Камиль Исаев - Intel
Камиль Исаев - IntelКамиль Исаев - Intel
Камиль Исаев - Intel
 
El uso de las TIC en las Micro pequeñas y medianas empresas industriales en ...
El uso de las TIC  en las Micro pequeñas y medianas empresas industriales en ...El uso de las TIC  en las Micro pequeñas y medianas empresas industriales en ...
El uso de las TIC en las Micro pequeñas y medianas empresas industriales en ...
 
Innovation Strategy
Innovation StrategyInnovation Strategy
Innovation Strategy
 
Proportions Taylor elliott
Proportions Taylor elliottProportions Taylor elliott
Proportions Taylor elliott
 
Simple surgical periodontic case
Simple surgical periodontic caseSimple surgical periodontic case
Simple surgical periodontic case
 
Proportional
ProportionalProportional
Proportional
 
Kristína info
Kristína infoKristína info
Kristína info
 

Similar to BGP Error Handling (NANOG 51)

IETF80 - IDR/GROW BGP Error Handling Requirements
IETF80 - IDR/GROW BGP Error Handling RequirementsIETF80 - IDR/GROW BGP Error Handling Requirements
IETF80 - IDR/GROW BGP Error Handling RequirementsRob Shakir
 
BGP OPERATIONAL Message
BGP OPERATIONAL MessageBGP OPERATIONAL Message
BGP OPERATIONAL MessageRob Shakir
 
Rafał Szarecki - PIM-tunnels and MPLS P2MP as Multicast data plane in IPTV a...
 Rafał Szarecki - PIM-tunnels and MPLS P2MP as Multicast data plane in IPTV a... Rafał Szarecki - PIM-tunnels and MPLS P2MP as Multicast data plane in IPTV a...
Rafał Szarecki - PIM-tunnels and MPLS P2MP as Multicast data plane in IPTV a...PROIDEA
 
Mexico 3070 user group meeting 2012 test coverage john
Mexico 3070 user group meeting 2012  test coverage johnMexico 3070 user group meeting 2012  test coverage john
Mexico 3070 user group meeting 2012 test coverage johnInterlatin
 
Eigrp
EigrpEigrp
Eigrpfirey
 
Vyatta Subscription Edition 6.5 R1 Testing and Analysis
Vyatta Subscription Edition 6.5 R1 Testing and AnalysisVyatta Subscription Edition 6.5 R1 Testing and Analysis
Vyatta Subscription Edition 6.5 R1 Testing and AnalysisRouter Analysis, Inc.
 
Eigrp and ospf comparison
Eigrp and ospf comparisonEigrp and ospf comparison
Eigrp and ospf comparisonDeepak Raj
 
BGP Traffic Engineering with SDN Controller
BGP Traffic Engineering with SDN ControllerBGP Traffic Engineering with SDN Controller
BGP Traffic Engineering with SDN ControllerAPNIC
 
CCNA3 Verson6 Chapter10
CCNA3 Verson6 Chapter10CCNA3 Verson6 Chapter10
CCNA3 Verson6 Chapter10Chaing Ravuth
 
CCNA3 Verson6 Chapter7
CCNA3 Verson6 Chapter7CCNA3 Verson6 Chapter7
CCNA3 Verson6 Chapter7Chaing Ravuth
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloudinside-BigData.com
 
PLNOG 7: Emil Gągała, Sławomir Janukowicz - carrier grade NAT
PLNOG 7: Emil Gągała,  Sławomir Janukowicz - carrier grade NAT PLNOG 7: Emil Gągała,  Sławomir Janukowicz - carrier grade NAT
PLNOG 7: Emil Gągała, Sławomir Janukowicz - carrier grade NAT PROIDEA
 
PLNOG 8: Rafał Szarecki - Telco Group Network
PLNOG 8: Rafał Szarecki - Telco Group Network PLNOG 8: Rafał Szarecki - Telco Group Network
PLNOG 8: Rafał Szarecki - Telco Group Network PROIDEA
 
AusNOG 2011 - Residential IPv6 CPE - What Not to Do and Other Observations
AusNOG 2011 - Residential IPv6 CPE - What Not to Do and Other ObservationsAusNOG 2011 - Residential IPv6 CPE - What Not to Do and Other Observations
AusNOG 2011 - Residential IPv6 CPE - What Not to Do and Other ObservationsMark Smith
 

Similar to BGP Error Handling (NANOG 51) (20)

IETF80 - IDR/GROW BGP Error Handling Requirements
IETF80 - IDR/GROW BGP Error Handling RequirementsIETF80 - IDR/GROW BGP Error Handling Requirements
IETF80 - IDR/GROW BGP Error Handling Requirements
 
BGP OPERATIONAL Message
BGP OPERATIONAL MessageBGP OPERATIONAL Message
BGP OPERATIONAL Message
 
Rafał Szarecki - PIM-tunnels and MPLS P2MP as Multicast data plane in IPTV a...
 Rafał Szarecki - PIM-tunnels and MPLS P2MP as Multicast data plane in IPTV a... Rafał Szarecki - PIM-tunnels and MPLS P2MP as Multicast data plane in IPTV a...
Rafał Szarecki - PIM-tunnels and MPLS P2MP as Multicast data plane in IPTV a...
 
Aag c45 697761
Aag c45 697761Aag c45 697761
Aag c45 697761
 
Fast Convergence Techniques
Fast Convergence TechniquesFast Convergence Techniques
Fast Convergence Techniques
 
Mexico 3070 user group meeting 2012 test coverage john
Mexico 3070 user group meeting 2012  test coverage johnMexico 3070 user group meeting 2012  test coverage john
Mexico 3070 user group meeting 2012 test coverage john
 
Samplab19
Samplab19Samplab19
Samplab19
 
Eigrp
EigrpEigrp
Eigrp
 
Vyatta Subscription Edition 6.5 R1 Testing and Analysis
Vyatta Subscription Edition 6.5 R1 Testing and AnalysisVyatta Subscription Edition 6.5 R1 Testing and Analysis
Vyatta Subscription Edition 6.5 R1 Testing and Analysis
 
Eigrp and ospf comparison
Eigrp and ospf comparisonEigrp and ospf comparison
Eigrp and ospf comparison
 
BGP Traffic Engineering with SDN Controller
BGP Traffic Engineering with SDN ControllerBGP Traffic Engineering with SDN Controller
BGP Traffic Engineering with SDN Controller
 
Rip
RipRip
Rip
 
CCNA3 Verson6 Chapter10
CCNA3 Verson6 Chapter10CCNA3 Verson6 Chapter10
CCNA3 Verson6 Chapter10
 
CCNA3 Verson6 Chapter7
CCNA3 Verson6 Chapter7CCNA3 Verson6 Chapter7
CCNA3 Verson6 Chapter7
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloud
 
cogiel-OLT.pdf
cogiel-OLT.pdfcogiel-OLT.pdf
cogiel-OLT.pdf
 
BMP Test Results
BMP Test ResultsBMP Test Results
BMP Test Results
 
PLNOG 7: Emil Gągała, Sławomir Janukowicz - carrier grade NAT
PLNOG 7: Emil Gągała,  Sławomir Janukowicz - carrier grade NAT PLNOG 7: Emil Gągała,  Sławomir Janukowicz - carrier grade NAT
PLNOG 7: Emil Gągała, Sławomir Janukowicz - carrier grade NAT
 
PLNOG 8: Rafał Szarecki - Telco Group Network
PLNOG 8: Rafał Szarecki - Telco Group Network PLNOG 8: Rafał Szarecki - Telco Group Network
PLNOG 8: Rafał Szarecki - Telco Group Network
 
AusNOG 2011 - Residential IPv6 CPE - What Not to Do and Other Observations
AusNOG 2011 - Residential IPv6 CPE - What Not to Do and Other ObservationsAusNOG 2011 - Residential IPv6 CPE - What Not to Do and Other Observations
AusNOG 2011 - Residential IPv6 CPE - What Not to Do and Other Observations
 

BGP Error Handling (NANOG 51)

  • 1. BGP ERROR HANDLING. DEVELOPING AN OPERATOR-LED APPROACH IN THE IETF. Shakir, Cable&Wireless Worldwide. Rob NANOG 51 – Feb. 01, 2011 – MIAMI, FL
  • 2. CUSTOMER A Typical SP Network? PE PE TRANSIT PE P P PE PEER P PE PE BGP IGP CUSTOMER IGP Signals customer/Internal prefixes between PEs EGP Propagates internal prefixes to neighbouring ASes.
  • 3. A (Modern) Typical SP Network? CUSTOMER PE PE RR TRANSIT PE PE PEER P P P PE PE BGP IGP CUSTOMER IGP Minimal infrastructure routing information. BGP Propagate internal routing and service data.
  • 4. BGP Failures I. JAN. ERRORS IN AS4_PATH 09 Erroneous  data  in  the  AS4_PATH  op6onal  transi6ve  a9ribute   causing  BGP  session  failure  (JunOS  bug).   VERY LONG AS_PATH FEB. Very  long  AS_PATHs  in  the  global  BGP  table  cause  session  failure.   Not  the  first  6me  this  had  been  seen.   09
  • 5. BGP Failures II. AUG. RIPE NCC RIS EXPERIMENT 10 A  RIPE  NCC  RIS/Duke  University  experiment  results  in  BGP   sessions  being  reset  –  disrup6ng  global  table  (IOS  XR  bug).   iBGP FAILURES ?? Mul6ple  occurrences  within  xSP  networks.   Likely  to  cause  higher  financial  impact  (L3VPN  margin).   ??
  • 6. Why do we see these events? RTR A RTR B UPDATE Error! RTR A RTR B NOTIFICATION
  • 7. Cause/Impact. LIMITED Must  either  DISCARD  a9ributes  or   TOOLSET IN STANDARDS. respond  with  NOTIFICATION.   SERVICE Transit/Peering  failure    -­‐  although  error  source  may  be  remote.   IMPACT. iBGP  failure  –  high  impact  sessions?  Route  reflectors?   Results in loss of RIB! Would you tolerate this in your IGP based on one erroneous LSP?
  • 8. Intent of Work. DEFINE HOW Document  the  way  xSPs  use  BGP.   BGP IS USED. Ensure  that  cri6cal  nature  of  the  protocol  is  understood.   PROVIDE Determine  how  OPERATORS  think  that  BGP  should   REQUIREMENTS fail  –  and  what  we’ll  compromise  on.   TIE TOGETHER Ensure  that  tools  resul6ng  from  exis6ng  dra]s   IETF WORK ITEMS. form  a  useful  framework  to  make  BGP  robust.  
  • 9. Approach Overview. 01 DON’T SEND NOTIFICATION. 04 MONITORING 02 RECOVER RIB CONSISTENCY. 03 RESTART BGP HITLESSLY.
  • 10. Avoid sending NOTIFICATION. Error! 172.16.0.0/12 WITHDRAWN UPDATE 172.16.0.0/12 RTR A RTR B NOTIFICATION WHAT DO WE “treat-­‐as-­‐withdraw”  mechanism  can  result  in   COMPROMISE ON? rou6ng  inconsistency  (possible  loops!).   EXISTING WORK dra]-­‐chen  (eBGP  errors)  –  includes  Opt  Trans.   ITEMS IN IETF? Needs  to  be  extended  to  cover  iBGP.  
  • 11. Recover RIB Consistency. Missing 172.16.0.0/12 from RTR A REQUEST 172.16.0.0/12 RTR A RTR B UPDATE 172.16.0.0/12 HOW CAN THIS Mechanisms  to  re-­‐request  missing  NLRI.   BE ACHIEVED? One  prefix  at  once,  or  whole  RIB.   EXISTING WORK “One-­‐Time  Prefix  ORF”.   ITEMS? Enhanced  ROUTE  REFRESH.  
  • 12. Reduce Impact of Session Reset. SESSION RESETS, NOTIFICATION  has  u6lity  for  resecng  state.   CAN WE AVOID THEM? Consider  that  some6mes  it  is  unavoidable.   FORWARDING PLANE UNAFFECTED. SESSION RESET RTR A RTR B SESSION RE-OPEN EXISTING WORK (Expired)  “SOFT-­‐NOTIFICATION”.   ITEMS IN IETF? Further  work  required  to  revive!  
  • 13. Introduce Further Monitoring. EXISTING ERRORS NOCs  can  see  session  failures  very  easily  –  both   ARE VERY VISIBLE. via  session  monitoring  and  forwarding  outage!     FURTHER COMPLEXITY Mechanisms  are  required  to  make  error   MEANS LESS MANAGEABLE handling  visible  to  both  BGP  speakers.   EXISTING WORK (In-­‐band)  ADVISORY  and  DIAGNOSTIC.   ITEMS IN IETF? (Out-­‐of-­‐Band)  BGP  Monitoring  Protocol.  
  • 14. Complexities of Approach. Know  the  NLRI?   Re-­‐request   (ORF)   Error!   Re-­‐request  the   Hitless  Session   treat-­‐as-­‐ whole  RIB   Reset   withdraw   OOPS!
  • 15. Why am I standing here? NANOG As Operators, we deal with the fall-out of protocol issues! SO… an agreed, operator-recommended approach is required.
  • 16. Questions, comments, review… ALL MUCH APPRECIATED! Feedback form at http://rob.sh/bgp http://tools.ietf.org/html/draft-shakir-idr-ops-reqs-for-bgp-error-handling-00 rob.shakir@cw.com // +44(0)207 100 7532 // RJS-RIPE
  • 17. ADDITIONAL SLIDES. Q&A AND FURTHER BACKGROUND.
  • 18. Receiver side only? UTILITY FOR Yes  –  we  can  s6ll  use  these  mechanisms.   RX-SIDE BUGS? Although  u6lity  of  RIB  consistency  refresh  is  reduced.   Error!   Repeat  errors  –     Hitless  Session   treat-­‐as-­‐ NLRI  remains   Reset   withdraw   withdrawn?   REQUIREMENTS Must  ensure  that  this  is  flagged  to  Operator.   Implementa6on  bugs  cannot  be  recovered  by  any  protocol  mechanism.  
  • 19. Multi-session BGP IS MULTISESSION No  –  it  helps,  but  is  not  a  complete  solu6on.   ANOTHER SOLUTION? Many  topologies  in  one  AFI.   DOES HAVE Can  achieve  AFI  error  separa6on.   UTILITY e.g.  IPv4  and  IPv6  errors  can  be  independent  of  each  other.     SOLUTION DOES For  complete  solu6on,  we  would  need  one  session   NOT SCALE per  topology  –  control-­‐plane  does  not  scale  to  this!