Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Learning from failures

Learning from failures

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

Learning from failures

  1. 1. Learning from failures Yoshinobu ‘maz’ Matsuzaki <maz@iij.ad.jp> bdNOG12 maz@iij.ad.jp 1
  2. 2. Reliability is getting important • More use of the Internet • COVID-19 has been pushing digitalization • Bandwidth is a key • When congestion occurs, the experience gets worse • But enough bandwidth just is not enough • Even if you have it set up wrong, you can still use it somehow • Reasonability, stability and resiliency is the other key bdNOG12 maz@iij.ad.jp 2
  3. 3. Risk prediction training 1. Understanding the situation • Discuss imaginable hazard scenario in the given situation. 2. Determining risks • Identify the hazards that need to be addressed 3. Establishing countermeasures • Discuss possible measures to solve the hazards 4. Setting goals • Selecting possible measures to implement bdNOG12 maz@iij.ad.jp 3
  4. 4. Example1: Routing • An ISP assigns /24 for a customer • ISP set up a static route for the link • The customer set up a default route to the uplink • The customer uses /28 out of the /24 10.0.0.0/24 10.0.0.0/28 static route static route default bdNOG12 maz@iij.ad.jp 4
  5. 5. Example1: Risks • If a packet comes to an address other than the /28 out of the /24, the packet will be looped • If the customer's LAN-side interface is down, all packets destined for the /24 will be looped. • Routing loop! 10.0.0.0/24 10.0.0.0/28 static route static route default A packet to: 10.0.0.99 bdNOG12 maz@iij.ad.jp 5
  6. 6. Example1: Measures • Implementing dynamic routing between ISP and the customer • Configuring a static route on the customer's router that directs the same /24 to null 10.0.0.0/24 10.0.0.0/28 static route static route default bdNOG12 maz@iij.ad.jp 6
  7. 7. Example1: Adopting • Configuring a static route on the customer's side router that directs the same /24 to null 10.0.0.0/24 10.0.0.0/28 static route static route default 10.0.0.0/24 static null route bdNOG12 maz@iij.ad.jp 7
  8. 8. Example2: Port assignments • Removing a cable from port X • Just to be safe, make sure the LED is off before pulling it out • But can you spot the right port for sure? bdNOG12 maz@iij.ad.jp 8
  9. 9. 1 2 3 4 5 6 Straight forward Starting from port 0The left LED is for LC status More efficient but confusable A little clearer Port 21 is the SFP now bdNOG12 maz@iij.ad.jp 9
  10. 10. And more... • We may see a different implementation in the future • Assumptions are the source of accidents! • Different products have different port/LED assignments • These caused confusion bdNOG12 maz@iij.ad.jp 10
  11. 11. The more you know, the more you can see • A variety of experience helps us to better consider the hazards • and to identify risks • Technical education and proper training are necessary to improve operational skills • bdNOG workshops and tutorials are helpful • There is always a need for appropriate educational materials bdNOG12 maz@iij.ad.jp 11
  12. 12. Mistakes! • Mistakes can be a very good teaching tool • There is a lot to learn from mistakes in the case studies • There are some special cases, but there are also many common failures and lessons to be learned by comparing them to your own situation • But as a business, we need to stop repeating failures in our service facilities • It damages reliability bdNOG12 maz@iij.ad.jp 12
  13. 13. Build a database of mistakes • It can be a great teaching tool for engineers! • not to reproduce the similar mistakes • You may find common and frequent mistakes • If you can find the root cause of the failure, you can come up with a more effective solution bdNOG12 maz@iij.ad.jp 13
  14. 14. Mistake trend analysis • Identify the high-impact mistakes • Minimize the bad effects • Reduce mistakes bdNOG12 maz@iij.ad.jp 14 effects of mistakes frequency of mistakes should not be happened problemsmatters problems
  15. 15. Accident investigation committee • In some industries, Accident Investigation Committees conduct detailed investigations and compile reports in order to prevent the repeating of serious accidents • Maybe bdNOG can do this as a community activity • For the healthy development of the Internet in Bangladesh • Regular reports of accident cases during bdNOG meetings bdNOG12 maz@iij.ad.jp 15
  16. 16. Summary • To have a reliable network, we need to continuously improve our operations • The use of failure cases allows for more effective risk analysis and countermeasures • As bdNOG community, I believe the following are worth considering • Collection of failure and mistake cases • Trials of accident analysis bdNOG12 maz@iij.ad.jp 16

    Be the first to comment

  • morshedalam9406

    Mar. 17, 2021

Learning from failures

Views

Total views

110

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

3

Shares

0

Comments

0

Likes

1

×