• Save
AA261: DevOps lessons in collaborative maintenance
Upcoming SlideShare
Loading in...5
×
 

AA261: DevOps lessons in collaborative maintenance

on

  • 772 views

On January 31, 2000, Alaska Airlines Flight 261 plunged into the Pacific ocean in an extreme "nose down" position, killing all 88 crew and passengers on board. The NTSB concluded AA261's horizontal ...

On January 31, 2000, Alaska Airlines Flight 261 plunged into the Pacific ocean in an extreme "nose down" position, killing all 88 crew and passengers on board. The NTSB concluded AA261's horizontal stabiliser trim system's jackscrew was inadequately maintained, causing the pilots to lose all control of the plane.

There are striking parallels with the problems we face daily in IT operations & software development, and the 30 years of give and take between the aircraft manufacturer's engineers, airline maintenance staff, and federal regulators that preceded AA261's simple mechanical failure.

In this talk, Lindsay looks at the complex interplay between the parties in the AA261 crash through a DevOps lens, investigating the collaborative approach to maintenance and operation of the MD-83 aircraft, and relating the complexities back to the complex IT systems we build and maintain.

Statistics

Views

Total Views
772
Views on SlideShare
740
Embed Views
32

Actions

Likes
2
Downloads
0
Comments
0

1 Embed 32

https://twitter.com 32

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

AA261: DevOps lessons in collaborative maintenance AA261: DevOps lessons in collaborative maintenance Presentation Transcript

  • AA261 DevOps lessons incollaborative maintenance
  • Lindsay Holmwood @auxesis
  • Software Manager @Bulletproof Networks
  • Trigger warning: death
  • January 31, 2000Puerto Vallarta
  • Seattle
  • Departed PVR at 13.37 PST
  • Ascended to 31,000ft
  • 2 hours into flight:Jammed horizontal stabiliser
  • No trim control
  • Redirected to LAX
  • Pilots unjammedhorizontal stabilisers
  • 2 pilots3 crew83 passengers
  • This is a maintenance accident. AlaskaAirlines maintenance and inspection of itshorizontal stabilizer activation system waspoorly conceived and woefully executed. Thefailure was compounded by poor oversight...had any of the managers, mechanics,inspectors, supervisors or FAA overseerswhose job it was to protect this mechanismdone their job conscientiously, this accidentcannot happen. -- John J. Goglia, NTSB Board Member
  • hindsight != foresight
  • [hindsight] converts a oncevague, unlikely future into an immediate, certain past -- Sidney Dekker
  • This is a maintenance accident. AlaskaAirlines maintenance and inspection of itshorizontal stabilizer activation system waspoorly conceived and woefully executed. Thefailure was compounded by poor oversight...had any of the managers, mechanics,inspectors, supervisors or FAA overseerswhose job it was to protect this mechanismdone their job conscientiously, this accidentcannot happen. -- John J. Goglia, NTSB Board Member
  • This is a maintenance accident. AlaskaAirlines maintenance and inspection of itshorizontal stabilizer activation system waspoorly conceived and woefully executed. Thefailure was compounded by poor oversight...had any of the managers, mechanics,inspectors, supervisors or FAA overseerswhose job it was to protect this mechanismdone their job conscientiously, this accidentcannot happen. -- John J. Goglia, NTSB Board Member
  • “poorly conceived andwoefully executed”
  • DC-9 -> MD-80 -> MD-83
  • Evolutionaryproduct development
  • Appropriatedmaintenance schedules
  • Jackscrewlubrication interval
  • 1965 every 300-350 hours launch of DC-91985 every 700 hours industry deregulation1987 every 1000 hours industry standardisation1991 every 1200 hours industry standardisation1994 every 1600 hours industry standardisation1996 every 8 months (2550 hours) Alaska Airlines policy change
  • 1965 every 300-350 hours launch of DC-91985 every 700 hours industry deregulation1987 every 1000 hours industry standardisation1991 every 1200 hours industry standardisation1994 every 1600 hours industry standardisation1996 every 8 months (2550 hours) Alaska Airlines policy change
  • 1965 every 300-350 hours launch of DC-91985 every 700 hours industry deregulation1987 every 1000 hours industry standardisation1991 every 1200 hours industry standardisation1994 every 1600 hours industry standardisation1996 every 8 months (2550 hours) Alaska Airlines policy change
  • 1965 every 300-350 hours launch of DC-91985 every 700 hours industry deregulation1987 every 1000 hours industry standardisation1991 every 1200 hours industry standardisation1994 every 1600 hours industry standardisation1996 every 8 months (2550 hours) Alaska Airlines policy change
  • 1965 every 300-350 hours launch of DC-91985 every 700 hours industry deregulation1987 every 1000 hours industry standardisation1991 every 1200 hours industry standardisation1994 every 1600 hours industry standardisation1996 every 8 months (2550 hours) Alaska Airlines policy change
  • 1965 every 300-350 hours launch of DC-91985 every 700 hours industry deregulation1987 every 1000 hours industry standardisation1991 every 1200 hours industry standardisation1994 every 1600 hours industry standardisation1996 every 8 months (2550 hours) Alaska Airlines policy change
  • 1965 every 300-350 hours launch of DC-91985 every 700 hours industry deregulation1987 every 1000 hours industry standardisation1991 every 1200 hours industry standardisation1994 every 1600 hours industry standardisation1996 every 8 months (2550 hours) Alaska Airlines policy change
  • Decrementalism
  • Complex system constraints Jens Rasmussen
  • wo rkl oad
  • wo rkl oadeconomy
  • wo rkl oadeconomy saf ety
  • tim e
  • etim cost
  • ty aliqu coste tim
  • wo rkl oadeconomy saf ety
  • wo rkl oadeconomy saf ety
  • wo rkl oadeconomy saf ety
  • wo rkl oadeconomy saf ety
  • wo rkl oadeconomy saf ety
  • wo rkl oadeconomy saf ety
  • wo rkl oadeconomy saf ety
  • wo rkl oadeconomy saf ety
  • wo rkl oadeconomy saf ety
  • outside: failure of foresight oad safrkl etywo economy
  • outside: failure of foresight oad safrkl inside: ety trade-offswo in direction of greater efficiency economy
  • trade-offs in direction ofgreater efficiency
  • trade-offs in direction ofgreater efficiency
  • Constraints on knowledge
  • Why would they make baddecisions intentionally?
  • Decisions seemed rational
  • Local rationalisation
  • “people make what they consider to be the bestdecision based on available knowledge at the time”
  • This is a maintenance accident. AlaskaAirlines maintenance and inspection of itshorizontal stabilizer activation system waspoorly conceived and woefully executed. Thefailure was compounded by poor oversight...had any of the managers, mechanics,inspectors, supervisors or FAA overseerswhose job it was to protect this mechanismdone their job conscientiously, this accidentcannot happen. -- John J. Goglia, NTSB Board Member
  • wo rkl oadeconomy saf ety
  • ty aliqu coste tim
  • Devops constraints
  • “God, our ops team are arseholes. I just wantto deploy this change and go home!”
  • “God, our ops team are arseholes. I just wantto deploy this change and go home!” oad saf rkl ety wo economy
  • “God, our ops team are arseholes. I just wantto deploy this change and go home!” oad oad saf saf rkl rkl ety ety wo wo economy economy
  • What are the circumstances?
  • Where are the tensions?
  • Have ops been burnt before?
  • Is there deployment friction? Why?
  • Is deployment high-risk?
  • Is deployment time consuming?
  • Is deployment important to the business?
  • “It’s 3am an the pager has gone off again. Whycan’t these devs just write code that works?”
  • “It’s 3am an the pager has gone off again. Whycan’t these devs just write code that works?” oad saf rkl ety wo economy
  • “It’s 3am an the pager has gone off again. Whycan’t these devs just write code that works?” oad oad saf saf rkl rkl ety ety wo wo economy economy
  • [hindsight] converts a oncevague, unlikely future into an immediate, certain past -- Sidney Dekker
  • What are the circumstances?
  • Where are the tensions?
  • Why didn’t the dev know the code would fail like this?
  • Why weren’t you involvedwhen the code was written?
  • How is code reviewed?
  • Is the infrastructure anti-fragile?
  • Is the code anti-fragile?
  • Hindsight bias
  • [hindsight] converts a oncevague, unlikely future into an immediate, certain past -- Sidney Dekker
  • What are the motivations?
  • “amoral actors”
  • wo rkl oadeconomy saf ety
  • wo rkl oadeconomy saf ety
  • “root cause” is simply thepoint you stop looking -- Sidney Dekker
  • What are the circumstances?
  • Where are the tensions?
  • Thank you!
  • Thank you!Liked the talk? Let @auxesis know!
  • Sidney Dekker [books]Field Guide to Understand Human ErrorDrift Into FailureJust CultureDan Manges [blog]How incidents affect infrastructure priorities