Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The History of Fire Escapes

831 views

Published on

What can software learn from the history of fire escapes in New York City? Turns out, we're following a similar evolution.

This was a keynote for QCon 2018.

Published in: Software
  • Be the first to comment

The History of Fire Escapes

  1. 1. The history of fire escapes Tanya Reilly @whereistanya Abstract: When a datacenter goes offline, a server gets overloaded, or a binary hits a crashing bug, we usually have a contingency plan. We reduce damage, redirect traffic, page someone, drop low-priority requests, follow documented procedures. But why do many failures still come as a surprise? In this talk, we look at some real life analogs to preventing and managing software failures. Fire partitions. Public safety campaigns. Smoke alarms. Sprinkler systems. Doors that say “This is not an exit”. And fire escapes. What can we learn from the real world about expecting failure and designing for it? --- https://commons.wikimedia.org/wiki/Category:Fire_escapes#/media/File:ISO_7010_E 016.svg Public domain. Slide template started as Oivia from SlidesCarnival and then drifted into something very else.
  2. 2. "When we first dropped our bags on apartment floors…" Welcome To New York Taylor Swift Good morning! So, I'm a New Yorker. I'm not from the US -- I'm an immigrant -- but one of the many things I love about New York City is that you move here, and it’s immediately your city. The number one criterion for being a New Yorker is wanting to be a New Yorker. It's a welcoming place. So good morning to my fellow New Yorkers, wherever you're originally from, and, if you're travelled to be here, welcome to New York. We're glad to have you. I work in Site Reliability and I'm especially interested in what happens when things fail, the contingency plans we use to recover when something breaks. And last year I was thinking about that a lot and walking around the city and I started really noticing that New York is *covered* in fire escapes. They’re a contingency plan too. They’re for incident response. You don’t use them until all of your regular methods of getting out of the building have failed. So I started reading about fire escapes. ---- https://unsplash.com/photos/Iyd__3m4XF8 CC0
  3. 3. content warning: fire Before I say more about that, let’s talk content. This talk is about at disaster prevention and disaster recovery in software, by looking at parallels in building fires. This will include stories of some of the worst fires in the history of new york city. We'll be looking at the reasons fires started, the stuff that helped them spread and how people died. There's also some pictures of buildings on fire. Nothing lurid, but there are pictures. If you have raw feelings related to recent fires, this could be rough. If you'd be more comfortable skipping this one, you should do that with my blessing. While you're packing up, I'll even tell you what I'm going to say, so you don't miss anything:
  4. 4. Tony Fischer CC BY-2.0 Fireproof buildings are more effective than fire escapes. Fireproof software is more effective than incident response. Where's our fire code? Here's my thesis ● fire escapes are a hacky bit of afterthought tacked on to the outside of a building after the building is finished. If you're using fire escapes, it's worth making them as good as possible, but you’ll prevent more fires if you build better buildings. ● Similarly, incident response is often a hacky bit of afterthought tacked on long after software is released. Again, great incident response can help you recover faster than if you don’t have it but… you’ll prevent more outages if you build better software. ● Finally, buildings have an extremely detailed fire code, but we don't really have an extremely detailed systems engineering code for software, and I think we should have. Now I'm going to say the same thing but take 35 minutes. --- How Much is that Doggie in the Window? https://flic.kr/p/72Lhz1 CC BY 2.0
  5. 5. Claudia Heidelberger CC BY-ND 2.0 Greenwich Village Fire escapes were really only built in New York City for a hundred years. They weren't common until the 1860s, and in the 1960s they stopped being allowed on new construction. There's some debate now about whether we should start removing them in places where the building has been upgraded, or whether they should be preserved as part of the city's history. I think at least some of them should be preserved. Look how beautiful that is! -- Claudia Heidelberger CC BY-ND 2.0. https://flic.kr/p/oqYYv1
  6. 6. Dan DeLuca CC BY-2.0 East Village And here's another lovely one. They made an effort to have it match the style of the building, not feel like a separate thing tacked on at the end. And I think that's key. -- Dan DeLuca CC BY 2.0 https://flic.kr/p/76Jmb2
  7. 7. “ "fire escapes were haphazardly attached to the most elaborately designed facades" Richard Plunz, a History of Housing in New York City 7 But most of the time, the people adding the fire escape didn't think of it as part of the building .As this quote says, fire escapes were haphazardly attached to the most elaborately designed facades. The facade of the building was architecture but the fire escape was law. It was an external contingency plan, not part of the main structure. And I think that's part of why fire escapes ended up not being successful. --- https://books.google.com/books?id=fcKlDAAAQBAJ&pg=PA24
  8. 8. A brief history of New York City fires (With apologies to actual historians) But I'm jumping to the end. Let's look at the evolution of New York City's fire code. By the way, my great fear now is that there’s a building historian in the room who will listen to this and be like “Nope, that is really not what happened." Please forgive any errors, building historian! If i made mistakes, I would love if you would come tell me at the end!
  9. 9. The Financial District 1835 On to the history. We’re skipping the great fire of 1776, and jumping straight to 1835 and the Financial district. This was a commercial, not residential area, and as a result the number of fatalities was comparatively low -- two people -- I mean, still, two too many, but this is mostly remembered as a fire that cost a LOT of money. Almost 700 buildings were destroyed. The city had 26 fire insurance companies. This fire put 23 of them out of business. --- https://en.wikipedia.org/wiki/Great_Fire_of_New_York#/media/File:The_Great_Fire_of _the_City_of_New_York_Dec_16_1835.jpg Public domain.
  10. 10. no failure domains contingency plans failed exhausted incident responders what happened? 1835 The fire was caused by a burst gas pipe in a maze of wooden warehouses. Wood burns easily so there were no failure domains: the fire spread very quickly. Inside two hours it covered 17 city blocks, most of the financial district. The city's water supplies were low and the typical contingency plan was to pull water from the rivers, but it was a freezing night in December and first the firefighters had to cut through ice. At the time it was also common to use gunpowder to level buildings and stop the fire spreading. But they had used up all their gunpowder on a fire two days earlier. That fire involved the entire fire department of 1500 people, and they were still exhausted. Still, they fought the fire for 15 hours until marines from the Brooklyn Navy Yard arrived with more gunpowder and blew up some buildings along Wall Street to make a barrier.
  11. 11. dedicated incident responders: a professional fire department new infrastructure: the Croton Aqueduct better incident response 1835 As a result of the fire, the city stopped using volunteer firefighters and moved to a professional force with better equipment. And they built the Croton Dam and Aqueduct. It was built because of the fire, but a reliable water source is good for lots of reasons! --- No longer in use, btw. It was replaced with the New Croton Dam, which still supplies a small fraction of the city's water. The old one is on the National Register of Historic Places.
  12. 12. robust structures: they rebuilt in stone better buildings 1835 But more importantly, as well as better incident response, they took the opportunity to make a more resilient city. The fire spread fast because the buildings were made of wood. They rebuilt with stone and brick. And this paid off, ten years later, when there was another enormous fire. The great fire of 1845 was very bad -- thirty people died -- but it didn’t spread as far or as fast, because it slowed down when it hit those new brick buildings.
  13. 13. 1860 Tenements Let’s jump forward 25 years and talk about tenements. Tenements were extremely dense, extremely terrible housing. I'd read about tenements but hasn't realised the scale of them. In the 1860s, nearly 500 thousand people -- more than half the city -- lived in tenements. The population of New York City doubled every decade between 1800 and 1880. Maybe you've seen this with teams and software systems: when you grow rapidly, you can build some culture problems and some technical debt. This was certainly the case here. Landlords made more accommodation by splitting big rooms into many smaller ones, mostly with no light or ventilation. These were really awful places to live. They were crime riddled, filthy and filled with disease. Every report about them mentioned that they were fire traps. In 1860, two tenement fires happened back to back. -- https://en.wikipedia.org/wiki/Tenement#/media/File:LowerEastSideTenements.JPG Public domain. Elm St: http://www.nytimes.com/1860/02/03/news/calamitous-fire-tenement-house-elm-street- destroyed-thirty-persons-supposed-have.html?mtrref=www.google.com 45th street fire: http://www.nytimes.com/1860/03/29/news/destructive-fires-four-tenement-houses-des troyed-two-mothers-eight-children.html?pagewanted=all
  14. 14. Quote about the buildings from that second article: “If a skillful man, with a deadly hatred of his race in his heart, sat down to plan a human residence in which to entrap and destroy those who should dwell in it, it is extremely probable that if he had seen these houses in West Forty-fifth-street he would take them as a model. “
  15. 15. what happened? 1860 no isolation obsolete contingency plans no failure domains The first one, on Elm Street, started in a bakery on the ground floor of a large residential building. Terrible place for a bakery, but that's where it was. The baker was storing a lot of hay and wood shavings, and when they burned they made dense smoke, killing some of the people who lived in the higher floors before the fire even got up there. The wooden stairway quickly burned away, trapping people on the top floors. Firefighters arrived with ladders, but the ladders only went to the fourth floor and this was a six storey building. At least 10 people died. A month later four houses burned on west 45th street. These houses had roof hatches called scuttles, which should have let people escape across the roofs, but they all were missing their ladders so nobody could get up there. Another ten people died.
  16. 16. An optimistic disaster plan is a useless disaster plan These escape plans -- the ladders and scuttles and the roof -- had worked fine for a previous iteration of shorter NYC buildings, but they hadn't been updated for the new shape of the city. Just like with the water and the gunpowder, there was a plan in place for a fire disaster. And just like them, the plan only worked in the most optimistic circumstance. We see that all the time. Backups that will work if we lose the database in a very specific way. Failover plans that only work if we have two weeks notice of the failover and the old data center doesn't lose power.
  17. 17. better buildings 1860 new law: an Act to Provide Against Unsafe Buildings in the City of New York The city immediately passed a law to make the tenements more robust against fire. They even put an injunction on new tenement construction until the law was passed. Now houses for more than eight families (kind of specific) had to have fire-proof stairs either inside or outside the building. What’s frustrating about this is that four years earlier a commission had reported that, if there was a fire, tenants on the 6th and 7th floors of tenements had basically zero chance of survival. They recommended fire proof stairs. But nothing happened until a bunch of people died. ---
  18. 18. ● Tenements must have fire escapes... The Tenement House Act 1867 Seven years later, the Draft Riots (which are a whole separate awful thing in which a whole bunch of people died) led to another law: the Tenement House act. This act had good goals but it was extremely unsuccessful. Buildings had to have a fire escape, but they didn't have to make anyone safer! So landlords put up fire escapes that couldn’t hold the number of people in the house, or that weren’t well attached to the walls or that were just a rusty ladder. And what even was a fire escape? Well, it wasn't well defined. Let's take a diversion and look at some fire escape patents. As we look at them, you might want to think of disaster recovery plans you have known and loved. --- The picture’s actually from 1900 but whatever :-D https://commons.wikimedia.org/wiki/File:New_York,_N.Y.,_yard_of_tenement_LOC_d et.4a18586.jpg Public domain.
  19. 19. William Houghton's fire escape 1891 This is a ladder with a counterweight. Imagine climbing down from the 7th floor of your building on one of these. With your six children. In the rain. In a dress that went to your ankles. -- https://en.wikipedia.org/wiki/Fire_escape#/media/File:Houghton%27s_Fire_Escape_1 877.jpg Public domain.
  20. 20. Mary McArthur's fire escape 1904 This is a kind of rope ladder that attaches to a window sill. http://www.google.com.pg/patents/US800934
  21. 21. William Bedinger's fire escape 1915 This is a parachute that rolls up very small. The idea was that you'd carry it with you everywhere in case you were in any tall building fire situations. https://www.google.com/patents/US1168465
  22. 22. Henry Vieregg's fire escape 1902 According to this patent, and I quote: "A person desiring to escape seizes one member of the cord, rope, or chain, as shown in Fig. 1, and forthwith jumps out of the window. [...]" Like, I am looking at this thing and do not feel like I could forthwith jump out of anything. https://www.google.com/patents/US708846
  23. 23. Anna Gonnelly's fire escape 1887 Anna Gonnelly's fire escape was a bridge that you could sling from your roof to another building. It had side rails, so it was only moderately terrifying. https://www.google.com/patents/US368816
  24. 24. Pasquale Nigro's fire escape 1909 This one is just fantastically ludicrous. But good if you want to fight supervillain crime? All of these patents were granted, btw. GOOGLE PATENT US 912152 A
  25. 25. BB Oppenheimer's fire escape 1879 And this one… You might think that this is just a parachute helmet. It is not. It is a parachute helmet and a pair of very bouncy shoes. GOOGLE PATENT US 221855 A .
  26. 26. Nicholas Borgfeldt's fire escape 1882 Finally, I've read this patent three times and I'm fairly convinced that the guy invented a rope. It's the most silicon valley invention of 1882. Though, let's be clear, rope was a popular kind of fire escape. In fact, it was the state of the art for hotels. https://www.google.com/patents/US267399
  27. 27. Every hotel's fire escape 1887 Puck Magazine, 1887 I don't mean a ladder made of rope, I mean literally a rope. Every hotel room had to have a rope and that was the only fire escape. Even at the time, people found that pretty terrible. This is part of a snarky cartoon from a magazine called Puck, published in 1887, of a whole lot of people trying to use the ropes. Like most of those other parents, it's designed for the easiest case: someone with upper body strength and agility who isn't wearing a skirt or carrying a child. If your disaster plan only works for the easiest case, it's not a good plan. I want to emphasise here that a rope is better than nothing. In fact, probably every one of these fire escapes, even mister parachute hat, is better than nothing. But these escape plans are not where I would put my efforts if I wanted to have fewer people die in fires. But this is what the law focused on. -- https://books.google.com/books?id=XwAjAQAAMAAJ&pg=PA48 Pre 1923 so public domain
  28. 28. 1867 Tenements must also have windows... The Tenement House Act (continued) Anyway! The Tenement House Act. Even with fire escapes, tenements were still terrible. They were badly constructed, overcrowded, and -- I find this amazing -- it was perfectly legal to store lots of combustible materials in them. One other thing the tenement act said, was that every room now had to have a window. And just like “what even is a fire escape” it didn’t define “what even is a window”. So the landlords cut holes in interior walls between rooms and called them "interior windows". A decade later, the law said sigh, ok, exterior windows. So landlords started constructing buildings with air shafts, little narrow gaps between buildings. Now, picture it, you have no indoor plumbing and the bathroom is down six flights of stairs and now you have an air shaft. You can imagine how that goes. One article I read described the air shaft as “festering tubes of disease”. Very poetic! And many of the fire escapes just led down to these air shafts and there was no way out from there. --- https://en.wikipedia.org/wiki/Old_Law_Tenement#/media/File:Airshaft_of_a_dumbbell
  29. 29. _tenement,_New_York_City,_taken_from_the_roof,_ca._1900_-_NARA_-_535468.jp g Public domain.
  30. 30. 1871 Carla Geisser CC BY THANK YOU CARLA <3 Tenements must have usable fire escapes. By 1871, iron fire escapes were becoming common and of course people were using them as extra space. You still see that now -- they're used for bikes and gardening and barbecues and cat runs. All of that has been illegal since 1871. Because it makes the fire escape very hard to use in a fire! A later law said that every fire escapes had to have a cast-iron sign saying that you could be fined for obstructing your fire escape. And it was fair, because usable fire escapes are better than unusable ones. But, again, it was still perfectly legal to run your explosive business out of a tenement basement and tons of residential fires started because of deep frying crullers. And anyway, the regulations were mostly not enforced, so people didn't pay much attention. ------------ The encumbrance sign thing is from 1885, but encumbrances were illegal from 1871 and mentioning this many dates makes *my* ears glaze over and I'm already interested in this. So we're conflating two things to keep it moving along. Image by Carla Geisser, used with permission.
  31. 31. 1876 The Brooklyn Theater Fire In 1876, the Brooklyn Theater on Cadman Plaza. The final act of the play was about to start and the stage manager noticed a very tiny fire on the left of the stage. --- https://en.wikipedia.org/wiki/Brooklyn_Theatre_fire#/media/File:BrooklynTheatre_Fro m_Johnson_Street_Looking_East.jpg Public domain.
  32. 32. obsolete contingency plans encumbrances unpracticed incident response delayed escalation restricted access what happened? 1876 It was typical to keep buckets of water next to the stage, but there weren't any. There was a fire hose, but too much scenery was piled beside the stage and he couldn't get to it. There's those encumbrances again. The stage manager asked a couple of carpenters to put the fire out by beating it with poles. This didn't work and actually spread some sparks, setting fire to the loft. The actors -- laudably -- wanted to avoid a panic, so they announced that the fire was part of the show, and that people shouldn't freak out, but once the audience realised, they stampeded. And they had trouble getting out. We have a real stampeding herd problem here: there was only one stairway down from the cheap seats at the top, and everyone trying to use it at once. It filled with smoke. There were no fire escapes and some exits were locked to prevent against gate crashers so people couldn't get out that way. 278 people died. At the time, it was the worst theater fire in US history. It's now the third worst because we really don't learn.
  33. 33. accountability: prosecutions new laws for exits and encumbrances automated response: sprinklers! better buildings 1876 The jury blamed the theater owners for not obeying a bunch of existing fire laws, and new laws were written, including widening exits and not storing stuff on the stage. In 1882, the building code said that theatres had to have automatic sprinklers: it's the first type of building in the city to require sprinklers. The first automated response. What I find remarkable is that this fire happened nine years after regulation said that tenements had to have safe exits, but those laws didn't carry over to theatres, or to other types of buildings like: hotels, schools, factories, ships, offices. I'm going to spare you most of the horror stories, but we'll look at factories in a minute, after….
  34. 34. 1890- 1901 Even more Tenement House Acts! ...we get proper no-kidding tenement regulation at last! And we even do it without a bunch of people dying!. Thank you Jacob Riis! In 1890, this guy called Jacob Riis published a book about tenement life called How the Other Half Lives and did a lecture tour on it. And up until now the upper and middle class people of New York City had sort of known the tenements were awful, but for the first time ever, there were photographs. It was harder to ignore. Well, it was probably part empathy, part fear of smallpox coming out of there but, whatever, over the next decade, people started to care. I was really reassured when I read this, because until then it had been all “there was a horrific fire and we added a very specific law and then there was a different horrific fire and we added a different very specific law”. And it was mostly like that! But this Tenement House Act came from someone saying “wow, look how much this sucks” in a compelling way. And that gives me hope! Anyway, the next couple of Tenement House Acts included having to have actual windows, not air shafts, and fire escapes couldn't be ladders any more: they had to have open balconies and stairs and be properly attached to the wall. Even better: your neighbours can no longer boil oil in the basement! Hurray! And all new construction has to have interior fire partitions. Failure domains!
  35. 35. We're finally looking at stopping fires from starting and spreading, not just escaping from them. And, best of all, it’s all actually going to be enforced. Welcome to the 20th century! But, oh yeah, it still sucks in factories. -- https://commons.wikimedia.org/wiki/Category:How_the_Other_Half_Lives#/media/File :How_the_Other_Half_Lives_front_cover.png Public domain. http://www.americanyawp.com/text/how-the-other-half-lived-photographs-of-jacob-riis/ Public domain because pre 1923. https://commons.wikimedia.org/wiki/File:Jacob_Riis_portrait.jpg Public domain because pre 1923.
  36. 36. The Newark Factory Fire 1910 The triangle shirtwaist is the famous one, but the Newark factory fire a few months earlier is a textbook disaster waiting to happen so I wanted to talk about it. This building had two fire escapes -- look at the size of this building! One of them was a really heavy ladder that needed to be lifted into place. Another emergency plan that only worked for people with good upper body strength. In the fire, the young women who worked in this factory weren't able to lift down the ladder. So.. only one fire escape. -- http://www.oldnewark.com/histories/factoryfirearticle.php
  37. 37. no isolation no monitoring ignored warnings delayed escalation what happened? 1910 blameful culture restricted access untested contingency plans no drills etc, etc The building was shared by a couple of paper box companies, a nightgown factory and a lamp manufacturer. It had previously been used by machine companies and the floors were soaked in oil. A fire started in the lamp factory. There was no fire alarm, and the bottom three floors had evacuated before they realised that 116 people up on the 4th didn't know there was a fire. This building had had ten fires in ten years and the buildings department had condemned this factory three times, but the factory owners basically ignored them and kept running. All of that was expensive for insurance and they didn't want another fire on their record, so they delayed calling in the firefighters, even though the firehouse was just across the street. The firehouse had a policy of reprimanding their firefighters for false alarms -- no blameless post-mortems here! -- so before raising a general alarm, they sent a couple of guys over with a fire extinguisher, delaying the real response even more. The only door up to the 4th floor was kept locked, which was against the law. The windows wouldn't open and the victims had to break glass with their hands. The window sills were four feet off the ground and the platform up to them broke under the weight of people trying to get out. And the victims had never been in a fire drill and they had no idea what to do. They,
  38. 38. quite reasonably, freaked out. 25 people died, 32 more were very badly injured. I feel like I could spend an hour just talking about this fire. There's so much to learn from it. --- http://www.oldnewark.com/histories/factoryfirearticle.php is really good and I recommend it, if you don't mind being angry)
  39. 39. Human error is never the root cause When officials investigated, they said the root cause was not the walls soaked in grease, or delaying calling fire fighters, or the locked door, or the lack of smoke alarms or the unusable fire escapes. It was that "the victims merely succumbed to panic" The way humans react to a disaster can definitely make the situation worse -- remember those carpenters with sticks in the theater -- but that is in no way their fault. Humans will act in human ways. If your systems can't handle that, and you haven't invested a lot of time in training the humans to act in some other way, your systems are crap. ---State Farm CC BY 2.0 Ref: https://www.uvm.edu/histpres/HPJ/AndreThesis.pdf
  40. 40. “They died from misadventure and accident.” outcome...? 1910 Coroner's Jury, December 1910 So what happened? Nothing. The jury didn't convict, though at least one juror later said he regretted it. New Yorkers did look a bit at their factories and say "huh, I wonder if we should care about that"..., but nothing changed. Is it because it happened ten miles away instead of on the island of Manhattan? No idea. The New York Fire Chief said "This city may have a fire as deadly as the one in Newark at any time". Four months later… --- "They died from misadventure and accident" from http://www.nytimes.com/2011/02/24/nyregion/24towns.html "This city may have a fire as deadly as the one in Newark at any time." from http://trianglefire.ilr.cornell.edu/primary/testimonials/tf_warnings.html
  41. 41. 1911 The Triangle Shirtwaist Factory 146 people died inside 18 minutes. The famous Triangle Shirtwaist Fire. -- https://timesmachine.nytimes.com/timesmachine/1911/03/26/104859694.html https://en.wikipedia.org/wiki/Triangle_Shirtwaist_Factory_fire#/media/File:Image_of_T riangle_Shirtwaist_Factory_fire_on_March_25_-_1911.jpg Public domain. http://www.baruch.cuny.edu/nycdata/disasters/fires-triangle_shirtwaist.html
  42. 42. what happened? no isolation obsolete contingency plans restricted access ignored warnings 1911 This building was considered fireproof. They had done it right. They built a good building. But it was packed with garments hanging so tightly together that the building might as well have been made out of cloth. The building should have had three fire escapes; it had one and that collapsed under the weight of people escaping. Fire fighters came but the fire ladders and the water could only get to the 6th floor and the city had gotten taller again: the factory was on the 7th to 9th. One exit was locked; the guy with the key escaped without unlocking it. And the employers already knew about the problems. Employees had organised a strike the previous year to protest the working conditions, and they'd been fired. The building had had a recent warning notice from the department of sanitary control, but they hadn't fixed their violations.
  43. 43. better tools: stronger pump, longer ladder better incident response 1911 The fire department developed a stronger water pump and a longer ladder, so they could reach taller buildings.
  44. 44. laws: 60 in three years automated response: sprinklers accountability: the American Society of Safety Engineers better buildings 1911 But more importantly, building conditions took a big step forwards. There were 60 new laws over the next three years. Again, everyone knew factories were bad. But, again, the law didn't change until a bunch of people died ON THE ISLAND OF MANHATTAN. Sprinklers started to be required in factories. (But only factories over seven stories tall. Very specific again.) A professional organisation, the American Society of Safety Engineers (which still exists), was founded. -- After the fire, the owners of Triangle Shirtwaist factory, Harris and Blanck, were brought to court on charges of manslaughter but were eventually acquitted. They were fined $75 for each life lost. However their insurance policy paid them a total of $60,000, at the rate of $400 per life lost, so they actually profited from the tragedy. After two years, they continued to lock the doors to exits and were fined for several safety code violations. The worst people :-(
  45. 45. Phil Roeder CC BY-2.0 "...a type of exit condemned by the experience of many fires" NFPA report, 1914 And at last, people started to look at fire escapes differently. After the disaster, a report called them "a pitiful delusion." and "a type of exit condemned by the experience of many fires". --- http://www.nfpa.org/News-and-Research/Publications/NFPA-Journal/2014/September -October-2014/Features/Fire-Escapes/1914-Sound-the-Alarm
  46. 46. Barbara L Hanson CC BY 2.0 Dan DeLuca CC BY 2.0 Eden, Janine and Jim CC BY 2.0 don toye CC-BY-ND 2,0 Kristine Paulus CC-BY-ND 2.0 "...a type of exit condemned by the experience of many fires" NFPA report, 1914 The report called out a lot of reasons fire escapes are terrible: ● the platforms are too small ● people put stuff on them ● they don't get a lot of maintenance ● snow and ice makes them slippy and dangerous But most importantly ● they never, ever get tested. --- Images: Kristine Paulus CC BY 2.0. https://flic.kr/p/fszEDf (plants) Dan DeLuca CC BY 2.0. https://flic.kr/p/5hsnTM (chairs) Eden, Janine and Jim. CC BY 2.0. https://flic.kr/p/7G1tWZ (snow) Barbara L. Hanson. CC BY 2.0. https://flic.kr/p/8uxpcf (rain) Don toye, CC BY 2.0 https://flic.kr/p/9XrAs (bike)
  47. 47. “ ... fire escape collapses during times of intense use – such as during actual fires. John W. Cramer, The Story of a Tenement House Fire escapes were known to collapse during times of intense use. But they pretty much have one time of intense use. If they're going to collapse, it's going to be during a fire. So what do we do? We have a couple of options here. We can add more regulations around fire escapes: you have to maintain them, you have to try them out every year! There actually was a law about regularly painting your fire escape. To prevent against slipping you have to build a textured floor into the fire escape and leave a pair of shoes with good grips on the top of each one… Or we could step back and ask whether we're optimising for the wrong thing. ---- Quote via http://www.boweryboogie.com/2014/10/favorite-pastime-tenement-fire-escapes/ A photo called "Fire Escape Collapse" received a Pulitzer in 1976. It's fairly harrowing, so I'm not linking it here -- extreme content warning if decide you go look at it -- but it made Boston rewrite its fire escape safety laws. Journalists are amazing.
  48. 48. “ New York Times, February 25th, 1923 1923 In 1923, the New York Times had an article praising fireproof interior walls: "For six years there has been no loss of life by fire in the 200 buildings so treated." It blows my mind that a group of 206 buildings having no fire deaths in six years was considered newsworthy. In 1929 those fireproof walls became code: all new buildings over 75 feet in height had to have them, and also had to have two fully enclosed staircases! Failure domains are part of the code at last! --- https://timesmachine.nytimes.com/timesmachine/1923/02/25/105849722.html?pageN umber=141
  49. 49. 1968 "Fire escapes shall not be permitted on new construction" John VanderHaagen CC BY 2.0 The idea of building better buildings gained traction and in 1968 fire escapes stopped being allowed at all. The code still says "Fire escapes shall not be permitted on new construction". The 1968 code also required sprinklers for hotels and high-rise office buildings, but not nightclubs or residential buildings. ---- " Fire escapes shall not be permitted on new construction, with the exception of group homes. Fire escapes may be used as exits on buildings existing on December sixth, nineteen hundred sixty-eight when such buildings are altered, subject to the approval of the commissioner, or as provided in subdivision (b) hereof. " https://commons.wikimedia.org/wiki/File:New_York,_New_York,_April_1968.jpg CC BY 2.0
  50. 50. More fires. More very specific laws. 1975 - 2018 ● In 1975, seven people died in a nightclub, so, sprinklers for required for nightclubs. ● In 1998 there were two bad residential fires, and now you have to have sprinklers for residences with four or more units. ● And I'm sure this story is not over and the code will be expanded many more times in response to very specific things in which a bunch of people die. Btw, there's no retrofitting of existing buildings. The laws only apply to new buildings and existing buildings get better as they're renovated. So buildings in NYC comply to the safety standard of whenever they were renovated last. Think about that, wherever you sleep tonight. --- https://pxhere.com/en/photo/900057 CC0
  51. 51. Fire deaths decreased because we built better buildings. So that was 150 years of fire codes. For decades we considered it inevitable that fires would start and spread, and we optimised for escaping from them. And we definitely got good at responding to massive fire disasters. But slowly we made progress on other, more important parts of the fire life cycle. Which I'm going to describe in four stages: https://commons.wikimedia.org/wiki/File:An_Old_Rear-Tenement_in_Roosevelt_Stree t.png Public domain.
  52. 52. 1prevention making it harder for the fire to start We prevented sparks. A certain amount of sparks are ok! We need to cook food and have birthday candles. But by becoming more deliberate about when we make sparks, we made it harder for the fire to start at all. We moved bakeries out of residential buildings, began doing wiring inspections, did public safety campaigns about cooking and smoking. --- https://www.pexels.com/photo/fire-match-smoke-flame-54627/ CC0
  53. 53. 2detection stopping it while it's small We worked on detection and immediate amateur response: smoke alarms, fire blankets, fire extinguishers, and more public safety campaigns. And we introduced sprinklers. --- https://commons.wikimedia.org/wiki/File:Fire-blanket-on-display.jpg Public domain
  54. 54. 3 50 isolation preventing it from spreading 3. We introduced failure domains, to keep the fire to one small part of the building or city. We started using materials that were hard to ignite so the fire would spread slowly. And we did fire drills, to move humans quickly and safely away from the danger area and to prevent the kind of panic that makes things worse. -- https://unsplash.com/photos/MApjpqu9V7E CC0
  55. 55. 4response okay, we're fighting a fire And only then, 4, emergency response. We also got better at responding to massive fires. The New York Fire Department is *very good*. But step 4, this is our last resort and we should try not to rely on our last resort. We gained more from stopping the fire from getting to this point. And, if you missed my extremely subtle metaphor here, it's the same for software. --- Image: skeeze. CC0. https://pixabay.com/en/firefighters-training-live-fire-696167/
  56. 56. reliability is everyone's job 1 prevention 2 detection 3 isolation 4 response The most important reliability work is making problems stop before they get to that fourth stage. This means that reliability is everyone's problem. Everyone who's writing code or designing systems should have reliability in mind. Yeah, some people have a site reliability team. Just as we have people who specialise in UI or security, both of which we should all care about, we can have people who specialise in reliability and advocate for it. But, while SREs may occasionally act as firefighters, the more important part of their job is to be the fire safety engineers, handing out smoke alarms, legislating fire partitions, pointing out buildings that are made of wood, advocating for the removal of clutter, educating everyone. The part of their job which is being last resort firefighters? That skillset should be used rarely. You don't want the NYFD running into your kitchen every time you burn toast. If you're calling them in, it's a sign that something's gone horribly wrong. But it's still very common to have firefighters reacting to every software problem.
  57. 57. There's a really nice tradition in the ops and SRE communities, where if a site is down, people send #hugops on twitter to the people working on it. I want to particularly call out Baron Schwartz sending hugops in advance to people running mail servers on GDPR day :-D I love #hugops. I send #hugops. But one thing you'll notice if you follow the hashtag is that… a lot of things break and nobody is really surprised. We're at the stage of software evolution where we expect software to fail. We need to build better buildings in software too. And that means we think about those same four stages. --- Tweets used with permission.
  58. 58. 1prevention making it harder for the fire to start Just like with buildings, a certain amount of sparks are fine for us too! We need to make changes. Maybe something gets overloaded or a user does something we didn't plan for. Many of us use the concept of error budgets: depending on how close we are to missing our SLAs, we make more or fewer changes. We can reduce our sparks: --- https://www.pexels.com/photo/fire-match-smoke-flame-54627/ CC0
  59. 59. hiding the matches 55 Michael Chen CC BY 2.0 We can think about how users use our tools and provide clean, safe, validated interfaces that are hard to get wrong. We can restrict their access to functionality or data they don't need. A stove igniter is a better tool than a box of matches. --- https://flic.kr/p/LdPYz Michael Chen CC BY 2.0
  60. 60. operating with care 56 Reproduced from NFPA's website, © NFPA (2018). The fire department recommends that you don't operate a stove while drunk or sleepy, and the same goes for a root prompts or code merges. Many outages are caused by changes, so we can make them deliberately and carefully, with design review, code review and change management. --- http://www.nfpa.org/termsofuse Liberal use of NFPA fact sheets and news releases is allowable with attribution. Please use the following: "Reproduced from NFPA's website, © NFPA (year)." NFPA does not grant permission for its content to be displayed on other Web sites.
  61. 61. 57 State Farm CC BY 2.0 wiring inspections We can make it a standard to inspect our systems, looking for regressions, looking for what has bitrotted or become overloaded. A thorough test suite is like a wiring inspection that runs on every deploy. And we can do chaos engineering: continually testing the system's resilience against chaotic events. -- https://flic.kr/p/duWtgw State Farm CC BY 2.0
  62. 62. detection stopping it while it's small 2 But, ok, sometimes, inevitably, things go wrong. We have an opportunity to put this fire out while it's tiny. --- https://commons.wikimedia.org/wiki/File:Fire-blanket-on-display.jpg Public domain
  63. 63. 59 topquark22 CC BY 2.0 smoke alarms, fire extinguishers Humans can react quickest if the right fire extinguishers are available. Provide a one-click rollback for all your changes. Use canaries: push the change to one instance before we push all the instances. And launch with feature flags to push out new features in a way that makes it very fast to turn them off if you need to. Alerts need a fine balance, as everyone knows who’s ever had an over-enthusiastic smoke alarm in their kitchen. An occasional false alarm is ok, but having humans continuously react to small problems can burn them out. It's using up your gunpowder on small fires and not having enough left for the big ones! So aim to keep your false alarms low. --- https://flic.kr/p/6AcBru topquark22 CC BY-2.0 https://pixabay.com/en/fire-extinguisher-fire-delete-99915/ Public domain.
  64. 64. HomeSpot HQ CC BY 2.0 sprinklers But even better, don't get humans involved at all for small things. Add automatic recovery. If a machine dies, it should automatically be replaced. If a backend goes missing, we should be able to coast for a while. Health checking and load balancing should move traffic from an unhealthy region to a healthy one. Maybe you want to let humans know, but the message they should get is "everything is under control but you might want to look at this when you get a chance". Not "WELCOME TO 3AM! A MACHINE DID A THING". -- https://flic.kr/p/fmr7a7 HomeSpot HQ www.homespothq.com
  65. 65. 3 61 isolation preventing it from spreading Stage 3: Ok, there's a fire, it's happening. Now we want to not let it get on anything it's not already on. -- https://unsplash.com/photos/MApjpqu9V7E CC0
  66. 66. 62 Achim Hering CC BY 3.0 fire barriers Failure domains split our systems up so that only one part of it should be affected by any given outage. And if the problem's going to move as components get overloaded, we want that to be slow enough that we can control it, not an immediate cascade. And we have our own version of moving bakeries out of residential buildings: we can isolate risky customers on their own replicas or shards. --- https://commons.m.wikimedia.org/wiki/File:Durasteel_fire_barrier.jpg State Farm CC BY 2.0
  67. 67. fire drills Just like we make it incredibly common to hear a smoke alarm and find our way outside, make it so that a disaster is never a surprise. Humans will panic the first time they hit a situation that's outside their comfort zone. At intervals, tell people you're doing a controlled outage, and take a system offline. --- https://pixabay.com/en/safety-helmet-construction-hat-295057/ CC0
  68. 68. avoiding encumbrances 64 You know the phenomenon where you're fixing something and you hit a bunch of unintuitive commands, or out of date documentation, and it ends up taking you much longer to do something simple? Or you even end up breaking something else? These traps are a basement full of straw, or a fire hose with cluttered scenery on top of it. It's making it very, very hard for you to move around safely as you try to fix the real problem. Push back on technical debt and clutter. Fatigue is an encumbrance too. You're way more likely to make a mistake if you're exhausted. Set rules about how long a person should deal with an incident before their on call shift is over and someone else needs to swap in. Enforce those rules. -- photo by me.
  69. 69. 4response okay, we're fighting a fire And sometimes we will still get to stage 4, fighting a massive outage. But we should aim to not get here often. Firefighting is not good for your SLAs and it's also not great for the health of the humans involved. --- Image: skeeze. CC0. https://pixabay.com/en/firefighters-training-live-fire-696167/
  70. 70. controlled burns Jereme Rauckman CC BY 2.0 Ideally we'll get to a point where our firefighters mostly train using controlled outages, like many real fire departments do. But we're not there yet. Many of us are still fixing unreliable software by focusing on this fourth stage, with human response and escape routes... -- https://flic.kr/p/pjPGD6
  71. 71. Software without built-in reliability? That's a tenement. ..., that means they're building tenements. Foul air is coming in through the air shafts, and it's not somewhere humans should live. Reliability can't be added after the building is finished. It needs to be built in. Failure needs to be built in. Building better buildings makes a huge difference. --- https://commons.wikimedia.org/wiki/File:Two_officials_of_the_New_York_City_Tenem ent_House_Department_inspect_a_cluttered_basement_living_room,_ca._1900_-_N ARA_-_535469.jpg Public domain.
  72. 72. NYC had 48 civilian fire deaths in 2016. That's the lowest in 100 years. Reprinted with permission from NFPA Report U.S. Fire Death Rates by State copyright © 2017, National Fire Protection Association, Quincy, MA. All rights reserved. In 2016, 48 people died by fires in New York City. This is still a lot of people! But 2016 was the lowest number since they started recording a hundred years ago, even though the population of the city continues to grow. That Bronx fire in December that killed 12 people was the deadliest in 25 years. How did we get from the fire traps of the 1800s to here? --- https://www.nfpa.org/News-and-Research/Fire-statistics-and-reports/Fire-statistics/Fires-i n-the-US/Overall-fire-problem/Fire-deaths-by-state Used with permission from copyrightrequests@nfpa.org
  73. 73. 444 → pages! 69 Well, this helped. This is the New York City fire code. It has 444 pages and costs $140 dollars, which I know because I really wanted to bring one in here today and dramatically wave it at everyone.The guy at the library was really confused about why I'd want a physical copy. He was like "Look, do you have access to the internet?" --- Book: http://shop.iccsafe.org/2014-new-york-city-fire-code.html Fire code: https://www1.nyc.gov/site/fdny/about/resources/code-and-rules/nyc-fire-code.page
  74. 74. 70 444 → pages! And fire safety is also mentioned plenty in the city building code, the city construction code, the state building code, the National Fire Prevention Agency electrical code and I’m sure plenty of other dense legislation. Don’t ask me what's in each of these. There’s a lot of code, that’s all I’m saying. But we don't have a fire code for software. We have a bunch of O’Reilly books and they're great. But nothing makes us adhere to our best practices, or prioritises one set of rules over the others. Why don't we have a fire code yet?
  75. 75. “ "No computer software failure has killed or injured a large number of people. It is just conceivable that such a tragedy could occur." Software: A Vital Key to UK Competitiveness (C) Crown Copyright 1986 via Risks Digest (https://catless.ncl.ac.uk/Risks) h/t joe Thompson @caffeinepresent 1986 It has been proposed from time to time! I found this report from 1986 called "Software: a vital key to UK competitiveness", which had a whole appendix on safety critical software. It starts with “No computer software failure has killed or injured a large number of people. It is just conceivable that such a tragedy could occur.” ---- https://catless.ncl.ac.uk/Risks/4/14#subj3.1 https://twitter.com/caffeinepresent/status/945079032445620226 https://pxhere.com/en/photo/1111021 CC0
  76. 76. “ "Each life-critical system must be operated by a Certified Software Engineer who is named as being personally responsible for the system." Proposal from the UK Advisory Council for Applied Research and Development, 1986 1986 The Advisory Council predicted a time when it wouldn’t be possible to recover from software failure by just switching off the computer and doing the thing manually -- this was written in 1986, remember. We're there now. They wanted certification: you would only be able to operate a life-critical computer system if you had a license and a Certified Software Engineer to sign off on it -- and they would be personally liable! -- and a bunch of other stuff, and you'd have to get re-certified every five years. They also proposed what’s basically on call shifts, disaster recovery practice drills, and post-mortems, including post-mortems for near misses. A lot of this feels prescient and we ended up doing it, but we never required certification. --- https://catless.ncl.ac.uk/Risks/4/14#subj3.1 https://twitter.com/caffeinepresent/status/945079032445620226 https://pxhere.com/en/photo/1111021 CC0
  77. 77. 73 slide from @jkuroda's amazing LISA 2017 keynote. Used with permission. If you were at LISA in November, you might have seen Jon Kuroda's fantastic closing keynote about aviation safety. Like buildings, plane travel got safer only after a lot of bad accidents. Jon pointed out that, while we might think of computing as a new field, it's the same age as a bunch of others. Software, aviation, power, emergency medicine all took a big jump forward after world war 2. But our industry is significantly less mature than any of the others. ---- https://people.eecs.berkeley.edu/~jkuroda/talks/jkuroda-systemcrash-planecrash-lisa2 017.pdf Image by me.
  78. 78. The stakes are lower? Is that because the stakes are lower? It's at least part of the reason. Mostly, the stakes have have been lower. Software mostly hasn't had the ability to cause massive disasters. Researching this talk, I read a ton about deaths from software -- it really was a cheerful time creating this talk -- and found surprisingly few. Most of the new about software and deaths were about how software is IMPROVING things. By making processes repeatable and precise, we're saving lives. But we have had some famously dangerous software bugs.
  79. 79. The stakes are lower? Ars Technica, August 2013 The Independent, October 1992 New York Times, June 1986 The Therac-25 radiation therapy machine had a concurrent programming bug that made it occasionally give its patients radiation doses that were hundreds of times greater than they should have been. Three people died. In college I remember studying the London Ambulance dispatch failure. A new software system was deployed that hadn't been load tested, and it had a memory leak. It couldn't keep track of where the ambulances were, which led to them arriving hours late. 46 people died who might have been ok if the ambulance had arrived on time. And some near misses. Like, I haven't heard of any actual negative outcomes from the OCR bug that went around in 2013, but you can see how it might print end up with numbers in prescriptions or structural engineering documents being catastrophically wrong. And the news is full of software concerns in vehicles, self-driving or otherwise. -- https://www.nytimes.com/1986/06/21/us/fatal-radiation-dose-in-therapy-attributed-to-c omputer-mistake.html https://www.independent.co.uk/news/ambulance-chief-quits-after-patients-die-in-comp uter-failure-1560111.html https://www.wired.com/2009/10/1026london-ambulance-computer-meltdown/ https://arstechnica.com/information-technology/2013/08/confused-photocopiers-rando mly-rewriting-scanned-documents
  80. 80. https://resources.sei.cmu.edu/library/asset-view.cfm?assetid=426747
  81. 81. “"It took a Newark fire and a Triangle fire to bring New York State's fire legislation to its present inefficiency." Inis Weed, New Outlook volume 104, 1913 1913 But none of those has been our Triangle fire. So far software has been able to kill people one or a few at a time. We haven’t had the wide-scale disasters that have shocked other industries into growing up. Aviation regulations came from a bunch of people dying. Mining regulations came from a bunch of people dying. Professional engineering organisations came from a bunch of people dying. To quote my new favourite 1910s journalist, Inis Weed, "It took a Titanic disaster to improve the safety of vessels. It took a Newark Fire and a Triangle fire to bring New York State's fire legislation to its present inefficiency". The use of software for life-critical systems grows every year. And every day we send #hugops on Twitter to the people working on the latest massive software outage. At some point these will overlap. Hope is not a strategy. Are we ready for this kind of responsibility? We, all of us here, are people who are responsible for software. The world will need a lot of software over the next few decades. Some people in this room will run life critical systems. We are 1890s landlords looking at a whole lot of new opportunity. We know, there's money to be made from cutting all of the corners, but we have a choice. I don't want us to wait for a disaster... ------------------ New Outlook, volume 104. https://books.google.com/books?id=URCzNkpDZp0C
  82. 82. Inis Weed or Inis Weed Jones made topics like medicine, sociology and science exciting for regular people. She wrote extensively for Harper’s, Schibner’s and the Reader’s Digest. She lived, at least for a while, at 337 West 22nd St. She wrote tons about working conditions and humanised anonymous workers. She was an investigator for the US Commission on Industrial Relations. She wrote articles like The Reasons Why The Copper Miners Struck (about a strike), and Safer Childbirth with Less Pain, and Acne: the Plague of Youth and Not By Bread Alone (about young people returning to farming). She also published a book called "Peetie: the story of a real cat", which is $72 on abebooks.com and I won't deny, I'm tempted. She reads like a tremendously compassionate person who wrote about things people needed to care about in an engaging way and made them care. (Please don't be a milkshake duck, Inis).
  83. 83. Let's choose not to build tenements. ...to decide not to build tenements. Remember, some regulations didn't come from fires! Some came from a lot of people deciding to care about the same thing at the same time. We can decide now what good systems look like. We can create professional standards and industry safety codes, and create and opt in to a professional organisation to keep ourselves honest. And then, like the fire code, we can keep revising and improving it until huge software outages are rare and shocking. The entire industry should learn from every major outage. No secrets.
  84. 84. 78 http://noidea.dog/fires ● Escapes in Urban America: History and Preservation, Elizabeth Mary Andre ● No exit: the rise and demise of the outside fire escape: Sara E Wermiel ● How Fire Disaster Shaped the Evolution of the New York City Building Code, Charles Shelhamer ● The Creative and forgotten fire escape designs of the 1800s, Lauren Young ● New Outlook vol 104 (May-August 1913) ● RISKS Digest ● 1910 Newark Factory Fire, Mary Alden Hopkins ● New York City (NYC) Disasters, Baruch College ● Presentation template by SlidesCarnival Questions? Comments? Find me at @whereistanya or fires@noidea.dog #GetAlarmedNYC Before I finish: if you're in New York, the NYFD and the Red Cross have a shared campaign to give people free smoke alarms and free batteries. They'll even come install it for you. If you don't have a smoke alarm, please search for #GetAlarmedNYC and fill in their form. http://fw.to/Kzv1G4f (Two SREs live in my apartment, so we already have two redundant meshes of networked alarms from different manufacturers and also a few standalone alarms.) This slide lists a few references that I found especially useful or interesting while writing this talk. That first one contains a list of all the others, so hit up http://noidea.dog/fires if you want a lot of links to read more about fires and fire escapes. If you have comments on the talk, or questions or you're a building historian who is willing to tell me what I got wrong, you can find me at @whereistanya on Twitter or fires@noidea.dog. --- https://commons.wikimedia.org/wiki/File:Smoke_alarm.JPG CC0

×