SlideShare a Scribd company logo
Updating software in humans
Computers are mostly terrible
You have to play dumb to get anything done
War stories are fun!
https://app.intercom.io/🚢
->
https://shipitcon.com
Computers are mostly terrible
A fun example - why do you need
to suspend curiosity most of the
time?
There are no guarantees when you
save a file.
Computers are mostly terrible ✅
You have to play dumb to get anything done✅
War stories are cool
“A mental model is an explanation of someone’s
thought process about how something works in
the real world. It is a representation of the
surrounding world, the relationships between its
various parts and a person’s intuitive perception
about his or her own acts and their
consequences.”
Cool war story
That unfortunate time we destroyed
all of our Elasticsearch clusters.
💩
💩💩
💩 💩
💩
💩
💩 💩
💩
💩
Our understanding and mental
model of the extent of our
automation, and related risks, was
incomplete.
Back to the deployment?
😅
Some highlights from CI/CD
check_minimum_sha
Computers are terrible ✅
You have to play dumb to get anything done ✅
War stories are cool ✅
Updating your mental model:
Ask people stuff
Documentation (lol)
Read source code
Chaos experiments
Thank you!
@brian_scanlan

More Related Content

Similar to Updating Software in Humans

Game Design for Storytellers
Game Design for StorytellersGame Design for Storytellers
Game Design for Storytellers
Pietro Polsinelli
 
intro (1).ppt
intro (1).pptintro (1).ppt
intro (1).ppt
burakkrk6
 
Shall We Play a Game? Gaming the System, When the System Is Your Learning Man...
Shall We Play a Game? Gaming the System, When the System Is Your Learning Man...Shall We Play a Game? Gaming the System, When the System Is Your Learning Man...
Shall We Play a Game? Gaming the System, When the System Is Your Learning Man...
Nikki Massaro Kauffman
 
The Brain in the Game
The Brain in the GameThe Brain in the Game
The Brain in the Game
Gil Steiner
 
Computational Humor: Can a machine have a sense of humor (2022)
Computational Humor: Can a machine have a sense of humor (2022)Computational Humor: Can a machine have a sense of humor (2022)
Computational Humor: Can a machine have a sense of humor (2022)
Thomas Winters
 
Virtual Reality: A Renaissance
Virtual Reality: A RenaissanceVirtual Reality: A Renaissance
Virtual Reality: A Renaissance
St. Petersburg College
 
AI and Interactive Narrative
AI and Interactive NarrativeAI and Interactive Narrative
AI and Interactive Narrative
Mirjam Eladhari
 
AI and Interactive Narrative in 2019
AI and Interactive Narrative in 2019 AI and Interactive Narrative in 2019
AI and Interactive Narrative in 2019
Mirjam Eladhari
 
Machine Learning on the web - moving from Terminator to Star Trek
Machine Learning on the web - moving from Terminator to Star TrekMachine Learning on the web - moving from Terminator to Star Trek
Machine Learning on the web - moving from Terminator to Star Trek
Christian Heilmann
 
Semi-sober notes from SxSW 2017
Semi-sober notes from SxSW 2017Semi-sober notes from SxSW 2017
Semi-sober notes from SxSW 2017
George Wang
 
How to stop sucking and be awesome instead
How to stop sucking and be awesome insteadHow to stop sucking and be awesome instead
How to stop sucking and be awesome instead
codinghorror
 
Howtostopsucking
HowtostopsuckingHowtostopsucking
HowtostopsuckingHugo Pinto
 
Howtostopsuckingandbeawesomeinstead 120601013410-phpapp01
Howtostopsuckingandbeawesomeinstead 120601013410-phpapp01Howtostopsuckingandbeawesomeinstead 120601013410-phpapp01
Howtostopsuckingandbeawesomeinstead 120601013410-phpapp01Hugo Pinto
 
Dwarf Fortress Presentation With Notes
Dwarf Fortress Presentation With NotesDwarf Fortress Presentation With Notes
Dwarf Fortress Presentation With Notesdizzyjosh
 
Serious Games Workshop Almere 2012 part 1
Serious Games Workshop Almere 2012 part 1Serious Games Workshop Almere 2012 part 1
Serious Games Workshop Almere 2012 part 1
Oscar Garcia-Panella
 
Persuasive Essay On The Devil And Tom Walker
Persuasive Essay On The Devil And Tom WalkerPersuasive Essay On The Devil And Tom Walker
Persuasive Essay On The Devil And Tom Walker
Mimi Young
 
The Clark-Kozma Debate in the 21st Century
The Clark-Kozma Debate in the 21st Century The Clark-Kozma Debate in the 21st Century
The Clark-Kozma Debate in the 21st Century
Katrin Becker
 
[Game] Programming I Didn't Learn in School
[Game] Programming I Didn't Learn in School[Game] Programming I Didn't Learn in School
[Game] Programming I Didn't Learn in School
💻 Anton Gerdelan
 
Disrupt 2 Grow - Devoxx 2013
Disrupt 2 Grow - Devoxx 2013Disrupt 2 Grow - Devoxx 2013
Disrupt 2 Grow - Devoxx 2013
Konrad Malawski
 

Similar to Updating Software in Humans (20)

Game Design for Storytellers
Game Design for StorytellersGame Design for Storytellers
Game Design for Storytellers
 
intro (1).ppt
intro (1).pptintro (1).ppt
intro (1).ppt
 
CCFW.software
CCFW.softwareCCFW.software
CCFW.software
 
Shall We Play a Game? Gaming the System, When the System Is Your Learning Man...
Shall We Play a Game? Gaming the System, When the System Is Your Learning Man...Shall We Play a Game? Gaming the System, When the System Is Your Learning Man...
Shall We Play a Game? Gaming the System, When the System Is Your Learning Man...
 
The Brain in the Game
The Brain in the GameThe Brain in the Game
The Brain in the Game
 
Computational Humor: Can a machine have a sense of humor (2022)
Computational Humor: Can a machine have a sense of humor (2022)Computational Humor: Can a machine have a sense of humor (2022)
Computational Humor: Can a machine have a sense of humor (2022)
 
Virtual Reality: A Renaissance
Virtual Reality: A RenaissanceVirtual Reality: A Renaissance
Virtual Reality: A Renaissance
 
AI and Interactive Narrative
AI and Interactive NarrativeAI and Interactive Narrative
AI and Interactive Narrative
 
AI and Interactive Narrative in 2019
AI and Interactive Narrative in 2019 AI and Interactive Narrative in 2019
AI and Interactive Narrative in 2019
 
Machine Learning on the web - moving from Terminator to Star Trek
Machine Learning on the web - moving from Terminator to Star TrekMachine Learning on the web - moving from Terminator to Star Trek
Machine Learning on the web - moving from Terminator to Star Trek
 
Semi-sober notes from SxSW 2017
Semi-sober notes from SxSW 2017Semi-sober notes from SxSW 2017
Semi-sober notes from SxSW 2017
 
How to stop sucking and be awesome instead
How to stop sucking and be awesome insteadHow to stop sucking and be awesome instead
How to stop sucking and be awesome instead
 
Howtostopsucking
HowtostopsuckingHowtostopsucking
Howtostopsucking
 
Howtostopsuckingandbeawesomeinstead 120601013410-phpapp01
Howtostopsuckingandbeawesomeinstead 120601013410-phpapp01Howtostopsuckingandbeawesomeinstead 120601013410-phpapp01
Howtostopsuckingandbeawesomeinstead 120601013410-phpapp01
 
Dwarf Fortress Presentation With Notes
Dwarf Fortress Presentation With NotesDwarf Fortress Presentation With Notes
Dwarf Fortress Presentation With Notes
 
Serious Games Workshop Almere 2012 part 1
Serious Games Workshop Almere 2012 part 1Serious Games Workshop Almere 2012 part 1
Serious Games Workshop Almere 2012 part 1
 
Persuasive Essay On The Devil And Tom Walker
Persuasive Essay On The Devil And Tom WalkerPersuasive Essay On The Devil And Tom Walker
Persuasive Essay On The Devil And Tom Walker
 
The Clark-Kozma Debate in the 21st Century
The Clark-Kozma Debate in the 21st Century The Clark-Kozma Debate in the 21st Century
The Clark-Kozma Debate in the 21st Century
 
[Game] Programming I Didn't Learn in School
[Game] Programming I Didn't Learn in School[Game] Programming I Didn't Learn in School
[Game] Programming I Didn't Learn in School
 
Disrupt 2 Grow - Devoxx 2013
Disrupt 2 Grow - Devoxx 2013Disrupt 2 Grow - Devoxx 2013
Disrupt 2 Grow - Devoxx 2013
 

Recently uploaded

Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
manasideore6
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
ssuser7dcef0
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
Steel & Timber Design according to British Standard
Steel & Timber Design according to British StandardSteel & Timber Design according to British Standard
Steel & Timber Design according to British Standard
AkolbilaEmmanuel1
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 

Recently uploaded (20)

Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
Steel & Timber Design according to British Standard
Steel & Timber Design according to British StandardSteel & Timber Design according to British Standard
Steel & Timber Design according to British Standard
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 

Updating Software in Humans

Editor's Notes

  1. Hello! It’s really cool to be here at such an amazing conference! I really love this room, though the table layout reminds me of a wedding. This is the weirdest wedding I’ve ever been at. This is a talk about mental models that plausibly sounds like it could be part of a software delivery conference! It covers some topics already mentioned in other talks, hopefully I won’t contradict them too much.
  2. First I’m going to spoil my own talk. Here’s what you will take away from this talk. If “computers are terrible” is a surprise, you are probably at the wrong conference, and you weren’t watching Laura’s talk just there.
  3. But before I cover those topics, does anybody remember this? Charity suggested on twitter that we should do some deployments on stage! So I’m also going to do a deployment and talk a little bit about why deployments are important to Intercom. Later on I’ll chat through some of the details of our deployment process.
  4. So I’m going to ship something to Intercom live on stage. This is a terrible idea, live demos can go badly and be very tedious to watch! The deployment will probably work. It will definitely eventually work, our deployment pipeline is generally robust, though it’s not extremely fast and is subject to the occasional wobble. There’s a chance it may not go through by the end of this talk especially if I get a bout of nerves and make a mistake! If it doesn’t work, I promise to keep coming back to this conference to talk every year until I do manage to ship successfully to Intercom during a talk.
  5. Here’s what I’m going to ship. It’s a simple redirect of app.intercom.io/THE SHIP EMOJI to shipitcon.com. You can test it now to see if it works. After I kick this off, I’ll explain a little about Intercom and why we care so much about shipping software.
  6. I work at Intercom. Our Dublin office is literally next door, I spend a reasonable amount of time every day looking out of our window onto the roof of this building. Intercom is an Irish software startup who helps online businesses talk to their customers, primarily using a messenger.
  7. Provide software as a service to our customers. Here’s our messenger. There’s gifs, emojis. There’s also a backend app which has a lot more going on but doesn’t work as well on giant screens!
  8. Here’s our messenger. There’s gifs, emojis. There’s also a backend app which has a lot more going on but doesn’t work as well on giant screens!
  9. The messenger space is pretty competitive, there’s a huge opportunity to create a lot of value, and so we have to move fast.
  10. In the R&D team at Intercom we use principles to drive our work. These allow us to teach, share what we’ve learned and scale how we think about building great product. They’re a bunch of lessons learned. They’re opinionated, and not simply A bunch of truisms. The principles distill how we think about building great product.
  11. Here are 3 of our principles we use in our R&D organisation. “Ship to learn” is a universal R&D principle. The sooner we ship, the quicker we learn how our product is used. Getting software into users hands lets you understand how its used quickly.
  12. “What you ship is what matters” is a design principle, used by our design team. Our designers care about what is actually built, delivered and usable by our users, not the artifacts created along the way. The process is important, but the output is the critical part. Sketch? Sigma.
  13. “Build in small steps” is a direct instruction to our engineers. Make small changes frequently. Break work down into safer, smaller steps. This doesn’t just refer to changes done via code deploys and pull requests, but allo usual modern “testing in production” techniques such As feature flags. In addition to being iterative and assistive for an agile development process, there are secondary benefits such as helping our availability, quality and again lets us understand what actually happens when we ship what we’re building. These aren’t universal truisms. It would be reckless of an infrastructure provider to globally or naively apply these. They also imply a bunch of support that you need to do to allow these to happen.
  14. Ok, so getting back on track. Computers are great! They can do things like tell the time, save files, do basic arithmetic, run automated tests, deploy software and talk to each other in large distributed systems just so that we can be all agile and stuff when we want to serve our Uber for cats startup. The thing is, computers are terrible at all of those things. Or rather, those things sound simple, and obvious but reality is quite difficult. And things get even worse when you have lots of computers attached to a network!
  15. Almost all of the time, what we are working with are things that we have imperfect, incomplete and at times mental models. Almost all of the time, this is ok. It is also utterly essential to get anything done that this remains the case. So let’s look at a very basic thing that everybody does with computers every day.
  16. Haha, what even is this thing? It’s a 3.5 inch floppy disk from the 90s! But it’s also used as the icon to save data everywhere, even though nobody saves anything to disk these days. There are a million tweets along these lines, this is not an original joke. Nobody really uses floppy disks any more, we get it. Detachable storage is generally greatly frowned upon, don’t we all use Dropbox etc.? But not only is this a terrible icon, reliably saving stuff to any type of disk is surprisingly hard.
  17. The problem is that there are no guarantees what happens when the program you’re writing calls the write() system call. The OS, filesystem, disks themselves have multiple intermediate layers that are crucial for a high performing system but make it hard to reason about where your data actually is.
  18. The “documentation” doesn’t help. Here’s an extract from “man mount” on a modern Linux system. The “rumour” is from around 2001, nearly 20 years old. There are some common strategies to the limitations of filesystems such as renaming files to achieve atomicity. I’ve done this a load of times in simple situations, but when it comes to actually being able to recover the data in all circumstances. But if you’re writing the parts of MySQL or PostgreSQL or a distributed system like S3 with serious data durability requirements, you need to dig deep here and understand the precise behaviour of the different parts that make up a disk (caches, platters etc.) as well as the drivers, filesystems. For the rest of us? It’s ok to think that when you click the floppy disk and save the file, that the file is saved. It works the vast majority of time and you have to pretend you don’t know what’s happening.
  19. So we’ve shown that computers are terrible and you have to ignore the reality to actually do anything.
  20. This is a good quote, lifted from @copyconstruct’s writeup about mental models. The mental model you need to have unless you’re a MySQL or kernel developer is “the computer saved my file”. I guess the trick is knowing when to go deeper. When things break, that’s a great time to go deeper!
  21. So here’s a war story from Intercom where we had to rebuild our understanding of something.
  22. Here’s a network diagram of our production setup in the cloud. We’re hosted entirely in Amazon Web Services in us-east-1 (North Virginia). There are different network subnets in our cloud. We use three. This is where our services live. I’m not really dumbing this down, we try to keep things very simple.
  23. A subnet has different bits of configuration, like the IP addresses it can use. It also has a routing table that tells the computers where to send the packets. This routing table is ALSO VERY SIMPLE - a small number of entries.
  24. Dumpster fire, everything’s fine dog.
  25. The same routing table is used across all subnets. We use TERRAFORM, an infrastructure as code tool, to manage a bunch of our AWS infrastructure including our network setup. routing tables etc. It allows us to define our network in code and it translates this to AWS API calls. We added NEW SUBNETS but the way we had configured TERRAFORM was that when it was adding subnets, it would REMOVE AND ADD BACK THE ROUTING TABLES WHEN A NEW SUBNET WAS ADDED. I’ll say that again. When we add a new subnet, the routing table was removed from all subnets and then recreated and added back to all the subnets.This made a conceptually simple change into a complex, dangerous change. You wouldn’t do that in the AWS UI The reason for this was that Terraform’s language is pretty basic and gives very few primitives to program with.
  26. By now you can probably guess where this is going, though trust me it gets better. We added some new subnets, but got the new IP ranges wrong. They were overlapping with existing ones. The routing table got DELETED by Terraform, it tried to create the new subnets then boom. And then it basically gave up after it couldn’t create the new subnets. Complete network outage of our production cloud environment in AWS for 14 minutes and 57 seconds. Engineers, mostly in SF, did amazing getting us back to a good state. This was on a Friday evening. I was actually in the pub for my 40th birthday watching all this unfold over Slack. It was very impressive to see our Incident Command and global shared on call kick in. Once we were in a major event mode, of course large amounts of engineers joined to help out. This was great to see, especially from the pub :D
  27. In general, automation is amazing, but sometimes it can really bite you in the ass.
  28. We use OpsWorks, which is basically hosted Chef, to manage our Elasticsearch clusters. Fully documented but not quite well understood feature of AWS’s Opsworks service. Because the hosts weren’t contactable, Opsworks decided to “autoheal” the hosts by moving them to new hardware. All The search data was stored on the nice fast local disks, gone. So basically opsworks auto healed the shit out of our elasticsearch clusters, leaving us with 10 wonderfully empty clusters.
  29. Built a list of known automation, examined cloud watch logs, looked for areas where errors could result in damage or there weren’t safeguards from destroying production infrastrucutre. “single bullet gun” This is incomplete - we weren’t formally proving what is our there, but an audit of what we can work with easily is a good start.
  30. So we’ve shown that computers are terrible and you have to ignore the reality to actually do anything. Since we’ve been looking at that deployment, and this is a conference about delivering software I’m going to show off some of the more interesting parts of our testing process.
  31. Sometimes we change something in our environment that is not backwards compatible. Like, we upgrade a database or something. This check gives us the ability to force developers to a minimum version so that we don’t annoy them. We’ll tell everybody to rebase. Happens infrequently enough, but a good way to avoid doing a lot of work at times. Also we work off of trunk!!!
  32. This is from our Docker file. We don’t use Docker in production, but it’s useful to share test artifacts. We use a vendored copy of a Docker base image from CircleCI (who we don’t use for Intercom) based off Ruby 2.5.5 (which we have long since migrated off of) to install a “stretch” Docker image (we don’t use Debian in production, we use Amazon Linux, which is CentOS based). Why do we vendor this? Because upstream changes have broken our deployment pipeline, so we’d rather control this. That’s a lesson learned!
  33. We use a vendored copy of a Docker base image from CircleCI (who we don’t use for Intercom) based off Ruby 2.5.5 (which we have long since migrated off of) to install a “stretch” Docker image (we don’t use Debian in production, we use Amazon Linux, which is CentOS based). Why do we vendor this? Because upstream changes have broken our deployment pipeline, so we’d rather control this. That’s a lesson learned!
  34. We use a vendored copy of a Docker base image from CircleCI (who we don’t use for Intercom) based off Ruby 2.5.5 (which we have long since migrated off of) to install a “stretch” Docker image (we don’t use Debian in production, we use Amazon Linux, which is CentOS based). Why do we vendor this? Because upstream changes have broken our deployment pipeline, so we’d rather control this. That’s a lesson learned!
  35. We use a vendored copy of a Docker base image from CircleCI (who we don’t use for Intercom) based off Ruby 2.5.5 (which we have long since migrated off of) to install a “stretch” Docker image (we don’t use Debian in production, we use Amazon Linux, which is CentOS based). Why do we vendor this? Because upstream changes have broken our deployment pipeline, so we’d rather control this. That’s a lesson learned!
  36. Here we just install a vendored version of MongoDB Enterprise server (we don’t use enterprise server in production) for some reason at the end of a shell chain that installs a vendored version of OpenSSL. Something to do with the move from jessie. “It works”
  37. Here’s libeatmydata
  38. Whirlworld tour through Docker Some Mongo confi, some installing MySQL from scratch (We run AWS’s RDS Aurora in production), some Redis stable (we use ElastiCache Redis in production) And some Ruby stuff! Next, we load the schema from a cache… no I mean the schema cache from a… cache…
  39. In our modern CI/CD environment, a “green” build has passed all its tests and is therefore safe to ship to production. A quick look under the covers shows that this is often far from the case, for example non-deterministic tests being covered up by retries. Also when we deploy our monolith to over a thousand servers, it doesn’t happen at once. At any moment there might be many software versions running, even on the same host. But for all but a bunch of narrow edge cases, this is totally fine. My point is here that we’ve got a reliable, battle hardened complex build and deploy system. To ship code at Intercom, you’re better off not knowing about any of this stuff! It’s another example of having to act dumb to get your job done. The abstraction here is critical.
  40. So we’ve shown that computers are terrible and you have to ignore the reality to actually do anything. Updating your knowledge of what’s going on is useful when you realise that what you’re working with doesn’t work the way you expect.
  41. Once you realise there’s value in updating your model of the world, what can you do about it? For example, my understanding of the main drivers of Intercom’s AWS cloud costs and how they are influenced by how we autoscale isn’t something that can be solved with writing some debugging statements . Analyse data, ingest it into analytical tools, build hypotheses and make decisions on the basis of what you now think you know.
  42. There are some great materials on a bunch of the topics I’ve covered here that go into a lot more detail and probably ten times more articulate. Copy Construct’s blog post that I already mentioned, Tanya Reilly’s “Nobody could have predicted this”, Dan Luu’s talk from Deconstruct which goes into a lot of detail about how reliably saving data is very hard. I’ll tweet out links and the slide deck after the talk!
  43. The backend looks like this. It looks similar enough to
  44. So unlike the proverbial frog in boiling water, we did notice some problems.
  45. There’s my twitter handle, I hope you enjoyed the talk. Thanks for listening!