SlideShare a Scribd company logo
1 of 26
@paprikati_eng
About Me
Lisa Karlin Curtis
Software Engineer at GoCardless
I blog at paprikati.github.io
@paprikati_eng
@paprikati_eng
API
API
Us
API
Webhook
Handler
Integrator
Database
Webhook
GET /resource
UPDATE
@paprikati_eng
Adding a mandatory
field to an endpoint
Breaking apart a
database transaction
Introducing a rate
limit
Changing an error
response string
Changing the timing
of batch processing
Reducing the
latency on an API
call
@paprikati_eng
Things break
because an
assumption
made by the
integrator is no
longer correct
@paprikati_eng
How do assumptions develop?
01 Documentation
02 Support Articles & Blog Posts
03 Ad Hoc Communication
04 Industry Standards
05 Observed behaviour
@paprikati_eng
Explicit Assumptions
01 Documentation
02 Support Articles & Blog Posts
03 Ad Hoc Communication
04 Industry Standards
05 Observed behaviour
@paprikati_eng
Implicit Assumptions
01 Documentation
02 Support Articles & Blog Posts
03 Ad Hoc Communication
04 Industry Standards
05 Observed behaviour
@paprikati_eng
Developers
witness a
behaviour, and
assume it is
reliable
@paprikati_eng
@paprikati_eng
If it hasn’t broken for a long time, we
think it never will
@paprikati_eng
Avoiding bad assumptions
01 Documentation
02 Support Articles & Blog Posts
03 Ad Hoc Communication
04 Industry Standards
05 Observed behaviour
@paprikati_eng
Avoiding bad assumptions
01 Documentation
02 Support Articles & Blog Posts
03 Ad Hoc Communication
04 Industry Standards
05 Observed behaviour
@paprikati_eng
Avoiding bad assumptions
01 Documentation
02 Support Articles & Blog Posts
03 Ad Hoc Communication
04 Industry Standards
05 Observed behaviour
@paprikati_eng
Avoiding bad assumptions
01 Documentation
02 Support Articles & Blog Posts
03 Ad Hoc Communication
04 Industry Standards
05 Observed behaviour
@paprikati_eng
Avoiding bad assumptions
01 Documentation
02 Support Articles & Blog Posts
03 Ad Hoc Communication
04 Industry Standards
05 Observed behaviour
@paprikati_eng
@paprikati_eng
Avoiding bad assumptions
Given that many integrators just look at the
HTTP examples, naming is critical
Deliberately call out tripwires in your docs to
combat pattern matching
Restrict and document your behaviour as
explicitly as you can
@paprikati_eng
‘Breaking’ is not
a binary
@paprikati_eng
Empathise with
your integrators
How breaking is my change?
Dogfood your
products
Observe and
measure
@paprikati_eng
Releasing a potentially breaking change
Pull Comms
Updating docs or a
changelog
Push Comms
Newsletter or email to
integrators
Ack’d Comms
Wait for a positive
response from
integrators before
rolling out a change
01 02 03
Likelihood of a change being breaking
@paprikati_eng
Can you make the change incremental?
Can you release the change into a test environment?
Can you easily roll back if there are unexpected consequences?
Releasing a potentially breaking change
@paprikati_eng
Find the balance
between caution
and product delivery
that’s right for you.
CREDITS: This presentation template was adapted from a
template by Slidesgo, including icons by Flaticon, and
images by Unsplash
@paprikati_eng

More Related Content

What's hot

What's hot (20)

Developer Support Models: Calibrating Service Level to Commitment
Developer Support Models: Calibrating Service Level to CommitmentDeveloper Support Models: Calibrating Service Level to Commitment
Developer Support Models: Calibrating Service Level to Commitment
 
apidays LIVE New York 2021 - Service API design validation by Uchit Vyas, KPMG
apidays LIVE New York 2021 - Service API design validation by Uchit Vyas, KPMGapidays LIVE New York 2021 - Service API design validation by Uchit Vyas, KPMG
apidays LIVE New York 2021 - Service API design validation by Uchit Vyas, KPMG
 
Take Your API Docs from 406 Not Acceptable to 200 OK
Take Your API Docs from 406 Not Acceptable to 200 OKTake Your API Docs from 406 Not Acceptable to 200 OK
Take Your API Docs from 406 Not Acceptable to 200 OK
 
apidays LIVE New York 2021 - Design-First: How to champion an API culture shi...
apidays LIVE New York 2021 - Design-First: How to champion an API culture shi...apidays LIVE New York 2021 - Design-First: How to champion an API culture shi...
apidays LIVE New York 2021 - Design-First: How to champion an API culture shi...
 
Dependency Down, Flexibility Up – The Benefits of API-First Development
Dependency Down, Flexibility Up – The Benefits of API-First DevelopmentDependency Down, Flexibility Up – The Benefits of API-First Development
Dependency Down, Flexibility Up – The Benefits of API-First Development
 
Case Study: Creating a DocOps/Docs-As-Code DevPortal for C3.ai
Case Study: Creating a DocOps/Docs-As-Code DevPortal for C3.aiCase Study: Creating a DocOps/Docs-As-Code DevPortal for C3.ai
Case Study: Creating a DocOps/Docs-As-Code DevPortal for C3.ai
 
Rest api best practices – comprehensive handbook
Rest api best practices – comprehensive handbookRest api best practices – comprehensive handbook
Rest api best practices – comprehensive handbook
 
apidays LIVE LONDON - Unlock the Power of OAS in the Last Mile of your Lifecy...
apidays LIVE LONDON - Unlock the Power of OAS in the Last Mile of your Lifecy...apidays LIVE LONDON - Unlock the Power of OAS in the Last Mile of your Lifecy...
apidays LIVE LONDON - Unlock the Power of OAS in the Last Mile of your Lifecy...
 
Recipes for API Ninjas
Recipes for API NinjasRecipes for API Ninjas
Recipes for API Ninjas
 
apidays LIVE New York 2021 - Docs Driven API Development by Rahul Dighe, Paypal
apidays LIVE New York 2021 - Docs Driven API Development by Rahul Dighe, Paypalapidays LIVE New York 2021 - Docs Driven API Development by Rahul Dighe, Paypal
apidays LIVE New York 2021 - Docs Driven API Development by Rahul Dighe, Paypal
 
Engineer Stunning (API) documentation
Engineer Stunning (API) documentationEngineer Stunning (API) documentation
Engineer Stunning (API) documentation
 
apidays LIVE JAKARTA - Machine Learning powered API governance by Jenks Guo
apidays LIVE JAKARTA - Machine Learning powered API governance by Jenks Guoapidays LIVE JAKARTA - Machine Learning powered API governance by Jenks Guo
apidays LIVE JAKARTA - Machine Learning powered API governance by Jenks Guo
 
Standardizing APIs Across Your Organization with Swagger and OAS | A SmartBea...
Standardizing APIs Across Your Organization with Swagger and OAS | A SmartBea...Standardizing APIs Across Your Organization with Swagger and OAS | A SmartBea...
Standardizing APIs Across Your Organization with Swagger and OAS | A SmartBea...
 
The Inverted Funnel of API Documentation
The Inverted Funnel of API DocumentationThe Inverted Funnel of API Documentation
The Inverted Funnel of API Documentation
 
Building an API Platform for Digital Transformation
Building an API Platform for Digital TransformationBuilding an API Platform for Digital Transformation
Building an API Platform for Digital Transformation
 
Designing APIs and Microservices Using Domain-Driven Design
Designing APIs and Microservices Using Domain-Driven DesignDesigning APIs and Microservices Using Domain-Driven Design
Designing APIs and Microservices Using Domain-Driven Design
 
apidays LIVE Australia 2020 - The Evolution of APIs: Events and the AsyncAPI ...
apidays LIVE Australia 2020 - The Evolution of APIs: Events and the AsyncAPI ...apidays LIVE Australia 2020 - The Evolution of APIs: Events and the AsyncAPI ...
apidays LIVE Australia 2020 - The Evolution of APIs: Events and the AsyncAPI ...
 
APIdays Paris 2019 - Improve the Security of Your APIs by Securing the API Li...
APIdays Paris 2019 - Improve the Security of Your APIs by Securing the API Li...APIdays Paris 2019 - Improve the Security of Your APIs by Securing the API Li...
APIdays Paris 2019 - Improve the Security of Your APIs by Securing the API Li...
 
apidays LIVE Helsinki & North - Ideas around automating API Management by Mat...
apidays LIVE Helsinki & North - Ideas around automating API Management by Mat...apidays LIVE Helsinki & North - Ideas around automating API Management by Mat...
apidays LIVE Helsinki & North - Ideas around automating API Management by Mat...
 
apidays LIVE LONDON - Protecting financial-grade APIs - Getting the right API...
apidays LIVE LONDON - Protecting financial-grade APIs - Getting the right API...apidays LIVE LONDON - Protecting financial-grade APIs - Getting the right API...
apidays LIVE LONDON - Protecting financial-grade APIs - Getting the right API...
 

Similar to How to avoid breaking other people's things

Tailoring the DITA Suit to Fit
Tailoring the DITA Suit to FitTailoring the DITA Suit to Fit
Tailoring the DITA Suit to Fit
Salesforce Engineering
 
To Open Banking and Beyond: Developing APIs that are Resilient to every new I...
To Open Banking and Beyond: Developing APIs that are Resilient to every new I...To Open Banking and Beyond: Developing APIs that are Resilient to every new I...
To Open Banking and Beyond: Developing APIs that are Resilient to every new I...
Curiosity Software Ireland
 
How to Use Data to Inform Your Design and Drive Your Business
How to Use Data to Inform Your Design and Drive Your BusinessHow to Use Data to Inform Your Design and Drive Your Business
How to Use Data to Inform Your Design and Drive Your Business
Kissmetrics on SlideShare
 

Similar to How to avoid breaking other people's things (20)

#ATAGTR2020 Presentation - Redefining DevOps for seamless performance testing
#ATAGTR2020 Presentation - Redefining DevOps for seamless performance testing#ATAGTR2020 Presentation - Redefining DevOps for seamless performance testing
#ATAGTR2020 Presentation - Redefining DevOps for seamless performance testing
 
Building regression tests to increase velocity and prevent things from “Going...
Building regression tests to increase velocity and prevent things from “Going...Building regression tests to increase velocity and prevent things from “Going...
Building regression tests to increase velocity and prevent things from “Going...
 
Tailoring the DITA Suit to Fit
Tailoring the DITA Suit to FitTailoring the DITA Suit to Fit
Tailoring the DITA Suit to Fit
 
To Open Banking and Beyond: Developing APIs that are Resilient to every new I...
To Open Banking and Beyond: Developing APIs that are Resilient to every new I...To Open Banking and Beyond: Developing APIs that are Resilient to every new I...
To Open Banking and Beyond: Developing APIs that are Resilient to every new I...
 
When RESTful may be considered harmful
When RESTful may be considered harmfulWhen RESTful may be considered harmful
When RESTful may be considered harmful
 
Implementation Presentation
Implementation PresentationImplementation Presentation
Implementation Presentation
 
Inside Developer Relations at AWS
Inside Developer Relations at AWSInside Developer Relations at AWS
Inside Developer Relations at AWS
 
Guidewire Connections 2023 DE-4 Using AI to Accelerate Application Integration
Guidewire Connections 2023 DE-4 Using AI to Accelerate Application IntegrationGuidewire Connections 2023 DE-4 Using AI to Accelerate Application Integration
Guidewire Connections 2023 DE-4 Using AI to Accelerate Application Integration
 
What makes a cellular IoT API great? Tobias Goebel
What makes a cellular IoT API great? Tobias GoebelWhat makes a cellular IoT API great? Tobias Goebel
What makes a cellular IoT API great? Tobias Goebel
 
The AppExchange for Developers
The AppExchange for DevelopersThe AppExchange for Developers
The AppExchange for Developers
 
Test Everything: TrustRadius Delivers Customer Value with Experimentation
Test Everything: TrustRadius Delivers Customer Value with ExperimentationTest Everything: TrustRadius Delivers Customer Value with Experimentation
Test Everything: TrustRadius Delivers Customer Value with Experimentation
 
Designing Good API & Its Importance
Designing Good API & Its ImportanceDesigning Good API & Its Importance
Designing Good API & Its Importance
 
ERP Merged Slides.pdf
ERP Merged Slides.pdfERP Merged Slides.pdf
ERP Merged Slides.pdf
 
B2B eCommerce on Salesforce: The Facts
B2B eCommerce on Salesforce: The FactsB2B eCommerce on Salesforce: The Facts
B2B eCommerce on Salesforce: The Facts
 
Cloud Expo - Designing Cloud Solutions for Customers
Cloud Expo - Designing Cloud Solutions for CustomersCloud Expo - Designing Cloud Solutions for Customers
Cloud Expo - Designing Cloud Solutions for Customers
 
How to deal with REST API Evolution
How to deal with REST API EvolutionHow to deal with REST API Evolution
How to deal with REST API Evolution
 
Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!
 
10 Immutable Steps to Mobilize Your Business
10 Immutable Steps to Mobilize Your Business10 Immutable Steps to Mobilize Your Business
10 Immutable Steps to Mobilize Your Business
 
How GetNinjas uses data to make smarter product decisions
How GetNinjas uses data to make smarter product decisionsHow GetNinjas uses data to make smarter product decisions
How GetNinjas uses data to make smarter product decisions
 
How to Use Data to Inform Your Design and Drive Your Business
How to Use Data to Inform Your Design and Drive Your BusinessHow to Use Data to Inform Your Design and Drive Your Business
How to Use Data to Inform Your Design and Drive Your Business
 

More from Pronovix

Inclusive, Accessible Tech: Bias-Free Language in Code and Configurations
Inclusive, Accessible Tech: Bias-Free Language in Code and ConfigurationsInclusive, Accessible Tech: Bias-Free Language in Code and Configurations
Inclusive, Accessible Tech: Bias-Free Language in Code and Configurations
Pronovix
 
Creating API documentation for international communities
Creating API documentation for international communitiesCreating API documentation for international communities
Creating API documentation for international communities
Pronovix
 
Docs-as-Code: Evolving the API Documentation Experience
Docs-as-Code: Evolving the API Documentation ExperienceDocs-as-Code: Evolving the API Documentation Experience
Docs-as-Code: Evolving the API Documentation Experience
Pronovix
 

More from Pronovix (20)

By the time they're reading the docs, it's already too late
By the time they're reading the docs, it's already too lateBy the time they're reading the docs, it's already too late
By the time they're reading the docs, it's already too late
 
Optimizing Dev Portals with Analytics and Feedback
Optimizing Dev Portals with Analytics and FeedbackOptimizing Dev Portals with Analytics and Feedback
Optimizing Dev Portals with Analytics and Feedback
 
Success metrics when launching your first developer portal
Success metrics when launching your first developer portalSuccess metrics when launching your first developer portal
Success metrics when launching your first developer portal
 
Documentation, APIs & AI
Documentation, APIs & AIDocumentation, APIs & AI
Documentation, APIs & AI
 
Making sense of analytics for documentation pages
Making sense of analytics for documentation pagesMaking sense of analytics for documentation pages
Making sense of analytics for documentation pages
 
Feedback cycles and their role in improving overall developer experiences
Feedback cycles and their role in improving overall developer experiencesFeedback cycles and their role in improving overall developer experiences
Feedback cycles and their role in improving overall developer experiences
 
GraphQL Isn't An Excuse To Stop Writing Docs
GraphQL Isn't An Excuse To Stop Writing DocsGraphQL Isn't An Excuse To Stop Writing Docs
GraphQL Isn't An Excuse To Stop Writing Docs
 
API Documentation For Web3
API Documentation For Web3API Documentation For Web3
API Documentation For Web3
 
Why your API doesn’t solve my problem: A use case-driven API design
Why your API doesn’t solve my problem: A use case-driven API designWhy your API doesn’t solve my problem: A use case-driven API design
Why your API doesn’t solve my problem: A use case-driven API design
 
unREST among the docs
unREST among the docsunREST among the docs
unREST among the docs
 
Developing a best-in-class deprecation policy for your APIs
Developing a best-in-class deprecation policy for your APIsDeveloping a best-in-class deprecation policy for your APIs
Developing a best-in-class deprecation policy for your APIs
 
Annotate, Automate & Educate: Driving generated OpenAPI docs to benefit everyone
Annotate, Automate & Educate: Driving generated OpenAPI docs to benefit everyoneAnnotate, Automate & Educate: Driving generated OpenAPI docs to benefit everyone
Annotate, Automate & Educate: Driving generated OpenAPI docs to benefit everyone
 
What do developers do when it comes to understanding and using APIs?
What do developers do when it comes to understanding and using APIs?What do developers do when it comes to understanding and using APIs?
What do developers do when it comes to understanding and using APIs?
 
Inclusive, Accessible Tech: Bias-Free Language in Code and Configurations
Inclusive, Accessible Tech: Bias-Free Language in Code and ConfigurationsInclusive, Accessible Tech: Bias-Free Language in Code and Configurations
Inclusive, Accessible Tech: Bias-Free Language in Code and Configurations
 
Creating API documentation for international communities
Creating API documentation for international communitiesCreating API documentation for international communities
Creating API documentation for international communities
 
One Developer Portal to Document Them All
One Developer Portal to Document Them AllOne Developer Portal to Document Them All
One Developer Portal to Document Them All
 
Docs-as-Code: Evolving the API Documentation Experience
Docs-as-Code: Evolving the API Documentation ExperienceDocs-as-Code: Evolving the API Documentation Experience
Docs-as-Code: Evolving the API Documentation Experience
 
Developer journey - make it easy for devs to love your product
Developer journey - make it easy for devs to love your productDeveloper journey - make it easy for devs to love your product
Developer journey - make it easy for devs to love your product
 
Complexity is not complicatedness
Complexity is not complicatednessComplexity is not complicatedness
Complexity is not complicatedness
 
How cognitive biases and ranking can foster an ineffective architecture and d...
How cognitive biases and ranking can foster an ineffective architecture and d...How cognitive biases and ranking can foster an ineffective architecture and d...
How cognitive biases and ranking can foster an ineffective architecture and d...
 

Recently uploaded

Recently uploaded (20)

Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Buy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptxBuy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptx
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreel
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 

How to avoid breaking other people's things

Editor's Notes

  1. I’m Lisa Karlin Curtis, born and bred in London. I’m a software engineer at GoCardless working in our core-banking team. I’m gonna be talking about how to stop breaking other people’s things
  2. We’re going to start with a sad story. A developer notices that they have an endpoint that has a really high latency compared to what they’d expect. They find a performance issue in the code (essentially an exascerbated N+1 problem), and they deploy a fix. The latency on the endpoint goes down by a half. The developer stares at the beautiful graph with a lovely cliff shape, feels good about themselves, and moves on. Somewhere else in the world, another developer gets paged - their database CPU usage has spiked and it is struggling to handle the load. So what happened here?
  3. They start investigating - there’s no obvious cause. No recent changes, request volume is pretty much as expected. They start scaling down queues to relieve the pressure, which solves the immediate issue. The database seems to have recovered. Then they notice something strange. They’ve suddenly started processing webhooks much more quickly than they used to. It turns out that our integrator had a webhook handler would receive a webhook from us and then make a request back to find the status of the resource. This was the endpoint that we had fixed earlier that day. By the way, I’m going to use the word integrator a lot - what I mean is people who are integrating against the API that you are maintaining. Sometimes that will be inside your company, sometimes it will be a customer. Back to the story. That webhook handler spent most of its time waiting for our response, before then updating its own database. So the slow endpoint was essentially rate limiting the webhook handler’s interaction with its own database. It’s worth noting that our webhooks are often a result of batch processes, so they are really spiky - we send lots of them in a short space of time, a couple of times a day As the endpoint got faster, during those spikes, the webhook handler started to apply more load to the database than normal, to such an extent that an engineer got paged to resolve a service degradation. The fix here is fairly simple: scale down the webhook handlers so they process fewer webhooks and the database usage returns to normal. Or alternatively, beef up your database. This shows us just how easy it is to accidentally break someone else’s thing - even if you’re trying to do right by your integrators.
  4. When do we break things? To set the scene, here are some examples of changes that have broken code in the past: Traditional API changes - adding a mandatory field, removing an endpoint, changing validation logic - I think we’re all comfortable with this stuff Introducing a rate limit / changing your rate limiting logic - docker did this recently and I think communicated really clearly, but it obviously impacted lots of their integrators Changing an error string: At GoCardless we found a bug where we weren’t respecting the accept-language header on a few of our endpoints, and we fixed it, and one of our integrators raised a ticket saying that we’d broken their software - it turned out they were relying on us not translating that particular error. Breaking apart a database transaction Change timing of your batch processing We can see from our logs that certain integrators create lots of payments ‘just-in-time’ - i.e. just before our daily payment run, so we know that changing our timings without communicating with them would cause significant issues Reducing the latency on an API call END SLIDE at about 5-6 mins
  5. I’m gonna define a breaking change as something where I (the API developer) do a thing and someone’s integration breaks. And that happens because an assumption made by that integrator is no longer correct. When this happens, it’s easy to criticise that engineer whose made that assumption Assumptions are inevitable - as a developer you really can’t get anywhere without them Even if it is their fault, it’s often your problem. Possibly not if you’re google or AWS (unless it’s slack that you’ve killed), but for most companies if your integrators are feeling pain, then you’ll feel it too. either immediately or in the long term, when you're trying to renew contracts.
  6. There are a few different ways that assumptions develop
  7. Some of these are explicit: a integrator asking a question, getting an answer, And builds their system based on that answer The first step when you’re building an integration is often to look at the documentation. Although it's worth noting that people often skip to the examples and don't actually read any of the text that you have slaved over so you really need to make sure that your example is a super representative They might also look at support articles and blog posts - either stuff you’ve published Or maybe from a third party. And then you have ad hoc communication So what I mean by this is random emails or phone calls maybe with a pre sales team or your solution engineers, it might be a conversation that gets had on a support ticket. It might be emailing the friend that you have that used to work at the company and all of that kind of ad hoc communication is still driving the assumptions that integrators make about how your software is going to behave.
  8. Other assumptions are more implicit. Industry standards are quite interesting: you send me a json response you're going to give me an application/json header. So I don't need to tell my http client that it's going to be json because it can work it out for itself and i'm going to assume as an integrator that that never changes. Similarly, I assume that you will keep my secrets safe. So if you tell me my access token was used to create something, I’ll assume it was me. Generally this stuff is fine, but in some cases you can find yourself in trouble if these standards change We had a really bad incident where we upgraded our HA Proxy version which was observing the new industry standard And downcased all our outgoing HTTP headers. According to the official textbook, HTTP response headers should not be treated as case sensitive, but a couple of key integrators had been relying on the previous behaviour and had a significant outage. And that outage was actually exacerbated by the fact that their requests were being process but they weren't processing our response and that meant that we had two systems that are out of sync in a really unfortunate way. Observed behaviour Skip to next slide!
  9. As a integrator, you want the engineers who run the services that you use to be constantly improving it and adding features, but in a way you also want them to not touch it so you can be sure that its behaviour won’t change. As soon as a developer sees something, whether that’s An undocumented header on an HTTP response A batch process that happens at the same time every day A particular API latency They assume it’s reliable and build their systems accordingly. Humans also pattern match really aggressively - not just in software but in all walks of life. We find it very easy to convince ourselves that correlation = causation And that means particularly if we can come up with an explanation of why A always means B, we are quick to accept and rely on it. When you think about it, this is a bit bizarre - we are all employed to make changes to our own systems, We should understand that they are constantly in flux. We also all encounter interesting edge cases every day where someone has hit some incredibly unlikely scenario that’s caused your code to misbehave. But we all assume that everyone else’s will stay exactly the same forever. T-15 mins
  10. None of this stuff is new. A great example of this is MS-DOS. MS-DOS was released with a number of documented interrupts, calls hooks - all that retro stuff - but early application developers found that they weren’t able to achieve everything they wanted. This was made worse because microsoft would use undocumented calls in their own software, so it was impossible to compete using only what was in the documentation. So like all good engineers, they started decompiling the OS, and writing lists of undocumented information like ralf brown’s interrupt list. This information was shared, and using these undocumented features became so widespread that microsoft couldn’t change anything without breaking all these applications that people used every day. We can think of the interrupt list being analogous to someone writing a blog on medium called ‘10 things you didn’t know that X API could do’
  11. Some of these assumptions are also unconscious. Once something is stable for a while, we sort of just assume it will never break. We also make our resourcing choices based on previous data because napkin math is always quite haphazard so when i'm choosing how much cpu to allocate to my pod. I pick a number out of thin air, and then I see what happens, and then I change it until it’s happy. That works fine as long as what that pod is being asked to do is reasonably consistent over time, but as we've discussed that's not always true. We can think about this in our first story - the database had plenty of resource until our endpoint got faster
  12. So if we want to stop breaking other people’s things, we need to help our integrators stop making bad assumptions.
  13. Document edge cases Discoverability is important - think about SEO and also search within your docs site Don’t ever deliberately not document something. If it’s subject to change, call it out so there’s no ambiguity.
  14. Keep your own religiously up-to-date and searchable If you’ve got 3rd party blogs that are incorrect, try contacting the author or commenting with the fix needed to make the guide work or point them at an equivalent page If you get unlucky, that 3rd party content can become the equivalent of ralf brown’s interrupt list.
  15. Consistency is key. If a developer wants to understand what might break things, they need to know what communication is going out, ideally in a super searchable format. In my experience many B2B software companies end up emailing random PDFs around or creating shared slack channels, at which point the engineers working on the product don’t really stand a chance of knowing what assumptions might have been made as a result.
  16. Follow them where you can Flag really loudly if you can’t, or where the industry has not yet settled
  17. There’s a lot to think about with observed behaviour
  18. Naming is really important. Particularly when developers don’t read the docs and just look at the examples An example is numbers that begin with 0s which often get truncated (company reg. number) We also have a field in our API called ‘account_number_ending’, but unfortunately in Australia some account numbers have letters in them, which is pretty sad. You can also try to draw attention to it in the docs - particularly by making the example include the edge case Use documentation and communication to combat pattern matching If you know you could change your batch timings, call that out in the docs ‘we currently run it once a day at 11am, but this is likely to change’ Expose information on your API that you might want to change - it’s a good flag. Restrict your own behaviour both by documenting a limit and then implementing it in the code to ensure you keep to that commitment. We had an issue at GoCardless where somebody that we integrate with started adding a lot of extra events to each webhook And our webhook handlers ran out of memory because they were loading so much data. T - 11 mins
  19. For complex products, it’s very unlikely that all your integrators will have avoided bad assumptions. So we need to find strategies to mitigate the impact of our changes.
  20. The first thing to remember is that a change isn’t either breaking or not. If a integrator has done something strange enough, almost anything can be breaking. This binary is historically used to assign blame: if it’s not ‘breaking’ then its the integrators fault. As we discussed earlier, it may not be technically ‘your fault’ but it’s probably still your problem. If your biggest customer’s integration breaks, the fact that you didn’t ‘break the rules’ will be little consolation to the engineers up all night trying to resolve it. So instead of thinking about it as a yes/no question - we should think about it in terms of probabilities. How likely is it that someone is relying on this behaviour.
  21. Not all breaking changes are equal - yes some changes are 100% breaking (e.g. killing an endpoint). But many are neither 0% or 100% Try to empathise with your integrators about what assumptions they might have made. Use people in your organisation who are less familiar with the specifics than you are to rubber duck. If possible, try and talk to some of them. If you can, find ways to dogfood your APIs to find tripwires. This is particularly good as an onboarding exercise - it helps your new joiners immediately put themselves in the shoes of your integrators, And helps you keep docs and guides up-to-date as well as introducing them to your product. Sometimes you can even measure it - add observability to help you look for people relying on this undocumented behaviour - for example we can see a spike in Payment Create requests every day just before our payment run. This can also help you identify which integrators will be impacted
  22. Scale your release approach depending on how many integrators you think have made the bad assumption. We want to have different strategies to employ at different levels. If we over communicate, we get into a ‘boy who cried wolf’ situation where no-one reads anything you send them, and their stuff ends up breaking anyway. Surprisingly, the email in their inbox that they didn’t read doesn’t make them feel better. Start at pull comms - updating docs or a changelog. This is useful to help integrators recover after they’ve found an issue You can then upgrade to push comms - perhaps a newsletter or email. This is where it gets tough - we all ignore emails every day - so try to make sure the content is as relevant as possible. Don’t tell integrators about changes to features they don’t use, and try to resist the temptation to include marketing content in the developer-focussed comms. Then if you’re really worried, you can use explicitly acknowledged comms. This works well if you have a few key integrators you want to check in with before pulling the trigger. T-5 mins
  23. We can also mitigate the impact of a breaking change by releasing it in different ways. If at all possible, you want to try and make changes incrementally to help give early warning signs to your integrators. For example, apply the new behaviour to a % of requests. That will help integrators avoid performance cliffs and could turn a potential outage into a minor service degradation. Many integrators will have ‘near miss’ alerting to help them identify problems before they cause significant damage. If you’ve got a test or sandbox environment, that’s also a great candidate. Making changes there (if integrators are actively using it) can act as the canary in the coal mine. The final point is about rolling back - if your biggest integrators phones you and tells you that you’ve broken their integration, it’s really nice to have a kill switch in your back pocket to stop the bleeding. Now that's obviously not always possible, because it totally depends on the nature of the change. But it's worth knowing what that kill switches and also being really clear internally about when that isn't isn't possible so that as soon as that call comes in, you know what your options are.
  24. The only way to truly avoid breaking other people’s things, is to not change anything at all, and often even that is not possible. Also, we’d mostly be out of a job. Instead, we should think in terms of managing risk. We’ve talked about ways of preventing these issues by helping your integrators make good assumptions in the first place, And how important it is to build and maintain a capability to communicate when you are making potentially breaking changes to help mitigate the impact But, you aren’t a mind reader, and integrators are sometimes careless and under pressure, just like you. So be cautious; assume that your integrators didn’t read the docs perfectly, or at all, and may have cut corners. They may not have the observability of their systems that you might hope or expect. You need to find the balance between caution and product delivery that’s right for your organisation. For all the modern talk of ‘move fast and break things’, it is still painful when stuff breaks and it can take a lot of time and energy to recover. Building trust with your integrators is critical to the success of a product, but so is delivering features. We may not be able to completely stop breaking other people’s things, but we can definitely make it much less likely if we put the effort in.
  25. I hope you’ve enjoyed the talk - thank you for listening! Please find me on twitter at @paprikati_eng if you’d like to chat about anything we’ve covered today Have a great day!