SlideShare a Scribd company logo
1 of 38
Download to read offline
Outline
Error Handling in Concurrent Systems
Aka Building Concurrent Systems in a Hostile Environment
Turning the dumpster fire we have into the one we deserve.
Hi
I’m Angus, I Guess I work at Liveops Cloud My opinions are my own
(as much as you can own an opinion comma man). @angusiguess
on twitter angusiguess on github I like bikes. A lot of bikes. A lot.
Why I’m interested
This time last year, I was interested in systems working, so I
talked about correctness.
A lot has happened since then.
Why I’m interested
Namely, a lot of my code has gone to production.
And a lot of that code has failed.
Sometimes silently.
And things have gotten weird.
So I did some reading
Because someone smarter than me probably solved
this in the 60’s through 80’s.
And I found this:
"Making reliable distributed systems in the presence of
software errors"
The Open Telecon Platform Model
An almost certainly reductionist history of computing.
For a while, computers could work synchronously.
Instructions could be processed in order.
A lot of tricks to deal with I/O, memory mapping, hardware.
Then communication networks happened:
Computing borrowed ideas from railroads and telegrams
Then computers were used to drive phones
Then phones were used to connect computers.
Gave rise to two obvious paradigms
Sequential
Concurrent
Modelling problems
A lot of computation benefits from being modelled
sequentially.
Problems where order matters
Numerical problems
Reading from and writing to things
Even executing programs
Modelling problems
A lot of computation suffers when modelled sequentially.
Communication
Sensory data
Modelling things affecting each other rather than the world
affecting things
When communication gets important, so does concurrency
1986, Joe Armstrong starts work on Erlang, to program
telephone systems.
Erlang is a strange language
Doesn’t like to share memory
Programs are split into processes
Processes have to send messages to each other
No guarantees that a message has been received
Processes don’t always know where to find each other
Everything old is new again.
2013, clojure core team starts work on core.async, based on
go’s goroutines
goroutines don’t share memory
goroutines communicate by putting messages on channels
No guarantees about a message being received
no way to even determine who is listening to a channel
What fresh hell is this?
These seem like strong constraints.
Why assume them?
Shared Memory
Suppose process P1 and P2 each have a list of instructions
P1 and P2 start executing at roughly the same time,
modifying memory.
We can’t guarantee the order that P1 and P2 will interleave
How can we write safe programs?
Well we kind of can’t. We can write some safe programs
Function Calls
Depend on the existence of a receiving function.
Couple the caller to the receiver
Not knowing about places
Assume the receiver will be there when we ask for something
Also a way to enforce no shared state
Still unclear
Synchronous systems fail as one.
Like a magic eight ball.
Concurrent systems fail partially
Like a highway or a casino
We can’t assume that all of our system will be intact
How are we supposed to work like this?
I quit
I always wanted to be a bike messenger anyway
No wait don’t go!
We can fix it
We just have to change how we think
Haha jk don’t try to fix it
Rule #1: Don’t try to fix it.
If we have a single process, we can try as hard as we want
before we fail
Things will either work, kind of work, or not work at all
If we have lots of processes we have to think about all the
ways a piece could fail.
It’s too much, so what if we just don’t?
Exceptions, Errors, and Failures
Exceptions are when the runtime hits something unspecified
Errors are when programmers don’t know what to do
Failures are when the system doesn’t know what to do about
programmers not knowing what to do.
Why let it crash then?
If a small piece of a system fails, we probably know what to do
with it.
Let’s try this really quick.
We’re processing a stream of events that looks like this:
[num-of-events, num-of-seconds]
We want to track the total events per second to take an
average later.
(+ acc (divider event))
acc = acc + divider(event)
Let’s try this really quick.
We get an event [0, 0]
Our code throws an exception, we can catch it before addition
happens.
What would we want from the function call?
Let’s try this really quick.
What if our code looked like:
(* acc (divider event))
acc = acc * divider(event)
What about a database?
We request something from a database and:
The query is wrong
Crash the process
The query fails.
Try again, could be a connection blip.
The query times out.
Maybe chill out there for a second, no sense in knocking our
database over.
Rule #2: Ask for help
If a small part of a program doesn’t know what to do, maybe a
larger part will.
Supervisors
Processes that watch other processes and decide how to act.
A supervisor can restart a process or fail and throw an
exception.
Supervisors decouple error handling from business logic
Things fit together
We start to get an idea of how things fit together.
It’s easier to see how parts of a system should fail an interact.
Trees are pretty intuitive.
Maybe it’s time for an example
Matchmaking server for a multiplayer games.
Checks which players are available
Determines whether these two players can be routed to each
other
Sends off a command to create a session
Maybe it’s time for an example
API gets REST requests, updates system state.
Matchmaker searches state for good matches, checks to see if
a connection can be made, sends them to a game session
service.
This seems nicer
We can reason about errors a little better
Parts of this system can run independently
It’s clearer what the system needs to run
So I guess my point is:
There are nice ways to model concurrent systems.
When building systems, think about ways to:
Isolate failure (let it crash)
Recover and operate partially
Cut down on dependencies
Shouts out to:
Joe Armstrong for writing an incredible dissertation and a cool
language.
My co-worker Simon Robinson for chatting about me long and
hard about availability in systems.
You
PS
If you want to do any of this
LiveOps is hiring
Get at me at the after party.
Never put in a questions slide, they said.
Questions?

More Related Content

Similar to Angus Fletcher - Error Handling in Concurrent Systems

Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSylvain Kalache
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwareTheo Schlossnagle
 
Mastering Microservices 2022 - Debugging distributed systems
Mastering Microservices 2022 - Debugging distributed systemsMastering Microservices 2022 - Debugging distributed systems
Mastering Microservices 2022 - Debugging distributed systemsBert Jan Schrijver
 
JavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsJavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsBert Jan Schrijver
 
GOTO night April 2022 - Debugging distributed systems
GOTO night April 2022 - Debugging distributed systemsGOTO night April 2022 - Debugging distributed systems
GOTO night April 2022 - Debugging distributed systemsBert Jan Schrijver
 
Data skills for Agile Teams- Killing story points
Data skills for Agile Teams- Killing story pointsData skills for Agile Teams- Killing story points
Data skills for Agile Teams- Killing story pointsyasinnathani
 
Exploratory Testing in an Agile Context
Exploratory Testing in an Agile ContextExploratory Testing in an Agile Context
Exploratory Testing in an Agile ContextElisabeth Hendrickson
 
Let it crash! The Erlang Approach to Building Reliable Services
Let it crash! The Erlang Approach to Building Reliable ServicesLet it crash! The Erlang Approach to Building Reliable Services
Let it crash! The Erlang Approach to Building Reliable ServicesBrian Troutwine
 
Devoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systemsDevoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systemsBert Jan Schrijver
 
Arnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systemsArnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systemsBert Jan Schrijver
 
Generative Testing in Clojure
Generative Testing in ClojureGenerative Testing in Clojure
Generative Testing in ClojureAlistair Roche
 
Teach Your Sites to Call for Help: Automated Problem Reporting for Online Ser...
Teach Your Sites to Call for Help: Automated Problem Reporting for Online Ser...Teach Your Sites to Call for Help: Automated Problem Reporting for Online Ser...
Teach Your Sites to Call for Help: Automated Problem Reporting for Online Ser...Caktus Group
 
RedisConf17 - Observability and the Glorious Future
RedisConf17 - Observability and the Glorious FutureRedisConf17 - Observability and the Glorious Future
RedisConf17 - Observability and the Glorious FutureRedis Labs
 
How to Use FTP Files
 How to Use FTP Files How to Use FTP Files
How to Use FTP Filescrysatal16
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Brian Brazil
 
From DevOps to NoOps how not to get Equifaxed Apidays
From DevOps to NoOps how not to get Equifaxed ApidaysFrom DevOps to NoOps how not to get Equifaxed Apidays
From DevOps to NoOps how not to get Equifaxed ApidaysOri Pekelman
 
Adversarial Simulation Nickerson/Gates Wild West Hacking Fest Oct 2017
Adversarial Simulation Nickerson/Gates Wild West Hacking Fest Oct 2017Adversarial Simulation Nickerson/Gates Wild West Hacking Fest Oct 2017
Adversarial Simulation Nickerson/Gates Wild West Hacking Fest Oct 2017Chris Gates
 

Similar to Angus Fletcher - Error Handling in Concurrent Systems (20)

Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 
Mastering Microservices 2022 - Debugging distributed systems
Mastering Microservices 2022 - Debugging distributed systemsMastering Microservices 2022 - Debugging distributed systems
Mastering Microservices 2022 - Debugging distributed systems
 
JavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsJavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systems
 
GOTO night April 2022 - Debugging distributed systems
GOTO night April 2022 - Debugging distributed systemsGOTO night April 2022 - Debugging distributed systems
GOTO night April 2022 - Debugging distributed systems
 
Data skills for Agile Teams- Killing story points
Data skills for Agile Teams- Killing story pointsData skills for Agile Teams- Killing story points
Data skills for Agile Teams- Killing story points
 
Exploratory Testing in an Agile Context
Exploratory Testing in an Agile ContextExploratory Testing in an Agile Context
Exploratory Testing in an Agile Context
 
Debugging distributed systems
Debugging distributed systemsDebugging distributed systems
Debugging distributed systems
 
Let it crash! The Erlang Approach to Building Reliable Services
Let it crash! The Erlang Approach to Building Reliable ServicesLet it crash! The Erlang Approach to Building Reliable Services
Let it crash! The Erlang Approach to Building Reliable Services
 
Devoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systemsDevoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systems
 
Arnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systemsArnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systems
 
Generative Testing in Clojure
Generative Testing in ClojureGenerative Testing in Clojure
Generative Testing in Clojure
 
Teach Your Sites to Call for Help: Automated Problem Reporting for Online Ser...
Teach Your Sites to Call for Help: Automated Problem Reporting for Online Ser...Teach Your Sites to Call for Help: Automated Problem Reporting for Online Ser...
Teach Your Sites to Call for Help: Automated Problem Reporting for Online Ser...
 
RedisConf17 - Observability and the Glorious Future
RedisConf17 - Observability and the Glorious FutureRedisConf17 - Observability and the Glorious Future
RedisConf17 - Observability and the Glorious Future
 
How to Use FTP Files
 How to Use FTP Files How to Use FTP Files
How to Use FTP Files
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)
 
From DevOps to NoOps how not to get Equifaxed Apidays
From DevOps to NoOps how not to get Equifaxed ApidaysFrom DevOps to NoOps how not to get Equifaxed Apidays
From DevOps to NoOps how not to get Equifaxed Apidays
 
Devops down-under
Devops down-underDevops down-under
Devops down-under
 
Adversarial Simulation Nickerson/Gates Wild West Hacking Fest Oct 2017
Adversarial Simulation Nickerson/Gates Wild West Hacking Fest Oct 2017Adversarial Simulation Nickerson/Gates Wild West Hacking Fest Oct 2017
Adversarial Simulation Nickerson/Gates Wild West Hacking Fest Oct 2017
 

Recently uploaded

What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 

Recently uploaded (20)

What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 

Angus Fletcher - Error Handling in Concurrent Systems

  • 2. Error Handling in Concurrent Systems Aka Building Concurrent Systems in a Hostile Environment Turning the dumpster fire we have into the one we deserve.
  • 3. Hi I’m Angus, I Guess I work at Liveops Cloud My opinions are my own (as much as you can own an opinion comma man). @angusiguess on twitter angusiguess on github I like bikes. A lot of bikes. A lot.
  • 4. Why I’m interested This time last year, I was interested in systems working, so I talked about correctness. A lot has happened since then.
  • 5. Why I’m interested Namely, a lot of my code has gone to production. And a lot of that code has failed. Sometimes silently. And things have gotten weird.
  • 6. So I did some reading Because someone smarter than me probably solved this in the 60’s through 80’s. And I found this: "Making reliable distributed systems in the presence of software errors" The Open Telecon Platform Model
  • 7. An almost certainly reductionist history of computing. For a while, computers could work synchronously. Instructions could be processed in order. A lot of tricks to deal with I/O, memory mapping, hardware.
  • 8. Then communication networks happened: Computing borrowed ideas from railroads and telegrams Then computers were used to drive phones Then phones were used to connect computers.
  • 9. Gave rise to two obvious paradigms Sequential Concurrent
  • 10. Modelling problems A lot of computation benefits from being modelled sequentially. Problems where order matters Numerical problems Reading from and writing to things Even executing programs
  • 11. Modelling problems A lot of computation suffers when modelled sequentially. Communication Sensory data Modelling things affecting each other rather than the world affecting things
  • 12. When communication gets important, so does concurrency 1986, Joe Armstrong starts work on Erlang, to program telephone systems.
  • 13. Erlang is a strange language Doesn’t like to share memory Programs are split into processes Processes have to send messages to each other No guarantees that a message has been received Processes don’t always know where to find each other
  • 14. Everything old is new again. 2013, clojure core team starts work on core.async, based on go’s goroutines goroutines don’t share memory goroutines communicate by putting messages on channels No guarantees about a message being received no way to even determine who is listening to a channel
  • 15. What fresh hell is this? These seem like strong constraints. Why assume them?
  • 16. Shared Memory Suppose process P1 and P2 each have a list of instructions P1 and P2 start executing at roughly the same time, modifying memory. We can’t guarantee the order that P1 and P2 will interleave How can we write safe programs? Well we kind of can’t. We can write some safe programs
  • 17. Function Calls Depend on the existence of a receiving function. Couple the caller to the receiver
  • 18. Not knowing about places Assume the receiver will be there when we ask for something Also a way to enforce no shared state
  • 19. Still unclear Synchronous systems fail as one. Like a magic eight ball. Concurrent systems fail partially Like a highway or a casino
  • 20. We can’t assume that all of our system will be intact How are we supposed to work like this? I quit I always wanted to be a bike messenger anyway
  • 21. No wait don’t go! We can fix it We just have to change how we think Haha jk don’t try to fix it
  • 22. Rule #1: Don’t try to fix it. If we have a single process, we can try as hard as we want before we fail Things will either work, kind of work, or not work at all If we have lots of processes we have to think about all the ways a piece could fail. It’s too much, so what if we just don’t?
  • 23. Exceptions, Errors, and Failures Exceptions are when the runtime hits something unspecified Errors are when programmers don’t know what to do Failures are when the system doesn’t know what to do about programmers not knowing what to do.
  • 24. Why let it crash then? If a small piece of a system fails, we probably know what to do with it.
  • 25. Let’s try this really quick. We’re processing a stream of events that looks like this: [num-of-events, num-of-seconds] We want to track the total events per second to take an average later. (+ acc (divider event)) acc = acc + divider(event)
  • 26. Let’s try this really quick. We get an event [0, 0] Our code throws an exception, we can catch it before addition happens. What would we want from the function call?
  • 27. Let’s try this really quick. What if our code looked like: (* acc (divider event)) acc = acc * divider(event)
  • 28. What about a database? We request something from a database and: The query is wrong Crash the process The query fails. Try again, could be a connection blip. The query times out. Maybe chill out there for a second, no sense in knocking our database over.
  • 29. Rule #2: Ask for help If a small part of a program doesn’t know what to do, maybe a larger part will.
  • 30. Supervisors Processes that watch other processes and decide how to act. A supervisor can restart a process or fail and throw an exception. Supervisors decouple error handling from business logic
  • 31. Things fit together We start to get an idea of how things fit together. It’s easier to see how parts of a system should fail an interact. Trees are pretty intuitive.
  • 32. Maybe it’s time for an example Matchmaking server for a multiplayer games. Checks which players are available Determines whether these two players can be routed to each other Sends off a command to create a session
  • 33. Maybe it’s time for an example API gets REST requests, updates system state. Matchmaker searches state for good matches, checks to see if a connection can be made, sends them to a game session service.
  • 34. This seems nicer We can reason about errors a little better Parts of this system can run independently It’s clearer what the system needs to run
  • 35. So I guess my point is: There are nice ways to model concurrent systems. When building systems, think about ways to: Isolate failure (let it crash) Recover and operate partially Cut down on dependencies
  • 36. Shouts out to: Joe Armstrong for writing an incredible dissertation and a cool language. My co-worker Simon Robinson for chatting about me long and hard about availability in systems. You
  • 37. PS If you want to do any of this LiveOps is hiring Get at me at the after party.
  • 38. Never put in a questions slide, they said. Questions?