Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Practical Fault Tolerance in Elixir - Alexei Sholik

44 views

Published on

Elixir Club 10
March 17, 2018
Kyiv

Published in: Software
  • Be the first to comment

  • Be the first to like this

Practical Fault Tolerance in Elixir - Alexei Sholik

  1. 1. Practical Fault Tolerance in Elixir Alexei Sholik Elixir Club Kyiv, 17 Mar 2018
  2. 2. About me Backend engineer at Contractbook.co. Co-host at BeamEaters podcast. Contributor to Elixir. github.com/alco
  3. 3. What is fault tolerance?
  4. 4. Why it’s important
  5. 5. Why do only Erlang/Elixir communities seem to care about it?
  6. 6. A practical example (demo)
  7. 7. “Let it crash” is not the full story
  8. 8. Fail fast → restart → try again
  9. 9. Building blocks of fault tolerance
  10. 10. Process
  11. 11. Error
  12. 12. Link
  13. 13. Monitor
  14. 14. def call(process, request, timeout) do monitor = Process.monitor(process)xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx send(process, {:"$gen_call", {self(), monitor}, request}) receive {^monitor, reply} -> Process.demonitor(monitor, [:flush]) {:ok, reply} {:DOWN, ^monitor, _, _, reason} ->⁣xxxxxxxxxxxxxxxxxxxxxxxxxxxx exit(reason) after timeout -> Process.demonitor(monitor, [:flush]) exit(:timeout) end end
  15. 15. Task
  16. 16. Supervisor
  17. 17. Bohrbug
  18. 18. Heisenbug
  19. 19. Improving our example (demo)
  20. 20. ... DB SuperDocs.Web.Endpoint <request process> <db_connection process> Ecto’s connection pool supervisor
  21. 21. DB SuperDocs.Web.Endpoint <request process> <db_connection process> SuperDocs.TaskSup User.send_confirmation_email ...
  22. 22. Durable message queue hex.pm/packages/amqp
  23. 23. DB SuperDocs.Web.Endpoint <request process> <db_connection process> RabbitMQ SuperDocs.TaskSup ImportService.import_documents_for AMQPClient AMQPWorker AMQPWorker ...
  24. 24. Other tools and techniques
  25. 25. Remote error tracking Sentry, Rollbar, log aggregators
  26. 26. Exponential backoff hex.pm/packages/db_connection hex.pm/packages/gen_retry
  27. 27. Alternative supervisor hex.pm/packages/director
  28. 28. And so on...
  29. 29. Recap ● Trust in OTP… ● ...but don’t dismiss other tools ● Anticipate failures ● Isolate failures ● Fail fast, restart, try again
  30. 30. ● Why do computers fail and what can be done about it? ● Making reliable distributed systems in the presence of software errors ● It's About the Guarantees ● On Erlang, State and Crashes ● Error Kernels Reading material
  31. 31. Thank you! Questions?

×