Let It Crash (@pavlobaron)

18,574 views

Published on

This is the slides of my "Let It Crash" talk at the SEACON in Hamburg in 2011, updated at OOP 2012 in Munich

Published in: Technology

Let It Crash (@pavlobaron)

  1. _LET_IT_CRASH_ *** Pavlo Baron, OOP‘12
  2. The Blue Screen Of Death (BSOD) used as background in this presentation has been friendly provided by Microsoft Windows
  3. _PAVLO_BARON_ [email_address] @pavlobaron
  4. Do you know _MURPHY_? You should, since he’s developing your software, deploying and releasing, testing and using it. He’s also designing and operating the whole hardware your programs are running on
  5. Everything that _CAN_ go wrong _WILL_ go wrong (Murphy’s Law)
  6. The very question is: do you _REALLY_ know everything that can go wrong?
  7. You can cover your cleaner bottle with hazard symbols, and still there will be a genius who will clean his teeth with it
  8. Or in topic relevant terms, look at this line of code: C = A / B
  9. <ul><li>Things that can go wrong: </li></ul><ul><li>A, B are undefined </li></ul><ul><li>A, B have wrong types </li></ul><ul><li>B = 0 </li></ul><ul><li>too big numbers </li></ul>C = A / B
  10. Was it all? No, there’s more: 5. C assumed to be a string 6. C gets overwritten 7. A, B, C not thread safe 8. C = 0 through rounding 9. this is copy/paste code . . . C = A / B
  11. And that’s just one LOC. Now imagine the explosion with every further LOC
  12. Your primary instinct is to protect yourself from things going wrong…
  13. … through attempting to prevent things from going wrong
  14. _DEFENSIVE_PROGRAMMING_ is what they taught us to do: critical if (isnum(A) and isnum(B) and (A > B * 10) and (B > 0)) try C = A / B catch (DivisionByZero) …
  15. And of course you will omit the one which will crash the whole thing
  16. So what you sometimes are trying to do is: catch (WrongStellarConstellation) //move stars catch (WrongSunPosition) //move after the sun catch (Angle45DegreeToHorizon) //run for your life catch (*) //just for the case
  17. You’re hounding _MURPHY_. Btw.: did you already check your payroll? Maybe you’ll find him there
  18. Do you imply I’m stupid? Wanna fight?
  19. No. I just suggest to let go of trying to achieve the impossible
  20. It is _HARD_TO_FORESEE_ every eventuality
  21. It is _IMPOSSIBLE_TO_FORESEE_ every eventuality dealing with big data amounts and huge event numbers
  22. It is wrong to catch all eventualities in a generic _ERROR_HANDLER_OF_FEAR_
  23. _DYNAMICALLY_TYPED_ languages have much more potential to introduce unpredictable data constellations
  24. _WEB_BASED_APPLICATIONS_ come up with the user generated content which is completely unpredictable
  25. _HARDWARE_ is unreliable. Accept. Period.
  26. _NETWORKS_ are unreliable. Accept. Period.
  27. Error handling code makes your source code _UNREADABLE_ and _UNMAINTAINABLE_
  28. Why would you think that the error handling code is _FREE_FROM_ERRORS_?
  29. Of course, your _TEST_COVERAGE_ is extremely high, especially for the error handling code
  30. Or, alternatively, your programs are just completely free from bugs
  31. This is a typical, complete, wholly bug-free program: begin end
  32. Aha. So what should I do instead?
  33. <ul><li>_JOE_ARMSTRONG_’s thesis </li></ul><ul><li>4.3 Error handling philosophy </li></ul><ul><li>let some other process do the error recovery </li></ul><ul><li>if you can’t do what you want to do, die </li></ul><ul><li>do not program defensively </li></ul>
  34. _LET_IT_CRASH_
  35. _YOU_NEED_ “ Virtual” processing units much smaller than your OS’s processes / threads Their cheapness pays off when they get scheduled through zero/minimal context switching and possibility to run 100’000s of them in one single VM instance
  36. _YOU_NEED_ Actors
  37. _YOU_NEED_ A virtual machine to run these actors, to schedule them, to control resources
  38. _YOU_NEED_ Mechanisms in your VM to protect and isolate single actors from each other and the whole VM from the actors Actors must pass messages instead of sharing resources in order to minimize dependencies
  39. _YOU_NEED_ A very high level of actor isolation
  40. _YOU_NEED_ To be able to externalize actor’s state Actors get their state from some storage / another actor with every new “call” and return a modified state for further storage
  41. _YOU_NEED_ An onion alike, layered model When an actor crashes, another one still “stores” its state and can pass it on to a new actor
  42. _YOU_NEED_ Ideally a functional programming approach There, you can only apply state from outside and get back a modified one
  43. _YOU_NEED_ Mechanisms which allow an actor to automatically get informed when an other actor crashes Also mechanisms to link actors together so they “die” together
  44. _YOU_NEED_ Linking (and monitoring) mechanisms
  45. _YOU_NEED_ Worker actors just doing some job. They can crash, but might get restarted at the point of crash to go on
  46. _YOU_NEED_ Supervisor actors only monitoring other actors, killing, restarting them, providing them their external state
  47. _YOU_NEED_ Hierarchies / trees of workers and supervisors, with branches being configured differently
  48. _YOU_NEED_ An automated strategy configuration for how to start workers, how to deal with crashes, how often etc.
  49. Aha. And where can I get it for free?
  50. Erlang/OTP (the heart of Joe’s thesis) can do it all
  51. Spawn / register / message passing register(?MODULE, spawn(?MODULE, loop, [])), ?MODULE ! {start}.
  52. Linking / trapping exits register(?MODULE, spawn_link(?MODULE, loop, [])), ?MODULE ! {start}. ... loop() -> receive {start} -> process_flag(trap_exit, true), loop(); {'EXIT', _Pid, stop} -> ...
  53. Monitoring erlang:monitor(process, From), ... loop() -> receive {'DOWN', _Ref, process, Pid, Reason} -> ...
  54. Worker / Supervisor -behaviour(supervisor). ... init(Args) -> {ok, {{one_for_one, 2, 10}, [ {the_server, {test_server, start_link, [Args]}, permanent, 2000, worker, [test_server]} ]}}.
  55. Worker state externalization -behaviour(gen_server). ... init(Args) -> ... State = #state{id = Id, session_id = SessionId}, {ok, State}. ... handle_cast(stop, State) -> dr_session:stop( State#state.session_id), {noreply, State};
  56. Same approach works (with minimal modifications) not only on one node, but on several nodes and even distributed over a network. That allows building true fault-tolerant systems
  57. Erlang was invented to program switches, control networks etc. It’s been designed for the mentioned hardware and network “ reliability”
  58. Erjang - Erlang VM “ on top“ of the JVM – can do this, too
  59. And Scala/Akka can also do this on the JVM
  60. Linking // link and unlink actors self.link(actorRef) self.unlink(actorRef) // starts and links Actors atomically self.startLink(actorRef) // spawns (creates and starts) actors self.spawn[MyActor] self.spawnRemote[MyActor] // spawns and links Actors atomically self.spawnLink[MyActor] self.spawnLinkRemote[MyActor]
  61. Worker / supervisor val supervisor = Supervisor( SupervisorConfig( AllForOneStrategy( List(classOf[Exception]),3,1000), Supervise( actorOf[MyActor1], Permanent) :: Supervise( actorOf[MyActor2], Permanent) :: Nil))
  62. What’s the catch? There must be one – I can’t simply build upon crashing
  63. Right. There’s no such thing as a free lunch
  64. _ASK_YOURSELF_ Should I handle this error? Defensive programming and “let it crash” philosophy must not exclude each other
  65. _ASK_YOURSELF_ How should I handle this error? Throw an exception? Return a value? Ignore? Write a log? Terminate the program?
  66. _ASK_YOURSELF_ Can I terminate / crash here? Do I have to go on? Can I go on from here at all?
  67. _ASK_YOURSELF_ Can I leave some garbage after I crash? Do I have to clean up before / after I do?
  68. _ASK_YOURSELF_ When I restart a worker, how can I know it doesn’t crash again, and again?..
  69. _ASK_YOURSELF_ Why do I expect wrong function parameters? Don’t I trust my own non-API calls? Shouldn’t I fix this outside?
  70. _ASK_YOURSELF_ When should I prefer in-place error handlers? When should I better go for a central error handler knowing about all types of possible errors?
  71. _ASK_YOURSELF_ Do I have to check parameter types? Do I have to generally check the semantics of the API function parameters, implement a contract?
  72. With all this in mind: Don’t program too defensively. _LET_IT_CRASH_
  73. _THANK_YOU_
  74. Presentation inspired by the work of Joe Armstrong, Steve Vinoski, Mazen Harake, Jonas Bonér and Kresten Krab Thorup Most images originate from istockphoto.com except few ones taken from Wikipedia and product pages

×