Your Code is Wrong

11,388 views
10,212 views

Published on

My keynote at NoSQL Now! on August 21st, 2013

Published in: Technology

Your Code is Wrong

  1. Your Code is Wrong Nathan Marz @nathanmarz 1
  2. Let’s start with an example
  3. Storm’s “reportError” method
  4. (Storm is a realtime computation system, like Hadoop but for realtime)
  5. Storm architecture
  6. Storm architecture Master node (similar to Hadoop JobTracker)
  7. Storm architecture Used for cluster coordination
  8. Storm architecture Run worker processes
  9. Storm’s “reportError” method
  10. Used to show errors in the Storm UI
  11. Error info is stored in Zookeeper
  12. What happens when a user deploys code like this?
  13. Denial-of-service on Zookeeper and cluster goes down
  14. Robust! Designed input space Actual input space
  15. Your code is wrong
  16. Your code is literally wrong
  17. Your code is wrong
  18. Why do you believe your code is correct?
  19. Your code Dependency 1 Dependency 2 Dependency 3
  20. Dependency 1 Dependency 4 Dependency 5
  21. Dependency 4 Dependency 6 Dependency 9 Dependency 7 Dependency 8
  22. Dependency 3,000,000 Hardware
  23. Electronics
  24. Chemistry
  25. Atomic physics
  26. Quantum mechanics
  27. I think I can safely say that nobody understands quantum mechanics. Richard Feynman
  28. Your code is wrong
  29. Your code ...
  30. All the software you’ve used has had bugs in it
  31. Including the software you’ve written
  32. Your code is sometimes correct
  33. That’s good enough!
  34. Treat code as nondeterministic
  35. Embrace “your code is wrong” to design better software
  36. Robust! Designed input space Actual input space
  37. Robust! Designed input space Actual input space
  38. An example
  39. Learning from Hadoop Jobtracker Job Job Job
  40. Learning from Hadoop Jobtracker Job Job Job
  41. Learning from Hadoop Jobtracker Job Job Job
  42. Your code is wrong
  43. So your processes will crash
  44. Storm’s daemons are process fault-tolerant
  45. Storm Nimbus Topology Topology Topology
  46. Storm Nimbus Topology Topology Topology
  47. Storm Nimbus Topology Topology Topology
  48. Storm Nimbus Topology Topology Topology
  49. Storm Nimbus Topology Topology Topology
  50. Robust! Designed input space Actual input space
  51. Robust! Designed input space Actual input space
  52. The impact of code being wrong
  53. Robust! Designed input space Actual input space Failures! Bad performance! Security holes! Irrelevant!
  54. Design principle #1 Measuring and monitoring are the foundation of solid engineering
  55. Measuring: Under what range of inputs does my software function well?
  56. Monitoring: What’s the actual input space of my software?
  57. Measure & Monitor Latency Throughput Stack traces Buffer sizes Memory usage CPU usage #threads spawned ...
  58. How you monitor your software is as important as its functionality
  59. Design principle #2 Embrace immutability
  60. Read/write database Application
  61. MySQLApplication
  62. MongoDBApplication
  63. RiakApplication
  64. CassandraApplication
  65. HBaseApplication
  66. Your code is wrong
  67. So data will be corrupted
  68. And you may not know why
  69. Views Immutable, ever-growing data Application Architecture based on immutability
  70. Views Immutable, ever-growing data Application Lambda architecture
  71. Design principle #3 Minimize dependencies
  72. The less that can go wrong, the less that will go wrong
  73. Example: Storm’s usage of Zookeeper
  74. Worker locations stored in Zookeeper
  75. All workers must know locations of other workers to send messages
  76. Two ways to get location updates
  77. 1. Poll Zookeeper Worker Zookeeper
  78. 2. Use Zookeeper “watch” feature to get push notifications Worker Zookeeper
  79. Method 2 is faster but relies on another feature
  80. Storm uses both methods Worker Zookeeper
  81. If watch feature fails, locations still propagate via polling
  82. Eliminating dependence justified by small amount of code required
  83. Design principle #4 Explicitly respect functional input ranges
  84. Storm’s “reportError” method
  85. Implement self-throttling to avoid overloading other systems
  86. Design principle #5 Embrace recomputation
  87. “Your code is wrong” meanings 1. Design input space differs from actual input space 2. The logic of your code is wrong 3. Requirements are constantly changing
  88. You must be able to change your code to match shifting requirements
  89. Example: blogging software
  90. New requirement: search
  91. Have to build a search index
  92. Recomputation gives you so much more
  93. Views Immutable, ever-growing data Application
  94. Building software no different than any other engineering
  95. The underlying challenges are the same
  96. What will break it?
  97. What are limits of my dependencies?
  98. How can I add redundancy to increase robustness?
  99. Can I isolate failures?
  100. Our raw materials are ideas instead of matter
  101. Thank you

×