STAQ Development Manual (Redacted)


Published on

This deck represents our current thinking about the best way to build enterprise SaaS software in 2015 - using a variety of techniques from several disciplines.

Since I wrote this I have also become very interested in resilience engineering and the notion that web developers are primarily engaged in the construction of socio-technical systems. When I rewrite this I plan to talk about how we should try to minimize mean-time-to-recover (MTTR) instead of mean-time-between-failures (MTBF), and how continuous deployment grows a safety culture around your operations.

I redacted most of the examples that illustrate these points because they use sensitive code examples or URLs. If you want to see the rest of slides, join us!

Published in: Software
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

STAQ Development Manual (Redacted)

  1. 1. Development Manual: Fast and Good
  3. 3. Priorities 1.Accuracy 2.Timeliness 3.Performance
  4. 4. Predictability >> Speed
  5. 5. Professionalism >> Heroism
  6. 6. Defense systems <=> New features & integrations
  7. 7. Ethical Conduct, Always When in doubt, ask!
  8. 8. THE VISION
  9. 9. How do we build software in a predictable, professional way?
  10. 10. We manage the risk of change
  11. 11. It’s more risky to NOT change code
  12. 12. Let’s make it impossible to break the app
  14. 14. All bugs are ultimately process failures Our systems should reject bad changes and enforce quality automatically
  15. 15. Deliver incremental results
  16. 16. What’s the 10% solution you could ship today?
  17. 17. Have more meetings • Make decisions • Hash things out • Explain your plan • Brainstorm • Write pseudocode • Make diagrams • Sharpen understanding
  18. 18. I, Mike Subelsky, heartily endorse meetings IIF they entail decision-making, brainstorming, mentoring, conflict, fun, excitement
  19. 19. Conduct Root Cause Analysis After Every Outage “The Five Whys”
  20. 20. Team leaders send weekly change updates to all@
  21. 21. Prioritize automatic tests and quality control over new features/integrations
  22. 22. Place one person on QA / bugfixing duty each week Need a cool object to pass around the office
  24. 24. Set smaller goals you can actually achieve We need more milestones, hard deadlines, “forcing functions”
  25. 25. 3 Layers of Planning & Goalsetting • Goals • Epics • Stories
  26. 26. Goal • Strategic objective that helps STAQ grow • Increases revenue or reduces costs, or both • Expressed as a single sentence, completely in business terms, associated with a deadline • “Charge customers for use of custom connections by 12/01/2015”
  27. 27. Epics • Major, high-level initiative to help achieve a Goal • Complex effort encompassing changes to multiple codebases • Expressed in business terms, in terms of user capabilities, with a deadline • “Users can view, create, update, and delete custom connections on their own by 11/15/2015” • Constantly reprioritized as the project deadline approaches
  28. 28. Push back • AMs/TAMs should always push us to deliver our best, advocating for customer • We should push back when real constraints exist • Helps everyone be more creative • Helps clarify priorities • 80% / incremental solutions often good enough • Resist the urge to dive in and save the day
  29. 29. Refactoring & maintenance cycles • Count on about a week of fixing and polishing major features post-release • After an epic or major project finishes, we do this anyway; let’s plan for it • Good time to handle chores, smaller feature tickets & bug fixes
  31. 31. Make small changes
  32. 32. Observe the Single Responsibility Principle Most other OOP principles derived from this one
  33. 33. - “…every class should have responsibility over a single part of the functionality provided by the software.”
  34. 34. - Responsibility is “…a reason to change…a class or module should have one, and only one, reason to change.”
  35. 35. Deploy in AM/mid-day
  36. 36. Prefer to deploy heavy/ breaking changes on weekends Avoid Sunday morning when collections run
  37. 37. Evening deploys OK but more dangerous You’ll get lazy / sleepy Make a rollback plan
  38. 38. Use the staging server • • staqnowledged/home/infrastructure/staging- server
  39. 39. Fork the app • Good for testing database changes • Need someone to try & document this • • May need to fork multiple apps in concert • Heroku addon attachment feature pretty cool
  40. 40. Use feature flags
  41. 41. Create parallel gems/engines/ extractors/tables
  42. 42. Design solid code open to future change But also beware of YAGNI (You Ain’t Gonna Need It)
  43. 43. Make it easy to test This is the main point of BDD; produces better designs
  44. 44. Lint Ruby and JS Code • Rubocop • ruby -c after every save • Let’s make this a git pre-commit hook • brew install jsl
  45. 45. Keep writing tests • Unit tests: 100% coverage • Usually no need to test explicit string contents • Integration test: all major subsystems • Including failure modes • Acceptance test: most features, important failure modes • Smoke test: engines/gems integrated into apps
  46. 46. Tests must always pass on CI Fixing the build should usually take precedence over other work
  47. 47. Keep Classes Small • “One screenful or less” • Not counting documentation • Almost no private methods • There’s always a better way, look for a hidden object
  48. 48. Write many, many classes
  49. 49. Use clean architectural patterns • 12factor • Dependency injection • Hexagonal architecture
  50. 50. Clear code >> documentation >> comments
  51. 51. Refactor constantly Think like a gardener, not like a landscaper
  52. 52. Make the change easy, then make the easy change
  53. 53. Write many, many gems/engines
  54. 54. Continue peer code review
  55. 55. You are only done when… • Documentation written (YARD tags + wiki) • Unit tests written to 100% coverage • Integration tests covering important subsystem interactions • Acceptance tests covering important features and failure modes • Code reviewed by a peer
  57. 57. Develop a few high-level, actionable alerts in staqmonitor
  58. 58. Many cool possibilities
  59. 59. We are going to build an immune system Rejects bad changes
  60. 60. Develop indications and warning indicators
  61. 61. QUESTIONS?