Path dependent-development (PyCon India)


Published on

(Image on page 3: it's the traditional fast/good/cheap trade-off. Something glitched in the conversion))

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Path dependent-development (PyCon India)

  1. 1. Path Dependent Development Nick Coghlan @ncoghlan_dev Red Hat Toolsmith CPython Core Developer
  2. 2. Usefully Wrong“All models are wrong. Some models are useful.”“... the practical question is: How wrong do theyhave to be to not be useful?” George E. P. Box (statistician) “Empirical Model-Building”
  3. 3. Choose Any Two?
  4. 4. Path Dependence● “good enough to be useful” -> ship it● The decisions we make leave their mark on the software we ship● These marks remain long after the scope of the software expands to other use cases
  5. 5. What is “Good Enough”?● Depends on your priorities and resources – What are you building? – Why are you building it? – Who are you building it for? – Who is building it? – What are you building it with? – How much risk can you tolerate?
  6. 6. Context Matters● Building an intranet web service – Trusted network – Enforced user base● Building a web startup – Hostile network – Business lives or dies by user choice● Building hardware control and management systems – Usage driven by hardware – Software as a necessary evil
  7. 7. Trade-Offs Needed: Inquire Within
  8. 8. Functionality● Doing one (or a few) things well is often better than doing a lot of things badly● Adding functionality later is usually easier to sell than taking it away (no matter how broken it turns out to be)
  9. 9. Flexibility● Dont make things configurable● Configurability = testing and maintenance pain● Do separate concerns (if you make it configurable later, only one place needs to change)● Do use flexible support tools – SQL Alchemy makes it easy to change database – Django locks in some major decisions (like ORM and templating language) but provides a rich ecosystem of prebuilt components that work well together
  10. 10. Security● A lot of software is still insecure by default – Unhashed (or poorly hashed) passwords – Unencrypted communications channels● Multiple layers of defence can hide this● Try to make the “easy option” and the “secure option” one and same● Can be very hard to fix poor security choices
  11. 11. Reinventing Wheels● Reuse means dependency management● Often simpler to roll your own to start● With good modularity, easy to replace later● Watch for increasing complexity
  12. 12. Documentation● How sophisticated are users expected to be? – Installed by developers? Admins? End users? – Intended for domain experts only?● Is it stable enough to document?● Documentation can highlight design flaws
  13. 13. Test Quality● Fine grained tests pinpoint failures easily● Coarse grained tests are often easier to write● Can easily start with coarse grained tests, then add more fine grained tests to narrow down failures● Slow tests are better than no tests● External dependencies are better than no tests● Regression tests are great, but dont let them block fixes for problems that cant be reproduced reliably
  14. 14. Code Reviews● Code is written to: – Tell the computer what to do – Tell future maintainers what it does● Tests cover the first, reviews the second● Debatable value for small teams● Highly valuable for large teams● Needs appropriate tools
  15. 15. Performance & Scalability● Dont stress about it if you dont need to● Start with measurement infrastructure● If simple is fast enough, stick with simple
  16. 16. Reliability● Not all software is mission critical● Pay attention to failure modes● Error quality matters
  17. 17. Usability● Humans are still a lot smarter than computers● If users have no choice, theyll usually cope● Hence, awful UX in most “enterprise” software
  18. 18. Maintainability & Business Risks● The Bus Factor – Most startups = 1 – Large companies want it to be higher● Developer docs (including comments)● Legal risks (copyrights, patents, trademarks)
  19. 19. Automation● Critical to speeding up release cycles● Is a process stable enough to automate?
  20. 20. Managing Path Dependence
  21. 21. Exit Strategies● Know what youre not doing● Have a vague idea how to fix it when needed● Actual fixes will depend on future needs● Sometimes, the only right answer is “No”
  22. 22. Patterns and Processes● Keep your options open● Minimise current complexity● This is not easy – Software architecture and design patterns – Software processes and methodologies● “interim” solutions may last a long time● If you dont have a test suite, start there
  23. 23. Prototyping vs Implementation● Two very different modes of development● Prototyping – Exploration – Trying to figure out what is feasible● Implementation – Already known to be feasible – Making it happen to a known specification● Big difference in priorities!
  24. 24. Social Implications● Design decisions are context dependent● Easy to criticise in hindsight● Design trade-offs can influence community● Actually getting better at building software● Ambitions are (more than?) keeping pace
  25. 25. Path Dependence in Action
  26. 26. An Innocent Start● PulpDist: Mirroring network based on rsync● Simple job definitions { "remote_server": "localhost", "remote_path": "/demo/simple/", "local_path": "/var/www/pub/sync_demo_raw/", ... }● Simple custom validator for JSON data – Checks on individual values – Overall sanity checks on full jobs
  27. 27. Dont Repeat Yourself● Simple format turned out to be too simple – Hard to modify given multiple jobs from same source● Enhanced format with reusable elements { "mirror_id": "local_copy", "tree_id": "simple_sync", "site_id": "bne", ... }● Simple validator was no longer adequate
  28. 28. What To Do?● Upgrade the existing validator – Possible, but tedious to test properly – Not a good wheel to reinvent● JSON validation library – Research would be starting from scratch – Hard to assess quality quickly● Relational database – Enforces the constraints by its very nature – Error quality would likely be poor
  29. 29. Two Birds...● For validation, I needed to: – Ensure identifiers were unique – Ensure cross references were valid● For UI purposes I also needed: – To filter by component identifiers – To sorting by various fields● Sound familiar?
  30. 30. Two Birds...● For validation, I needed to: – Ensure identifiers were unique – Ensure cross references were valid● For UI purposes I also needed: – To filter by component identifiers – To sorting by various fields● Sound familiar?
  31. 31. ...One Stone● An in-memory SQLite database was perfect● But writing SQL by hand is still horrible● SQL Alchemy in target environment● Problem solved! – Config loaded into DB after simple field validation – If the DB accepts it, references are also valid
  32. 32. How Does The Story End?● Still some very rough edges – Sqlite error messages are quite user hostile – Schema changes are triple-keyed● Future changes? – Master in database, JSON only as export? – Improved error messages? – Switch to an actual schema engine?● Other priorities!
  33. 33. Q&A Pulp: PulpDist: CPython Sprints Monday & Tuesday