Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Experiences from going open source

30 views

Published on

This presentation was delivered by Adrien Ball at the Open Science in Practice (OSIP) summer school at EPFL Lausanne on September 2019 (http://osip2019.epfl.ch/).
It presents some lessons that were learned in the process of open sourcing the Snips NLU python library.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Experiences from going open source

  1. 1. EXPERIENCES FROM GOING OPEN SOURCE BY ADRIEN BALL
  2. 2. SNIPS
  3. 3. SNIPS END-TO-END SPEECH TO MEANING SOFTWARE Wake word Speech to text Natural Language Understanding
  4. 4. SNIPS NATURAL LANGUAGE UNDERSTANDING Speech to text Natural Language Understanding
  5. 5. SNIPS NATURAL LANGUAGE UNDERSTANDING Natural Language Understanding
  6. 6. OPEN SOURCE AT SNIPS MOTIVATIONS ▸ Transparency ▸ Visibility ▸ Reproducibility
  7. 7. PACKAGING FOR OPEN SOURCE
  8. 8. PACKAGING FOR OPEN SOURCE OBJECTIVES FOR THE COMMUNITY ▸ Understand ▸ Use ▸ Contribute
  9. 9. PACKAGING FOR OPEN SOURCE REQUIREMENTS ▸ Documentation ▸ Continuous Integration and build automation ▸ APIs and versioning
  10. 10. PACKAGING FOR OPEN SOURCE GOOD PRACTICES ▸ Documentation ▸ Continuous Integration and build automation ▸ APIs and versioning
  11. 11. PACKAGING FOR OPEN SOURCE DOCUMENTATION ▸ Hard and painful to maintain ▸ More documentation => More outdated documentation ▸ Less documentation => Less explanations
  12. 12. https://xkcd.com/1343/
  13. 13. DOCUMENTATION WELL DESIGNED APIS
  14. 14. DOCUMENTATION DOCSTRINGS
  15. 15. DOCUMENTATION DOCSTRINGS
  16. 16. DOCUMENTATION DOCTESTS
  17. 17. PACKAGING FOR OPEN SOURCE GOOD PRACTICES ▸ Documentation ▸ Continuous Integration and build automation ▸ APIs and versioning
  18. 18. PACKAGING FOR OPEN SOURCE CONTINUOUS INTEGRATION AND BUILD AUTOMATION ▸ Continuous integration: ▸ Always be merging into a branch ▸ Merge frequently ▸ Build automation: ▸ Enforce tests and checks to pass before merging
  19. 19. PACKAGING FOR OPEN SOURCE BUILD AUTOMATION, WHAT FOR ? ▸ the project can be installed or built on the targeted platforms ▸ the code is doing what it is expected to do ▸ you haven't introduced regressions ▸ the documentation is not outdated ▸ automate whatever is error prone, and can be automated
  20. 20. PACKAGING FOR OPEN SOURCE GOOD PRACTICES ▸ Documentation ▸ Continuous Integration and build automation ▸ APIs and versioning
  21. 21. PACKAGING FOR OPEN SOURCE APIS AND VERSIONING ▸ Python: everything is public! ▸ Public API = Conventions + Doc
  22. 22. PACKAGING FOR OPEN SOURCE SEMANTIC VERSIONING 1.3.2
  23. 23. PACKAGING FOR OPEN SOURCE SEMANTIC VERSIONING 1 3 2 major minor patch Bump when you Examples make incompatible API changes - removed function - additional mandatory param - changed returned type Impact on client code no longer works add functionality in a backwards compatible manner - new API - new optional param additional capabilities make backwards compatible bug fixes improved behavior - internal bugs
  24. 24. MACHINE LEARNING AND OPEN SOURCE
  25. 25. MACHINE LEARNING AND OPEN SOURCE SPECIFIC CHALLENGES ▸ Managing resources ▸ Testing a Machine Learning pipeline ▸ Reproducibility ▸ Modularity and Extensibility
  26. 26. MACHINE LEARNING AND OPEN SOURCE SPECIFIC CHALLENGES ▸ Managing resources ▸ Testing a Machine Learning pipeline ▸ Reproducibility ▸ Modularity and Extensibility
  27. 27. MACHINE LEARNING AND OPEN SOURCE MANAGING RESOURCES Input Output resources ML Pipeline
  28. 28. MACHINE LEARNING AND OPEN SOURCE MANAGING RESOURCES Input Output ML Pipeline ▸ Heavier library ▸ Updating the resources requires a release ▸ No user-defined resources
  29. 29. MACHINE LEARNING AND OPEN SOURCE MANAGING RESOURCES Input Output resources ML Pipeline
  30. 30. MACHINE LEARNING AND OPEN SOURCE MANAGING RESOURCES Input Output resources ML Pipeline
  31. 31. MACHINE LEARNING AND OPEN SOURCE MANAGING RESOURCES IN SNIPS NLU
  32. 32. MACHINE LEARNING AND OPEN SOURCE MANAGING RESOURCES IN SNIPS NLU
  33. 33. MACHINE LEARNING AND OPEN SOURCE SPECIFIC CHALLENGES ▸ Managing resources ▸ Testing a Machine Learning pipeline ▸ Reproducibility ▸ Modularity and Extensibility
  34. 34. ▸ Traditional testing: ▸ Testing in ML ? MACHINE LEARNING AND OPEN SOURCE TESTING A MACHINE LEARNING PIPELINE
  35. 35. MACHINE LEARNING AND OPEN SOURCE TESTING A MACHINE LEARNING PIPELINE
  36. 36. MACHINE LEARNING AND OPEN SOURCE TESTING DONE WRONG
  37. 37. MACHINE LEARNING AND OPEN SOURCE TESTING DONE RIGHT
  38. 38. MACHINE LEARNING AND OPEN SOURCE TESTING A MACHINE LEARNING PIPELINE
  39. 39. MACHINE LEARNING AND OPEN SOURCE HANDLE RANDOMNESS IN TESTS
  40. 40. MACHINE LEARNING AND OPEN SOURCE SPECIFIC CHALLENGES ▸ Managing resources ▸ Testing a Machine Learning pipeline ▸ Reproducibility ▸ Modularity and Extensibility
  41. 41. MACHINE LEARNING AND OPEN SOURCE REPRODUCIBILITY FROM A PRODUCT PERSPECTIVE Data Training Evaluation selected data
  42. 42. Suffering Reproducibility 0% 100% MACHINE LEARNING AND OPEN SOURCE REPRODUCIBILITY FROM A DEBUGGING PERSPECTIVE
  43. 43. MACHINE LEARNING AND OPEN SOURCE REPRODUCIBILITY FOR BENCHMARKS Data + Code 🤓 🤓 🤓 0.95 0.87 0.92 0.98 0.91 0.88 0.89 0.83 0.92
  44. 44. MACHINE LEARNING AND OPEN SOURCE REPRODUCIBILITY WITH RANDOM SEEDS
  45. 45. MACHINE LEARNING AND OPEN SOURCE REPRODUCIBILITY THROUGH CONFIGURATIONS 56 3.0 True code
  46. 46. MACHINE LEARNING AND OPEN SOURCE REPRODUCIBILITY THROUGH CONFIGURATIONS 42 1.5 False code
  47. 47. MACHINE LEARNING AND OPEN SOURCE REPRODUCIBILITY THROUGH CONFIGURATIONS 42 1.5 False x y z param_1: param_2: param_3: code config
  48. 48. MACHINE LEARNING AND OPEN SOURCE REPRODUCIBILITY THROUGH CONFIGURATIONS Data + Code + Config 🤓 🤓 🤓 0.95 0.87 0.92 0.98 0.91 0.88 0.89 0.83 0.92
  49. 49. MACHINE LEARNING AND OPEN SOURCE SPECIFIC CHALLENGES ▸ Managing resources ▸ Testing a Machine Learning pipeline ▸ Reproducibility ▸ Modularity and Extensibility
  50. 50. MACHINE LEARNING AND OPEN SOURCE MODULARITY AND EXTENSIBILITY Input Output LogReg SVM PIPELINE AVAILABLE COMPONENTS
  51. 51. MACHINE LEARNING AND OPEN SOURCE MODULARITY AND EXTENSIBILITY Input Output LogReg AVAILABLE COMPONENTS SVM PIPELINE
  52. 52. MACHINE LEARNING AND OPEN SOURCE MODULARITY AND EXTENSIBILITY Input Output LogReg SVM PIPELINE AVAILABLE COMPONENTS
  53. 53. MACHINE LEARNING AND OPEN SOURCE MODULARITY AND EXTENSIBILITY Input Output LogReg SVM PIPELINE AVAILABLE COMPONENTS
  54. 54. MACHINE LEARNING AND OPEN SOURCE MODULARITY AND EXTENSIBILITY Input Output LogReg SVM PIPELINE AVAILABLE COMPONENTS
  55. 55. MACHINE LEARNING AND OPEN SOURCE MODULARITY AND EXTENSIBILITY FILE SYSTEMRAM
  56. 56. MACHINE LEARNING AND OPEN SOURCE REGISTRABLE COMPONENTS Declarative syntax for pipeline and components
  57. 57. MACHINE LEARNING AND OPEN SOURCE REGISTRABLE COMPONENTS Components Registry
  58. 58. EXPERIENCES FROM GOING OPEN SOURCE TAKEAWAYS ▸ writing tests save you time, not the opposite ▸ test the right things ▸ make your outputs reproducible ▸ use abstractions to improve modularity and clarity
  59. 59. THANK YOU @adrien_ball github.com/snipsco/snips-nlu

×