EXPERIENCES FROM GOING
OPEN SOURCE
BY ADRIEN BALL
SNIPS
SNIPS
END-TO-END SPEECH TO MEANING SOFTWARE
Wake word Speech to text
Natural Language
Understanding
SNIPS
NATURAL LANGUAGE UNDERSTANDING
Speech to text
Natural Language
Understanding
SNIPS
NATURAL LANGUAGE UNDERSTANDING
Natural Language
Understanding
OPEN SOURCE AT SNIPS
MOTIVATIONS
▸ Transparency
▸ Visibility
▸ Reproducibility
PACKAGING FOR
OPEN SOURCE
PACKAGING FOR OPEN SOURCE
OBJECTIVES FOR THE COMMUNITY
▸ Understand
▸ Use
▸ Contribute
PACKAGING FOR OPEN SOURCE
REQUIREMENTS
▸ Documentation
▸ Continuous Integration and build automation
▸ APIs and versioning
PACKAGING FOR OPEN SOURCE
GOOD PRACTICES
▸ Documentation
▸ Continuous Integration and build automation
▸ APIs and versioning
PACKAGING FOR OPEN SOURCE
DOCUMENTATION
▸ Hard and painful to maintain
▸ More documentation => More outdated documentation
▸ Less documentation => Less explanations
https://xkcd.com/1343/
DOCUMENTATION
WELL DESIGNED APIS
DOCUMENTATION
DOCSTRINGS
DOCUMENTATION
DOCSTRINGS
DOCUMENTATION
DOCTESTS
PACKAGING FOR OPEN SOURCE
GOOD PRACTICES
▸ Documentation
▸ Continuous Integration and build automation
▸ APIs and versioning
PACKAGING FOR OPEN SOURCE
CONTINUOUS INTEGRATION AND BUILD AUTOMATION
▸ Continuous integration:
▸ Always be merging into a branch
▸ Merge frequently
▸ Build automation:
▸ Enforce tests and checks to pass before merging
PACKAGING FOR OPEN SOURCE
BUILD AUTOMATION, WHAT FOR ?
▸ the project can be installed or built on the targeted
platforms
▸ the code is doing what it is expected to do
▸ you haven't introduced regressions
▸ the documentation is not outdated
▸ automate whatever is error prone, and can be automated
PACKAGING FOR OPEN SOURCE
GOOD PRACTICES
▸ Documentation
▸ Continuous Integration and build automation
▸ APIs and versioning
PACKAGING FOR OPEN SOURCE
APIS AND VERSIONING
▸ Python: everything is public!
▸ Public API = Conventions + Doc
PACKAGING FOR OPEN SOURCE
SEMANTIC VERSIONING
1.3.2
PACKAGING FOR OPEN SOURCE
SEMANTIC VERSIONING
1 3 2
major minor patch
Bump when you
Examples
make incompatible API
changes
- removed function
- additional mandatory
param
- changed returned
type
Impact on
client code
no longer works
add functionality in a
backwards compatible
manner
- new API
- new optional param
additional capabilities
make backwards
compatible bug fixes
improved behavior
- internal bugs
MACHINE LEARNING
AND OPEN SOURCE
MACHINE LEARNING AND OPEN SOURCE
SPECIFIC CHALLENGES
▸ Managing resources
▸ Testing a Machine Learning pipeline
▸ Reproducibility
▸ Modularity and Extensibility
MACHINE LEARNING AND OPEN SOURCE
SPECIFIC CHALLENGES
▸ Managing resources
▸ Testing a Machine Learning pipeline
▸ Reproducibility
▸ Modularity and Extensibility
MACHINE LEARNING AND OPEN SOURCE
MANAGING RESOURCES
Input Output
resources
ML Pipeline
MACHINE LEARNING AND OPEN SOURCE
MANAGING RESOURCES
Input Output
ML Pipeline
▸ Heavier library
▸ Updating the resources requires a release
▸ No user-defined resources
MACHINE LEARNING AND OPEN SOURCE
MANAGING RESOURCES
Input Output
resources
ML Pipeline
MACHINE LEARNING AND OPEN SOURCE
MANAGING RESOURCES
Input Output
resources
ML Pipeline
MACHINE LEARNING AND OPEN SOURCE
MANAGING RESOURCES IN SNIPS NLU
MACHINE LEARNING AND OPEN SOURCE
MANAGING RESOURCES IN SNIPS NLU
MACHINE LEARNING AND OPEN SOURCE
SPECIFIC CHALLENGES
▸ Managing resources
▸ Testing a Machine Learning pipeline
▸ Reproducibility
▸ Modularity and Extensibility
▸ Traditional testing:
▸ Testing in ML ?
MACHINE LEARNING AND OPEN SOURCE
TESTING A MACHINE LEARNING PIPELINE
MACHINE LEARNING AND OPEN SOURCE
TESTING A MACHINE LEARNING PIPELINE
MACHINE LEARNING AND OPEN SOURCE
TESTING DONE WRONG
MACHINE LEARNING AND OPEN SOURCE
TESTING DONE RIGHT
MACHINE LEARNING AND OPEN SOURCE
TESTING A MACHINE LEARNING PIPELINE
MACHINE LEARNING AND OPEN SOURCE
HANDLE RANDOMNESS IN TESTS
MACHINE LEARNING AND OPEN SOURCE
SPECIFIC CHALLENGES
▸ Managing resources
▸ Testing a Machine Learning pipeline
▸ Reproducibility
▸ Modularity and Extensibility
MACHINE LEARNING AND OPEN SOURCE
REPRODUCIBILITY FROM A PRODUCT PERSPECTIVE
Data
Training
Evaluation selected data
Suffering
Reproducibility
0% 100%
MACHINE LEARNING AND OPEN SOURCE
REPRODUCIBILITY FROM A DEBUGGING PERSPECTIVE
MACHINE LEARNING AND OPEN SOURCE
REPRODUCIBILITY FOR BENCHMARKS
Data
+
Code
🤓
🤓
🤓
0.95 0.87 0.92
0.98 0.91 0.88
0.89 0.83 0.92
MACHINE LEARNING AND OPEN SOURCE
REPRODUCIBILITY WITH RANDOM SEEDS
MACHINE LEARNING AND OPEN SOURCE
REPRODUCIBILITY THROUGH CONFIGURATIONS
56
3.0
True
code
MACHINE LEARNING AND OPEN SOURCE
REPRODUCIBILITY THROUGH CONFIGURATIONS
42
1.5
False
code
MACHINE LEARNING AND OPEN SOURCE
REPRODUCIBILITY THROUGH CONFIGURATIONS
42
1.5
False
x
y
z
param_1:
param_2:
param_3:
code
config
MACHINE LEARNING AND OPEN SOURCE
REPRODUCIBILITY THROUGH CONFIGURATIONS
Data
+
Code
+
Config
🤓
🤓
🤓
0.95 0.87 0.92
0.98 0.91 0.88
0.89 0.83 0.92
MACHINE LEARNING AND OPEN SOURCE
SPECIFIC CHALLENGES
▸ Managing resources
▸ Testing a Machine Learning pipeline
▸ Reproducibility
▸ Modularity and Extensibility
MACHINE LEARNING AND OPEN SOURCE
MODULARITY AND EXTENSIBILITY
Input Output
LogReg
SVM
PIPELINE
AVAILABLE
COMPONENTS
MACHINE LEARNING AND OPEN SOURCE
MODULARITY AND EXTENSIBILITY
Input Output
LogReg
AVAILABLE
COMPONENTS
SVM
PIPELINE
MACHINE LEARNING AND OPEN SOURCE
MODULARITY AND EXTENSIBILITY
Input Output
LogReg
SVM
PIPELINE
AVAILABLE
COMPONENTS
MACHINE LEARNING AND OPEN SOURCE
MODULARITY AND EXTENSIBILITY
Input Output
LogReg
SVM
PIPELINE
AVAILABLE
COMPONENTS
MACHINE LEARNING AND OPEN SOURCE
MODULARITY AND EXTENSIBILITY
Input Output
LogReg
SVM
PIPELINE
AVAILABLE
COMPONENTS
MACHINE LEARNING AND OPEN SOURCE
MODULARITY AND EXTENSIBILITY
FILE SYSTEMRAM
MACHINE LEARNING AND OPEN SOURCE
REGISTRABLE COMPONENTS
Declarative syntax for
pipeline and components
MACHINE LEARNING AND OPEN SOURCE
REGISTRABLE COMPONENTS
Components Registry
EXPERIENCES FROM GOING OPEN SOURCE
TAKEAWAYS
▸ writing tests save you time, not the opposite
▸ test the right things
▸ make your outputs reproducible
▸ use abstractions to improve modularity and clarity
THANK YOU
@adrien_ball
github.com/snipsco/snips-nlu

Experiences from going open source