Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 33

(Py)testing the Limits of Machine Learning

Jan. 25, 2022
0 likes 1 view

0

Share

Download to read offline

Data & Analytics

Despite the hype cycle, each day machine learning becomes a little less magic and a little more real. Predictions increasingly drive our everyday lives, embedded into more of our everyday applications. To support this creative surge, development teams are evolving, integrating novel open source software and state-of-the-art GPU hardware, and bringing on essential new teammates like data ethicists and machine learning engineers. Software teams are also now challenged to build and maintain codebases that are intentionally not fully deterministic.

This nondeterminism can manifest in a number of surprising and oftentimes very stressful ways! Successive runs of model training may produce slight but meaningful variations. Data wrangling pipelines turn out to be extremely sensitive to the order in which transformations are applied, and require thoughtful orchestration to avoid leakage. Model hyperparameters that can be tuned independently may have mutually exclusive conditions. Models can also degrade over time, producing increasingly unreliable predictions. Moreover, open source libraries are living, dynamic things; the latest release of your team's favorite library might cause your code to suddenly behave in unexpected ways.

Put simply, as ML becomes more of an expectation than an exception in our industry, testing has never been more important! Fortunately, we are lucky to have a rich open source ecosystem to support us in our journey to build the next generation of apps in a safe, stable way. In this talk we'll share some hard-won lessons, favorite open source packages, and reusable techniques for testing ML software components.

Recommended

Related Books

Free with a 30 day trial from Scribd

See all
Guerrilla Data Analysis Using Microsoft Excel: 2nd Edition Covering Excel 2010/2013 Oz du Soleil
(3/5)
Free
Python Machine Learning Sebastian Raschka
(4/5)
Free
Outnumbered: From Facebook and Google to Fake News and Filter-bubbles – The Algorithms That Control Our Lives David Sumpter
(5/5)
Free
Data Model Patterns: A Metadata Map David C. Hay
(3/5)
Free
Data Visualization: a successful design process Andy Kirk
(4/5)
Free
Agent-Based and Individual-Based Modeling: A Practical Introduction, Second Edition Steven F. Railsback
(4/5)
Free
Dynamic Models in Biology Stephen P. Ellner
(4/5)
Free
Probability, Markov Chains, Queues, and Simulation: The Mathematical Basis of Performance Modeling William J. Stewart
(2/5)
Free
Numerical Methods for Stochastic Computations: A Spectral Method Approach Dongbin Xiu
(5/5)
Free
Business Analysis Debra Paul
(4.5/5)
Free
Learn to Write DAX: A practical guide to learning Power Pivot for Excel and Power BI Matt Allington
(4/5)
Free
Python Data Science Essentials - Second Edition Boschetti Alberto
(4/5)
Free
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI &amp; Power Pivot in Excel 2010-2016 Rob Collie
(4.5/5)
Free
Supercharge Excel: When you learn to Write DAX for Power Pivot Matt Allington
(0/5)
Free
Logic Designer's Handbook: Circuits and Systems E. A. Parr
(0/5)
Free
Supercharge Power BI: Power BI is Better When You Learn To Write DAX Matt Allington
(0/5)
Free

(Py)testing the Limits of Machine Learning

  1. 1. (Py)Testing the Limits of Machine Learning Rebecca Bilbro ⩓ Daniel Sollis ⩓ Patrick Deziel
  2. 2. 01. Introduction Why test ML? 02. DIY Testing API Building blocks of a good ML test suite 03. Non-Determinism Keeping your head when the models act up 04. Experiment with Care ML diagnostics for experimental robustness 05. Conclusion Level up your ML game with these testing tips & tricks
  3. 3. Why test ML? 01
  4. 4. Do we need to test ML code? “Testing is for software, not data science.” “It’s a waste of time to test experimental research code.” “We follow hypothesis-driven development, not test-driven development.”
  5. 5. Can we test ML code? “Machine learning algorithms are non-deterministic, so there’s no way to test them.” “Our Jupyter notebooks don’t support test runners.” “Machine learning has too many parameters to test them all.”
  6. 6. Bottom Line If it’s going into a product, it needs to be tested.
  7. 7. Building blocks of a good ML test suite 02
  8. 8. Estimators and Transformers Inheriting from the Estimator() and Transformer() sklearn classes allows you to overload existing methods. Allows you to generalize various models and transformations in sklearn. Doing this allows the consistent use of pipelines across both preprocessing as well as modeling. Transformer fit() transform() Estimator fit() predict() X, y X, y ŷ X′
  9. 9. Creating a Wrapper ModelWrapper fit() transform() predict() Transformer Estimator Estimator Transformer Inheriting & Overloading
  10. 10. Pipelines and FeatureUnions The Pipeline and FeatureUnion features in SKLearn allow you to organize preprocessing and modeling, letting you quickly iterate through experiments. Pipelines are meant for use with simple modeling, while FeatureUnions are meant for parallelizable tasks. By creating a wrapper class using these features becomes even easier. Data Loader Transformer Transformer Estimator fit() predict()
  11. 11. pipeline = Pipeline([ ('extract_essays', EssayExtractor()), ('counts', CountVectorizer()), ('tf_idf', TfidfTransformer()), ('classifier', MultinomialNB()) ]) pipeline.fit_transform(X_train, y_train) y_pred = pipeline.predict() Create a pipeline that loads data from a file on disk, extracts each instance as an individual essay, then applies text feature extraction before a text classification model. Pipeline Example extract_essays counts tf_idf classifier http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html
  12. 12. http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html feature_union extract_essays counts tf_idf classifier document meta concepts DictVectorizer DictVectorizer Feature Union pipeline = Pipeline([ ('extract_essays', EssayExractor()), ('features', FeatureUnion([ ('ngram_tf_idf', Pipeline([ ('counts', CountVectorizer()), ('tf_idf', TfidfTransformer()) ])), ('essay_length', LengthTransformer()), ('misspellings', MispellingCountTransformer()) ])), ('classifier', MultinomialNB()) ])
  13. 13. We Use Pre-Commit in addition to Black to ensure that our repository stays clean and unified across commits. Coding Style and Enforcement Part of Keeping our Standards high is enforcing an agreed upon coding style and sticking to it.
  14. 14. The Double Edged Sword of Black python -m black '.file.py'
  15. 15. CI/CD With Jenkins Using Jenkins for build testing helps keep the whole team on the same page as well as enforcing the teams testing standards. Automating builds in addition to local testing helps to ensure that code works in different environments/machines. Push Pre-Commit Black Jenkins Build/Testing CICD Flow
  16. 16. Dealing with Non-Determinism 03
  17. 17. Testing an ML Pipeline ● How do we handle non-determinism in our pipeline? ● How do we test multiple parameters in our pipeline? ● How do we handle small variations in our pipeline? Scikit-learn Pipeline https://www.freecodecamp.org/news/chihuahua-or-muﬃn-my-search-for-the-best-computer-vision-api-cbda4d6b425d/
  18. 18. Different Data, Different Results Scikit-learn Pipeline Muﬃn Dog Scikit-learn Pipeline Muﬃn Dog Train Test Test Train
  19. 19. Different Executions, Different Results Train Test Scikit-learn Pipeline Muﬃn Dog Scikit-learn Pipeline Muﬃn Dog
  20. 20. Ensuring Reproducibility ● Fixing the random seed can ensure reproducibility across executions of the same code. ● Scikit-learn provides a random_state parameter for each non-deterministic function which allows the user to fix the random seed. class sklearn.neural_network.MLPClassifier(hidden_layer_sizes=100, activation='relu', *, solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuﬄe=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, max_fun=15000) https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassiﬁer.html
  21. 21. Using random_state ● Our function will now produce the same results on different executions if we pass it the same data.
  22. 22. (Py)Testing Our Function ● ML comes with an abundance of options. ● How do we test multiple parameters without turning our test code into spaghetti?
  23. 23. Using pytest.parametrize
  24. 24. Dealing With Inevitable Variations ● With floating point arithmetic, things can get...strange. ● In order to correctly test ML, we need a better way to compare floating point results. ● We need a method of handling results that are “close enough”. ○ E.g., Training time
  25. 25. Using pytest.approx
  26. 26. Diagnostics for Machine Learning 04
  27. 27. Engineering vs. Experimentation What if it’s a false dichotomy?
  28. 28. Data Loader Transformer(s) Feature Visualization fit() transform() draw() Data Loader Transformer(s) Estimator Evaluation Visualization fit() predict() score() draw() The Yellowbrick API
  29. 29. dog muffin
  30. 30. import matplotlib.pyplot as plt from sklearn.linear_model import SGDClassifier from sklearn.ensemble import RandomForestClassifier from yellowbrick.classifier import ClassificationReport from sklearn.model_selection import train_test_split as tts def muffins_or_dogs(X, y, model, classes=["dog", "muffin"]): fig, ax = plt.subplots() X_train, X_test, y_train, y_test = tts(X, y, random_state=38) visualizer = ClassificationReport( model, classes=classes, cmap="Greys", ax=ax, support=True, show=False ) visualizer.fit(X_train, y_train) score = visualizer.score(X_test, y_test) image_path = visualizer.estimator.__class__.__name__ + ".png" visualizer.show(outpath=image_path) return visualizer.estimator.predict(X_test)
  31. 31. Tips & Tricks Leverage an ML API Systematize tests by wrapping open source ML frameworks Pipeline ML Steps Chain ML steps to support accuracy & reproducibility Drill into Fuzziness Use parameterization & approximation to deal with non-determinism Embrace Consistency Adopt a team-wide coding style to facilitate collaboration Befriend Small Robots CI/CD helps flag test regressions & dependency changes Experiment with Care Use diagnostic tools that don’t interfere with testability
  32. 32. Thank you! Template by SlidesGo Icons by Flaticon Images by Freepik

×