Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Joshua%Bloom,%Ph.D.%%
CTO,%Co'founder
PyData,'Sea*le,'July'2015
A"Systems"View"of"Machine"Learning"
in%science%&%industry
...
About Me…
http://research.google.com/pubs/pub43146.html
• Complex models erode abstraction
boundaries
• Data dependencies cost more ...
Algorithms
Software
Hardware
Project Staff
Consumers
Organization + Society
ML System Components
Agenda
- inside-out discu...
Linear/
Logistic
Regression
Naive
Bayes
Decision
Trees
SVMs
Bagging
Boosting
Decision
Forests
Neural
Nets
Deep
Learning
Ne...
Nguyen'et'al,'CVPR'2015'
All Models of Learning Have Flaws
http://hunch.net/?p=224
“It’s common to forget the flaws
of the ...
Nguyen'et'al,'CVPR'2015'
The$impact$of$dataset$bias
Training/testing#on#biased#datasets#gives#unrealistic#results.
!E.g.#:...
What are you optimizing for?Component What
Algorithm/Model
Learning rate, convexity, error
bounds, scaling, …
+ Software/H...
Scalar proxies:
- RMSE
- RMSLE
- [adjusted] R2
- ...
R2=0.91
RMSE = 692.3
Pearson R=0.96
Optimization Metric:
What’s the e...
Scalar proxies:
- RMSE
- RMSLE
- [adjusted] R2
- ...
R2=0.91
RMSE = 692.3
Pearson R=0.96
scatter
outliers
bias
Optimizatio...
which classifier is best?
depends...
Optimization Metric:
What’s the essence of what I care about?
10
>$50k Prize
<$50k Prize
Netflix
winning'
metric
best'
benchmark
many'teams'get'within'
~few'%'of'opQmum
so"which"is"easi...
11
“We evaluated some of the new methods
offline but the additional accuracy gains
that we measured did not seem to justif...
WiseFactory
automated feature extraction, learning, prediction, deployment
WiseTransfer
efficient manipulation of large obj...
Wise DataSet
BaseVariableGroup BaseVariableGroup BaseVariableGroup
InstanceGroup
InstanceGroup
InstanceGroup
RowSparse
Row...
Language-agnostic C++ Base Classes
Python-specific Derived Classes
Output
Input
Iterator
Processor
Array
Processor
String
P...
Datasets for Data Science Comparison
• ]
• -
Slicing
Induces
Copy
Immutable
Columns
Query
Transfer
Speed to
Python
C++
SDK...
Enforcing (Weak) Contracts: Monitoring Deployments
Build DS
workflow
on test set,
like the offline
testing accuracy
deploy &...
unit tests
Regression Tests
Integration Tests
Of course you’re
doing this…
ETL Testing
is my contract
affected by the
(cha...
reproducibility
• every deployment & drift test
given unique hash
• generate data files & script with
hash
• Perform sampli...
“Weak Contracts”
ie.
Abstractions within
components bleed through
to other components
cf. Sculley …
1. A'smart'programmer'...
Platonic Form
Data, as we act like it is…
Plutonic Form
…as it is.
NLP
{broken: 3, “blue screen”: 2, ...}
computer
vision
{eyes: [{“location”: [21,13],
“bounding”: [...]}]..}
metadata
Spars...
SeismologyNeuroscience
Klein et al.
Astronomy
http://mltsp.io
pip install mltsp
ML tsp.
Machine Learning
Time-Series Platform
R. AllenM. SilverF. Peréz JSB
Domain
scien...
Flask
CLI
(under developement)
REST
/learn
/upload UI
Disco
W1 W2 Wn
Disco worker pool
datastore
DB
Demo!
--
MLTSP Continuous Integration
github.com/drone
github.com/mltsp/mltsp
Test

Container

with
MLTSP
Custom
Feature
Extract...
http://bigmacc.info
Results from MLTSP
The Astrophysical Journal Supplement Series, 203:32 (27pp), 2012 December
Published...
Probabilistic Classfication of
Variable Stars
Shivvers,JSB,Richards MNRAS,2014
106 “DEB” candidates
12 new
mass-radii
15 “R...
WISE SUPPORT
FROM
SUBJECT
DESCRIPTION
Support Ticket
DATE
FROM
SUBJECT
DESCRIPTION
Support Ticket
DATE
FROM
SUBJECT
DESCRI...
Fault Tolerant ML
augmentation vs. full automation
Random forest
prediction of body
segment in Xbox
Kinect
gmail
https://www.reddit.com/r/funny/comments/3e7gy4/yes_netflix_because_my_6_year_old_will_enjoy_the/
“Yes Netflix,
because my 6 ...
[So]'What'should'be'the'machine'learning'engineering'process?”'
“Machine'learning'disrupts'so_ware'engineering'
- Leon Bot...
ỉπ vs.
(or “Data Science is a Team Sport”)
deep domain skill/knowledge/training
deep methodological knowledge/skill
deep d...
‣ Novel testing can strengthen abstractions within components,
and contracts between
‣ Machine Learning Systems require op...
Area Man
Bites off more
than he can chew
PyData 2014
Thanks!
@pro6sb
A"Systems"View"of"Machine"Learning
in#science#&#industry
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
Upcoming SlideShare
Loading in …5
×

PyData 2015 Keynote: "A Systems View of Machine Learning"

12,472 views

Published on

Despite the growing abundance of powerful tools, building and deploying machine-learning frameworks into production continues to be major challenge, in both science and industry. I'll present some particular pain points and cautions for practitioners as well as recent work addressing some of the nagging issues. I advocate for a systems view, which, when expanded beyond the algorithms and codes to the organizational ecosystem, places some interesting constraints on the teams tasked with development and stewardship of ML products.

About: Dr. Joshua Bloom is an astronomy professor at the University of California, Berkeley where he teaches high-energy astrophysics and Python for data scientists. He has published over 250 refereed articles largely on time-domain transients events and telescope/insight automation. His book on gamma-ray bursts, a technical introduction for physical scientists, was published recently by Princeton University Press. He is also co-founder and CTO of wise.io, a startup based in Berkeley. Josh has been awarded the Pierce Prize from the American Astronomical Society; he is also a former Sloan Fellow, Junior Fellow at the Harvard Society, and Hertz Foundation Fellow. He holds a PhD from Caltech and degrees from Harvard and Cambridge University.

Published in: Technology

PyData 2015 Keynote: "A Systems View of Machine Learning"

  1. 1. Joshua%Bloom,%Ph.D.%% CTO,%Co'founder PyData,'Sea*le,'July'2015 A"Systems"View"of"Machine"Learning" in%science%&%industry Gordon%&%Be6y%% Moore%Founda:on%% Data'Driven%Inves:gator%% UC%Berkeley,%Astronomy @pro6sb
  2. 2. About Me…
  3. 3. http://research.google.com/pubs/pub43146.html • Complex models erode abstraction boundaries • Data dependencies cost more than code dependencies • System-level Spaghetti • Changing External World “It may be surprising to the academic community to know that only a fraction of the code … is actually doing ‘machine learning’. A mature system might end up being (at most) 5% machine learning code and (at least) 95% glue code.”
  4. 4. Algorithms Software Hardware Project Staff Consumers Organization + Society ML System Components Agenda - inside-out discussion of component parts & some interconnects - presentation of some facilitating new tools - impact on problem definetion teams
  5. 5. Linear/ Logistic Regression Naive Bayes Decision Trees SVMs Bagging Boosting Decision Forests Neural Nets Deep Learning Nearest Neighbors Gaussian/ Dirichlet Processes Splines Lasso XGBoost …. Some Algos/Models/Approaches Used in Practice LDA/LSI RNN Software Instantiations in the Python Ecosystem BOW word2vec
  6. 6. Nguyen'et'al,'CVPR'2015' All Models of Learning Have Flaws http://hunch.net/?p=224 “It’s common to forget the flaws of the model that you are most familiar…while the flaws of new models get exaggerated.” - John Langford (2007, Microsoft research) Concepts$≠$Statistics Convolutional# networks# can#be# fooled.
  7. 7. Nguyen'et'al,'CVPR'2015' The$impact$of$dataset$bias Training/testing#on#biased#datasets#gives#unrealistic#results. !E.g.#:#Torralba and#Efros,#Unbiased2look2at2dataset2bias,#CVPR#2011. Torralba/Efros11 via L. Bottou (ICML 2015) All Models of Learning Have Flaws http://hunch.net/?p=224 “It’s common to forget the flaws of the model that you are most familiar…while the flaws of new models get exaggerated.” - John Langford (2007, Microsoft research) Concepts$≠$Statistics Convolutional# networks# can#be# fooled. (Nguyen#et#al,#CVPR#2015) Magri*e,'ICML,'1929'
  8. 8. What are you optimizing for?Component What Algorithm/Model Learning rate, convexity, error bounds, scaling, … + Software/Hardware Accuracy, Memory usage, Disk usage, CPU needs, time to learn, time to predict + Project Staff time to implement, people/ resource costs, reliability, maintainability, experimentability + Consumers direct value, useability, explainability, actionability + Society indirect value - multi-axis optimizations in a given component - highly coupled optimization considerations between components - myoptic view can be costly further up the stack
  9. 9. Scalar proxies: - RMSE - RMSLE - [adjusted] R2 - ... R2=0.91 RMSE = 692.3 Pearson R=0.96 Optimization Metric: What’s the essence of what I care about?
  10. 10. Scalar proxies: - RMSE - RMSLE - [adjusted] R2 - ... R2=0.91 RMSE = 692.3 Pearson R=0.96 scatter outliers bias Optimization Metric: What’s the essence of what I care about?
  11. 11. which classifier is best? depends... Optimization Metric: What’s the essence of what I care about?
  12. 12. 10 >$50k Prize <$50k Prize Netflix winning' metric best' benchmark many'teams'get'within' ~few'%'of'opQmum so"which"is"easier"to"put" into"produc9on? Leaderboard'data'from'Kaggle'&'NeMlix Optimization Metric
  13. 13. 11 “We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment.” Xavier'Amatriain'and'Jus0n'Basilico'(April'2012) On the Prize
  14. 14. WiseFactory automated feature extraction, learning, prediction, deployment WiseTransfer efficient manipulation of large objects WiseDataSet WiseML high-productivity data science in Python WiseAlgorithm WindTunnel detect drift in CPU, Mem, Accuracy, Statistics Quality Wrapping High-Level API Deployment & Monitoring C++ SDK Core ML Stack at Wise.io G. Blanco D. Eads J. Richards P. Baines H. Brink
  15. 15. Wise DataSet BaseVariableGroup BaseVariableGroup BaseVariableGroup InstanceGroup InstanceGroup InstanceGroup RowSparse RowMajor HeterogeneousCache AlgoRepo ColSparse MemMapped Variable Mapper Level Mapper • fast, highly memory-efficient • heterogeneous • distributed Goal: easily surface algorithms 
 (written in C++ to be cache exploitative) to Python
 
 WiseDataSets
  16. 16. Language-agnostic C++ Base Classes Python-specific Derived Classes Output Input Iterator Processor Array Processor String Processor FrameBuilder SeriesBuilder StringBuilder Frame Processor R-specific Derived Classes • expose flexible interface from Python, to high- performance, Python-agnostic C++ code • pass arbitrary data between layers using 
 “Protocol Master” (like Protobufs) • write C++ code generically for GraphLab, Spark, pandas, and Wise WiseTransfer
  17. 17. Datasets for Data Science Comparison • ] • - Slicing Induces Copy Immutable Columns Query Transfer Speed to Python C++ SDK Distributed Memory Efficiency Categorical Optimized Sparse & Dense Pandas DataFrame Sequences No Yes N/A No No Medium Medium Yes GraphLab SFrame Yes Yes Yes Low Yes Yes High No Yes Spark DataFrame Yes Yes Yes Very Low No Yes Low No Yes Dask Yes No Yes N/A No Yes Medium No No Blaze No No Yes N/A No Yes Medium No No Wise DataSet Copy- on-write No Yes Very High Yes Yes Very High High Yes See also: Rob Story, today
  18. 18. Enforcing (Weak) Contracts: Monitoring Deployments Build DS workflow on test set, like the offline testing accuracy deploy & start monitoring results online, accuracy is worse than expected ? 1. Bang head to find (subtle) overfitting in model 2. Retrain: with new data (mo’ data, better answers) 3. Concept Drift: if retraining doesn’t help, jigger the DS workflow 4. Maybe that’s ok: Prediction influenced outcome. Hold out some live. What to do: see also, Chris Harland’s talk yesterday; Mike Manapat, today
  19. 19. unit tests Regression Tests Integration Tests Of course you’re doing this… ETL Testing is my contract affected by the (changing) update? Model Deployment Testing @treycausey (yesterday) some tools: Engarde Hypothesis Feature Forge Software Tests Enforcing (Weak) Contracts: Monitoring Deployments 1. Need to know when things are too different than before 2. Then alert a real human 3. Use automated tools to try to isolate cause of change: data or code.
  20. 20. reproducibility • every deployment & drift test given unique hash • generate data files & script with hash • Perform sampling on known-good deployments • Monitor RAM, CPU, accuracy metrics over time • Probabalistic testing component of our continuous integration of ML,
 10k++ tests Wise “WindTunnel”
  21. 21. “Weak Contracts” ie. Abstractions within components bleed through to other components cf. Sculley … 1. A'smart'programmer'makes'an' invenQve'use'of'a'trained'object' recognizer.' 2. The'object'recognizer'receives'data'that' does'not'resemble'the'tesQng'data'and' outputs'nonsense.' 3. The'code'of'the'smart'programmer'does' not'work.' Example (via Bottou)
  22. 22. Platonic Form Data, as we act like it is… Plutonic Form …as it is.
  23. 23. NLP {broken: 3, “blue screen”: 2, ...} computer vision {eyes: [{“location”: [21,13], “bounding”: [...]}]..} metadata Sparse Dense{num_pages: 12, channel: “email”...} Nested 3rd party {author_klout: 34.0, ...} Missing/Noisy timeseries [2014-12-01T12:03:12, 2014-12-01T12:05:12] Streaming Real Data != Benchmark Data
  24. 24. SeismologyNeuroscience Klein et al. Astronomy
  25. 25. http://mltsp.io pip install mltsp ML tsp. Machine Learning Time-Series Platform R. AllenM. SilverF. Peréz JSB Domain scientists AstroSeismoNeuro Funding bodies S. van der Walt A. Creillin-Quick Comp/ Stat/Eng An open-source web platform for distributed time-series analysis → •Selection of sophisticated feature extraction algorithms •Distributed computation •Sandboxed execution of custom code
  26. 26. Flask CLI (under developement) REST /learn /upload UI Disco W1 W2 Wn Disco worker pool datastore DB Demo!
  27. 27. -- MLTSP Continuous Integration github.com/drone github.com/mltsp/mltsp Test
 Container
 with MLTSP Custom Feature Extractor Sandbox Worker Pull request triggers webhook Workers- Disco SSH Drone calls GitHub status API
  28. 28. http://bigmacc.info Results from MLTSP The Astrophysical Journal Supplement Series, 203:32 (27pp), 2012 December Published Work before MLTSP MVP: Reproduce main results of a scientific paper
  29. 29. Probabilistic Classfication of Variable Stars Shivvers,JSB,Richards MNRAS,2014 106 “DEB” candidates 12 new mass-radii 15 “RCB/DYP”
 candidates 8 new discoveries Triple # of Galactic DYPer Stars Miller, Richards, JSB,..ApJ 2012 5400 Spectroscopic Targets Miller, JSB, Richards,..ApJ 2015 Turn synoptic imaged into ~spectrographs
  30. 30. WISE SUPPORT FROM SUBJECT DESCRIPTION Support Ticket DATE FROM SUBJECT DESCRIPTION Support Ticket DATE FROM SUBJECT DESCRIPTION Support Ticket DATE FROM SUBJECT DESCRIPTION Support Ticket DATE FROM SUBJECT DESCRIPTION Support Ticket DATE FROM SUBJECT DESCRIPTION Support Ticket DATE FROM SUBJECT DESCRIPTION Support Ticket DATE TIER 1 AUTOMATED RESPONSE CUSTOMER FROM COMPLEXITY TO CLARITY INTELLIGENT ROUTING RECOMMENDED RESPONSE AUTOMATED REPLY Wise Support >30% faster avg. response time more consistent answers faster scaling of support teams
  31. 31. Fault Tolerant ML augmentation vs. full automation Random forest prediction of body segment in Xbox Kinect gmail
  32. 32. https://www.reddit.com/r/funny/comments/3e7gy4/yes_netflix_because_my_6_year_old_will_enjoy_the/ “Yes Netflix, because my 6 year old will enjoy the animated fun of Sons of Anarchy”
  33. 33. [So]'What'should'be'the'machine'learning'engineering'process?”' “Machine'learning'disrupts'so_ware'engineering' - Leon Bottou (Facebook)
  34. 34. ỉπ vs. (or “Data Science is a Team Sport”) deep domain skill/knowledge/training deep methodological knowledge/skill deep domain or methodological skill/knowledge/training strong methodological or domain knowledge/skill Goal: empower teams of gamma’s to excel ML Systems: It Takes a Village
  35. 35. ‣ Novel testing can strengthen abstractions within components, and contracts between ‣ Machine Learning Systems require optimizations across components - so we’d better understand the true loss function ‣ (End user) fault tolerance is a must Parting Thoughts ‣ Build ML into Systems because to have to…
  36. 36. Area Man Bites off more than he can chew PyData 2014
  37. 37. Thanks! @pro6sb A"Systems"View"of"Machine"Learning in#science#&#industry

×