2. 2/22/2018 Dan Elton 2
What is a Machine Learning?
"Machine Learning is a field of study that gives computers the ability to learn without
being explicitly programmed" - Arthur Samuel, 1959
"A computer program is said to learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by
P, improves with the experience E." - Tom M. Mitchell.
Reinforcement
learning
Unsupervised learningSupervised learning
• Regression
• Classification
Model Y = f(x) to match data (x,y)
• Parametric models
• Linear models
• Polynomial model
• Logistic model
• Neural network model
• Convolutional Neural network
• Non parametric models
• Kernel Ridge regression
• Decision tree
• Gaussian Process regression
• Kernel SVM
• Clustering
• Dimensionality
reduction
• Autoencoders
• Robotics , etc
4. Dan Elton, Silver Spring AI Information Meetup2/22/2018 4
Pitfall # 1 : Overfitting
5. Dan Elton, Silver Spring AI Information Meetup2/22/2018 5
Pitfall # 1 : Overfitting
Check for overfitting with cross validation
Look at the gap between performance in
training data and performance in test data.
It should be as small as possible.
Regularization and/or change
hyperparameters
6. Dan Elton, Silver Spring AI Information Meetup2/22/2018 6
Pitfall # 1 : Overfitting
Always show your test data score and your training data score
Error from overfitting
Error from bias
7. Dan Elton, Silver Spring AI Information Meetup2/22/2018 7
What is bias?
Meanings of the term “bias”
• Statistical bias: The “bias” part of the error term,
from the model not being the true model
• Biased training data
• Training data collected in a biased way
• Target signal leaks into data
• Social bias: when the ML system does things that
are against our values
The last two are closely related.
8. Dan Elton, Silver Spring AI Information Meetup2/22/2018 8
Statistical bias
9. Dan Elton, Silver Spring AI Information Meetup2/22/2018 9
Biased training data
The famous “tank story”
10. Dan Elton, Silver Spring AI Information Meetup2/22/2018 10
Biased training data
1. L. N. Kanal and N. C. Randall. 1964. Recognition system design by statistical
analysis. In Proceedings of the 1964 19th ACM national conference (ACM '64).
Should we be telling the tank story?
Gwern (https://www.gwern.net/Tanks) argues not since:
The story is often described as fact when there’s no evidence it actually
happened. Higher epistemic rigor should be demanded.
“the tank story tends to promote complacency and underestimation of the state of the
art ”
Yet the story is most likely based on real research done in the 60s on
trying to identify tanks in areal photos. 1 However, the published research
corrects for brightness levels by applying a Laplacian filter to the images.
11. Dan Elton, Silver Spring AI Information Meetup2/22/2018 11
Biased training data
In the 1990s, the Cost Effective Health Care (CEHC) funded
a study to see if ML could predict risk of death for patients
with pneumonia.
The most accurate model was a multi task neural net, with
an AUC=0.86 compared to 0.77 for logistic regression
The system was almost fielded, but the researchers felt it was
risky putting a black box model into production without
knowing at all how it was working. So they trained a rule-
based learning system on the same data. It had lower
accuracy, but was highly transparent. One rule it learned was:
HasAsthma(x) LowerRisk(x)
A better story:
Cooper et al. Predicting dire outcomes of patients with community acquired pneumonia, Journal of Biomedical Informatics,
v.38 n.5, p.347-366, 2005
Caruana et al. 2015. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission.
In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
https://vimeo.com/125940125
12. Dan Elton, Silver Spring AI Information Meetup2/22/2018 12
Bias
Problems of bias in social applications
Kate Crawford (NIPS, 2018) identifies 2 types of bias:
Harms of allocation
• Discrimination in products & services
• mortgage approval
• Parole granting
• Insurance rates
Harms of representation
• More subtle
• Perpetuation of social inequalities and stereotypes we
don’t want to be perpetuated
• Misrepresentation of sensitive topics like personal and
group identity
13. Dan Elton, Silver Spring AI Information Meetup2/22/2018 13
Bias
Examples of harms of distribution
Datta, Amit, Michael Carl Tschantz, and Anupam Datta. "Automated experiments on ad
privacy settings." Proceedings on Privacy Enhancing Technologies 2015.1 (2015): 92-
112. APA
14. Dan Elton, Silver Spring AI Information Meetup2/22/2018 14
Bias
Examples of harms of representation
15. Dan Elton, Silver Spring AI Information Meetup2/22/2018 15
Bias
Examples of harms of representation
A newly published study found high error rates for dark skinned women
-- Microsoft 21%
-- IBM – 35%
Less than 1% error for white males
Nytimes
16. Dan Elton, Silver Spring AI Information Meetup2/22/2018 16
Bias
Examples of harms of representation
This problem was fixed by Google in Dec. 2016
17. Dan Elton, Silver Spring AI Information Meetup2/22/2018 17
Bias
Examples of harms of representation
Sweeney L. Discrimination in Online Ad Delivery. Communications of the
Association of Computing Machinery (CACM), Vol. 56 No. 5, Pages 44-54
(2013) http://arxiv.org/abs/1301.6822.
18. Dan Elton, Silver Spring AI Information Meetup2/22/2018 18
Bias
1. L. N. Kanal and N. C. Randall. 1964. Recognition system design by statistical
analysis. In Proceedings of the 1964 19th ACM national conference (ACM '64).
Examples of harms of representation
19. Dan Elton, Silver Spring AI Information Meetup2/22/2018 19
Bias
Implicit gender bias in word2vec
Bolukbasi et al, Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings arXiv:1607.06520
(2016)
Biased gender associations:
midwife:doctor; sewing:carpentry, and registered_nurse:physician,
not biased
feminine:manly; convent:monastery; handbag:briefcase, etc
20. Dan Elton, Silver Spring AI Information Meetup2/22/2018 20
Parole granting case
Consider the COMPAS algorithm for granting parole.. It uses 100 variables/features/risk
factors. Race is not explicitly considered.
It was widely reported in the media as being biased because it granted parole to white
people with much higher probability.
The media criticized it for being biased.
Yet, conditional on risk factors considered legitimate (such as number of prior
convictions, etc) the system did not exhibit any bias between white people and black
people.
(Corbett-Davies et al 2017)
Trying to make the algorithm ‘fair’ has a real social cost, quantified by Corbet-Davies et
al.
Debiasing demo at https://research.google.com/bigpicture/attacking-discrimination-in-ml/
21. Dan Elton, Silver Spring AI Information Meetup2/22/2018 21
Simpson’s paradox (aka confounding)
Stanford Admissions
22. Dan Elton, Silver Spring AI Information Meetup2/22/2018 22
Pitfall – not cleaning your data
The “Schenectady Problem”
https://s6.io/schenectady-12345/
“The fallacies of self-
reported data”
23. Dan Elton, Silver Spring AI Information Meetup2/22/2018 23
Pitfall - Not normalizing your data
Kernel methods are based on the distance between points – if one
feature (dimension) is very large, it will dominate.
Normalization helps speed up optimization of models by removing
”long valleys” in the cost function:
24. Dan Elton, Silver Spring AI Information Meetup2/22/2018 24
Simpson’s paradox
25. Dan Elton, Silver Spring AI Information Meetup2/22/2018 25
Pitfall – not comparing with baseline predictors
Scikit-learn contains a
dummy repressor
which just returns the
mean y as well as
dummy classifiers
26. Dan Elton, Silver Spring AI Information Meetup2/22/2018 26
Pitfall - Not normalizing your data
It’s sometimes important to normalize the target variables as well
Log (y) or Logistic(y) can be used to ‘squash the data’ to a narrow range of values.
Kernel Ridge
Regression
Random Forest Support Vector
Regression
27. Dan Elton, Silver Spring AI Information Meetup2/22/2018 27
Pitfall: trying to extrapolate
What’s the next number in this
sequence?
1, 3, 5, 7, ?
Correct solution
217,341
28. Dan Elton, Silver Spring AI Information Meetup2/22/2018 28
Pitfall: trying to extrapolate
Sometimes you get lucky…
But typically machine learning
models with nonlinearity do not
extrapolate.
29. Dan Elton, Silver Spring AI Information Meetup2/22/2018 29
Is “data science” a “science” ?
Technically, yes.
Data scientists generally follow the scientific method :
They collect data
They create a “hypothesis” (the model to be fit)
They see if the model can fit the data. If it doesn’t, some parameters are tweaked.
Eventually, they test the model on test data.
If the model works, it goes into production (“becomes a theory”)
But are ML models falsafiable? ….. Sort of (?)
30. Dan Elton, Silver Spring AI Information Meetup2/22/2018 30
ML models are “bad explanations”
https://www.ted.com/talks/david_deutsch_a_new_way_to_explain_explanation
Good explanations of the world cannot
easily be changed to accommodate new
data
31. Dan Elton, Silver Spring AI Information Meetup2/22/2018 31
One way of looking at it…
32. 2/22/2018 Dan Elton, P.W. Chung Group Meeting 32
Meta lesson – don’t be arrogant!
“With four parameters I can
fit an elephant, and with
five I can make him wiggle
his trunk.”
- John von Neumann
When you do regression, even with deep learning, typically all you are really
doing is curve fitting! Some ML can be recast as data compression. You
are not coming up with good explanations as to what is happening.
33. Dan Elton, Silver Spring AI Information Meetup2/22/2018 33
The End
Thanks for listening!