3. Practical Machine Learning
• Your model makes unacceptably large errors on new data. What to do next?
• Collect more training samples
4. Practical Machine Learning
• Your model makes unacceptably large errors on new data. What to do next?
• Collect more training samples
• Reduce number of features
5. Practical Machine Learning
• Your model makes unacceptably large errors on new data. What to do next?
• Collect more training samples
• Reduce number of features
• Increase number of features
6. Practical Machine Learning
• Your model makes unacceptably large errors on new data. What to do next?
• Collect more training samples
• Reduce number of features
• Increase number of features
• Regularization
7. Practical Machine Learning
• Your model makes unacceptably large errors on new data. What to do next?
• Collect more training samples
• Reduce number of features
• Increase number of features
• Regularization
• Bigger Model
8. Practical Machine Learning
• Your model makes unacceptably large errors on new data. What to do next?
• Collect more training samples
• Reduce number of features
• Increase number of features
• Regularization
• Bigger Model
• Hyper-parameter tuning
9. Bias vs. Variance
x
x
x
x x
x
x
x
x x
x
x
x
x x
High bias
(underfit)
“Just right” High variance
(overfit)
x
f(x)
x
f(x)
x
f(x)
10. Bias vs. Variance – Machine Learning
perspective
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Validation error
Training Test
11. Bias vs. Variance – Machine Learning
perspective
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Validation error
Training Validation Test
12. Bias vs. Variance – Machine Learning
perspective
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Validation error
Training Validation Test
Bias
Variance
13. Bias vs. Variance – Machine Learning
perspective
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Validation error
Training Validation Test
Bias
Variance
1%
5%
6%
14. Bias vs. Variance – Machine Learning
perspective
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Validation error
Training Validation Test
Bias
Variance
1%
5%
6%
1%
2%
6%
15. Bias vs. Variance – Machine Learning
perspective
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Validation error
Training Validation Test
Bias
Variance
1%
5%
6%
1%
2%
6%
1%
5%
10%
16. Data from different distributions/domains
Training Test
10-hour call-center speech50-hour conversational speech
17. Data from different distributions/domains
Training Test
10-hour call-center speech50-hour conversational speech
TestValTrain-Val
18. Data from different distributions/domains
Training Test
10-hour call-center speech50-hour conversational speech
TestValTrain-Val
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Train-Val error
• Validation error
• Test error
19. Data from different distributions/domains
Training Test
10-hour call-center speech50-hour conversational speech
TestValTrain-Val
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Train-Val error
• Validation error
• Test error
Bias
Variance
Train-Test mismatch
Overfitting of Val
20. Data from different distributions/domains
Training Test
10-hour call-center speech50-hour conversational speech
TestValTrain-Val
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Train-Val error
• Validation error
• Test error
Bias
Variance
1%
5%
6%
Train-Test mismatch
Overfitting of Val
10%
20%
22. Workflow (courtesy of Andrew Ng)
Training error high?
Bigger model
Train longer
New model architecture
Yes
23. Workflow (courtesy of Andrew Ng)
Training error high?
Bigger model
Train longer
New model architecture
Train-Val error high?
Yes
No
24. Workflow (courtesy of Andrew Ng)
Training error high?
Bigger model
Train longer
New model architecture
Train-Val error high?
More data
Regularization
New model architecture
Yes
Yes
No
25. Workflow (courtesy of Andrew Ng)
Training error high?
Bigger model
Train longer
New model architecture
Train-Val error high?
More data
Regularization
New model architecture
Val error high?
Yes
Yes
No
No
26. Workflow (courtesy of Andrew Ng)
Training error high?
Bigger model
Train longer
New model architecture
Train-Val error high?
More data
Regularization
New model architecture
Val error high?
More data similar to test
Data synthesis
New model architecture
Yes
Yes
Yes
No
No
27. Workflow (courtesy of Andrew Ng)
Training error high?
Bigger model
Train longer
New model architecture
Train-Val error high?
More data
Regularization
New model architecture
Val error high?
More data similar to test
Data synthesis
New model architecture
Test error high?
Yes
Yes
Yes
No
No
No
28. Workflow (courtesy of Andrew Ng)
Training error high?
Bigger model
Train longer
New model architecture
Train-Val error high?
More data
Regularization
New model architecture
Val error high?
More data similar to test
Data synthesis
New model architecture
Test error high? More validation data
Yes
Yes
Yes
Yes
No
No
No
No
Done
30. Learning curves
More training dataerror
Validation
Train
More training data
error
Validation
Train
High bias
Getting more data likely
doesn’t help much
31. Learning curves
More training dataerror
Validation
Train
More training data
error
Validation
Train
More training data
error
Validation
Train
High bias
Getting more data likely
doesn’t help much
High variance
Getting more data is likely
to help
33. Working with imbalanced datasets
• Change your performance metric (e.g. F1 score instead of Accuracy)
• Customize objective function
34. Working with imbalanced datasets
• Change your performance metric (e.g. F1 score instead of Accuracy)
• Customize objective function
• Data:
• Oversampling/Undersampling
• Synthesize minority class (e.g. SMOTE)
• Buy more data
35. Working with imbalanced datasets
• Change your performance metric (e.g. F1 score instead of Accuracy)
• Customize objective function
• Data:
• Oversampling/Undersampling
• Synthesize minority class (e.g. SMOTE)
• Buy more data
• Algorithms:
• Bagging
• New/Other models
• Different perspective, e.g. anomaly detection