Practical Machine Learning
Practical Machine Learning
• Your model makes unacceptably large errors on new data. What to do next?
Practical Machine Learning
• Your model makes unacceptably large errors on new data. What to do next?
• Collect more training samples
Practical Machine Learning
• Your model makes unacceptably large errors on new data. What to do next?
• Collect more training samples
• Reduce number of features
Practical Machine Learning
• Your model makes unacceptably large errors on new data. What to do next?
• Collect more training samples
• Reduce number of features
• Increase number of features
Practical Machine Learning
• Your model makes unacceptably large errors on new data. What to do next?
• Collect more training samples
• Reduce number of features
• Increase number of features
• Regularization
Practical Machine Learning
• Your model makes unacceptably large errors on new data. What to do next?
• Collect more training samples
• Reduce number of features
• Increase number of features
• Regularization
• Bigger Model
Practical Machine Learning
• Your model makes unacceptably large errors on new data. What to do next?
• Collect more training samples
• Reduce number of features
• Increase number of features
• Regularization
• Bigger Model
• Hyper-parameter tuning
Bias vs. Variance
x
x
x
x x
x
x
x
x x
x
x
x
x x
High bias
(underfit)
“Just right” High variance
(overfit)
x
f(x)
x
f(x)
x
f(x)
Bias vs. Variance – Machine Learning
perspective
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Validation error
Training Test
Bias vs. Variance – Machine Learning
perspective
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Validation error
Training Validation Test
Bias vs. Variance – Machine Learning
perspective
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Validation error
Training Validation Test
Bias
Variance
Bias vs. Variance – Machine Learning
perspective
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Validation error
Training Validation Test
Bias
Variance
1%
5%
6%
Bias vs. Variance – Machine Learning
perspective
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Validation error
Training Validation Test
Bias
Variance
1%
5%
6%
1%
2%
6%
Bias vs. Variance – Machine Learning
perspective
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Validation error
Training Validation Test
Bias
Variance
1%
5%
6%
1%
2%
6%
1%
5%
10%
Data from different distributions/domains
Training Test
10-hour call-center speech50-hour conversational speech
Data from different distributions/domains
Training Test
10-hour call-center speech50-hour conversational speech
TestValTrain-Val
Data from different distributions/domains
Training Test
10-hour call-center speech50-hour conversational speech
TestValTrain-Val
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Train-Val error
• Validation error
• Test error
Data from different distributions/domains
Training Test
10-hour call-center speech50-hour conversational speech
TestValTrain-Val
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Train-Val error
• Validation error
• Test error
Bias
Variance
Train-Test mismatch
Overfitting of Val
Data from different distributions/domains
Training Test
10-hour call-center speech50-hour conversational speech
TestValTrain-Val
• Optimal error rate (e.g. Bayes rate, best human error)
• Training error
• Train-Val error
• Validation error
• Test error
Bias
Variance
1%
5%
6%
Train-Test mismatch
Overfitting of Val
10%
20%
Workflow (courtesy of Andrew Ng)
Training error high?
Workflow (courtesy of Andrew Ng)
Training error high?
Bigger model
Train longer
New model architecture
Yes
Workflow (courtesy of Andrew Ng)
Training error high?
Bigger model
Train longer
New model architecture
Train-Val error high?
Yes
No
Workflow (courtesy of Andrew Ng)
Training error high?
Bigger model
Train longer
New model architecture
Train-Val error high?
More data
Regularization
New model architecture
Yes
Yes
No
Workflow (courtesy of Andrew Ng)
Training error high?
Bigger model
Train longer
New model architecture
Train-Val error high?
More data
Regularization
New model architecture
Val error high?
Yes
Yes
No
No
Workflow (courtesy of Andrew Ng)
Training error high?
Bigger model
Train longer
New model architecture
Train-Val error high?
More data
Regularization
New model architecture
Val error high?
More data similar to test
Data synthesis
New model architecture
Yes
Yes
Yes
No
No
Workflow (courtesy of Andrew Ng)
Training error high?
Bigger model
Train longer
New model architecture
Train-Val error high?
More data
Regularization
New model architecture
Val error high?
More data similar to test
Data synthesis
New model architecture
Test error high?
Yes
Yes
Yes
No
No
No
Workflow (courtesy of Andrew Ng)
Training error high?
Bigger model
Train longer
New model architecture
Train-Val error high?
More data
Regularization
New model architecture
Val error high?
More data similar to test
Data synthesis
New model architecture
Test error high? More validation data
Yes
Yes
Yes
Yes
No
No
No
No
Done
Learning curves
More training dataerror
Validation
Train
Learning curves
More training dataerror
Validation
Train
More training data
error
Validation
Train
High bias
Getting more data likely
doesn’t help much
Learning curves
More training dataerror
Validation
Train
More training data
error
Validation
Train
More training data
error
Validation
Train
High bias
Getting more data likely
doesn’t help much
High variance
Getting more data is likely
to help
Working with imbalanced datasets
Working with imbalanced datasets
• Change your performance metric (e.g. F1 score instead of Accuracy)
• Customize objective function
Working with imbalanced datasets
• Change your performance metric (e.g. F1 score instead of Accuracy)
• Customize objective function
• Data:
• Oversampling/Undersampling
• Synthesize minority class (e.g. SMOTE)
• Buy more data
Working with imbalanced datasets
• Change your performance metric (e.g. F1 score instead of Accuracy)
• Customize objective function
• Data:
• Oversampling/Undersampling
• Synthesize minority class (e.g. SMOTE)
• Buy more data
• Algorithms:
• Bagging
• New/Other models
• Different perspective, e.g. anomaly detection
Dirty work drives progress

Practical Machine Learning

  • 1.
  • 2.
    Practical Machine Learning •Your model makes unacceptably large errors on new data. What to do next?
  • 3.
    Practical Machine Learning •Your model makes unacceptably large errors on new data. What to do next? • Collect more training samples
  • 4.
    Practical Machine Learning •Your model makes unacceptably large errors on new data. What to do next? • Collect more training samples • Reduce number of features
  • 5.
    Practical Machine Learning •Your model makes unacceptably large errors on new data. What to do next? • Collect more training samples • Reduce number of features • Increase number of features
  • 6.
    Practical Machine Learning •Your model makes unacceptably large errors on new data. What to do next? • Collect more training samples • Reduce number of features • Increase number of features • Regularization
  • 7.
    Practical Machine Learning •Your model makes unacceptably large errors on new data. What to do next? • Collect more training samples • Reduce number of features • Increase number of features • Regularization • Bigger Model
  • 8.
    Practical Machine Learning •Your model makes unacceptably large errors on new data. What to do next? • Collect more training samples • Reduce number of features • Increase number of features • Regularization • Bigger Model • Hyper-parameter tuning
  • 9.
    Bias vs. Variance x x x xx x x x x x x x x x x High bias (underfit) “Just right” High variance (overfit) x f(x) x f(x) x f(x)
  • 10.
    Bias vs. Variance– Machine Learning perspective • Optimal error rate (e.g. Bayes rate, best human error) • Training error • Validation error Training Test
  • 11.
    Bias vs. Variance– Machine Learning perspective • Optimal error rate (e.g. Bayes rate, best human error) • Training error • Validation error Training Validation Test
  • 12.
    Bias vs. Variance– Machine Learning perspective • Optimal error rate (e.g. Bayes rate, best human error) • Training error • Validation error Training Validation Test Bias Variance
  • 13.
    Bias vs. Variance– Machine Learning perspective • Optimal error rate (e.g. Bayes rate, best human error) • Training error • Validation error Training Validation Test Bias Variance 1% 5% 6%
  • 14.
    Bias vs. Variance– Machine Learning perspective • Optimal error rate (e.g. Bayes rate, best human error) • Training error • Validation error Training Validation Test Bias Variance 1% 5% 6% 1% 2% 6%
  • 15.
    Bias vs. Variance– Machine Learning perspective • Optimal error rate (e.g. Bayes rate, best human error) • Training error • Validation error Training Validation Test Bias Variance 1% 5% 6% 1% 2% 6% 1% 5% 10%
  • 16.
    Data from differentdistributions/domains Training Test 10-hour call-center speech50-hour conversational speech
  • 17.
    Data from differentdistributions/domains Training Test 10-hour call-center speech50-hour conversational speech TestValTrain-Val
  • 18.
    Data from differentdistributions/domains Training Test 10-hour call-center speech50-hour conversational speech TestValTrain-Val • Optimal error rate (e.g. Bayes rate, best human error) • Training error • Train-Val error • Validation error • Test error
  • 19.
    Data from differentdistributions/domains Training Test 10-hour call-center speech50-hour conversational speech TestValTrain-Val • Optimal error rate (e.g. Bayes rate, best human error) • Training error • Train-Val error • Validation error • Test error Bias Variance Train-Test mismatch Overfitting of Val
  • 20.
    Data from differentdistributions/domains Training Test 10-hour call-center speech50-hour conversational speech TestValTrain-Val • Optimal error rate (e.g. Bayes rate, best human error) • Training error • Train-Val error • Validation error • Test error Bias Variance 1% 5% 6% Train-Test mismatch Overfitting of Val 10% 20%
  • 21.
    Workflow (courtesy ofAndrew Ng) Training error high?
  • 22.
    Workflow (courtesy ofAndrew Ng) Training error high? Bigger model Train longer New model architecture Yes
  • 23.
    Workflow (courtesy ofAndrew Ng) Training error high? Bigger model Train longer New model architecture Train-Val error high? Yes No
  • 24.
    Workflow (courtesy ofAndrew Ng) Training error high? Bigger model Train longer New model architecture Train-Val error high? More data Regularization New model architecture Yes Yes No
  • 25.
    Workflow (courtesy ofAndrew Ng) Training error high? Bigger model Train longer New model architecture Train-Val error high? More data Regularization New model architecture Val error high? Yes Yes No No
  • 26.
    Workflow (courtesy ofAndrew Ng) Training error high? Bigger model Train longer New model architecture Train-Val error high? More data Regularization New model architecture Val error high? More data similar to test Data synthesis New model architecture Yes Yes Yes No No
  • 27.
    Workflow (courtesy ofAndrew Ng) Training error high? Bigger model Train longer New model architecture Train-Val error high? More data Regularization New model architecture Val error high? More data similar to test Data synthesis New model architecture Test error high? Yes Yes Yes No No No
  • 28.
    Workflow (courtesy ofAndrew Ng) Training error high? Bigger model Train longer New model architecture Train-Val error high? More data Regularization New model architecture Val error high? More data similar to test Data synthesis New model architecture Test error high? More validation data Yes Yes Yes Yes No No No No Done
  • 29.
    Learning curves More trainingdataerror Validation Train
  • 30.
    Learning curves More trainingdataerror Validation Train More training data error Validation Train High bias Getting more data likely doesn’t help much
  • 31.
    Learning curves More trainingdataerror Validation Train More training data error Validation Train More training data error Validation Train High bias Getting more data likely doesn’t help much High variance Getting more data is likely to help
  • 32.
  • 33.
    Working with imbalanceddatasets • Change your performance metric (e.g. F1 score instead of Accuracy) • Customize objective function
  • 34.
    Working with imbalanceddatasets • Change your performance metric (e.g. F1 score instead of Accuracy) • Customize objective function • Data: • Oversampling/Undersampling • Synthesize minority class (e.g. SMOTE) • Buy more data
  • 35.
    Working with imbalanceddatasets • Change your performance metric (e.g. F1 score instead of Accuracy) • Customize objective function • Data: • Oversampling/Undersampling • Synthesize minority class (e.g. SMOTE) • Buy more data • Algorithms: • Bagging • New/Other models • Different perspective, e.g. anomaly detection
  • 36.