Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Class 6
Overfitting, Underfitting, & Cross-validation
Legal Analytics
Professor Daniel Martin Katz
Professor Michael J Bomma...
Model Fit
access more at legalanalyticscourse.com
We are interested in how well
a given model performs
access more at legalanalyticscourse.com
both on existing data
access more at legalanalyticscourse.com
Underfitting occurs when a
statistical model or algorithm
cannot capture the
underlying trend of the data
access more at le...
an underfit
model
has
low variance,
high bias
access more at legalanalyticscourse.com
Overfitting occurs when a
statistical model or algorithm
captures the noise of the data
(as opposed to the signal)
access m...
an overfit
model
has
low bias,
high variance
access more at legalanalyticscourse.com
Model Fit
is all about
generalization
access more at legalanalyticscourse.com
Underfitting/Overfitting
The challenge of generalization
Why is generalization hard?
Learning, machine or otherwise, looks something like this:
!  We are presented with a view of ...
Underfitting
Zero variance in our observed sample led to a model with a constant
predicted value; this model underfits the...
Underfitting
Let’s look at a simple example – fitting a quadratic equation with a linear
function.
Quadratic functions loo...
Underfitting
Example:
y = x^2 + 2 x + 1 + e
access more at legalanalyticscourse.com
Underfitting
What happens if we try to fit a model to this data. First, let’s start with a
simple linear function, i.e., l...
Underfitting
Example:
y = 1.94 x + 6.62
access more at legalanalyticscourse.com
Underfitting
This linear model clearly does not capture the non-linear relationship
between x and y.
However, no combinati...
Overfitting
Overfitting is the opposite of underfitting, and it occurs when a model
codifies noise into the structure.
Ove...
Overfitting
Let’s return to our quadratic example from before. As we discussed, our
quadratic data was generated by a mode...
Overfitting
First, let’s focus on the portion of data that we saw in our training set before
– the range where x lies betw...
Overfitting
But what happens if we look outside of this (-4, 4) range? It turns out that
we’ve committed two common overfi...
Generalizing safely
So what can we do to safely generalize? Two of the most common approaches
are regularization and cross...
Cross-validation
Cross-validation, like regularization, is meant to prevent the learning
process from codifying sample-spe...
Cross-validation: K-fold
The most common approach to cross-validation is to divide the training set of
data into K distinc...
“Cross-validation is widely used to check model error by
testing on data not part of the training set. Multiple rounds
wit...
Legal Analytics
Class 6 - Overfitting, Underfitting, & Cross-Validation
daniel martin katz
blog | ComputationalLegalStudie...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validation - Professor Daniel Martin Katz + Professo...
Upcoming SlideShare
Loading in …5
×

Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validation - Professor Daniel Martin Katz + Professor Michael J Bommarito

2,056 views

Published on

Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validation - Professor Daniel Martin Katz + Professor Michael J Bommarito

Published in: Law
  • Be the first to comment

Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validation - Professor Daniel Martin Katz + Professor Michael J Bommarito

  1. 1. Class 6 Overfitting, Underfitting, & Cross-validation Legal Analytics Professor Daniel Martin Katz Professor Michael J Bommarito II legalanalyticscourse.com
  2. 2. Model Fit access more at legalanalyticscourse.com
  3. 3. We are interested in how well a given model performs access more at legalanalyticscourse.com
  4. 4. both on existing data access more at legalanalyticscourse.com
  5. 5. Underfitting occurs when a statistical model or algorithm cannot capture the underlying trend of the data access more at legalanalyticscourse.com
  6. 6. an underfit model has low variance, high bias access more at legalanalyticscourse.com
  7. 7. Overfitting occurs when a statistical model or algorithm captures the noise of the data (as opposed to the signal) access more at legalanalyticscourse.com
  8. 8. an overfit model has low bias, high variance access more at legalanalyticscourse.com
  9. 9. Model Fit is all about generalization access more at legalanalyticscourse.com
  10. 10. Underfitting/Overfitting The challenge of generalization
  11. 11. Why is generalization hard? Learning, machine or otherwise, looks something like this: !  We are presented with a view of objects in the world. !  We encode aspects of these objects, e.g., colors, into “features.” !  We generalize from patterns in these features to statements about objects. Example: !  We spend a summer on Michigan lakes and see many animals. All swans that we see are white. We generalize from this sample to the statement that all swans are white. What went wrong? Mathematically speaking, we did not observe enough variance in our observed sample; in fact, our observed variance for the color feature was zero! access more at legalanalyticscourse.com
  12. 12. Underfitting Zero variance in our observed sample led to a model with a constant predicted value; this model underfits the true variance of swans. Underfitting is, in essence, model simplification or ignorance of signal. Underfit models may perform well on modal data, but they typically struggle with lower-frequency or more complex cases. Underfitting can occur for a number of reasons: !  The model is too simple for the actual system. Technically speaking, either the model does not contain enough parameters or the functional forms are not capable of spanning the true functions. !  The number of records or variance of the records does not provide the learning process with enough information. access more at legalanalyticscourse.com
  13. 13. Underfitting Let’s look at a simple example – fitting a quadratic equation with a linear function. Quadratic functions look like this: y = a^2 + b x + c A function is therefore defined by supplying three parameters: a, b, and c. To make this realistic, let’s add some simple N(0,1) random errors, giving us the form: y = a^2 + b x + c + e where e is distributed N(0,1). access more at legalanalyticscourse.com
  14. 14. Underfitting Example: y = x^2 + 2 x + 1 + e access more at legalanalyticscourse.com
  15. 15. Underfitting What happens if we try to fit a model to this data. First, let’s start with a simple linear function, i.e., linear regression. Our linear form looks like this: y = a x + b + e A model is therefore defined by supplying two parameters: a and b. access more at legalanalyticscourse.com
  16. 16. Underfitting Example: y = 1.94 x + 6.62 access more at legalanalyticscourse.com
  17. 17. Underfitting This linear model clearly does not capture the non-linear relationship between x and y. However, no combination of a and b will successfully match this across all x, since the linear model is just too simple to represent a non-linear model. Linear models have too few parameters to fit non-linear models! Thus, they will typically underfit non-linear models. (fit quadratic model below) access more at legalanalyticscourse.com
  18. 18. Overfitting Overfitting is the opposite of underfitting, and it occurs when a model codifies noise into the structure. Overfitting may occur for a number of reasons: !  Models that are much more complex than the underlying data, either in terms of functional form or number of parameters. !  Learning that is too focused on minimizing the loss function for a single training sample. access more at legalanalyticscourse.com
  19. 19. Overfitting Let’s return to our quadratic example from before. As we discussed, our quadratic data was generated by a model with three parameters: a, b, and c. When we tried to explain the data with just two parameters, the resulting model underfit the data and did a poor job. When we tried to explain the data with three parameters, the resulting model did an excellent job of fitting the data. What happens if we try to explain the data with seven parameters? access more at legalanalyticscourse.com
  20. 20. Overfitting First, let’s focus on the portion of data that we saw in our training set before – the range where x lies between -4 and 4. At first blush, it looks like we’ve done an excellent job. Compared to our three parameter quadratic fit, we have done an even better job of reducing the some of our squared residuals. Why not always use more parameters? access more at legalanalyticscourse.com
  21. 21. Overfitting But what happens if we look outside of this (-4, 4) range? It turns out that we’ve committed two common overfitting mistakes: !  Our model is much more complex than the underlying data. Quadratic relationships are built on three parameters, whereas our model uses eight. When we minimized our loss function, the extra five parameters were used to fit to noise, not signal! !  Our model was trained on a very narrow sample of the world. While we do an excellent job of predicting values between -4 and 4, we do a very poor job outside of this range access more at legalanalyticscourse.com
  22. 22. Generalizing safely So what can we do to safely generalize? Two of the most common approaches are regularization and cross-validation. Regularization is … access more at legalanalyticscourse.com
  23. 23. Cross-validation Cross-validation, like regularization, is meant to prevent the learning process from codifying sample-specific noise as structure. However, unlike regularization, cross-validation does not impose any geometric constraints on the shape or “feel” of our learning solution, i.e., model. Instead, it focuses on repeating the learning task on multiple samples of training data, then evaluating the performance of these models on the “held- out” or unseen data. access more at legalanalyticscourse.com
  24. 24. Cross-validation: K-fold The most common approach to cross-validation is to divide the training set of data into K distinct partitions of equal size. K-1 of these partitions are then used to learn a models. The resulting model is then used to predict the Kth partition. This process is repeated K times, and the best performing sample is kept as the trained model. http://genome.tugraz.at/proclassify/help/pages/XV.html http://stats.stackexchange.com/questions/1826/cross-validation-in-plain-english access more at legalanalyticscourse.com
  25. 25. “Cross-validation is widely used to check model error by testing on data not part of the training set. Multiple rounds with randomly selected test sets are averaged together to reduce variability of the cross-validation; high variability of the model will produce high average errors on the test set. One way of resolving the trade-off is to use mixture models and ensemble learning. For example, boosting combines many ‘weak’ (high bias) models in an ensemble that has greater variance than the individual models, while bagging combines ‘strong’ learners in a way that reduces their variance.” http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29 access more at legalanalyticscourse.com
  26. 26. Legal Analytics Class 6 - Overfitting, Underfitting, & Cross-Validation daniel martin katz blog | ComputationalLegalStudies corp | LexPredict michael j bommarito twitter | @computational blog | ComputationalLegalStudies corp | LexPredict twitter | @mjbommar more content available at legalanalyticscourse.com site | danielmartinkatz.com site | bommaritollc.com

×