Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Legal Analytics Course - Class 12 -... by Daniel Katz 1247 views
- Legal Analytics Course - Class 9 - ... by Daniel Katz 1517 views
- Legal Analytics Course - Class 11 -... by Daniel Katz 2602 views
- Legal Analytics Course - Class 10 -... by Daniel Katz 1793 views
- Legal Analytics Course - Class #2 -... by Daniel Katz 4026 views
- Legal Analytics - Introduction to t... by Daniel Katz 19818 views

2,056 views

Published on

Published in:
Law

No Downloads

Total views

2,056

On SlideShare

0

From Embeds

0

Number of Embeds

674

Shares

0

Downloads

0

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Class 6 Overﬁtting, Underﬁtting, & Cross-validation Legal Analytics Professor Daniel Martin Katz Professor Michael J Bommarito II legalanalyticscourse.com
- 2. Model Fit access more at legalanalyticscourse.com
- 3. We are interested in how well a given model performs access more at legalanalyticscourse.com
- 4. both on existing data access more at legalanalyticscourse.com
- 5. Underﬁtting occurs when a statistical model or algorithm cannot capture the underlying trend of the data access more at legalanalyticscourse.com
- 6. an underﬁt model has low variance, high bias access more at legalanalyticscourse.com
- 7. Overﬁtting occurs when a statistical model or algorithm captures the noise of the data (as opposed to the signal) access more at legalanalyticscourse.com
- 8. an overﬁt model has low bias, high variance access more at legalanalyticscourse.com
- 9. Model Fit is all about generalization access more at legalanalyticscourse.com
- 10. Underfitting/Overfitting The challenge of generalization
- 11. Why is generalization hard? Learning, machine or otherwise, looks something like this: ! We are presented with a view of objects in the world. ! We encode aspects of these objects, e.g., colors, into “features.” ! We generalize from patterns in these features to statements about objects. Example: ! We spend a summer on Michigan lakes and see many animals. All swans that we see are white. We generalize from this sample to the statement that all swans are white. What went wrong? Mathematically speaking, we did not observe enough variance in our observed sample; in fact, our observed variance for the color feature was zero! access more at legalanalyticscourse.com
- 12. Underfitting Zero variance in our observed sample led to a model with a constant predicted value; this model underfits the true variance of swans. Underfitting is, in essence, model simplification or ignorance of signal. Underfit models may perform well on modal data, but they typically struggle with lower-frequency or more complex cases. Underfitting can occur for a number of reasons: ! The model is too simple for the actual system. Technically speaking, either the model does not contain enough parameters or the functional forms are not capable of spanning the true functions. ! The number of records or variance of the records does not provide the learning process with enough information. access more at legalanalyticscourse.com
- 13. Underfitting Let’s look at a simple example – fitting a quadratic equation with a linear function. Quadratic functions look like this: y = a^2 + b x + c A function is therefore defined by supplying three parameters: a, b, and c. To make this realistic, let’s add some simple N(0,1) random errors, giving us the form: y = a^2 + b x + c + e where e is distributed N(0,1). access more at legalanalyticscourse.com
- 14. Underfitting Example: y = x^2 + 2 x + 1 + e access more at legalanalyticscourse.com
- 15. Underfitting What happens if we try to fit a model to this data. First, let’s start with a simple linear function, i.e., linear regression. Our linear form looks like this: y = a x + b + e A model is therefore defined by supplying two parameters: a and b. access more at legalanalyticscourse.com
- 16. Underfitting Example: y = 1.94 x + 6.62 access more at legalanalyticscourse.com
- 17. Underfitting This linear model clearly does not capture the non-linear relationship between x and y. However, no combination of a and b will successfully match this across all x, since the linear model is just too simple to represent a non-linear model. Linear models have too few parameters to fit non-linear models! Thus, they will typically underfit non-linear models. (fit quadratic model below) access more at legalanalyticscourse.com
- 18. Overfitting Overfitting is the opposite of underfitting, and it occurs when a model codifies noise into the structure. Overfitting may occur for a number of reasons: ! Models that are much more complex than the underlying data, either in terms of functional form or number of parameters. ! Learning that is too focused on minimizing the loss function for a single training sample. access more at legalanalyticscourse.com
- 19. Overfitting Let’s return to our quadratic example from before. As we discussed, our quadratic data was generated by a model with three parameters: a, b, and c. When we tried to explain the data with just two parameters, the resulting model underfit the data and did a poor job. When we tried to explain the data with three parameters, the resulting model did an excellent job of fitting the data. What happens if we try to explain the data with seven parameters? access more at legalanalyticscourse.com
- 20. Overfitting First, let’s focus on the portion of data that we saw in our training set before – the range where x lies between -4 and 4. At first blush, it looks like we’ve done an excellent job. Compared to our three parameter quadratic fit, we have done an even better job of reducing the some of our squared residuals. Why not always use more parameters? access more at legalanalyticscourse.com
- 21. Overfitting But what happens if we look outside of this (-4, 4) range? It turns out that we’ve committed two common overfitting mistakes: ! Our model is much more complex than the underlying data. Quadratic relationships are built on three parameters, whereas our model uses eight. When we minimized our loss function, the extra five parameters were used to fit to noise, not signal! ! Our model was trained on a very narrow sample of the world. While we do an excellent job of predicting values between -4 and 4, we do a very poor job outside of this range access more at legalanalyticscourse.com
- 22. Generalizing safely So what can we do to safely generalize? Two of the most common approaches are regularization and cross-validation. Regularization is … access more at legalanalyticscourse.com
- 23. Cross-validation Cross-validation, like regularization, is meant to prevent the learning process from codifying sample-specific noise as structure. However, unlike regularization, cross-validation does not impose any geometric constraints on the shape or “feel” of our learning solution, i.e., model. Instead, it focuses on repeating the learning task on multiple samples of training data, then evaluating the performance of these models on the “held- out” or unseen data. access more at legalanalyticscourse.com
- 24. Cross-validation: K-fold The most common approach to cross-validation is to divide the training set of data into K distinct partitions of equal size. K-1 of these partitions are then used to learn a models. The resulting model is then used to predict the Kth partition. This process is repeated K times, and the best performing sample is kept as the trained model. http://genome.tugraz.at/proclassify/help/pages/XV.html http://stats.stackexchange.com/questions/1826/cross-validation-in-plain-english access more at legalanalyticscourse.com
- 25. “Cross-validation is widely used to check model error by testing on data not part of the training set. Multiple rounds with randomly selected test sets are averaged together to reduce variability of the cross-validation; high variability of the model will produce high average errors on the test set. One way of resolving the trade-off is to use mixture models and ensemble learning. For example, boosting combines many ‘weak’ (high bias) models in an ensemble that has greater variance than the individual models, while bagging combines ‘strong’ learners in a way that reduces their variance.” http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29 access more at legalanalyticscourse.com
- 26. Legal Analytics Class 6 - Overfitting, Underfitting, & Cross-Validation daniel martin katz blog | ComputationalLegalStudies corp | LexPredict michael j bommarito twitter | @computational blog | ComputationalLegalStudies corp | LexPredict twitter | @mjbommar more content available at legalanalyticscourse.com site | danielmartinkatz.com site | bommaritollc.com

No public clipboards found for this slide

Be the first to comment