A statistical model or a machine learning algorithm is said to have underfitting when a model is too simple to capture data complexities. It represents the inability of the model to learn the training data effectively result in poor performance both on the training and testing data. In simple terms, an underfit model’s are inaccurate, especially when applied to new, unseen examples. It mainly happens when we uses very simple model with overly simplified assumptions. To address underfitting problem of the model, we need to use more complex models, with enhanced feature representation, and less regularization.
A statistical model is said to be overfitted when the model does not make accurate predictions on testing data. When a model gets trained with so much data, it starts learning from the noise and inaccurate data entries in our data set. And when testing with test data results in High variance. Then the model does not categorize the data correctly, because of too many details and noise. The causes of overfitting are the non-parametric and non-linear methods because these types of machine learning algorithms have more freedom in building the model based on the dataset and therefore they can really build unrealistic models. A solution to avoid overfitting is using a linear algorithm if we have linear data or using the parameters like the maximal depth if we are using decision trees.
2. Overfitting in machine learning
Overfitting refers to a scenario when the model tries to cover all the data points present
in the given dataset.
The model starts caching noise and inaccurate values present in the dataset.
Reduces the efficiency and accuracy of the model.
The overfitted model has low bias and high variance
4. Overfitting in machine learning Cont.
Overfitted model performance
The accuracy score is good and high during training but it decreases during testing.
How to avoid Overfitting
Using cross-validation
Using Regularization techniques
Implementing Ensembling techniques.
Picking a less parameterized/complex model
Training the model with sufficient data
Removing features
Early stopping the training
5. Underfitting in machine learning
Underfitting is just the opposite of overfitting.
Underfitting occurs when our machine learning model is not able to capture the
underlying trend of the data.
An underfitted model has high bias and low variance.
7. Underfitting in machine learning Cont.
Underfitted model performance
The accuracy score is low during training as well as testing.
How to avoid Underfitting
Preprocessing the data to reduce noise in data
More training to the model
Increasing the number of features in the dataset
Increasing the model complexity
Increasing the training time of the model to get better results.
8. Good fit model in machine learning
A good fit model is a balanced model, which is not suffering from underfitting and
overfitting.
This is a perfect model which gives good accuracy score during training and equally
performs well during testing.
9. Detecting Overfitting And Underfitting
And Good Fit
Detecting for Classification and Regression:
Error Overfitting Right Fit Underfitting
Training Low Low High
Test High Low High
11. Example To Understand Overfitting vs. Underfitting
(vs. Good Fitting) in Machine Learning
Consider a AI class consisting of students and a professor
12. Example To Underfitting vs. Overfitting (vs. Best
Fitting) in Machine Learning Cont.
We can broadly divide the students into 3 features (Hobby, Interest, Attention).
13. Example To Underfitting vs. Overfitting (vs. Best
Fitting) in Machine Learning Cont.
The professor first delivers lectures and teaches the students about the problems and
how to solve them.
At the end of the day, the professor simply takes a quiz based on what he taught in
the class.
14. Example To Underfitting vs. Overfitting (vs. Best
Fitting) in Machine Learning Cont.
So, let’s discuss what happens when the professor takes a classroom test at the end of
the day:
We can clearly infer that the student who simply memorizes everything is scoring better without
much difficulty.
15. Example To Underfitting vs. Overfitting (vs. Best
Fitting) in Machine Learning Cont.
Now here’s the twist.
Let’s also look at what happens during the semester final, when students have to face
new unknown questions which are not taught in the class by the Professor.
16. How Does this Relate to Underfitting and
Overfitting in Machine Learning
Summaries the students Class Test and Semester Exam scores with a feature of
Interest.
We Make a dataset with Class Test and Semester Exam scores as a training and testing
dataset.
17. How Does this Relate to Underfitting and
Overfitting in Machine Learning
This example relates to the problem dataset Which the train, test and validation
scores of the dataset.
This example relates to the problem Which we encountered during the train and test
scores of the decision tree classifier.
18. How Does this Relate to Underfitting and Overfitting
in Machine Learning Cont.
Let’s work on connecting this example with the results of the decision tree classifier.