Underfitting and Overfitting in Machine
Learning
Underfitting
• Under fitting, occurs when a machine learning model is too simple to capture the
patterns in the training data. This leads to poor performance on both training and
test data because the model fails to learn the underlying relationships.
• Causes:
• Using a very simple model (e.g., a linear model for complex data).
• Not training the model for enough epochs (in deep learning).
• Too much regularization.
• Ignoring important features.
Example
Suppose we are predicting blood sugar levels based on age and BMI using a
linear regression model. However, if the relationship is actually non-linear
and we still use a simple straight-line model, it will underfit the data and give
poor predictions.
How to Fix?
Use a more complex model (e.g., polynomial regression instead of linear).
•Train for a longer time.
•Add more relevant features
Overfitting
Over fitting occurs when a machine learning model learns too much detail
from the training data, including noise. The model performs very well on
training data but poorly on new (test) data.
• Using a very complex model (e.g., deep neural networks on small data).
• Training for too many epochs.
• Too little regularization.
• High variance in the data.
How to avoid over fitting
How to Fix?
• Use regularization (e.g., L1/L2 regularization, dropout in deep learning).
• Use simpler models (e.g., pruning decision trees).
• Increase the amount of training data.
• Use cross-validation to check if the model is overfitting.
Bias and variance
Dependency Modelling in Classification
• Dependency modelling in classification refers to the process of
understanding and capturing the relationships between features (independent
variables) and the target class (dependent variable). In many classification
tasks, features are not independent but exhibit dependencies, which, if
properly modeled, can enhance classification accuracy.
Types of Dependency Modelling in
Classification:
• Feature Dependency Modeling:
• Some features may have strong correlations. For example, in medical
diagnosis (such as diabetes prediction), blood glucose levels and insulin levels
are interdependent.
• Methods like Bayesian Networks and Markov Models can be used to
model these dependencies.
• Class Dependency Modelling
• Some classification problems involve sequential or hierarchical relationships among classes.
• Hidden Markov Models (HMM) and Conditional Random Fields (CRF) are
commonly used in text classification and speech recognition.
• Graph-based Dependency Modeling:
• Probabilistic Graphical Models (PGMs) such as Bayesian Networks and Markov
Random Fields (MRFs) are useful for modeling complex dependencies.
• These models help in representing conditional dependencies among variables.
• Deep Learning-Based Dependency Modeling:
• Neural networks, especially Recurrent Neural Networks (RNNs) and
Graph Neural Networks (GNNs), capture dependencies in time-series and
graph-based data.
• Transformer models (like BERT) use attention mechanisms to model long-
range dependencies.
Applications in Classification:
• Medical Diagnosis: Identifying dependencies between symptoms and diseases
(e.g., diabetes prediction using correlated health parameters).
• Natural Language Processing (NLP): Dependency parsing in text
classification (e.g., sentiment analysis).
• Image Recognition: Convolutional Neural Networks (CNNs) model spatial
dependencies between pixels.
• Fraud Detection: Graph-based models detect fraudulent transactions based on
relational dependencies.

Underfitting and Overfitting in Machine Learning.pptx

  • 1.
    Underfitting and Overfittingin Machine Learning
  • 2.
    Underfitting • Under fitting,occurs when a machine learning model is too simple to capture the patterns in the training data. This leads to poor performance on both training and test data because the model fails to learn the underlying relationships. • Causes: • Using a very simple model (e.g., a linear model for complex data). • Not training the model for enough epochs (in deep learning). • Too much regularization. • Ignoring important features.
  • 4.
    Example Suppose we arepredicting blood sugar levels based on age and BMI using a linear regression model. However, if the relationship is actually non-linear and we still use a simple straight-line model, it will underfit the data and give poor predictions.
  • 5.
    How to Fix? Usea more complex model (e.g., polynomial regression instead of linear). •Train for a longer time. •Add more relevant features
  • 6.
    Overfitting Over fitting occurswhen a machine learning model learns too much detail from the training data, including noise. The model performs very well on training data but poorly on new (test) data. • Using a very complex model (e.g., deep neural networks on small data). • Training for too many epochs. • Too little regularization. • High variance in the data.
  • 7.
    How to avoidover fitting
  • 8.
    How to Fix? •Use regularization (e.g., L1/L2 regularization, dropout in deep learning). • Use simpler models (e.g., pruning decision trees). • Increase the amount of training data. • Use cross-validation to check if the model is overfitting.
  • 11.
  • 12.
    Dependency Modelling inClassification • Dependency modelling in classification refers to the process of understanding and capturing the relationships between features (independent variables) and the target class (dependent variable). In many classification tasks, features are not independent but exhibit dependencies, which, if properly modeled, can enhance classification accuracy.
  • 13.
    Types of DependencyModelling in Classification: • Feature Dependency Modeling: • Some features may have strong correlations. For example, in medical diagnosis (such as diabetes prediction), blood glucose levels and insulin levels are interdependent. • Methods like Bayesian Networks and Markov Models can be used to model these dependencies.
  • 14.
    • Class DependencyModelling • Some classification problems involve sequential or hierarchical relationships among classes. • Hidden Markov Models (HMM) and Conditional Random Fields (CRF) are commonly used in text classification and speech recognition. • Graph-based Dependency Modeling: • Probabilistic Graphical Models (PGMs) such as Bayesian Networks and Markov Random Fields (MRFs) are useful for modeling complex dependencies. • These models help in representing conditional dependencies among variables.
  • 15.
    • Deep Learning-BasedDependency Modeling: • Neural networks, especially Recurrent Neural Networks (RNNs) and Graph Neural Networks (GNNs), capture dependencies in time-series and graph-based data. • Transformer models (like BERT) use attention mechanisms to model long- range dependencies.
  • 16.
    Applications in Classification: •Medical Diagnosis: Identifying dependencies between symptoms and diseases (e.g., diabetes prediction using correlated health parameters). • Natural Language Processing (NLP): Dependency parsing in text classification (e.g., sentiment analysis). • Image Recognition: Convolutional Neural Networks (CNNs) model spatial dependencies between pixels. • Fraud Detection: Graph-based models detect fraudulent transactions based on relational dependencies.