TAE1
FYMCA ‘B’
Academic Year- 2024-25
Subject: Machine Learning
Topic: Machine Learning life cycle Training Testing and
Cross-validation
Department of MCA
Name of Student: Samrudhi Krishnath Yadav
Roll No: MCA91
Registration No: 24AMCA1101
Internal Mentor: Pro.Rasika Salunke
Name of Student: Sujata Shrimant Gawade
Roll No: MCA92
Registration No: 24AMCA1101
Internal Mentor: Pro.Rasika Salunke
Name of Student: Suyash Shrikant Shingote
Roll No: MCA95
Registration No: 24AMCA1101086
Internal Mentor: Pro.Rasika Salunke
In the machine learning life cycle, training, testing, and cross-
validation are distinct yet interconnected processes that ensure a
model is built effectively and evaluated robustly.
1.Training: Always performed to build the model.
2.Testing: Essential for evaluating how the model performs on
unseen data before deployment.
3.Cross-Validation: Recommended during model tuning and
hyperparameter optimization to avoid overfitting and improve
reliability.
Machine Learning Life Cycle: Training vs Testing vs Cross-Validation
1.Data Collection: Gathering relevant data from various sources like databases, APIs, sensors, web
scraping, etc.
2.Data Preparation: Cleaning and organizing the collected data by handling missing values,
normalizing, encoding categorical variables, and ensuring it's ready for analysis.
3.Data Wrangling: Analyzing and visualizing the data to understand patterns, correlations, and
distributions to gain insights.
4.Data Modeling: Selecting and applying appropriate machine learning algorithms (such as regression,
classification, etc.) based on the problem.
5.Model Training: Training the model
using the prepared dataset, where the
model learns from the data and adjusts its
parameters.
6.Model Testing: Assessing the model's
performance using unseen test data.
Metrics like accuracy, precision, recall,
and F1-score are used to evaluate the
model.
7.Model Deployment: Once the model is fine-tuned and performs well, it is deployed into
production where it can be used to make predictions on new data.
Training
1. Training: Always performed to build the model.
Purpose:
•To enable the model to learn from the data and identify patterns or relationships
Process:
• The model is fed with a dataset (training dataset) that contains input features and their
corresponding outputs (labels in supervised learning).
• Algorithms adjust the model's internal parameters (e.g., weights in neural networks) to minimize
error and improve predictions.
Training
Output:
•A trained model capable of making predictions based on the learned patterns.
Key Points:
•Typically uses 70-80% of the dataset.
•Overfitting (learning noise instead of general patterns) can occur if the model is too complex.
Testing
Purpose:
•To evaluate the model's performance on unseen data and check its generalization ability.
Process:
•The trained model is tested on a testing dataset that it has never seen before.
•Performance metrics like accuracy, precision, recall, F1-score, and mean squared error (MSE) are
computed.
Testing
Output:
•Objective performance metrics that indicate how well the model performs on new data.
Key Points:
•Typically uses 20-30% of the dataset.
•Helps assess how the model might perform in real-world scenarios.
Cross-Validation
Purpose:
•To optimize model performance and prevent overfitting by testing the model across multiple subsets
of the data.
Process:
•The dataset is split into K equal parts (e.g., K=5 for 5-fold cross-validation).
•The model is trained on K-1 parts and tested on the remaining 1 part. This process repeats K times,
with each fold serving as the testing set once.
•The final performance is the average of the metrics across all K iterations
Cross-Validation
Output:
•A more robust estimate of the model's performance across the dataset.
Key Points:
•Cross-validation reduces the risk of overfitting or underfitting.
•Ensures that the model is evaluated on different parts of the data, leading to more reliable results.
Key Differences
THANK YOU
FOR JOINING IN OUR RESEARCH

Machine Learning life cycle training testing cross validation.pptx

  • 1.
    TAE1 FYMCA ‘B’ Academic Year-2024-25 Subject: Machine Learning Topic: Machine Learning life cycle Training Testing and Cross-validation Department of MCA
  • 2.
    Name of Student:Samrudhi Krishnath Yadav Roll No: MCA91 Registration No: 24AMCA1101 Internal Mentor: Pro.Rasika Salunke Name of Student: Sujata Shrimant Gawade Roll No: MCA92 Registration No: 24AMCA1101 Internal Mentor: Pro.Rasika Salunke Name of Student: Suyash Shrikant Shingote Roll No: MCA95 Registration No: 24AMCA1101086 Internal Mentor: Pro.Rasika Salunke
  • 3.
    In the machinelearning life cycle, training, testing, and cross- validation are distinct yet interconnected processes that ensure a model is built effectively and evaluated robustly. 1.Training: Always performed to build the model. 2.Testing: Essential for evaluating how the model performs on unseen data before deployment. 3.Cross-Validation: Recommended during model tuning and hyperparameter optimization to avoid overfitting and improve reliability. Machine Learning Life Cycle: Training vs Testing vs Cross-Validation
  • 5.
    1.Data Collection: Gatheringrelevant data from various sources like databases, APIs, sensors, web scraping, etc. 2.Data Preparation: Cleaning and organizing the collected data by handling missing values, normalizing, encoding categorical variables, and ensuring it's ready for analysis.
  • 6.
    3.Data Wrangling: Analyzingand visualizing the data to understand patterns, correlations, and distributions to gain insights. 4.Data Modeling: Selecting and applying appropriate machine learning algorithms (such as regression, classification, etc.) based on the problem.
  • 7.
    5.Model Training: Trainingthe model using the prepared dataset, where the model learns from the data and adjusts its parameters. 6.Model Testing: Assessing the model's performance using unseen test data. Metrics like accuracy, precision, recall, and F1-score are used to evaluate the model.
  • 8.
    7.Model Deployment: Oncethe model is fine-tuned and performs well, it is deployed into production where it can be used to make predictions on new data.
  • 9.
    Training 1. Training: Alwaysperformed to build the model. Purpose: •To enable the model to learn from the data and identify patterns or relationships Process: • The model is fed with a dataset (training dataset) that contains input features and their corresponding outputs (labels in supervised learning). • Algorithms adjust the model's internal parameters (e.g., weights in neural networks) to minimize error and improve predictions.
  • 10.
    Training Output: •A trained modelcapable of making predictions based on the learned patterns. Key Points: •Typically uses 70-80% of the dataset. •Overfitting (learning noise instead of general patterns) can occur if the model is too complex.
  • 11.
    Testing Purpose: •To evaluate themodel's performance on unseen data and check its generalization ability. Process: •The trained model is tested on a testing dataset that it has never seen before. •Performance metrics like accuracy, precision, recall, F1-score, and mean squared error (MSE) are computed.
  • 12.
    Testing Output: •Objective performance metricsthat indicate how well the model performs on new data. Key Points: •Typically uses 20-30% of the dataset. •Helps assess how the model might perform in real-world scenarios.
  • 13.
    Cross-Validation Purpose: •To optimize modelperformance and prevent overfitting by testing the model across multiple subsets of the data. Process: •The dataset is split into K equal parts (e.g., K=5 for 5-fold cross-validation). •The model is trained on K-1 parts and tested on the remaining 1 part. This process repeats K times, with each fold serving as the testing set once. •The final performance is the average of the metrics across all K iterations
  • 14.
    Cross-Validation Output: •A more robustestimate of the model's performance across the dataset. Key Points: •Cross-validation reduces the risk of overfitting or underfitting. •Ensures that the model is evaluated on different parts of the data, leading to more reliable results.
  • 15.
  • 16.
    THANK YOU FOR JOININGIN OUR RESEARCH