Introduction to Machine Learning Learning Types ML Life Cycle Dataset for ML Data Pre-processing Training versus Testing Cross-Validation

Unit IV Introduction to ML
Prof. Rahul Navale
Department of AI & AIML
GH Raisoni College of Engineering and
Management Pune

Agenda
 Introduction to Machine Learning
 Learning Types
 ML Life Cycle
 Dataset for ML
 Data Pre-processing
 Training versus Testing
 Cross-Validation

What is Machine Learning?
• Definition: In Simple words ML is the study of
computer algorithms. With the help of ML algorithms
we can take some decision or make some
predictions.
• Examples
• Voice assistants
• Product recommendations
• Predictive analytics
• Image recognition

Types of Machine Learning Algorithms
• • Supervised Learning:
– Trained using labeled data.
– Linear Regression: Used for predicting continuous
outcomes.
– It models the relationship between a dependent
variable and one or more independent variables by
fitting a linear equation to observed data.
– Logistic Regression: Used for binary classification
tasks (e.g., predicting yes/no outcomes).
– It estimates probabilities using a logistic function.

• • Unsupervised Learning:
– Dataset without labeled responses.
– Clustering: Algorithms like K-means, hierarchical
clustering, and DBSCAN group a set of objects in
such a way that objects in the same group are
more similar to each other than to those in other
groups.

• • Semi-Supervised Learning:
– Semi-supervised learning is an ML
approach that trains models using a
combination of a small amount of labeled
data and a large amount of unlabeled data.

ML Implementation General Steps

Dataset for ML
• “Data really powers everything that we do.”
— Jeff Weiner.
• https://
towardsdatascience.com/top-sources-for-mac
hine-learning-datasets-bb6d0dc3378b

Data Pre-Processing
• At the heart of Machine Learning is to process data.
1. Importing the required Libraries
2. Importing the data set
3. Handling the Missing Data.
4. Encoding Categorical Data.
5. Splitting the data set into test set and training set.
6. Feature Scaling.

Training Vs. Testing Data
• In machine learning, datasets are typically split into
two subsets: training and testing data.
• The training data is used to train the machine
learning algorithm.
• The testing data is used to evaluate the accuracy of
the trained algorithm.
• Nearly 70% of the whole dataset will be used as a
training set and the remaining 30% will be used as a
validation set.

Cross-Validation
• Cross-validation is a statistical technique used to
assess the performance of a machine learning (ML)
model.
• It involves training and evaluating ML models on
subsets of a dataset, and then repeating the process
with different subsets.
• This helps to ensure that the model is trained and
tested on new data at each step.
• The results of each iteration are averaged to
calculate the cross-validation accuracy.

Introduction to Machine Learning Learning Types ML Life Cycle Dataset for ML Data Pre-processing Training versus Testing Cross-Validation

More Related Content

Similar to Introduction to Machine Learning Learning Types ML Life Cycle Dataset for ML Data Pre-processing Training versus Testing Cross-Validation

Recently uploaded

Introduction to Machine Learning Learning Types ML Life Cycle Dataset for ML Data Pre-processing Training versus Testing Cross-Validation