Supervised Learning (Data Science).pptx

WIIRTUU LEENJII KOMPIYUTERAA OITI
OITI COMPUTER TRAINING CENTER
ኦ፥አይ፥ቲ፥አይ የኮምፒውተር ማሰልጠኛ ማዕከል
Advanced Data Science (Supervised Learning) Handout
May 16, 2024
Burayu, Ethiopia
By: Tariku Endale (MSc)
7/9/2024 Prepared by: Tariku Endale (MSc) 1

Machine Learning (ML)
Supervised Machine Learning
Semi-Supervised Machine
Learning
Unsupervised Machine Learning
Reinforcement Learning
Definition by Tom Mitchell (1998): Machine Learning is the study of algorithms that
 improve their performance P
 at some task T
 with experience E. A well-defined learning task is given by <P,T,E>.
Supervised learning is a paradigm in machine learning where input objects and a desired output
value train a model. The training data is processed, building a function that maps new data on
expected output values(https://en.wikipedia.org/wiki/Supervised_learning).
Unsupervised learning in artificial intelligence is a type of machine learning that learns from
data without human supervision. (https://en.wikipedia.org/wiki/Unsupervised_learning).
Semi-supervised learning is a branch of machine learning that combines supervised and unsupervised
learning by using both labeled and unlabeled data to train artificial intelligence (AI) models for classification
and regression tasks. https://www.ibm.com/topics/semi-supervised-learning
Reinforcement learning (RL) is a machine learning (ML) technique that trains software
to make decisions to achieve the most optimal results.
It mimics the trial-and-error learning process that humans use to achieve their goals.
Https://aws.amazon.com

Supervised Learning (ML)
Uses Labeled Datasets to train
algorithms
Build an Artificial system that can
learn from the mapping of input and
output
Predict the output when a new input
is given.
The nature of the labeled data should
be compatible to achieve the goal.
Use Classification and Regression

Supervised Learning (Classification & Regression))
Classification is a supervised machine learning
method where the model tries to predict the correct
label of a given input data. In classification, the model
is fully trained using the training data, and then it is
evaluated on test data before being used to perform
prediction on new unseen data (Source:
https://www.datacamp.com/blog/classification-machine-learning ).
Regression: is a supervised machine learning
technique which is used to predict continuous values.
The ultimate goal of the regression algorithm is to plot
a best-fit line or a curve between the data. The three
main metrics that are used for evaluating the trained
regression model are variance, bias and error.
Type Output Type Problem Nature
Classification predicts a
categorical value
used to separate
data into classes
Regression predicts a
continuous value
used to predict a
value

Classification Code (OITI Lab Practice)
Simple Classification
Import Packages (Libraries)
Define and Load Dataset
Create a Model, Train and Evaluate it
Get the Trained Data
Compare the actual and Predicted
Data and write a Conclusion

Regression Model (SL OITI Lab Practice)
Import all Packages
Load Dataset
Read Dataframe
Understand the column list

Regression Model (Cont’d)
Describe the dataframe
Check Null Values
No column has “NULL” value

Drop Columns that you don’t need while
Processing and start cleaning
Check Null Values again
No column has “NULL” value yet
Find an Outliers and try to figureout using histogram so that
Any separated distributions are an outliers.
E.g. at 0,0 we have an outlier and it should be either omitted or
minimized
Outlier with Scatter

Data Cleaning from null value and assign
on different data frame variable
Check Null Values again and Check whether all the rows have equal
rows with data. All columns equal rows. 768
Apply EDA to check whether the outliers are collected together and
Describe it again

Remove or Isolate the outlier from the dataset we are expecting to use as
train and test dataset.
Import the required packages to start model building:
Here we have used:
 DecisionTreeRegression
 Linear Regression
 XGBRegression and we will select the highest score.

Install packages immediately if they are not
accessible
Split Train and Test Dataset

Score of our model with DecisionTreeRegressor is: 0.9998
Score of our model with Linear Regression is: 1.0000
Score of our model with XGBRegressor is: 0.9995

THANK YOU!

Supervised Learning (Data Science).pptx

More Related Content

Similar to Supervised Learning (Data Science).pptx

Recently uploaded

Supervised Learning (Data Science).pptx