WIIRTUU LEENJII KOMPIYUTERAA OITI
OITI COMPUTER TRAINING CENTER
ኦ፥አይ፥ቲ፥አይ የኮምፒውተር ማሰልጠኛ ማዕከል
Advanced Data Science (Supervised Learning) Handout
May 16, 2024
Burayu, Ethiopia
By: Tariku Endale (MSc)
7/9/2024 Prepared by: Tariku Endale (MSc) 1
Machine Learning (ML)
Supervised Machine Learning
Semi-Supervised Machine
Learning
Unsupervised Machine Learning
Reinforcement Learning
7/9/2024 Prepared by: Tariku Endale (MSc) 2
Definition by Tom Mitchell (1998): Machine Learning is the study of algorithms that
 improve their performance P
 at some task T
 with experience E. A well-defined learning task is given by <P,T,E>.
Supervised learning is a paradigm in machine learning where input objects and a desired output
value train a model. The training data is processed, building a function that maps new data on
expected output values(https://en.wikipedia.org/wiki/Supervised_learning).
Unsupervised learning in artificial intelligence is a type of machine learning that learns from
data without human supervision. (https://en.wikipedia.org/wiki/Unsupervised_learning).
Semi-supervised learning is a branch of machine learning that combines supervised and unsupervised
learning by using both labeled and unlabeled data to train artificial intelligence (AI) models for classification
and regression tasks. https://www.ibm.com/topics/semi-supervised-learning
Reinforcement learning (RL) is a machine learning (ML) technique that trains software
to make decisions to achieve the most optimal results.
It mimics the trial-and-error learning process that humans use to achieve their goals.
Https://aws.amazon.com
Supervised Learning (ML)
Uses Labeled Datasets to train
algorithms
Build an Artificial system that can
learn from the mapping of input and
output
Predict the output when a new input
is given.
The nature of the labeled data should
be compatible to achieve the goal.
Use Classification and Regression
7/9/2024 Prepared by: Tariku Endale (MSc) 3
Supervised Learning (Classification & Regression))
Classification is a supervised machine learning
method where the model tries to predict the correct
label of a given input data. In classification, the model
is fully trained using the training data, and then it is
evaluated on test data before being used to perform
prediction on new unseen data (Source:
https://www.datacamp.com/blog/classification-machine-learning ).
Regression: is a supervised machine learning
technique which is used to predict continuous values.
The ultimate goal of the regression algorithm is to plot
a best-fit line or a curve between the data. The three
main metrics that are used for evaluating the trained
regression model are variance, bias and error.
7/9/2024 Prepared by: Tariku Endale (MSc) 4
Type Output Type Problem Nature
Classification predicts a
categorical value
used to separate
data into classes
Regression predicts a
continuous value
used to predict a
value
Classification Code (OITI Lab Practice)
Simple Classification
Import Packages (Libraries)
Define and Load Dataset
Create a Model, Train and Evaluate it
Get the Trained Data
Compare the actual and Predicted
Data and write a Conclusion
Regression Model (SL OITI Lab Practice)
Import all Packages
Load Dataset
Read Dataframe
Understand the column list
Regression Model (Cont’d)
Describe the dataframe
Check Null Values
No column has “NULL” value
Regression Model (Cont’d)
Drop Columns that you don’t need while
Processing and start cleaning
Check Null Values again
No column has “NULL” value yet
Find an Outliers and try to figureout using histogram so that
Any separated distributions are an outliers.
E.g. at 0,0 we have an outlier and it should be either omitted or
minimized
Outlier with Scatter
Regression Model (Cont’d)
Data Cleaning from null value and assign
on different data frame variable
Check Null Values again and Check whether all the rows have equal
rows with data. All columns equal rows. 768
Apply EDA to check whether the outliers are collected together and
Describe it again
Regression Model (Cont’d)
Remove or Isolate the outlier from the dataset we are expecting to use as
train and test dataset.
Import the required packages to start model building:
Here we have used:
 DecisionTreeRegression
 Linear Regression
 XGBRegression and we will select the highest score.
Regression Model (Cont’d)
Install packages immediately if they are not
accessible
Split Train and Test Dataset
Regression Model (Cont’d)
Score of our model with DecisionTreeRegressor is: 0.9998
Score of our model with Linear Regression is: 1.0000
Score of our model with XGBRegressor is: 0.9995
7/9/2024 Prepared by: Tariku Endale (MSc) 13
THANK YOU!

Supervised Learning (Data Science).pptx

  • 1.
    WIIRTUU LEENJII KOMPIYUTERAAOITI OITI COMPUTER TRAINING CENTER ኦ፥አይ፥ቲ፥አይ የኮምፒውተር ማሰልጠኛ ማዕከል Advanced Data Science (Supervised Learning) Handout May 16, 2024 Burayu, Ethiopia By: Tariku Endale (MSc) 7/9/2024 Prepared by: Tariku Endale (MSc) 1
  • 2.
    Machine Learning (ML) SupervisedMachine Learning Semi-Supervised Machine Learning Unsupervised Machine Learning Reinforcement Learning 7/9/2024 Prepared by: Tariku Endale (MSc) 2 Definition by Tom Mitchell (1998): Machine Learning is the study of algorithms that  improve their performance P  at some task T  with experience E. A well-defined learning task is given by <P,T,E>. Supervised learning is a paradigm in machine learning where input objects and a desired output value train a model. The training data is processed, building a function that maps new data on expected output values(https://en.wikipedia.org/wiki/Supervised_learning). Unsupervised learning in artificial intelligence is a type of machine learning that learns from data without human supervision. (https://en.wikipedia.org/wiki/Unsupervised_learning). Semi-supervised learning is a branch of machine learning that combines supervised and unsupervised learning by using both labeled and unlabeled data to train artificial intelligence (AI) models for classification and regression tasks. https://www.ibm.com/topics/semi-supervised-learning Reinforcement learning (RL) is a machine learning (ML) technique that trains software to make decisions to achieve the most optimal results. It mimics the trial-and-error learning process that humans use to achieve their goals. Https://aws.amazon.com
  • 3.
    Supervised Learning (ML) UsesLabeled Datasets to train algorithms Build an Artificial system that can learn from the mapping of input and output Predict the output when a new input is given. The nature of the labeled data should be compatible to achieve the goal. Use Classification and Regression 7/9/2024 Prepared by: Tariku Endale (MSc) 3
  • 4.
    Supervised Learning (Classification& Regression)) Classification is a supervised machine learning method where the model tries to predict the correct label of a given input data. In classification, the model is fully trained using the training data, and then it is evaluated on test data before being used to perform prediction on new unseen data (Source: https://www.datacamp.com/blog/classification-machine-learning ). Regression: is a supervised machine learning technique which is used to predict continuous values. The ultimate goal of the regression algorithm is to plot a best-fit line or a curve between the data. The three main metrics that are used for evaluating the trained regression model are variance, bias and error. 7/9/2024 Prepared by: Tariku Endale (MSc) 4 Type Output Type Problem Nature Classification predicts a categorical value used to separate data into classes Regression predicts a continuous value used to predict a value
  • 5.
    Classification Code (OITILab Practice) Simple Classification Import Packages (Libraries) Define and Load Dataset Create a Model, Train and Evaluate it Get the Trained Data Compare the actual and Predicted Data and write a Conclusion
  • 6.
    Regression Model (SLOITI Lab Practice) Import all Packages Load Dataset Read Dataframe Understand the column list
  • 7.
    Regression Model (Cont’d) Describethe dataframe Check Null Values No column has “NULL” value
  • 8.
    Regression Model (Cont’d) DropColumns that you don’t need while Processing and start cleaning Check Null Values again No column has “NULL” value yet Find an Outliers and try to figureout using histogram so that Any separated distributions are an outliers. E.g. at 0,0 we have an outlier and it should be either omitted or minimized Outlier with Scatter
  • 9.
    Regression Model (Cont’d) DataCleaning from null value and assign on different data frame variable Check Null Values again and Check whether all the rows have equal rows with data. All columns equal rows. 768 Apply EDA to check whether the outliers are collected together and Describe it again
  • 10.
    Regression Model (Cont’d) Removeor Isolate the outlier from the dataset we are expecting to use as train and test dataset. Import the required packages to start model building: Here we have used:  DecisionTreeRegression  Linear Regression  XGBRegression and we will select the highest score.
  • 11.
    Regression Model (Cont’d) Installpackages immediately if they are not accessible Split Train and Test Dataset
  • 12.
    Regression Model (Cont’d) Scoreof our model with DecisionTreeRegressor is: 0.9998 Score of our model with Linear Regression is: 1.0000 Score of our model with XGBRegressor is: 0.9995
  • 13.
    7/9/2024 Prepared by:Tariku Endale (MSc) 13 THANK YOU!