Machine Learning
Dr Debabrata Swain
Assistant Professor (Senior Grade)
CSE Department
Pandit Deendayal Energy University
Gandhinagar, Gujarat
Machine Learning
• Machine learning is a method of data analysis that automates analytical
model building. It is a branch of artificial intelligence based on the idea
that systems can learn from data, identify patterns and make decisions with
minimal human intervention.
Traditional Computing vs Machine Learning
AI vs ML vs DL vs DS
Types of Learning
• Supervised
• Unsupervised
Supervised Learning
Unsupervised
Different Classification Algorithm
• Logistic Regression
• Knn Algorithm
• Decision Tree
• Random Forest
Logistic Regression
• It is a Machine Learning Algorithm used for
Binary as well as Multiclass Classification.
K-NN Algorithm
• It is a classification algorithm.
• For this algorithm we need to have some
examples with correct groups called reference
record.
• For classifying an unknown point it finds the
distance from all the points and looks for K-
nearest points.
• The class having majority is assigned to the
unknown record.
Name Age Gender Sport
Ajay 32 M Football
Mark 40 M Neither
Sara 16 F Cricket
Zaira 34 F Cricket
Sachin 55 M Neither
Rahul 40 M Cricket
Pooja 20 F Neither
Smith 15 M Cricket
Laxmi 55 F Football
Michael 15 M Football
• Before Training the model we have to pre-
process the data.
• The gender column is carrying discrete data
which is non-numeric.
• We have to convert it to numeric categorical
values Male=0, Female=1.
Name Age Gender Sport
Ajay 32 0 Football
Mark 40 0 Neither
Sara 16 1 Cricket
Zaira 34 1 Cricket
Sachin 55 0 Neither
Rahul 40 0 Cricket
Pooja 20 1 Neither
Smith 15 0 Cricket
Laxmi 55 1 Football
Michael 15 0 Football
• Assume K=3 (Number of Neighbours)
• What will be the class of the test Record?
Name- Anjelina
Age- 5
Gender- Female (1)
• To find the distance between the records distance formula is
used.
• Distance between Ajay and Anjelina=
(5 − 32)2+(1 − 0)2
= 27.02
Name Age Gender Sport Distance
Ajay 32 0 Football 27.02
Mark 40 0 Neither 35.01
Sara 16 1 Cricket 11
Zaira 34 1 Cricket 9
Sachin 55 0 Neither 50.01
Rahul 40 0 Cricket 35.01
Pooja 20 1 Neither 15
Smith 15 0 Cricket 10
Laxmi 55 1 Football 50
Michael 15 0 Football 10.05
The most nearest 3 records are-
• Zaira (9)- Cricket
• Smith (10)- Cricket
• Michael (10.5)- Football
Anjelina will have the class- Cricket (Majority)
Decision Tree
• When dataset is easily separable then we can
use logistic regression to draw decision
boundary for classification.
• But when the data points are not linearly
separable or complex then we can not use
logistic regression for classification.
• In such kind of problems decision tree
algorithm is used to draw different decision
boundaries.
Random Forest
• It is a collection of different decision trees.
• Each decision tree are trained using random
batch of records out of the total records.
• Now a number of classifiers we have. So when
we want to find the class of an unknown
record then all individual classifiers will give
their decision and finally voting algorithm is
used to select the class having majority votes.
IEEE Presentation.pptx
IEEE Presentation.pptx

IEEE Presentation.pptx

  • 1.
    Machine Learning Dr DebabrataSwain Assistant Professor (Senior Grade) CSE Department Pandit Deendayal Energy University Gandhinagar, Gujarat
  • 2.
    Machine Learning • Machinelearning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.
  • 3.
    Traditional Computing vsMachine Learning
  • 4.
    AI vs MLvs DL vs DS
  • 5.
    Types of Learning •Supervised • Unsupervised
  • 6.
  • 7.
  • 10.
    Different Classification Algorithm •Logistic Regression • Knn Algorithm • Decision Tree • Random Forest
  • 11.
    Logistic Regression • Itis a Machine Learning Algorithm used for Binary as well as Multiclass Classification.
  • 17.
    K-NN Algorithm • Itis a classification algorithm. • For this algorithm we need to have some examples with correct groups called reference record. • For classifying an unknown point it finds the distance from all the points and looks for K- nearest points. • The class having majority is assigned to the unknown record.
  • 18.
    Name Age GenderSport Ajay 32 M Football Mark 40 M Neither Sara 16 F Cricket Zaira 34 F Cricket Sachin 55 M Neither Rahul 40 M Cricket Pooja 20 F Neither Smith 15 M Cricket Laxmi 55 F Football Michael 15 M Football
  • 19.
    • Before Trainingthe model we have to pre- process the data. • The gender column is carrying discrete data which is non-numeric. • We have to convert it to numeric categorical values Male=0, Female=1.
  • 20.
    Name Age GenderSport Ajay 32 0 Football Mark 40 0 Neither Sara 16 1 Cricket Zaira 34 1 Cricket Sachin 55 0 Neither Rahul 40 0 Cricket Pooja 20 1 Neither Smith 15 0 Cricket Laxmi 55 1 Football Michael 15 0 Football
  • 21.
    • Assume K=3(Number of Neighbours) • What will be the class of the test Record? Name- Anjelina Age- 5 Gender- Female (1) • To find the distance between the records distance formula is used.
  • 22.
    • Distance betweenAjay and Anjelina= (5 − 32)2+(1 − 0)2 = 27.02
  • 23.
    Name Age GenderSport Distance Ajay 32 0 Football 27.02 Mark 40 0 Neither 35.01 Sara 16 1 Cricket 11 Zaira 34 1 Cricket 9 Sachin 55 0 Neither 50.01 Rahul 40 0 Cricket 35.01 Pooja 20 1 Neither 15 Smith 15 0 Cricket 10 Laxmi 55 1 Football 50 Michael 15 0 Football 10.05
  • 24.
    The most nearest3 records are- • Zaira (9)- Cricket • Smith (10)- Cricket • Michael (10.5)- Football Anjelina will have the class- Cricket (Majority)
  • 25.
    Decision Tree • Whendataset is easily separable then we can use logistic regression to draw decision boundary for classification.
  • 26.
    • But whenthe data points are not linearly separable or complex then we can not use logistic regression for classification.
  • 28.
    • In suchkind of problems decision tree algorithm is used to draw different decision boundaries.
  • 32.
    Random Forest • Itis a collection of different decision trees. • Each decision tree are trained using random batch of records out of the total records. • Now a number of classifiers we have. So when we want to find the class of an unknown record then all individual classifiers will give their decision and finally voting algorithm is used to select the class having majority votes.