Machine learning ( Part 2 )

www.SunilOS.com 1
Supervised Learning
www.sunilos.com
www.raystec.com

www.SunilOS.com 2
What is Machine Learning?
❑ Human Learns from past experience.
❑ A computer does not have “experiences”.
❑ A computer system learns from data,
❑ Which represent some “past experiences” of an application
domain.
❑ Our focus: learn a target function that can be used to predict the
values of a class attribute, e.g. a loan application is, approve or
not-approved, and high-risk or low risk.
❑ The task is commonly called: Supervised learning, classification,
or inductive learning.

Types of Learning
❑Supervised Learning
o Classification
o Regression
❑Unsupervised Learning
o Clustering
❑Reinforcement Learning
www.SunilOS.com 3

Types of supervised Learning
❑Classification:
o A classification problem
is when the output
variable is a category,
such as “red” or “blue” or
“disease” and “no
disease”.
❑Regression:
o A regression problem is
when the output variable
is a real value, such as
“dollars” or “weight”.
www.SunilOS.com 4

Supervised Learning Process
❑Learning(training):
o Learn the model with known data
❑Testing:
o test the Model with unseen data
❑Accuracy:
❑ No of right classification/Total no of test case
www.SunilOS.com 5
Training
data
Learning algorithm Model AccuracyTraining
Data
Step1: Training Step2: Testing
Testing
Data

Classification example
❑ A loan providing company receives thousands of applications
for new loans.
❑ Each application contains information about an applicant
o Age
o Marital status
o annual salary
o Outstanding debts
o credit rating
o etc.
❑ Problem: to decide whether an application should approved, or
to classify applications into two categories, approved and not
approved.
www.SunilOS.com 6

An example
❑Data: Loan application data
❑Task: Predict whether a loan should be approved
or not.
❑Performance measure: Accuracy.
❑No learning: classify all future applications (test
data) to the majority class (i.e., Yes):
o Accuracy = 9/15 = 60%.
❑We can do better than 60% with learning.
www.SunilOS.com 9

Evaluating classification methods
❑Predictive accuracy
o Accuracy=No of correct classification / total no of test Case
❑Efficiency
o time to construct the model
o time to use the model
www.SunilOS.com 10

Conclusion
❑ Applications of supervised learning are in almost any field or
domain.
❑ There are numerous classification techniques.
o Bayesian networks
o K- Nearest Neighbors
o Decision Tree Classification
o Fuzzy classification
❑ This large number of methods also show the importance of
classification and its wide applicability.
❑ It remains to be an active research area.
www.SunilOS.com 11

www.SunilOS.com 12
Classification
www.sunilos.com
www.raystec.com
4/16/2020 www.SunilOS.com 12

www.SunilOS.com 13
What is Classification?
 Classification is a supervised machine learning
approach.
 Computer uses Training data for learning and uses this
learning to classify new observations.
 Classification can be:
 Binary class classification : spam or not spam, male or
female Multiclass classification: Fruits, Colors.

Types of classification algorithm
❑Linear Classifiers: Logistic Regression, Naive
Bayes Classifier
❑K Nearest Neighbor
❑Support Vector Machines
❑Decision Trees
❑Random Forest

K-Nearest Neighbor
❑ The k-nearest-neighbors algorithm is a supervised
classification technique that based on similar qualities.
❑ KNN assumes, similar things exist near to each other.
❑ The algorithm takes a bunch of labeled points and uses them
to learn how to label other points.
❑ To label a new point, it looks at the labeled points closest to
that new point (those are its nearest neighbors).
❑ Closeness is typically expressed in terms of a dissimilarity
function.
❑ Once it checks with ‘k’ number of nearest neighbors, it
assigns a label based on whichever label the most of the
neighbors have.
www.SunilOS.com4/16/2020 15

KNN working Steps
❑Calculate distance for new test data with old labeled data
❑Find closest neighbors for new test data.
❑Vote for labels which is nearest.

KNN algorithm Implementation
❑Define dataset.
❑Prepare data.
❑Train model.
❑Test Model.
❑Calculate accuracy.

Dataset
❑Let's first create your own dataset. Here you
need two kinds of attributes or columns in your
data: Feature and target label. The reason for two
type of column is "supervised nature of KNN
algorithm".
❑In this dataset, you have two features (weather
and temperature) and one label(play).

Define dataset
Weather Temp Play
Sunny Hot No
Sunny Hot Yes
Overcast Hot Yes
Rainy Mild Yes
Rainy Cool No
Rainy Cool Yes
Overcast Cool No
Sunny Mild Yes
Sunny Cool Yes
Rainy Mild Yes
Sunny Mild Yes
Overcast Mild Yes
Overcast Hot Yes
Rainy Mild No4/16/2020 www.SunilOS.com 19

Sample data

Code implementation in scikit learn
❑ # Assigning features and label variables
❑ # First Feature
❑ weather=['Sunny','Sunny','Overcast','Rainy','Rainy',
'Rainy','Overcast','Sunny','Sunny',
❑ 'Rainy','Sunny','Overcast','Overcast','Rainy']
❑ # Second Feature
❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool',
'Mild','Cool','Mild','Mild','Mild','Hot','Mild']
❑
❑ # Label or target variable
❑ play=['No','No','Yes','Yes','Yes','No','Yes','No','Y
es','Yes','Yes','Yes','Yes','No']

Code implementation in scikit learn(cont.)
❑ # Import Label Encoder
❑ from sklearn import preprocessing
❑ #creating label Encoder
❑ le = preprocessing.LabelEncoder()
❑ # Converting string labels into numbers.
❑ weather_encoded=le.fit_transform(weather)
❑ print(weather_encoded)
❑
❑ # converting string labels into numbers
❑ temp_encoded=le.fit_transform(temp)
❑ label=le.fit_transform(play)
❑ print(label)

Code implementation in scikit learn(cont.)
❑ #combining weather and temp into single list of tuples
❑ features=list(zip(weather_encoded,temp_encoded))
❑ print(features)
❑ #Prepare Model instance
❑ from sklearn.neighbors import KNeighborsClassifier
❑ model = KNeighborsClassifier(n_neighbors=3)
❑ # Train the model using the training sets
❑ model.fit(features,label)
❑ #Predict Output
❑ predicted= model.predict([[0,2]]) # 0:Overcast, 2:Mild
❑ print(predicted)

Advantage of KNN
❑It is extremely easy to implement
❑This makes the KNN algorithm much faster than other
algorithms that require training e.g. SVM, Linear
Regression etc.
❑Since the algorithm requires no training before making
predictions, new data can be added seamlessly.
❑There are only two parameters required to implement
KNN i.e. the value of K and the distance function (e.g.
Euclidean or Manhattan etc.)

Disadvantages of KNN
❑ The KNN algorithm doesn't work well with high dimensional
data because with large number of dimensions, it becomes
difficult for the algorithm to calculate distance in each
dimension.
❑ The KNN algorithm has a high prediction cost for large
datasets. This is because in large datasets the cost of
calculating distance between new point and each existing
point becomes higher.
❑ Finally, the KNN algorithm doesn't work well with categorical
features since it is difficult to find the distance between
dimensions with categorical features.

Naive Bayes Classification Base
❑It uses Bayes theorem of probability for prediction of
unknown class/Label.
❑Naive Bayes classifier assumes that the effect of a
particular feature in a class is independent of other
features.
o For example, a loan applicant is desirable or not depending on his/her
income, previous loan and transaction history, age, and location.
o Even if these features are interdependent, these features are still
considered independently.
o This assumption simplifies computation, and that's why it is considered as
naive
www.SunilOS.com 26

Approve a Loan
❑ Bank has received a loan application and now we want to predict whether
bank will approve or not.
❑ Approval will be decide on the basis of independent attributes specified in the
application form.
❑ Income, previous loan, transaction history, age, and location information
specified in application form are considered as independent attribute.
❑ Now we will calculate separate probability:
❑ probability of approval or rejection of loan on income,
❑ probability of approval or rejection of loan on previous loan,
❑ probability of approval or rejection of loan on age,
❑ probability of approval or rejection of loan on location,
❑ Naive Bayes will help us to multiply above probabilities and forecast approval
and rejection of new loan application.
www.SunilOS.com 27

Naïve Bayes Classification Base (cont.)
❑ Where,
❑ P(c|x) is the posterior probability of class c given predictor ( features).
❑ P(c) is the probability of class.
❑ P(x|c) is the likelihood which is the probability of predictor given class.
❑ P(x) is the prior probability of predictor.
www.SunilOS.com 28

Types of Naive Bayes Algorithm
❑Gaussian Naive Bayes.
❑Multinomial Naive Bayes.
❑Bernoulli Naïve Bayes.
❑P(A|B)=P(B|A)*P(A)
❑ -----------------
❑ P(B)
www.SunilOS.com 29

How Gaussian Naive Bayes classifier works?
❑Given an example of weather conditions and
playing sports.
❑You need to calculate the probability of playing
sports.
❑Now, you need to classify whether players will
play or not, based on the weather condition.
www.SunilOS.com 30

How Naive Bayes classifier works? (cont.)
❑ Naive Bayes classifier calculates the probability of an event in
the following steps:
❑ Calculate the prior probability for given class labels
o p(play)
o P(not play).
❑ Find Likelihood probability with each attribute for each class.
o P(Hot/play) or p(Hot/not play)
o P(Cold/play) p(Cold/not play)
❑ Put these value in Bayes Formula and calculate posterior
probability.
❑ See which class has a higher probability, given the input
belongs to the higher probability class.
www.SunilOS.com 31

Dataset
Weather Play
Sunny No
Sunny Yes
Overcast Yes
Rainy Yes
Rainy No
Rainy Yes
Overcast No
Sunny Yes
Sunny Yes
Rainy Yes
Sunny Yes
Overcast Yes
Overcast Yes
Rainy No
www.SunilOS.com 32

Frequency Table
Weather No Yes
Sunny 1 4 5
Overcast 1 3 4
Rainy 2 3 5
Total 4 10
www.SunilOS.com 33

Prior Probability of class
Weather No Yes
Sunny 1 4 5 5/14=0.35
Overcast 1 3 4 4/14=0.29
Rainy 2 3 5 5/14=0.35
Total 4 10
4/14=0.29 10/14=0.71
www.SunilOS.com 34

Posterior Probability
Weather No Yes Posterior
probability
of No
Posterior
Probability of
Yes
Sunny 1 4 1/4= 0.25 4/10=0.4
Overcast 1 3 1/4= 0.25 3/10=0.3
Rainy 2 3 2/4 =0.5 3/10=0.3
Total 4 10
4/14=0.29 10/14=0.71
www.SunilOS.com 35

Probability of playing when weather is overcast
❑ Equation:
o P(Yes|Overcast)=P(Overcast|Yes)*P(Yes)/P(Overcast)
❑ Calculate Prior Probabilities:
o P(Overcast) = 4/14 = 0.29
o P(Yes)= 10/14 = 0.71
❑ Calculate Posterior Probabilities:
o P(Overcast |Yes) = 3/10 = 0.3
❑ Put Prior and Posterior probabilities in equation
o P (Yes | Overcast) = 0.3 * 0.71 / 0.29 =
0.7344(Higher)
www.SunilOS.com 36

Probability of not playing when weather is overcast
❑ Equation:
o P(No|Overcast)=P(Overcast|No)*P(No)/P(Overcast)
❑ Calculate Prior Probabilities:
o P(Overcast) = 4/14 = 0.29
o P(No)= 4/14 = 0.29
❑ Calculate Posterior Probabilities:
o P(Overcast |No) = 1/4 = 0.25
❑ Put Prior and Posterior probabilities in equation
o P (No | Overcast) = 0.25 * 0.29 / 0.29 =
0.25(Low)
www.SunilOS.com 37

Implementation of Naive Bayes algorithm:
❑ weather=['Sunny','Sunny','Overcast','Rainy','Ra
iny','Rainy','Overcast','Sunny','Sunny','Rainy'
,'Sunny','Overcast','Overcast','Rainy']
❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','C
ool','Mild','Cool','Mild','Mild','Mild','Hot','
Mild']
❑ play=['No','No','Yes','Yes','Yes','No','Yes','N
o','Yes','Yes','Yes','Yes','Yes','No']
www.SunilOS.com 38

Implementation of Naive Bayes algorithm (cont.)
❑ # Import LabelEncoder
o from sklearn import preprocessing
❑ #creating labelEncoder
o le = preprocessing.LabelEncoder()
o weather_encoded=le.fit_transform(weather)
o print("Weather:",weather_encoded)
❑ # Converting string labels into numbers
o temp_encoded=le.fit_transform(temp)
o print("Temp:",temp_encoded)
o label=le.fit_transform(play)
o print("Play:",label)
www.SunilOS.com 39

Implementation of Naive Bayes algorithm (cont.)
❑ #Combining weather and temp into single list of tuples
o features=list(zip(weather_encoded,temp_encoded))
o print("Features:",features)
❑ #Import Gaussian Naive Bayes model
o from sklearn.naive_bayes import GaussianNB
❑ #Create a Gaussian Classifier
o model = GaussianNB()
❑ # Train the model using the training sets
o model.fit(features,label)
❑#Predict Output: 0:Overcast, 2:Mild
o predicted= model.predict([[0,2]])
o print ("Predicted Value:", predicted)
www.SunilOS.com 40

Multinomial Naive Bayes algorithm:
❑This machine learning algorithm is used for text
data classification.
❑If we are interested in finding out a number of
occurrences of a word in a document then we have
to use a multinomial naive Bayes algorithm.
www.SunilOS.com 41

How does Naive Bayes Algorithm Works ?
❑ Let’s consider an example, classify the review whether it is
positive or negative.
❑ Training Dataset:
www.SunilOS.com 42
Text Reviews
I like the movie Positive
It's a good movie. Nice Story Positive
Nice songs. But sadly a boring
ending.
negative
Overall nice movie Positive
Sad, boring movie negative

❑ We classify whether the text “overall liked the movie” has a
positive review or a negative review. We have to calculate:
❑ P(positive | overall liked the movie) — the probability that
the tag of a sentence is positive.
❑ P(negative | overall liked the movie) — the probability that
the tag of a sentence is negative .
❑ Before that, first, we apply Removing Stopwords and
Stemming in the text.
www.SunilOS.com 43

Removing Stopwords & Stemming
❑ Removing Stopwords: These are common words that don’t
really add anything to the classification, such as an able,
either, else, ever and so on.
❑
❑ Stemming: Stemming to take out the root of the word. A
stemming algorithm reduces the words
o “chocolates”, “chocolaty”, “Choco” to the root word, “chocolate”
o and “retrieval”, “retrieved”, “retrieves” reduce to the stem “retrieve”.
www.SunilOS.com 44

Feature Engineering:
❑The important part is to find the features from the data
to make machine learning algorithms works.
❑ In this case, we have text. We need to convert this text
into numbers that we can do calculations on.
❑ We use word frequencies. That is treating every
document as a set of the words it contains.
❑Our features will be the counts of each words.
www.SunilOS.com 45

Now Calculate Probability
❑ In our case, we have
o P(positive | overall liked the movie)
❑ Since for our classifier we have to find out which tag has a
bigger probability, we can discard the divisor which is the same
for both tags,
o P(overall liked the movie|positive)* P(positive)
o P(overall liked the movie|negative)* P(negative)
www.SunilOS.com 46

❑ There’s a problem though: “overall liked the movie” doesn’t
appear in our training dataset, so the probability is zero. Here, we
assume the ‘naive’ condition that every word in a sentence is
independent of the other ones. This means that now we look at
individual words.
❑ We can write this as:
o P(overall liked the movie) = P(overall) * P(liked) * P(the) * P(movie)
❑ The next step is just applying the Bayes theorem:
o P(overall liked the movie| positive) = P(overall | positive) * P(liked |
positive) * P(the | positive) * P(movie | positive)
❑ And now, these individual words actually show up several times
in our training data, and we can calculate probability of them!
www.SunilOS.com 47

The prior Probability
❑ P(positive) is= 3/5 =0.6.
❑ P(negative) is= 2/5=0.4.
❑ Then, calculating P(overall | positive) means counting how many
times the word “overall” appears in positive texts+1 divided by
the total number of words in positive+ total no of unique words
in all reviews.
o Total words in positive=13.
o Total words in Negative=10.
o Total Unique words in all=15
www.SunilOS.com 48

Calculated Prior Probability
❑ Therefore,
o P(overall | positive) = (1+1)/(13+15)=0.07142
o P(liked | positive) = (1+1)/(13+15)=0.07142
o P(the | positive) = (1+1)/(13+15)=0.07142
o P(movie | positive) = (3+1)/(13+15)=0.1428
❑ Therefore,
o P(overall | negative) = (0+1)/(10+15)=0.04
o P(liked | negative) = (0+1)/(10+15)=0.04
o P(the | negative) = (0+1)/(10+15)=0.04
o P(movie| negative) = (1+1)/(10+15)=0.08
www.SunilOS.com 49

Laplace smoothing
❑If probability comes out to be zero then By using
Laplace smoothing:
❑we add 1 to every count so it’s never zero. To balance
this, we add the number of possible words to the
divisor, so the division will never be greater than 1.
❑In our case, the total unique possible words count are
15.
www.SunilOS.com 50

Calculate Prior Probability
www.SunilOS.com 51

Result: Positive Review
❑ P(overall | positive) * P(liked |positive)
* P(the | positive) * P(movie | positive)
* P(positive )= 3.06 * 10^{-5}=0.0000306
❑ P(overall | negative) * P(liked |negative)
* P(the | negative) * P(movie | negative)
* P(negative) = 0.20 * 10^{-5}=0.000002048
www.SunilOS.com 52

Implementation of Multinomial Naive Bayes algorithm:
❑Multinomial implements the naive Bayes algorithm for
multinomially (discrete no of possible outcome)
distributed data,
❑and is one of the two classic naive Bayes variants used
in text classification (where the data are typically
represented as word vector counts).
www.SunilOS.com 53

Implementation of Multinomial Naive Bayes algorithm:
o import numpy as np
o reviews=np.array(['I like the movie',
o 'Its a good movie. Nice Story',
o 'Nice songs. But sadly a boring ending.',
o 'Overall nice movie',
o 'Sad, boring movie'])
o label=["positive","positive","negative","positive
","negative"]
o test=np.array(["Overall i like the movie"])
www.SunilOS.com 54

Implementation of Multinomial Naive Bayes algorithm (cont.)
❑ #encode text data into numeric
o lable_encoded=le.fit_transform(label)
o print("Label:",lable_encoded)
www.SunilOS.com 55

❑ # Generate counts from text using a vectorizer. There are other
vectorizers available, and lots of options you can set.
❑ # This performs our step of computing word counts.
o from sklearn.feature_extraction.text import
CountVectorizer
o vectorizer=CountVectorizer(stop_words='english')
o train_features =vectorizer.fit_transform(reviews)
o test_features = vectorizer.transform(test)
o print("Train vocabulary:",vectorizer.vocabulary_)
❑ #Print Dimension of the training and test data
o print("Shape of Train:",train_features.shape)
o print("Shape of Train:",test_features.shape)
www.SunilOS.com 56

❑ # Fit a naive Bayes model to the training data.
❑ # This will train the model using the word counts we computer,
and the existing classifications in the training set.
o nb = MultinomialNB()
o nb.fit(train_features,lable_encoded)
❑
❑ # Now we can use the model to predict classifications for our test
features.
o predictions = nb.predict(test_features)
o print(predictions)
www.SunilOS.com 57

Bernoulli Naive Bayes:
❑ BernoulliNB implements the naive Bayes training and
classification algorithms for data that is distributed according to
multivariate Bernoulli distributions;
o i.e., there may be multiple features but each one is assumed to be a
binary-valued (boolean) variable.
❑ Therefore, this class requires samples to be represented as
binary-valued feature vectors;
❑ if handed any other kind of data, a BernoulliNB instance may
binarize its input (depending on the binarize parameter).
www.SunilOS.com 58

for a Bernoulli trial
❑ a random experiment that has only two outcomes
o usually called a “Success” or a “Failure”.
❑ For example, the probability of getting a heads (a “success”)
while flipping a coin is 0.5.
❑ The probability of “failure” is 1 – P (1 minus the probability of
success, which also equals 0.5 for a coin toss).
❑ It is a special case of the binomial distribution for n = 1. In other
words, it is a binomial distribution with a single trial (e.g. a
single coin toss).
www.SunilOS.com 59

Implementation of Bernoulli Naive Bayes algorithm (cont.)
o import numpy as np
o document=np.array(["Saturn Dealer’s Car",
o "Toyota Car Tercel",
o "Baseball Game Play",
o "Pulled Muscle Game",
o "Colored GIFs Root"])
o label=np.array(["Auto","Auto","Sports","Sports","
Computer"])
o test=np.array(["Home Runs Game","Car Engine
Noises"])
www.SunilOS.com 60

❑ #Import preprocessing
o lable_encoded=le.fit_transform(label)
o print("Label:",lable_encoded)
www.SunilOS.com 61

❑ # Generate counts from text using a vectorizer. There are other
vectorizers available, and lots of options you can set.
❑ # This performs our step of computing word Occurrence counts.
o vectorizer=CountVectorizer(stop_words='english',b
inary=True)
o train_features =
vectorizer.fit_transform(document)
o test_features = vectorizer.transform(test)
o print("Train vocabulary:",vectorizer.vocabulary_)
❑ #Print dimention of the Trainning and Ttest data
o print("Shape of Train:",train_features.shape)
o print("Shape of Train:",test_features.shape)
www.SunilOS.com 62

❑ # Fit a naive Bayes model to the training data.
❑ # This will train the model using the word occurrence counts we
compute, in the existing classifications in the training set.
o nb=BernoulliNB()
o nb.fit(train_features,lable_encoded)
❑
❑ # Now we can use the model to predict classifications for our test
features.
o predictions = nb.predict(test_features)
o print("Prediction:",predictions)
www.SunilOS.com 63

Advantages Of Naïve Bayes
❑ It is Simple, Fast and accurate.
❑ It has very low computation cost.
❑ It can efficiently work on a large dataset.
❑ It can be used with multiple class prediction problems.
❑ It also performs well in the case of text analytics problems.
❑ When the assumption of independence holds, a Naive Bayes
classifier performs better compared to other models like logistic
regression.
www.SunilOS.com 64

Disadvantages of naive Bayes
❑ The assumption of independent features. In practice, it is
almost impossible that model will get a set of predictors which
are entirely independent.
❑ If there is no training tuple of a particular class, this causes
zero posterior probability.
❑ In this case, the model is unable to make predictions. This
problem is known as Zero Probability/Frequency Problem.
www.SunilOS.com 65

www.SunilOS.com 66
Decision Tree
www.sunilos.com
www.raystec.com

Decision tree
www.SunilOS.com 67

What Is Decision Tree?
❑ Decision Tree is a supervised learning algorithm.
❑ It is a tree Like structure for classification and regression Model.
❑ Decision trees can be used for both categorical and numerical
data.
o The categorical data represent: gender, marital status, etc.
o while the numerical data represent age, temperature, etc.
❑ A decision tree is a tree
❑ where each node represents
o a feature (attribute),
❑ each link (branch) represents
o a decision (rule) and
❑ each leaf represents an
o outcome (categorical or continues value).
www.SunilOS.com 68

Reason to choose Decision Tree
❑Decision Trees usually represents human
thinking ability while making a decision, so it is
easy to understand.
❑The logic behind the decision tree can be easily
understood because it shows a tree-like structure.
www.SunilOS.com 69

Terminologies
❑ Root Node: It is first node of the tree. It represents the entire
dataset, which further gets divided into two or more
homogeneous sets.
❑ Leaf Node: It is final nodes of the tree, and the tree cannot be
further divided after getting a leaf node.
❑ Splitting: Splitting is the process of dividing the decision
node/root node into sub-nodes according to the given conditions.
❑ Branch/Sub Tree: A tree formed by splitting the tree.
❑ Pruning: Pruning is the process of removing the unwanted
branches from the tree.
❑ Parent/Child node: The root node of the tree is called the parent
node, and other nodes are called the child nodes.
www.SunilOS.com 70

How Does A Decision Tree Work?
❑ It splits the dataset into subsets on the basis of the most
significant attribute in the dataset.
❑ How the decision tree identifies this attribute and how this
splitting is done is decided by Attribute selection Measure.
❑ The most significant attribute is selected as the root node.
❑ Splitting is done to form sub-nodes called decision nodes.
❑ And the nodes which do not split further are terminal or leaf
nodes.
www.SunilOS.com 71

Attribute selection measure.
❑ While implementing a Decision tree, the main issue arises
that how to select the best attribute for the root node and for
sub-nodes.
❑ So, to solve such problems there is a technique which is
called as Attribute selection measure or ASM.
❑ There are two popular techniques for ASM, which are:
o Information Gain
o Gini Index
www.SunilOS.com 72

Information Gain
❑ It calculates how much information a feature provides us about a
class.
❑ According to the value of information gain, we split the node and
build the decision tree.
❑ A node/attribute having the highest information gain is split first.
It can be calculated using the below formula:
o Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)
❑ Entropy:It specifies randomness in data. Entropy can be calculated as:
o Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)Where,
❑ S= Total number of samples
❑ P(yes)= probability of yes
❑ P(no)= probability of no
www.SunilOS.com 73

Gini Index
❑ Gini index is a measure of impurity or purity used while creating
a decision tree in the CART(Classification and Regression Tree)
algorithm.
❑ An attribute with the low Gini index should be preferred as
compared to the high Gini index.
❑ It only creates binary splits, and the CART algorithm uses the
Gini index to create binary splits.
❑ Gini index can be calculated using the below formula:
o Gini Index= 1- ∑jPj
2
www.SunilOS.com 74

Types of decision Trees Algorithms
❑ There are many decision tree algorithms available. Some of
Them are as following
❑ ID3
❑ C4.5
❑ CART
❑ etc.
www.SunilOS.com 75

Advantages & Disadvantages of DT
Advantages
❑ It follows the same process
as human follows in real life
to make decisions.
❑ Easy To Understand.
❑ It can be very useful for
solving decision-related
problems.
❑ It helps to think about all the
possible outcomes for a
problem.
❑ No need of data cleaning.
Disadvantages
❑ The decision tree contains
lots of layers, which makes
it complex.
❑ It may have an overfitting
issue, which can be resolved
using the Random Forest
algorithm.
❑ For more class labels, the
computational complexity of
the decision tree may
increase.
76

Working of CART Algorithm
www.SunilOS.com 77
Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No

Gini index:
❑Gini index is a metric for classification tasks in
CART.
❑It stores sum of squared probabilities of each class.
We can formulate it as illustrated below.
❑Gini = 1 – Σ (Pi)2 for i=1 to number of classes
www.SunilOS.com 78

Select attribute to create Root node
❑ Outlook(weather):Outlook is a nominal feature. It can be sunny, overcast
or rain. The final decisions for outlook feature.
❑ Gini(Outlook=Sunny) = 1 – (2/5)2 – (3/5)2 = 1 – 0.16 – 0.36 = 0.48
❑ Gini(Outlook=Overcast) = 1 – (4/4)2 – (0/4)2 = 0
❑ Gini(Outlook=Rain) = 1 – (3/5)2 – (2/5)2 = 1 – 0.36 – 0.16 = 0.48
❑ Then, we will calculate weighted sum of gini indexes for outlook feature.
❑ Gini(Outlook) = (5/14) x 0.48 + (4/14) x 0 + (5/14) x 0.48
❑ Gini(Outlook)= 0.171 + 0 + 0.171 = 0.342
www.SunilOS.com 79
Outlook Yes No Number of instances
Sunny 2 3 5
Overcast 4 0 4
Rainy 3 2 5

Temperature
❑ Similarly, temperature is a nominal feature and it could have 3 different
values: Cool, Hot and Mild. Let’s summarize decisions for temperature
feature.
❑ Gini(Temp=Hot) = 1 – (2/4)2 – (2/4)2 = 0.5
❑ Gini(Temp=Cool) = 1 – (3/4)2 – (1/4)2 = 1 – 0.5625 – 0.0625 = 0.375
❑ Gini(Temp=Mild) = 1 – (4/6)2 – (2/6)2 = 1 – 0.444 – 0.111 = 0.445
❑ We’ll calculate weighted sum of gini index for temperature feature
❑ Gini(Temp) = (4/14) x 0.5 + (4/14) x 0.375 + (6/14) x 0.445
❑ Gini(Temp)= 0.142 + 0.107 + 0.190 = 0.439
www.SunilOS.com 80
Temperature Yes No Number of
instances
Hot 2 2 4
Cool 3 1 4
Mild 4 2 6

Humidity
❑ Humidity is a binary class feature. It can be high or normal.
❑ Gini(Humidity=High) = 1 – (3/7)2 – (4/7)2 = 1 – 0.1836 – 0.326
❑ Gini(Humidity=High) = 0.48
❑ Gini(Humidity=Normal) = 1 – (6/7)2 – (1/7)2 = 1 – 0.734 – 0.020
❑ Gini(Humidity=High) = 0.244
❑ We’ll calculate weighted sum of gini index for Humidity feature
❑ Gini(Wind) = (7/14) x 0.48 + (7/14) x 0.244 = 0.362
www.SunilOS.com 81
Humidity Yes No Number of
instances
High 3 4 7
Normal 6 1 7

Windy
❑ Wind is a binary class similar to humidity. It can be weak and strong.
❑ Gini(Wind=Weak) = 1 – (6/8)2 – (2/8)2 = 1 – 0.5625 – 0.062
❑ Gini(wind=weak)= 0.375
❑ Gini(Wind=Strong) = 1 – (3/6)2 – (3/6)2 = 1 – 0.25 – 0.25
❑ Gini(Wind=Strong)= 0.5
❑We’ll calculate weighted sum of gini index for wind feature
❑ Gini(Wind) = (8/14) x 0.375 + (6/14) x 0.5
❑ Gini(wind)= 0.428
www.SunilOS.com 82
Wind Yes No Number of
instances
Weak 6 2 8
Strong 3 3 6

To Make decision tree
❑ Choose attribute with Lower Gini Index.
❑ Outlook will be the root node because it has minimum gini index
value. Overcast subset has only yes decisions. That means overcast
leaf is over
❑ We will apply same principles to those sub datasets in the following
steps. Focus on the sub dataset for sunny outlook. We need to find the
gini index scores for temperature, humidity and wind features
respectively.
www.SunilOS.com 83
Feature Gini index
Outlook 0.342
Temperature 0.439
Humidity 0.362
Wind 0.428

Sub-tree (subset) sunny
www.SunilOS.com 84

Gini of temperature for sunny outlook:
❑ Gini(Outlook=Sunny and Temp.=Hot) = 1 – (0/2)2 – (2/2)2 = 0
❑ Gini(Outlook=Sunny and Temp.=Cool) = 1 – (1/1)2 – (0/1)2 = 0
❑ Gini(Outlook=Sunny and Temp.=Mild) = 1 – (1/2)2 – (1/2)2 = 1 – 0.25
– 0.25 = 0.5
❑ Gini(Outlook=Sunny and Temp.) = (2/5)x0 + (1/5)x0 + (2/5)x0.5 = 0.2
www.SunilOS.com 85
instances
Hot 0 2 2
Cool 1 0 1
Mild 1 1 2

Gini of humidity for sunny Outlook(Weather):
❑ Gini(Outlook=Sunny and Humidity=High) = 1 – (0/3)2 – (3/3)2 = 0
❑ Gini(Outlook=Sunny and Humidity=Normal) = 1 – (2/2)2 – (0/2)2 = 0
❑ Gini(Outlook=Sunny and Humidity) = (3/5)x0 + (2/5)x0 = 0
www.SunilOS.com 86
instances
High 0 3 3
Normal 2 0 2

Gini of wind for sunny outlook:
❑ Gini(Outlook=Sunny and Wind=Weak) = 1 – (1/3)2 – (2/3)2 = 0.266
❑ Gini(Outlook=Sunny and Wind=Strong) = 1- (1/2)2 – (1/2)2 = 0.2
❑ Gini(Outlook=Sunny and Wind) = (3/5)x0.266 + (2/5)x0.2 = 0.466
www.SunilOS.com 87
instances
Weak 1 2 3
Strong 1 1 2

Decision for sunny outlook:
❑ We’ve calculated gini index scores for feature when outlook is sunny.
The winner is humidity because it has the lowest value.
❑ We’ll put humidity at the extension of sunny outlook because it has
minimum gini index.
❑ As seen, decision is always no for high humidity and sunny outlook.
On the other hand, decision will always be yes for normal humidity
and sunny outlook. This branch is over.
www.SunilOS.com 88
Feature Gini index
Temperature 0.2
Humidity 0
Wind 0.466

Now, we need to focus on rain outlook.
www.SunilOS.com 89

Gini of temperature for rain outlook:
❑ Gini(Outlook=Rain and Temp.=Cool) = 1 – (1/2)2 – (1/2)2 = 0.5
❑ Gini(Outlook=Rain and Temp.=Mild) = 1 – (2/3)2 – (1/3)2 = 0.444
❑ Gini(Outlook=Rain and Temp.) = (2/5)x0.5 + (3/5)x0.444 = 0.466
www.SunilOS.com 90
instances
Cool 1 1 2
Mild 2 1 3

Gini of humidity for rain outlook:
❑ Gini(Outlook=Rain and Humidity=High) = 1 – (1/2)2 – (1/2)2 = 0.5
❑ Gini(Outlook=Rain and Humidity=Normal) = 1 – (2/3)2 – (1/3)2 =
0.444
❑ Gini(Outlook=Rain and Humidity) = (2/5)x0.5 + (3/5)x0.444 = 0.466
www.SunilOS.com 91
instances
High 1 1 2
Normal 2 1 3

Gini of wind for rain outlook:
❑ Gini(Outlook=Rain and Wind=Weak) = 1 – (3/3)2 – (0/3)2 = 0
❑ Gini(Outlook=Rain and Wind=Strong) = 1 – (0/2)2 – (2/2)2 = 0
❑ Gini(Outlook=Rain and Wind) = (3/5)x0 + (2/5)x0 = 0
www.SunilOS.com 92
instances
Weak 3 0 3
Strong 0 2 2

Decision for rain outlook:
❑ So for rain outlook we will take wind feature for spliting because it has
minimum gini index.
❑ Put the wind feature for rain outlook branch and monitor the new sub
data sets.
❑ As seen, decision is always yes when wind is weak. On the other hand,
decision is always no if wind is strong. This means, this branch is over.
www.SunilOS.com 93
Feature Gini index
Temperature 0.466
Humidity 0.466
Wind 0

Final decision Tree
www.SunilOS.com 94

Code Implementation of CART
❑ #Assigning features and label variables
❑ weather=['Sunny','Sunny','Overcast','Rainy','Rainy',
'Rainy','Overcast','Sunny','Sunny','Rainy','Sunny',
'Overcast', 'Overcast‘ , 'Rainy']
❑
❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool',
'Mild','Cool','Mild','Mild','Mild','Hot','Mild']
❑
❑ humidity=["High","High","High","High","Normal","Norm
al","Normal","High","Normal","Normal","Normal","High
","Normal","High"]
❑
❑ Windy=["Weak","Strong","Weak","Weak","Weak","Strong“
,"Strong","Weak","Weak","Weak","Strong","Strong","We
ak","Strong"]
www.SunilOS.com 95

❑
❑
❑
❑ print("Weather:",weather_encoded)
❑
www.SunilOS.com 96

❑ print("Temp:",temp_encoded)
❑
❑ windy_encoded=le.fit_transform(Windy)
❑ print("Windy:",windy_encoded)
❑
❑ Humadity_encoded=le.fit_transform(humadity)
❑ print("Humadity:",Humadity_encoded)
❑ print("Play:",label)
www.SunilOS.com 97

❑ #Combinig weather,temp, Windy, humadity into single listof tuples
❑ features=list(zip(weather_encoded,temp_encoded,windy
_encoded,Humadity_encoded))
❑ print("Features:",features)
❑ #Import the DecisionTreeClassifier
❑ from sklearn.tree import DecisionTreeClassifier
❑ tree = DecisionTreeClassifier(criterion='gini')
❑ #Train the Model
❑ tree.fit(features,label)
❑ #Test Model 2:sunny, 2:Mild 0:Windy:Strong 0:Humadity:High
❑ prediction = tree.predict([[2,2,1,0]])
❑ print("Decision",prediction)
❑
www.SunilOS.com 98

Working of ID3 Algorithm
❑ For ID3 implementation we are using the same dataset
which we have used in CART algorithm.
❑ First step will be to create a root node.
❑ If all results are yes, then the leaf node “yes” will be
returned else the leaf node “no” will be returned.
❑ Find out the Entropy of all observations and entropy with
attribute “x” that is E(S) and E(S, x).
❑ Find out the information gain and select the attribute with
high information gain.
❑ Repeat the above steps until all attributes are covered.
www.SunilOS.com 99

Complete Entropy of dataset
❑ First we will calculate entropy for decision column (play)
Decision column consists of 14 instances and includes two
labels: Yes and No.
o Yes=9
o No=5
❑ Entropy(Decision)= –p(Yes)*log2p(Yes)–p(No)*log2p(No)
❑ Entropy(Decision)= –(9/14) *log2(9/14)–(5/14)*log2(5/14)
= 0.940
❑ Now, we need to find out the most dominant attribute to
make root node of the tree.
www.SunilOS.com 100

Wind factor on decision
❑ Formula:
o Gain(Decision,Wind)=Entropy(Decision) – ∑ [ p(Decision|Wind).*
Entropy(Decision|Wind)]
❑ Wind attribute has two labels: Weak and Strong. We
would reflect it to the formula.
o Gain(Decision,Wind)=Entropy(Decision)–
[p(Decision|Wind=Weak)*Entropy(Decision|Wind=Weak)]-
[p(Decision|Wind=Strong)*Entropy(Decision|Wind=Strong) ]
❑ Now, we need to calculate (Decision|Wind=Weak)
and (Decision|Wind=Strong) respectively.
www.SunilOS.com 101

Weak wind factor on decision
www.SunilOS.com 102

Weak wind factor on decision
❑ There are 8 instances for weak wind. Decision of 2 items are
No and 6 items are Yes as illustrated below.
❑ Entropy(Decision|Wind=Weak)=–p(No)*log2p(No)-p(Yes)*log2p(Yes)
❑ Entropy(Decision|Wind=Weak) = – (2/8)*log2(2/8) – (6/8) *log2(6/8)
❑ Entropy(Decision|Wind=Weak) = 0.811
www.SunilOS.com 103

Strong wind factor on decision(Play):
www.SunilOS.com 104

Strong wind factor on decision(Play):
❑ Here, there are 6 instances for strong wind. Decision is
divided into two equal parts.
❑ Entropy(Decision|Wind=Strong)=–p(No)*log2p(No)– p(Yes)*log2p(Yes)
❑ Entropy(Decision|Wind=Strong) = – (3/6)*log2(3/6) – (3/6) *log2(3/6)
❑ Entropy(Decision|Wind=Strong) = 1
www.SunilOS.com 105

Information Gain for Wind Attribute
❑ Formula:
o Gain(Decision,Wind) = Entropy(Decision)–
[p(Decision|Wind=Weak) * Entropy(Decision|Wind=Weak) ] –
[p(Decision|Wind=Strong)*Entropy(Decision|Wind=Strong) ]
❑ Gain(Decision,Wind) = 0.940 – [ (8/14) *0.811 ] – [ (6/14)*1]
❑ Gain(Decision,Wind) = 0.048
❑ We Have calculated Gain for Wind. Apply the same procedure to
Others to get Best attribute to make it root node.
www.SunilOS.com 106

Information Gain for Other factors
❑ Other factors on decision
o Gain(Decision, Outlook) = 0.246
o Gain(Decision, Temperature) = 0.029
o Gain(Decision, Humidity) = 0.151
❑ Outlook factor on decision has highest score. That’s why, outlook
decision will appear in the root node of the tree.
www.SunilOS.com 107

Overcast outlook on decision
❑ Basically, decision will always be yes if outlook were overcast.
www.SunilOS.com 108

Sunny outlook on decision
www.SunilOS.com 109

❑ Here, there are 5 instances for sunny outlook. Decision
would be probably 3/5 percent No, 2/5 percent Yes.
❑ Gain(Outlook=Sunny|Temperature) = 0.570
❑ Gain(Outlook=Sunny|Humidity) = 0.970
❑ Gain(Outlook=Sunny|Wind) = 0.019
❑ Now, humidity is the decision because it produces the
highest score if outlook were sunny.
www.SunilOS.com 110

❑ At this point, decision will always be NO if humidity were high.
❑ At this point, decision will always be Yes if humidity were Normal.
www.SunilOS.com 111

Rain outlook on decision
❑ Gain(Outlook=Rain | Temperature) = 0.01997309402197489
❑ Gain(Outlook=Rain | Humidity) = 0.01997309402197489
❑ Gain(Outlook=Rain | Wind) = 0.9709505944546686
❑ Here, wind produces the highest score if outlook were rain. That’s why, we
need to check wind attribute in 2nd level if outlook were rain.
www.SunilOS.com 112

Rain outlook on decision
❑ Decision will always Yes if wind were weak and outlook were
rain.
❑ Decision will always No if wind were Strong and outlook were
rain.
www.SunilOS.com 113
Day
Outloo
k
Temp.
Humidit
y
Wind Decision

Final decision Tree
www.SunilOS.com 114

Implementation of ID3
❑ #Import the DecisionTreeClassifier
❑ from sklearn.tree import DecisionTreeClassifier
iny','Rainy','Overcast','Sunny','Sunny',
'Rainy','Sunny','Overcast','Overcast','Rainy']
Mild']
❑
www.SunilOS.com 115

Implementation of ID3(cont.)
❑
www.SunilOS.com 116

Implementation of ID3(cont.)
❑ #Combinig weather and temp into single listof tuples
❑ features=list(zip(weather_encoded,temp_encoded))
❑ #Create Instance of Model, and train the model
❑ tree = DecisionTreeClassifier(criterion='entropy')
❑ #Predict result for 0:Overcast, 2:mild
❑ prediction = tree.predict([[0,2]])
www.SunilOS.com 117

Random Forest Algorithm
www.SunilOS.com 118

What is Random Forest
❑In Random Forest algorithm we join different and same type of
multiple algorithms together. For example multiple decision trees
to make a forest of trees. That is known as Random forest.
❑ It helps us to make a powerful prediction model.
❑Random forest algorithm works for both regression and
classification Problems.
❑Application of Random Forest
o Fraud prediction
o Cancer detection
o Stock market predictions
o Spam filter
o News classification
www.SunilOS.com 119

How does random Forest Works?
❑ Pick N random data records from the dataset.
❑ Based on these N numbers of record build a decision tree.
❑ Choose how many trees we want to create and repeat the
previous steps.
❑ To predict the output for new record:
❑ In case of Regression: Each tree will predict the result. The final
result will be calculated by taking average of all result predicted
by all trees.
❑ In case of Classification: The trees will predict the class level
for new record. Finally we will assign the new record to the
category which has majority.
www.SunilOS.com 120

Advantages and Disadvantages of Random Forest
Advantages
❑ In Random forest there is multiple
trees. So this algorithm is not
biased.
❑ This is a stable algorithm. If new
training data is introduced only
one tree will be affected not all
the trees.
❑ This is suitable for both
categorical data, and numerical
data.
❑ This is also work well when
dataset has missing values
❑ Model can be trained parallel .
Disadvantages
❑It is complex algorithm.
❑It required more computational
time to join multiple decision
trees.
❑It takes too much time to train the
model as compare to other
algorithm
121

Code implementation of random Forest
❑ #Assign features
❑ weather=['Sunny','Sunny','Overcast','Rainy'
,'Rainy','Rainy','Overcast','Sunny','Sunny'
,'Rainy','Sunny','Overcast','Overcast',
'Rainy']
❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool
','Cool','Mild','Cool','Mild','Mild','Mild'
,'Hot','Mild']
❑ humadity=["High","High","High","High","Norm
al","Normal","Normal","High","Normal","Norm
al","Normal","High", "Normal","High"]
www.SunilOS.com 122

❑ Windy=["Weak","Strong","Weak","Weak","Weak"
,"Strong","Strong","Weak","Weak","Weak",
"Strong","Strong","Weak","Strong"]
❑ play=['No','No','Yes','Yes','Yes','No','Yes
','No','Yes','Yes','Yes','Yes','Yes','No']
❑
❑ #Import LabelEncoder
www.SunilOS.com 123

❑
❑
❑
❑ Humadity_encoded=le.fit_transform(humadity)
❑
www.SunilOS.com 124

❑ #Combinig weather and temp into single listof tuples
❑ features=list(zip(weather_encoded,temp_encoded,
❑ windy_encoded,Humadity_encoded))
❑ #Import the RandomforestClassifier
❑ from sklearn.ensemble import RandomForestClassifi
er
❑ #create instance of the Random Forest Classifier
❑ tree= RandomForestClassifier(n_estimators=5)
❑ #train the Model
❑ #Test 2:sunny, 2:Mild 0:Windy:Strong 0:Humadity:High
❑ prediction = tree.predict([[2,2,1,0]])
www.SunilOS.com 125

www.SunilOS.com 126
Support Vector Machine
www.sunilos.com
www.raystec.com

SVM
❑ Support Vector Machine is a supervised machine learning algorithm.
❑ They are developed in 1990’s and still famous.
❑ It is used for classification and Regression problem.
❑ SVM can be used for linearly and multidimensional dataset (2 Dim. and 3
Dim.).
❑ SVM can be used for multiclass classification(Having more than 1 class
Label).
www.SunilOS.com 127

How SVM Works:
❑ To separate two classes as shown in previous slide. we need a
line that’s separate data in two classes.
❑ This line is known as Decision boundary or a hyper plane. We
draw a line such as we have a maximum margin between the data
points of the classes, which is near to the hyper plane.
❑ To separate the two classes of data points, there are many
possible hyper planes that could be chosen. Our objective is to
find a plane that has the maximum margin, i.e. the maximum
distance between data points of both classes.
❑ Maximizing the margin distance provides some reinforcement so
that future data points can be classified with more confidence.
www.SunilOS.com 128

SVM Related Terminologies
❑ Support Vectors:
o When we classify data with the help of hyperplane, than the data points which are near
to the hyperplane is known as support Vectors.
❑ Hyperplane
o A hyperplane is a decision boundary between the two classes. It is used to separate the
data points of different class.
❑Margin:
o We draw a parallel line along the data points which are near to the hyperplane. The gap
between decision lines of each class is known as margin.
o For ex. D- and D+ are the lines which are closest to the support vectors of two opponent
classes. Than we can obtain margin as
o Margin=D- + D+
o If the margin is larger in between the classes, then it is considered a good margin, a
smaller margin is a bad margin.
www.SunilOS.com 129

What is the reason to Choose SVM?
❑SVM can be used for multiclass classification.
❑SVM can be used for linear separated dataset.
❑SVM can be used for high dimensional dataset which
are not linearly separable.
❑SVM is efficiently classifying the dataset in high
dimension.
www.SunilOS.com 130

Implementation of Linear SVM:
❑ #import liabraries
❑ import numpy as np
❑ import matplotlib.pyplot as plt
❑ from matplotlib import style
❑ style.use("ggplot")
❑ from sklearn import svm
❑ #Attributes
❑ x = [1, 5, 1.5, 8, 1, 9]
❑ y = [2, 8, 1.8, 8, 0.6,11]
❑ plt.scatter(x,y)
❑ plt.show()
www.SunilOS.com 131

Implementation of Linear SVM(cont.)
❑ #import preprocessing
❑ X=list(zip(x,y))
❑ y = [0,1,0,1,0,1]
❑ #Train SVM Model
❑ clf = svm.SVC(kernel='linear', C = 1.0)
❑ clf.fit(X,y)
❑ # Test x=0.58, y=0.76
❑ print(clf.predict([[0.58,0.76]]))
❑ #x=10.58, y=10.76
❑ print(clf.predict([[10.58,10.76]]))
www.SunilOS.com 132

Non- Linear SVM
www.SunilOS.com 133

SVM Kernels
❑ The SVM algorithm is implemented in practice using a kernel.
❑ A kernel transforms an input data space into the required form (linear or non
linear).
❑ SVM uses a technique called the kernel trick. Here, the kernel takes a low-
dimensional input space and transforms it into a higher dimensional space.
❑ In other words, you can say that it converts non separable problem to
separable problems by adding more dimension to it.
❑ It is most useful in non-linear separation problem. Kernel trick helps you to
build a more accurate classifier.
❑ Types of Kernels
o Linear Kernel
o Polynomial Kernel
o RBF (Radial Basis Kernel )
www.SunilOS.com 134

Linear Kernel
❑A linear kernel can be used as normal dot product any two given
observations. The product between two vectors is the sum of the
multiplication of each pair of input values.
o K(x, xi) = sum(x * xi)
❑ For example, the inner product of the vectors [1, 2] and [3, 4] is 1*3 + 2*4 or
11.
❑ The equation for making a prediction for a new input using the dot product
between the input (x) and each support vector (xi) is calculated as follows:
f(x) = B0 + sum(ai * (x,xi))
❑ This is an equation that is used for calculating the inner products of a new
input vector (x) with all support vectors in training data. The coefficients B0
and ai (for each input) must be estimated from the training data by the learning
algorithm.
www.SunilOS.com 135

Polynomial Kernel
❑A polynomial kernel is a more generalized form of the
linear kernel. The polynomial kernel can distinguish
curved or nonlinear input space.
K(x,xi) = 1 + sum(x * xi)^d
❑Where d is the degree of the polynomial. d=1 is similar
to the linear transformation. The degree needs to be
manually specified in the learning algorithm.
www.SunilOS.com 136

RBF (radial basis function) Kernel
❑ The Radial basis function kernel is a popular kernel function
commonly used in support vector machine classification.
RBF can map an input space in infinite dimensional space.
K(x,xi) = exp(-gamma * sum((x – xi^2))
❑ Here gamma is a parameter, which ranges from 0 to 1. A
higher value of gamma will perfectly fit the training dataset,
which causes over-fitting. Gamma=0.1 is considered to be a
good default value. The value of gamma needs to be
manually specified in the learning algorithm.
www.SunilOS.com 137

Implementation of Non Linear Kernel
❑ We can see our dataset is not linearly separable from the
graph.
www.SunilOS.com 138

iny','Rainy','Overcast','Sunny','Sunny','Rainy'
,'Sunny','Overcast','Overcast','Rainy']
❑
Mild']
❑
❑ humadity=["High","High","High","High","Normal",
"Normal","Normal","High","Normal","Normal","Nor
mal","High","Normal","High"]
❑
www.SunilOS.com 139

❑ Windy=["Weak","Strong","Weak","Weak","Weak","St
rong","Strong","Weak","Weak","Weak","Strong","S
trong","Weak","Strong"]
❑
www.SunilOS.com 140

❑ Humidity_encoded=le.fit_transform(humadity)
www.SunilOS.com 141

❑ #Combinig weather and temp into single list of tuples
❑ features=list(zip(weather_encoded,temp_encoded,windy
_encoded,Humadity_encoded))
❑ #import svm
❑ from sklearn import svm
❑ #Create a svm Classifier
❑ clf = svm.SVC(kernel='rbf') # Linear Kernel
❑ #Train SVM Model
❑ clf.fit(features,label)
❑ # Test 2:sunny, 2:Mild 0:Windy:Strong 0:Humadity:High
❑ prediction = clf.predict([[2,2,1,0]])
www.SunilOS.com 142

Advantages & Disadvantages of SVM
Advantages
❑It works really well with a
clear margin of separation
❑It is effective in high
dimensional spaces.
❑It is effective in cases where
the number of dimensions is
greater than the number of
samples.
❑It support vectors, so it is
also memory efficient.
Disadvantages
❑It doesn’t perform well
when we have large data set
because the required
training time is higher
❑It also doesn’t perform very
well, when the data set has
more noise i.e. target classes
are overlapping
143

www.SunilOS.com 144
Regression
www.sunilos.com
www.raystec.com

Types Of Regression
❑Linear regression
❑Logistic regression
❑Polynomial regression
www.SunilOS.com 145
Logistic Linear Polynomial
Regression Regression Regression

Logistic Regression and linear Regression
Linear Regression Logistic Regression
Linear regression is used to predict the continuous dependent
variable using a given set of independent variables.
Logistic Regression is used to predict the categorical dependent
variable using a given set of independent variables.
Linear Regression is used for solving Regression problem. Logistic regression is used for solving Classification problems.
In Linear regression, we predict the value of continuous
variables.
In logistic Regression, we predict the values of categorical
variables.
In linear regression, we find the best fit line, by which we can
easily predict the output.
In Logistic Regression, we find the S-curve by which we can
classify the samples.
Least square estimation method is used for estimation of
accuracy.
Maximum likelihood estimation method is used for estimation
of accuracy.
The output for Linear Regression must be a continuous value,
such as price, age, etc.
The output of Logistic Regression must be a Categorical value
such as 0 or 1, Yes or No, etc.
In Linear regression, it is required that relationship between
dependent variable and independent variable must be linear.
In Logistic regression, it is not required to have the linear
relationship between the dependent and independent variable.
In linear regression, there may be collinearity between the
independent variables.
In logistic regression, there should not be collinearity between
the independent variable.
www.SunilOS.com 146

Linear Regression
❑Linear regression:
o Linear regression is a statistical approach for
modeling the relationship between a dependent
variable with a given set of independent variables.

Linear Regression cont.
❑Linear regression attempts to model the
relationship between two variables by fitting a
linear equation to observed data. One variable is
considered to be an independent variable, and
the other is considered to be a dependent
variable.
o For example, a modeler might want to relate the
weights of individuals to their heights using a linear
regression model.

What is Linear
❑First, let’s say that you are shopping at Dmart.
Whether you buy goods or not, you have to pay
2.00rs for parking ticket. Each apple price 1.5rs.,
and you have to buy an (x) item of apple. Then
we can populate a price list as following:

Linear Relationship among data
Quantity Price
1 3.50 Rs.
2 5.00 Rs
3 6.50 Rs
4 8.00 Rs
5 9.50 Rs
… ...
10 17.00 Rs
11 18.50 Rs
... ...
x y

Linear Function
❑ It’s easy to predict (or calculate) the Price based on Value and vice versa using
the equation of y=2+1.5x for this example or:
Y =a + bx
❑ Linear Functions with:
❑ a = 2
❑ b = 1.5
❑ A linear function has one independent variable and one dependent variable.
The independent variable is x and the dependent variable is y.
❑ a is the constant term or the y intercept. It is the value of the dependent
variable when x = 0.
❑ b is the coefficient of the independent variable. It is also known as the slope
and gives the rate of change of the dependent variable.

Implementation of Linear Regression:
❑ Code explanation:
❑ dataset: the table contains all values in our csv file
❑ X: the first column which contains Years Experience array
❑ y: the last column which contains Salary array
y = b0 + b1*x1
❑ y: dependent variable
❑ b0: constant
❑ b1: coefficient
❑ x1: independent variable

Dataset: Salary Data

Visualization of data

Code Implementation of Linear Regression
❑ import numpy as np
❑ import matplotlib.pyplot as plt
❑ import pandas as pd
❑ # Importing the dataset
❑ dataset=pd.read_csv('E:/MLImplementation/r
egression.csv')
❑ #get a copy of dataset exclude last column
❑ X = dataset.iloc[:, :-1].values
❑ #get array of dataset in column 1st
❑ y = dataset.iloc[:, 1].values

Code Implementation of Linear Regression (cont.)
❑ # Splitting the dataset into the Training set and Test set
❑ from sklearn.model_selection import
train_test_split
❑ X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size=1/3,
random_state=0)
❑ # Fitting Simple Linear Regression to the Training set
❑ from sklearn.linear_model import
LinearRegression
❑ regressor = LinearRegression()
❑ regressor.fit(X_train, y_train)

❑ # Predicting the Test set results
❑ y_pred = regressor.predict(X_test)
❑ #predicting the salary for 5 year Experienced Employee
❑ y_pred = regressor.predict([[5]])
❑ print(y_pred)

❑ # Visualizing the Training set results
❑ viz_train = plt
❑ viz_train.scatter(X_train, y_train, color='red')
❑ viz_train.plot(X_train, regressor.predict(X_train),
color='blue')
❑ viz_train.title('Salary VS Experience (Training
set)')
❑ viz_train.xlabel('Year of Experience')
❑ viz_train.ylabel('Salary')
❑ viz_train.show()

Training Dataset

❑ # Visualizing the Test set results
❑ viz_test = plt
❑ viz_test.scatter(X_test, y_test,
color='red')
❑ viz_test.plot(X_train,regressor.predict(X_t
rain), color='blue')
❑ viz_test.title('Salary VS Experience (Test
set)')
❑ viz_test.xlabel('Year of Experience')
❑ viz_test.ylabel('Salary')
❑ viz_test.show()

Test Data

Advantages & Disadvantages of Linear Regression
❑Advantages:
o Simple and easy to understand.
o Cheap computational cost.
o Ground for more complex machine learning algorithms.
❑
❑Disadvantage:
o Oversimplify or fail in non-linear problems (only do well in
linear modeling)
o Sensitive to outliers and noises

Multi Linear Regression
❑In most cases, we will have more than one independent
variable — we’ll have multiple variables; it can be as
little as two independent variables and up to hundreds
(or theoretically even thousands) of variables.
❑In those cases we will use a Multiple Linear Regression
model (MLR). The regression equation is pretty much
the same as the simple regression equation, just with
more variables:
Y= b0 + b1X1 + b2X2+...bnXn

Implementation Of Multi linear Regression
❑We are taking loan dataset for multi linear regression
with age, credit-rating and children as features and loan
as target.
❑We are going to predict the loan amount (dependent
variable) with the help of age, credit-rating and no of
children(Independent variable).
❑Note that the data has four columns, out of which three
columns are features and one is the target variable.

Loan Dataset

Relationship between credit-rating and loan amount

Code Implementation of MLR
❑ #Features age, credit-rating and no of children
❑ age=[19,18,28,33,32,31,46,37,37,60,25,62,23,56]
❑ credit_rating=[27.9,42.13,33,22.705,28.88,25.74,
❑ 33.44,27.74,29.83,25.84,26.22,26.29,34.4,39.82]
❑ children=[0,1,3,0,0,0,1,3,2,0,0,0,0,0]
❑ #Label data
❑ loan=[16884.924,1725.5523,4449.462,21984.47061,3866.
8552,
❑ 3756.6216,8240.5896,7281.5056,6406.4107,28923.13692,
❑ 2721.3208,27808.7251,1826.843,11090.7178,]

Code Implementation of MLR (cont.)
❑ #Combining age, credit-rating and children into single list of tuples
❑ features=list(zip(age,credit_rating,children))
❑ print(features)
❑ #define the multiple Linear regression model
❑ linear_regress = LinearRegression()
❑ #Fit the multiple Linear regression model
❑ linear_regress.fit(features,loan)
❑ print("coefficient:",linear_regress.coef_)
❑ print("intercept:",linear_regress.intercept_)
❑ # predict with test data
❑ #age:20,credit-rating:32,children:0
❑ y_pred=linear_regress.predict([[20,32,0]])
❑ print(y_pred)

Disclaimer
❑This is an educational presentation to enhance the
skill of computer science students.
❑This presentation is available for free to computer
science students.
❑Some internet images from different URLs are used
in this presentation to simplify technical examples
and correlate examples with the real world.
❑We are grateful to owners of these URLs and
pictures.
www.SunilOS.com 169

Thank You!
www.SunilOS.com 170
www.SunilOS.com

Machine learning ( Part 2 )

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Machine learning ( Part 2 )

Similar to Machine learning ( Part 2 ) (20)

More from Sunil OS

More from Sunil OS (12)

Recently uploaded

Recently uploaded (20)

Machine learning ( Part 2 )