SlideShare a Scribd company logo
1 of 170
www.SunilOS.com 1
Supervised Learning
www.sunilos.com
www.raystec.com
www.SunilOS.com 2
What is Machine Learning?
❑ Human Learns from past experience.
❑ A computer does not have “experiences”.
❑ A computer system learns from data,
❑ Which represent some “past experiences” of an application
domain.
❑ Our focus: learn a target function that can be used to predict the
values of a class attribute, e.g. a loan application is, approve or
not-approved, and high-risk or low risk.
❑ The task is commonly called: Supervised learning, classification,
or inductive learning.
Types of Learning
❑Supervised Learning
o Classification
o Regression
❑Unsupervised Learning
o Clustering
❑Reinforcement Learning
www.SunilOS.com 3
Types of supervised Learning
❑Classification:
o A classification problem
is when the output
variable is a category,
such as “red” or “blue” or
“disease” and “no
disease”.
❑Regression:
o A regression problem is
when the output variable
is a real value, such as
“dollars” or “weight”.
www.SunilOS.com 4
Supervised Learning Process
❑Learning(training):
o Learn the model with known data
❑Testing:
o test the Model with unseen data
❑Accuracy:
❑ No of right classification/Total no of test case
www.SunilOS.com 5
Training
data
Learning algorithm Model AccuracyTraining
Data
Step1: Training Step2: Testing
Testing
Data
Classification example
❑ A loan providing company receives thousands of applications
for new loans.
❑ Each application contains information about an applicant
o Age
o Marital status
o annual salary
o Outstanding debts
o credit rating
o etc.
❑ Problem: to decide whether an application should approved, or
to classify applications into two categories, approved and not
approved.
www.SunilOS.com 6
Dataset
www.SunilOS.com 7
www.SunilOS.com 8
An example
❑Data: Loan application data
❑Task: Predict whether a loan should be approved
or not.
❑Performance measure: Accuracy.
❑No learning: classify all future applications (test
data) to the majority class (i.e., Yes):
o Accuracy = 9/15 = 60%.
❑We can do better than 60% with learning.
www.SunilOS.com 9
Evaluating classification methods
❑Predictive accuracy
o Accuracy=No of correct classification / total no of test Case
❑Efficiency
o time to construct the model
o time to use the model
www.SunilOS.com 10
Conclusion
❑ Applications of supervised learning are in almost any field or
domain.
❑ There are numerous classification techniques.
o Bayesian networks
o K- Nearest Neighbors
o Decision Tree Classification
o Fuzzy classification
❑ This large number of methods also show the importance of
classification and its wide applicability.
❑ It remains to be an active research area.
www.SunilOS.com 11
www.SunilOS.com 12
Classification
www.sunilos.com
www.raystec.com
4/16/2020 www.SunilOS.com 12
www.SunilOS.com 13
What is Classification?
 Classification is a supervised machine learning
approach.
 Computer uses Training data for learning and uses this
learning to classify new observations.
 Classification can be:
 Binary class classification : spam or not spam, male or
female Multiclass classification: Fruits, Colors.
4/16/2020 www.SunilOS.com 13
Types of classification algorithm
❑Linear Classifiers: Logistic Regression, Naive
Bayes Classifier
❑K Nearest Neighbor
❑Support Vector Machines
❑Decision Trees
❑Random Forest
4/16/2020 www.SunilOS.com 14
K-Nearest Neighbor
❑ The k-nearest-neighbors algorithm is a supervised
classification technique that based on similar qualities.
❑ KNN assumes, similar things exist near to each other.
❑ The algorithm takes a bunch of labeled points and uses them
to learn how to label other points.
❑ To label a new point, it looks at the labeled points closest to
that new point (those are its nearest neighbors).
❑ Closeness is typically expressed in terms of a dissimilarity
function.
❑ Once it checks with ‘k’ number of nearest neighbors, it
assigns a label based on whichever label the most of the
neighbors have.
www.SunilOS.com4/16/2020 15
KNN working Steps
❑Calculate distance for new test data with old labeled data
❑Find closest neighbors for new test data.
❑Vote for labels which is nearest.
4/16/2020 www.SunilOS.com 16
KNN algorithm Implementation
❑Define dataset.
❑Prepare data.
❑Train model.
❑Test Model.
❑Calculate accuracy.
4/16/2020 www.SunilOS.com 17
Dataset
❑Let's first create your own dataset. Here you
need two kinds of attributes or columns in your
data: Feature and target label. The reason for two
type of column is "supervised nature of KNN
algorithm".
❑In this dataset, you have two features (weather
and temperature) and one label(play).
4/16/2020 www.SunilOS.com 18
Define dataset
Weather Temp Play
Sunny Hot No
Sunny Hot Yes
Overcast Hot Yes
Rainy Mild Yes
Rainy Cool No
Rainy Cool Yes
Overcast Cool No
Sunny Mild Yes
Sunny Cool Yes
Rainy Mild Yes
Sunny Mild Yes
Overcast Mild Yes
Overcast Hot Yes
Rainy Mild No4/16/2020 www.SunilOS.com 19
Sample data
4/16/2020 www.SunilOS.com 20
Code implementation in scikit learn
❑ # Assigning features and label variables
❑ # First Feature
❑ weather=['Sunny','Sunny','Overcast','Rainy','Rainy',
'Rainy','Overcast','Sunny','Sunny',
❑ 'Rainy','Sunny','Overcast','Overcast','Rainy']
❑ # Second Feature
❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool',
'Mild','Cool','Mild','Mild','Mild','Hot','Mild']
❑
❑ # Label or target variable
❑ play=['No','No','Yes','Yes','Yes','No','Yes','No','Y
es','Yes','Yes','Yes','Yes','No']
4/16/2020 www.SunilOS.com 21
Code implementation in scikit learn(cont.)
❑ # Import Label Encoder
❑ from sklearn import preprocessing
❑ #creating label Encoder
❑ le = preprocessing.LabelEncoder()
❑ # Converting string labels into numbers.
❑ weather_encoded=le.fit_transform(weather)
❑ print(weather_encoded)
❑
❑ # converting string labels into numbers
❑ temp_encoded=le.fit_transform(temp)
❑ label=le.fit_transform(play)
❑ print(label)
4/16/2020 www.SunilOS.com 22
Code implementation in scikit learn(cont.)
❑ #combining weather and temp into single list of tuples
❑ features=list(zip(weather_encoded,temp_encoded))
❑ print(features)
❑ #Prepare Model instance
❑ from sklearn.neighbors import KNeighborsClassifier
❑ model = KNeighborsClassifier(n_neighbors=3)
❑ # Train the model using the training sets
❑ model.fit(features,label)
❑ #Predict Output
❑ predicted= model.predict([[0,2]]) # 0:Overcast, 2:Mild
❑ print(predicted)
4/16/2020 www.SunilOS.com 23
Advantage of KNN
❑It is extremely easy to implement
❑This makes the KNN algorithm much faster than other
algorithms that require training e.g. SVM, Linear
Regression etc.
❑Since the algorithm requires no training before making
predictions, new data can be added seamlessly.
❑There are only two parameters required to implement
KNN i.e. the value of K and the distance function (e.g.
Euclidean or Manhattan etc.)
4/16/2020 www.SunilOS.com 24
Disadvantages of KNN
❑ The KNN algorithm doesn't work well with high dimensional
data because with large number of dimensions, it becomes
difficult for the algorithm to calculate distance in each
dimension.
❑ The KNN algorithm has a high prediction cost for large
datasets. This is because in large datasets the cost of
calculating distance between new point and each existing
point becomes higher.
❑ Finally, the KNN algorithm doesn't work well with categorical
features since it is difficult to find the distance between
dimensions with categorical features.
4/16/2020 www.SunilOS.com 25
Naive Bayes Classification Base
❑It uses Bayes theorem of probability for prediction of
unknown class/Label.
❑Naive Bayes classifier assumes that the effect of a
particular feature in a class is independent of other
features.
o For example, a loan applicant is desirable or not depending on his/her
income, previous loan and transaction history, age, and location.
o Even if these features are interdependent, these features are still
considered independently.
o This assumption simplifies computation, and that's why it is considered as
naive
www.SunilOS.com 26
Approve a Loan
❑ Bank has received a loan application and now we want to predict whether
bank will approve or not.
❑ Approval will be decide on the basis of independent attributes specified in the
application form.
❑ Income, previous loan, transaction history, age, and location information
specified in application form are considered as independent attribute.
❑ Now we will calculate separate probability:
❑ probability of approval or rejection of loan on income,
❑ probability of approval or rejection of loan on previous loan,
❑ probability of approval or rejection of loan on age,
❑ probability of approval or rejection of loan on location,
❑ Naive Bayes will help us to multiply above probabilities and forecast approval
and rejection of new loan application.
www.SunilOS.com 27
Naïve Bayes Classification Base (cont.)
❑ Where,
❑ P(c|x) is the posterior probability of class c given predictor ( features).
❑ P(c) is the probability of class.
❑ P(x|c) is the likelihood which is the probability of predictor given class.
❑ P(x) is the prior probability of predictor.
www.SunilOS.com 28
Types of Naive Bayes Algorithm
❑Gaussian Naive Bayes.
❑Multinomial Naive Bayes.
❑Bernoulli Naïve Bayes.
❑P(A|B)=P(B|A)*P(A)
❑ -----------------
❑ P(B)
www.SunilOS.com 29
How Gaussian Naive Bayes classifier works?
❑Given an example of weather conditions and
playing sports.
❑You need to calculate the probability of playing
sports.
❑Now, you need to classify whether players will
play or not, based on the weather condition.
www.SunilOS.com 30
How Naive Bayes classifier works? (cont.)
❑ Naive Bayes classifier calculates the probability of an event in
the following steps:
❑ Calculate the prior probability for given class labels
o p(play)
o P(not play).
❑ Find Likelihood probability with each attribute for each class.
o P(Hot/play) or p(Hot/not play)
o P(Cold/play) p(Cold/not play)
❑ Put these value in Bayes Formula and calculate posterior
probability.
❑ See which class has a higher probability, given the input
belongs to the higher probability class.
www.SunilOS.com 31
Dataset
Weather Play
Sunny No
Sunny Yes
Overcast Yes
Rainy Yes
Rainy No
Rainy Yes
Overcast No
Sunny Yes
Sunny Yes
Rainy Yes
Sunny Yes
Overcast Yes
Overcast Yes
Rainy No
www.SunilOS.com 32
Frequency Table
Weather No Yes
Sunny 1 4 5
Overcast 1 3 4
Rainy 2 3 5
Total 4 10
www.SunilOS.com 33
Prior Probability of class
Weather No Yes
Sunny 1 4 5 5/14=0.35
Overcast 1 3 4 4/14=0.29
Rainy 2 3 5 5/14=0.35
Total 4 10
4/14=0.29 10/14=0.71
www.SunilOS.com 34
Posterior Probability
Weather No Yes Posterior
probability
of No
Posterior
Probability of
Yes
Sunny 1 4 1/4= 0.25 4/10=0.4
Overcast 1 3 1/4= 0.25 3/10=0.3
Rainy 2 3 2/4 =0.5 3/10=0.3
Total 4 10
4/14=0.29 10/14=0.71
www.SunilOS.com 35
Probability of playing when weather is overcast
❑ Equation:
o P(Yes|Overcast)=P(Overcast|Yes)*P(Yes)/P(Overcast)
❑ Calculate Prior Probabilities:
o P(Overcast) = 4/14 = 0.29
o P(Yes)= 10/14 = 0.71
❑ Calculate Posterior Probabilities:
o P(Overcast |Yes) = 3/10 = 0.3
❑ Put Prior and Posterior probabilities in equation
o P (Yes | Overcast) = 0.3 * 0.71 / 0.29 =
0.7344(Higher)
www.SunilOS.com 36
Probability of not playing when weather is overcast
❑ Equation:
o P(No|Overcast)=P(Overcast|No)*P(No)/P(Overcast)
❑ Calculate Prior Probabilities:
o P(Overcast) = 4/14 = 0.29
o P(No)= 4/14 = 0.29
❑ Calculate Posterior Probabilities:
o P(Overcast |No) = 1/4 = 0.25
❑ Put Prior and Posterior probabilities in equation
o P (No | Overcast) = 0.25 * 0.29 / 0.29 =
0.25(Low)
www.SunilOS.com 37
Implementation of Naive Bayes algorithm:
❑ # Assigning features and label variables
❑ weather=['Sunny','Sunny','Overcast','Rainy','Ra
iny','Rainy','Overcast','Sunny','Sunny','Rainy'
,'Sunny','Overcast','Overcast','Rainy']
❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','C
ool','Mild','Cool','Mild','Mild','Mild','Hot','
Mild']
❑ play=['No','No','Yes','Yes','Yes','No','Yes','N
o','Yes','Yes','Yes','Yes','Yes','No']
www.SunilOS.com 38
Implementation of Naive Bayes algorithm (cont.)
❑ # Import LabelEncoder
o from sklearn import preprocessing
❑ #creating labelEncoder
o le = preprocessing.LabelEncoder()
❑ # Converting string labels into numbers.
o weather_encoded=le.fit_transform(weather)
o print("Weather:",weather_encoded)
❑ # Converting string labels into numbers
o temp_encoded=le.fit_transform(temp)
o print("Temp:",temp_encoded)
o label=le.fit_transform(play)
o print("Play:",label)
www.SunilOS.com 39
Implementation of Naive Bayes algorithm (cont.)
❑ #Combining weather and temp into single list of tuples
o features=list(zip(weather_encoded,temp_encoded))
o print("Features:",features)
❑ #Import Gaussian Naive Bayes model
o from sklearn.naive_bayes import GaussianNB
❑ #Create a Gaussian Classifier
o model = GaussianNB()
❑ # Train the model using the training sets
o model.fit(features,label)
❑#Predict Output: 0:Overcast, 2:Mild
o predicted= model.predict([[0,2]])
o print ("Predicted Value:", predicted)
www.SunilOS.com 40
Multinomial Naive Bayes algorithm:
❑This machine learning algorithm is used for text
data classification.
❑If we are interested in finding out a number of
occurrences of a word in a document then we have
to use a multinomial naive Bayes algorithm.
www.SunilOS.com 41
How does Naive Bayes Algorithm Works ?
❑ Let’s consider an example, classify the review whether it is
positive or negative.
❑ Training Dataset:
www.SunilOS.com 42
Text Reviews
I like the movie Positive
It's a good movie. Nice Story Positive
Nice songs. But sadly a boring
ending.
negative
Overall nice movie Positive
Sad, boring movie negative
❑ We classify whether the text “overall liked the movie” has a
positive review or a negative review. We have to calculate:
❑ P(positive | overall liked the movie) — the probability that
the tag of a sentence is positive.
❑ P(negative | overall liked the movie) — the probability that
the tag of a sentence is negative .
❑ Before that, first, we apply Removing Stopwords and
Stemming in the text.
www.SunilOS.com 43
Removing Stopwords & Stemming
❑ Removing Stopwords: These are common words that don’t
really add anything to the classification, such as an able,
either, else, ever and so on.
❑
❑ Stemming: Stemming to take out the root of the word. A
stemming algorithm reduces the words
o “chocolates”, “chocolaty”, “Choco” to the root word, “chocolate”
o and “retrieval”, “retrieved”, “retrieves” reduce to the stem “retrieve”.
www.SunilOS.com 44
Feature Engineering:
❑The important part is to find the features from the data
to make machine learning algorithms works.
❑ In this case, we have text. We need to convert this text
into numbers that we can do calculations on.
❑ We use word frequencies. That is treating every
document as a set of the words it contains.
❑Our features will be the counts of each words.
www.SunilOS.com 45
Now Calculate Probability
❑ In our case, we have
o P(positive | overall liked the movie)
❑ Since for our classifier we have to find out which tag has a
bigger probability, we can discard the divisor which is the same
for both tags,
o P(overall liked the movie|positive)* P(positive)
o P(overall liked the movie|negative)* P(negative)
www.SunilOS.com 46
❑ There’s a problem though: “overall liked the movie” doesn’t
appear in our training dataset, so the probability is zero. Here, we
assume the ‘naive’ condition that every word in a sentence is
independent of the other ones. This means that now we look at
individual words.
❑ We can write this as:
o P(overall liked the movie) = P(overall) * P(liked) * P(the) * P(movie)
❑ The next step is just applying the Bayes theorem:
o P(overall liked the movie| positive) = P(overall | positive) * P(liked |
positive) * P(the | positive) * P(movie | positive)
❑ And now, these individual words actually show up several times
in our training data, and we can calculate probability of them!
www.SunilOS.com 47
The prior Probability
❑ P(positive) is= 3/5 =0.6.
❑ P(negative) is= 2/5=0.4.
❑ Then, calculating P(overall | positive) means counting how many
times the word “overall” appears in positive texts+1 divided by
the total number of words in positive+ total no of unique words
in all reviews.
o Total words in positive=13.
o Total words in Negative=10.
o Total Unique words in all=15
www.SunilOS.com 48
Calculated Prior Probability
❑ Therefore,
o P(overall | positive) = (1+1)/(13+15)=0.07142
o P(liked | positive) = (1+1)/(13+15)=0.07142
o P(the | positive) = (1+1)/(13+15)=0.07142
o P(movie | positive) = (3+1)/(13+15)=0.1428
❑ Therefore,
o P(overall | negative) = (0+1)/(10+15)=0.04
o P(liked | negative) = (0+1)/(10+15)=0.04
o P(the | negative) = (0+1)/(10+15)=0.04
o P(movie| negative) = (1+1)/(10+15)=0.08
www.SunilOS.com 49
Laplace smoothing
❑If probability comes out to be zero then By using
Laplace smoothing:
❑we add 1 to every count so it’s never zero. To balance
this, we add the number of possible words to the
divisor, so the division will never be greater than 1.
❑In our case, the total unique possible words count are
15.
www.SunilOS.com 50
Calculate Prior Probability
www.SunilOS.com 51
Result: Positive Review
❑ P(overall | positive) * P(liked |positive)
* P(the | positive) * P(movie | positive)
* P(positive )= 3.06 * 10^{-5}=0.0000306
❑ P(overall | negative) * P(liked |negative)
* P(the | negative) * P(movie | negative)
* P(negative) = 0.20 * 10^{-5}=0.000002048
www.SunilOS.com 52
Implementation of Multinomial Naive Bayes algorithm:
❑Multinomial implements the naive Bayes algorithm for
multinomially (discrete no of possible outcome)
distributed data,
❑and is one of the two classic naive Bayes variants used
in text classification (where the data are typically
represented as word vector counts).
www.SunilOS.com 53
Implementation of Multinomial Naive Bayes algorithm:
❑ # Assigning features and label variables
o import numpy as np
o reviews=np.array(['I like the movie',
o 'Its a good movie. Nice Story',
o 'Nice songs. But sadly a boring ending.',
o 'Overall nice movie',
o 'Sad, boring movie'])
o label=["positive","positive","negative","positive
","negative"]
o test=np.array(["Overall i like the movie"])
www.SunilOS.com 54
Implementation of Multinomial Naive Bayes algorithm (cont.)
❑ #encode text data into numeric
o from sklearn import preprocessing
❑ #creating labelEncoder
o le = preprocessing.LabelEncoder()
❑ # Converting string labels into numbers.
o lable_encoded=le.fit_transform(label)
o print("Label:",lable_encoded)
www.SunilOS.com 55
Implementation of Multinomial Naive Bayes algorithm (cont.)
❑ # Generate counts from text using a vectorizer. There are other
vectorizers available, and lots of options you can set.
❑ # This performs our step of computing word counts.
o from sklearn.feature_extraction.text import
CountVectorizer
o vectorizer=CountVectorizer(stop_words='english')
o train_features =vectorizer.fit_transform(reviews)
o test_features = vectorizer.transform(test)
o print("Train vocabulary:",vectorizer.vocabulary_)
❑ #Print Dimension of the training and test data
o print("Shape of Train:",train_features.shape)
o print("Shape of Train:",test_features.shape)
www.SunilOS.com 56
Implementation of Multinomial Naive Bayes algorithm (cont.)
❑ # Fit a naive Bayes model to the training data.
❑ # This will train the model using the word counts we computer,
and the existing classifications in the training set.
o nb = MultinomialNB()
o nb.fit(train_features,lable_encoded)
❑
❑ # Now we can use the model to predict classifications for our test
features.
o predictions = nb.predict(test_features)
o print(predictions)
www.SunilOS.com 57
Bernoulli Naive Bayes:
❑ BernoulliNB implements the naive Bayes training and
classification algorithms for data that is distributed according to
multivariate Bernoulli distributions;
o i.e., there may be multiple features but each one is assumed to be a
binary-valued (boolean) variable.
❑ Therefore, this class requires samples to be represented as
binary-valued feature vectors;
❑ if handed any other kind of data, a BernoulliNB instance may
binarize its input (depending on the binarize parameter).
www.SunilOS.com 58
for a Bernoulli trial
❑ a random experiment that has only two outcomes
o usually called a “Success” or a “Failure”.
❑ For example, the probability of getting a heads (a “success”)
while flipping a coin is 0.5.
❑ The probability of “failure” is 1 – P (1 minus the probability of
success, which also equals 0.5 for a coin toss).
❑ It is a special case of the binomial distribution for n = 1. In other
words, it is a binomial distribution with a single trial (e.g. a
single coin toss).
www.SunilOS.com 59
Implementation of Bernoulli Naive Bayes algorithm (cont.)
❑ # Assigning features and label variables
o import numpy as np
o document=np.array(["Saturn Dealer’s Car",
o "Toyota Car Tercel",
o "Baseball Game Play",
o "Pulled Muscle Game",
o "Colored GIFs Root"])
o label=np.array(["Auto","Auto","Sports","Sports","
Computer"])
o test=np.array(["Home Runs Game","Car Engine
Noises"])
www.SunilOS.com 60
Implementation of Bernoulli Naive Bayes algorithm (cont.)
❑ #Import preprocessing
o from sklearn import preprocessing
❑ #creating labelEncoder
o le = preprocessing.LabelEncoder()
❑ # Converting string labels into numbers.
o lable_encoded=le.fit_transform(label)
o print("Label:",lable_encoded)
www.SunilOS.com 61
Implementation of Bernoulli Naive Bayes algorithm (cont.)
❑ # Generate counts from text using a vectorizer. There are other
vectorizers available, and lots of options you can set.
❑ # This performs our step of computing word Occurrence counts.
o vectorizer=CountVectorizer(stop_words='english',b
inary=True)
o train_features =
vectorizer.fit_transform(document)
o test_features = vectorizer.transform(test)
o print("Train vocabulary:",vectorizer.vocabulary_)
❑ #Print dimention of the Trainning and Ttest data
o print("Shape of Train:",train_features.shape)
o print("Shape of Train:",test_features.shape)
www.SunilOS.com 62
Implementation of Bernoulli Naive Bayes algorithm (cont.)
❑ # Fit a naive Bayes model to the training data.
❑ # This will train the model using the word occurrence counts we
compute, in the existing classifications in the training set.
o nb=BernoulliNB()
o nb.fit(train_features,lable_encoded)
❑
❑ # Now we can use the model to predict classifications for our test
features.
o predictions = nb.predict(test_features)
o print("Prediction:",predictions)
www.SunilOS.com 63
Advantages Of Naïve Bayes
❑ It is Simple, Fast and accurate.
❑ It has very low computation cost.
❑ It can efficiently work on a large dataset.
❑ It can be used with multiple class prediction problems.
❑ It also performs well in the case of text analytics problems.
❑ When the assumption of independence holds, a Naive Bayes
classifier performs better compared to other models like logistic
regression.
www.SunilOS.com 64
Disadvantages of naive Bayes
❑ The assumption of independent features. In practice, it is
almost impossible that model will get a set of predictors which
are entirely independent.
❑ If there is no training tuple of a particular class, this causes
zero posterior probability.
❑ In this case, the model is unable to make predictions. This
problem is known as Zero Probability/Frequency Problem.
www.SunilOS.com 65
www.SunilOS.com 66
Decision Tree
www.sunilos.com
www.raystec.com
Decision tree
www.SunilOS.com 67
What Is Decision Tree?
❑ Decision Tree is a supervised learning algorithm.
❑ It is a tree Like structure for classification and regression Model.
❑ Decision trees can be used for both categorical and numerical
data.
o The categorical data represent: gender, marital status, etc.
o while the numerical data represent age, temperature, etc.
❑ A decision tree is a tree
❑ where each node represents
o a feature (attribute),
❑ each link (branch) represents
o a decision (rule) and
❑ each leaf represents an
o outcome (categorical or continues value).
www.SunilOS.com 68
Reason to choose Decision Tree
❑Decision Trees usually represents human
thinking ability while making a decision, so it is
easy to understand.
❑The logic behind the decision tree can be easily
understood because it shows a tree-like structure.
www.SunilOS.com 69
Terminologies
❑ Root Node: It is first node of the tree. It represents the entire
dataset, which further gets divided into two or more
homogeneous sets.
❑ Leaf Node: It is final nodes of the tree, and the tree cannot be
further divided after getting a leaf node.
❑ Splitting: Splitting is the process of dividing the decision
node/root node into sub-nodes according to the given conditions.
❑ Branch/Sub Tree: A tree formed by splitting the tree.
❑ Pruning: Pruning is the process of removing the unwanted
branches from the tree.
❑ Parent/Child node: The root node of the tree is called the parent
node, and other nodes are called the child nodes.
www.SunilOS.com 70
How Does A Decision Tree Work?
❑ It splits the dataset into subsets on the basis of the most
significant attribute in the dataset.
❑ How the decision tree identifies this attribute and how this
splitting is done is decided by Attribute selection Measure.
❑ The most significant attribute is selected as the root node.
❑ Splitting is done to form sub-nodes called decision nodes.
❑ And the nodes which do not split further are terminal or leaf
nodes.
www.SunilOS.com 71
Attribute selection measure.
❑ While implementing a Decision tree, the main issue arises
that how to select the best attribute for the root node and for
sub-nodes.
❑ So, to solve such problems there is a technique which is
called as Attribute selection measure or ASM.
❑ There are two popular techniques for ASM, which are:
o Information Gain
o Gini Index
www.SunilOS.com 72
Information Gain
❑ It calculates how much information a feature provides us about a
class.
❑ According to the value of information gain, we split the node and
build the decision tree.
❑ A node/attribute having the highest information gain is split first.
It can be calculated using the below formula:
o Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)
❑ Entropy:It specifies randomness in data. Entropy can be calculated as:
o Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)Where,
❑ S= Total number of samples
❑ P(yes)= probability of yes
❑ P(no)= probability of no
www.SunilOS.com 73
Gini Index
❑ Gini index is a measure of impurity or purity used while creating
a decision tree in the CART(Classification and Regression Tree)
algorithm.
❑ An attribute with the low Gini index should be preferred as
compared to the high Gini index.
❑ It only creates binary splits, and the CART algorithm uses the
Gini index to create binary splits.
❑ Gini index can be calculated using the below formula:
o Gini Index= 1- ∑jPj
2
www.SunilOS.com 74
Types of decision Trees Algorithms
❑ There are many decision tree algorithms available. Some of
Them are as following
❑ ID3
❑ C4.5
❑ CART
❑ etc.
www.SunilOS.com 75
Advantages & Disadvantages of DT
Advantages
❑ It follows the same process
as human follows in real life
to make decisions.
❑ Easy To Understand.
❑ It can be very useful for
solving decision-related
problems.
❑ It helps to think about all the
possible outcomes for a
problem.
❑ No need of data cleaning.
Disadvantages
❑ The decision tree contains
lots of layers, which makes
it complex.
❑ It may have an overfitting
issue, which can be resolved
using the Random Forest
algorithm.
❑ For more class labels, the
computational complexity of
the decision tree may
increase.
76
Working of CART Algorithm
www.SunilOS.com 77
Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No
Gini index:
❑Gini index is a metric for classification tasks in
CART.
❑It stores sum of squared probabilities of each class.
We can formulate it as illustrated below.
❑Gini = 1 – Σ (Pi)2 for i=1 to number of classes
www.SunilOS.com 78
Select attribute to create Root node
❑ Outlook(weather):Outlook is a nominal feature. It can be sunny, overcast
or rain. The final decisions for outlook feature.
❑ Gini(Outlook=Sunny) = 1 – (2/5)2 – (3/5)2 = 1 – 0.16 – 0.36 = 0.48
❑ Gini(Outlook=Overcast) = 1 – (4/4)2 – (0/4)2 = 0
❑ Gini(Outlook=Rain) = 1 – (3/5)2 – (2/5)2 = 1 – 0.36 – 0.16 = 0.48
❑ Then, we will calculate weighted sum of gini indexes for outlook feature.
❑ Gini(Outlook) = (5/14) x 0.48 + (4/14) x 0 + (5/14) x 0.48
❑ Gini(Outlook)= 0.171 + 0 + 0.171 = 0.342
www.SunilOS.com 79
Outlook Yes No Number of instances
Sunny 2 3 5
Overcast 4 0 4
Rainy 3 2 5
Temperature
❑ Similarly, temperature is a nominal feature and it could have 3 different
values: Cool, Hot and Mild. Let’s summarize decisions for temperature
feature.
❑ Gini(Temp=Hot) = 1 – (2/4)2 – (2/4)2 = 0.5
❑ Gini(Temp=Cool) = 1 – (3/4)2 – (1/4)2 = 1 – 0.5625 – 0.0625 = 0.375
❑ Gini(Temp=Mild) = 1 – (4/6)2 – (2/6)2 = 1 – 0.444 – 0.111 = 0.445
❑ We’ll calculate weighted sum of gini index for temperature feature
❑ Gini(Temp) = (4/14) x 0.5 + (4/14) x 0.375 + (6/14) x 0.445
❑ Gini(Temp)= 0.142 + 0.107 + 0.190 = 0.439
www.SunilOS.com 80
Temperature Yes No Number of
instances
Hot 2 2 4
Cool 3 1 4
Mild 4 2 6
Humidity
❑ Humidity is a binary class feature. It can be high or normal.
❑ Gini(Humidity=High) = 1 – (3/7)2 – (4/7)2 = 1 – 0.1836 – 0.326
❑ Gini(Humidity=High) = 0.48
❑ Gini(Humidity=Normal) = 1 – (6/7)2 – (1/7)2 = 1 – 0.734 – 0.020
❑ Gini(Humidity=High) = 0.244
❑ We’ll calculate weighted sum of gini index for Humidity feature
❑ Gini(Wind) = (7/14) x 0.48 + (7/14) x 0.244 = 0.362
www.SunilOS.com 81
Humidity Yes No Number of
instances
High 3 4 7
Normal 6 1 7
Windy
❑ Wind is a binary class similar to humidity. It can be weak and strong.
❑ Gini(Wind=Weak) = 1 – (6/8)2 – (2/8)2 = 1 – 0.5625 – 0.062
❑ Gini(wind=weak)= 0.375
❑ Gini(Wind=Strong) = 1 – (3/6)2 – (3/6)2 = 1 – 0.25 – 0.25
❑ Gini(Wind=Strong)= 0.5
❑We’ll calculate weighted sum of gini index for wind feature
❑ Gini(Wind) = (8/14) x 0.375 + (6/14) x 0.5
❑ Gini(wind)= 0.428
www.SunilOS.com 82
Wind Yes No Number of
instances
Weak 6 2 8
Strong 3 3 6
To Make decision tree
❑ Choose attribute with Lower Gini Index.
❑ Outlook will be the root node because it has minimum gini index
value. Overcast subset has only yes decisions. That means overcast
leaf is over
❑ We will apply same principles to those sub datasets in the following
steps. Focus on the sub dataset for sunny outlook. We need to find the
gini index scores for temperature, humidity and wind features
respectively.
www.SunilOS.com 83
Feature Gini index
Outlook 0.342
Temperature 0.439
Humidity 0.362
Wind 0.428
Sub-tree (subset) sunny
Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
11 Sunny Mild Normal Strong Yes
www.SunilOS.com 84
Gini of temperature for sunny outlook:
❑ Gini(Outlook=Sunny and Temp.=Hot) = 1 – (0/2)2 – (2/2)2 = 0
❑ Gini(Outlook=Sunny and Temp.=Cool) = 1 – (1/1)2 – (0/1)2 = 0
❑ Gini(Outlook=Sunny and Temp.=Mild) = 1 – (1/2)2 – (1/2)2 = 1 – 0.25
– 0.25 = 0.5
❑ Gini(Outlook=Sunny and Temp.) = (2/5)x0 + (1/5)x0 + (2/5)x0.5 = 0.2
www.SunilOS.com 85
Temperature Yes No Number of
instances
Hot 0 2 2
Cool 1 0 1
Mild 1 1 2
Gini of humidity for sunny Outlook(Weather):
❑ Gini(Outlook=Sunny and Humidity=High) = 1 – (0/3)2 – (3/3)2 = 0
❑ Gini(Outlook=Sunny and Humidity=Normal) = 1 – (2/2)2 – (0/2)2 = 0
❑ Gini(Outlook=Sunny and Humidity) = (3/5)x0 + (2/5)x0 = 0
www.SunilOS.com 86
Humidity Yes No Number of
instances
High 0 3 3
Normal 2 0 2
Gini of wind for sunny outlook:
❑ Gini(Outlook=Sunny and Wind=Weak) = 1 – (1/3)2 – (2/3)2 = 0.266
❑ Gini(Outlook=Sunny and Wind=Strong) = 1- (1/2)2 – (1/2)2 = 0.2
❑ Gini(Outlook=Sunny and Wind) = (3/5)x0.266 + (2/5)x0.2 = 0.466
www.SunilOS.com 87
Wind Yes No Number of
instances
Weak 1 2 3
Strong 1 1 2
Decision for sunny outlook:
❑ We’ve calculated gini index scores for feature when outlook is sunny.
The winner is humidity because it has the lowest value.
❑ We’ll put humidity at the extension of sunny outlook because it has
minimum gini index.
❑ As seen, decision is always no for high humidity and sunny outlook.
On the other hand, decision will always be yes for normal humidity
and sunny outlook. This branch is over.
www.SunilOS.com 88
Feature Gini index
Temperature 0.2
Humidity 0
Wind 0.466
Now, we need to focus on rain outlook.
Day Outlook Temp. Humidity Wind Decision
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
10 Rain Mild Normal Weak Yes
14 Rain Mild High Strong No
www.SunilOS.com 89
Gini of temperature for rain outlook:
❑ Gini(Outlook=Rain and Temp.=Cool) = 1 – (1/2)2 – (1/2)2 = 0.5
❑ Gini(Outlook=Rain and Temp.=Mild) = 1 – (2/3)2 – (1/3)2 = 0.444
❑ Gini(Outlook=Rain and Temp.) = (2/5)x0.5 + (3/5)x0.444 = 0.466
www.SunilOS.com 90
Temperature Yes No Number of
instances
Cool 1 1 2
Mild 2 1 3
Gini of humidity for rain outlook:
❑ Gini(Outlook=Rain and Humidity=High) = 1 – (1/2)2 – (1/2)2 = 0.5
❑ Gini(Outlook=Rain and Humidity=Normal) = 1 – (2/3)2 – (1/3)2 =
0.444
❑ Gini(Outlook=Rain and Humidity) = (2/5)x0.5 + (3/5)x0.444 = 0.466
www.SunilOS.com 91
Humidity Yes No Number of
instances
High 1 1 2
Normal 2 1 3
Gini of wind for rain outlook:
❑ Gini(Outlook=Rain and Wind=Weak) = 1 – (3/3)2 – (0/3)2 = 0
❑ Gini(Outlook=Rain and Wind=Strong) = 1 – (0/2)2 – (2/2)2 = 0
❑ Gini(Outlook=Rain and Wind) = (3/5)x0 + (2/5)x0 = 0
www.SunilOS.com 92
Wind Yes No Number of
instances
Weak 3 0 3
Strong 0 2 2
Decision for rain outlook:
❑ So for rain outlook we will take wind feature for spliting because it has
minimum gini index.
❑ Put the wind feature for rain outlook branch and monitor the new sub
data sets.
❑ As seen, decision is always yes when wind is weak. On the other hand,
decision is always no if wind is strong. This means, this branch is over.
www.SunilOS.com 93
Feature Gini index
Temperature 0.466
Humidity 0.466
Wind 0
Final decision Tree
www.SunilOS.com 94
Code Implementation of CART
❑ #Assigning features and label variables
❑ weather=['Sunny','Sunny','Overcast','Rainy','Rainy',
'Rainy','Overcast','Sunny','Sunny','Rainy','Sunny',
'Overcast', 'Overcast‘ , 'Rainy']
❑
❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool',
'Mild','Cool','Mild','Mild','Mild','Hot','Mild']
❑
❑ humidity=["High","High","High","High","Normal","Norm
al","Normal","High","Normal","Normal","Normal","High
","Normal","High"]
❑
❑ Windy=["Weak","Strong","Weak","Weak","Weak","Strong“
,"Strong","Weak","Weak","Weak","Strong","Strong","We
ak","Strong"]
www.SunilOS.com 95
Code Implementation of CART
❑ play=['No','No','Yes','Yes','Yes','No','Yes','N
o','Yes','Yes','Yes','Yes','Yes','No']
❑
❑ # Import LabelEncoder
❑ from sklearn import preprocessing
❑
❑ #creating labelEncoder
❑ le = preprocessing.LabelEncoder()
❑
❑ # Converting string labels into numbers.
❑ weather_encoded=le.fit_transform(weather)
❑ print("Weather:",weather_encoded)
❑
www.SunilOS.com 96
Code Implementation of CART
❑ # Converting string labels into numbers
❑ temp_encoded=le.fit_transform(temp)
❑ print("Temp:",temp_encoded)
❑
❑ windy_encoded=le.fit_transform(Windy)
❑ print("Windy:",windy_encoded)
❑
❑ Humadity_encoded=le.fit_transform(humadity)
❑ print("Humadity:",Humadity_encoded)
❑ label=le.fit_transform(play)
❑ print("Play:",label)
www.SunilOS.com 97
Code Implementation of CART
❑ #Combinig weather,temp, Windy, humadity into single listof tuples
❑ features=list(zip(weather_encoded,temp_encoded,windy
_encoded,Humadity_encoded))
❑ print("Features:",features)
❑ #Import the DecisionTreeClassifier
❑ from sklearn.tree import DecisionTreeClassifier
❑ tree = DecisionTreeClassifier(criterion='gini')
❑ #Train the Model
❑ tree.fit(features,label)
❑ #Test Model 2:sunny, 2:Mild 0:Windy:Strong 0:Humadity:High
❑ prediction = tree.predict([[2,2,1,0]])
❑ print("Decision",prediction)
❑
www.SunilOS.com 98
Working of ID3 Algorithm
❑ For ID3 implementation we are using the same dataset
which we have used in CART algorithm.
❑ First step will be to create a root node.
❑ If all results are yes, then the leaf node “yes” will be
returned else the leaf node “no” will be returned.
❑ Find out the Entropy of all observations and entropy with
attribute “x” that is E(S) and E(S, x).
❑ Find out the information gain and select the attribute with
high information gain.
❑ Repeat the above steps until all attributes are covered.
www.SunilOS.com 99
Complete Entropy of dataset
❑ First we will calculate entropy for decision column (play)
Decision column consists of 14 instances and includes two
labels: Yes and No.
o Yes=9
o No=5
❑ Entropy(Decision)= –p(Yes)*log2p(Yes)–p(No)*log2p(No)
❑ Entropy(Decision)= –(9/14) *log2(9/14)–(5/14)*log2(5/14)
= 0.940
❑ Now, we need to find out the most dominant attribute to
make root node of the tree.
www.SunilOS.com 100
Wind factor on decision
❑ Formula:
o Gain(Decision,Wind)=Entropy(Decision) – ∑ [ p(Decision|Wind).*
Entropy(Decision|Wind)]
❑ Wind attribute has two labels: Weak and Strong. We
would reflect it to the formula.
o Gain(Decision,Wind)=Entropy(Decision)–
[p(Decision|Wind=Weak)*Entropy(Decision|Wind=Weak)]-
[p(Decision|Wind=Strong)*Entropy(Decision|Wind=Strong) ]
❑ Now, we need to calculate (Decision|Wind=Weak)
and (Decision|Wind=Strong) respectively.
www.SunilOS.com 101
Weak wind factor on decision
Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
13 Overcast Hot Normal Weak Yes
www.SunilOS.com 102
Weak wind factor on decision
❑ There are 8 instances for weak wind. Decision of 2 items are
No and 6 items are Yes as illustrated below.
❑ Entropy(Decision|Wind=Weak)=–p(No)*log2p(No)-p(Yes)*log2p(Yes)
❑ Entropy(Decision|Wind=Weak) = – (2/8)*log2(2/8) – (6/8) *log2(6/8)
❑ Entropy(Decision|Wind=Weak) = 0.811
www.SunilOS.com 103
Strong wind factor on decision(Play):
Day Outlook Temp. Humidity Wind Decision
2 Sunny Hot High Strong No
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
14 Rain Mild High Strong No
www.SunilOS.com 104
Strong wind factor on decision(Play):
❑ Here, there are 6 instances for strong wind. Decision is
divided into two equal parts.
❑ Entropy(Decision|Wind=Strong)=–p(No)*log2p(No)– p(Yes)*log2p(Yes)
❑ Entropy(Decision|Wind=Strong) = – (3/6)*log2(3/6) – (3/6) *log2(3/6)
❑ Entropy(Decision|Wind=Strong) = 1
www.SunilOS.com 105
Information Gain for Wind Attribute
❑ Formula:
o Gain(Decision,Wind) = Entropy(Decision)–
[p(Decision|Wind=Weak) * Entropy(Decision|Wind=Weak) ] –
[p(Decision|Wind=Strong)*Entropy(Decision|Wind=Strong) ]
❑ Gain(Decision,Wind) = 0.940 – [ (8/14) *0.811 ] – [ (6/14)*1]
❑ Gain(Decision,Wind) = 0.048
❑ We Have calculated Gain for Wind. Apply the same procedure to
Others to get Best attribute to make it root node.
www.SunilOS.com 106
Information Gain for Other factors
❑ Other factors on decision
o Gain(Decision, Outlook) = 0.246
o Gain(Decision, Temperature) = 0.029
o Gain(Decision, Humidity) = 0.151
❑ Outlook factor on decision has highest score. That’s why, outlook
decision will appear in the root node of the tree.
www.SunilOS.com 107
Overcast outlook on decision
❑ Basically, decision will always be yes if outlook were overcast.
www.SunilOS.com 108
Day Outlook Temp. Humidity Wind Decision
3 Overcast Hot High Weak Yes
7 Overcast Cool Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
Sunny outlook on decision
Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
11 Sunny Mild Normal Strong Yes
www.SunilOS.com 109
Sunny outlook on decision
❑ Here, there are 5 instances for sunny outlook. Decision
would be probably 3/5 percent No, 2/5 percent Yes.
❑ Gain(Outlook=Sunny|Temperature) = 0.570
❑ Gain(Outlook=Sunny|Humidity) = 0.970
❑ Gain(Outlook=Sunny|Wind) = 0.019
❑ Now, humidity is the decision because it produces the
highest score if outlook were sunny.
www.SunilOS.com 110
Sunny outlook on decision
❑ At this point, decision will always be NO if humidity were high.
❑ At this point, decision will always be Yes if humidity were Normal.
www.SunilOS.com 111
Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
8 Sunny Mild High Weak No
Day Outlook Temp. Humidity Wind Decision
9 Sunny Cool Normal Weak Yes
11 Sunny Mild Normal Strong Yes
Rain outlook on decision
❑ Gain(Outlook=Rain | Temperature) = 0.01997309402197489
❑ Gain(Outlook=Rain | Humidity) = 0.01997309402197489
❑ Gain(Outlook=Rain | Wind) = 0.9709505944546686
❑ Here, wind produces the highest score if outlook were rain. That’s why, we
need to check wind attribute in 2nd level if outlook were rain.
www.SunilOS.com 112
Day Outlook Temp. Humidity Wind Decision
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
10 Rain Mild Normal Weak Yes
14 Rain Mild High Strong No
Rain outlook on decision
❑ Decision will always Yes if wind were weak and outlook were
rain.
❑ Decision will always No if wind were Strong and outlook were
rain.
www.SunilOS.com 113
Day Outlook Temp. Humidity Wind Decision
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
Day
Outloo
k
Temp.
Humidit
y
Wind Decision
6 Rain Cool Normal Strong No
14 Rain Mild High Strong No
Final decision Tree
www.SunilOS.com 114
Implementation of ID3
❑ #Import the DecisionTreeClassifier
❑ from sklearn.tree import DecisionTreeClassifier
❑ # Assigning features and label variables
❑ weather=['Sunny','Sunny','Overcast','Rainy','Ra
iny','Rainy','Overcast','Sunny','Sunny',
'Rainy','Sunny','Overcast','Overcast','Rainy']
❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','C
ool','Mild','Cool','Mild','Mild','Mild','Hot','
Mild']
❑
❑ play=['No','No','Yes','Yes','Yes','No','Yes','N
o','Yes','Yes','Yes','Yes','Yes','No']
www.SunilOS.com 115
Implementation of ID3(cont.)
❑ # Import LabelEncoder
❑ from sklearn import preprocessing
❑ #creating labelEncoder
❑ le = preprocessing.LabelEncoder()
❑ # Converting string labels into numbers.
❑ weather_encoded=le.fit_transform(weather)
❑ print("Weather:",weather_encoded)
❑
❑ # Converting string labels into numbers
❑ temp_encoded=le.fit_transform(temp)
www.SunilOS.com 116
Implementation of ID3(cont.)
❑ print("Temp:",temp_encoded)
❑ label=le.fit_transform(play)
❑ print("Play:",label)
❑ #Combinig weather and temp into single listof tuples
❑ features=list(zip(weather_encoded,temp_encoded))
❑ print("Features:",features)
❑ #Create Instance of Model, and train the model
❑ tree = DecisionTreeClassifier(criterion='entropy')
❑ tree.fit(features,label)
❑ #Predict result for 0:Overcast, 2:mild
❑ prediction = tree.predict([[0,2]])
❑ print("Decision",prediction)
www.SunilOS.com 117
Random Forest Algorithm
www.SunilOS.com 118
What is Random Forest
❑In Random Forest algorithm we join different and same type of
multiple algorithms together. For example multiple decision trees
to make a forest of trees. That is known as Random forest.
❑ It helps us to make a powerful prediction model.
❑Random forest algorithm works for both regression and
classification Problems.
❑Application of Random Forest
o Fraud prediction
o Cancer detection
o Stock market predictions
o Spam filter
o News classification
www.SunilOS.com 119
How does random Forest Works?
❑ Pick N random data records from the dataset.
❑ Based on these N numbers of record build a decision tree.
❑ Choose how many trees we want to create and repeat the
previous steps.
❑ To predict the output for new record:
❑ In case of Regression: Each tree will predict the result. The final
result will be calculated by taking average of all result predicted
by all trees.
❑ In case of Classification: The trees will predict the class level
for new record. Finally we will assign the new record to the
category which has majority.
www.SunilOS.com 120
Advantages and Disadvantages of Random Forest
Advantages
❑ In Random forest there is multiple
trees. So this algorithm is not
biased.
❑ This is a stable algorithm. If new
training data is introduced only
one tree will be affected not all
the trees.
❑ This is suitable for both
categorical data, and numerical
data.
❑ This is also work well when
dataset has missing values
❑ Model can be trained parallel .
Disadvantages
❑It is complex algorithm.
❑It required more computational
time to join multiple decision
trees.
❑It takes too much time to train the
model as compare to other
algorithm
121
Code implementation of random Forest
❑ #Assign features
❑ weather=['Sunny','Sunny','Overcast','Rainy'
,'Rainy','Rainy','Overcast','Sunny','Sunny'
,'Rainy','Sunny','Overcast','Overcast',
'Rainy']
❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool
','Cool','Mild','Cool','Mild','Mild','Mild'
,'Hot','Mild']
❑ humadity=["High","High","High","High","Norm
al","Normal","Normal","High","Normal","Norm
al","Normal","High", "Normal","High"]
www.SunilOS.com 122
Code implementation of random Forest
❑ Windy=["Weak","Strong","Weak","Weak","Weak"
,"Strong","Strong","Weak","Weak","Weak",
"Strong","Strong","Weak","Strong"]
❑ play=['No','No','Yes','Yes','Yes','No','Yes
','No','Yes','Yes','Yes','Yes','Yes','No']
❑
❑ #Import LabelEncoder
❑ from sklearn import preprocessing
❑ #creating labelEncoder
❑ le = preprocessing.LabelEncoder()
www.SunilOS.com 123
Code implementation of random Forest
❑ # Converting string labels into numbers.
❑ weather_encoded=le.fit_transform(weather)
❑ print("Weather:",weather_encoded)
❑
❑ temp_encoded=le.fit_transform(temp)
❑ print("Temp:",temp_encoded)
❑
❑ windy_encoded=le.fit_transform(Windy)
❑ print("Windy:",windy_encoded)
❑
❑ Humadity_encoded=le.fit_transform(humadity)
❑ print("Humadity:",Humadity_encoded)
❑
❑ label=le.fit_transform(play)
❑ print("Play:",label)
www.SunilOS.com 124
Code implementation of random Forest
❑ #Combinig weather and temp into single listof tuples
❑ features=list(zip(weather_encoded,temp_encoded,
❑ windy_encoded,Humadity_encoded))
❑ #Import the RandomforestClassifier
❑ from sklearn.ensemble import RandomForestClassifi
er
❑ #create instance of the Random Forest Classifier
❑ tree= RandomForestClassifier(n_estimators=5)
❑ #train the Model
❑ tree.fit(features,label)
❑ #Test 2:sunny, 2:Mild 0:Windy:Strong 0:Humadity:High
❑ prediction = tree.predict([[2,2,1,0]])
❑ print("Decision",prediction)
www.SunilOS.com 125
www.SunilOS.com 126
Support Vector Machine
www.sunilos.com
www.raystec.com
SVM
❑ Support Vector Machine is a supervised machine learning algorithm.
❑ They are developed in 1990’s and still famous.
❑ It is used for classification and Regression problem.
❑ SVM can be used for linearly and multidimensional dataset (2 Dim. and 3
Dim.).
❑ SVM can be used for multiclass classification(Having more than 1 class
Label).
www.SunilOS.com 127
How SVM Works:
❑ To separate two classes as shown in previous slide. we need a
line that’s separate data in two classes.
❑ This line is known as Decision boundary or a hyper plane. We
draw a line such as we have a maximum margin between the data
points of the classes, which is near to the hyper plane.
❑ To separate the two classes of data points, there are many
possible hyper planes that could be chosen. Our objective is to
find a plane that has the maximum margin, i.e. the maximum
distance between data points of both classes.
❑ Maximizing the margin distance provides some reinforcement so
that future data points can be classified with more confidence.
www.SunilOS.com 128
SVM Related Terminologies
❑ Support Vectors:
o When we classify data with the help of hyperplane, than the data points which are near
to the hyperplane is known as support Vectors.
❑ Hyperplane
o A hyperplane is a decision boundary between the two classes. It is used to separate the
data points of different class.
❑Margin:
o We draw a parallel line along the data points which are near to the hyperplane. The gap
between decision lines of each class is known as margin.
o For ex. D- and D+ are the lines which are closest to the support vectors of two opponent
classes. Than we can obtain margin as
o Margin=D- + D+
o If the margin is larger in between the classes, then it is considered a good margin, a
smaller margin is a bad margin.
www.SunilOS.com 129
What is the reason to Choose SVM?
❑SVM can be used for multiclass classification.
❑SVM can be used for linear separated dataset.
❑SVM can be used for high dimensional dataset which
are not linearly separable.
❑SVM is efficiently classifying the dataset in high
dimension.
www.SunilOS.com 130
Implementation of Linear SVM:
❑ #import liabraries
❑ import numpy as np
❑ import matplotlib.pyplot as plt
❑ from matplotlib import style
❑ style.use("ggplot")
❑ from sklearn import svm
❑ #Attributes
❑ x = [1, 5, 1.5, 8, 1, 9]
❑ y = [2, 8, 1.8, 8, 0.6,11]
❑ plt.scatter(x,y)
❑ plt.show()
www.SunilOS.com 131
Implementation of Linear SVM(cont.)
❑ #import preprocessing
❑ from sklearn import preprocessing
❑ X=list(zip(x,y))
❑ y = [0,1,0,1,0,1]
❑ #Train SVM Model
❑ clf = svm.SVC(kernel='linear', C = 1.0)
❑ clf.fit(X,y)
❑ # Test x=0.58, y=0.76
❑ print(clf.predict([[0.58,0.76]]))
❑ #x=10.58, y=10.76
❑ print(clf.predict([[10.58,10.76]]))
www.SunilOS.com 132
Non- Linear SVM
www.SunilOS.com 133
SVM Kernels
❑ The SVM algorithm is implemented in practice using a kernel.
❑ A kernel transforms an input data space into the required form (linear or non
linear).
❑ SVM uses a technique called the kernel trick. Here, the kernel takes a low-
dimensional input space and transforms it into a higher dimensional space.
❑ In other words, you can say that it converts non separable problem to
separable problems by adding more dimension to it.
❑ It is most useful in non-linear separation problem. Kernel trick helps you to
build a more accurate classifier.
❑ Types of Kernels
o Linear Kernel
o Polynomial Kernel
o RBF (Radial Basis Kernel )
www.SunilOS.com 134
Linear Kernel
❑A linear kernel can be used as normal dot product any two given
observations. The product between two vectors is the sum of the
multiplication of each pair of input values.
o K(x, xi) = sum(x * xi)
❑ For example, the inner product of the vectors [1, 2] and [3, 4] is 1*3 + 2*4 or
11.
❑ The equation for making a prediction for a new input using the dot product
between the input (x) and each support vector (xi) is calculated as follows:
f(x) = B0 + sum(ai * (x,xi))
❑ This is an equation that is used for calculating the inner products of a new
input vector (x) with all support vectors in training data. The coefficients B0
and ai (for each input) must be estimated from the training data by the learning
algorithm.
www.SunilOS.com 135
Polynomial Kernel
❑A polynomial kernel is a more generalized form of the
linear kernel. The polynomial kernel can distinguish
curved or nonlinear input space.
K(x,xi) = 1 + sum(x * xi)^d
❑Where d is the degree of the polynomial. d=1 is similar
to the linear transformation. The degree needs to be
manually specified in the learning algorithm.
www.SunilOS.com 136
RBF (radial basis function) Kernel
❑ The Radial basis function kernel is a popular kernel function
commonly used in support vector machine classification.
RBF can map an input space in infinite dimensional space.
K(x,xi) = exp(-gamma * sum((x – xi^2))
❑ Here gamma is a parameter, which ranges from 0 to 1. A
higher value of gamma will perfectly fit the training dataset,
which causes over-fitting. Gamma=0.1 is considered to be a
good default value. The value of gamma needs to be
manually specified in the learning algorithm.
www.SunilOS.com 137
Implementation of Non Linear Kernel
❑ We can see our dataset is not linearly separable from the
graph.
www.SunilOS.com 138
Implementation of Non Linear Kernel
❑ # Assigning features and label variables
❑ weather=['Sunny','Sunny','Overcast','Rainy','Ra
iny','Rainy','Overcast','Sunny','Sunny','Rainy'
,'Sunny','Overcast','Overcast','Rainy']
❑
❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','C
ool','Mild','Cool','Mild','Mild','Mild','Hot','
Mild']
❑
❑ humadity=["High","High","High","High","Normal",
"Normal","Normal","High","Normal","Normal","Nor
mal","High","Normal","High"]
❑
www.SunilOS.com 139
Implementation of Non Linear Kernel
❑ Windy=["Weak","Strong","Weak","Weak","Weak","St
rong","Strong","Weak","Weak","Weak","Strong","S
trong","Weak","Strong"]
❑
❑ play=['No','No','Yes','Yes','Yes','No','Yes','N
o','Yes','Yes','Yes','Yes','Yes','No']
❑ # Import LabelEncoder
❑ from sklearn import preprocessing
❑ #creating labelEncoder
❑ le = preprocessing.LabelEncoder()
❑ # Converting string labels into numbers.
❑ weather_encoded=le.fit_transform(weather)
❑ print("Weather:",weather_encoded)
www.SunilOS.com 140
Implementation of Non Linear Kernel
❑ # Converting string labels into numbers
❑ temp_encoded=le.fit_transform(temp)
❑ print("Temp:",temp_encoded)
❑ windy_encoded=le.fit_transform(Windy)
❑ print("Windy:",windy_encoded)
❑ Humidity_encoded=le.fit_transform(humadity)
❑ print("Humadity:",Humadity_encoded)
❑ label=le.fit_transform(play)
❑ print("Play:",label)
www.SunilOS.com 141
Implementation of Non Linear Kernel
❑ #Combinig weather and temp into single list of tuples
❑ features=list(zip(weather_encoded,temp_encoded,windy
_encoded,Humadity_encoded))
❑ print("Features:",features)
❑ #import svm
❑ from sklearn import svm
❑ #Create a svm Classifier
❑ clf = svm.SVC(kernel='rbf') # Linear Kernel
❑ #Train SVM Model
❑ clf.fit(features,label)
❑ # Test 2:sunny, 2:Mild 0:Windy:Strong 0:Humadity:High
❑ prediction = clf.predict([[2,2,1,0]])
❑ print("Decision",prediction)
www.SunilOS.com 142
Advantages & Disadvantages of SVM
Advantages
❑It works really well with a
clear margin of separation
❑It is effective in high
dimensional spaces.
❑It is effective in cases where
the number of dimensions is
greater than the number of
samples.
❑It support vectors, so it is
also memory efficient.
Disadvantages
❑It doesn’t perform well
when we have large data set
because the required
training time is higher
❑It also doesn’t perform very
well, when the data set has
more noise i.e. target classes
are overlapping
143
www.SunilOS.com 144
Regression
www.sunilos.com
www.raystec.com
Types Of Regression
❑Linear regression
❑Logistic regression
❑Polynomial regression
www.SunilOS.com 145
Logistic Linear Polynomial
Regression Regression Regression
Logistic Regression and linear Regression
Linear Regression Logistic Regression
Linear regression is used to predict the continuous dependent
variable using a given set of independent variables.
Logistic Regression is used to predict the categorical dependent
variable using a given set of independent variables.
Linear Regression is used for solving Regression problem. Logistic regression is used for solving Classification problems.
In Linear regression, we predict the value of continuous
variables.
In logistic Regression, we predict the values of categorical
variables.
In linear regression, we find the best fit line, by which we can
easily predict the output.
In Logistic Regression, we find the S-curve by which we can
classify the samples.
Least square estimation method is used for estimation of
accuracy.
Maximum likelihood estimation method is used for estimation
of accuracy.
The output for Linear Regression must be a continuous value,
such as price, age, etc.
The output of Logistic Regression must be a Categorical value
such as 0 or 1, Yes or No, etc.
In Linear regression, it is required that relationship between
dependent variable and independent variable must be linear.
In Logistic regression, it is not required to have the linear
relationship between the dependent and independent variable.
In linear regression, there may be collinearity between the
independent variables.
In logistic regression, there should not be collinearity between
the independent variable.
www.SunilOS.com 146
Linear Regression
❑Linear regression:
o Linear regression is a statistical approach for
modeling the relationship between a dependent
variable with a given set of independent variables.
4/16/2020 www.SunilOS.com 147
Linear Regression cont.
❑Linear regression attempts to model the
relationship between two variables by fitting a
linear equation to observed data. One variable is
considered to be an independent variable, and
the other is considered to be a dependent
variable.
o For example, a modeler might want to relate the
weights of individuals to their heights using a linear
regression model.
4/16/2020 www.SunilOS.com 148
What is Linear
❑First, let’s say that you are shopping at Dmart.
Whether you buy goods or not, you have to pay
2.00rs for parking ticket. Each apple price 1.5rs.,
and you have to buy an (x) item of apple. Then
we can populate a price list as following:
4/16/2020 www.SunilOS.com 149
Linear Relationship among data
Quantity Price
1 3.50 Rs.
2 5.00 Rs
3 6.50 Rs
4 8.00 Rs
5 9.50 Rs
… ...
10 17.00 Rs
11 18.50 Rs
... ...
x y
4/16/2020 www.SunilOS.com 150
Linear Function
❑ It’s easy to predict (or calculate) the Price based on Value and vice versa using
the equation of y=2+1.5x for this example or:
Y =a + bx
❑ Linear Functions with:
❑ a = 2
❑ b = 1.5
❑ A linear function has one independent variable and one dependent variable.
The independent variable is x and the dependent variable is y.
❑ a is the constant term or the y intercept. It is the value of the dependent
variable when x = 0.
❑ b is the coefficient of the independent variable. It is also known as the slope
and gives the rate of change of the dependent variable.
4/16/2020 www.SunilOS.com 151
Implementation of Linear Regression:
❑ Code explanation:
❑ dataset: the table contains all values in our csv file
❑ X: the first column which contains Years Experience array
❑ y: the last column which contains Salary array
y = b0 + b1*x1
❑ y: dependent variable
❑ b0: constant
❑ b1: coefficient
❑ x1: independent variable
4/16/2020 www.SunilOS.com 152
Dataset: Salary Data
4/16/2020 www.SunilOS.com 153
Visualization of data
4/16/2020 www.SunilOS.com 154
Code Implementation of Linear Regression
❑ import numpy as np
❑ import matplotlib.pyplot as plt
❑ import pandas as pd
❑ # Importing the dataset
❑ dataset=pd.read_csv('E:/MLImplementation/r
egression.csv')
❑ #get a copy of dataset exclude last column
❑ X = dataset.iloc[:, :-1].values
❑ #get array of dataset in column 1st
❑ y = dataset.iloc[:, 1].values
4/16/2020 www.SunilOS.com 155
Code Implementation of Linear Regression (cont.)
❑ # Splitting the dataset into the Training set and Test set
❑ from sklearn.model_selection import
train_test_split
❑ X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size=1/3,
random_state=0)
❑ # Fitting Simple Linear Regression to the Training set
❑ from sklearn.linear_model import
LinearRegression
❑ regressor = LinearRegression()
❑ regressor.fit(X_train, y_train)
4/16/2020 www.SunilOS.com 156
Code Implementation of Linear Regression (cont.)
❑ # Predicting the Test set results
❑ y_pred = regressor.predict(X_test)
❑ #predicting the salary for 5 year Experienced Employee
❑ y_pred = regressor.predict([[5]])
❑ print(y_pred)
4/16/2020 www.SunilOS.com 157
Code Implementation of Linear Regression (cont.)
❑ # Visualizing the Training set results
❑ viz_train = plt
❑ viz_train.scatter(X_train, y_train, color='red')
❑ viz_train.plot(X_train, regressor.predict(X_train),
color='blue')
❑ viz_train.title('Salary VS Experience (Training
set)')
❑ viz_train.xlabel('Year of Experience')
❑ viz_train.ylabel('Salary')
❑ viz_train.show()
4/16/2020 www.SunilOS.com 158
Training Dataset
4/16/2020 www.SunilOS.com 159
Code Implementation of Linear Regression (cont.)
❑ # Visualizing the Test set results
❑ viz_test = plt
❑ viz_test.scatter(X_test, y_test,
color='red')
❑ viz_test.plot(X_train,regressor.predict(X_t
rain), color='blue')
❑ viz_test.title('Salary VS Experience (Test
set)')
❑ viz_test.xlabel('Year of Experience')
❑ viz_test.ylabel('Salary')
❑ viz_test.show()
4/16/2020 www.SunilOS.com 160
Test Data
4/16/2020 www.SunilOS.com 161
Advantages & Disadvantages of Linear Regression
❑Advantages:
o Simple and easy to understand.
o Cheap computational cost.
o Ground for more complex machine learning algorithms.
❑
❑Disadvantage:
o Oversimplify or fail in non-linear problems (only do well in
linear modeling)
o Sensitive to outliers and noises
4/16/2020 www.SunilOS.com 162
Multi Linear Regression
❑In most cases, we will have more than one independent
variable — we’ll have multiple variables; it can be as
little as two independent variables and up to hundreds
(or theoretically even thousands) of variables.
❑In those cases we will use a Multiple Linear Regression
model (MLR). The regression equation is pretty much
the same as the simple regression equation, just with
more variables:
Y= b0 + b1X1 + b2X2+...bnXn
4/16/2020 www.SunilOS.com 163
Implementation Of Multi linear Regression
❑We are taking loan dataset for multi linear regression
with age, credit-rating and children as features and loan
as target.
❑We are going to predict the loan amount (dependent
variable) with the help of age, credit-rating and no of
children(Independent variable).
❑Note that the data has four columns, out of which three
columns are features and one is the target variable.
4/16/2020 www.SunilOS.com 164
Loan Dataset
4/16/2020 www.SunilOS.com 165
Relationship between credit-rating and loan amount
4/16/2020 www.SunilOS.com 166
Code Implementation of MLR
❑ #Features age, credit-rating and no of children
❑ age=[19,18,28,33,32,31,46,37,37,60,25,62,23,56]
❑ credit_rating=[27.9,42.13,33,22.705,28.88,25.74,
❑ 33.44,27.74,29.83,25.84,26.22,26.29,34.4,39.82]
❑ children=[0,1,3,0,0,0,1,3,2,0,0,0,0,0]
❑ #Label data
❑ loan=[16884.924,1725.5523,4449.462,21984.47061,3866.
8552,
❑ 3756.6216,8240.5896,7281.5056,6406.4107,28923.13692,
❑ 2721.3208,27808.7251,1826.843,11090.7178,]
4/16/2020 www.SunilOS.com 167
Code Implementation of MLR (cont.)
❑ #Combining age, credit-rating and children into single list of tuples
❑ features=list(zip(age,credit_rating,children))
❑ print(features)
❑ #define the multiple Linear regression model
❑ linear_regress = LinearRegression()
❑ #Fit the multiple Linear regression model
❑ linear_regress.fit(features,loan)
❑ print("coefficient:",linear_regress.coef_)
❑ print("intercept:",linear_regress.intercept_)
❑ # predict with test data
❑ #age:20,credit-rating:32,children:0
❑ y_pred=linear_regress.predict([[20,32,0]])
❑ print(y_pred)
4/16/2020 www.SunilOS.com 168
Disclaimer
❑This is an educational presentation to enhance the
skill of computer science students.
❑This presentation is available for free to computer
science students.
❑Some internet images from different URLs are used
in this presentation to simplify technical examples
and correlate examples with the real world.
❑We are grateful to owners of these URLs and
pictures.
www.SunilOS.com 169
Thank You!
www.SunilOS.com 170
www.SunilOS.com

More Related Content

What's hot

Java IO Streams V4
Java IO Streams V4Java IO Streams V4
Java IO Streams V4Sunil OS
 
JavaScript
JavaScriptJavaScript
JavaScriptSunil OS
 
Java 8 - CJ
Java 8 - CJJava 8 - CJ
Java 8 - CJSunil OS
 
Collections Framework
Collections FrameworkCollections Framework
Collections FrameworkSunil OS
 
Java Input Output and File Handling
Java Input Output and File HandlingJava Input Output and File Handling
Java Input Output and File HandlingSunil OS
 
Resource Bundle
Resource BundleResource Bundle
Resource BundleSunil OS
 
JAVA Variables and Operators
JAVA Variables and OperatorsJAVA Variables and Operators
JAVA Variables and OperatorsSunil OS
 
Java Basics V3
Java Basics V3Java Basics V3
Java Basics V3Sunil OS
 
Threads V4
Threads  V4Threads  V4
Threads V4Sunil OS
 
Java Basics
Java BasicsJava Basics
Java BasicsSunil OS
 
Hibernate
Hibernate Hibernate
Hibernate Sunil OS
 
Jsp/Servlet
Jsp/ServletJsp/Servlet
Jsp/ServletSunil OS
 
Java Threads and Concurrency
Java Threads and ConcurrencyJava Threads and Concurrency
Java Threads and ConcurrencySunil OS
 

What's hot (20)

Java IO Streams V4
Java IO Streams V4Java IO Streams V4
Java IO Streams V4
 
JDBC
JDBCJDBC
JDBC
 
JavaScript
JavaScriptJavaScript
JavaScript
 
Log4 J
Log4 JLog4 J
Log4 J
 
Java 8 - CJ
Java 8 - CJJava 8 - CJ
Java 8 - CJ
 
Collections Framework
Collections FrameworkCollections Framework
Collections Framework
 
JUnit 4
JUnit 4JUnit 4
JUnit 4
 
Java Input Output and File Handling
Java Input Output and File HandlingJava Input Output and File Handling
Java Input Output and File Handling
 
Resource Bundle
Resource BundleResource Bundle
Resource Bundle
 
JAVA Variables and Operators
JAVA Variables and OperatorsJAVA Variables and Operators
JAVA Variables and Operators
 
Java Basics V3
Java Basics V3Java Basics V3
Java Basics V3
 
Threads V4
Threads  V4Threads  V4
Threads V4
 
Java Basics
Java BasicsJava Basics
Java Basics
 
Hibernate
Hibernate Hibernate
Hibernate
 
C++
C++C++
C++
 
OOP V3.1
OOP V3.1OOP V3.1
OOP V3.1
 
Jsp/Servlet
Jsp/ServletJsp/Servlet
Jsp/Servlet
 
JAVA OOP
JAVA OOPJAVA OOP
JAVA OOP
 
Java Threads and Concurrency
Java Threads and ConcurrencyJava Threads and Concurrency
Java Threads and Concurrency
 
C Basics
C BasicsC Basics
C Basics
 

Similar to Machine learning ( Part 2 )

Big Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptxBig Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptxPlacementsBCA
 
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and LimeIRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and LimeIRJET Journal
 
IRJET- Student Placement Prediction using Machine Learning
IRJET- Student Placement Prediction using Machine LearningIRJET- Student Placement Prediction using Machine Learning
IRJET- Student Placement Prediction using Machine LearningIRJET Journal
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithmsShalitha Suranga
 
Module three ppt of DWDM. Details of data mining rules
Module three ppt of DWDM. Details of data mining rulesModule three ppt of DWDM. Details of data mining rules
Module three ppt of DWDM. Details of data mining rulesNivaTripathy1
 
Machine Learning Approach.pptx
Machine Learning Approach.pptxMachine Learning Approach.pptx
Machine Learning Approach.pptxCYPatrickKwee
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
Datamining intro-iep
Datamining intro-iepDatamining intro-iep
Datamining intro-iepaaryarun1999
 
Text classificationmethods
Text classificationmethodsText classificationmethods
Text classificationmethodsFraboni Ec
 
Text classification methods
Text classification methodsText classification methods
Text classification methodsHarry Potter
 
Text classification methods
Text classification methodsText classification methods
Text classification methodsLuis Goldster
 
Text classification methods
Text classification methodsText classification methods
Text classification methodsYoung Alista
 
Text classification methods
Text classification methodsText classification methods
Text classification methodsJames Wong
 
Text classification methods
Text classification methodsText classification methods
Text classification methodsTony Nguyen
 
Text classification methods
Text classification methodsText classification methods
Text classification methodsDavid Hoen
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskQuantUniversity
 
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptx
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptxNEAREST NEIGHBOUR CLUSTER ANALYSIS.pptx
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptxagniva pradhan
 

Similar to Machine learning ( Part 2 ) (20)

Big Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptxBig Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptx
 
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and LimeIRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
 
IRJET- Student Placement Prediction using Machine Learning
IRJET- Student Placement Prediction using Machine LearningIRJET- Student Placement Prediction using Machine Learning
IRJET- Student Placement Prediction using Machine Learning
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithms
 
Module three ppt of DWDM. Details of data mining rules
Module three ppt of DWDM. Details of data mining rulesModule three ppt of DWDM. Details of data mining rules
Module three ppt of DWDM. Details of data mining rules
 
Machine Learning Approach.pptx
Machine Learning Approach.pptxMachine Learning Approach.pptx
Machine Learning Approach.pptx
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Datamining intro-iep
Datamining intro-iepDatamining intro-iep
Datamining intro-iep
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
Credit risk meetup
Credit risk meetupCredit risk meetup
Credit risk meetup
 
Text classificationmethods
Text classificationmethodsText classificationmethods
Text classificationmethods
 
Text classification methods
Text classification methodsText classification methods
Text classification methods
 
Text classification methods
Text classification methodsText classification methods
Text classification methods
 
Text classification methods
Text classification methodsText classification methods
Text classification methods
 
Text classification methods
Text classification methodsText classification methods
Text classification methods
 
Text classification methods
Text classification methodsText classification methods
Text classification methods
 
Text classification methods
Text classification methodsText classification methods
Text classification methods
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
 
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptx
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptxNEAREST NEIGHBOUR CLUSTER ANALYSIS.pptx
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptx
 
presentationIDC - 14MAY2015
presentationIDC - 14MAY2015presentationIDC - 14MAY2015
presentationIDC - 14MAY2015
 

More from Sunil OS

Threads v3
Threads v3Threads v3
Threads v3Sunil OS
 
Exception Handling v3
Exception Handling v3Exception Handling v3
Exception Handling v3Sunil OS
 
Python Pandas
Python PandasPython Pandas
Python PandasSunil OS
 
Angular 8
Angular 8 Angular 8
Angular 8 Sunil OS
 
C# Variables and Operators
C# Variables and OperatorsC# Variables and Operators
C# Variables and OperatorsSunil OS
 
Rays Technologies
Rays TechnologiesRays Technologies
Rays TechnologiesSunil OS
 
Java Swing JFC
Java Swing JFCJava Swing JFC
Java Swing JFCSunil OS
 

More from Sunil OS (10)

OOP v3
OOP v3OOP v3
OOP v3
 
Threads v3
Threads v3Threads v3
Threads v3
 
Exception Handling v3
Exception Handling v3Exception Handling v3
Exception Handling v3
 
Python Pandas
Python PandasPython Pandas
Python Pandas
 
Angular 8
Angular 8 Angular 8
Angular 8
 
C# Variables and Operators
C# Variables and OperatorsC# Variables and Operators
C# Variables and Operators
 
C# Basics
C# BasicsC# Basics
C# Basics
 
Rays Technologies
Rays TechnologiesRays Technologies
Rays Technologies
 
C++ oop
C++ oopC++ oop
C++ oop
 
Java Swing JFC
Java Swing JFCJava Swing JFC
Java Swing JFC
 

Recently uploaded

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 

Recently uploaded (20)

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 

Machine learning ( Part 2 )

  • 2. www.SunilOS.com 2 What is Machine Learning? ❑ Human Learns from past experience. ❑ A computer does not have “experiences”. ❑ A computer system learns from data, ❑ Which represent some “past experiences” of an application domain. ❑ Our focus: learn a target function that can be used to predict the values of a class attribute, e.g. a loan application is, approve or not-approved, and high-risk or low risk. ❑ The task is commonly called: Supervised learning, classification, or inductive learning.
  • 3. Types of Learning ❑Supervised Learning o Classification o Regression ❑Unsupervised Learning o Clustering ❑Reinforcement Learning www.SunilOS.com 3
  • 4. Types of supervised Learning ❑Classification: o A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”. ❑Regression: o A regression problem is when the output variable is a real value, such as “dollars” or “weight”. www.SunilOS.com 4
  • 5. Supervised Learning Process ❑Learning(training): o Learn the model with known data ❑Testing: o test the Model with unseen data ❑Accuracy: ❑ No of right classification/Total no of test case www.SunilOS.com 5 Training data Learning algorithm Model AccuracyTraining Data Step1: Training Step2: Testing Testing Data
  • 6. Classification example ❑ A loan providing company receives thousands of applications for new loans. ❑ Each application contains information about an applicant o Age o Marital status o annual salary o Outstanding debts o credit rating o etc. ❑ Problem: to decide whether an application should approved, or to classify applications into two categories, approved and not approved. www.SunilOS.com 6
  • 9. An example ❑Data: Loan application data ❑Task: Predict whether a loan should be approved or not. ❑Performance measure: Accuracy. ❑No learning: classify all future applications (test data) to the majority class (i.e., Yes): o Accuracy = 9/15 = 60%. ❑We can do better than 60% with learning. www.SunilOS.com 9
  • 10. Evaluating classification methods ❑Predictive accuracy o Accuracy=No of correct classification / total no of test Case ❑Efficiency o time to construct the model o time to use the model www.SunilOS.com 10
  • 11. Conclusion ❑ Applications of supervised learning are in almost any field or domain. ❑ There are numerous classification techniques. o Bayesian networks o K- Nearest Neighbors o Decision Tree Classification o Fuzzy classification ❑ This large number of methods also show the importance of classification and its wide applicability. ❑ It remains to be an active research area. www.SunilOS.com 11
  • 13. www.SunilOS.com 13 What is Classification?  Classification is a supervised machine learning approach.  Computer uses Training data for learning and uses this learning to classify new observations.  Classification can be:  Binary class classification : spam or not spam, male or female Multiclass classification: Fruits, Colors. 4/16/2020 www.SunilOS.com 13
  • 14. Types of classification algorithm ❑Linear Classifiers: Logistic Regression, Naive Bayes Classifier ❑K Nearest Neighbor ❑Support Vector Machines ❑Decision Trees ❑Random Forest 4/16/2020 www.SunilOS.com 14
  • 15. K-Nearest Neighbor ❑ The k-nearest-neighbors algorithm is a supervised classification technique that based on similar qualities. ❑ KNN assumes, similar things exist near to each other. ❑ The algorithm takes a bunch of labeled points and uses them to learn how to label other points. ❑ To label a new point, it looks at the labeled points closest to that new point (those are its nearest neighbors). ❑ Closeness is typically expressed in terms of a dissimilarity function. ❑ Once it checks with ‘k’ number of nearest neighbors, it assigns a label based on whichever label the most of the neighbors have. www.SunilOS.com4/16/2020 15
  • 16. KNN working Steps ❑Calculate distance for new test data with old labeled data ❑Find closest neighbors for new test data. ❑Vote for labels which is nearest. 4/16/2020 www.SunilOS.com 16
  • 17. KNN algorithm Implementation ❑Define dataset. ❑Prepare data. ❑Train model. ❑Test Model. ❑Calculate accuracy. 4/16/2020 www.SunilOS.com 17
  • 18. Dataset ❑Let's first create your own dataset. Here you need two kinds of attributes or columns in your data: Feature and target label. The reason for two type of column is "supervised nature of KNN algorithm". ❑In this dataset, you have two features (weather and temperature) and one label(play). 4/16/2020 www.SunilOS.com 18
  • 19. Define dataset Weather Temp Play Sunny Hot No Sunny Hot Yes Overcast Hot Yes Rainy Mild Yes Rainy Cool No Rainy Cool Yes Overcast Cool No Sunny Mild Yes Sunny Cool Yes Rainy Mild Yes Sunny Mild Yes Overcast Mild Yes Overcast Hot Yes Rainy Mild No4/16/2020 www.SunilOS.com 19
  • 21. Code implementation in scikit learn ❑ # Assigning features and label variables ❑ # First Feature ❑ weather=['Sunny','Sunny','Overcast','Rainy','Rainy', 'Rainy','Overcast','Sunny','Sunny', ❑ 'Rainy','Sunny','Overcast','Overcast','Rainy'] ❑ # Second Feature ❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool', 'Mild','Cool','Mild','Mild','Mild','Hot','Mild'] ❑ ❑ # Label or target variable ❑ play=['No','No','Yes','Yes','Yes','No','Yes','No','Y es','Yes','Yes','Yes','Yes','No'] 4/16/2020 www.SunilOS.com 21
  • 22. Code implementation in scikit learn(cont.) ❑ # Import Label Encoder ❑ from sklearn import preprocessing ❑ #creating label Encoder ❑ le = preprocessing.LabelEncoder() ❑ # Converting string labels into numbers. ❑ weather_encoded=le.fit_transform(weather) ❑ print(weather_encoded) ❑ ❑ # converting string labels into numbers ❑ temp_encoded=le.fit_transform(temp) ❑ label=le.fit_transform(play) ❑ print(label) 4/16/2020 www.SunilOS.com 22
  • 23. Code implementation in scikit learn(cont.) ❑ #combining weather and temp into single list of tuples ❑ features=list(zip(weather_encoded,temp_encoded)) ❑ print(features) ❑ #Prepare Model instance ❑ from sklearn.neighbors import KNeighborsClassifier ❑ model = KNeighborsClassifier(n_neighbors=3) ❑ # Train the model using the training sets ❑ model.fit(features,label) ❑ #Predict Output ❑ predicted= model.predict([[0,2]]) # 0:Overcast, 2:Mild ❑ print(predicted) 4/16/2020 www.SunilOS.com 23
  • 24. Advantage of KNN ❑It is extremely easy to implement ❑This makes the KNN algorithm much faster than other algorithms that require training e.g. SVM, Linear Regression etc. ❑Since the algorithm requires no training before making predictions, new data can be added seamlessly. ❑There are only two parameters required to implement KNN i.e. the value of K and the distance function (e.g. Euclidean or Manhattan etc.) 4/16/2020 www.SunilOS.com 24
  • 25. Disadvantages of KNN ❑ The KNN algorithm doesn't work well with high dimensional data because with large number of dimensions, it becomes difficult for the algorithm to calculate distance in each dimension. ❑ The KNN algorithm has a high prediction cost for large datasets. This is because in large datasets the cost of calculating distance between new point and each existing point becomes higher. ❑ Finally, the KNN algorithm doesn't work well with categorical features since it is difficult to find the distance between dimensions with categorical features. 4/16/2020 www.SunilOS.com 25
  • 26. Naive Bayes Classification Base ❑It uses Bayes theorem of probability for prediction of unknown class/Label. ❑Naive Bayes classifier assumes that the effect of a particular feature in a class is independent of other features. o For example, a loan applicant is desirable or not depending on his/her income, previous loan and transaction history, age, and location. o Even if these features are interdependent, these features are still considered independently. o This assumption simplifies computation, and that's why it is considered as naive www.SunilOS.com 26
  • 27. Approve a Loan ❑ Bank has received a loan application and now we want to predict whether bank will approve or not. ❑ Approval will be decide on the basis of independent attributes specified in the application form. ❑ Income, previous loan, transaction history, age, and location information specified in application form are considered as independent attribute. ❑ Now we will calculate separate probability: ❑ probability of approval or rejection of loan on income, ❑ probability of approval or rejection of loan on previous loan, ❑ probability of approval or rejection of loan on age, ❑ probability of approval or rejection of loan on location, ❑ Naive Bayes will help us to multiply above probabilities and forecast approval and rejection of new loan application. www.SunilOS.com 27
  • 28. Naïve Bayes Classification Base (cont.) ❑ Where, ❑ P(c|x) is the posterior probability of class c given predictor ( features). ❑ P(c) is the probability of class. ❑ P(x|c) is the likelihood which is the probability of predictor given class. ❑ P(x) is the prior probability of predictor. www.SunilOS.com 28
  • 29. Types of Naive Bayes Algorithm ❑Gaussian Naive Bayes. ❑Multinomial Naive Bayes. ❑Bernoulli Naïve Bayes. ❑P(A|B)=P(B|A)*P(A) ❑ ----------------- ❑ P(B) www.SunilOS.com 29
  • 30. How Gaussian Naive Bayes classifier works? ❑Given an example of weather conditions and playing sports. ❑You need to calculate the probability of playing sports. ❑Now, you need to classify whether players will play or not, based on the weather condition. www.SunilOS.com 30
  • 31. How Naive Bayes classifier works? (cont.) ❑ Naive Bayes classifier calculates the probability of an event in the following steps: ❑ Calculate the prior probability for given class labels o p(play) o P(not play). ❑ Find Likelihood probability with each attribute for each class. o P(Hot/play) or p(Hot/not play) o P(Cold/play) p(Cold/not play) ❑ Put these value in Bayes Formula and calculate posterior probability. ❑ See which class has a higher probability, given the input belongs to the higher probability class. www.SunilOS.com 31
  • 32. Dataset Weather Play Sunny No Sunny Yes Overcast Yes Rainy Yes Rainy No Rainy Yes Overcast No Sunny Yes Sunny Yes Rainy Yes Sunny Yes Overcast Yes Overcast Yes Rainy No www.SunilOS.com 32
  • 33. Frequency Table Weather No Yes Sunny 1 4 5 Overcast 1 3 4 Rainy 2 3 5 Total 4 10 www.SunilOS.com 33
  • 34. Prior Probability of class Weather No Yes Sunny 1 4 5 5/14=0.35 Overcast 1 3 4 4/14=0.29 Rainy 2 3 5 5/14=0.35 Total 4 10 4/14=0.29 10/14=0.71 www.SunilOS.com 34
  • 35. Posterior Probability Weather No Yes Posterior probability of No Posterior Probability of Yes Sunny 1 4 1/4= 0.25 4/10=0.4 Overcast 1 3 1/4= 0.25 3/10=0.3 Rainy 2 3 2/4 =0.5 3/10=0.3 Total 4 10 4/14=0.29 10/14=0.71 www.SunilOS.com 35
  • 36. Probability of playing when weather is overcast ❑ Equation: o P(Yes|Overcast)=P(Overcast|Yes)*P(Yes)/P(Overcast) ❑ Calculate Prior Probabilities: o P(Overcast) = 4/14 = 0.29 o P(Yes)= 10/14 = 0.71 ❑ Calculate Posterior Probabilities: o P(Overcast |Yes) = 3/10 = 0.3 ❑ Put Prior and Posterior probabilities in equation o P (Yes | Overcast) = 0.3 * 0.71 / 0.29 = 0.7344(Higher) www.SunilOS.com 36
  • 37. Probability of not playing when weather is overcast ❑ Equation: o P(No|Overcast)=P(Overcast|No)*P(No)/P(Overcast) ❑ Calculate Prior Probabilities: o P(Overcast) = 4/14 = 0.29 o P(No)= 4/14 = 0.29 ❑ Calculate Posterior Probabilities: o P(Overcast |No) = 1/4 = 0.25 ❑ Put Prior and Posterior probabilities in equation o P (No | Overcast) = 0.25 * 0.29 / 0.29 = 0.25(Low) www.SunilOS.com 37
  • 38. Implementation of Naive Bayes algorithm: ❑ # Assigning features and label variables ❑ weather=['Sunny','Sunny','Overcast','Rainy','Ra iny','Rainy','Overcast','Sunny','Sunny','Rainy' ,'Sunny','Overcast','Overcast','Rainy'] ❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','C ool','Mild','Cool','Mild','Mild','Mild','Hot',' Mild'] ❑ play=['No','No','Yes','Yes','Yes','No','Yes','N o','Yes','Yes','Yes','Yes','Yes','No'] www.SunilOS.com 38
  • 39. Implementation of Naive Bayes algorithm (cont.) ❑ # Import LabelEncoder o from sklearn import preprocessing ❑ #creating labelEncoder o le = preprocessing.LabelEncoder() ❑ # Converting string labels into numbers. o weather_encoded=le.fit_transform(weather) o print("Weather:",weather_encoded) ❑ # Converting string labels into numbers o temp_encoded=le.fit_transform(temp) o print("Temp:",temp_encoded) o label=le.fit_transform(play) o print("Play:",label) www.SunilOS.com 39
  • 40. Implementation of Naive Bayes algorithm (cont.) ❑ #Combining weather and temp into single list of tuples o features=list(zip(weather_encoded,temp_encoded)) o print("Features:",features) ❑ #Import Gaussian Naive Bayes model o from sklearn.naive_bayes import GaussianNB ❑ #Create a Gaussian Classifier o model = GaussianNB() ❑ # Train the model using the training sets o model.fit(features,label) ❑#Predict Output: 0:Overcast, 2:Mild o predicted= model.predict([[0,2]]) o print ("Predicted Value:", predicted) www.SunilOS.com 40
  • 41. Multinomial Naive Bayes algorithm: ❑This machine learning algorithm is used for text data classification. ❑If we are interested in finding out a number of occurrences of a word in a document then we have to use a multinomial naive Bayes algorithm. www.SunilOS.com 41
  • 42. How does Naive Bayes Algorithm Works ? ❑ Let’s consider an example, classify the review whether it is positive or negative. ❑ Training Dataset: www.SunilOS.com 42 Text Reviews I like the movie Positive It's a good movie. Nice Story Positive Nice songs. But sadly a boring ending. negative Overall nice movie Positive Sad, boring movie negative
  • 43. ❑ We classify whether the text “overall liked the movie” has a positive review or a negative review. We have to calculate: ❑ P(positive | overall liked the movie) — the probability that the tag of a sentence is positive. ❑ P(negative | overall liked the movie) — the probability that the tag of a sentence is negative . ❑ Before that, first, we apply Removing Stopwords and Stemming in the text. www.SunilOS.com 43
  • 44. Removing Stopwords & Stemming ❑ Removing Stopwords: These are common words that don’t really add anything to the classification, such as an able, either, else, ever and so on. ❑ ❑ Stemming: Stemming to take out the root of the word. A stemming algorithm reduces the words o “chocolates”, “chocolaty”, “Choco” to the root word, “chocolate” o and “retrieval”, “retrieved”, “retrieves” reduce to the stem “retrieve”. www.SunilOS.com 44
  • 45. Feature Engineering: ❑The important part is to find the features from the data to make machine learning algorithms works. ❑ In this case, we have text. We need to convert this text into numbers that we can do calculations on. ❑ We use word frequencies. That is treating every document as a set of the words it contains. ❑Our features will be the counts of each words. www.SunilOS.com 45
  • 46. Now Calculate Probability ❑ In our case, we have o P(positive | overall liked the movie) ❑ Since for our classifier we have to find out which tag has a bigger probability, we can discard the divisor which is the same for both tags, o P(overall liked the movie|positive)* P(positive) o P(overall liked the movie|negative)* P(negative) www.SunilOS.com 46
  • 47. ❑ There’s a problem though: “overall liked the movie” doesn’t appear in our training dataset, so the probability is zero. Here, we assume the ‘naive’ condition that every word in a sentence is independent of the other ones. This means that now we look at individual words. ❑ We can write this as: o P(overall liked the movie) = P(overall) * P(liked) * P(the) * P(movie) ❑ The next step is just applying the Bayes theorem: o P(overall liked the movie| positive) = P(overall | positive) * P(liked | positive) * P(the | positive) * P(movie | positive) ❑ And now, these individual words actually show up several times in our training data, and we can calculate probability of them! www.SunilOS.com 47
  • 48. The prior Probability ❑ P(positive) is= 3/5 =0.6. ❑ P(negative) is= 2/5=0.4. ❑ Then, calculating P(overall | positive) means counting how many times the word “overall” appears in positive texts+1 divided by the total number of words in positive+ total no of unique words in all reviews. o Total words in positive=13. o Total words in Negative=10. o Total Unique words in all=15 www.SunilOS.com 48
  • 49. Calculated Prior Probability ❑ Therefore, o P(overall | positive) = (1+1)/(13+15)=0.07142 o P(liked | positive) = (1+1)/(13+15)=0.07142 o P(the | positive) = (1+1)/(13+15)=0.07142 o P(movie | positive) = (3+1)/(13+15)=0.1428 ❑ Therefore, o P(overall | negative) = (0+1)/(10+15)=0.04 o P(liked | negative) = (0+1)/(10+15)=0.04 o P(the | negative) = (0+1)/(10+15)=0.04 o P(movie| negative) = (1+1)/(10+15)=0.08 www.SunilOS.com 49
  • 50. Laplace smoothing ❑If probability comes out to be zero then By using Laplace smoothing: ❑we add 1 to every count so it’s never zero. To balance this, we add the number of possible words to the divisor, so the division will never be greater than 1. ❑In our case, the total unique possible words count are 15. www.SunilOS.com 50
  • 52. Result: Positive Review ❑ P(overall | positive) * P(liked |positive) * P(the | positive) * P(movie | positive) * P(positive )= 3.06 * 10^{-5}=0.0000306 ❑ P(overall | negative) * P(liked |negative) * P(the | negative) * P(movie | negative) * P(negative) = 0.20 * 10^{-5}=0.000002048 www.SunilOS.com 52
  • 53. Implementation of Multinomial Naive Bayes algorithm: ❑Multinomial implements the naive Bayes algorithm for multinomially (discrete no of possible outcome) distributed data, ❑and is one of the two classic naive Bayes variants used in text classification (where the data are typically represented as word vector counts). www.SunilOS.com 53
  • 54. Implementation of Multinomial Naive Bayes algorithm: ❑ # Assigning features and label variables o import numpy as np o reviews=np.array(['I like the movie', o 'Its a good movie. Nice Story', o 'Nice songs. But sadly a boring ending.', o 'Overall nice movie', o 'Sad, boring movie']) o label=["positive","positive","negative","positive ","negative"] o test=np.array(["Overall i like the movie"]) www.SunilOS.com 54
  • 55. Implementation of Multinomial Naive Bayes algorithm (cont.) ❑ #encode text data into numeric o from sklearn import preprocessing ❑ #creating labelEncoder o le = preprocessing.LabelEncoder() ❑ # Converting string labels into numbers. o lable_encoded=le.fit_transform(label) o print("Label:",lable_encoded) www.SunilOS.com 55
  • 56. Implementation of Multinomial Naive Bayes algorithm (cont.) ❑ # Generate counts from text using a vectorizer. There are other vectorizers available, and lots of options you can set. ❑ # This performs our step of computing word counts. o from sklearn.feature_extraction.text import CountVectorizer o vectorizer=CountVectorizer(stop_words='english') o train_features =vectorizer.fit_transform(reviews) o test_features = vectorizer.transform(test) o print("Train vocabulary:",vectorizer.vocabulary_) ❑ #Print Dimension of the training and test data o print("Shape of Train:",train_features.shape) o print("Shape of Train:",test_features.shape) www.SunilOS.com 56
  • 57. Implementation of Multinomial Naive Bayes algorithm (cont.) ❑ # Fit a naive Bayes model to the training data. ❑ # This will train the model using the word counts we computer, and the existing classifications in the training set. o nb = MultinomialNB() o nb.fit(train_features,lable_encoded) ❑ ❑ # Now we can use the model to predict classifications for our test features. o predictions = nb.predict(test_features) o print(predictions) www.SunilOS.com 57
  • 58. Bernoulli Naive Bayes: ❑ BernoulliNB implements the naive Bayes training and classification algorithms for data that is distributed according to multivariate Bernoulli distributions; o i.e., there may be multiple features but each one is assumed to be a binary-valued (boolean) variable. ❑ Therefore, this class requires samples to be represented as binary-valued feature vectors; ❑ if handed any other kind of data, a BernoulliNB instance may binarize its input (depending on the binarize parameter). www.SunilOS.com 58
  • 59. for a Bernoulli trial ❑ a random experiment that has only two outcomes o usually called a “Success” or a “Failure”. ❑ For example, the probability of getting a heads (a “success”) while flipping a coin is 0.5. ❑ The probability of “failure” is 1 – P (1 minus the probability of success, which also equals 0.5 for a coin toss). ❑ It is a special case of the binomial distribution for n = 1. In other words, it is a binomial distribution with a single trial (e.g. a single coin toss). www.SunilOS.com 59
  • 60. Implementation of Bernoulli Naive Bayes algorithm (cont.) ❑ # Assigning features and label variables o import numpy as np o document=np.array(["Saturn Dealer’s Car", o "Toyota Car Tercel", o "Baseball Game Play", o "Pulled Muscle Game", o "Colored GIFs Root"]) o label=np.array(["Auto","Auto","Sports","Sports"," Computer"]) o test=np.array(["Home Runs Game","Car Engine Noises"]) www.SunilOS.com 60
  • 61. Implementation of Bernoulli Naive Bayes algorithm (cont.) ❑ #Import preprocessing o from sklearn import preprocessing ❑ #creating labelEncoder o le = preprocessing.LabelEncoder() ❑ # Converting string labels into numbers. o lable_encoded=le.fit_transform(label) o print("Label:",lable_encoded) www.SunilOS.com 61
  • 62. Implementation of Bernoulli Naive Bayes algorithm (cont.) ❑ # Generate counts from text using a vectorizer. There are other vectorizers available, and lots of options you can set. ❑ # This performs our step of computing word Occurrence counts. o vectorizer=CountVectorizer(stop_words='english',b inary=True) o train_features = vectorizer.fit_transform(document) o test_features = vectorizer.transform(test) o print("Train vocabulary:",vectorizer.vocabulary_) ❑ #Print dimention of the Trainning and Ttest data o print("Shape of Train:",train_features.shape) o print("Shape of Train:",test_features.shape) www.SunilOS.com 62
  • 63. Implementation of Bernoulli Naive Bayes algorithm (cont.) ❑ # Fit a naive Bayes model to the training data. ❑ # This will train the model using the word occurrence counts we compute, in the existing classifications in the training set. o nb=BernoulliNB() o nb.fit(train_features,lable_encoded) ❑ ❑ # Now we can use the model to predict classifications for our test features. o predictions = nb.predict(test_features) o print("Prediction:",predictions) www.SunilOS.com 63
  • 64. Advantages Of Naïve Bayes ❑ It is Simple, Fast and accurate. ❑ It has very low computation cost. ❑ It can efficiently work on a large dataset. ❑ It can be used with multiple class prediction problems. ❑ It also performs well in the case of text analytics problems. ❑ When the assumption of independence holds, a Naive Bayes classifier performs better compared to other models like logistic regression. www.SunilOS.com 64
  • 65. Disadvantages of naive Bayes ❑ The assumption of independent features. In practice, it is almost impossible that model will get a set of predictors which are entirely independent. ❑ If there is no training tuple of a particular class, this causes zero posterior probability. ❑ In this case, the model is unable to make predictions. This problem is known as Zero Probability/Frequency Problem. www.SunilOS.com 65
  • 68. What Is Decision Tree? ❑ Decision Tree is a supervised learning algorithm. ❑ It is a tree Like structure for classification and regression Model. ❑ Decision trees can be used for both categorical and numerical data. o The categorical data represent: gender, marital status, etc. o while the numerical data represent age, temperature, etc. ❑ A decision tree is a tree ❑ where each node represents o a feature (attribute), ❑ each link (branch) represents o a decision (rule) and ❑ each leaf represents an o outcome (categorical or continues value). www.SunilOS.com 68
  • 69. Reason to choose Decision Tree ❑Decision Trees usually represents human thinking ability while making a decision, so it is easy to understand. ❑The logic behind the decision tree can be easily understood because it shows a tree-like structure. www.SunilOS.com 69
  • 70. Terminologies ❑ Root Node: It is first node of the tree. It represents the entire dataset, which further gets divided into two or more homogeneous sets. ❑ Leaf Node: It is final nodes of the tree, and the tree cannot be further divided after getting a leaf node. ❑ Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the given conditions. ❑ Branch/Sub Tree: A tree formed by splitting the tree. ❑ Pruning: Pruning is the process of removing the unwanted branches from the tree. ❑ Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child nodes. www.SunilOS.com 70
  • 71. How Does A Decision Tree Work? ❑ It splits the dataset into subsets on the basis of the most significant attribute in the dataset. ❑ How the decision tree identifies this attribute and how this splitting is done is decided by Attribute selection Measure. ❑ The most significant attribute is selected as the root node. ❑ Splitting is done to form sub-nodes called decision nodes. ❑ And the nodes which do not split further are terminal or leaf nodes. www.SunilOS.com 71
  • 72. Attribute selection measure. ❑ While implementing a Decision tree, the main issue arises that how to select the best attribute for the root node and for sub-nodes. ❑ So, to solve such problems there is a technique which is called as Attribute selection measure or ASM. ❑ There are two popular techniques for ASM, which are: o Information Gain o Gini Index www.SunilOS.com 72
  • 73. Information Gain ❑ It calculates how much information a feature provides us about a class. ❑ According to the value of information gain, we split the node and build the decision tree. ❑ A node/attribute having the highest information gain is split first. It can be calculated using the below formula: o Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature) ❑ Entropy:It specifies randomness in data. Entropy can be calculated as: o Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)Where, ❑ S= Total number of samples ❑ P(yes)= probability of yes ❑ P(no)= probability of no www.SunilOS.com 73
  • 74. Gini Index ❑ Gini index is a measure of impurity or purity used while creating a decision tree in the CART(Classification and Regression Tree) algorithm. ❑ An attribute with the low Gini index should be preferred as compared to the high Gini index. ❑ It only creates binary splits, and the CART algorithm uses the Gini index to create binary splits. ❑ Gini index can be calculated using the below formula: o Gini Index= 1- ∑jPj 2 www.SunilOS.com 74
  • 75. Types of decision Trees Algorithms ❑ There are many decision tree algorithms available. Some of Them are as following ❑ ID3 ❑ C4.5 ❑ CART ❑ etc. www.SunilOS.com 75
  • 76. Advantages & Disadvantages of DT Advantages ❑ It follows the same process as human follows in real life to make decisions. ❑ Easy To Understand. ❑ It can be very useful for solving decision-related problems. ❑ It helps to think about all the possible outcomes for a problem. ❑ No need of data cleaning. Disadvantages ❑ The decision tree contains lots of layers, which makes it complex. ❑ It may have an overfitting issue, which can be resolved using the Random Forest algorithm. ❑ For more class labels, the computational complexity of the decision tree may increase. 76
  • 77. Working of CART Algorithm www.SunilOS.com 77 Day Outlook Temp. Humidity Wind Decision 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Weak Yes 14 Rain Mild High Strong No
  • 78. Gini index: ❑Gini index is a metric for classification tasks in CART. ❑It stores sum of squared probabilities of each class. We can formulate it as illustrated below. ❑Gini = 1 – Σ (Pi)2 for i=1 to number of classes www.SunilOS.com 78
  • 79. Select attribute to create Root node ❑ Outlook(weather):Outlook is a nominal feature. It can be sunny, overcast or rain. The final decisions for outlook feature. ❑ Gini(Outlook=Sunny) = 1 – (2/5)2 – (3/5)2 = 1 – 0.16 – 0.36 = 0.48 ❑ Gini(Outlook=Overcast) = 1 – (4/4)2 – (0/4)2 = 0 ❑ Gini(Outlook=Rain) = 1 – (3/5)2 – (2/5)2 = 1 – 0.36 – 0.16 = 0.48 ❑ Then, we will calculate weighted sum of gini indexes for outlook feature. ❑ Gini(Outlook) = (5/14) x 0.48 + (4/14) x 0 + (5/14) x 0.48 ❑ Gini(Outlook)= 0.171 + 0 + 0.171 = 0.342 www.SunilOS.com 79 Outlook Yes No Number of instances Sunny 2 3 5 Overcast 4 0 4 Rainy 3 2 5
  • 80. Temperature ❑ Similarly, temperature is a nominal feature and it could have 3 different values: Cool, Hot and Mild. Let’s summarize decisions for temperature feature. ❑ Gini(Temp=Hot) = 1 – (2/4)2 – (2/4)2 = 0.5 ❑ Gini(Temp=Cool) = 1 – (3/4)2 – (1/4)2 = 1 – 0.5625 – 0.0625 = 0.375 ❑ Gini(Temp=Mild) = 1 – (4/6)2 – (2/6)2 = 1 – 0.444 – 0.111 = 0.445 ❑ We’ll calculate weighted sum of gini index for temperature feature ❑ Gini(Temp) = (4/14) x 0.5 + (4/14) x 0.375 + (6/14) x 0.445 ❑ Gini(Temp)= 0.142 + 0.107 + 0.190 = 0.439 www.SunilOS.com 80 Temperature Yes No Number of instances Hot 2 2 4 Cool 3 1 4 Mild 4 2 6
  • 81. Humidity ❑ Humidity is a binary class feature. It can be high or normal. ❑ Gini(Humidity=High) = 1 – (3/7)2 – (4/7)2 = 1 – 0.1836 – 0.326 ❑ Gini(Humidity=High) = 0.48 ❑ Gini(Humidity=Normal) = 1 – (6/7)2 – (1/7)2 = 1 – 0.734 – 0.020 ❑ Gini(Humidity=High) = 0.244 ❑ We’ll calculate weighted sum of gini index for Humidity feature ❑ Gini(Wind) = (7/14) x 0.48 + (7/14) x 0.244 = 0.362 www.SunilOS.com 81 Humidity Yes No Number of instances High 3 4 7 Normal 6 1 7
  • 82. Windy ❑ Wind is a binary class similar to humidity. It can be weak and strong. ❑ Gini(Wind=Weak) = 1 – (6/8)2 – (2/8)2 = 1 – 0.5625 – 0.062 ❑ Gini(wind=weak)= 0.375 ❑ Gini(Wind=Strong) = 1 – (3/6)2 – (3/6)2 = 1 – 0.25 – 0.25 ❑ Gini(Wind=Strong)= 0.5 ❑We’ll calculate weighted sum of gini index for wind feature ❑ Gini(Wind) = (8/14) x 0.375 + (6/14) x 0.5 ❑ Gini(wind)= 0.428 www.SunilOS.com 82 Wind Yes No Number of instances Weak 6 2 8 Strong 3 3 6
  • 83. To Make decision tree ❑ Choose attribute with Lower Gini Index. ❑ Outlook will be the root node because it has minimum gini index value. Overcast subset has only yes decisions. That means overcast leaf is over ❑ We will apply same principles to those sub datasets in the following steps. Focus on the sub dataset for sunny outlook. We need to find the gini index scores for temperature, humidity and wind features respectively. www.SunilOS.com 83 Feature Gini index Outlook 0.342 Temperature 0.439 Humidity 0.362 Wind 0.428
  • 84. Sub-tree (subset) sunny Day Outlook Temp. Humidity Wind Decision 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 11 Sunny Mild Normal Strong Yes www.SunilOS.com 84
  • 85. Gini of temperature for sunny outlook: ❑ Gini(Outlook=Sunny and Temp.=Hot) = 1 – (0/2)2 – (2/2)2 = 0 ❑ Gini(Outlook=Sunny and Temp.=Cool) = 1 – (1/1)2 – (0/1)2 = 0 ❑ Gini(Outlook=Sunny and Temp.=Mild) = 1 – (1/2)2 – (1/2)2 = 1 – 0.25 – 0.25 = 0.5 ❑ Gini(Outlook=Sunny and Temp.) = (2/5)x0 + (1/5)x0 + (2/5)x0.5 = 0.2 www.SunilOS.com 85 Temperature Yes No Number of instances Hot 0 2 2 Cool 1 0 1 Mild 1 1 2
  • 86. Gini of humidity for sunny Outlook(Weather): ❑ Gini(Outlook=Sunny and Humidity=High) = 1 – (0/3)2 – (3/3)2 = 0 ❑ Gini(Outlook=Sunny and Humidity=Normal) = 1 – (2/2)2 – (0/2)2 = 0 ❑ Gini(Outlook=Sunny and Humidity) = (3/5)x0 + (2/5)x0 = 0 www.SunilOS.com 86 Humidity Yes No Number of instances High 0 3 3 Normal 2 0 2
  • 87. Gini of wind for sunny outlook: ❑ Gini(Outlook=Sunny and Wind=Weak) = 1 – (1/3)2 – (2/3)2 = 0.266 ❑ Gini(Outlook=Sunny and Wind=Strong) = 1- (1/2)2 – (1/2)2 = 0.2 ❑ Gini(Outlook=Sunny and Wind) = (3/5)x0.266 + (2/5)x0.2 = 0.466 www.SunilOS.com 87 Wind Yes No Number of instances Weak 1 2 3 Strong 1 1 2
  • 88. Decision for sunny outlook: ❑ We’ve calculated gini index scores for feature when outlook is sunny. The winner is humidity because it has the lowest value. ❑ We’ll put humidity at the extension of sunny outlook because it has minimum gini index. ❑ As seen, decision is always no for high humidity and sunny outlook. On the other hand, decision will always be yes for normal humidity and sunny outlook. This branch is over. www.SunilOS.com 88 Feature Gini index Temperature 0.2 Humidity 0 Wind 0.466
  • 89. Now, we need to focus on rain outlook. Day Outlook Temp. Humidity Wind Decision 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 6 Rain Cool Normal Strong No 10 Rain Mild Normal Weak Yes 14 Rain Mild High Strong No www.SunilOS.com 89
  • 90. Gini of temperature for rain outlook: ❑ Gini(Outlook=Rain and Temp.=Cool) = 1 – (1/2)2 – (1/2)2 = 0.5 ❑ Gini(Outlook=Rain and Temp.=Mild) = 1 – (2/3)2 – (1/3)2 = 0.444 ❑ Gini(Outlook=Rain and Temp.) = (2/5)x0.5 + (3/5)x0.444 = 0.466 www.SunilOS.com 90 Temperature Yes No Number of instances Cool 1 1 2 Mild 2 1 3
  • 91. Gini of humidity for rain outlook: ❑ Gini(Outlook=Rain and Humidity=High) = 1 – (1/2)2 – (1/2)2 = 0.5 ❑ Gini(Outlook=Rain and Humidity=Normal) = 1 – (2/3)2 – (1/3)2 = 0.444 ❑ Gini(Outlook=Rain and Humidity) = (2/5)x0.5 + (3/5)x0.444 = 0.466 www.SunilOS.com 91 Humidity Yes No Number of instances High 1 1 2 Normal 2 1 3
  • 92. Gini of wind for rain outlook: ❑ Gini(Outlook=Rain and Wind=Weak) = 1 – (3/3)2 – (0/3)2 = 0 ❑ Gini(Outlook=Rain and Wind=Strong) = 1 – (0/2)2 – (2/2)2 = 0 ❑ Gini(Outlook=Rain and Wind) = (3/5)x0 + (2/5)x0 = 0 www.SunilOS.com 92 Wind Yes No Number of instances Weak 3 0 3 Strong 0 2 2
  • 93. Decision for rain outlook: ❑ So for rain outlook we will take wind feature for spliting because it has minimum gini index. ❑ Put the wind feature for rain outlook branch and monitor the new sub data sets. ❑ As seen, decision is always yes when wind is weak. On the other hand, decision is always no if wind is strong. This means, this branch is over. www.SunilOS.com 93 Feature Gini index Temperature 0.466 Humidity 0.466 Wind 0
  • 95. Code Implementation of CART ❑ #Assigning features and label variables ❑ weather=['Sunny','Sunny','Overcast','Rainy','Rainy', 'Rainy','Overcast','Sunny','Sunny','Rainy','Sunny', 'Overcast', 'Overcast‘ , 'Rainy'] ❑ ❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool', 'Mild','Cool','Mild','Mild','Mild','Hot','Mild'] ❑ ❑ humidity=["High","High","High","High","Normal","Norm al","Normal","High","Normal","Normal","Normal","High ","Normal","High"] ❑ ❑ Windy=["Weak","Strong","Weak","Weak","Weak","Strong“ ,"Strong","Weak","Weak","Weak","Strong","Strong","We ak","Strong"] www.SunilOS.com 95
  • 96. Code Implementation of CART ❑ play=['No','No','Yes','Yes','Yes','No','Yes','N o','Yes','Yes','Yes','Yes','Yes','No'] ❑ ❑ # Import LabelEncoder ❑ from sklearn import preprocessing ❑ ❑ #creating labelEncoder ❑ le = preprocessing.LabelEncoder() ❑ ❑ # Converting string labels into numbers. ❑ weather_encoded=le.fit_transform(weather) ❑ print("Weather:",weather_encoded) ❑ www.SunilOS.com 96
  • 97. Code Implementation of CART ❑ # Converting string labels into numbers ❑ temp_encoded=le.fit_transform(temp) ❑ print("Temp:",temp_encoded) ❑ ❑ windy_encoded=le.fit_transform(Windy) ❑ print("Windy:",windy_encoded) ❑ ❑ Humadity_encoded=le.fit_transform(humadity) ❑ print("Humadity:",Humadity_encoded) ❑ label=le.fit_transform(play) ❑ print("Play:",label) www.SunilOS.com 97
  • 98. Code Implementation of CART ❑ #Combinig weather,temp, Windy, humadity into single listof tuples ❑ features=list(zip(weather_encoded,temp_encoded,windy _encoded,Humadity_encoded)) ❑ print("Features:",features) ❑ #Import the DecisionTreeClassifier ❑ from sklearn.tree import DecisionTreeClassifier ❑ tree = DecisionTreeClassifier(criterion='gini') ❑ #Train the Model ❑ tree.fit(features,label) ❑ #Test Model 2:sunny, 2:Mild 0:Windy:Strong 0:Humadity:High ❑ prediction = tree.predict([[2,2,1,0]]) ❑ print("Decision",prediction) ❑ www.SunilOS.com 98
  • 99. Working of ID3 Algorithm ❑ For ID3 implementation we are using the same dataset which we have used in CART algorithm. ❑ First step will be to create a root node. ❑ If all results are yes, then the leaf node “yes” will be returned else the leaf node “no” will be returned. ❑ Find out the Entropy of all observations and entropy with attribute “x” that is E(S) and E(S, x). ❑ Find out the information gain and select the attribute with high information gain. ❑ Repeat the above steps until all attributes are covered. www.SunilOS.com 99
  • 100. Complete Entropy of dataset ❑ First we will calculate entropy for decision column (play) Decision column consists of 14 instances and includes two labels: Yes and No. o Yes=9 o No=5 ❑ Entropy(Decision)= –p(Yes)*log2p(Yes)–p(No)*log2p(No) ❑ Entropy(Decision)= –(9/14) *log2(9/14)–(5/14)*log2(5/14) = 0.940 ❑ Now, we need to find out the most dominant attribute to make root node of the tree. www.SunilOS.com 100
  • 101. Wind factor on decision ❑ Formula: o Gain(Decision,Wind)=Entropy(Decision) – ∑ [ p(Decision|Wind).* Entropy(Decision|Wind)] ❑ Wind attribute has two labels: Weak and Strong. We would reflect it to the formula. o Gain(Decision,Wind)=Entropy(Decision)– [p(Decision|Wind=Weak)*Entropy(Decision|Wind=Weak)]- [p(Decision|Wind=Strong)*Entropy(Decision|Wind=Strong) ] ❑ Now, we need to calculate (Decision|Wind=Weak) and (Decision|Wind=Strong) respectively. www.SunilOS.com 101
  • 102. Weak wind factor on decision Day Outlook Temp. Humidity Wind Decision 1 Sunny Hot High Weak No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes 13 Overcast Hot Normal Weak Yes www.SunilOS.com 102
  • 103. Weak wind factor on decision ❑ There are 8 instances for weak wind. Decision of 2 items are No and 6 items are Yes as illustrated below. ❑ Entropy(Decision|Wind=Weak)=–p(No)*log2p(No)-p(Yes)*log2p(Yes) ❑ Entropy(Decision|Wind=Weak) = – (2/8)*log2(2/8) – (6/8) *log2(6/8) ❑ Entropy(Decision|Wind=Weak) = 0.811 www.SunilOS.com 103
  • 104. Strong wind factor on decision(Play): Day Outlook Temp. Humidity Wind Decision 2 Sunny Hot High Strong No 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 14 Rain Mild High Strong No www.SunilOS.com 104
  • 105. Strong wind factor on decision(Play): ❑ Here, there are 6 instances for strong wind. Decision is divided into two equal parts. ❑ Entropy(Decision|Wind=Strong)=–p(No)*log2p(No)– p(Yes)*log2p(Yes) ❑ Entropy(Decision|Wind=Strong) = – (3/6)*log2(3/6) – (3/6) *log2(3/6) ❑ Entropy(Decision|Wind=Strong) = 1 www.SunilOS.com 105
  • 106. Information Gain for Wind Attribute ❑ Formula: o Gain(Decision,Wind) = Entropy(Decision)– [p(Decision|Wind=Weak) * Entropy(Decision|Wind=Weak) ] – [p(Decision|Wind=Strong)*Entropy(Decision|Wind=Strong) ] ❑ Gain(Decision,Wind) = 0.940 – [ (8/14) *0.811 ] – [ (6/14)*1] ❑ Gain(Decision,Wind) = 0.048 ❑ We Have calculated Gain for Wind. Apply the same procedure to Others to get Best attribute to make it root node. www.SunilOS.com 106
  • 107. Information Gain for Other factors ❑ Other factors on decision o Gain(Decision, Outlook) = 0.246 o Gain(Decision, Temperature) = 0.029 o Gain(Decision, Humidity) = 0.151 ❑ Outlook factor on decision has highest score. That’s why, outlook decision will appear in the root node of the tree. www.SunilOS.com 107
  • 108. Overcast outlook on decision ❑ Basically, decision will always be yes if outlook were overcast. www.SunilOS.com 108 Day Outlook Temp. Humidity Wind Decision 3 Overcast Hot High Weak Yes 7 Overcast Cool Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Weak Yes
  • 109. Sunny outlook on decision Day Outlook Temp. Humidity Wind Decision 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 11 Sunny Mild Normal Strong Yes www.SunilOS.com 109
  • 110. Sunny outlook on decision ❑ Here, there are 5 instances for sunny outlook. Decision would be probably 3/5 percent No, 2/5 percent Yes. ❑ Gain(Outlook=Sunny|Temperature) = 0.570 ❑ Gain(Outlook=Sunny|Humidity) = 0.970 ❑ Gain(Outlook=Sunny|Wind) = 0.019 ❑ Now, humidity is the decision because it produces the highest score if outlook were sunny. www.SunilOS.com 110
  • 111. Sunny outlook on decision ❑ At this point, decision will always be NO if humidity were high. ❑ At this point, decision will always be Yes if humidity were Normal. www.SunilOS.com 111 Day Outlook Temp. Humidity Wind Decision 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 8 Sunny Mild High Weak No Day Outlook Temp. Humidity Wind Decision 9 Sunny Cool Normal Weak Yes 11 Sunny Mild Normal Strong Yes
  • 112. Rain outlook on decision ❑ Gain(Outlook=Rain | Temperature) = 0.01997309402197489 ❑ Gain(Outlook=Rain | Humidity) = 0.01997309402197489 ❑ Gain(Outlook=Rain | Wind) = 0.9709505944546686 ❑ Here, wind produces the highest score if outlook were rain. That’s why, we need to check wind attribute in 2nd level if outlook were rain. www.SunilOS.com 112 Day Outlook Temp. Humidity Wind Decision 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 6 Rain Cool Normal Strong No 10 Rain Mild Normal Weak Yes 14 Rain Mild High Strong No
  • 113. Rain outlook on decision ❑ Decision will always Yes if wind were weak and outlook were rain. ❑ Decision will always No if wind were Strong and outlook were rain. www.SunilOS.com 113 Day Outlook Temp. Humidity Wind Decision 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes Day Outloo k Temp. Humidit y Wind Decision 6 Rain Cool Normal Strong No 14 Rain Mild High Strong No
  • 115. Implementation of ID3 ❑ #Import the DecisionTreeClassifier ❑ from sklearn.tree import DecisionTreeClassifier ❑ # Assigning features and label variables ❑ weather=['Sunny','Sunny','Overcast','Rainy','Ra iny','Rainy','Overcast','Sunny','Sunny', 'Rainy','Sunny','Overcast','Overcast','Rainy'] ❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','C ool','Mild','Cool','Mild','Mild','Mild','Hot',' Mild'] ❑ ❑ play=['No','No','Yes','Yes','Yes','No','Yes','N o','Yes','Yes','Yes','Yes','Yes','No'] www.SunilOS.com 115
  • 116. Implementation of ID3(cont.) ❑ # Import LabelEncoder ❑ from sklearn import preprocessing ❑ #creating labelEncoder ❑ le = preprocessing.LabelEncoder() ❑ # Converting string labels into numbers. ❑ weather_encoded=le.fit_transform(weather) ❑ print("Weather:",weather_encoded) ❑ ❑ # Converting string labels into numbers ❑ temp_encoded=le.fit_transform(temp) www.SunilOS.com 116
  • 117. Implementation of ID3(cont.) ❑ print("Temp:",temp_encoded) ❑ label=le.fit_transform(play) ❑ print("Play:",label) ❑ #Combinig weather and temp into single listof tuples ❑ features=list(zip(weather_encoded,temp_encoded)) ❑ print("Features:",features) ❑ #Create Instance of Model, and train the model ❑ tree = DecisionTreeClassifier(criterion='entropy') ❑ tree.fit(features,label) ❑ #Predict result for 0:Overcast, 2:mild ❑ prediction = tree.predict([[0,2]]) ❑ print("Decision",prediction) www.SunilOS.com 117
  • 119. What is Random Forest ❑In Random Forest algorithm we join different and same type of multiple algorithms together. For example multiple decision trees to make a forest of trees. That is known as Random forest. ❑ It helps us to make a powerful prediction model. ❑Random forest algorithm works for both regression and classification Problems. ❑Application of Random Forest o Fraud prediction o Cancer detection o Stock market predictions o Spam filter o News classification www.SunilOS.com 119
  • 120. How does random Forest Works? ❑ Pick N random data records from the dataset. ❑ Based on these N numbers of record build a decision tree. ❑ Choose how many trees we want to create and repeat the previous steps. ❑ To predict the output for new record: ❑ In case of Regression: Each tree will predict the result. The final result will be calculated by taking average of all result predicted by all trees. ❑ In case of Classification: The trees will predict the class level for new record. Finally we will assign the new record to the category which has majority. www.SunilOS.com 120
  • 121. Advantages and Disadvantages of Random Forest Advantages ❑ In Random forest there is multiple trees. So this algorithm is not biased. ❑ This is a stable algorithm. If new training data is introduced only one tree will be affected not all the trees. ❑ This is suitable for both categorical data, and numerical data. ❑ This is also work well when dataset has missing values ❑ Model can be trained parallel . Disadvantages ❑It is complex algorithm. ❑It required more computational time to join multiple decision trees. ❑It takes too much time to train the model as compare to other algorithm 121
  • 122. Code implementation of random Forest ❑ #Assign features ❑ weather=['Sunny','Sunny','Overcast','Rainy' ,'Rainy','Rainy','Overcast','Sunny','Sunny' ,'Rainy','Sunny','Overcast','Overcast', 'Rainy'] ❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool ','Cool','Mild','Cool','Mild','Mild','Mild' ,'Hot','Mild'] ❑ humadity=["High","High","High","High","Norm al","Normal","Normal","High","Normal","Norm al","Normal","High", "Normal","High"] www.SunilOS.com 122
  • 123. Code implementation of random Forest ❑ Windy=["Weak","Strong","Weak","Weak","Weak" ,"Strong","Strong","Weak","Weak","Weak", "Strong","Strong","Weak","Strong"] ❑ play=['No','No','Yes','Yes','Yes','No','Yes ','No','Yes','Yes','Yes','Yes','Yes','No'] ❑ ❑ #Import LabelEncoder ❑ from sklearn import preprocessing ❑ #creating labelEncoder ❑ le = preprocessing.LabelEncoder() www.SunilOS.com 123
  • 124. Code implementation of random Forest ❑ # Converting string labels into numbers. ❑ weather_encoded=le.fit_transform(weather) ❑ print("Weather:",weather_encoded) ❑ ❑ temp_encoded=le.fit_transform(temp) ❑ print("Temp:",temp_encoded) ❑ ❑ windy_encoded=le.fit_transform(Windy) ❑ print("Windy:",windy_encoded) ❑ ❑ Humadity_encoded=le.fit_transform(humadity) ❑ print("Humadity:",Humadity_encoded) ❑ ❑ label=le.fit_transform(play) ❑ print("Play:",label) www.SunilOS.com 124
  • 125. Code implementation of random Forest ❑ #Combinig weather and temp into single listof tuples ❑ features=list(zip(weather_encoded,temp_encoded, ❑ windy_encoded,Humadity_encoded)) ❑ #Import the RandomforestClassifier ❑ from sklearn.ensemble import RandomForestClassifi er ❑ #create instance of the Random Forest Classifier ❑ tree= RandomForestClassifier(n_estimators=5) ❑ #train the Model ❑ tree.fit(features,label) ❑ #Test 2:sunny, 2:Mild 0:Windy:Strong 0:Humadity:High ❑ prediction = tree.predict([[2,2,1,0]]) ❑ print("Decision",prediction) www.SunilOS.com 125
  • 126. www.SunilOS.com 126 Support Vector Machine www.sunilos.com www.raystec.com
  • 127. SVM ❑ Support Vector Machine is a supervised machine learning algorithm. ❑ They are developed in 1990’s and still famous. ❑ It is used for classification and Regression problem. ❑ SVM can be used for linearly and multidimensional dataset (2 Dim. and 3 Dim.). ❑ SVM can be used for multiclass classification(Having more than 1 class Label). www.SunilOS.com 127
  • 128. How SVM Works: ❑ To separate two classes as shown in previous slide. we need a line that’s separate data in two classes. ❑ This line is known as Decision boundary or a hyper plane. We draw a line such as we have a maximum margin between the data points of the classes, which is near to the hyper plane. ❑ To separate the two classes of data points, there are many possible hyper planes that could be chosen. Our objective is to find a plane that has the maximum margin, i.e. the maximum distance between data points of both classes. ❑ Maximizing the margin distance provides some reinforcement so that future data points can be classified with more confidence. www.SunilOS.com 128
  • 129. SVM Related Terminologies ❑ Support Vectors: o When we classify data with the help of hyperplane, than the data points which are near to the hyperplane is known as support Vectors. ❑ Hyperplane o A hyperplane is a decision boundary between the two classes. It is used to separate the data points of different class. ❑Margin: o We draw a parallel line along the data points which are near to the hyperplane. The gap between decision lines of each class is known as margin. o For ex. D- and D+ are the lines which are closest to the support vectors of two opponent classes. Than we can obtain margin as o Margin=D- + D+ o If the margin is larger in between the classes, then it is considered a good margin, a smaller margin is a bad margin. www.SunilOS.com 129
  • 130. What is the reason to Choose SVM? ❑SVM can be used for multiclass classification. ❑SVM can be used for linear separated dataset. ❑SVM can be used for high dimensional dataset which are not linearly separable. ❑SVM is efficiently classifying the dataset in high dimension. www.SunilOS.com 130
  • 131. Implementation of Linear SVM: ❑ #import liabraries ❑ import numpy as np ❑ import matplotlib.pyplot as plt ❑ from matplotlib import style ❑ style.use("ggplot") ❑ from sklearn import svm ❑ #Attributes ❑ x = [1, 5, 1.5, 8, 1, 9] ❑ y = [2, 8, 1.8, 8, 0.6,11] ❑ plt.scatter(x,y) ❑ plt.show() www.SunilOS.com 131
  • 132. Implementation of Linear SVM(cont.) ❑ #import preprocessing ❑ from sklearn import preprocessing ❑ X=list(zip(x,y)) ❑ y = [0,1,0,1,0,1] ❑ #Train SVM Model ❑ clf = svm.SVC(kernel='linear', C = 1.0) ❑ clf.fit(X,y) ❑ # Test x=0.58, y=0.76 ❑ print(clf.predict([[0.58,0.76]])) ❑ #x=10.58, y=10.76 ❑ print(clf.predict([[10.58,10.76]])) www.SunilOS.com 132
  • 134. SVM Kernels ❑ The SVM algorithm is implemented in practice using a kernel. ❑ A kernel transforms an input data space into the required form (linear or non linear). ❑ SVM uses a technique called the kernel trick. Here, the kernel takes a low- dimensional input space and transforms it into a higher dimensional space. ❑ In other words, you can say that it converts non separable problem to separable problems by adding more dimension to it. ❑ It is most useful in non-linear separation problem. Kernel trick helps you to build a more accurate classifier. ❑ Types of Kernels o Linear Kernel o Polynomial Kernel o RBF (Radial Basis Kernel ) www.SunilOS.com 134
  • 135. Linear Kernel ❑A linear kernel can be used as normal dot product any two given observations. The product between two vectors is the sum of the multiplication of each pair of input values. o K(x, xi) = sum(x * xi) ❑ For example, the inner product of the vectors [1, 2] and [3, 4] is 1*3 + 2*4 or 11. ❑ The equation for making a prediction for a new input using the dot product between the input (x) and each support vector (xi) is calculated as follows: f(x) = B0 + sum(ai * (x,xi)) ❑ This is an equation that is used for calculating the inner products of a new input vector (x) with all support vectors in training data. The coefficients B0 and ai (for each input) must be estimated from the training data by the learning algorithm. www.SunilOS.com 135
  • 136. Polynomial Kernel ❑A polynomial kernel is a more generalized form of the linear kernel. The polynomial kernel can distinguish curved or nonlinear input space. K(x,xi) = 1 + sum(x * xi)^d ❑Where d is the degree of the polynomial. d=1 is similar to the linear transformation. The degree needs to be manually specified in the learning algorithm. www.SunilOS.com 136
  • 137. RBF (radial basis function) Kernel ❑ The Radial basis function kernel is a popular kernel function commonly used in support vector machine classification. RBF can map an input space in infinite dimensional space. K(x,xi) = exp(-gamma * sum((x – xi^2)) ❑ Here gamma is a parameter, which ranges from 0 to 1. A higher value of gamma will perfectly fit the training dataset, which causes over-fitting. Gamma=0.1 is considered to be a good default value. The value of gamma needs to be manually specified in the learning algorithm. www.SunilOS.com 137
  • 138. Implementation of Non Linear Kernel ❑ We can see our dataset is not linearly separable from the graph. www.SunilOS.com 138
  • 139. Implementation of Non Linear Kernel ❑ # Assigning features and label variables ❑ weather=['Sunny','Sunny','Overcast','Rainy','Ra iny','Rainy','Overcast','Sunny','Sunny','Rainy' ,'Sunny','Overcast','Overcast','Rainy'] ❑ ❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','C ool','Mild','Cool','Mild','Mild','Mild','Hot',' Mild'] ❑ ❑ humadity=["High","High","High","High","Normal", "Normal","Normal","High","Normal","Normal","Nor mal","High","Normal","High"] ❑ www.SunilOS.com 139
  • 140. Implementation of Non Linear Kernel ❑ Windy=["Weak","Strong","Weak","Weak","Weak","St rong","Strong","Weak","Weak","Weak","Strong","S trong","Weak","Strong"] ❑ ❑ play=['No','No','Yes','Yes','Yes','No','Yes','N o','Yes','Yes','Yes','Yes','Yes','No'] ❑ # Import LabelEncoder ❑ from sklearn import preprocessing ❑ #creating labelEncoder ❑ le = preprocessing.LabelEncoder() ❑ # Converting string labels into numbers. ❑ weather_encoded=le.fit_transform(weather) ❑ print("Weather:",weather_encoded) www.SunilOS.com 140
  • 141. Implementation of Non Linear Kernel ❑ # Converting string labels into numbers ❑ temp_encoded=le.fit_transform(temp) ❑ print("Temp:",temp_encoded) ❑ windy_encoded=le.fit_transform(Windy) ❑ print("Windy:",windy_encoded) ❑ Humidity_encoded=le.fit_transform(humadity) ❑ print("Humadity:",Humadity_encoded) ❑ label=le.fit_transform(play) ❑ print("Play:",label) www.SunilOS.com 141
  • 142. Implementation of Non Linear Kernel ❑ #Combinig weather and temp into single list of tuples ❑ features=list(zip(weather_encoded,temp_encoded,windy _encoded,Humadity_encoded)) ❑ print("Features:",features) ❑ #import svm ❑ from sklearn import svm ❑ #Create a svm Classifier ❑ clf = svm.SVC(kernel='rbf') # Linear Kernel ❑ #Train SVM Model ❑ clf.fit(features,label) ❑ # Test 2:sunny, 2:Mild 0:Windy:Strong 0:Humadity:High ❑ prediction = clf.predict([[2,2,1,0]]) ❑ print("Decision",prediction) www.SunilOS.com 142
  • 143. Advantages & Disadvantages of SVM Advantages ❑It works really well with a clear margin of separation ❑It is effective in high dimensional spaces. ❑It is effective in cases where the number of dimensions is greater than the number of samples. ❑It support vectors, so it is also memory efficient. Disadvantages ❑It doesn’t perform well when we have large data set because the required training time is higher ❑It also doesn’t perform very well, when the data set has more noise i.e. target classes are overlapping 143
  • 145. Types Of Regression ❑Linear regression ❑Logistic regression ❑Polynomial regression www.SunilOS.com 145 Logistic Linear Polynomial Regression Regression Regression
  • 146. Logistic Regression and linear Regression Linear Regression Logistic Regression Linear regression is used to predict the continuous dependent variable using a given set of independent variables. Logistic Regression is used to predict the categorical dependent variable using a given set of independent variables. Linear Regression is used for solving Regression problem. Logistic regression is used for solving Classification problems. In Linear regression, we predict the value of continuous variables. In logistic Regression, we predict the values of categorical variables. In linear regression, we find the best fit line, by which we can easily predict the output. In Logistic Regression, we find the S-curve by which we can classify the samples. Least square estimation method is used for estimation of accuracy. Maximum likelihood estimation method is used for estimation of accuracy. The output for Linear Regression must be a continuous value, such as price, age, etc. The output of Logistic Regression must be a Categorical value such as 0 or 1, Yes or No, etc. In Linear regression, it is required that relationship between dependent variable and independent variable must be linear. In Logistic regression, it is not required to have the linear relationship between the dependent and independent variable. In linear regression, there may be collinearity between the independent variables. In logistic regression, there should not be collinearity between the independent variable. www.SunilOS.com 146
  • 147. Linear Regression ❑Linear regression: o Linear regression is a statistical approach for modeling the relationship between a dependent variable with a given set of independent variables. 4/16/2020 www.SunilOS.com 147
  • 148. Linear Regression cont. ❑Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an independent variable, and the other is considered to be a dependent variable. o For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model. 4/16/2020 www.SunilOS.com 148
  • 149. What is Linear ❑First, let’s say that you are shopping at Dmart. Whether you buy goods or not, you have to pay 2.00rs for parking ticket. Each apple price 1.5rs., and you have to buy an (x) item of apple. Then we can populate a price list as following: 4/16/2020 www.SunilOS.com 149
  • 150. Linear Relationship among data Quantity Price 1 3.50 Rs. 2 5.00 Rs 3 6.50 Rs 4 8.00 Rs 5 9.50 Rs … ... 10 17.00 Rs 11 18.50 Rs ... ... x y 4/16/2020 www.SunilOS.com 150
  • 151. Linear Function ❑ It’s easy to predict (or calculate) the Price based on Value and vice versa using the equation of y=2+1.5x for this example or: Y =a + bx ❑ Linear Functions with: ❑ a = 2 ❑ b = 1.5 ❑ A linear function has one independent variable and one dependent variable. The independent variable is x and the dependent variable is y. ❑ a is the constant term or the y intercept. It is the value of the dependent variable when x = 0. ❑ b is the coefficient of the independent variable. It is also known as the slope and gives the rate of change of the dependent variable. 4/16/2020 www.SunilOS.com 151
  • 152. Implementation of Linear Regression: ❑ Code explanation: ❑ dataset: the table contains all values in our csv file ❑ X: the first column which contains Years Experience array ❑ y: the last column which contains Salary array y = b0 + b1*x1 ❑ y: dependent variable ❑ b0: constant ❑ b1: coefficient ❑ x1: independent variable 4/16/2020 www.SunilOS.com 152
  • 153. Dataset: Salary Data 4/16/2020 www.SunilOS.com 153
  • 154. Visualization of data 4/16/2020 www.SunilOS.com 154
  • 155. Code Implementation of Linear Regression ❑ import numpy as np ❑ import matplotlib.pyplot as plt ❑ import pandas as pd ❑ # Importing the dataset ❑ dataset=pd.read_csv('E:/MLImplementation/r egression.csv') ❑ #get a copy of dataset exclude last column ❑ X = dataset.iloc[:, :-1].values ❑ #get array of dataset in column 1st ❑ y = dataset.iloc[:, 1].values 4/16/2020 www.SunilOS.com 155
  • 156. Code Implementation of Linear Regression (cont.) ❑ # Splitting the dataset into the Training set and Test set ❑ from sklearn.model_selection import train_test_split ❑ X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=0) ❑ # Fitting Simple Linear Regression to the Training set ❑ from sklearn.linear_model import LinearRegression ❑ regressor = LinearRegression() ❑ regressor.fit(X_train, y_train) 4/16/2020 www.SunilOS.com 156
  • 157. Code Implementation of Linear Regression (cont.) ❑ # Predicting the Test set results ❑ y_pred = regressor.predict(X_test) ❑ #predicting the salary for 5 year Experienced Employee ❑ y_pred = regressor.predict([[5]]) ❑ print(y_pred) 4/16/2020 www.SunilOS.com 157
  • 158. Code Implementation of Linear Regression (cont.) ❑ # Visualizing the Training set results ❑ viz_train = plt ❑ viz_train.scatter(X_train, y_train, color='red') ❑ viz_train.plot(X_train, regressor.predict(X_train), color='blue') ❑ viz_train.title('Salary VS Experience (Training set)') ❑ viz_train.xlabel('Year of Experience') ❑ viz_train.ylabel('Salary') ❑ viz_train.show() 4/16/2020 www.SunilOS.com 158
  • 160. Code Implementation of Linear Regression (cont.) ❑ # Visualizing the Test set results ❑ viz_test = plt ❑ viz_test.scatter(X_test, y_test, color='red') ❑ viz_test.plot(X_train,regressor.predict(X_t rain), color='blue') ❑ viz_test.title('Salary VS Experience (Test set)') ❑ viz_test.xlabel('Year of Experience') ❑ viz_test.ylabel('Salary') ❑ viz_test.show() 4/16/2020 www.SunilOS.com 160
  • 162. Advantages & Disadvantages of Linear Regression ❑Advantages: o Simple and easy to understand. o Cheap computational cost. o Ground for more complex machine learning algorithms. ❑ ❑Disadvantage: o Oversimplify or fail in non-linear problems (only do well in linear modeling) o Sensitive to outliers and noises 4/16/2020 www.SunilOS.com 162
  • 163. Multi Linear Regression ❑In most cases, we will have more than one independent variable — we’ll have multiple variables; it can be as little as two independent variables and up to hundreds (or theoretically even thousands) of variables. ❑In those cases we will use a Multiple Linear Regression model (MLR). The regression equation is pretty much the same as the simple regression equation, just with more variables: Y= b0 + b1X1 + b2X2+...bnXn 4/16/2020 www.SunilOS.com 163
  • 164. Implementation Of Multi linear Regression ❑We are taking loan dataset for multi linear regression with age, credit-rating and children as features and loan as target. ❑We are going to predict the loan amount (dependent variable) with the help of age, credit-rating and no of children(Independent variable). ❑Note that the data has four columns, out of which three columns are features and one is the target variable. 4/16/2020 www.SunilOS.com 164
  • 166. Relationship between credit-rating and loan amount 4/16/2020 www.SunilOS.com 166
  • 167. Code Implementation of MLR ❑ #Features age, credit-rating and no of children ❑ age=[19,18,28,33,32,31,46,37,37,60,25,62,23,56] ❑ credit_rating=[27.9,42.13,33,22.705,28.88,25.74, ❑ 33.44,27.74,29.83,25.84,26.22,26.29,34.4,39.82] ❑ children=[0,1,3,0,0,0,1,3,2,0,0,0,0,0] ❑ #Label data ❑ loan=[16884.924,1725.5523,4449.462,21984.47061,3866. 8552, ❑ 3756.6216,8240.5896,7281.5056,6406.4107,28923.13692, ❑ 2721.3208,27808.7251,1826.843,11090.7178,] 4/16/2020 www.SunilOS.com 167
  • 168. Code Implementation of MLR (cont.) ❑ #Combining age, credit-rating and children into single list of tuples ❑ features=list(zip(age,credit_rating,children)) ❑ print(features) ❑ #define the multiple Linear regression model ❑ linear_regress = LinearRegression() ❑ #Fit the multiple Linear regression model ❑ linear_regress.fit(features,loan) ❑ print("coefficient:",linear_regress.coef_) ❑ print("intercept:",linear_regress.intercept_) ❑ # predict with test data ❑ #age:20,credit-rating:32,children:0 ❑ y_pred=linear_regress.predict([[20,32,0]]) ❑ print(y_pred) 4/16/2020 www.SunilOS.com 168
  • 169. Disclaimer ❑This is an educational presentation to enhance the skill of computer science students. ❑This presentation is available for free to computer science students. ❑Some internet images from different URLs are used in this presentation to simplify technical examples and correlate examples with the real world. ❑We are grateful to owners of these URLs and pictures. www.SunilOS.com 169