- 2. www.SunilOS.com 2 What is Machine Learning? ❑ Human Learns from past experience. ❑ A computer does not have “experiences”. ❑ A computer system learns from data, ❑ Which represent some “past experiences” of an application domain. ❑ Our focus: learn a target function that can be used to predict the values of a class attribute, e.g. a loan application is, approve or not-approved, and high-risk or low risk. ❑ The task is commonly called: Supervised learning, classification, or inductive learning.
- 3. Types of Learning ❑Supervised Learning o Classification o Regression ❑Unsupervised Learning o Clustering ❑Reinforcement Learning www.SunilOS.com 3
- 4. Types of supervised Learning ❑Classification: o A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”. ❑Regression: o A regression problem is when the output variable is a real value, such as “dollars” or “weight”. www.SunilOS.com 4
- 5. Supervised Learning Process ❑Learning(training): o Learn the model with known data ❑Testing: o test the Model with unseen data ❑Accuracy: ❑ No of right classification/Total no of test case www.SunilOS.com 5 Training data Learning algorithm Model AccuracyTraining Data Step1: Training Step2: Testing Testing Data
- 6. Classification example ❑ A loan providing company receives thousands of applications for new loans. ❑ Each application contains information about an applicant o Age o Marital status o annual salary o Outstanding debts o credit rating o etc. ❑ Problem: to decide whether an application should approved, or to classify applications into two categories, approved and not approved. www.SunilOS.com 6
- 9. An example ❑Data: Loan application data ❑Task: Predict whether a loan should be approved or not. ❑Performance measure: Accuracy. ❑No learning: classify all future applications (test data) to the majority class (i.e., Yes): o Accuracy = 9/15 = 60%. ❑We can do better than 60% with learning. www.SunilOS.com 9
- 10. Evaluating classification methods ❑Predictive accuracy o Accuracy=No of correct classification / total no of test Case ❑Efficiency o time to construct the model o time to use the model www.SunilOS.com 10
- 11. Conclusion ❑ Applications of supervised learning are in almost any field or domain. ❑ There are numerous classification techniques. o Bayesian networks o K- Nearest Neighbors o Decision Tree Classification o Fuzzy classification ❑ This large number of methods also show the importance of classification and its wide applicability. ❑ It remains to be an active research area. www.SunilOS.com 11
- 13. www.SunilOS.com 13 What is Classification? Classification is a supervised machine learning approach. Computer uses Training data for learning and uses this learning to classify new observations. Classification can be: Binary class classification : spam or not spam, male or female Multiclass classification: Fruits, Colors. 4/16/2020 www.SunilOS.com 13
- 14. Types of classification algorithm ❑Linear Classifiers: Logistic Regression, Naive Bayes Classifier ❑K Nearest Neighbor ❑Support Vector Machines ❑Decision Trees ❑Random Forest 4/16/2020 www.SunilOS.com 14
- 15. K-Nearest Neighbor ❑ The k-nearest-neighbors algorithm is a supervised classification technique that based on similar qualities. ❑ KNN assumes, similar things exist near to each other. ❑ The algorithm takes a bunch of labeled points and uses them to learn how to label other points. ❑ To label a new point, it looks at the labeled points closest to that new point (those are its nearest neighbors). ❑ Closeness is typically expressed in terms of a dissimilarity function. ❑ Once it checks with ‘k’ number of nearest neighbors, it assigns a label based on whichever label the most of the neighbors have. www.SunilOS.com4/16/2020 15
- 16. KNN working Steps ❑Calculate distance for new test data with old labeled data ❑Find closest neighbors for new test data. ❑Vote for labels which is nearest. 4/16/2020 www.SunilOS.com 16
- 17. KNN algorithm Implementation ❑Define dataset. ❑Prepare data. ❑Train model. ❑Test Model. ❑Calculate accuracy. 4/16/2020 www.SunilOS.com 17
- 18. Dataset ❑Let's first create your own dataset. Here you need two kinds of attributes or columns in your data: Feature and target label. The reason for two type of column is "supervised nature of KNN algorithm". ❑In this dataset, you have two features (weather and temperature) and one label(play). 4/16/2020 www.SunilOS.com 18
- 19. Define dataset Weather Temp Play Sunny Hot No Sunny Hot Yes Overcast Hot Yes Rainy Mild Yes Rainy Cool No Rainy Cool Yes Overcast Cool No Sunny Mild Yes Sunny Cool Yes Rainy Mild Yes Sunny Mild Yes Overcast Mild Yes Overcast Hot Yes Rainy Mild No4/16/2020 www.SunilOS.com 19
- 21. Code implementation in scikit learn ❑ # Assigning features and label variables ❑ # First Feature ❑ weather=['Sunny','Sunny','Overcast','Rainy','Rainy', 'Rainy','Overcast','Sunny','Sunny', ❑ 'Rainy','Sunny','Overcast','Overcast','Rainy'] ❑ # Second Feature ❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool', 'Mild','Cool','Mild','Mild','Mild','Hot','Mild'] ❑ ❑ # Label or target variable ❑ play=['No','No','Yes','Yes','Yes','No','Yes','No','Y es','Yes','Yes','Yes','Yes','No'] 4/16/2020 www.SunilOS.com 21
- 22. Code implementation in scikit learn(cont.) ❑ # Import Label Encoder ❑ from sklearn import preprocessing ❑ #creating label Encoder ❑ le = preprocessing.LabelEncoder() ❑ # Converting string labels into numbers. ❑ weather_encoded=le.fit_transform(weather) ❑ print(weather_encoded) ❑ ❑ # converting string labels into numbers ❑ temp_encoded=le.fit_transform(temp) ❑ label=le.fit_transform(play) ❑ print(label) 4/16/2020 www.SunilOS.com 22
- 23. Code implementation in scikit learn(cont.) ❑ #combining weather and temp into single list of tuples ❑ features=list(zip(weather_encoded,temp_encoded)) ❑ print(features) ❑ #Prepare Model instance ❑ from sklearn.neighbors import KNeighborsClassifier ❑ model = KNeighborsClassifier(n_neighbors=3) ❑ # Train the model using the training sets ❑ model.fit(features,label) ❑ #Predict Output ❑ predicted= model.predict([[0,2]]) # 0:Overcast, 2:Mild ❑ print(predicted) 4/16/2020 www.SunilOS.com 23
- 24. Advantage of KNN ❑It is extremely easy to implement ❑This makes the KNN algorithm much faster than other algorithms that require training e.g. SVM, Linear Regression etc. ❑Since the algorithm requires no training before making predictions, new data can be added seamlessly. ❑There are only two parameters required to implement KNN i.e. the value of K and the distance function (e.g. Euclidean or Manhattan etc.) 4/16/2020 www.SunilOS.com 24
- 25. Disadvantages of KNN ❑ The KNN algorithm doesn't work well with high dimensional data because with large number of dimensions, it becomes difficult for the algorithm to calculate distance in each dimension. ❑ The KNN algorithm has a high prediction cost for large datasets. This is because in large datasets the cost of calculating distance between new point and each existing point becomes higher. ❑ Finally, the KNN algorithm doesn't work well with categorical features since it is difficult to find the distance between dimensions with categorical features. 4/16/2020 www.SunilOS.com 25
- 26. Naive Bayes Classification Base ❑It uses Bayes theorem of probability for prediction of unknown class/Label. ❑Naive Bayes classifier assumes that the effect of a particular feature in a class is independent of other features. o For example, a loan applicant is desirable or not depending on his/her income, previous loan and transaction history, age, and location. o Even if these features are interdependent, these features are still considered independently. o This assumption simplifies computation, and that's why it is considered as naive www.SunilOS.com 26
- 27. Approve a Loan ❑ Bank has received a loan application and now we want to predict whether bank will approve or not. ❑ Approval will be decide on the basis of independent attributes specified in the application form. ❑ Income, previous loan, transaction history, age, and location information specified in application form are considered as independent attribute. ❑ Now we will calculate separate probability: ❑ probability of approval or rejection of loan on income, ❑ probability of approval or rejection of loan on previous loan, ❑ probability of approval or rejection of loan on age, ❑ probability of approval or rejection of loan on location, ❑ Naive Bayes will help us to multiply above probabilities and forecast approval and rejection of new loan application. www.SunilOS.com 27
- 28. Naïve Bayes Classification Base (cont.) ❑ Where, ❑ P(c|x) is the posterior probability of class c given predictor ( features). ❑ P(c) is the probability of class. ❑ P(x|c) is the likelihood which is the probability of predictor given class. ❑ P(x) is the prior probability of predictor. www.SunilOS.com 28
- 29. Types of Naive Bayes Algorithm ❑Gaussian Naive Bayes. ❑Multinomial Naive Bayes. ❑Bernoulli Naïve Bayes. ❑P(A|B)=P(B|A)*P(A) ❑ ----------------- ❑ P(B) www.SunilOS.com 29
- 30. How Gaussian Naive Bayes classifier works? ❑Given an example of weather conditions and playing sports. ❑You need to calculate the probability of playing sports. ❑Now, you need to classify whether players will play or not, based on the weather condition. www.SunilOS.com 30
- 31. How Naive Bayes classifier works? (cont.) ❑ Naive Bayes classifier calculates the probability of an event in the following steps: ❑ Calculate the prior probability for given class labels o p(play) o P(not play). ❑ Find Likelihood probability with each attribute for each class. o P(Hot/play) or p(Hot/not play) o P(Cold/play) p(Cold/not play) ❑ Put these value in Bayes Formula and calculate posterior probability. ❑ See which class has a higher probability, given the input belongs to the higher probability class. www.SunilOS.com 31
- 32. Dataset Weather Play Sunny No Sunny Yes Overcast Yes Rainy Yes Rainy No Rainy Yes Overcast No Sunny Yes Sunny Yes Rainy Yes Sunny Yes Overcast Yes Overcast Yes Rainy No www.SunilOS.com 32
- 33. Frequency Table Weather No Yes Sunny 1 4 5 Overcast 1 3 4 Rainy 2 3 5 Total 4 10 www.SunilOS.com 33
- 34. Prior Probability of class Weather No Yes Sunny 1 4 5 5/14=0.35 Overcast 1 3 4 4/14=0.29 Rainy 2 3 5 5/14=0.35 Total 4 10 4/14=0.29 10/14=0.71 www.SunilOS.com 34
- 35. Posterior Probability Weather No Yes Posterior probability of No Posterior Probability of Yes Sunny 1 4 1/4= 0.25 4/10=0.4 Overcast 1 3 1/4= 0.25 3/10=0.3 Rainy 2 3 2/4 =0.5 3/10=0.3 Total 4 10 4/14=0.29 10/14=0.71 www.SunilOS.com 35
- 36. Probability of playing when weather is overcast ❑ Equation: o P(Yes|Overcast)=P(Overcast|Yes)*P(Yes)/P(Overcast) ❑ Calculate Prior Probabilities: o P(Overcast) = 4/14 = 0.29 o P(Yes)= 10/14 = 0.71 ❑ Calculate Posterior Probabilities: o P(Overcast |Yes) = 3/10 = 0.3 ❑ Put Prior and Posterior probabilities in equation o P (Yes | Overcast) = 0.3 * 0.71 / 0.29 = 0.7344(Higher) www.SunilOS.com 36
- 37. Probability of not playing when weather is overcast ❑ Equation: o P(No|Overcast)=P(Overcast|No)*P(No)/P(Overcast) ❑ Calculate Prior Probabilities: o P(Overcast) = 4/14 = 0.29 o P(No)= 4/14 = 0.29 ❑ Calculate Posterior Probabilities: o P(Overcast |No) = 1/4 = 0.25 ❑ Put Prior and Posterior probabilities in equation o P (No | Overcast) = 0.25 * 0.29 / 0.29 = 0.25(Low) www.SunilOS.com 37
- 38. Implementation of Naive Bayes algorithm: ❑ # Assigning features and label variables ❑ weather=['Sunny','Sunny','Overcast','Rainy','Ra iny','Rainy','Overcast','Sunny','Sunny','Rainy' ,'Sunny','Overcast','Overcast','Rainy'] ❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','C ool','Mild','Cool','Mild','Mild','Mild','Hot',' Mild'] ❑ play=['No','No','Yes','Yes','Yes','No','Yes','N o','Yes','Yes','Yes','Yes','Yes','No'] www.SunilOS.com 38
- 39. Implementation of Naive Bayes algorithm (cont.) ❑ # Import LabelEncoder o from sklearn import preprocessing ❑ #creating labelEncoder o le = preprocessing.LabelEncoder() ❑ # Converting string labels into numbers. o weather_encoded=le.fit_transform(weather) o print("Weather:",weather_encoded) ❑ # Converting string labels into numbers o temp_encoded=le.fit_transform(temp) o print("Temp:",temp_encoded) o label=le.fit_transform(play) o print("Play:",label) www.SunilOS.com 39
- 40. Implementation of Naive Bayes algorithm (cont.) ❑ #Combining weather and temp into single list of tuples o features=list(zip(weather_encoded,temp_encoded)) o print("Features:",features) ❑ #Import Gaussian Naive Bayes model o from sklearn.naive_bayes import GaussianNB ❑ #Create a Gaussian Classifier o model = GaussianNB() ❑ # Train the model using the training sets o model.fit(features,label) ❑#Predict Output: 0:Overcast, 2:Mild o predicted= model.predict([[0,2]]) o print ("Predicted Value:", predicted) www.SunilOS.com 40
- 41. Multinomial Naive Bayes algorithm: ❑This machine learning algorithm is used for text data classification. ❑If we are interested in finding out a number of occurrences of a word in a document then we have to use a multinomial naive Bayes algorithm. www.SunilOS.com 41
- 42. How does Naive Bayes Algorithm Works ? ❑ Let’s consider an example, classify the review whether it is positive or negative. ❑ Training Dataset: www.SunilOS.com 42 Text Reviews I like the movie Positive It's a good movie. Nice Story Positive Nice songs. But sadly a boring ending. negative Overall nice movie Positive Sad, boring movie negative
- 43. ❑ We classify whether the text “overall liked the movie” has a positive review or a negative review. We have to calculate: ❑ P(positive | overall liked the movie) — the probability that the tag of a sentence is positive. ❑ P(negative | overall liked the movie) — the probability that the tag of a sentence is negative . ❑ Before that, first, we apply Removing Stopwords and Stemming in the text. www.SunilOS.com 43
- 44. Removing Stopwords & Stemming ❑ Removing Stopwords: These are common words that don’t really add anything to the classification, such as an able, either, else, ever and so on. ❑ ❑ Stemming: Stemming to take out the root of the word. A stemming algorithm reduces the words o “chocolates”, “chocolaty”, “Choco” to the root word, “chocolate” o and “retrieval”, “retrieved”, “retrieves” reduce to the stem “retrieve”. www.SunilOS.com 44
- 45. Feature Engineering: ❑The important part is to find the features from the data to make machine learning algorithms works. ❑ In this case, we have text. We need to convert this text into numbers that we can do calculations on. ❑ We use word frequencies. That is treating every document as a set of the words it contains. ❑Our features will be the counts of each words. www.SunilOS.com 45
- 46. Now Calculate Probability ❑ In our case, we have o P(positive | overall liked the movie) ❑ Since for our classifier we have to find out which tag has a bigger probability, we can discard the divisor which is the same for both tags, o P(overall liked the movie|positive)* P(positive) o P(overall liked the movie|negative)* P(negative) www.SunilOS.com 46
- 47. ❑ There’s a problem though: “overall liked the movie” doesn’t appear in our training dataset, so the probability is zero. Here, we assume the ‘naive’ condition that every word in a sentence is independent of the other ones. This means that now we look at individual words. ❑ We can write this as: o P(overall liked the movie) = P(overall) * P(liked) * P(the) * P(movie) ❑ The next step is just applying the Bayes theorem: o P(overall liked the movie| positive) = P(overall | positive) * P(liked | positive) * P(the | positive) * P(movie | positive) ❑ And now, these individual words actually show up several times in our training data, and we can calculate probability of them! www.SunilOS.com 47
- 48. The prior Probability ❑ P(positive) is= 3/5 =0.6. ❑ P(negative) is= 2/5=0.4. ❑ Then, calculating P(overall | positive) means counting how many times the word “overall” appears in positive texts+1 divided by the total number of words in positive+ total no of unique words in all reviews. o Total words in positive=13. o Total words in Negative=10. o Total Unique words in all=15 www.SunilOS.com 48
- 49. Calculated Prior Probability ❑ Therefore, o P(overall | positive) = (1+1)/(13+15)=0.07142 o P(liked | positive) = (1+1)/(13+15)=0.07142 o P(the | positive) = (1+1)/(13+15)=0.07142 o P(movie | positive) = (3+1)/(13+15)=0.1428 ❑ Therefore, o P(overall | negative) = (0+1)/(10+15)=0.04 o P(liked | negative) = (0+1)/(10+15)=0.04 o P(the | negative) = (0+1)/(10+15)=0.04 o P(movie| negative) = (1+1)/(10+15)=0.08 www.SunilOS.com 49
- 50. Laplace smoothing ❑If probability comes out to be zero then By using Laplace smoothing: ❑we add 1 to every count so it’s never zero. To balance this, we add the number of possible words to the divisor, so the division will never be greater than 1. ❑In our case, the total unique possible words count are 15. www.SunilOS.com 50
- 52. Result: Positive Review ❑ P(overall | positive) * P(liked |positive) * P(the | positive) * P(movie | positive) * P(positive )= 3.06 * 10^{-5}=0.0000306 ❑ P(overall | negative) * P(liked |negative) * P(the | negative) * P(movie | negative) * P(negative) = 0.20 * 10^{-5}=0.000002048 www.SunilOS.com 52
- 53. Implementation of Multinomial Naive Bayes algorithm: ❑Multinomial implements the naive Bayes algorithm for multinomially (discrete no of possible outcome) distributed data, ❑and is one of the two classic naive Bayes variants used in text classification (where the data are typically represented as word vector counts). www.SunilOS.com 53
- 54. Implementation of Multinomial Naive Bayes algorithm: ❑ # Assigning features and label variables o import numpy as np o reviews=np.array(['I like the movie', o 'Its a good movie. Nice Story', o 'Nice songs. But sadly a boring ending.', o 'Overall nice movie', o 'Sad, boring movie']) o label=["positive","positive","negative","positive ","negative"] o test=np.array(["Overall i like the movie"]) www.SunilOS.com 54
- 55. Implementation of Multinomial Naive Bayes algorithm (cont.) ❑ #encode text data into numeric o from sklearn import preprocessing ❑ #creating labelEncoder o le = preprocessing.LabelEncoder() ❑ # Converting string labels into numbers. o lable_encoded=le.fit_transform(label) o print("Label:",lable_encoded) www.SunilOS.com 55
- 56. Implementation of Multinomial Naive Bayes algorithm (cont.) ❑ # Generate counts from text using a vectorizer. There are other vectorizers available, and lots of options you can set. ❑ # This performs our step of computing word counts. o from sklearn.feature_extraction.text import CountVectorizer o vectorizer=CountVectorizer(stop_words='english') o train_features =vectorizer.fit_transform(reviews) o test_features = vectorizer.transform(test) o print("Train vocabulary:",vectorizer.vocabulary_) ❑ #Print Dimension of the training and test data o print("Shape of Train:",train_features.shape) o print("Shape of Train:",test_features.shape) www.SunilOS.com 56
- 57. Implementation of Multinomial Naive Bayes algorithm (cont.) ❑ # Fit a naive Bayes model to the training data. ❑ # This will train the model using the word counts we computer, and the existing classifications in the training set. o nb = MultinomialNB() o nb.fit(train_features,lable_encoded) ❑ ❑ # Now we can use the model to predict classifications for our test features. o predictions = nb.predict(test_features) o print(predictions) www.SunilOS.com 57
- 58. Bernoulli Naive Bayes: ❑ BernoulliNB implements the naive Bayes training and classification algorithms for data that is distributed according to multivariate Bernoulli distributions; o i.e., there may be multiple features but each one is assumed to be a binary-valued (boolean) variable. ❑ Therefore, this class requires samples to be represented as binary-valued feature vectors; ❑ if handed any other kind of data, a BernoulliNB instance may binarize its input (depending on the binarize parameter). www.SunilOS.com 58
- 59. for a Bernoulli trial ❑ a random experiment that has only two outcomes o usually called a “Success” or a “Failure”. ❑ For example, the probability of getting a heads (a “success”) while flipping a coin is 0.5. ❑ The probability of “failure” is 1 – P (1 minus the probability of success, which also equals 0.5 for a coin toss). ❑ It is a special case of the binomial distribution for n = 1. In other words, it is a binomial distribution with a single trial (e.g. a single coin toss). www.SunilOS.com 59
- 60. Implementation of Bernoulli Naive Bayes algorithm (cont.) ❑ # Assigning features and label variables o import numpy as np o document=np.array(["Saturn Dealer’s Car", o "Toyota Car Tercel", o "Baseball Game Play", o "Pulled Muscle Game", o "Colored GIFs Root"]) o label=np.array(["Auto","Auto","Sports","Sports"," Computer"]) o test=np.array(["Home Runs Game","Car Engine Noises"]) www.SunilOS.com 60
- 61. Implementation of Bernoulli Naive Bayes algorithm (cont.) ❑ #Import preprocessing o from sklearn import preprocessing ❑ #creating labelEncoder o le = preprocessing.LabelEncoder() ❑ # Converting string labels into numbers. o lable_encoded=le.fit_transform(label) o print("Label:",lable_encoded) www.SunilOS.com 61
- 62. Implementation of Bernoulli Naive Bayes algorithm (cont.) ❑ # Generate counts from text using a vectorizer. There are other vectorizers available, and lots of options you can set. ❑ # This performs our step of computing word Occurrence counts. o vectorizer=CountVectorizer(stop_words='english',b inary=True) o train_features = vectorizer.fit_transform(document) o test_features = vectorizer.transform(test) o print("Train vocabulary:",vectorizer.vocabulary_) ❑ #Print dimention of the Trainning and Ttest data o print("Shape of Train:",train_features.shape) o print("Shape of Train:",test_features.shape) www.SunilOS.com 62
- 63. Implementation of Bernoulli Naive Bayes algorithm (cont.) ❑ # Fit a naive Bayes model to the training data. ❑ # This will train the model using the word occurrence counts we compute, in the existing classifications in the training set. o nb=BernoulliNB() o nb.fit(train_features,lable_encoded) ❑ ❑ # Now we can use the model to predict classifications for our test features. o predictions = nb.predict(test_features) o print("Prediction:",predictions) www.SunilOS.com 63
- 64. Advantages Of Naïve Bayes ❑ It is Simple, Fast and accurate. ❑ It has very low computation cost. ❑ It can efficiently work on a large dataset. ❑ It can be used with multiple class prediction problems. ❑ It also performs well in the case of text analytics problems. ❑ When the assumption of independence holds, a Naive Bayes classifier performs better compared to other models like logistic regression. www.SunilOS.com 64
- 65. Disadvantages of naive Bayes ❑ The assumption of independent features. In practice, it is almost impossible that model will get a set of predictors which are entirely independent. ❑ If there is no training tuple of a particular class, this causes zero posterior probability. ❑ In this case, the model is unable to make predictions. This problem is known as Zero Probability/Frequency Problem. www.SunilOS.com 65
- 68. What Is Decision Tree? ❑ Decision Tree is a supervised learning algorithm. ❑ It is a tree Like structure for classification and regression Model. ❑ Decision trees can be used for both categorical and numerical data. o The categorical data represent: gender, marital status, etc. o while the numerical data represent age, temperature, etc. ❑ A decision tree is a tree ❑ where each node represents o a feature (attribute), ❑ each link (branch) represents o a decision (rule) and ❑ each leaf represents an o outcome (categorical or continues value). www.SunilOS.com 68
- 69. Reason to choose Decision Tree ❑Decision Trees usually represents human thinking ability while making a decision, so it is easy to understand. ❑The logic behind the decision tree can be easily understood because it shows a tree-like structure. www.SunilOS.com 69
- 70. Terminologies ❑ Root Node: It is first node of the tree. It represents the entire dataset, which further gets divided into two or more homogeneous sets. ❑ Leaf Node: It is final nodes of the tree, and the tree cannot be further divided after getting a leaf node. ❑ Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the given conditions. ❑ Branch/Sub Tree: A tree formed by splitting the tree. ❑ Pruning: Pruning is the process of removing the unwanted branches from the tree. ❑ Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child nodes. www.SunilOS.com 70
- 71. How Does A Decision Tree Work? ❑ It splits the dataset into subsets on the basis of the most significant attribute in the dataset. ❑ How the decision tree identifies this attribute and how this splitting is done is decided by Attribute selection Measure. ❑ The most significant attribute is selected as the root node. ❑ Splitting is done to form sub-nodes called decision nodes. ❑ And the nodes which do not split further are terminal or leaf nodes. www.SunilOS.com 71
- 72. Attribute selection measure. ❑ While implementing a Decision tree, the main issue arises that how to select the best attribute for the root node and for sub-nodes. ❑ So, to solve such problems there is a technique which is called as Attribute selection measure or ASM. ❑ There are two popular techniques for ASM, which are: o Information Gain o Gini Index www.SunilOS.com 72
- 73. Information Gain ❑ It calculates how much information a feature provides us about a class. ❑ According to the value of information gain, we split the node and build the decision tree. ❑ A node/attribute having the highest information gain is split first. It can be calculated using the below formula: o Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature) ❑ Entropy:It specifies randomness in data. Entropy can be calculated as: o Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)Where, ❑ S= Total number of samples ❑ P(yes)= probability of yes ❑ P(no)= probability of no www.SunilOS.com 73
- 74. Gini Index ❑ Gini index is a measure of impurity or purity used while creating a decision tree in the CART(Classification and Regression Tree) algorithm. ❑ An attribute with the low Gini index should be preferred as compared to the high Gini index. ❑ It only creates binary splits, and the CART algorithm uses the Gini index to create binary splits. ❑ Gini index can be calculated using the below formula: o Gini Index= 1- ∑jPj 2 www.SunilOS.com 74
- 75. Types of decision Trees Algorithms ❑ There are many decision tree algorithms available. Some of Them are as following ❑ ID3 ❑ C4.5 ❑ CART ❑ etc. www.SunilOS.com 75
- 76. Advantages & Disadvantages of DT Advantages ❑ It follows the same process as human follows in real life to make decisions. ❑ Easy To Understand. ❑ It can be very useful for solving decision-related problems. ❑ It helps to think about all the possible outcomes for a problem. ❑ No need of data cleaning. Disadvantages ❑ The decision tree contains lots of layers, which makes it complex. ❑ It may have an overfitting issue, which can be resolved using the Random Forest algorithm. ❑ For more class labels, the computational complexity of the decision tree may increase. 76
- 77. Working of CART Algorithm www.SunilOS.com 77 Day Outlook Temp. Humidity Wind Decision 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Weak Yes 14 Rain Mild High Strong No
- 78. Gini index: ❑Gini index is a metric for classification tasks in CART. ❑It stores sum of squared probabilities of each class. We can formulate it as illustrated below. ❑Gini = 1 – Σ (Pi)2 for i=1 to number of classes www.SunilOS.com 78
- 79. Select attribute to create Root node ❑ Outlook(weather):Outlook is a nominal feature. It can be sunny, overcast or rain. The final decisions for outlook feature. ❑ Gini(Outlook=Sunny) = 1 – (2/5)2 – (3/5)2 = 1 – 0.16 – 0.36 = 0.48 ❑ Gini(Outlook=Overcast) = 1 – (4/4)2 – (0/4)2 = 0 ❑ Gini(Outlook=Rain) = 1 – (3/5)2 – (2/5)2 = 1 – 0.36 – 0.16 = 0.48 ❑ Then, we will calculate weighted sum of gini indexes for outlook feature. ❑ Gini(Outlook) = (5/14) x 0.48 + (4/14) x 0 + (5/14) x 0.48 ❑ Gini(Outlook)= 0.171 + 0 + 0.171 = 0.342 www.SunilOS.com 79 Outlook Yes No Number of instances Sunny 2 3 5 Overcast 4 0 4 Rainy 3 2 5
- 80. Temperature ❑ Similarly, temperature is a nominal feature and it could have 3 different values: Cool, Hot and Mild. Let’s summarize decisions for temperature feature. ❑ Gini(Temp=Hot) = 1 – (2/4)2 – (2/4)2 = 0.5 ❑ Gini(Temp=Cool) = 1 – (3/4)2 – (1/4)2 = 1 – 0.5625 – 0.0625 = 0.375 ❑ Gini(Temp=Mild) = 1 – (4/6)2 – (2/6)2 = 1 – 0.444 – 0.111 = 0.445 ❑ We’ll calculate weighted sum of gini index for temperature feature ❑ Gini(Temp) = (4/14) x 0.5 + (4/14) x 0.375 + (6/14) x 0.445 ❑ Gini(Temp)= 0.142 + 0.107 + 0.190 = 0.439 www.SunilOS.com 80 Temperature Yes No Number of instances Hot 2 2 4 Cool 3 1 4 Mild 4 2 6
- 81. Humidity ❑ Humidity is a binary class feature. It can be high or normal. ❑ Gini(Humidity=High) = 1 – (3/7)2 – (4/7)2 = 1 – 0.1836 – 0.326 ❑ Gini(Humidity=High) = 0.48 ❑ Gini(Humidity=Normal) = 1 – (6/7)2 – (1/7)2 = 1 – 0.734 – 0.020 ❑ Gini(Humidity=High) = 0.244 ❑ We’ll calculate weighted sum of gini index for Humidity feature ❑ Gini(Wind) = (7/14) x 0.48 + (7/14) x 0.244 = 0.362 www.SunilOS.com 81 Humidity Yes No Number of instances High 3 4 7 Normal 6 1 7
- 82. Windy ❑ Wind is a binary class similar to humidity. It can be weak and strong. ❑ Gini(Wind=Weak) = 1 – (6/8)2 – (2/8)2 = 1 – 0.5625 – 0.062 ❑ Gini(wind=weak)= 0.375 ❑ Gini(Wind=Strong) = 1 – (3/6)2 – (3/6)2 = 1 – 0.25 – 0.25 ❑ Gini(Wind=Strong)= 0.5 ❑We’ll calculate weighted sum of gini index for wind feature ❑ Gini(Wind) = (8/14) x 0.375 + (6/14) x 0.5 ❑ Gini(wind)= 0.428 www.SunilOS.com 82 Wind Yes No Number of instances Weak 6 2 8 Strong 3 3 6
- 83. To Make decision tree ❑ Choose attribute with Lower Gini Index. ❑ Outlook will be the root node because it has minimum gini index value. Overcast subset has only yes decisions. That means overcast leaf is over ❑ We will apply same principles to those sub datasets in the following steps. Focus on the sub dataset for sunny outlook. We need to find the gini index scores for temperature, humidity and wind features respectively. www.SunilOS.com 83 Feature Gini index Outlook 0.342 Temperature 0.439 Humidity 0.362 Wind 0.428
- 84. Sub-tree (subset) sunny Day Outlook Temp. Humidity Wind Decision 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 11 Sunny Mild Normal Strong Yes www.SunilOS.com 84
- 85. Gini of temperature for sunny outlook: ❑ Gini(Outlook=Sunny and Temp.=Hot) = 1 – (0/2)2 – (2/2)2 = 0 ❑ Gini(Outlook=Sunny and Temp.=Cool) = 1 – (1/1)2 – (0/1)2 = 0 ❑ Gini(Outlook=Sunny and Temp.=Mild) = 1 – (1/2)2 – (1/2)2 = 1 – 0.25 – 0.25 = 0.5 ❑ Gini(Outlook=Sunny and Temp.) = (2/5)x0 + (1/5)x0 + (2/5)x0.5 = 0.2 www.SunilOS.com 85 Temperature Yes No Number of instances Hot 0 2 2 Cool 1 0 1 Mild 1 1 2
- 86. Gini of humidity for sunny Outlook(Weather): ❑ Gini(Outlook=Sunny and Humidity=High) = 1 – (0/3)2 – (3/3)2 = 0 ❑ Gini(Outlook=Sunny and Humidity=Normal) = 1 – (2/2)2 – (0/2)2 = 0 ❑ Gini(Outlook=Sunny and Humidity) = (3/5)x0 + (2/5)x0 = 0 www.SunilOS.com 86 Humidity Yes No Number of instances High 0 3 3 Normal 2 0 2
- 87. Gini of wind for sunny outlook: ❑ Gini(Outlook=Sunny and Wind=Weak) = 1 – (1/3)2 – (2/3)2 = 0.266 ❑ Gini(Outlook=Sunny and Wind=Strong) = 1- (1/2)2 – (1/2)2 = 0.2 ❑ Gini(Outlook=Sunny and Wind) = (3/5)x0.266 + (2/5)x0.2 = 0.466 www.SunilOS.com 87 Wind Yes No Number of instances Weak 1 2 3 Strong 1 1 2
- 88. Decision for sunny outlook: ❑ We’ve calculated gini index scores for feature when outlook is sunny. The winner is humidity because it has the lowest value. ❑ We’ll put humidity at the extension of sunny outlook because it has minimum gini index. ❑ As seen, decision is always no for high humidity and sunny outlook. On the other hand, decision will always be yes for normal humidity and sunny outlook. This branch is over. www.SunilOS.com 88 Feature Gini index Temperature 0.2 Humidity 0 Wind 0.466
- 89. Now, we need to focus on rain outlook. Day Outlook Temp. Humidity Wind Decision 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 6 Rain Cool Normal Strong No 10 Rain Mild Normal Weak Yes 14 Rain Mild High Strong No www.SunilOS.com 89
- 90. Gini of temperature for rain outlook: ❑ Gini(Outlook=Rain and Temp.=Cool) = 1 – (1/2)2 – (1/2)2 = 0.5 ❑ Gini(Outlook=Rain and Temp.=Mild) = 1 – (2/3)2 – (1/3)2 = 0.444 ❑ Gini(Outlook=Rain and Temp.) = (2/5)x0.5 + (3/5)x0.444 = 0.466 www.SunilOS.com 90 Temperature Yes No Number of instances Cool 1 1 2 Mild 2 1 3
- 91. Gini of humidity for rain outlook: ❑ Gini(Outlook=Rain and Humidity=High) = 1 – (1/2)2 – (1/2)2 = 0.5 ❑ Gini(Outlook=Rain and Humidity=Normal) = 1 – (2/3)2 – (1/3)2 = 0.444 ❑ Gini(Outlook=Rain and Humidity) = (2/5)x0.5 + (3/5)x0.444 = 0.466 www.SunilOS.com 91 Humidity Yes No Number of instances High 1 1 2 Normal 2 1 3
- 92. Gini of wind for rain outlook: ❑ Gini(Outlook=Rain and Wind=Weak) = 1 – (3/3)2 – (0/3)2 = 0 ❑ Gini(Outlook=Rain and Wind=Strong) = 1 – (0/2)2 – (2/2)2 = 0 ❑ Gini(Outlook=Rain and Wind) = (3/5)x0 + (2/5)x0 = 0 www.SunilOS.com 92 Wind Yes No Number of instances Weak 3 0 3 Strong 0 2 2
- 93. Decision for rain outlook: ❑ So for rain outlook we will take wind feature for spliting because it has minimum gini index. ❑ Put the wind feature for rain outlook branch and monitor the new sub data sets. ❑ As seen, decision is always yes when wind is weak. On the other hand, decision is always no if wind is strong. This means, this branch is over. www.SunilOS.com 93 Feature Gini index Temperature 0.466 Humidity 0.466 Wind 0
- 95. Code Implementation of CART ❑ #Assigning features and label variables ❑ weather=['Sunny','Sunny','Overcast','Rainy','Rainy', 'Rainy','Overcast','Sunny','Sunny','Rainy','Sunny', 'Overcast', 'Overcast‘ , 'Rainy'] ❑ ❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool', 'Mild','Cool','Mild','Mild','Mild','Hot','Mild'] ❑ ❑ humidity=["High","High","High","High","Normal","Norm al","Normal","High","Normal","Normal","Normal","High ","Normal","High"] ❑ ❑ Windy=["Weak","Strong","Weak","Weak","Weak","Strong“ ,"Strong","Weak","Weak","Weak","Strong","Strong","We ak","Strong"] www.SunilOS.com 95
- 96. Code Implementation of CART ❑ play=['No','No','Yes','Yes','Yes','No','Yes','N o','Yes','Yes','Yes','Yes','Yes','No'] ❑ ❑ # Import LabelEncoder ❑ from sklearn import preprocessing ❑ ❑ #creating labelEncoder ❑ le = preprocessing.LabelEncoder() ❑ ❑ # Converting string labels into numbers. ❑ weather_encoded=le.fit_transform(weather) ❑ print("Weather:",weather_encoded) ❑ www.SunilOS.com 96
- 97. Code Implementation of CART ❑ # Converting string labels into numbers ❑ temp_encoded=le.fit_transform(temp) ❑ print("Temp:",temp_encoded) ❑ ❑ windy_encoded=le.fit_transform(Windy) ❑ print("Windy:",windy_encoded) ❑ ❑ Humadity_encoded=le.fit_transform(humadity) ❑ print("Humadity:",Humadity_encoded) ❑ label=le.fit_transform(play) ❑ print("Play:",label) www.SunilOS.com 97
- 98. Code Implementation of CART ❑ #Combinig weather,temp, Windy, humadity into single listof tuples ❑ features=list(zip(weather_encoded,temp_encoded,windy _encoded,Humadity_encoded)) ❑ print("Features:",features) ❑ #Import the DecisionTreeClassifier ❑ from sklearn.tree import DecisionTreeClassifier ❑ tree = DecisionTreeClassifier(criterion='gini') ❑ #Train the Model ❑ tree.fit(features,label) ❑ #Test Model 2:sunny, 2:Mild 0:Windy:Strong 0:Humadity:High ❑ prediction = tree.predict([[2,2,1,0]]) ❑ print("Decision",prediction) ❑ www.SunilOS.com 98
- 99. Working of ID3 Algorithm ❑ For ID3 implementation we are using the same dataset which we have used in CART algorithm. ❑ First step will be to create a root node. ❑ If all results are yes, then the leaf node “yes” will be returned else the leaf node “no” will be returned. ❑ Find out the Entropy of all observations and entropy with attribute “x” that is E(S) and E(S, x). ❑ Find out the information gain and select the attribute with high information gain. ❑ Repeat the above steps until all attributes are covered. www.SunilOS.com 99
- 100. Complete Entropy of dataset ❑ First we will calculate entropy for decision column (play) Decision column consists of 14 instances and includes two labels: Yes and No. o Yes=9 o No=5 ❑ Entropy(Decision)= –p(Yes)*log2p(Yes)–p(No)*log2p(No) ❑ Entropy(Decision)= –(9/14) *log2(9/14)–(5/14)*log2(5/14) = 0.940 ❑ Now, we need to find out the most dominant attribute to make root node of the tree. www.SunilOS.com 100
- 101. Wind factor on decision ❑ Formula: o Gain(Decision,Wind)=Entropy(Decision) – ∑ [ p(Decision|Wind).* Entropy(Decision|Wind)] ❑ Wind attribute has two labels: Weak and Strong. We would reflect it to the formula. o Gain(Decision,Wind)=Entropy(Decision)– [p(Decision|Wind=Weak)*Entropy(Decision|Wind=Weak)]- [p(Decision|Wind=Strong)*Entropy(Decision|Wind=Strong) ] ❑ Now, we need to calculate (Decision|Wind=Weak) and (Decision|Wind=Strong) respectively. www.SunilOS.com 101
- 102. Weak wind factor on decision Day Outlook Temp. Humidity Wind Decision 1 Sunny Hot High Weak No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes 13 Overcast Hot Normal Weak Yes www.SunilOS.com 102
- 103. Weak wind factor on decision ❑ There are 8 instances for weak wind. Decision of 2 items are No and 6 items are Yes as illustrated below. ❑ Entropy(Decision|Wind=Weak)=–p(No)*log2p(No)-p(Yes)*log2p(Yes) ❑ Entropy(Decision|Wind=Weak) = – (2/8)*log2(2/8) – (6/8) *log2(6/8) ❑ Entropy(Decision|Wind=Weak) = 0.811 www.SunilOS.com 103
- 104. Strong wind factor on decision(Play): Day Outlook Temp. Humidity Wind Decision 2 Sunny Hot High Strong No 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 14 Rain Mild High Strong No www.SunilOS.com 104
- 105. Strong wind factor on decision(Play): ❑ Here, there are 6 instances for strong wind. Decision is divided into two equal parts. ❑ Entropy(Decision|Wind=Strong)=–p(No)*log2p(No)– p(Yes)*log2p(Yes) ❑ Entropy(Decision|Wind=Strong) = – (3/6)*log2(3/6) – (3/6) *log2(3/6) ❑ Entropy(Decision|Wind=Strong) = 1 www.SunilOS.com 105
- 106. Information Gain for Wind Attribute ❑ Formula: o Gain(Decision,Wind) = Entropy(Decision)– [p(Decision|Wind=Weak) * Entropy(Decision|Wind=Weak) ] – [p(Decision|Wind=Strong)*Entropy(Decision|Wind=Strong) ] ❑ Gain(Decision,Wind) = 0.940 – [ (8/14) *0.811 ] – [ (6/14)*1] ❑ Gain(Decision,Wind) = 0.048 ❑ We Have calculated Gain for Wind. Apply the same procedure to Others to get Best attribute to make it root node. www.SunilOS.com 106
- 107. Information Gain for Other factors ❑ Other factors on decision o Gain(Decision, Outlook) = 0.246 o Gain(Decision, Temperature) = 0.029 o Gain(Decision, Humidity) = 0.151 ❑ Outlook factor on decision has highest score. That’s why, outlook decision will appear in the root node of the tree. www.SunilOS.com 107
- 108. Overcast outlook on decision ❑ Basically, decision will always be yes if outlook were overcast. www.SunilOS.com 108 Day Outlook Temp. Humidity Wind Decision 3 Overcast Hot High Weak Yes 7 Overcast Cool Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Weak Yes
- 109. Sunny outlook on decision Day Outlook Temp. Humidity Wind Decision 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 11 Sunny Mild Normal Strong Yes www.SunilOS.com 109
- 110. Sunny outlook on decision ❑ Here, there are 5 instances for sunny outlook. Decision would be probably 3/5 percent No, 2/5 percent Yes. ❑ Gain(Outlook=Sunny|Temperature) = 0.570 ❑ Gain(Outlook=Sunny|Humidity) = 0.970 ❑ Gain(Outlook=Sunny|Wind) = 0.019 ❑ Now, humidity is the decision because it produces the highest score if outlook were sunny. www.SunilOS.com 110
- 111. Sunny outlook on decision ❑ At this point, decision will always be NO if humidity were high. ❑ At this point, decision will always be Yes if humidity were Normal. www.SunilOS.com 111 Day Outlook Temp. Humidity Wind Decision 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 8 Sunny Mild High Weak No Day Outlook Temp. Humidity Wind Decision 9 Sunny Cool Normal Weak Yes 11 Sunny Mild Normal Strong Yes
- 112. Rain outlook on decision ❑ Gain(Outlook=Rain | Temperature) = 0.01997309402197489 ❑ Gain(Outlook=Rain | Humidity) = 0.01997309402197489 ❑ Gain(Outlook=Rain | Wind) = 0.9709505944546686 ❑ Here, wind produces the highest score if outlook were rain. That’s why, we need to check wind attribute in 2nd level if outlook were rain. www.SunilOS.com 112 Day Outlook Temp. Humidity Wind Decision 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 6 Rain Cool Normal Strong No 10 Rain Mild Normal Weak Yes 14 Rain Mild High Strong No
- 113. Rain outlook on decision ❑ Decision will always Yes if wind were weak and outlook were rain. ❑ Decision will always No if wind were Strong and outlook were rain. www.SunilOS.com 113 Day Outlook Temp. Humidity Wind Decision 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes Day Outloo k Temp. Humidit y Wind Decision 6 Rain Cool Normal Strong No 14 Rain Mild High Strong No
- 114. Final decision Tree www.SunilOS.com 114
- 115. Implementation of ID3 ❑ #Import the DecisionTreeClassifier ❑ from sklearn.tree import DecisionTreeClassifier ❑ # Assigning features and label variables ❑ weather=['Sunny','Sunny','Overcast','Rainy','Ra iny','Rainy','Overcast','Sunny','Sunny', 'Rainy','Sunny','Overcast','Overcast','Rainy'] ❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','C ool','Mild','Cool','Mild','Mild','Mild','Hot',' Mild'] ❑ ❑ play=['No','No','Yes','Yes','Yes','No','Yes','N o','Yes','Yes','Yes','Yes','Yes','No'] www.SunilOS.com 115
- 116. Implementation of ID3(cont.) ❑ # Import LabelEncoder ❑ from sklearn import preprocessing ❑ #creating labelEncoder ❑ le = preprocessing.LabelEncoder() ❑ # Converting string labels into numbers. ❑ weather_encoded=le.fit_transform(weather) ❑ print("Weather:",weather_encoded) ❑ ❑ # Converting string labels into numbers ❑ temp_encoded=le.fit_transform(temp) www.SunilOS.com 116
- 117. Implementation of ID3(cont.) ❑ print("Temp:",temp_encoded) ❑ label=le.fit_transform(play) ❑ print("Play:",label) ❑ #Combinig weather and temp into single listof tuples ❑ features=list(zip(weather_encoded,temp_encoded)) ❑ print("Features:",features) ❑ #Create Instance of Model, and train the model ❑ tree = DecisionTreeClassifier(criterion='entropy') ❑ tree.fit(features,label) ❑ #Predict result for 0:Overcast, 2:mild ❑ prediction = tree.predict([[0,2]]) ❑ print("Decision",prediction) www.SunilOS.com 117
- 119. What is Random Forest ❑In Random Forest algorithm we join different and same type of multiple algorithms together. For example multiple decision trees to make a forest of trees. That is known as Random forest. ❑ It helps us to make a powerful prediction model. ❑Random forest algorithm works for both regression and classification Problems. ❑Application of Random Forest o Fraud prediction o Cancer detection o Stock market predictions o Spam filter o News classification www.SunilOS.com 119
- 120. How does random Forest Works? ❑ Pick N random data records from the dataset. ❑ Based on these N numbers of record build a decision tree. ❑ Choose how many trees we want to create and repeat the previous steps. ❑ To predict the output for new record: ❑ In case of Regression: Each tree will predict the result. The final result will be calculated by taking average of all result predicted by all trees. ❑ In case of Classification: The trees will predict the class level for new record. Finally we will assign the new record to the category which has majority. www.SunilOS.com 120
- 121. Advantages and Disadvantages of Random Forest Advantages ❑ In Random forest there is multiple trees. So this algorithm is not biased. ❑ This is a stable algorithm. If new training data is introduced only one tree will be affected not all the trees. ❑ This is suitable for both categorical data, and numerical data. ❑ This is also work well when dataset has missing values ❑ Model can be trained parallel . Disadvantages ❑It is complex algorithm. ❑It required more computational time to join multiple decision trees. ❑It takes too much time to train the model as compare to other algorithm 121
- 122. Code implementation of random Forest ❑ #Assign features ❑ weather=['Sunny','Sunny','Overcast','Rainy' ,'Rainy','Rainy','Overcast','Sunny','Sunny' ,'Rainy','Sunny','Overcast','Overcast', 'Rainy'] ❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool ','Cool','Mild','Cool','Mild','Mild','Mild' ,'Hot','Mild'] ❑ humadity=["High","High","High","High","Norm al","Normal","Normal","High","Normal","Norm al","Normal","High", "Normal","High"] www.SunilOS.com 122
- 123. Code implementation of random Forest ❑ Windy=["Weak","Strong","Weak","Weak","Weak" ,"Strong","Strong","Weak","Weak","Weak", "Strong","Strong","Weak","Strong"] ❑ play=['No','No','Yes','Yes','Yes','No','Yes ','No','Yes','Yes','Yes','Yes','Yes','No'] ❑ ❑ #Import LabelEncoder ❑ from sklearn import preprocessing ❑ #creating labelEncoder ❑ le = preprocessing.LabelEncoder() www.SunilOS.com 123
- 124. Code implementation of random Forest ❑ # Converting string labels into numbers. ❑ weather_encoded=le.fit_transform(weather) ❑ print("Weather:",weather_encoded) ❑ ❑ temp_encoded=le.fit_transform(temp) ❑ print("Temp:",temp_encoded) ❑ ❑ windy_encoded=le.fit_transform(Windy) ❑ print("Windy:",windy_encoded) ❑ ❑ Humadity_encoded=le.fit_transform(humadity) ❑ print("Humadity:",Humadity_encoded) ❑ ❑ label=le.fit_transform(play) ❑ print("Play:",label) www.SunilOS.com 124
- 125. Code implementation of random Forest ❑ #Combinig weather and temp into single listof tuples ❑ features=list(zip(weather_encoded,temp_encoded, ❑ windy_encoded,Humadity_encoded)) ❑ #Import the RandomforestClassifier ❑ from sklearn.ensemble import RandomForestClassifi er ❑ #create instance of the Random Forest Classifier ❑ tree= RandomForestClassifier(n_estimators=5) ❑ #train the Model ❑ tree.fit(features,label) ❑ #Test 2:sunny, 2:Mild 0:Windy:Strong 0:Humadity:High ❑ prediction = tree.predict([[2,2,1,0]]) ❑ print("Decision",prediction) www.SunilOS.com 125
- 126. www.SunilOS.com 126 Support Vector Machine www.sunilos.com www.raystec.com
- 127. SVM ❑ Support Vector Machine is a supervised machine learning algorithm. ❑ They are developed in 1990’s and still famous. ❑ It is used for classification and Regression problem. ❑ SVM can be used for linearly and multidimensional dataset (2 Dim. and 3 Dim.). ❑ SVM can be used for multiclass classification(Having more than 1 class Label). www.SunilOS.com 127
- 128. How SVM Works: ❑ To separate two classes as shown in previous slide. we need a line that’s separate data in two classes. ❑ This line is known as Decision boundary or a hyper plane. We draw a line such as we have a maximum margin between the data points of the classes, which is near to the hyper plane. ❑ To separate the two classes of data points, there are many possible hyper planes that could be chosen. Our objective is to find a plane that has the maximum margin, i.e. the maximum distance between data points of both classes. ❑ Maximizing the margin distance provides some reinforcement so that future data points can be classified with more confidence. www.SunilOS.com 128
- 129. SVM Related Terminologies ❑ Support Vectors: o When we classify data with the help of hyperplane, than the data points which are near to the hyperplane is known as support Vectors. ❑ Hyperplane o A hyperplane is a decision boundary between the two classes. It is used to separate the data points of different class. ❑Margin: o We draw a parallel line along the data points which are near to the hyperplane. The gap between decision lines of each class is known as margin. o For ex. D- and D+ are the lines which are closest to the support vectors of two opponent classes. Than we can obtain margin as o Margin=D- + D+ o If the margin is larger in between the classes, then it is considered a good margin, a smaller margin is a bad margin. www.SunilOS.com 129
- 130. What is the reason to Choose SVM? ❑SVM can be used for multiclass classification. ❑SVM can be used for linear separated dataset. ❑SVM can be used for high dimensional dataset which are not linearly separable. ❑SVM is efficiently classifying the dataset in high dimension. www.SunilOS.com 130
- 131. Implementation of Linear SVM: ❑ #import liabraries ❑ import numpy as np ❑ import matplotlib.pyplot as plt ❑ from matplotlib import style ❑ style.use("ggplot") ❑ from sklearn import svm ❑ #Attributes ❑ x = [1, 5, 1.5, 8, 1, 9] ❑ y = [2, 8, 1.8, 8, 0.6,11] ❑ plt.scatter(x,y) ❑ plt.show() www.SunilOS.com 131
- 132. Implementation of Linear SVM(cont.) ❑ #import preprocessing ❑ from sklearn import preprocessing ❑ X=list(zip(x,y)) ❑ y = [0,1,0,1,0,1] ❑ #Train SVM Model ❑ clf = svm.SVC(kernel='linear', C = 1.0) ❑ clf.fit(X,y) ❑ # Test x=0.58, y=0.76 ❑ print(clf.predict([[0.58,0.76]])) ❑ #x=10.58, y=10.76 ❑ print(clf.predict([[10.58,10.76]])) www.SunilOS.com 132
- 133. Non- Linear SVM www.SunilOS.com 133
- 134. SVM Kernels ❑ The SVM algorithm is implemented in practice using a kernel. ❑ A kernel transforms an input data space into the required form (linear or non linear). ❑ SVM uses a technique called the kernel trick. Here, the kernel takes a low- dimensional input space and transforms it into a higher dimensional space. ❑ In other words, you can say that it converts non separable problem to separable problems by adding more dimension to it. ❑ It is most useful in non-linear separation problem. Kernel trick helps you to build a more accurate classifier. ❑ Types of Kernels o Linear Kernel o Polynomial Kernel o RBF (Radial Basis Kernel ) www.SunilOS.com 134
- 135. Linear Kernel ❑A linear kernel can be used as normal dot product any two given observations. The product between two vectors is the sum of the multiplication of each pair of input values. o K(x, xi) = sum(x * xi) ❑ For example, the inner product of the vectors [1, 2] and [3, 4] is 1*3 + 2*4 or 11. ❑ The equation for making a prediction for a new input using the dot product between the input (x) and each support vector (xi) is calculated as follows: f(x) = B0 + sum(ai * (x,xi)) ❑ This is an equation that is used for calculating the inner products of a new input vector (x) with all support vectors in training data. The coefficients B0 and ai (for each input) must be estimated from the training data by the learning algorithm. www.SunilOS.com 135
- 136. Polynomial Kernel ❑A polynomial kernel is a more generalized form of the linear kernel. The polynomial kernel can distinguish curved or nonlinear input space. K(x,xi) = 1 + sum(x * xi)^d ❑Where d is the degree of the polynomial. d=1 is similar to the linear transformation. The degree needs to be manually specified in the learning algorithm. www.SunilOS.com 136
- 137. RBF (radial basis function) Kernel ❑ The Radial basis function kernel is a popular kernel function commonly used in support vector machine classification. RBF can map an input space in infinite dimensional space. K(x,xi) = exp(-gamma * sum((x – xi^2)) ❑ Here gamma is a parameter, which ranges from 0 to 1. A higher value of gamma will perfectly fit the training dataset, which causes over-fitting. Gamma=0.1 is considered to be a good default value. The value of gamma needs to be manually specified in the learning algorithm. www.SunilOS.com 137
- 138. Implementation of Non Linear Kernel ❑ We can see our dataset is not linearly separable from the graph. www.SunilOS.com 138
- 139. Implementation of Non Linear Kernel ❑ # Assigning features and label variables ❑ weather=['Sunny','Sunny','Overcast','Rainy','Ra iny','Rainy','Overcast','Sunny','Sunny','Rainy' ,'Sunny','Overcast','Overcast','Rainy'] ❑ ❑ temp=['Hot','Hot','Hot','Mild','Cool','Cool','C ool','Mild','Cool','Mild','Mild','Mild','Hot',' Mild'] ❑ ❑ humadity=["High","High","High","High","Normal", "Normal","Normal","High","Normal","Normal","Nor mal","High","Normal","High"] ❑ www.SunilOS.com 139
- 140. Implementation of Non Linear Kernel ❑ Windy=["Weak","Strong","Weak","Weak","Weak","St rong","Strong","Weak","Weak","Weak","Strong","S trong","Weak","Strong"] ❑ ❑ play=['No','No','Yes','Yes','Yes','No','Yes','N o','Yes','Yes','Yes','Yes','Yes','No'] ❑ # Import LabelEncoder ❑ from sklearn import preprocessing ❑ #creating labelEncoder ❑ le = preprocessing.LabelEncoder() ❑ # Converting string labels into numbers. ❑ weather_encoded=le.fit_transform(weather) ❑ print("Weather:",weather_encoded) www.SunilOS.com 140
- 141. Implementation of Non Linear Kernel ❑ # Converting string labels into numbers ❑ temp_encoded=le.fit_transform(temp) ❑ print("Temp:",temp_encoded) ❑ windy_encoded=le.fit_transform(Windy) ❑ print("Windy:",windy_encoded) ❑ Humidity_encoded=le.fit_transform(humadity) ❑ print("Humadity:",Humadity_encoded) ❑ label=le.fit_transform(play) ❑ print("Play:",label) www.SunilOS.com 141
- 142. Implementation of Non Linear Kernel ❑ #Combinig weather and temp into single list of tuples ❑ features=list(zip(weather_encoded,temp_encoded,windy _encoded,Humadity_encoded)) ❑ print("Features:",features) ❑ #import svm ❑ from sklearn import svm ❑ #Create a svm Classifier ❑ clf = svm.SVC(kernel='rbf') # Linear Kernel ❑ #Train SVM Model ❑ clf.fit(features,label) ❑ # Test 2:sunny, 2:Mild 0:Windy:Strong 0:Humadity:High ❑ prediction = clf.predict([[2,2,1,0]]) ❑ print("Decision",prediction) www.SunilOS.com 142
- 143. Advantages & Disadvantages of SVM Advantages ❑It works really well with a clear margin of separation ❑It is effective in high dimensional spaces. ❑It is effective in cases where the number of dimensions is greater than the number of samples. ❑It support vectors, so it is also memory efficient. Disadvantages ❑It doesn’t perform well when we have large data set because the required training time is higher ❑It also doesn’t perform very well, when the data set has more noise i.e. target classes are overlapping 143
- 145. Types Of Regression ❑Linear regression ❑Logistic regression ❑Polynomial regression www.SunilOS.com 145 Logistic Linear Polynomial Regression Regression Regression
- 146. Logistic Regression and linear Regression Linear Regression Logistic Regression Linear regression is used to predict the continuous dependent variable using a given set of independent variables. Logistic Regression is used to predict the categorical dependent variable using a given set of independent variables. Linear Regression is used for solving Regression problem. Logistic regression is used for solving Classification problems. In Linear regression, we predict the value of continuous variables. In logistic Regression, we predict the values of categorical variables. In linear regression, we find the best fit line, by which we can easily predict the output. In Logistic Regression, we find the S-curve by which we can classify the samples. Least square estimation method is used for estimation of accuracy. Maximum likelihood estimation method is used for estimation of accuracy. The output for Linear Regression must be a continuous value, such as price, age, etc. The output of Logistic Regression must be a Categorical value such as 0 or 1, Yes or No, etc. In Linear regression, it is required that relationship between dependent variable and independent variable must be linear. In Logistic regression, it is not required to have the linear relationship between the dependent and independent variable. In linear regression, there may be collinearity between the independent variables. In logistic regression, there should not be collinearity between the independent variable. www.SunilOS.com 146
- 147. Linear Regression ❑Linear regression: o Linear regression is a statistical approach for modeling the relationship between a dependent variable with a given set of independent variables. 4/16/2020 www.SunilOS.com 147
- 148. Linear Regression cont. ❑Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an independent variable, and the other is considered to be a dependent variable. o For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model. 4/16/2020 www.SunilOS.com 148
- 149. What is Linear ❑First, let’s say that you are shopping at Dmart. Whether you buy goods or not, you have to pay 2.00rs for parking ticket. Each apple price 1.5rs., and you have to buy an (x) item of apple. Then we can populate a price list as following: 4/16/2020 www.SunilOS.com 149
- 150. Linear Relationship among data Quantity Price 1 3.50 Rs. 2 5.00 Rs 3 6.50 Rs 4 8.00 Rs 5 9.50 Rs … ... 10 17.00 Rs 11 18.50 Rs ... ... x y 4/16/2020 www.SunilOS.com 150
- 151. Linear Function ❑ It’s easy to predict (or calculate) the Price based on Value and vice versa using the equation of y=2+1.5x for this example or: Y =a + bx ❑ Linear Functions with: ❑ a = 2 ❑ b = 1.5 ❑ A linear function has one independent variable and one dependent variable. The independent variable is x and the dependent variable is y. ❑ a is the constant term or the y intercept. It is the value of the dependent variable when x = 0. ❑ b is the coefficient of the independent variable. It is also known as the slope and gives the rate of change of the dependent variable. 4/16/2020 www.SunilOS.com 151
- 152. Implementation of Linear Regression: ❑ Code explanation: ❑ dataset: the table contains all values in our csv file ❑ X: the first column which contains Years Experience array ❑ y: the last column which contains Salary array y = b0 + b1*x1 ❑ y: dependent variable ❑ b0: constant ❑ b1: coefficient ❑ x1: independent variable 4/16/2020 www.SunilOS.com 152
- 153. Dataset: Salary Data 4/16/2020 www.SunilOS.com 153
- 154. Visualization of data 4/16/2020 www.SunilOS.com 154
- 155. Code Implementation of Linear Regression ❑ import numpy as np ❑ import matplotlib.pyplot as plt ❑ import pandas as pd ❑ # Importing the dataset ❑ dataset=pd.read_csv('E:/MLImplementation/r egression.csv') ❑ #get a copy of dataset exclude last column ❑ X = dataset.iloc[:, :-1].values ❑ #get array of dataset in column 1st ❑ y = dataset.iloc[:, 1].values 4/16/2020 www.SunilOS.com 155
- 156. Code Implementation of Linear Regression (cont.) ❑ # Splitting the dataset into the Training set and Test set ❑ from sklearn.model_selection import train_test_split ❑ X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=0) ❑ # Fitting Simple Linear Regression to the Training set ❑ from sklearn.linear_model import LinearRegression ❑ regressor = LinearRegression() ❑ regressor.fit(X_train, y_train) 4/16/2020 www.SunilOS.com 156
- 157. Code Implementation of Linear Regression (cont.) ❑ # Predicting the Test set results ❑ y_pred = regressor.predict(X_test) ❑ #predicting the salary for 5 year Experienced Employee ❑ y_pred = regressor.predict([[5]]) ❑ print(y_pred) 4/16/2020 www.SunilOS.com 157
- 158. Code Implementation of Linear Regression (cont.) ❑ # Visualizing the Training set results ❑ viz_train = plt ❑ viz_train.scatter(X_train, y_train, color='red') ❑ viz_train.plot(X_train, regressor.predict(X_train), color='blue') ❑ viz_train.title('Salary VS Experience (Training set)') ❑ viz_train.xlabel('Year of Experience') ❑ viz_train.ylabel('Salary') ❑ viz_train.show() 4/16/2020 www.SunilOS.com 158
- 160. Code Implementation of Linear Regression (cont.) ❑ # Visualizing the Test set results ❑ viz_test = plt ❑ viz_test.scatter(X_test, y_test, color='red') ❑ viz_test.plot(X_train,regressor.predict(X_t rain), color='blue') ❑ viz_test.title('Salary VS Experience (Test set)') ❑ viz_test.xlabel('Year of Experience') ❑ viz_test.ylabel('Salary') ❑ viz_test.show() 4/16/2020 www.SunilOS.com 160
- 161. Test Data 4/16/2020 www.SunilOS.com 161
- 162. Advantages & Disadvantages of Linear Regression ❑Advantages: o Simple and easy to understand. o Cheap computational cost. o Ground for more complex machine learning algorithms. ❑ ❑Disadvantage: o Oversimplify or fail in non-linear problems (only do well in linear modeling) o Sensitive to outliers and noises 4/16/2020 www.SunilOS.com 162
- 163. Multi Linear Regression ❑In most cases, we will have more than one independent variable — we’ll have multiple variables; it can be as little as two independent variables and up to hundreds (or theoretically even thousands) of variables. ❑In those cases we will use a Multiple Linear Regression model (MLR). The regression equation is pretty much the same as the simple regression equation, just with more variables: Y= b0 + b1X1 + b2X2+...bnXn 4/16/2020 www.SunilOS.com 163
- 164. Implementation Of Multi linear Regression ❑We are taking loan dataset for multi linear regression with age, credit-rating and children as features and loan as target. ❑We are going to predict the loan amount (dependent variable) with the help of age, credit-rating and no of children(Independent variable). ❑Note that the data has four columns, out of which three columns are features and one is the target variable. 4/16/2020 www.SunilOS.com 164
- 166. Relationship between credit-rating and loan amount 4/16/2020 www.SunilOS.com 166
- 167. Code Implementation of MLR ❑ #Features age, credit-rating and no of children ❑ age=[19,18,28,33,32,31,46,37,37,60,25,62,23,56] ❑ credit_rating=[27.9,42.13,33,22.705,28.88,25.74, ❑ 33.44,27.74,29.83,25.84,26.22,26.29,34.4,39.82] ❑ children=[0,1,3,0,0,0,1,3,2,0,0,0,0,0] ❑ #Label data ❑ loan=[16884.924,1725.5523,4449.462,21984.47061,3866. 8552, ❑ 3756.6216,8240.5896,7281.5056,6406.4107,28923.13692, ❑ 2721.3208,27808.7251,1826.843,11090.7178,] 4/16/2020 www.SunilOS.com 167
- 168. Code Implementation of MLR (cont.) ❑ #Combining age, credit-rating and children into single list of tuples ❑ features=list(zip(age,credit_rating,children)) ❑ print(features) ❑ #define the multiple Linear regression model ❑ linear_regress = LinearRegression() ❑ #Fit the multiple Linear regression model ❑ linear_regress.fit(features,loan) ❑ print("coefficient:",linear_regress.coef_) ❑ print("intercept:",linear_regress.intercept_) ❑ # predict with test data ❑ #age:20,credit-rating:32,children:0 ❑ y_pred=linear_regress.predict([[20,32,0]]) ❑ print(y_pred) 4/16/2020 www.SunilOS.com 168
- 169. Disclaimer ❑This is an educational presentation to enhance the skill of computer science students. ❑This presentation is available for free to computer science students. ❑Some internet images from different URLs are used in this presentation to simplify technical examples and correlate examples with the real world. ❑We are grateful to owners of these URLs and pictures. www.SunilOS.com 169