MACHINE LEARNING
(USING R)
SUKHWINDER SINGH ME CS
A COMPUTER PROGRAM IS SAID TO LEARN FROM EXPERIENCE E WITH RESPECT TO SOME TASK TAND
SOME PERFORMANCE MEASURE PIF ITS PERFORMANCE ON T, AS MEASURED BY P, IMPROVES WITH
EXPERIENCE E.
TOPIC: REGRESSION AND CLASSIFICATION
• Regression :- A linear Regression is approach of modeling for finding the
relation between a input variable(X) and output variable(Y).
• simple linear regression: we have a single input attribute (x) and we
want to use linear regression,
• multiple linear regression:If we had multiple input attributes
(e.g. x1, x2, x3, etc.)
Model Evaluation Parameter:- r,R,RMSE,MSE,MAE,Accuracy
Example:- linear model, Polynomial Regression
LINEAR MODEL
lm()
• Based on training data, the learning process computes one weight for each
feature to form a model that can predict or estimate the target value
• formula <- as.formula(paste(target, "~", paste(c(inputs), collapse = "+")))
• Model <- lm(formula, trainDataset) Predicted <- predict(model, testDataset)
• Correlation: r <- cor(Actual,Predicted ), Rsquare = R <- r * r,
• RMSE <- mean(abs(Actual-Predicted)),
• Accuracy <- mean(abs(Actual-Predicted)<=1) Accuracy <- round(accuracy,4)
*100
TOPIC: CLASSIFICATION
• Classification is the task of approximating a mapping function (f) from input variables (X)
to discrete output variables (y).
• Model Evaluation parameter: Confusion matrix,
• Senstivity=TP/TP+FN
• Specificity=TN/TN+FP
• ,Precision=TP/TP+FP
• Accuracy=TP+TN/TP+TN+FN+FP
• Model Support Vector Machines,Decision Trees, Neural Networks
INFORMATION GAIN
• Information gain is the difference of the information in two terms of information before
splitting and information and after splitting using Entropy. Max the Information
gain max the information an attribute consists.
• Entropy is the sum of the probability of each
label times the log probability of that same label.
Decision Tree based on Information gain
model <- rpart(formula, trainDataset, method="class)
FEATURE SELECTION
• Feature: It is a measurable term that play an vital role in the life of dataset. Higher the information of an
attribute greater the chances of its selection.
• Wrapper Method(Supervised): An approach to train system with best subset and estimate the error of
the model. Several technique used in Wrapper methods are:
Foreword: Start from 0 to n, Select attribute with best result.
Backword: Start from bottom and eliminate them one by one till best result.
Recursive feature Elimination.: Use Foreword and backword and select best outcome for your model.
• Filter method(unsupervised): We have to find co-relation or some other statisitical relationship between
target feature and input feature. Pearson, LDA, CHI-Square model are used to predict the best feature.
• Embedded Method:it is combination of Wrapper and Filter Method. Lasso, Ridge regression model
used in this.
NEURAL NETWORK
nnet()
• Neural network is a system of hardware and/or software patterned after the operation of
neurons in the human brain. Neural networks.
• Input Layer | Hidden Layer(Processing layer) | Output layer
• Activation function: wi * xi wii * xii (Weight of connection* Value of input)
• Linear Activation: y=x Sigmoid Activation: y=
1
1+ⅇ−𝑥
• Foreword propagation, backword propagation
• library(nnet) model<- nnet(formula, trainDataset, size=10)
GA (GENETIC ALGORITHM)
ga()
• Genetic Algorithms (GAs) are search based algorithms based on the concepts
of natural selection and genetics.
• Population | Chromosomes | Gene | Allele
• Evaluation: to check parameter
• Selection: Select best parents
• Crossover:Randomised parents to makes new off-spring.
One point croosover | multi point crossover | uniform crossover
• Mutation: New Child born from selected parents
Bit Flip mutation | Swap mutation | Scramble mutation | Inverse mutation
SVM (SUPPORT VECTOR MACHINE)
ksvm()
SVM is a supervised machine learning algorithm that can be employed for both classification and regression
purposes. based on the idea of finding a hyperplane that best divides a dataset into two classes
Hyperplane: a line that linearly separates and classifies a set of data, data points to be as far away from the
hyperplane as possible
Kernal: defines the similarity or a distance measure between new data and the support vectors.
Linear: K(x, xi) = sum(x * xi)
Polynomial: K(x,xi) = 1 + sum(x * xi)^d
Radial: K(x,xi) = exp(-gamma * sum((x – xi^2))
model <- ksvm(formula, trainDataset, kernel="rbfdot")
ENSEMBLING
• Ensembling is a technique of combining two or more algorithms of similar/dissimilar types called base
learners.
• Averaging: It’s the average of predictions from models in case of regression problem classification
problem.
• Majority vote: It’s the prediction with maximum vote / recommendation from
multiple models predictions while predicting the outcomes of a classification problem.
• Weighted average: Different weights are applied to predictions from multiple
models then taking the average which means giving high/low importance to model.
• Bagging: choose ‘n’ observations or rows out of the original dataset each row is selected with
replacement from the original dataset in each iteration. Boosting: First algorithm is trained on the entire
dataset and the subsequent algorithms are built by fitting the residuals of the first
CONCEPT LEARNING
• : learning a category description (concept) from a set of positive and negative training It is Set of objects
defined over large set. It consist of approximating Boolean values function. c:x->X
A search problem for best fitting hypothesis in a hypotheses space.
h is a set of constraints on attributes
• An instance x satisfies an hypothesis h iff all the constraints expressed by h are satisfied by the attribute
values in x.
• Example 1: x1: Sunny, Warm, Normal, Strong, Warm, Same h1: Sunny, ?, ?, Strong, ?, Same Satisfies? Yes
• Example 2: x2: Sunny, Warm, Normal, Strong, Warm, Same h2: Sunny, ?, ?, Ø, ?, Same Satisfies? No
• Most generic <?,?,?,?,?> | most specific <∅, ∅, ∅, ∅, ∅>
FIND S
• A concept c is maximally specific if it covers all positive hypothesis
• A concept c is maximally general if it does not cover any hypothesis
S – set of hypothesis (candidate concepts) = maximum specific generalizations
G – set of hypothesis (candidate concepts) = maximum general specializations
Example: <big, red, circle>F || <small><red,circle>T
Start for Gi, else there is –ve hypothesis starts from Si.
S0=<?,red,circle>
G1=G0
G0
INDUCTIVE VS ANALYTICAL LEARNING
• Inductive learning require a certain number of training examples to achieve a given level of
generalization accuracy. Higher the number of training increase the accuracy., identify feature and
distinguish them in positive/negative feature.
• Ex. Decision tree, Neural network, Genetic algorithm.
• Analytical learning uses the learner’s prior knowledge and deductive reasoning to analyze
individual training examples, in order to discriminate the relevant features from the irrelevant.
• EBL uses prior knowledge to analyze, explain each training example in order to infer which
example feature are relevant to the target function and which are ir-relevent.
INDUCTIVE VS ANALYTICAL LEARNING
• The Inductive Generalization Problem
Given: • Instances | Hypotheses | Target Concept | Training examples of target concept
Determine: Hypotheses consistent with the training examples
Hypothesis: data
The Analytical Generalization Problem
Given: • Instances | Hypotheses | Target Concept | Training examples of target concept | Domain theory
for explaining examples
• Determine: Hypotheses consistent with the training examples and the domain theory
Hypothesis: domain theory
EBL-EXPLANATION BASED LEARNING
• Initialize hypothesis = {}
• For each positive training example not covered by hypothesis:
• 1. Explain how training example satisfies target concept, in terms of domain theory
• 2. Analyze the explanation to determine the most general conditions under which this explanation
(proof) holds
• 3. Refine the hypothesis by adding a new rule, whose preconditions are the above conditions, and
whose consequent asserts the target concept

Machine learning

  • 1.
    MACHINE LEARNING (USING R) SUKHWINDERSINGH ME CS A COMPUTER PROGRAM IS SAID TO LEARN FROM EXPERIENCE E WITH RESPECT TO SOME TASK TAND SOME PERFORMANCE MEASURE PIF ITS PERFORMANCE ON T, AS MEASURED BY P, IMPROVES WITH EXPERIENCE E.
  • 2.
    TOPIC: REGRESSION ANDCLASSIFICATION • Regression :- A linear Regression is approach of modeling for finding the relation between a input variable(X) and output variable(Y). • simple linear regression: we have a single input attribute (x) and we want to use linear regression, • multiple linear regression:If we had multiple input attributes (e.g. x1, x2, x3, etc.) Model Evaluation Parameter:- r,R,RMSE,MSE,MAE,Accuracy Example:- linear model, Polynomial Regression
  • 3.
    LINEAR MODEL lm() • Basedon training data, the learning process computes one weight for each feature to form a model that can predict or estimate the target value • formula <- as.formula(paste(target, "~", paste(c(inputs), collapse = "+"))) • Model <- lm(formula, trainDataset) Predicted <- predict(model, testDataset) • Correlation: r <- cor(Actual,Predicted ), Rsquare = R <- r * r, • RMSE <- mean(abs(Actual-Predicted)), • Accuracy <- mean(abs(Actual-Predicted)<=1) Accuracy <- round(accuracy,4) *100
  • 4.
    TOPIC: CLASSIFICATION • Classificationis the task of approximating a mapping function (f) from input variables (X) to discrete output variables (y). • Model Evaluation parameter: Confusion matrix, • Senstivity=TP/TP+FN • Specificity=TN/TN+FP • ,Precision=TP/TP+FP • Accuracy=TP+TN/TP+TN+FN+FP • Model Support Vector Machines,Decision Trees, Neural Networks
  • 5.
    INFORMATION GAIN • Informationgain is the difference of the information in two terms of information before splitting and information and after splitting using Entropy. Max the Information gain max the information an attribute consists. • Entropy is the sum of the probability of each label times the log probability of that same label. Decision Tree based on Information gain model <- rpart(formula, trainDataset, method="class)
  • 6.
    FEATURE SELECTION • Feature:It is a measurable term that play an vital role in the life of dataset. Higher the information of an attribute greater the chances of its selection. • Wrapper Method(Supervised): An approach to train system with best subset and estimate the error of the model. Several technique used in Wrapper methods are: Foreword: Start from 0 to n, Select attribute with best result. Backword: Start from bottom and eliminate them one by one till best result. Recursive feature Elimination.: Use Foreword and backword and select best outcome for your model. • Filter method(unsupervised): We have to find co-relation or some other statisitical relationship between target feature and input feature. Pearson, LDA, CHI-Square model are used to predict the best feature. • Embedded Method:it is combination of Wrapper and Filter Method. Lasso, Ridge regression model used in this.
  • 7.
    NEURAL NETWORK nnet() • Neuralnetwork is a system of hardware and/or software patterned after the operation of neurons in the human brain. Neural networks. • Input Layer | Hidden Layer(Processing layer) | Output layer • Activation function: wi * xi wii * xii (Weight of connection* Value of input) • Linear Activation: y=x Sigmoid Activation: y= 1 1+ⅇ−𝑥 • Foreword propagation, backword propagation • library(nnet) model<- nnet(formula, trainDataset, size=10)
  • 8.
    GA (GENETIC ALGORITHM) ga() •Genetic Algorithms (GAs) are search based algorithms based on the concepts of natural selection and genetics. • Population | Chromosomes | Gene | Allele • Evaluation: to check parameter • Selection: Select best parents • Crossover:Randomised parents to makes new off-spring. One point croosover | multi point crossover | uniform crossover • Mutation: New Child born from selected parents Bit Flip mutation | Swap mutation | Scramble mutation | Inverse mutation
  • 9.
    SVM (SUPPORT VECTORMACHINE) ksvm() SVM is a supervised machine learning algorithm that can be employed for both classification and regression purposes. based on the idea of finding a hyperplane that best divides a dataset into two classes Hyperplane: a line that linearly separates and classifies a set of data, data points to be as far away from the hyperplane as possible Kernal: defines the similarity or a distance measure between new data and the support vectors. Linear: K(x, xi) = sum(x * xi) Polynomial: K(x,xi) = 1 + sum(x * xi)^d Radial: K(x,xi) = exp(-gamma * sum((x – xi^2)) model <- ksvm(formula, trainDataset, kernel="rbfdot")
  • 10.
    ENSEMBLING • Ensembling isa technique of combining two or more algorithms of similar/dissimilar types called base learners. • Averaging: It’s the average of predictions from models in case of regression problem classification problem. • Majority vote: It’s the prediction with maximum vote / recommendation from multiple models predictions while predicting the outcomes of a classification problem. • Weighted average: Different weights are applied to predictions from multiple models then taking the average which means giving high/low importance to model. • Bagging: choose ‘n’ observations or rows out of the original dataset each row is selected with replacement from the original dataset in each iteration. Boosting: First algorithm is trained on the entire dataset and the subsequent algorithms are built by fitting the residuals of the first
  • 11.
    CONCEPT LEARNING • :learning a category description (concept) from a set of positive and negative training It is Set of objects defined over large set. It consist of approximating Boolean values function. c:x->X A search problem for best fitting hypothesis in a hypotheses space. h is a set of constraints on attributes • An instance x satisfies an hypothesis h iff all the constraints expressed by h are satisfied by the attribute values in x. • Example 1: x1: Sunny, Warm, Normal, Strong, Warm, Same h1: Sunny, ?, ?, Strong, ?, Same Satisfies? Yes • Example 2: x2: Sunny, Warm, Normal, Strong, Warm, Same h2: Sunny, ?, ?, Ø, ?, Same Satisfies? No • Most generic <?,?,?,?,?> | most specific <∅, ∅, ∅, ∅, ∅>
  • 12.
    FIND S • Aconcept c is maximally specific if it covers all positive hypothesis • A concept c is maximally general if it does not cover any hypothesis S – set of hypothesis (candidate concepts) = maximum specific generalizations G – set of hypothesis (candidate concepts) = maximum general specializations Example: <big, red, circle>F || <small><red,circle>T Start for Gi, else there is –ve hypothesis starts from Si. S0=<?,red,circle> G1=G0 G0
  • 13.
    INDUCTIVE VS ANALYTICALLEARNING • Inductive learning require a certain number of training examples to achieve a given level of generalization accuracy. Higher the number of training increase the accuracy., identify feature and distinguish them in positive/negative feature. • Ex. Decision tree, Neural network, Genetic algorithm. • Analytical learning uses the learner’s prior knowledge and deductive reasoning to analyze individual training examples, in order to discriminate the relevant features from the irrelevant. • EBL uses prior knowledge to analyze, explain each training example in order to infer which example feature are relevant to the target function and which are ir-relevent.
  • 14.
    INDUCTIVE VS ANALYTICALLEARNING • The Inductive Generalization Problem Given: • Instances | Hypotheses | Target Concept | Training examples of target concept Determine: Hypotheses consistent with the training examples Hypothesis: data The Analytical Generalization Problem Given: • Instances | Hypotheses | Target Concept | Training examples of target concept | Domain theory for explaining examples • Determine: Hypotheses consistent with the training examples and the domain theory Hypothesis: domain theory
  • 15.
    EBL-EXPLANATION BASED LEARNING •Initialize hypothesis = {} • For each positive training example not covered by hypothesis: • 1. Explain how training example satisfies target concept, in terms of domain theory • 2. Analyze the explanation to determine the most general conditions under which this explanation (proof) holds • 3. Refine the hypothesis by adding a new rule, whose preconditions are the above conditions, and whose consequent asserts the target concept