SlideShare a Scribd company logo
1 of 55
MODULE 4
Rule based classification- 1R. Neural Networks-Back propagation. Support
Vector Machines, Lazy Learners-K Nearest Neighbor Classifier. Accuracy and
error Measures evaluation. Prediction:-Linear Regression and Non-Linear
Regression
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 1
Rule-Based Classification -1R
• Rules are a good way of representing information or bits of
knowledge.
• A rule-based classifier uses a set of IF-THEN rules for
classification.
• An IF-THEN rule is an expression of the form
IF condition THEN conclusion.
An example
• R1: IF age = youth AND student = yes THEN buys computer =
yes
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 2
• The “IF” part (or left side) of a rule is known as
the rule antecedent or precondition.
• The “THEN” part (or right side) is the rule
consequent.
• In the rule antecedent, the condition consists of
one or more attribute tests (e.g., age = youth and
student = yes) that are logically ANDed.
• The rule’s consequent contains a class prediction
(in this case, we are predicting whether a
customer will buy a computer).
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 3
• R1 can also be written as
R1: (age = youth) ∧ (student = yes) ⇒ (buys computer = yes).
• If the condition (i.e., all the attribute tests) in a
rule antecedent holds true for a given tuple,
we say that the rule antecedent is satisfied (or
simply, that the rule is satisfied) and that the
rule covers the tuple.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 4
The following rules to determine classification of grades:
• If 90 ≤ grade, then class=A
• If 80 ≤ grade and grade < 90, then class = B
• If 70 ≤ grade and grade < 80, then class = C
• If 60 ≤ grade and grade < 70, then class =D
• If grade < 60, then class = F
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 5
• A classification rule, r = (a, c), consists of the if or
antecedent, a, part and the then or consequent portion,
c.
• These rules relate directly to the corresponding DT that
could be created. A DT can always be used to generate
rules.
There are differences between rules and trees :
• The tree has an implied order in which the splitting is
performed. Rules have no order.
• A tree is created based on looking at all classes. When
generating rules, only one class must be examined at a
time.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 6
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 7
• If a rule is satisfied by X, the rule is said to be triggered .
• If more than one rule is triggered / if no rule is satisfied by X ,
need a conflict resolution strategy to figure out which rule
gets to fire and assign its class prediction to X.
• Size ordering and Rule ordering.
• The size ordering scheme assigns the highest priority to the
triggering rule with the most attribute tests is fired.
• The rule ordering scheme prioritizes the rules beforehand.
The ordering may be class based or rule-based.
• With class-based ordering, the classes are sorted in order of
decreasing “importance”.
• With rule-based ordering, the rules are organized into one
long priority list, according to some measure of rule quality
such as accuracy, coverage, or size
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 8
Rule Extraction from a Decision Tree
• To extract rules from a decision tree, one rule is
created for each path from the root
to a leaf node.
• Each splitting criterion along a given path is
logically ANDed to form the
rule antecedent (“IF” part).
• The leaf node holds the class prediction, forming
the rule consequent (“THEN” part).
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 9
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 10
Rule Induction Using a Sequential Covering Algorithm
• These techniques are sometimes called
covering algorithms because they attempt to
generate rules exactly cover a specific class .
• Tree algorithms work in a top down divide and
conquer approach, but this need not be the
case for covering algorithms.
• Usually the "best" attribute-value pair is
chosen, as opposed to the best attribute with
the tree-based algorithms
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 11
• Suppose that we wished to generate a rule to
classify persons as tall. The basic format for
the rule is then
If ? then class = tall
• The objective for the covering algorithms is to
replace the "?" in this statement with
predicates that can be used to obtain the
"best" probability of being tall.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 12
1R Classification
• One simple approach is called 1R because it
generates a simple set of rules .
• The basic idea is to choose the best attribute
to perform the classification based on the
training data.
• "Best" is defined here by counting the number
of errors.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 13
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 14
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 15
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 16
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 17
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 18
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 19
• Algorithm: Backpropagation. Neural network
learning for classification or numeric
prediction, using the backpropagation
algorithm.
• Input:
• D, a data set consisting of the training tuples
and their associated target values;
• l, the learning rate;
• network, a multilayer feed-forward network.
• Output: A trained neural network.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 20
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 21
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 22
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 23
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 24
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 25
Lazy Learners
• The classification methods -tree induction, Bayesian classification, rule-
based classification, classification by backpropagation, support vector
machines, and classification based on association rule mining—are
examples of eager learners.
• Eager learners, when given a set of training tuples, will construct a
generalization (i.e., classification) model before receiving new (e.g., test)
tuples to classify.
• In lazy approach, in which the learner instead waits until the last minute
before doing any model construction to classify a given test tuple.
• That is, when given a training tuple, a lazy learner simply stores it (or does
only a little minor processing) and waits until it is given a test tuple.
• Only when it sees the test tuple does it perform generalization to classify
the tuple based on its similarity to the stored training tuples.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 26
• lazy learners do less work when a training tuple is
presented and more work when making a classification
or numeric prediction.
• lazy learners can be computationally expensive.
• They require efficient storage techniques and are well
suited to implementation on parallel hardware.
• They offer little explanation or insight into the data’s
structure.
• Lazy learners, however, naturally support incremental
learning.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 27
k-Nearest-Neighbor Classifiers
• Nearest-neighbor classifiers are based on learning by analogy,
that is, by comparing a given test tuple with training tuples
that are similar to it.
• When given an unknown tuple, a k-nearest-neighbor classifier
searches the pattern space for the k training tuples that are
closest to the unknown tuple. These k training tuples are the k
“nearest neighbors” of the unknown tuple.
• “Closeness” is defined in terms of a distance metric, such as
Euclidean distance. The Euclidean distance between two
points or tuples, say, X1 = (x11, x12,..., x1n) and X2 = (x21,
x22,..., x2n), is
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 28
• The unknown tuple is assigned the most common class among
its k-nearest neighbors. When k = 1, the unknown tuple is
assigned the class of the training tuple that is closest to it in
pattern space.
• Nearest-neighbor classifiers can also be used for numeric
prediction, that is, to return a real-valued prediction for a given
unknown tuple.
• In this case, the classifier returns the average value of the real-
valued labels associated with the k-nearest neighbors of the
unknown tuple
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 29
• For nominal attributes, a simple method is to
compare the corresponding value of the
attribute in tuple X1 with that in tuple X2.
• If the two are identical (e.g., tuples X1 and X2
both have the color blue), then the difference
between the two is taken as 0.
• If the two are different (e.g., tuple X1 is blue
but tuple X2 is red), then the difference is
considered to be 1.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 30
• In general, if the value of a given attribute A is
missing in tuple X1 and/or in tuple X2, assume
the maximum possible difference.
• For nominal attributes, take the difference value
to be 1 if either one or both of the corresponding
values of A are missing.
• If A is numeric and missing from both tuples X1
and X2, then the difference is also taken to be 1.
• If only one value is missing and the other is
present and normalized, then we can take the
difference to be either |1 – v/| or |0 − v/ | (i.e., 1
− v/or v/), whichever is greater.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 31
• The k value that gives the minimum error rate
may be selected.
• Nearest-neighbor classifiers use distance
metric The Manhattan distance, or other
distance measurements to improve accuracy.
• Other techniques to speed up classification
time include the use of partial distance
calculations and editing the stored tuples.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 32
Prediction
• The prediction of continuous values can be modeled by statistical techniques
of regression.
• Linear and multiple regression
• Linear regression is the simplest form of regression.
• In linear regression, data are modeled using a straight
line.
• In linear regression models a random variable, Y (called a
response variable), as a linear function of another
random variable, X (called a predictor variable), i.e.,
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 33
• These coeffcients can be solved for by the
method of least squares.
• Given s samples or data points of the form
(x1,y1), (x2, y2), .., (xs, ys), then the regression
coeffcients can be estimated using this method.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 34
Thus, the equation of the least squares line is estimated by Y = 21.7 + 3.7X
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 35
• Multiple regression is an extension of linear
regression involving more than one predictor
variable
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 36
• Nonlinear regression
• Given response variable and predictor variables
have a relationship that may be modeled by a
polynomial function.
• Polynomial regression can be modeled by adding
polynomial terms to the basic linear model.
• By applying transformations to the variables, we
can convert the nonlinear model into a linear one
that can then be solved by the method of least
squares.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 37
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 38
Accuracy and error Measures evaluation-
Classifier accuracy measures
• Accuracy is measured on a test set consisting of
class-labeled tuples that were not used to train the
model.
• The accuracy of a classifier on a given test set is the
percentage of test set tuples that are correctly
classified by the classifier.
• This is also referred to as the overall recognition rate
of the classifier, that is, it reflects how well the
classifier recognizes tuples of the various classes.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 39
• Error rate or misclassification rate of a
classifier, M, which is simply 1-Acc(M), where
Acc(M) is the accuracy of M.
• The confusion matrix is a useful tool for
accuracy measurement.
• Given m classes, a confusion matrix is a table
of at least size m by m.
• An entry, CMi, j in the first m rows and m
columns indicates the number of tuples of
class i that were labeled by the classifier as
class j.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 40
• For a classifier to have good accuracy, ideally most of
the tuples would be represented along the diagonal
of the confusion matrix, from entry CM 1, 1 to entry
CM m, m, with the rest of the entries being close to
zero.
• The table may have additional rows or columns to
provide totals or recognition rates per class.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 41
Given two classes
• positive tuples (tuples of the main class of interest, e.g.,
buys computer = yes)
• negative tuples (e.g., buys computer = no)
• True positives refer to the positive tuples that were
correctly labeled by the classifier,
• True negatives are the negative tuples that were correctly
labeled by the classifier.
• False positives are the negative tuples that were
incorrectly labeled (e.g., tuples of classbuys computer = no
for which the classifier predicted buys computer = yes).
• False negatives are the positive tuples that were
incorrectly labeled (e.g., tuples of classbuys computer =
yes for which the classifier predicted buys computer = no).
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 42
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 43
• The sensitivity and specificity measures can
also be used as accuracy measures.
• Sensitivity is also referred to as the true
positive (recognition) rate (that is, the
proportion of positive tuples that are correctly
identified)
• Specificity is the true negative rate (that is,
the proportion of negative tuples that are
correctly identified).
• Precision is used to access the percentage of
tuples labeled as “yes” that actually are “yes”
tuples.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 44
where t pos is the number of true positives
pos is the number of positive tuples,
t neg is the number of true negatives
neg is the number of negative tuples,
f pos is the number of false positives
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 45
Predictor Error Measures
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 46
The mean squared error exaggerates the presence of outliers, while the mean absolute
error does not.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 47
Evaluating the Accuracy of a Classifier or Predictor
• Holdout, random sub sampling, cross
validation, and the bootstrap are common
techniques for assessing accuracy based on
randomly sampled partitions of the given
data.
• The use of such techniques to estimate
accuracy increases the overall computation
time, yet is useful for model selection.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 48
The holdout method
• In this method, the given data are randomly partitioned into
two independent sets, a training set and a test set.
• Typically, two-thirds of the data are allocated to the training
set, and the remaining one-third is allocated to the test set.
• The training set is used to derive the model, whose accuracy
is estimated with the test set
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 49
Random subsampling
• Random subsampling is a variation of the
holdout method in which the holdout method
is repeated k times.
• The overall accuracy estimate is taken as the
average of the accuracies obtained from each
iteration.
• (For prediction, we can take the average of
thepredictor error rates.)
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 50
Cross-validation
• In k-fold cross-validation, the initial data are
randomly partitioned into k mutually exclusive
subsets or “folds,” D1, D2, : : : , Dk, each of
approximately equal size.
• Training and testing is performed k times.
• In iteration i, partition Di is reserved as the test
set, and the remaining partitions are collectively
used to train the model.
• That is, in the first iteration, subsets D2, : : : , Dk
collectively serve as the training set in order to
obtain a first model, which is tested on D1;
• the second iteration is trained on subsets D1, D3, :
: : , Dk and tested on D2; and so on.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 51
• each sample is used the same number of
times for training and once for testing.
• For classification, the accuracy estimate is the
overall number of correct classifications from
the k iterations, divided by the total number of
tuples in the initial data.
• For prediction, the error estimate can be
computed as the total loss from the k
iterations, divided by the total number of
initial tuples.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 52
• Leave-one-out is a special case of k-fold cross-
validation where k is set to the number of initial
tuples.
• That is, only one sample is “left out” at a time for
the test set.
• In stratified cross-validation, the folds are stratified
so that the class distribution of the tuples in each
fold is approximately the same as that in the initial
data.
• In general, stratified 10-fold cross-validation is
recommended for estimating accuracy (even if
computation power allows using more folds) due
to its relatively low bias and variance.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 53
Bootstrap
• The bootstrap method samples the given
training tuples uniformly with replacement.
• That is, each time a tuple is selected, it is
equally likely to be selected again and readded
to the training set.
• In sampling with replacement, the machine is
allowed to select the same tuple more than
• once
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 54
• A commonly used one is the .632 bootstrap,
which works as follows. Suppose we are given a
data set of d tuples.
• The data set is sampled d times, with
replacement, resulting in a bootstrap sample or
training set of d samples.
• some of the original data tuples will occur more
than oncemin this sample.
• The data tuples that did not make it into the
training set end up forming the test set.
• on average, 63.2% of the original data tuples will
end up in the bootstrap, and the remaining 36.8%
will form the test set (hence, the name, .632
bootstrap.)
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 55

More Related Content

What's hot

Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learningSANTHOSH RAJA M G
 
Cs501 classification prediction
Cs501 classification predictionCs501 classification prediction
Cs501 classification predictionKamal Singh Lodhi
 
Class imbalance problem1
Class imbalance problem1Class imbalance problem1
Class imbalance problem1chs71
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxnikshaikh786
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessingSalah Amean
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data MiningValerii Klymchuk
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmhktripathy
 
Chapter 4 Classification
Chapter 4 ClassificationChapter 4 Classification
Chapter 4 ClassificationKhalid Elshafie
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptxRashi Agarwal
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic conceptsKrish_ver2
 
Hierarchical clustering of multi class data (the zoo dataset)
Hierarchical clustering of multi class data (the zoo dataset)Hierarchical clustering of multi class data (the zoo dataset)
Hierarchical clustering of multi class data (the zoo dataset)Raid Mahbouba
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsSalah Amean
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Yan Xu
 

What's hot (20)

Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Cs501 classification prediction
Cs501 classification predictionCs501 classification prediction
Cs501 classification prediction
 
Class imbalance problem1
Class imbalance problem1Class imbalance problem1
Class imbalance problem1
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
08 clustering
08 clustering08 clustering
08 clustering
 
MODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptxMODULE 4_ CLUSTERING.pptx
MODULE 4_ CLUSTERING.pptx
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Graph clustering
Graph clusteringGraph clustering
Graph clustering
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
Chapter 4 Classification
Chapter 4 ClassificationChapter 4 Classification
Chapter 4 Classification
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptx
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
 
Hierarchical clustering of multi class data (the zoo dataset)
Hierarchical clustering of multi class data (the zoo dataset)Hierarchical clustering of multi class data (the zoo dataset)
Hierarchical clustering of multi class data (the zoo dataset)
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
 
Missing data handling
Missing data handlingMissing data handling
Missing data handling
 
Decision tree learning
Decision tree learningDecision tree learning
Decision tree learning
 
Association rules
Association rulesAssociation rules
Association rules
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
 

Similar to CS 402 DATAMINING AND WAREHOUSING -MODULE 4

UNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningUNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningNandakumar P
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
Predictive analytics
Predictive analyticsPredictive analytics
Predictive analyticsDinakar nk
 
cluster analysis(1).pptxbfdhdhhthjhfghhj
cluster analysis(1).pptxbfdhdhhthjhfghhjcluster analysis(1).pptxbfdhdhhthjhfghhj
cluster analysis(1).pptxbfdhdhhthjhfghhjKaranSingh784447
 
Clustering and Classification Algorithms Ankita Dubey
Clustering and Classification Algorithms Ankita DubeyClustering and Classification Algorithms Ankita Dubey
Clustering and Classification Algorithms Ankita DubeyAnkita Dubey
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
Algorithm explanations
Algorithm explanationsAlgorithm explanations
Algorithm explanationsnikita kapil
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsNithyananthSengottai
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptxsurbhidutta4
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
 
Training machine learning knn 2017
Training machine learning knn 2017Training machine learning knn 2017
Training machine learning knn 2017Iwan Sofana
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Dalei Li
 

Similar to CS 402 DATAMINING AND WAREHOUSING -MODULE 4 (20)

UNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningUNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data Mining
 
Moviereview prjct
Moviereview prjctMoviereview prjct
Moviereview prjct
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Predictive analytics
Predictive analyticsPredictive analytics
Predictive analytics
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
 
cluster analysis(1).pptxbfdhdhhthjhfghhj
cluster analysis(1).pptxbfdhdhhthjhfghhjcluster analysis(1).pptxbfdhdhhthjhfghhj
cluster analysis(1).pptxbfdhdhhthjhfghhj
 
Clustering and Classification Algorithms Ankita Dubey
Clustering and Classification Algorithms Ankita DubeyClustering and Classification Algorithms Ankita Dubey
Clustering and Classification Algorithms Ankita Dubey
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
Algorithm explanations
Algorithm explanationsAlgorithm explanations
Algorithm explanations
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptx
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
 
ML MODULE 4.pdf
ML MODULE 4.pdfML MODULE 4.pdf
ML MODULE 4.pdf
 
Training machine learning knn 2017
Training machine learning knn 2017Training machine learning knn 2017
Training machine learning knn 2017
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 

Recently uploaded

Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 

Recently uploaded (20)

Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 

CS 402 DATAMINING AND WAREHOUSING -MODULE 4

  • 1. MODULE 4 Rule based classification- 1R. Neural Networks-Back propagation. Support Vector Machines, Lazy Learners-K Nearest Neighbor Classifier. Accuracy and error Measures evaluation. Prediction:-Linear Regression and Non-Linear Regression 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 1
  • 2. Rule-Based Classification -1R • Rules are a good way of representing information or bits of knowledge. • A rule-based classifier uses a set of IF-THEN rules for classification. • An IF-THEN rule is an expression of the form IF condition THEN conclusion. An example • R1: IF age = youth AND student = yes THEN buys computer = yes 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 2
  • 3. • The “IF” part (or left side) of a rule is known as the rule antecedent or precondition. • The “THEN” part (or right side) is the rule consequent. • In the rule antecedent, the condition consists of one or more attribute tests (e.g., age = youth and student = yes) that are logically ANDed. • The rule’s consequent contains a class prediction (in this case, we are predicting whether a customer will buy a computer). 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 3
  • 4. • R1 can also be written as R1: (age = youth) ∧ (student = yes) ⇒ (buys computer = yes). • If the condition (i.e., all the attribute tests) in a rule antecedent holds true for a given tuple, we say that the rule antecedent is satisfied (or simply, that the rule is satisfied) and that the rule covers the tuple. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 4
  • 5. The following rules to determine classification of grades: • If 90 ≤ grade, then class=A • If 80 ≤ grade and grade < 90, then class = B • If 70 ≤ grade and grade < 80, then class = C • If 60 ≤ grade and grade < 70, then class =D • If grade < 60, then class = F 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 5
  • 6. • A classification rule, r = (a, c), consists of the if or antecedent, a, part and the then or consequent portion, c. • These rules relate directly to the corresponding DT that could be created. A DT can always be used to generate rules. There are differences between rules and trees : • The tree has an implied order in which the splitting is performed. Rules have no order. • A tree is created based on looking at all classes. When generating rules, only one class must be examined at a time. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 6
  • 8. • If a rule is satisfied by X, the rule is said to be triggered . • If more than one rule is triggered / if no rule is satisfied by X , need a conflict resolution strategy to figure out which rule gets to fire and assign its class prediction to X. • Size ordering and Rule ordering. • The size ordering scheme assigns the highest priority to the triggering rule with the most attribute tests is fired. • The rule ordering scheme prioritizes the rules beforehand. The ordering may be class based or rule-based. • With class-based ordering, the classes are sorted in order of decreasing “importance”. • With rule-based ordering, the rules are organized into one long priority list, according to some measure of rule quality such as accuracy, coverage, or size 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 8
  • 9. Rule Extraction from a Decision Tree • To extract rules from a decision tree, one rule is created for each path from the root to a leaf node. • Each splitting criterion along a given path is logically ANDed to form the rule antecedent (“IF” part). • The leaf node holds the class prediction, forming the rule consequent (“THEN” part). 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 9
  • 11. Rule Induction Using a Sequential Covering Algorithm • These techniques are sometimes called covering algorithms because they attempt to generate rules exactly cover a specific class . • Tree algorithms work in a top down divide and conquer approach, but this need not be the case for covering algorithms. • Usually the "best" attribute-value pair is chosen, as opposed to the best attribute with the tree-based algorithms 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 11
  • 12. • Suppose that we wished to generate a rule to classify persons as tall. The basic format for the rule is then If ? then class = tall • The objective for the covering algorithms is to replace the "?" in this statement with predicates that can be used to obtain the "best" probability of being tall. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 12
  • 13. 1R Classification • One simple approach is called 1R because it generates a simple set of rules . • The basic idea is to choose the best attribute to perform the classification based on the training data. • "Best" is defined here by counting the number of errors. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 13
  • 20. • Algorithm: Backpropagation. Neural network learning for classification or numeric prediction, using the backpropagation algorithm. • Input: • D, a data set consisting of the training tuples and their associated target values; • l, the learning rate; • network, a multilayer feed-forward network. • Output: A trained neural network. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 20
  • 26. Lazy Learners • The classification methods -tree induction, Bayesian classification, rule- based classification, classification by backpropagation, support vector machines, and classification based on association rule mining—are examples of eager learners. • Eager learners, when given a set of training tuples, will construct a generalization (i.e., classification) model before receiving new (e.g., test) tuples to classify. • In lazy approach, in which the learner instead waits until the last minute before doing any model construction to classify a given test tuple. • That is, when given a training tuple, a lazy learner simply stores it (or does only a little minor processing) and waits until it is given a test tuple. • Only when it sees the test tuple does it perform generalization to classify the tuple based on its similarity to the stored training tuples. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 26
  • 27. • lazy learners do less work when a training tuple is presented and more work when making a classification or numeric prediction. • lazy learners can be computationally expensive. • They require efficient storage techniques and are well suited to implementation on parallel hardware. • They offer little explanation or insight into the data’s structure. • Lazy learners, however, naturally support incremental learning. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 27
  • 28. k-Nearest-Neighbor Classifiers • Nearest-neighbor classifiers are based on learning by analogy, that is, by comparing a given test tuple with training tuples that are similar to it. • When given an unknown tuple, a k-nearest-neighbor classifier searches the pattern space for the k training tuples that are closest to the unknown tuple. These k training tuples are the k “nearest neighbors” of the unknown tuple. • “Closeness” is defined in terms of a distance metric, such as Euclidean distance. The Euclidean distance between two points or tuples, say, X1 = (x11, x12,..., x1n) and X2 = (x21, x22,..., x2n), is 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 28
  • 29. • The unknown tuple is assigned the most common class among its k-nearest neighbors. When k = 1, the unknown tuple is assigned the class of the training tuple that is closest to it in pattern space. • Nearest-neighbor classifiers can also be used for numeric prediction, that is, to return a real-valued prediction for a given unknown tuple. • In this case, the classifier returns the average value of the real- valued labels associated with the k-nearest neighbors of the unknown tuple 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 29
  • 30. • For nominal attributes, a simple method is to compare the corresponding value of the attribute in tuple X1 with that in tuple X2. • If the two are identical (e.g., tuples X1 and X2 both have the color blue), then the difference between the two is taken as 0. • If the two are different (e.g., tuple X1 is blue but tuple X2 is red), then the difference is considered to be 1. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 30
  • 31. • In general, if the value of a given attribute A is missing in tuple X1 and/or in tuple X2, assume the maximum possible difference. • For nominal attributes, take the difference value to be 1 if either one or both of the corresponding values of A are missing. • If A is numeric and missing from both tuples X1 and X2, then the difference is also taken to be 1. • If only one value is missing and the other is present and normalized, then we can take the difference to be either |1 – v/| or |0 − v/ | (i.e., 1 − v/or v/), whichever is greater. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 31
  • 32. • The k value that gives the minimum error rate may be selected. • Nearest-neighbor classifiers use distance metric The Manhattan distance, or other distance measurements to improve accuracy. • Other techniques to speed up classification time include the use of partial distance calculations and editing the stored tuples. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 32
  • 33. Prediction • The prediction of continuous values can be modeled by statistical techniques of regression. • Linear and multiple regression • Linear regression is the simplest form of regression. • In linear regression, data are modeled using a straight line. • In linear regression models a random variable, Y (called a response variable), as a linear function of another random variable, X (called a predictor variable), i.e., 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 33
  • 34. • These coeffcients can be solved for by the method of least squares. • Given s samples or data points of the form (x1,y1), (x2, y2), .., (xs, ys), then the regression coeffcients can be estimated using this method. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 34
  • 35. Thus, the equation of the least squares line is estimated by Y = 21.7 + 3.7X 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 35
  • 36. • Multiple regression is an extension of linear regression involving more than one predictor variable 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 36
  • 37. • Nonlinear regression • Given response variable and predictor variables have a relationship that may be modeled by a polynomial function. • Polynomial regression can be modeled by adding polynomial terms to the basic linear model. • By applying transformations to the variables, we can convert the nonlinear model into a linear one that can then be solved by the method of least squares. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 37
  • 39. Accuracy and error Measures evaluation- Classifier accuracy measures • Accuracy is measured on a test set consisting of class-labeled tuples that were not used to train the model. • The accuracy of a classifier on a given test set is the percentage of test set tuples that are correctly classified by the classifier. • This is also referred to as the overall recognition rate of the classifier, that is, it reflects how well the classifier recognizes tuples of the various classes. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 39
  • 40. • Error rate or misclassification rate of a classifier, M, which is simply 1-Acc(M), where Acc(M) is the accuracy of M. • The confusion matrix is a useful tool for accuracy measurement. • Given m classes, a confusion matrix is a table of at least size m by m. • An entry, CMi, j in the first m rows and m columns indicates the number of tuples of class i that were labeled by the classifier as class j. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 40
  • 41. • For a classifier to have good accuracy, ideally most of the tuples would be represented along the diagonal of the confusion matrix, from entry CM 1, 1 to entry CM m, m, with the rest of the entries being close to zero. • The table may have additional rows or columns to provide totals or recognition rates per class. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 41
  • 42. Given two classes • positive tuples (tuples of the main class of interest, e.g., buys computer = yes) • negative tuples (e.g., buys computer = no) • True positives refer to the positive tuples that were correctly labeled by the classifier, • True negatives are the negative tuples that were correctly labeled by the classifier. • False positives are the negative tuples that were incorrectly labeled (e.g., tuples of classbuys computer = no for which the classifier predicted buys computer = yes). • False negatives are the positive tuples that were incorrectly labeled (e.g., tuples of classbuys computer = yes for which the classifier predicted buys computer = no). 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 42
  • 44. • The sensitivity and specificity measures can also be used as accuracy measures. • Sensitivity is also referred to as the true positive (recognition) rate (that is, the proportion of positive tuples that are correctly identified) • Specificity is the true negative rate (that is, the proportion of negative tuples that are correctly identified). • Precision is used to access the percentage of tuples labeled as “yes” that actually are “yes” tuples. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 44
  • 45. where t pos is the number of true positives pos is the number of positive tuples, t neg is the number of true negatives neg is the number of negative tuples, f pos is the number of false positives 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 45
  • 46. Predictor Error Measures 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 46
  • 47. The mean squared error exaggerates the presence of outliers, while the mean absolute error does not. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 47
  • 48. Evaluating the Accuracy of a Classifier or Predictor • Holdout, random sub sampling, cross validation, and the bootstrap are common techniques for assessing accuracy based on randomly sampled partitions of the given data. • The use of such techniques to estimate accuracy increases the overall computation time, yet is useful for model selection. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 48
  • 49. The holdout method • In this method, the given data are randomly partitioned into two independent sets, a training set and a test set. • Typically, two-thirds of the data are allocated to the training set, and the remaining one-third is allocated to the test set. • The training set is used to derive the model, whose accuracy is estimated with the test set 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 49
  • 50. Random subsampling • Random subsampling is a variation of the holdout method in which the holdout method is repeated k times. • The overall accuracy estimate is taken as the average of the accuracies obtained from each iteration. • (For prediction, we can take the average of thepredictor error rates.) 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 50
  • 51. Cross-validation • In k-fold cross-validation, the initial data are randomly partitioned into k mutually exclusive subsets or “folds,” D1, D2, : : : , Dk, each of approximately equal size. • Training and testing is performed k times. • In iteration i, partition Di is reserved as the test set, and the remaining partitions are collectively used to train the model. • That is, in the first iteration, subsets D2, : : : , Dk collectively serve as the training set in order to obtain a first model, which is tested on D1; • the second iteration is trained on subsets D1, D3, : : : , Dk and tested on D2; and so on. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 51
  • 52. • each sample is used the same number of times for training and once for testing. • For classification, the accuracy estimate is the overall number of correct classifications from the k iterations, divided by the total number of tuples in the initial data. • For prediction, the error estimate can be computed as the total loss from the k iterations, divided by the total number of initial tuples. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 52
  • 53. • Leave-one-out is a special case of k-fold cross- validation where k is set to the number of initial tuples. • That is, only one sample is “left out” at a time for the test set. • In stratified cross-validation, the folds are stratified so that the class distribution of the tuples in each fold is approximately the same as that in the initial data. • In general, stratified 10-fold cross-validation is recommended for estimating accuracy (even if computation power allows using more folds) due to its relatively low bias and variance. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 53
  • 54. Bootstrap • The bootstrap method samples the given training tuples uniformly with replacement. • That is, each time a tuple is selected, it is equally likely to be selected again and readded to the training set. • In sampling with replacement, the machine is allowed to select the same tuple more than • once 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 54
  • 55. • A commonly used one is the .632 bootstrap, which works as follows. Suppose we are given a data set of d tuples. • The data set is sampled d times, with replacement, resulting in a bootstrap sample or training set of d samples. • some of the original data tuples will occur more than oncemin this sample. • The data tuples that did not make it into the training set end up forming the test set. • on average, 63.2% of the original data tuples will end up in the bootstrap, and the remaining 36.8% will form the test set (hence, the name, .632 bootstrap.) 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 55