SlideShare a Scribd company logo
Lecture-32Lecture-32
What is classification? What isWhat is classification? What is
prediction?prediction?
Classification:Classification:

predicts categorical class labelspredicts categorical class labels

classifies data (constructs a model) based on theclassifies data (constructs a model) based on the
training set and the values (class labels) in atraining set and the values (class labels) in a
classifying attribute and uses it in classifying new dataclassifying attribute and uses it in classifying new data
Prediction:Prediction:

models continuous-valued functionsmodels continuous-valued functions

predicts unknown or missing valuespredicts unknown or missing values
ApplicationsApplications

credit approvalcredit approval

target marketingtarget marketing

medical diagnosismedical diagnosis

treatment effectiveness analysistreatment effectiveness analysis
What is Classification & PredictionWhat is Classification & Prediction
Lecture-32 - What is classification? What is prediction?Lecture-32 - What is classification? What is prediction?
Classification—A Two-Step ProcessClassification—A Two-Step Process
Learning step- describing a set of predeterminedLearning step- describing a set of predetermined
classesclasses

Each tuple/sample is assumed to belong to a predefined class,Each tuple/sample is assumed to belong to a predefined class,
as determined by the class label attributeas determined by the class label attribute

The set of tuples used for model construction: training setThe set of tuples used for model construction: training set

The model is represented as classification rules, decision trees,The model is represented as classification rules, decision trees,
or mathematical formulaeor mathematical formulae
Classification- for classifying future or unknown objectsClassification- for classifying future or unknown objects

Estimate accuracy of the modelEstimate accuracy of the model
The known label of test sample is compared with theThe known label of test sample is compared with the
classified result from the modelclassified result from the model
Accuracy rate is the percentage of test set samples that areAccuracy rate is the percentage of test set samples that are
correctly classified by the modelcorrectly classified by the model
Test set is independent of training set, otherwise over-fittingTest set is independent of training set, otherwise over-fitting
will occurwill occur
Lecture-32 - What is classification? What is prediction?Lecture-32 - What is classification? What is prediction?
Classification Process : Model ConstructionClassification Process : Model Construction
Training
Data
NAME RANK YEARS TENURED
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
Classification
Algorithms
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifier
(Model)
Lecture-32 - What is classification? What is prediction?Lecture-32 - What is classification? What is prediction?
Classification Process : Use the Model inClassification Process : Use the Model in
PredictionPrediction
Classifier
Testing
Data
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Unseen Data
(Jeff, Professor, 4)
Tenured?
Lecture-32 - What is classification? What is prediction?Lecture-32 - What is classification? What is prediction?
Supervised vs. Unsupervised LearningSupervised vs. Unsupervised Learning
Supervised learning (classification)Supervised learning (classification)

Supervision: The training data (observations,Supervision: The training data (observations,
measurements, etc.) are accompanied by labelsmeasurements, etc.) are accompanied by labels
indicating the class of the observationsindicating the class of the observations

New data is classified based on the training setNew data is classified based on the training set
Unsupervised learning (clustering)Unsupervised learning (clustering)

The class labels of training data is unknownThe class labels of training data is unknown

Given a set of measurements, observations, etc. withGiven a set of measurements, observations, etc. with
the aim of establishing the existence of classes orthe aim of establishing the existence of classes or
clusters in the dataclusters in the data
Lecture-32 - What is classification? What is prediction?Lecture-32 - What is classification? What is prediction?
Lecture-33Lecture-33
Issues regarding classificationIssues regarding classification
and predictionand prediction
Issues regarding classification andIssues regarding classification and
prediction -prediction -Preparing the data forPreparing the data for
classification and predictionclassification and prediction
Data cleaningData cleaning

Preprocess data in order to reduce noise and handlePreprocess data in order to reduce noise and handle
missing valuesmissing values
Relevance analysis (feature selection)Relevance analysis (feature selection)

Remove the irrelevant or redundant attributesRemove the irrelevant or redundant attributes
Data transformationData transformation

Generalize and/or normalize dataGeneralize and/or normalize data
Lecture-33 - Issues regarding classification and predictionLecture-33 - Issues regarding classification and prediction
Issues regarding classification and predictionIssues regarding classification and prediction
Comparing Classification MethodsComparing Classification Methods
AccuracyAccuracy
Speed and scalabilitySpeed and scalability

time to construct the modeltime to construct the model

time to use the modeltime to use the model
RobustnessRobustness

handling noise and missing valueshandling noise and missing values
ScalabilityScalability

efficiency in disk-resident databasesefficiency in disk-resident databases
Interpretability:Interpretability:

understanding and insight provded by the modelunderstanding and insight provded by the model
interpretabilityinterpretability

decision tree sizedecision tree size

compactness of classification rulescompactness of classification rules
Lecture-33 - Issues regarding classification and predictionLecture-33 - Issues regarding classification and prediction
Lecture-34Lecture-34
Classification by decision treeClassification by decision tree
inductioninduction
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
Classification by Decision Tree InductionClassification by Decision Tree Induction
Decision treeDecision tree

A flow-chart-like tree structureA flow-chart-like tree structure

Internal node denotes a test on an attributeInternal node denotes a test on an attribute

Branch represents an outcome of the testBranch represents an outcome of the test

Leaf nodes represent class labels or class distributionLeaf nodes represent class labels or class distribution
Decision tree generation consists of two phasesDecision tree generation consists of two phases

Tree constructionTree construction
At start, all the training examples are at the rootAt start, all the training examples are at the root
Partition examples recursively based on selected attributesPartition examples recursively based on selected attributes

Tree pruningTree pruning
Identify and remove branches that reflect noise or outliersIdentify and remove branches that reflect noise or outliers
Use of decision tree: Classifying an unknown sampleUse of decision tree: Classifying an unknown sample

Test the attribute values of the sample against the decision treeTest the attribute values of the sample against the decision tree
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
Training DatasetTraining Dataset
age income student credit_rating
<=30 high no fair
<=30 high no excellent
31…40 high no fair
>40 medium no fair
>40 low yes fair
>40 low yes excellent
31…40 low yes excellent
<=30 medium no fair
<=30 low yes fair
>40 medium yes fair
<=30 medium yes excellent
31…40 medium no excellent
31…40 high yes fair
>40 medium no excellent
This
follows
an
example
from
Quinlan’s
ID3
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
Output: A Decision Tree for “Output: A Decision Tree for “buys_computer”buys_computer”
age?
overcast
student? credit rating?
no yes fairexcellent
<=30 >40
no noyes yes
yes
30..40
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
Algorithm for Decision Tree InductionAlgorithm for Decision Tree Induction
Basic algorithm (a greedy algorithm)Basic algorithm (a greedy algorithm)

Tree is constructed in a top-down recursive divide-and-conquerTree is constructed in a top-down recursive divide-and-conquer
mannermanner

At start, all the training examples are at the rootAt start, all the training examples are at the root

Attributes are categorical (if continuous-valued, they areAttributes are categorical (if continuous-valued, they are
discretized in advance)discretized in advance)

Examples are partitioned recursively based on selected attributesExamples are partitioned recursively based on selected attributes

Test attributes are selected on the basis of a heuristic or statisticalTest attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)measure (e.g., information gain)
Conditions for stopping partitioningConditions for stopping partitioning

All samples for a given node belong to the same classAll samples for a given node belong to the same class

There are no remaining attributes for further partitioning – majorityThere are no remaining attributes for further partitioning – majority
voting is employed for classifying the leafvoting is employed for classifying the leaf

There are no samples leftThere are no samples left
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
Attribute Selection MeasureAttribute Selection Measure
Information gain (ID3/C4.5)Information gain (ID3/C4.5)

All attributes are assumed to be categoricalAll attributes are assumed to be categorical

Can be modified for continuous-valued attributesCan be modified for continuous-valued attributes
Gini index (IBM IntelligentMiner)Gini index (IBM IntelligentMiner)

All attributes are assumed continuous-valuedAll attributes are assumed continuous-valued

Assume there exist several possible split values for eachAssume there exist several possible split values for each
attributeattribute

May need other tools, such as clustering, to get theMay need other tools, such as clustering, to get the
possible split valuespossible split values

Can be modified for categorical attributesCan be modified for categorical attributes
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
Information Gain (ID3/C4.5)Information Gain (ID3/C4.5)
Select the attribute with the highest informationSelect the attribute with the highest information
gaingain
Assume there are two classes,Assume there are two classes, PP andand NN

Let the set of examplesLet the set of examples SS containcontain pp elements of classelements of class PP
andand nn elements of classelements of class NN

The amount of information, needed to decide if anThe amount of information, needed to decide if an
arbitrary example inarbitrary example in SS belongs tobelongs to PP oror NN is defined asis defined as
np
n
np
n
np
p
np
p
npI
++
−
++
−= 22 loglog),(
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
Information Gain in Decision TreeInformation Gain in Decision Tree
InductionInduction
Assume that using attribute A a setAssume that using attribute A a set SS will bewill be
partitioned into sets {partitioned into sets {SS11,, SS22 , …,, …, SSvv}}
 IfIf SSii containscontains ppii examples ofexamples of PP andand nnii examples ofexamples of NN,,
the entropy, or the expected information needed tothe entropy, or the expected information needed to
classify objects in all subtreesclassify objects in all subtrees SSii isis
The encoding information that would be gainedThe encoding information that would be gained
by branching onby branching on AA
∑
= +
+
=
ν
1
),()(
i
ii
ii
npI
np
np
AE
)(),()( AEnpIAGain −=
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
Attribute Selection by Information GainAttribute Selection by Information Gain
ComputationComputation
 Class P: buys_computerClass P: buys_computer
= “yes”= “yes”
 Class N: buys_computerClass N: buys_computer
= “no”= “no”
 I(p, n) = I(9, 5) =0.940I(p, n) = I(9, 5) =0.940
 Compute the entropy forCompute the entropy for
ageage::
HenceHence
SimilarlySimilarly
age pi ni I(pi, ni)
<=30 2 3 0.971
30…40 4 0 0
>40 3 2 0.971
69.0)2,3(
14
5
)0,4(
14
4
)3,2(
14
5
)(
=+
+=
I
IIageE
048.0)_(
151.0)(
029.0)(
=
=
=
ratingcreditGain
studentGain
incomeGain
)(),()( ageEnpIageGain −=
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
GiniGini Index (IBM IntelligentMiner)Index (IBM IntelligentMiner)
If a data setIf a data set TT contains examples fromcontains examples from nn classes,classes,
gini index,gini index, ginigini((TT) is defined as) is defined as
wherewhere ppjj is the relative frequency of classis the relative frequency of class jj inin T.T.
If a data setIf a data set TT is split into two subsetsis split into two subsets TT11 andand TT22 withwith
sizessizes NN11 andand NN22 respectively, therespectively, the ginigini index of theindex of the
split data contains examples fromsplit data contains examples from nn classes, theclasses, the
ginigini indexindex ginigini((TT) is defined as) is defined as
The attribute provides the smallestThe attribute provides the smallest giniginisplitsplit((TT) is) is
chosen to split the node (chosen to split the node (need to enumerate allneed to enumerate all
possible splitting points for each attributepossible splitting points for each attribute).).
∑
=
−=
n
j
p jTgini
1
21)(
)()()( 2
2
1
1
Tgini
N
N
Tgini
N
NTginisplit
+=
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
Extracting Classification Rules from TreesExtracting Classification Rules from Trees
Represent the knowledge in the form of IF-THEN rulesRepresent the knowledge in the form of IF-THEN rules
One rule is created for each path from the root to a leafOne rule is created for each path from the root to a leaf
Each attribute-value pair along a path forms a conjunctionEach attribute-value pair along a path forms a conjunction
The leaf node holds the class predictionThe leaf node holds the class prediction
Rules are easier for humans to understandRules are easier for humans to understand
ExampleExample
IFIF ageage = “<=30” AND= “<=30” AND studentstudent = “= “nono” THEN” THEN buys_computerbuys_computer = “= “nono””
IFIF ageage = “<=30” AND= “<=30” AND studentstudent = “= “yesyes” THEN” THEN buys_computerbuys_computer = “= “yesyes””
IFIF ageage = “31…40”= “31…40” THENTHEN buys_computerbuys_computer = “= “yesyes””
IFIF ageage = “>40” AND= “>40” AND credit_ratingcredit_rating = “= “excellentexcellent” THEN” THEN
buys_computerbuys_computer = “= “yesyes””
IFIF ageage = “>40” AND= “>40” AND credit_ratingcredit_rating = “= “fairfair” THEN” THEN buys_computerbuys_computer = “= “nono””
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
Avoid Overfitting in ClassificationAvoid Overfitting in Classification
The generated tree may overfit the training dataThe generated tree may overfit the training data

Too many branches, some may reflect anomaliesToo many branches, some may reflect anomalies
due to noise or outliersdue to noise or outliers

Result is in poor accuracy for unseen samplesResult is in poor accuracy for unseen samples
Two approaches to avoid overfittingTwo approaches to avoid overfitting

Prepruning: Halt tree construction earlyPrepruning: Halt tree construction early——do not splitdo not split
a node if this would result in the goodness measurea node if this would result in the goodness measure
falling below a thresholdfalling below a threshold
Difficult to choose an appropriate thresholdDifficult to choose an appropriate threshold

Postpruning: Remove branches from a “fully grown”Postpruning: Remove branches from a “fully grown”
treetree——get a sequence of progressively pruned treesget a sequence of progressively pruned trees
Use a set of data different from the training dataUse a set of data different from the training data
to decide which is the “best pruned tree”to decide which is the “best pruned tree”
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
Approaches to Determine the Final TreeApproaches to Determine the Final Tree
SizeSize
Separate training and testing setsSeparate training and testing sets
Use cross validation, 10-fold cross validationUse cross validation, 10-fold cross validation
Use all the data for trainingUse all the data for training

apply a statistical test (chi-square) to estimateapply a statistical test (chi-square) to estimate
whether expanding or pruning a node may improvewhether expanding or pruning a node may improve
the entire distributionthe entire distribution
Use minimum description length (MDL) principle:Use minimum description length (MDL) principle:

halting growth of the tree when the encoding ishalting growth of the tree when the encoding is
minimizedminimized
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
Enhancements to basic decision treeEnhancements to basic decision tree
inductioninduction
Allow for continuous-valued attributesAllow for continuous-valued attributes

Dynamically define new discrete-valued attributes thatDynamically define new discrete-valued attributes that
partition the continuous attribute value into a discretepartition the continuous attribute value into a discrete
set of intervalsset of intervals
Handle missing attribute valuesHandle missing attribute values

Assign the most common value of the attributeAssign the most common value of the attribute

Assign probability to each of the possible valuesAssign probability to each of the possible values
Attribute constructionAttribute construction

Create new attributes based on existing ones that areCreate new attributes based on existing ones that are
sparsely representedsparsely represented

This reduces fragmentation, repetition, and replicationThis reduces fragmentation, repetition, and replication
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
Classification in Large DatabasesClassification in Large Databases
ClassificationClassification——a classical problem extensively studied bya classical problem extensively studied by
statisticians and machine learning researchersstatisticians and machine learning researchers
Scalability: Classifying data sets with millions of examplesScalability: Classifying data sets with millions of examples
and hundreds of attributes with reasonable speedand hundreds of attributes with reasonable speed
Why decision tree induction in data mining?Why decision tree induction in data mining?

relatively faster learning speed (than other classificationrelatively faster learning speed (than other classification
methods)methods)

convertible to simple and easy to understandconvertible to simple and easy to understand
classification rulesclassification rules

can use SQL queries for accessing databasescan use SQL queries for accessing databases

comparable classification accuracy with other methodscomparable classification accuracy with other methods
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
Scalable Decision Tree Induction Methods inScalable Decision Tree Induction Methods in
Data Mining StudiesData Mining Studies
SLIQ (EDBT’96SLIQ (EDBT’96 —— Mehta et al.)Mehta et al.)

builds an index for each attribute and only class list andbuilds an index for each attribute and only class list and
the current attribute list reside in memorythe current attribute list reside in memory
SPRINT (VLDB’96SPRINT (VLDB’96 —— J. Shafer et al.)J. Shafer et al.)

constructs an attribute list data structureconstructs an attribute list data structure
PUBLIC (VLDB’98PUBLIC (VLDB’98 —— Rastogi & Shim)Rastogi & Shim)

integrates tree splitting and tree pruning: stop growingintegrates tree splitting and tree pruning: stop growing
the tree earlierthe tree earlier
RainForestRainForest (VLDB’98(VLDB’98 —— Gehrke, Ramakrishnan & Ganti)Gehrke, Ramakrishnan & Ganti)

separates the scalability aspects from the criteria thatseparates the scalability aspects from the criteria that
determine the quality of the treedetermine the quality of the tree

builds an AVC-list (attribute, value, class label)builds an AVC-list (attribute, value, class label)
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
Lecture-35Lecture-35
Bayesian ClassificationBayesian Classification
Bayesian ClassificationBayesian Classification
Statical classifiersStatical classifiers
Based on Baye’s theoremBased on Baye’s theorem
Naïve Bayesian classificationNaïve Bayesian classification
Class conditional independenceClass conditional independence
Bayesian belief netwoksBayesian belief netwoks
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
Bayesian ClassificationBayesian Classification
Probabilistic learningProbabilistic learning

Calculate explicit probabilities for hypothesis, among the mostCalculate explicit probabilities for hypothesis, among the most
practical approaches to certain types of learning problemspractical approaches to certain types of learning problems
IncrementalIncremental

Each training example can incrementally increase/decrease theEach training example can incrementally increase/decrease the
probability that a hypothesis is correct. Prior knowledge can beprobability that a hypothesis is correct. Prior knowledge can be
combined with observed data.combined with observed data.
Probabilistic predictionProbabilistic prediction

Predict multiple hypotheses, weighted by their probabilitiesPredict multiple hypotheses, weighted by their probabilities
StandardStandard

Even when Bayesian methods are computationally intractable, theyEven when Bayesian methods are computationally intractable, they
can provide a standard of optimal decision making against whichcan provide a standard of optimal decision making against which
other methods can be measuredother methods can be measured
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
Baye’s TheoremBaye’s Theorem
Let X be a data tuple and H be hypothesis,Let X be a data tuple and H be hypothesis,
such that X belongs to a specific class C.such that X belongs to a specific class C.
Posterior probability of a hypothesis h on X,Posterior probability of a hypothesis h on X,
P(h|X) follows the Baye’s theoremP(h|X) follows the Baye’s theorem
)(
)()|(
)|(
XP
HPHXP
XHP =
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
Naïve Bayes ClassifierNaïve Bayes Classifier
Let D be a training data set of tuples andLet D be a training data set of tuples and
associated class labelsassociated class labels
X = (x1, x2, x3,…xn) and m = C1, C2, C3,..CmX = (x1, x2, x3,…xn) and m = C1, C2, C3,..Cm
Bayes theorem:Bayes theorem:
P(Ci|X) = P(X|Ci)·P(Ci) / P(X)P(Ci|X) = P(X|Ci)·P(Ci) / P(X)
Naïve Baye’s predicts that X belongs to class CiNaïve Baye’s predicts that X belongs to class Ci
if and only ifif and only if
P(Ci/X) > P(Cj/X) for 1<=j<=m, i!=jP(Ci/X) > P(Cj/X) for 1<=j<=m, i!=j
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
Naïve Bayes ClassifierNaïve Bayes Classifier
P(X) is constant for all classesP(X) is constant for all classes
P(C1)=P(C2)=…..=P(Cn)P(C1)=P(C2)=…..=P(Cn)
P(X|Ci)·P(Ci) is to be maximizeP(X|Ci)·P(Ci) is to be maximize
P(Ci)=|Ci,d|/|D|P(Ci)=|Ci,d|/|D|
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
Naïve Bayesian ClassificationNaïve Bayesian Classification
Naïve assumption: attribute independenceNaïve assumption: attribute independence
P(xP(x11,…,x,…,xkk|C) = P(x|C) = P(x11|C)·…·P(x|C)·…·P(xkk|C)|C)
If attribute is categorical:If attribute is categorical:
P(xP(xii|C) is estimated as the relative freq of|C) is estimated as the relative freq of
samples having value xsamples having value xii as i-th attribute in classas i-th attribute in class
CC
If attribute is continuous:If attribute is continuous:
P(xP(xii|C) is estimated thru a Gaussian density|C) is estimated thru a Gaussian density
functionfunction
Computationally easy in both casesComputationally easy in both cases
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
Play-tennis example: estimating P(xPlay-tennis example: estimating P(xii|C)|C)
Outlook Temperature Humidity Windy Class
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N
outlookoutlook
P(sunny|p) = 2/9P(sunny|p) = 2/9 P(sunny|n) = 3/5P(sunny|n) = 3/5
P(overcast|p) = 4/9P(overcast|p) = 4/9 P(overcast|n) = 0P(overcast|n) = 0
P(rain|p) = 3/9P(rain|p) = 3/9 P(rain|n) = 2/5P(rain|n) = 2/5
temperaturetemperature
P(hot|p) = 2/9P(hot|p) = 2/9 P(hot|n) = 2/5P(hot|n) = 2/5
P(mild|p) = 4/9P(mild|p) = 4/9 P(mild|n) = 2/5P(mild|n) = 2/5
P(cool|p) = 3/9P(cool|p) = 3/9 P(cool|n) = 1/5P(cool|n) = 1/5
humidityhumidity
P(high|p) = 3/9P(high|p) = 3/9 P(high|n) = 4/5P(high|n) = 4/5
P(normal|p) = 6/9P(normal|p) = 6/9 P(normal|n) = 2/5P(normal|n) = 2/5
windywindy
P(true|p) = 3/9P(true|p) = 3/9 P(true|n) = 3/5P(true|n) = 3/5
P(false|p) = 6/9P(false|p) = 6/9 P(false|n) = 2/5P(false|n) = 2/5
P(p) = 9/14P(p) = 9/14
P(n) = 5/14P(n) = 5/14
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
Play-tennis example: classifying XPlay-tennis example: classifying X
An unseen sample X = <rain, hot, high, false>An unseen sample X = <rain, hot, high, false>
P(X|p)·P(p) =P(X|p)·P(p) =
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) =P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) =
3/9·2/9·3/9·6/9·9/14 =3/9·2/9·3/9·6/9·9/14 = 0.0105820.010582
P(X|n)·P(n) =P(X|n)·P(n) =
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) =P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) =
2/5·2/5·4/5·2/5·5/14 =2/5·2/5·4/5·2/5·5/14 = 0.0182860.018286
Sample X is classified in class n (donSample X is classified in class n (don’’t play)t play)
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
How effective are Bayesian classifiers?How effective are Bayesian classifiers?
makes computation possiblemakes computation possible
optimal classifiers when satisfiedoptimal classifiers when satisfied
but is seldom satisfied in practice, as attributesbut is seldom satisfied in practice, as attributes
(variables) are often correlated.(variables) are often correlated.
Attempts to overcome this limitation:Attempts to overcome this limitation:

Bayesian networks, that combine Bayesian reasoningBayesian networks, that combine Bayesian reasoning
with causal relationships between attributeswith causal relationships between attributes

Decision trees, that reason on one attribute at theDecision trees, that reason on one attribute at the
time, considering most important attributes firsttime, considering most important attributes first
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
Bayesian Belief Networks (I)Bayesian Belief Networks (I)
Family
History
LungCancer
PositiveXRay
Smoker
Emphysema
Dyspnea
LC
~LC
(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)
0.8
0.2
0.5
0.5
0.7
0.3
0.1
0.9
Bayesian Belief Networks
The conditional probability table
for the variable LungCancer
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
Bayesian Belief NetworksBayesian Belief Networks
Bayesian belief network allows aBayesian belief network allows a subsetsubset of the variablesof the variables
conditionally independentconditionally independent
A graphical model of causal relationshipsA graphical model of causal relationships
Several cases of learning Bayesian belief networksSeveral cases of learning Bayesian belief networks

Given both network structure and all the variables: easyGiven both network structure and all the variables: easy

Given network structure but only some variablesGiven network structure but only some variables

When the network structure is not known in advanceWhen the network structure is not known in advance
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
Lecture-36Lecture-36
Classification by BackpropagationClassification by Backpropagation
Classification by BackpropagationClassification by Backpropagation
Neural network learning algorithmNeural network learning algorithm
Psychologists and NeurobiologistsPsychologists and Neurobiologists
Neural network – set of connected input/outputNeural network – set of connected input/output
units in which each connection has a weightunits in which each connection has a weight
associated with it.associated with it.
Lecture-36 - Classification by BackpropagationLecture-36 - Classification by Backpropagation
Neural NetworksNeural Networks
AdvantagesAdvantages

prediction accuracy is generally highprediction accuracy is generally high

robust, works when training examples contain errorsrobust, works when training examples contain errors

output may be discrete, real-valued, or a vector ofoutput may be discrete, real-valued, or a vector of
several discrete or real-valued attributesseveral discrete or real-valued attributes

fast evaluation of the learned target functionfast evaluation of the learned target function
limitationslimitations

long training timelong training time

difficult to understand the learned function (weights)difficult to understand the learned function (weights)

not easy to incorporate domain knowledgenot easy to incorporate domain knowledge
Lecture-36 - Classification by BackpropagationLecture-36 - Classification by Backpropagation
A NeuronA Neuron
TheThe nn-dimensional input vector-dimensional input vector xx is mappedis mapped
into variableinto variable yy by means of the scalarby means of the scalar
product and a nonlinear function mappingproduct and a nonlinear function mapping
µk-
f
weighted
sum
Input
vector x
output y
Activation
function
weight
vector w
∑
w0
w1
wn
x0
x1
xn
Lecture-36 - Classification by BackpropagationLecture-36 - Classification by Backpropagation
Network TrainingNetwork Training
The ultimate objective of trainingThe ultimate objective of training

obtain a set of weights that makes almost all theobtain a set of weights that makes almost all the
tuples in the training data classified correctlytuples in the training data classified correctly
StepsSteps

Initialize weights with random valuesInitialize weights with random values

Feed the input tuples into the network one by oneFeed the input tuples into the network one by one

For each unitFor each unit
Compute the net input to the unit as a linear combination ofCompute the net input to the unit as a linear combination of
all the inputs to the unitall the inputs to the unit
Compute the output value using the activation functionCompute the output value using the activation function
Compute the errorCompute the error
Update the weights and the biasUpdate the weights and the bias
Lecture-36 - Classification by BackpropagationLecture-36 - Classification by Backpropagation
Multi-Layer PerceptronMulti-Layer Perceptron
Output nodes
Input nodes
Hidden nodes
Output vector
Input vector: xi
wij
∑ +=
i
jiijj OwI θ
jIj
e
O −
+
=
1
1
))(1( jjjjj OTOOErr −−=
jk
k
kjjj wErrOOErr ∑−= )1(
ijijij OErrlww )(+=
jjj Errl)(+=θθ
Lecture-36 - Classification by BackpropagationLecture-36 - Classification by Backpropagation
Lecture-37Lecture-37
Classification based on conceptsClassification based on concepts
from association rule miningfrom association rule mining
Association-Based ClassificationAssociation-Based Classification
Several methods for association-basedSeveral methods for association-based
classificationclassification

ARCS: Quantitative association mining andARCS: Quantitative association mining and
clustering of association rules (Lent et al’97)clustering of association rules (Lent et al’97)
It beats C4.5 in (mainly) scalability and alsoIt beats C4.5 in (mainly) scalability and also
accuracyaccuracy

Associative classification: (Liu et al’98)Associative classification: (Liu et al’98)
It mines high support and high confidence rules inIt mines high support and high confidence rules in
the form of “cond_set => y”, where y is a class labelthe form of “cond_set => y”, where y is a class label
Lecture-37 - Classification based on concepts from association rule miningLecture-37 - Classification based on concepts from association rule mining
Association-Based ClassificationAssociation-Based Classification

CAEP (Classification by aggregatingCAEP (Classification by aggregating
emerging patterns) (Dong et al’99)emerging patterns) (Dong et al’99)
Emerging patterns (EPs): the itemsets whoseEmerging patterns (EPs): the itemsets whose
support increases significantly from one class tosupport increases significantly from one class to
anotheranother
Mine Eps based on minimum support and growthMine Eps based on minimum support and growth
raterate
Lecture-37 - Classification based on concepts from association rule miningLecture-37 - Classification based on concepts from association rule mining
Lecture-38Lecture-38
Other Classification MethodsOther Classification Methods
Other Classification MethodsOther Classification Methods
k-nearest neighbor classifierk-nearest neighbor classifier
case-based reasoningcase-based reasoning
Genetic algorithmGenetic algorithm
Rough set approachRough set approach
Fuzzy set approachesFuzzy set approaches
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
Instance-Based MethodsInstance-Based Methods
Instance-based learning:Instance-based learning:

Store training examples and delay the processingStore training examples and delay the processing
(“lazy evaluation”) until a new instance must be(“lazy evaluation”) until a new instance must be
classifiedclassified
ApproachesApproaches

kk-nearest neighbor approach-nearest neighbor approach
Instances represented as points in a EuclideanInstances represented as points in a Euclidean
space.space.

Locally weighted regressionLocally weighted regression
Constructs local approximationConstructs local approximation

Case-based reasoningCase-based reasoning
Uses symbolic representations and knowledge-Uses symbolic representations and knowledge-
based inferencebased inference
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
TheThe kk-Nearest Neighbor Algorithm-Nearest Neighbor Algorithm
All instances correspond to points in the n-DAll instances correspond to points in the n-D
space.space.
The nearest neighbor are defined in terms ofThe nearest neighbor are defined in terms of
Euclidean distance.Euclidean distance.
The target function could be discrete- or real-The target function could be discrete- or real-
valued.valued.
For discrete-valued, theFor discrete-valued, the kk-NN returns the most-NN returns the most
common value among the k training examplescommon value among the k training examples
nearest tonearest to xxqq..
Vonoroi diagram: the decision surface inducedVonoroi diagram: the decision surface induced
by 1-NN for a typical set of training examples.by 1-NN for a typical set of training examples.
_ _
+
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
Discussion on theDiscussion on the kk-NN Algorithm-NN Algorithm
The k-NN algorithm for continuous-valued target functionsThe k-NN algorithm for continuous-valued target functions

Calculate the mean values of theCalculate the mean values of the kk nearest neighborsnearest neighbors
Distance-weighted nearest neighbor algorithmDistance-weighted nearest neighbor algorithm

Weight the contribution of each of the k neighborsWeight the contribution of each of the k neighbors
according to their distance to the query pointaccording to their distance to the query point xxqq
giving greater weight to closer neighborsgiving greater weight to closer neighbors

Similarly, for real-valued target functionsSimilarly, for real-valued target functions
Robust to noisy data by averaging k-nearest neighborsRobust to noisy data by averaging k-nearest neighbors
Curse of dimensionality: distance between neighbors couldCurse of dimensionality: distance between neighbors could
be dominated by irrelevant attributes.be dominated by irrelevant attributes.

To overcome it, axes stretch or elimination of the leastTo overcome it, axes stretch or elimination of the least
relevant attributes.relevant attributes.
w
d xq xi
≡ 1
2( , )
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
Case-Based ReasoningCase-Based Reasoning
Also uses: lazy evaluation + analyze similar instancesAlso uses: lazy evaluation + analyze similar instances
Difference: Instances are not “points in a Euclidean space”Difference: Instances are not “points in a Euclidean space”
Example: Water faucet problem in CADET (Sycara et al’92)Example: Water faucet problem in CADET (Sycara et al’92)
MethodologyMethodology

Instances represented by rich symbolic descriptionsInstances represented by rich symbolic descriptions
( function graphs)( function graphs)

Multiple retrieved cases may be combinedMultiple retrieved cases may be combined

Tight coupling between case retrieval, knowledge-basedTight coupling between case retrieval, knowledge-based
reasoning, and problem solvingreasoning, and problem solving
Research issuesResearch issues

Indexing based on syntactic similarity measure, andIndexing based on syntactic similarity measure, and
when failure, backtracking, and adapting to additionalwhen failure, backtracking, and adapting to additional
casescases
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
Lazy vs. Eager LearningLazy vs. Eager Learning
Instance-based learning: lazy evaluationInstance-based learning: lazy evaluation
Decision-tree and Bayesian classification: eager evaluationDecision-tree and Bayesian classification: eager evaluation
Key differencesKey differences

Lazy method may consider query instanceLazy method may consider query instance xqxq when deciding how towhen deciding how to
generalize beyond the training datageneralize beyond the training data DD

Eager method cannot since they have already chosen globalEager method cannot since they have already chosen global
approximation when seeing the queryapproximation when seeing the query
Efficiency: Lazy - less time training but more time predictingEfficiency: Lazy - less time training but more time predicting
AccuracyAccuracy

Lazy method effectively uses a richer hypothesis space since itLazy method effectively uses a richer hypothesis space since it
uses many local linear functions to form its implicit globaluses many local linear functions to form its implicit global
approximation to the target functionapproximation to the target function

Eager: must commit to a single hypothesis that covers the entireEager: must commit to a single hypothesis that covers the entire
instance spaceinstance space
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
Genetic AlgorithmsGenetic Algorithms
GA: based on an analogy to biological evolutionGA: based on an analogy to biological evolution
Each rule is represented by a string of bitsEach rule is represented by a string of bits
An initial population is created consisting of randomlyAn initial population is created consisting of randomly
generated rulesgenerated rules
 e.g., IF Ae.g., IF A11 and Not Aand Not A22 then Cthen C22 can be encoded as 100can be encoded as 100
Based on the notion of survival of the fittest, a newBased on the notion of survival of the fittest, a new
population is formed to consists of the fittest rules andpopulation is formed to consists of the fittest rules and
their offspringstheir offsprings
The fitness of a rule is represented by its classificationThe fitness of a rule is represented by its classification
accuracy on a set of training examplesaccuracy on a set of training examples
Offsprings are generated by crossover and mutationOffsprings are generated by crossover and mutation
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
Rough Set ApproachRough Set Approach
Rough sets are used to approximately orRough sets are used to approximately or
“roughly” define equivalent classes“roughly” define equivalent classes
A rough set for a given class C is approximatedA rough set for a given class C is approximated
by two sets: a lower approximation (certain toby two sets: a lower approximation (certain to
be in C) and an upper approximation (cannotbe in C) and an upper approximation (cannot
be described as not belonging to C)be described as not belonging to C)
Finding the minimal subsets (reducts) ofFinding the minimal subsets (reducts) of
attributes (for feature reduction) is NP-hard butattributes (for feature reduction) is NP-hard but
a discernibility matrix is used to reduce thea discernibility matrix is used to reduce the
computation intensitycomputation intensity
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
Fuzzy Set ApproachesFuzzy Set Approaches
Fuzzy logic uses truth values between 0.0 and 1.0 toFuzzy logic uses truth values between 0.0 and 1.0 to
represent the degree of membership (such as usingrepresent the degree of membership (such as using
fuzzy membership graph)fuzzy membership graph)
Attribute values are converted to fuzzy valuesAttribute values are converted to fuzzy values

e.g., income is mapped into the discrete categoriese.g., income is mapped into the discrete categories
{low, medium, high} with fuzzy values calculated{low, medium, high} with fuzzy values calculated
For a given new sample, more than one fuzzy value mayFor a given new sample, more than one fuzzy value may
applyapply
Each applicable rule contributes a vote for membershipEach applicable rule contributes a vote for membership
in the categoriesin the categories
Typically, the truth values for each predicted categoryTypically, the truth values for each predicted category
are summedare summed
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
Lecture-39Lecture-39
PredictionPrediction
What Is Prediction?What Is Prediction?
Prediction is similar to classificationPrediction is similar to classification

First, construct a modelFirst, construct a model

Second, use model to predict unknown valueSecond, use model to predict unknown value
Major method for prediction is regressionMajor method for prediction is regression

Linear and multiple regressionLinear and multiple regression

Non-linear regressionNon-linear regression
Prediction is different from classificationPrediction is different from classification

Classification refers to predict categorical class labelClassification refers to predict categorical class label

Prediction models continuous-valued functionsPrediction models continuous-valued functions
Lecture-39 - PredictionLecture-39 - Prediction
Predictive modeling: Predict data values or constructPredictive modeling: Predict data values or construct
generalized linear models based on the database data.generalized linear models based on the database data.
One can only predict value ranges or category distributionsOne can only predict value ranges or category distributions
Method outline:Method outline:

Minimal generalizationMinimal generalization

Attribute relevance analysisAttribute relevance analysis

Generalized linear model constructionGeneralized linear model construction

PredictionPrediction
Determine the major factors which influence the predictionDetermine the major factors which influence the prediction

Data relevance analysis: uncertainty measurement,Data relevance analysis: uncertainty measurement,
entropy analysis, expert judgement, etc.entropy analysis, expert judgement, etc.
Multi-level prediction: drill-down and roll-up analysisMulti-level prediction: drill-down and roll-up analysis
Predictive Modeling in DatabasesPredictive Modeling in Databases
Lecture-39 - PredictionLecture-39 - Prediction
Linear regressionLinear regression: Y =: Y = αα ++ ββ XX

Two parameters ,Two parameters , αα andand ββ specify the line and are tospecify the line and are to
be estimated by using the data at hand.be estimated by using the data at hand.

using the least squares criterion to the known values ofusing the least squares criterion to the known values of
YY11, Y, Y22, …, X, …, X11, X, X22, …., ….
Multiple regressionMultiple regression: Y = b0 + b1 X1 + b2 X2.: Y = b0 + b1 X1 + b2 X2.

Many nonlinear functions can be transformed into theMany nonlinear functions can be transformed into the
above.above.
Log-linear modelsLog-linear models::

The multi-way table of joint probabilities isThe multi-way table of joint probabilities is
approximated by a product of lower-order tables.approximated by a product of lower-order tables.

Probability:Probability: p(a, b, c, d) =p(a, b, c, d) = ααabab ββacacχχadad δδbcdbcd
Regress Analysis and Log-Linear Models inRegress Analysis and Log-Linear Models in
PredictionPrediction
Lecture-39 - PredictionLecture-39 - Prediction
Locally Weighted RegressionLocally Weighted Regression
Construct an explicit approximation toConstruct an explicit approximation to ff over a local regionover a local region
surrounding query instancesurrounding query instance xqxq..
Locally weighted linear regression:Locally weighted linear regression:

The target functionThe target function ff is approximated nearis approximated near xqxq using theusing the
linear function:linear function:

minimize the squared error: distance-decreasing weightminimize the squared error: distance-decreasing weight
KK

the gradient descent training rule:the gradient descent training rule:
In most cases, the target function is approximated by aIn most cases, the target function is approximated by a
constant, linear, or quadratic function.constant, linear, or quadratic function.
( ) ( ) ( )f x w w a x wnan x= + + +
0 1 1

E xq f x f x
x k nearest neighbors of xq
K d xq x( ) ( ( ) ( ))
_ _ _ _
( ( , ))≡ −
∈
∑
1
2
2
∆wj K d xq x f x f x a j x
x k nearest neighbors of xq
≡ −
∈
∑η ( ( , ))(( ( ) ( )) ( )
_ _ _ _
Lecture-39 - PredictionLecture-39 - Prediction
Lecture-40Lecture-40
Classification accuracyClassification accuracy
Classification Accuracy: Estimating ErrorClassification Accuracy: Estimating Error
RatesRates
Partition: Training-and-testingPartition: Training-and-testing

use two independent data sets- training set , test setuse two independent data sets- training set , test set

used for data set with large number of samplesused for data set with large number of samples
Cross-validationCross-validation

divide the data set intodivide the data set into kk subsamplessubsamples

useuse k-1k-1 subsamples as training data and one sub-subsamples as training data and one sub-
sample as test data ---sample as test data --- kk-fold cross-validation-fold cross-validation

for data set with moderate sizefor data set with moderate size
Bootstrapping (leave-one-out)Bootstrapping (leave-one-out)

for small size datafor small size data
Lecture-40 - Classification accuracyLecture-40 - Classification accuracy
Boosting and BaggingBoosting and Bagging
Boosting increases classification accuracyBoosting increases classification accuracy

Applicable to decision trees or BayesianApplicable to decision trees or Bayesian
classifierclassifier
Learn a series of classifiers, where eachLearn a series of classifiers, where each
classifier in the series pays more attention toclassifier in the series pays more attention to
the examples misclassified by its predecessorthe examples misclassified by its predecessor
Boosting requires only linear time andBoosting requires only linear time and
constant spaceconstant space
Lecture-40 - Classification accuracyLecture-40 - Classification accuracy
Boosting TechniqueBoosting Technique —— AlgorithmAlgorithm
Assign every example an equal weightAssign every example an equal weight 1/N1/N
For t = 1, 2, …, T DoFor t = 1, 2, …, T Do

Obtain a hypothesis (classifier) hObtain a hypothesis (classifier) h(t)(t)
under wunder w(t)(t)

Calculate the error ofCalculate the error of h(t)h(t) and re-weight theand re-weight the
examples based on the errorexamples based on the error

Normalize wNormalize w(t+1)(t+1)
to sum to 1to sum to 1
Output a weighted sum of all the hypothesis,Output a weighted sum of all the hypothesis,
with each hypothesis weighted according to itswith each hypothesis weighted according to its
accuracy on the training setaccuracy on the training set
Lecture-40 - Classification accuracyLecture-40 - Classification accuracy

More Related Content

What's hot

Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
DataminingTools Inc
 
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning Algorithms
Walaa Hamdy Assy
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.
Rohit Kumar
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
SOUMIT KAR
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning
ANKUSH PAL
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
Er. Nawaraj Bhandari
 
supervised learning
supervised learningsupervised learning
supervised learning
Amar Tripathi
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
EdutechLearners
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
Pavithra Thippanaik
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
Alia Hamwi
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Edureka!
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
DezyreAcademy
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
hina firdaus
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
Krish_ver2
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
Haris Jamil
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
CloudxLab
 
Data Mining
Data MiningData Mining
Data Mining
IIIT ALLAHABAD
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
Shweta Ghate
 

What's hot (20)

Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning Algorithms
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
 
Data Mining
Data MiningData Mining
Data Mining
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
 

Viewers also liked

input output Organization
input output Organizationinput output Organization
input output Organization
Acad
 
pipeline and vector processing
pipeline and vector processingpipeline and vector processing
pipeline and vector processing
Acad
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
Acad
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
Acad
 
Prim Algorithm and kruskal algorithm
Prim Algorithm and kruskal algorithmPrim Algorithm and kruskal algorithm
Prim Algorithm and kruskal algorithm
Acad
 
Branch and bound
Branch and boundBranch and bound
Branch and bound
Acad
 
c and data structures first unit notes (jntuh syllabus)
c and data structures first unit notes (jntuh syllabus)c and data structures first unit notes (jntuh syllabus)
c and data structures first unit notes (jntuh syllabus)
Acad
 
Structure and Typedef
Structure and TypedefStructure and Typedef
Structure and Typedef
Acad
 
Tiny os
Tiny osTiny os
Tiny os
Acad
 
Union from C and Data Strutures
Union from C and Data StruturesUnion from C and Data Strutures
Union from C and Data Strutures
Acad
 
Data retrieval in sensor networks
Data retrieval in sensor networksData retrieval in sensor networks
Data retrieval in sensor networks
Acad
 

Viewers also liked (11)

input output Organization
input output Organizationinput output Organization
input output Organization
 
pipeline and vector processing
pipeline and vector processingpipeline and vector processing
pipeline and vector processing
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
 
Prim Algorithm and kruskal algorithm
Prim Algorithm and kruskal algorithmPrim Algorithm and kruskal algorithm
Prim Algorithm and kruskal algorithm
 
Branch and bound
Branch and boundBranch and bound
Branch and bound
 
c and data structures first unit notes (jntuh syllabus)
c and data structures first unit notes (jntuh syllabus)c and data structures first unit notes (jntuh syllabus)
c and data structures first unit notes (jntuh syllabus)
 
Structure and Typedef
Structure and TypedefStructure and Typedef
Structure and Typedef
 
Tiny os
Tiny osTiny os
Tiny os
 
Union from C and Data Strutures
Union from C and Data StruturesUnion from C and Data Strutures
Union from C and Data Strutures
 
Data retrieval in sensor networks
Data retrieval in sensor networksData retrieval in sensor networks
Data retrieval in sensor networks
 

Similar to Classification and prediction

Dm bs-lec7-classification - dti
Dm bs-lec7-classification - dtiDm bs-lec7-classification - dti
Dm bs-lec7-classification - dti
ammarhaiderengr
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
AsrithaKorupolu
 
Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-best
ABDUmomo
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
321106410027
 
5 learning
5 learning5 learning
5 learning
Vishal Dutt
 
3 classification
3  classification3  classification
3 classification
Mahmoud Alfarra
 
DM Unit-III ppt.ppt
DM Unit-III ppt.pptDM Unit-III ppt.ppt
DM Unit-III ppt.ppt
Laxmi139487
 
Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & prediction
hktripathy
 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptx
HimanshuSharma997566
 
unit classification.pptx
unit  classification.pptxunit  classification.pptx
unit classification.pptx
ssuser908de6
 
Data mining: Classification and Prediction
Data mining: Classification and PredictionData mining: Classification and Prediction
Data mining: Classification and Prediction
Datamining Tools
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
Vidya sagar Sharma
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
Er. Nawaraj Bhandari
 
Unit 3classification
Unit 3classificationUnit 3classification
Unit 3classification
Kalpna Saharan
 
Classification in Data Mining
Classification in Data MiningClassification in Data Mining
Classification in Data Mining
Rashmi Bhat
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.ppt
Rohit Raj
 
08 classbasic
08 classbasic08 classbasic
08 classbasic
engrasi
 
08 classbasic
08 classbasic08 classbasic
08 classbasic
ritumysterious1
 

Similar to Classification and prediction (20)

Dm bs-lec7-classification - dti
Dm bs-lec7-classification - dtiDm bs-lec7-classification - dti
Dm bs-lec7-classification - dti
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
 
Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-best
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
 
5 learning
5 learning5 learning
5 learning
 
3 classification
3  classification3  classification
3 classification
 
DM Unit-III ppt.ppt
DM Unit-III ppt.pptDM Unit-III ppt.ppt
DM Unit-III ppt.ppt
 
Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & prediction
 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptx
 
unit classification.pptx
unit  classification.pptxunit  classification.pptx
unit classification.pptx
 
Data mining: Classification and Prediction
Data mining: Classification and PredictionData mining: Classification and Prediction
Data mining: Classification and Prediction
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
 
08 classbasic
08 classbasic08 classbasic
08 classbasic
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Dbm630 lecture06
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
 
Unit 3classification
Unit 3classificationUnit 3classification
Unit 3classification
 
Classification in Data Mining
Classification in Data MiningClassification in Data Mining
Classification in Data Mining
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.ppt
 
08 classbasic
08 classbasic08 classbasic
08 classbasic
 
08 classbasic
08 classbasic08 classbasic
08 classbasic
 

More from Acad

routing alg.pptx
routing alg.pptxrouting alg.pptx
routing alg.pptx
Acad
 
Network Layer design Issues.pptx
Network Layer design Issues.pptxNetwork Layer design Issues.pptx
Network Layer design Issues.pptx
Acad
 
Computer Science basics
Computer Science basics Computer Science basics
Computer Science basics
Acad
 
Union
UnionUnion
Union
Acad
 
Stacks
StacksStacks
Stacks
Acad
 
Str
StrStr
Str
Acad
 
Functions
FunctionsFunctions
Functions
Acad
 
File
FileFile
File
Acad
 
Ds
DsDs
Ds
Acad
 
Dma
DmaDma
Dma
Acad
 
Botnet detection by Imitation method
Botnet detection  by Imitation methodBotnet detection  by Imitation method
Botnet detection by Imitation method
Acad
 
Bot net detection by using ssl encryption
Bot net detection by using ssl encryptionBot net detection by using ssl encryption
Bot net detection by using ssl encryption
Acad
 
An Aggregate Location Monitoring System Of Privacy Preserving In Authenticati...
An Aggregate Location Monitoring System Of Privacy Preserving In Authenticati...An Aggregate Location Monitoring System Of Privacy Preserving In Authenticati...
An Aggregate Location Monitoring System Of Privacy Preserving In Authenticati...
Acad
 
Literature survey on peer to peer botnets
Literature survey on peer to peer botnetsLiterature survey on peer to peer botnets
Literature survey on peer to peer botnets
Acad
 
multi processors
multi processorsmulti processors
multi processors
Acad
 

More from Acad (15)

routing alg.pptx
routing alg.pptxrouting alg.pptx
routing alg.pptx
 
Network Layer design Issues.pptx
Network Layer design Issues.pptxNetwork Layer design Issues.pptx
Network Layer design Issues.pptx
 
Computer Science basics
Computer Science basics Computer Science basics
Computer Science basics
 
Union
UnionUnion
Union
 
Stacks
StacksStacks
Stacks
 
Str
StrStr
Str
 
Functions
FunctionsFunctions
Functions
 
File
FileFile
File
 
Ds
DsDs
Ds
 
Dma
DmaDma
Dma
 
Botnet detection by Imitation method
Botnet detection  by Imitation methodBotnet detection  by Imitation method
Botnet detection by Imitation method
 
Bot net detection by using ssl encryption
Bot net detection by using ssl encryptionBot net detection by using ssl encryption
Bot net detection by using ssl encryption
 
An Aggregate Location Monitoring System Of Privacy Preserving In Authenticati...
An Aggregate Location Monitoring System Of Privacy Preserving In Authenticati...An Aggregate Location Monitoring System Of Privacy Preserving In Authenticati...
An Aggregate Location Monitoring System Of Privacy Preserving In Authenticati...
 
Literature survey on peer to peer botnets
Literature survey on peer to peer botnetsLiterature survey on peer to peer botnets
Literature survey on peer to peer botnets
 
multi processors
multi processorsmulti processors
multi processors
 

Recently uploaded

2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 

Recently uploaded (20)

2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 

Classification and prediction

  • 1. Lecture-32Lecture-32 What is classification? What isWhat is classification? What is prediction?prediction?
  • 2. Classification:Classification:  predicts categorical class labelspredicts categorical class labels  classifies data (constructs a model) based on theclassifies data (constructs a model) based on the training set and the values (class labels) in atraining set and the values (class labels) in a classifying attribute and uses it in classifying new dataclassifying attribute and uses it in classifying new data Prediction:Prediction:  models continuous-valued functionsmodels continuous-valued functions  predicts unknown or missing valuespredicts unknown or missing values ApplicationsApplications  credit approvalcredit approval  target marketingtarget marketing  medical diagnosismedical diagnosis  treatment effectiveness analysistreatment effectiveness analysis What is Classification & PredictionWhat is Classification & Prediction Lecture-32 - What is classification? What is prediction?Lecture-32 - What is classification? What is prediction?
  • 3. Classification—A Two-Step ProcessClassification—A Two-Step Process Learning step- describing a set of predeterminedLearning step- describing a set of predetermined classesclasses  Each tuple/sample is assumed to belong to a predefined class,Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attributeas determined by the class label attribute  The set of tuples used for model construction: training setThe set of tuples used for model construction: training set  The model is represented as classification rules, decision trees,The model is represented as classification rules, decision trees, or mathematical formulaeor mathematical formulae Classification- for classifying future or unknown objectsClassification- for classifying future or unknown objects  Estimate accuracy of the modelEstimate accuracy of the model The known label of test sample is compared with theThe known label of test sample is compared with the classified result from the modelclassified result from the model Accuracy rate is the percentage of test set samples that areAccuracy rate is the percentage of test set samples that are correctly classified by the modelcorrectly classified by the model Test set is independent of training set, otherwise over-fittingTest set is independent of training set, otherwise over-fitting will occurwill occur Lecture-32 - What is classification? What is prediction?Lecture-32 - What is classification? What is prediction?
  • 4. Classification Process : Model ConstructionClassification Process : Model Construction Training Data NAME RANK YEARS TENURED Mike Assistant Prof 3 no Mary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes Dave Assistant Prof 6 no Anne Associate Prof 3 no Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model) Lecture-32 - What is classification? What is prediction?Lecture-32 - What is classification? What is prediction?
  • 5. Classification Process : Use the Model inClassification Process : Use the Model in PredictionPrediction Classifier Testing Data NAME RANK YEARS TENURED Tom Assistant Prof 2 no Merlisa Associate Prof 7 no George Professor 5 yes Joseph Assistant Prof 7 yes Unseen Data (Jeff, Professor, 4) Tenured? Lecture-32 - What is classification? What is prediction?Lecture-32 - What is classification? What is prediction?
  • 6. Supervised vs. Unsupervised LearningSupervised vs. Unsupervised Learning Supervised learning (classification)Supervised learning (classification)  Supervision: The training data (observations,Supervision: The training data (observations, measurements, etc.) are accompanied by labelsmeasurements, etc.) are accompanied by labels indicating the class of the observationsindicating the class of the observations  New data is classified based on the training setNew data is classified based on the training set Unsupervised learning (clustering)Unsupervised learning (clustering)  The class labels of training data is unknownThe class labels of training data is unknown  Given a set of measurements, observations, etc. withGiven a set of measurements, observations, etc. with the aim of establishing the existence of classes orthe aim of establishing the existence of classes or clusters in the dataclusters in the data Lecture-32 - What is classification? What is prediction?Lecture-32 - What is classification? What is prediction?
  • 7. Lecture-33Lecture-33 Issues regarding classificationIssues regarding classification and predictionand prediction
  • 8. Issues regarding classification andIssues regarding classification and prediction -prediction -Preparing the data forPreparing the data for classification and predictionclassification and prediction Data cleaningData cleaning  Preprocess data in order to reduce noise and handlePreprocess data in order to reduce noise and handle missing valuesmissing values Relevance analysis (feature selection)Relevance analysis (feature selection)  Remove the irrelevant or redundant attributesRemove the irrelevant or redundant attributes Data transformationData transformation  Generalize and/or normalize dataGeneralize and/or normalize data Lecture-33 - Issues regarding classification and predictionLecture-33 - Issues regarding classification and prediction
  • 9. Issues regarding classification and predictionIssues regarding classification and prediction Comparing Classification MethodsComparing Classification Methods AccuracyAccuracy Speed and scalabilitySpeed and scalability  time to construct the modeltime to construct the model  time to use the modeltime to use the model RobustnessRobustness  handling noise and missing valueshandling noise and missing values ScalabilityScalability  efficiency in disk-resident databasesefficiency in disk-resident databases Interpretability:Interpretability:  understanding and insight provded by the modelunderstanding and insight provded by the model interpretabilityinterpretability  decision tree sizedecision tree size  compactness of classification rulescompactness of classification rules Lecture-33 - Issues regarding classification and predictionLecture-33 - Issues regarding classification and prediction
  • 10. Lecture-34Lecture-34 Classification by decision treeClassification by decision tree inductioninduction Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
  • 11. Classification by Decision Tree InductionClassification by Decision Tree Induction Decision treeDecision tree  A flow-chart-like tree structureA flow-chart-like tree structure  Internal node denotes a test on an attributeInternal node denotes a test on an attribute  Branch represents an outcome of the testBranch represents an outcome of the test  Leaf nodes represent class labels or class distributionLeaf nodes represent class labels or class distribution Decision tree generation consists of two phasesDecision tree generation consists of two phases  Tree constructionTree construction At start, all the training examples are at the rootAt start, all the training examples are at the root Partition examples recursively based on selected attributesPartition examples recursively based on selected attributes  Tree pruningTree pruning Identify and remove branches that reflect noise or outliersIdentify and remove branches that reflect noise or outliers Use of decision tree: Classifying an unknown sampleUse of decision tree: Classifying an unknown sample  Test the attribute values of the sample against the decision treeTest the attribute values of the sample against the decision tree Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
  • 12. Training DatasetTraining Dataset age income student credit_rating <=30 high no fair <=30 high no excellent 31…40 high no fair >40 medium no fair >40 low yes fair >40 low yes excellent 31…40 low yes excellent <=30 medium no fair <=30 low yes fair >40 medium yes fair <=30 medium yes excellent 31…40 medium no excellent 31…40 high yes fair >40 medium no excellent This follows an example from Quinlan’s ID3 Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
  • 13. Output: A Decision Tree for “Output: A Decision Tree for “buys_computer”buys_computer” age? overcast student? credit rating? no yes fairexcellent <=30 >40 no noyes yes yes 30..40 Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
  • 14. Algorithm for Decision Tree InductionAlgorithm for Decision Tree Induction Basic algorithm (a greedy algorithm)Basic algorithm (a greedy algorithm)  Tree is constructed in a top-down recursive divide-and-conquerTree is constructed in a top-down recursive divide-and-conquer mannermanner  At start, all the training examples are at the rootAt start, all the training examples are at the root  Attributes are categorical (if continuous-valued, they areAttributes are categorical (if continuous-valued, they are discretized in advance)discretized in advance)  Examples are partitioned recursively based on selected attributesExamples are partitioned recursively based on selected attributes  Test attributes are selected on the basis of a heuristic or statisticalTest attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)measure (e.g., information gain) Conditions for stopping partitioningConditions for stopping partitioning  All samples for a given node belong to the same classAll samples for a given node belong to the same class  There are no remaining attributes for further partitioning – majorityThere are no remaining attributes for further partitioning – majority voting is employed for classifying the leafvoting is employed for classifying the leaf  There are no samples leftThere are no samples left Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
  • 15. Attribute Selection MeasureAttribute Selection Measure Information gain (ID3/C4.5)Information gain (ID3/C4.5)  All attributes are assumed to be categoricalAll attributes are assumed to be categorical  Can be modified for continuous-valued attributesCan be modified for continuous-valued attributes Gini index (IBM IntelligentMiner)Gini index (IBM IntelligentMiner)  All attributes are assumed continuous-valuedAll attributes are assumed continuous-valued  Assume there exist several possible split values for eachAssume there exist several possible split values for each attributeattribute  May need other tools, such as clustering, to get theMay need other tools, such as clustering, to get the possible split valuespossible split values  Can be modified for categorical attributesCan be modified for categorical attributes Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
  • 16. Information Gain (ID3/C4.5)Information Gain (ID3/C4.5) Select the attribute with the highest informationSelect the attribute with the highest information gaingain Assume there are two classes,Assume there are two classes, PP andand NN  Let the set of examplesLet the set of examples SS containcontain pp elements of classelements of class PP andand nn elements of classelements of class NN  The amount of information, needed to decide if anThe amount of information, needed to decide if an arbitrary example inarbitrary example in SS belongs tobelongs to PP oror NN is defined asis defined as np n np n np p np p npI ++ − ++ −= 22 loglog),( Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
  • 17. Information Gain in Decision TreeInformation Gain in Decision Tree InductionInduction Assume that using attribute A a setAssume that using attribute A a set SS will bewill be partitioned into sets {partitioned into sets {SS11,, SS22 , …,, …, SSvv}}  IfIf SSii containscontains ppii examples ofexamples of PP andand nnii examples ofexamples of NN,, the entropy, or the expected information needed tothe entropy, or the expected information needed to classify objects in all subtreesclassify objects in all subtrees SSii isis The encoding information that would be gainedThe encoding information that would be gained by branching onby branching on AA ∑ = + + = ν 1 ),()( i ii ii npI np np AE )(),()( AEnpIAGain −= Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
  • 18. Attribute Selection by Information GainAttribute Selection by Information Gain ComputationComputation  Class P: buys_computerClass P: buys_computer = “yes”= “yes”  Class N: buys_computerClass N: buys_computer = “no”= “no”  I(p, n) = I(9, 5) =0.940I(p, n) = I(9, 5) =0.940  Compute the entropy forCompute the entropy for ageage:: HenceHence SimilarlySimilarly age pi ni I(pi, ni) <=30 2 3 0.971 30…40 4 0 0 >40 3 2 0.971 69.0)2,3( 14 5 )0,4( 14 4 )3,2( 14 5 )( =+ += I IIageE 048.0)_( 151.0)( 029.0)( = = = ratingcreditGain studentGain incomeGain )(),()( ageEnpIageGain −= Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
  • 19. GiniGini Index (IBM IntelligentMiner)Index (IBM IntelligentMiner) If a data setIf a data set TT contains examples fromcontains examples from nn classes,classes, gini index,gini index, ginigini((TT) is defined as) is defined as wherewhere ppjj is the relative frequency of classis the relative frequency of class jj inin T.T. If a data setIf a data set TT is split into two subsetsis split into two subsets TT11 andand TT22 withwith sizessizes NN11 andand NN22 respectively, therespectively, the ginigini index of theindex of the split data contains examples fromsplit data contains examples from nn classes, theclasses, the ginigini indexindex ginigini((TT) is defined as) is defined as The attribute provides the smallestThe attribute provides the smallest giniginisplitsplit((TT) is) is chosen to split the node (chosen to split the node (need to enumerate allneed to enumerate all possible splitting points for each attributepossible splitting points for each attribute).). ∑ = −= n j p jTgini 1 21)( )()()( 2 2 1 1 Tgini N N Tgini N NTginisplit += Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
  • 20. Extracting Classification Rules from TreesExtracting Classification Rules from Trees Represent the knowledge in the form of IF-THEN rulesRepresent the knowledge in the form of IF-THEN rules One rule is created for each path from the root to a leafOne rule is created for each path from the root to a leaf Each attribute-value pair along a path forms a conjunctionEach attribute-value pair along a path forms a conjunction The leaf node holds the class predictionThe leaf node holds the class prediction Rules are easier for humans to understandRules are easier for humans to understand ExampleExample IFIF ageage = “<=30” AND= “<=30” AND studentstudent = “= “nono” THEN” THEN buys_computerbuys_computer = “= “nono”” IFIF ageage = “<=30” AND= “<=30” AND studentstudent = “= “yesyes” THEN” THEN buys_computerbuys_computer = “= “yesyes”” IFIF ageage = “31…40”= “31…40” THENTHEN buys_computerbuys_computer = “= “yesyes”” IFIF ageage = “>40” AND= “>40” AND credit_ratingcredit_rating = “= “excellentexcellent” THEN” THEN buys_computerbuys_computer = “= “yesyes”” IFIF ageage = “>40” AND= “>40” AND credit_ratingcredit_rating = “= “fairfair” THEN” THEN buys_computerbuys_computer = “= “nono”” Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
  • 21. Avoid Overfitting in ClassificationAvoid Overfitting in Classification The generated tree may overfit the training dataThe generated tree may overfit the training data  Too many branches, some may reflect anomaliesToo many branches, some may reflect anomalies due to noise or outliersdue to noise or outliers  Result is in poor accuracy for unseen samplesResult is in poor accuracy for unseen samples Two approaches to avoid overfittingTwo approaches to avoid overfitting  Prepruning: Halt tree construction earlyPrepruning: Halt tree construction early——do not splitdo not split a node if this would result in the goodness measurea node if this would result in the goodness measure falling below a thresholdfalling below a threshold Difficult to choose an appropriate thresholdDifficult to choose an appropriate threshold  Postpruning: Remove branches from a “fully grown”Postpruning: Remove branches from a “fully grown” treetree——get a sequence of progressively pruned treesget a sequence of progressively pruned trees Use a set of data different from the training dataUse a set of data different from the training data to decide which is the “best pruned tree”to decide which is the “best pruned tree” Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
  • 22. Approaches to Determine the Final TreeApproaches to Determine the Final Tree SizeSize Separate training and testing setsSeparate training and testing sets Use cross validation, 10-fold cross validationUse cross validation, 10-fold cross validation Use all the data for trainingUse all the data for training  apply a statistical test (chi-square) to estimateapply a statistical test (chi-square) to estimate whether expanding or pruning a node may improvewhether expanding or pruning a node may improve the entire distributionthe entire distribution Use minimum description length (MDL) principle:Use minimum description length (MDL) principle:  halting growth of the tree when the encoding ishalting growth of the tree when the encoding is minimizedminimized Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
  • 23. Enhancements to basic decision treeEnhancements to basic decision tree inductioninduction Allow for continuous-valued attributesAllow for continuous-valued attributes  Dynamically define new discrete-valued attributes thatDynamically define new discrete-valued attributes that partition the continuous attribute value into a discretepartition the continuous attribute value into a discrete set of intervalsset of intervals Handle missing attribute valuesHandle missing attribute values  Assign the most common value of the attributeAssign the most common value of the attribute  Assign probability to each of the possible valuesAssign probability to each of the possible values Attribute constructionAttribute construction  Create new attributes based on existing ones that areCreate new attributes based on existing ones that are sparsely representedsparsely represented  This reduces fragmentation, repetition, and replicationThis reduces fragmentation, repetition, and replication Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
  • 24. Classification in Large DatabasesClassification in Large Databases ClassificationClassification——a classical problem extensively studied bya classical problem extensively studied by statisticians and machine learning researchersstatisticians and machine learning researchers Scalability: Classifying data sets with millions of examplesScalability: Classifying data sets with millions of examples and hundreds of attributes with reasonable speedand hundreds of attributes with reasonable speed Why decision tree induction in data mining?Why decision tree induction in data mining?  relatively faster learning speed (than other classificationrelatively faster learning speed (than other classification methods)methods)  convertible to simple and easy to understandconvertible to simple and easy to understand classification rulesclassification rules  can use SQL queries for accessing databasescan use SQL queries for accessing databases  comparable classification accuracy with other methodscomparable classification accuracy with other methods Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
  • 25. Scalable Decision Tree Induction Methods inScalable Decision Tree Induction Methods in Data Mining StudiesData Mining Studies SLIQ (EDBT’96SLIQ (EDBT’96 —— Mehta et al.)Mehta et al.)  builds an index for each attribute and only class list andbuilds an index for each attribute and only class list and the current attribute list reside in memorythe current attribute list reside in memory SPRINT (VLDB’96SPRINT (VLDB’96 —— J. Shafer et al.)J. Shafer et al.)  constructs an attribute list data structureconstructs an attribute list data structure PUBLIC (VLDB’98PUBLIC (VLDB’98 —— Rastogi & Shim)Rastogi & Shim)  integrates tree splitting and tree pruning: stop growingintegrates tree splitting and tree pruning: stop growing the tree earlierthe tree earlier RainForestRainForest (VLDB’98(VLDB’98 —— Gehrke, Ramakrishnan & Ganti)Gehrke, Ramakrishnan & Ganti)  separates the scalability aspects from the criteria thatseparates the scalability aspects from the criteria that determine the quality of the treedetermine the quality of the tree  builds an AVC-list (attribute, value, class label)builds an AVC-list (attribute, value, class label) Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
  • 27. Bayesian ClassificationBayesian Classification Statical classifiersStatical classifiers Based on Baye’s theoremBased on Baye’s theorem Naïve Bayesian classificationNaïve Bayesian classification Class conditional independenceClass conditional independence Bayesian belief netwoksBayesian belief netwoks Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
  • 28. Bayesian ClassificationBayesian Classification Probabilistic learningProbabilistic learning  Calculate explicit probabilities for hypothesis, among the mostCalculate explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problemspractical approaches to certain types of learning problems IncrementalIncremental  Each training example can incrementally increase/decrease theEach training example can incrementally increase/decrease the probability that a hypothesis is correct. Prior knowledge can beprobability that a hypothesis is correct. Prior knowledge can be combined with observed data.combined with observed data. Probabilistic predictionProbabilistic prediction  Predict multiple hypotheses, weighted by their probabilitiesPredict multiple hypotheses, weighted by their probabilities StandardStandard  Even when Bayesian methods are computationally intractable, theyEven when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against whichcan provide a standard of optimal decision making against which other methods can be measuredother methods can be measured Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
  • 29. Baye’s TheoremBaye’s Theorem Let X be a data tuple and H be hypothesis,Let X be a data tuple and H be hypothesis, such that X belongs to a specific class C.such that X belongs to a specific class C. Posterior probability of a hypothesis h on X,Posterior probability of a hypothesis h on X, P(h|X) follows the Baye’s theoremP(h|X) follows the Baye’s theorem )( )()|( )|( XP HPHXP XHP = Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
  • 30. Naïve Bayes ClassifierNaïve Bayes Classifier Let D be a training data set of tuples andLet D be a training data set of tuples and associated class labelsassociated class labels X = (x1, x2, x3,…xn) and m = C1, C2, C3,..CmX = (x1, x2, x3,…xn) and m = C1, C2, C3,..Cm Bayes theorem:Bayes theorem: P(Ci|X) = P(X|Ci)·P(Ci) / P(X)P(Ci|X) = P(X|Ci)·P(Ci) / P(X) Naïve Baye’s predicts that X belongs to class CiNaïve Baye’s predicts that X belongs to class Ci if and only ifif and only if P(Ci/X) > P(Cj/X) for 1<=j<=m, i!=jP(Ci/X) > P(Cj/X) for 1<=j<=m, i!=j Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
  • 31. Naïve Bayes ClassifierNaïve Bayes Classifier P(X) is constant for all classesP(X) is constant for all classes P(C1)=P(C2)=…..=P(Cn)P(C1)=P(C2)=…..=P(Cn) P(X|Ci)·P(Ci) is to be maximizeP(X|Ci)·P(Ci) is to be maximize P(Ci)=|Ci,d|/|D|P(Ci)=|Ci,d|/|D| Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
  • 32. Naïve Bayesian ClassificationNaïve Bayesian Classification Naïve assumption: attribute independenceNaïve assumption: attribute independence P(xP(x11,…,x,…,xkk|C) = P(x|C) = P(x11|C)·…·P(x|C)·…·P(xkk|C)|C) If attribute is categorical:If attribute is categorical: P(xP(xii|C) is estimated as the relative freq of|C) is estimated as the relative freq of samples having value xsamples having value xii as i-th attribute in classas i-th attribute in class CC If attribute is continuous:If attribute is continuous: P(xP(xii|C) is estimated thru a Gaussian density|C) is estimated thru a Gaussian density functionfunction Computationally easy in both casesComputationally easy in both cases Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
  • 33. Play-tennis example: estimating P(xPlay-tennis example: estimating P(xii|C)|C) Outlook Temperature Humidity Windy Class sunny hot high false N sunny hot high true N overcast hot high false P rain mild high false P rain cool normal false P rain cool normal true N overcast cool normal true P sunny mild high false N sunny cool normal false P rain mild normal false P sunny mild normal true P overcast mild high true P overcast hot normal false P rain mild high true N outlookoutlook P(sunny|p) = 2/9P(sunny|p) = 2/9 P(sunny|n) = 3/5P(sunny|n) = 3/5 P(overcast|p) = 4/9P(overcast|p) = 4/9 P(overcast|n) = 0P(overcast|n) = 0 P(rain|p) = 3/9P(rain|p) = 3/9 P(rain|n) = 2/5P(rain|n) = 2/5 temperaturetemperature P(hot|p) = 2/9P(hot|p) = 2/9 P(hot|n) = 2/5P(hot|n) = 2/5 P(mild|p) = 4/9P(mild|p) = 4/9 P(mild|n) = 2/5P(mild|n) = 2/5 P(cool|p) = 3/9P(cool|p) = 3/9 P(cool|n) = 1/5P(cool|n) = 1/5 humidityhumidity P(high|p) = 3/9P(high|p) = 3/9 P(high|n) = 4/5P(high|n) = 4/5 P(normal|p) = 6/9P(normal|p) = 6/9 P(normal|n) = 2/5P(normal|n) = 2/5 windywindy P(true|p) = 3/9P(true|p) = 3/9 P(true|n) = 3/5P(true|n) = 3/5 P(false|p) = 6/9P(false|p) = 6/9 P(false|n) = 2/5P(false|n) = 2/5 P(p) = 9/14P(p) = 9/14 P(n) = 5/14P(n) = 5/14 Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
  • 34. Play-tennis example: classifying XPlay-tennis example: classifying X An unseen sample X = <rain, hot, high, false>An unseen sample X = <rain, hot, high, false> P(X|p)·P(p) =P(X|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) =P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 =3/9·2/9·3/9·6/9·9/14 = 0.0105820.010582 P(X|n)·P(n) =P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) =P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 =2/5·2/5·4/5·2/5·5/14 = 0.0182860.018286 Sample X is classified in class n (donSample X is classified in class n (don’’t play)t play) Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
  • 35. How effective are Bayesian classifiers?How effective are Bayesian classifiers? makes computation possiblemakes computation possible optimal classifiers when satisfiedoptimal classifiers when satisfied but is seldom satisfied in practice, as attributesbut is seldom satisfied in practice, as attributes (variables) are often correlated.(variables) are often correlated. Attempts to overcome this limitation:Attempts to overcome this limitation:  Bayesian networks, that combine Bayesian reasoningBayesian networks, that combine Bayesian reasoning with causal relationships between attributeswith causal relationships between attributes  Decision trees, that reason on one attribute at theDecision trees, that reason on one attribute at the time, considering most important attributes firsttime, considering most important attributes first Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
  • 36. Bayesian Belief Networks (I)Bayesian Belief Networks (I) Family History LungCancer PositiveXRay Smoker Emphysema Dyspnea LC ~LC (FH, S) (FH, ~S)(~FH, S) (~FH, ~S) 0.8 0.2 0.5 0.5 0.7 0.3 0.1 0.9 Bayesian Belief Networks The conditional probability table for the variable LungCancer Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
  • 37. Bayesian Belief NetworksBayesian Belief Networks Bayesian belief network allows aBayesian belief network allows a subsetsubset of the variablesof the variables conditionally independentconditionally independent A graphical model of causal relationshipsA graphical model of causal relationships Several cases of learning Bayesian belief networksSeveral cases of learning Bayesian belief networks  Given both network structure and all the variables: easyGiven both network structure and all the variables: easy  Given network structure but only some variablesGiven network structure but only some variables  When the network structure is not known in advanceWhen the network structure is not known in advance Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
  • 39. Classification by BackpropagationClassification by Backpropagation Neural network learning algorithmNeural network learning algorithm Psychologists and NeurobiologistsPsychologists and Neurobiologists Neural network – set of connected input/outputNeural network – set of connected input/output units in which each connection has a weightunits in which each connection has a weight associated with it.associated with it. Lecture-36 - Classification by BackpropagationLecture-36 - Classification by Backpropagation
  • 40. Neural NetworksNeural Networks AdvantagesAdvantages  prediction accuracy is generally highprediction accuracy is generally high  robust, works when training examples contain errorsrobust, works when training examples contain errors  output may be discrete, real-valued, or a vector ofoutput may be discrete, real-valued, or a vector of several discrete or real-valued attributesseveral discrete or real-valued attributes  fast evaluation of the learned target functionfast evaluation of the learned target function limitationslimitations  long training timelong training time  difficult to understand the learned function (weights)difficult to understand the learned function (weights)  not easy to incorporate domain knowledgenot easy to incorporate domain knowledge Lecture-36 - Classification by BackpropagationLecture-36 - Classification by Backpropagation
  • 41. A NeuronA Neuron TheThe nn-dimensional input vector-dimensional input vector xx is mappedis mapped into variableinto variable yy by means of the scalarby means of the scalar product and a nonlinear function mappingproduct and a nonlinear function mapping µk- f weighted sum Input vector x output y Activation function weight vector w ∑ w0 w1 wn x0 x1 xn Lecture-36 - Classification by BackpropagationLecture-36 - Classification by Backpropagation
  • 42. Network TrainingNetwork Training The ultimate objective of trainingThe ultimate objective of training  obtain a set of weights that makes almost all theobtain a set of weights that makes almost all the tuples in the training data classified correctlytuples in the training data classified correctly StepsSteps  Initialize weights with random valuesInitialize weights with random values  Feed the input tuples into the network one by oneFeed the input tuples into the network one by one  For each unitFor each unit Compute the net input to the unit as a linear combination ofCompute the net input to the unit as a linear combination of all the inputs to the unitall the inputs to the unit Compute the output value using the activation functionCompute the output value using the activation function Compute the errorCompute the error Update the weights and the biasUpdate the weights and the bias Lecture-36 - Classification by BackpropagationLecture-36 - Classification by Backpropagation
  • 43. Multi-Layer PerceptronMulti-Layer Perceptron Output nodes Input nodes Hidden nodes Output vector Input vector: xi wij ∑ += i jiijj OwI θ jIj e O − + = 1 1 ))(1( jjjjj OTOOErr −−= jk k kjjj wErrOOErr ∑−= )1( ijijij OErrlww )(+= jjj Errl)(+=θθ Lecture-36 - Classification by BackpropagationLecture-36 - Classification by Backpropagation
  • 44. Lecture-37Lecture-37 Classification based on conceptsClassification based on concepts from association rule miningfrom association rule mining
  • 45. Association-Based ClassificationAssociation-Based Classification Several methods for association-basedSeveral methods for association-based classificationclassification  ARCS: Quantitative association mining andARCS: Quantitative association mining and clustering of association rules (Lent et al’97)clustering of association rules (Lent et al’97) It beats C4.5 in (mainly) scalability and alsoIt beats C4.5 in (mainly) scalability and also accuracyaccuracy  Associative classification: (Liu et al’98)Associative classification: (Liu et al’98) It mines high support and high confidence rules inIt mines high support and high confidence rules in the form of “cond_set => y”, where y is a class labelthe form of “cond_set => y”, where y is a class label Lecture-37 - Classification based on concepts from association rule miningLecture-37 - Classification based on concepts from association rule mining
  • 46. Association-Based ClassificationAssociation-Based Classification  CAEP (Classification by aggregatingCAEP (Classification by aggregating emerging patterns) (Dong et al’99)emerging patterns) (Dong et al’99) Emerging patterns (EPs): the itemsets whoseEmerging patterns (EPs): the itemsets whose support increases significantly from one class tosupport increases significantly from one class to anotheranother Mine Eps based on minimum support and growthMine Eps based on minimum support and growth raterate Lecture-37 - Classification based on concepts from association rule miningLecture-37 - Classification based on concepts from association rule mining
  • 48. Other Classification MethodsOther Classification Methods k-nearest neighbor classifierk-nearest neighbor classifier case-based reasoningcase-based reasoning Genetic algorithmGenetic algorithm Rough set approachRough set approach Fuzzy set approachesFuzzy set approaches Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
  • 49. Instance-Based MethodsInstance-Based Methods Instance-based learning:Instance-based learning:  Store training examples and delay the processingStore training examples and delay the processing (“lazy evaluation”) until a new instance must be(“lazy evaluation”) until a new instance must be classifiedclassified ApproachesApproaches  kk-nearest neighbor approach-nearest neighbor approach Instances represented as points in a EuclideanInstances represented as points in a Euclidean space.space.  Locally weighted regressionLocally weighted regression Constructs local approximationConstructs local approximation  Case-based reasoningCase-based reasoning Uses symbolic representations and knowledge-Uses symbolic representations and knowledge- based inferencebased inference Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
  • 50. TheThe kk-Nearest Neighbor Algorithm-Nearest Neighbor Algorithm All instances correspond to points in the n-DAll instances correspond to points in the n-D space.space. The nearest neighbor are defined in terms ofThe nearest neighbor are defined in terms of Euclidean distance.Euclidean distance. The target function could be discrete- or real-The target function could be discrete- or real- valued.valued. For discrete-valued, theFor discrete-valued, the kk-NN returns the most-NN returns the most common value among the k training examplescommon value among the k training examples nearest tonearest to xxqq.. Vonoroi diagram: the decision surface inducedVonoroi diagram: the decision surface induced by 1-NN for a typical set of training examples.by 1-NN for a typical set of training examples. _ _ + Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
  • 51. Discussion on theDiscussion on the kk-NN Algorithm-NN Algorithm The k-NN algorithm for continuous-valued target functionsThe k-NN algorithm for continuous-valued target functions  Calculate the mean values of theCalculate the mean values of the kk nearest neighborsnearest neighbors Distance-weighted nearest neighbor algorithmDistance-weighted nearest neighbor algorithm  Weight the contribution of each of the k neighborsWeight the contribution of each of the k neighbors according to their distance to the query pointaccording to their distance to the query point xxqq giving greater weight to closer neighborsgiving greater weight to closer neighbors  Similarly, for real-valued target functionsSimilarly, for real-valued target functions Robust to noisy data by averaging k-nearest neighborsRobust to noisy data by averaging k-nearest neighbors Curse of dimensionality: distance between neighbors couldCurse of dimensionality: distance between neighbors could be dominated by irrelevant attributes.be dominated by irrelevant attributes.  To overcome it, axes stretch or elimination of the leastTo overcome it, axes stretch or elimination of the least relevant attributes.relevant attributes. w d xq xi ≡ 1 2( , ) Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
  • 52. Case-Based ReasoningCase-Based Reasoning Also uses: lazy evaluation + analyze similar instancesAlso uses: lazy evaluation + analyze similar instances Difference: Instances are not “points in a Euclidean space”Difference: Instances are not “points in a Euclidean space” Example: Water faucet problem in CADET (Sycara et al’92)Example: Water faucet problem in CADET (Sycara et al’92) MethodologyMethodology  Instances represented by rich symbolic descriptionsInstances represented by rich symbolic descriptions ( function graphs)( function graphs)  Multiple retrieved cases may be combinedMultiple retrieved cases may be combined  Tight coupling between case retrieval, knowledge-basedTight coupling between case retrieval, knowledge-based reasoning, and problem solvingreasoning, and problem solving Research issuesResearch issues  Indexing based on syntactic similarity measure, andIndexing based on syntactic similarity measure, and when failure, backtracking, and adapting to additionalwhen failure, backtracking, and adapting to additional casescases Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
  • 53. Lazy vs. Eager LearningLazy vs. Eager Learning Instance-based learning: lazy evaluationInstance-based learning: lazy evaluation Decision-tree and Bayesian classification: eager evaluationDecision-tree and Bayesian classification: eager evaluation Key differencesKey differences  Lazy method may consider query instanceLazy method may consider query instance xqxq when deciding how towhen deciding how to generalize beyond the training datageneralize beyond the training data DD  Eager method cannot since they have already chosen globalEager method cannot since they have already chosen global approximation when seeing the queryapproximation when seeing the query Efficiency: Lazy - less time training but more time predictingEfficiency: Lazy - less time training but more time predicting AccuracyAccuracy  Lazy method effectively uses a richer hypothesis space since itLazy method effectively uses a richer hypothesis space since it uses many local linear functions to form its implicit globaluses many local linear functions to form its implicit global approximation to the target functionapproximation to the target function  Eager: must commit to a single hypothesis that covers the entireEager: must commit to a single hypothesis that covers the entire instance spaceinstance space Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
  • 54. Genetic AlgorithmsGenetic Algorithms GA: based on an analogy to biological evolutionGA: based on an analogy to biological evolution Each rule is represented by a string of bitsEach rule is represented by a string of bits An initial population is created consisting of randomlyAn initial population is created consisting of randomly generated rulesgenerated rules  e.g., IF Ae.g., IF A11 and Not Aand Not A22 then Cthen C22 can be encoded as 100can be encoded as 100 Based on the notion of survival of the fittest, a newBased on the notion of survival of the fittest, a new population is formed to consists of the fittest rules andpopulation is formed to consists of the fittest rules and their offspringstheir offsprings The fitness of a rule is represented by its classificationThe fitness of a rule is represented by its classification accuracy on a set of training examplesaccuracy on a set of training examples Offsprings are generated by crossover and mutationOffsprings are generated by crossover and mutation Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
  • 55. Rough Set ApproachRough Set Approach Rough sets are used to approximately orRough sets are used to approximately or “roughly” define equivalent classes“roughly” define equivalent classes A rough set for a given class C is approximatedA rough set for a given class C is approximated by two sets: a lower approximation (certain toby two sets: a lower approximation (certain to be in C) and an upper approximation (cannotbe in C) and an upper approximation (cannot be described as not belonging to C)be described as not belonging to C) Finding the minimal subsets (reducts) ofFinding the minimal subsets (reducts) of attributes (for feature reduction) is NP-hard butattributes (for feature reduction) is NP-hard but a discernibility matrix is used to reduce thea discernibility matrix is used to reduce the computation intensitycomputation intensity Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
  • 56. Fuzzy Set ApproachesFuzzy Set Approaches Fuzzy logic uses truth values between 0.0 and 1.0 toFuzzy logic uses truth values between 0.0 and 1.0 to represent the degree of membership (such as usingrepresent the degree of membership (such as using fuzzy membership graph)fuzzy membership graph) Attribute values are converted to fuzzy valuesAttribute values are converted to fuzzy values  e.g., income is mapped into the discrete categoriese.g., income is mapped into the discrete categories {low, medium, high} with fuzzy values calculated{low, medium, high} with fuzzy values calculated For a given new sample, more than one fuzzy value mayFor a given new sample, more than one fuzzy value may applyapply Each applicable rule contributes a vote for membershipEach applicable rule contributes a vote for membership in the categoriesin the categories Typically, the truth values for each predicted categoryTypically, the truth values for each predicted category are summedare summed Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
  • 58. What Is Prediction?What Is Prediction? Prediction is similar to classificationPrediction is similar to classification  First, construct a modelFirst, construct a model  Second, use model to predict unknown valueSecond, use model to predict unknown value Major method for prediction is regressionMajor method for prediction is regression  Linear and multiple regressionLinear and multiple regression  Non-linear regressionNon-linear regression Prediction is different from classificationPrediction is different from classification  Classification refers to predict categorical class labelClassification refers to predict categorical class label  Prediction models continuous-valued functionsPrediction models continuous-valued functions Lecture-39 - PredictionLecture-39 - Prediction
  • 59. Predictive modeling: Predict data values or constructPredictive modeling: Predict data values or construct generalized linear models based on the database data.generalized linear models based on the database data. One can only predict value ranges or category distributionsOne can only predict value ranges or category distributions Method outline:Method outline:  Minimal generalizationMinimal generalization  Attribute relevance analysisAttribute relevance analysis  Generalized linear model constructionGeneralized linear model construction  PredictionPrediction Determine the major factors which influence the predictionDetermine the major factors which influence the prediction  Data relevance analysis: uncertainty measurement,Data relevance analysis: uncertainty measurement, entropy analysis, expert judgement, etc.entropy analysis, expert judgement, etc. Multi-level prediction: drill-down and roll-up analysisMulti-level prediction: drill-down and roll-up analysis Predictive Modeling in DatabasesPredictive Modeling in Databases Lecture-39 - PredictionLecture-39 - Prediction
  • 60. Linear regressionLinear regression: Y =: Y = αα ++ ββ XX  Two parameters ,Two parameters , αα andand ββ specify the line and are tospecify the line and are to be estimated by using the data at hand.be estimated by using the data at hand.  using the least squares criterion to the known values ofusing the least squares criterion to the known values of YY11, Y, Y22, …, X, …, X11, X, X22, …., …. Multiple regressionMultiple regression: Y = b0 + b1 X1 + b2 X2.: Y = b0 + b1 X1 + b2 X2.  Many nonlinear functions can be transformed into theMany nonlinear functions can be transformed into the above.above. Log-linear modelsLog-linear models::  The multi-way table of joint probabilities isThe multi-way table of joint probabilities is approximated by a product of lower-order tables.approximated by a product of lower-order tables.  Probability:Probability: p(a, b, c, d) =p(a, b, c, d) = ααabab ββacacχχadad δδbcdbcd Regress Analysis and Log-Linear Models inRegress Analysis and Log-Linear Models in PredictionPrediction Lecture-39 - PredictionLecture-39 - Prediction
  • 61. Locally Weighted RegressionLocally Weighted Regression Construct an explicit approximation toConstruct an explicit approximation to ff over a local regionover a local region surrounding query instancesurrounding query instance xqxq.. Locally weighted linear regression:Locally weighted linear regression:  The target functionThe target function ff is approximated nearis approximated near xqxq using theusing the linear function:linear function:  minimize the squared error: distance-decreasing weightminimize the squared error: distance-decreasing weight KK  the gradient descent training rule:the gradient descent training rule: In most cases, the target function is approximated by aIn most cases, the target function is approximated by a constant, linear, or quadratic function.constant, linear, or quadratic function. ( ) ( ) ( )f x w w a x wnan x= + + + 0 1 1  E xq f x f x x k nearest neighbors of xq K d xq x( ) ( ( ) ( )) _ _ _ _ ( ( , ))≡ − ∈ ∑ 1 2 2 ∆wj K d xq x f x f x a j x x k nearest neighbors of xq ≡ − ∈ ∑η ( ( , ))(( ( ) ( )) ( ) _ _ _ _ Lecture-39 - PredictionLecture-39 - Prediction
  • 63. Classification Accuracy: Estimating ErrorClassification Accuracy: Estimating Error RatesRates Partition: Training-and-testingPartition: Training-and-testing  use two independent data sets- training set , test setuse two independent data sets- training set , test set  used for data set with large number of samplesused for data set with large number of samples Cross-validationCross-validation  divide the data set intodivide the data set into kk subsamplessubsamples  useuse k-1k-1 subsamples as training data and one sub-subsamples as training data and one sub- sample as test data ---sample as test data --- kk-fold cross-validation-fold cross-validation  for data set with moderate sizefor data set with moderate size Bootstrapping (leave-one-out)Bootstrapping (leave-one-out)  for small size datafor small size data Lecture-40 - Classification accuracyLecture-40 - Classification accuracy
  • 64. Boosting and BaggingBoosting and Bagging Boosting increases classification accuracyBoosting increases classification accuracy  Applicable to decision trees or BayesianApplicable to decision trees or Bayesian classifierclassifier Learn a series of classifiers, where eachLearn a series of classifiers, where each classifier in the series pays more attention toclassifier in the series pays more attention to the examples misclassified by its predecessorthe examples misclassified by its predecessor Boosting requires only linear time andBoosting requires only linear time and constant spaceconstant space Lecture-40 - Classification accuracyLecture-40 - Classification accuracy
  • 65. Boosting TechniqueBoosting Technique —— AlgorithmAlgorithm Assign every example an equal weightAssign every example an equal weight 1/N1/N For t = 1, 2, …, T DoFor t = 1, 2, …, T Do  Obtain a hypothesis (classifier) hObtain a hypothesis (classifier) h(t)(t) under wunder w(t)(t)  Calculate the error ofCalculate the error of h(t)h(t) and re-weight theand re-weight the examples based on the errorexamples based on the error  Normalize wNormalize w(t+1)(t+1) to sum to 1to sum to 1 Output a weighted sum of all the hypothesis,Output a weighted sum of all the hypothesis, with each hypothesis weighted according to itswith each hypothesis weighted according to its accuracy on the training setaccuracy on the training set Lecture-40 - Classification accuracyLecture-40 - Classification accuracy