This document discusses classification and prediction. Classification predicts categorical class labels by classifying data based on a training set and class labels. Prediction models continuous values and predicts unknown values. Some applications are credit approval, marketing, medical diagnosis, and treatment analysis. Classification involves a learning step to describe classes and a classification step to classify new data. Prediction involves estimating accuracy by comparing test results to known labels. Issues with classification and prediction include data preparation, comparing methods, and decision tree induction algorithms.
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
This Decision Tree Algorithm in Machine Learning Presentation will help you understand all the basics of Decision Tree along with what Machine Learning is, what Machine Learning is, what Decision Tree is, the advantages and disadvantages of Decision Tree, how Decision Tree algorithm works with resolved examples, and at the end of the decision Tree use case/demo in Python for loan payment. For both beginners and experts who want to learn Machine Learning Algorithms, this Decision Tree tutorial is perfect.
This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision.
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
This Decision Tree Algorithm in Machine Learning Presentation will help you understand all the basics of Decision Tree along with what Machine Learning is, what Machine Learning is, what Decision Tree is, the advantages and disadvantages of Decision Tree, how Decision Tree algorithm works with resolved examples, and at the end of the decision Tree use case/demo in Python for loan payment. For both beginners and experts who want to learn Machine Learning Algorithms, this Decision Tree tutorial is perfect.
This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision.
Supervised learning: discover patterns in the data that relate data attributes with a target (class) attribute.
These patterns are then utilized to predict the values of the target attribute in future data instances.
Unsupervised learning: The data have no target attribute.
We want to explore the data to find some intrinsic structures in them.
Slides were formed by referring to the text Machine Learning by Tom M Mitchelle (Mc Graw Hill, Indian Edition) and by referring to Video tutorials on NPTEL
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaEdureka!
YouTube: https://youtu.be/xtOg44r6dsE
(** Python Data Science Training: https://www.edureka.co/python **)
In this PPT on Supervised vs Unsupervised vs Reinforcement learning, we’ll be discussing the types of machine learning and we’ll differentiate them based on a few key parameters. The following topics are covered in this session:
1. Introduction to Machine Learning
2. Types of Machine Learning
3. Supervised vs Unsupervised vs Reinforcement learning
4. Use Cases
Python Training Playlist: https://goo.gl/Na1p9G
Python Blog Series: https://bit.ly/2RVzcVE
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
List of top Machine Learning algorithms are making headway in the world of data science. Explained here are the top 10 of these machine learning algorithms - https://www.dezyre.com/article/top-10-machine-learning-algorithms/202
Ensemble Learning is a technique that creates multiple models and then combines them to produce improved results.
Ensemble learning usually produces more accurate solutions than a single model would.
In this tutorial, we will learn the the following topics -
+ Voting Classifiers
+ Bagging and Pasting
+ Random Patches and Random Subspaces
+ Random Forests
+ Boosting
+ Stacking
Supervised learning: discover patterns in the data that relate data attributes with a target (class) attribute.
These patterns are then utilized to predict the values of the target attribute in future data instances.
Unsupervised learning: The data have no target attribute.
We want to explore the data to find some intrinsic structures in them.
Slides were formed by referring to the text Machine Learning by Tom M Mitchelle (Mc Graw Hill, Indian Edition) and by referring to Video tutorials on NPTEL
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaEdureka!
YouTube: https://youtu.be/xtOg44r6dsE
(** Python Data Science Training: https://www.edureka.co/python **)
In this PPT on Supervised vs Unsupervised vs Reinforcement learning, we’ll be discussing the types of machine learning and we’ll differentiate them based on a few key parameters. The following topics are covered in this session:
1. Introduction to Machine Learning
2. Types of Machine Learning
3. Supervised vs Unsupervised vs Reinforcement learning
4. Use Cases
Python Training Playlist: https://goo.gl/Na1p9G
Python Blog Series: https://bit.ly/2RVzcVE
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
List of top Machine Learning algorithms are making headway in the world of data science. Explained here are the top 10 of these machine learning algorithms - https://www.dezyre.com/article/top-10-machine-learning-algorithms/202
Ensemble Learning is a technique that creates multiple models and then combines them to produce improved results.
Ensemble learning usually produces more accurate solutions than a single model would.
In this tutorial, we will learn the the following topics -
+ Voting Classifiers
+ Bagging and Pasting
+ Random Patches and Random Subspaces
+ Random Forests
+ Boosting
+ Stacking
Definition of classification
Basic principles of classification
Typical
How Does Classification Works?
Difference between Classification & Prediction.
Machine learning techniques
Decision Trees
k-Nearest Neighbors
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
2. Classification:Classification:
predicts categorical class labelspredicts categorical class labels
classifies data (constructs a model) based on theclassifies data (constructs a model) based on the
training set and the values (class labels) in atraining set and the values (class labels) in a
classifying attribute and uses it in classifying new dataclassifying attribute and uses it in classifying new data
Prediction:Prediction:
models continuous-valued functionsmodels continuous-valued functions
predicts unknown or missing valuespredicts unknown or missing values
ApplicationsApplications
credit approvalcredit approval
target marketingtarget marketing
medical diagnosismedical diagnosis
treatment effectiveness analysistreatment effectiveness analysis
What is Classification & PredictionWhat is Classification & Prediction
Lecture-32 - What is classification? What is prediction?Lecture-32 - What is classification? What is prediction?
3. Classification—A Two-Step ProcessClassification—A Two-Step Process
Learning step- describing a set of predeterminedLearning step- describing a set of predetermined
classesclasses
Each tuple/sample is assumed to belong to a predefined class,Each tuple/sample is assumed to belong to a predefined class,
as determined by the class label attributeas determined by the class label attribute
The set of tuples used for model construction: training setThe set of tuples used for model construction: training set
The model is represented as classification rules, decision trees,The model is represented as classification rules, decision trees,
or mathematical formulaeor mathematical formulae
Classification- for classifying future or unknown objectsClassification- for classifying future or unknown objects
Estimate accuracy of the modelEstimate accuracy of the model
The known label of test sample is compared with theThe known label of test sample is compared with the
classified result from the modelclassified result from the model
Accuracy rate is the percentage of test set samples that areAccuracy rate is the percentage of test set samples that are
correctly classified by the modelcorrectly classified by the model
Test set is independent of training set, otherwise over-fittingTest set is independent of training set, otherwise over-fitting
will occurwill occur
Lecture-32 - What is classification? What is prediction?Lecture-32 - What is classification? What is prediction?
4. Classification Process : Model ConstructionClassification Process : Model Construction
Training
Data
NAME RANK YEARS TENURED
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
Classification
Algorithms
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifier
(Model)
Lecture-32 - What is classification? What is prediction?Lecture-32 - What is classification? What is prediction?
5. Classification Process : Use the Model inClassification Process : Use the Model in
PredictionPrediction
Classifier
Testing
Data
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Unseen Data
(Jeff, Professor, 4)
Tenured?
Lecture-32 - What is classification? What is prediction?Lecture-32 - What is classification? What is prediction?
6. Supervised vs. Unsupervised LearningSupervised vs. Unsupervised Learning
Supervised learning (classification)Supervised learning (classification)
Supervision: The training data (observations,Supervision: The training data (observations,
measurements, etc.) are accompanied by labelsmeasurements, etc.) are accompanied by labels
indicating the class of the observationsindicating the class of the observations
New data is classified based on the training setNew data is classified based on the training set
Unsupervised learning (clustering)Unsupervised learning (clustering)
The class labels of training data is unknownThe class labels of training data is unknown
Given a set of measurements, observations, etc. withGiven a set of measurements, observations, etc. with
the aim of establishing the existence of classes orthe aim of establishing the existence of classes or
clusters in the dataclusters in the data
Lecture-32 - What is classification? What is prediction?Lecture-32 - What is classification? What is prediction?
8. Issues regarding classification andIssues regarding classification and
prediction -prediction -Preparing the data forPreparing the data for
classification and predictionclassification and prediction
Data cleaningData cleaning
Preprocess data in order to reduce noise and handlePreprocess data in order to reduce noise and handle
missing valuesmissing values
Relevance analysis (feature selection)Relevance analysis (feature selection)
Remove the irrelevant or redundant attributesRemove the irrelevant or redundant attributes
Data transformationData transformation
Generalize and/or normalize dataGeneralize and/or normalize data
Lecture-33 - Issues regarding classification and predictionLecture-33 - Issues regarding classification and prediction
9. Issues regarding classification and predictionIssues regarding classification and prediction
Comparing Classification MethodsComparing Classification Methods
AccuracyAccuracy
Speed and scalabilitySpeed and scalability
time to construct the modeltime to construct the model
time to use the modeltime to use the model
RobustnessRobustness
handling noise and missing valueshandling noise and missing values
ScalabilityScalability
efficiency in disk-resident databasesefficiency in disk-resident databases
Interpretability:Interpretability:
understanding and insight provded by the modelunderstanding and insight provded by the model
interpretabilityinterpretability
decision tree sizedecision tree size
compactness of classification rulescompactness of classification rules
Lecture-33 - Issues regarding classification and predictionLecture-33 - Issues regarding classification and prediction
10. Lecture-34Lecture-34
Classification by decision treeClassification by decision tree
inductioninduction
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
11. Classification by Decision Tree InductionClassification by Decision Tree Induction
Decision treeDecision tree
A flow-chart-like tree structureA flow-chart-like tree structure
Internal node denotes a test on an attributeInternal node denotes a test on an attribute
Branch represents an outcome of the testBranch represents an outcome of the test
Leaf nodes represent class labels or class distributionLeaf nodes represent class labels or class distribution
Decision tree generation consists of two phasesDecision tree generation consists of two phases
Tree constructionTree construction
At start, all the training examples are at the rootAt start, all the training examples are at the root
Partition examples recursively based on selected attributesPartition examples recursively based on selected attributes
Tree pruningTree pruning
Identify and remove branches that reflect noise or outliersIdentify and remove branches that reflect noise or outliers
Use of decision tree: Classifying an unknown sampleUse of decision tree: Classifying an unknown sample
Test the attribute values of the sample against the decision treeTest the attribute values of the sample against the decision tree
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
12. Training DatasetTraining Dataset
age income student credit_rating
<=30 high no fair
<=30 high no excellent
31…40 high no fair
>40 medium no fair
>40 low yes fair
>40 low yes excellent
31…40 low yes excellent
<=30 medium no fair
<=30 low yes fair
>40 medium yes fair
<=30 medium yes excellent
31…40 medium no excellent
31…40 high yes fair
>40 medium no excellent
This
follows
an
example
from
Quinlan’s
ID3
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
13. Output: A Decision Tree for “Output: A Decision Tree for “buys_computer”buys_computer”
age?
overcast
student? credit rating?
no yes fairexcellent
<=30 >40
no noyes yes
yes
30..40
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
14. Algorithm for Decision Tree InductionAlgorithm for Decision Tree Induction
Basic algorithm (a greedy algorithm)Basic algorithm (a greedy algorithm)
Tree is constructed in a top-down recursive divide-and-conquerTree is constructed in a top-down recursive divide-and-conquer
mannermanner
At start, all the training examples are at the rootAt start, all the training examples are at the root
Attributes are categorical (if continuous-valued, they areAttributes are categorical (if continuous-valued, they are
discretized in advance)discretized in advance)
Examples are partitioned recursively based on selected attributesExamples are partitioned recursively based on selected attributes
Test attributes are selected on the basis of a heuristic or statisticalTest attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)measure (e.g., information gain)
Conditions for stopping partitioningConditions for stopping partitioning
All samples for a given node belong to the same classAll samples for a given node belong to the same class
There are no remaining attributes for further partitioning – majorityThere are no remaining attributes for further partitioning – majority
voting is employed for classifying the leafvoting is employed for classifying the leaf
There are no samples leftThere are no samples left
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
15. Attribute Selection MeasureAttribute Selection Measure
Information gain (ID3/C4.5)Information gain (ID3/C4.5)
All attributes are assumed to be categoricalAll attributes are assumed to be categorical
Can be modified for continuous-valued attributesCan be modified for continuous-valued attributes
Gini index (IBM IntelligentMiner)Gini index (IBM IntelligentMiner)
All attributes are assumed continuous-valuedAll attributes are assumed continuous-valued
Assume there exist several possible split values for eachAssume there exist several possible split values for each
attributeattribute
May need other tools, such as clustering, to get theMay need other tools, such as clustering, to get the
possible split valuespossible split values
Can be modified for categorical attributesCan be modified for categorical attributes
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
16. Information Gain (ID3/C4.5)Information Gain (ID3/C4.5)
Select the attribute with the highest informationSelect the attribute with the highest information
gaingain
Assume there are two classes,Assume there are two classes, PP andand NN
Let the set of examplesLet the set of examples SS containcontain pp elements of classelements of class PP
andand nn elements of classelements of class NN
The amount of information, needed to decide if anThe amount of information, needed to decide if an
arbitrary example inarbitrary example in SS belongs tobelongs to PP oror NN is defined asis defined as
np
n
np
n
np
p
np
p
npI
++
−
++
−= 22 loglog),(
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
17. Information Gain in Decision TreeInformation Gain in Decision Tree
InductionInduction
Assume that using attribute A a setAssume that using attribute A a set SS will bewill be
partitioned into sets {partitioned into sets {SS11,, SS22 , …,, …, SSvv}}
IfIf SSii containscontains ppii examples ofexamples of PP andand nnii examples ofexamples of NN,,
the entropy, or the expected information needed tothe entropy, or the expected information needed to
classify objects in all subtreesclassify objects in all subtrees SSii isis
The encoding information that would be gainedThe encoding information that would be gained
by branching onby branching on AA
∑
= +
+
=
ν
1
),()(
i
ii
ii
npI
np
np
AE
)(),()( AEnpIAGain −=
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
18. Attribute Selection by Information GainAttribute Selection by Information Gain
ComputationComputation
Class P: buys_computerClass P: buys_computer
= “yes”= “yes”
Class N: buys_computerClass N: buys_computer
= “no”= “no”
I(p, n) = I(9, 5) =0.940I(p, n) = I(9, 5) =0.940
Compute the entropy forCompute the entropy for
ageage::
HenceHence
SimilarlySimilarly
age pi ni I(pi, ni)
<=30 2 3 0.971
30…40 4 0 0
>40 3 2 0.971
69.0)2,3(
14
5
)0,4(
14
4
)3,2(
14
5
)(
=+
+=
I
IIageE
048.0)_(
151.0)(
029.0)(
=
=
=
ratingcreditGain
studentGain
incomeGain
)(),()( ageEnpIageGain −=
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
19. GiniGini Index (IBM IntelligentMiner)Index (IBM IntelligentMiner)
If a data setIf a data set TT contains examples fromcontains examples from nn classes,classes,
gini index,gini index, ginigini((TT) is defined as) is defined as
wherewhere ppjj is the relative frequency of classis the relative frequency of class jj inin T.T.
If a data setIf a data set TT is split into two subsetsis split into two subsets TT11 andand TT22 withwith
sizessizes NN11 andand NN22 respectively, therespectively, the ginigini index of theindex of the
split data contains examples fromsplit data contains examples from nn classes, theclasses, the
ginigini indexindex ginigini((TT) is defined as) is defined as
The attribute provides the smallestThe attribute provides the smallest giniginisplitsplit((TT) is) is
chosen to split the node (chosen to split the node (need to enumerate allneed to enumerate all
possible splitting points for each attributepossible splitting points for each attribute).).
∑
=
−=
n
j
p jTgini
1
21)(
)()()( 2
2
1
1
Tgini
N
N
Tgini
N
NTginisplit
+=
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
20. Extracting Classification Rules from TreesExtracting Classification Rules from Trees
Represent the knowledge in the form of IF-THEN rulesRepresent the knowledge in the form of IF-THEN rules
One rule is created for each path from the root to a leafOne rule is created for each path from the root to a leaf
Each attribute-value pair along a path forms a conjunctionEach attribute-value pair along a path forms a conjunction
The leaf node holds the class predictionThe leaf node holds the class prediction
Rules are easier for humans to understandRules are easier for humans to understand
ExampleExample
IFIF ageage = “<=30” AND= “<=30” AND studentstudent = “= “nono” THEN” THEN buys_computerbuys_computer = “= “nono””
IFIF ageage = “<=30” AND= “<=30” AND studentstudent = “= “yesyes” THEN” THEN buys_computerbuys_computer = “= “yesyes””
IFIF ageage = “31…40”= “31…40” THENTHEN buys_computerbuys_computer = “= “yesyes””
IFIF ageage = “>40” AND= “>40” AND credit_ratingcredit_rating = “= “excellentexcellent” THEN” THEN
buys_computerbuys_computer = “= “yesyes””
IFIF ageage = “>40” AND= “>40” AND credit_ratingcredit_rating = “= “fairfair” THEN” THEN buys_computerbuys_computer = “= “nono””
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
21. Avoid Overfitting in ClassificationAvoid Overfitting in Classification
The generated tree may overfit the training dataThe generated tree may overfit the training data
Too many branches, some may reflect anomaliesToo many branches, some may reflect anomalies
due to noise or outliersdue to noise or outliers
Result is in poor accuracy for unseen samplesResult is in poor accuracy for unseen samples
Two approaches to avoid overfittingTwo approaches to avoid overfitting
Prepruning: Halt tree construction earlyPrepruning: Halt tree construction early——do not splitdo not split
a node if this would result in the goodness measurea node if this would result in the goodness measure
falling below a thresholdfalling below a threshold
Difficult to choose an appropriate thresholdDifficult to choose an appropriate threshold
Postpruning: Remove branches from a “fully grown”Postpruning: Remove branches from a “fully grown”
treetree——get a sequence of progressively pruned treesget a sequence of progressively pruned trees
Use a set of data different from the training dataUse a set of data different from the training data
to decide which is the “best pruned tree”to decide which is the “best pruned tree”
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
22. Approaches to Determine the Final TreeApproaches to Determine the Final Tree
SizeSize
Separate training and testing setsSeparate training and testing sets
Use cross validation, 10-fold cross validationUse cross validation, 10-fold cross validation
Use all the data for trainingUse all the data for training
apply a statistical test (chi-square) to estimateapply a statistical test (chi-square) to estimate
whether expanding or pruning a node may improvewhether expanding or pruning a node may improve
the entire distributionthe entire distribution
Use minimum description length (MDL) principle:Use minimum description length (MDL) principle:
halting growth of the tree when the encoding ishalting growth of the tree when the encoding is
minimizedminimized
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
23. Enhancements to basic decision treeEnhancements to basic decision tree
inductioninduction
Allow for continuous-valued attributesAllow for continuous-valued attributes
Dynamically define new discrete-valued attributes thatDynamically define new discrete-valued attributes that
partition the continuous attribute value into a discretepartition the continuous attribute value into a discrete
set of intervalsset of intervals
Handle missing attribute valuesHandle missing attribute values
Assign the most common value of the attributeAssign the most common value of the attribute
Assign probability to each of the possible valuesAssign probability to each of the possible values
Attribute constructionAttribute construction
Create new attributes based on existing ones that areCreate new attributes based on existing ones that are
sparsely representedsparsely represented
This reduces fragmentation, repetition, and replicationThis reduces fragmentation, repetition, and replication
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
24. Classification in Large DatabasesClassification in Large Databases
ClassificationClassification——a classical problem extensively studied bya classical problem extensively studied by
statisticians and machine learning researchersstatisticians and machine learning researchers
Scalability: Classifying data sets with millions of examplesScalability: Classifying data sets with millions of examples
and hundreds of attributes with reasonable speedand hundreds of attributes with reasonable speed
Why decision tree induction in data mining?Why decision tree induction in data mining?
relatively faster learning speed (than other classificationrelatively faster learning speed (than other classification
methods)methods)
convertible to simple and easy to understandconvertible to simple and easy to understand
classification rulesclassification rules
can use SQL queries for accessing databasescan use SQL queries for accessing databases
comparable classification accuracy with other methodscomparable classification accuracy with other methods
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
25. Scalable Decision Tree Induction Methods inScalable Decision Tree Induction Methods in
Data Mining StudiesData Mining Studies
SLIQ (EDBT’96SLIQ (EDBT’96 —— Mehta et al.)Mehta et al.)
builds an index for each attribute and only class list andbuilds an index for each attribute and only class list and
the current attribute list reside in memorythe current attribute list reside in memory
SPRINT (VLDB’96SPRINT (VLDB’96 —— J. Shafer et al.)J. Shafer et al.)
constructs an attribute list data structureconstructs an attribute list data structure
PUBLIC (VLDB’98PUBLIC (VLDB’98 —— Rastogi & Shim)Rastogi & Shim)
integrates tree splitting and tree pruning: stop growingintegrates tree splitting and tree pruning: stop growing
the tree earlierthe tree earlier
RainForestRainForest (VLDB’98(VLDB’98 —— Gehrke, Ramakrishnan & Ganti)Gehrke, Ramakrishnan & Ganti)
separates the scalability aspects from the criteria thatseparates the scalability aspects from the criteria that
determine the quality of the treedetermine the quality of the tree
builds an AVC-list (attribute, value, class label)builds an AVC-list (attribute, value, class label)
Lecture-34 - Classification by decision tree inductionLecture-34 - Classification by decision tree induction
27. Bayesian ClassificationBayesian Classification
Statical classifiersStatical classifiers
Based on Baye’s theoremBased on Baye’s theorem
Naïve Bayesian classificationNaïve Bayesian classification
Class conditional independenceClass conditional independence
Bayesian belief netwoksBayesian belief netwoks
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
28. Bayesian ClassificationBayesian Classification
Probabilistic learningProbabilistic learning
Calculate explicit probabilities for hypothesis, among the mostCalculate explicit probabilities for hypothesis, among the most
practical approaches to certain types of learning problemspractical approaches to certain types of learning problems
IncrementalIncremental
Each training example can incrementally increase/decrease theEach training example can incrementally increase/decrease the
probability that a hypothesis is correct. Prior knowledge can beprobability that a hypothesis is correct. Prior knowledge can be
combined with observed data.combined with observed data.
Probabilistic predictionProbabilistic prediction
Predict multiple hypotheses, weighted by their probabilitiesPredict multiple hypotheses, weighted by their probabilities
StandardStandard
Even when Bayesian methods are computationally intractable, theyEven when Bayesian methods are computationally intractable, they
can provide a standard of optimal decision making against whichcan provide a standard of optimal decision making against which
other methods can be measuredother methods can be measured
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
29. Baye’s TheoremBaye’s Theorem
Let X be a data tuple and H be hypothesis,Let X be a data tuple and H be hypothesis,
such that X belongs to a specific class C.such that X belongs to a specific class C.
Posterior probability of a hypothesis h on X,Posterior probability of a hypothesis h on X,
P(h|X) follows the Baye’s theoremP(h|X) follows the Baye’s theorem
)(
)()|(
)|(
XP
HPHXP
XHP =
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
30. Naïve Bayes ClassifierNaïve Bayes Classifier
Let D be a training data set of tuples andLet D be a training data set of tuples and
associated class labelsassociated class labels
X = (x1, x2, x3,…xn) and m = C1, C2, C3,..CmX = (x1, x2, x3,…xn) and m = C1, C2, C3,..Cm
Bayes theorem:Bayes theorem:
P(Ci|X) = P(X|Ci)·P(Ci) / P(X)P(Ci|X) = P(X|Ci)·P(Ci) / P(X)
Naïve Baye’s predicts that X belongs to class CiNaïve Baye’s predicts that X belongs to class Ci
if and only ifif and only if
P(Ci/X) > P(Cj/X) for 1<=j<=m, i!=jP(Ci/X) > P(Cj/X) for 1<=j<=m, i!=j
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
31. Naïve Bayes ClassifierNaïve Bayes Classifier
P(X) is constant for all classesP(X) is constant for all classes
P(C1)=P(C2)=…..=P(Cn)P(C1)=P(C2)=…..=P(Cn)
P(X|Ci)·P(Ci) is to be maximizeP(X|Ci)·P(Ci) is to be maximize
P(Ci)=|Ci,d|/|D|P(Ci)=|Ci,d|/|D|
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
32. Naïve Bayesian ClassificationNaïve Bayesian Classification
Naïve assumption: attribute independenceNaïve assumption: attribute independence
P(xP(x11,…,x,…,xkk|C) = P(x|C) = P(x11|C)·…·P(x|C)·…·P(xkk|C)|C)
If attribute is categorical:If attribute is categorical:
P(xP(xii|C) is estimated as the relative freq of|C) is estimated as the relative freq of
samples having value xsamples having value xii as i-th attribute in classas i-th attribute in class
CC
If attribute is continuous:If attribute is continuous:
P(xP(xii|C) is estimated thru a Gaussian density|C) is estimated thru a Gaussian density
functionfunction
Computationally easy in both casesComputationally easy in both cases
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
33. Play-tennis example: estimating P(xPlay-tennis example: estimating P(xii|C)|C)
Outlook Temperature Humidity Windy Class
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N
outlookoutlook
P(sunny|p) = 2/9P(sunny|p) = 2/9 P(sunny|n) = 3/5P(sunny|n) = 3/5
P(overcast|p) = 4/9P(overcast|p) = 4/9 P(overcast|n) = 0P(overcast|n) = 0
P(rain|p) = 3/9P(rain|p) = 3/9 P(rain|n) = 2/5P(rain|n) = 2/5
temperaturetemperature
P(hot|p) = 2/9P(hot|p) = 2/9 P(hot|n) = 2/5P(hot|n) = 2/5
P(mild|p) = 4/9P(mild|p) = 4/9 P(mild|n) = 2/5P(mild|n) = 2/5
P(cool|p) = 3/9P(cool|p) = 3/9 P(cool|n) = 1/5P(cool|n) = 1/5
humidityhumidity
P(high|p) = 3/9P(high|p) = 3/9 P(high|n) = 4/5P(high|n) = 4/5
P(normal|p) = 6/9P(normal|p) = 6/9 P(normal|n) = 2/5P(normal|n) = 2/5
windywindy
P(true|p) = 3/9P(true|p) = 3/9 P(true|n) = 3/5P(true|n) = 3/5
P(false|p) = 6/9P(false|p) = 6/9 P(false|n) = 2/5P(false|n) = 2/5
P(p) = 9/14P(p) = 9/14
P(n) = 5/14P(n) = 5/14
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
34. Play-tennis example: classifying XPlay-tennis example: classifying X
An unseen sample X = <rain, hot, high, false>An unseen sample X = <rain, hot, high, false>
P(X|p)·P(p) =P(X|p)·P(p) =
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) =P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) =
3/9·2/9·3/9·6/9·9/14 =3/9·2/9·3/9·6/9·9/14 = 0.0105820.010582
P(X|n)·P(n) =P(X|n)·P(n) =
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) =P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) =
2/5·2/5·4/5·2/5·5/14 =2/5·2/5·4/5·2/5·5/14 = 0.0182860.018286
Sample X is classified in class n (donSample X is classified in class n (don’’t play)t play)
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
35. How effective are Bayesian classifiers?How effective are Bayesian classifiers?
makes computation possiblemakes computation possible
optimal classifiers when satisfiedoptimal classifiers when satisfied
but is seldom satisfied in practice, as attributesbut is seldom satisfied in practice, as attributes
(variables) are often correlated.(variables) are often correlated.
Attempts to overcome this limitation:Attempts to overcome this limitation:
Bayesian networks, that combine Bayesian reasoningBayesian networks, that combine Bayesian reasoning
with causal relationships between attributeswith causal relationships between attributes
Decision trees, that reason on one attribute at theDecision trees, that reason on one attribute at the
time, considering most important attributes firsttime, considering most important attributes first
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
36. Bayesian Belief Networks (I)Bayesian Belief Networks (I)
Family
History
LungCancer
PositiveXRay
Smoker
Emphysema
Dyspnea
LC
~LC
(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)
0.8
0.2
0.5
0.5
0.7
0.3
0.1
0.9
Bayesian Belief Networks
The conditional probability table
for the variable LungCancer
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
37. Bayesian Belief NetworksBayesian Belief Networks
Bayesian belief network allows aBayesian belief network allows a subsetsubset of the variablesof the variables
conditionally independentconditionally independent
A graphical model of causal relationshipsA graphical model of causal relationships
Several cases of learning Bayesian belief networksSeveral cases of learning Bayesian belief networks
Given both network structure and all the variables: easyGiven both network structure and all the variables: easy
Given network structure but only some variablesGiven network structure but only some variables
When the network structure is not known in advanceWhen the network structure is not known in advance
Lecture-35 - Bayesian ClassificationLecture-35 - Bayesian Classification
39. Classification by BackpropagationClassification by Backpropagation
Neural network learning algorithmNeural network learning algorithm
Psychologists and NeurobiologistsPsychologists and Neurobiologists
Neural network – set of connected input/outputNeural network – set of connected input/output
units in which each connection has a weightunits in which each connection has a weight
associated with it.associated with it.
Lecture-36 - Classification by BackpropagationLecture-36 - Classification by Backpropagation
40. Neural NetworksNeural Networks
AdvantagesAdvantages
prediction accuracy is generally highprediction accuracy is generally high
robust, works when training examples contain errorsrobust, works when training examples contain errors
output may be discrete, real-valued, or a vector ofoutput may be discrete, real-valued, or a vector of
several discrete or real-valued attributesseveral discrete or real-valued attributes
fast evaluation of the learned target functionfast evaluation of the learned target function
limitationslimitations
long training timelong training time
difficult to understand the learned function (weights)difficult to understand the learned function (weights)
not easy to incorporate domain knowledgenot easy to incorporate domain knowledge
Lecture-36 - Classification by BackpropagationLecture-36 - Classification by Backpropagation
41. A NeuronA Neuron
TheThe nn-dimensional input vector-dimensional input vector xx is mappedis mapped
into variableinto variable yy by means of the scalarby means of the scalar
product and a nonlinear function mappingproduct and a nonlinear function mapping
µk-
f
weighted
sum
Input
vector x
output y
Activation
function
weight
vector w
∑
w0
w1
wn
x0
x1
xn
Lecture-36 - Classification by BackpropagationLecture-36 - Classification by Backpropagation
42. Network TrainingNetwork Training
The ultimate objective of trainingThe ultimate objective of training
obtain a set of weights that makes almost all theobtain a set of weights that makes almost all the
tuples in the training data classified correctlytuples in the training data classified correctly
StepsSteps
Initialize weights with random valuesInitialize weights with random values
Feed the input tuples into the network one by oneFeed the input tuples into the network one by one
For each unitFor each unit
Compute the net input to the unit as a linear combination ofCompute the net input to the unit as a linear combination of
all the inputs to the unitall the inputs to the unit
Compute the output value using the activation functionCompute the output value using the activation function
Compute the errorCompute the error
Update the weights and the biasUpdate the weights and the bias
Lecture-36 - Classification by BackpropagationLecture-36 - Classification by Backpropagation
43. Multi-Layer PerceptronMulti-Layer Perceptron
Output nodes
Input nodes
Hidden nodes
Output vector
Input vector: xi
wij
∑ +=
i
jiijj OwI θ
jIj
e
O −
+
=
1
1
))(1( jjjjj OTOOErr −−=
jk
k
kjjj wErrOOErr ∑−= )1(
ijijij OErrlww )(+=
jjj Errl)(+=θθ
Lecture-36 - Classification by BackpropagationLecture-36 - Classification by Backpropagation
45. Association-Based ClassificationAssociation-Based Classification
Several methods for association-basedSeveral methods for association-based
classificationclassification
ARCS: Quantitative association mining andARCS: Quantitative association mining and
clustering of association rules (Lent et al’97)clustering of association rules (Lent et al’97)
It beats C4.5 in (mainly) scalability and alsoIt beats C4.5 in (mainly) scalability and also
accuracyaccuracy
Associative classification: (Liu et al’98)Associative classification: (Liu et al’98)
It mines high support and high confidence rules inIt mines high support and high confidence rules in
the form of “cond_set => y”, where y is a class labelthe form of “cond_set => y”, where y is a class label
Lecture-37 - Classification based on concepts from association rule miningLecture-37 - Classification based on concepts from association rule mining
46. Association-Based ClassificationAssociation-Based Classification
CAEP (Classification by aggregatingCAEP (Classification by aggregating
emerging patterns) (Dong et al’99)emerging patterns) (Dong et al’99)
Emerging patterns (EPs): the itemsets whoseEmerging patterns (EPs): the itemsets whose
support increases significantly from one class tosupport increases significantly from one class to
anotheranother
Mine Eps based on minimum support and growthMine Eps based on minimum support and growth
raterate
Lecture-37 - Classification based on concepts from association rule miningLecture-37 - Classification based on concepts from association rule mining
48. Other Classification MethodsOther Classification Methods
k-nearest neighbor classifierk-nearest neighbor classifier
case-based reasoningcase-based reasoning
Genetic algorithmGenetic algorithm
Rough set approachRough set approach
Fuzzy set approachesFuzzy set approaches
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
49. Instance-Based MethodsInstance-Based Methods
Instance-based learning:Instance-based learning:
Store training examples and delay the processingStore training examples and delay the processing
(“lazy evaluation”) until a new instance must be(“lazy evaluation”) until a new instance must be
classifiedclassified
ApproachesApproaches
kk-nearest neighbor approach-nearest neighbor approach
Instances represented as points in a EuclideanInstances represented as points in a Euclidean
space.space.
Locally weighted regressionLocally weighted regression
Constructs local approximationConstructs local approximation
Case-based reasoningCase-based reasoning
Uses symbolic representations and knowledge-Uses symbolic representations and knowledge-
based inferencebased inference
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
50. TheThe kk-Nearest Neighbor Algorithm-Nearest Neighbor Algorithm
All instances correspond to points in the n-DAll instances correspond to points in the n-D
space.space.
The nearest neighbor are defined in terms ofThe nearest neighbor are defined in terms of
Euclidean distance.Euclidean distance.
The target function could be discrete- or real-The target function could be discrete- or real-
valued.valued.
For discrete-valued, theFor discrete-valued, the kk-NN returns the most-NN returns the most
common value among the k training examplescommon value among the k training examples
nearest tonearest to xxqq..
Vonoroi diagram: the decision surface inducedVonoroi diagram: the decision surface induced
by 1-NN for a typical set of training examples.by 1-NN for a typical set of training examples.
_ _
+
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
51. Discussion on theDiscussion on the kk-NN Algorithm-NN Algorithm
The k-NN algorithm for continuous-valued target functionsThe k-NN algorithm for continuous-valued target functions
Calculate the mean values of theCalculate the mean values of the kk nearest neighborsnearest neighbors
Distance-weighted nearest neighbor algorithmDistance-weighted nearest neighbor algorithm
Weight the contribution of each of the k neighborsWeight the contribution of each of the k neighbors
according to their distance to the query pointaccording to their distance to the query point xxqq
giving greater weight to closer neighborsgiving greater weight to closer neighbors
Similarly, for real-valued target functionsSimilarly, for real-valued target functions
Robust to noisy data by averaging k-nearest neighborsRobust to noisy data by averaging k-nearest neighbors
Curse of dimensionality: distance between neighbors couldCurse of dimensionality: distance between neighbors could
be dominated by irrelevant attributes.be dominated by irrelevant attributes.
To overcome it, axes stretch or elimination of the leastTo overcome it, axes stretch or elimination of the least
relevant attributes.relevant attributes.
w
d xq xi
≡ 1
2( , )
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
52. Case-Based ReasoningCase-Based Reasoning
Also uses: lazy evaluation + analyze similar instancesAlso uses: lazy evaluation + analyze similar instances
Difference: Instances are not “points in a Euclidean space”Difference: Instances are not “points in a Euclidean space”
Example: Water faucet problem in CADET (Sycara et al’92)Example: Water faucet problem in CADET (Sycara et al’92)
MethodologyMethodology
Instances represented by rich symbolic descriptionsInstances represented by rich symbolic descriptions
( function graphs)( function graphs)
Multiple retrieved cases may be combinedMultiple retrieved cases may be combined
Tight coupling between case retrieval, knowledge-basedTight coupling between case retrieval, knowledge-based
reasoning, and problem solvingreasoning, and problem solving
Research issuesResearch issues
Indexing based on syntactic similarity measure, andIndexing based on syntactic similarity measure, and
when failure, backtracking, and adapting to additionalwhen failure, backtracking, and adapting to additional
casescases
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
53. Lazy vs. Eager LearningLazy vs. Eager Learning
Instance-based learning: lazy evaluationInstance-based learning: lazy evaluation
Decision-tree and Bayesian classification: eager evaluationDecision-tree and Bayesian classification: eager evaluation
Key differencesKey differences
Lazy method may consider query instanceLazy method may consider query instance xqxq when deciding how towhen deciding how to
generalize beyond the training datageneralize beyond the training data DD
Eager method cannot since they have already chosen globalEager method cannot since they have already chosen global
approximation when seeing the queryapproximation when seeing the query
Efficiency: Lazy - less time training but more time predictingEfficiency: Lazy - less time training but more time predicting
AccuracyAccuracy
Lazy method effectively uses a richer hypothesis space since itLazy method effectively uses a richer hypothesis space since it
uses many local linear functions to form its implicit globaluses many local linear functions to form its implicit global
approximation to the target functionapproximation to the target function
Eager: must commit to a single hypothesis that covers the entireEager: must commit to a single hypothesis that covers the entire
instance spaceinstance space
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
54. Genetic AlgorithmsGenetic Algorithms
GA: based on an analogy to biological evolutionGA: based on an analogy to biological evolution
Each rule is represented by a string of bitsEach rule is represented by a string of bits
An initial population is created consisting of randomlyAn initial population is created consisting of randomly
generated rulesgenerated rules
e.g., IF Ae.g., IF A11 and Not Aand Not A22 then Cthen C22 can be encoded as 100can be encoded as 100
Based on the notion of survival of the fittest, a newBased on the notion of survival of the fittest, a new
population is formed to consists of the fittest rules andpopulation is formed to consists of the fittest rules and
their offspringstheir offsprings
The fitness of a rule is represented by its classificationThe fitness of a rule is represented by its classification
accuracy on a set of training examplesaccuracy on a set of training examples
Offsprings are generated by crossover and mutationOffsprings are generated by crossover and mutation
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
55. Rough Set ApproachRough Set Approach
Rough sets are used to approximately orRough sets are used to approximately or
“roughly” define equivalent classes“roughly” define equivalent classes
A rough set for a given class C is approximatedA rough set for a given class C is approximated
by two sets: a lower approximation (certain toby two sets: a lower approximation (certain to
be in C) and an upper approximation (cannotbe in C) and an upper approximation (cannot
be described as not belonging to C)be described as not belonging to C)
Finding the minimal subsets (reducts) ofFinding the minimal subsets (reducts) of
attributes (for feature reduction) is NP-hard butattributes (for feature reduction) is NP-hard but
a discernibility matrix is used to reduce thea discernibility matrix is used to reduce the
computation intensitycomputation intensity
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
56. Fuzzy Set ApproachesFuzzy Set Approaches
Fuzzy logic uses truth values between 0.0 and 1.0 toFuzzy logic uses truth values between 0.0 and 1.0 to
represent the degree of membership (such as usingrepresent the degree of membership (such as using
fuzzy membership graph)fuzzy membership graph)
Attribute values are converted to fuzzy valuesAttribute values are converted to fuzzy values
e.g., income is mapped into the discrete categoriese.g., income is mapped into the discrete categories
{low, medium, high} with fuzzy values calculated{low, medium, high} with fuzzy values calculated
For a given new sample, more than one fuzzy value mayFor a given new sample, more than one fuzzy value may
applyapply
Each applicable rule contributes a vote for membershipEach applicable rule contributes a vote for membership
in the categoriesin the categories
Typically, the truth values for each predicted categoryTypically, the truth values for each predicted category
are summedare summed
Lecture-38 - Other Classification MethodsLecture-38 - Other Classification Methods
58. What Is Prediction?What Is Prediction?
Prediction is similar to classificationPrediction is similar to classification
First, construct a modelFirst, construct a model
Second, use model to predict unknown valueSecond, use model to predict unknown value
Major method for prediction is regressionMajor method for prediction is regression
Linear and multiple regressionLinear and multiple regression
Non-linear regressionNon-linear regression
Prediction is different from classificationPrediction is different from classification
Classification refers to predict categorical class labelClassification refers to predict categorical class label
Prediction models continuous-valued functionsPrediction models continuous-valued functions
Lecture-39 - PredictionLecture-39 - Prediction
59. Predictive modeling: Predict data values or constructPredictive modeling: Predict data values or construct
generalized linear models based on the database data.generalized linear models based on the database data.
One can only predict value ranges or category distributionsOne can only predict value ranges or category distributions
Method outline:Method outline:
Minimal generalizationMinimal generalization
Attribute relevance analysisAttribute relevance analysis
Generalized linear model constructionGeneralized linear model construction
PredictionPrediction
Determine the major factors which influence the predictionDetermine the major factors which influence the prediction
Data relevance analysis: uncertainty measurement,Data relevance analysis: uncertainty measurement,
entropy analysis, expert judgement, etc.entropy analysis, expert judgement, etc.
Multi-level prediction: drill-down and roll-up analysisMulti-level prediction: drill-down and roll-up analysis
Predictive Modeling in DatabasesPredictive Modeling in Databases
Lecture-39 - PredictionLecture-39 - Prediction
60. Linear regressionLinear regression: Y =: Y = αα ++ ββ XX
Two parameters ,Two parameters , αα andand ββ specify the line and are tospecify the line and are to
be estimated by using the data at hand.be estimated by using the data at hand.
using the least squares criterion to the known values ofusing the least squares criterion to the known values of
YY11, Y, Y22, …, X, …, X11, X, X22, …., ….
Multiple regressionMultiple regression: Y = b0 + b1 X1 + b2 X2.: Y = b0 + b1 X1 + b2 X2.
Many nonlinear functions can be transformed into theMany nonlinear functions can be transformed into the
above.above.
Log-linear modelsLog-linear models::
The multi-way table of joint probabilities isThe multi-way table of joint probabilities is
approximated by a product of lower-order tables.approximated by a product of lower-order tables.
Probability:Probability: p(a, b, c, d) =p(a, b, c, d) = ααabab ββacacχχadad δδbcdbcd
Regress Analysis and Log-Linear Models inRegress Analysis and Log-Linear Models in
PredictionPrediction
Lecture-39 - PredictionLecture-39 - Prediction
61. Locally Weighted RegressionLocally Weighted Regression
Construct an explicit approximation toConstruct an explicit approximation to ff over a local regionover a local region
surrounding query instancesurrounding query instance xqxq..
Locally weighted linear regression:Locally weighted linear regression:
The target functionThe target function ff is approximated nearis approximated near xqxq using theusing the
linear function:linear function:
minimize the squared error: distance-decreasing weightminimize the squared error: distance-decreasing weight
KK
the gradient descent training rule:the gradient descent training rule:
In most cases, the target function is approximated by aIn most cases, the target function is approximated by a
constant, linear, or quadratic function.constant, linear, or quadratic function.
( ) ( ) ( )f x w w a x wnan x= + + +
0 1 1
E xq f x f x
x k nearest neighbors of xq
K d xq x( ) ( ( ) ( ))
_ _ _ _
( ( , ))≡ −
∈
∑
1
2
2
∆wj K d xq x f x f x a j x
x k nearest neighbors of xq
≡ −
∈
∑η ( ( , ))(( ( ) ( )) ( )
_ _ _ _
Lecture-39 - PredictionLecture-39 - Prediction
63. Classification Accuracy: Estimating ErrorClassification Accuracy: Estimating Error
RatesRates
Partition: Training-and-testingPartition: Training-and-testing
use two independent data sets- training set , test setuse two independent data sets- training set , test set
used for data set with large number of samplesused for data set with large number of samples
Cross-validationCross-validation
divide the data set intodivide the data set into kk subsamplessubsamples
useuse k-1k-1 subsamples as training data and one sub-subsamples as training data and one sub-
sample as test data ---sample as test data --- kk-fold cross-validation-fold cross-validation
for data set with moderate sizefor data set with moderate size
Bootstrapping (leave-one-out)Bootstrapping (leave-one-out)
for small size datafor small size data
Lecture-40 - Classification accuracyLecture-40 - Classification accuracy
64. Boosting and BaggingBoosting and Bagging
Boosting increases classification accuracyBoosting increases classification accuracy
Applicable to decision trees or BayesianApplicable to decision trees or Bayesian
classifierclassifier
Learn a series of classifiers, where eachLearn a series of classifiers, where each
classifier in the series pays more attention toclassifier in the series pays more attention to
the examples misclassified by its predecessorthe examples misclassified by its predecessor
Boosting requires only linear time andBoosting requires only linear time and
constant spaceconstant space
Lecture-40 - Classification accuracyLecture-40 - Classification accuracy
65. Boosting TechniqueBoosting Technique —— AlgorithmAlgorithm
Assign every example an equal weightAssign every example an equal weight 1/N1/N
For t = 1, 2, …, T DoFor t = 1, 2, …, T Do
Obtain a hypothesis (classifier) hObtain a hypothesis (classifier) h(t)(t)
under wunder w(t)(t)
Calculate the error ofCalculate the error of h(t)h(t) and re-weight theand re-weight the
examples based on the errorexamples based on the error
Normalize wNormalize w(t+1)(t+1)
to sum to 1to sum to 1
Output a weighted sum of all the hypothesis,Output a weighted sum of all the hypothesis,
with each hypothesis weighted according to itswith each hypothesis weighted according to its
accuracy on the training setaccuracy on the training set
Lecture-40 - Classification accuracyLecture-40 - Classification accuracy