This document discusses various techniques for data classification including decision tree induction, Bayesian classification methods, rule-based classification, and classification by backpropagation. It covers key concepts such as supervised vs. unsupervised learning, training data vs. test data, and issues around preprocessing data for classification. The document also discusses evaluating classification models using metrics like accuracy, precision, recall, and F-measures as well as techniques like holdout validation, cross-validation, and bootstrap.
Data mining Basics and complete description Sulman Ahmed
This course is all about the data mining techniques and how we mine the data and get optimize results.This course is all about the data mining techniques and how we mine the data and get optimize results.This course is all about the data mining techniques and how we mine the data and get optimize results.This course is all about the data mining techniques and how we mine the data and get optimize results.This course is all about the data mining techniques and how we mine the data and get optimize results.This course is all about the data mining techniques and how we mine the data and get optimize results.This course is all about the data mining techniques and how we mine the data and get optimize results
Survey on Various Classification Techniques in Data Miningijsrd.com
Dynamic Classification is an information mining (machine learning) strategy used to anticipate bunch participation for information cases. In this paper, we show the essential arrangement systems. A few significant sorts of arrangement technique including induction, Bayesian networks, k-nearest neighbor classifier, case-based reasoning, genetic algorithm and fuzzy logic techniques. The objective of this review is to give a complete audit of distinctive characterization procedures in information mining.
UNIT 3: Data Warehousing and Data MiningNandakumar P
UNIT-III Classification and Prediction: Issues Regarding Classification and Prediction – Classification by Decision Tree Introduction – Bayesian Classification – Rule Based Classification – Classification by Back propagation – Support Vector Machines – Associative Classification – Lazy Learners – Other Classification Methods – Prediction – Accuracy and Error Measures – Evaluating the Accuracy of a Classifier or Predictor – Ensemble Methods – Model Section.
Data mining Basics and complete description Sulman Ahmed
This course is all about the data mining techniques and how we mine the data and get optimize results.This course is all about the data mining techniques and how we mine the data and get optimize results.This course is all about the data mining techniques and how we mine the data and get optimize results.This course is all about the data mining techniques and how we mine the data and get optimize results.This course is all about the data mining techniques and how we mine the data and get optimize results.This course is all about the data mining techniques and how we mine the data and get optimize results.This course is all about the data mining techniques and how we mine the data and get optimize results
Survey on Various Classification Techniques in Data Miningijsrd.com
Dynamic Classification is an information mining (machine learning) strategy used to anticipate bunch participation for information cases. In this paper, we show the essential arrangement systems. A few significant sorts of arrangement technique including induction, Bayesian networks, k-nearest neighbor classifier, case-based reasoning, genetic algorithm and fuzzy logic techniques. The objective of this review is to give a complete audit of distinctive characterization procedures in information mining.
UNIT 3: Data Warehousing and Data MiningNandakumar P
UNIT-III Classification and Prediction: Issues Regarding Classification and Prediction – Classification by Decision Tree Introduction – Bayesian Classification – Rule Based Classification – Classification by Back propagation – Support Vector Machines – Associative Classification – Lazy Learners – Other Classification Methods – Prediction – Accuracy and Error Measures – Evaluating the Accuracy of a Classifier or Predictor – Ensemble Methods – Model Section.
Classification Using Decision Trees and RulesChapter 5.docxmonicafrancis71118
Classification Using Decision
Trees and Rules
Chapter 5
Introduction
• Decision tree learners use a tree structure to model the relationships
among the features and the potential outcomes.
• a structure of branching decisions into a final predicted class value
• Decision begins at the root node, then passed through decision nodes
that require choices.
• Choices split the data across branches that indicate potential
outcomes of a decision
• Tree is terminated by leaf nodes that denote the action to be taken as
the result of the series of decisions.
Decision Tree Example
Benefits
• Flowchart-like tree structure is not necessarily exclusively for the
learner's internal use.
• Resulting structure in a human-readable format.
• Provides insight into how and why the model works or doesn't work well for a
particular task.
• Useful where classification mechanism needs to be transparent for legal reasons, or in
case the results need to be shared with others in order to inform future business
practices
• Credit scoring models where criteria that causes an applicant to be rejected need to be clearly
documented and free from bias
• Marketing studies of customer behavior such as satisfaction or churn, which will be shared
with management or advertising agencies
• Diagnosis of medical conditions based on laboratory measurements, symptoms, or the rate of
disease progression
Applicability
• Widely used machine learning technique
• Can be applied to model almost any type of data with excellent
results
• Does not fit task where the data has a large number of nominal
features with many levels or it has a large number of numeric
features.
• Result in large number of decisions and an overly complex tree.
• Tendency of decision trees to overfit data, though this can be overcome by
adjusting some simple parameters
Divide and Conquer
• Decision trees are built using a heuristic called recursive partitioning.
• Divide and conquer because it splits the data into subsets, which are then
split repeatedly into even smaller subsets,
• Stops when the data within the subsets are sufficiently homogenous, or
another stopping criterion has been met.
• Root node represents the entire dataset
• Algorithm must choose a feature to split upon
• Choose the feature most predictive of the target class.
• Algorithm continues to divide and conquer the data, choosing the best
candidate feature each time to create another decision node, until a stopping
criterion is reached.
Divide and Conquer
• Stopping Conditions
• All (or nearly all) of the examples at the node have the same class
• There are no remaining features to distinguish among the examples
• The tree has grown to a predefined size limit
Example
• Finding potential for a movie- Box Office Bust, Mainstream Hit, Critical Success
• Diagonal lines might have split the data even more cleanly.
• Limitation of the decision tree's knowledge representation, whi.
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Maninda Edirisooriya
Model Testing and Evaluation is a lesson where you learn how to train different ML models with changes and evaluating them to select the best model out of them. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
2. Classification-Basic Concept
• Decision Tree Induction – Bayes Classification Methods – Rule-Based Classification – Model Evaluation and
Selection – Techniques to improve Classification Accuracy - Bayesian Belief Networks – Classification by
Backpropagation.
• Classification and prediction are two forms of data analysis that can be used to extract models describing
important data classes or to predict future data trends
• Classification predicts categorical, discrete and unordered labels
• Prediction model –continuous valued functions
• eg,., bank loan application, all electronics
• Regression analysis is a statistical methodology that is most often used for numeric prediction.
• Data classification is a two step process
Learning
Classification
3. Classification-Basic Concept
• What is trained data and test data?
• What is supervised learning and unsupervised learning?
• Issues regarding classification and prediction
Pre processing the data for classification and prediction
Data cleaning
Relevance analysis
Data transformation and reduction-normalization
• Comparing classification and prediction methods
Accuracy , speed, robustness, scalability,interpretability
4.
5. Decision Tree Induction
• Decision tree induction is the learning of decision tree from class-
labelled training tuples.
• A decision tree is a flow chart like tree structure where internal
node(non leaf node) denotes a test on an attribute, each branch
represents an outcome of the test and each leaf node(or terminal node)
holds a class label.
• The topmost node in a tree is the root node.
6.
7. Decision Tree Induction
• How decision tree used for classification?
• Why are decision tree classifiers so popular?
• Attribute selection measures are used to select the attribute that best
partitions the tuples into distinct classes
• Tree pruning attempts to identify and remove such branches with the
goal of improving classification accuracy on unseen data.
8. Decision Tree Induction
• J. Ross Quinlan, a researcher in machine learning developed a decision tree algorithm known as
ID3(Iterative Dichotomiser)
• Successor of ID3-C4.5
• Classification and regression tree(CART)
• ID3,C4.4 and CART adopt greedy(i.e., nonbacktracking),top-down recursive divide and conquer
manner.
• Attribute selection Measures: Information Gain, Gain Ratio, Gini Index
• Other attribute selection measures
• Tree pruning
• Scalability and decision tree induction
• Visual mining for decision tree induction
15. Bayes classification Methods
• What are Bayesian classifiers?
Bayesian classifiers are statistical classifiers.
They can predict class membership probabilities such as the probability that a
given tuple belongs to a particular class.
Bayesian classification is based on bayes theorem.
Studies comparing classification algorithms have found a simple Bayesian
classifier known as the naïve Bayesian classifier.-to be comparable in performance
with decision tree and selected neural network classifiers.
Bayesian classifiers have also exhibited high accuracy and speed when applied to
large database.
16. Bayes classification Methods
Naïve Bayesian classifiers assume that the effect of an attribute value
on a given class is independent of the values of the other attributes.
This assumption is called class conditional independence.
Bayes Theorem
Naïve Bayesian classification
Bayesian belief networks
Training Bayesian belief networks.
17. Bayesian belief networks
• Basic concepts and Mechanisms
• Training Bayesian Belief Networks
Compute the gradients
Take a small step in the direction of the gradient
Renormalize the weights
20. Classification by back propagation
• Defining network topology
• Backpropagation
Initialize the weights
Propagate the inputs forward
Backpropagate the error: updating the weights and biases after the
presentation of each tuple .this is referred to as case updating
Alternatively, the weights and biases increments could be accumulated
in variables, so that the weights and biases are updated after all the
tuples in the training set have been presented. This is called epoch
updating ,where one iteration through the training set is an epoch
21. Classification by back propagation
Terminating conditions
• Inside the black box: backpropagation and interpretability
29. Rule based classification
• Using IF-THEN rules for classification: Size ordering and Rule
ordering
• Rule extraction from a decision tree
• Rule induction using a sequential covering algorithm
Rule quality measures
Rule pruning
33. Rule extraction from a decision tree
• To extract rules from a decision tree, one rule is created for each path
from the root to leaf node.
• Each splitting criterion along a given path is logically ANDed to form
the rule antecedent(“IF “ part).The leaf node holds the class prediction,
forming the rule consequent(“THEN” part).
36. Rule Quality Measure
• Learn_one_rule needs a measure of rule quality. Every time it
considers an attribute test, it must check to see if appending such a test
to the current rule’s condition will result in an improved rule.
38. Model Evaluation and Selection
Choosing one classifier over another
• Metrics for evaluating classifier performance
Positive tuples(tuples of main class of interest)
Negative tuples(all other tuples)
• Holdout method and random subsampling
• Cross validation
• Bootstrap
• Model selection using statistical tests of significance
• Comparing classifiers based on cost-benefit and ROC curves
42. Model Evaluation and Selection
• Evaluation metrics: How can we measure accuracy? Other metrics to consider?
• Use validation test set of class-labeled tuples instead of training set when
assessing accuracy
• Methods for estimating a classifier’s accuracy:
• Holdout method, random subsampling
• Cross-validation
• Bootstrap
• Comparing classifiers:
• Confidence intervals
• Cost-benefit analysis and ROC Curves
43. Classifier Evaluation Metrics:
Confusion Matrix
Actual classPredicted
class
buy_computer
= yes
buy_computer
= no
Total
buy_computer = yes 6954 46 7000
buy_computer = no 412 2588 3000
Total 7366 2634 10000
• Given m classes, an entry, CMi,j in a confusion matrix indicates #
of tuples in class i that were labeled by the classifier as class j
• May have extra rows/columns to provide totals
Confusion Matrix:
Actual classPredicted class C1 ¬ C1
C1 True Positives (TP) False Negatives (FN)
¬ C1 False Positives (FP) True Negatives (TN)
Example of Confusion Matrix:
43
44. Classifier Evaluation Metrics: Accuracy, Error Rate,
Sensitivity and Specificity
• Classifier Accuracy, or recognition
rate: percentage of test set tuples
that are correctly classified
Accuracy = (TP + TN)/All
• Error rate: 1 – accuracy, or
Error rate = (FP + FN)/All
Class Imbalance Problem:
One class may be rare, e.g.
fraud, or HIV-positive
Significant majority of the
negative class and minority of
the positive class
Sensitivity: True Positive
recognition rate
Sensitivity = TP/P
Specificity: True Negative
recognition rate
Specificity = TN/N
AP C ¬C
C TP FN P
¬C FP TN N
P’ N’ All
44
45. Classifier Evaluation Metrics:
Precision and Recall, and F-measures
• Precision: exactness – what % of tuples that the classifier
labeled as positive are actually positive
• Recall: completeness – what % of positive tuples did the
classifier label as positive?
• Perfect score is 1.0
• Inverse relationship between precision & recall
• F measure (F1 or F-score): harmonic mean of precision and
recall,
• Fß: weighted measure of precision and recall
• assigns ß times as much weight to recall as to precision
45
46. Classifier Evaluation Metrics:
Example
46
Precision = 90/230 = 39.13% Recall = 90/300 = 30.00%
Actual ClassPredicted class cancer = yes cancer = no Total Recognition(%)
cancer = yes 90 210 300 30.00 (sensitivity
cancer = no 140 9560 9700 98.56 (specificity)
Total 230 9770 10000 96.40 (accuracy)
47. Evaluating Classifier Accuracy:
Holdout & Cross-Validation Methods
• Holdout method
• Given data is randomly partitioned into two independent sets
• Training set (e.g., 2/3) for model construction
• Test set (e.g., 1/3) for accuracy estimation
• Random sampling: a variation of holdout
• Repeat holdout k times, accuracy = avg. of the accuracies obtained
• Cross-validation (k-fold, where k = 10 is most popular)
• Randomly partition the data into k mutually exclusive subsets,
each approximately equal size
• At i-th iteration, use Di as test set and others as training set
• Leave-one-out: k folds where k = # of tuples, for small sized
data
• *Stratified cross-validation*: folds are stratified so that class
dist. in each fold is approx. the same as that in the initial data
47
48. Evaluating Classifier Accuracy:
Bootstrap
• Bootstrap
• Works well with small data sets
• Samples the given training tuples uniformly with replacement
• i.e., each time a tuple is selected, it is equally likely to be selected
again and re-added to the training set
• Several bootstrap methods, and a common one is .632 boostrap
• A data set with d tuples is sampled d times, with replacement, resulting in
a training set of d samples. The data tuples that did not make it into the
training set end up forming the test set. About 63.2% of the original data
end up in the bootstrap, and the remaining 36.8% form the test set (since
(1 – 1/d)d ≈ e-1 = 0.368)
• Repeat the sampling procedure k times, overall accuracy of the model:
48
49. Estimating Confidence Intervals:
Classifier Models M1 vs. M2
• Suppose we have 2 classifiers, M1 and M2, which one is better?
• Use 10-fold cross-validation to obtain and
• These mean error rates are just estimates of error on the true population of
future data cases
• What if the difference between the 2 error rates is just attributed to chance?
• Use a test of statistical significance
• Obtain confidence limits for our error estimates
49
50. Estimating Confidence Intervals:
Null Hypothesis
• Perform 10-fold cross-validation
• Assume samples follow a t distribution with k–1 degrees of freedom (here, k=10)
• Use t-test (or Student’s t-test)
• Null Hypothesis: M1 & M2 are the same
• If we can reject null hypothesis, then
• we conclude that the difference between M1 & M2 is statistically significant
• Chose model with lower error rate
50
51. Estimating Confidence Intervals: t-test
• If only 1 test set available: pairwise comparison
• For ith round of 10-fold cross-validation, the same cross partitioning is used to
obtain err(M1)i and err(M2)i
• Average over 10 rounds to get
• t-test computes t-statistic with k-1 degrees of freedom:
• If two test sets available: use non-paired t-test
where
and
where
where k1 & k2 are # of cross-validation samples used for M1 & M2, resp.
51
52. Estimating Confidence Intervals:
Table for t-distribution
• Symmetric
• Significance level,
e.g., sig = 0.05 or 5%
means M1 & M2 are
significantly different
for 95% of
population
• Confidence limit, z =
sig/2
52
53. Estimating Confidence Intervals:
Statistical Significance
• Are M1 & M2 significantly different?
• Compute t. Select significance level (e.g. sig = 5%)
• Consult table for t-distribution: Find t value corresponding to k-1 degrees of
freedom (here, 9)
• t-distribution is symmetric: typically upper % points of distribution shown →
look up value for confidence limit z=sig/2 (here, 0.025)
• If t > z or t < -z, then t value lies in rejection region:
• Reject null hypothesis that mean error rates of M1 & M2 are same
• Conclude: statistically significant difference between M1 & M2
• Otherwise, conclude that any difference is chance
53
54. Model Selection: ROC Curves
• ROC (Receiver Operating
Characteristics) curves: for visual
comparison of classification models
• Originated from signal detection theory
• Shows the trade-off between the true
positive rate and the false positive rate
• The area under the ROC curve is a
measure of the accuracy of the model
• Rank the test tuples in decreasing
order: the one that is most likely to
belong to the positive class appears at
the top of the list
• The closer to the diagonal line (i.e., the
closer the area is to 0.5), the less
accurate is the model
Vertical axis
represents the true
positive rate
Horizontal axis rep.
the false positive rate
The plot also shows a
diagonal line
A model with perfect
accuracy will have an
area of 1.0
54
55. Issues Affecting Model Selection
• Accuracy
• classifier accuracy: predicting class label
• Speed
• time to construct the model (training time)
• time to use the model (classification/prediction time)
• Robustness: handling noise and missing values
• Scalability: efficiency in disk-resident databases
• Interpretability
• understanding and insight provided by the model
• Other measures, e.g., goodness of rules, such as decision tree
size or compactness of classification rules 55
56. Techniques to Improve Classification
Accuracy
• Introducing Ensemble Methods
• Bagging
• Boosting and AdaBoost
• Random forest
• Improving classification accuracy of class Imbalanced data
57. Ensemble Methods: Increasing the
Accuracy
• Ensemble methods
• Use a combination of models to increase accuracy
• Combine a series of k learned models, M1, M2, …, Mk, with
the aim of creating an improved model M*
• Popular ensemble methods
• Bagging: averaging the prediction over a collection of
classifiers
• Boosting: weighted vote with a collection of classifiers
• Ensemble: combining a set of heterogeneous classifiers
57
58. Bagging: Boostrap Aggregation
• Analogy: Diagnosis based on multiple doctors’ majority vote
• Training
• Given a set D of d tuples, at each iteration i, a training set Di of d tuples is
sampled with replacement from D (i.e., bootstrap)
• A classifier model Mi is learned for each training set Di
• Classification: classify an unknown sample X
• Each classifier Mi returns its class prediction
• The bagged classifier M* counts the votes and assigns the class with the
most votes to X
• Prediction: can be applied to the prediction of continuous values by taking the
average value of each prediction for a given test tuple
• Accuracy
• Often significantly better than a single classifier derived from D
• For noise data: not considerably worse, more robust
• Proved improved accuracy in prediction
58
59. Boosting
• Analogy: Consult several doctors, based on a combination of
weighted diagnoses—weight assigned based on the previous
diagnosis accuracy
• How boosting works?
• Weights are assigned to each training tuple
• A series of k classifiers is iteratively learned
• After a classifier Mi is learned, the weights are updated to
allow the subsequent classifier, Mi+1, to pay more attention to
the training tuples that were misclassified by Mi
• The final M* combines the votes of each individual classifier,
where the weight of each classifier's vote is a function of its
accuracy
• Boosting algorithm can be extended for numeric prediction
• Comparing with bagging: Boosting tends to have greater accuracy,
but it also risks overfitting the model to misclassified data 59
60. 60
Adaboost (Freund and Schapire,
1997)
• Given a set of d class-labeled tuples, (X1, y1), …, (Xd, yd)
• Initially, all the weights of tuples are set the same (1/d)
• Generate k classifiers in k rounds. At round i,
• Tuples from D are sampled (with replacement) to form a training set Di
of the same size
• Each tuple’s chance of being selected is based on its weight
• A classification model Mi is derived from Di
• Its error rate is calculated using Di as a test set
• If a tuple is misclassified, its weight is increased, o.w. it is decreased
• Error rate: err(Xj) is the misclassification error of tuple Xj. Classifier Mi
error rate is the sum of the weights of the misclassified tuples:
• The weight of classifier Mi’s vote is )
(
)
(
1
log
i
i
M
error
M
error
d
j
j
i err
w
M
error )
(
)
( j
X
61. Random Forest (Breiman 2001)
• Random Forest:
• Each classifier in the ensemble is a decision tree classifier and is
generated using a random selection of attributes at each node to determine
the split
• During classification, each tree votes and the most popular class is
returned
• Two Methods to construct Random Forest:
• Forest-RI (random input selection): Randomly select, at each node, F
attributes as candidates for the split at the node. The CART methodology
is used to grow the trees to maximum size
• Forest-RC (random linear combinations): Creates new attributes (or
features) that are a linear combination of the existing attributes (reduces
the correlation between individual classifiers)
• Comparable in accuracy to Adaboost, but more robust to errors and outliers
• Insensitive to the number of attributes selected for consideration at each split,
and faster than bagging or boosting
61
62. Classification of Class-Imbalanced Data Sets
• Class-imbalance problem: Rare positive example but numerous
negative ones, e.g., medical diagnosis, fraud, oil-spill, fault, etc.
• Traditional methods assume a balanced distribution of classes and
equal error costs: not suitable for class-imbalanced data
• Typical methods for imbalance data in 2-class classification:
• Oversampling: re-sampling of data from positive class
• Under-sampling: randomly eliminate tuples from negative
class
• Threshold-moving: moves the decision threshold, t, so that the
rare class tuples are easier to classify, and hence, less chance of
costly false negative errors
• Ensemble techniques: Ensemble multiple classifiers introduced
above
• Still difficult for class imbalance problem on multiclass tasks
62