SlideShare a Scribd company logo
1 of 40
Download to read offline
Exploring the Noise Resilience
Combined Sturges Algorithm
Akrita Agarwal
Advisor: Dr. Anca Ralescu
November 7, 2015
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 1 / 39
Motivation
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 2 / 39
Motivation
A study on Noise?
Real-world datasets are noisy
Recordings under normal environmental conditions
Equipment Measurement Error
Most algorithms ignore Noise.
Not a lot of research done on Noise.
Aim : Explore the robustness of algorithms to Noise.
Which algorithm is least affected by noisy Datasets?
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 3 / 39
Classification
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 4 / 39
Classification
Classification : Assigning a new observation to a set of known
categories
Companies store large amounts of data.
Effective Classifier can assist in making good predictions and informed
business decisions.
E.g. Whether to recommend Prime products to the non-prime
customers, based on behavior
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 5 / 39
Classification Algorithms
Two broad kinds of Classifiers are -
Frequency based classifiers: use the frequency of datapoints in the
dataset to determine the class membership of a given test point,
Geometry based classifiers leverage the geometrical aspects of a
dataset such as the distance.
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 6 / 39
Naive Bayes
The Naive Bayes Classifier
Frequency based classifier
Computes the probability of a test data point to be in each class
class probability extracted from training data.
Pros
Intuitive to understand and build.
Easily trained, even with a small dataset
It’s fast
Cons
Assumes conditional independence of the data
ignores the underlying geometry of the data.
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 7 / 39
k Nearest Neighbors
The k Nearest Neighbors Classifier
Geometry based classifier
Assigns the class to test data point by determining the majority class
of k nearest points
Pros
Easy to implement and understand
Classes don’t have to be linearly separable
Cons
Tends to ignore the importance of an attribute; uses all
only indirectly takes into account the frequency of the data
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 8 / 39
Combined Sturges Classifier
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 9 / 39
Combined Sturges
The Combined Sturges(CS)Classifier
Explicitly uses geometry + frequency
Data represented as Frequency distribution on class.
Classification Score is computed for each class.
Test point assigned to class with highest Score.
Continuous data values are binned.
No. of bins = 1 + log2n
Sturges, 1926 - Choice of a Class Interval
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 10 / 39
Combined Sturges
Dummy Dataset
Table: Dummy Dataset
A1 A2 Class
3 2 1
1 2 1
4 2 0
3 2 1
1 1 0
2 2 1
3 3 0
4 1 0
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 11 / 39
Combined Sturges
Dummy Dataset
Table: Dummy Dataset
A1 A2 Class
3 2 1
1 2 1
4 2 0
3 2 1
1 1 0
2 2 1
3 3 0
4 1 0
Table: Frequency Distribution on Classes 0 & 1
A1 f (A1) A2 f (A2)
1 0.25 1 0.50
3 0.25 2 0.25
4 0.50 3 0.25
A1 f (A1) A2 f (A2)
1 0.25 2 0.75
2 0.25 3 0.25
3 0.50
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 11 / 39
Combined Sturges
Test Point : T1
3 4
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 12 / 39
Combined Sturges
1 Geometric Criterion
Test Point : T1
3 4
minimum Distance
Classification
Criteria :
Geometric
Classification
Score: Highest
Posterior
Probability
Table: Nearest distance of T1 to Classes
A1 f (A1) A2 f (A2)
1 0.25 1 0.50
3 0.25 2 0.25
4 0.50 3 0.25
A1 f (A1) A2 f (A2)
1 0.25 2 0.75
2 0.25 3 0.25
3 0.50
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 13 / 39
Combined Sturges
Classification Score, S(c) c ∈ 0, 1
S(0)
A1 = P(Class0) × f (A1)
A2 = P(Class0) × f (A2)
average(A1, A2) =average(0.5 ×0.25, 0.5 × 0.25) = 0.125
S(1)
A1 = P(Class1) × f (A1)
A2 = P(Class1) × f (A2)
average(A1, A2) = average(0.5 × 0.50, 0.5 × 0.25) = 0.187
S(0) < S(1)
Class 1
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 14 / 39
Combined Sturges
1 Statistical Criterion
Test Point : T1
3 4
maximum
Frequency
Classification
criteria : Statistical
Classification
Score: Minimum
Distance
Table: Maximum Frequency in Classes
A1 f (A1) A2 f (A2)
1 0.25 1 0.50
3 0.25 2 0.25
4 0.50 3 0.25
A1 f (A1) A2 f (A2)
1 0.25 2 0.75
2 0.25 3 0.25
3 0.50
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 15 / 39
Combined Sturges
Classification Score
S(0)
A1 = (4 − 3) = 1
A2 = (4 − 1) = 3
average(A1, A2) = average(1, 3) = 2
S(1)
A1 = (3 − 3) = 0
A2 = (4 − 2) = 2
average(A1, A2) = average(0, 2) = 1
S(0) > S(1)
Class 1
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 16 / 39
Combined Sturges
1 Combined Criterion
Test Point : T1
3 4
d =
(T1 − A1).f (A1)
Expected Distance
ED = EDc
A1.EDc
A2
min Expected
Distance, ED
Table: Aggregate Expected Distance, ED
A1 f (A1) d.f A2 f (A2) d.f
1 0.25 0.50 1 0.50 1.50
3 0.25 0 2 0.25 0.50
4 0.50 0.50 3 0.25 0.25
ED0
A1 1.00 ED0
A2 2.25
A1 f (A1) d.f A2 f (A2) d.f
1 0.25 0.50 2 0.75 1.50
2 0.25 0.25 3 0.25 0.25
3 0.50 0
ED1
A1 0.75 ED1
A2 1.75
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 17 / 39
Combined Sturges
Classification Penalty
S(0)
ED = 1.00 × 2.25 = 2.25
S(0) = ED × (1 − P(Class0)) = 1.125
S(1)
ED = 0.75 × 1.75 = 1.31
S(1) = ED × (1 − P(Class1)) = 0.655
S(0) > S(1)
Class 1
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 18 / 39
The Noise Model
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 19 / 39
The Noise Model
Dealing with Noise
Brodley & Fried, 1999 - detect and reduce noise
Kubica & Moore, 2003 - identify Noise using a probabilistic model
and remove it.
Elias Kalapanidas, 2003 - Developed a Noise Model based on data
properties.
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 20 / 39
The Noise Model
Additive Noise, x = x + δx
δx = σxj × zi,j
σxj , standard deviation of attribute j,
zi,j = CDF(pi,j )
xi,j =
xi,j if pi,j ≥ n
xi,j if pi,j < n
(1)
Based on Noise level n ∈ {0, 0.15, 0.30, 0.50, 0.80}
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 21 / 39
The Noise Model
Attribute-level Noise
Table: Original Dataset
A1 A2 Class
3 2 1
1 2 1
4 2 0
3 2 1
1 1 0
2 2 1
3 3 0
4 1 0
Table: 40% (n = 0.4) Noisy Dataset
A1 A2 Class
8.5 0.55 1
8.9 2 1
4 0.7 0
3 2 1
4.7 1 0
2 2 1
3 3 0
1.6 0.02 0
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 22 / 39
Datasets
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 23 / 39
Datasets
Artificial datasets
Multivariate Normal
x1 = random Normal vector, t = random Normal vector
x2 = 0.8x1 + 0.6t
x3 = 0.6x1 + 0.8t
x4 = t
Linear Function with Non-normal inputs
x2 = (x1)2
+ 0.5t
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 24 / 39
Datasets
2 Artificial datasets
Different Imbalanced-Ratio
3 Real Datasets
Table: Comparison of physical properties of Datasets.
Dataset
No. of
Samples
No. of
Classes
No. of
Attributes
Attribute
Value
Imbalance
Ratio
Haberman 306 2 3 Integer 2.78
A1 200 3 4 Real 6.66
A2 200 3 4 Real 39
Iris 150 3 4 Real 2
Pima
Diabetes
768 2 8
Integer,
Real
1.87
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 25 / 39
Process Flow
1 Create Artificial Datasets
2 Implement the Noise model on all Datasets
3 Apply the three algorithms
4 Compare the results
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 26 / 39
Results
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 27 / 39
Results
Performance Measures
Confusion Matrix
Table: Confusion matrix for 2 classes.
Predicted Outcome
Positive Negative
Actual values
Positive TP FN
Negative FP TN
Accuracy Acc = TP+TN
TP+TN+FP+FN
Precision P = TP
TP+FP
Recall R = TP
TP+FN
F-measure Fα = PR
αP+(1−α)R
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 28 / 39
Results
Non-Noisy Datasets
Artificial Datasets -
knn does the best - 91.2% & 93.7%
Good improvement in CS from 65% - 76%
Table: Non-Noisy Artificial Datasets - Performance of all algorithms
Dataset Algorithm Accuracy Precision Recall F-measure
CS 65.0 63.5 70.1 66.6
knn 91.2 92.8 87.4 89.8A1
Naive Bayes 60.2 61.6 60.14 64.1
CS 76.0 68.4 71.62 69.7
knn 93.7 94.7 91.9 93.2A2
Naive Bayes 63.1 61.1 65.2 63.5
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 29 / 39
Results
Real Datasets -
Iris : knn does best. Followed by Naive Bayes.
Haberman : CS does best. Naive Bayes is really bad.
Pima-Diabetes : CS is best. Naive Bayes follows.
Table: Non-Noisy Real Datasets - Performance of all algorithms
Dataset Algorithm Accuracy Precision Recall F-Measure
CS 94.3 95.1 94.3 94.7
knn 96.7 96.8 96.7 96.8Iris
Naive Bayes 96.2 93.7 95 94.3
CS 75.2 67.2 61.6 64.2
knn 73.4 63.2 54.8 58.5Haberman
Naive Bayes 0.5 41.9 47.6 47.3
CS 73.7 74.9 65.1 69.6
knn 64.5 65.6 66.9 66.3
Pima -
Diabetes
Naive Bayes 70.3 59.2 56.7 57.9
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 30 / 39
Results
Noisy Datasets : A1
knn does best.
For both knn and CS, No change with noise
Naive Bayes does bad.
Table: Noisy A1 dataset - Performance of all algorithms
Algorithm Noise % Accuracy Precision Recall F-Measure
0 65 63.5 70.1 66.6
15 64.8 63.4 96.7 96.8CS
50 65.5 63.2 95 94.3
0 87.5 87.2 61.6 61.6
15 87.3 88.1 54.8 58.5knn
50 86.7 88.5 47.6 47.3
0 ≈ 0 ≈ 0 ≈ 0 ≈ 0
15 ≈ 0 ≈ 0 ≈ 0 ≈ 0Naive Bayes
50 ≈ 0 ≈ 0 ≈ 0 ≈ 0
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 31 / 39
Results
Noisy Datasets : A2
knn does best, but goes from 92.6% - 86.3%
For CS, no change with noise
From A1 to A2, CS : 65% - 76%
Table: Noisy A2 dataset - Performance of all algorithms
Algorithm Noise % Accuracy Precision Recall F-Measure
0 76.0 68.4 71.6 69.7
15 76.8 64.7 73.1 68.4CS
50 76.4 66.9 71.7 68.5
0 92.6 86.9 85.5 86.2
15 91.1 84.2 84.2 83.5knn
50 86.3 83.0 78.2 77.9
0 ≈ 0 ≈ 0 ≈ 0 ≈ 0
15 ≈ 0 ≈ 0 ≈ 0 ≈ 0Naive Bayes
50 ≈ 0 ≈ 0 ≈ 0 ≈ 0
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 32 / 39
Results
Noisy Datasets : Iris
knn does best at 0% Noise (96.7%) , then CS 94.5%
CS does best at 50% Noise - 73.1%, then knn - 63.8%
Table: Noisy Iris dataset - Performance of all algorithms
Algorithm Noise % Accuracy Precision Recall F-Measure
0 94.5 94.9 94.5 94.7
15 86.2 87.6 86.2 86.9CS
50 73.1 74.9 73.1 73.9
0 96.7 96.8 96.7 96.8
15 83.6 84.6 83.6 84.1knn
50 63.8 63.2 63.8 63.5
0 93.3 92.3 91.9 92.1
15 92.3 91.5 91.2 91.4Naive Bayes
50 0.7 18.3 0.7 NaN
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 33 / 39
Results
Noisy Datasets : Haberman
CS does best at 74.7%
Naive Bayes performs badly at ≈ 43%
Table: Noisy Haberman dataset - Performance of all algorithms
Algorithm Noise % Accuracy Precision Recall F-Measure
0 74.7 66.7 61.4 63.9
15 66.1 62.2 61.9 62.0CS
50 74.5 66.6 63 64.7
0 74.1 65.7 55.1 59.7
15 72.0 56.2 52.3 54.0knn
50 70.5 51.8 50.6 51.0
0 41.0 47.1 46.5 46.8
15 43.3 46.2 45.3 45.7Naive Bayes
50 41.4 34.7 32.4 31.8
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 34 / 39
Results
Noisy Datasets : Pima-Diabetes
CS does best, followed by knn
Naive Bayes bad with Noise : 70% - 55.7% - 0%
Table: Noisy Pima-Diabetes dataset - Performance of all algorithms
Algorithm Noise % Accuracy Precision Recall F-Measure
0 72.8 72.8 64.2 68.2
15 70.8 68.3 65.8 67CS
50 67.0 64.9 55.9 60.0
0 63.5 64.6 65.9 65.2
15 60.8 61.2 62.3 61.7knn
50 55.0 55.6 56.1 55.8
0 70.3 59.2 56.7 57.9
15 55.7 49.4 46.0 NaNNaive Bayes
50 0 0 0 NaN
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 35 / 39
Results
Results Summary
Table: Best Algorithm for different Noise Levels
Dataset 0% Noise 15% Noise 50% Noise
A1 knn knn knn
A2 knn knn knn
Haberman CS knn CS
Iris knn Naive Bayes CS
Pima -
Diabetes
CS CS CS
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 36 / 39
Conclusion
No algorithm is best.
In general knn has better accuracy but CS is more robust to noise.
Naive Bayes does much worse for noise, than others.
Also:
CS performs well for Imbalanced Datasets.
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 37 / 39
Future Work
Test with more datasets.
Test for performance on imbalanced datasets.
Only additive Noise model was used, try with other variations.
Compare with more algorithms.
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 38 / 39
Questions
Questions?
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 39 / 39

More Related Content

Viewers also liked

Rainfall estimation in India
Rainfall estimation in IndiaRainfall estimation in India
Rainfall estimation in IndiaAkrita Agarwal
 
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...Zainul Sayed
 
Faceted Search and Solr
Faceted Search and SolrFaceted Search and Solr
Faceted Search and Solrotisg
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentationKaiwen Qi
 
Resume Parsing with Named Entity Clustering Algorithm
Resume Parsing with Named Entity Clustering AlgorithmResume Parsing with Named Entity Clustering Algorithm
Resume Parsing with Named Entity Clustering AlgorithmSwapnil Sonar
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsSalah Amean
 

Viewers also liked (6)

Rainfall estimation in India
Rainfall estimation in IndiaRainfall estimation in India
Rainfall estimation in India
 
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
 
Faceted Search and Solr
Faceted Search and SolrFaceted Search and Solr
Faceted Search and Solr
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentation
 
Resume Parsing with Named Entity Clustering Algorithm
Resume Parsing with Named Entity Clustering AlgorithmResume Parsing with Named Entity Clustering Algorithm
Resume Parsing with Named Entity Clustering Algorithm
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 

Similar to Exploring the Noise Resilience of Classification Algorithms

Anderson-Darling Test and ROC Curve
Anderson-Darling Test and ROC CurveAnderson-Darling Test and ROC Curve
Anderson-Darling Test and ROC CurveKavi
 
an-experimental-evaluation-of-the-reliability-.ppt
an-experimental-evaluation-of-the-reliability-.pptan-experimental-evaluation-of-the-reliability-.ppt
an-experimental-evaluation-of-the-reliability-.pptMuhammadRusydiAlwi
 
Multivalued Subsets Under Information Theory
Multivalued Subsets Under Information TheoryMultivalued Subsets Under Information Theory
Multivalued Subsets Under Information TheoryIndraneel Dabhade
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_fariaPaulo Faria
 
Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...
Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...
Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...Waqas Tariq
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDatadog
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clusteringishmecse13
 
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...Md Kafiul Islam
 
Dynamic response of structures with uncertain properties
Dynamic response of structures with uncertain propertiesDynamic response of structures with uncertain properties
Dynamic response of structures with uncertain propertiesUniversity of Glasgow
 
WISS 2015 - Machine Learning lecture by Ludovic Samper
WISS 2015 - Machine Learning lecture by Ludovic Samper WISS 2015 - Machine Learning lecture by Ludovic Samper
WISS 2015 - Machine Learning lecture by Ludovic Samper Antidot
 
Real time active noise cancellation using adaptive filters following RLS and ...
Real time active noise cancellation using adaptive filters following RLS and ...Real time active noise cancellation using adaptive filters following RLS and ...
Real time active noise cancellation using adaptive filters following RLS and ...IRJET Journal
 
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...NTNU
 
Design of an Adaptive Hearing Aid Algorithm using Booth-Wallace Tree Multiplier
Design of an Adaptive Hearing Aid Algorithm using Booth-Wallace Tree MultiplierDesign of an Adaptive Hearing Aid Algorithm using Booth-Wallace Tree Multiplier
Design of an Adaptive Hearing Aid Algorithm using Booth-Wallace Tree MultiplierWaqas Tariq
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayKim Hammar
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
FPGA IMPLEMENTATION OF NOISE CANCELLATION USING ADAPTIVE ALGORITHMS
FPGA IMPLEMENTATION OF NOISE CANCELLATION USING ADAPTIVE ALGORITHMSFPGA IMPLEMENTATION OF NOISE CANCELLATION USING ADAPTIVE ALGORITHMS
FPGA IMPLEMENTATION OF NOISE CANCELLATION USING ADAPTIVE ALGORITHMSEditor IJMTER
 
Vlsi implementation of adaptive kalman filter for
Vlsi implementation of adaptive kalman filter forVlsi implementation of adaptive kalman filter for
Vlsi implementation of adaptive kalman filter foreSAT Publishing House
 
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...IRJET Journal
 

Similar to Exploring the Noise Resilience of Classification Algorithms (20)

Anderson-Darling Test and ROC Curve
Anderson-Darling Test and ROC CurveAnderson-Darling Test and ROC Curve
Anderson-Darling Test and ROC Curve
 
an-experimental-evaluation-of-the-reliability-.ppt
an-experimental-evaluation-of-the-reliability-.pptan-experimental-evaluation-of-the-reliability-.ppt
an-experimental-evaluation-of-the-reliability-.ppt
 
Multivalued Subsets Under Information Theory
Multivalued Subsets Under Information TheoryMultivalued Subsets Under Information Theory
Multivalued Subsets Under Information Theory
 
5 csp
5 csp5 csp
5 csp
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
 
Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...
Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...
Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - Datadog
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
 
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...
 
Dynamic response of structures with uncertain properties
Dynamic response of structures with uncertain propertiesDynamic response of structures with uncertain properties
Dynamic response of structures with uncertain properties
 
WISS 2015 - Machine Learning lecture by Ludovic Samper
WISS 2015 - Machine Learning lecture by Ludovic Samper WISS 2015 - Machine Learning lecture by Ludovic Samper
WISS 2015 - Machine Learning lecture by Ludovic Samper
 
Real time active noise cancellation using adaptive filters following RLS and ...
Real time active noise cancellation using adaptive filters following RLS and ...Real time active noise cancellation using adaptive filters following RLS and ...
Real time active noise cancellation using adaptive filters following RLS and ...
 
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
An Experimental Study about Simple Decision Trees for Bagging Ensemble on Dat...
 
CASR-Report
CASR-ReportCASR-Report
CASR-Report
 
Design of an Adaptive Hearing Aid Algorithm using Booth-Wallace Tree Multiplier
Design of an Adaptive Hearing Aid Algorithm using Booth-Wallace Tree MultiplierDesign of an Adaptive Hearing Aid Algorithm using Booth-Wallace Tree Multiplier
Design of an Adaptive Hearing Aid Algorithm using Booth-Wallace Tree Multiplier
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
FPGA IMPLEMENTATION OF NOISE CANCELLATION USING ADAPTIVE ALGORITHMS
FPGA IMPLEMENTATION OF NOISE CANCELLATION USING ADAPTIVE ALGORITHMSFPGA IMPLEMENTATION OF NOISE CANCELLATION USING ADAPTIVE ALGORITHMS
FPGA IMPLEMENTATION OF NOISE CANCELLATION USING ADAPTIVE ALGORITHMS
 
Vlsi implementation of adaptive kalman filter for
Vlsi implementation of adaptive kalman filter forVlsi implementation of adaptive kalman filter for
Vlsi implementation of adaptive kalman filter for
 
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
 

Recently uploaded

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 

Recently uploaded (20)

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 

Exploring the Noise Resilience of Classification Algorithms

  • 1. Exploring the Noise Resilience Combined Sturges Algorithm Akrita Agarwal Advisor: Dr. Anca Ralescu November 7, 2015 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 1 / 39
  • 2. Motivation Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 2 / 39
  • 3. Motivation A study on Noise? Real-world datasets are noisy Recordings under normal environmental conditions Equipment Measurement Error Most algorithms ignore Noise. Not a lot of research done on Noise. Aim : Explore the robustness of algorithms to Noise. Which algorithm is least affected by noisy Datasets? Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 3 / 39
  • 4. Classification Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 4 / 39
  • 5. Classification Classification : Assigning a new observation to a set of known categories Companies store large amounts of data. Effective Classifier can assist in making good predictions and informed business decisions. E.g. Whether to recommend Prime products to the non-prime customers, based on behavior Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 5 / 39
  • 6. Classification Algorithms Two broad kinds of Classifiers are - Frequency based classifiers: use the frequency of datapoints in the dataset to determine the class membership of a given test point, Geometry based classifiers leverage the geometrical aspects of a dataset such as the distance. Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 6 / 39
  • 7. Naive Bayes The Naive Bayes Classifier Frequency based classifier Computes the probability of a test data point to be in each class class probability extracted from training data. Pros Intuitive to understand and build. Easily trained, even with a small dataset It’s fast Cons Assumes conditional independence of the data ignores the underlying geometry of the data. Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 7 / 39
  • 8. k Nearest Neighbors The k Nearest Neighbors Classifier Geometry based classifier Assigns the class to test data point by determining the majority class of k nearest points Pros Easy to implement and understand Classes don’t have to be linearly separable Cons Tends to ignore the importance of an attribute; uses all only indirectly takes into account the frequency of the data Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 8 / 39
  • 9. Combined Sturges Classifier Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 9 / 39
  • 10. Combined Sturges The Combined Sturges(CS)Classifier Explicitly uses geometry + frequency Data represented as Frequency distribution on class. Classification Score is computed for each class. Test point assigned to class with highest Score. Continuous data values are binned. No. of bins = 1 + log2n Sturges, 1926 - Choice of a Class Interval Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 10 / 39
  • 11. Combined Sturges Dummy Dataset Table: Dummy Dataset A1 A2 Class 3 2 1 1 2 1 4 2 0 3 2 1 1 1 0 2 2 1 3 3 0 4 1 0 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 11 / 39
  • 12. Combined Sturges Dummy Dataset Table: Dummy Dataset A1 A2 Class 3 2 1 1 2 1 4 2 0 3 2 1 1 1 0 2 2 1 3 3 0 4 1 0 Table: Frequency Distribution on Classes 0 & 1 A1 f (A1) A2 f (A2) 1 0.25 1 0.50 3 0.25 2 0.25 4 0.50 3 0.25 A1 f (A1) A2 f (A2) 1 0.25 2 0.75 2 0.25 3 0.25 3 0.50 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 11 / 39
  • 13. Combined Sturges Test Point : T1 3 4 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 12 / 39
  • 14. Combined Sturges 1 Geometric Criterion Test Point : T1 3 4 minimum Distance Classification Criteria : Geometric Classification Score: Highest Posterior Probability Table: Nearest distance of T1 to Classes A1 f (A1) A2 f (A2) 1 0.25 1 0.50 3 0.25 2 0.25 4 0.50 3 0.25 A1 f (A1) A2 f (A2) 1 0.25 2 0.75 2 0.25 3 0.25 3 0.50 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 13 / 39
  • 15. Combined Sturges Classification Score, S(c) c ∈ 0, 1 S(0) A1 = P(Class0) × f (A1) A2 = P(Class0) × f (A2) average(A1, A2) =average(0.5 ×0.25, 0.5 × 0.25) = 0.125 S(1) A1 = P(Class1) × f (A1) A2 = P(Class1) × f (A2) average(A1, A2) = average(0.5 × 0.50, 0.5 × 0.25) = 0.187 S(0) < S(1) Class 1 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 14 / 39
  • 16. Combined Sturges 1 Statistical Criterion Test Point : T1 3 4 maximum Frequency Classification criteria : Statistical Classification Score: Minimum Distance Table: Maximum Frequency in Classes A1 f (A1) A2 f (A2) 1 0.25 1 0.50 3 0.25 2 0.25 4 0.50 3 0.25 A1 f (A1) A2 f (A2) 1 0.25 2 0.75 2 0.25 3 0.25 3 0.50 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 15 / 39
  • 17. Combined Sturges Classification Score S(0) A1 = (4 − 3) = 1 A2 = (4 − 1) = 3 average(A1, A2) = average(1, 3) = 2 S(1) A1 = (3 − 3) = 0 A2 = (4 − 2) = 2 average(A1, A2) = average(0, 2) = 1 S(0) > S(1) Class 1 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 16 / 39
  • 18. Combined Sturges 1 Combined Criterion Test Point : T1 3 4 d = (T1 − A1).f (A1) Expected Distance ED = EDc A1.EDc A2 min Expected Distance, ED Table: Aggregate Expected Distance, ED A1 f (A1) d.f A2 f (A2) d.f 1 0.25 0.50 1 0.50 1.50 3 0.25 0 2 0.25 0.50 4 0.50 0.50 3 0.25 0.25 ED0 A1 1.00 ED0 A2 2.25 A1 f (A1) d.f A2 f (A2) d.f 1 0.25 0.50 2 0.75 1.50 2 0.25 0.25 3 0.25 0.25 3 0.50 0 ED1 A1 0.75 ED1 A2 1.75 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 17 / 39
  • 19. Combined Sturges Classification Penalty S(0) ED = 1.00 × 2.25 = 2.25 S(0) = ED × (1 − P(Class0)) = 1.125 S(1) ED = 0.75 × 1.75 = 1.31 S(1) = ED × (1 − P(Class1)) = 0.655 S(0) > S(1) Class 1 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 18 / 39
  • 20. The Noise Model Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 19 / 39
  • 21. The Noise Model Dealing with Noise Brodley & Fried, 1999 - detect and reduce noise Kubica & Moore, 2003 - identify Noise using a probabilistic model and remove it. Elias Kalapanidas, 2003 - Developed a Noise Model based on data properties. Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 20 / 39
  • 22. The Noise Model Additive Noise, x = x + δx δx = σxj × zi,j σxj , standard deviation of attribute j, zi,j = CDF(pi,j ) xi,j = xi,j if pi,j ≥ n xi,j if pi,j < n (1) Based on Noise level n ∈ {0, 0.15, 0.30, 0.50, 0.80} Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 21 / 39
  • 23. The Noise Model Attribute-level Noise Table: Original Dataset A1 A2 Class 3 2 1 1 2 1 4 2 0 3 2 1 1 1 0 2 2 1 3 3 0 4 1 0 Table: 40% (n = 0.4) Noisy Dataset A1 A2 Class 8.5 0.55 1 8.9 2 1 4 0.7 0 3 2 1 4.7 1 0 2 2 1 3 3 0 1.6 0.02 0 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 22 / 39
  • 24. Datasets Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 23 / 39
  • 25. Datasets Artificial datasets Multivariate Normal x1 = random Normal vector, t = random Normal vector x2 = 0.8x1 + 0.6t x3 = 0.6x1 + 0.8t x4 = t Linear Function with Non-normal inputs x2 = (x1)2 + 0.5t Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 24 / 39
  • 26. Datasets 2 Artificial datasets Different Imbalanced-Ratio 3 Real Datasets Table: Comparison of physical properties of Datasets. Dataset No. of Samples No. of Classes No. of Attributes Attribute Value Imbalance Ratio Haberman 306 2 3 Integer 2.78 A1 200 3 4 Real 6.66 A2 200 3 4 Real 39 Iris 150 3 4 Real 2 Pima Diabetes 768 2 8 Integer, Real 1.87 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 25 / 39
  • 27. Process Flow 1 Create Artificial Datasets 2 Implement the Noise model on all Datasets 3 Apply the three algorithms 4 Compare the results Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 26 / 39
  • 28. Results Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 27 / 39
  • 29. Results Performance Measures Confusion Matrix Table: Confusion matrix for 2 classes. Predicted Outcome Positive Negative Actual values Positive TP FN Negative FP TN Accuracy Acc = TP+TN TP+TN+FP+FN Precision P = TP TP+FP Recall R = TP TP+FN F-measure Fα = PR αP+(1−α)R Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 28 / 39
  • 30. Results Non-Noisy Datasets Artificial Datasets - knn does the best - 91.2% & 93.7% Good improvement in CS from 65% - 76% Table: Non-Noisy Artificial Datasets - Performance of all algorithms Dataset Algorithm Accuracy Precision Recall F-measure CS 65.0 63.5 70.1 66.6 knn 91.2 92.8 87.4 89.8A1 Naive Bayes 60.2 61.6 60.14 64.1 CS 76.0 68.4 71.62 69.7 knn 93.7 94.7 91.9 93.2A2 Naive Bayes 63.1 61.1 65.2 63.5 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 29 / 39
  • 31. Results Real Datasets - Iris : knn does best. Followed by Naive Bayes. Haberman : CS does best. Naive Bayes is really bad. Pima-Diabetes : CS is best. Naive Bayes follows. Table: Non-Noisy Real Datasets - Performance of all algorithms Dataset Algorithm Accuracy Precision Recall F-Measure CS 94.3 95.1 94.3 94.7 knn 96.7 96.8 96.7 96.8Iris Naive Bayes 96.2 93.7 95 94.3 CS 75.2 67.2 61.6 64.2 knn 73.4 63.2 54.8 58.5Haberman Naive Bayes 0.5 41.9 47.6 47.3 CS 73.7 74.9 65.1 69.6 knn 64.5 65.6 66.9 66.3 Pima - Diabetes Naive Bayes 70.3 59.2 56.7 57.9 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 30 / 39
  • 32. Results Noisy Datasets : A1 knn does best. For both knn and CS, No change with noise Naive Bayes does bad. Table: Noisy A1 dataset - Performance of all algorithms Algorithm Noise % Accuracy Precision Recall F-Measure 0 65 63.5 70.1 66.6 15 64.8 63.4 96.7 96.8CS 50 65.5 63.2 95 94.3 0 87.5 87.2 61.6 61.6 15 87.3 88.1 54.8 58.5knn 50 86.7 88.5 47.6 47.3 0 ≈ 0 ≈ 0 ≈ 0 ≈ 0 15 ≈ 0 ≈ 0 ≈ 0 ≈ 0Naive Bayes 50 ≈ 0 ≈ 0 ≈ 0 ≈ 0 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 31 / 39
  • 33. Results Noisy Datasets : A2 knn does best, but goes from 92.6% - 86.3% For CS, no change with noise From A1 to A2, CS : 65% - 76% Table: Noisy A2 dataset - Performance of all algorithms Algorithm Noise % Accuracy Precision Recall F-Measure 0 76.0 68.4 71.6 69.7 15 76.8 64.7 73.1 68.4CS 50 76.4 66.9 71.7 68.5 0 92.6 86.9 85.5 86.2 15 91.1 84.2 84.2 83.5knn 50 86.3 83.0 78.2 77.9 0 ≈ 0 ≈ 0 ≈ 0 ≈ 0 15 ≈ 0 ≈ 0 ≈ 0 ≈ 0Naive Bayes 50 ≈ 0 ≈ 0 ≈ 0 ≈ 0 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 32 / 39
  • 34. Results Noisy Datasets : Iris knn does best at 0% Noise (96.7%) , then CS 94.5% CS does best at 50% Noise - 73.1%, then knn - 63.8% Table: Noisy Iris dataset - Performance of all algorithms Algorithm Noise % Accuracy Precision Recall F-Measure 0 94.5 94.9 94.5 94.7 15 86.2 87.6 86.2 86.9CS 50 73.1 74.9 73.1 73.9 0 96.7 96.8 96.7 96.8 15 83.6 84.6 83.6 84.1knn 50 63.8 63.2 63.8 63.5 0 93.3 92.3 91.9 92.1 15 92.3 91.5 91.2 91.4Naive Bayes 50 0.7 18.3 0.7 NaN Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 33 / 39
  • 35. Results Noisy Datasets : Haberman CS does best at 74.7% Naive Bayes performs badly at ≈ 43% Table: Noisy Haberman dataset - Performance of all algorithms Algorithm Noise % Accuracy Precision Recall F-Measure 0 74.7 66.7 61.4 63.9 15 66.1 62.2 61.9 62.0CS 50 74.5 66.6 63 64.7 0 74.1 65.7 55.1 59.7 15 72.0 56.2 52.3 54.0knn 50 70.5 51.8 50.6 51.0 0 41.0 47.1 46.5 46.8 15 43.3 46.2 45.3 45.7Naive Bayes 50 41.4 34.7 32.4 31.8 Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 34 / 39
  • 36. Results Noisy Datasets : Pima-Diabetes CS does best, followed by knn Naive Bayes bad with Noise : 70% - 55.7% - 0% Table: Noisy Pima-Diabetes dataset - Performance of all algorithms Algorithm Noise % Accuracy Precision Recall F-Measure 0 72.8 72.8 64.2 68.2 15 70.8 68.3 65.8 67CS 50 67.0 64.9 55.9 60.0 0 63.5 64.6 65.9 65.2 15 60.8 61.2 62.3 61.7knn 50 55.0 55.6 56.1 55.8 0 70.3 59.2 56.7 57.9 15 55.7 49.4 46.0 NaNNaive Bayes 50 0 0 0 NaN Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 35 / 39
  • 37. Results Results Summary Table: Best Algorithm for different Noise Levels Dataset 0% Noise 15% Noise 50% Noise A1 knn knn knn A2 knn knn knn Haberman CS knn CS Iris knn Naive Bayes CS Pima - Diabetes CS CS CS Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 36 / 39
  • 38. Conclusion No algorithm is best. In general knn has better accuracy but CS is more robust to noise. Naive Bayes does much worse for noise, than others. Also: CS performs well for Imbalanced Datasets. Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 37 / 39
  • 39. Future Work Test with more datasets. Test for performance on imbalanced datasets. Only additive Noise model was used, try with other variations. Compare with more algorithms. Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 38 / 39
  • 40. Questions Questions? Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 39 / 39