SlideShare a Scribd company logo
1 of 29
7/29/2013 1
 Abstract of the work
 Why we need it?
 Naïve Bayesian Classifier
 Definition
 Algorithm
 Gaussian Distribution
 Decision Tree
 Definition
 Algorithm
7/29/2013 2
 Information Gain
 My Algorithm
 Experimental Design
 Experimental Result
 Remarks
7/29/2013 3
 Apply Naïve Bayesian.
 Based on information gain create decision tree
& select attributes.
 Apply Naïve Bayesian with the selected
attributes.
 Minimize the time & space need to analysis.
 Can work with continuous data stream.
7/29/2013 4
1. Now-a-days data volume of internet
user is getting larger.
2. Machine learning is getting harder
day by day.
3. Pre-processing of data may be a
solution for it.
4. Using the necessary data only can
make the learning process faster
7/29/2013 5
5. A better technique can make the process more
organized using only necessary data
6. Cut off all un-important attributes from data set.
7. Dataset become compact in terms of attributes
and calculation becomes fast.
8. Get better performance than past on behalf of
time and space.
7/29/2013 6
 Naïve Bayesian Classifier
 Gaussian Distribution
 Decision Tree
 Information Gain
7/29/2013 7
 The Naïve Bayesian classifier(NB) is a
straightforward and frequently used method
for supervised learning.
 It provides a flexible way for dealing with any
number of attributes or classes
 It’s based on statistical probability theory.
7/29/2013 8
 It is the asymptotically fastest learning
algorithm that examines all its training input.
 It has been demonstrated to perform
surprisingly well in a very wide variety of
problems in spite of the simplistic nature of the
model.
 Furthermore, small amounts of bad data, or
“noise,” do not perturb the results by much.
7/29/2013 9
 There are classes, say Ck for the data to be
classified into.
 Each class has a probability P(Ck) that
represents the prior probability of classifying
an attribute into Ck.
 For n attribute values, vj, the goal of
classification is clearly to find the conditional
probability P(Ck | v1 ∧ v2 ∧ … ∧ vn).
7/29/2013 10
 By Bayes’ rule, this probability is equivalent to
7/29/2013 11
 The mathematical function for calculating the
probability density of Gaussian distribution at a
particular point X is:
where µ is the mean and σ is the standard deviation of
the continues-valued attribute X
7/29/2013 12
 1. Decision trees are one of the most
popular methods used for inductive
inference.
 2. The basic algorithm for decision tree
induction is a greedy algorithm that
constructs decision trees in a top-down
recursive divide-and-conquer manner.
 3. The main concept of selecting an
attribute and constructing a decision tree
is Information Gain(IG)
7/29/2013 13
 The basic idea behind any decision tree
algorithm is as follows:
 Choose the best attribute(s) to split the remaining
instances and make that attribute a decision node
using Information Gain
 Repeat this process for recursively for each child
 Stop when:
 All the instances have the same target attribute value
 There are no more attributes
 There are no more instances
7/29/2013 14
Leave At
Stall? Accident?
10 AM 9 AM
8 AM
Long
Long
Short Medium Long
No Yes No Yes
If we leave at
9 AM and
there is no
accident
happened on
the road, what
will our
commute time
be?
 The critical step in decision trees is the selection of
the best test attribute.
 The information gain measure is used to select the
test attribute at each node in the tree.
 The expected information needed to classify a
given sample is given by
where pk is the probability that an arbitrary sample
belongs to class Ck and is estimated by sk / s.
7/29/2013 16
1. Run Naïve Bayesian classifier on the training
data set
2. Run C4.5 on data from step 1.
3. Select a set of attributes that appear only in
the simplified decision tree as relevant features.
4. Run Naïve Bayesian classifier on the training
data using only the final attributes selected in
step 3.
5. Compare the result of step 4 with step 1.
7/29/2013 17
 Each dataset is shuffled randomly.
 Produce disjoint training and test sets as
follows.
 80% training & 20% test data
 70% training & 30% test data
 60% training & 40% test data
 For each set of training and test data, run
 Naïve Bayesian Classifier (NBC)
 C4.5
 Selective Bayesian Classifier(SBC)
7/29/2013 18
Dataset # of instances # of attributes # of attributes
selected
Iris 150 4 2
Diabetes 768 8 6
Ionosphere 351 34 14
Breast Cancer 286 9 6
Ecoli 336 8 7
7/29/2013 19
 Number of instances and attributes before &
after Decision Tree
 Number of test instance(s) classified properly
7/29/2013 20
Trainin
g : Test
Number
of
instance
Naïve
Bayesia
n
Accurac
y(%)
Selectiv
e Naïve
Bayesia
n
Accurac
y(%)
80 : 20 30 27 90% 29 96.67%
Iris 70 : 30 45 42 93.33% 43 95.56%
60 : 40 60 56 93.33% 57 95%
Trainin
g : Test
Number
of
instance
Naïve
Bayesia
n(NB)
Accurac
y(%)
Selectiv
e Naïve
Bayesia
n
Accurac
y(%)
80 : 20 154 119 77.27% 126 81.81%
Diabetes 70 : 30 231 173 76.20% 181 78.35%
60 :40 308 239 77.60% 246 79.87%
7/29/2013 21
Training
: Test
Number
of
instance
Naïve
Bayesian
(NB)
Accuracy
(%)
Selective
Naïve
Bayesian
Accuracy
(%)
80 : 20 137 134 97.81% 135 98.54%
Breast
Cancer
70 : 30 205 200 97.56% 202 98.54%
60 : 40 274 261 95.26% 264 96.35%
Training
: Test
Number
of
instance
Naïve
Bayesian
(NB)
Accuracy
(%)
Selective
Naïve
Bayesian
Accuracy
(%)
80 : 20 68 56 82.35% 58 85.29%
Ecoli 70 : 30 101 81 80.20% 82 81.19%
60 :40 135 110 81.48% 110 81.48%
7/29/2013 22
Trainin
g : Test
Number
of
instance
Naïve
Bayesian
Accuracy
(%)
Selective
Naïve
Bayesian
Accuracy
(%)
80 : 20 81 74 91.36% 78 96.30%
Ionospher
e
70 : 30 106 97 91.51% 100 94.34%
60 : 40 141 131 92.91% 134 95.04%
 Result of Cross Validation(10 fold)
7/29/2013 23
Naïve Bayesian Selective Naïve
Bayesian
Number of
instances
15 16 16
16 16 16
14 14 16
16 16 16
Iris 13 13 16
16 16 16
15 15 16
15 16 16
15 16 16
15 15 15
 Result of Cross Validation(10 fold)
7/29/2013 24
Naïve Bayesian Selective Naïve
Bayesian
Number of
instances
65 63 69
68 68 69
68 68 69
66 65 69
Breast Cancer 65 65 69
66 66 69
68 69 69
67 68 69
65 66 69
67 69 69
 Result of Cross Validation(10 fold)
7/29/2013 25
Naïve Bayesian Selective Naïve
Bayesian
Number of
instances
69 68 77
53 56 77
56 57 77
61 62 77
Diabetes 65 64 77
56 56 77
56 57 77
60 59 77
52 54 77
59 60 77
 Result of Cross Validation(10 fold)
7/29/2013 26
Naïve Bayesian Selective Naïve
Bayesian
Number of
instances
21 21 34
31 31 34
31 31 34
26 26 34
Ecoli 25 25 34
23 23 34
24 24 34
27 27 34
29 29 34
30 30 34
 Result of Cross Validation(10 fold)
7/29/2013 27
Naïve Bayesian Selective Naïve
Bayesian
Number of
instances
35 33 36
33 33 36
31 32 36
33 34 36
Ionosphere 33 35 36
30 31 36
30 31 36
32 33 36
31 31 36
33 34 36
 Dataset:
 UCI Machine Learning Repository
 Weka provided datasets
 Software & Tools:
 Weka 3.6.9
 Python Data Mining libraries
 sklearn
 numpy
 pylab
7/29/2013 28
Thank You
7/29/2013 29

More Related Content

What's hot

Modelling Probability Distributions using Neural Networks: Applications to Me...
Modelling Probability Distributions using Neural Networks: Applications to Me...Modelling Probability Distributions using Neural Networks: Applications to Me...
Modelling Probability Distributions using Neural Networks: Applications to Me...Christian Baumgartner
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction modelsMuthu Kumaar Thangavelu
 
local_learning.doc - Word Format
local_learning.doc - Word Formatlocal_learning.doc - Word Format
local_learning.doc - Word Formatbutest
 
Automatic Forecasting at Scale
Automatic Forecasting at ScaleAutomatic Forecasting at Scale
Automatic Forecasting at ScaleSean Taylor
 
Fuzzy clustering1
Fuzzy clustering1Fuzzy clustering1
Fuzzy clustering1abc
 
WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...
WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...
WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...Nexgen Technology
 

What's hot (7)

Modelling Probability Distributions using Neural Networks: Applications to Me...
Modelling Probability Distributions using Neural Networks: Applications to Me...Modelling Probability Distributions using Neural Networks: Applications to Me...
Modelling Probability Distributions using Neural Networks: Applications to Me...
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction models
 
local_learning.doc - Word Format
local_learning.doc - Word Formatlocal_learning.doc - Word Format
local_learning.doc - Word Format
 
Automatic Forecasting at Scale
Automatic Forecasting at ScaleAutomatic Forecasting at Scale
Automatic Forecasting at Scale
 
Fuzzy clustering1
Fuzzy clustering1Fuzzy clustering1
Fuzzy clustering1
 
WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...
WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...
WEAKLY SUPERVISED FINE-GRAINED CATEGORIZATION WITH PART-BASED IMAGE REPRESENT...
 
40120130406009
4012013040600940120130406009
40120130406009
 

Similar to Enhancing the performance of Naive Bayesian Classifier using Information Gain concept of Decision Tree

Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationHariniMS1
 
IDA 2015: Efficient model selection for regularized classification by exploit...
IDA 2015: Efficient model selection for regularized classification by exploit...IDA 2015: Efficient model selection for regularized classification by exploit...
IDA 2015: Efficient model selection for regularized classification by exploit...George Balikas
 
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...IRJET Journal
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf321106410027
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction modelsMuthu Kumaar Thangavelu
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
wekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdfwekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdfDr. Rajesh P Barnwal
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Fatimakhan325
 
Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)Sanghun Kim
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningLeo Salemann
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningKarunakar Kotha
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningWenfan Xu
 
Final Defense ppt (brats)of brain tumor segmentation.pptx
Final Defense ppt (brats)of brain tumor segmentation.pptxFinal Defense ppt (brats)of brain tumor segmentation.pptx
Final Defense ppt (brats)of brain tumor segmentation.pptxPrabhakarNeupane3
 
Expandable bayesian
Expandable bayesianExpandable bayesian
Expandable bayesianAhmad Amri
 
Classification of Grasp Patterns using sEMG
Classification of Grasp Patterns using sEMGClassification of Grasp Patterns using sEMG
Classification of Grasp Patterns using sEMGPriyanka Reddy
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
 

Similar to Enhancing the performance of Naive Bayesian Classifier using Information Gain concept of Decision Tree (20)

Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
 
IDA 2015: Efficient model selection for regularized classification by exploit...
IDA 2015: Efficient model selection for regularized classification by exploit...IDA 2015: Efficient model selection for regularized classification by exploit...
IDA 2015: Efficient model selection for regularized classification by exploit...
 
Second subjective assignment
Second  subjective assignmentSecond  subjective assignment
Second subjective assignment
 
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
 
forest-cover-type
forest-cover-typeforest-cover-type
forest-cover-type
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction models
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
MNIST 10-class Classifiers
MNIST 10-class ClassifiersMNIST 10-class Classifiers
MNIST 10-class Classifiers
 
wekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdfwekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdf
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
 
Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Final Defense ppt (brats)of brain tumor segmentation.pptx
Final Defense ppt (brats)of brain tumor segmentation.pptxFinal Defense ppt (brats)of brain tumor segmentation.pptx
Final Defense ppt (brats)of brain tumor segmentation.pptx
 
Expandable bayesian
Expandable bayesianExpandable bayesian
Expandable bayesian
 
Classification of Grasp Patterns using sEMG
Classification of Grasp Patterns using sEMGClassification of Grasp Patterns using sEMG
Classification of Grasp Patterns using sEMG
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
 
P1121133727
P1121133727P1121133727
P1121133727
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Enhancing the performance of Naive Bayesian Classifier using Information Gain concept of Decision Tree

  • 2.  Abstract of the work  Why we need it?  Naïve Bayesian Classifier  Definition  Algorithm  Gaussian Distribution  Decision Tree  Definition  Algorithm 7/29/2013 2
  • 3.  Information Gain  My Algorithm  Experimental Design  Experimental Result  Remarks 7/29/2013 3
  • 4.  Apply Naïve Bayesian.  Based on information gain create decision tree & select attributes.  Apply Naïve Bayesian with the selected attributes.  Minimize the time & space need to analysis.  Can work with continuous data stream. 7/29/2013 4
  • 5. 1. Now-a-days data volume of internet user is getting larger. 2. Machine learning is getting harder day by day. 3. Pre-processing of data may be a solution for it. 4. Using the necessary data only can make the learning process faster 7/29/2013 5
  • 6. 5. A better technique can make the process more organized using only necessary data 6. Cut off all un-important attributes from data set. 7. Dataset become compact in terms of attributes and calculation becomes fast. 8. Get better performance than past on behalf of time and space. 7/29/2013 6
  • 7.  Naïve Bayesian Classifier  Gaussian Distribution  Decision Tree  Information Gain 7/29/2013 7
  • 8.  The Naïve Bayesian classifier(NB) is a straightforward and frequently used method for supervised learning.  It provides a flexible way for dealing with any number of attributes or classes  It’s based on statistical probability theory. 7/29/2013 8
  • 9.  It is the asymptotically fastest learning algorithm that examines all its training input.  It has been demonstrated to perform surprisingly well in a very wide variety of problems in spite of the simplistic nature of the model.  Furthermore, small amounts of bad data, or “noise,” do not perturb the results by much. 7/29/2013 9
  • 10.  There are classes, say Ck for the data to be classified into.  Each class has a probability P(Ck) that represents the prior probability of classifying an attribute into Ck.  For n attribute values, vj, the goal of classification is clearly to find the conditional probability P(Ck | v1 ∧ v2 ∧ … ∧ vn). 7/29/2013 10
  • 11.  By Bayes’ rule, this probability is equivalent to 7/29/2013 11
  • 12.  The mathematical function for calculating the probability density of Gaussian distribution at a particular point X is: where µ is the mean and σ is the standard deviation of the continues-valued attribute X 7/29/2013 12
  • 13.  1. Decision trees are one of the most popular methods used for inductive inference.  2. The basic algorithm for decision tree induction is a greedy algorithm that constructs decision trees in a top-down recursive divide-and-conquer manner.  3. The main concept of selecting an attribute and constructing a decision tree is Information Gain(IG) 7/29/2013 13
  • 14.  The basic idea behind any decision tree algorithm is as follows:  Choose the best attribute(s) to split the remaining instances and make that attribute a decision node using Information Gain  Repeat this process for recursively for each child  Stop when:  All the instances have the same target attribute value  There are no more attributes  There are no more instances 7/29/2013 14
  • 15. Leave At Stall? Accident? 10 AM 9 AM 8 AM Long Long Short Medium Long No Yes No Yes If we leave at 9 AM and there is no accident happened on the road, what will our commute time be?
  • 16.  The critical step in decision trees is the selection of the best test attribute.  The information gain measure is used to select the test attribute at each node in the tree.  The expected information needed to classify a given sample is given by where pk is the probability that an arbitrary sample belongs to class Ck and is estimated by sk / s. 7/29/2013 16
  • 17. 1. Run Naïve Bayesian classifier on the training data set 2. Run C4.5 on data from step 1. 3. Select a set of attributes that appear only in the simplified decision tree as relevant features. 4. Run Naïve Bayesian classifier on the training data using only the final attributes selected in step 3. 5. Compare the result of step 4 with step 1. 7/29/2013 17
  • 18.  Each dataset is shuffled randomly.  Produce disjoint training and test sets as follows.  80% training & 20% test data  70% training & 30% test data  60% training & 40% test data  For each set of training and test data, run  Naïve Bayesian Classifier (NBC)  C4.5  Selective Bayesian Classifier(SBC) 7/29/2013 18
  • 19. Dataset # of instances # of attributes # of attributes selected Iris 150 4 2 Diabetes 768 8 6 Ionosphere 351 34 14 Breast Cancer 286 9 6 Ecoli 336 8 7 7/29/2013 19  Number of instances and attributes before & after Decision Tree
  • 20.  Number of test instance(s) classified properly 7/29/2013 20 Trainin g : Test Number of instance Naïve Bayesia n Accurac y(%) Selectiv e Naïve Bayesia n Accurac y(%) 80 : 20 30 27 90% 29 96.67% Iris 70 : 30 45 42 93.33% 43 95.56% 60 : 40 60 56 93.33% 57 95% Trainin g : Test Number of instance Naïve Bayesia n(NB) Accurac y(%) Selectiv e Naïve Bayesia n Accurac y(%) 80 : 20 154 119 77.27% 126 81.81% Diabetes 70 : 30 231 173 76.20% 181 78.35% 60 :40 308 239 77.60% 246 79.87%
  • 21. 7/29/2013 21 Training : Test Number of instance Naïve Bayesian (NB) Accuracy (%) Selective Naïve Bayesian Accuracy (%) 80 : 20 137 134 97.81% 135 98.54% Breast Cancer 70 : 30 205 200 97.56% 202 98.54% 60 : 40 274 261 95.26% 264 96.35% Training : Test Number of instance Naïve Bayesian (NB) Accuracy (%) Selective Naïve Bayesian Accuracy (%) 80 : 20 68 56 82.35% 58 85.29% Ecoli 70 : 30 101 81 80.20% 82 81.19% 60 :40 135 110 81.48% 110 81.48%
  • 22. 7/29/2013 22 Trainin g : Test Number of instance Naïve Bayesian Accuracy (%) Selective Naïve Bayesian Accuracy (%) 80 : 20 81 74 91.36% 78 96.30% Ionospher e 70 : 30 106 97 91.51% 100 94.34% 60 : 40 141 131 92.91% 134 95.04%
  • 23.  Result of Cross Validation(10 fold) 7/29/2013 23 Naïve Bayesian Selective Naïve Bayesian Number of instances 15 16 16 16 16 16 14 14 16 16 16 16 Iris 13 13 16 16 16 16 15 15 16 15 16 16 15 16 16 15 15 15
  • 24.  Result of Cross Validation(10 fold) 7/29/2013 24 Naïve Bayesian Selective Naïve Bayesian Number of instances 65 63 69 68 68 69 68 68 69 66 65 69 Breast Cancer 65 65 69 66 66 69 68 69 69 67 68 69 65 66 69 67 69 69
  • 25.  Result of Cross Validation(10 fold) 7/29/2013 25 Naïve Bayesian Selective Naïve Bayesian Number of instances 69 68 77 53 56 77 56 57 77 61 62 77 Diabetes 65 64 77 56 56 77 56 57 77 60 59 77 52 54 77 59 60 77
  • 26.  Result of Cross Validation(10 fold) 7/29/2013 26 Naïve Bayesian Selective Naïve Bayesian Number of instances 21 21 34 31 31 34 31 31 34 26 26 34 Ecoli 25 25 34 23 23 34 24 24 34 27 27 34 29 29 34 30 30 34
  • 27.  Result of Cross Validation(10 fold) 7/29/2013 27 Naïve Bayesian Selective Naïve Bayesian Number of instances 35 33 36 33 33 36 31 32 36 33 34 36 Ionosphere 33 35 36 30 31 36 30 31 36 32 33 36 31 31 36 33 34 36
  • 28.  Dataset:  UCI Machine Learning Repository  Weka provided datasets  Software & Tools:  Weka 3.6.9  Python Data Mining libraries  sklearn  numpy  pylab 7/29/2013 28