SlideShare a Scribd company logo

Classification Based Machine Learning Algorithms

Md. Main Uddin Rony
Md. Main Uddin Rony
Md. Main Uddin RonySoftware Developer at Infolytx, Inc.

This slide focuses on working procedure of some famous classification based machine learning algorithms

Classification Based Machine Learning Algorithms

1 of 41
Download to read offline
Classification
Based Machine
Learning
Algorithms
Md Main Uddin Rony,
Software Engineer
.
1
What is Classification?
Classification is a data mining task of predicting the value of a
categorical variable (target or class)
This is done by building a model based on one or more numerical
and/or categorical variables ( predictors, attributes or features)
Considered an instance of supervised learning
Corresponding unsupervised procedure is known as clustering
2
Classification
Based Algorithms
Four main groups of classification
algorithms are:
● Frequency Table
- ZeroR
- OneR
- Naive Bayesian
- Decision Tree
● Covariance Matrix
- Linear Discriminant Analysis
- Logistic Regression
● Similarity Functions
- K Nearest Neighbours
● Others
- Artificial Neural Network
- Support Vector Machine
3
4
Naive Bayes Classifier
● Works based on Bayes’ theorem
● Why its is called Naive?
- Because it assumes that the presence of a particular feature
in a class is unrelated to the presence of any other feature
● Easy to build
● Useful for very large data sets
Bayes’ Theorem
The theorem can be stated mathematically as follow:
P(A) and P(B) are the probabilities of observing A and B without regard
to each other. Also known as Prior Probability.
P(A | B), a conditional (Posterior) probability, is the probability of
observing event A given that B is true.
P(B | A) is the conditional (Posterior)probability of observing event B
given that A is true.
So, how does naive bayes classifier work based on this?
5
How Naive Bayes works?
● Let D be a training set of tuples and each tuple is represented by n-dimensional
attribute vector, X = ( x1, x2, ….., xn)
● Suppose that there are m classes, C1, C2,...., Cm. Given a tuple, X, the classifier will
predict that X belongs to the class having the highest posterior probability, conditioned
on X. That is, the Naive Bayesian classifier predicts that tuple X belongs to the class Ci
if and only if
● By Bayes’ theorem
● P(X) is constant for all classes, only needs to be maximized
6

More Related Content

What's hot

Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic RegressionKnoldus Inc.
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)Sharayu Patil
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.Megha Sharma
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reductionmrizwan969
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted treesNihar Ranjan
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Usama Fayyaz
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine LearningSamra Shahzadi
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning ProjectAbhishek Singh
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 

What's hot (20)

KNN
KNN KNN
KNN
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
 
Clustering
ClusteringClustering
Clustering
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 

Viewers also liked

Online algorithms in Machine Learning
Online algorithms in Machine LearningOnline algorithms in Machine Learning
Online algorithms in Machine LearningAmrinder Arora
 
Introduction to Machine Learning & Classification
Introduction to Machine Learning & ClassificationIntroduction to Machine Learning & Classification
Introduction to Machine Learning & ClassificationChristopher Sharkey
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examplesFelipe
 
Study On ATM/POS Switching Software For Banks
Study On ATM/POS Switching Software For BanksStudy On ATM/POS Switching Software For Banks
Study On ATM/POS Switching Software For BanksMd. Main Uddin Rony
 
Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlpPan Xiaotong
 
Cost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learningCost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learningSabidur Rahman
 
Deep learning for text analytics
Deep learning for text analyticsDeep learning for text analytics
Deep learning for text analyticsErik Tromp
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Sean Golliher
 
NLP@Work Conference: email persuasion
NLP@Work Conference: email persuasionNLP@Work Conference: email persuasion
NLP@Work Conference: email persuasionevolutionpd
 
Applications of Machine Learning to Location-based Social Networks
Applications of Machine Learning to Location-based Social NetworksApplications of Machine Learning to Location-based Social Networks
Applications of Machine Learning to Location-based Social NetworksJoan Capdevila Pujol
 
IoT Mobility Forensics
IoT Mobility ForensicsIoT Mobility Forensics
IoT Mobility ForensicsSabidur Rahman
 
Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1Saksham Agrawal
 
AI Reality: Where are we now? Data for Good? - Bill Boorman
AI Reality: Where are we now? Data for Good? - Bill  BoormanAI Reality: Where are we now? Data for Good? - Bill  Boorman
AI Reality: Where are we now? Data for Good? - Bill BoormanTextkernel
 
Using Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesUsing Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesBenjamin Taylor
 
classification_methods-logistic regression Machine Learning
classification_methods-logistic regression Machine Learning classification_methods-logistic regression Machine Learning
classification_methods-logistic regression Machine Learning Shiraz316
 
Airline passenger profiling based on fuzzy deep machine learning
Airline passenger profiling based on fuzzy deep machine learningAirline passenger profiling based on fuzzy deep machine learning
Airline passenger profiling based on fuzzy deep machine learningAyman Qaddumi
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP Textkernel
 

Viewers also liked (20)

Online algorithms in Machine Learning
Online algorithms in Machine LearningOnline algorithms in Machine Learning
Online algorithms in Machine Learning
 
Introduction to Machine Learning & Classification
Introduction to Machine Learning & ClassificationIntroduction to Machine Learning & Classification
Introduction to Machine Learning & Classification
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examples
 
Version controll.pptx
Version controll.pptxVersion controll.pptx
Version controll.pptx
 
Zero
ZeroZero
Zero
 
Study On ATM/POS Switching Software For Banks
Study On ATM/POS Switching Software For BanksStudy On ATM/POS Switching Software For Banks
Study On ATM/POS Switching Software For Banks
 
Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlp
 
Cost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learningCost savings from auto-scaling of network resources using machine learning
Cost savings from auto-scaling of network resources using machine learning
 
Deep learning for text analytics
Deep learning for text analyticsDeep learning for text analytics
Deep learning for text analytics
 
NLP
NLPNLP
NLP
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
NLP@Work Conference: email persuasion
NLP@Work Conference: email persuasionNLP@Work Conference: email persuasion
NLP@Work Conference: email persuasion
 
Applications of Machine Learning to Location-based Social Networks
Applications of Machine Learning to Location-based Social NetworksApplications of Machine Learning to Location-based Social Networks
Applications of Machine Learning to Location-based Social Networks
 
IoT Mobility Forensics
IoT Mobility ForensicsIoT Mobility Forensics
IoT Mobility Forensics
 
Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1
 
AI Reality: Where are we now? Data for Good? - Bill Boorman
AI Reality: Where are we now? Data for Good? - Bill  BoormanAI Reality: Where are we now? Data for Good? - Bill  Boorman
AI Reality: Where are we now? Data for Good? - Bill Boorman
 
Using Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesUsing Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From Resumes
 
classification_methods-logistic regression Machine Learning
classification_methods-logistic regression Machine Learning classification_methods-logistic regression Machine Learning
classification_methods-logistic regression Machine Learning
 
Airline passenger profiling based on fuzzy deep machine learning
Airline passenger profiling based on fuzzy deep machine learningAirline passenger profiling based on fuzzy deep machine learning
Airline passenger profiling based on fuzzy deep machine learning
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP
 

Similar to Classification Based Machine Learning Algorithms

Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithmsShalitha Suranga
 
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAminaRepo
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and ldaSuresh Pokharel
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.ShwetaPatil174
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksFederico Cerutti
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxSivam Chinna
 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdfRahul926331
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptAnshika865276
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesXavier Rafael Palou
 
Cluster Analysis
Cluster Analysis Cluster Analysis
Cluster Analysis Baivab Nag
 
2.7 other classifiers
2.7 other classifiers2.7 other classifiers
2.7 other classifiersKrish_ver2
 
Decision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationDecision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationKomal Kotak
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningPyingkodi Maran
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx36rajneekant
 

Similar to Classification Based Machine Learning Algorithms (20)

Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithms
 
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
 
Data analysis of weather forecasting
Data analysis of weather forecastingData analysis of weather forecasting
Data analysis of weather forecasting
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
 
Classifiers
ClassifiersClassifiers
Classifiers
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.ppt
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
 
Cluster Analysis
Cluster Analysis Cluster Analysis
Cluster Analysis
 
2.7 other classifiers
2.7 other classifiers2.7 other classifiers
2.7 other classifiers
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Decision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationDecision Tree and Bayesian Classification
Decision Tree and Bayesian Classification
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 

Recently uploaded

Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)CUO VEERANAN VEERANAN
 
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...AkbarHidayatullah11
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxMdRafiqulIslam403212
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referencepriyansabari355
 
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Thibaud Le Douarin
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaAdrian Sanabria
 
PredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptxPredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptxKapilSinghal47
 
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...Daniele Malitesta
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxVighnesh Shashtri
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Cyber Security Experts
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfAustraliaChapterIIBA
 
chatgpt-prompts (1).pdf
chatgpt-prompts (1).pdfchatgpt-prompts (1).pdf
chatgpt-prompts (1).pdfMuntherMurjan1
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referencepriyansabari355
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023stephizcoolio
 
data analytics and tools from in2inglobal.pdf
data analytics  and tools from in2inglobal.pdfdata analytics  and tools from in2inglobal.pdf
data analytics and tools from in2inglobal.pdfdigimartfamily
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for usersStephenEfange3
 

Recently uploaded (18)

Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)
 
2.pptx
2.pptx2.pptx
2.pptx
 
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...
Morris H. DeGroot, Mark J. Schervish - Probability and Statistics (4th Editio...
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptx
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as reference
 
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix Enigma
 
PredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptxPredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptx
 
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptx
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
 
chatgpt-prompts (1).pdf
chatgpt-prompts (1).pdfchatgpt-prompts (1).pdf
chatgpt-prompts (1).pdf
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a reference
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023
 
data analytics and tools from in2inglobal.pdf
data analytics  and tools from in2inglobal.pdfdata analytics  and tools from in2inglobal.pdf
data analytics and tools from in2inglobal.pdf
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for users
 

Classification Based Machine Learning Algorithms

  • 2. What is Classification? Classification is a data mining task of predicting the value of a categorical variable (target or class) This is done by building a model based on one or more numerical and/or categorical variables ( predictors, attributes or features) Considered an instance of supervised learning Corresponding unsupervised procedure is known as clustering 2
  • 3. Classification Based Algorithms Four main groups of classification algorithms are: ● Frequency Table - ZeroR - OneR - Naive Bayesian - Decision Tree ● Covariance Matrix - Linear Discriminant Analysis - Logistic Regression ● Similarity Functions - K Nearest Neighbours ● Others - Artificial Neural Network - Support Vector Machine 3
  • 4. 4 Naive Bayes Classifier ● Works based on Bayes’ theorem ● Why its is called Naive? - Because it assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature ● Easy to build ● Useful for very large data sets
  • 5. Bayes’ Theorem The theorem can be stated mathematically as follow: P(A) and P(B) are the probabilities of observing A and B without regard to each other. Also known as Prior Probability. P(A | B), a conditional (Posterior) probability, is the probability of observing event A given that B is true. P(B | A) is the conditional (Posterior)probability of observing event B given that A is true. So, how does naive bayes classifier work based on this? 5
  • 6. How Naive Bayes works? ● Let D be a training set of tuples and each tuple is represented by n-dimensional attribute vector, X = ( x1, x2, ….., xn) ● Suppose that there are m classes, C1, C2,...., Cm. Given a tuple, X, the classifier will predict that X belongs to the class having the highest posterior probability, conditioned on X. That is, the Naive Bayesian classifier predicts that tuple X belongs to the class Ci if and only if ● By Bayes’ theorem ● P(X) is constant for all classes, only needs to be maximized 6
  • 7. How Naive Bayes works? (Contd.) ● To reduce computation in evaluating , the naive assumption of class-conditional independence is made. This presumes that the attributes’ values are conditionally independent of one another, given the class label of the tuple (i.e., that there are no dependence relationships among the attributes). This assumption is called class conditional independence. ● Thus, 7
  • 8. How Naive Bayes Works? (Hands on Calculation) Given all the previous patient's symptoms and diagnosis Does the patient with the following symptoms have the flu? 8 chills runny nose headache fever flu? Y N Mild Y N Y Y No N Y Y N Strong Y Y N Y Mild Y Y N N No N N N Y Strong Y Y N Y Strong N N Y Y Mild Y Y chills runny nose headache fever flu? Y N Mild Y ?
  • 9. How Naive Bayes Works? (Hands on Calculation) Contd. First, we compute all possible individual probabilities conditioned on the target attribute (flu). 9 P(Flu=Y) 0.625 P(Flu=N) 0.375 P(chills=Y|flu=Y) 0.6 P(chills=Y|flu=N) 0.333 P(chills=N|flu=Y) 0.4 P(chills=N|flu=N) 0.666 P(runny nose=Y|flu=Y) 0.8 P(runny nose=Y|flu=N) 0.333 P(runny nose=N|flu=Y) 0.2 P(runny nose=N|flu=N) 0.666 P(headache=Mild|flu=Y) 0.4 P(headache=Mild|flu=N) 0.333 P(headache=No|flu=Y) 0.2 P(headache=No|flu=N) 0.333 P(headache=Strong|flu=Y) 0.4 P(headache=Strong|flu=N) 0.333 P(fever=Y|flu=Y) 0.8 P(fever=Y|flu=N) 0.333 P(fever=N|flu=Y) 0.2 P(fever=N|flu=N) 0.666
  • 10. How Naive Bayes Works? (Hands on Calculation) Contd. And then decide: P(flu=Y|Given attribute) = P(chills = Y|flu=Y).P(runny nose = N|flu=Y).P(headache = Mild|flu=Y).P(fever = N|flu=Y).P(flu=Y) = 0.6 * 0.2 * 0.4 * 0.2 * 0.625 = 0.006 VS P(flu=N|Given attribute) = P(chills = Y|flu=N).P(runny nose = N|flu=N).P(headache = Mild|flu=N).P(fever = N|flu=N).P(flu=N) = 0.333 * 0.666 * 0.333 * 0.666 * 0.375 = 0.0184 So, Naive Bayes classifier predicts that the patient doesn’t have the flu. 10
  • 12. Decision Tree ● Decision tree builds classification or regression models in the form of a tree structure ● It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. ● The final result is a tree with decision nodes and leaf nodes. - A decision node has two or more branches - Leaf node represents a classification or decision ● The topmost decision node in a tree which corresponds to the best predictor called root node ● Decision trees can handle both categorical and numerical data 12
  • 13. Example Set we will work on... 13 Outlook Temp Humidity Windy Play Golf Rainy Hot High False No Rainy Hot High True No Overcast Hot High False Yes Sunny Mild High False Yes Sunny Cool Normal False Yes Sunny Cool Normal True No Overcast Cool Normal True Yes Rainy Mild High False No Rainy Cool Normal False Yes Sunny Mild Normal False Yes Rainy Mild Normal True Yes Overcoast Mild High True Yes Overcoast Hot Normal False Yes Sunny Mild High True No
  • 15. How it works ● The core algorithm for building decision trees called ID3 by J. R. Quinlan ● ID3 uses Entropy and Information Gain to construct a decision tree 15
  • 16. Entropy ● A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar values (homogeneous) ● ID3 algorithm uses entropy to calculate the homogeneity of a sample ● If the sample is completely homogeneous the entropy is zero and if the sample is an equally divided it has entropy of one 16
  • 17. Compute Two Types of Entropy ● To build a decision tree, we need to calculate two types of entropy using frequency tables as follows: ● a) Entropy using the frequency table of one attribute (Entropy of the Target): 17
  • 18. ● b) Entropy using the frequency table of two attributes: 18
  • 19. Information Gain ● The information gain is based on the decrease in entropy after a dataset is split on an attribute ● Constructing a decision tree is all about finding attribute that returns the highest information gain (i.e., the most homogeneous branches) 19
  • 20. Example ● Step 1: Calculate entropy of the target 20
  • 21. Example ● Step 2: The dataset is then split on the different attributes. The entropy for each branch is calculated. ● Then it is added proportionally, to get total entropy for the split. ● The resulting entropy is subtracted from the entropy before the split. ● The result is the Information Gain, or decrease in entropy 21
  • 23. Example ● Step 3: Choose attribute with the largest information gain as the decision node 23
  • 24. Example ● Step 4a: A branch with entropy of 0 is a leaf node. 24
  • 25. Example ● Step 4b: A branch with entropy more than 0 needs further splitting. 25
  • 26. Example ● Step 5: The ID3 algorithm is run recursively on the non-leaf branches, until all data is classified. 26
  • 27. Decision Tree to Decision Rules ● A decision tree can easily be transformed to a set of rules by mapping from the root node to leaf nodes one by one 27
  • 28. Any idea about Random Forest?? After all, Forests are made of trees…. 28
  • 30. k-NN Algorithm ● K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions) ● KNN has been used in statistical estimation and pattern recognition already in the beginning of 1970’s ● A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function ● If k =1 , the what will it do? 30
  • 32. Distance measures for cont. variables 32
  • 33. How many neighbors? ● Choosing the optimal value for K is best done by first inspecting the data ● In general, a large K value is more precise as it reduces the overall noise but there is no guarantee ● Cross-validation is another way to retrospectively determine a good K value by using an independent dataset to validate the K value ● Historically, the optimal K for most datasets has been between 3-10. That produces much better results than 1NN 33
  • 34. Example ● Consider the following data concerning credit default. Age and Loan are two numerical variables (predictors) and Default is the target 34
  • 35. Example ● We can now use the training set to classify an unknown case (Age=48 and Loan=$142,000) using Euclidean distance. ● If K=1 then the nearest neighbor is the last case in the training set with Default=Y ● D = Sqrt[(48-33)^2 + (142000-150000)^2] = 8000.01 >> Default=Y ● With K=3, there are two Default=Y and one Default=N out of three closest neighbors. The prediction for the unknown case is again Default=Y 35
  • 36. Standardized Distance ● One major drawback in calculating distance measures directly from the training set is in the case where variables have different measurement scales or there is a mixture of numerical and categorical variables. ● For example, if one variable is based on annual income in dollars, and the other is based on age in years then income will have a much higher influence on the distance calculated. ● One solution is to standardize the training set 36
  • 37. Standardized Distance Using the standardized distance on the same training set, the unknown case returned a different neighbor which is not a good sign of robustness. 37
  • 38. Some Confusions... What will happen if k equals to multiple of category or label type? What will happen if k = 1? What will happen if we take k’s value equal to dataset size? 38
  • 39. Acknowledgements... Contents are borrowed from… 1. Data Mining Concepts and Techniques By Jiawei Han, Micheline Kamber, Jian Pei 2. Naive Bayes Example (Youtube Video) By Francisco Icaobelli (https://www.youtube.com/watch?v=ZAfarappAO0) 3. Predicting the Future Classification Presented By: Dr. Noureddin Sadawi (https://github.com/nsadawi/DataMiningSlides/blob/master/Slides.pdf) 39