SlideShare a Scribd company logo

Classifiers

Ayurdata
Ayurdata

Classifiers are algorithms which map the input data to any specific type of output category.

1 of 3
Use of classifiers in research problems
Classifiers are algorithms which map the input data to any specific type of output category.
They can be used to build dynamic models with high precision and accuracy such that the
resulting model can be used to predict or classify previously unknown data points. Classifiers
have found wide use in data science applications in various domains. For instance,
classification of a new tumour as malignant or benign, identifying a mail as spam or ham,
marking an insurance claim as possibly fraudulent or genuine are different instances of
classification. Classification algorithms use training data, i.e., they learn from example data
and build a model or procedure to identify a new data point as belonging to a particular
category. Thereby they belong to the class of supervised learning methods.
There are a number of classifiers that can be used to classify data on the basis of historic and
already existing data. A very short description of these methods is given here just to introduce
the concepts.
Logistic Regression
As a simple case, consider a logistic model with two predictors x1 and x2, and one binary
response variable y which we denote as 𝑝 = 𝑃(𝑌 = 1). We assume a linear relationship
between the predictor variables and the log-odds of the event. This relationship can be
expressed as,
log
𝑝
1 − 𝑝
= β + β 𝑥 + β 𝑥
By simple algebraic manipulation, the probability that Y=1 is,
𝑝 =
𝑒
𝑒 + 1
The above formula shows that once the β ′𝑠 are estimated, we can compute the probability that
Y=1 for a given observation, or its complement Y=0.
Decision Trees
In this technique, we split the population or sample into two or more homogeneous sets (or
sub-populations) based on most significant splitter/differentiator in input variables. The end
result of the algorithm would be a tree like structure with root, branch and leaf nodes (target
variable). Decision trees use multiple algorithms to decide to split a node in two or more sub-
nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes. Although
several criteria like Gini index, chi-square, reduction in variance are available for identifying
the nodes, one popular measure used for spitting is the information gain. This is equivalent to
selecting a particular node with maximum reduction in entropy as measured by Shannon’s
index (H).
𝐻 = − 𝑝 log 𝑝
where s is the number of groups at a node and 𝑝 indicate the proportion of individuals in the
ith group.
Random Forests
Ensemble learning is a type of supervised learning technique in which the basic idea is to
generate multiple models on a training dataset and then simply combine (average) their output
rules or their hypotheses to generate a stronger model which performs very well. Random forest
is a classic case of ensemble learning. Decision trees are considered very simple and easily
interpretable but a major drawback in them is that they have poor predictive performance and
poor generalization on test set and so sometimes are called weak learners. In the context of
decision trees, random forest is a model based on multiple trees. Rather than just simply
averaging the predictions of individual trees (which we could call a ‘forest’), this model
uses two key concepts that gives it the name ‘random’ viz., (i) random sampling of training
data points when building trees (ii) random subsets of features considered when splitting nodes.
The idea here is that instead of producing a single complicated and complex model which might
have a high variance that will lead to overfitting or might be too simple and have a high bias
which leads to underfitting, we will generate lots of models using the training set and at the
end combine them.
Support Vector Machines
Given a set of training examples, each marked as belonging to one or the other of two categories,
a Support Vector Machine (SVM) training algorithm builds a model that assigns new examples
to one category or the other. In theory, SVM is a discriminative classifier formally defined by a
separating hyperplane. In other words, given labelled training data, the algorithm outputs an
optimal hyperplane which categorizes new examples. Thus, the hyperplanes are decision
boundaries that help classify the data points. Data points falling on either side of the hyperplane
can be attributed to different classes. Also, the dimension of the hyperplane depends upon the
number of features. If the number of input features is 2, then the hyperplane is just a line. If the
number of input features is 3, then the hyperplane becomes a two-dimensional plane. In practice,
there are many hyperplanes that might classify the data. One reasonable choice as the best
hyperplane is the one that represents the largest separation, or margin, between the two classes.
So, we choose the hyperplane such that the distance from it to the nearest data point on each
side is maximized.
Naïve Bayes Classifier
Naive Bayes algorithm, in particular is a logic-based technique which is simple yet so powerful
that it is often known to outperform complex algorithms for very large datasets. The foundation
pillar for naive Bayes algorithm is the Bayes theorem which states that in a sequence of events,
if A is the first event and B is the second event, then P(B/A) is obtained by the expression,
P(B/A) = P(B) * P(A/B) / P(A)
The reason that Naive Bayes algorithm is called naive is not because it is simple (naïve). It is
because the algorithm makes a very strong assumption about the data having features
independent of each other. In other words, it assumes that the presence of one feature in a class
is completely unrelated to the presence of all other features. If this assumption of independence
holds, naive Bayes performs extremely well and often better than other models.
Mathematically,
𝑃(𝑋 , … , 𝑋 /𝑌) = 𝑃(𝑋 /𝑌)
In order to create a classifier model, we find the probability of a given set of inputs for all
possible values of the class variable Y and pick up the output with maximum probability. This
can be expressed as
𝑌 = 𝑎𝑟𝑔𝑢𝑚𝑎𝑥 𝑃(𝑌) 𝑃(𝑋 /𝑌)
Neural Networks
A neural network is a series of algorithms that endeavours to recognize underlying relationships
in a set of data through a process that mimics the way the human brain operates. The basic
computational unit of the brain is a neuron. In comparison, a ‘neuron’ in a neural network also
called a perceptron is a mathematical function that collects and classifies information according
to a specific architecture. The perceptron receives input from some other nodes, or from an
external source and computes an output. Each input has an associated weight (w) which is
assigned on the basis of its relative importance to other inputs. The node applies a nonlinear
function to the weighted sum of its inputs to create the output. The idea is that the synaptic
strengths (the weights w) are revisable based on learning from the training data which in turn
controls the strength of their influence and direction.
The learning happens in two steps: forward propagation and back propagation. In simple words,
forward propagation is making a guess about the answer and back propagation is minimising
the error between the actual answer and guessed answer. The process of updating the input
signals is continued through multiple iterations to arrive at a decision.
K Nearest Neighbour Technique
K-nearest neighbours (KNN) is a simple algorithm that stores all available cases and classifies
new cases based on a similarity measure (e.g., distance functions). A case is classified by a
majority vote of its neighbours meaning the case being assigned to the most common class
amongst its K nearest neighbours measured by a distance function. Below is step by step
procedure to compute K-nearest neighbours.
1. Determine parameter K=number of neighbours to be used.
2. Calculate the distance between the query-instance (item to be identified as belonging
to a preidentified category) and all the training samples.
3. Sort the distance and determine nearest neighbours based on the Kth minimum distance.
4. Gather the category 𝛾 of the nearest neighbours
5. Use simple majority of the category of nearest neighbours as the prediction values of
the query instance.
The most intuitive nearest neighbour type classifier is the 1-nearest neighbour classifier that
assigns a point x to the class of its closest neighbour in the feature space.
Finally, the choice of particular classifier for a given situation would depend on their relative
performance in respect of accuracy, sensitivity and specificity. There are deeper issues
involved in the use of all these techniques and considerable developments have taken place in
both theory and programming related to the topic.
--- Jayaraman

Recommended

Chapter 09 class advanced
Chapter 09 class advancedChapter 09 class advanced
Chapter 09 class advancedHouw Liong The
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methodsrajshreemuthiah
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learningAnil Yadav
 
report.doc
report.docreport.doc
report.docbutest
 
Analysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data MiningAnalysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data Miningijdmtaiir
 

More Related Content

What's hot

Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —Salah Amean
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.docbutest
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clusteringguest0edcaf
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXmlaij
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2uetian12
 
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3eSAT Journals
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Modelsguest0edcaf
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centersAndres Mendez-Vazquez
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavAgile Testing Alliance
 

What's hot (19)

Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —
 
Random forest
Random forestRandom forest
Random forest
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.doc
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOX
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2
 
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Classification
ClassificationClassification
Classification
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Random forest
Random forestRandom forest
Random forest
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 

Similar to Classifiers

SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithmLaura Petrosanu
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsinfopapers
 
Basic course for computer based methods
Basic course for computer based methodsBasic course for computer based methods
Basic course for computer based methodsimprovemed
 
Basic course on computer-based methods
Basic course on computer-based methodsBasic course on computer-based methods
Basic course on computer-based methodsimprovemed
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive ModelsDatamining Tools
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Gingles Caroline
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorizationmidi
 
Artificial Neural Networks for NIU
Artificial Neural Networks for NIUArtificial Neural Networks for NIU
Artificial Neural Networks for NIUProf. Neeta Awasthy
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..butest
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetIJERA Editor
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3butest
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.ShwetaPatil174
 
coppin chapter 10e.ppt
coppin chapter 10e.pptcoppin chapter 10e.ppt
coppin chapter 10e.pptbutest
 

Similar to Classifiers (20)

SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernels
 
Basic course for computer based methods
Basic course for computer based methodsBasic course for computer based methods
Basic course for computer based methods
 
Basic course on computer-based methods
Basic course on computer-based methodsBasic course on computer-based methods
Basic course on computer-based methods
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
Artificial Neural Networks for NIU
Artificial Neural Networks for NIUArtificial Neural Networks for NIU
Artificial Neural Networks for NIU
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data Set
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3
 
09 classadvanced
09 classadvanced09 classadvanced
09 classadvanced
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.
 
coppin chapter 10e.ppt
coppin chapter 10e.pptcoppin chapter 10e.ppt
coppin chapter 10e.ppt
 

More from Ayurdata

Statistical distributions
Statistical distributionsStatistical distributions
Statistical distributionsAyurdata
 
Health Behaviour: An Ayurveda Perspective
Health Behaviour: An Ayurveda PerspectiveHealth Behaviour: An Ayurveda Perspective
Health Behaviour: An Ayurveda PerspectiveAyurdata
 
Ayur data
Ayur data Ayur data
Ayur data Ayurdata
 
Stat Methods in ayurveda
Stat Methods in ayurvedaStat Methods in ayurveda
Stat Methods in ayurvedaAyurdata
 
Ayurveda colleges and courses
Ayurveda colleges and coursesAyurveda colleges and courses
Ayurveda colleges and coursesAyurdata
 
AyurData Ayurveda Webinar
AyurData Ayurveda WebinarAyurData Ayurveda Webinar
AyurData Ayurveda WebinarAyurdata
 
Advanced Statistical Manual for Ayurveda Research
Advanced Statistical Manual for Ayurveda ResearchAdvanced Statistical Manual for Ayurveda Research
Advanced Statistical Manual for Ayurveda ResearchAyurdata
 
Advanced manual part 4
Advanced manual part 4Advanced manual part 4
Advanced manual part 4Ayurdata
 
Investigation modes in ayurveda
Investigation modes in ayurvedaInvestigation modes in ayurveda
Investigation modes in ayurvedaAyurdata
 
Advanced Statistical Manual Part III
Advanced Statistical Manual Part IIIAdvanced Statistical Manual Part III
Advanced Statistical Manual Part IIIAyurdata
 
Advanced statistical manual part ii
Advanced statistical manual part iiAdvanced statistical manual part ii
Advanced statistical manual part iiAyurdata
 
Advanced statistical manual part i
Advanced statistical manual part iAdvanced statistical manual part i
Advanced statistical manual part iAyurdata
 
Advanced statistical manual for ayurveda research sample
Advanced statistical manual for ayurveda research sampleAdvanced statistical manual for ayurveda research sample
Advanced statistical manual for ayurveda research sampleAyurdata
 
Ayurveda vs allopathy
Ayurveda vs allopathyAyurveda vs allopathy
Ayurveda vs allopathyAyurdata
 
Meta-Analysis in Ayurveda
Meta-Analysis in AyurvedaMeta-Analysis in Ayurveda
Meta-Analysis in AyurvedaAyurdata
 
A manual on statistical analysis in ayurveda research
A manual on statistical analysis in ayurveda researchA manual on statistical analysis in ayurveda research
A manual on statistical analysis in ayurveda researchAyurdata
 
Ich sample size
Ich sample sizeIch sample size
Ich sample sizeAyurdata
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionAyurdata
 
Ayur data startup
Ayur data startupAyur data startup
Ayur data startupAyurdata
 

More from Ayurdata (20)

Statistical distributions
Statistical distributionsStatistical distributions
Statistical distributions
 
BMI
BMIBMI
BMI
 
Health Behaviour: An Ayurveda Perspective
Health Behaviour: An Ayurveda PerspectiveHealth Behaviour: An Ayurveda Perspective
Health Behaviour: An Ayurveda Perspective
 
Ayur data
Ayur data Ayur data
Ayur data
 
Stat Methods in ayurveda
Stat Methods in ayurvedaStat Methods in ayurveda
Stat Methods in ayurveda
 
Ayurveda colleges and courses
Ayurveda colleges and coursesAyurveda colleges and courses
Ayurveda colleges and courses
 
AyurData Ayurveda Webinar
AyurData Ayurveda WebinarAyurData Ayurveda Webinar
AyurData Ayurveda Webinar
 
Advanced Statistical Manual for Ayurveda Research
Advanced Statistical Manual for Ayurveda ResearchAdvanced Statistical Manual for Ayurveda Research
Advanced Statistical Manual for Ayurveda Research
 
Advanced manual part 4
Advanced manual part 4Advanced manual part 4
Advanced manual part 4
 
Investigation modes in ayurveda
Investigation modes in ayurvedaInvestigation modes in ayurveda
Investigation modes in ayurveda
 
Advanced Statistical Manual Part III
Advanced Statistical Manual Part IIIAdvanced Statistical Manual Part III
Advanced Statistical Manual Part III
 
Advanced statistical manual part ii
Advanced statistical manual part iiAdvanced statistical manual part ii
Advanced statistical manual part ii
 
Advanced statistical manual part i
Advanced statistical manual part iAdvanced statistical manual part i
Advanced statistical manual part i
 
Advanced statistical manual for ayurveda research sample
Advanced statistical manual for ayurveda research sampleAdvanced statistical manual for ayurveda research sample
Advanced statistical manual for ayurveda research sample
 
Ayurveda vs allopathy
Ayurveda vs allopathyAyurveda vs allopathy
Ayurveda vs allopathy
 
Meta-Analysis in Ayurveda
Meta-Analysis in AyurvedaMeta-Analysis in Ayurveda
Meta-Analysis in Ayurveda
 
A manual on statistical analysis in ayurveda research
A manual on statistical analysis in ayurveda researchA manual on statistical analysis in ayurveda research
A manual on statistical analysis in ayurveda research
 
Ich sample size
Ich sample sizeIch sample size
Ich sample size
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Ayur data startup
Ayur data startupAyur data startup
Ayur data startup
 

Recently uploaded

Introduction to data science.pdf-Definition,types and application of Data Sci...
Introduction to data science.pdf-Definition,types and application of Data Sci...Introduction to data science.pdf-Definition,types and application of Data Sci...
Introduction to data science.pdf-Definition,types and application of Data Sci...DrSumathyV
 
Artificial Intelligence for Vision: A walkthrough of recent breakthroughs
Artificial Intelligence for Vision:  A walkthrough of recent breakthroughsArtificial Intelligence for Vision:  A walkthrough of recent breakthroughs
Artificial Intelligence for Vision: A walkthrough of recent breakthroughsNikolas Markou
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsDataArchiva
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...ThinkInnovation
 
ppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxHizkiaJastis
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?Denodo
 
Basics of Creating Graphs / Charts using Microsoft Excel
Basics of Creating Graphs / Charts using Microsoft ExcelBasics of Creating Graphs / Charts using Microsoft Excel
Basics of Creating Graphs / Charts using Microsoft ExcelTope Osanyintuyi
 
Choose your perfect jacket.pdf
Choose your perfect jacket.pdfChoose your perfect jacket.pdf
Choose your perfect jacket.pdfAlexia Trejo
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensKondapi V Siva Rama Brahmam
 
fundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxfundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxPoonamRijal
 
itc limited word file.pdf...............
itc limited word file.pdf...............itc limited word file.pdf...............
itc limited word file.pdf...............mahetamanav24
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)UNCResearchHub
 

Recently uploaded (13)

Introduction to data science.pdf-Definition,types and application of Data Sci...
Introduction to data science.pdf-Definition,types and application of Data Sci...Introduction to data science.pdf-Definition,types and application of Data Sci...
Introduction to data science.pdf-Definition,types and application of Data Sci...
 
Artificial Intelligence for Vision: A walkthrough of recent breakthroughs
Artificial Intelligence for Vision:  A walkthrough of recent breakthroughsArtificial Intelligence for Vision:  A walkthrough of recent breakthroughs
Artificial Intelligence for Vision: A walkthrough of recent breakthroughs
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data Goals
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...
 
Electricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptxElectricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptx
 
ppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptx
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?
 
Basics of Creating Graphs / Charts using Microsoft Excel
Basics of Creating Graphs / Charts using Microsoft ExcelBasics of Creating Graphs / Charts using Microsoft Excel
Basics of Creating Graphs / Charts using Microsoft Excel
 
Choose your perfect jacket.pdf
Choose your perfect jacket.pdfChoose your perfect jacket.pdf
Choose your perfect jacket.pdf
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample Screens
 
fundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxfundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptx
 
itc limited word file.pdf...............
itc limited word file.pdf...............itc limited word file.pdf...............
itc limited word file.pdf...............
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)
 

Classifiers

  • 1. Use of classifiers in research problems Classifiers are algorithms which map the input data to any specific type of output category. They can be used to build dynamic models with high precision and accuracy such that the resulting model can be used to predict or classify previously unknown data points. Classifiers have found wide use in data science applications in various domains. For instance, classification of a new tumour as malignant or benign, identifying a mail as spam or ham, marking an insurance claim as possibly fraudulent or genuine are different instances of classification. Classification algorithms use training data, i.e., they learn from example data and build a model or procedure to identify a new data point as belonging to a particular category. Thereby they belong to the class of supervised learning methods. There are a number of classifiers that can be used to classify data on the basis of historic and already existing data. A very short description of these methods is given here just to introduce the concepts. Logistic Regression As a simple case, consider a logistic model with two predictors x1 and x2, and one binary response variable y which we denote as 𝑝 = 𝑃(𝑌 = 1). We assume a linear relationship between the predictor variables and the log-odds of the event. This relationship can be expressed as, log 𝑝 1 − 𝑝 = β + β 𝑥 + β 𝑥 By simple algebraic manipulation, the probability that Y=1 is, 𝑝 = 𝑒 𝑒 + 1 The above formula shows that once the β ′𝑠 are estimated, we can compute the probability that Y=1 for a given observation, or its complement Y=0. Decision Trees In this technique, we split the population or sample into two or more homogeneous sets (or sub-populations) based on most significant splitter/differentiator in input variables. The end result of the algorithm would be a tree like structure with root, branch and leaf nodes (target variable). Decision trees use multiple algorithms to decide to split a node in two or more sub- nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes. Although several criteria like Gini index, chi-square, reduction in variance are available for identifying the nodes, one popular measure used for spitting is the information gain. This is equivalent to selecting a particular node with maximum reduction in entropy as measured by Shannon’s index (H). 𝐻 = − 𝑝 log 𝑝 where s is the number of groups at a node and 𝑝 indicate the proportion of individuals in the ith group.
  • 2. Random Forests Ensemble learning is a type of supervised learning technique in which the basic idea is to generate multiple models on a training dataset and then simply combine (average) their output rules or their hypotheses to generate a stronger model which performs very well. Random forest is a classic case of ensemble learning. Decision trees are considered very simple and easily interpretable but a major drawback in them is that they have poor predictive performance and poor generalization on test set and so sometimes are called weak learners. In the context of decision trees, random forest is a model based on multiple trees. Rather than just simply averaging the predictions of individual trees (which we could call a ‘forest’), this model uses two key concepts that gives it the name ‘random’ viz., (i) random sampling of training data points when building trees (ii) random subsets of features considered when splitting nodes. The idea here is that instead of producing a single complicated and complex model which might have a high variance that will lead to overfitting or might be too simple and have a high bias which leads to underfitting, we will generate lots of models using the training set and at the end combine them. Support Vector Machines Given a set of training examples, each marked as belonging to one or the other of two categories, a Support Vector Machine (SVM) training algorithm builds a model that assigns new examples to one category or the other. In theory, SVM is a discriminative classifier formally defined by a separating hyperplane. In other words, given labelled training data, the algorithm outputs an optimal hyperplane which categorizes new examples. Thus, the hyperplanes are decision boundaries that help classify the data points. Data points falling on either side of the hyperplane can be attributed to different classes. Also, the dimension of the hyperplane depends upon the number of features. If the number of input features is 2, then the hyperplane is just a line. If the number of input features is 3, then the hyperplane becomes a two-dimensional plane. In practice, there are many hyperplanes that might classify the data. One reasonable choice as the best hyperplane is the one that represents the largest separation, or margin, between the two classes. So, we choose the hyperplane such that the distance from it to the nearest data point on each side is maximized. Naïve Bayes Classifier Naive Bayes algorithm, in particular is a logic-based technique which is simple yet so powerful that it is often known to outperform complex algorithms for very large datasets. The foundation pillar for naive Bayes algorithm is the Bayes theorem which states that in a sequence of events, if A is the first event and B is the second event, then P(B/A) is obtained by the expression, P(B/A) = P(B) * P(A/B) / P(A) The reason that Naive Bayes algorithm is called naive is not because it is simple (naïve). It is because the algorithm makes a very strong assumption about the data having features independent of each other. In other words, it assumes that the presence of one feature in a class is completely unrelated to the presence of all other features. If this assumption of independence holds, naive Bayes performs extremely well and often better than other models. Mathematically,
  • 3. 𝑃(𝑋 , … , 𝑋 /𝑌) = 𝑃(𝑋 /𝑌) In order to create a classifier model, we find the probability of a given set of inputs for all possible values of the class variable Y and pick up the output with maximum probability. This can be expressed as 𝑌 = 𝑎𝑟𝑔𝑢𝑚𝑎𝑥 𝑃(𝑌) 𝑃(𝑋 /𝑌) Neural Networks A neural network is a series of algorithms that endeavours to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates. The basic computational unit of the brain is a neuron. In comparison, a ‘neuron’ in a neural network also called a perceptron is a mathematical function that collects and classifies information according to a specific architecture. The perceptron receives input from some other nodes, or from an external source and computes an output. Each input has an associated weight (w) which is assigned on the basis of its relative importance to other inputs. The node applies a nonlinear function to the weighted sum of its inputs to create the output. The idea is that the synaptic strengths (the weights w) are revisable based on learning from the training data which in turn controls the strength of their influence and direction. The learning happens in two steps: forward propagation and back propagation. In simple words, forward propagation is making a guess about the answer and back propagation is minimising the error between the actual answer and guessed answer. The process of updating the input signals is continued through multiple iterations to arrive at a decision. K Nearest Neighbour Technique K-nearest neighbours (KNN) is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions). A case is classified by a majority vote of its neighbours meaning the case being assigned to the most common class amongst its K nearest neighbours measured by a distance function. Below is step by step procedure to compute K-nearest neighbours. 1. Determine parameter K=number of neighbours to be used. 2. Calculate the distance between the query-instance (item to be identified as belonging to a preidentified category) and all the training samples. 3. Sort the distance and determine nearest neighbours based on the Kth minimum distance. 4. Gather the category 𝛾 of the nearest neighbours 5. Use simple majority of the category of nearest neighbours as the prediction values of the query instance. The most intuitive nearest neighbour type classifier is the 1-nearest neighbour classifier that assigns a point x to the class of its closest neighbour in the feature space. Finally, the choice of particular classifier for a given situation would depend on their relative performance in respect of accuracy, sensitivity and specificity. There are deeper issues involved in the use of all these techniques and considerable developments have taken place in both theory and programming related to the topic. --- Jayaraman