1. 2 or more samples. Hypothesis testing begins with a Null Hypothesis. A null hypothesis is the
assumption that 2 features in 2 samples are not statistically different.
Alpha is the probability of incorrectly rejecting the null hypothesis or rejecting a null
hypothesis even though it is true.
Z Test: The most basic and common type of testing done. This is used to compare a similar
feature in samples from two different populations.
Z-test is used when sample size is large (n>50), or the population variance is known. t-test is
used when sample size is small (n<50) and population variance is unknown.
T Test: This is the most common test used in clinical trials and tests. This is used to compare
statistical difference in sample s from the same population with the effect of an
external agent.
ANOVA: What if you want to compare more than 2 samples of the same population (which
is the limitation of t Test). ANOVA helps one do that
Chi-square test: This is the statistical test for qualitative data. This is the used compare the
dependence of 2 features on each other. E.g.: The concurrence of male gender and
taller height in children of age 5
The binomial distribution is used when there are exactly two mutually exclusive outcomes
of a trial. These outcomes are appropriately labelled "success" and "failure". The binomial
distribution is used to obtain the probability of observing x successes in N trials
Poison Distribution: can be used to estimate how likely it is that something will happen "X"
number of times
Logistic Regressionisa ML technique thatisusedtopredictthe log-oddsof the probabilityof an
eventasa linearcombinationof independentorinotherwordspredictorvariables.
Decisiontree- usesdivideandconquerstrategyforclassifying (Gini Impurity)
RoC Curve can be usedto understandthe overall performance of alogisticregressionmodeland
usedformodel selection.
Confusionmatrix: Accuracy,Sensitivity,Specificity,Precision
Ridge andLasso Regressionare typesof Regularizationtechniques
Regularizationtechniquesare usedtodeal withoverfittingandwhenthe datasetislarge
Ridge andLasso Regressioninvolveaddingpenaltiestothe regressionfunction
Ridge PerformsL2 regularization, addspenaltyequivalenttosquare of the magnitude of coefficients
Laso PerformsL1 reg adds penaltyequivalenttoabsolute value of magnitudeof coefficients