SlideShare a Scribd company logo
1 of 26
Download to read offline
Sweet AI
Statistics
Sweet AI
Variables
Variables
Quantitative
(Histogram)
Discrete
(number of students in a class)
Continuous
(weight)
Interval (Temp)
Ratio (Height, Age)
Categorical/ Qualitative
(Bar plot)
Binary
(spam/safe)
Nominal
(non-sortable: colors, genre)
Ordinal
(sortable: grades, product rating)
Sweet AI
Probability
Probability
Independent event
Dependent event Conditional probability P(A|B) =
P A∩𝐡
𝑃(𝐡)
Multiplication rule/ Intersection
Depended event:
P A ∩ 𝐡 = 𝑃 𝐴 βˆ— 𝑃 𝐡 𝐴 π‘œπ‘Ÿ 𝑃 𝐡 βˆ— 𝑃(𝐴|𝐡)
Indepenedent event:
P A ∩ 𝐡 = 𝑃 𝐴 βˆ— 𝑃(𝐡)
Addition rule/ Union P A βˆͺ 𝐡 = 𝑃 𝐴 + 𝑃 𝐡 βˆ’ P A ∩ 𝐡
Complement rule 𝑃(𝐴 ) = 1 βˆ’ 𝑃(𝐴)
Bayes Theorem P(A|B) =
𝑃 B 𝐴 𝑃(𝐴)
𝑃(𝐡)
Permutation (order matter)
n: number of set, r: number of spots
Repetition nr ex: AB, BA, AA, BB
No repetition
𝑛!
(π‘›βˆ’π‘Ÿ)!
ex: AB, BA
Combination (order doesn’t matter)
Repetition
(𝑛+π‘Ÿ βˆ’1)!
π‘Ÿ!(π‘›βˆ’1)!
ex: AA, BA, BB
No repetition
𝑛!
π‘Ÿ!(π‘›βˆ’π‘Ÿ)!
ex: AB
Sweet AI Basic Concepts
Concept Description
Population The entire dataset that you want to draw conclusions about. e.g., all the school’s students of the USA
Sample
A smaller set randomly drawn from the population. e.g., 700 volunteer students from different schools in the USA
Outlier/ Noise/
Anomalies
Datapoints that are at abnormal distance from the other observations, and they can skew the model.
Variate
Univariate Γ  one variable
Bivariate Γ  two variable
Multivariate Γ  more than two variables
Sampling Methods
Probability
Simple
random
Systematic Stratified Cluster
Non-probability
Convenience Snowball Quota Purposive
Sweet AI
Statistical Measures
Statistical Measures
Central Tendency
Mean
Median
Mode
Central Dispersion
Range
Variance
Standard Deviation
IQR
Association
Covariance
Correlation
Sweet AI
Basic Measurement Concepts
Central
Tendency
Description Example
Mean/ Average
( πœ‡, Μ…
π‘₯ )
The total of the numbers divide by the number of numbers.
Sensitive to outlier.
[4, 3, 7, 2, 3, 6]: 4 + 3 + 7 + 2 + 3 + 6 = 25 / 7 Γ  4.16
Median ( Med, M) Sort the numbers and find the middle number [4, 3, 7, 2, 3, 6]: [2, 3, 3, 4, 6, 7] Γ  3.5
Mode The most common occurring number [4, 3, 7, 2, 3, 6]: Γ  3
Sweet AI
Central Dispersion
Dispersion Description Example
Range The difference between smallest and largest number [4, 3, 7, 2, 3, 6]: 7 – 2 Γ  5
Variance (𝜎2
) Shows how spread-out the data points are, and measures the width of the
distribution around mean
π‘π‘œπ‘π‘’π‘™π‘Žπ‘‘π‘–π‘œπ‘›: 𝜎2 =
βˆ‘/01
2
(#$ % &)
(
π‘†π‘Žπ‘šπ‘π‘™π‘’: 𝑆2 =
βˆ‘/01
2
(#$ % #)
( %)
[4, 3, 7, 2, 3, 6]: πœ‡ = 5 Γ  dist(-1, -2, 2, -3, -2, 1)2
à 𝜎2 = 23/6 = 3.83
Standard
Deviation (𝜎)
How spread out the data is around the mean and used to identify outliers.
data points that are more than one sd from mean might be consider
unusual 𝜎 = 𝜎2
[4, 3, 7, 2, 3, 6]: 𝜎 = 3.83 = 1.95
Standard
Error (SE)
Population Sd estimates how spread-out individual values are from the population mean.
Standard error estimates the accuracy of a sample and how far a sample mean is likely to be from the population mean.
𝑆𝐸 Μ…
π‘₯ =
*
+
(𝜎: population standard deviation, n: # datapoints in the sample) Γ  return the result as mean ±𝑆𝐸
Sweet AI
Concept Description Example
Quartiles All datapoints are considered and sorted
ascendingly, find median, then find median of two
other sets:
β€’ Q1: lower/ first
β€’ Q2: median/ middle/ second
β€’ Q3: upper/ third
β€’ [2, 3, 3, 4, 6, 7, 8, 12, 19, 19, 24, 26]
β€’ [ 2,3,3 | 4,6,7 | 8,12,19 | 19,24,26]
Interquartile
Range (IQR)
IQR = Q3 – Q1
Outlier = Q1 – 1.5 x IQR
Outlier = Q3 + 1.5 x IQR
Percentiles 99 values that split the sample into 100 equal size subsamples
Central Dispersion
Q1 Q3
IQR
Whisker
Whisker
Fence at 1.5 IQR
Sweet AI
Association
Association Description
Covariance Measures the relationship between two variable in two dimensions.
Positive value Γ  two variables move in the same direction
Negative value Γ  two variables move in the inverse direction
Closer to zero indicates weak relationship
Farther from 0 indicates stronger relationship
Pearson Correlation
Coefficient/ Pearson’s r
Measure the strength and direction of a
linear relationship two variables.
-1 (strong negative relationship) < r < +1
(strong positive relationship)
P = 0 Γ  no correlation
π‘π‘œπ‘π‘’π‘™π‘Žπ‘‘π‘–π‘œπ‘›: πΆπ‘œπ‘£ 𝑋, π‘Œ =
βˆ‘(𝑋𝑖 βˆ’ D
𝑋)(π‘Œπ‘– βˆ’ D
π‘Œ)
𝑁
π‘†π‘Žπ‘šπ‘π‘™π‘’: πΆπ‘œπ‘£ 𝑋, π‘Œ =
βˆ‘(𝑋𝑖 βˆ’ D
𝑋)(π‘Œπ‘– βˆ’
D
π‘Œ)
𝑁 βˆ’ 1
Image from U of Wisconsin.
πœŒπ‘‹, π‘Œ =
π‘π‘œπ‘£(𝑋, π‘Œ)
πœŽπ‘‹πœŽπ‘Œ
=
βˆ‘(π‘₯𝑖 βˆ’ Μ…
π‘₯)(𝑦𝑖 βˆ’ 1
𝑦)
βˆ‘(π‘₯𝑖 βˆ’ Μ…
π‘₯)2 βˆ‘ 𝑦𝑖 βˆ’ 1
𝑦 2
Sweet AI
Distribution
Credit: Harold Toomey, WyzAnt Tutor
Sweet AI
Distribution
Discrete/ mass function
Continuous/ density function
Sweet AI
Hypothesis Testing
Hypotheses
Alternative (H1/Ha)
e.g., a male salary is higher than a female salary for a same job position
Null (H0)
e.g., a male salary is equal to a female salary for a same job position
Non-Directional
Directional
Statistical
Sweet AI
Hypothesis Testing
Hypothesis Test
Parametric Test
Regression
Simple Linear
Regression
Multiple Linear
Regression
Logistic Regression
Comparison
t-test
ANOVA
MANOVA
Correlation
Pearson's r
Non-Parametric Test
Spearman's r
Chi square test
ANOSIM
Wilcoxon
Sign test
Sweet AI
Hypothesis Testing
State H0 & H1 Collect testing samples
Select & Execute
Statistical Test
Infer the results
(Reject/fail to Reject H0)
Ho: Men are, on average,
not getting higher salary
than women.
Ha: Men are, on average,
getting higher salary
than women.
Equal proportion of men
& women in a variety of
industries in scope of a
country
One-tail t-test
Average diff 20k and p-
value 0.002, which is
consistent with H1
Terminology Description
Significance level
/confidence
level(𝛼)
A threshold to decide whether a test statistic is statistically significant. Statical significance means high likely a
relationship between variables is not caused by chance. 𝛼 is lays in the area inside the tail(s) of the H0
𝛼 = 1 – (confidence level /100) Γ  Common practice 𝛼 : 0.01, 0.05, 0.1
P-value
(probability
Value)
Determines plausibility of null hypotheses, whether H0 should be rejected or not! P(Sample statistics| H0 True)
0 < p-value < 1
P-value β‰₯ 𝛼 : results are not statically significant, H0 not rejected/failed, the null must fly!
P-value < 𝛼 : results are statically significant, H0 rejected/failed, the null must go!
Sweet AI Basic Concepts
H0 is ... True False
Rejected Type I Error
𝛼
Correct
Not Rejected Correct Type II Error
𝛽
https://www.abtasty.com
β€’ Used to test if two groups of data are different from each other and we don’t know standard deviation of population
β€’ Normal Distribution Formula:
β€’ To calculate percentile of a datapoint we should standardize a Normal Distribution to a Standard Normal Distribution
β€’ Standard Normal Distribution has πœ‡ = 0 & 𝜎 = 1
β€’ How to determine x’s percentile/probability or how far from typical is this result?
1. Standardize the values of normal distribution and calculating z-score by population πœ‡ and population 𝜎
v for a single raw datum x: 𝒛 =
𝒙 βˆ’ 𝝁
𝝈
v for n independent and distributed samples(X): 𝒁 =
7
𝒙 βˆ’ 𝝁
𝝈/ 𝒏
v for proportion 𝒁 =
:
π’‘βˆ’π’‘
𝒑(πŸβˆ’π’‘)/𝒏
"
𝑝: π‘œπ‘π‘ π‘’π‘Ÿπ‘£π‘’π‘‘ π‘ π‘Žπ‘šπ‘π‘™π‘’ π‘π‘Ÿπ‘œπ‘π‘œπ‘Ÿπ‘‘π‘–π‘œπ‘›, 𝑝: hypothesized population proportion, n: sample size
2. Looking at z-table to map a z-score to the area under a normal distribution curve and return P-value
3. Compare p-value with 𝛼: 𝑖𝑓 𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ β‰₯ 𝛼: Fail to reject H0 else Reject H0
Sweet AI
Z-test
Sweet AI
Z-test
www.z-table.com
Sweet AI
Student t-test
β€’ Used to test if two groups of data are different from each other and we don’t know standard deviation of population
β€’ Assumption:
β€’ Normal distribution
β€’ Similar variance for each group/sample
β€’ Same number of datapoint in each group/sample (20-30), more than this we should use z-test
β€’ H0: There is no difference between groups
β€’ H1: There is a difference between groups
Types of t-test Description Formula
Degree of
freedom
One-sample t-test Test if a population mean is equal to some value 𝝁
Μ…
π‘₯: sample mean, πœ‡: population mean, s: sample standard deviation, n:
sample size
𝑑 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ =
Μ…
π‘₯ βˆ’ πœ‡
𝑆
𝑛
df = n -1
Dependent/Paired-
samples t-test
Test whether two population means are equal by sampling the same
population twice , s: sample variances
𝑑 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ =
βˆ‘π‘‘
𝑛 βˆ‘π‘‘ 2 βˆ’ βˆ‘π‘‘ 2
𝑛 βˆ’ 1
df = n -1
Independent two-sample
t-test/ unpaired samples
t-test
Test if two population means are equal, two independent samples of
different size with unequal variance
t βˆ’ π‘£π‘Žπ‘™π‘’π‘’ =
π‘ π‘–π‘”π‘›π‘Žπ‘™
π‘›π‘œπ‘–π‘ π‘’
=
Μ…
π‘₯1 βˆ’ Μ…
π‘₯2
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2
df = n2 + n1 -2
Sweet AI
Student t-test
stanford.edu
t-value < π‘π‘Ÿπ‘–π‘‘π‘–π‘π‘Žπ‘™ π‘£π‘Žπ‘™π‘’π‘’ β†’ Do Not Reject H0
t-value > π‘π‘Ÿπ‘–π‘‘π‘–π‘π‘Žπ‘™ π‘£π‘Žπ‘™π‘’π‘’ β†’ Reject H0
Degrees of freedom (df)
1. Calculate t-value and df
2. Determine on one or two tail test and the level
of confidence
3. Look up critical value from t-table and
determine to reject or fail to reject H0
Sweet AI
t-test vs. z-test
Start
Known 𝜎
sample size < 30
Is population highly
skewed?
t - test sign test
sample size >= 30
Is population highly
skewed?
z - test Alternative methods
Not known 𝜎
Is population highly
skewed?
t-test Alternative methods
Yes
No
Yes
No No Yes
β€’ t-test and z-test are used to
determine and compare the
significance of a set of data.
Sweet AI
Analysis Of Variance (ANOVA)
β€’ ANOVA determines the effects of several categorical independent variables on one numerical dependent variable.
ANOVA
One way 1 independent categorical variable on a single dependent variable
Two way 2 independent categorical variables on a single dependent variable
N-way Multiple independent categorical variables
Sweet AI
Analysis Of Variance (ANOVA)
1. Calculate variance between group and within groups
2. Calculate degree of freedom
3. Compute F-value. F βˆ’ π‘£π‘Žπ‘™π‘’π‘’ =
π‘£π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ 𝑏𝑒𝑑𝑀𝑒𝑒𝑛 π‘”π‘Ÿπ‘œπ‘’π‘π‘ 
π‘£π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ π‘€π‘–π‘‘β„Žπ‘–π‘›π‘” π‘”π‘Ÿπ‘œπ‘’π‘π‘ 
=
𝑆𝑆𝐡𝐺/𝑑𝑓1
π‘†π‘†π‘ŠπΊ/𝑑𝑓2
, df1 = n -1, df2 = (n – 1)m , n: # sample in each group, m: # groups
4. Find critical value/F-score from F Distribution table using df1, df2 and a selected alpha http://www.socr.ucla.edu/Applets.dir/F_Table.html
5. Compare F-value and F-score, if f-value < fcritical : fail to reject H0 else H0 is rejected
Sweet AI
t-test vs. z-test
Sweet AI
Hypothesis Testing towardsdatascience.com
Sweet AI
Hypothesis Testing
Sweet AI
Python Library
Type of Test Scipy Code
Determine Gaussian distribution of data from scipy.stats import shapiro/ normaltest
stat, p = shapiro(data) # p > 0.05 has Gaussian distribution
Determine linear relationship of two samples from scipy.stats import pearsonr
stat, p = pearsonr(data1, data2) # p > 0.05 more likely they are independent
Determine monotonic relationship of two samples from scipy.stats import spearmanr/ kendalltau
stat, p = spearmanr(data1, data2) # p > 0.05 more likely they are independent
Determine relationship of two categorical variables from scipy.stats import chi2_contingency
stat, p, dof, expected = chi2_contingency(table) # p > 0.05 more likely they are independent
Determine z-score or percentile from scipy import stats
stats.norm.cdf(z) or stats.norm.ppf(p)
Determine if the means of two independent normally distributed
samples are significantly different (student t-test)
from scipy.stats import ttest_ind
stat, p = ttest_ind(data1, data2) # p > 0.05 more likely the same distribution
Determine if the means of two paired normally distributed
samples are significantly different (student t-test)
from scipy.stats import ttest_rel
stat, p = ttest_rel(data1, data2) # p > 0.05 more likely the same distribution
Determine if the means of two or more independent normally
distributed samples are significantly different (ANOVA)
from scipy.stats import f_oneway
stat, p = f_oneway(data1, data2, data3) # p > 0.05 more likely the same distribution
Determine if the distribution of two independent samples are
equal (Mann- Whitney U test)
from scipy.stats import mannwhitneyu
stat, p = mannwhitneyu(data1, data2) # p > 0.05 more likely the same distribution

More Related Content

What's hot

Applied Statistics In Business
Applied Statistics In BusinessApplied Statistics In Business
Applied Statistics In BusinessAshish Nangla
Β 
Introduction to Regression Analysis and R
Introduction to Regression Analysis and R   Introduction to Regression Analysis and R
Introduction to Regression Analysis and R Rachana Taneja Bhatia
Β 
Statistical tests/prosthodontic courses
Statistical tests/prosthodontic coursesStatistical tests/prosthodontic courses
Statistical tests/prosthodontic coursesIndian dental academy
Β 
Chapter14
Chapter14Chapter14
Chapter14rwmiller
Β 
PG STAT 531 Lecture 2 Descriptive statistics
PG STAT 531 Lecture 2 Descriptive statisticsPG STAT 531 Lecture 2 Descriptive statistics
PG STAT 531 Lecture 2 Descriptive statisticsAashish Patel
Β 
Reporting a single sample t-test
Reporting a single sample t-testReporting a single sample t-test
Reporting a single sample t-testKen Plummer
Β 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in RAlichy Sowmya
Β 
Bbs10 ppt ch03
Bbs10 ppt ch03Bbs10 ppt ch03
Bbs10 ppt ch03Anwar Afridi
Β 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysisNimrita Koul
Β 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysisFarzad Javidanrad
Β 
Presentation on Regression Analysis
Presentation on Regression AnalysisPresentation on Regression Analysis
Presentation on Regression AnalysisJ P Verma
Β 
PG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of Data
PG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of DataPG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of Data
PG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of DataAashish Patel
Β 
One-way ANOVA research paper
One-way ANOVA research paperOne-way ANOVA research paper
One-way ANOVA research paperJose Dela Cruz
Β 

What's hot (20)

Applied Statistics In Business
Applied Statistics In BusinessApplied Statistics In Business
Applied Statistics In Business
Β 
Introduction to Regression Analysis and R
Introduction to Regression Analysis and R   Introduction to Regression Analysis and R
Introduction to Regression Analysis and R
Β 
Statistical tests/prosthodontic courses
Statistical tests/prosthodontic coursesStatistical tests/prosthodontic courses
Statistical tests/prosthodontic courses
Β 
Chapter14
Chapter14Chapter14
Chapter14
Β 
PG STAT 531 Lecture 2 Descriptive statistics
PG STAT 531 Lecture 2 Descriptive statisticsPG STAT 531 Lecture 2 Descriptive statistics
PG STAT 531 Lecture 2 Descriptive statistics
Β 
Ds vs Is discuss 3.1
Ds vs Is discuss 3.1Ds vs Is discuss 3.1
Ds vs Is discuss 3.1
Β 
Reporting a single sample t-test
Reporting a single sample t-testReporting a single sample t-test
Reporting a single sample t-test
Β 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
Β 
Statistics - Basics
Statistics - BasicsStatistics - Basics
Statistics - Basics
Β 
Applied statistics part 5
Applied statistics part 5Applied statistics part 5
Applied statistics part 5
Β 
Bbs10 ppt ch03
Bbs10 ppt ch03Bbs10 ppt ch03
Bbs10 ppt ch03
Β 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
Β 
T test statistics
T test statisticsT test statistics
T test statistics
Β 
Applied statistics part 3
Applied statistics part 3Applied statistics part 3
Applied statistics part 3
Β 
Applied statistics part 4
Applied statistics part  4Applied statistics part  4
Applied statistics part 4
Β 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysis
Β 
Anova (1)
Anova (1)Anova (1)
Anova (1)
Β 
Presentation on Regression Analysis
Presentation on Regression AnalysisPresentation on Regression Analysis
Presentation on Regression Analysis
Β 
PG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of Data
PG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of DataPG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of Data
PG STAT 531 Lecture 3 Graphical and Diagrammatic Representation of Data
Β 
One-way ANOVA research paper
One-way ANOVA research paperOne-way ANOVA research paper
One-way ANOVA research paper
Β 

Similar to BasicStatistics.pdf

Medical statistics2
Medical statistics2Medical statistics2
Medical statistics2Amany El-seoud
Β 
Basics in Epidemiology & Biostatistics 2 RSS6 2014
Basics in Epidemiology & Biostatistics 2 RSS6 2014Basics in Epidemiology & Biostatistics 2 RSS6 2014
Basics in Epidemiology & Biostatistics 2 RSS6 2014RSS6
Β 
Applied Statistics And Doe Mayank
Applied Statistics And Doe MayankApplied Statistics And Doe Mayank
Applied Statistics And Doe Mayankrealmayank
Β 
MPhil clinical psy Non-parametric statistics.pptx
MPhil clinical psy Non-parametric statistics.pptxMPhil clinical psy Non-parametric statistics.pptx
MPhil clinical psy Non-parametric statistics.pptxrodrickrajamanickam
Β 
Chi square test social research refer.ppt
Chi square test social research refer.pptChi square test social research refer.ppt
Chi square test social research refer.pptSnehamurali18
Β 
Quantitative_analysis.ppt
Quantitative_analysis.pptQuantitative_analysis.ppt
Quantitative_analysis.pptmousaderhem1
Β 
marketing research & applications on SPSS
marketing research & applications on SPSSmarketing research & applications on SPSS
marketing research & applications on SPSSANSHU TIWARI
Β 
Chi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & groupChi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & groupNeelam Zafar
Β 
Marketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxMarketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxxababid981
Β 
Elementary statistics for Food Indusrty
Elementary statistics for Food IndusrtyElementary statistics for Food Indusrty
Elementary statistics for Food IndusrtyAtcharaporn Khoomtong
Β 
Medical Statistics Part-II:Inferential statistics
Medical Statistics Part-II:Inferential  statisticsMedical Statistics Part-II:Inferential  statistics
Medical Statistics Part-II:Inferential statisticsRamachandra Barik
Β 
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxWeek 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxcockekeshia
Β 
Categorical data analysis full lecture note PPT.pptx
Categorical data analysis full lecture note  PPT.pptxCategorical data analysis full lecture note  PPT.pptx
Categorical data analysis full lecture note PPT.pptxMinilikDerseh1
Β 
Proportion test using Chi square
Proportion test using Chi squareProportion test using Chi square
Proportion test using Chi squareParag Shah
Β 
Data analysis
Data analysisData analysis
Data analysismetalkid132
Β 

Similar to BasicStatistics.pdf (20)

Medical statistics2
Medical statistics2Medical statistics2
Medical statistics2
Β 
Stat2013
Stat2013Stat2013
Stat2013
Β 
Basics in Epidemiology & Biostatistics 2 RSS6 2014
Basics in Epidemiology & Biostatistics 2 RSS6 2014Basics in Epidemiology & Biostatistics 2 RSS6 2014
Basics in Epidemiology & Biostatistics 2 RSS6 2014
Β 
Applied Statistics And Doe Mayank
Applied Statistics And Doe MayankApplied Statistics And Doe Mayank
Applied Statistics And Doe Mayank
Β 
MPhil clinical psy Non-parametric statistics.pptx
MPhil clinical psy Non-parametric statistics.pptxMPhil clinical psy Non-parametric statistics.pptx
MPhil clinical psy Non-parametric statistics.pptx
Β 
Chi square test social research refer.ppt
Chi square test social research refer.pptChi square test social research refer.ppt
Chi square test social research refer.ppt
Β 
Quantitative_analysis.ppt
Quantitative_analysis.pptQuantitative_analysis.ppt
Quantitative_analysis.ppt
Β 
marketing research & applications on SPSS
marketing research & applications on SPSSmarketing research & applications on SPSS
marketing research & applications on SPSS
Β 
Chi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & groupChi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & group
Β 
Marketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxMarketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptx
Β 
Elementary statistics for Food Indusrty
Elementary statistics for Food IndusrtyElementary statistics for Food Indusrty
Elementary statistics for Food Indusrty
Β 
elementary statistic
elementary statisticelementary statistic
elementary statistic
Β 
Stat topics
Stat topicsStat topics
Stat topics
Β 
Medical Statistics Part-II:Inferential statistics
Medical Statistics Part-II:Inferential  statisticsMedical Statistics Part-II:Inferential  statistics
Medical Statistics Part-II:Inferential statistics
Β 
Chi2 Anova
Chi2 AnovaChi2 Anova
Chi2 Anova
Β 
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docxWeek 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Week 5 Lecture 14 The Chi Square TestQuite often, patterns of .docx
Β 
Categorical data analysis full lecture note PPT.pptx
Categorical data analysis full lecture note  PPT.pptxCategorical data analysis full lecture note  PPT.pptx
Categorical data analysis full lecture note PPT.pptx
Β 
Proportion test using Chi square
Proportion test using Chi squareProportion test using Chi square
Proportion test using Chi square
Β 
Data analysis
Data analysisData analysis
Data analysis
Β 
Quantitative Data analysis
Quantitative Data analysisQuantitative Data analysis
Quantitative Data analysis
Β 

Recently uploaded

Call Girls 🫀 Dwarka ➑️ 9711199171 ➑️ Delhi 🫦 Two shot with one girl
Call Girls 🫀 Dwarka ➑️ 9711199171 ➑️ Delhi 🫦 Two shot with one girlCall Girls 🫀 Dwarka ➑️ 9711199171 ➑️ Delhi 🫦 Two shot with one girl
Call Girls 🫀 Dwarka ➑️ 9711199171 ➑️ Delhi 🫦 Two shot with one girlkumarajju5765
Β 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
Β 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
Β 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
Β 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
Β 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
Β 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
Β 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra
Β 
Junnasandra Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore...Junnasandra Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore...amitlee9823
Β 
꧁❀ Greater Noida Call Girls Delhi ❀꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❀ Greater Noida Call Girls Delhi ❀꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❀ Greater Noida Call Girls Delhi ❀꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❀ Greater Noida Call Girls Delhi ❀꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
Β 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
Β 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
Β 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
Β 
Delhi Call Girls CP 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Callshivangimorya083
Β 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
Β 
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...amitlee9823
Β 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
Β 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
Β 

Recently uploaded (20)

Call Girls 🫀 Dwarka ➑️ 9711199171 ➑️ Delhi 🫦 Two shot with one girl
Call Girls 🫀 Dwarka ➑️ 9711199171 ➑️ Delhi 🫦 Two shot with one girlCall Girls 🫀 Dwarka ➑️ 9711199171 ➑️ Delhi 🫦 Two shot with one girl
Call Girls 🫀 Dwarka ➑️ 9711199171 ➑️ Delhi 🫦 Two shot with one girl
Β 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Β 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Β 
CHEAP Call Girls in Saket (-DELHI )πŸ” 9953056974πŸ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )πŸ” 9953056974πŸ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )πŸ” 9953056974πŸ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )πŸ” 9953056974πŸ”(=)/CALL GIRLS SERVICE
Β 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
Β 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Β 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
Β 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
Β 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
Β 
Junnasandra Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore...Junnasandra Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore...
Β 
꧁❀ Greater Noida Call Girls Delhi ❀꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❀ Greater Noida Call Girls Delhi ❀꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❀ Greater Noida Call Girls Delhi ❀꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❀ Greater Noida Call Girls Delhi ❀꧂ 9711199171 ☎️ Hard And Sexy Vip Call
Β 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
Β 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
Β 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
Β 
Delhi Call Girls CP 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 β˜Žβœ”πŸ‘Œβœ” Whatsapp Hard And Sexy Vip Call
Β 
꧁❀ Aerocity Call Girls Service Aerocity Delhi ❀꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❀ Aerocity Call Girls Service Aerocity Delhi ❀꧂ 9999965857 ☎️ Hard And Sexy ...꧁❀ Aerocity Call Girls Service Aerocity Delhi ❀꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❀ Aerocity Call Girls Service Aerocity Delhi ❀꧂ 9999965857 ☎️ Hard And Sexy ...
Β 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Β 
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...
Β 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Β 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
Β 

BasicStatistics.pdf

  • 2. Sweet AI Variables Variables Quantitative (Histogram) Discrete (number of students in a class) Continuous (weight) Interval (Temp) Ratio (Height, Age) Categorical/ Qualitative (Bar plot) Binary (spam/safe) Nominal (non-sortable: colors, genre) Ordinal (sortable: grades, product rating)
  • 3. Sweet AI Probability Probability Independent event Dependent event Conditional probability P(A|B) = P A∩𝐡 𝑃(𝐡) Multiplication rule/ Intersection Depended event: P A ∩ 𝐡 = 𝑃 𝐴 βˆ— 𝑃 𝐡 𝐴 π‘œπ‘Ÿ 𝑃 𝐡 βˆ— 𝑃(𝐴|𝐡) Indepenedent event: P A ∩ 𝐡 = 𝑃 𝐴 βˆ— 𝑃(𝐡) Addition rule/ Union P A βˆͺ 𝐡 = 𝑃 𝐴 + 𝑃 𝐡 βˆ’ P A ∩ 𝐡 Complement rule 𝑃(𝐴 ) = 1 βˆ’ 𝑃(𝐴) Bayes Theorem P(A|B) = 𝑃 B 𝐴 𝑃(𝐴) 𝑃(𝐡) Permutation (order matter) n: number of set, r: number of spots Repetition nr ex: AB, BA, AA, BB No repetition 𝑛! (π‘›βˆ’π‘Ÿ)! ex: AB, BA Combination (order doesn’t matter) Repetition (𝑛+π‘Ÿ βˆ’1)! π‘Ÿ!(π‘›βˆ’1)! ex: AA, BA, BB No repetition 𝑛! π‘Ÿ!(π‘›βˆ’π‘Ÿ)! ex: AB
  • 4. Sweet AI Basic Concepts Concept Description Population The entire dataset that you want to draw conclusions about. e.g., all the school’s students of the USA Sample A smaller set randomly drawn from the population. e.g., 700 volunteer students from different schools in the USA Outlier/ Noise/ Anomalies Datapoints that are at abnormal distance from the other observations, and they can skew the model. Variate Univariate Γ  one variable Bivariate Γ  two variable Multivariate Γ  more than two variables Sampling Methods Probability Simple random Systematic Stratified Cluster Non-probability Convenience Snowball Quota Purposive
  • 5. Sweet AI Statistical Measures Statistical Measures Central Tendency Mean Median Mode Central Dispersion Range Variance Standard Deviation IQR Association Covariance Correlation
  • 6. Sweet AI Basic Measurement Concepts Central Tendency Description Example Mean/ Average ( πœ‡, Μ… π‘₯ ) The total of the numbers divide by the number of numbers. Sensitive to outlier. [4, 3, 7, 2, 3, 6]: 4 + 3 + 7 + 2 + 3 + 6 = 25 / 7 Γ  4.16 Median ( Med, M) Sort the numbers and find the middle number [4, 3, 7, 2, 3, 6]: [2, 3, 3, 4, 6, 7] Γ  3.5 Mode The most common occurring number [4, 3, 7, 2, 3, 6]: Γ  3
  • 7. Sweet AI Central Dispersion Dispersion Description Example Range The difference between smallest and largest number [4, 3, 7, 2, 3, 6]: 7 – 2 Γ  5 Variance (𝜎2 ) Shows how spread-out the data points are, and measures the width of the distribution around mean π‘π‘œπ‘π‘’π‘™π‘Žπ‘‘π‘–π‘œπ‘›: 𝜎2 = βˆ‘/01 2 (#$ % &) ( π‘†π‘Žπ‘šπ‘π‘™π‘’: 𝑆2 = βˆ‘/01 2 (#$ % #) ( %) [4, 3, 7, 2, 3, 6]: πœ‡ = 5 Γ  dist(-1, -2, 2, -3, -2, 1)2 Γ  𝜎2 = 23/6 = 3.83 Standard Deviation (𝜎) How spread out the data is around the mean and used to identify outliers. data points that are more than one sd from mean might be consider unusual 𝜎 = 𝜎2 [4, 3, 7, 2, 3, 6]: 𝜎 = 3.83 = 1.95 Standard Error (SE) Population Sd estimates how spread-out individual values are from the population mean. Standard error estimates the accuracy of a sample and how far a sample mean is likely to be from the population mean. 𝑆𝐸 Μ… π‘₯ = * + (𝜎: population standard deviation, n: # datapoints in the sample) Γ  return the result as mean ±𝑆𝐸
  • 8. Sweet AI Concept Description Example Quartiles All datapoints are considered and sorted ascendingly, find median, then find median of two other sets: β€’ Q1: lower/ first β€’ Q2: median/ middle/ second β€’ Q3: upper/ third β€’ [2, 3, 3, 4, 6, 7, 8, 12, 19, 19, 24, 26] β€’ [ 2,3,3 | 4,6,7 | 8,12,19 | 19,24,26] Interquartile Range (IQR) IQR = Q3 – Q1 Outlier = Q1 – 1.5 x IQR Outlier = Q3 + 1.5 x IQR Percentiles 99 values that split the sample into 100 equal size subsamples Central Dispersion Q1 Q3 IQR Whisker Whisker Fence at 1.5 IQR
  • 9. Sweet AI Association Association Description Covariance Measures the relationship between two variable in two dimensions. Positive value Γ  two variables move in the same direction Negative value Γ  two variables move in the inverse direction Closer to zero indicates weak relationship Farther from 0 indicates stronger relationship Pearson Correlation Coefficient/ Pearson’s r Measure the strength and direction of a linear relationship two variables. -1 (strong negative relationship) < r < +1 (strong positive relationship) P = 0 Γ  no correlation π‘π‘œπ‘π‘’π‘™π‘Žπ‘‘π‘–π‘œπ‘›: πΆπ‘œπ‘£ 𝑋, π‘Œ = βˆ‘(𝑋𝑖 βˆ’ D 𝑋)(π‘Œπ‘– βˆ’ D π‘Œ) 𝑁 π‘†π‘Žπ‘šπ‘π‘™π‘’: πΆπ‘œπ‘£ 𝑋, π‘Œ = βˆ‘(𝑋𝑖 βˆ’ D 𝑋)(π‘Œπ‘– βˆ’ D π‘Œ) 𝑁 βˆ’ 1 Image from U of Wisconsin. πœŒπ‘‹, π‘Œ = π‘π‘œπ‘£(𝑋, π‘Œ) πœŽπ‘‹πœŽπ‘Œ = βˆ‘(π‘₯𝑖 βˆ’ Μ… π‘₯)(𝑦𝑖 βˆ’ 1 𝑦) βˆ‘(π‘₯𝑖 βˆ’ Μ… π‘₯)2 βˆ‘ 𝑦𝑖 βˆ’ 1 𝑦 2
  • 10. Sweet AI Distribution Credit: Harold Toomey, WyzAnt Tutor
  • 11. Sweet AI Distribution Discrete/ mass function Continuous/ density function
  • 12. Sweet AI Hypothesis Testing Hypotheses Alternative (H1/Ha) e.g., a male salary is higher than a female salary for a same job position Null (H0) e.g., a male salary is equal to a female salary for a same job position Non-Directional Directional Statistical
  • 13. Sweet AI Hypothesis Testing Hypothesis Test Parametric Test Regression Simple Linear Regression Multiple Linear Regression Logistic Regression Comparison t-test ANOVA MANOVA Correlation Pearson's r Non-Parametric Test Spearman's r Chi square test ANOSIM Wilcoxon Sign test
  • 14. Sweet AI Hypothesis Testing State H0 & H1 Collect testing samples Select & Execute Statistical Test Infer the results (Reject/fail to Reject H0) Ho: Men are, on average, not getting higher salary than women. Ha: Men are, on average, getting higher salary than women. Equal proportion of men & women in a variety of industries in scope of a country One-tail t-test Average diff 20k and p- value 0.002, which is consistent with H1
  • 15. Terminology Description Significance level /confidence level(𝛼) A threshold to decide whether a test statistic is statistically significant. Statical significance means high likely a relationship between variables is not caused by chance. 𝛼 is lays in the area inside the tail(s) of the H0 𝛼 = 1 – (confidence level /100) Γ  Common practice 𝛼 : 0.01, 0.05, 0.1 P-value (probability Value) Determines plausibility of null hypotheses, whether H0 should be rejected or not! P(Sample statistics| H0 True) 0 < p-value < 1 P-value β‰₯ 𝛼 : results are not statically significant, H0 not rejected/failed, the null must fly! P-value < 𝛼 : results are statically significant, H0 rejected/failed, the null must go! Sweet AI Basic Concepts H0 is ... True False Rejected Type I Error 𝛼 Correct Not Rejected Correct Type II Error 𝛽 https://www.abtasty.com
  • 16. β€’ Used to test if two groups of data are different from each other and we don’t know standard deviation of population β€’ Normal Distribution Formula: β€’ To calculate percentile of a datapoint we should standardize a Normal Distribution to a Standard Normal Distribution β€’ Standard Normal Distribution has πœ‡ = 0 & 𝜎 = 1 β€’ How to determine x’s percentile/probability or how far from typical is this result? 1. Standardize the values of normal distribution and calculating z-score by population πœ‡ and population 𝜎 v for a single raw datum x: 𝒛 = 𝒙 βˆ’ 𝝁 𝝈 v for n independent and distributed samples(X): 𝒁 = 7 𝒙 βˆ’ 𝝁 𝝈/ 𝒏 v for proportion 𝒁 = : π’‘βˆ’π’‘ 𝒑(πŸβˆ’π’‘)/𝒏 " 𝑝: π‘œπ‘π‘ π‘’π‘Ÿπ‘£π‘’π‘‘ π‘ π‘Žπ‘šπ‘π‘™π‘’ π‘π‘Ÿπ‘œπ‘π‘œπ‘Ÿπ‘‘π‘–π‘œπ‘›, 𝑝: hypothesized population proportion, n: sample size 2. Looking at z-table to map a z-score to the area under a normal distribution curve and return P-value 3. Compare p-value with 𝛼: 𝑖𝑓 𝑝 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ β‰₯ 𝛼: Fail to reject H0 else Reject H0 Sweet AI Z-test
  • 18. Sweet AI Student t-test β€’ Used to test if two groups of data are different from each other and we don’t know standard deviation of population β€’ Assumption: β€’ Normal distribution β€’ Similar variance for each group/sample β€’ Same number of datapoint in each group/sample (20-30), more than this we should use z-test β€’ H0: There is no difference between groups β€’ H1: There is a difference between groups Types of t-test Description Formula Degree of freedom One-sample t-test Test if a population mean is equal to some value 𝝁 Μ… π‘₯: sample mean, πœ‡: population mean, s: sample standard deviation, n: sample size 𝑑 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ = Μ… π‘₯ βˆ’ πœ‡ 𝑆 𝑛 df = n -1 Dependent/Paired- samples t-test Test whether two population means are equal by sampling the same population twice , s: sample variances 𝑑 βˆ’ π‘£π‘Žπ‘™π‘’π‘’ = βˆ‘π‘‘ 𝑛 βˆ‘π‘‘ 2 βˆ’ βˆ‘π‘‘ 2 𝑛 βˆ’ 1 df = n -1 Independent two-sample t-test/ unpaired samples t-test Test if two population means are equal, two independent samples of different size with unequal variance t βˆ’ π‘£π‘Žπ‘™π‘’π‘’ = π‘ π‘–π‘”π‘›π‘Žπ‘™ π‘›π‘œπ‘–π‘ π‘’ = Μ… π‘₯1 βˆ’ Μ… π‘₯2 𝑠1 2 𝑛1 + 𝑠2 2 𝑛2 df = n2 + n1 -2
  • 19. Sweet AI Student t-test stanford.edu t-value < π‘π‘Ÿπ‘–π‘‘π‘–π‘π‘Žπ‘™ π‘£π‘Žπ‘™π‘’π‘’ β†’ Do Not Reject H0 t-value > π‘π‘Ÿπ‘–π‘‘π‘–π‘π‘Žπ‘™ π‘£π‘Žπ‘™π‘’π‘’ β†’ Reject H0 Degrees of freedom (df) 1. Calculate t-value and df 2. Determine on one or two tail test and the level of confidence 3. Look up critical value from t-table and determine to reject or fail to reject H0
  • 20. Sweet AI t-test vs. z-test Start Known 𝜎 sample size < 30 Is population highly skewed? t - test sign test sample size >= 30 Is population highly skewed? z - test Alternative methods Not known 𝜎 Is population highly skewed? t-test Alternative methods Yes No Yes No No Yes β€’ t-test and z-test are used to determine and compare the significance of a set of data.
  • 21. Sweet AI Analysis Of Variance (ANOVA) β€’ ANOVA determines the effects of several categorical independent variables on one numerical dependent variable. ANOVA One way 1 independent categorical variable on a single dependent variable Two way 2 independent categorical variables on a single dependent variable N-way Multiple independent categorical variables
  • 22. Sweet AI Analysis Of Variance (ANOVA) 1. Calculate variance between group and within groups 2. Calculate degree of freedom 3. Compute F-value. F βˆ’ π‘£π‘Žπ‘™π‘’π‘’ = π‘£π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ 𝑏𝑒𝑑𝑀𝑒𝑒𝑛 π‘”π‘Ÿπ‘œπ‘’π‘π‘  π‘£π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ π‘€π‘–π‘‘β„Žπ‘–π‘›π‘” π‘”π‘Ÿπ‘œπ‘’π‘π‘  = 𝑆𝑆𝐡𝐺/𝑑𝑓1 π‘†π‘†π‘ŠπΊ/𝑑𝑓2 , df1 = n -1, df2 = (n – 1)m , n: # sample in each group, m: # groups 4. Find critical value/F-score from F Distribution table using df1, df2 and a selected alpha http://www.socr.ucla.edu/Applets.dir/F_Table.html 5. Compare F-value and F-score, if f-value < fcritical : fail to reject H0 else H0 is rejected
  • 24. Sweet AI Hypothesis Testing towardsdatascience.com
  • 26. Sweet AI Python Library Type of Test Scipy Code Determine Gaussian distribution of data from scipy.stats import shapiro/ normaltest stat, p = shapiro(data) # p > 0.05 has Gaussian distribution Determine linear relationship of two samples from scipy.stats import pearsonr stat, p = pearsonr(data1, data2) # p > 0.05 more likely they are independent Determine monotonic relationship of two samples from scipy.stats import spearmanr/ kendalltau stat, p = spearmanr(data1, data2) # p > 0.05 more likely they are independent Determine relationship of two categorical variables from scipy.stats import chi2_contingency stat, p, dof, expected = chi2_contingency(table) # p > 0.05 more likely they are independent Determine z-score or percentile from scipy import stats stats.norm.cdf(z) or stats.norm.ppf(p) Determine if the means of two independent normally distributed samples are significantly different (student t-test) from scipy.stats import ttest_ind stat, p = ttest_ind(data1, data2) # p > 0.05 more likely the same distribution Determine if the means of two paired normally distributed samples are significantly different (student t-test) from scipy.stats import ttest_rel stat, p = ttest_rel(data1, data2) # p > 0.05 more likely the same distribution Determine if the means of two or more independent normally distributed samples are significantly different (ANOVA) from scipy.stats import f_oneway stat, p = f_oneway(data1, data2, data3) # p > 0.05 more likely the same distribution Determine if the distribution of two independent samples are equal (Mann- Whitney U test) from scipy.stats import mannwhitneyu stat, p = mannwhitneyu(data1, data2) # p > 0.05 more likely the same distribution