SlideShare a Scribd company logo
... and are you sure?
Multiple statistical comparisons problem
Jiˇr´ı Haviger
jiri.haviger@uhk.cz
May 12, 2018
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 1 / 24
Introduction
Jelly beans cause acne ...
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 2 / 24
Introduction
... and are you sure?
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 3 / 24
Basic idea of inferential statistics Inference, confidence intervals and pvalue
Inference
Demostration of sample means distributions, shiny.rit.albany.edu
Demostration of sample means distributions, rpsychologist.com
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 4 / 24
Basic idea of inferential statistics Inference, confidence intervals and pvalue
Confidence intervals
Q: how to estimate the popultion characteristic from knowing sample? Point? Interval?
Probabilistic theory:
is knowing probability density function PDF of sample measures
(eg. Student s T distribution of sample means m)
for different samples
we have: sample with statistical characteristic (n, x, sd, ...)
we have: α as a probability in which we accept mistake (usually
α = 0.05)
to do:: from sample information N, m, sd and α...
transform sample characteristics into variable with knowing
distribution (e.g. t = x−µ
s ·
√
n)
to do: based on PDF and t determine confidence interval for
characteristic (eg. CI(µ))
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 5 / 24
Hypothesis testing pvalue
pvalue visualisation
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 6 / 24
Hypothesis testing Hypothesis testing process
Hypothesis testing procecss
Q: Comes our sample comes population with null hypothesis?
we have: idea about population (from theory, intuition, goverment,
... )
we have: sample with statistical characteristic (n, x, sd, ...)
we have: α as a probability in which we accept mistake (usually
α = 0.05)
to do: formulate null and alternative hypothesis
to do: determine probability, that our sample is from population with
null hypothesis → p-value or sig.
to do: compare pvalue from sample and α level
pvalue < α → reject null hypothesis
pvalue ≥ α → retain null hypothesis.
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 7 / 24
Hypothesis testing Two possible errors
Two possible errors
Q: Which mistakes in null hypothesis testing can I do?
null hypothesis rejected correctly (True Positive, TP)
null hypothesis rejected noncorrectly (False Positive, FP, error I)
null hypothesis retain correctly (True Negativ, TN)
null hypothesis retain noncorrectly (False Negative, FN, error II)
Terminology: H0 is reject ∼ test is positive ∼ discovery
test result about H0 rejection
positive (discovery) negative
reality H0 false TP FN
true FP TN
Online demostration of two type of error
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 8 / 24
Hypothesis testing Two possible errors
Two errors
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 9 / 24
Hypothesis testing Power of analysis, sample size, effect size
Power of test
test result about H0
positive (discovery) negative
reality H0 false TP FN (β)
true FP (α) TN (power, 1 − β)
In ”basic level of statistic” you determine α as probability of false
positives results (eg. false positives diagnoses of cancer)
in ”advanced level of statistic” you to compute minimal reqiured
sample size from given α β and effect size.
There are four numbers in relation: α, β, effect size and sample size
if is fixed effect size and sample size, then
decreasing α implies increasing β
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 10 / 24
Hypothesis testing Power of analysis, sample size, effect size
Software for power analysis
G*power, package for R or python, ...)
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 11 / 24
Multil comparisin problem Introduction
More tests
Q: Whats happends with probability of false popsitives, if we use more than one test?
for one test: probability that we have false positive results is
P(FP) = α
for two tests: probability of at least one false positive results is
P(FP1 or FP2) = P(FP1) + P(FP2) − P(FP1 and FP2) = · · ·
· · · = 1 − P(¬FP1 and ¬FP2) = · · ·
· · · = 1 − (1 − α) · (1 − α) = 1 − (1 − α)2
for m tests: probability of at least one false positive results is
P(FP1 or . . . or FPm) = 1 − (1 − α)m
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 12 / 24
Multil comparisin problem Family wise error rate correction
More tests
Q: Relationship between number of test m and P(FP1 or . . . or FPm) = 1 − (1 − α)m
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 13 / 24
Multil comparisin problem Family wise error rate correction
Basic alpha correction
Q: How to change α → αcorr so the prob of P(FP1 or . . . or FPm) will be α?
P(FP1 or . . . or FPm) should be α
P(FP1 or . . . or FPm) = α
1 − (1 − αcorr )m = α
αcorr = 1 − (1 − α)1/m
αcorr is call ˇSid´ak correction named by Czech statistician Zbynˇek ˇSid´ak
(see wiki) and we will use sign αsid
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 14 / 24
Multil comparisin problem Family wise error rate correction
Bonferroni correction
Q: What about Bonfferoni correction αbonf = α
m
?
linear approximation of ˇSid´ak correction
αsid = 1 − (1 − α)1/m
Laurent series at m = ∞: αsid ≈ −log(1−α)
m + O(( 1
m )2)
Taylor series at α = 0: −log(1−α)
m ≈ α
m + O(α2)
Practically there is no difference in using
αsid ≈
α
m
= αbonf
αsid and αbonf corrections are based on number of all tests.
Bonferroni correction is named by Italian mathematician Carlo Emilio
Bonferroni.
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 15 / 24
Multil comparisin problem Two type of errors again...
Balance between FP and FN
Q: And what about β?
Online demonstration of two type of error
decrease α → increase β
increase β → increase probability of FN → test is going to ”blind”
how to balance between FP and FN depends on solving problem
sometime is better to decrement FP
e.g. in justice - no one false prison
sometimes is better to decrement FN
e.g. in brain disorders - detect some disorders correct and some wrong
is better, than non-detect disorders at all
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 16 / 24
Multil comparisin problem Two type of errors again...
Balance between FP and FN
Q: What if we have thousand of tests?
ˇSid´ak and Bonferonni control False Positive from all results
Family Wise Error Rate (FWER), FWER = FP/M
FWER corrections are strict and tending to blind test
other point of view is necessary, so what about ...
... tocontrol False Positive rate only from Discoveries
False Discovery Rate (FDR), FDR = FP/(TP+FP)
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 17 / 24
Multil comparisin problem control False Discovery Rate
Benjamini - Hochberg algorithm
Q: How to control FDR to predefined level α in m tests?
Benjamini - Hochberg algorithm for independent tests
1 create all tests and determine all pvalues
2 sort pvalues from smallest one - P[i]
3 compute linear series C[i] = α · i
m
4 set k as a first i, for which P[i] ≥ C[i]
5 αbh = α · k
m
αbh is based on numbers of all tests and concrete pvalue series.
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 18 / 24
Multil comparisin problem control False Discovery Rate
Bemjamini - Hochberg visualization
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 19 / 24
Multil comparisin problem control False Discovery Rate
pvalues distribution
Q: Why αBH used number of all test, if control FDR only?
we don’t know, which pvalues are from discoveries
and which not, but ...
we can construct pvalue distribution
form definition of p-values we know:
all pvalues from H0 has uniform distribution between 0,1
all pvalues from HA has decreasing distribution
from top (close to 0) to zero (close to 1)
all pvalues has mixed distribution
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 20 / 24
Multil comparisin problem control False Discovery Rate
pvalues distribution
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 21 / 24
Multil comparisin problem control False Discovery Rate
pvalues distribution and qvalues
Q: So is possible to use pvalue distribution for control FDR?
Determining qvalues from pvalues distributions (Storey)
1 sort pvalues from smallest one - P[i]
2 create density plot of P[i] in (0,1) with step 0.05 (or smaller)
3 determine π0 from right part of density - level selecting H0 from HA
4 compute qvalues Q[k] as false discovery rate
5 select max Q[k] so Q[k] ≤ α
6 αst = k
7 αst is based on distributions of pvalues
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 22 / 24
Multil comparisin problem control False Discovery Rate
Computational Psycholinguistic Analysis of Czech Text
Two examples of pvalue distributions from our research
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 23 / 24
Finish Questions?
Web sources, contact
https://xkcd.com/882/
https://shiny.rit.albany.edu/stat/confidence/
http://rpsychologist.com/d3/CI/
http://varianceexplained.org/statistics/interpreting-pvalue-histogram/
http://qvalue.princeton.edu/
Jiˇr´ı Haviger
ResearchGate, ORCID, LinkedIn ...
e:jiri.haviger@uhk.cz
Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 24 / 24

More Related Content

What's hot

Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic Regression
Kaushik Rajan
 
Statistical tests
Statistical tests Statistical tests
Statistical tests
Thangamani Ramalingam
 
Bayesian inference
Bayesian inferenceBayesian inference
Bayesian inference
CharthaGaglani
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
Kaori Kubo Germano, PhD
 
Probability Distributions
Probability Distributions Probability Distributions
Probability Distributions
Anthony J. Evans
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis Testing
Sr Edith Bogue
 
Hypothesis
HypothesisHypothesis
Hypothesis
Nilanjan Bhaumik
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
DrZahid Khan
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
Ram Kumar Shah "Struggler"
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
Dr. C.V. Suresh Babu
 
Hypothesis testing , T test , chi square test, z test
Hypothesis testing , T test , chi square test, z test Hypothesis testing , T test , chi square test, z test
Hypothesis testing , T test , chi square test, z test
Irfan Ullah
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood Estimationguestfee8698
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
DataminingTools Inc
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
Sagar Kamble
 
F test and ANOVA
F test and ANOVAF test and ANOVA
F test and ANOVA
MEENURANJI
 
Anova lecture
Anova lectureAnova lecture
Anova lecture
doublem44
 

What's hot (20)

Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic Regression
 
Statistical tests
Statistical tests Statistical tests
Statistical tests
 
Bayesian inference
Bayesian inferenceBayesian inference
Bayesian inference
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Probability Distributions
Probability Distributions Probability Distributions
Probability Distributions
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis Testing
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
Regression
RegressionRegression
Regression
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
 
Hypothesis testing , T test , chi square test, z test
Hypothesis testing , T test , chi square test, z test Hypothesis testing , T test , chi square test, z test
Hypothesis testing , T test , chi square test, z test
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood Estimation
 
Confidence interval
Confidence intervalConfidence interval
Confidence interval
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
 
P value
P valueP value
P value
 
F test and ANOVA
F test and ANOVAF test and ANOVA
F test and ANOVA
 
Anova lecture
Anova lectureAnova lecture
Anova lecture
 

Similar to Multiple comparison problem

Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)
Matt Hansen
 
Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)
Matt Hansen
 
Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)
Matt Hansen
 
Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)
Matt Hansen
 
What So Funny About Proportion Testv3
What So Funny About Proportion Testv3What So Funny About Proportion Testv3
What So Funny About Proportion Testv3
ChrisConnors
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Matt Hansen
 
Novelties in social science statistics
Novelties in social science statisticsNovelties in social science statistics
Novelties in social science statistics
Jiri Haviger
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
Matt Hansen
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
sathish sak
 
Hypothesis Testing in Six Sigma
Hypothesis Testing in Six SigmaHypothesis Testing in Six Sigma
Hypothesis Testing in Six Sigma
Body of Knowledge
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
Shivasharana Marnur
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic ReasoningTameem Ahmad
 
Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)
Matt Hansen
 
Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)
Matt Hansen
 
2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx
ImpanaR2
 
Introduction to Bayesian Statistics.ppt
Introduction to Bayesian Statistics.pptIntroduction to Bayesian Statistics.ppt
Introduction to Bayesian Statistics.ppt
Long Dang
 
General concept for hypohtesis testing
General concept for hypohtesis testingGeneral concept for hypohtesis testing
General concept for hypohtesis testing
Nadeem Uddin
 
Review Z Test Ci 1
Review Z Test Ci 1Review Z Test Ci 1
Review Z Test Ci 1shoffma5
 

Similar to Multiple comparison problem (20)

Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)
 
Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)
 
Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)
 
bayesjaw.ppt
bayesjaw.pptbayesjaw.ppt
bayesjaw.ppt
 
Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)
 
What So Funny About Proportion Testv3
What So Funny About Proportion Testv3What So Funny About Proportion Testv3
What So Funny About Proportion Testv3
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
 
Novelties in social science statistics
Novelties in social science statisticsNovelties in social science statistics
Novelties in social science statistics
 
Lecture3
Lecture3Lecture3
Lecture3
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
 
Hypothesis Testing in Six Sigma
Hypothesis Testing in Six SigmaHypothesis Testing in Six Sigma
Hypothesis Testing in Six Sigma
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic Reasoning
 
Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)
 
Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)
 
2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx
 
Introduction to Bayesian Statistics.ppt
Introduction to Bayesian Statistics.pptIntroduction to Bayesian Statistics.ppt
Introduction to Bayesian Statistics.ppt
 
General concept for hypohtesis testing
General concept for hypohtesis testingGeneral concept for hypohtesis testing
General concept for hypohtesis testing
 
Review Z Test Ci 1
Review Z Test Ci 1Review Z Test Ci 1
Review Z Test Ci 1
 

Recently uploaded

一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 

Recently uploaded (20)

一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 

Multiple comparison problem

  • 1. ... and are you sure? Multiple statistical comparisons problem Jiˇr´ı Haviger jiri.haviger@uhk.cz May 12, 2018 Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 1 / 24
  • 2. Introduction Jelly beans cause acne ... Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 2 / 24
  • 3. Introduction ... and are you sure? Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 3 / 24
  • 4. Basic idea of inferential statistics Inference, confidence intervals and pvalue Inference Demostration of sample means distributions, shiny.rit.albany.edu Demostration of sample means distributions, rpsychologist.com Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 4 / 24
  • 5. Basic idea of inferential statistics Inference, confidence intervals and pvalue Confidence intervals Q: how to estimate the popultion characteristic from knowing sample? Point? Interval? Probabilistic theory: is knowing probability density function PDF of sample measures (eg. Student s T distribution of sample means m) for different samples we have: sample with statistical characteristic (n, x, sd, ...) we have: α as a probability in which we accept mistake (usually α = 0.05) to do:: from sample information N, m, sd and α... transform sample characteristics into variable with knowing distribution (e.g. t = x−µ s · √ n) to do: based on PDF and t determine confidence interval for characteristic (eg. CI(µ)) Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 5 / 24
  • 6. Hypothesis testing pvalue pvalue visualisation Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 6 / 24
  • 7. Hypothesis testing Hypothesis testing process Hypothesis testing procecss Q: Comes our sample comes population with null hypothesis? we have: idea about population (from theory, intuition, goverment, ... ) we have: sample with statistical characteristic (n, x, sd, ...) we have: α as a probability in which we accept mistake (usually α = 0.05) to do: formulate null and alternative hypothesis to do: determine probability, that our sample is from population with null hypothesis → p-value or sig. to do: compare pvalue from sample and α level pvalue < α → reject null hypothesis pvalue ≥ α → retain null hypothesis. Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 7 / 24
  • 8. Hypothesis testing Two possible errors Two possible errors Q: Which mistakes in null hypothesis testing can I do? null hypothesis rejected correctly (True Positive, TP) null hypothesis rejected noncorrectly (False Positive, FP, error I) null hypothesis retain correctly (True Negativ, TN) null hypothesis retain noncorrectly (False Negative, FN, error II) Terminology: H0 is reject ∼ test is positive ∼ discovery test result about H0 rejection positive (discovery) negative reality H0 false TP FN true FP TN Online demostration of two type of error Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 8 / 24
  • 9. Hypothesis testing Two possible errors Two errors Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 9 / 24
  • 10. Hypothesis testing Power of analysis, sample size, effect size Power of test test result about H0 positive (discovery) negative reality H0 false TP FN (β) true FP (α) TN (power, 1 − β) In ”basic level of statistic” you determine α as probability of false positives results (eg. false positives diagnoses of cancer) in ”advanced level of statistic” you to compute minimal reqiured sample size from given α β and effect size. There are four numbers in relation: α, β, effect size and sample size if is fixed effect size and sample size, then decreasing α implies increasing β Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 10 / 24
  • 11. Hypothesis testing Power of analysis, sample size, effect size Software for power analysis G*power, package for R or python, ...) Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 11 / 24
  • 12. Multil comparisin problem Introduction More tests Q: Whats happends with probability of false popsitives, if we use more than one test? for one test: probability that we have false positive results is P(FP) = α for two tests: probability of at least one false positive results is P(FP1 or FP2) = P(FP1) + P(FP2) − P(FP1 and FP2) = · · · · · · = 1 − P(¬FP1 and ¬FP2) = · · · · · · = 1 − (1 − α) · (1 − α) = 1 − (1 − α)2 for m tests: probability of at least one false positive results is P(FP1 or . . . or FPm) = 1 − (1 − α)m Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 12 / 24
  • 13. Multil comparisin problem Family wise error rate correction More tests Q: Relationship between number of test m and P(FP1 or . . . or FPm) = 1 − (1 − α)m Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 13 / 24
  • 14. Multil comparisin problem Family wise error rate correction Basic alpha correction Q: How to change α → αcorr so the prob of P(FP1 or . . . or FPm) will be α? P(FP1 or . . . or FPm) should be α P(FP1 or . . . or FPm) = α 1 − (1 − αcorr )m = α αcorr = 1 − (1 − α)1/m αcorr is call ˇSid´ak correction named by Czech statistician Zbynˇek ˇSid´ak (see wiki) and we will use sign αsid Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 14 / 24
  • 15. Multil comparisin problem Family wise error rate correction Bonferroni correction Q: What about Bonfferoni correction αbonf = α m ? linear approximation of ˇSid´ak correction αsid = 1 − (1 − α)1/m Laurent series at m = ∞: αsid ≈ −log(1−α) m + O(( 1 m )2) Taylor series at α = 0: −log(1−α) m ≈ α m + O(α2) Practically there is no difference in using αsid ≈ α m = αbonf αsid and αbonf corrections are based on number of all tests. Bonferroni correction is named by Italian mathematician Carlo Emilio Bonferroni. Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 15 / 24
  • 16. Multil comparisin problem Two type of errors again... Balance between FP and FN Q: And what about β? Online demonstration of two type of error decrease α → increase β increase β → increase probability of FN → test is going to ”blind” how to balance between FP and FN depends on solving problem sometime is better to decrement FP e.g. in justice - no one false prison sometimes is better to decrement FN e.g. in brain disorders - detect some disorders correct and some wrong is better, than non-detect disorders at all Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 16 / 24
  • 17. Multil comparisin problem Two type of errors again... Balance between FP and FN Q: What if we have thousand of tests? ˇSid´ak and Bonferonni control False Positive from all results Family Wise Error Rate (FWER), FWER = FP/M FWER corrections are strict and tending to blind test other point of view is necessary, so what about ... ... tocontrol False Positive rate only from Discoveries False Discovery Rate (FDR), FDR = FP/(TP+FP) Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 17 / 24
  • 18. Multil comparisin problem control False Discovery Rate Benjamini - Hochberg algorithm Q: How to control FDR to predefined level α in m tests? Benjamini - Hochberg algorithm for independent tests 1 create all tests and determine all pvalues 2 sort pvalues from smallest one - P[i] 3 compute linear series C[i] = α · i m 4 set k as a first i, for which P[i] ≥ C[i] 5 αbh = α · k m αbh is based on numbers of all tests and concrete pvalue series. Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 18 / 24
  • 19. Multil comparisin problem control False Discovery Rate Bemjamini - Hochberg visualization Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 19 / 24
  • 20. Multil comparisin problem control False Discovery Rate pvalues distribution Q: Why αBH used number of all test, if control FDR only? we don’t know, which pvalues are from discoveries and which not, but ... we can construct pvalue distribution form definition of p-values we know: all pvalues from H0 has uniform distribution between 0,1 all pvalues from HA has decreasing distribution from top (close to 0) to zero (close to 1) all pvalues has mixed distribution Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 20 / 24
  • 21. Multil comparisin problem control False Discovery Rate pvalues distribution Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 21 / 24
  • 22. Multil comparisin problem control False Discovery Rate pvalues distribution and qvalues Q: So is possible to use pvalue distribution for control FDR? Determining qvalues from pvalues distributions (Storey) 1 sort pvalues from smallest one - P[i] 2 create density plot of P[i] in (0,1) with step 0.05 (or smaller) 3 determine π0 from right part of density - level selecting H0 from HA 4 compute qvalues Q[k] as false discovery rate 5 select max Q[k] so Q[k] ≤ α 6 αst = k 7 αst is based on distributions of pvalues Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 22 / 24
  • 23. Multil comparisin problem control False Discovery Rate Computational Psycholinguistic Analysis of Czech Text Two examples of pvalue distributions from our research Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 23 / 24
  • 24. Finish Questions? Web sources, contact https://xkcd.com/882/ https://shiny.rit.albany.edu/stat/confidence/ http://rpsychologist.com/d3/CI/ http://varianceexplained.org/statistics/interpreting-pvalue-histogram/ http://qvalue.princeton.edu/ Jiˇr´ı Haviger ResearchGate, ORCID, LinkedIn ... e:jiri.haviger@uhk.cz Jiˇr´ı Haviger (jiri.haviger@uhk.cz) ... and are you sure? May 12, 2018 24 / 24