SlideShare a Scribd company logo
1 of 185
Download to read offline
International Life Sciences Workshop 
“Decision-Making in Biomedical Science – Meet Experts” 
September 12 – 16 | 2014 
Potsdam | Germany 
Harmonizing statistical evidences and 
predictions 
Nikita N. Khromov-Borisov 
Pavlov First Saint Petersburg State Medical University 
Saint Petersburg, Russia 
Nikita.KhromovBorisov@gmail.com 
+7 952-204-89-49; +7 921-449-29-05 
http://independent.academia.edu/NikitaKhromovBorisov 
https://www.researchgate.net/profile/Nikita_Khromov-Borisov?ev=hdr_xprf 
1
Slides are freely available to all 
Nikita N. Khromov-Borisov 
Department of Physics, Mathematics and Informatics 
Pavlov First Saint Petersburg State Medical University 
Nikita.KhromovBorisov@gmail.com 
+7-952-204-89-49; +7-921-449-29-05 
http://independent.academia.edu/NikitaKhromovBorisov 
2
The best way to discuss scientific issues is to 
discuss them in a foreign language 
Max Ludwig Henning Delbrück, 
(September 4, 1906 – March 9, 1981) 
Piotr Slonimski 
(November 9, 1922 – April 25, 2009) 
3
Second hand teaching 
• The History of Science has suffered greatly from the use by 
teachers of second-hand material, and the consequent 
obliteration of the circumstances and the intellectual 
atmosphere in which the great discoveries of the past were 
made. 
• A first-hand study is always instructive, and often . . . full of 
surprises. 
• Ronald A. Fisher, 1955 
• Cited by: Ziliak S.T., McCloskey D.N. The Cult of Statistical 
Significance: How the Standard Error Costs Us Jobs, Justice, and 
Lives. The University of Michigan Press, Ann Arbor, 2008, 321 pp. 
• http://stephentziliak.com/ 
4
Crisis of reproducibility of the 
results in biomedicine 
5
The essences of science are 
replication and reproducibility 
• The essence of science is replication: 
• a scientist should always be concerned about what would 
happen if he or another scientist were to repeat his 
experiment. 
• Guttman L. What is not what in statistics. The Statistician, 
1977; 26(2): 81-107. 
• Scientists have elaborated method of determining the 
validity of their results. 
• They learned to ask the question: are they reproducible? 
• Scherr G.H. Irreproducible Science: Editor’s Introduction. 
• In The Best of the Journal of Irreproducible Results, 
Workman 
• Publishing, New York, 1983. 
• Reproducibility is like the ghost that will always come back 
to haunt you. 
• http://datapede.blogspot.ru/2014/03/part-1z-p-value-surviving-mosquito.html 
6
Loscalzo J. Irreproducible Experimental Results: 
Causes, (Mis)interpretations, and Consequences. 
Circulation, 2012; 125: 1211-1214. 
• In Science what is relevant is reproducible results. 
• If an initial observation is found to be reproducible, 
then it must be true. 
• If an initial observation is found not to be 
reproducible, then it must be false. 
• Many readers of scientific journals—especially of 
higher-impact journals—assume that if a study is of 
sufficient quality to pass the scrutiny of rigorous 
reviewers, it must be true. 
• This assumption is based on the inferred equivalence 
of reproducibility and truth. 
7
• Long ago Fisher . . . recognised that . . . solid 
knowledge came from a demonstrated ability to 
repeat experiments . . . 
• This is unhappy for the investigator who would 
like to settle things once and for all, but 
consistent with the best accounts . . . of the 
scientific method . . . 
• Tukey J.W. The philosophy of multiple 
comparisons. Statistical Science, 1991; 6: 100- 
116. 
8
Tukey J.W. Analyzing data: Sanctification 
or detective work? American Psychologist, 
1969; 24: 83–91. 
• Nothing learned is certain. 
• We learn by taking chances. 
• Every modern learning theorist expects learning to be by trial, 
with some errors. 
• This is as true for science as for the individual. 
• Confirmation comes from repetition. 
• Repetition is the basis for judging varilability and significance and 
confidence. 
• Repetition of results, each significant, is the basis, according to 
Fisher, of scientific truth. 
• Certainty is an illusion. 
• As an illusion, certainty can be wasteful, as well as misleading. 
• Data analysis needs to be both exploratory and confirmatory. 
9
From the history of epidemiological studies: Risk factors for cancer 
[Jenks S., Volkers N. Razors and Refrigerators and Reindeer — Oh My! 
JNCI, 1992; 84(24):1863] 
• Using electric razor: Increase the risk of developing leukemia. 
• Distal forearm fractures in women: Reduction in overall cancer 
incidence, breast cancer incidence, and incidence of tumors. 
• Fluorescent lighting: Melanoma in male but not in females. 
• Allergies and cancer: At first the inverse relationship. Later several 
types of cancer were elevated. However, ovarian cancer risk 
decreased with increasing numbers of allergies. 
• Breeding reindeer: in Swedish Lapps decreased risks for cancers of 
the colon, female breast, male genital tract, kidneys, respiratory 
system, and for lymphomas. However, increased risk for stomach 
cancer. 
10
From the history of epidemiological studies: Risk factors for cancer 
[Jenks S., Volkers N. Razors and Refrigerators and Reindeer — Oh My! JNCI, 
1992; 84(24): 1863] 
• Waiters in Norway: Decreased risk of stomach cancer but excess risks of 
cancers of the liver, rectum, upper respiratory and digestive tracts, and 
lung. Higher mortality rate from lung cancer. 
• Owning a pet bird: Fourfold increase in lung cancer risk among pigeon 
fanciers (more hazardous than living with a smoker). Owners of budgies, 
canaries, finches, or parrots were OK. 
• Height: Lower risks for some cancers in short men, particularly colorectal 
cancer, and lower risks for this cancer and for breast cancer in short 
women. But being tall may confer some advantage for certain cancers 
(esophageal, endometrial and cervical), while tall men have only a 
slightly elevated risk for prostate, kidney and colon cancers. 
• Refrigerators: Seems protect everyone from stomach cancer. 
11
• An extensive list of curious and questionable 
medical observations about the various risk 
factors, was given in the work: 
• Buchanan A.V., Weiss K.M., Fullerton S.M. 
• Dissecting complex disease: the quest for the 
Philosopher’s Stone? 
• International Journal of Epidemiology 2006. – 
Vol. 35. – P. 562–571 
12
Table of irreproducible results? 
• Hormone replacement therapy and heart 
disease 
• Hormone replacement therapy and cancer 
• Stress and stomach ulcers 
• Annual physical checkups and disease 
prevention 
• Behavioural disorders and their cause 
• Diagnostic mammography and cancer 
prevention 
• Breast self-exam and cancer prevention 
• Echinacea and colds 
• Vitamin C and colds 
• Baby aspirin and heart disease prevention 
• Dietary salt and hypertension 
• Dietary fat and heart disease 
• Dietary calcium and bone strength 
• Obesity and disease 
• Dietary fibre and colon cancer 
• The food pyramid and nutrient RDAs 
• Cholesterol and heart disease 
• Homocysteine and heart disease 
• Inflammation and heart disease 
• Olive oil and breast cancer 
• Fidgeting and obesity 
• Sun and cancer 
• Mercury and autism 
• Obstetric practice and schizophrenia 
• Mothering patterns and schizophrenia 
• Anything else and schizophrenia 
• Red wine (but not white, and not grape juice) 
and heart disease 
• Syphilis and genes 
• Mothering patterns and autism 
• Breast feeding and asthma 
• Bottle feeding and asthma 
• Anything and asthma 
• Power transformers and leukaemia 
• Nuclear power plants and leukaemia 
• Cell phones and brain tumours 
• Vitamin antioxidants and cancer, aging 
• HMOs and reduced health care cost 
• HMOs and healthier Americans 
• Genes and you name it! 
13
‘Blood group mythology’: myths about AB0 
• Human blood group system AB0 can serve as an classic example of 
unacknowledged associations with the different conditions. 
• Several incredible phenomenon were reported: 
• Persons with A have more severe hangovers; 
• Persons with B defecate the most; 
• Persons with 0 have more healthy teeth; 
• Military with 0 are spineless and with B are more impulsive; 
• Persons with B are more prone to crime; 
• Strong connection between AB0 and nutrition; 
• Persons with A2 have the highest IQ; 
• A is significantly more common among members of the higher socio-economic 
groups. 
• All these associations are not reproduced and virtually forgotten. 
14
• Large companies in Japan still use blood types 
when advertising for, or evaluating, job 
applicants. 
• George Garratty 
• Association of Blood Groups and Disease: Do 
Blood Group Antigens and Antibodies Have a 
Biological Role? 
• History and Philosophy of the Life Sciences, 
1996; Vol. 18, No. 3, The First Genetic Marker, p. 
321-344. 
15
• The only associations between AB0 blood 
groups and malignant neoplasms, 
thrombosis, peptic ulcers, bleeding, bacterial 
and viral infections are still regarded as 
statistically “proven“. 
• Alas, these associations have no clinical 
(practical) importance due to low values of 
odds ratio (OR) which do not exceed the 
value of OR = 1.5. 
16
Associations between AB0 blood groups and diseases, 
which are still considered to be statistically “proven” 
Medical condition A > 0 0 > A B/AB > A/0 OR 
Malignancy X 1.2 – 1.3 
Thrombosis X 
Peptic ulcers X 1.2 – 1.4 
Bleeding X 1.5 
E. coli / Salmonella X 
Note that here we meet extremely important issue of clinical (or 
any other practical) importance (significance) of the observed 
associations. Here clinical importance is demonstrated with one 
of the measures of the effect size such as odds ratio (OR). 
17
Begley C.G., Ellis L.M. Raise standards for preclinical 
cancer research. Nature, 2012; 483: 531-533. 
• Recently Glenn Begley, former vice president of the 
well-known biotech company Amgen, and his colleague 
Lee Ellis published the results of their efforts to replicate findings 
from recent publications in the clinical oncology literature. 
• The data were disturbing. 
• Of 53 papers, only 6 (11%) were reproducible. 
• Begley and Ellis state that the 
• poor reproducibility of the results becomes a systemic problem of 
modern science. 
• In one study, which was cited in a short period more 
than 1900 times, even the authors themselves later were 
unable to reproduce their own results. 
18
Increasing replication of un-reproducibility in science 
• Gautam Naik: Scientists' 
Elusive Goal: 
Reproducing Study 
Results. The Wall Street 
Journal, December 2, 
2011. 
• This is one of medicine’s 
dirty secrets: 
• Most results, including 
those that appear in top-flight 
peer-reviewed 
journals, can’t be 
reproduced. 
19
Macleod M.R., Michie S., Roberts I., Dirnagl U., Chalmers I., Ioannidis J.P.A., 
Al-Shahi Salman R., Chan A.-W., Glasziou P. Biomedical research: increasing 
value, reducing waste. The Lancet, 2014, 383(9912): 101-104 
• Of 1575 reports about cancer prognostic markers 
published in 2005, 1509 (96%) detailed at least one 
significant prognostic variable. 
• However, few identified biomarkers have been 
confirmed by subsequent research and few have 
entered routine clinical practice. 
• This pattern — initially promising findings not leading 
to improvements in health care — has been recorded 
across biomedical research. 
• So why is research that might transform health care 
and reduce health problems not being successfully 
produced? 
20
Ioannidis J.P.A. 
Why most published 
research findings are false. 
PLoS Med., 2005. – Vol. 2. – 
No. 8. – Paper: e124. 
Cited by 2174 
21
Reproducibility Initiative 
http://validation.scienceexchange.com/#/ 
22
• PLOS ONE Launches Reproducibility Initiative 
• http://validation.scienceexchange.com/#/ 
• Reproducibility Initiative receives $1.3M grant to validate 50 landmark 
cancer studies 
• Reproducibility Project: Psychology 
• https://osf.io/ezcuj/wiki/home/ 
• Special Section on Replicability in Psychological Science 
• Perspectives on Psychological Science, 2012; 7(6): 528 –530 
23
• Journal of Negative Results in BioMedicine is 
an open access, peer-reviewed, online 
journal that provides a platform for the 
publication and discussion of unexpected, 
controversial, provocative and/or negative 
results in the context of current tenets. 
• Editor-in-Chief 
• Bjorn R Olsen, Harvard Medical School 
24
Challenges in irreproducible research 
• No research paper can ever be considered to be the 
final word, and the replication and corroboration of 
research results is key to the scientific process. 
• In studying complex entities, especially animals and 
human beings, the complexity of the system and of 
the techniques can all too easily lead to results that 
seem robust in the lab, and valid to editors and 
referees of journals, but which do not stand the test 
of further studies. 
• http://www.nature.com/nature/focus/reproducibility/index.html 
25
Statistics 
“A subject which most statisticians 
find difficult but in which nearly all 
physicians are expert.” 
26
• Statistical flaws are a major cause of irreproducible 
results in all types of biomedical experimentation. 
• These include errors in trial design, data analysis, and 
data interpretation. 
• “If experimentation is the Queen of the sciences, 
surely statistical methods must be regarded as the 
Guardian of the Royal Virtue.” 
• Myron Tribus 
(Letter to Science) 
27
Statistical Babel 
• Unfortunately, statisticians speak different languages , and often 
do not hear and/or do not understand each other. 
• Two main approaches to the statistical inference are developing: 
• Bayesian and 
• Frequentist 
• Frequentist inference is subdivided onto two main branches: 
• Fisherian and 
• Neyman-Pearsonian 
• Users do not always differentiate them that leads to serious 
confusions. 
• Two other approaches are also exist: Likelihood and Fiducial 
inferences. 
• http://en.wikipedia.org/wiki/Frequentist_inference 
28
Babel 
29
Fundamental statistics principles 
• Random sampling is the main principle of statistics. 
• Randomness and the Law of Large Numbers ensure 
the sample representativeness. 
• A sample is called representative if it reflects correctly 
the distribution from which the sample is taken. 
• The main objective of statistics consists in analyzing 
random samples to get conclusions on the 
distributions from which they are drawn. 
• Note that we do not need the term “population” 
which can be misleading. 
30
Statistics with confidence 
• Does Statistics enable us to trust to it? 
• For instance, how to check is the die 
perfect (fair, ideal, symmetric) or not? 
• The answer is provided by the Law of 
Large Numbers. 
31
Simulation of the rolling a die: program SUStats 
http://www.jsc.nildram.co.uk/examples/sustats/diescore/DieScoreApplet.html 
A die was rolled 100 times in each of four independent simulations. 
Please, answer three questions: 
1. Are the results of the rolling reproducible (i.e. are the histograms similar)? 
- Yes 
- No 
2. What a form (shape) of the histogram and the underlying distribution we expect 
32 
for the results of rolling fair die? 
- Unimodal of a bell-form 
- Triangle 
- Uniform (rectangular) 
3. Can we state that the die is fair? 
- Yes 
- No
Simulation of the rolling a die: program SUStats 
http://www.jsc.nildram.co.uk/examples/sustats/diescore/DieScoreApplet.html 
33 
A die was rolled 1 000 times in each of four independent simulations. 
Please, answer two questions: 
1. Are the results of the rolling reproducible (are the histograms similar)? 
- Yes 
- No 
2. Can we state that the die is certainly fair? 
- Yes 
- No
Simulation of the rolling a die: program SUStats 
http://www.jsc.nildram.co.uk/examples/sustats/diescore/DieScoreApplet.html 
34 
A die was rolled 10 000 times in each of four independent simulations. 
Please, answer two questions: 
1. Are the results of the rolling reproducible (are the histograms similar)? 
- Yes 
- No 
2. Can we state that the die is certainly fair (the histograms are certainly 
rectangular and the entire distribution is uniform)? 
- Yes 
- No
Simulation of the rolling a die: program SUStats 
http://www.jsc.nildram.co.uk/examples/sustats/diescore/DieScoreApplet.html 
35 
Pease, keep in mind the last figure (number) n = 10 000 that gives reliable 
results. It is difficult to realize it in biomedicine, but it’s really reliable.
Lyrical digression 
• If to ponder, it is the 
• Pauli exclusion principle 
• that provides a variety of forms 
• of matter at all levels, 
• from atoms to living beings, 
• e.g., genetic and phenotypic (biochemical, 
physiological, morphological) variations. 
36
Sample size 
“She thought that a smaller sample 
size makes for more accurate results” 
37
Sample sizes in physics, chemistry, biology and 
medicine 
• Physicists and chemists works with the samples of different 
substances which contain 6∙1023 (the Avogadro constant) of 
particles (atoms or molecules) in 1 mole of the pure substance. 
• Even 1 nanomole of given substance contains about 1014 such 
particles. 
• These particles may be regarded as rather identical. 
• However, we need not to forget that even on the atomic level 
there are several isotopes of a given chemical element. 
• And some of them are radioactive. 
• In medicine researchers are limited with the size of the world 
population which is less then 1010, specifically, about 7.257∙109. 
• See real-time: http://www.worldometers.info/world-population/ 
• And human population are extremely heterogeneous. 
38
Principal contradiction 
• All people are dissimilar, even monozygotic (“identical”) 
twins. 
• In such twins the differences in copy number variation 
(CNV), immunoglobulins, fingerprints are observed. 
• Surely this fact is one of the main sources of the low 
reproducibility and predictive ability of the results in 
biomedicine. 
• Thus, the genetic and phenotypic uniqueness of each person 
comes into contradiction with the statistical methodology, 
which requires to analyze large amounts (thousands or at 
least hundreds) of identical persons to achieve the certain 
conclusions. 
39
What is the Low of Large Numbers? 
• If the probability P(A) of an event A is constant in all trials, then the larger n - 
the number of trials (experiments, sample size), 
• the closer the observed (empirical, experimental) relative frequency, f(A), of 
a given outcome (event) A converges to its expected (theoretical) probability 
P(A): 
f A P A P 
 
n 
• This means that the frequencies become more and more stable and their 
fluctuations become smaller and smaller. 
• Corollary: 
• Thus, we may not know the probability of an event A, but repeating the trial 
as much as possible, we can accept its observed frequency f(A) as a reliable 
statistical estimate of the unknown probability P(A)unkn. 
• Statistics helps us to know the unknown. 
• In Probability Theory probabilities are known, Statistics estimate them. 
40 
    
 
“Reverse side” of the Law of Large Numbers 
• Simultaneously along with the convergence of the frequency 
of an event A to its probability, the situation, when the 
frequency of the event will coincide exactly with its 
probability: 
• becomes less probable 
• i.e. the larger the number of trials the closer the probability 
of such an exact match converges to zero: 
41 
f A  PA 
     
Pr f A  P A P  
0 
 
n
Probability of the exact coincidence of the frequency f(A) with 
the probability P(A), e.g., fair coin tossing with P(A) = φ = 0,5 
f(A) 
• 5/10 
• 50/100 
• 500/1 000 
• 5 000/10 000 
• 50 000/100 000 
• 500 000/1 000 000 
P[f(A)] 
• 0.25 
• 0.080 
• 0.025 
• 0.0080 
• 0.0025 
• 0.00080 
42 
For the sake of clarity, the probability values are rounded to 
two significant figures.
Consequences of the Law of Large Numbers 
(LLN) 
• According to the Law of Large Numbers the larger the 
Sample Size, 
n 
• the “better” (more accurate, more reliable) the Sample data 
reflects the distribution of Random Variable from which the 
Sample is drawn. 
• Consequently, the larger the sample size, the more 
representative is the Sample. 
• This is true, however, if and only if (iff) the Sample data are 
the realizations of the independent identically distributed 
(iid) Random Variables. 
43
Statistical estimation 
44
What are the main objectives of statistics? 
• Statistical Estimation (of the parameters) 
• Point and interval estimations 
• Statistical Inference 
– Testing Statistical Hypotheses 
– Comparison of Models 
• Statistical Associations 
• Correlation and Regression 
45
What is Estimator and what is Estimate? 
• An “Estimator“ is a statistic that is used to infer the value of 
an unknown parameter in a statistical model. 
• The parameter being estimated is sometimes called 
the estimand. 
• In other words, an estimator is a rule for calculating an 
estimate of a given quantity based on observed data: 
• thus the rule and its result (the estimate) are distinguished. 
46
Two main kinds of Statistical Estimates 
• Point Estimate – estimation by a single 
number. 
• Intreval Estimate – estimation by an interval, 
which covers the value of the estimated 
parameter with given probability called 
confidence level. 
47
The main logic of Statistical Estimation: Point 
Estimates 
• Usually the parameter φunkn is unknown. 
• The objective is to estimate it on the basis of observed statistical data 
• x1, x2, …, xi, …, xn. 
• The above values are regarded as realizations of corresponding iid 
random variables: 
• X1, X2, …, Xi, …, Xn. 
• Appropriate function of these random variable is chosen as an Estimator 
for the unknown parameter. 
• Any such function is called “Statistic” and it also is a random variable. 
• Calculated values of a chosen Estimator are called Estimates. 
• Estimate is regarded as a realizations of given Estimator. 
48
Compression of statistical information 
• One of the most widely used statistic is a sample 
mean which plays a role of the Estimate of the 
mean value of the underlying distribution. 
• It is calculated as: 
n 
 
 
 
i 
i x 
1 
n 
M 
1 
• And it is generated by the Estimator: 
n 
~ 1 ~ 
 
 
 
i 
i X 
n 
M 
1 
• Here tilde “~” is a symbol of a random variable. 
49
Example 1 
Intrauterine growth restriction 
(IUGR) and interferon IFN-α/β 
50
• Let consider one of the most common 
problem of statistical analysis of two 
independent samples. 
51
IUGR – intrauterine growth restriction 
(old name “intrauterine growth retardation”) 
• Foetuses of birth weight less than 10th percentile of those born at 
same gestational age 
• or 
• two standard deviations below the population mean are considered 
growth restricted. 
• Note that the difiniton is based on statistical terms: 10th percentile 
and/or standard deviations. 
• More strictly IUGR should refer to foetuses that are small for gestational 
age and display other signs of chronic hypoxia or failure to thrive. 
• Approximately 3-5% of all pregnancies. 
• IUGR also known as SGA (small for gestational age). 
52
A comparision between normal and IUGR 
babies (Dr. M.C. Bansal) 
53
IUGR 
54
Normal and IUGR placenta (Dr. M.C. Bansal) 
55
56 
Levels of induced production of INF-α/β in 16 healthy mothers of 
healthy newborns and in 20 mothers of newborns with IUGR 
(intrauterine growth restriction) (Koroleva L.I.). Data are ranked. 
Healthy IUGR 
Rank 
IFN-α/β, 
IU/ml 
Rank 
IFN-α/β, 
IU/ml 
Rank 
IFN-α/β, 
IU/ml 
Rank 
IFN-α/β, 
IU/ml 
1 38 9 92 1 104 11 144 
2 42 10 93 2 121 12 146 
3 58 11 94 3 123 13 147 
4 59 12 101 4 123 14 149 
5 70 13 103 5 127 15 151 
6 71 14 115 6 130 16 153 
7 81 15 159 7 132 17 162 
8 86 16 170 8 134 18 168 
9 134 19 171 
10 140 20 173 
Only three highlighted values in healthy group are overlapped with the values in 
IUGR group. Level of INF-a/b in IUGR group stochastically dominates that in healthy.
Exploratory and Pictorial Statistics. 
Visualization of the initial data and 
their preliminary statistical 
descriptions: 
histograms, box plots, dominance 
diagrams, etc. 
57
58 
Comparisons of histograms for the levels of induced production of 
INF-α/β in 16 healthy mothers of healthy newborns and in 20 
mothers of newborns with IUGR. Free program PAST 
http://folk.uio.no/ohammer/past
Comparisons of histograms and cumulative sample distributions for the levels of 
induced production of INF-α/β in 16 healthy mothers of healthy newborns and in 20 
mothers of newborns with IUGR. 
Program XLSTAT http:www.xlstat.com 
1 
0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0 
0 50 100 150 200 
Cumulative relative frequency 
Cumulative distributions 
(Healthy / IUGR) 
Healthy IUGR 
0.025 
0.02 
0.015 
0.01 
0.005 
0 
Histograms (IFN-a/b, IU/mL) 
0 50 100 150 200 
Density 
IFN-a/b, IU/mL 
Healthy Normal(89.500,36.471) 
IUGR Normal(141.600,18.323) 
59
CDF – cumulative distribution functions and stochastic 
dominance 
Program XLSTAT http:www.xlstat.com 
• The level of induced IFN-a/ 
b in IUGR patients (green 
line) stochastically 
dominates that for healthy 
mothers (blue line): 
• X2 > X1 
• Stochastic - randomly 
determined; having a 
random probability 
distribution or pattern that 
may be analyzed 
statistically but may not be 
predicted precisely. 
1 
0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0 
0 50 100 150 200 
Cumulative relative frequency 
Cumulative distributions 
(IUGR / Healthy) 
IUGR Healthy 
60
Box-and-Whisker plot 
Q1 – first quartile, Q3 – third quartile, IQR – interquartile range, σ – standard deviation. 
61
Box-and-whisker plot for the levels of induced production of 
IFN-/ in 16 healthy mothers of healthy newborns and in 20 
mothers of newborns with IUGR. Free program: Instat+ 
http://www.reading.ac.uk/ssc/n/n_instat.htm 
62 
Marks for 
outliers 
medians 
95% confidence limits for medians 
What did the Box Plot say to the outlier? "Don't you dare get close to my whisker!!"
What is outlier? 
• Outlier is an observation that is numerically distant from the rest of the 
data. 
• They are often indicative of measurement (or registration) errors. 
• For example, if for the arterial blood pressure the value 1100 is 
registered, this could be misprint: either 1 or 0 is rather redundant. 
• Removing of outlier(s) is a controversial practice recommended in 
several textbooks and manuals. 
• However, the possibility should be considered that the underlying 
distribution for the data is not approximately normal, having "fat (heavy) 
tails“ or representing a mixture of two or more different distributions. 
• Mixture may comprise two identical distributions, but shifted relative to 
each other. 
• Thus, removing of outlier(s) have to be based on the extra-statistical 
considerations. 
• “I'm not an outlier; I just haven't found my distribution yet!” 
63
Mixture analysis 
Program PAST 
Component 
proportion 
Mean, M 
Standard 
Deviation, SD 
0.88 78.8 22.5 
0.12 164.5 5.5 
Data in healthy group can be regarded as a 
mixture of two normal distributions. 
Their proportions are 88% and 12%. 
The major component has sample mean 
about M = 79 IU/mL and standard deviation 
SD = 23 IU/mL. 
The minor component has M = 165 IU/mL 
and standard deviation SD = 5.5 IU/mL. 
However, the sample size (n1 = 16) is too 
small to get certain conclusion. 
64
Effect size 
65
• Recommendations for the Conduct, Reporting, Editing, and Publication 
of Scholarly Work in Medical Journals. Updated December 2013. 
• iii. Statistics 
• Describe statistical methods with enough detail to enable a 
knowledgeable reader with access to the original data to judge its 
appropriateness for the study and to verify the reported results. 
• When possible, quantify findings and present them with appropriate 
indicators of measurement error or uncertainty (such as confidence 
intervals). 
• Avoid relying solely on statistical hypothesis testing, such as P values, 
which fail to convey important information about effect size and 
precision of estimates. 
• http://www.icmje.org/recommendations/ 
• Prediction probabilities and prediction intervals should be added. 
66
• Over 300 medical and biomedical journals 
are guided with the ICMJE recommendations. 
67
Effect Size, ES 
• Question of the clinical (practical) importance of the observed 
• Effect Size (ES) 
• is a key when interpreting results of biomedical investigations (e.g., clinical 
trials). 
• Effect Size is defined as a quantitative reflection of the magnitude of some 
phenomenon that is used for the purpose of addressing a question of 
interest. 
• Kelley K., Preacher K.J. On Effect Size. Psychological Methods, 2012; 17(2): 
137–152 
• ES can be the difference between mean values, different kind of ratios, 
correlation, association etc. 
• ES can be expressed either in the real measurement units, or 
• as standardized (nonmetric) quantity. 
68
• Analyzing samples we get conclusions on the 
distributions from which they are drawn. 
• In the case of comparing two independent 
distributions the simplest and useful measure of the 
effect size is AUC (or AUROC) – Area Under (ROC-) 
Curve which relates to Mann-Whitney U-statistics. 
• One of its representation is so-called dominance 
diagram. 
69
170 159 115 103 101 94 93 92 86 81 71 70 59 58 42 38 
70 
104 
121 
123 
123 
127 
130 
132 
134 
134 
140 
144 
146 
147 
149 
151 
153 
162 
168 
171 
173
Dominance diagram 
Program XLSTAT http:www.xlstat.com 
Healthy 
Dominance diagram 
IUGR 
71 
Umin = 35 is a number of “plus” signs, and Umax = 285 is a number of “minus” signs, 
and obviously: Umin + Umax = 35 + 285 = n1 × n2 = 16 × 20 = 320
• For two independent random variables X and Y , 
• Θ = P(Y > X) + 1/2 P(Y = X) 
• is advocated as a general measure of effect size to characterize the 
degree of separation (or, conversely, overlap) of their distributions. 
• It is estimated by statistic 
• θ  AUC = Umax / (n1 × n2), 
• derived by dividing the larger observed value Umax of the Mann–Whitney 
statistic by the product of the two sample sizes. 
• It is equivalent to the observed value of AUC - area under the receiver 
operating characteristic (ROC) curve. 
• It has been termed the ‘probability of concordance’, ‘common language 
effect size’ and ‘measure of stochastic superiority’. 
72
AUC - area under (ROC-) curve 
• In given rectangular matrix the total cell number 
is a product of the two sample sizes: 
• n1  n2 = 20  16 = 320 
• The observed maximum value of two additive 
components of Mann-Whitney U-statistics is the 
number of yellow cells in the matrix: 
• Umax = 285 
• So the point estimate for AUC is: 
• AUC = Umax / (n1  n2) = 285/320 = 0.89 
73
Interval estimation 
Researchers should wherever 
possible, base discussion and 
interpretation of results on point 
and interval estimates 
74
What is Confidence Interval? 
• Frequentist’s Confidence Interval is a random 
interval that covers the estimated (unknown) 
value of a given Parameter with the specified 
probability. 
• Such probability is called confidence level (or 
confidence coefficient). 
75
CI 
• If the experiment is repeated several times, the observed 
values for the limits of the Confidence Interval calculated 
from the observations will vary from sample to sample. 
• Frequently, with the probability (1 - ), it will include (cover) 
the estimated unknown value of parameter, but with the 
probability  it will inevitably miss the estimated value. 
• How frequently the observed interval contains the 
parameter is determined by the confidence level (or 
confidence coefficient). 
• Confidence level is chosen by the researcher in accordance 
with his intuition. 
76
Frequentist’s Confidence Interval (CI) 
 ~ ~ 
 
       
 
lower unkn upper 
  
  
unkn lower 
  
2 
2 
1 
upper unkn 
 
  
 
  
  
~ 
~ 
P 
P 
P 
77
The meaning of the Confidence Level 
• The meaning of the term “confidence level” is that, if 
confidence intervals are constructed across many separate 
data analyses of repeated (and possibly different) 
experiments, the proportion of such intervals that contain 
the true value of the parameter will approximately match 
the confidence level. 
• So, e.g., the 95% does not attach to the one frequentist CI, 
it attaches to “the proportion of such intervals”. 
• When only single CI is obtained, it is unknown whether it 
is true or not. 
• Again, we come to a conclusion about the need to repeat 
the experiment many times. 
78
Bayesian confidence (credible) interval 
79 
   1 L U P ~ 
  
 
2 
   L P ~ 
  
 
2 
   U P ~
Significance Level α and 
Confidence Level (1 – α) 
Significance 
level,  
Confidence 
level, 
(1 - ) 
Reliability 
0.05 95% Low 
0.01 99% Medium 
0.001 99.9% High 
80
Confidence interval and statistical significance 
Expected value of θ 100(1 – α)% CI for the unknown value θunkn: 
Unknown estimated by given interval 
value θunkn does not differ statistically 
from the expected value θ. 
Unknown estimated by given interval 
value θunkn is statistically significantly 
larger than the expected value θ at the 
significance level α. 
Unknown estimated by given interval 
value θunkn is statistically significantly 
smaller than the expected value θ at the 
significance level α. 
81
Statistical significance and practical (clinical) importance 
Estimated unknown difference is 
statistically nonsignificant and 
clinically unimportant 
CI is too wide; perhaps sample size 
is too small 
Estimated unknown difference is 
statistically significant, but 
clinically unimportant 
Estimated unknown difference is 
statistically significant and 
clinically important 
Expected “null” value CI 
82 
82 
Clinically indifferent zone or 
reference interval
Compact form for the 
joint presentation 
of point and interval estimations 
• Example: 
– AUC point estimation: 0.89 
– Lower limit of the 95% CI: 0.72 
– Upper limit of the 99% CI: 0.96 
• Compact record: 
• AUC  θ = 0.720.890.96 
• Louis T.A., Zeger S.L. Effective communication of standard errors 
and confidence intervals. Biostatistics, 2009; 10(1): 1–2. 
• Newcombe’s spreadsheet: GENERALISEDMW.XLS 
http://medicine.cf.ac.uk/primary-care-public-health/resources/ 
83
Statistical inference 
using confidence interval 
• Obtained 95% confidence interval (CI) does not cover the 
indifferent value AUCindiff = 0.5. 
• This means that the unknown value of AUCunkn estimated with this 
interval statistically significantly differs from the indifferent value 
AUCindiff = 0.5 (under the significance level α = 0.05). 
• Consequently, we can conclude that one of two comparing 
random variables stochastically dominates another. 
• When the shapes of both distributions are similar we can 
interpret this result as the statistically significant deviation of the 
estimated Hodges-Lehmann shift parameter from its indifferent 
value ΔHLindiff = 0. 
84
• Strictly speaking, widespread interpretation of the 
Mann-Whitney U-statistic as a measure of the 
difference between medians of the two comparing 
distributions is incorrect. 
• Mann-Whitney statistic is the measure of stochastic 
dominance of one of two independent distributions 
(not their medians). 
• When the shapes of both distribution are similar, 
than Mann-Whitney statistic becomes the basis for 
estimating the Hodges-Lehmann shift parameter. 
85
170 159 115 103 101 94 93 92 86 81 71 70 59 58 42 38 
104 -66 -55 -11 1 3 10 11 12 18 23 33 34 45 46 62 66 
121 -49 -38 6 18 20 27 28 29 35 40 50 51 62 63 79 83 
123 -47 -36 8 20 22 29 30 31 37 42 52 53 64 65 81 85 
123 -47 -36 8 20 l999=22 29 30 31 37 42 52 53 64 65 81 85 
127 -43 -32 12 24 26 33 34 35 41 46 56 57 68 69 85 89 
130 -40 -29 15 27 29 36 37 38 44 49 59 60 71 72 88 92 
132 -38 -27 17 29 l99=31 l95=38 39 40 46 51 61 62 73 u95=74 90 94 
134 -36 -25 19 31 33 40 41 42 48 53 63 64 75 76 92 96 
134 -36 -25 19 31 33 40 41 42 48 53 63 64 75 76 92 96 
140 -30 -19 25 37 39 46 47 48 54 59 69 70 81 82 98 102 
144 -26 -15 29 41 43 50 51 52 58 63 73 74 85 86 102 106 
146 -24 -13 31 43 45 52 53 54 60 65 75 76 u999=87 88 104 108 
147 -23 -12 32 44 46 53 54 55 61 66 76 77 88 89 105 109 
149 -21 -10 34 46 48 55 HL=56 57 63 68 78 79 90 91 107 111 
151 -19 -8 36 48 50 57 58 59 65 70 80 81 92 93 109 113 
153 -17 -6 38 50 52 59 60 61 67 72 82 83 94 95 111 115 
162 -8 3 47 59 61 68 69 70 76 81 91 92 103 104 120 124 
168 -2 9 53 65 67 74 75 76 82 87 97 98 109 110 126 130 
171 1 12 56 68 70 77 78 u99=79 85 90 100 101 112 113 129 133 
173 3 14 58 70 72 79 80 81 87 92 102 103 114 115 131 135 
86
Applying nonparametric confidence interval for the shift parameter to the 
comparison of the induced production of IFN-/ in healthy group and group with 
IUGR. Program StatXact http://www.cytel.com/software-solutions/statxact 
• Resulting Nonparametric Hodges-Lehmann point and 
interval estimates of the shift parameter are: 
• ΔHL = 385674 IU/mL 
• This 95% confidence interval doesn’t cover the 
indifferent value of the shift Δindiff = 0. 
• So estimated with this interval unknown value of the 
shift Δunkn statistically significantly differs from 0 at 
the significance level α = 0,05. 
• Therefore the induced production IFN-α/β in IUGR 
group is statistically significantly higher than in 
healthy group. 
87
Applying parametric confidence interval for the mean difference to the comparison 
of the induced production of IFN-/ in healthy group and group with IUGR. 
Free Program ESCI JSMS.xls http://www.latrobe.edu.au/psy/esci/ 
• Parametric point and interval estimates 
of the difference of two means are: 
• Δ = 335271 IU/mL 
• This 95% confidence interval doesn’t 
cover the indifferent value Δindiff = 0. 
• So estimated with this interval 
unknown value of the difference Δunkn 
statistically significantly differs from 0 
at the significance level α = 0,05. 
• Therefore the induced production IFN- 
α/β in IUGR group is statistically 
significantly higher than in healthy 
group. 
88 
ES  Δ = 33.152.171.0 IU/mL; 
dC = 1.87; Student t = 5.58
Visualization of the comparison two meand using confidence 
interval for the mean difference Free Program ESCI JSMS.xls 
http://www.latrobe.edu.au/psy/esci/ 
• Presented 95% confidence 
interval (rose triangle and 
vertical segment) for the mean 
difference doesn’t cover the 
indifferent value Δindiff = 0. 
• So estimated with this interval 
unknown value of the difference 
Δunkn statistically significantly 
differs from 0 at the significance 
level α = 0.05. 
• Therefore the induced 
production IFN-α/β in IUGR 
group is statistically significantly 
higher than in healthy group. 
89 
Blue circles are observed values. Black dots 
and vertical segments are point and interval 
estimates of the unknown means. Rose 
triangle and vertical segment are estimates 
of their unknown difference.
Newcombe’s standardized 
effect size: δN or StAUC 
• When σ1 = σ2 = σ, θ reduces to 
• Φ(δN /√2) 
• that is expressed in terms of the standard deviation σ. 
• Here Φ is common notation for the CDF (Cumulative 
Density Function) of the standard Gaussian (normal) 
distribution. 
• θ is more preferable than δN, as it is less 
depends on distributional assumptions, thus 
more satisfactory than the standardized 
difference. 
90
Interrelationship between AUC and StAUC 
AUC  θ StAUC  δN Size StAUC  δN AUC  θ 
0.5 0 0 0.50 
0.55 0.18 XS 
extra-small 
0.25 0.57 
0.6 0.36 0.5 0.64 
0.65 0.55 S 
small 
0.75 0.70 
0.7 0.74 1 0.76 
0.75 0.95 M 
medium 
1.25 0.81 
0.8 1.2 1.5 0.86 
0.85 1.5 L 
large 
1.75 0.89 
0.9 1.8 2 0.92 
0.95 2.3 XL 
extra-large 
2.5 0.96 
0.99 3.3 3 0.98 
0.999 4.4 XXL 
extra-extra-large 
3.5 0.993 
4 0.998 
91
Standardized 
Cohen’s effect size, StES  dC 
M M 
d 1 2 
pooled s 
C 
 
 
92
Standardized effect size (mean 
difference), StES  dC; how it looks like 
93
Verbal scale for the interpretation of the 
standardized Cohen’s effect size 
Standardized Cohen’s 
effect size, dC 
Interpretation 
0 – 0,5 Negligibly small (worthless) 
0,5 – 1,0 Small (weak) 
1,0 – 1,5 Moderate 
1,5 – 2,0 Large (strong) 
2,0 – 3,0 Very large (very strong) 
3,0 -  Extremely large 
94
Once more: Statistical significance and 
the Effect size 
• Effect (difference, association, correlation, risk, 
benefit, etc.) can be statistically significant, 
however, its practical (e.g., clinical) importance can 
appeared to be worthless. 
• “Statistically significant” does not imply 
“substantial”, “practically important”, “valuable”. 
• Effects can be real, nonrandom, but nonetheless, 
negligibly small. 
95
Confidence interval for the Standardized 
Cohen’s Effect Size dC. Free Program LePrep 
http://www.univ-rouen.fr/LMRS/Persopage/Lecoutre/PAC.htm 
96
Results: point estimates and 95% confidence 
intervals for the three main effect sizes 
• AUC – area under the ROC-curve: 
• AUC = 0.720.890.96 
• StAUC – Newcombe’s standardized AUC: 
• StAUC = δN = 0.81.72.5 
• StES – Cohen’s standardized difference of means: 
• StES = dC = 1.11.92.7 
• Verbal interpretation: 
• with probability 95% the estimated unknown effect 
sizes can be interpreted as from medium to very large 
(strong). 
97
Statistical predictions and 
reproducibility 
“Prediction is very difficult, 
especially about the future” 
98
Repeat! 
• Often it is believed that if the “statistically 
significant” result is obtained, this excludes the 
need of repeating the experiment. 
• Testing the significance of statistical 
hypotheses is a method to detect rare events 
which deserve further investigation. 
• Fisher 
99
Cumming G. The New Statistics: 
Why and How. Psychological Science, 
2014; 25(1): 7 –29. 
• Three problems are central: 
• Published research is a biased selection of all 
researches; 
• data analysis and reporting are 
often elective and biased; and 
• in many research fields, studies are 
rarely replicated, so false 
conclusions persist. 
100
Replication 
• A single study is rarely, if ever, definitive; additional 
related evidences are required. 
• Such evidences may come from a close replication, 
which, with meta-analysis, should give more reliable 
estimates than the original study. 
• A more general replication may increase reliability 
and also provide evidence of generality or robustness 
of the original finding. 
• We need increased recognition of the value of both 
close and more general replications, and greater 
opportunities to report them. 
101
Reproducibility and predictive ability of P-values and 
confidence intervals (n = 32). CI dance. 
Free program “ESCI PPS p intervals” http://www.latrobe.edu.au/psy/esci/. 
Cumming G. Replication and p intervals: p values predict the future only vaguely, but 
confidence intervals do much better. Persp. Psychol. Sci., 2008; 3: 286-300. 
102
• Thus, it is risky to rich definite conclusion 
from a single experiment only. 
• Any scientific investigation should be 
repeated manifold. 
• And a reproducibility of the results must be 
studied. 
103
Gigerenzer G. We need statistical thinking, 
not rituals. Behavioral and Brain Sciences, 
1998; 21(2): 199-200 
• A researcher cannot be unconcerned about: 
• “what would happen if additional subjects were to be included into the 
experiment?”, 
• “what would be the conclusion for the data of these future subjects?”, 
• “what would be the conclusion for the whole data?”, or 
• “what would happen if this experiment were to be repeated?” 
• Asking and answering such questions goes beyond the ritualized 
statistical procedures, and is likely to influence the way the authors of 
scientific papers interpret experimental findings and conduct their 
experiments. 
• Prediction probabilities are an unavoidable part of statistical thinking 
and the time is come to take them seriously. 
104
Prediction and confidence intervals. 
Program Instat+ http://www.reading.ac.uk/ssc/n/n_instat.htm 
105
Reproducibility of the absolute effect size ES for the healthy 
and IUGR groups at α = 0.05 and (1 – α) = 0.95 
106 
95% confidence interval for ES  Δ is from 33 to 71 IU/mL; 
95% prediction interval for it is wider: from 25 to 78 IU/mL.
10-fold increasing sample size 
107 
If we will repeat the experiment 10 times independently, the prediction 
interval will become narrower and closer to the confidence level.
Prediction interval versus confidence interval 
• Note that under 10-fold repetition of the 
experiment the 95% prediction interval 
becomes closer the observed 95% confidence 
interval. 
• This is demonstration of the meaning of 
confidence interval as that one which covers 
the estimated effect size under manifold 
(infinite) repetitions of the experiment. 
108
Reproducibility of the standardized Cohen’s effect size dC for 
the healthy and IUGR groups at α = 0.05 and (1 – α) = 0.95 
109 
95% confidence interval for StES  dC is from 1.1 to 2.7 IU/mL; 
95% prediction interval for it is wider: from 0.8 to 3.1 IU/mL.
10-fold increasing sample size 
110 
If we will repeat the experiment 10 times independently, the prediction 
interval will become narrower and closer to the confidence level.
Prediction probabilities, Prep, Psrep and Preprep 
111 
Probability of a same-sign effect is Prep = 1.0; of a same-sign and significant at α 
= 0.05 is Psrep = 0.99 and of a same-sign effect with Prep = 0.99 is Preprep = 0.98.
Reproducibility of the P-value when comparing healthy and IUGR 
groups at α = 0.05 and (1 – α) = 0.95 
112 
Observed Pval = 3∙10-6. 95% prediction interval for it will be from extremely 
small from 3∙10-11 to the moderate 0.01.
Probabilities of replication and prediction intervals 
• Thus, it is predicted that when our experiment will be 
repeated, than the probability to receive the same sign for 
the mean difference (expressed as absolute effect size ES as 
well as Cohen’s standardized effect size dC) will be 
• Prep = 1.00. 
• And the probability to receive the difference of the same 
sign and statistically significant at the level α = 0.05 will be 
• Psrep = 0.99. 
• Moreover, it is predicted that in future repetition of the 
experiment, the P-value could lie in very wide 95% 
prediction interval from very low to rather medium: 
• Pval = 3∙10-11 to Pval = 0.01. 
113
Main statistical tools and their destination 
• Bayes Factor (BF) → comparing statistical 
models and/or hypotheses 
• P-value → statistical hypothesis testing 
• Effect Size (ES) → practical (clinical) importance 
• Confidence intervals (CI) → visualization of both, 
the estimates and the hypotheses testing 
• Prediction Intervals (PI) → prediction of future 
repetitions 
114
Bayes theorem in action: 
connecting prior and posterior 
probabilities 
115
Reverend Thomas Bayes 
(c. 1702 – April 17, 1761) 
116
117 
Bayes Factor 
• Bayes factor differs principally from P-value (Рval). 
• Base factor is not a probability in itself, but a ratio 
of probabilities, and it can vary from zero to infinity: 
• BF01 = P(Dobs|H0) / P(Dobs|H1) 
• BF10 = P(Dobs|H1) / P(Dobs|H0) 
• This means that using Bayes factor provide not only 
testing the significance of the null hypothesis, but 
comparison of the probabilities to obtain the 
observed data under both hypotheses. 
• However, for this we should have a better idea 
of the alternative hypothesis.
Amazing property of Bayes factor in 
terms of “odds” 
118
What are the odds? 
• The odds (in favor) of an event A is the ratio of 
the probability that the event will happen 
P(A) to the probability that the event will not happen P(Ā): 
• O(A) = P(A) : P(Ā) = P(A) : [1 – P(A)] 
• Conversely, the odds against an event A is the opposite 
ratio. 
• Such a representation of the probability is familiar to 
geneticists. 
• Famous Mendel’s ratio of 3 : 1 is a representation of the 
probabilities 3/4 and 1/4 in terms of odds. 
119
Bayes factor BF in terms of odds 
• Base factor not only shows how many times the probability 
P(Dobs|H0) differs from the probability P(Dobs|H1). 
• It also shows how many times the posterior odds in favor of one 
hypothesis against the other (alternative) differ from their a prior 
odds. 
• Conversely, 
| 
P H D 
1 obs 
| 
P D H 
obs 0 
  
: 
10 • BF01 = 1/BF10 
P H 
• Thus, we observe an amazing property of Bayes factor: 
• without knowing prior and posterior probabilities of both 
hypotheses, we can quantitatively compare their odds. 
120 
  
  
  
  
  
1 
  0 
0 obs 
obs 0 
P H 
P H D 
P D H 
BF 
| 
|
Interpretation of credibility of Bayes factors 
BF10 and BF01 
121 
BF01 
Evidence in favor of hypothesis Н0 against 
hypothesis Н1 
>10 000 Convincing 
100 – 1 000 Very strong 
30 – 100 Strong 
10 – 30 Moderate 
3 – 10 Weak 
1 – 3 Negligible 
BF10 
Evidence in favor of hypothesis Н1 against 
hypothesis Н0
John Arbuthnot 
29.04.1667 – 27.02.1735 
122
Number of Christened in 
London during 82 years 
Year Boys Girls Year Boys Girls 
1629 5218 > 4683 1650 2890 > 2722 
1630 4858 > 4457 3231 > 2840 
4422 > 4102 3220 > 2908 
4994 > 4590 3196 > 2959 
5158 > 4839 3441 > 3179 
5035 > 4820 3655 > 3349 
5106 > 4928 3668 > 3382 
4917 > 4605 3396 > 3289 
4703 > 4457 3157 > 3013 
5359 > 4952 3209 > 2781 
5366 > 4784 1660 3724 > 3247 
1640 5518 > 5332 4748 > 4107 
5470 > 5200 5216 > 4803 
5460 > 4910 5411 > 4881 
4793 > 4617 6041 > 5881 
4107 > 3997 5114 > 4858 
4047 > 3919 4678 > 4319 
3768 > 3395 5616 > 5322 
3796 > 3536 6073 > 5560 
3363 > 3181 1669 6506 > 5829 
1649 3079 > 2746 
Year Boys Girls Year Boys Girls 
1670 6278 > 5719 1691 7662 > 7392 
6449 > 6061 7602 > 7316 
6443 > 6120 7676 > 7483 
6073 > 5822 6985 > 6647 
6113 > 5738 7263 > 6713 
6058 > 5717 7632 > 7229 
6552 > 5847 8062 > 7767 
6423 > 6203 8426 > 7626 
6568 > 6033 7911 > 7452 
6247 > 6041 1700 7578 > 7061 
1680 6548 > 6299 8102 > 7514 
6822 > 6533 8031 > 7656 
6909 > 6744 7765 > 7683 
7577 > 7158 6113 > 5738 
7575 > 7127 8366 > 7779 
7484 > 7246 7952 > 7417 
7575 > 7119 8379 > 7687 
7737 > 7214 8239 > 7623 
7487 > 7101 7840 > 7380 
7604 > 7167 1710 7640 > 7288 
1690 7909 > 7302 
• Total 484 382 > 454 041 
• Total sum 938 423 
123
Comparison of the frequentist and Bayesian results 
• Testing homogeneity (independence) of the Arbuthnot data results 
in: 
• Pval ≈ 10-8 
• BF01 = 8∙10117 
• From the frequentist point of view the heterogeneity of Arbuthnot 
data is statistically highly significant. 
• From the Bayesian point of view the conclusion is diametrically 
opposite: 
• To obtain such data is 8∙10117 times more likely under the hypothesis 
H0 on their homogeneity then under the alternative hypothesis H1 
on their heterogeneity. 
• Or: 
• The posterior odds in favor of the null hypothesis against alternative 
hypothesis are 8∙10117 times higher then their prior odds. 
124
Bayes Factor, online program Bayes Factor Calculators 
http://pcl.missouri.edu/bayesfactor 
125
Output 
• BF01 = 0.00018 and 
• BF10 = 1/ BF01 = 5555.5 
• It is 5555 times more likely 
to obtain the value of the 
Student t-test statistic t = 
5.58 with df = 34 under the 
H1:   0 than under H0:  = 
0. 
• According to the verbal 
scale such value of BF10 is 
interpreted as convincing 
evidence in favor of H1 
against H0. 
126
Summary 
Statistical evidences 
• AUC  θ = 0.720.890.96 
• StAUC  δN = 0.81.72.5 
• StES  dC = 1.11.92.7 
• ΔHL = 385674 IU/mL 
• Δ = 335271 IU/mL 
• BF10= 5555 
• Pval = 3∙10-6 
Statistical predictions 
• 95% prediction intervals: 
• From 0.8 to 3.1 IU/mL 
• From 25 to 79 IU/mL 
• From 3∙10-11 to 0.010 
• Probability of replication: 
• Psrep = 0.99 
127
Example 2 
TGT – Thrombin Generation Test 
128
Castoldi E., Rosing J. Thrombin generation tests. Thrombosis 
Research, 2011; 127(Suppl. 3): S21–S25 
• Parameters of the 
thrombin generation curve: 
• LT – lag time, min 
• TTP – time to peak, min 
• PT – peak thrombin, nM 
• ETP – endogenous 
thrombin potential, 
nM∙min 
• V – maximum velocity of 
thrombin generation, 
V = PT / (TTP – LT), nM/min 
129
Estimation of parameters of TGT, results of traditional NHST 
and effect sizes. n1 = 40, n2 = 53 
LT, min ETP, nM∙min TTP, min PT, nM V, nM/min 
RI 8.0 – 27.4 1290 – 2480 17 – 41 85 – 192 5.3 – 25.4 
M1 14 16 17 1820 1900 1990 25 27 28 125 134 144 11 13 15 
M2 15 17 19 1640 1740 1830 29 31 33 100 106 113 7.1 7.9 8.7 
Pval 0.37 0.015 0.0012 3∙10-6 10-8 
Effect sizes 
ΔHL -3.3 -1.0 1.2 52 188 323 -7.3 -4.6 -1.8 14 28 40 3.3 4.6 6.0 
SE  Δ -3.4 -1.3 0.7 43 167 294 -7.1 -4.5 -2.1 17 28 39 3.4 5.1 6.7 
AUC  θ 0.44 0.55 0.67 0.55 0.67 0.77 0.68 0.70 0.79 0.66 0.77 0.85 0.73 0.83 0.90 
StAUC  δN -0.61 -0.20 0.22 0.19 0.63 1.04 -1.13 -0.72 -0.28 0.53 1.06 1.48 0.89 1.36 1.80 
StES  dC -0.66 -0.25 0.16 0.10 0.52 0.94 -1.15 -0.73 -0.30 0.65 1.09 1.53 0.89 1.35 1.80 
n1 and n2 – sample sizes of the control and CAD groups; RI – nonparametric reference 
interval; М1 and М2 – sample means; Pval – P-value; ΔHL – Hodges-Lehmann shift 
estimate; Δ = М1 – М2 – effect size in real units; θ - area under ROC-curve; δN and dC 
– Newcombe’s and Cohen’s standardized effect sizes. 
Programs: Reference Value Advisor, PAST, StatXact, GENERALIZED.xls, ESCI-JSMS.xls, 
LePrep. 
130
Informativeness of TGT parameters 
53 CHD patients and 40 people without clinical manifestations of 
coronary heart disease (data by Berezovskaya G.A.) 
dC – standardized Cohen’s effect size, Pval – Р-value, BF10 – Bayes factor for 
comparison of odds in favor of H1 versus H0, Psrep – probability of statistically 
significant effect of the same sign (direction) in a replication, Power – “achieved” 
power, n1 = n2 – minimum sample sizes for replication. Programs: ESCI-JSMS.xls, 
Online BF Calculator (http://pcl.missouri.edu/bayesfactor), LePrep, G*Power 
131
Syndrome of statistical leniency and 
credulity 
Fallacies and Confusions of Null 
Hypothesis Significance Testing 
(NHST) and P-value 
“What does a statistician call it when the 
heads of 10 rats are cut off and 1 survives? 
- Nonsignificant.” 
132
P-value 
• P-value is the most controversial concept in statistics. 
• Many textbook authors and the majority of experimenters do not 
understand what its final product – a P-value – actually means 
(Gigerenzer, 1988). 
• The concept of a P-value lies so far from the intuitive 
understanding that no ordinary person can hold it in memory. 
• ‘‘We rely too much on P values, and most of us really don’t have a 
clue what they mean.’’ 
• Lai J., Fidler F., Cumming G. Subjective p intervals: Researchers 
underestimate the variability of p values over replication. 
Methodology: European Journal of Research Methods for the 
Behavioral and Social Sciences, 2012; 8: 51-62. 
133
What is P-value? What is null hypothesis H0? 
• A P-value is the probability of observing data as or more 
extreme as the actual outcome when the null hypothesis 
is true. 
• When testing null hypothesis we transform data into a 
test statistic. 
• Then the P-value is the probability of obtaining a test 
statistic at least as extreme as the one that was actually 
observed, assuming that the null hypothesis is true. 
• Usually the null hypothesis is a statement of 'no effect' or 
'no difference'. 
• The Null Hypothesis is often denoted H0 (read “H-nought”) 
134
Null Hypothesis Significance Testing Waltz 
• The P value is at the heart of the most common approach to data 
analysis – Null Hypothesis Significance Testing (NHST). 
• Think of NHST as a waltz with three steps: 
• (i) State a null hypothesis: that is, there is no effect. 
• (ii) Calculate the p value, which is the probability of getting results 
like ours and more extreme – if the null hypothesis is true. 
• (iii) If Pval is sufficiently small, reject the null hypothesis and sound 
the trumpets: 
• our effect is not zero, it's statistically significant! 
• Generations of students have been inducted into 
the rituals of .05 meaning "significant", 
and .01 "highly significant". 
135
Р-value, Рval 
• Thus, by definition, the P-value (Pval) is the conditional probability of obtaining the 
observed value of difference (dobs) and all other larger or less probable values 
(D ≥ dobs|H0), when the null hypothesis is true: 
• Pval = P(D ≥ dobs|H0). 
• In terms of the statistical hypothesis testing, P-value is: 
• The probability to obtain the modulus of observed value |tobs| of the test statistic T 
and all other larger or less probable values (i.e., the values even more deviating from 
the expected one) 
• under assumption that the null hypothesis H0 is true: 
• 
• Pval = P(|T| ≥ |tobs.| | H0). 
• Note that the “less probable values” are not observed. 
• We infer them out of all possible values in the frame of the chosen (null) model. 
136
• A P-value is usually interpreted as a measure of 
how much evidence we have against the null 
hypothesis, how much is contradiction between 
null hypothesis and observed data. 
• The null hypothesis, traditionally represented by 
the symbol H0, represents the hypothesis of no 
change or no effect. 
• The smaller the P-value, the more (stronger) 
evidence we have against H0. 
137
What is Test Statistic? 
• Test statistic is a statistic used for the testing the given null 
hypothesis. 
• Example: Student t-test statistic: 
M ~ M ~ 
• In such a case testing the null hypothesis H0 on the equality of two 
independent means (H0: M1 – M2 = 0) is reduced to the testing the 
null hypothesis on the t = 0. 
• When this hypothesis is true, than the distribution of the t-statistic 
is known. 
• Namely, it is the Student t-distribution. 
• This distribution has a single parameter called degrees of 
freedom, df. 
  
1 2 2 
1 2 
1 2 
   
 
 
 
, df n n 
s ~ 
t ~ 
M M 
138
William Sealy Gosset (June 13, 1876–October 16, 1937) is famous as 
a statistician, best known by his pen name Student and for his work 
on Student's t-distribution. 
139
n1 = 5, n2 = 7, df = 10, t = 1,5 
P = 0,16 – the difference is statistically nonsignificant 
140 
http://ftparmy.com/103097-decision-visualizer.html
n1 = 5, n2 = 7, df = 10, t = 3,0 
P = 0,013 – the difference is statistically significant at 
the significance level α = 0,05, but not at 0,01 
141
Searching the threshold for the P-value: is it possible? 
• When small P-value is observed, the intuitive 
(extrastatistical) temptation appears to reject null 
hypothesis H0. 
• However, there is no statistical reason what P-value 
would be regarded as sufficiently small to reject H0 
safely. 
• Once again, such decision is extrastatistical. 
• In practice, decision to reject or accept H0 must 
depend on circumstances. 
• In each specific (concrete) situation researcher 
should make her/his choice by oneself. 
142
143 
Traditional interpretation 
of the P-values (Pval) 
(and their Michelin star scale) 
P-value (Pval) Statistical significance Michelin stars 
> 0,05 Nonsignificant 
0,05 – 0,01 Moderately significant * 
0,01 – 0,001 Significant ** 
0,001 – 0,0001 Highly significant *** 
< 0,0001 Extremely significant **** 
Four stars value 0,0001 was introduced recently by Harvey J. Motulsky: 
http://www.graphpad.com/guides/prism/6/statistics/index.htm?interpreting_a_small_p_value_from_an_unpaired_t_test.htm
Tyranny and/or hypnosis of the figures 
0.05 and 95% 
• Unfortunately, as a threshold the significance level 
α = 0.05 is most commonly used. 
• Too often the overcoming this threshold level 
(Pval < 0.05) solely in a single experiment is regarded 
as sufficient for the decision to reject the null 
hypothesis and conclude on the statistical 
significance of the observed effect. 
144
Andrey Nikolaevich Kolmogorov 
(25 April 1903 – 20 October 1987) 
• In statistics, the recommended 
significance level varies from 
0.05 for preliminary orientation 
experiments to 0.001 for 
important ultimate conclusions, 
but the attainable reliability of 
probability conclusions is often 
much higher. 
• Thus, the principal conclusions of 
statistical physics are based on 
the neglect of probabilities of an 
order less than 10−10. 
• (1951) 
145 
http://www.encyclopediaofmath.org/index.php/Probability
Sterne J.A.C., Davey Smith G. 
Sifting the evidence – 
what’s wrong with significance tests? 
BMJ, 2001; 322: 227-231. Cited by 763 
• Presently, several other authors echo to Kolmogorov: 
• P-value closer to 0.05 is not a strong evidence against null 
hypothesis. 
• As a strong evidence against Н0 Pval < 0.001 should be 
regarded. 
• In addition to P-values it is strongly recommended to 
present confidence intervals for the effect size. 
146
“Flexible” P-values 
• In fact no scientific worker has a fixed level of 
significance at which from year to year, and 
in all circumstances, he rejects hypotheses; 
• he rather gives his mind to each particular 
case in the light of his evidence and his ideas. 
• 
• Fisher R. A. Statistical Methods and Scientific Inference, 
1956, pages 41-42. 
147
Sir Ronald Aylmer Fisher 
17 Feb 1890 - 29 July 1962 
148
Warrning 
• Usually P-value is interpreted as a measure for the 
evidence given by the available data against the null 
hypothesis. 
• Strictly speaking, however, it is not a measure in 
mathematical sense. 
• It does not possess the additivity property, and 
moreover, 
• it does not satisfy to two the more important principle 
of the statistical theory – The Likelihood Principle and 
the P-postulate. 
149
Likelihood Principle 
• Verbosely, the Likelihood Principle is a statement 
that statistical analysis must operate with that and 
only that data which are actually obtained in the 
experiment. 
• However, for the calculation of Р-value (as it 
follows from its definition), not only the observed 
experimental data are used, but all other, less 
probable, which were not observed in fact. 
150
Р-postulate 
• To serve as real and adequate measure of 
the statistical evidence, Р-value should 
satisfy the simple rule (postulate) according to 
which the same Р-values have to present equal 
evidences against the null hypothesis. 
• This rule is called «Р-postulate». 
• Obviously, this minimal requirement is not met. 
• 
• Wagenmakers E.-J. A practical solution to the pervasive 
problems of p values. Psychonomic Bulletin & Review, 2007; 
14(5): 779-804. 
151
Р-postulate 
• Intuitively one can recognize that Рval = 0.01 in the 
experiment with 10 observations will not 
demonstrate the same evidential strength as 
Рval = 0.01 in the experiment with 300 observations. 
• Equally, Рval = 0.001, obtained in one experiment 
and Рval = 0.01 in another does not imply that the 
effect observed in the first experiment is 10 times 
more evidential than in the second. 
152
P-value is the realization of corresponding 
random variable P* 
• P-value is an observed value of the corresponding 
random variable 
• P* 
• When null hypothesis H0 is true, then Pval has so 
called (continuous) standard uniform distribution, 
that is uniform distribution on the interval [0; 1]: 
• P* ~ Uni[0; 1]. 
153
P-value distributions 
Pike N. free spreadsheet: FDR.xls http://www.webcitation.org/5rxSzU7qL 
Δ = μ1 – μ2 = 0; 
Δ = μ1 – μ2 = 10; 
χ2 = 390,6; df = 400; Pval = 0,62 
χ2 = 1348,8; df = 400; Pval = 4∙10-101 
154 
120 
100 
80 
60 
40 
20 
0 
Frequency distribution of p-values 
Observed frequency Expected frequency 
0.05 
0.10 
0.15 
0.20 
0.25 
0.30 
0.35 
0.40 
0.45 
0.50 
0.55 
0.60 
0.65 
0.70 
0.75 
0.80 
0.85 
0.90 
0.95 
1.00 
Frequency of values in range 
p-value defining upper limit of range 
16 
14 
12 
10 
8 
6 
4 
2 
0 
Frequency distribution of p-values 
Observed frequency Expected frequency 
0.05 
0.10 
0.15 
0.20 
0.25 
0.30 
0.35 
0.40 
0.45 
0.50 
0.55 
0.60 
0.65 
0.70 
0.75 
0.80 
0.85 
0.90 
0.95 
1.00 
Frequency of values in range 
p-value defining upper limit of range 
These are histograms obtained with 200 simulations.
Reproducibility and predictive ability of P-values and 95% 
confidence intervals (n = 32). Dance of Pval 
Free program “ESCI PPS p intervals” http://www.latrobe.edu.au/psy/esci/. 
Cumming G. Replication and p intervals: p values predict the future only vaguely, but 
confidence intervals do much better. Persp. Psychol. Sci., 2008; 3: 286-300. 
155
Reproducibility and predictive ability of P-values and 95% 
confidence intervals (n = 32). Dance of Pval 
Free spreadsheet “ESCI PPS p intervals” http://www.latrobe.edu.au/psy/esci/. Cumming 
G. Replication and p intervals: p values predict the future only vaguely, but confidence 
intervals do much better. Persp. Psychol. Sci., 2008; 3: 286-300. 
156
Reproducibility of the P-value when comparing healthy and IUGR 
groups at α = 0.05 and (1 – α) = 0.95 
157 
Observed Pval = 3∙10-6. 95% prediction interval for it will be from extremely 
small from 3∙10-11 to the moderate 0.01.
Popular temptation 
• It is conventional to interpret the quintessence of 
traditional (frequentist) conclusions from the 
statistical hypotheses testing as: 
• The less P-value, the stronger is evidence (which is 
presented by the data) against null hypothesis H0 
the bigger is a reason to doubt in H0. 
• Hence, whether intentionally or not (and seems 
rather naturally), the temptation appears to 
interpret P-value as a probability of the null 
hypothesis. 
158
Popular delusion 
• P-value is not a probability of null hypothesis! 
• P-value is calculated 
• under the assumption 
• that null hypothesis H0 is true: 
• Pval = P(|D| ≥ |dobs||H0), 
• Hence, P-value cannot be a probability of null 
hypothesis: 
• P{D|H0} ≠ P{H0|D} 
• Collection of other fallacies about P-value see, e.g.: 
• http://en.wikipedia.org/wiki/P-value 
• Goodman S. A dirty dozen: Twelve P-value 
misconceptions. Semin. Hematol., 2008; 45: 135-140 
159
Calibration of P-values 
• Vovk V. G. A logic of probability, with application to the foundations of statistics. Journal of 
the Royal Statistical Society. Series B (Methodological), 1993; 55(2): 317-351. 
• Sellke T., Bayarri M.J., Berger J.O. Calibration of p values for testing precise null hypotheses. 
The American Statistician, 2001; 55(1): 62-71. Cited by 321 
• When 
  
BF 
|  
P H D 
0 1 BF 
01 
01 
 
• - lower bound for the probability of the null hypothesis H0 
160 
P 1 e val  
  01 val val BF  eP  lnP
161 
The “price” of P-values 
Observed 
P-value 
Upper limit of 
80% intreval for 
Pval 
Lower limit for the 
probability of hull 
hypothesis 
P(H0) 
Upper limit for the 
probabililty of 
repeat 
Рrepr 
0.05 0.44 ≥ 29% < 50% 
0.01 0.22 ≥ 11% < 73% 
0.001 0.07 ≥ 1.8% < 90% 
Sellke T., Bayarri M.J., Berger J.O. Calibration of p values for testing precise null hypotheses. The 
American Statistician, Vol. 55, No. 1. (2001), pp. 62-71. 
Goodman S.N. A comment on replication, p-values and evidence // Statistics in Medicine, 1992. 
– Vol. 11. – P. 875-879. 
Cumming G. Replication and p intervals: p values predict the future only vaguely, but confidence 
intervals do much better // Perspectives on Psychological Science, 2008. – Vol. 3. – No. 4. – P. 
186-300.
The problem with p values: how significant are they, really? 
November 12th, 2013 Geoff Cumming 
http://phys.org/wire-news/145707973/the-problem-with-p-values-how-significant-are-they-really.html 
A p value of 0.05 has been the default ‘significance’ threshold for nearly 90 
years … but is that standard too weak? Martin_Heigan 
162
Funny metaphor 
• “Perhaps p values are like mosquitos. 
• They have an evolutionary niche somewhere 
and no amount of scratching, swatting, or 
spraying will dislodge them”. 
• Campbell J.P. Editorial: Some remarks from 
the outgoing editor. Journal of Applied 
Psychology, 1982; 67: 691-700 
163
• The usefulness of P-values is quite limited, 
and we continue to suggest that these 
procedures be euthanized. 
• Anderson D.R., Burnham K.P. Avoiding pitfalls 
when using information-theoretic methods. 
The Journal of Wildlife Management, 2002; 
66(3): 912-918. 
164
On seduction: 
• Yes, the P-value can seduce. 
• It is sexy and we can be blinded. 
• A significant P-value can perplex our thinking, where we simply get 
too excited and forget to look at the actual effect size. 
• Does that < 0.05 really matter when the effect size is small? 
• The study which concluded that the "internet is changing the 
dynamics and outcomes of marriage itself“ can be an example. 
• This study showed that those who meet their spouses online are less 
likely to divorce and more likely to have high marital satisfaction (of 
course with very significant P-values). 
• However, the effect size was very very small where happiness, for 
example, barely moved from 5.48 to 5.64. 
• So, do not sign up for match.com thinking that you may be happier 
with your spouse. 
165
Meaning of the P-value: 
Publish or Perish 
166
Pee-value 
(http://wmbriggs.com/blog/?p=9338) 
Statistics is the only field in which men boast of their 
wee p-values 
167
• Revised standards for 
statistical evidence 
• Valen E. Johnson 
• PNAS, 2013; 110(48): 
19313–19317 
• Supporting 
Information: 
• Johnson 
10.1073/pnas.1313476 
110 
168
Evidence thresholds γ and size of corresponding significance tests α 
169
Revised standards for statistical evidence 
• A simple strategy for improving the replicability of scientific 
research includes the following steps: 
• (i) Associate statistically significant test results with P values 
that are less than 0.005. 
• (ii) Associate highly significant test results with P values that 
are less than 0.001 (cf. Kolmogorov) and even 0.0001. 
• (iii) Report the Bayes factor in favor of the alternative 
hypothesis and the default alternative hypothesis that was 
tested. 
170
Revised standards for statistical evidence 
• (iv) BF10 > 30 or even > 100 should be 
considered as strong and convincing evidence 
in favor of alternative hypothesis H1. 
• Proposed modifications of common 
standards of evidence intend to reduce the 
rate of nonreproducibility of scientific results 
by a factor of 5 or greater. 
• Certainly, the larger sample sizes are 
required. 
171
Minimum sizes for two independent samples with non-overlapping 
values required to achieve the lower confidence 
limits for two measures of the effect size: AUCL and SESL 
Lower confidence limits for 
the effect size measured 
with: 
Confidence levels 
AUCL StAUCL 0.95 0.99 0.999 
0.80 1.2 10 17 27 
0.90 1.8 21 35 56 
0.95 2.3 40 69 111 
0.99 3.3 194 334 545 
0.999 4.4 1923 3320 5418 
Extrapolated using Newcombe’s free spreadsheet VISUALISETHETA.xls 
http://medicine.cf.ac.uk/primary-care-public-health/resources/ 
172
Джон Уайлдер Тьюки (John Wilder Tukey, 16.04.1915 — 26.07.2000) 
• Any research should be at 
least two-staged. 
• First stage – exploratory 
(preliminary, pilot, 
hypotheses generating) 
study. 
• Second stage – confirmatory 
study. 
• The second stage is designed 
on the basis of the results 
obtained at the first stage. 
173
Conclusions 
• Bad reproducibility of experimental results 
becomes a systemic problem in biomedicine. 
• One of the main reason of this is inadequate 
statistical analysis. 
• Statistical analysis should be comprehensive 
harmonizing statistical evidences and predictions as 
well as frequentist and Bayesian approaches. 
• It is insufficient to carry out the null hypothesis 
significance testing (NHST) reporting P-values. 
174
Conclusions (continued) 
• Statistical significance doesn’t mean clinical 
importance. 
• Effect size with confidence and prediction intervals 
should be reported. 
• Experiments an/or observations should be repeated 
many-many times and their agreement should be 
investigated. 
• The best way is to repeat the experiments 
independently in different laboratories (in different 
countries). 
175
Editorial politics 
• Journal editors and reviewers should not accept for 
publications the papers if they report results of a single 
experiment and no results of the independent replication. 
• Experts on statistics should be included in the editorial 
boards. 
• Reviewers should be obliged to re-examine all the 
calculations. 
• For this reason the free access to the initial (“raw”) data 
should be ensure. 
• Transparency and openness are cornerstones of the 
scientific method. 
176
Francis Galton, 1901 
• “I have begun to think that no one ought to 
publish biometric results, without lodging a 
well-arranged and well-bound copy of his data in some 
place where it should be accessible, under reasonable 
restrictions, to those who desire to verify his work.” 
• Galton F. Biometry. Biometrika, 1901; 1(1): 7-10. 
• Galton’s suggestion of a store data had been 
revived by Professor Julian Huxley, and 
suggestion made for storing measurements 
in the British Museum of Natural History. 
177
• One of the most 
common and leading to 
the biggest disaster of 
temptations is 
tempting with the 
words: "Everybody 
does it" 
• Leo Tolstoy 
178
Books on Bayesian biostatistics 
179
180 
Lesaffre E., Lawson A. Bayesian 
Biostatistics. Bayesian Biostatistics. 
2012. Wiley. 534 p. 
Broemeling L.D. Bayesian Biostatistics 
and Diagnostic Medicine. 2007. CRC 
Press, 216 p.
181 
Kruschke J. Doing Bayesian Data Analysis. 2010. Academic Press, 672 p.
Downey A.B. Think Bayes: Bayesian Statistics 
Made Simple. Version 1.0.1, 2012. Green Tea 
Press: Needham, Massachusetts, 195 p. 
182 
Albert J. Bayesian Computation with R. 
Series: Use R! 2nd ed. 2009, Springer, 
299 p.
Free Software 
• Educational: SUStats, 
http://www.jsc.nildram.co.uk/examples/sustats/diescore/DieScoreApplet.html 
• WinStat http://math.exeter.edu/rparris/winstats.html 
• SOCR http://www.socr.ucla.edu/ 
• Research: R http://cran.r-project.org/ 
• PAST http://folk.uio.no/ohammer/past/ 
• Instat+ http://www.reading.ac.uk/ssc/n/software/instat/337/Instat+_v3.37.msi 
• Online Bayes Factor Calculator http://pcl.missouri.edu/bayesfactor 
• LePAC and LePrep http://www.univ-rouen.fr/LMRS/Persopage/Lecoutre/PAC.htm 
• G*Power http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/ 
• Reference Value Advisor http://www.biostat.envt.fr/spip/spip.php?article63 
• Newcombe’s spreadsheets 
http://medicine.cf.ac.uk/primary-care-public-health/resources/ 
• Cumming’s spreadsheets ESCI http://www.latrobe.edu.au/psy/esci/ 
• Harold Kaplan statistical pages http://printmacroj.com/statistics.htm 
• Commercial: 
• StatXact http://www.cytel.com/software-solutions/statxact 
• XLStat http:www.xlstat.com 
183
Commercial Software 
• StatXact http://www.cytel.com/software-solutions/statxact 
• XLStat http:www.xlstat.com 
• MedCalc https://www.medcalc.org/ 
• GraphPad Prism http://www.graphpad.com/ 
• StatsDirect http://www.statsdirect.com/ 
• Expensive monsters: 
• SAS http://www.sas.com/en_us/home.html 
• IBM SPSS http://www-01.ibm.com/software/analytics/spss/ 
• STATISTICA http://www.statsoft.com/ 
• John C. Pezzullo’s comprehensive list of statistical software: 
http://statpages.org/ 
184
Thank you for your attention 
Slides are freely available to all 
Nikita N. Khromov-Borisov 
Department of Physics, Mathematics and Informatics 
Pavlov First Saint Petersburg State Medical University 
Nikita.KhromovBorisov@gmail.com 
+7-952-204-89-49; +7-921-449-29-05 
http://independent.academia.edu/NikitaKhromovBorisov 
185

More Related Content

What's hot

The application of extracorporeal photochemotherapy to head and neck squamous...
The application of extracorporeal photochemotherapy to head and neck squamous...The application of extracorporeal photochemotherapy to head and neck squamous...
The application of extracorporeal photochemotherapy to head and neck squamous...TÀI LIỆU NGÀNH MAY
 
The role of traf3 and cyld mutationin the etiology of human papillomavirus dr...
The role of traf3 and cyld mutationin the etiology of human papillomavirus dr...The role of traf3 and cyld mutationin the etiology of human papillomavirus dr...
The role of traf3 and cyld mutationin the etiology of human papillomavirus dr...TÀI LIỆU NGÀNH MAY
 
2. health ethics intro
2. health ethics intro2. health ethics intro
2. health ethics introMesfin Tafa
 
Vitamin D and Multiple Sclerosis - An Annotated Biography
Vitamin D and Multiple Sclerosis - An Annotated BiographyVitamin D and Multiple Sclerosis - An Annotated Biography
Vitamin D and Multiple Sclerosis - An Annotated BiographyKat Venegas
 
Consumer and Connected Health: A New Day in Health and Healthcare?
Consumer and Connected Health: A New Day in Health and Healthcare?Consumer and Connected Health: A New Day in Health and Healthcare?
Consumer and Connected Health: A New Day in Health and Healthcare?Bradford Hesse
 
Clinical profile of paediatric patients with rheumatic heart disease at moi t...
Clinical profile of paediatric patients with rheumatic heart disease at moi t...Clinical profile of paediatric patients with rheumatic heart disease at moi t...
Clinical profile of paediatric patients with rheumatic heart disease at moi t...Alexander Decker
 
Research Updates in SJIA & MAS - Grant Schulert
Research Updates in SJIA & MAS - Grant SchulertResearch Updates in SJIA & MAS - Grant Schulert
Research Updates in SJIA & MAS - Grant SchulertSystemic JIA Foundation
 
Tuskegee Experiment
Tuskegee ExperimentTuskegee Experiment
Tuskegee ExperimentDUKE
 
Open Journal of Orthopedics and Rheumatology
Open Journal of Orthopedics and RheumatologyOpen Journal of Orthopedics and Rheumatology
Open Journal of Orthopedics and Rheumatologypeertechzpublication
 
Austin Journal of Genetics and Genomic Research
Austin Journal of Genetics and Genomic ResearchAustin Journal of Genetics and Genomic Research
Austin Journal of Genetics and Genomic ResearchAustin Publishing Group
 

What's hot (15)

The application of extracorporeal photochemotherapy to head and neck squamous...
The application of extracorporeal photochemotherapy to head and neck squamous...The application of extracorporeal photochemotherapy to head and neck squamous...
The application of extracorporeal photochemotherapy to head and neck squamous...
 
The role of traf3 and cyld mutationin the etiology of human papillomavirus dr...
The role of traf3 and cyld mutationin the etiology of human papillomavirus dr...The role of traf3 and cyld mutationin the etiology of human papillomavirus dr...
The role of traf3 and cyld mutationin the etiology of human papillomavirus dr...
 
Alexander Schlachterman.C.V.1.2015
Alexander Schlachterman.C.V.1.2015Alexander Schlachterman.C.V.1.2015
Alexander Schlachterman.C.V.1.2015
 
2. health ethics intro
2. health ethics intro2. health ethics intro
2. health ethics intro
 
Genetic Studies in Scleroderma
Genetic Studies in SclerodermaGenetic Studies in Scleroderma
Genetic Studies in Scleroderma
 
Vitamin D and Multiple Sclerosis - An Annotated Biography
Vitamin D and Multiple Sclerosis - An Annotated BiographyVitamin D and Multiple Sclerosis - An Annotated Biography
Vitamin D and Multiple Sclerosis - An Annotated Biography
 
Genetics of Scleroderma: Towards Personalized Medicine in the Genomic Age
Genetics of Scleroderma: Towards Personalized Medicine in the Genomic AgeGenetics of Scleroderma: Towards Personalized Medicine in the Genomic Age
Genetics of Scleroderma: Towards Personalized Medicine in the Genomic Age
 
Consumer and Connected Health: A New Day in Health and Healthcare?
Consumer and Connected Health: A New Day in Health and Healthcare?Consumer and Connected Health: A New Day in Health and Healthcare?
Consumer and Connected Health: A New Day in Health and Healthcare?
 
Clinical profile of paediatric patients with rheumatic heart disease at moi t...
Clinical profile of paediatric patients with rheumatic heart disease at moi t...Clinical profile of paediatric patients with rheumatic heart disease at moi t...
Clinical profile of paediatric patients with rheumatic heart disease at moi t...
 
IVMS-THE TUSKEGEE SYPHILIS EXPERIMENT 75 YEARS LATER
IVMS-THE  TUSKEGEE SYPHILIS EXPERIMENT  75 YEARS LATER IVMS-THE  TUSKEGEE SYPHILIS EXPERIMENT  75 YEARS LATER
IVMS-THE TUSKEGEE SYPHILIS EXPERIMENT 75 YEARS LATER
 
Rare Pulmonary Diseases in Systemic JIA
Rare Pulmonary Diseases in Systemic JIARare Pulmonary Diseases in Systemic JIA
Rare Pulmonary Diseases in Systemic JIA
 
Research Updates in SJIA & MAS - Grant Schulert
Research Updates in SJIA & MAS - Grant SchulertResearch Updates in SJIA & MAS - Grant Schulert
Research Updates in SJIA & MAS - Grant Schulert
 
Tuskegee Experiment
Tuskegee ExperimentTuskegee Experiment
Tuskegee Experiment
 
Open Journal of Orthopedics and Rheumatology
Open Journal of Orthopedics and RheumatologyOpen Journal of Orthopedics and Rheumatology
Open Journal of Orthopedics and Rheumatology
 
Austin Journal of Genetics and Genomic Research
Austin Journal of Genetics and Genomic ResearchAustin Journal of Genetics and Genomic Research
Austin Journal of Genetics and Genomic Research
 

Similar to Harmonizing statistical evidences and predictions

Types of clinical studies
Types of clinical studiesTypes of clinical studies
Types of clinical studiesSamir Haffar
 
Applied use of CUSUMs in surveillance
Applied use of CUSUMs in surveillanceApplied use of CUSUMs in surveillance
Applied use of CUSUMs in surveillanceNuffield Trust
 
Evolutionary arguments in medical genomics
Evolutionary arguments in medical genomicsEvolutionary arguments in medical genomics
Evolutionary arguments in medical genomicsNikita Khromov-Borisov
 
Observational Reserach & Errors.pptx
Observational Reserach & Errors.pptxObservational Reserach & Errors.pptx
Observational Reserach & Errors.pptxdavipharm
 
Prostate Cancer and Plant Based Nutrition
Prostate Cancer and Plant Based NutritionProstate Cancer and Plant Based Nutrition
Prostate Cancer and Plant Based NutritionEsserHealth
 
Cancer Revolution: Natural Treatments
Cancer Revolution: Natural TreatmentsCancer Revolution: Natural Treatments
Cancer Revolution: Natural TreatmentsJohn Bergman
 
Babel fish slideshare
Babel fish slideshareBabel fish slideshare
Babel fish slideshareBrian Hughes
 
Recover from cancer diagnosis
Recover from cancer diagnosisRecover from cancer diagnosis
Recover from cancer diagnosisJohn Bergman
 
Evaluating medical evidence for journalists
Evaluating medical evidence for journalistsEvaluating medical evidence for journalists
Evaluating medical evidence for journalistsIvan Oransky
 
Covering Medical Studies: How Not to Get It Wrong
Covering Medical Studies: How Not to Get It WrongCovering Medical Studies: How Not to Get It Wrong
Covering Medical Studies: How Not to Get It WrongIvan Oransky
 
Schwitzer keynote to ISDM 2013 Lima, Peru
Schwitzer keynote to ISDM 2013 Lima, PeruSchwitzer keynote to ISDM 2013 Lima, Peru
Schwitzer keynote to ISDM 2013 Lima, PeruGary Schwitzer
 
Cancer early screening and protection
Cancer early screening and protectionCancer early screening and protection
Cancer early screening and protectionMonkez M Yousif
 
HOW TO LIVE LONGER THAN YOUR DOCTOR
HOW TO LIVE LONGER THAN YOUR DOCTORHOW TO LIVE LONGER THAN YOUR DOCTOR
HOW TO LIVE LONGER THAN YOUR DOCTORLouis Cady, MD
 
4. case control studies
4. case control studies4. case control studies
4. case control studiesNaveen Phuyal
 
Atul Butte's presentation for the FDA 5th Annual Scientific Computing Days
Atul Butte's presentation for the FDA 5th Annual Scientific Computing DaysAtul Butte's presentation for the FDA 5th Annual Scientific Computing Days
Atul Butte's presentation for the FDA 5th Annual Scientific Computing DaysUniversity of California, San Francisco
 

Similar to Harmonizing statistical evidences and predictions (20)

Types of clinical studies
Types of clinical studiesTypes of clinical studies
Types of clinical studies
 
Applied use of CUSUMs in surveillance
Applied use of CUSUMs in surveillanceApplied use of CUSUMs in surveillance
Applied use of CUSUMs in surveillance
 
Evolutionary arguments in medical genomics
Evolutionary arguments in medical genomicsEvolutionary arguments in medical genomics
Evolutionary arguments in medical genomics
 
Observational Reserach & Errors.pptx
Observational Reserach & Errors.pptxObservational Reserach & Errors.pptx
Observational Reserach & Errors.pptx
 
Prostate Cancer and Plant Based Nutrition
Prostate Cancer and Plant Based NutritionProstate Cancer and Plant Based Nutrition
Prostate Cancer and Plant Based Nutrition
 
Cancer Revolution: Natural Treatments
Cancer Revolution: Natural TreatmentsCancer Revolution: Natural Treatments
Cancer Revolution: Natural Treatments
 
Babel fish slideshare
Babel fish slideshareBabel fish slideshare
Babel fish slideshare
 
Recover from cancer diagnosis
Recover from cancer diagnosisRecover from cancer diagnosis
Recover from cancer diagnosis
 
Evaluating medical evidence for journalists
Evaluating medical evidence for journalistsEvaluating medical evidence for journalists
Evaluating medical evidence for journalists
 
2013 03 genomic medicine slides
2013 03 genomic medicine slides2013 03 genomic medicine slides
2013 03 genomic medicine slides
 
Cancer 2018
Cancer 2018Cancer 2018
Cancer 2018
 
Covering Medical Studies: How Not to Get It Wrong
Covering Medical Studies: How Not to Get It WrongCovering Medical Studies: How Not to Get It Wrong
Covering Medical Studies: How Not to Get It Wrong
 
Schwitzer keynote to ISDM 2013 Lima, Peru
Schwitzer keynote to ISDM 2013 Lima, PeruSchwitzer keynote to ISDM 2013 Lima, Peru
Schwitzer keynote to ISDM 2013 Lima, Peru
 
Cancer early screening and protection
Cancer early screening and protectionCancer early screening and protection
Cancer early screening and protection
 
HOW TO LIVE LONGER THAN YOUR DOCTOR
HOW TO LIVE LONGER THAN YOUR DOCTORHOW TO LIVE LONGER THAN YOUR DOCTOR
HOW TO LIVE LONGER THAN YOUR DOCTOR
 
CASUAL INFERENCE.pptx
CASUAL INFERENCE.pptxCASUAL INFERENCE.pptx
CASUAL INFERENCE.pptx
 
4. case control studies
4. case control studies4. case control studies
4. case control studies
 
Atul Butte's presentation for the FDA 5th Annual Scientific Computing Days
Atul Butte's presentation for the FDA 5th Annual Scientific Computing DaysAtul Butte's presentation for the FDA 5th Annual Scientific Computing Days
Atul Butte's presentation for the FDA 5th Annual Scientific Computing Days
 
Cancer
Cancer Cancer
Cancer
 
Latest in Lupus
Latest in LupusLatest in Lupus
Latest in Lupus
 

More from Nikita Khromov-Borisov

кольцов и матричный принцип 2015
кольцов и матричный принцип 2015кольцов и матричный принцип 2015
кольцов и матричный принцип 2015Nikita Khromov-Borisov
 
парадоксы спортгеномики 2015
парадоксы спортгеномики 2015парадоксы спортгеномики 2015
парадоксы спортгеномики 2015Nikita Khromov-Borisov
 
химия днк для генетиков 2015
химия днк для генетиков 2015химия днк для генетиков 2015
химия днк для генетиков 2015Nikita Khromov-Borisov
 
парадоксы геномной медицины 2015
парадоксы геномной медицины 2015парадоксы геномной медицины 2015
парадоксы геномной медицины 2015Nikita Khromov-Borisov
 
кризис воспроизводимости в биомедицине Rus 2014
кризис воспроизводимости в биомедицине Rus 2014кризис воспроизводимости в биомедицине Rus 2014
кризис воспроизводимости в биомедицине Rus 2014Nikita Khromov-Borisov
 
Prematurity of genetic testing of predispositions rus 2014
Prematurity of genetic testing of predispositions rus 2014Prematurity of genetic testing of predispositions rus 2014
Prematurity of genetic testing of predispositions rus 2014Nikita Khromov-Borisov
 
Reproducibility and predictivity in the genetics of predispositions ppt 2013
Reproducibility and predictivity in the genetics of predispositions ppt 2013Reproducibility and predictivity in the genetics of predispositions ppt 2013
Reproducibility and predictivity in the genetics of predispositions ppt 2013Nikita Khromov-Borisov
 
Population thinking in studies of genetic predispositions ppt
Population thinking in studies of genetic predispositions pptPopulation thinking in studies of genetic predispositions ppt
Population thinking in studies of genetic predispositions pptNikita Khromov-Borisov
 
Modern free biostatistical software ppt
Modern free biostatistical software pptModern free biostatistical software ppt
Modern free biostatistical software pptNikita Khromov-Borisov
 
Half a century with the central dogma of molecular biology ppt
Half a century with the central dogma of molecular biology pptHalf a century with the central dogma of molecular biology ppt
Half a century with the central dogma of molecular biology pptNikita Khromov-Borisov
 
Format for the population data in forensic genetics ppt
Format for the population data in forensic genetics pptFormat for the population data in forensic genetics ppt
Format for the population data in forensic genetics pptNikita Khromov-Borisov
 
Evolutionary medical genomics ppt 2013
Evolutionary medical genomics ppt 2013Evolutionary medical genomics ppt 2013
Evolutionary medical genomics ppt 2013Nikita Khromov-Borisov
 
Catalog of formulae for forensic genetics ppt
Catalog of formulae for forensic genetics pptCatalog of formulae for forensic genetics ppt
Catalog of formulae for forensic genetics pptNikita Khromov-Borisov
 
Biometrical problems in population studies ppt 2004
Biometrical problems in population studies ppt 2004Biometrical problems in population studies ppt 2004
Biometrical problems in population studies ppt 2004Nikita Khromov-Borisov
 

More from Nikita Khromov-Borisov (17)

кольцов и матричный принцип 2015
кольцов и матричный принцип 2015кольцов и матричный принцип 2015
кольцов и матричный принцип 2015
 
парадоксы спортгеномики 2015
парадоксы спортгеномики 2015парадоксы спортгеномики 2015
парадоксы спортгеномики 2015
 
химия днк для генетиков 2015
химия днк для генетиков 2015химия днк для генетиков 2015
химия днк для генетиков 2015
 
парадоксы геномной медицины 2015
парадоксы геномной медицины 2015парадоксы геномной медицины 2015
парадоксы геномной медицины 2015
 
кризис воспроизводимости в биомедицине Rus 2014
кризис воспроизводимости в биомедицине Rus 2014кризис воспроизводимости в биомедицине Rus 2014
кризис воспроизводимости в биомедицине Rus 2014
 
Prematurity of genetic testing of predispositions rus 2014
Prematurity of genetic testing of predispositions rus 2014Prematurity of genetic testing of predispositions rus 2014
Prematurity of genetic testing of predispositions rus 2014
 
Syndrome of statistical leniency ppt
Syndrome of statistical leniency pptSyndrome of statistical leniency ppt
Syndrome of statistical leniency ppt
 
Reproducibility and predictivity in the genetics of predispositions ppt 2013
Reproducibility and predictivity in the genetics of predispositions ppt 2013Reproducibility and predictivity in the genetics of predispositions ppt 2013
Reproducibility and predictivity in the genetics of predispositions ppt 2013
 
Population thinking in studies of genetic predispositions ppt
Population thinking in studies of genetic predispositions pptPopulation thinking in studies of genetic predispositions ppt
Population thinking in studies of genetic predispositions ppt
 
Modern free biostatistical software ppt
Modern free biostatistical software pptModern free biostatistical software ppt
Modern free biostatistical software ppt
 
Half a century with the central dogma of molecular biology ppt
Half a century with the central dogma of molecular biology pptHalf a century with the central dogma of molecular biology ppt
Half a century with the central dogma of molecular biology ppt
 
Genetics of predispositions ppt
Genetics of predispositions pptGenetics of predispositions ppt
Genetics of predispositions ppt
 
Format for the population data in forensic genetics ppt
Format for the population data in forensic genetics pptFormat for the population data in forensic genetics ppt
Format for the population data in forensic genetics ppt
 
Evolutionary medical genomics ppt 2013
Evolutionary medical genomics ppt 2013Evolutionary medical genomics ppt 2013
Evolutionary medical genomics ppt 2013
 
Catalog of formulae for forensic genetics ppt
Catalog of formulae for forensic genetics pptCatalog of formulae for forensic genetics ppt
Catalog of formulae for forensic genetics ppt
 
Biometrical problems in population studies ppt 2004
Biometrical problems in population studies ppt 2004Biometrical problems in population studies ppt 2004
Biometrical problems in population studies ppt 2004
 
Joshua lederberg ppt
Joshua lederberg pptJoshua lederberg ppt
Joshua lederberg ppt
 

Recently uploaded

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 

Recently uploaded (20)

Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 

Harmonizing statistical evidences and predictions

  • 1. International Life Sciences Workshop “Decision-Making in Biomedical Science – Meet Experts” September 12 – 16 | 2014 Potsdam | Germany Harmonizing statistical evidences and predictions Nikita N. Khromov-Borisov Pavlov First Saint Petersburg State Medical University Saint Petersburg, Russia Nikita.KhromovBorisov@gmail.com +7 952-204-89-49; +7 921-449-29-05 http://independent.academia.edu/NikitaKhromovBorisov https://www.researchgate.net/profile/Nikita_Khromov-Borisov?ev=hdr_xprf 1
  • 2. Slides are freely available to all Nikita N. Khromov-Borisov Department of Physics, Mathematics and Informatics Pavlov First Saint Petersburg State Medical University Nikita.KhromovBorisov@gmail.com +7-952-204-89-49; +7-921-449-29-05 http://independent.academia.edu/NikitaKhromovBorisov 2
  • 3. The best way to discuss scientific issues is to discuss them in a foreign language Max Ludwig Henning Delbrück, (September 4, 1906 – March 9, 1981) Piotr Slonimski (November 9, 1922 – April 25, 2009) 3
  • 4. Second hand teaching • The History of Science has suffered greatly from the use by teachers of second-hand material, and the consequent obliteration of the circumstances and the intellectual atmosphere in which the great discoveries of the past were made. • A first-hand study is always instructive, and often . . . full of surprises. • Ronald A. Fisher, 1955 • Cited by: Ziliak S.T., McCloskey D.N. The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives. The University of Michigan Press, Ann Arbor, 2008, 321 pp. • http://stephentziliak.com/ 4
  • 5. Crisis of reproducibility of the results in biomedicine 5
  • 6. The essences of science are replication and reproducibility • The essence of science is replication: • a scientist should always be concerned about what would happen if he or another scientist were to repeat his experiment. • Guttman L. What is not what in statistics. The Statistician, 1977; 26(2): 81-107. • Scientists have elaborated method of determining the validity of their results. • They learned to ask the question: are they reproducible? • Scherr G.H. Irreproducible Science: Editor’s Introduction. • In The Best of the Journal of Irreproducible Results, Workman • Publishing, New York, 1983. • Reproducibility is like the ghost that will always come back to haunt you. • http://datapede.blogspot.ru/2014/03/part-1z-p-value-surviving-mosquito.html 6
  • 7. Loscalzo J. Irreproducible Experimental Results: Causes, (Mis)interpretations, and Consequences. Circulation, 2012; 125: 1211-1214. • In Science what is relevant is reproducible results. • If an initial observation is found to be reproducible, then it must be true. • If an initial observation is found not to be reproducible, then it must be false. • Many readers of scientific journals—especially of higher-impact journals—assume that if a study is of sufficient quality to pass the scrutiny of rigorous reviewers, it must be true. • This assumption is based on the inferred equivalence of reproducibility and truth. 7
  • 8. • Long ago Fisher . . . recognised that . . . solid knowledge came from a demonstrated ability to repeat experiments . . . • This is unhappy for the investigator who would like to settle things once and for all, but consistent with the best accounts . . . of the scientific method . . . • Tukey J.W. The philosophy of multiple comparisons. Statistical Science, 1991; 6: 100- 116. 8
  • 9. Tukey J.W. Analyzing data: Sanctification or detective work? American Psychologist, 1969; 24: 83–91. • Nothing learned is certain. • We learn by taking chances. • Every modern learning theorist expects learning to be by trial, with some errors. • This is as true for science as for the individual. • Confirmation comes from repetition. • Repetition is the basis for judging varilability and significance and confidence. • Repetition of results, each significant, is the basis, according to Fisher, of scientific truth. • Certainty is an illusion. • As an illusion, certainty can be wasteful, as well as misleading. • Data analysis needs to be both exploratory and confirmatory. 9
  • 10. From the history of epidemiological studies: Risk factors for cancer [Jenks S., Volkers N. Razors and Refrigerators and Reindeer — Oh My! JNCI, 1992; 84(24):1863] • Using electric razor: Increase the risk of developing leukemia. • Distal forearm fractures in women: Reduction in overall cancer incidence, breast cancer incidence, and incidence of tumors. • Fluorescent lighting: Melanoma in male but not in females. • Allergies and cancer: At first the inverse relationship. Later several types of cancer were elevated. However, ovarian cancer risk decreased with increasing numbers of allergies. • Breeding reindeer: in Swedish Lapps decreased risks for cancers of the colon, female breast, male genital tract, kidneys, respiratory system, and for lymphomas. However, increased risk for stomach cancer. 10
  • 11. From the history of epidemiological studies: Risk factors for cancer [Jenks S., Volkers N. Razors and Refrigerators and Reindeer — Oh My! JNCI, 1992; 84(24): 1863] • Waiters in Norway: Decreased risk of stomach cancer but excess risks of cancers of the liver, rectum, upper respiratory and digestive tracts, and lung. Higher mortality rate from lung cancer. • Owning a pet bird: Fourfold increase in lung cancer risk among pigeon fanciers (more hazardous than living with a smoker). Owners of budgies, canaries, finches, or parrots were OK. • Height: Lower risks for some cancers in short men, particularly colorectal cancer, and lower risks for this cancer and for breast cancer in short women. But being tall may confer some advantage for certain cancers (esophageal, endometrial and cervical), while tall men have only a slightly elevated risk for prostate, kidney and colon cancers. • Refrigerators: Seems protect everyone from stomach cancer. 11
  • 12. • An extensive list of curious and questionable medical observations about the various risk factors, was given in the work: • Buchanan A.V., Weiss K.M., Fullerton S.M. • Dissecting complex disease: the quest for the Philosopher’s Stone? • International Journal of Epidemiology 2006. – Vol. 35. – P. 562–571 12
  • 13. Table of irreproducible results? • Hormone replacement therapy and heart disease • Hormone replacement therapy and cancer • Stress and stomach ulcers • Annual physical checkups and disease prevention • Behavioural disorders and their cause • Diagnostic mammography and cancer prevention • Breast self-exam and cancer prevention • Echinacea and colds • Vitamin C and colds • Baby aspirin and heart disease prevention • Dietary salt and hypertension • Dietary fat and heart disease • Dietary calcium and bone strength • Obesity and disease • Dietary fibre and colon cancer • The food pyramid and nutrient RDAs • Cholesterol and heart disease • Homocysteine and heart disease • Inflammation and heart disease • Olive oil and breast cancer • Fidgeting and obesity • Sun and cancer • Mercury and autism • Obstetric practice and schizophrenia • Mothering patterns and schizophrenia • Anything else and schizophrenia • Red wine (but not white, and not grape juice) and heart disease • Syphilis and genes • Mothering patterns and autism • Breast feeding and asthma • Bottle feeding and asthma • Anything and asthma • Power transformers and leukaemia • Nuclear power plants and leukaemia • Cell phones and brain tumours • Vitamin antioxidants and cancer, aging • HMOs and reduced health care cost • HMOs and healthier Americans • Genes and you name it! 13
  • 14. ‘Blood group mythology’: myths about AB0 • Human blood group system AB0 can serve as an classic example of unacknowledged associations with the different conditions. • Several incredible phenomenon were reported: • Persons with A have more severe hangovers; • Persons with B defecate the most; • Persons with 0 have more healthy teeth; • Military with 0 are spineless and with B are more impulsive; • Persons with B are more prone to crime; • Strong connection between AB0 and nutrition; • Persons with A2 have the highest IQ; • A is significantly more common among members of the higher socio-economic groups. • All these associations are not reproduced and virtually forgotten. 14
  • 15. • Large companies in Japan still use blood types when advertising for, or evaluating, job applicants. • George Garratty • Association of Blood Groups and Disease: Do Blood Group Antigens and Antibodies Have a Biological Role? • History and Philosophy of the Life Sciences, 1996; Vol. 18, No. 3, The First Genetic Marker, p. 321-344. 15
  • 16. • The only associations between AB0 blood groups and malignant neoplasms, thrombosis, peptic ulcers, bleeding, bacterial and viral infections are still regarded as statistically “proven“. • Alas, these associations have no clinical (practical) importance due to low values of odds ratio (OR) which do not exceed the value of OR = 1.5. 16
  • 17. Associations between AB0 blood groups and diseases, which are still considered to be statistically “proven” Medical condition A > 0 0 > A B/AB > A/0 OR Malignancy X 1.2 – 1.3 Thrombosis X Peptic ulcers X 1.2 – 1.4 Bleeding X 1.5 E. coli / Salmonella X Note that here we meet extremely important issue of clinical (or any other practical) importance (significance) of the observed associations. Here clinical importance is demonstrated with one of the measures of the effect size such as odds ratio (OR). 17
  • 18. Begley C.G., Ellis L.M. Raise standards for preclinical cancer research. Nature, 2012; 483: 531-533. • Recently Glenn Begley, former vice president of the well-known biotech company Amgen, and his colleague Lee Ellis published the results of their efforts to replicate findings from recent publications in the clinical oncology literature. • The data were disturbing. • Of 53 papers, only 6 (11%) were reproducible. • Begley and Ellis state that the • poor reproducibility of the results becomes a systemic problem of modern science. • In one study, which was cited in a short period more than 1900 times, even the authors themselves later were unable to reproduce their own results. 18
  • 19. Increasing replication of un-reproducibility in science • Gautam Naik: Scientists' Elusive Goal: Reproducing Study Results. The Wall Street Journal, December 2, 2011. • This is one of medicine’s dirty secrets: • Most results, including those that appear in top-flight peer-reviewed journals, can’t be reproduced. 19
  • 20. Macleod M.R., Michie S., Roberts I., Dirnagl U., Chalmers I., Ioannidis J.P.A., Al-Shahi Salman R., Chan A.-W., Glasziou P. Biomedical research: increasing value, reducing waste. The Lancet, 2014, 383(9912): 101-104 • Of 1575 reports about cancer prognostic markers published in 2005, 1509 (96%) detailed at least one significant prognostic variable. • However, few identified biomarkers have been confirmed by subsequent research and few have entered routine clinical practice. • This pattern — initially promising findings not leading to improvements in health care — has been recorded across biomedical research. • So why is research that might transform health care and reduce health problems not being successfully produced? 20
  • 21. Ioannidis J.P.A. Why most published research findings are false. PLoS Med., 2005. – Vol. 2. – No. 8. – Paper: e124. Cited by 2174 21
  • 23. • PLOS ONE Launches Reproducibility Initiative • http://validation.scienceexchange.com/#/ • Reproducibility Initiative receives $1.3M grant to validate 50 landmark cancer studies • Reproducibility Project: Psychology • https://osf.io/ezcuj/wiki/home/ • Special Section on Replicability in Psychological Science • Perspectives on Psychological Science, 2012; 7(6): 528 –530 23
  • 24. • Journal of Negative Results in BioMedicine is an open access, peer-reviewed, online journal that provides a platform for the publication and discussion of unexpected, controversial, provocative and/or negative results in the context of current tenets. • Editor-in-Chief • Bjorn R Olsen, Harvard Medical School 24
  • 25. Challenges in irreproducible research • No research paper can ever be considered to be the final word, and the replication and corroboration of research results is key to the scientific process. • In studying complex entities, especially animals and human beings, the complexity of the system and of the techniques can all too easily lead to results that seem robust in the lab, and valid to editors and referees of journals, but which do not stand the test of further studies. • http://www.nature.com/nature/focus/reproducibility/index.html 25
  • 26. Statistics “A subject which most statisticians find difficult but in which nearly all physicians are expert.” 26
  • 27. • Statistical flaws are a major cause of irreproducible results in all types of biomedical experimentation. • These include errors in trial design, data analysis, and data interpretation. • “If experimentation is the Queen of the sciences, surely statistical methods must be regarded as the Guardian of the Royal Virtue.” • Myron Tribus (Letter to Science) 27
  • 28. Statistical Babel • Unfortunately, statisticians speak different languages , and often do not hear and/or do not understand each other. • Two main approaches to the statistical inference are developing: • Bayesian and • Frequentist • Frequentist inference is subdivided onto two main branches: • Fisherian and • Neyman-Pearsonian • Users do not always differentiate them that leads to serious confusions. • Two other approaches are also exist: Likelihood and Fiducial inferences. • http://en.wikipedia.org/wiki/Frequentist_inference 28
  • 30. Fundamental statistics principles • Random sampling is the main principle of statistics. • Randomness and the Law of Large Numbers ensure the sample representativeness. • A sample is called representative if it reflects correctly the distribution from which the sample is taken. • The main objective of statistics consists in analyzing random samples to get conclusions on the distributions from which they are drawn. • Note that we do not need the term “population” which can be misleading. 30
  • 31. Statistics with confidence • Does Statistics enable us to trust to it? • For instance, how to check is the die perfect (fair, ideal, symmetric) or not? • The answer is provided by the Law of Large Numbers. 31
  • 32. Simulation of the rolling a die: program SUStats http://www.jsc.nildram.co.uk/examples/sustats/diescore/DieScoreApplet.html A die was rolled 100 times in each of four independent simulations. Please, answer three questions: 1. Are the results of the rolling reproducible (i.e. are the histograms similar)? - Yes - No 2. What a form (shape) of the histogram and the underlying distribution we expect 32 for the results of rolling fair die? - Unimodal of a bell-form - Triangle - Uniform (rectangular) 3. Can we state that the die is fair? - Yes - No
  • 33. Simulation of the rolling a die: program SUStats http://www.jsc.nildram.co.uk/examples/sustats/diescore/DieScoreApplet.html 33 A die was rolled 1 000 times in each of four independent simulations. Please, answer two questions: 1. Are the results of the rolling reproducible (are the histograms similar)? - Yes - No 2. Can we state that the die is certainly fair? - Yes - No
  • 34. Simulation of the rolling a die: program SUStats http://www.jsc.nildram.co.uk/examples/sustats/diescore/DieScoreApplet.html 34 A die was rolled 10 000 times in each of four independent simulations. Please, answer two questions: 1. Are the results of the rolling reproducible (are the histograms similar)? - Yes - No 2. Can we state that the die is certainly fair (the histograms are certainly rectangular and the entire distribution is uniform)? - Yes - No
  • 35. Simulation of the rolling a die: program SUStats http://www.jsc.nildram.co.uk/examples/sustats/diescore/DieScoreApplet.html 35 Pease, keep in mind the last figure (number) n = 10 000 that gives reliable results. It is difficult to realize it in biomedicine, but it’s really reliable.
  • 36. Lyrical digression • If to ponder, it is the • Pauli exclusion principle • that provides a variety of forms • of matter at all levels, • from atoms to living beings, • e.g., genetic and phenotypic (biochemical, physiological, morphological) variations. 36
  • 37. Sample size “She thought that a smaller sample size makes for more accurate results” 37
  • 38. Sample sizes in physics, chemistry, biology and medicine • Physicists and chemists works with the samples of different substances which contain 6∙1023 (the Avogadro constant) of particles (atoms or molecules) in 1 mole of the pure substance. • Even 1 nanomole of given substance contains about 1014 such particles. • These particles may be regarded as rather identical. • However, we need not to forget that even on the atomic level there are several isotopes of a given chemical element. • And some of them are radioactive. • In medicine researchers are limited with the size of the world population which is less then 1010, specifically, about 7.257∙109. • See real-time: http://www.worldometers.info/world-population/ • And human population are extremely heterogeneous. 38
  • 39. Principal contradiction • All people are dissimilar, even monozygotic (“identical”) twins. • In such twins the differences in copy number variation (CNV), immunoglobulins, fingerprints are observed. • Surely this fact is one of the main sources of the low reproducibility and predictive ability of the results in biomedicine. • Thus, the genetic and phenotypic uniqueness of each person comes into contradiction with the statistical methodology, which requires to analyze large amounts (thousands or at least hundreds) of identical persons to achieve the certain conclusions. 39
  • 40. What is the Low of Large Numbers? • If the probability P(A) of an event A is constant in all trials, then the larger n - the number of trials (experiments, sample size), • the closer the observed (empirical, experimental) relative frequency, f(A), of a given outcome (event) A converges to its expected (theoretical) probability P(A): f A P A P  n • This means that the frequencies become more and more stable and their fluctuations become smaller and smaller. • Corollary: • Thus, we may not know the probability of an event A, but repeating the trial as much as possible, we can accept its observed frequency f(A) as a reliable statistical estimate of the unknown probability P(A)unkn. • Statistics helps us to know the unknown. • In Probability Theory probabilities are known, Statistics estimate them. 40      
  • 41. “Reverse side” of the Law of Large Numbers • Simultaneously along with the convergence of the frequency of an event A to its probability, the situation, when the frequency of the event will coincide exactly with its probability: • becomes less probable • i.e. the larger the number of trials the closer the probability of such an exact match converges to zero: 41 f A  PA      Pr f A  P A P  0  n
  • 42. Probability of the exact coincidence of the frequency f(A) with the probability P(A), e.g., fair coin tossing with P(A) = φ = 0,5 f(A) • 5/10 • 50/100 • 500/1 000 • 5 000/10 000 • 50 000/100 000 • 500 000/1 000 000 P[f(A)] • 0.25 • 0.080 • 0.025 • 0.0080 • 0.0025 • 0.00080 42 For the sake of clarity, the probability values are rounded to two significant figures.
  • 43. Consequences of the Law of Large Numbers (LLN) • According to the Law of Large Numbers the larger the Sample Size, n • the “better” (more accurate, more reliable) the Sample data reflects the distribution of Random Variable from which the Sample is drawn. • Consequently, the larger the sample size, the more representative is the Sample. • This is true, however, if and only if (iff) the Sample data are the realizations of the independent identically distributed (iid) Random Variables. 43
  • 45. What are the main objectives of statistics? • Statistical Estimation (of the parameters) • Point and interval estimations • Statistical Inference – Testing Statistical Hypotheses – Comparison of Models • Statistical Associations • Correlation and Regression 45
  • 46. What is Estimator and what is Estimate? • An “Estimator“ is a statistic that is used to infer the value of an unknown parameter in a statistical model. • The parameter being estimated is sometimes called the estimand. • In other words, an estimator is a rule for calculating an estimate of a given quantity based on observed data: • thus the rule and its result (the estimate) are distinguished. 46
  • 47. Two main kinds of Statistical Estimates • Point Estimate – estimation by a single number. • Intreval Estimate – estimation by an interval, which covers the value of the estimated parameter with given probability called confidence level. 47
  • 48. The main logic of Statistical Estimation: Point Estimates • Usually the parameter φunkn is unknown. • The objective is to estimate it on the basis of observed statistical data • x1, x2, …, xi, …, xn. • The above values are regarded as realizations of corresponding iid random variables: • X1, X2, …, Xi, …, Xn. • Appropriate function of these random variable is chosen as an Estimator for the unknown parameter. • Any such function is called “Statistic” and it also is a random variable. • Calculated values of a chosen Estimator are called Estimates. • Estimate is regarded as a realizations of given Estimator. 48
  • 49. Compression of statistical information • One of the most widely used statistic is a sample mean which plays a role of the Estimate of the mean value of the underlying distribution. • It is calculated as: n    i i x 1 n M 1 • And it is generated by the Estimator: n ~ 1 ~    i i X n M 1 • Here tilde “~” is a symbol of a random variable. 49
  • 50. Example 1 Intrauterine growth restriction (IUGR) and interferon IFN-α/β 50
  • 51. • Let consider one of the most common problem of statistical analysis of two independent samples. 51
  • 52. IUGR – intrauterine growth restriction (old name “intrauterine growth retardation”) • Foetuses of birth weight less than 10th percentile of those born at same gestational age • or • two standard deviations below the population mean are considered growth restricted. • Note that the difiniton is based on statistical terms: 10th percentile and/or standard deviations. • More strictly IUGR should refer to foetuses that are small for gestational age and display other signs of chronic hypoxia or failure to thrive. • Approximately 3-5% of all pregnancies. • IUGR also known as SGA (small for gestational age). 52
  • 53. A comparision between normal and IUGR babies (Dr. M.C. Bansal) 53
  • 55. Normal and IUGR placenta (Dr. M.C. Bansal) 55
  • 56. 56 Levels of induced production of INF-α/β in 16 healthy mothers of healthy newborns and in 20 mothers of newborns with IUGR (intrauterine growth restriction) (Koroleva L.I.). Data are ranked. Healthy IUGR Rank IFN-α/β, IU/ml Rank IFN-α/β, IU/ml Rank IFN-α/β, IU/ml Rank IFN-α/β, IU/ml 1 38 9 92 1 104 11 144 2 42 10 93 2 121 12 146 3 58 11 94 3 123 13 147 4 59 12 101 4 123 14 149 5 70 13 103 5 127 15 151 6 71 14 115 6 130 16 153 7 81 15 159 7 132 17 162 8 86 16 170 8 134 18 168 9 134 19 171 10 140 20 173 Only three highlighted values in healthy group are overlapped with the values in IUGR group. Level of INF-a/b in IUGR group stochastically dominates that in healthy.
  • 57. Exploratory and Pictorial Statistics. Visualization of the initial data and their preliminary statistical descriptions: histograms, box plots, dominance diagrams, etc. 57
  • 58. 58 Comparisons of histograms for the levels of induced production of INF-α/β in 16 healthy mothers of healthy newborns and in 20 mothers of newborns with IUGR. Free program PAST http://folk.uio.no/ohammer/past
  • 59. Comparisons of histograms and cumulative sample distributions for the levels of induced production of INF-α/β in 16 healthy mothers of healthy newborns and in 20 mothers of newborns with IUGR. Program XLSTAT http:www.xlstat.com 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 150 200 Cumulative relative frequency Cumulative distributions (Healthy / IUGR) Healthy IUGR 0.025 0.02 0.015 0.01 0.005 0 Histograms (IFN-a/b, IU/mL) 0 50 100 150 200 Density IFN-a/b, IU/mL Healthy Normal(89.500,36.471) IUGR Normal(141.600,18.323) 59
  • 60. CDF – cumulative distribution functions and stochastic dominance Program XLSTAT http:www.xlstat.com • The level of induced IFN-a/ b in IUGR patients (green line) stochastically dominates that for healthy mothers (blue line): • X2 > X1 • Stochastic - randomly determined; having a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 150 200 Cumulative relative frequency Cumulative distributions (IUGR / Healthy) IUGR Healthy 60
  • 61. Box-and-Whisker plot Q1 – first quartile, Q3 – third quartile, IQR – interquartile range, σ – standard deviation. 61
  • 62. Box-and-whisker plot for the levels of induced production of IFN-/ in 16 healthy mothers of healthy newborns and in 20 mothers of newborns with IUGR. Free program: Instat+ http://www.reading.ac.uk/ssc/n/n_instat.htm 62 Marks for outliers medians 95% confidence limits for medians What did the Box Plot say to the outlier? "Don't you dare get close to my whisker!!"
  • 63. What is outlier? • Outlier is an observation that is numerically distant from the rest of the data. • They are often indicative of measurement (or registration) errors. • For example, if for the arterial blood pressure the value 1100 is registered, this could be misprint: either 1 or 0 is rather redundant. • Removing of outlier(s) is a controversial practice recommended in several textbooks and manuals. • However, the possibility should be considered that the underlying distribution for the data is not approximately normal, having "fat (heavy) tails“ or representing a mixture of two or more different distributions. • Mixture may comprise two identical distributions, but shifted relative to each other. • Thus, removing of outlier(s) have to be based on the extra-statistical considerations. • “I'm not an outlier; I just haven't found my distribution yet!” 63
  • 64. Mixture analysis Program PAST Component proportion Mean, M Standard Deviation, SD 0.88 78.8 22.5 0.12 164.5 5.5 Data in healthy group can be regarded as a mixture of two normal distributions. Their proportions are 88% and 12%. The major component has sample mean about M = 79 IU/mL and standard deviation SD = 23 IU/mL. The minor component has M = 165 IU/mL and standard deviation SD = 5.5 IU/mL. However, the sample size (n1 = 16) is too small to get certain conclusion. 64
  • 66. • Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals. Updated December 2013. • iii. Statistics • Describe statistical methods with enough detail to enable a knowledgeable reader with access to the original data to judge its appropriateness for the study and to verify the reported results. • When possible, quantify findings and present them with appropriate indicators of measurement error or uncertainty (such as confidence intervals). • Avoid relying solely on statistical hypothesis testing, such as P values, which fail to convey important information about effect size and precision of estimates. • http://www.icmje.org/recommendations/ • Prediction probabilities and prediction intervals should be added. 66
  • 67. • Over 300 medical and biomedical journals are guided with the ICMJE recommendations. 67
  • 68. Effect Size, ES • Question of the clinical (practical) importance of the observed • Effect Size (ES) • is a key when interpreting results of biomedical investigations (e.g., clinical trials). • Effect Size is defined as a quantitative reflection of the magnitude of some phenomenon that is used for the purpose of addressing a question of interest. • Kelley K., Preacher K.J. On Effect Size. Psychological Methods, 2012; 17(2): 137–152 • ES can be the difference between mean values, different kind of ratios, correlation, association etc. • ES can be expressed either in the real measurement units, or • as standardized (nonmetric) quantity. 68
  • 69. • Analyzing samples we get conclusions on the distributions from which they are drawn. • In the case of comparing two independent distributions the simplest and useful measure of the effect size is AUC (or AUROC) – Area Under (ROC-) Curve which relates to Mann-Whitney U-statistics. • One of its representation is so-called dominance diagram. 69
  • 70. 170 159 115 103 101 94 93 92 86 81 71 70 59 58 42 38 70 104 121 123 123 127 130 132 134 134 140 144 146 147 149 151 153 162 168 171 173
  • 71. Dominance diagram Program XLSTAT http:www.xlstat.com Healthy Dominance diagram IUGR 71 Umin = 35 is a number of “plus” signs, and Umax = 285 is a number of “minus” signs, and obviously: Umin + Umax = 35 + 285 = n1 × n2 = 16 × 20 = 320
  • 72. • For two independent random variables X and Y , • Θ = P(Y > X) + 1/2 P(Y = X) • is advocated as a general measure of effect size to characterize the degree of separation (or, conversely, overlap) of their distributions. • It is estimated by statistic • θ  AUC = Umax / (n1 × n2), • derived by dividing the larger observed value Umax of the Mann–Whitney statistic by the product of the two sample sizes. • It is equivalent to the observed value of AUC - area under the receiver operating characteristic (ROC) curve. • It has been termed the ‘probability of concordance’, ‘common language effect size’ and ‘measure of stochastic superiority’. 72
  • 73. AUC - area under (ROC-) curve • In given rectangular matrix the total cell number is a product of the two sample sizes: • n1  n2 = 20  16 = 320 • The observed maximum value of two additive components of Mann-Whitney U-statistics is the number of yellow cells in the matrix: • Umax = 285 • So the point estimate for AUC is: • AUC = Umax / (n1  n2) = 285/320 = 0.89 73
  • 74. Interval estimation Researchers should wherever possible, base discussion and interpretation of results on point and interval estimates 74
  • 75. What is Confidence Interval? • Frequentist’s Confidence Interval is a random interval that covers the estimated (unknown) value of a given Parameter with the specified probability. • Such probability is called confidence level (or confidence coefficient). 75
  • 76. CI • If the experiment is repeated several times, the observed values for the limits of the Confidence Interval calculated from the observations will vary from sample to sample. • Frequently, with the probability (1 - ), it will include (cover) the estimated unknown value of parameter, but with the probability  it will inevitably miss the estimated value. • How frequently the observed interval contains the parameter is determined by the confidence level (or confidence coefficient). • Confidence level is chosen by the researcher in accordance with his intuition. 76
  • 77. Frequentist’s Confidence Interval (CI)  ~ ~          lower unkn upper     unkn lower   2 2 1 upper unkn         ~ ~ P P P 77
  • 78. The meaning of the Confidence Level • The meaning of the term “confidence level” is that, if confidence intervals are constructed across many separate data analyses of repeated (and possibly different) experiments, the proportion of such intervals that contain the true value of the parameter will approximately match the confidence level. • So, e.g., the 95% does not attach to the one frequentist CI, it attaches to “the proportion of such intervals”. • When only single CI is obtained, it is unknown whether it is true or not. • Again, we come to a conclusion about the need to repeat the experiment many times. 78
  • 79. Bayesian confidence (credible) interval 79    1 L U P ~    2    L P ~    2    U P ~
  • 80. Significance Level α and Confidence Level (1 – α) Significance level,  Confidence level, (1 - ) Reliability 0.05 95% Low 0.01 99% Medium 0.001 99.9% High 80
  • 81. Confidence interval and statistical significance Expected value of θ 100(1 – α)% CI for the unknown value θunkn: Unknown estimated by given interval value θunkn does not differ statistically from the expected value θ. Unknown estimated by given interval value θunkn is statistically significantly larger than the expected value θ at the significance level α. Unknown estimated by given interval value θunkn is statistically significantly smaller than the expected value θ at the significance level α. 81
  • 82. Statistical significance and practical (clinical) importance Estimated unknown difference is statistically nonsignificant and clinically unimportant CI is too wide; perhaps sample size is too small Estimated unknown difference is statistically significant, but clinically unimportant Estimated unknown difference is statistically significant and clinically important Expected “null” value CI 82 82 Clinically indifferent zone or reference interval
  • 83. Compact form for the joint presentation of point and interval estimations • Example: – AUC point estimation: 0.89 – Lower limit of the 95% CI: 0.72 – Upper limit of the 99% CI: 0.96 • Compact record: • AUC  θ = 0.720.890.96 • Louis T.A., Zeger S.L. Effective communication of standard errors and confidence intervals. Biostatistics, 2009; 10(1): 1–2. • Newcombe’s spreadsheet: GENERALISEDMW.XLS http://medicine.cf.ac.uk/primary-care-public-health/resources/ 83
  • 84. Statistical inference using confidence interval • Obtained 95% confidence interval (CI) does not cover the indifferent value AUCindiff = 0.5. • This means that the unknown value of AUCunkn estimated with this interval statistically significantly differs from the indifferent value AUCindiff = 0.5 (under the significance level α = 0.05). • Consequently, we can conclude that one of two comparing random variables stochastically dominates another. • When the shapes of both distributions are similar we can interpret this result as the statistically significant deviation of the estimated Hodges-Lehmann shift parameter from its indifferent value ΔHLindiff = 0. 84
  • 85. • Strictly speaking, widespread interpretation of the Mann-Whitney U-statistic as a measure of the difference between medians of the two comparing distributions is incorrect. • Mann-Whitney statistic is the measure of stochastic dominance of one of two independent distributions (not their medians). • When the shapes of both distribution are similar, than Mann-Whitney statistic becomes the basis for estimating the Hodges-Lehmann shift parameter. 85
  • 86. 170 159 115 103 101 94 93 92 86 81 71 70 59 58 42 38 104 -66 -55 -11 1 3 10 11 12 18 23 33 34 45 46 62 66 121 -49 -38 6 18 20 27 28 29 35 40 50 51 62 63 79 83 123 -47 -36 8 20 22 29 30 31 37 42 52 53 64 65 81 85 123 -47 -36 8 20 l999=22 29 30 31 37 42 52 53 64 65 81 85 127 -43 -32 12 24 26 33 34 35 41 46 56 57 68 69 85 89 130 -40 -29 15 27 29 36 37 38 44 49 59 60 71 72 88 92 132 -38 -27 17 29 l99=31 l95=38 39 40 46 51 61 62 73 u95=74 90 94 134 -36 -25 19 31 33 40 41 42 48 53 63 64 75 76 92 96 134 -36 -25 19 31 33 40 41 42 48 53 63 64 75 76 92 96 140 -30 -19 25 37 39 46 47 48 54 59 69 70 81 82 98 102 144 -26 -15 29 41 43 50 51 52 58 63 73 74 85 86 102 106 146 -24 -13 31 43 45 52 53 54 60 65 75 76 u999=87 88 104 108 147 -23 -12 32 44 46 53 54 55 61 66 76 77 88 89 105 109 149 -21 -10 34 46 48 55 HL=56 57 63 68 78 79 90 91 107 111 151 -19 -8 36 48 50 57 58 59 65 70 80 81 92 93 109 113 153 -17 -6 38 50 52 59 60 61 67 72 82 83 94 95 111 115 162 -8 3 47 59 61 68 69 70 76 81 91 92 103 104 120 124 168 -2 9 53 65 67 74 75 76 82 87 97 98 109 110 126 130 171 1 12 56 68 70 77 78 u99=79 85 90 100 101 112 113 129 133 173 3 14 58 70 72 79 80 81 87 92 102 103 114 115 131 135 86
  • 87. Applying nonparametric confidence interval for the shift parameter to the comparison of the induced production of IFN-/ in healthy group and group with IUGR. Program StatXact http://www.cytel.com/software-solutions/statxact • Resulting Nonparametric Hodges-Lehmann point and interval estimates of the shift parameter are: • ΔHL = 385674 IU/mL • This 95% confidence interval doesn’t cover the indifferent value of the shift Δindiff = 0. • So estimated with this interval unknown value of the shift Δunkn statistically significantly differs from 0 at the significance level α = 0,05. • Therefore the induced production IFN-α/β in IUGR group is statistically significantly higher than in healthy group. 87
  • 88. Applying parametric confidence interval for the mean difference to the comparison of the induced production of IFN-/ in healthy group and group with IUGR. Free Program ESCI JSMS.xls http://www.latrobe.edu.au/psy/esci/ • Parametric point and interval estimates of the difference of two means are: • Δ = 335271 IU/mL • This 95% confidence interval doesn’t cover the indifferent value Δindiff = 0. • So estimated with this interval unknown value of the difference Δunkn statistically significantly differs from 0 at the significance level α = 0,05. • Therefore the induced production IFN- α/β in IUGR group is statistically significantly higher than in healthy group. 88 ES  Δ = 33.152.171.0 IU/mL; dC = 1.87; Student t = 5.58
  • 89. Visualization of the comparison two meand using confidence interval for the mean difference Free Program ESCI JSMS.xls http://www.latrobe.edu.au/psy/esci/ • Presented 95% confidence interval (rose triangle and vertical segment) for the mean difference doesn’t cover the indifferent value Δindiff = 0. • So estimated with this interval unknown value of the difference Δunkn statistically significantly differs from 0 at the significance level α = 0.05. • Therefore the induced production IFN-α/β in IUGR group is statistically significantly higher than in healthy group. 89 Blue circles are observed values. Black dots and vertical segments are point and interval estimates of the unknown means. Rose triangle and vertical segment are estimates of their unknown difference.
  • 90. Newcombe’s standardized effect size: δN or StAUC • When σ1 = σ2 = σ, θ reduces to • Φ(δN /√2) • that is expressed in terms of the standard deviation σ. • Here Φ is common notation for the CDF (Cumulative Density Function) of the standard Gaussian (normal) distribution. • θ is more preferable than δN, as it is less depends on distributional assumptions, thus more satisfactory than the standardized difference. 90
  • 91. Interrelationship between AUC and StAUC AUC  θ StAUC  δN Size StAUC  δN AUC  θ 0.5 0 0 0.50 0.55 0.18 XS extra-small 0.25 0.57 0.6 0.36 0.5 0.64 0.65 0.55 S small 0.75 0.70 0.7 0.74 1 0.76 0.75 0.95 M medium 1.25 0.81 0.8 1.2 1.5 0.86 0.85 1.5 L large 1.75 0.89 0.9 1.8 2 0.92 0.95 2.3 XL extra-large 2.5 0.96 0.99 3.3 3 0.98 0.999 4.4 XXL extra-extra-large 3.5 0.993 4 0.998 91
  • 92. Standardized Cohen’s effect size, StES  dC M M d 1 2 pooled s C   92
  • 93. Standardized effect size (mean difference), StES  dC; how it looks like 93
  • 94. Verbal scale for the interpretation of the standardized Cohen’s effect size Standardized Cohen’s effect size, dC Interpretation 0 – 0,5 Negligibly small (worthless) 0,5 – 1,0 Small (weak) 1,0 – 1,5 Moderate 1,5 – 2,0 Large (strong) 2,0 – 3,0 Very large (very strong) 3,0 -  Extremely large 94
  • 95. Once more: Statistical significance and the Effect size • Effect (difference, association, correlation, risk, benefit, etc.) can be statistically significant, however, its practical (e.g., clinical) importance can appeared to be worthless. • “Statistically significant” does not imply “substantial”, “practically important”, “valuable”. • Effects can be real, nonrandom, but nonetheless, negligibly small. 95
  • 96. Confidence interval for the Standardized Cohen’s Effect Size dC. Free Program LePrep http://www.univ-rouen.fr/LMRS/Persopage/Lecoutre/PAC.htm 96
  • 97. Results: point estimates and 95% confidence intervals for the three main effect sizes • AUC – area under the ROC-curve: • AUC = 0.720.890.96 • StAUC – Newcombe’s standardized AUC: • StAUC = δN = 0.81.72.5 • StES – Cohen’s standardized difference of means: • StES = dC = 1.11.92.7 • Verbal interpretation: • with probability 95% the estimated unknown effect sizes can be interpreted as from medium to very large (strong). 97
  • 98. Statistical predictions and reproducibility “Prediction is very difficult, especially about the future” 98
  • 99. Repeat! • Often it is believed that if the “statistically significant” result is obtained, this excludes the need of repeating the experiment. • Testing the significance of statistical hypotheses is a method to detect rare events which deserve further investigation. • Fisher 99
  • 100. Cumming G. The New Statistics: Why and How. Psychological Science, 2014; 25(1): 7 –29. • Three problems are central: • Published research is a biased selection of all researches; • data analysis and reporting are often elective and biased; and • in many research fields, studies are rarely replicated, so false conclusions persist. 100
  • 101. Replication • A single study is rarely, if ever, definitive; additional related evidences are required. • Such evidences may come from a close replication, which, with meta-analysis, should give more reliable estimates than the original study. • A more general replication may increase reliability and also provide evidence of generality or robustness of the original finding. • We need increased recognition of the value of both close and more general replications, and greater opportunities to report them. 101
  • 102. Reproducibility and predictive ability of P-values and confidence intervals (n = 32). CI dance. Free program “ESCI PPS p intervals” http://www.latrobe.edu.au/psy/esci/. Cumming G. Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Persp. Psychol. Sci., 2008; 3: 286-300. 102
  • 103. • Thus, it is risky to rich definite conclusion from a single experiment only. • Any scientific investigation should be repeated manifold. • And a reproducibility of the results must be studied. 103
  • 104. Gigerenzer G. We need statistical thinking, not rituals. Behavioral and Brain Sciences, 1998; 21(2): 199-200 • A researcher cannot be unconcerned about: • “what would happen if additional subjects were to be included into the experiment?”, • “what would be the conclusion for the data of these future subjects?”, • “what would be the conclusion for the whole data?”, or • “what would happen if this experiment were to be repeated?” • Asking and answering such questions goes beyond the ritualized statistical procedures, and is likely to influence the way the authors of scientific papers interpret experimental findings and conduct their experiments. • Prediction probabilities are an unavoidable part of statistical thinking and the time is come to take them seriously. 104
  • 105. Prediction and confidence intervals. Program Instat+ http://www.reading.ac.uk/ssc/n/n_instat.htm 105
  • 106. Reproducibility of the absolute effect size ES for the healthy and IUGR groups at α = 0.05 and (1 – α) = 0.95 106 95% confidence interval for ES  Δ is from 33 to 71 IU/mL; 95% prediction interval for it is wider: from 25 to 78 IU/mL.
  • 107. 10-fold increasing sample size 107 If we will repeat the experiment 10 times independently, the prediction interval will become narrower and closer to the confidence level.
  • 108. Prediction interval versus confidence interval • Note that under 10-fold repetition of the experiment the 95% prediction interval becomes closer the observed 95% confidence interval. • This is demonstration of the meaning of confidence interval as that one which covers the estimated effect size under manifold (infinite) repetitions of the experiment. 108
  • 109. Reproducibility of the standardized Cohen’s effect size dC for the healthy and IUGR groups at α = 0.05 and (1 – α) = 0.95 109 95% confidence interval for StES  dC is from 1.1 to 2.7 IU/mL; 95% prediction interval for it is wider: from 0.8 to 3.1 IU/mL.
  • 110. 10-fold increasing sample size 110 If we will repeat the experiment 10 times independently, the prediction interval will become narrower and closer to the confidence level.
  • 111. Prediction probabilities, Prep, Psrep and Preprep 111 Probability of a same-sign effect is Prep = 1.0; of a same-sign and significant at α = 0.05 is Psrep = 0.99 and of a same-sign effect with Prep = 0.99 is Preprep = 0.98.
  • 112. Reproducibility of the P-value when comparing healthy and IUGR groups at α = 0.05 and (1 – α) = 0.95 112 Observed Pval = 3∙10-6. 95% prediction interval for it will be from extremely small from 3∙10-11 to the moderate 0.01.
  • 113. Probabilities of replication and prediction intervals • Thus, it is predicted that when our experiment will be repeated, than the probability to receive the same sign for the mean difference (expressed as absolute effect size ES as well as Cohen’s standardized effect size dC) will be • Prep = 1.00. • And the probability to receive the difference of the same sign and statistically significant at the level α = 0.05 will be • Psrep = 0.99. • Moreover, it is predicted that in future repetition of the experiment, the P-value could lie in very wide 95% prediction interval from very low to rather medium: • Pval = 3∙10-11 to Pval = 0.01. 113
  • 114. Main statistical tools and their destination • Bayes Factor (BF) → comparing statistical models and/or hypotheses • P-value → statistical hypothesis testing • Effect Size (ES) → practical (clinical) importance • Confidence intervals (CI) → visualization of both, the estimates and the hypotheses testing • Prediction Intervals (PI) → prediction of future repetitions 114
  • 115. Bayes theorem in action: connecting prior and posterior probabilities 115
  • 116. Reverend Thomas Bayes (c. 1702 – April 17, 1761) 116
  • 117. 117 Bayes Factor • Bayes factor differs principally from P-value (Рval). • Base factor is not a probability in itself, but a ratio of probabilities, and it can vary from zero to infinity: • BF01 = P(Dobs|H0) / P(Dobs|H1) • BF10 = P(Dobs|H1) / P(Dobs|H0) • This means that using Bayes factor provide not only testing the significance of the null hypothesis, but comparison of the probabilities to obtain the observed data under both hypotheses. • However, for this we should have a better idea of the alternative hypothesis.
  • 118. Amazing property of Bayes factor in terms of “odds” 118
  • 119. What are the odds? • The odds (in favor) of an event A is the ratio of the probability that the event will happen P(A) to the probability that the event will not happen P(Ā): • O(A) = P(A) : P(Ā) = P(A) : [1 – P(A)] • Conversely, the odds against an event A is the opposite ratio. • Such a representation of the probability is familiar to geneticists. • Famous Mendel’s ratio of 3 : 1 is a representation of the probabilities 3/4 and 1/4 in terms of odds. 119
  • 120. Bayes factor BF in terms of odds • Base factor not only shows how many times the probability P(Dobs|H0) differs from the probability P(Dobs|H1). • It also shows how many times the posterior odds in favor of one hypothesis against the other (alternative) differ from their a prior odds. • Conversely, | P H D 1 obs | P D H obs 0   : 10 • BF01 = 1/BF10 P H • Thus, we observe an amazing property of Bayes factor: • without knowing prior and posterior probabilities of both hypotheses, we can quantitatively compare their odds. 120           1   0 0 obs obs 0 P H P H D P D H BF | |
  • 121. Interpretation of credibility of Bayes factors BF10 and BF01 121 BF01 Evidence in favor of hypothesis Н0 against hypothesis Н1 >10 000 Convincing 100 – 1 000 Very strong 30 – 100 Strong 10 – 30 Moderate 3 – 10 Weak 1 – 3 Negligible BF10 Evidence in favor of hypothesis Н1 against hypothesis Н0
  • 122. John Arbuthnot 29.04.1667 – 27.02.1735 122
  • 123. Number of Christened in London during 82 years Year Boys Girls Year Boys Girls 1629 5218 > 4683 1650 2890 > 2722 1630 4858 > 4457 3231 > 2840 4422 > 4102 3220 > 2908 4994 > 4590 3196 > 2959 5158 > 4839 3441 > 3179 5035 > 4820 3655 > 3349 5106 > 4928 3668 > 3382 4917 > 4605 3396 > 3289 4703 > 4457 3157 > 3013 5359 > 4952 3209 > 2781 5366 > 4784 1660 3724 > 3247 1640 5518 > 5332 4748 > 4107 5470 > 5200 5216 > 4803 5460 > 4910 5411 > 4881 4793 > 4617 6041 > 5881 4107 > 3997 5114 > 4858 4047 > 3919 4678 > 4319 3768 > 3395 5616 > 5322 3796 > 3536 6073 > 5560 3363 > 3181 1669 6506 > 5829 1649 3079 > 2746 Year Boys Girls Year Boys Girls 1670 6278 > 5719 1691 7662 > 7392 6449 > 6061 7602 > 7316 6443 > 6120 7676 > 7483 6073 > 5822 6985 > 6647 6113 > 5738 7263 > 6713 6058 > 5717 7632 > 7229 6552 > 5847 8062 > 7767 6423 > 6203 8426 > 7626 6568 > 6033 7911 > 7452 6247 > 6041 1700 7578 > 7061 1680 6548 > 6299 8102 > 7514 6822 > 6533 8031 > 7656 6909 > 6744 7765 > 7683 7577 > 7158 6113 > 5738 7575 > 7127 8366 > 7779 7484 > 7246 7952 > 7417 7575 > 7119 8379 > 7687 7737 > 7214 8239 > 7623 7487 > 7101 7840 > 7380 7604 > 7167 1710 7640 > 7288 1690 7909 > 7302 • Total 484 382 > 454 041 • Total sum 938 423 123
  • 124. Comparison of the frequentist and Bayesian results • Testing homogeneity (independence) of the Arbuthnot data results in: • Pval ≈ 10-8 • BF01 = 8∙10117 • From the frequentist point of view the heterogeneity of Arbuthnot data is statistically highly significant. • From the Bayesian point of view the conclusion is diametrically opposite: • To obtain such data is 8∙10117 times more likely under the hypothesis H0 on their homogeneity then under the alternative hypothesis H1 on their heterogeneity. • Or: • The posterior odds in favor of the null hypothesis against alternative hypothesis are 8∙10117 times higher then their prior odds. 124
  • 125. Bayes Factor, online program Bayes Factor Calculators http://pcl.missouri.edu/bayesfactor 125
  • 126. Output • BF01 = 0.00018 and • BF10 = 1/ BF01 = 5555.5 • It is 5555 times more likely to obtain the value of the Student t-test statistic t = 5.58 with df = 34 under the H1:   0 than under H0:  = 0. • According to the verbal scale such value of BF10 is interpreted as convincing evidence in favor of H1 against H0. 126
  • 127. Summary Statistical evidences • AUC  θ = 0.720.890.96 • StAUC  δN = 0.81.72.5 • StES  dC = 1.11.92.7 • ΔHL = 385674 IU/mL • Δ = 335271 IU/mL • BF10= 5555 • Pval = 3∙10-6 Statistical predictions • 95% prediction intervals: • From 0.8 to 3.1 IU/mL • From 25 to 79 IU/mL • From 3∙10-11 to 0.010 • Probability of replication: • Psrep = 0.99 127
  • 128. Example 2 TGT – Thrombin Generation Test 128
  • 129. Castoldi E., Rosing J. Thrombin generation tests. Thrombosis Research, 2011; 127(Suppl. 3): S21–S25 • Parameters of the thrombin generation curve: • LT – lag time, min • TTP – time to peak, min • PT – peak thrombin, nM • ETP – endogenous thrombin potential, nM∙min • V – maximum velocity of thrombin generation, V = PT / (TTP – LT), nM/min 129
  • 130. Estimation of parameters of TGT, results of traditional NHST and effect sizes. n1 = 40, n2 = 53 LT, min ETP, nM∙min TTP, min PT, nM V, nM/min RI 8.0 – 27.4 1290 – 2480 17 – 41 85 – 192 5.3 – 25.4 M1 14 16 17 1820 1900 1990 25 27 28 125 134 144 11 13 15 M2 15 17 19 1640 1740 1830 29 31 33 100 106 113 7.1 7.9 8.7 Pval 0.37 0.015 0.0012 3∙10-6 10-8 Effect sizes ΔHL -3.3 -1.0 1.2 52 188 323 -7.3 -4.6 -1.8 14 28 40 3.3 4.6 6.0 SE  Δ -3.4 -1.3 0.7 43 167 294 -7.1 -4.5 -2.1 17 28 39 3.4 5.1 6.7 AUC  θ 0.44 0.55 0.67 0.55 0.67 0.77 0.68 0.70 0.79 0.66 0.77 0.85 0.73 0.83 0.90 StAUC  δN -0.61 -0.20 0.22 0.19 0.63 1.04 -1.13 -0.72 -0.28 0.53 1.06 1.48 0.89 1.36 1.80 StES  dC -0.66 -0.25 0.16 0.10 0.52 0.94 -1.15 -0.73 -0.30 0.65 1.09 1.53 0.89 1.35 1.80 n1 and n2 – sample sizes of the control and CAD groups; RI – nonparametric reference interval; М1 and М2 – sample means; Pval – P-value; ΔHL – Hodges-Lehmann shift estimate; Δ = М1 – М2 – effect size in real units; θ - area under ROC-curve; δN and dC – Newcombe’s and Cohen’s standardized effect sizes. Programs: Reference Value Advisor, PAST, StatXact, GENERALIZED.xls, ESCI-JSMS.xls, LePrep. 130
  • 131. Informativeness of TGT parameters 53 CHD patients and 40 people without clinical manifestations of coronary heart disease (data by Berezovskaya G.A.) dC – standardized Cohen’s effect size, Pval – Р-value, BF10 – Bayes factor for comparison of odds in favor of H1 versus H0, Psrep – probability of statistically significant effect of the same sign (direction) in a replication, Power – “achieved” power, n1 = n2 – minimum sample sizes for replication. Programs: ESCI-JSMS.xls, Online BF Calculator (http://pcl.missouri.edu/bayesfactor), LePrep, G*Power 131
  • 132. Syndrome of statistical leniency and credulity Fallacies and Confusions of Null Hypothesis Significance Testing (NHST) and P-value “What does a statistician call it when the heads of 10 rats are cut off and 1 survives? - Nonsignificant.” 132
  • 133. P-value • P-value is the most controversial concept in statistics. • Many textbook authors and the majority of experimenters do not understand what its final product – a P-value – actually means (Gigerenzer, 1988). • The concept of a P-value lies so far from the intuitive understanding that no ordinary person can hold it in memory. • ‘‘We rely too much on P values, and most of us really don’t have a clue what they mean.’’ • Lai J., Fidler F., Cumming G. Subjective p intervals: Researchers underestimate the variability of p values over replication. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 2012; 8: 51-62. 133
  • 134. What is P-value? What is null hypothesis H0? • A P-value is the probability of observing data as or more extreme as the actual outcome when the null hypothesis is true. • When testing null hypothesis we transform data into a test statistic. • Then the P-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. • Usually the null hypothesis is a statement of 'no effect' or 'no difference'. • The Null Hypothesis is often denoted H0 (read “H-nought”) 134
  • 135. Null Hypothesis Significance Testing Waltz • The P value is at the heart of the most common approach to data analysis – Null Hypothesis Significance Testing (NHST). • Think of NHST as a waltz with three steps: • (i) State a null hypothesis: that is, there is no effect. • (ii) Calculate the p value, which is the probability of getting results like ours and more extreme – if the null hypothesis is true. • (iii) If Pval is sufficiently small, reject the null hypothesis and sound the trumpets: • our effect is not zero, it's statistically significant! • Generations of students have been inducted into the rituals of .05 meaning "significant", and .01 "highly significant". 135
  • 136. Р-value, Рval • Thus, by definition, the P-value (Pval) is the conditional probability of obtaining the observed value of difference (dobs) and all other larger or less probable values (D ≥ dobs|H0), when the null hypothesis is true: • Pval = P(D ≥ dobs|H0). • In terms of the statistical hypothesis testing, P-value is: • The probability to obtain the modulus of observed value |tobs| of the test statistic T and all other larger or less probable values (i.e., the values even more deviating from the expected one) • under assumption that the null hypothesis H0 is true: • • Pval = P(|T| ≥ |tobs.| | H0). • Note that the “less probable values” are not observed. • We infer them out of all possible values in the frame of the chosen (null) model. 136
  • 137. • A P-value is usually interpreted as a measure of how much evidence we have against the null hypothesis, how much is contradiction between null hypothesis and observed data. • The null hypothesis, traditionally represented by the symbol H0, represents the hypothesis of no change or no effect. • The smaller the P-value, the more (stronger) evidence we have against H0. 137
  • 138. What is Test Statistic? • Test statistic is a statistic used for the testing the given null hypothesis. • Example: Student t-test statistic: M ~ M ~ • In such a case testing the null hypothesis H0 on the equality of two independent means (H0: M1 – M2 = 0) is reduced to the testing the null hypothesis on the t = 0. • When this hypothesis is true, than the distribution of the t-statistic is known. • Namely, it is the Student t-distribution. • This distribution has a single parameter called degrees of freedom, df.   1 2 2 1 2 1 2       , df n n s ~ t ~ M M 138
  • 139. William Sealy Gosset (June 13, 1876–October 16, 1937) is famous as a statistician, best known by his pen name Student and for his work on Student's t-distribution. 139
  • 140. n1 = 5, n2 = 7, df = 10, t = 1,5 P = 0,16 – the difference is statistically nonsignificant 140 http://ftparmy.com/103097-decision-visualizer.html
  • 141. n1 = 5, n2 = 7, df = 10, t = 3,0 P = 0,013 – the difference is statistically significant at the significance level α = 0,05, but not at 0,01 141
  • 142. Searching the threshold for the P-value: is it possible? • When small P-value is observed, the intuitive (extrastatistical) temptation appears to reject null hypothesis H0. • However, there is no statistical reason what P-value would be regarded as sufficiently small to reject H0 safely. • Once again, such decision is extrastatistical. • In practice, decision to reject or accept H0 must depend on circumstances. • In each specific (concrete) situation researcher should make her/his choice by oneself. 142
  • 143. 143 Traditional interpretation of the P-values (Pval) (and their Michelin star scale) P-value (Pval) Statistical significance Michelin stars > 0,05 Nonsignificant 0,05 – 0,01 Moderately significant * 0,01 – 0,001 Significant ** 0,001 – 0,0001 Highly significant *** < 0,0001 Extremely significant **** Four stars value 0,0001 was introduced recently by Harvey J. Motulsky: http://www.graphpad.com/guides/prism/6/statistics/index.htm?interpreting_a_small_p_value_from_an_unpaired_t_test.htm
  • 144. Tyranny and/or hypnosis of the figures 0.05 and 95% • Unfortunately, as a threshold the significance level α = 0.05 is most commonly used. • Too often the overcoming this threshold level (Pval < 0.05) solely in a single experiment is regarded as sufficient for the decision to reject the null hypothesis and conclude on the statistical significance of the observed effect. 144
  • 145. Andrey Nikolaevich Kolmogorov (25 April 1903 – 20 October 1987) • In statistics, the recommended significance level varies from 0.05 for preliminary orientation experiments to 0.001 for important ultimate conclusions, but the attainable reliability of probability conclusions is often much higher. • Thus, the principal conclusions of statistical physics are based on the neglect of probabilities of an order less than 10−10. • (1951) 145 http://www.encyclopediaofmath.org/index.php/Probability
  • 146. Sterne J.A.C., Davey Smith G. Sifting the evidence – what’s wrong with significance tests? BMJ, 2001; 322: 227-231. Cited by 763 • Presently, several other authors echo to Kolmogorov: • P-value closer to 0.05 is not a strong evidence against null hypothesis. • As a strong evidence against Н0 Pval < 0.001 should be regarded. • In addition to P-values it is strongly recommended to present confidence intervals for the effect size. 146
  • 147. “Flexible” P-values • In fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; • he rather gives his mind to each particular case in the light of his evidence and his ideas. • • Fisher R. A. Statistical Methods and Scientific Inference, 1956, pages 41-42. 147
  • 148. Sir Ronald Aylmer Fisher 17 Feb 1890 - 29 July 1962 148
  • 149. Warrning • Usually P-value is interpreted as a measure for the evidence given by the available data against the null hypothesis. • Strictly speaking, however, it is not a measure in mathematical sense. • It does not possess the additivity property, and moreover, • it does not satisfy to two the more important principle of the statistical theory – The Likelihood Principle and the P-postulate. 149
  • 150. Likelihood Principle • Verbosely, the Likelihood Principle is a statement that statistical analysis must operate with that and only that data which are actually obtained in the experiment. • However, for the calculation of Р-value (as it follows from its definition), not only the observed experimental data are used, but all other, less probable, which were not observed in fact. 150
  • 151. Р-postulate • To serve as real and adequate measure of the statistical evidence, Р-value should satisfy the simple rule (postulate) according to which the same Р-values have to present equal evidences against the null hypothesis. • This rule is called «Р-postulate». • Obviously, this minimal requirement is not met. • • Wagenmakers E.-J. A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 2007; 14(5): 779-804. 151
  • 152. Р-postulate • Intuitively one can recognize that Рval = 0.01 in the experiment with 10 observations will not demonstrate the same evidential strength as Рval = 0.01 in the experiment with 300 observations. • Equally, Рval = 0.001, obtained in one experiment and Рval = 0.01 in another does not imply that the effect observed in the first experiment is 10 times more evidential than in the second. 152
  • 153. P-value is the realization of corresponding random variable P* • P-value is an observed value of the corresponding random variable • P* • When null hypothesis H0 is true, then Pval has so called (continuous) standard uniform distribution, that is uniform distribution on the interval [0; 1]: • P* ~ Uni[0; 1]. 153
  • 154. P-value distributions Pike N. free spreadsheet: FDR.xls http://www.webcitation.org/5rxSzU7qL Δ = μ1 – μ2 = 0; Δ = μ1 – μ2 = 10; χ2 = 390,6; df = 400; Pval = 0,62 χ2 = 1348,8; df = 400; Pval = 4∙10-101 154 120 100 80 60 40 20 0 Frequency distribution of p-values Observed frequency Expected frequency 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 Frequency of values in range p-value defining upper limit of range 16 14 12 10 8 6 4 2 0 Frequency distribution of p-values Observed frequency Expected frequency 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 Frequency of values in range p-value defining upper limit of range These are histograms obtained with 200 simulations.
  • 155. Reproducibility and predictive ability of P-values and 95% confidence intervals (n = 32). Dance of Pval Free program “ESCI PPS p intervals” http://www.latrobe.edu.au/psy/esci/. Cumming G. Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Persp. Psychol. Sci., 2008; 3: 286-300. 155
  • 156. Reproducibility and predictive ability of P-values and 95% confidence intervals (n = 32). Dance of Pval Free spreadsheet “ESCI PPS p intervals” http://www.latrobe.edu.au/psy/esci/. Cumming G. Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Persp. Psychol. Sci., 2008; 3: 286-300. 156
  • 157. Reproducibility of the P-value when comparing healthy and IUGR groups at α = 0.05 and (1 – α) = 0.95 157 Observed Pval = 3∙10-6. 95% prediction interval for it will be from extremely small from 3∙10-11 to the moderate 0.01.
  • 158. Popular temptation • It is conventional to interpret the quintessence of traditional (frequentist) conclusions from the statistical hypotheses testing as: • The less P-value, the stronger is evidence (which is presented by the data) against null hypothesis H0 the bigger is a reason to doubt in H0. • Hence, whether intentionally or not (and seems rather naturally), the temptation appears to interpret P-value as a probability of the null hypothesis. 158
  • 159. Popular delusion • P-value is not a probability of null hypothesis! • P-value is calculated • under the assumption • that null hypothesis H0 is true: • Pval = P(|D| ≥ |dobs||H0), • Hence, P-value cannot be a probability of null hypothesis: • P{D|H0} ≠ P{H0|D} • Collection of other fallacies about P-value see, e.g.: • http://en.wikipedia.org/wiki/P-value • Goodman S. A dirty dozen: Twelve P-value misconceptions. Semin. Hematol., 2008; 45: 135-140 159
  • 160. Calibration of P-values • Vovk V. G. A logic of probability, with application to the foundations of statistics. Journal of the Royal Statistical Society. Series B (Methodological), 1993; 55(2): 317-351. • Sellke T., Bayarri M.J., Berger J.O. Calibration of p values for testing precise null hypotheses. The American Statistician, 2001; 55(1): 62-71. Cited by 321 • When   BF |  P H D 0 1 BF 01 01  • - lower bound for the probability of the null hypothesis H0 160 P 1 e val    01 val val BF  eP  lnP
  • 161. 161 The “price” of P-values Observed P-value Upper limit of 80% intreval for Pval Lower limit for the probability of hull hypothesis P(H0) Upper limit for the probabililty of repeat Рrepr 0.05 0.44 ≥ 29% < 50% 0.01 0.22 ≥ 11% < 73% 0.001 0.07 ≥ 1.8% < 90% Sellke T., Bayarri M.J., Berger J.O. Calibration of p values for testing precise null hypotheses. The American Statistician, Vol. 55, No. 1. (2001), pp. 62-71. Goodman S.N. A comment on replication, p-values and evidence // Statistics in Medicine, 1992. – Vol. 11. – P. 875-879. Cumming G. Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better // Perspectives on Psychological Science, 2008. – Vol. 3. – No. 4. – P. 186-300.
  • 162. The problem with p values: how significant are they, really? November 12th, 2013 Geoff Cumming http://phys.org/wire-news/145707973/the-problem-with-p-values-how-significant-are-they-really.html A p value of 0.05 has been the default ‘significance’ threshold for nearly 90 years … but is that standard too weak? Martin_Heigan 162
  • 163. Funny metaphor • “Perhaps p values are like mosquitos. • They have an evolutionary niche somewhere and no amount of scratching, swatting, or spraying will dislodge them”. • Campbell J.P. Editorial: Some remarks from the outgoing editor. Journal of Applied Psychology, 1982; 67: 691-700 163
  • 164. • The usefulness of P-values is quite limited, and we continue to suggest that these procedures be euthanized. • Anderson D.R., Burnham K.P. Avoiding pitfalls when using information-theoretic methods. The Journal of Wildlife Management, 2002; 66(3): 912-918. 164
  • 165. On seduction: • Yes, the P-value can seduce. • It is sexy and we can be blinded. • A significant P-value can perplex our thinking, where we simply get too excited and forget to look at the actual effect size. • Does that < 0.05 really matter when the effect size is small? • The study which concluded that the "internet is changing the dynamics and outcomes of marriage itself“ can be an example. • This study showed that those who meet their spouses online are less likely to divorce and more likely to have high marital satisfaction (of course with very significant P-values). • However, the effect size was very very small where happiness, for example, barely moved from 5.48 to 5.64. • So, do not sign up for match.com thinking that you may be happier with your spouse. 165
  • 166. Meaning of the P-value: Publish or Perish 166
  • 167. Pee-value (http://wmbriggs.com/blog/?p=9338) Statistics is the only field in which men boast of their wee p-values 167
  • 168. • Revised standards for statistical evidence • Valen E. Johnson • PNAS, 2013; 110(48): 19313–19317 • Supporting Information: • Johnson 10.1073/pnas.1313476 110 168
  • 169. Evidence thresholds γ and size of corresponding significance tests α 169
  • 170. Revised standards for statistical evidence • A simple strategy for improving the replicability of scientific research includes the following steps: • (i) Associate statistically significant test results with P values that are less than 0.005. • (ii) Associate highly significant test results with P values that are less than 0.001 (cf. Kolmogorov) and even 0.0001. • (iii) Report the Bayes factor in favor of the alternative hypothesis and the default alternative hypothesis that was tested. 170
  • 171. Revised standards for statistical evidence • (iv) BF10 > 30 or even > 100 should be considered as strong and convincing evidence in favor of alternative hypothesis H1. • Proposed modifications of common standards of evidence intend to reduce the rate of nonreproducibility of scientific results by a factor of 5 or greater. • Certainly, the larger sample sizes are required. 171
  • 172. Minimum sizes for two independent samples with non-overlapping values required to achieve the lower confidence limits for two measures of the effect size: AUCL and SESL Lower confidence limits for the effect size measured with: Confidence levels AUCL StAUCL 0.95 0.99 0.999 0.80 1.2 10 17 27 0.90 1.8 21 35 56 0.95 2.3 40 69 111 0.99 3.3 194 334 545 0.999 4.4 1923 3320 5418 Extrapolated using Newcombe’s free spreadsheet VISUALISETHETA.xls http://medicine.cf.ac.uk/primary-care-public-health/resources/ 172
  • 173. Джон Уайлдер Тьюки (John Wilder Tukey, 16.04.1915 — 26.07.2000) • Any research should be at least two-staged. • First stage – exploratory (preliminary, pilot, hypotheses generating) study. • Second stage – confirmatory study. • The second stage is designed on the basis of the results obtained at the first stage. 173
  • 174. Conclusions • Bad reproducibility of experimental results becomes a systemic problem in biomedicine. • One of the main reason of this is inadequate statistical analysis. • Statistical analysis should be comprehensive harmonizing statistical evidences and predictions as well as frequentist and Bayesian approaches. • It is insufficient to carry out the null hypothesis significance testing (NHST) reporting P-values. 174
  • 175. Conclusions (continued) • Statistical significance doesn’t mean clinical importance. • Effect size with confidence and prediction intervals should be reported. • Experiments an/or observations should be repeated many-many times and their agreement should be investigated. • The best way is to repeat the experiments independently in different laboratories (in different countries). 175
  • 176. Editorial politics • Journal editors and reviewers should not accept for publications the papers if they report results of a single experiment and no results of the independent replication. • Experts on statistics should be included in the editorial boards. • Reviewers should be obliged to re-examine all the calculations. • For this reason the free access to the initial (“raw”) data should be ensure. • Transparency and openness are cornerstones of the scientific method. 176
  • 177. Francis Galton, 1901 • “I have begun to think that no one ought to publish biometric results, without lodging a well-arranged and well-bound copy of his data in some place where it should be accessible, under reasonable restrictions, to those who desire to verify his work.” • Galton F. Biometry. Biometrika, 1901; 1(1): 7-10. • Galton’s suggestion of a store data had been revived by Professor Julian Huxley, and suggestion made for storing measurements in the British Museum of Natural History. 177
  • 178. • One of the most common and leading to the biggest disaster of temptations is tempting with the words: "Everybody does it" • Leo Tolstoy 178
  • 179. Books on Bayesian biostatistics 179
  • 180. 180 Lesaffre E., Lawson A. Bayesian Biostatistics. Bayesian Biostatistics. 2012. Wiley. 534 p. Broemeling L.D. Bayesian Biostatistics and Diagnostic Medicine. 2007. CRC Press, 216 p.
  • 181. 181 Kruschke J. Doing Bayesian Data Analysis. 2010. Academic Press, 672 p.
  • 182. Downey A.B. Think Bayes: Bayesian Statistics Made Simple. Version 1.0.1, 2012. Green Tea Press: Needham, Massachusetts, 195 p. 182 Albert J. Bayesian Computation with R. Series: Use R! 2nd ed. 2009, Springer, 299 p.
  • 183. Free Software • Educational: SUStats, http://www.jsc.nildram.co.uk/examples/sustats/diescore/DieScoreApplet.html • WinStat http://math.exeter.edu/rparris/winstats.html • SOCR http://www.socr.ucla.edu/ • Research: R http://cran.r-project.org/ • PAST http://folk.uio.no/ohammer/past/ • Instat+ http://www.reading.ac.uk/ssc/n/software/instat/337/Instat+_v3.37.msi • Online Bayes Factor Calculator http://pcl.missouri.edu/bayesfactor • LePAC and LePrep http://www.univ-rouen.fr/LMRS/Persopage/Lecoutre/PAC.htm • G*Power http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/ • Reference Value Advisor http://www.biostat.envt.fr/spip/spip.php?article63 • Newcombe’s spreadsheets http://medicine.cf.ac.uk/primary-care-public-health/resources/ • Cumming’s spreadsheets ESCI http://www.latrobe.edu.au/psy/esci/ • Harold Kaplan statistical pages http://printmacroj.com/statistics.htm • Commercial: • StatXact http://www.cytel.com/software-solutions/statxact • XLStat http:www.xlstat.com 183
  • 184. Commercial Software • StatXact http://www.cytel.com/software-solutions/statxact • XLStat http:www.xlstat.com • MedCalc https://www.medcalc.org/ • GraphPad Prism http://www.graphpad.com/ • StatsDirect http://www.statsdirect.com/ • Expensive monsters: • SAS http://www.sas.com/en_us/home.html • IBM SPSS http://www-01.ibm.com/software/analytics/spss/ • STATISTICA http://www.statsoft.com/ • John C. Pezzullo’s comprehensive list of statistical software: http://statpages.org/ 184
  • 185. Thank you for your attention Slides are freely available to all Nikita N. Khromov-Borisov Department of Physics, Mathematics and Informatics Pavlov First Saint Petersburg State Medical University Nikita.KhromovBorisov@gmail.com +7-952-204-89-49; +7-921-449-29-05 http://independent.academia.edu/NikitaKhromovBorisov 185