Like this presentation? Why not share!

# Basics statistics

## by BITS on Sep 15, 2011

• 2,147 views

This is the presentation of the BITS training session on "Essential statistics"....

This is the presentation of the BITS training session on "Essential statistics".

View more material on http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203865:essential-statistics&catid=81:training-pages&Itemid=190

### Views

Total Views
2,147
Views on SlideShare
2,057
Embed Views
90

Likes
1
129
0

## Basics statistics Presentation Transcript

• Introduction to statistics Els Adriaens, PhDDecember 17, 2010 1
• OverviewOutline Formulate a relevant research question Study design Gather the data according to the plan Analyze the data Explorative data-analyses (descriptives, graphically) Drawing inference (answer our research question with certain confidence) Report the resultsOverview 2
• Experimental versus observation studies Experimental studyDesign of an experimental study Observational studyOverview study designs Mixed experimental and observational studies Part 1 Design of a studyPart 1 – Design of a study 3
• Experimental versus observation studies Experimental studyDesign of an experimental study Observational studyOverview study designs Mixed experimental and observational studies Experimental study Factor levels (treatments) randomly assigned over the different experimental units (control over explanatory variable) → information about cause-and effect relationship between the explanatory factors and a response variable Example: Effect of Vitamin C on prevention of colds in 800 children. Half of the children were selected at random and received Vit C (treatment group) the remaining children received a placebo (control group) Qualitative explanatory factor with two levels and children as experimental unitsPart 1 – Design of a study 4
• Experimental versus observation studies Experimental studyDesign of an experimental study Observational studyOverview study designs Mixed experimental and observational studies Observational study Data obtained from non-experimental study: explanatory variables not controlled, randomization of the treatments to experimental units does not occur → establish associations between the explanatory factors and a response variable Example: Company officials wished to study the relation between the age of an employee and the number of days of illness in a year. Explanatory variable not controlled → age is observed Establish associations but no cause-and-effect: a positive relation between age and number of days of illness may not imply that number of days of illness is the direct result of age → younger employees work indoors while older employees usually work outdoors, and therefore work location is more responsible for the number of days of illness instead of agePart 1 – Design of a study 5
• Experimental versus observation studies Experimental studyDesign of an experimental study Observational studyOverview study designs Mixed experimental and observational studies Mixed studies Example: a clinical trial performed in 3 hospital centers, at each center the effect of drug on lowering blood cholesterol was investigated. Within each hospital center volunteers were randomly assigned to one of the two treatments (drug / placebo) Experimental factor: treatment (drug versus placebo) Observational factor: hospital center, not randomly assigned since each volunteer was assigned to the nearest hospital centerPart 1 – Design of a study 6
• Experimental - observation studies Factors and treatments MeasurementsDesign of an experimental study RandomizationOverview study designs Sampling from a population Structure of the experiment Factor B Level 1 Level 2 Level 3 Level 1 1 2 3 Factor A Level 2 4 5 6 Experimental unit Replicates = treatment repeated → estimate experimental error 2 levels of factor A x 3 levels of factor B = 6 treatments experimental unit: smallest unit of experimental material to which a treatment can be assigned, the experimental unit is determined by the method of randomization Part 1 – Design of a study 7
• Experimental - observation studies Factors and treatments MeasurementsDesign of an experimental study RandomizationOverview study designs Sampling from a population Number of factors: initial stages of investigation → include many factors (more than can possibly studied in a single experiment) Cause-and-effect diagrams are often used to identify factors that could affect the outcome → reduce number of factors Example : 4 factors each 2 levels → 16 treatment combinations Number of levels of each factor: Qualitative factors Quantitative factors: # levels reflect the type of trend expected by the experimenter • 2 levels ~ linear change in response: min – max of specified range • 3 levels ~ quadratic trend • > 4 levels ~ detailed examination shape of response curve desired Range of factor is one of the most important design decisionsPart 1 – Design of a study 8
• Experimental - observation studies Factors and treatments MeasurementsDesign of an experimental study RandomizationOverview study designs Sampling from a populationMeasurements: precision versus accuracy Precision of a variable: the degree to which a variable has nearly the same value when measured several times. It is a function of random error (chance) and is assessed as the reproducibility of repeated measurements. Example: weigh the same person 3 times on an electronic balance and obtain slightly different measurements – 67.5 kg, 67.4 kg and 67.6 kgThe more precise a measurement, the greater the statistical power at a givensample size to estimate mean values and to test hypothesisVariability may be due to operator, instrument and subject Minimize random error and improve precision Operating manuals, training the operator, refining / automating instruments Repeat the measurement and average over a larger number of observations (but! added cost, practical difficulties)Part 1 – Design of a study 9
• Experimental - observation studies Factors and treatments MeasurementsDesign of an experimental study RandomizationOverview study designs Sampling from a population Accuracy of a variable: the degree to which a variable actually represents what it is supposed to represent. It is a function of systematic error (bias) which is often difficult to detect and has important influence on the validity of the result. Example 1: incorrect calibration of an instrument Example 2: gastric freezing as a treatment for ulcers in the upper part of the intestine Improve accuracy and minimize bias Operating manuals, training the operator, refining / automating instruments Periodic calibration using a gold standard (example 1) Blinding: double–blind study: the experimental subject and the evaluator have no information on which treatment that they receive or give, any inaccuracy in measuring the outcome will be the same in the 2 groups (example 2)Part 1 – Design of a study 10
• Experimental - observation studies Factors and treatments MeasurementsDesign of an experimental study RandomizationOverview study designs Sampling from a population Bias and variance in shooting arrows at a target. Bias means that the archer systematically misses in the same direction. Variance means that the arrows are scattered (Moore and McCabe 2002)Part 3 – Statistical inference 11
• Experimental - observation studies Factors and treatments MeasurementsDesign of an experimental study RandomizationOverview study designs Sampling from a populationSampling from a population Simple random sample Population (N elements) Sample (n elements) Random draws With equal probabilityPart 1 – Design of a study 12
• Experimental - observation studies Factors and treatments MeasurementsDesign of an experimental study RandomizationOverview study designs Sampling from a population Randomization → treatments are at random assigned to experimental units Tends to eliminate the influence of extraneous factors not under direct control of the experimenter Blocking → increase precision by talking into account other factors Randomization Group 1 → treatment 1 Males Group 2 → treatment 2 Homogeneous Group 3 → treatment 3 Heterogeneous Subjects Randomization Group 1 → treatment 1 Females Group 2 → treatment 2 Homogeneous Group 3 → treatment 3Part 1 – Design of a study 13
• Experimental - observation studies Factors and treatments MeasurementsDesign of an experimental study RandomizationOverview study designs Sampling from a population Stratified Sampling Suppose we want to know the attitudes of male and female students in the engineering school Is a simple random sample from that school a good idea? No too few women (10%) Stratify the sample, pick a random sample from Stratum 1: female engineers Stratum 2: male engineers Estimates are measured with comparable precission. Learn from distribution in each stratum, do NOT pool the data e.g. if the average weight is 60kg for the women and 80 kg for the men, The average engineer will weight 10% x 60 + 90% x 80 = 78 kgPart 1 – Design of a study 14
• Types of variablesUnivariate descriptivesBivariate descriptives Part 2 Explorative data-analysisPart 2 – Explorative data-analysis 15
• Types of variablesUnivariate descriptivesBivariate descriptivesDescriptive statistics Allows the researcher to describe or summarize the data. This is typically done in the beginning of a results section. The researcher gives an idea of the sample size, the characteristics under study (e.g. baseline characteristics in a clinical trial) Example: A total of 235 students participated in this study, 163 women (69.4%) versus 72 men (30.6%). On average the female students (81.3 ± 19.4) had a slightly higher score on exam 2 in comparison to the male students (80.7 ± 18.1).Part 2 – Explorative data-analysis 16
• Types of variablesUnivariate descriptivesBivariate descriptives We typically start with univariate explorations (one variable at a time). Next, describe joint distributions (2 by 2 = bivariate; more variables = multivariate) Graphical summary to inspect the shape of the distribution: symmetry, modality, heaviness of tails Numerical summary: classical measures of location and spread Mean and standard deviation Median and interquartile range Mode: value that occurs most often (useful for nominal data)Part 2 – Explorative data-analysis 17
• Types of variablesUnivariate descriptivesBivariate descriptivesNotes on notation A random variable X is a variable whose value is a numerical outcome of a random phenomenon (nonnumerical outcomes are numerically encoded) Random variables are usually denoted by capital letters such as X, Y, … Fixed constants or observed values are usually denoted by small letters e.g. x, y. Special constants (to be specified) will be written as Greek letters α, β, μ, σ indices i will subscript random or observed outcomes for individual observations in the data set: Yi , yiPart 2 – Explorative data-analysis 18
• Types of variables Univariate descriptives Bivariate descriptivesType Characteristic Example Descriptive Information statistic contentCategorical the set of all possible values can be enumerated• Nominal Unordered categories Gender, race Counts, Lower proportions• Ordinal Ordered categories Degree of pain Median IntermediateContinuous can take all possible values Weight, Mean, Higheror ordered within some interval of real number of standarddiscrete numbers (continuous) or cigarettes per deviation limited to integers (discrete) day Part 2 – Explorative data-analysis 19
• Types of variables Histogram – Boxplot Normal curveUnivariate descriptives Measures for location centerBivariate descriptives Measures of spread Mean of a series of observations xi, i = 1, 2, …, n Properties given that X and Y are random variables and ‘a’ is a scalar µ aX +b = aµ X + b = ax + b µ X +Y = µY + µY = x + y Median (M): middle of the distribution such that at least 50% of the outcomes is larger than or equal to M and at least 50% of the outcomes is smaller than or equal to M For n uneven: this is the middle value in order of magnitude For n even: one will take the average of the two middle valuesPart 2 – Explorative data-analysis 20
• Types of variables Histogram – Boxplot Normal curveUnivariate descriptives Measures for location centerBivariate descriptives Measures of spread Mean is very sensitive to outliers Numbers of partners desired in the next 30 years Miller and Fishkin, 1997Part 2 – Explorative data-analysis 21
• Types of variables Histogram – Boxplot Normal curveUnivariate descriptives Measures for location centerBivariate descriptives Measures of spread Standard deviation of a series of observed values xi 1 n SD( x) = n ∑i =1 ( xi − x) 2 When the variable is approximately normally distributed, approximately 95% of the data will lie between x − 1.96 SD( x) and x + 1.96 SD( x) Square of SD is called the Variance Var(x) SD( x) Variation coefficient 100% xPart 2 – Explorative data-analysis 22
• Types of variables Histogram – Boxplot Normal curveUnivariate descriptives Measures for location centerBivariate descriptives Measures of spread Interquartile range (IQR): distance Q3 – Q1 with Q1: a value such that at least 25% of the outcomes fall below Q1 and at least 75% of the outcomes fall above Q1 Q3: a value such that at least 75% of the outcomes fall below Q3 and at least 25% of the outcomes fall above Q3 If more than one value satisfies this criterion, the average is usually takenPart 2 – Explorative data-analysis 23
• Types of variables Histogram – Boxplot Normal curveUnivariate descriptives Measures for location centerBivariate descriptives Measures of spread Five number summary: Min, Q1, Median, Q3 Max whiskers reach to largest observation within a distance of 1.5 x IQR 1.5 x IQR Birth weight IQR quartiles MedianPart 2 – Explorative data-analysis 24
• Types of variables Histogram – Boxplot Normal curveUnivariate descriptives Measures for location centerBivariate descriptives Measures of spread Bar diagram for continuous data – relative or absolute frequencies Percentage Birth weightPart 2 – Explorative data-analysis 25
• Types of variables Histogram – Boxplot Normal curveUnivariate descriptives Measures for location centerBivariate descriptives Measures of spread Normal distribution 1  x−µ  2 1 −   Density φ ( x) = e 2 σ  σ 2π μ is the population mean σ² is the population variance Notation X ~ N(μ, σ²) X −µ If X ~ N(μ, σ²), then Z = ~ N(0, 1) is a standard normal distribution σPart 2 – Explorative data-analysis 26
• Types of variables Histogram – Boxplot Normal curveUnivariate descriptives Measures for location centerBivariate descriptives Measures of spread Properties of the standard normal distribution N(0, 1) unimodal: 1 maximum (i.e. 0) symmetric around 0 68-95-99.7 rule: • 68% of the area under the curve (AUC) lies between -1 and 1, 68% of the observations fall within 1 SD of the mean μ • 95% of the AUC lies between -2 and 2, 95% of the observations fall within 2 SD of the mean μ • 99.7% of the AUC lies between -3 and 3, 99.7% of the observations fall within 3 SD of the mean μPart 2 – Explorative data-analysis 27
• Types of variables Histogram – Boxplot Normal curveUnivariate descriptives Measures for location centerBivariate descriptives Measures of spreadNormal quantile plot Compares two distributions by plotting their quantiles against each other If the observed and the normal distribution are identical, points are expected to lie on a straight line with intercept 0 and slope 1 Distributions with the same shape but simply rescaled or shifted still show up on a straight line but with different intercept (shift) or slope (scale change) Normal Q-Q plot of randomly generate data N(0, 1) randomly generated exponential dataPart 2 – Explorative data-analysis 28
• Types of variables Continuous dataUnivariate descriptives Categorical dataBivariate descriptivesBivariate relations – continuous data Graphical: boxplots, (stacked) histrograms, scatter plots Correlation coefficient (r): Takes values between -1 and 1 Pearson correlation coefficient expresses a degree of linear dependence 1 n  xi − x yi − y  r = ∑ × n i =1  SD( x) SD( y )   ! Summary statistic cannot r = 0.816 replace the individual examination of the data Source wikipedia – Anscombe’s QuartetPart 2 – Explorative data-analysis 29
• Types of variables Continuous dataUnivariate descriptives Categorical dataBivariate descriptivesBivariate relations - Spearman’s Rank correlation (-1 and 1) Measures of monotone association (extent to which as one variable increases, the other variable tends to increase or decrease) No assumption on linearity Ordinal variables Source: Answers.comPart 2 – Explorative data-analysis 30
• Types of variables Continuous dataUnivariate descriptives Categorical dataBivariate descriptivesBivariate relations - Spearman’s Rank correlation (-1 and 1) Corneal irregular astigmatism after laser in situ keratomileusis for myopia Br J Ophthalmol 2001;85:534-536 X Spearman rank correlationhttp://geographyfieldwork.com/SpearmansRank.htm rs=0.440, p <0.0001Part 2 – Explorative data-analysis 31
• Types of variables Continuous dataUnivariate descriptives Categorical dataBivariate descriptives2x2 associations – categorical data: comparing two proportions Many studies are designed to compare two groups (X) on a binary response variable (Y) Y X Success Failure Group 1 π1 1-π1 π: probability of succes Group 2 π2 1-π2 1-π: probability of failureExample: is there an association between antiviral drug use (X) and pneumonia(Y). Pneumonia Pneumonia Yes No Yes No Antiviral drug 579 45172 45751 Antiviral drug 0.013 0.987 1 Control 648 45103 45751 Control 0.014 0.986 1Part 2 – Explorative data-analysis 32
• Types of variables Continuous dataUnivariate descriptives Categorical dataBivariate descriptivesRisk difference: is there a difference between the group taking antiviral drug and the control group π1 – π2 = 0.013 – 0.014 = -0.001 Properties -1 ≤ (π1 - π2) ≤ 1 if response is independent of group, then (π1 - π2) = 0 A difference may be more important when both success probabilities are close to 0 or 1 than when both p’s are close to 0.5 Example (p1-p2) = 0.09 (0.1-0.01=0.09) or (0.50-0.41=0.09) In the first case, p1 is 10 times larger than p2 while in the second case p1 is only 1.2 times larger than p2.Part 2 – Explorative data-analysis 33
• Types of variables Continuous dataUnivariate descriptives Categorical dataBivariate descriptivesRelative risk: ratio of the success probabilities of the 2 groups Properties 0 ≤ (π1/ π2) ≥ 1 if response is independent of group, then (π1/ π2) = 1 Antiviral drug example (p1/p2) = (.013/.0.14) = 0.894 with 95% CI: 0.799, 0.999 The sample proportion of pneumonia cases was 10.6% lower for the group prescribed antiviral drug. The CI of the relative risk indicates that the risk of pneumonia is at least 1% lower for the group prescribed antiviral drug.Part 2 – Explorative data-analysis 34
• Types of variables Continuous dataUnivariate descriptives Categorical dataBivariate descriptivesOdds ratioFor a probability π of success, the odds are defined to be Odds ≥ 0 with values > 1 when a success is more likely than a failure. For example, if π = .75, then the odds of success = .75/.25 = 3.0: a success is three times as likely as a failure. If Ω = 1/3, a failure is three times as likely as a success. The ratio of the odds Ω1 and Ω2 in the two rows is called the odds ratio Properties odds ratio 0≤θ≥∞ When X and Y are independent, then θ = 1 the odds ratio does not change value when the orientation of the table reverses (rows become columns, columns become rows)Part 2 – Explorative data-analysis 35
• Types of variables Continuous dataUnivariate descriptives Categorical dataBivariate descriptivesOdds ratio - continued Properties if θ = 4, the odds of success in row 1 are 4 times the odds in row 2, and thus subjects in row 1 are more likely to have success than are subjects in row 2 θ = 4 does not mean that the probability π1 is four times π2 (that would be the interpretation of relative risk) the odds ratio does not change when both cell counts within any row (or column, but not both) are multiplied by a nonzero constant; this implies that the odds ratio does not depend on the marginal counts within a row/columnPart 2 – Explorative data-analysis 36
• Types of variables Continuous dataUnivariate descriptives Categorical dataBivariate descriptivesOdds ratio - Example Pneumonia Sample odds ratio is computed by Yes No Antiviral drug 579 45172 45751 Control 648 45103 45751 For the patients prescribed antiviral drug, the estimated odds of pneumonia is 579/45751 = 0.013. There were 1.3% pneumonia cases for every 100 cases with no pneumonia. The sample odds ratio = 579*45103/648*45172 = 0.892. (95% CI: 0.797, 0.999). The estimated odds for patients prescribed antiviral drug equals 0.892 times the estimated odds for patients in the control group. The estimated odds were 10.8% lower for the antiviral drug group.Part 2 – Explorative data-analysis 37
• Types of variables Continuous dataUnivariate descriptives Categorical dataBivariate descriptivesRelation between odds ratio and relative risk When the proportion of successes is close to 0 for both groups, the sample odds ratio is similar to the sample relative risk. In such a case, on odds ratio of 0.89 does mean that the probability of success for the patients prescribed antiviral drug is about 0.89 times the probability of success for the patients in the control group Relative risk = 0.894 (95% CI: 0.799, 0.999) Odds ratio = 0.892 (95% CI: 0.797, 0.999)Part 2 – Explorative data-analysis 38
• Types of variables Continuous dataUnivariate descriptives Categorical dataBivariate descriptivesWhat should be used, risk difference, relative risk or odds ratio The odds ratio is the preferred estimate In a case-control study it is usually not possible to estimate the probability of an outcome given X (π1), and therefore it is also not possible to estimate the difference of proportions or relative risk for that outcome In a retrospective study, 709 patients with lung cancer (cases) were queried about their smoking behavior (X). Each case was matched with a control patients: same age, same gender, same hospital but no lung cancer Odds ratio = 2.97 the estimated odds of lung cancer for smokers were 2.97 times the estimated odds for non-smokers Lung cancer Cases Controls Smoker 688 650 Non-smoker 21 59 Total 709 709Part 2 – Explorative data-analysis 39
• Part 3 Statistical inferencePart 3 – Statistical inference 40
• DistributionsBias and varianceHypothesis testing Statistical inference: by using the laws of probability, we infer conclusions about a population from data collected in a random sample Population (N elements) Sample (n elements) X Random sample X Collect data μ, σ SD(x) Make inferences about population A parameter (μ, σ) is a number that describes the population. A parameter is a fixed number, but its value is unkown in practice. A statistic ( X , SD( x) ) is a number that describes the sample. Its value is known when we have collected a sample, but it changes from sample to sample.Part 3 – Statistical inference 41
• Distributions Binomial distributionBias and variance Poisson distributionHypothesis testing Normal distribution The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. Binomial distribution Poisson distribution Normal distributionPart 3 – Statistical inference 42
• Distributions Binomial distributionBias and variance Poisson distributionHypothesis testing Normal distribution Binomial distribution Fixed number of n independent observations Each observations falls in one of two categories (success/failure) The probability of success ‘p’ is the same for each observation → denote X the number of successes among the n observations which can take values 0, 1, …, n then X ~ B(n, p) Properties µ X = np σ X = np(1 − p) 2 Probability mass functionPart 3 – Statistical inference 43
• Distributions Binomial distributionBias and variance Poisson distributionHypothesis testing Normal distribution Poisson distribution: expresses the number Y of events in a given unit of time, space, volume, or any other dimension Example → modeling a phenomenon in which we are waiting for an occurrence (waiting for customers to arrive in a bank) Basic assumption: for small time intervals, the probability of an occurrence is proportional to the length of waiting time Single parameter λ >0, the average number of events per unit of measurement. k = number of occurrences of an event λ = expected number of occurrences that occur during the given interval µY = λ σY = λ 2Part 3 – Statistical inference 44
• Distributions Binomial distributionBias and variance Poisson distributionHypothesis testing Normal distribution Normal distribution 1  x−µ  2 1 −   density φ ( x) = e 2 σ  σ 2π X1, X2, …, Xn is a simple random sample with mean μ and variance σ² if Xi ~ N(μ, σ²) then X ~ N(μ, σ²/n)Central limit theorem Draw a simple random sample (X1,… , Xn) of size n from a population with mean μ and finite variance σ². When n is large, the sample average then follows approximately a normal distribution regardless of the data distribution.  σ² X ~ N  µ,   n Part 3 – Statistical inference 45
• Distributions Sampling variabilityBias and variance Standard deviation vs standard errorHypothesis testing Confidence interval Law of large numbers: population mean μ of X is unknown. The mean x of a simple random sample → estimate of μ . X is a random variable that varies in repeated sampling guarantees that as the sample size of a simple random sample increases, the sample mean x gets closer to the population mean μ Unbiased statistic: a statistic used to estimate an unknown parameter is unbiased if the mean of its sampling distribution is equal to the true value of the parameter being estimated. Variability of a statistic is described by the spread of its sampling distribution. Spread determined by sampling design and sample size. Larger samples have smaller spread.Part 3 – Statistical inference 46
• Distributions Sampling variabilityBias and variance Standard deviation vs standard errorHypothesis testing Confidence interval How precise is our estimate? Sample Population Generalize findings for general population Estimate must approximate the population value Representative sample → prevents the results for the sample from being biased → results are still subject to sampling variability: different samples from the same population will yield different results Generalizing results from the sample to the study population then requires that we acknowledge sampling variabilityPart 3 – Statistical inference 47
• Distributions Sampling variabilityBias and variance Standard deviation vs standard errorHypothesis testing Confidence intervalStandard deviation ≠ standard error Standard error measures the uncertainty in an estimate (standard error of the mean = SEM) µ σ n Sampling distribution of the sample means X Standard deviation (SD) of the observations → measures the variability in the observations both are standard deviations, but the standard error shrinks with increasing sample size, in contrast to the standard deviation of the observations The mean and SD are the preferred summary statistics for (normally distributed) data, and the mean and 95% confidence interval are preferred for reporting an estimate and its measure of precision.Part 3 – Statistical inference 48
• Distributions Sampling variabilityBias and variance Standard deviation vs standard errorHypothesis testing Confidence intervalConfidence intervals When we estimate a parameter by calculating a sample statistic, there is a degree of uncertainty in our estimation We can construct an interval around the sample mean X within which we expect the true population mean μ with known probability (e.g. 95% chance) (1-α)100% confidence interval for the mean contains the population mean with (1-α)100 % chance. Confidence level or coverage probability is (1-α) σ known σ unknown  σ   s  X ±z  X ±  t n −1,α / 2 ×   n  nPart 3 – Statistical inference 49
• Distributions Principle of statistical testsBias and variance p-value and powerHypothesis testing one-sided versus two-sided testingHypothesis testing The null hypothesis (Ho) assumes ‘no difference’ or ‘no effect’ The average … is equal in both treatment groups The alternative hypothesis (HA) is claiming the opposite The average … differs by treatment Type of decision H0 true HA true Accept H0 Correct decision (1-α) Type II error (β) p>α Reject H0 Type I error (α) Correct decision (1- β) p<α PowerPart 3 – Statistical inference 50
• Distributions Principle of statistical testsBias and variance p-value and powerHypothesis testing one-sided versus two-sided testing We assume H0 is true unless we can demonstrate, based on sample data at the desired level of confidence, that HA is true. → level of confidence related to 2 potential types of statistical errors • example: in a clinical trial we want to study the effect of an experimental drug (T) and compare it to a placebo (P) H0 : effect of drug T = effect of P HA : effect of drug T ≠ effect of P Type I error (false positive): concern of the regulators, the drug is not working but it will go to the market Type II error (false negative): concern of pharmaceutical companies, could not prove that the new drug is workingPart 3 – Statistical inference 51
• Distributions Principle of statistical testsBias and variance p-value and powerHypothesis testing one-sided versus two-sided testingSensitivity and specificity Gold standard Positive (ill) Negative (not-ill) Test outcome False Positive (FP) True Positive (TP) → Positive Type I error (P-value) Test outcome False negative(FN) True Negative (TN) → Negative Type II error Sensitivity Specificity Proportion ill Proportion non-ill people identified people identified as being ill non-illPart 3 – Statistical inference 52
• Distributions Principle of statistical testsBias and variance p-value and powerHypothesis testing one-sided versus two-sided testingWhen are hypothesis needed Hypothesis are not needed in descriptive studies If any of the following terms appears in the research question (study not simply descriptive) a hypothesis should be formulated: greater than, less than, causes, leads to, compared with, more likely than, associated with, related to, similar to, correlated with. The hypothesis should be clearly stated in advance.Part 3 – Statistical inference 53
• Distributions Principle of statistical testsBias and variance p-value and powerHypothesis testing one-sided versus two-sided testingPrincipal of statistical testing calculate a test statistic which measures ‘distance’ from the observed sample to the null hypothesis, whose distribution is known under the null hypothesis Reject Ho test statistic t exceeds a chosen cut-off c (critical value) in magnitude p-value stays below a chosen cut-off α in magnitude safety principle: cut-off is chosen such that the risk of making a Type I error is controlled at a prespecified significance level α Usually α = 0.05 (test performed at the 5% significance level) the power of the test (probability to avoid Type II errors, 1-β) is not controlled → chose adequate designs and sufficiently large sample sizesPart 3 – Statistical inference 54
• Distributions Principle of statistical testsBias and variance p-value and powerHypothesis testing one-sided versus two-sided testing critical value c: reject H0 when the test statistic t exceeds the chosen cut-off c in magnitude p-value: probability to find a result for the test statistic at least as extreme as the observed result (in the direction of the alternative hypothesis), if the null hypothesis holds Acceptance region α = 0.05 Rejection region Rejection region α α 2 2 cL cR Distribution of test statisticPart 3 – Statistical inference 55
• Distributions Principle of statistical testsBias and variance p-value and powerHypothesis testing one-sided versus two-sided testing Power: 1 − β = 1 − P (accept H0|HA) = P (reject HA|HA) For many testing problems H0 is formulated very precisely, but there are usually an infinite number of distributions consistent with HA. σ n µ1 − µ 0 With what probability must the statistical test Standardized effect size σ detect this smallest relevant difference? ~ 91% chance of finding an association of that size or greaterPart 3 – Statistical inference 56
• Distributions Principle of statistical testsBias and variance p-value and powerHypothesis testing one-sided versus two-sided testingOne-sided versus two sided testing Two-sided testing One-sided testing Decided prior to data analysis and avoid one-sided tests unless there are really good reasons for using them (only one direction of the association is clinically or biologically relevant) never wrong to use a two-sided test where a one-sided test is applicable at most a slight loss of powerPart 3 – Statistical inference 57
• Distributions Bias and variance Hypothesis testingMultiple and Post Hoc Hypotheses - testing problem Inflated rate of false positive conclusions (Type I error) Assume we perform 3 independent comparison between 2 groups, each conducted with α = 0.05 The probability that each of the tests → conclude H0 is correct in each case = (0.95)³ =0.857 → the chance of finding at least one false positive statistically significant test increases to 14.3% (1-0.857=0.143, not 0.05) Adjusting for multiple hypotheses is especially important when the consequences of making a false positive error are large e.g. mistakenly concluding that an ineffective treatment is beneficial Adjustments can be made → False Discovery rate control Part 3 – Statistical inference 58
• Part 4 Statistical testsPart 4 – Statistical tests 59
• Continuous/Categorical data Parametric statistics Non-parametric statistics Categorical data – ProportionsContinuous data Parametric statistics Non-parametric statisticsCategorical data Ordinal versus nominal Types of testing One-sample tests Two dependent groups Two independent groups More than two groups Controlling for covariatesPart 4 – Statistical tests 60
• Continuous/Categorical data Parametric statistics Non-parametric statistics Categorical data – ProportionsDependent versus independentDependent Independent Subject Time x Time y Subject Treatment Weight Treatment A Treatment B Volunteer 1 A x1A Weight Weight Volunteer 1 x1A x1B Volunteer 2 A x2A Volunteer 2 x2A x2B Volunteer 3 A x3A Volunteer 3 x3A x3B Volunteer 4 x4A A Volunteer 4 x4A x4B Volunteer 5 x5A A Volunteer 5 x5A x5B Volunteer 6 B x6B Volunteer 7 B x7B Volunteer 8 B x8B Volunteer 9 B x9B Volunteer 10 B x10BPart 4 – Statistical tests 61
• Continuous data Parametric statisticsCategorical data – Proportions Non-parametric statisticsParametric statistics assumes that the data come from a type of probability distribution and make inferences about the parameters of the distribution requires assumptions (e.g. Normal distribution), if they are correct they produce more accurate and precise estimates and have generally more statistical power e.g. Independent sample t-test Assumptions • Independent observations • Population 1 → X1i ~ N(μ1, σ²) Population 2 → X2i ~ N(μ2, σ²) H0 : μ1 = μ2 → H0 two distributions are equalPart 4 – Statistical tests 62
• Continuous data Parametric statistics Rank testsCategorical data – Proportions Non-parametric statistics Permutation testsNon-parametric statistics – rank tests no specific assumption about the population distribution required Example: statistics based on Rank tests Let X1, …, Xn denote a sample of n observations, the rank of observation Xj is defined as Rj = R(Xj) = number of observations in the sample < Xj n = ∑ I (X i ≤ X j ) i =1 The smallest observation gets rank 1, the second smallest rank 2, …, the largest observation gets rank n. In case of ties (a tie is a pair of equal observations), the ranks of the tied observations are defined as the average of their ranks according to the definition just given. These are called mid-ranks.Part 4 – Statistical tests 63
• Continuous data Parametric statistics Rank testsCategorical data – Proportions Non-parametric statistics Permutation tests Example Observations Ranks 2 1 8 2 12 (3+4)/2 12 (3+4)/2 15 5 39 6 Properties of rank-transformed observations they only depend on the ordering of the observations they are insensitive to outliers (robust) the distribution of the ranks does not depend on the distribution of the observationsPart 4 – Statistical tests 64
• Continuous data Parametric statistics Rank testsCategorical data – Proportions Non-parametric statistics Permutation testsNon-parametric statistics – permutation tests reference distribution of a characteristic of interest is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points.Example: a company has a new training program and whishes to evaluate if thenew method is better than the traditional one. To assess the effect of the newmethod, they set up an experiment with 7 new employees. Four of them arerandomly assigned to the new training method, and the other three received theold training method.Observed data Rearrangement New Traditional New Traditional Permutations 37 23 37 23  7  7! 49 31 49 31 55  = = 35 55 46 55 31 46  4  4!3! 57 57Part 4 – Statistical tests 65
• Continuous data Parametric statistics Rank testsCategorical data – Proportions Non-parametric statistics Permutation testsPermutation tests to verify whether there is a difference in means of a continuous measurement in 2 independent populations Permutation null distribution H0 : F1(x) = F2(x) for all x. HA : μ1 > μ2 Test statistic T = X1 − X 2 Example: we have 35 possible permutations (each having a t*-value), the collection of all the t*-values is the permutation null distributionPart 4 – Statistical tests 66
• Continuous data Parametric statistics Rank testsCategorical data – Proportions Non-parametric statistics Permutation testsPermutation test - example Test statistic T = X1 − X 2 → t = 49.5 – 33.3 = 16.2 Permutation null distribution of the 35 possible permutations, under the null hypothesis all t*-values are equally likely H0 will be rejected for large T (T>c, critical value), c controls the type I error rate at α P(T > c |H0) < αPart 4 – Statistical tests 67
• Continuous data Parametric statistics Rank testsCategorical data – Proportions Non-parametric statistics Permutation testsParametric versus non-parametric tests Parametric tests: the data are sampled from a population with N-distribution OR large sample size (CLT) Smaller sample size: outliers or skewed distribution can be problematic → transformation or non-parametric tests (permutation or rank tests) Permutation tests: very flexible Non-parametric rank tests: in case of no meaningful measurement scale (pain score, Apgar score, …) Careful with formulation of H0 and interpretation of the analysis Less powerPart 4 – Statistical tests 68
• Continuous data Parametric statistics Rank testsCategorical data – Proportions Non-parametric statistics Permutation tests Categorical / discrete data: the set of all possible values can be enumerated Ordinal data: ordered categories Age group, pain assessment from no to severe, Likert scales (agree strongly, agree, neutral, disagree, disagree strongly) Nominal data: categories have no natural order, sometimes called qualitative data (gender, race, hair color) Counts: variables are represented by frequencies Proportions / percentages Ratio of counts e.g. binary or dichotomous data: have exactly two possible outcomes (success / failure), we count the number of success in the number of trialsPart 4 – Statistical tests 69
• One-sample tests Parametric statistics One-sample t-test Non-parametric statistics Categorical data - ProportionsOne-sample t-test to verify whether the mean of a continuous measurement deviates from a given value μ0 H0 : μ = μ0 HA : μ ≠ μ0 Test statistic t-distributed with n-1 degrees of freedom (df) Assumptions Independent observations Normally distributed observations or large samplePart 4 – Statistical tests 70
• One-sample tests Parametric statistics 1-way contingency tables Non-parametric statistics Categorical data – ProportionsOne categorical variable with J ≥ 2 categories Example: number of students in each of the three main subjects in the 1st master psychology (2003-2004) Suppose that in the population, the true proportions are:Part 6 – Categorical data 71
• One-sample tests Parametric statistics 1-way contingency tables Non-parametric statistics Categorical data – ProportionsX² test One categorical variable with J ≥ 2 categories Statistic H0 : pj = πj for all j or for frequencies nj = μj HA : pj ≠ πj Statistic Example, df = J − 1 = 2 and P < .0001, strongly suggesting that the null hypothesis should be rejected.Part 6 – Categorical data 72
• Two dependent samples Parametric statistics Paired sample t-test Non-parametric statistics Categorical data - ProportionsPaired sample t-test to verify whether 2 continuous measurements, obtained from paired subjects, are the same on average H0 : μ1 = μ2 HA : μ1 ≠ μ2 → calculate differences Y = X1 – X2 and use the one-sample t-test to verify whether H0 : μ = 0 versus HA : μ ≠ 0, where μ is the average of Y Assumptions Independent differences Normally distributed differences or large sample (n ≥ 40) n ≥ 15 t-test fine unless very skewed distribution or outliers n < 15 data ~ N-distr, very skewed distribution or outliers problematicPart 4 – Statistical tests Source assumptions ‘Introduction to the practice of statistics, Moore & McCabe’ 73
• Two dependent samples Parametric statistics Wilcoxon signed rank test Non-parametric statistics Categorical data - ProportionsWilcoxon signed rank test Compare 2 dependent samples → the difference variable Y = X1 - X2 Whit Yi + observations on the positive differences (i = 1, …, n+) and Yi - observations on the negative differences (i = 1, …, n-) then H0 : P(Y - < Y +) = ½ HA : P(Y - < Y +) > ½ StatisticPart 4 – Statistical tests 74
• Two dependent samples Parametric statistics Wilcoxon signed rank test Non-parametric statistics Categorical data - ProportionsWilcoxon signed rank test - Example Two stories ware narrated to children with reading disorders, story 1 was not illustrated whereas story 2 was illustratedChild 1 2 3 4 5Story 1 0.40 0.72 0.00 0.36 0.55Story 2 0.77 0.49 0.66 0.28 0.38Difference (Yi ) 0.37 -0.23 0.66 -0.08 -0.17ranks of |Yi | 4 3 5 1 2signed ranks 4 -3 5 -1 -2 V=9V= 9, n=5, p=0.406From this small sample we could not conclude that children with reading disorderscan tell a story better when the story was illustrated.Part 4 – Statistical tests 75
• Two dependent samples Parametric statistics Models for matched pairs Non-parametric statistics Categorical data - ProportionsModels for matched pairs For comparing categorical responses for 2 samples when each sample has the same subject or when a natural pairing exists between each subject in one sample and a subject from the other sample. McNemar test compares proportions in paired studies H0 : π1+ = π+1 After Total Before Yes No HA : π1+ ≠ π+1 Yes n11 n12 n1+ No n21 n22 n2+ Total n+1 n+2 nPart 4 – Statistical tests 76
• Two independent samples Parametric statistics Independent sample t-test Non-parametric statistics Categorical data - ProportionsIndependent sample t-test to verify whether the mean of a continuous measurement is the same in 2 independent populations H0 : μ1 = μ2 versus HA : μ1 ≠ μ2 Test statistic Measurement variance = in the 2 groups Measurement variance ≠ in the 2 groups t* Assumptions Independent observations Normally distributed observations or large sample in each group Small but equal sample size n1 = n2 = 5 and shape of distributions comparable → we can still trust on t-test proceduresPart 4 – Statistical tests 77
• Two independent samples Parametric statistics Independent sample t-test Non-parametric statistics Categorical data - ProportionsIndependent sample t-test – continued Measurement variance = in the 2 groups, SE of the mean difference can be estimated as With Measurement variance ≠ in the 2 groups, SE of the mean difference can be estimated as (1-α)100% confidence interval for μ1 - μ2 versusPart 4 – Statistical tests 78
• Two independent samples Parametric statistics Rank tests Non-parametric statistics Mann-Whitney U, Wilcoxon Rank Sum Categorical data - ProportionsMann-Whitney (U) test, Wilcoxon rank-sum test Compare 2 independent samples H0 : F1(x) = F2(x) for all x HA : P(X1 < X2) ≠ ½ where X1 and X2 have distributions F1 and F2, respectively. If X1 and X2 are continuous random variables, the test may be thought of as testing the null hypothesis that the probability of an observation from one population exceeding an observation from the second population is 0.5, this implies P(X1 < X2) = P(X1 > X2) = ½ → test statistics based on this principlePart 4 – Statistical tests 79
• Two independent samples Parametric statistics Rank tests Non-parametric statistics Mann-Whitney U, Wilcoxon Rank Sum Categorical data - Proportions Is the Wilcoxon rank-sum test the nonparametric alternative for the independent-sample t-test? Remember H0 : F1(x) = F2(x) for all x (2 distributions are equal) HA : P(X1 < X2) ≠ ½ → the ranks cannot be used to estimate the mean! Independent sample t-test H0 : μ1 = μ2 HA : μ1 ≠ μ2Part 4 – Statistical tests 80
• Two independent samples Parametric statistics 2X2 contingency tables Non-parametric statistics Categorical data – Proportions2x2 contingency tables Example: Patient characteristics at the onset of first-line treatment with gefitinib or chemotherapy Frequency Conditional distribution of ECOP PS status given treatment ECOG PS Total ECOG PS TotalTreatm <2 ≥2 Treatm <2 ≥2Gefinitib 70 17 87 Gefinitib 0.805 0.195 1.00Chemo 57 4 61 Chemo 0.934 0.066 1.00Total 127 21 TotalTwo variables are said to be statistically independent if the conditionaldistributions of Y (Eastern Cooperative Oncology Performance status) areidentical at each level of X (treatment)Part 4 – Statistical tests 81
• Two independent samples Parametric statistics 2X2 contingency tables Non-parametric statistics Categorical data – ProportionsTesting independence - Pearson chi-square test H0 : πij = πi+ π+j for all i and j or for frequencies nj = μj HA : πij ≠ πi+ π+j Statistic Example Χ² = 4.964, df=1, ECOG PS status and treatment are significantly associated, The proportion of patients with a poor ECOG performance status (≥ 2) was higher in the first-line gefitinib group (20%) than in the first-line chemotherapy group (7%; P = 0.026).Part 4 – Statistical tests 82
• Two independent samples Parametric statistics 2X2 contingency tables Non-parametric statistics Categorical data – ProportionsTesting independence – Fisher’s exact testFor small samples, Fisher’s exact test: assumes that the row and margin totalsare fixed (hypergeometric distribution). When this assumption is not met (mostcases), Fisher’s exact test is very conservative, resulting in a type I error below0.05. H0 : θ = 1 HA : θ ≠ 1Treatm Adeno Nonadeno TotalGefinitib 85 2 87 Two-sided p-values:Chemo 58 3 61 Fisher’s exact test p = 0.403Total 142 5 673 Chi-square test p=0.385Part 6 – Categorical data 83
• Two independent samples Parametric statistics 2X2 contingency tables Non-parametric statistics Categorical data – ProportionsLarge samples In case of very large sample sizes pearson chi-square will reject almost any null hypothesis, even if the deviation of the observed from the expected counts is of little importance → use the Gini index (value equals the proportion of observations that would have to be moved from one cell to another in order for the observed counts to equal the expected countsSmall samples Inferences based on chi-square distribution become questionable when the expected counts in some cells become too small (below 5) even when the total sample size is large → use exact solutions (Fishers Exact test)Part 6 – Categorical data 84
• ≥ two independent samples Parametric statistics Analysis of Variance Non-parametric statistics Categorical data – ProportionsOne-way analysis of variance (ANOVA) to verify whether the mean of a continuous measurement is the same in 2 or more independent populations H0 : μ1 = μ2 = … = μk versus HA : at least 1 of the population means differs Between MSE H0 Test statistic F = ~ F k −1, n − k Within MSE Assumptions Independent observations Normally distributed observations or large sample within each group (Q-Q plots) Equal variance in each group (boxplots or Levene’s test)Part 4 – Statistical tests 85
• ≥ two independent samples Parametric statistics Analysis of Variance Non-parametric statistics Categorical data – ProportionsANOVA principle Is variation between groups large as compared to variation within groups Consider k groups with each ni observations with jth observation in ith group k ni k ni k ni ∑∑ (Yij − Y ) 2 = ∑∑ (Yij − Yi ) 2 + ∑∑ (Yi − Y ) 2 i =1 j =1 i =1 j =1 i =1 j =1 Total Sum of Squares = within SS + between SSPart 4 – Statistical tests 86
• ≥ two independent samples Parametric statistics Analysis of Variance Non-parametric statistics Categorical data – Proportions ANOVA Table Source Sum of Squares df Mean Squared Error F SS MSE k ni Between ∑∑ (Y i − Y ) 2 i =1 j =1 k-1 SS B k −1 MSEB MSEW k ni Within ∑∑ (Y i =1 j =1 ij − Yi ) 2 n-k SSW n−k k ni Total ∑∑ (Y i =1 j =1 ij − Y )2Part 4 – Statistical tests 87
• ≥ two independent samples Parametric statistics Analysis of Variance Non-parametric statistics Categorical data – Proportions Deviations from the assumptions one-way analysis of variance is robust against lack of normality → in case of important deviations from a normal distribution : use nonparametric Kruskal-Wallis test or transformations ANOVA is not sensitive to the assumption of homogeneity of variances (perform Levene’s test at the 1% sigificance level) → heterogeneity of variances • little impact when the group level sample sizes ≈ equal: Type I error rate is slightly increased • with important heterogeneity and markedly ≠ group level sample sizes, weighted least squares regression may be used, weighting each observation by the inverse group level standard deviationPart 4 – Statistical tests 88
• ≥ two independent samples Parametric statistics Analysis of Variance Non-parametric statistics Categorical data – ProportionsPost-hoc analysis if ANOVA detects no difference, we conclude that there is insufficient evidence of a difference in means if ANOVA detects a difference → post hoc analysis to investigate where the - difference is DO NOT perform all pairwise comparisons using independent samples t-tests → multiple testing problem Assume we perform 3 different t-test, each conducted with α = 0.05 The probability that each of the tests → conclude H0 is correct in each case = (0.95)³ =0.857 (assuming independence of tests) → the level of sign that at least one of the three tests leads to conclusion HA when H0 holds in each case would be 1-0.857=0.143 (not 0.05). The level of significance and power for a family of tests ≠ individual testPart 4 – Statistical tests 89
• ≥ two independent samples Parametric statistics Analysis of Variance Non-parametric statistics Categorical data – ProportionsFamily-wise error rate - αE The probability of making at least 1 false discovery (type I errors) among all the hypotheses when performing multiple pairwise tests → We should correct for the risk of false detections most procedures for multiple testing are designed to control the risk of at least 1 false detection at αE, assuming that all k null hypotheses are true when the k tests are independent, each with significance level α, then αE = P(at least 1 Type I error) = 1 − (1 − α)k ≈ k α family-wise error rate increases with the number of testsPart 4 – Statistical tests 90
• ≥ two independent samples Parametric statistics Analysis of Variance Non-parametric statistics Categorical data – ProportionsMultiple comparison procedures that control family-wise error rate Bonferroni procedure Conservative test: makes less Type I errors than allowed for (and thus more Type II errors) Only applicable when the effects to be investigated are identified in advance of the data analysis Tukey procedure Preferred method when only pairwise comparisons are to be made Scheffé procedure Preferred method when the family of interest is a set of all possible contrasts among the factor level meansPart 4 – Statistical tests 91
• ≥ two independent samples Parametric statistics Analysis of Variance Non-parametric statistics Categorical data – ProportionsRules of thumb never interpret a large p-value as indicating absence of association never interpret a small p-value as indicating an important association report p-values in combination with an effect estimate and confidence interval! This allows for judging whether the effect is practically significant. in some cases, it may be advisable to determine equivalence intervals prior to data analysisPart 4 – Statistical tests 92
• > two independent samples Parametric statistics Kruskal-Wallis test Non-parametric statistics Categorical data – ProportionsKruskal-Wallis rank test k-sample problem, compare more than 2 independent samples H0 : F1(x) = F2(x) = … = Fk(x) for all x HA : P(X1 < X2) ≠ ½ the observations in some populations are systematically larger than in other populations Assumptions the observations in each group come from populations with the same shape of distributionPart 4 – Statistical tests 93
• > two independent samples Parametric statistics Kruskal-Wallis test Non-parametric statistics Categorical data – ProportionsKruskal-Wallis rank test the rank test statistic is basically an MSEbetween based on the ranks rank all observations in the combined sample let Rij denote the rank Xij (i =1, …, k, j =1, …, ni) Kruskal-Wallis test statistic average of the ranks Rij (j =1, …, ni) in the ith groupPart 4 – Statistical tests 94
• > two independent samples Parametric statistics Kruskal-Wallis test Non-parametric statistics Categorical data – ProportionsKruskal-Wallis rank test when H0 is rejected → at least 2 means are different → pairwise comparisons Wilcoxon rank sum statistic or Mann-Whitney statistic: alternative hypothesis in terms of probabilities: HA : P(X1 > X2) … Family-wise error rate – αE → we should correct for the risk of false detections, Bonferroni correction: when m tests must be performed simultaneously, each of the tests must be performed at α = αE / m equivalent: multiply each p-value with m before interpretingPart 4 – Statistical tests 95
• ≥ two independent samples Parametric statistics Analysis of Covariance (ANCOVA)controlling for covariate Non-parametric statistics Categorical data – ProportionsAnalysis of Covariance - ANCOVA Adjustment for a confounder (e.g. age) Just like in ANOVA we have a treatment effect (consider for example 3 treatments) We add the variable age to our model → adjustment for a confounderPart 4 – Statistical tests 96
• ≥ two independent samples Parametric statistics Breslow-Day testcontrolling for covariate Non-parametric statistics Cochran-Mantel-Haenszel test Categorical data – ProportionsThree-way contingency tables In studying the effect of an explanatory variable X on a response variable Y, one should control covariates that can influence that relationship Example: Peginterferon alfa for hepatitis C Virologic Response Genotype Treatment Yes No 1 A 138 160 Conditional odds ratio θ1 B 103 182 2 A 106 34 Conditional odds ratio θ2 B 88 57 Total A 244 194 Marginal odds ratio B 191 239Part 4 – Statistical tests 97
• ≥ two independent samples Parametric statistics Breslow-Day testcontrolling for covariate Non-parametric statistics Cochran-Mantel-Haenszel test Categorical data – ProportionsBreslow-Day test for testing homogeneity of odds ratios The odds ratio between X and Y is the same as in different Z categories. It is a test of homogeneous association.Part 4 – Statistical tests 98
• ≥ two independent samples Parametric statistics Breslow-Day testcontrolling for covariate Non-parametric statistics Cochran-Mantel-Haenszel test Categorical data – ProportionsCochran-Mantel-Haenszel Test of conditional independence Conditional XY independence given Z in a 2 × 2 × K table. The response is conditionally independent of the treatment in any given strata Inappropriate when the association varies dramatically among the partial tablesPart 4 – Statistical tests 99
• ≥ two independent samples Parametric statistics Breslow-Day testcontrolling for covariate Non-parametric statistics Cochran-Mantel-Haenszel test Categorical data – ProportionsCochran-Mantel-Haenszel Test of conditional independence Example Colon cancer: ECOG PS-adjusted OR = 1.52 (95% CI, 0.98-2.36, p=0.064 CMH test). Indicating that the response is independent of the treatment in the different ECOP PS strata. 6. Bokemeyer et al, 2008: M&M and p 667 Efficacy Response ECOP PS Treatment Yes No 0 Cet. + FOL Conditional odds ratio θ1 FOLFOX-4 1 Cet. + FOL Conditional odds ratio θ2 FOLFOX-4 2 Cet. + FOL Conditional odds ratio θ3 FOLFOX-4 Total Cet. + FOL 77 92 Marginal odds ratio = 1.51 FOLFOX-4 60 108Part 4 – Statistical tests 100