3. T - Test
• A t-test is any statistical hypothesis test in which the test statistic follows a Student's t-
distribution under the null hypothesis. It can be used to determine if two sets of data are
significantly different from each other.
• In probability and statistics, Student's t-distribution (or simply the t-distribution) is any
member of a family of continuous probability distributions that arises when estimating
the mean of a normally distributed population in situations where the sample size is
small and population standard deviation is unknown.
• Let 𝑋1, 𝑋2, … , 𝑋 𝑛 be independent and identically distributed as N(𝜇, 𝜎2
), i.e. this is a
sample of size n from a normally distributed population with expected value 𝜇 and
unknown variance 𝜎2
.
4. Assumptions
• Interval or ratio scale of measurement
• X follows a normal distribution with mean μ and variance σ2
• Z and s are independent,
• Where Z= Standard normal variable
s =The ratio of sample standard deviation over population
standard deviation
• Generally used when sample size is <30.
One sample T test statistic
• The random variable
T=
𝑋−𝜇
𝑠
𝑛
Where, 𝑆2
=
1
𝑛−1 𝑖=1
𝑛
(𝑋𝑖 − 𝑋)2
has a Student's t-distribution with n − 1 degrees of freedom.
5. SAS Code for One sample T test
Example:
title 'One-Sample t Test';
data time;
input time @@;
datalines;
43 90 84 87 116 95 86 99 93 92
121 71 66 98 79 102 60 112 105 98
;
run;
proc ttest h0=80 alpha=0.05;
var time;
run;
6. Output
Here, we see that the p value (0.0329) < α (0.05) hence we reject 𝐻 𝑜.
Hence we conclude that the mean weight of given sample is not equal to 80.
One-Sample t Test :
7. Types of Two sample T – Test
• Independent Two Sample T – Test:
The independent samples t-test is used when two separate sets
of independent and identically distributed samples are obtained,
from each of the two populations being compared.
• Dependent (Paired) Two Sample T – Test:
A typical example of the repeated measures t-test would be
where subjects are tested prior to a treatment, say for high blood
pressure, and the same subjects are tested again after treatment
with a blood-pressure lowering medication.
8. Assumptions of Independent two sample t-test
Given two groups 1 and 2 this test is only applicable when:
• The two sample sizes (that is, the number n of participants of each group) are equal
• The two samples have the same variance
• Samples have been randomly drawn independent of each other
Testing Assumptions
Normality:- Shapiro–Wilk or Kolmogorov–Smirnov test, or it can be
assessed graphically using a normal quantile plot
If the two groups are normally distributed then to check
equality of variance we use F - Test If normality is rejected then we use Wilcoxon Test.
Both the assumptions are satisfied thus we go
for two sample T - Test
If variance of two groups is not equal then
we use Welch Test
10. Independent Two sample T – test in SAS
Example:
data wt;
input group $ wt @@;
datalines;
Ind 85 Ind 70 Ind 64 Ind 87 Ind 76 Ind 95 Ind 67 Ind 87 Ind 93 Ind 82
Aus 90 Aus 95 Aus 103 Aus 107 Aus 95 Aus 112 Aus 98 Aus 92 Aus 98 Aus 115
;
run;
proc ttest h0=0 alpha=0.05;
class group;
var wt;
run;
12. Paired Two sample T – test in SAS
Example:
data wt;
input Before After @@;
datalines;
120 128 140 132 126 118 124 131 128 125 130 132 130 131 140 141 126 129 118 127
135 137 127 135
;
run;
proc ttest data=wt sides=2 alpha=0.05 h0=0;
title "Paired sample t-test";
paired Before * After;
run;
Output
The TTEST Procedure
Difference: Before - After
13. ANOVA
ANOVA – Analysis Of Variance
Analysis of variance (ANOVA) is a collection of statistical models used to
analyze the differences among group means.
14. • The mathematical model that describes the relationship between the response and
treatment for the one way ANOVA is given by
𝑌𝑖𝑗 = µ + α𝑖 + ε𝑖𝑗 , i = 1,2….k, j = 1,2,…... 𝑛𝑖
where, 𝑌𝑖𝑗 : 𝑗 𝑡ℎ
observation on 𝑖 𝑡ℎ
treatment,
µ : common effect for the whole experiment,
α𝑖 : 𝑖 𝑡ℎ
treatment effect,
ε𝑖𝑗 : random error.
One – Way ANOVA
Hypothesis of one – way ANOVA
H0: The means of all the groups are equal.
Vs
H1: Not all of the group means are the same.
15. • The normality assumption:
dependent variable should be approximately normally distributed
• The homogeneity of variance assumption:
variance of each group should be approximately equal
• The independence assumption:
observations should be independent of each other
Assumptions of one way-ANOVA
Example
• Suppose we are testing a new drug to see if it helps reduce the time to recover from a
fever. We decide to test the drug on three different races (Caucasian, African American,
and Hispanic. We randomly select 10 test subjects from each of those races, so all
together, we have 30 test subjects.
• The response variable is the time in minutes after taking the medicine before the fever is
reduced.
18. • Two-way ANOVA is a type of study design with one numerical outcome variable
and two categorical explanatory variables.
• Mathematical model of two –way ANOVA is as follows
𝑌𝑖𝑗 = µ + α𝑖 + 𝛽𝑗 + γ𝑖𝑗 + ε𝑖𝑗𝑘
where µ is overall mean effect
α𝑖 is effect due to 𝑖 𝑡ℎ level of first factor
𝛽𝑗 is effect due to 𝑗 𝑡ℎ level of second factor
γ𝑖𝑗 is effect due to interaction between 𝑖 𝑡ℎ
level of first factor and 𝑗 𝑡ℎ
level of second factor
Two – Way ANOVA
Assumptions:
• The populations from which the samples were obtained must be normally or approximately
normally distributed.
• The samples must be independent.
• The variances of the populations must be equal.
• The groups must have the same sample size.
19. Proc anova data=time;
class gender race;
model time=gender race;
run;
SAS code for Two-way ANOVA
Output of two-way ANOVA
21. ANCOVA
• ANCOVA by definition is a general linear model that includes both ANOVA (categorical)
predictors and Regression (continuous) predictors.
• ANCOVA examines the influence of an independent variable on a dependent variable
while removing the effect of the covariate factor.
• ANCOVA first conducts a regression of the independent variable (i.e., the covariate) on
the dependent variable.
• The residuals (the unexplained variance in the regression model) are then subject to an
ANOVA.
22. ANCOVA in other words…
• Analysis of Covariance (ANCOVA) is a statistical test related to ANOVA
• It tests whether there is a significant difference between groups after controlling for
variance explained by a covariate
• A covariate (CV) is a continuous variable that correlates with the dependent variable (DV)
• This is one way that you can run a statistical test with both categorical and continuous
independent variables
Purposes of ANCOVA
Increase sensitivity of F test
• Removes predictable variance from the error term
• Improves power of the analysis
23. Adjustment of Covariate Effect
Partitioning variance in ANOVA
Variance
Variance due to Treatment Within cell variance(Error)
Variance due to Covariate
Partitioning variance in ANCOVA
Variance
Variance due to Treatment Within Cell Varaince(Error)
24. Relationship between CV and DV
Hypotheses for ANCOVA
H0: the group means are equal after controlling for the covariate
Vs
H1: the group means are not equal after controlling for the covariate
25. • linearity of regression: The regression relationship between the dependent variable and
concomitant variables must be linear.
• homogeneity of error variances: The error is a random variable zero mean and equal
variances for different treatment classes and observations
• independence of error terms: The errors are uncorrelated. That is that the error
covariance matrix is diagonal.
• normality of error terms: The residual (error terms) should be normally distributed
𝜀𝑖𝑗~𝑁(0, 𝜎2)
• homogeneity of regression slopes: The slopes of the different regression lines should be
equivalent, i.e., regression lines should be parallel among groups.
Assumptions
26. Choosing Covariates
Variables that affect or have the potential to affect the dependent variable
• Demographic information
• Inherent characteristics
• Differences in group characteristics due to sampling
Number of covariates depends on:
• Known relationship or previous research
• No. of independent variables or groups
• Total no. of subjects
Example
Suppose we want to compare the effect of drugs on the weights of a particular group of
patients(homogeneous among themselves). We can analyze the data by performing the ANCOVA by
regarding:
𝑦: the final weight of the patients taking drugs, after a specified period as the response variable.
𝑥: the initial weight of the animals at the time of starting the experiment as the covariate.
27. Model
So, our model becomes:
𝑦𝑖𝑗 = 𝜇 + 𝛼𝑖 + 𝛽 𝑥𝑖𝑗 − 𝑥00 + 𝜀𝑖𝑗 Where,
• 𝜇 is the general mean effect
• 𝛼𝑖 is the (fixed) additional effect due to the 𝑖th treatment (i=1,2,….,p)
• 𝜀𝑖𝑗 is the random error effect (j=1,2,…, 𝑛𝑖)
• 𝛽 is the coefficient of regression of y on x
• 𝑥𝑖𝑗 is the value of covariate variable corresponding to the response variable 𝑦𝑖𝑗
• PROC MIXED data =<name of the dataset>;
CLASS variables ;
MODEL dependent = < fixed-effects > <covariates> < / options > ;
LSMEANS fixed-effects < / options > ;
run;
SAS Codes in ANCOVA
Group means are adjusted based on the how much amount of effect the covariate actually has . The formula does kind of a mini regression equation and figures out how much variance is explained in the outcome by the covariate that we might have and then it can actually give a quantitative value to say this covariate is either increasing or decreasing the outcome variable by this amount 5points or 10 points. And the group mean is just adjusted by that 5 point or 10 point the covariate actually had. So the formula looks at the relationship between the covariate and the outcome and then predicts how much variance it has with the outcome an actual value numeric value and then removes the numeric value from the group means and so now the group means are adjusted based upon the effect the covariate is having. So we get SPSS and we actually look at the outputs you’ll see the group means will be designated as the adjusted group means. In other words, group means have been changed by a certain amount and the amount is the effect the covariate is actually having on the outcome. So that how it works.
The effect of covariates should be same between treatment groups
CLASS declares qualitative variables that create indicator variables in design matrices
MODEL specifies dependent variable and fixed effects, setting up X
LSMEANS computes least squares means for classification fixed effects <DIFF computes differences of the least squares means, ADJUST= performs multiple comparisons adjustments, CL produces confidence limits>