Anova, ancova

Glimpse of T test, ANOVA and
ANCOVA
Prachi Bari
Sanjana Ghadigaonkar
Samrat Parage
Vaibhav Deshmukh
Pragya Sinha
Aritra Das

T test
A statistical examination of two population means

T - Test
• A t-test is any statistical hypothesis test in which the test statistic follows a Student's t-
distribution under the null hypothesis. It can be used to determine if two sets of data are
significantly different from each other.
• In probability and statistics, Student's t-distribution (or simply the t-distribution) is any
member of a family of continuous probability distributions that arises when estimating
the mean of a normally distributed population in situations where the sample size is
small and population standard deviation is unknown.
• Let 𝑋1, 𝑋2, … , 𝑋 𝑛 be independent and identically distributed as N(𝜇, 𝜎2
), i.e. this is a
sample of size n from a normally distributed population with expected value 𝜇 and
unknown variance 𝜎2
.

Assumptions
• Interval or ratio scale of measurement
• X follows a normal distribution with mean μ and variance σ2
• Z and s are independent,
• Where Z= Standard normal variable
s =The ratio of sample standard deviation over population
standard deviation
• Generally used when sample size is <30.
One sample T test statistic
• The random variable
T=
𝑋−𝜇
𝑠
𝑛
Where, 𝑆2
=
1
𝑛−1 𝑖=1
𝑛
(𝑋𝑖 − 𝑋)2
has a Student's t-distribution with n − 1 degrees of freedom.

SAS Code for One sample T test
Example:
title 'One-Sample t Test';
data time;
input time @@;
datalines;
43 90 84 87 116 95 86 99 93 92
121 71 66 98 79 102 60 112 105 98
;
run;
proc ttest h0=80 alpha=0.05;
var time;
run;

Output
Here, we see that the p value (0.0329) < α (0.05) hence we reject 𝐻 𝑜.
Hence we conclude that the mean weight of given sample is not equal to 80.
One-Sample t Test :

Types of Two sample T – Test
• Independent Two Sample T – Test:
The independent samples t-test is used when two separate sets
of independent and identically distributed samples are obtained,
from each of the two populations being compared.
• Dependent (Paired) Two Sample T – Test:
A typical example of the repeated measures t-test would be
where subjects are tested prior to a treatment, say for high blood
pressure, and the same subjects are tested again after treatment
with a blood-pressure lowering medication.

Assumptions of Independent two sample t-test
Given two groups 1 and 2 this test is only applicable when:
• The two sample sizes (that is, the number n of participants of each group) are equal
• The two samples have the same variance
• Samples have been randomly drawn independent of each other
Testing Assumptions
Normality:- Shapiro–Wilk or Kolmogorov–Smirnov test, or it can be
assessed graphically using a normal quantile plot
If the two groups are normally distributed then to check
equality of variance we use F - Test If normality is rejected then we use Wilcoxon Test.
Both the assumptions are satisfied thus we go
for two sample T - Test
If variance of two groups is not equal then
we use Welch Test

Test statistic
Where,
Hypothesis
Here the hypothesis to be tested is:
𝐻 𝑜: 𝜇1 = 𝜇2
vs
𝐻1: 𝜇1 ≠ 𝜇2

Independent Two sample T – test in SAS
Example:
data wt;
input group $ wt @@;
datalines;
Ind 85 Ind 70 Ind 64 Ind 87 Ind 76 Ind 95 Ind 67 Ind 87 Ind 93 Ind 82
Aus 90 Aus 95 Aus 103 Aus 107 Aus 95 Aus 112 Aus 98 Aus 92 Aus 98 Aus 115
;
run;
proc ttest h0=0 alpha=0.05;
class group;
var wt;
run;

Paired Two sample T – test in SAS
Example:
data wt;
input Before After @@;
datalines;
120 128 140 132 126 118 124 131 128 125 130 132 130 131 140 141 126 129 118 127
135 137 127 135
;
run;
proc ttest data=wt sides=2 alpha=0.05 h0=0;
title "Paired sample t-test";
paired Before * After;
run;
Output
The TTEST Procedure
Difference: Before - After

ANOVA
ANOVA – Analysis Of Variance
Analysis of variance (ANOVA) is a collection of statistical models used to
analyze the differences among group means.

• The mathematical model that describes the relationship between the response and
treatment for the one way ANOVA is given by
𝑌𝑖𝑗 = µ + α𝑖 + ε𝑖𝑗 , i = 1,2….k, j = 1,2,…... 𝑛𝑖
where, 𝑌𝑖𝑗 : 𝑗 𝑡ℎ
observation on 𝑖 𝑡ℎ
treatment,
µ : common effect for the whole experiment,
α𝑖 : 𝑖 𝑡ℎ
treatment effect,
ε𝑖𝑗 : random error.
One – Way ANOVA
Hypothesis of one – way ANOVA
H0: The means of all the groups are equal.
Vs
H1: Not all of the group means are the same.

• The normality assumption:
dependent variable should be approximately normally distributed
• The homogeneity of variance assumption:
variance of each group should be approximately equal
• The independence assumption:
observations should be independent of each other
Assumptions of one way-ANOVA
Example
• Suppose we are testing a new drug to see if it helps reduce the time to recover from a
fever. We decide to test the drug on three different races (Caucasian, African American,
and Hispanic. We randomly select 10 test subjects from each of those races, so all
together, we have 30 test subjects.
• The response variable is the time in minutes after taking the medicine before the fever is
reduced.

Proc anova data=time;
class gender race;
model time=gender race;
run;
SAS code for one-way ANOVA

• Two-way ANOVA is a type of study design with one numerical outcome variable
and two categorical explanatory variables.
• Mathematical model of two –way ANOVA is as follows
𝑌𝑖𝑗 = µ + α𝑖 + 𝛽𝑗 + γ𝑖𝑗 + ε𝑖𝑗𝑘
where µ is overall mean effect
α𝑖 is effect due to 𝑖 𝑡ℎ level of first factor
𝛽𝑗 is effect due to 𝑗 𝑡ℎ level of second factor
γ𝑖𝑗 is effect due to interaction between 𝑖 𝑡ℎ
level of first factor and 𝑗 𝑡ℎ
level of second factor
Two – Way ANOVA
Assumptions:
• The populations from which the samples were obtained must be normally or approximately
normally distributed.
• The samples must be independent.
• The variances of the populations must be equal.
• The groups must have the same sample size.

Proc anova data=time;
class gender race;
model time=gender race;
run;
SAS code for Two-way ANOVA
Output of two-way ANOVA

ANCOVA
Analysis of covariance (ANCOVA) is a general linear model which blends
ANOVA and regression.

ANCOVA
• ANCOVA by definition is a general linear model that includes both ANOVA (categorical)
predictors and Regression (continuous) predictors.
• ANCOVA examines the influence of an independent variable on a dependent variable
while removing the effect of the covariate factor.
• ANCOVA first conducts a regression of the independent variable (i.e., the covariate) on
the dependent variable.
• The residuals (the unexplained variance in the regression model) are then subject to an
ANOVA.

ANCOVA in other words…
• Analysis of Covariance (ANCOVA) is a statistical test related to ANOVA
• It tests whether there is a significant difference between groups after controlling for
variance explained by a covariate
• A covariate (CV) is a continuous variable that correlates with the dependent variable (DV)
• This is one way that you can run a statistical test with both categorical and continuous
independent variables
Purposes of ANCOVA
Increase sensitivity of F test
• Removes predictable variance from the error term
• Improves power of the analysis

Adjustment of Covariate Effect
Partitioning variance in ANOVA
Variance
Variance due to Treatment Within cell variance(Error)
Variance due to Covariate
Partitioning variance in ANCOVA
Variance
Variance due to Treatment Within Cell Varaince(Error)

Relationship between CV and DV
Hypotheses for ANCOVA
H0: the group means are equal after controlling for the covariate
Vs
H1: the group means are not equal after controlling for the covariate

• linearity of regression: The regression relationship between the dependent variable and
concomitant variables must be linear.
• homogeneity of error variances: The error is a random variable zero mean and equal
variances for different treatment classes and observations
• independence of error terms: The errors are uncorrelated. That is that the error
covariance matrix is diagonal.
• normality of error terms: The residual (error terms) should be normally distributed
𝜀𝑖𝑗~𝑁(0, 𝜎2)
• homogeneity of regression slopes: The slopes of the different regression lines should be
equivalent, i.e., regression lines should be parallel among groups.
Assumptions

Choosing Covariates
Variables that affect or have the potential to affect the dependent variable
• Demographic information
• Inherent characteristics
• Differences in group characteristics due to sampling
Number of covariates depends on:
• Known relationship or previous research
• No. of independent variables or groups
• Total no. of subjects
Example
Suppose we want to compare the effect of drugs on the weights of a particular group of
patients(homogeneous among themselves). We can analyze the data by performing the ANCOVA by
regarding:
𝑦: the final weight of the patients taking drugs, after a specified period as the response variable.
𝑥: the initial weight of the animals at the time of starting the experiment as the covariate.

Model
So, our model becomes:
𝑦𝑖𝑗 = 𝜇 + 𝛼𝑖 + 𝛽 𝑥𝑖𝑗 − 𝑥00 + 𝜀𝑖𝑗 Where,
• 𝜇 is the general mean effect
• 𝛼𝑖 is the (fixed) additional effect due to the 𝑖th treatment (i=1,2,….,p)
• 𝜀𝑖𝑗 is the random error effect (j=1,2,…, 𝑛𝑖)
• 𝛽 is the coefficient of regression of y on x
• 𝑥𝑖𝑗 is the value of covariate variable corresponding to the response variable 𝑦𝑖𝑗
• PROC MIXED data =<name of the dataset>;
CLASS variables ;
MODEL dependent = < fixed-effects > <covariates> < / options > ;
LSMEANS fixed-effects < / options > ;
run;
SAS Codes in ANCOVA

Anova, ancova

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Anova, ancova

Similar to Anova, ancova (20)

Recently uploaded

Recently uploaded (20)

Anova, ancova

Editor's Notes