SlideShare a Scribd company logo
1 of 15
Download to read offline
Multivariate Analysis using R
Dipika Patra
21 July 2016
1. One Sample Hotelling T- square test :
Descriptive Aspect :
Changes in Pulmonary Response of 12 workers after 6 hours of Exposure to Cotton dust measured by 3
following variables.
FVC - changes in Forced Vital Capacity (FVC) after 6 hours
FEV - Changes in Forced Expiratory Volume (FEV) after 6 hours
CC - Changes in Closing Capacity (CC) after 6 hours
Inferential Aspect :
Hypothesis 1 :
The null hypothesis : NO CHANGE IN PULMONARY FUNCTION against
The alternative hypothesis : CHANGE IN PULMONARY FUNCTION
In order to test the hypothesis we use the distribution of One Sample Hotelling T- square ,obtained by
Hotelling(1931).
The distribution is indexed by two parameters,the dimension p=3 and the degrees of freedom v=10-1= 9.
We Reject the null hypothesis if the calculated T-square is greater than the tabulated T-square value with
p=3, v=9 at 5% level of signiļ¬cance or with respect to the P - value, we reject the null hypothesis if it is less
than 0.05.
ā€¢ R Codes (to call the data) with Output:
library(ICSNP)
## Loading required package: mvtnorm
## Loading required package: ICS
data(pulmonary)
pulmonary
## FVC FEV CC
## 1 -0.11 -0.12 -4.3
## 2 0.02 0.08 4.4
## 3 -0.02 0.03 7.5
## 4 0.07 0.19 -0.3
## 5 -0.16 -0.36 -5.8
1
## 6 -0.42 -0.49 14.5
## 7 -0.32 -0.48 -1.9
## 8 -0.35 -0.30 17.3
## 9 -0.10 -0.04 2.5
## 10 0.01 -0.02 -5.6
## 11 -0.10 -0.17 2.2
## 12 -0.26 -0.30 5.5
ā€¢ R codes (to test the Hypothesis 1) with Output:
attach(pulmonary)
HotellingsT2(pulmonary, mu = c(0,0,0))
##
## Hotelling's one sample T2-test
##
## data: pulmonary
## T.2 = 3.8231, df1 = 3, df2 = 9, p-value = 0.05123
## alternative hypothesis: true location is not equal to c(0,0,0)
Conclusion:
Since the P-value=0.05123 which is greater than 0.05 , we accept the null hypothesis based on the given
sample.That is , Based on the given sample we conclude that there is no signiļ¬cant change in pulmonary
function.
Hypothesis 2:
The null hypothesis : CHANGES ONLY IN CLOSING CAPACITY WITH 2 UNIT against
The alternative hypothesis : OTHER THAN NULL HYPOTHESIS
To test the above discussed hypothesis following codes are used.
ā€¢ R Codes (to test the Hypothesis 2) with Output:
HotellingsT2(pulmonary, mu = c(0,0,2))
##
## Hotelling's one sample T2-test
##
## data: pulmonary
## T.2 = 6.6204, df1 = 3, df2 = 9, p-value = 0.01178
## alternative hypothesis: true location is not equal to c(0,0,2)
Conclusion:
Since the P-value=0.01178 which is greater than 0.01 , we accept the null hypothesis based on the given
sample.That is , Based on the given sample we conclude that the change in pulmonary function by closing
capacity with 2 unit.
2
2. Two Sample Hotelling T-square test:
Generating Data Using R code:
Considering the situation of rating two teachers by two independent groups with 3 and 6 numbers of students
respectively based on satisfaction level (scale of 6) and knowledge level (scale of 10) :
math.teach <- data.frame(teacher=factor(rep(1:2,c(3, 6))), satisfaction = c(1, 3,2, 4, 6, 6, 5,5, 4), kn
math.teach
## teacher satisfaction knowledge
## 1 1 1 3
## 2 1 3 7
## 3 1 2 2
## 4 2 4 6
## 5 2 6 8
## 6 2 6 8
## 7 2 5 10
## 8 2 5 10
## 9 2 4 6
Graphical Display ( Multiple Boxplot) :
1
2 4 6 8 10
Teacher Knowledge
1
1 2 3 4 5 6
Teacher Satisfaction
Hypothesis 1:
3
Consider testing the null hypothesis that the two groups have identical mean vectors. This is represented
below as well as the general alternative that the mean vectors are not equal.
The null hypothesis : NO DIFFERENCE IN RATING BETWEEN TWO GROUPS against
The alternative hypothesis : CHANGE IN RATING BETWEEN TWO GROUPS
In order to test the hypothesis we use the distribution of Two Sample Hotelling T- square. Hotelling T-Square
Statistic for two sample:
T2
= X1 āˆ’ X2 Sāˆ’1
p (
1
n1
+
1
n2
)
āˆ’1
X1 āˆ’ X2
where
sp
be the pooled variance covariance matrix. And
T2
āˆ¼ F(p, n1 + n2 āˆ’ p āˆ’ 1)
p(n1 + n2 āˆ’ 2)
(n1 + n2 āˆ’ p āˆ’ 1)
The distribution is indexed by two parameters,the dimension p=2 and the degrees of freedom
v = n1 + n2 āˆ’ p āˆ’ 1 = 3 + 6 āˆ’ 2 āˆ’ 1 = 6
. We Reject the null hypothesis if the calculated T-square is greater than the tabulated T-square value with
p=2, v=6 at 1% level of signiļ¬cance or with respect to the P - value, we reject the null hypothesis if it is less
than 0.01.
ā€¢ R Codes (to test the Hypothesis 1) with Output:
attach(math.teach)
HotellingsT2(cbind(satisfaction, knowledge) ~
teacher, mu=c(0,0))
##
## Hotelling's two sample T2-test
##
## data: cbind(satisfaction, knowledge) by teacher
## T.2 = 9, df1 = 2, df2 = 6, p-value = 0.01562
## alternative hypothesis: true location difference is not equal to c(0,0)
Conclusion:
Since the P-value=0.01562 which is greater than 0.01 , we accept the null hypothesis based on the given
sample.That is , Based on the given sample we conclude that there is no signiļ¬cant diļ¬€erence in rating
between two groups.
Hypothesis 2:
Consider testing the null hypothesis that the two groups have
(āˆ’1, 1)
mean vectors. This is represented below as well as the general alternative that the mean vectors are other
than that.
4
The null hypothesis : GIVEN SPECIFIC CHANGE IN RATING BETWEEN TWO GROUPS against
The alternative hypothesis : CHANGE IN RATING BETWEEN TWO GROUPS OTHER THAN GIVEN
In order to test the hypothesis we use the distribution of Two Sample Hotelling T- square.
ā€¢ R Codes (to test the Hypothesis 2) with Output:
HotellingsT2(cbind(satisfaction, knowledge) ~teacher, mu=c(-1,1))
##
## Hotelling's two sample T2-test
##
## data: cbind(satisfaction, knowledge) by teacher
## T.2 = 5.6897, df1 = 2, df2 = 6, p-value = 0.04115
## alternative hypothesis: true location difference is not equal to c(-1,1)
Conclusion:
Since the P-value=0.04115 which is less than 0.05 , we reject the null hypothesis based on the given sample.That
is , Based on the given sample we conclude that there is signiļ¬cant diļ¬€erence in rating between two groups
other than the mean vector (-1,1)ā€™.
Hypothesis 3:
Consider testing the null hypothesis that the two groups have
(1, 1)
mean vectors. This is represented below as well as the general alternative that the mean vectors are other
than that.
The null hypothesis : GIVEN SPECIFIC CHANGE IN RATING BETWEEN TWO GROUPS against
The alternative hypothesis : CHANGE IN RATING BETWEEN TWO GROUPS OTHER THAN THE
GIVEN
In order to test the hypothesis we use the distribution of Two Sample Hotelling T- square.
ā€¢ R Codes (to test the Hypothesis 3) with Output:
HotellingsT2(cbind(satisfaction, knowledge) ~teacher, mu=c(1,1))
##
## Hotelling's two sample T2-test
##
## data: cbind(satisfaction, knowledge) by teacher
## T.2 = 16.034, df1 = 2, df2 = 6, p-value = 0.003915
## alternative hypothesis: true location difference is not equal to c(1,1)
5
Conclusion:
Since the P-value=0.003915 which is less than 0.01 , we reject the null hypothesis based on the given
sample.That is , Based on the given sample we conclude that there is signiļ¬cant diļ¬€erence in rating between
two groups other than the mean vector (1,1)ā€™.
Hypothesis 4:
Consider testing the null hypothesis that the two groups have
(2, 2)
mean vectors. This is represented below as well as the general alternative that the mean vectors are other
than that.
The null hypothesis : GIVEN SPECIFIC CHANGE IN RATING BETWEEN TWO GROUPS against
The alternative hypothesis : CHANGE IN RATING BETWEEN TWO GROUPS OTHER THAN THE
GIVEN
To test the hypothesis we use the distribution of Two Sample Hotelling T- square.
ā€¢ R Codes (to test the Hypothesis 4) with Output:
HotellingsT2(cbind(satisfaction, knowledge) ~teacher, mu=c(2,2))
##
## Hotelling's two sample T2-test
##
## data: cbind(satisfaction, knowledge) by teacher
## T.2 = 25.138, df1 = 2, df2 = 6, p-value = 0.001212
## alternative hypothesis: true location difference is not equal to c(2,2)
Conclusion:
Since the P-value=0.001212 which is less than 0.01 , we reject the null hypothesis based on the given
sample.That is , Based on the given sample we conclude that there is signiļ¬cant diļ¬€erence in rating between
two groups other than the mean vector (2,2)ā€™.
3. Two Way MANOVA(two factors) :
Descriptive Aspect :
Triathlon performance :- Multi-sport race where 60 competitors complete swim course, bike course, and run
course, in that order.
Factors :- gender (2), age group (3) and interaction (6)
6
Research Question :
If gender (M/F) or age category (CAT1,CAT2,CAT3) has an eļ¬€ect on the times for the individual sports.
ā€¢ R Codes (to read the data from CSV ļ¬le) with Output:
getwd()
## [1] "C:/Users/User/Documents"
data.manova<-read.csv("triathlon.csv",header = T)
data.manova
## GENDER CATEGORY SWIM BIKE RUN
## 1 F CAT1 52 252 145
## 2 F CAT1 44 238 163
## 3 F CAT1 42 196 83
## 4 F CAT1 46 238 179
## 5 F CAT1 42 238 136
## 6 F CAT1 38 203 176
## 7 F CAT1 50 336 95
## 8 F CAT1 40 196 152
## 9 F CAT1 42 238 108
## 10 F CAT1 40 266 132
## 11 F CAT2 34 322 147
## 12 F CAT2 42 238 161
## 13 F CAT2 34 217 173
## 14 F CAT2 36 217 154
## 15 F CAT2 46 252 120
## 16 F CAT2 38 182 143
## 17 F CAT2 32 245 126
## 18 F CAT2 38 231 162
## 19 F CAT2 30 161 150
## 20 F CAT2 28 210 136
## 21 F CAT3 28 182 111
## 22 F CAT3 28 217 119
## 23 F CAT3 32 210 141
## 24 F CAT3 32 238 168
## 25 F CAT3 26 210 171
## 26 F CAT3 26 189 123
## 27 F CAT3 24 147 89
## 28 F CAT3 30 217 140
## 29 F CAT3 28 259 105
## 30 F CAT3 28 203 131
## 31 M CAT1 50 294 103
## 32 M CAT1 48 329 109
## 33 M CAT1 58 357 161
## 34 M CAT1 50 245 87
## 35 M CAT1 52 259 172
## 36 M CAT1 56 308 178
## 37 M CAT1 50 308 152
## 38 M CAT1 48 343 170
7
## 39 M CAT1 48 301 115
## 40 M CAT1 52 252 123
## 41 M CAT2 36 224 151
## 42 M CAT2 38 273 150
## 43 M CAT2 34 259 133
## 44 M CAT2 34 217 90
## 45 M CAT2 38 252 172
## 46 M CAT2 38 224 80
## 47 M CAT2 34 217 171
## 48 M CAT2 42 287 164
## 49 M CAT2 36 252 166
## 50 M CAT2 40 280 154
## 51 M CAT3 22 196 143
## 52 M CAT3 20 196 167
## 53 M CAT3 20 175 127
## 54 M CAT3 24 154 80
## 55 M CAT3 22 189 152
## 56 M CAT3 24 175 137
## 57 M CAT3 28 231 125
## 58 M CAT3 26 217 156
## 59 M CAT3 24 196 149
## 60 M CAT3 22 161 113
ā€¢ Formating data to run MANOVA :
gender <- as.factor(data.manova[,1])
cat <- as.factor(data.manova[,2])
times <- as.matrix(data.manova[,3:5])
head(times)
## SWIM BIKE RUN
## [1,] 52 252 145
## [2,] 44 238 163
## [3,] 42 196 83
## [4,] 46 238 179
## [5,] 42 238 136
## [6,] 38 203 176
ā€¢ R Code for two way MANOVA considering interaction eļ¬€ect of gender & age :
output <- manova(times~gender*cat)
output
## Call:
## manova(times ~ gender * cat)
##
## Terms:
## gender cat gender:cat Residuals
## resp 1 24.07 4709.20 396.93 738.80
## resp 2 6468.82 51696.63 15093.63 65321.90
## resp 3 2.02 1681.60 212.13 43755.90
## Deg. of Freedom 1 2 2 54
8
##
## Residual standard errors: 3.698849 34.78024 28.46567
## Estimated effects may be unbalanced
Wilkā€™s lambda test :
summary(output, test="Wilks")
## Df Wilks approx F num Df den Df Pr(>F)
## gender 1 0.90547 1.8095 3 52 0.1568890
## cat 2 0.12952 30.8289 6 104 < 2.2e-16 ***
## gender:cat 2 0.62497 4.5923 6 104 0.0003562 ***
## Residuals 54
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Pillaiā€™s trace test ( default in R):
summary(output, test="Pillai")
## Df Pillai approx F num Df den Df Pr(>F)
## gender 1 0.09453 1.8095 3 52 0.15689
## cat 2 0.90048 14.4686 6 106 5.338e-12 ***
## gender:cat 2 0.37636 4.0952 6 106 0.00098 ***
## Residuals 54
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Hotelling-Lawleyā€™s trace test :
summary(output, test="Hotelling")
## Df Hotelling-Lawley approx F num Df den Df Pr(>F)
## gender 1 0.1044 1.810 3 52 0.1568890
## cat 2 6.4889 55.156 6 102 < 2.2e-16 ***
## gender:cat 2 0.5979 5.083 6 102 0.0001335 ***
## Residuals 54
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ā€¢ R Code with output for separate study for each response :
summary.aov(output)
## Response SWIM :
## Df Sum Sq Mean Sq F value Pr(>F)
## gender 1 24.1 24.07 1.7591 0.1903
## cat 2 4709.2 2354.60 172.1012 < 2.2e-16 ***
## gender:cat 2 396.9 198.47 14.5062 9.073e-06 ***
## Residuals 54 738.8 13.68
9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response BIKE :
## Df Sum Sq Mean Sq F value Pr(>F)
## gender 1 6469 6468.8 5.3476 0.024591 *
## cat 2 51697 25848.3 21.3682 1.458e-07 ***
## gender:cat 2 15094 7546.8 6.2388 0.003651 **
## Residuals 54 65322 1209.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response RUN :
## Df Sum Sq Mean Sq F value Pr(>F)
## gender 1 2 2.02 0.0025 0.9604
## cat 2 1682 840.80 1.0376 0.3612
## gender:cat 2 212 106.07 0.1309 0.8776
## Residuals 54 43756 810.29
ā€¢ R Code for two way MANOVA without interaction eļ¬€ect of gender & age :
manova(times~gender*cat)
## Call:
## manova(times ~ gender * cat)
##
## Terms:
## gender cat gender:cat Residuals
## resp 1 24.07 4709.20 396.93 738.80
## resp 2 6468.82 51696.63 15093.63 65321.90
## resp 3 2.02 1681.60 212.13 43755.90
## Deg. of Freedom 1 2 2 54
##
## Residual standard errors: 3.698849 34.78024 28.46567
## Estimated effects may be unbalanced
ā€¢ Paticular eļ¬€ect of gender :
manova(times~gender)
## Call:
## manova(times ~ gender)
##
## Terms:
## gender Residuals
## resp 1 24.07 5844.93
## resp 2 6468.82 132112.17
## resp 3 2.02 45649.63
## Deg. of Freedom 1 58
##
## Residual standard errors: 10.03866 47.72626 28.05464
## Estimated effects may be unbalanced
10
ā€¢ Paticular eļ¬€ect of age category :
manova(times~cat)
## Call:
## manova(times ~ cat)
##
## Terms:
## cat Residuals
## resp 1 4709.2 1159.8
## resp 2 51696.63 86884.35
## resp 3 1681.60 43970.05
## Deg. of Freedom 2 57
##
## Residual standard errors: 4.510806 39.04212 27.77417
## Estimated effects may be unbalanced
4. Principal Component Analysis :
Descriptive Aspect :
Switzerland, in 1888, was entering a period known as the demographic transition; i.e., its fertility was
beginning to fall from the high level typical of underdeveloped countries.The data collected are for 47
French-speaking ā€œprovincesā€ on 6 following variables.
Fertility - Common standardized fertility measure
Agriculture - Percentage of Male involved in Agriculture as occupation
Examination - Percentage of draftees receiving highest mark on army examination
Education - Percent Education beyond primay school for draftees
Catholic - Percentage of Catholic
Infant.Mortality - Live birth who lives less than 1 year
Here, all variables are scaled to [0, 100], where in the original, all but ā€œCatholicā€ were scaled to [0, 1].
Purpose :
-To reduce the dimensionality of data
-To decrease redundancy
-To identify the variables work together to create dynamics of the system
ā€¢ R Codes (to call the data) with Output:
library(psych)
attach(swiss)
11
Graphical Display to explore dependence structure:
Scatter plot of the data reveals strong corelation among the 6 variables.
plot(swiss, col="blue", pch=20)
Fertility
0 40 80 0 20 40 15 20 25
4080
060
Agriculture
Examination
525
030
Education
Catholic
060
40 60 80
1525
5 15 30 0 40 80
Infant.Mortality
round(cor(swiss),2)
## Fertility Agriculture Examination Education Catholic
## Fertility 1.00 0.35 -0.65 -0.66 0.46
## Agriculture 0.35 1.00 -0.69 -0.64 0.40
## Examination -0.65 -0.69 1.00 0.70 -0.57
## Education -0.66 -0.64 0.70 1.00 -0.15
## Catholic 0.46 0.40 -0.57 -0.15 1.00
## Infant.Mortality 0.42 -0.06 -0.11 -0.10 0.18
## Infant.Mortality
## Fertility 0.42
## Agriculture -0.06
## Examination -0.11
## Education -0.10
## Catholic 0.18
## Infant.Mortality 1.00
ā€¢ Principal Components :
12
swiss_pca<-prcomp(swiss,center=T,scale=T)
swiss_pca
## Standard deviations:
## [1] 1.7887865 1.0900955 0.9206573 0.6625169 0.4522540 0.3476529
##
## Rotation:
## PC1 PC2 PC3 PC4 PC5
## Fertility -0.4569876 0.3220284 -0.17376638 0.53555794 -0.38308893
## Agriculture -0.4242141 -0.4115132 0.03834472 -0.64291822 -0.37495215
## Examination 0.5097327 0.1250167 -0.09123696 -0.05446158 -0.81429082
## Education 0.4543119 0.1790495 0.53239316 -0.09738818 0.07144564
## Catholic -0.3501111 0.1458730 0.80680494 0.09947244 -0.18317236
## Infant.Mortality -0.1496668 0.8111645 -0.16010636 -0.52677184 0.10453530
## PC6
## Fertility 0.47295441
## Agriculture 0.30870058
## Examination -0.22401686
## Education 0.68081610
## Catholic -0.40219666
## Infant.Mortality -0.07457754
-Center and scale refers to respective mean and standard deviation of the variables that are used for
normalization prior to implementing PCA.
-The rotation measure provides the principal component loading. Each column of rotation matrix contains
the principal component loading vector.
ā€¢ Selection of Components :
The summary method describe the importance of the PCs. The ļ¬rst row describe again the standard deviation
associated with each PC. The second row shows the proportion of the variance in the data explained by each
component while the third row describe the cumulative proportion of explained variance.
summary(swiss_pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 1.7888 1.0901 0.9207 0.66252 0.45225 0.34765
## Proportion of Variance 0.5333 0.1981 0.1413 0.07315 0.03409 0.02014
## Cumulative Proportion 0.5333 0.7313 0.8726 0.94577 0.97986 1.00000
We can see there that the ļ¬rst two Principal Components accounts for more than 70 % of the variance of the
data and considering the third principal component 85% of the data variability explained.
ā€¢ Graphical selection by Screeplot:
The plot method returns a plot of the variances (y-axis) associated with the PCs (x-axis). The Figure below
is useful to decide how many PCs to retain for further analysis.
An eigenvalue > 1 indicates that PCs account for more variance than accounted by one of the original
variables in standardized data. This is commonly used as a cutoļ¬€ point for which PCs are retained.
In this case,we can see that the ļ¬rst two PCs explain most of the variability in the data.
13
library("factoextra")
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
eig.val <- get_eigenvalue(swiss_pca)
head(eig.val)
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 3.1997570 53.329283 53.32928
## Dim.2 1.1883082 19.805137 73.13442
## Dim.3 0.8476098 14.126830 87.26125
## Dim.4 0.4389287 7.315478 94.57673
## Dim.5 0.2045337 3.408895 97.98562
## Dim.6 0.1208626 2.014376 100.00000
fviz_screeplot(swiss_pca,ncp=6, choice="eigenvalue")
0
1
2
3
1 2 3 4 5 6
Dimensions
Eigenvalue
Scree plot
14
Report :
We select out responsible variables whose contribution to the principal components is signiļ¬cant (with loading
beyond Ā± 0.5). Responsible Varibles with their loadings :
ā€¢ In PC1 ā€œExamination(0.50)ā€
ā€¢ In PC2 ā€œInfant.Mortality(0.81)ā€
15

More Related Content

What's hot

Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsBabasab Patil
Ā 
4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regressionA M
Ā 
Generalized linear model
Generalized linear modelGeneralized linear model
Generalized linear modelRahul Rockers
Ā 
Maximum likelihood estimation
Maximum likelihood estimationMaximum likelihood estimation
Maximum likelihood estimationzihad164
Ā 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAnand Thokal
Ā 
Test of significance
Test of significanceTest of significance
Test of significanceDr Bushra Jabeen
Ā 
Mann Whitney U Test | Statistics
Mann Whitney U Test | StatisticsMann Whitney U Test | Statistics
Mann Whitney U Test | StatisticsTransweb Global Inc
Ā 
Mann Whitney U test
Mann Whitney U testMann Whitney U test
Mann Whitney U testDr. Ankit Gaur
Ā 
Correlation and Simple Regression
Correlation  and Simple RegressionCorrelation  and Simple Regression
Correlation and Simple RegressionVenkata Reddy Konasani
Ā 
Karl pearson's correlation
Karl pearson's correlationKarl pearson's correlation
Karl pearson's correlationfairoos1
Ā 
The kolmogorov smirnov test
The kolmogorov smirnov testThe kolmogorov smirnov test
The kolmogorov smirnov testSubhradeep Mitra
Ā 
Ordinal logistic regression
Ordinal logistic regression Ordinal logistic regression
Ordinal logistic regression Dr Athar Khan
Ā 
Hypothesis testing ppt final
Hypothesis testing ppt finalHypothesis testing ppt final
Hypothesis testing ppt finalpiyushdhaker
Ā 
Imputation techniques for missing data in clinical trials
Imputation techniques for missing data in clinical trialsImputation techniques for missing data in clinical trials
Imputation techniques for missing data in clinical trialsNitin George
Ā 

What's hot (20)

Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
Ā 
Z test, f-test,etc
Z test, f-test,etcZ test, f-test,etc
Z test, f-test,etc
Ā 
4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regression
Ā 
Generalized linear model
Generalized linear modelGeneralized linear model
Generalized linear model
Ā 
Chi square mahmoud
Chi square mahmoudChi square mahmoud
Chi square mahmoud
Ā 
Time series Analysis
Time series AnalysisTime series Analysis
Time series Analysis
Ā 
Maximum likelihood estimation
Maximum likelihood estimationMaximum likelihood estimation
Maximum likelihood estimation
Ā 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Ā 
Test of significance
Test of significanceTest of significance
Test of significance
Ā 
Mann Whitney U Test | Statistics
Mann Whitney U Test | StatisticsMann Whitney U Test | Statistics
Mann Whitney U Test | Statistics
Ā 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
Ā 
t test
t testt test
t test
Ā 
Z test
Z testZ test
Z test
Ā 
Mann Whitney U test
Mann Whitney U testMann Whitney U test
Mann Whitney U test
Ā 
Correlation and Simple Regression
Correlation  and Simple RegressionCorrelation  and Simple Regression
Correlation and Simple Regression
Ā 
Karl pearson's correlation
Karl pearson's correlationKarl pearson's correlation
Karl pearson's correlation
Ā 
The kolmogorov smirnov test
The kolmogorov smirnov testThe kolmogorov smirnov test
The kolmogorov smirnov test
Ā 
Ordinal logistic regression
Ordinal logistic regression Ordinal logistic regression
Ordinal logistic regression
Ā 
Hypothesis testing ppt final
Hypothesis testing ppt finalHypothesis testing ppt final
Hypothesis testing ppt final
Ā 
Imputation techniques for missing data in clinical trials
Imputation techniques for missing data in clinical trialsImputation techniques for missing data in clinical trials
Imputation techniques for missing data in clinical trials
Ā 

Similar to Multivariate1

Two dependent samples (matched pairs)
Two dependent samples (matched pairs) Two dependent samples (matched pairs)
Two dependent samples (matched pairs) Long Beach City College
Ā 
Two variances or standard deviations
Two variances or standard deviations  Two variances or standard deviations
Two variances or standard deviations Long Beach City College
Ā 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testingDenni Domingo
Ā 
Hypothesis Test _Two-sample t-test, Z-test, Proportion Z-test
Hypothesis Test _Two-sample t-test, Z-test, Proportion Z-testHypothesis Test _Two-sample t-test, Z-test, Proportion Z-test
Hypothesis Test _Two-sample t-test, Z-test, Proportion Z-testRavindra Nath Shukla
Ā 
10 ch ken black solution
10 ch ken black solution10 ch ken black solution
10 ch ken black solutionKrunal Shah
Ā 
Chapter11
Chapter11Chapter11
Chapter11rwmiller
Ā 
Two Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched PairsTwo Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched PairsLong Beach City College
Ā 
Experimental design data analysis
Experimental design data analysisExperimental design data analysis
Experimental design data analysismetalkid132
Ā 
Two Variances or Standard Deviations
Two Variances or Standard DeviationsTwo Variances or Standard Deviations
Two Variances or Standard DeviationsLong Beach City College
Ā 
T- Distribution Report
T- Distribution ReportT- Distribution Report
T- Distribution ReportBahzad5
Ā 
Hypothesis testing - II.pptx
Hypothesis testing - II.pptxHypothesis testing - II.pptx
Hypothesis testing - II.pptxShashvatSingh12
Ā 
Lesson06_new
Lesson06_newLesson06_new
Lesson06_newshengvn
Ā 
section11_Nonparametric.ppt
section11_Nonparametric.pptsection11_Nonparametric.ppt
section11_Nonparametric.pptssuser44b4b7
Ā 
Topic 1 part 2
Topic 1 part 2Topic 1 part 2
Topic 1 part 2Ryan Herzog
Ā 
t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750richardchandler
Ā 

Similar to Multivariate1 (20)

Chapter07.pdf
Chapter07.pdfChapter07.pdf
Chapter07.pdf
Ā 
Two Means, Independent Samples
Two Means, Independent SamplesTwo Means, Independent Samples
Two Means, Independent Samples
Ā 
Two dependent samples (matched pairs)
Two dependent samples (matched pairs) Two dependent samples (matched pairs)
Two dependent samples (matched pairs)
Ā 
Two variances or standard deviations
Two variances or standard deviations  Two variances or standard deviations
Two variances or standard deviations
Ā 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
Ā 
Hypothesis Test _Two-sample t-test, Z-test, Proportion Z-test
Hypothesis Test _Two-sample t-test, Z-test, Proportion Z-testHypothesis Test _Two-sample t-test, Z-test, Proportion Z-test
Hypothesis Test _Two-sample t-test, Z-test, Proportion Z-test
Ā 
10 ch ken black solution
10 ch ken black solution10 ch ken black solution
10 ch ken black solution
Ā 
Chapter11
Chapter11Chapter11
Chapter11
Ā 
Chapter11
Chapter11Chapter11
Chapter11
Ā 
Two Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched PairsTwo Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched Pairs
Ā 
Two Means Independent Samples
Two Means Independent Samples  Two Means Independent Samples
Two Means Independent Samples
Ā 
Experimental design data analysis
Experimental design data analysisExperimental design data analysis
Experimental design data analysis
Ā 
Two Variances or Standard Deviations
Two Variances or Standard DeviationsTwo Variances or Standard Deviations
Two Variances or Standard Deviations
Ā 
T- Distribution Report
T- Distribution ReportT- Distribution Report
T- Distribution Report
Ā 
Hypothesis testing - II.pptx
Hypothesis testing - II.pptxHypothesis testing - II.pptx
Hypothesis testing - II.pptx
Ā 
Mech ma6452 snm_notes
Mech ma6452 snm_notesMech ma6452 snm_notes
Mech ma6452 snm_notes
Ā 
Lesson06_new
Lesson06_newLesson06_new
Lesson06_new
Ā 
section11_Nonparametric.ppt
section11_Nonparametric.pptsection11_Nonparametric.ppt
section11_Nonparametric.ppt
Ā 
Topic 1 part 2
Topic 1 part 2Topic 1 part 2
Topic 1 part 2
Ā 
t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750
Ā 

More from Seth Anandaram Jaipuria College (8)

MBA project
MBA projectMBA project
MBA project
Ā 
Assignment in regression1
Assignment in regression1Assignment in regression1
Assignment in regression1
Ā 
Basic statistics
Basic statisticsBasic statistics
Basic statistics
Ā 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
Ā 
Factor Analysis with an Example
Factor Analysis with an ExampleFactor Analysis with an Example
Factor Analysis with an Example
Ā 
Multivariate analysis for 26 rice grain varieties
Multivariate analysis for 26 rice grain varietiesMultivariate analysis for 26 rice grain varieties
Multivariate analysis for 26 rice grain varieties
Ā 
Multiple reg presentation
Multiple reg presentationMultiple reg presentation
Multiple reg presentation
Ā 
Time series
Time seriesTime series
Time series
Ā 

Recently uploaded

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
Ā 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
Ā 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
Ā 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
Ā 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
Ā 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
Ā 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
Ā 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
Ā 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
Ā 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
Ā 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
Ā 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
Ā 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
Ā 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
Ā 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
Ā 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
Ā 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
Ā 
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
Ā 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
Ā 

Recently uploaded (20)

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
Ā 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
Ā 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
Ā 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
Ā 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
Ā 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
Ā 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
Ā 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
Ā 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
Ā 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
Ā 
Model Call Girl in Tilak Nagar Delhi reach out to us at šŸ”9953056974šŸ”
Model Call Girl in Tilak Nagar Delhi reach out to us at šŸ”9953056974šŸ”Model Call Girl in Tilak Nagar Delhi reach out to us at šŸ”9953056974šŸ”
Model Call Girl in Tilak Nagar Delhi reach out to us at šŸ”9953056974šŸ”
Ā 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Ā 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
Ā 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
Ā 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
Ā 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Ā 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Ā 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
Ā 
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
Ā 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
Ā 

Multivariate1

  • 1. Multivariate Analysis using R Dipika Patra 21 July 2016 1. One Sample Hotelling T- square test : Descriptive Aspect : Changes in Pulmonary Response of 12 workers after 6 hours of Exposure to Cotton dust measured by 3 following variables. FVC - changes in Forced Vital Capacity (FVC) after 6 hours FEV - Changes in Forced Expiratory Volume (FEV) after 6 hours CC - Changes in Closing Capacity (CC) after 6 hours Inferential Aspect : Hypothesis 1 : The null hypothesis : NO CHANGE IN PULMONARY FUNCTION against The alternative hypothesis : CHANGE IN PULMONARY FUNCTION In order to test the hypothesis we use the distribution of One Sample Hotelling T- square ,obtained by Hotelling(1931). The distribution is indexed by two parameters,the dimension p=3 and the degrees of freedom v=10-1= 9. We Reject the null hypothesis if the calculated T-square is greater than the tabulated T-square value with p=3, v=9 at 5% level of signiļ¬cance or with respect to the P - value, we reject the null hypothesis if it is less than 0.05. ā€¢ R Codes (to call the data) with Output: library(ICSNP) ## Loading required package: mvtnorm ## Loading required package: ICS data(pulmonary) pulmonary ## FVC FEV CC ## 1 -0.11 -0.12 -4.3 ## 2 0.02 0.08 4.4 ## 3 -0.02 0.03 7.5 ## 4 0.07 0.19 -0.3 ## 5 -0.16 -0.36 -5.8 1
  • 2. ## 6 -0.42 -0.49 14.5 ## 7 -0.32 -0.48 -1.9 ## 8 -0.35 -0.30 17.3 ## 9 -0.10 -0.04 2.5 ## 10 0.01 -0.02 -5.6 ## 11 -0.10 -0.17 2.2 ## 12 -0.26 -0.30 5.5 ā€¢ R codes (to test the Hypothesis 1) with Output: attach(pulmonary) HotellingsT2(pulmonary, mu = c(0,0,0)) ## ## Hotelling's one sample T2-test ## ## data: pulmonary ## T.2 = 3.8231, df1 = 3, df2 = 9, p-value = 0.05123 ## alternative hypothesis: true location is not equal to c(0,0,0) Conclusion: Since the P-value=0.05123 which is greater than 0.05 , we accept the null hypothesis based on the given sample.That is , Based on the given sample we conclude that there is no signiļ¬cant change in pulmonary function. Hypothesis 2: The null hypothesis : CHANGES ONLY IN CLOSING CAPACITY WITH 2 UNIT against The alternative hypothesis : OTHER THAN NULL HYPOTHESIS To test the above discussed hypothesis following codes are used. ā€¢ R Codes (to test the Hypothesis 2) with Output: HotellingsT2(pulmonary, mu = c(0,0,2)) ## ## Hotelling's one sample T2-test ## ## data: pulmonary ## T.2 = 6.6204, df1 = 3, df2 = 9, p-value = 0.01178 ## alternative hypothesis: true location is not equal to c(0,0,2) Conclusion: Since the P-value=0.01178 which is greater than 0.01 , we accept the null hypothesis based on the given sample.That is , Based on the given sample we conclude that the change in pulmonary function by closing capacity with 2 unit. 2
  • 3. 2. Two Sample Hotelling T-square test: Generating Data Using R code: Considering the situation of rating two teachers by two independent groups with 3 and 6 numbers of students respectively based on satisfaction level (scale of 6) and knowledge level (scale of 10) : math.teach <- data.frame(teacher=factor(rep(1:2,c(3, 6))), satisfaction = c(1, 3,2, 4, 6, 6, 5,5, 4), kn math.teach ## teacher satisfaction knowledge ## 1 1 1 3 ## 2 1 3 7 ## 3 1 2 2 ## 4 2 4 6 ## 5 2 6 8 ## 6 2 6 8 ## 7 2 5 10 ## 8 2 5 10 ## 9 2 4 6 Graphical Display ( Multiple Boxplot) : 1 2 4 6 8 10 Teacher Knowledge 1 1 2 3 4 5 6 Teacher Satisfaction Hypothesis 1: 3
  • 4. Consider testing the null hypothesis that the two groups have identical mean vectors. This is represented below as well as the general alternative that the mean vectors are not equal. The null hypothesis : NO DIFFERENCE IN RATING BETWEEN TWO GROUPS against The alternative hypothesis : CHANGE IN RATING BETWEEN TWO GROUPS In order to test the hypothesis we use the distribution of Two Sample Hotelling T- square. Hotelling T-Square Statistic for two sample: T2 = X1 āˆ’ X2 Sāˆ’1 p ( 1 n1 + 1 n2 ) āˆ’1 X1 āˆ’ X2 where sp be the pooled variance covariance matrix. And T2 āˆ¼ F(p, n1 + n2 āˆ’ p āˆ’ 1) p(n1 + n2 āˆ’ 2) (n1 + n2 āˆ’ p āˆ’ 1) The distribution is indexed by two parameters,the dimension p=2 and the degrees of freedom v = n1 + n2 āˆ’ p āˆ’ 1 = 3 + 6 āˆ’ 2 āˆ’ 1 = 6 . We Reject the null hypothesis if the calculated T-square is greater than the tabulated T-square value with p=2, v=6 at 1% level of signiļ¬cance or with respect to the P - value, we reject the null hypothesis if it is less than 0.01. ā€¢ R Codes (to test the Hypothesis 1) with Output: attach(math.teach) HotellingsT2(cbind(satisfaction, knowledge) ~ teacher, mu=c(0,0)) ## ## Hotelling's two sample T2-test ## ## data: cbind(satisfaction, knowledge) by teacher ## T.2 = 9, df1 = 2, df2 = 6, p-value = 0.01562 ## alternative hypothesis: true location difference is not equal to c(0,0) Conclusion: Since the P-value=0.01562 which is greater than 0.01 , we accept the null hypothesis based on the given sample.That is , Based on the given sample we conclude that there is no signiļ¬cant diļ¬€erence in rating between two groups. Hypothesis 2: Consider testing the null hypothesis that the two groups have (āˆ’1, 1) mean vectors. This is represented below as well as the general alternative that the mean vectors are other than that. 4
  • 5. The null hypothesis : GIVEN SPECIFIC CHANGE IN RATING BETWEEN TWO GROUPS against The alternative hypothesis : CHANGE IN RATING BETWEEN TWO GROUPS OTHER THAN GIVEN In order to test the hypothesis we use the distribution of Two Sample Hotelling T- square. ā€¢ R Codes (to test the Hypothesis 2) with Output: HotellingsT2(cbind(satisfaction, knowledge) ~teacher, mu=c(-1,1)) ## ## Hotelling's two sample T2-test ## ## data: cbind(satisfaction, knowledge) by teacher ## T.2 = 5.6897, df1 = 2, df2 = 6, p-value = 0.04115 ## alternative hypothesis: true location difference is not equal to c(-1,1) Conclusion: Since the P-value=0.04115 which is less than 0.05 , we reject the null hypothesis based on the given sample.That is , Based on the given sample we conclude that there is signiļ¬cant diļ¬€erence in rating between two groups other than the mean vector (-1,1)ā€™. Hypothesis 3: Consider testing the null hypothesis that the two groups have (1, 1) mean vectors. This is represented below as well as the general alternative that the mean vectors are other than that. The null hypothesis : GIVEN SPECIFIC CHANGE IN RATING BETWEEN TWO GROUPS against The alternative hypothesis : CHANGE IN RATING BETWEEN TWO GROUPS OTHER THAN THE GIVEN In order to test the hypothesis we use the distribution of Two Sample Hotelling T- square. ā€¢ R Codes (to test the Hypothesis 3) with Output: HotellingsT2(cbind(satisfaction, knowledge) ~teacher, mu=c(1,1)) ## ## Hotelling's two sample T2-test ## ## data: cbind(satisfaction, knowledge) by teacher ## T.2 = 16.034, df1 = 2, df2 = 6, p-value = 0.003915 ## alternative hypothesis: true location difference is not equal to c(1,1) 5
  • 6. Conclusion: Since the P-value=0.003915 which is less than 0.01 , we reject the null hypothesis based on the given sample.That is , Based on the given sample we conclude that there is signiļ¬cant diļ¬€erence in rating between two groups other than the mean vector (1,1)ā€™. Hypothesis 4: Consider testing the null hypothesis that the two groups have (2, 2) mean vectors. This is represented below as well as the general alternative that the mean vectors are other than that. The null hypothesis : GIVEN SPECIFIC CHANGE IN RATING BETWEEN TWO GROUPS against The alternative hypothesis : CHANGE IN RATING BETWEEN TWO GROUPS OTHER THAN THE GIVEN To test the hypothesis we use the distribution of Two Sample Hotelling T- square. ā€¢ R Codes (to test the Hypothesis 4) with Output: HotellingsT2(cbind(satisfaction, knowledge) ~teacher, mu=c(2,2)) ## ## Hotelling's two sample T2-test ## ## data: cbind(satisfaction, knowledge) by teacher ## T.2 = 25.138, df1 = 2, df2 = 6, p-value = 0.001212 ## alternative hypothesis: true location difference is not equal to c(2,2) Conclusion: Since the P-value=0.001212 which is less than 0.01 , we reject the null hypothesis based on the given sample.That is , Based on the given sample we conclude that there is signiļ¬cant diļ¬€erence in rating between two groups other than the mean vector (2,2)ā€™. 3. Two Way MANOVA(two factors) : Descriptive Aspect : Triathlon performance :- Multi-sport race where 60 competitors complete swim course, bike course, and run course, in that order. Factors :- gender (2), age group (3) and interaction (6) 6
  • 7. Research Question : If gender (M/F) or age category (CAT1,CAT2,CAT3) has an eļ¬€ect on the times for the individual sports. ā€¢ R Codes (to read the data from CSV ļ¬le) with Output: getwd() ## [1] "C:/Users/User/Documents" data.manova<-read.csv("triathlon.csv",header = T) data.manova ## GENDER CATEGORY SWIM BIKE RUN ## 1 F CAT1 52 252 145 ## 2 F CAT1 44 238 163 ## 3 F CAT1 42 196 83 ## 4 F CAT1 46 238 179 ## 5 F CAT1 42 238 136 ## 6 F CAT1 38 203 176 ## 7 F CAT1 50 336 95 ## 8 F CAT1 40 196 152 ## 9 F CAT1 42 238 108 ## 10 F CAT1 40 266 132 ## 11 F CAT2 34 322 147 ## 12 F CAT2 42 238 161 ## 13 F CAT2 34 217 173 ## 14 F CAT2 36 217 154 ## 15 F CAT2 46 252 120 ## 16 F CAT2 38 182 143 ## 17 F CAT2 32 245 126 ## 18 F CAT2 38 231 162 ## 19 F CAT2 30 161 150 ## 20 F CAT2 28 210 136 ## 21 F CAT3 28 182 111 ## 22 F CAT3 28 217 119 ## 23 F CAT3 32 210 141 ## 24 F CAT3 32 238 168 ## 25 F CAT3 26 210 171 ## 26 F CAT3 26 189 123 ## 27 F CAT3 24 147 89 ## 28 F CAT3 30 217 140 ## 29 F CAT3 28 259 105 ## 30 F CAT3 28 203 131 ## 31 M CAT1 50 294 103 ## 32 M CAT1 48 329 109 ## 33 M CAT1 58 357 161 ## 34 M CAT1 50 245 87 ## 35 M CAT1 52 259 172 ## 36 M CAT1 56 308 178 ## 37 M CAT1 50 308 152 ## 38 M CAT1 48 343 170 7
  • 8. ## 39 M CAT1 48 301 115 ## 40 M CAT1 52 252 123 ## 41 M CAT2 36 224 151 ## 42 M CAT2 38 273 150 ## 43 M CAT2 34 259 133 ## 44 M CAT2 34 217 90 ## 45 M CAT2 38 252 172 ## 46 M CAT2 38 224 80 ## 47 M CAT2 34 217 171 ## 48 M CAT2 42 287 164 ## 49 M CAT2 36 252 166 ## 50 M CAT2 40 280 154 ## 51 M CAT3 22 196 143 ## 52 M CAT3 20 196 167 ## 53 M CAT3 20 175 127 ## 54 M CAT3 24 154 80 ## 55 M CAT3 22 189 152 ## 56 M CAT3 24 175 137 ## 57 M CAT3 28 231 125 ## 58 M CAT3 26 217 156 ## 59 M CAT3 24 196 149 ## 60 M CAT3 22 161 113 ā€¢ Formating data to run MANOVA : gender <- as.factor(data.manova[,1]) cat <- as.factor(data.manova[,2]) times <- as.matrix(data.manova[,3:5]) head(times) ## SWIM BIKE RUN ## [1,] 52 252 145 ## [2,] 44 238 163 ## [3,] 42 196 83 ## [4,] 46 238 179 ## [5,] 42 238 136 ## [6,] 38 203 176 ā€¢ R Code for two way MANOVA considering interaction eļ¬€ect of gender & age : output <- manova(times~gender*cat) output ## Call: ## manova(times ~ gender * cat) ## ## Terms: ## gender cat gender:cat Residuals ## resp 1 24.07 4709.20 396.93 738.80 ## resp 2 6468.82 51696.63 15093.63 65321.90 ## resp 3 2.02 1681.60 212.13 43755.90 ## Deg. of Freedom 1 2 2 54 8
  • 9. ## ## Residual standard errors: 3.698849 34.78024 28.46567 ## Estimated effects may be unbalanced Wilkā€™s lambda test : summary(output, test="Wilks") ## Df Wilks approx F num Df den Df Pr(>F) ## gender 1 0.90547 1.8095 3 52 0.1568890 ## cat 2 0.12952 30.8289 6 104 < 2.2e-16 *** ## gender:cat 2 0.62497 4.5923 6 104 0.0003562 *** ## Residuals 54 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Pillaiā€™s trace test ( default in R): summary(output, test="Pillai") ## Df Pillai approx F num Df den Df Pr(>F) ## gender 1 0.09453 1.8095 3 52 0.15689 ## cat 2 0.90048 14.4686 6 106 5.338e-12 *** ## gender:cat 2 0.37636 4.0952 6 106 0.00098 *** ## Residuals 54 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Hotelling-Lawleyā€™s trace test : summary(output, test="Hotelling") ## Df Hotelling-Lawley approx F num Df den Df Pr(>F) ## gender 1 0.1044 1.810 3 52 0.1568890 ## cat 2 6.4889 55.156 6 102 < 2.2e-16 *** ## gender:cat 2 0.5979 5.083 6 102 0.0001335 *** ## Residuals 54 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ā€¢ R Code with output for separate study for each response : summary.aov(output) ## Response SWIM : ## Df Sum Sq Mean Sq F value Pr(>F) ## gender 1 24.1 24.07 1.7591 0.1903 ## cat 2 4709.2 2354.60 172.1012 < 2.2e-16 *** ## gender:cat 2 396.9 198.47 14.5062 9.073e-06 *** ## Residuals 54 738.8 13.68 9
  • 10. ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Response BIKE : ## Df Sum Sq Mean Sq F value Pr(>F) ## gender 1 6469 6468.8 5.3476 0.024591 * ## cat 2 51697 25848.3 21.3682 1.458e-07 *** ## gender:cat 2 15094 7546.8 6.2388 0.003651 ** ## Residuals 54 65322 1209.7 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Response RUN : ## Df Sum Sq Mean Sq F value Pr(>F) ## gender 1 2 2.02 0.0025 0.9604 ## cat 2 1682 840.80 1.0376 0.3612 ## gender:cat 2 212 106.07 0.1309 0.8776 ## Residuals 54 43756 810.29 ā€¢ R Code for two way MANOVA without interaction eļ¬€ect of gender & age : manova(times~gender*cat) ## Call: ## manova(times ~ gender * cat) ## ## Terms: ## gender cat gender:cat Residuals ## resp 1 24.07 4709.20 396.93 738.80 ## resp 2 6468.82 51696.63 15093.63 65321.90 ## resp 3 2.02 1681.60 212.13 43755.90 ## Deg. of Freedom 1 2 2 54 ## ## Residual standard errors: 3.698849 34.78024 28.46567 ## Estimated effects may be unbalanced ā€¢ Paticular eļ¬€ect of gender : manova(times~gender) ## Call: ## manova(times ~ gender) ## ## Terms: ## gender Residuals ## resp 1 24.07 5844.93 ## resp 2 6468.82 132112.17 ## resp 3 2.02 45649.63 ## Deg. of Freedom 1 58 ## ## Residual standard errors: 10.03866 47.72626 28.05464 ## Estimated effects may be unbalanced 10
  • 11. ā€¢ Paticular eļ¬€ect of age category : manova(times~cat) ## Call: ## manova(times ~ cat) ## ## Terms: ## cat Residuals ## resp 1 4709.2 1159.8 ## resp 2 51696.63 86884.35 ## resp 3 1681.60 43970.05 ## Deg. of Freedom 2 57 ## ## Residual standard errors: 4.510806 39.04212 27.77417 ## Estimated effects may be unbalanced 4. Principal Component Analysis : Descriptive Aspect : Switzerland, in 1888, was entering a period known as the demographic transition; i.e., its fertility was beginning to fall from the high level typical of underdeveloped countries.The data collected are for 47 French-speaking ā€œprovincesā€ on 6 following variables. Fertility - Common standardized fertility measure Agriculture - Percentage of Male involved in Agriculture as occupation Examination - Percentage of draftees receiving highest mark on army examination Education - Percent Education beyond primay school for draftees Catholic - Percentage of Catholic Infant.Mortality - Live birth who lives less than 1 year Here, all variables are scaled to [0, 100], where in the original, all but ā€œCatholicā€ were scaled to [0, 1]. Purpose : -To reduce the dimensionality of data -To decrease redundancy -To identify the variables work together to create dynamics of the system ā€¢ R Codes (to call the data) with Output: library(psych) attach(swiss) 11
  • 12. Graphical Display to explore dependence structure: Scatter plot of the data reveals strong corelation among the 6 variables. plot(swiss, col="blue", pch=20) Fertility 0 40 80 0 20 40 15 20 25 4080 060 Agriculture Examination 525 030 Education Catholic 060 40 60 80 1525 5 15 30 0 40 80 Infant.Mortality round(cor(swiss),2) ## Fertility Agriculture Examination Education Catholic ## Fertility 1.00 0.35 -0.65 -0.66 0.46 ## Agriculture 0.35 1.00 -0.69 -0.64 0.40 ## Examination -0.65 -0.69 1.00 0.70 -0.57 ## Education -0.66 -0.64 0.70 1.00 -0.15 ## Catholic 0.46 0.40 -0.57 -0.15 1.00 ## Infant.Mortality 0.42 -0.06 -0.11 -0.10 0.18 ## Infant.Mortality ## Fertility 0.42 ## Agriculture -0.06 ## Examination -0.11 ## Education -0.10 ## Catholic 0.18 ## Infant.Mortality 1.00 ā€¢ Principal Components : 12
  • 13. swiss_pca<-prcomp(swiss,center=T,scale=T) swiss_pca ## Standard deviations: ## [1] 1.7887865 1.0900955 0.9206573 0.6625169 0.4522540 0.3476529 ## ## Rotation: ## PC1 PC2 PC3 PC4 PC5 ## Fertility -0.4569876 0.3220284 -0.17376638 0.53555794 -0.38308893 ## Agriculture -0.4242141 -0.4115132 0.03834472 -0.64291822 -0.37495215 ## Examination 0.5097327 0.1250167 -0.09123696 -0.05446158 -0.81429082 ## Education 0.4543119 0.1790495 0.53239316 -0.09738818 0.07144564 ## Catholic -0.3501111 0.1458730 0.80680494 0.09947244 -0.18317236 ## Infant.Mortality -0.1496668 0.8111645 -0.16010636 -0.52677184 0.10453530 ## PC6 ## Fertility 0.47295441 ## Agriculture 0.30870058 ## Examination -0.22401686 ## Education 0.68081610 ## Catholic -0.40219666 ## Infant.Mortality -0.07457754 -Center and scale refers to respective mean and standard deviation of the variables that are used for normalization prior to implementing PCA. -The rotation measure provides the principal component loading. Each column of rotation matrix contains the principal component loading vector. ā€¢ Selection of Components : The summary method describe the importance of the PCs. The ļ¬rst row describe again the standard deviation associated with each PC. The second row shows the proportion of the variance in the data explained by each component while the third row describe the cumulative proportion of explained variance. summary(swiss_pca) ## Importance of components: ## PC1 PC2 PC3 PC4 PC5 PC6 ## Standard deviation 1.7888 1.0901 0.9207 0.66252 0.45225 0.34765 ## Proportion of Variance 0.5333 0.1981 0.1413 0.07315 0.03409 0.02014 ## Cumulative Proportion 0.5333 0.7313 0.8726 0.94577 0.97986 1.00000 We can see there that the ļ¬rst two Principal Components accounts for more than 70 % of the variance of the data and considering the third principal component 85% of the data variability explained. ā€¢ Graphical selection by Screeplot: The plot method returns a plot of the variances (y-axis) associated with the PCs (x-axis). The Figure below is useful to decide how many PCs to retain for further analysis. An eigenvalue > 1 indicates that PCs account for more variance than accounted by one of the original variables in standardized data. This is commonly used as a cutoļ¬€ point for which PCs are retained. In this case,we can see that the ļ¬rst two PCs explain most of the variability in the data. 13
  • 14. library("factoextra") ## Loading required package: ggplot2 ## ## Attaching package: 'ggplot2' ## The following objects are masked from 'package:psych': ## ## %+%, alpha eig.val <- get_eigenvalue(swiss_pca) head(eig.val) ## eigenvalue variance.percent cumulative.variance.percent ## Dim.1 3.1997570 53.329283 53.32928 ## Dim.2 1.1883082 19.805137 73.13442 ## Dim.3 0.8476098 14.126830 87.26125 ## Dim.4 0.4389287 7.315478 94.57673 ## Dim.5 0.2045337 3.408895 97.98562 ## Dim.6 0.1208626 2.014376 100.00000 fviz_screeplot(swiss_pca,ncp=6, choice="eigenvalue") 0 1 2 3 1 2 3 4 5 6 Dimensions Eigenvalue Scree plot 14
  • 15. Report : We select out responsible variables whose contribution to the principal components is signiļ¬cant (with loading beyond Ā± 0.5). Responsible Varibles with their loadings : ā€¢ In PC1 ā€œExamination(0.50)ā€ ā€¢ In PC2 ā€œInfant.Mortality(0.81)ā€ 15