2. E1A dataset
• Adopted from an example by Maarit Suomalainen.
• E1A cytoplasmic intensity was measured in A549 cells infected for with varying amounts of
wild Ad5 (0.031ul, 0.0625ul, 0.125ul, 0.25ul).
• Cells were infected for varying amounts of time (11hrs,7hrs, and 4hrs) .
Aim:
• We want to detect association between the Time of infection and the cytoplasmic E1A
intensity.
• The values show the E1A cytoplasmic intensity (intensity of the E1A signal obtained from
infections with different concentrations of wt Ad5) by the time of infection (11hrs and 7hrs,
7hrs and 4hrs).
• H
O : µ
BB=µ
BA (null hypothesis i.e. No difference between 11hrs and 7hrs infection)
•
• H
A : µ
BB≠µ
BA (alternate hypothesis i.e. there is a significant difference between 11hrs and 7hrs
infection with respect to the E1A cytoplasmic intensity)
(C) Abhilash Kannan- to be used only for
educational purposes
3. Why E1A??
• The first viral gene to be transcribed is early region 1A (E1A)
• The 13S and 12S mRNAs are the most abundant at early times during
infection.
• 9S mRNA is the most abundant at latetimes.
• The 11S and 10S mRNA are minor species that become more abundant
at late times after infection.
(C) Abhilash Kannan- to be used only for
educational purposes
6. • Consider the typical observation from the quantigene data (First 9 values for E1A from 11hrs and 7hrs
infection at 0.031ul ofAd5wt virus)
11hrs 0.016735 0.017585 0.031259 0.011706 0.024269 0.016424 0.01321 0.003255 0.003796
7hrs 0.006039 0.005799 0.003534 0.003393 0.008359 0.003465 0.013854 0.012815 0.031331
difference 0.010696 0.011786 0.027725 0.008313 0.01591 0.012959 -0.00064 -0.00956 -0.02753
• If the time of the virus infection (11hrs and 7hrs) has made no difference, then an outcome of 0.016735
for the 11hrs and 0.006039 for the 7hrs treatment might equally well have been 0.006039 for the 11hrs
and 0.016735 for the 7hrs
11hrs 0.006039 0.017585 0.031259 0.011706 0.024269 0.016424 0.01321 0.003255 0.003796
7hrs 0.016735 0.005799 0.003534 0.003393 0.008359 0.003465 0.013854 0.012815 0.031331
difference 0.010696 0.011786 0.027725 0.008313 0.01591 0.012959 -0.00064 -0.00956 -0.02753
difference 0.010696 0.011786 0.027725 0.008313 0.01591 0.012959 -0.00064 -0.00956 -0.02753
• A difference of 0.010696 becomes a difference of −0.010696
• There would be 29= 512 permutations (combinations), and a mean difference associated with each
permutation
• We then locate the mean difference for the data that we observed within this permutation distribution.
• The p-value is the proportion of values that are as large in absolute value as, or larger than, the mean for
the data.
(C) Abhilash Kannan- to be used only for
educational purposes
7. Parametric Test Assumptions of the parametric test Non-parametric alternatives
Two independent (unpaired)
samples Student's t test
1) data from both samples are
randomly selected
2) data from both samples come from
normally distributed populations
3) homogeneity of variance
(variances are equal)
Resampling methods – Permutation and
bootstapping analysis
Two dependent (paired)
samples Student's t test
1) the differences (di) must come
from a normally distributed
population of differences)
Wilcoxon signed rank (paired samples or
matched pairs) test
ANOVA 1) data from all samples are randomly
selected
2) data from all samples come from
normally distributed populations
3) homogeneity of variance
(variances are equal)
Kruskal-Wallis H test
Pearson Product Moment
Correlation Coefficient
Analysis
1) Y data for each X must be
randomly selected from a normal
distribution ofY values
2) X data for each Ymust be
randomly selected from a normal
distribution of X values
Spearman Rank Correlation
Kendall’s rank Correlation
CoefficientAnalysis
Types of Non-parametric Tests
(C) Abhilash Kannan- to be used only for
educational purposes
8. Actual Difference Mean of the difference
0.005516592 -0.027535 -0.00956 -0.00064 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.005516592
0.0275346 -0.00956 -0.00064 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.011635389
-0.027535 0.00956 -0.00064 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.007641054
-0.027535 -0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.005659747
-0.027535 0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.007784208
0.0275346 0.00956 -0.00064 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.01375985
0.0275346 -0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.011778544
0.0275346 0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.013903005
0.0275346 0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 0.01591 -0.02772 0.007741998
0.0275346 0.00956 0.000644 -0.00831 0.010696 0.011786 0.012959 0.01591 -0.02772 0.00589461
0.0275346 0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 -0.01591 0.027725 0.010367499
0.0275346 0.00956 0.000644 0.008313 0.010696 0.011786 -0.01296 -0.01591 0.027725 0.007487782
0.0275346 0.00956 0.000644 0.008313 0.010696 -0.01179 -0.01296 0.01591 0.027725 0.00840416
0.0275346 0.00956 0.000644 0.008313 -0.0107 -0.01179 -0.01296 0.01591 0.027725 0.006027309
0.0275346 0.00956 0.000644 -0.00831 -0.0107 -0.01179 0.012959 0.01591 0.027725 0.007059638
0.0275346 0.00956 -0.00064 -0.00831 -0.0107 -0.01179 0.012959 0.01591 0.027725 0.006916483
0.0275346 0.00956 0.000644 -0.00831 -0.0107 0.011786 0.012959 0.01591 0.027725 0.009678765
Combinations
• In the permutation distribution, these each have an equal probability of taking a positive or a negative sign.
• There are 2^n possibilities, and hence 29 = 512 different values for d¯. (n is the sample size)
• we have a total of 57 possible combinations that give a mean difference that is as large as or larger than in
the actual sample, where the value for pair 8 has a negative sign
Difference 0.010696 0.011786 0.027725 0.008313 0.01591 0.012959 -0.00064 -0.00956 -0.02753
• There are another 57 possibilities that give a mean difference that is of the same absolute value, but negative.
Hence p = 114/512 = 0.22.
(C) Abhilash Kannan- to be used only for
educational purposes
10. Steps
• Compute the difference between the means of two treatments → observed difference
Mean(E1A_11hrs) – Mean(E1A_7hrs)
• Combine two conditions into one dataset (to break the association → HO)
Pool the data together
• Repeat the following two steps for a large number of times (e.g.,1,000 permutations):
• Sample two samples from combined dataset without replacement
• Compute difference between means of the two sampled (i.e.,permuted) datasets
• Compute the fraction of how many times the permuted differences ≥ observed difference out of the
total number of permutations → p−value
(C) Abhilash Kannan- to be used only for
educational purposes
11. Permutation
Here we combine the two samples (11hrsand 7hrstreatment) into a single dataset such that under the null
hypothesis, there is no difference between the two groups.
New dataset with the permuted means after each iterations - one would have 10000 means of permuted samples
from each conditions.
.
.
(C) Abhilash Kannan- to be used only for
educational purposes
12. Means of Permuted Samples
• While the observed mean 11hrs infection and 7hrs infection was:
Mean of 11hrs infection = 0.0178718
Mean of 7hrs infection = 0.01049993
• Note there is a very little difference between the permuted means = -0.00000043 compared to observed
mean = -0.00737187
• We can check the distribution of means of the permuted samples:
(C) Abhilash Kannan- to be used only for
educational purposes
13. Difference between Means of Permuted Samples to calculate the confidence intervals
• We can now set the confidence intervals for the above differences.
• Since we would want test the statistical significance between two conditions (11hrs
vs 7hrs) for a particular amount of virus infection(0.031ul/well).
• We can set the level of significance to 5%. This means that the finding has a five
percent (.05) chance of not being true, which is the converse of a 95% chance of
being true (if true difference exists, it would seen for 95 out of 100 observations).
-0.005 0.000 0.005
Differences
Permuted Differences
(C) Abhilash Kannan- to be used only for
educational purposes
14. Computing the p-value
• p-value can be calculated from the distribution of mean differences of 11hrs and 7 hrs treatment from the
permuted sample.
• Since we have already calculated the confidence interval for these differences, we can check if any of the
difference between the two conditions from the our observed samples fall within this computed
confidence interval.
0
500
1000
1500
-0.005 0.000 0.005
Differences
density
Permuted Differences
• The difference is clearly significant.
• It can be clearly seen that observed difference never ovelaps with the confidence intervals (Two red solid
intercept) of the permuted differences.
• Thus the number of times the permuted differences ≥ observed difference out of the total number of
permutations gives the final p-value.
pvalue = sum(abs(diff_permuted) >=
abs(diff_observed)) / permutations
(C) Abhilash Kannan- to be used only for
educational purposes
15. Based on the p-value, the mean cytoplasmic E1A values is significantly different
between the cells infected for 11hrs and 7hrs with 0.031ul Ad5wt/well.
Nevertheless, p-values should not be 0 according to this
paper:
“Unless the dataset is very small (less than about 20-30 total
numbers, typically) or the test statistic has a particularly nice
mathematical form, is not practicable to generate all the
permutations. Therefore computer implementations of
permutation tests typically sample from the permutation
distribution. They do so by generating some independent
random permutations and hope that the results are a
representative sample of all the permutations”.
(C) Abhilash Kannan- to be used only for
educational purposes
16. Examples where Permutation is not able to detect the differences
(C) Abhilash Kannan- to be used only for
educational purposes
17. Based on the Permutation resampling analysis:
1. Significant difference in cytoplasmic E1Aintensities btween 11hrs, 7hrs and 4 hrs infection with different
virus concentration (0.031ul, 0.0625ul, 0.125ul and 0.25ul/well)
2. Some of the technical replicates also show significant differences
Example:
0.031ul virus (4hrs)
E09 vs E10 - NS
E09 vs E11 - NS
E10 vs E11 – S
0.0625ul Virus (11hrs)
D03 vs D04 - NS
D04 vs D05 - NS
D03 vs D05 – S
0.125ulVirus (11hrs)
C03 vs C04 - S
C04 vs C05 - NS
C03 vs C05 - S
(C) Abhilash Kannan- to be used only for
educational purposes
18. Correlation of E1B with E1A signal
(C) Abhilash Kannan- to be used only for
educational purposes