SlideShare a Scribd company logo
1 of 22
Download to read offline
Dealing	with	the	statistics	of	Large	Data
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
E1A dataset
• Adopted from an example by Maarit Suomalainen.
• E1A cytoplasmic intensity was measured in A549 cells infected for with varying amounts of
wild Ad5 (0.031ul, 0.0625ul, 0.125ul, 0.25ul).
• Cells were infected for varying amounts of time (11hrs,7hrs, and 4hrs) .
Aim:
• We want to detect association between the Time of infection and the cytoplasmic E1A
intensity.
• The values show the E1A cytoplasmic intensity (intensity of the E1A signal obtained from
infections with different concentrations of wt Ad5) by the time of infection (11hrs and 7hrs,
7hrs and 4hrs).
• H
O : µ
BB=µ
BA (null hypothesis i.e. No difference between 11hrs and 7hrs infection)
•
• H
A : µ
BB≠µ
BA (alternate hypothesis i.e. there is a significant difference between 11hrs and 7hrs
infection with respect to the E1A cytoplasmic intensity)
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
Why	E1A??
• The	first	viral	gene	to	be	transcribed	is	early	region	1A	(E1A)
• The 13S and 12S mRNAs are the most abundant at early times during
infection.
• 9S mRNA is the most abundant at latetimes.
• The 11S and 10S mRNA are minor species that become more abundant
at late times after infection.
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
Mean of	11hrs	infection =	0.0178718
Mean of	7hrs	infection =	0.01049993
Difference =	-0.007371876
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
Distribution	of	the	data - Nonparametric
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
• Consider the typical observation from the quantigene data (First 9 values for E1A from 11hrs and 7hrs
infection at 0.031ul ofAd5wt virus)
11hrs 0.016735 0.017585 0.031259 0.011706 0.024269 0.016424 0.01321 0.003255 0.003796
7hrs 0.006039 0.005799 0.003534 0.003393 0.008359 0.003465 0.013854 0.012815 0.031331
difference 0.010696 0.011786 0.027725 0.008313 0.01591 0.012959 -0.00064 -0.00956 -0.02753
• If the time of the virus infection (11hrs and 7hrs) has made no difference, then an outcome of 0.016735
for the 11hrs and 0.006039 for the 7hrs treatment might equally well have been 0.006039 for the 11hrs
and 0.016735 for the 7hrs
11hrs 0.006039 0.017585 0.031259 0.011706 0.024269 0.016424 0.01321 0.003255 0.003796
7hrs 0.016735 0.005799 0.003534 0.003393 0.008359 0.003465 0.013854 0.012815 0.031331
difference 0.010696 0.011786 0.027725 0.008313 0.01591 0.012959 -0.00064 -0.00956 -0.02753
difference 0.010696 0.011786 0.027725 0.008313 0.01591 0.012959 -0.00064 -0.00956 -0.02753
• A difference of 0.010696 becomes a difference of −0.010696
• There would be 29= 512 permutations (combinations), and a mean difference associated with each
permutation
• We then locate the mean difference for the data that we observed within this permutation distribution.
• The p-value is the proportion of values that are as large in absolute value as, or larger than, the mean for
the data.
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
Parametric Test Assumptions of the parametric test Non-parametric alternatives
Two independent (unpaired)
samples Student's t test
1) data from both samples are
randomly selected
2) data from both samples come from
normally distributed populations
3) homogeneity of variance
(variances are equal)
Resampling methods – Permutation and
bootstapping analysis
Two dependent (paired)
samples Student's t test
1) the differences (di) must come
from a normally distributed
population of differences)
Wilcoxon signed rank (paired samples or
matched pairs) test
ANOVA 1) data from all samples are randomly
selected
2) data from all samples come from
normally distributed populations
3) homogeneity of variance
(variances are equal)
Kruskal-Wallis H test
Pearson Product Moment
Correlation Coefficient
Analysis
1) Y data for each X must be
randomly selected from a normal
distribution ofY values
2) X data for each Ymust be
randomly selected from a normal
distribution of X values
Spearman Rank Correlation
Kendall’s rank Correlation
CoefficientAnalysis
Types of Non-parametric Tests
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
Actual	Difference Mean	of	the	difference
0.005516592 -0.027535 -0.00956 -0.00064 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.005516592
0.0275346 -0.00956 -0.00064 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.011635389
-0.027535 0.00956 -0.00064 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.007641054
-0.027535 -0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.005659747
-0.027535 0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.007784208
0.0275346 0.00956 -0.00064 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.01375985
0.0275346 -0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.011778544
0.0275346 0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.013903005
0.0275346 0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 0.01591 -0.02772 0.007741998
0.0275346 0.00956 0.000644 -0.00831 0.010696 0.011786 0.012959 0.01591 -0.02772 0.00589461
0.0275346 0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 -0.01591 0.027725 0.010367499
0.0275346 0.00956 0.000644 0.008313 0.010696 0.011786 -0.01296 -0.01591 0.027725 0.007487782
0.0275346 0.00956 0.000644 0.008313 0.010696 -0.01179 -0.01296 0.01591 0.027725 0.00840416
0.0275346 0.00956 0.000644 0.008313 -0.0107 -0.01179 -0.01296 0.01591 0.027725 0.006027309
0.0275346 0.00956 0.000644 -0.00831 -0.0107 -0.01179 0.012959 0.01591 0.027725 0.007059638
0.0275346 0.00956 -0.00064 -0.00831 -0.0107 -0.01179 0.012959 0.01591 0.027725 0.006916483
0.0275346 0.00956 0.000644 -0.00831 -0.0107 0.011786 0.012959 0.01591 0.027725 0.009678765
Combinations
• In the permutation distribution, these each have an equal probability of taking a positive or a negative sign.
• There are 2^n possibilities, and hence 29 = 512 different values for d¯. (n is the sample size)
• we have a total of 57 possible combinations that give a mean difference that is as large as or larger than in
the actual sample, where the value for pair 8 has a negative sign
Difference 0.010696 0.011786 0.027725 0.008313 0.01591 0.012959 -0.00064 -0.00956 -0.02753
• There are another 57 possibilities that give a mean difference that is of the same absolute value, but negative.
Hence p = 114/512 = 0.22.
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
Reality
• In	our	data	we	have	a	total	of	5839	complete	observation	for	11hrs	infection	and	6776	
complete	observation	for	7hrs	infection.
• Therefore	when	the	number	of	pairs	is	large,	it	will	not	be	feasible	to	use	such	an	
enumeration	approach	to	get	information	on	relevant	parts	of	the	upper	and	lower	tails	
of	the	distribution.	
• Computationally	expensive.
• We	therefore	take	repeated	random	samples	from	the	permutation.	
• Use	of	a	larger	sample	size	will	of	course	lead	to	more	accurate	p-values
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
Steps
• Compute the difference between the means of two treatments → observed difference
Mean(E1A_11hrs) – Mean(E1A_7hrs)
• Combine two conditions into one dataset (to break the association → HO)
Pool the data together
• Repeat the following two steps for a large number of times (e.g.,1,000 permutations):
• Sample two samples from combined dataset without replacement
• Compute difference between means of the two sampled (i.e.,permuted) datasets
• Compute the fraction of how many times the permuted differences ≥ observed difference out of the
total number of permutations → p−value
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
Permutation
Here we combine the two samples (11hrsand 7hrstreatment) into a single dataset such that under the null
hypothesis, there is no difference between the two groups.
New dataset with the permuted means after each iterations - one would have 10000 means of permuted samples
from each conditions.
.
.
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
Means	of	Permuted	Samples
• While the observed mean 11hrs infection and 7hrs infection was:
Mean of 11hrs infection = 0.0178718
Mean of 7hrs infection = 0.01049993
• Note there is a very little difference between the permuted means = -0.00000043 compared to observed
mean = -0.00737187
• We can check the distribution of means of the permuted samples:
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
Difference between Means of Permuted Samples to calculate the confidence intervals
• We can now set the confidence intervals for the above differences.
• Since we would want test the statistical significance between two conditions (11hrs
vs 7hrs) for a particular amount of virus infection(0.031ul/well).
• We can set the level of significance to 5%. This means that the finding has a five
percent (.05) chance of not being true, which is the converse of a 95% chance of
being true (if true difference exists, it would seen for 95 out of 100 observations).
-0.005 0.000 0.005
Differences
Permuted Differences
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
Computing the p-value
• p-value can be calculated from the distribution of mean differences of 11hrs and 7 hrs treatment from the
permuted sample.
• Since we have already calculated the confidence interval for these differences, we can check if any of the
difference between the two conditions from the our observed samples fall within this computed
confidence interval.
0
500
1000
1500
-0.005 0.000 0.005
Differences
density
Permuted Differences
• The difference is clearly significant.
• It can be clearly seen that observed difference never ovelaps with the confidence intervals (Two red solid
intercept) of the permuted differences.
• Thus the number of times the permuted differences ≥ observed difference out of the total number of
permutations gives the final p-value.
pvalue =	sum(abs(diff_permuted)	 >=	
abs(diff_observed))	 /	permutations
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
Based on the p-value, the mean cytoplasmic E1A values is significantly different
between the cells infected for 11hrs and 7hrs with 0.031ul Ad5wt/well.
Nevertheless, p-values should not be 0 according to this
paper:
“Unless the dataset is very small (less than about 20-30 total
numbers, typically) or the test statistic has a particularly nice
mathematical form, is not practicable to generate all the
permutations. Therefore computer implementations of
permutation tests typically sample from the permutation
distribution. They do so by generating some independent
random permutations and hope that the results are a
representative sample of all the permutations”.
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
Examples where Permutation is not able to detect the differences
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
Based on the Permutation resampling analysis:
1. Significant difference in cytoplasmic E1Aintensities btween 11hrs, 7hrs and 4 hrs infection with different
virus concentration (0.031ul, 0.0625ul, 0.125ul and 0.25ul/well)
2. Some of the technical replicates also show significant differences
Example:
0.031ul virus (4hrs)
E09 vs E10 - NS
E09 vs E11 - NS
E10 vs E11 – S
0.0625ul Virus (11hrs)
D03 vs D04 - NS
D04 vs D05 - NS
D03 vs D05 – S
0.125ulVirus (11hrs)
C03 vs C04 - S
C04 vs C05 - NS
C03 vs C05 - S
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
Correlation of E1B with E1A signal
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes
(C)	Abhilash	Kannan- to	be	used	only	for	
educational	purposes

More Related Content

Similar to E1A quantigene

Running head COURSE PROJECT NCLEX Memorial Hospital .docx
Running head COURSE PROJECT NCLEX Memorial Hospital             .docxRunning head COURSE PROJECT NCLEX Memorial Hospital             .docx
Running head COURSE PROJECT NCLEX Memorial Hospital .docxsusanschei
 
10. sampling and hypotehsis
10. sampling and hypotehsis10. sampling and hypotehsis
10. sampling and hypotehsisKaran Kukreja
 
Test of-significance : Z test , Chi square test
Test of-significance : Z test , Chi square testTest of-significance : Z test , Chi square test
Test of-significance : Z test , Chi square testdr.balan shaikh
 
Inferential Statistics.pdf
Inferential Statistics.pdfInferential Statistics.pdf
Inferential Statistics.pdfShivakumar B N
 
2_5332511410507220042.ppt
2_5332511410507220042.ppt2_5332511410507220042.ppt
2_5332511410507220042.pptnedalalazzwy
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distributionAvjinder (Avi) Kaler
 
What is your question
What is your questionWhat is your question
What is your questionStephenSenn2
 
What is your question
What is your questionWhat is your question
What is your questionStephen Senn
 
Biostatics part 7.pdf
Biostatics part 7.pdfBiostatics part 7.pdf
Biostatics part 7.pdfNatiphBasha
 
Saratoga real estate
Saratoga real estateSaratoga real estate
Saratoga real estateFaisal Akbar
 
Lecture 8 Type 1 and 2 errors.pptx
Lecture 8 Type 1 and 2 errors.pptxLecture 8 Type 1 and 2 errors.pptx
Lecture 8 Type 1 and 2 errors.pptxshakirRahman10
 
Tbs910 sampling hypothesis regression
Tbs910 sampling hypothesis regressionTbs910 sampling hypothesis regression
Tbs910 sampling hypothesis regressionStephen Ong
 

Similar to E1A quantigene (20)

Running head COURSE PROJECT NCLEX Memorial Hospital .docx
Running head COURSE PROJECT NCLEX Memorial Hospital             .docxRunning head COURSE PROJECT NCLEX Memorial Hospital             .docx
Running head COURSE PROJECT NCLEX Memorial Hospital .docx
 
10. sampling and hypotehsis
10. sampling and hypotehsis10. sampling and hypotehsis
10. sampling and hypotehsis
 
Test of significance
Test of significanceTest of significance
Test of significance
 
Hypothesis - Biostatistics
Hypothesis - BiostatisticsHypothesis - Biostatistics
Hypothesis - Biostatistics
 
Test of-significance : Z test , Chi square test
Test of-significance : Z test , Chi square testTest of-significance : Z test , Chi square test
Test of-significance : Z test , Chi square test
 
lecture-2.ppt
lecture-2.pptlecture-2.ppt
lecture-2.ppt
 
Inferential Statistics.pdf
Inferential Statistics.pdfInferential Statistics.pdf
Inferential Statistics.pdf
 
2_5332511410507220042.ppt
2_5332511410507220042.ppt2_5332511410507220042.ppt
2_5332511410507220042.ppt
 
Sign Test
Sign TestSign Test
Sign Test
 
Chi square
Chi squareChi square
Chi square
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distribution
 
What is your question
What is your questionWhat is your question
What is your question
 
What is your question
What is your questionWhat is your question
What is your question
 
Lund 2009
Lund 2009Lund 2009
Lund 2009
 
Biostatics part 7.pdf
Biostatics part 7.pdfBiostatics part 7.pdf
Biostatics part 7.pdf
 
Saratoga real estate
Saratoga real estateSaratoga real estate
Saratoga real estate
 
3b. Introductory Statistics - Julia Saperia
3b. Introductory Statistics - Julia Saperia3b. Introductory Statistics - Julia Saperia
3b. Introductory Statistics - Julia Saperia
 
Lecture 8 Type 1 and 2 errors.pptx
Lecture 8 Type 1 and 2 errors.pptxLecture 8 Type 1 and 2 errors.pptx
Lecture 8 Type 1 and 2 errors.pptx
 
Methods of minimizing errors
Methods of minimizing errorsMethods of minimizing errors
Methods of minimizing errors
 
Tbs910 sampling hypothesis regression
Tbs910 sampling hypothesis regressionTbs910 sampling hypothesis regression
Tbs910 sampling hypothesis regression
 

E1A quantigene

  • 2. E1A dataset • Adopted from an example by Maarit Suomalainen. • E1A cytoplasmic intensity was measured in A549 cells infected for with varying amounts of wild Ad5 (0.031ul, 0.0625ul, 0.125ul, 0.25ul). • Cells were infected for varying amounts of time (11hrs,7hrs, and 4hrs) . Aim: • We want to detect association between the Time of infection and the cytoplasmic E1A intensity. • The values show the E1A cytoplasmic intensity (intensity of the E1A signal obtained from infections with different concentrations of wt Ad5) by the time of infection (11hrs and 7hrs, 7hrs and 4hrs). • H O : µ BB=µ BA (null hypothesis i.e. No difference between 11hrs and 7hrs infection) • • H A : µ BB≠µ BA (alternate hypothesis i.e. there is a significant difference between 11hrs and 7hrs infection with respect to the E1A cytoplasmic intensity) (C) Abhilash Kannan- to be used only for educational purposes
  • 3. Why E1A?? • The first viral gene to be transcribed is early region 1A (E1A) • The 13S and 12S mRNAs are the most abundant at early times during infection. • 9S mRNA is the most abundant at latetimes. • The 11S and 10S mRNA are minor species that become more abundant at late times after infection. (C) Abhilash Kannan- to be used only for educational purposes
  • 4. Mean of 11hrs infection = 0.0178718 Mean of 7hrs infection = 0.01049993 Difference = -0.007371876 (C) Abhilash Kannan- to be used only for educational purposes
  • 5. Distribution of the data - Nonparametric (C) Abhilash Kannan- to be used only for educational purposes
  • 6. • Consider the typical observation from the quantigene data (First 9 values for E1A from 11hrs and 7hrs infection at 0.031ul ofAd5wt virus) 11hrs 0.016735 0.017585 0.031259 0.011706 0.024269 0.016424 0.01321 0.003255 0.003796 7hrs 0.006039 0.005799 0.003534 0.003393 0.008359 0.003465 0.013854 0.012815 0.031331 difference 0.010696 0.011786 0.027725 0.008313 0.01591 0.012959 -0.00064 -0.00956 -0.02753 • If the time of the virus infection (11hrs and 7hrs) has made no difference, then an outcome of 0.016735 for the 11hrs and 0.006039 for the 7hrs treatment might equally well have been 0.006039 for the 11hrs and 0.016735 for the 7hrs 11hrs 0.006039 0.017585 0.031259 0.011706 0.024269 0.016424 0.01321 0.003255 0.003796 7hrs 0.016735 0.005799 0.003534 0.003393 0.008359 0.003465 0.013854 0.012815 0.031331 difference 0.010696 0.011786 0.027725 0.008313 0.01591 0.012959 -0.00064 -0.00956 -0.02753 difference 0.010696 0.011786 0.027725 0.008313 0.01591 0.012959 -0.00064 -0.00956 -0.02753 • A difference of 0.010696 becomes a difference of −0.010696 • There would be 29= 512 permutations (combinations), and a mean difference associated with each permutation • We then locate the mean difference for the data that we observed within this permutation distribution. • The p-value is the proportion of values that are as large in absolute value as, or larger than, the mean for the data. (C) Abhilash Kannan- to be used only for educational purposes
  • 7. Parametric Test Assumptions of the parametric test Non-parametric alternatives Two independent (unpaired) samples Student's t test 1) data from both samples are randomly selected 2) data from both samples come from normally distributed populations 3) homogeneity of variance (variances are equal) Resampling methods – Permutation and bootstapping analysis Two dependent (paired) samples Student's t test 1) the differences (di) must come from a normally distributed population of differences) Wilcoxon signed rank (paired samples or matched pairs) test ANOVA 1) data from all samples are randomly selected 2) data from all samples come from normally distributed populations 3) homogeneity of variance (variances are equal) Kruskal-Wallis H test Pearson Product Moment Correlation Coefficient Analysis 1) Y data for each X must be randomly selected from a normal distribution ofY values 2) X data for each Ymust be randomly selected from a normal distribution of X values Spearman Rank Correlation Kendall’s rank Correlation CoefficientAnalysis Types of Non-parametric Tests (C) Abhilash Kannan- to be used only for educational purposes
  • 8. Actual Difference Mean of the difference 0.005516592 -0.027535 -0.00956 -0.00064 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.005516592 0.0275346 -0.00956 -0.00064 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.011635389 -0.027535 0.00956 -0.00064 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.007641054 -0.027535 -0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.005659747 -0.027535 0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.007784208 0.0275346 0.00956 -0.00064 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.01375985 0.0275346 -0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.011778544 0.0275346 0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 0.01591 0.027725 0.013903005 0.0275346 0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 0.01591 -0.02772 0.007741998 0.0275346 0.00956 0.000644 -0.00831 0.010696 0.011786 0.012959 0.01591 -0.02772 0.00589461 0.0275346 0.00956 0.000644 0.008313 0.010696 0.011786 0.012959 -0.01591 0.027725 0.010367499 0.0275346 0.00956 0.000644 0.008313 0.010696 0.011786 -0.01296 -0.01591 0.027725 0.007487782 0.0275346 0.00956 0.000644 0.008313 0.010696 -0.01179 -0.01296 0.01591 0.027725 0.00840416 0.0275346 0.00956 0.000644 0.008313 -0.0107 -0.01179 -0.01296 0.01591 0.027725 0.006027309 0.0275346 0.00956 0.000644 -0.00831 -0.0107 -0.01179 0.012959 0.01591 0.027725 0.007059638 0.0275346 0.00956 -0.00064 -0.00831 -0.0107 -0.01179 0.012959 0.01591 0.027725 0.006916483 0.0275346 0.00956 0.000644 -0.00831 -0.0107 0.011786 0.012959 0.01591 0.027725 0.009678765 Combinations • In the permutation distribution, these each have an equal probability of taking a positive or a negative sign. • There are 2^n possibilities, and hence 29 = 512 different values for d¯. (n is the sample size) • we have a total of 57 possible combinations that give a mean difference that is as large as or larger than in the actual sample, where the value for pair 8 has a negative sign Difference 0.010696 0.011786 0.027725 0.008313 0.01591 0.012959 -0.00064 -0.00956 -0.02753 • There are another 57 possibilities that give a mean difference that is of the same absolute value, but negative. Hence p = 114/512 = 0.22. (C) Abhilash Kannan- to be used only for educational purposes
  • 9. Reality • In our data we have a total of 5839 complete observation for 11hrs infection and 6776 complete observation for 7hrs infection. • Therefore when the number of pairs is large, it will not be feasible to use such an enumeration approach to get information on relevant parts of the upper and lower tails of the distribution. • Computationally expensive. • We therefore take repeated random samples from the permutation. • Use of a larger sample size will of course lead to more accurate p-values (C) Abhilash Kannan- to be used only for educational purposes
  • 10. Steps • Compute the difference between the means of two treatments → observed difference Mean(E1A_11hrs) – Mean(E1A_7hrs) • Combine two conditions into one dataset (to break the association → HO) Pool the data together • Repeat the following two steps for a large number of times (e.g.,1,000 permutations): • Sample two samples from combined dataset without replacement • Compute difference between means of the two sampled (i.e.,permuted) datasets • Compute the fraction of how many times the permuted differences ≥ observed difference out of the total number of permutations → p−value (C) Abhilash Kannan- to be used only for educational purposes
  • 11. Permutation Here we combine the two samples (11hrsand 7hrstreatment) into a single dataset such that under the null hypothesis, there is no difference between the two groups. New dataset with the permuted means after each iterations - one would have 10000 means of permuted samples from each conditions. . . (C) Abhilash Kannan- to be used only for educational purposes
  • 12. Means of Permuted Samples • While the observed mean 11hrs infection and 7hrs infection was: Mean of 11hrs infection = 0.0178718 Mean of 7hrs infection = 0.01049993 • Note there is a very little difference between the permuted means = -0.00000043 compared to observed mean = -0.00737187 • We can check the distribution of means of the permuted samples: (C) Abhilash Kannan- to be used only for educational purposes
  • 13. Difference between Means of Permuted Samples to calculate the confidence intervals • We can now set the confidence intervals for the above differences. • Since we would want test the statistical significance between two conditions (11hrs vs 7hrs) for a particular amount of virus infection(0.031ul/well). • We can set the level of significance to 5%. This means that the finding has a five percent (.05) chance of not being true, which is the converse of a 95% chance of being true (if true difference exists, it would seen for 95 out of 100 observations). -0.005 0.000 0.005 Differences Permuted Differences (C) Abhilash Kannan- to be used only for educational purposes
  • 14. Computing the p-value • p-value can be calculated from the distribution of mean differences of 11hrs and 7 hrs treatment from the permuted sample. • Since we have already calculated the confidence interval for these differences, we can check if any of the difference between the two conditions from the our observed samples fall within this computed confidence interval. 0 500 1000 1500 -0.005 0.000 0.005 Differences density Permuted Differences • The difference is clearly significant. • It can be clearly seen that observed difference never ovelaps with the confidence intervals (Two red solid intercept) of the permuted differences. • Thus the number of times the permuted differences ≥ observed difference out of the total number of permutations gives the final p-value. pvalue = sum(abs(diff_permuted) >= abs(diff_observed)) / permutations (C) Abhilash Kannan- to be used only for educational purposes
  • 15. Based on the p-value, the mean cytoplasmic E1A values is significantly different between the cells infected for 11hrs and 7hrs with 0.031ul Ad5wt/well. Nevertheless, p-values should not be 0 according to this paper: “Unless the dataset is very small (less than about 20-30 total numbers, typically) or the test statistic has a particularly nice mathematical form, is not practicable to generate all the permutations. Therefore computer implementations of permutation tests typically sample from the permutation distribution. They do so by generating some independent random permutations and hope that the results are a representative sample of all the permutations”. (C) Abhilash Kannan- to be used only for educational purposes
  • 16. Examples where Permutation is not able to detect the differences (C) Abhilash Kannan- to be used only for educational purposes
  • 17. Based on the Permutation resampling analysis: 1. Significant difference in cytoplasmic E1Aintensities btween 11hrs, 7hrs and 4 hrs infection with different virus concentration (0.031ul, 0.0625ul, 0.125ul and 0.25ul/well) 2. Some of the technical replicates also show significant differences Example: 0.031ul virus (4hrs) E09 vs E10 - NS E09 vs E11 - NS E10 vs E11 – S 0.0625ul Virus (11hrs) D03 vs D04 - NS D04 vs D05 - NS D03 vs D05 – S 0.125ulVirus (11hrs) C03 vs C04 - S C04 vs C05 - NS C03 vs C05 - S (C) Abhilash Kannan- to be used only for educational purposes
  • 18. Correlation of E1B with E1A signal (C) Abhilash Kannan- to be used only for educational purposes