This document describes a project that aims to maximize the power of hypothesis tests in single gene experiments with multiple cell types under a cost constraint. It formulates the problem and proposes a sequential sampling procedure to estimate the optimal sample size allocation across cell types. The procedure begins with an initial sample from each cell type and sequentially adds more samples until the estimated optimal sample size is reached for each cell type while ensuring the total cost is within the given budget. Graphical and tabular results from simulations evaluating the procedure are also presented.
Invezz.com - Grow your wealth with trading signals
Btp 2017 presentation
1. Introduction Methodology Results Conclusions Future Work Contributions
B.Tech. Project
Attaining maximum power under cost constraints
in single gene experiments
P Jishnu Jaykumar (201352005)
Abhijeet Singh Panwar (201351005)
under the supervision of
Dr. Bhargab Chattopadhyay
Indian Institute of Information Technology,
Vadodara
May 5, 2017
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 1/27
2. Introduction Methodology Results Conclusions Future Work Contributions
Introduction
Single gene experiments are assays for querying ribonu-
cleic acid species abundance in individual or pools of
cells.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 2/27
3. Introduction Methodology Results Conclusions Future Work Contributions
Introduction
Single gene experiments are assays for querying ribonu-
cleic acid species abundance in individual or pools of
cells.
They have enormous importance in varied fields like evo-
lution, pathology and drug development.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 2/27
4. Introduction Methodology Results Conclusions Future Work Contributions
Introduction
Single gene experiments are assays for querying ribonu-
cleic acid species abundance in individual or pools of
cells.
They have enormous importance in varied fields like evo-
lution, pathology and drug development.
However, these experiments are usually performed by
pre-specifying the sample size during experimental de-
sign, which may lead to under powered or over powered
hypothesis tests.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 2/27
5. Introduction Methodology Results Conclusions Future Work Contributions
Introduction
Single gene experiments are assays for querying ribonu-
cleic acid species abundance in individual or pools of
cells.
They have enormous importance in varied fields like evo-
lution, pathology and drug development.
However, these experiments are usually performed by
pre-specifying the sample size during experimental de-
sign, which may lead to under powered or over powered
hypothesis tests.
This resulted in a shift to experimental design
paradigms with minimal number of replicates to maxi-
mize power.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 2/27
6. Introduction Methodology Results Conclusions Future Work Contributions
Introduction
Single gene experiments are assays for querying ribonu-
cleic acid species abundance in individual or pools of
cells.
They have enormous importance in varied fields like evo-
lution, pathology and drug development.
However, these experiments are usually performed by
pre-specifying the sample size during experimental de-
sign, which may lead to under powered or over powered
hypothesis tests.
This resulted in a shift to experimental design
paradigms with minimal number of replicates to maxi-
mize power.
Several studies have analyzed sample size required to
maximize power in hypothesis testing in RNA abun-
dance experiments.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 2/27
7. Introduction Methodology Results Conclusions Future Work Contributions
Introduction - Contd.
Recently, for the two-stage experimental design frame-
work, algorithms like SCOTTY have been proposed.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 3/27
8. Introduction Methodology Results Conclusions Future Work Contributions
Introduction - Contd.
Recently, for the two-stage experimental design frame-
work, algorithms like SCOTTY have been proposed.
But SCOTTY fails, if there are multiple (> 2) cell types.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 3/27
9. Introduction Methodology Results Conclusions Future Work Contributions
Introduction - Contd.
Recently, for the two-stage experimental design frame-
work, algorithms like SCOTTY have been proposed.
But SCOTTY fails, if there are multiple (> 2) cell types.
Also, two-stage procedures are imperfect if pilot sample
is unrepresentative.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 3/27
10. Introduction Methodology Results Conclusions Future Work Contributions
Introduction - Contd.
Recently, for the two-stage experimental design frame-
work, algorithms like SCOTTY have been proposed.
But SCOTTY fails, if there are multiple (> 2) cell types.
Also, two-stage procedures are imperfect if pilot sample
is unrepresentative.
Thus the focus is to maximize the power of the hypoth-
esis test related to a single gene belonging to more than
two cell types under a cost constraint.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 3/27
11. Introduction Methodology Results Conclusions Future Work Contributions
Some facts
In experimental research, limited funding is allotted be-
forehand to carry out the sampling process.
Thus under a cost constraint we have to carry out the
test of equivalence or inferiority (or superiority) with
minimum errors.
Note that we fix the probability of type-I error to α (0.05
in our case),
So the idea is to minimize the probability of type-II
error or equivalently maximizing the power under cost
constraints.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 4/27
12. Introduction Methodology Results Conclusions Future Work Contributions
Formulation
Suppose there are K cell types belonging to a particular
gene.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 5/27
13. Introduction Methodology Results Conclusions Future Work Contributions
Formulation
Suppose there are K cell types belonging to a particular
gene.
For the ith
cell type, suppose Xi1, . . . , Xini
be iid ran-
dom variables, not necessary normal, with means µi and
variances σ2
i .
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 5/27
14. Introduction Methodology Results Conclusions Future Work Contributions
Formulation
Suppose there are K cell types belonging to a particular
gene.
For the ith
cell type, suppose Xi1, . . . , Xini
be iid ran-
dom variables, not necessary normal, with means µi and
variances σ2
i .
Consider δ = c µ K
i=1 ciµi , c = (c1, . . . , cK) is
known.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 5/27
15. Introduction Methodology Results Conclusions Future Work Contributions
Formulation
Suppose there are K cell types belonging to a particular
gene.
For the ith
cell type, suppose Xi1, . . . , Xini
be iid ran-
dom variables, not necessary normal, with means µi and
variances σ2
i .
Consider δ = c µ K
i=1 ciµi , c = (c1, . . . , cK) is
known.
Estimator of δ is δn = K
i=1 ci
¯Xni
.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 5/27
16. Introduction Methodology Results Conclusions Future Work Contributions
Formulation Contd.
γi’s: sample size allocation ratios (unknown), ni = γin1,
γ1 = 1.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 6/27
17. Introduction Methodology Results Conclusions Future Work Contributions
Formulation Contd.
γi’s: sample size allocation ratios (unknown), ni = γin1,
γ1 = 1.
V ar(δn) = K
i=1 c2
i σ2
i /ni = 1
n1
K
i=1 c2
i σ2
i /γi.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 6/27
18. Introduction Methodology Results Conclusions Future Work Contributions
Formulation Contd.
γi’s: sample size allocation ratios (unknown), ni = γin1,
γ1 = 1.
V ar(δn) = K
i=1 c2
i σ2
i /ni = 1
n1
K
i=1 c2
i σ2
i /γi.
Using CLT,
√
n1(δn − δ)
K
i=1 c2
i σ2
i /γi
→
D
N(0, 1)
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 6/27
19. Introduction Methodology Results Conclusions Future Work Contributions
Test for Equivalence
Compare δ, to two equivalence margins with δ1 < δ < δ2
H0 : δ ≤ δ1 or δ ≥ δ2 against HA : δ1 < δ < δ2
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 7/27
20. Introduction Methodology Results Conclusions Future Work Contributions
Test for Equivalence
Compare δ, to two equivalence margins with δ1 < δ < δ2
H0 : δ ≤ δ1 or δ ≥ δ2 against HA : δ1 < δ < δ2
Consider the test for equivalence. Then the null hypoth-
esis H0 is rejected at α% if
δ1 + zα V ar(δn) < δn < δ2 + zα V ar(δn)
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 7/27
21. Introduction Methodology Results Conclusions Future Work Contributions
Test for Equivalence
Compare δ, to two equivalence margins with δ1 < δ < δ2
H0 : δ ≤ δ1 or δ ≥ δ2 against HA : δ1 < δ < δ2
Consider the test for equivalence. Then the null hypoth-
esis H0 is rejected at α% if
δ1 + zα V ar(δn) < δn < δ2 + zα V ar(δn)
The approximate power function is
PTE(δ) = Φ
−zα +
δ2 − δ
V ar(ˆδn)
−Φ
zα −
δ − δ1
V ar(ˆδn)
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 7/27
22. Introduction Methodology Results Conclusions Future Work Contributions
Test for Inferiority
Compare δ, to δ0, we have,
H0 : δ ≤ −|δ0| against HA : δ > −|δ0|
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 8/27
23. Introduction Methodology Results Conclusions Future Work Contributions
Test for Inferiority
Compare δ, to δ0, we have,
H0 : δ ≤ −|δ0| against HA : δ > −|δ0|
Consider the inferiority test. Then the null hypothesis
H0 is rejected at α% if
δn > −|δ0| + zα V ar(δn)
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 8/27
24. Introduction Methodology Results Conclusions Future Work Contributions
Test for Inferiority
Compare δ, to δ0, we have,
H0 : δ ≤ −|δ0| against HA : δ > −|δ0|
Consider the inferiority test. Then the null hypothesis
H0 is rejected at α% if
δn > −|δ0| + zα V ar(δn)
The approximate power function
PTI(δ) = Φ
−zα +
δ + | δ0 |
V ar(ˆδn)
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 8/27
25. Introduction Methodology Results Conclusions Future Work Contributions
Maximizing Power Under Cost Constraints
Suppose, we have a certain amount of money to carry
out sampling, say |A0.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 9/27
26. Introduction Methodology Results Conclusions Future Work Contributions
Maximizing Power Under Cost Constraints
Suppose, we have a certain amount of money to carry
out sampling, say |A0.
If it costs |ai to sample each observation from ith
cell
type, then A0 = K
i=1 aini =⇒ n1 = A0
K
i=1 aiγi
.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 9/27
27. Introduction Methodology Results Conclusions Future Work Contributions
Maximizing Power Under Cost Constraints
Suppose, we have a certain amount of money to carry
out sampling, say |A0.
If it costs |ai to sample each observation from ith
cell
type, then A0 = K
i=1 aini =⇒ n1 = A0
K
i=1 aiγi
.
In order to maximize power, the optimal allocation ratio
γi =
| ci | σi
| c1 | σ1
a1
ai
(1)
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 9/27
28. Introduction Methodology Results Conclusions Future Work Contributions
Maximizing Power Under Cost Constraints
Suppose, we have a certain amount of money to carry
out sampling, say |A0.
If it costs |ai to sample each observation from ith
cell
type, then A0 = K
i=1 aini =⇒ n1 = A0
K
i=1 aiγi
.
In order to maximize power, the optimal allocation ratio
γi =
| ci | σi
| c1 | σ1
a1
ai
(1)
The optimal sample size for ith
cell type is,
nio =
A0 | ci | σi
K
i=1 | cl | σl
√
alai
(2)
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 9/27
29. Introduction Methodology Results Conclusions Future Work Contributions
Sequential Procedure
Step 1: First obtain Nio = m(≥ 2) observations from
each of the cell types and for ith
cell type ,check
m ≥
A0 | Ci | SiNio
K
i=1 | cl | slNlo
√
alai
If for ith
cell type, the above condition gets satisfied, no
further observations should be obtained from that cell
type. Else go to step 2 after carrying out step 1 for all
cell types.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 10/27
30. Introduction Methodology Results Conclusions Future Work Contributions
Sequential Procedure Contd.
Step 2: Add m observations corresponding to the ith
cell type and set Nio = m + m and check,
m + m ≥
A0 | Ci | SiNio
K
i=1 | cl | slNlo
√
alai
If for ith
cell type, the above condition gets satisfied,
no further sampling needs to be done for that cell type.
Else repeat step 2 and continue this until for all cell
types, the condition gets satisfied.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 11/27
31. Introduction Methodology Results Conclusions Future Work Contributions
Sequential Procedure and Characteristics
One can find optimal sample size using the stopping
rule. Nio, is the smallest integer, such that
nio ≥
A0 | Ci | Sinio
K
i=1 | cl | slnlo
√
alai
(3)
The total cost of sampling K
i=1 Nio observations, Nio
being the estimated final optimal sample size for ith
cell
type computed using Equation 4, is |A0 .
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 12/27
32. Introduction Methodology Results Conclusions Future Work Contributions
Contd.
The optimal allocation ratio as defined in Equation 2 to
derive minimum variance depends on population variances
of the K cell types. In practice, the value of the population
variance of each cell type is unknown, therefore, the sample
size required to obtain maximum power of the test cannot
be computed.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 13/27
33. Introduction Methodology Results Conclusions Future Work Contributions
Graphical Representation
Flowchart that describes the sequential procedure developed.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 14/27
36. Introduction Methodology Results Conclusions Future Work Contributions
Graph
Figure: Accuracy w.r.t. to all the distributions taken into account
for the simulation study.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 17/27
37. Introduction Methodology Results Conclusions Future Work Contributions
About the toolkit
A toolkit (web-application) for the sequential procedure
was also developed.
Input: A xlsx file containing data in a predefined format.
Output: Optimal sample sizes for each cell type speci-
fied in the input file.
Intermediate stages of the procedure can also be viewed.
It is temporarily hosted at http://139.59.74.69/
alpha_version/
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 18/27
38. Introduction Methodology Results Conclusions Future Work Contributions
Conclusions
Using the sequential procedure we found the optimal
sample sizes of the cell types, which are required to ob-
tain maximum power under cost constraints.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 19/27
39. Introduction Methodology Results Conclusions Future Work Contributions
Conclusions
Using the sequential procedure we found the optimal
sample sizes of the cell types, which are required to ob-
tain maximum power under cost constraints.
The simulation study showed that on an average, using
our procedure, the estimated optimal sample size and
theoretical sample size are close for most of the situa-
tions, except for the mixture of normal, weibull and chi
distribution.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 19/27
40. Introduction Methodology Results Conclusions Future Work Contributions
Conclusions
Using the sequential procedure we found the optimal
sample sizes of the cell types, which are required to ob-
tain maximum power under cost constraints.
The simulation study showed that on an average, using
our procedure, the estimated optimal sample size and
theoretical sample size are close for most of the situa-
tions, except for the mixture of normal, weibull and chi
distribution.
The procedure developed is independent of distribu-
tions.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 19/27
41. Introduction Methodology Results Conclusions Future Work Contributions
Conclusions Contd.
SCOTTY is limited to normal distribution and hence
can’t handle the datasets following other distributions.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 20/27
42. Introduction Methodology Results Conclusions Future Work Contributions
Conclusions Contd.
SCOTTY is limited to normal distribution and hence
can’t handle the datasets following other distributions.
Due to this factor, our procedure gains an advantage
over SCOTTY.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 20/27
43. Introduction Methodology Results Conclusions Future Work Contributions
Conclusions Contd.
SCOTTY is limited to normal distribution and hence
can’t handle the datasets following other distributions.
Due to this factor, our procedure gains an advantage
over SCOTTY.
We have developed a toolkit for estimating optimal sam-
ple sizes of different cell types corresponding to a single
gene and for making an inference about hypothesis test.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 20/27
44. Introduction Methodology Results Conclusions Future Work Contributions
Conclusions Contd.
SCOTTY is limited to normal distribution and hence
can’t handle the datasets following other distributions.
Due to this factor, our procedure gains an advantage
over SCOTTY.
We have developed a toolkit for estimating optimal sam-
ple sizes of different cell types corresponding to a single
gene and for making an inference about hypothesis test.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 20/27
45. Introduction Methodology Results Conclusions Future Work Contributions
Future Work
The project consisted of simulation study and toolkit
development for the sequential procedure with a single
gene having multiple cell types.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 21/27
46. Introduction Methodology Results Conclusions Future Work Contributions
Future Work
The project consisted of simulation study and toolkit
development for the sequential procedure with a single
gene having multiple cell types.
The future activities include developing a procedure for
a set of genes with multiple cell types.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 21/27
47. Introduction Methodology Results Conclusions Future Work Contributions
Future Work
The project consisted of simulation study and toolkit
development for the sequential procedure with a single
gene having multiple cell types.
The future activities include developing a procedure for
a set of genes with multiple cell types.
In addition to this, the toolkit will be also added up
with extra features and would be made to run for set of
genes scenario as well.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 21/27
48. Introduction Methodology Results Conclusions Future Work Contributions
Future Work
The project consisted of simulation study and toolkit
development for the sequential procedure with a single
gene having multiple cell types.
The future activities include developing a procedure for
a set of genes with multiple cell types.
In addition to this, the toolkit will be also added up
with extra features and would be made to run for set of
genes scenario as well.
The final outcome would be a rich web and desktop
application.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 21/27
49. Introduction Methodology Results Conclusions Future Work Contributions
Contributions
Abhijeet
Writing the R code for
the sequential procedure.
Simulation study using
Gamma Distribution
Weibull Distribution
Normal-Exponential
Distribution
Chi Distribution
Chi Square (Non-
Central) Distribution
Logistic Distribution
Jishnu
R code reviewing and bug
fixing.
Transforming the R code
to Python code.
Simulation study using
Normal distribution
Mixture-normal distri-
butions
Mixture of Normal +
Weibull + Chi Distri-
bution
Toolkit development.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 22/27
50. Introduction Methodology Results Conclusions Future Work Contributions
References I
[BSM+
13] Michele A Busby, Chip Stewart, Chase A Miller,
Krzysztof R Grzeda, and Gabor T Marth, Scotty:
a web tool for designing rna-seq experiments to
measure differential gene expression, Bioinfor-
matics 29 (2013), no. 5, 656–657.
[GCL11] Jiin-Huarng Guo, Hubert J Chen, and Wei-Ming
Luh, Sample size planning with the cost constraint
for testing superiority and equivalence of two in-
dependent groups, British Journal of Mathemat-
ical and Statistical Psychology 64 (2011), no. 3,
439–461.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 23/27
51. Introduction Methodology Results Conclusions Future Work Contributions
References II
[GS91] Bhaskar Kumar Ghosh and Pranab Kumar Sen,
Handbook of sequential analysis, CRC Press,
1991.
[JT99] Christopher Jennison and Bruce W Turnbull,
Group sequential methods with applications to
clinical trials, CRC Press, 1999.
[LG16] Wei-Ming Luh and Jiin-Huarng Guo, Sample size
planning for the noninferiority or equivalence of
a linear contrast with cost considerations., Psy-
chological methods 21 (2016), no. 1, 13.
[Lip90] Mark W Lipsey, Design sensitivity: Statistical
power for experimental research, vol. 19, Sage,
1990.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 24/27
52. Introduction Methodology Results Conclusions Future Work Contributions
References III
[LR06] Erich L Lehmann and Joseph P Romano, Testing
statistical hypotheses, Springer Science & Busi-
ness Media, 2006.
[MDS08] Nitis Mukhopadhyay and Basil M De Silva, Se-
quential methods and their applications, CRC
press, 2008.
[SS81] Pranab Kumar Sen and Pranab K Sen, Sequential
nonparametrics: invariance principles and statis-
tical inference, Wiley New York, 1981.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 25/27
53. Introduction Methodology Results Conclusions Future Work Contributions
Any Questions?
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 26/27
54. Introduction Methodology Results Conclusions Future Work Contributions
Statistics, likelihoods, and probabilities
mean everything to men, nothing to God.
- Richelle E. Goodrich
Thank You.
P Jishnu Jaykumar Abhijeet Singh Panwar BTP-2017 27/27