SlideShare a Scribd company logo
1 of 36
Download to read offline
Erin Shellman
DAML July, 27 2016
Catching the most with
high-throughput screening
Zymergen provides
a platform for the
rapid improvement
of microbial strains
through genetic
engineering.
http://www.yourgenome.org/facts/what-is-genetic-engineering
What if you don’t know which
gene to perturb?
Reduce the system to it’s
constituent parts and
experiment on each part, one
at a time, until you’ve described
the causal mechanism.
…then publish a paper.
Try thousands of
things and see
what works.
High-throughput
screening (HTS) is a
process for evaluating
many simultaneous
hypotheses.
Tier 1 Screen
Tier 2 Screen
Tank validation
Screening goals
Minimize false negatives
Minimize false positives
Confirm that our best tier 2
strains perform well in the
tank
Operational
Implications
HTS poses two
unique challenges:
1. low sample size
2. high variance
Small sample size
This is largely intentional. We
know most things don’t work, so
why waste resources on a
gamble?
High variance
Experimental
complexity creates
opportunities for
injection of bias and
unwanted variance.
😓
• Many classical statistical methods
assume normality and common
variance—which we can’t assume.
• We need to be especially thoughtful in
designing our tests.
Real, unknown distributions
15% difference in means
When we take
measurements, we get
little bits of information
about what these shapes
might be.
But we don’t always arrive
at the right answer.
0.86 0.98 1.2
meanbasestrain = 1.01 meancandidate = 0.81
0.55 0.87 1.02
p-value = 0.31
0.86 0.98 1.2
meanbasestrain = 1.01 meancandidate = 0.81
0.55 0.87 1.02
How many measurements
should I take so I can sleep
knowing that I’ve made the
best possible promotion
decisions?
p-value = 0.31
It depends.
What is the expected effect size, i.e. how different
do you think the strains are?
5% difference in means 50% difference in means
How variable are the measurement values?
It depends.
5% difference in means 50% difference in means
Power analysis
• Power analysis is a method for estimating the
sample size required to detect changes at
assumed levels.
• Power is the probability of detecting a difference,
when a difference is present.
• We compute it through simulation.
Power is a fixed parameter
The power threshold is set at 0.80,
meaning if we run the same experiment
100 times, we can expect to detect
differences in means at least 80 out of
those 100 times.
Simulation study design
t-test sum rank
contamination
no
contamination
• Parametric test, i.e.
assumes that the data
are normally distributed
• Sensitive to extreme
values
• Non-parametric test, i.e.
makes no distributional
assumptions
• Less sensitive to extreme
values
t-test sum rank
Initialize Strains:
𝝁basestrain, 𝝈basestrain,
𝝁mutant, 𝝈mutant
Initialize Campaign:
basestrain, mutants, N,
contamination rate, test
Simulate Data
Test for differences
in means
X5000
power = # times diff
detected / 5000
N = range(3, 11)
mu_ref = 0.80
sigma_ref = 0.30
mu_range = np.arange(0.80, 1.70, 0.10)
sigma_range = np.arange(0.05, 2, 0.10)
reference.get_observations(3)
Out:array([ 0.96, 1.00, 1.28])
mutant.get_observations(3)
Out: array([ 1.98, 1.60, 1.70])
e.g.
e.g.
A candidate strain would
have to show about 40%
improvement to be
detectable with 3 replicates
A candidate strain would
have to show about 15%
improvement to be
detectable with 10 replicates
No contamination
5% contamination
Results
• The presence of extreme values undermines
our ability to detect differences by effectively
decreasing N.
• We can make progress in the face of
extreme values by using non-parametric tests,
like sum rank, that perform equally well in
ideal conditions and better than the t-test in
typical conditions.
Adaptive experimental design?
STRAINPERFORMANCE
PROJECT TIME
1. ZERO TO MILLIGRAMS
Hits are big enough to detect with
low N.
2. MILLIGRAMS TO KILOGRAMS
Hits sizes shrinking and becoming more
variable as low hanging fruits dry up.
3. KILOGRAMS TO COMMODITY
Hit sizes are at their smallest as we
approach the theoretical max.
Tier 1 Screen
Tier 2 Screen
Tank validation
Screening goals
Minimize false negatives
Minimize false positives
Confirm that our best tier 2
strains perform well in the
tank
many hypotheses
low N
low promotion threshold
fewer hypotheses
bigger N
higher promotion threshold
Operational
Implications
Zero to
Milligrams
Milligrams to
Kilograms
Kilograms to
Commodity
Tier 1
4 replicates
p-value <= 0.10
6 replicates
p-value <= 0.10
8 replicates
p-value <= 0.10
Tier 2
8 replicates
p-value <= 0.05
12 replicates
p-value <= 0.05
16 replicates
p-value <= 0.05
Tank Tank is truth.☝
Hypothetical design
Thanks for listening!
Questions?

More Related Content

Similar to Catching the most with high-throughput screening

10. sampling and hypotehsis
10. sampling and hypotehsis10. sampling and hypotehsis
10. sampling and hypotehsisKaran Kukreja
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.pptbutest
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.pptbutest
 
Converting Measurement Systems From Attribute
Converting Measurement Systems From AttributeConverting Measurement Systems From Attribute
Converting Measurement Systems From Attributejdavidgreen007
 
Basics of SPSS and how to use it first time
Basics of SPSS and how to use it first timeBasics of SPSS and how to use it first time
Basics of SPSS and how to use it first timeRagabGautam1
 
Diversity-Aware Mutation Adequacy Criterion for Improving Fault Detection Cap...
Diversity-Aware Mutation Adequacy Criterion for Improving Fault Detection Cap...Diversity-Aware Mutation Adequacy Criterion for Improving Fault Detection Cap...
Diversity-Aware Mutation Adequacy Criterion for Improving Fault Detection Cap...Donghwan Shin
 
1. complete stats notes
1. complete stats notes1. complete stats notes
1. complete stats notesBob Smullen
 
Statistical analysis &amp; errors (lecture 3)
Statistical analysis &amp; errors (lecture 3)Statistical analysis &amp; errors (lecture 3)
Statistical analysis &amp; errors (lecture 3)Farhad Ashraf
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
GUIDELINES for SPSS STATISTICAL ANALYSES OF TESTS-1
GUIDELINES for SPSS STATISTICAL ANALYSES OF TESTS-1GUIDELINES for SPSS STATISTICAL ANALYSES OF TESTS-1
GUIDELINES for SPSS STATISTICAL ANALYSES OF TESTS-1Christopher Green
 
Making your science powerful : an introduction to NGS experimental design
Making your science powerful : an introduction to NGS experimental designMaking your science powerful : an introduction to NGS experimental design
Making your science powerful : an introduction to NGS experimental designjelena121
 

Similar to Catching the most with high-throughput screening (20)

10. sampling and hypotehsis
10. sampling and hypotehsis10. sampling and hypotehsis
10. sampling and hypotehsis
 
9618821.ppt
9618821.ppt9618821.ppt
9618821.ppt
 
9618821.pdf
9618821.pdf9618821.pdf
9618821.pdf
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.ppt
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.ppt
 
A04 Sample Size
A04 Sample SizeA04 Sample Size
A04 Sample Size
 
A04 Sample Size
A04 Sample SizeA04 Sample Size
A04 Sample Size
 
Data science
Data scienceData science
Data science
 
Converting Measurement Systems From Attribute
Converting Measurement Systems From AttributeConverting Measurement Systems From Attribute
Converting Measurement Systems From Attribute
 
Unit 3 Sampling
Unit 3 SamplingUnit 3 Sampling
Unit 3 Sampling
 
Basics of SPSS and how to use it first time
Basics of SPSS and how to use it first timeBasics of SPSS and how to use it first time
Basics of SPSS and how to use it first time
 
Diversity-Aware Mutation Adequacy Criterion for Improving Fault Detection Cap...
Diversity-Aware Mutation Adequacy Criterion for Improving Fault Detection Cap...Diversity-Aware Mutation Adequacy Criterion for Improving Fault Detection Cap...
Diversity-Aware Mutation Adequacy Criterion for Improving Fault Detection Cap...
 
T test
T testT test
T test
 
1. complete stats notes
1. complete stats notes1. complete stats notes
1. complete stats notes
 
hypothesis.pptx
hypothesis.pptxhypothesis.pptx
hypothesis.pptx
 
Statistical analysis &amp; errors (lecture 3)
Statistical analysis &amp; errors (lecture 3)Statistical analysis &amp; errors (lecture 3)
Statistical analysis &amp; errors (lecture 3)
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
SPSS.ppt
SPSS.pptSPSS.ppt
SPSS.ppt
 
GUIDELINES for SPSS STATISTICAL ANALYSES OF TESTS-1
GUIDELINES for SPSS STATISTICAL ANALYSES OF TESTS-1GUIDELINES for SPSS STATISTICAL ANALYSES OF TESTS-1
GUIDELINES for SPSS STATISTICAL ANALYSES OF TESTS-1
 
Making your science powerful : an introduction to NGS experimental design
Making your science powerful : an introduction to NGS experimental designMaking your science powerful : an introduction to NGS experimental design
Making your science powerful : an introduction to NGS experimental design
 

More from Erin Shellman

Case studies in data-driven merchandising
Case studies in data-driven merchandisingCase studies in data-driven merchandising
Case studies in data-driven merchandisingErin Shellman
 
Building Robust Pipelines with Airflow
Building Robust Pipelines with AirflowBuilding Robust Pipelines with Airflow
Building Robust Pipelines with AirflowErin Shellman
 
Developing effective data scientists
Developing effective data scientistsDeveloping effective data scientists
Developing effective data scientistsErin Shellman
 
Downloading the internet with Python + Scrapy
Downloading the internet with Python + ScrapyDownloading the internet with Python + Scrapy
Downloading the internet with Python + ScrapyErin Shellman
 
Fun! with the Twitter API
Fun! with the Twitter APIFun! with the Twitter API
Fun! with the Twitter APIErin Shellman
 
Collaborative Filtering for fun ...and profit!
Collaborative Filtering for fun ...and profit!Collaborative Filtering for fun ...and profit!
Collaborative Filtering for fun ...and profit!Erin Shellman
 
Assumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourselfAssumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourselfErin Shellman
 

More from Erin Shellman (9)

Case studies in data-driven merchandising
Case studies in data-driven merchandisingCase studies in data-driven merchandising
Case studies in data-driven merchandising
 
Building Robust Pipelines with Airflow
Building Robust Pipelines with AirflowBuilding Robust Pipelines with Airflow
Building Robust Pipelines with Airflow
 
Developing effective data scientists
Developing effective data scientistsDeveloping effective data scientists
Developing effective data scientists
 
Bot or Not
Bot or NotBot or Not
Bot or Not
 
Downloading the internet with Python + Scrapy
Downloading the internet with Python + ScrapyDownloading the internet with Python + Scrapy
Downloading the internet with Python + Scrapy
 
Fun! with the Twitter API
Fun! with the Twitter APIFun! with the Twitter API
Fun! with the Twitter API
 
real time real talk
real time real talkreal time real talk
real time real talk
 
Collaborative Filtering for fun ...and profit!
Collaborative Filtering for fun ...and profit!Collaborative Filtering for fun ...and profit!
Collaborative Filtering for fun ...and profit!
 
Assumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourselfAssumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourself
 

Recently uploaded

Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 

Recently uploaded (20)

Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 

Catching the most with high-throughput screening

  • 1. Erin Shellman DAML July, 27 2016 Catching the most with high-throughput screening
  • 2.
  • 3. Zymergen provides a platform for the rapid improvement of microbial strains through genetic engineering.
  • 5. What if you don’t know which gene to perturb?
  • 6. Reduce the system to it’s constituent parts and experiment on each part, one at a time, until you’ve described the causal mechanism. …then publish a paper.
  • 7. Try thousands of things and see what works.
  • 8. High-throughput screening (HTS) is a process for evaluating many simultaneous hypotheses.
  • 9. Tier 1 Screen Tier 2 Screen Tank validation Screening goals Minimize false negatives Minimize false positives Confirm that our best tier 2 strains perform well in the tank Operational Implications
  • 10. HTS poses two unique challenges: 1. low sample size 2. high variance
  • 11. Small sample size This is largely intentional. We know most things don’t work, so why waste resources on a gamble?
  • 12. High variance Experimental complexity creates opportunities for injection of bias and unwanted variance.
  • 13. 😓 • Many classical statistical methods assume normality and common variance—which we can’t assume. • We need to be especially thoughtful in designing our tests.
  • 14. Real, unknown distributions 15% difference in means
  • 15. When we take measurements, we get little bits of information about what these shapes might be.
  • 16. But we don’t always arrive at the right answer. 0.86 0.98 1.2 meanbasestrain = 1.01 meancandidate = 0.81 0.55 0.87 1.02 p-value = 0.31
  • 17. 0.86 0.98 1.2 meanbasestrain = 1.01 meancandidate = 0.81 0.55 0.87 1.02 How many measurements should I take so I can sleep knowing that I’ve made the best possible promotion decisions? p-value = 0.31
  • 18. It depends. What is the expected effect size, i.e. how different do you think the strains are? 5% difference in means 50% difference in means
  • 19. How variable are the measurement values? It depends. 5% difference in means 50% difference in means
  • 20. Power analysis • Power analysis is a method for estimating the sample size required to detect changes at assumed levels. • Power is the probability of detecting a difference, when a difference is present. • We compute it through simulation.
  • 21. Power is a fixed parameter The power threshold is set at 0.80, meaning if we run the same experiment 100 times, we can expect to detect differences in means at least 80 out of those 100 times.
  • 22. Simulation study design t-test sum rank contamination no contamination
  • 23. • Parametric test, i.e. assumes that the data are normally distributed • Sensitive to extreme values • Non-parametric test, i.e. makes no distributional assumptions • Less sensitive to extreme values t-test sum rank
  • 24. Initialize Strains: 𝝁basestrain, 𝝈basestrain, 𝝁mutant, 𝝈mutant Initialize Campaign: basestrain, mutants, N, contamination rate, test Simulate Data Test for differences in means X5000 power = # times diff detected / 5000 N = range(3, 11) mu_ref = 0.80 sigma_ref = 0.30 mu_range = np.arange(0.80, 1.70, 0.10) sigma_range = np.arange(0.05, 2, 0.10) reference.get_observations(3) Out:array([ 0.96, 1.00, 1.28]) mutant.get_observations(3) Out: array([ 1.98, 1.60, 1.70]) e.g. e.g.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29. A candidate strain would have to show about 40% improvement to be detectable with 3 replicates A candidate strain would have to show about 15% improvement to be detectable with 10 replicates
  • 31.
  • 32. Results • The presence of extreme values undermines our ability to detect differences by effectively decreasing N. • We can make progress in the face of extreme values by using non-parametric tests, like sum rank, that perform equally well in ideal conditions and better than the t-test in typical conditions.
  • 33. Adaptive experimental design? STRAINPERFORMANCE PROJECT TIME 1. ZERO TO MILLIGRAMS Hits are big enough to detect with low N. 2. MILLIGRAMS TO KILOGRAMS Hits sizes shrinking and becoming more variable as low hanging fruits dry up. 3. KILOGRAMS TO COMMODITY Hit sizes are at their smallest as we approach the theoretical max.
  • 34. Tier 1 Screen Tier 2 Screen Tank validation Screening goals Minimize false negatives Minimize false positives Confirm that our best tier 2 strains perform well in the tank many hypotheses low N low promotion threshold fewer hypotheses bigger N higher promotion threshold Operational Implications
  • 35. Zero to Milligrams Milligrams to Kilograms Kilograms to Commodity Tier 1 4 replicates p-value <= 0.10 6 replicates p-value <= 0.10 8 replicates p-value <= 0.10 Tier 2 8 replicates p-value <= 0.05 12 replicates p-value <= 0.05 16 replicates p-value <= 0.05 Tank Tank is truth.☝ Hypothetical design