SlideShare a Scribd company logo
Making your science powerful: an introduction to
NGS experimental design
7th May CSCR seminar
Dr Jelena Aleksic
The purpose of this talk
Bioinformaticians often get asked about experimental design, as
we have experience working with the data.
Good experimental design is absolutely crucial for NGS
applications, because of the size of the datasets.
The purpose of this talk is to introduce the basic concepts of
good NGS experimental design.
NGS Experimental Design
• Importance
• Specificity and power
• Accuracy, precision and bias
• Considerations for sample collection
• Considerations for sequencing
• Conclusions
Without a valid design, valid scientific conclusions cannot be drawn. Also
particularly important for NGS because of the size of the datasets.
Ronald A. Fisher
(1890-1962)
“To consult the statistician after an experiment is
finished is often merely to ask them to conduct a
post mortem examination. They can perhaps say what
the experiment died of.” (1938)
Good to think about early!
Value of Planning
Rushing into experiments without thoughtful planning invites
failure.
“Seventy percent of whether your experiment will
work is determined before you touch the first test
tube”
Tung-Tien Sun (2004).
Excessive trust in authorities and its influence on experimental design.
Nature Reviews Molecular Cell Biology
Still a serious issue in the field
- Almost 70% of all the human RNA-seq samples in GEO do not
have biological replicates (Feng et al. 2012)
- More unreplicated RNA-seq data were published than
replicated RNA-seq data in 2011 (Feng et al. 2012)
- ENCODE guidelines recommend two ChIP-seq biological
replicates (Landt et al. 2012), but this is controversial and
arguably too few
• Importance
• Specificity and power
• Accuracy, precision and bias
• Considerations for sample collection
• Considerations for sequencing
• Conclusions
NGS Experimental Design
Important concepts
• Specificity: the proportion of results that are genuine ones,
as opposed to false positives
• Power (sensitivity): the proportion of genuine results that are
successfully identified by the experiment
e.g. At FDR 1%, 1 in 100 results will be a false positive.
e.g. At 20% power, the experiment will successfully identify
200 out of 1000 genuine differentially expressed genes at a
specified fold-change level.
Adequately Powered
The power (sensitivity) of an experiment is the
probability that it can detect an effect, if it is present.
•Power is often overlooked.
•A probability - any value between 0% and 100%.
•Achieved by:
–Using appropriate numbers of animals / samples
(sample size)
–Controlling sources of variation
If you increase the variability when you increase the size then it
won't necessarily have more power.
Adequately Powered
•Too many samples:
–wastes resources e.g. animals
(unethical), money, time and
effort.
•Too few samples:
–may lack power and miss a
scientifically important effect. Power(%)
Sample Size per group (n)
How much power is enough?
– Depends on the experiment purpose.
– To just get the top few targets = not much.
– To get a comprehensive list of targets = a lot.
– Rare transcripts / genomic variants = a lot.
– Also depends on variability of samples.
– In vitro studies = less variability
– In vivo = more variability, mixture of tissues
– Cancer = hugely variable, need a lot of power
How do I tell?
• It’s possible to calculate the required power of an experiment.
For RNA-seq, there’s even an online tool (and there are
various general calculators):
• Alternatively, use rules of thumb based on other people’s
systematic reviews of a particular experiment type.
How many replicates?
4 mice…
No, it should be
12 mice…
Look into my crystal ball……
The sensitivity, ability or power to detect changes
depends on the sample size.
Depends on the resources, the goals of the study, and the
reliability of the technology:
–How much money do you have?
–Can you handle all these samples without problem?
–What size of differences (effect size) to detect?
–It’s an accurate representation of the population?
–Large enough to achieve meaningful results?
How many replicates?
What is the minimum number
of replicates?
Experiments with only one replicate break
statistics and make puppies cry.
• Unless:
– It’s a pilot experiment, e.g. to estimate data quality /
antibody specificity / quantity of particular transcripts etc.,
and you plan to do follow-ups (further reps or other
biological validation).
– It’s performed on a homogeneous and well-characterised
system where a lot of other data already exists. e.g. RNA-
seq of a much used cell line.
–It’s a simple confirmatory experiment
What is the minimum number
of replicates?
• Adequate minimum number of replicates for an
exploratory study = 3.
• The reasoning
– 3 data points per gene lets you fit a statistical distribution
more accurately, and gives a better estimate of variability
– This in turn improves both the specificity and the
sensitivity of the experiment.
Some labs do 4 replicates as standard, in case one fails.
• Importance
• Specificity and power
• Accuracy, precision and bias
• Considerations for sample collection
• Considerations for sequencing
• Conclusions
NGS Experimental Design
Precise & Unbiased
PRECISION:
• Reproducibility of repeated
measurements.
• Precise estimation of the quantity of
interest.
Precision
Bias
Target
UNBIASED:
• Bias can affect accuracy.
• Should control for systematic differences between the measure
and some “true" value (target).
• Doesn’t confound that estimate with a technical effect.
• Systematic variation (bias) leads to results being inaccurate.
• Random variation (chance) leads to results being imprecise.
“A biased scientific result is no
different from a useless one”
Daniel Sarewitz
(Beware the creeping cracks of bias. Nature, 2012)
• Bias is avoided by:
• Correct selection of experimental units.
• Randomisation of the experimental units.
• Randomisation of the order in which
measurements are made.
• “Blinding” and the use of coded samples where
appropriate.
• Failure to randomise and blind can lead to
false positive and negative results
Controlling bias
Confounding factors:
example
Plate1 Plate2 Plate3
Control Treatment 1 Treatment 2
RNA Extraction
The difference between Control, Treatment 1
and Treatment 2 is confounded by Plate
Confounding factors:
example 2
Day1, Plate 1 Day2, Plate 2 Day3, Plate 3
Control Treatment 1 Treatment 2
RNA Extraction
The difference between Control, Treatment 1
and Treatment 2 is confounded by day and plate.
Sources of potential bias
Issues can arise if any of the experimental steps are
applied in a systematically biased way between sample
and control. e.g. During:
- Sample collection procedure
- Molecular biology during the main experiment (e.g.
ChIP)
- Library prep
- Sequencing (different lanes on same flowcell is ok, but
different flow cells can produce different results)
http://www.the-scientist.com/blog/display/57558/
• A GWAS study of 800 centenarians against controls found 150 SNPs which can predict
if a person is a centenarian with 77 % accuracy.
• Problem: they used different SNP chips for centenarian vs control.
• Retracted 2011 following an independent lab reviewed the data and QC applied.
• Importance
• Specificity and power
• Accuracy, precision and bias
• Considerations for sample collection
• Considerations for sequencing
• Conclusions
NGS Experimental Design
Samples to collect
• Adequate minimum number of replicates for an
exploratory study = 3.
• Controls are also essential.
• Biological replicates should be performed -
technical reps will not capture variation present in the
tissues / organisms.
–> This means you need at least 3 biological replicate
samples, and 3 biological replicate controls.
Biological or technical
replicates?
Choose Samples
Extract RNA
Quality Control
Convert to cDNA
Quality Control
Library prep
Washing
Sequencing
Analysis
Biological Replication
Technical Replication
Technical Replication
Technical Replication
Technical Replication
RNAseqProcessingWorkflow
Biological replicate tricky
cases
• Some biological replicate cases are obvious: e.g.
tissue samples from different mice, blood samples
from different people, cell cultures from different
people.
• Others are less clear cut:
• e.g. Experiments using a single cell line
Cell line replicates
Treatment
Results
Bad example
Cell line replicates
Treatment
Results
Bad example Better
Cell line replicates
Treatment
Results
Bad example Better
Even better
Treatment
Results
Cell line replicates
Treatment
Results
Bad example Better
Even better
Treatment
Results
None of these
are biological
replicates - but
we do the best
we can.
Performing cell line replicates
• The aim is to assess the replicability of a given
experiment, by performing it independently
• This means ideally they should be done on different
days, with fresh medium and reagents etc. However,
this might be impractical.
• It is at least important to perform the treatments
independently. e.g. To get 4 replicates, do 4 separate
transfections (or RNAi etc) for the experimental
sample, and 4 for the control.
Experiment Controls
• A very important part in your experiment,
because it’s very difficult to eliminate all of the
possible confounding variables & bias.
• Designing the experiment with controls in mind is
often more crucial than determining the
independent variable.
• Increases the statistical validity of your data.
• Two types: positive and negative.
Experimental Controls
• Placebo control
Mimic a procedure or treatment without the
actual use of the procedure or test
substance.
• e.g. same surgical procedure but
without X implanted
• Vehicle control
Used in studies where a substance is used to deliver an experimental
compound.
e.g. apply EtOH to cell lines on it’s own as a control since it’s
used as a vehicle for delivering the Tamoxifen drug.
• Baseline biological state
e.g. Wild type to observe the effects of a knockout.
ChIP-seq Controls
• What controls to use
– Either input DNA or IgG antibody are commonly used. Both
have pros and cons.
• Replicates
- The control should have as many replicates as the sample,
as it is also variable.
• Particularly important
- ChIP reactions are extremely noisy and variable.
Should I pool samples?
When working with very small amounts of tissue sample, pooling is a
good way of reducing the noise while keeping the number of n
reasonably small.
e.g. where n = no. of sequencing reactions.
• Better to pool the biological material (tissue, cells), not the purified
RNA or labeled cDNA. In this way, problems are far easier to spot.
• However, there’s no way to estimate variation between individuals
in a pool, which is sometimes important and often interesting.
• Beware of outliers - Don’t include any sample that looks suspicious.
• In some studies (pooled vs unpooled designs) the majority of DEGs turn out
extreme in only one individual.
• Importance
• Specificity and power
• Accuracy, precision and bias
• Considerations for sample collection
• Considerations for sequencing
• Conclusions
NGS Experimental Design
Basic NGS parameters
• Library size / sequencing depth: the number of reads
obtained for a given sample (e.g. 20 million)
• Read length: bp length of each read in the library (e.g. 150
bp)
• Single end vs paired end: single end sequencing only
sequences one end of each DNA fragment, while paired end
sequences both ends.
How deeply should you
sequence
• Depends on the application: e.g. some parts of the genome
will be more tricky to sequence, so full coverage for genome
assembly may require very deep sequencing.
• RNA-seq differential expression: very roughly (for human
and mouse) - at least 10 million reads per sample is required,
and 30 million reads per sample should suit most applications.
Can sequence deeper if low expressed genes are particularly
important.
• If in doubt: you can run a pilot experiment and see how much
coverage and saturation you get, and potentially sequence
further after that.
Sequencing depth vs
replicates
Replicates are good! Even if you don’t sequence any more
deeply.
A study looking at RNAseq (Liu et al. 2013) found that adding
more replicates at 10 million reads library size was a more cost-
efficient way of improving the power of the experiment, than
adding more sequencing coverage. This was true for 3-7
replicates.
So, if you don’t have money for a lot of sequencing, you can still
multiplex a number of biological replicates!
What read length?
• The read length should match your sample fragment length:
you want it to be around the size of and just a bit shorter than your
fragments.
• Recommended for Illumina:
sRNAseq, ribosomal profiling = 50 bp read length
mRNAseq (fragment size ~80-200bp) = 150 bp read length
longer fragments = 250 bp
• For longer reads / niche applications: other sequencers also
exist. PacBio has an impressive read length (often >10 kb)
Single end vs paired end
• Single end: a bit cheaper, less sequencing required. Suitable for
most general purposes, such as differential expression analysis.
• Paired end: contains more length and positional information about
the sequenced fragment - can tell where it starts and stops.
• Applications for which you need paired end sequencing:
- splice junctions
- rearrangements: indels and inversions
• Importance
• Specificity and power
• Accuracy, precision and bias
• Considerations for sample collection
• Considerations for sequencing
• Conclusions
NGS Experimental Design
Summary
• Experimental design is key to a successful genomics experiment.
• The required power of an experiment should be considered when
choosing the number of replicates.
• An adequate number of biological replicates (usually >=3) is
crucial for a reliable statistical analysis
• Appropriate controls are essential
• Systematic bias can completely mess up an experiment and should
be controlled
Acknowledgements
Thank you to Ela and everyone in the Frye lab!
Iwona Driskell
Sandra Blanco
Shobbir Hussain
Joana Flores
Martyna Popis
Roberto Bandiera
Abdul Sajini
Mahalia Page
Nikoletta Gkatza
Carolin Witte
And also the following colleagues for
excellent experimental design
discussions:
Sabine Dietmann
Patrick Lombard
Meryem Ralser
Bettina Fischer (CSBC)
Sarah Carl (CSBC)
Duncan Odom (CRUK CRI)
Audrey Fu (University of Chicago)
And thank you all for listening!
I would particularly like to thank Roslin Russell (CRUK) for letting
me use her course material slides as a basis for preparing this talk

More Related Content

What's hot

Screening and selection of recombinants
Screening and selection of recombinants Screening and selection of recombinants
Screening and selection of recombinants
Kristu Jayanti College
 
Srid
SridSrid
Genome Editing- ZNF vs TELEN
Genome Editing- ZNF vs TELENGenome Editing- ZNF vs TELEN
Genome Editing- ZNF vs TELEN
abhijeetanandha1
 
Retro copia transposons
Retro copia transposonsRetro copia transposons
Retro copia transposons
Nethravathi Siri
 
Monoclonal antibody
Monoclonal antibodyMonoclonal antibody
Monoclonal antibody
Dr. Ashutosh Tiwari
 
Nucleic acids as therapeutic agents
Nucleic acids as therapeutic agentsNucleic acids as therapeutic agents
Nucleic acids as therapeutic agents
RESHMASOMAN3
 
Phage display
Phage displayPhage display
Phage display
Balaji Rathod
 
Illumina (sequencing by synthesis) method
Illumina (sequencing by synthesis) methodIllumina (sequencing by synthesis) method
Illumina (sequencing by synthesis) method
FekaduKorsa
 
Fundamental techniques of gene manipulation
Fundamental techniques of gene manipulationFundamental techniques of gene manipulation
Fundamental techniques of gene manipulation
Manigandan s
 
Biotechnology: Yeast Artificial Chromosome Cloning Vector
Biotechnology: Yeast Artificial Chromosome Cloning VectorBiotechnology: Yeast Artificial Chromosome Cloning Vector
Biotechnology: Yeast Artificial Chromosome Cloning Vector
Manish Raj
 
Rdna technology
Rdna technologyRdna technology
Rdna technology
Stiti Dash
 
Plasmid
PlasmidPlasmid
Plasmid
Pram Priyanca
 
Antibody phage display technology
Antibody phage display technologyAntibody phage display technology
Antibody phage display technology
1431122
 
Homology
HomologyHomology
Homology
avrilcoghlan
 
Genetics ppt
Genetics pptGenetics ppt
Genetics ppt
Deepti Barange
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
MSCW Mysore
 
Bidirectional and rolling circular dna replication
Bidirectional and rolling circular dna replicationBidirectional and rolling circular dna replication
Bidirectional and rolling circular dna replication
Gayathri91098
 
Transposition
TranspositionTransposition
Transposition
swethamohan17
 
Method of gene transfer
Method of gene transferMethod of gene transfer
Method of gene transfer
SumedhaBobade
 
Conjugation
ConjugationConjugation
Conjugation
Anup Singh
 

What's hot (20)

Screening and selection of recombinants
Screening and selection of recombinants Screening and selection of recombinants
Screening and selection of recombinants
 
Srid
SridSrid
Srid
 
Genome Editing- ZNF vs TELEN
Genome Editing- ZNF vs TELENGenome Editing- ZNF vs TELEN
Genome Editing- ZNF vs TELEN
 
Retro copia transposons
Retro copia transposonsRetro copia transposons
Retro copia transposons
 
Monoclonal antibody
Monoclonal antibodyMonoclonal antibody
Monoclonal antibody
 
Nucleic acids as therapeutic agents
Nucleic acids as therapeutic agentsNucleic acids as therapeutic agents
Nucleic acids as therapeutic agents
 
Phage display
Phage displayPhage display
Phage display
 
Illumina (sequencing by synthesis) method
Illumina (sequencing by synthesis) methodIllumina (sequencing by synthesis) method
Illumina (sequencing by synthesis) method
 
Fundamental techniques of gene manipulation
Fundamental techniques of gene manipulationFundamental techniques of gene manipulation
Fundamental techniques of gene manipulation
 
Biotechnology: Yeast Artificial Chromosome Cloning Vector
Biotechnology: Yeast Artificial Chromosome Cloning VectorBiotechnology: Yeast Artificial Chromosome Cloning Vector
Biotechnology: Yeast Artificial Chromosome Cloning Vector
 
Rdna technology
Rdna technologyRdna technology
Rdna technology
 
Plasmid
PlasmidPlasmid
Plasmid
 
Antibody phage display technology
Antibody phage display technologyAntibody phage display technology
Antibody phage display technology
 
Homology
HomologyHomology
Homology
 
Genetics ppt
Genetics pptGenetics ppt
Genetics ppt
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
Bidirectional and rolling circular dna replication
Bidirectional and rolling circular dna replicationBidirectional and rolling circular dna replication
Bidirectional and rolling circular dna replication
 
Transposition
TranspositionTransposition
Transposition
 
Method of gene transfer
Method of gene transferMethod of gene transfer
Method of gene transfer
 
Conjugation
ConjugationConjugation
Conjugation
 

Viewers also liked

Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
Thomas Keane
 
Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1
Maté Ongenaert
 
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
QIAGEN
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
VHIR Vall d’Hebron Institut de Recerca
 
Next Generation Sequencing & Transcriptome Analysis
Next Generation Sequencing & Transcriptome AnalysisNext Generation Sequencing & Transcriptome Analysis
Next Generation Sequencing & Transcriptome Analysis
Bastian Greshake
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
cursoNGS
 
High throughput sequencing
High throughput sequencingHigh throughput sequencing
High throughput sequencing
Cristian Cosentino, PhD
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
Bilal Nizami
 
NGS for Infectious Disease Diagnostics: An Opportunity for Growth
NGS for Infectious Disease Diagnostics: An Opportunity for Growth NGS for Infectious Disease Diagnostics: An Opportunity for Growth
NGS for Infectious Disease Diagnostics: An Opportunity for Growth
Alira Health
 
Computational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysisComputational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysis
cursoNGS
 
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
QIAGEN
 
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
QIAGEN
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Dayananda Salam
 
“Next-Generation Sequencing (NGS) Global Market – Forecast To 2022”
“Next-Generation Sequencing (NGS) Global Market  – Forecast To 2022”“Next-Generation Sequencing (NGS) Global Market  – Forecast To 2022”
“Next-Generation Sequencing (NGS) Global Market – Forecast To 2022”
Vinay Shiva Prasad
 
The QIAseq NGS Portfolio for Cancer Research: Sample-to-Insight for All
The QIAseq NGS Portfolio for Cancer Research: Sample-to-Insight for AllThe QIAseq NGS Portfolio for Cancer Research: Sample-to-Insight for All
The QIAseq NGS Portfolio for Cancer Research: Sample-to-Insight for All
QIAGEN
 
Basic Steps of the NGS Method
Basic Steps of the NGS MethodBasic Steps of the NGS Method
Basic Steps of the NGS Method
USD Bioinformatics
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101
Ino de Bruijn
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
VHIR Vall d’Hebron Institut de Recerca
 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology Overview
Dominic Suciu
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
James Hadfield
 

Viewers also liked (20)

Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
Workshop NGS data analysis - 1
Workshop NGS data analysis - 1Workshop NGS data analysis - 1
Workshop NGS data analysis - 1
 
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
Advanced NGS Data Analysis & Interpretation- BGW + IVA: NGS Tech Overview Web...
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
Next Generation Sequencing & Transcriptome Analysis
Next Generation Sequencing & Transcriptome AnalysisNext Generation Sequencing & Transcriptome Analysis
Next Generation Sequencing & Transcriptome Analysis
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
High throughput sequencing
High throughput sequencingHigh throughput sequencing
High throughput sequencing
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
NGS for Infectious Disease Diagnostics: An Opportunity for Growth
NGS for Infectious Disease Diagnostics: An Opportunity for Growth NGS for Infectious Disease Diagnostics: An Opportunity for Growth
NGS for Infectious Disease Diagnostics: An Opportunity for Growth
 
Computational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysisComputational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysis
 
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
 
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
“Next-Generation Sequencing (NGS) Global Market – Forecast To 2022”
“Next-Generation Sequencing (NGS) Global Market  – Forecast To 2022”“Next-Generation Sequencing (NGS) Global Market  – Forecast To 2022”
“Next-Generation Sequencing (NGS) Global Market – Forecast To 2022”
 
The QIAseq NGS Portfolio for Cancer Research: Sample-to-Insight for All
The QIAseq NGS Portfolio for Cancer Research: Sample-to-Insight for AllThe QIAseq NGS Portfolio for Cancer Research: Sample-to-Insight for All
The QIAseq NGS Portfolio for Cancer Research: Sample-to-Insight for All
 
Basic Steps of the NGS Method
Basic Steps of the NGS MethodBasic Steps of the NGS Method
Basic Steps of the NGS Method
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology Overview
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
 

Similar to Making your science powerful : an introduction to NGS experimental design

So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
University of California, Davis
 
3 - SamplingTechVarious sampling techniques are employedniques(new1430).ppt
3 - SamplingTechVarious sampling techniques are employedniques(new1430).ppt3 - SamplingTechVarious sampling techniques are employedniques(new1430).ppt
3 - SamplingTechVarious sampling techniques are employedniques(new1430).ppt
gharkaacer
 
Hypothesis Testing and Experimental design.pptx
Hypothesis Testing and Experimental design.pptxHypothesis Testing and Experimental design.pptx
Hypothesis Testing and Experimental design.pptx
Abile2
 
Experimental method of Educational Research.
Experimental method of Educational Research.Experimental method of Educational Research.
Experimental method of Educational Research.
Neha Deo
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
Lekki Frazier-Wood
 
Research design and experimentation
Research design and experimentationResearch design and experimentation
Research design and experimentation
Dr NEETHU ASOKAN
 
sience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real studysience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real study
wolf vanpaemel
 
Advances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyAdvances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell Technology
QIAGEN
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
Russ Altman
 
Sampling techniques
Sampling techniquesSampling techniques
Sampling techniques
Gaurav Saxena
 
Statistics five
Statistics fiveStatistics five
Statistics five
Mohamed Hefny
 
Evidence based psychiatry
Evidence based psychiatryEvidence based psychiatry
Evidence based psychiatry
shuchi pande
 
Sampling
SamplingSampling
Sampling
debmahuya
 
Study design2 6_07
Study design2 6_07Study design2 6_07
Study design2 6_07
Dan Fisher
 
Biostatistics in Clinical Research
Biostatistics in Clinical ResearchBiostatistics in Clinical Research
Biostatistics in Clinical Research
Abhaya Indrayan
 
Research
ResearchResearch
Dr. Marie Culhane - Increase the value of your diagnostics and your value as ...
Dr. Marie Culhane - Increase the value of your diagnostics and your value as ...Dr. Marie Culhane - Increase the value of your diagnostics and your value as ...
Dr. Marie Culhane - Increase the value of your diagnostics and your value as ...
John Blue
 
Sampling in Research
Sampling in ResearchSampling in Research
Sampling in Research
Enzo Engada
 
Design of experiments - Dr. Manu Melwin Joy - School of Management Studies, C...
Design of experiments - Dr. Manu Melwin Joy - School of Management Studies, C...Design of experiments - Dr. Manu Melwin Joy - School of Management Studies, C...
Design of experiments - Dr. Manu Melwin Joy - School of Management Studies, C...
manumelwin
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
David Cook
 

Similar to Making your science powerful : an introduction to NGS experimental design (20)

So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
3 - SamplingTechVarious sampling techniques are employedniques(new1430).ppt
3 - SamplingTechVarious sampling techniques are employedniques(new1430).ppt3 - SamplingTechVarious sampling techniques are employedniques(new1430).ppt
3 - SamplingTechVarious sampling techniques are employedniques(new1430).ppt
 
Hypothesis Testing and Experimental design.pptx
Hypothesis Testing and Experimental design.pptxHypothesis Testing and Experimental design.pptx
Hypothesis Testing and Experimental design.pptx
 
Experimental method of Educational Research.
Experimental method of Educational Research.Experimental method of Educational Research.
Experimental method of Educational Research.
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
Research design and experimentation
Research design and experimentationResearch design and experimentation
Research design and experimentation
 
sience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real studysience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real study
 
Advances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyAdvances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell Technology
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
Sampling techniques
Sampling techniquesSampling techniques
Sampling techniques
 
Statistics five
Statistics fiveStatistics five
Statistics five
 
Evidence based psychiatry
Evidence based psychiatryEvidence based psychiatry
Evidence based psychiatry
 
Sampling
SamplingSampling
Sampling
 
Study design2 6_07
Study design2 6_07Study design2 6_07
Study design2 6_07
 
Biostatistics in Clinical Research
Biostatistics in Clinical ResearchBiostatistics in Clinical Research
Biostatistics in Clinical Research
 
Research
ResearchResearch
Research
 
Dr. Marie Culhane - Increase the value of your diagnostics and your value as ...
Dr. Marie Culhane - Increase the value of your diagnostics and your value as ...Dr. Marie Culhane - Increase the value of your diagnostics and your value as ...
Dr. Marie Culhane - Increase the value of your diagnostics and your value as ...
 
Sampling in Research
Sampling in ResearchSampling in Research
Sampling in Research
 
Design of experiments - Dr. Manu Melwin Joy - School of Management Studies, C...
Design of experiments - Dr. Manu Melwin Joy - School of Management Studies, C...Design of experiments - Dr. Manu Melwin Joy - School of Management Studies, C...
Design of experiments - Dr. Manu Melwin Joy - School of Management Studies, C...
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
 

Recently uploaded

My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 

Recently uploaded (20)

My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 

Making your science powerful : an introduction to NGS experimental design

  • 1. Making your science powerful: an introduction to NGS experimental design 7th May CSCR seminar Dr Jelena Aleksic
  • 2. The purpose of this talk Bioinformaticians often get asked about experimental design, as we have experience working with the data. Good experimental design is absolutely crucial for NGS applications, because of the size of the datasets. The purpose of this talk is to introduce the basic concepts of good NGS experimental design.
  • 3. NGS Experimental Design • Importance • Specificity and power • Accuracy, precision and bias • Considerations for sample collection • Considerations for sequencing • Conclusions
  • 4. Without a valid design, valid scientific conclusions cannot be drawn. Also particularly important for NGS because of the size of the datasets.
  • 5. Ronald A. Fisher (1890-1962) “To consult the statistician after an experiment is finished is often merely to ask them to conduct a post mortem examination. They can perhaps say what the experiment died of.” (1938) Good to think about early!
  • 6. Value of Planning Rushing into experiments without thoughtful planning invites failure. “Seventy percent of whether your experiment will work is determined before you touch the first test tube” Tung-Tien Sun (2004). Excessive trust in authorities and its influence on experimental design. Nature Reviews Molecular Cell Biology
  • 7. Still a serious issue in the field - Almost 70% of all the human RNA-seq samples in GEO do not have biological replicates (Feng et al. 2012) - More unreplicated RNA-seq data were published than replicated RNA-seq data in 2011 (Feng et al. 2012) - ENCODE guidelines recommend two ChIP-seq biological replicates (Landt et al. 2012), but this is controversial and arguably too few
  • 8. • Importance • Specificity and power • Accuracy, precision and bias • Considerations for sample collection • Considerations for sequencing • Conclusions NGS Experimental Design
  • 9. Important concepts • Specificity: the proportion of results that are genuine ones, as opposed to false positives • Power (sensitivity): the proportion of genuine results that are successfully identified by the experiment e.g. At FDR 1%, 1 in 100 results will be a false positive. e.g. At 20% power, the experiment will successfully identify 200 out of 1000 genuine differentially expressed genes at a specified fold-change level.
  • 10. Adequately Powered The power (sensitivity) of an experiment is the probability that it can detect an effect, if it is present. •Power is often overlooked. •A probability - any value between 0% and 100%. •Achieved by: –Using appropriate numbers of animals / samples (sample size) –Controlling sources of variation If you increase the variability when you increase the size then it won't necessarily have more power.
  • 11. Adequately Powered •Too many samples: –wastes resources e.g. animals (unethical), money, time and effort. •Too few samples: –may lack power and miss a scientifically important effect. Power(%) Sample Size per group (n)
  • 12. How much power is enough? – Depends on the experiment purpose. – To just get the top few targets = not much. – To get a comprehensive list of targets = a lot. – Rare transcripts / genomic variants = a lot. – Also depends on variability of samples. – In vitro studies = less variability – In vivo = more variability, mixture of tissues – Cancer = hugely variable, need a lot of power
  • 13. How do I tell? • It’s possible to calculate the required power of an experiment. For RNA-seq, there’s even an online tool (and there are various general calculators): • Alternatively, use rules of thumb based on other people’s systematic reviews of a particular experiment type.
  • 14. How many replicates? 4 mice… No, it should be 12 mice… Look into my crystal ball…… The sensitivity, ability or power to detect changes depends on the sample size.
  • 15. Depends on the resources, the goals of the study, and the reliability of the technology: –How much money do you have? –Can you handle all these samples without problem? –What size of differences (effect size) to detect? –It’s an accurate representation of the population? –Large enough to achieve meaningful results? How many replicates?
  • 16. What is the minimum number of replicates? Experiments with only one replicate break statistics and make puppies cry. • Unless: – It’s a pilot experiment, e.g. to estimate data quality / antibody specificity / quantity of particular transcripts etc., and you plan to do follow-ups (further reps or other biological validation). – It’s performed on a homogeneous and well-characterised system where a lot of other data already exists. e.g. RNA- seq of a much used cell line. –It’s a simple confirmatory experiment
  • 17. What is the minimum number of replicates? • Adequate minimum number of replicates for an exploratory study = 3. • The reasoning – 3 data points per gene lets you fit a statistical distribution more accurately, and gives a better estimate of variability – This in turn improves both the specificity and the sensitivity of the experiment. Some labs do 4 replicates as standard, in case one fails.
  • 18. • Importance • Specificity and power • Accuracy, precision and bias • Considerations for sample collection • Considerations for sequencing • Conclusions NGS Experimental Design
  • 19. Precise & Unbiased PRECISION: • Reproducibility of repeated measurements. • Precise estimation of the quantity of interest. Precision Bias Target UNBIASED: • Bias can affect accuracy. • Should control for systematic differences between the measure and some “true" value (target). • Doesn’t confound that estimate with a technical effect. • Systematic variation (bias) leads to results being inaccurate. • Random variation (chance) leads to results being imprecise.
  • 20. “A biased scientific result is no different from a useless one” Daniel Sarewitz (Beware the creeping cracks of bias. Nature, 2012)
  • 21. • Bias is avoided by: • Correct selection of experimental units. • Randomisation of the experimental units. • Randomisation of the order in which measurements are made. • “Blinding” and the use of coded samples where appropriate. • Failure to randomise and blind can lead to false positive and negative results Controlling bias
  • 22. Confounding factors: example Plate1 Plate2 Plate3 Control Treatment 1 Treatment 2 RNA Extraction The difference between Control, Treatment 1 and Treatment 2 is confounded by Plate
  • 23. Confounding factors: example 2 Day1, Plate 1 Day2, Plate 2 Day3, Plate 3 Control Treatment 1 Treatment 2 RNA Extraction The difference between Control, Treatment 1 and Treatment 2 is confounded by day and plate.
  • 24. Sources of potential bias Issues can arise if any of the experimental steps are applied in a systematically biased way between sample and control. e.g. During: - Sample collection procedure - Molecular biology during the main experiment (e.g. ChIP) - Library prep - Sequencing (different lanes on same flowcell is ok, but different flow cells can produce different results)
  • 25. http://www.the-scientist.com/blog/display/57558/ • A GWAS study of 800 centenarians against controls found 150 SNPs which can predict if a person is a centenarian with 77 % accuracy. • Problem: they used different SNP chips for centenarian vs control. • Retracted 2011 following an independent lab reviewed the data and QC applied.
  • 26. • Importance • Specificity and power • Accuracy, precision and bias • Considerations for sample collection • Considerations for sequencing • Conclusions NGS Experimental Design
  • 27. Samples to collect • Adequate minimum number of replicates for an exploratory study = 3. • Controls are also essential. • Biological replicates should be performed - technical reps will not capture variation present in the tissues / organisms. –> This means you need at least 3 biological replicate samples, and 3 biological replicate controls.
  • 28. Biological or technical replicates? Choose Samples Extract RNA Quality Control Convert to cDNA Quality Control Library prep Washing Sequencing Analysis Biological Replication Technical Replication Technical Replication Technical Replication Technical Replication RNAseqProcessingWorkflow
  • 29. Biological replicate tricky cases • Some biological replicate cases are obvious: e.g. tissue samples from different mice, blood samples from different people, cell cultures from different people. • Others are less clear cut: • e.g. Experiments using a single cell line
  • 32. Cell line replicates Treatment Results Bad example Better Even better Treatment Results
  • 33. Cell line replicates Treatment Results Bad example Better Even better Treatment Results None of these are biological replicates - but we do the best we can.
  • 34. Performing cell line replicates • The aim is to assess the replicability of a given experiment, by performing it independently • This means ideally they should be done on different days, with fresh medium and reagents etc. However, this might be impractical. • It is at least important to perform the treatments independently. e.g. To get 4 replicates, do 4 separate transfections (or RNAi etc) for the experimental sample, and 4 for the control.
  • 35. Experiment Controls • A very important part in your experiment, because it’s very difficult to eliminate all of the possible confounding variables & bias. • Designing the experiment with controls in mind is often more crucial than determining the independent variable. • Increases the statistical validity of your data. • Two types: positive and negative.
  • 36. Experimental Controls • Placebo control Mimic a procedure or treatment without the actual use of the procedure or test substance. • e.g. same surgical procedure but without X implanted • Vehicle control Used in studies where a substance is used to deliver an experimental compound. e.g. apply EtOH to cell lines on it’s own as a control since it’s used as a vehicle for delivering the Tamoxifen drug. • Baseline biological state e.g. Wild type to observe the effects of a knockout.
  • 37. ChIP-seq Controls • What controls to use – Either input DNA or IgG antibody are commonly used. Both have pros and cons. • Replicates - The control should have as many replicates as the sample, as it is also variable. • Particularly important - ChIP reactions are extremely noisy and variable.
  • 38. Should I pool samples? When working with very small amounts of tissue sample, pooling is a good way of reducing the noise while keeping the number of n reasonably small. e.g. where n = no. of sequencing reactions. • Better to pool the biological material (tissue, cells), not the purified RNA or labeled cDNA. In this way, problems are far easier to spot. • However, there’s no way to estimate variation between individuals in a pool, which is sometimes important and often interesting. • Beware of outliers - Don’t include any sample that looks suspicious. • In some studies (pooled vs unpooled designs) the majority of DEGs turn out extreme in only one individual.
  • 39. • Importance • Specificity and power • Accuracy, precision and bias • Considerations for sample collection • Considerations for sequencing • Conclusions NGS Experimental Design
  • 40. Basic NGS parameters • Library size / sequencing depth: the number of reads obtained for a given sample (e.g. 20 million) • Read length: bp length of each read in the library (e.g. 150 bp) • Single end vs paired end: single end sequencing only sequences one end of each DNA fragment, while paired end sequences both ends.
  • 41. How deeply should you sequence • Depends on the application: e.g. some parts of the genome will be more tricky to sequence, so full coverage for genome assembly may require very deep sequencing. • RNA-seq differential expression: very roughly (for human and mouse) - at least 10 million reads per sample is required, and 30 million reads per sample should suit most applications. Can sequence deeper if low expressed genes are particularly important. • If in doubt: you can run a pilot experiment and see how much coverage and saturation you get, and potentially sequence further after that.
  • 42. Sequencing depth vs replicates Replicates are good! Even if you don’t sequence any more deeply. A study looking at RNAseq (Liu et al. 2013) found that adding more replicates at 10 million reads library size was a more cost- efficient way of improving the power of the experiment, than adding more sequencing coverage. This was true for 3-7 replicates. So, if you don’t have money for a lot of sequencing, you can still multiplex a number of biological replicates!
  • 43. What read length? • The read length should match your sample fragment length: you want it to be around the size of and just a bit shorter than your fragments. • Recommended for Illumina: sRNAseq, ribosomal profiling = 50 bp read length mRNAseq (fragment size ~80-200bp) = 150 bp read length longer fragments = 250 bp • For longer reads / niche applications: other sequencers also exist. PacBio has an impressive read length (often >10 kb)
  • 44. Single end vs paired end • Single end: a bit cheaper, less sequencing required. Suitable for most general purposes, such as differential expression analysis. • Paired end: contains more length and positional information about the sequenced fragment - can tell where it starts and stops. • Applications for which you need paired end sequencing: - splice junctions - rearrangements: indels and inversions
  • 45. • Importance • Specificity and power • Accuracy, precision and bias • Considerations for sample collection • Considerations for sequencing • Conclusions NGS Experimental Design
  • 46. Summary • Experimental design is key to a successful genomics experiment. • The required power of an experiment should be considered when choosing the number of replicates. • An adequate number of biological replicates (usually >=3) is crucial for a reliable statistical analysis • Appropriate controls are essential • Systematic bias can completely mess up an experiment and should be controlled
  • 47. Acknowledgements Thank you to Ela and everyone in the Frye lab! Iwona Driskell Sandra Blanco Shobbir Hussain Joana Flores Martyna Popis Roberto Bandiera Abdul Sajini Mahalia Page Nikoletta Gkatza Carolin Witte And also the following colleagues for excellent experimental design discussions: Sabine Dietmann Patrick Lombard Meryem Ralser Bettina Fischer (CSBC) Sarah Carl (CSBC) Duncan Odom (CRUK CRI) Audrey Fu (University of Chicago) And thank you all for listening! I would particularly like to thank Roslin Russell (CRUK) for letting me use her course material slides as a basis for preparing this talk