Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Sample size calculations
Experimental
design: part 5
Let’s move on to
sample size
determination
How many
samples do you
need?
I know this one!
You always do 3
experiments! Am ...
What if I am doing a
descriptive study?
Sample size
calculations are for
when you are
formally testing a
hypothesis
Explor...
Step 1:
Identify your
variables
Step 2:
Pick the right test
(sounds scary, isn’t
really)
Step 3:
Plug some numbers
into a ...
Let’s briefly talk about
stats
The good news is that you
usually should be planning
your experiment in a way
that makes yo...
Before we move on, let’s
consider an experiment
where you have more than
one hypothesis to test
The sample size calculatio...
Dependent/Outcome: What you actually measure
Independent/Factor: The bits you control, how your samples
are grouped
Types ...
The tests that you decide to use
will ultimately depend on
whether your data are normally
distributed or not
You’ll need t...
Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Categorical
Hypothesis: mice carrying the
green transge...
Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Categorical
Male
Female
Control Transgenic
4
4
2
6
Chi ...
Chi Squared
Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Categorical
Male
Female
-/-
You can use Chi...
Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Weight
Measurements
-/-
+/+
Continuous
if you planned t...
Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Continuous:
Independent or paired
T test?
If your group...
Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Continuous:
If you are measuring the
response in the sa...
Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Continuous
T tests are good for
comparing 2 groups, but...
Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Continuous:
Like for T tests, there are
different types...
Outcome Variables:
Factor Variables:
Types of Variable
Categorical
Continuous
If you break the factor variables
into two ,...
Outcome Variables:
Factor Variables:
Types of Variable
Continuous
Ok, let’s move on to situations
where you are looking to...
Outcome Variables:
Factor Variables:
Types of Variable
Categorical (usually binary)
A different form of regression,
the lo...
Outcome Variables:
Factor Variables:
Types of Variable
Multiple Continuous
Last one (of the
common ones!). The
MANOVA
Cate...
Outcome Variables:
Factor Variables:
Types of Variable
Multiple Continuous
Categorical
If you plan do multiple
ANOVAs or T...
Outcome Variables:
Factor Variables:
Types of Variable
Multiple Continuous
Categorical
Having multiple
comparisons will in...
Remember though when
you do your power analysis
it is based only on the
primary outcome
This means that you may
only have ...
MANOVA
Or
ANOVA
You’re doing a clinical
trial of a new drug that
decreases blood pressure
Blood pressure
measurements are ...
MANOVA
Or
ANOVA
However, if you need
multiple variables to
answer your question
then you need a
MANOVA
There are other tests for more
complex designs, but get help
for those!
Time for you to do
something!
Chi Square
T Test
1 ...
Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVA...
Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVA...
Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVA...
Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVA...
Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVA...
Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVA...
Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVA...
Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVA...
Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVA...
Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVA...
Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVA...
Chi Square
T Test
1 way ANOVA
Spearman
Logistic
MANOVA
Outcome Factor
Cat Cat
Cat (2 only)
Cat
Cat
Cat
Con
Con
2 way ANOVA...
Right, now I know what test I
am going to do, how do I
decide on sample sizes!
It’s quite easy
actually!
You can use an
on...
It’s asking for a
type I error rate, a
What’s that?
This is your critical p value written as
a percentage
P =0.05 = 5%
Thi...
Everyone uses p<0.05 right? So I
want 5% here.
5% is a common cut off but it
doesn’t mean it’s appropriate for
your study!...
It’s asking for a
Power, 1- b
What’s that?
The power refers to the
false negative rate
It’s the probability that you
would...
OK, back to the calculator.
It’s asking for means and
standard deviations
Yes, the calculator needs a
prediction for what ...
Let’s have a quick look at why
you need those numbers, what
they mean with respect to power
and p values
Mean A
Mean B
Eff...
Big
Differences
between
populations
Small
variation
within
samples
Small
variation
within
samples
Smaller
Differences
betw...
Big
Differences
between
populations
Small
ariation between
samples
Small
variation between
samples
Smaller
Differences
bet...
Where do the effect size and
variation numbers come
from?
Look at all this
juicy pilot data
You could have a 1000 samples
...
Look at all this
juicy pilot data
Variation and predictions for
effect size will ideally come from
pilot data
Sometimes th...
Hopefully you spotted an
important point in the last little
bit
If you can reduce the variation
between your experimental ...
What about technical
repeats? How many of them
do you need?
Good question!
Do you remember why you
do technical repeats?
E...
They increase the accuracy
of measurement for each
sample
Yes, by decreasing the intra-
individual variability you
should ...
So should I do as many as
possible?
Final analysis
No, not necessarily, that
would be a waste of time
and money.
You need ...
Paired normal and cancer tissues probed with antibodies
against protein B. Compare staining intensity in the normal
tissue...
The mRNA for protein A is decreased and mRNA for protein B is
decreased In RNA extracted from squamous cell carcinoma
tiss...
The mRNA for protein A is decreased and mRNA for protein B is
decreased In RNA extracted from squamous cell carcinoma
tiss...
Between Samples Variability
Within sample Variability
Tech repeats
Squamous cell carcinoma cells induced to overexpress pr...
Between Samples Variability
Within sample Variability
Tech repeats
Squamous cell carcinoma cells induced to overexpress pr...
Between Samples Variability
Within sample Variability
Tech repeats
Note that you might disagree
with some of the previous
...
Ethics, Cost and Feasibility
Once you have done your sample
size calculation you have three
important decisions to make
Ethics, Cost and Feasibility
1. Can I ethically justify using these numbers of
patients/animals?
• Does the end justify th...
Ethics, Cost and Feasibility
2. Can I afford to process that number of
samples?
Ethics, Cost and Feasibility
3. Is it feasible to process that number of
samples?
• Will I be able to recruit enough parti...
Ethics, Cost and Feasibility
What do you do if the
answer to these questions is
no?
Well, first you should be pleased
that...
Ethics, Cost and Feasibility
You can reduce the number of
samples required by:
Reducing variability in your
samples, possi...
Final Checks
Don’t jump in too soon!
3 final questions
1. Consider the ethics again.
Are you fully satisfied that you are going to
do t...
2. What would the devil’s advocate say about
your experimental design?
Are your controls sufficient to rule out
alternativ...
3. Have you had someone else’s input?
Before you jump in, get someone
independent to have a look. They might
come up with ...
It’s worth spending time on
experimental planning
Making sure things are right
will help make sure you
generate useful dat...
Part 5 Recap.
Don’t be put off by sample size determination. It
will stop you wasting time and money with
poorly planned e...
Sam
Liam
Jess
Danielle
Upcoming SlideShare
Loading in …5
×

Experimental design cartoon part 5 sample size

159 views

Published on

Part 5 of 5 - Experimental design lecture series. This one focuses on sample size calculations and introduces some of the commonly used statistical tests (for normally distributed data). Toward the end it covers type I and II errors, alpha/beta and reducing variability.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Experimental design cartoon part 5 sample size

  1. 1. Sample size calculations Experimental design: part 5
  2. 2. Let’s move on to sample size determination How many samples do you need? I know this one! You always do 3 experiments! Am I right? Nope. OK, do you do experiments until your data are significant? No. You do a calculation before you start based on your hypothesis and experiment design
  3. 3. What if I am doing a descriptive study? Sample size calculations are for when you are formally testing a hypothesis Exploratory studies don’t need them, but without the stats to back up your data you won’t be able to make strong conclusions
  4. 4. Step 1: Identify your variables Step 2: Pick the right test (sounds scary, isn’t really) Step 3: Plug some numbers into a calculator Sample size calculations are actually not too hard… 3 steps
  5. 5. Let’s briefly talk about stats The good news is that you usually should be planning your experiment in a way that makes your stats as simple as possible To be able to calculate sample size you will need a good idea of what test you plan to run in the end Oh, dear! If your plan is complex, then you definitely should get some expert help Let’s have a quick look at some of the common tests Phew!
  6. 6. Before we move on, let’s consider an experiment where you have more than one hypothesis to test The sample size calculation you do will be based on the primary outcome measure I always consider the primary outcome to be the real reason I did the experiment How do I know which is the primary hypothesis? Secondary outcomes are bonus information, or details that help explain the data
  7. 7. Dependent/Outcome: What you actually measure Independent/Factor: The bits you control, how your samples are grouped Types of Variable Class of Variable Categorical: Discrete answers: yes/no, male/female, Small/medium/large Continuous: Answers can be anything Height, weight, age Speed, distance, power, concentration For your experiment, first identify what class your variables will fall into Then look up the type of test that is appropriate for those combinations Reminder For both types of variable, there are two classes: categorical and continuous Simple!
  8. 8. The tests that you decide to use will ultimately depend on whether your data are normally distributed or not You’ll need to formally test for normality once you have your data However, for the purposes of experimental design, I usually base my calculations on the data being normally distributed Normal Distribution Non-Normal Distribution This simplifies things and means you only need to choose between about half as many tests I’m not convinced I’m normal! OK, on to the tests!!
  9. 9. Outcome Variables: Factor Variables: Types of Variable Categorical Categorical Hypothesis: mice carrying the green transgene are more likely to be female than male Male Female Control Transgenic Example please! Chi Squared
  10. 10. Outcome Variables: Factor Variables: Types of Variable Categorical Categorical Male Female Control Transgenic 4 4 2 6 Chi squared will test how likely the observed proportions will have arisen by chance Your data is simply the number of animals in each group Chi Squared
  11. 11. Chi Squared Outcome Variables: Factor Variables: Types of Variable Categorical Categorical Male Female -/- You can use Chi square with multiple categories in both outcome and factor variables +/-+/+ Do you call those Chi rectangles? No, Conro!
  12. 12. Outcome Variables: Factor Variables: Types of Variable Categorical Weight Measurements -/- +/+ Continuous if you planned to compare the weights of your 2 transgenic lines I find it helps if you imagine the graphs your data will generate Example please! Note: use T tests for comparing 2 groups only! T tests
  13. 13. Outcome Variables: Factor Variables: Types of Variable Categorical Continuous: Independent or paired T test? If your groups are unrelated, then you will use the independent T test? Independent T tests
  14. 14. Outcome Variables: Factor Variables: Types of Variable Categorical Continuous: If you are measuring the response in the same individuals, use a paired T test A + Treatment 1 Measurement B + Baseline Reading 1 Baseline Reading 2 Treatment 2 Measurement A B Paired T tests
  15. 15. Outcome Variables: Factor Variables: Types of Variable Categorical Continuous T tests are good for comparing 2 groups, but if you are going to have 3 or more you’ll likely to need ANOVA These are pretty common: for example comparing wild- type, heterozygous and homozygous mice -/- +/- +/+ Or if you have untreated, control treatment(s) and test treatments ANOVA
  16. 16. Outcome Variables: Factor Variables: Types of Variable Categorical Continuous: Like for T tests, there are different types of ANOVA depending on whether your samples are independent of not A one-way ANOVA is for independent samples, like our mice line examples -/- +/- +/+ A B A repeated measures ANOVA is for multiple paired samples 1 way or repeated measures ANOVA
  17. 17. Outcome Variables: Factor Variables: Types of Variable Categorical Continuous If you break the factor variables into two , you will need a 2-way ANOVA For example, if you plan to analyse your data looking at genotype AND gender effects then you will need a 2-way ANOVA -/-+/-+/+ -/-+/-+/+ Male Female 2 way ANOVA
  18. 18. Outcome Variables: Factor Variables: Types of Variable Continuous Ok, let’s move on to situations where you are looking to see if your two variables are correlated Continuous OutcomeVariable The most common test here is Pearson’s regression or Pearson’s correlation Pearson’s Regression
  19. 19. Outcome Variables: Factor Variables: Types of Variable Categorical (usually binary) A different form of regression, the logistic regression is used for categorical outcome data with continuous factor data Continuous An example might be if you want to see if there is a correlation between the age of your donor and whether a disease is present or not Logistic Regression
  20. 20. Outcome Variables: Factor Variables: Types of Variable Multiple Continuous Last one (of the common ones!). The MANOVA Categorical This is a special type of ANOVA where you have more than one outcome variable Classic examples are when you are comparing lots of different mRNAs in the same sample Or lots of different clinical details in the same patient Gene 1 Gene 2 Gene 3 NB – all 3 genes measured in same animal MANOVA
  21. 21. Outcome Variables: Factor Variables: Types of Variable Multiple Continuous Categorical If you plan do multiple ANOVAs or T tests in this situation you will underestimate your sample size The MANOVA is specifically designed for these multiple comparison tests NB – all 3 genes measured in same animal Gene 1 Gene 2 Gene 3 MANOVA
  22. 22. Outcome Variables: Factor Variables: Types of Variable Multiple Continuous Categorical Having multiple comparisons will increase the sample size you need. This is another time for you to consider; do you really want to ask lots of questions? NB – all 3 genes measured in same animal Gene 1 Gene 2 Gene 3 MANOVA
  23. 23. Remember though when you do your power analysis it is based only on the primary outcome This means that you may only have one outcome variable as far as power analysis goes MANOVA Or ANOVA The secondary outcomes will not be as robustly tested, but you can still report the data. BUT, make it clear when you write up your results that you have not accounted for multiple comparisons for these variables Oh, no. I’m confused!
  24. 24. MANOVA Or ANOVA You’re doing a clinical trial of a new drug that decreases blood pressure Blood pressure measurements are your primary outcome Secondary outcomes might be other clinical measurements like heart rate, white cell count, reflexes, blood sugars etc. The secondary outcomes are interesting but you aren’t designing the trial to test them, they’re not part of your sample calculations. ANOVA is OK
  25. 25. MANOVA Or ANOVA However, if you need multiple variables to answer your question then you need a MANOVA
  26. 26. There are other tests for more complex designs, but get help for those! Time for you to do something! Chi Square T Test 1 way ANOVA Spearman Logistic MANOVA Outcome Factor Cat Cat Cat (2 only) Cat Cat Cat Con Con 2 way ANOVACat + CatCon Con Con Con Con 2+
  27. 27. Chi Square T Test 1 way ANOVA Spearman Logistic MANOVA Outcome Factor Cat Cat Cat (2 only) Cat Cat Cat Con Con 2 way ANOVACon Con Con Con Con 2+ Protein B expression is increased in more than 50% of squamous cell carcinoma cancers Paired normal and cancer tissues probed with antibodies against protein B. Compare staining intensity in the normal tissue against the cancer tissue from the same patient. Staining intensity scored as “reduced”, “same” or “increased” First identify the variables and classify them as categorical or continuous Outcome Variables: Factor Variables: Cat + Cat
  28. 28. Chi Square T Test 1 way ANOVA Spearman Logistic MANOVA Outcome Factor Cat Cat Cat (2 only) Cat Cat Cat Con Con 2 way ANOVACon Con Con Con Con 2+ Protein B expression is increased in more than 50% of squamous cell carcinoma cancers Paired normal and cancer tissues probed with antibodies against protein B. Compare staining intensity in the normal tissue against the cancer tissue from the same patient. Staining intensity scored as “reduced”, “same” or “increased” Outcome Variables: Factor Variables: Categorical: reduced/ same/ increase Categorical: Cancer / no cancer Cat + Cat
  29. 29. Chi Square T Test 1 way ANOVA Spearman Logistic MANOVA Outcome Factor Cat Cat Cat (2 only) Cat Cat Cat Con Con 2 way ANOVACon Con Con Con Con 2+ Protein B expression is increased in more than 50% of squamous cell carcinoma cancers paired normal and cancer tissues probed with antibodies against protein B. Compare staining intensity in the normal tissue against the cancer tissue from the same patient. Staining intensity scored as signal intensity 0-255 Let’s test the hypothesis in a different way Outcome Variables: Factor Variables: Cat + Cat
  30. 30. Chi Square T Test 1 way ANOVA Spearman Logistic MANOVA Outcome Factor Cat Cat Cat (2 only) Cat Cat Cat Con Con 2 way ANOVACon Con Con Con Con 2+ Protein B expression is increased in more than 50% of squamous cell carcinoma cancers paired normal and cancer tissues probed with antibodies against protein B. Compare staining intensity in the normal tissue against the cancer tissue from the same patient. Staining intensity scored as signal intensity 0-255 Outcome Variables: Factor Variables: Continuous Signal intensity Categorical: Cancer / no cancer Cat + Cat Paired T test
  31. 31. Chi Square T Test 1 way ANOVA Spearman Logistic MANOVA Outcome Factor Cat Cat Cat (2 only) Cat Cat Cat Con Con 2 way ANOVACon Con Con Con Con 2+ Protein B expression is increased in more than 50% of squamous cell carcinoma cancers Paired normal and cancer tissues probed with antibodies against protein B. Compare staining intensity in the normal tissue against the cancer tissue from the same patient. Staining intensity scored as signal intensity 0-255. We also want to determine if patients with high staining intensity have also developed metastasis Outcome Variables: Factor Variables: Let’s add a little more Cat + Cat
  32. 32. Chi Square T Test 1 way ANOVA Spearman Logistic MANOVA Outcome Factor Cat Cat Cat (2 only) Cat Cat Cat Con Con 2 way ANOVACon Con Con Con Con 2+ Protein B expression is increased in more than 50% of squamous cell carcinoma cancers Outcome Variables: Factor Variables: Continuous Signal intensity Categorical: No cancer / cancer no metastasis / cancer with metastasisCat + Cat Paired normal and cancer tissues probed with antibodies against protein B. Compare staining intensity in the normal tissue against the cancer tissue from the same patient. Staining intensity scored as signal intensity 0-255. We also want to determine if patients with high staining intensity have also developed metastasis
  33. 33. Chi Square T Test 1 way ANOVA Spearman Logistic MANOVA Outcome Factor Cat Cat Cat (2 only) Cat Cat Cat Con Con 2 way ANOVACatCon Con Con Con Con 2+ The mRNA for protein A is decreased and mRNA for protein B is increased In RNA extracted from squamous cell carcinoma tissue compared to RNA isolated normal skin Paired normal skin and cancer tissues obtained, RNA isolated and reverse transcribed to cDNA. Quantitative PCR used to determine abundance of mRNA for protein B and mRNA for protein A, measured relative to a reference transcript Outcome Variables: Factor Variables:
  34. 34. Chi Square T Test 1 way ANOVA Spearman Logistic MANOVA Outcome Factor Cat Cat Cat (2 only) Cat Cat Cat Con Con 2 way ANOVACatCon Con Con Con Con 2+ The mRNA for protein A is decreased and mRNA for protein B is decreased In RNA extracted from squamous cell carcinoma tissue compared to RNA isolated normal skin Paired normal skin and cancer tissues obtained, RNA isolated and reverse transcribed to cDNA. Quantitative PCR used to determine abundance of mRNA for protein B and mRNA for protein A, measured relative to a reference transcript Outcome Variables: Factor Variables: Continuous x 2 mRNA for Protein A mRNA for Protein B Categorical: Cancer / no cancer
  35. 35. Chi Square T Test 1 way ANOVA Spearman Logistic MANOVA Outcome Factor Cat Cat Cat (2 only) Cat Cat Cat Con Con 2 way ANOVACatCon Con Con Con Con 2+ The mRNA for protein A is decreased and mRNA for protein B is decreased In RNA extracted from squamous cell carcinoma tissue compared to RNA isolated normal skin Paired normal skin and cancer tissues obtained, RNA isolated and reverse transcribed to cDNA. Quantitative PCR used to determine abundance of mRNA for protein B and mRNA for protein A, measured relative to a reference transcript. Ratio of mRNA A to mRNA B determined for each tissue Outcome Variables: Factor Variables: You might be interested in the ratio of A to B rather than absolute levels of both
  36. 36. Chi Square T Test 1 way ANOVA Spearman Logistic MANOVA Outcome Factor Cat Cat Cat (2 only) Cat Cat Cat Con Con 2 way ANOVACatCon Con Con Con Con 2+ The mRNA for protein A is decreased and mRNA for protein B is decreased In RNA extracted from squamous cell carcinoma tissue compared to RNA isolated normal skin Paired normal skin and cancer tissues obtained, RNA isolated and reverse transcribed to cDNA. Quantitative PCR used to determine abundance of mRNA for protein B and mRNA for protein A, measured relative to a reference transcript. Ratio of mRNA A to mRNA B determined for each tissue Outcome Variables: Factor Variables: Continuous mRNA A : mRNA B Categorical: Cancer / no cancer Repeated measures
  37. 37. Chi Square T Test 1 way ANOVA Spearman Logistic MANOVA Outcome Factor Cat Cat Cat (2 only) Cat Cat Cat Con Con 2 way ANOVACatCon Con Con Con Con 2+ Squamous cell carcinoma cells induced to overexpress protein B display increased invasion compared with control treated cells. Squamous cell carcinoma cells will either be induced to express protein B, a control protein “C”, or not treated then seeded onto either a skin substitute or onto pure collagen. After 48 hours the distance migrated into each substrate will be measured Outcome Variables: Factor Variables: Last one!
  38. 38. Chi Square T Test 1 way ANOVA Spearman Logistic MANOVA Outcome Factor Cat Cat Cat (2 only) Cat Cat Cat Con Con 2 way ANOVACatCon Con Con Con Con 2+ Squamous cell carcinoma cells induced to overexpress protein B display increased invasion compared with control treated cells. Squamous cell carcinoma cells will either be induced to express protein B, a control protein “C”, or not treated then seeded onto either a skin substitute or onto pure collagen. After 48 hours the distance migrated into each substrate will be measured Outcome Variables: Factor Variables: Continuous distance migrated Categorical: Untreated, +Protein B or +control protein C Categorical: skin substitute or pure collagen
  39. 39. Right, now I know what test I am going to do, how do I decide on sample sizes! It’s quite easy actually! You can use an online calculators I like: www.powerandsamplesize.com Pick the appropriate test and fill in the details it asks for You’ll need to know some terms about what to put where so we’ll quickly cover the big ones and what they mean
  40. 40. It’s asking for a type I error rate, a What’s that? This is your critical p value written as a percentage P =0.05 = 5% This number represents the probability that you would have observed the effect even though no effect exists The P value will ultimately measure of how confident you can be that you haven’t got a false positive How likely is it that this difference could have occurred by chance?
  41. 41. Everyone uses p<0.05 right? So I want 5% here. 5% is a common cut off but it doesn’t mean it’s appropriate for your study! You might want 1% or even 0.1% if the impact of a false positive would be problematic such as in a drug safety trial How likely is it that this difference could have occurred by chance?
  42. 42. It’s asking for a Power, 1- b What’s that? The power refers to the false negative rate It’s the probability that you would have observed a difference if there actually is a difference This number is written as a fraction of 1 but usually described as a percentage So, 0.8 would be described as 80% power The higher the better for this number, but again make it fit your question. How likely is it that you could have detected a difference if there actually was one?
  43. 43. OK, back to the calculator. It’s asking for means and standard deviations Yes, the calculator needs a prediction for what you are going to see You’ll need two predictions: The effect size. What the means of your different populations will be Mean A Mean B Effect size A B And an estimate of the variation within each population Sample A Distribution curve
  44. 44. Let’s have a quick look at why you need those numbers, what they mean with respect to power and p values Mean A Mean B Effect size A B Sample A Distribution curve
  45. 45. Big Differences between populations Small variation within samples Small variation within samples Smaller Differences between populations Ideal situation: no overlap between groups. You can be quite confident that the result hasn’t occurred by chance Big Differences between population means Larger variation within samples If the difference between groups is small, or the variation is large you will be less confident about your interpretation. P values higher, less power
  46. 46. Big Differences between populations Small ariation between samples Small variation between samples Smaller Differences between populations The sample size calculator will tell you how many samples you will need to decrease your p value and increase your power to the limits that you have chosen Big Differences between population means Larger variation between samples Increased sample size The effect of increasing sample size decreases with each addition. There comes a point where adding samples is a waste of time, effort, money and ethically wrong
  47. 47. Where do the effect size and variation numbers come from? Look at all this juicy pilot data You could have a 1000 samples and see a really small difference but it might not matter in the grand scheme of things The effect size should be chosen not just on what you will be able to see but also based on real- world importance of the effect
  48. 48. Look at all this juicy pilot data Variation and predictions for effect size will ideally come from pilot data Sometimes this won’t be possible, in which case you should turn to published work You won’t be able to find your exact experiment but you should be able to find something similar enough to make a prediction
  49. 49. Hopefully you spotted an important point in the last little bit If you can reduce the variation between your experimental units, you will need less samples or be more confident with the same number of samples This is one of the goals of your pilot experiments. To get the experimental conditions as tight as possible so that you limit the variation between samples to true biological variability
  50. 50. What about technical repeats? How many of them do you need? Good question! Do you remember why you do technical repeats? Experiment #1 Final analysis
  51. 51. They increase the accuracy of measurement for each sample Yes, by decreasing the intra- individual variability you should get a more accurate measurement Experiment #1 Final analysis Outliers or mistakes will have a smaller effect the more technical repeats you do There will therefore be less overall variability in your final sample numbers
  52. 52. So should I do as many as possible? Final analysis No, not necessarily, that would be a waste of time and money. You need to consider where the variability in your experiments will be and how much value each technical repeat adds Probably going to want examples here!
  53. 53. Paired normal and cancer tissues probed with antibodies against protein B. Compare staining intensity in the normal tissue against the cancer tissue from the same patient. Staining intensity scored as “reduced”, “same” or “increased” Protein B expression is increased in more than 50% of squamous cell carcinoma cancers The question to ask yourself is how variable will the data be when I test the same sample, and how does that compare to the between sample variability? Cancers vary a lot between individuals Our data should be pretty clear for each sample, though there will be variations within each sample It’s really a cost vs reward analysis. How much do each tech repeats cost (time, money and ethics) vs how much value they add Between Samples Variability Within sample Variability High Low Tech repeats Sample processing: low number OK Not much value from technical repeats, better to just increase sample numbers
  54. 54. The mRNA for protein A is decreased and mRNA for protein B is decreased In RNA extracted from squamous cell carcinoma tissue compared to RNA isolated normal skin Paired normal skin and cancer tissues obtained, RNA isolated and reverse transcribed to cDNA. Quantitative PCR used to determine abundance of mRNA for protein B and mRNA for protein A, measured relative to a reference transcript Here the samples are the same as last time, so again high inter-sample variability The technique this time also has more variability within it RNA degradation during storage, RNA isolation steps and pipetting errors during reverse transcription and qPCR So, we know we will need a large number of samples Between Samples Variability Within sample Variability High High So we probably need more technical repeats as well
  55. 55. The mRNA for protein A is decreased and mRNA for protein B is decreased In RNA extracted from squamous cell carcinoma tissue compared to RNA isolated normal skin Paired normal skin and cancer tissues obtained, RNA isolated and reverse transcribed to cDNA. Quantitative PCR used to determine abundance of mRNA for protein B and mRNA for protein A, measured relative to a reference transcript Between Samples Variability Within sample Variability High Sample availability might mean we don’t have the opportunity to repeat at the whole tissue level High But, logistically, there might be challenges You want the repeats to address the areas with the highest variability: the tissue extraction level Tech repeats Sample processing + RT-PCR and qPCR
  56. 56. Between Samples Variability Within sample Variability Tech repeats Squamous cell carcinoma cells induced to overexpress protein B display increased invasion compared with control treated cells. Squamous cell carcinoma cells will either be induced to express protein B, a control protein “C”, or not treated then seeded onto either a skin substitute. After 48 hours the distance migrated into each substrate will be measured Let’s assume here you are using cells from only one donor (e.g. an immortalised cell line) So variability will come from what condition the cells are in And maybe different levels of protein expression Med
  57. 57. Between Samples Variability Within sample Variability Tech repeats Squamous cell carcinoma cells induced to overexpress protein B display increased invasion compared with control treated cells. Squamous cell carcinoma cells will either be induced to express protein B, a control protein “C”, or not treated then seeded onto a skin substitute. After 48 hours the distance migrated into each substrate will be measured The assay itself should be quite tight. You will have the option of measuring lots of cells Med Probably don’t need many technical repeats per assay, rather doing more biological repeats would be a better use of resources Low Low
  58. 58. Between Samples Variability Within sample Variability Tech repeats Note that you might disagree with some of the previous assertions And really, it will be your data that tells you what you need to do But be aware, that adding lots of technical repeats may not be worth the time and money you invest in doing them OK, got it.
  59. 59. Ethics, Cost and Feasibility Once you have done your sample size calculation you have three important decisions to make
  60. 60. Ethics, Cost and Feasibility 1. Can I ethically justify using these numbers of patients/animals? • Does the end justify the means? • Will I be able to obtain ethical approval?
  61. 61. Ethics, Cost and Feasibility 2. Can I afford to process that number of samples?
  62. 62. Ethics, Cost and Feasibility 3. Is it feasible to process that number of samples? • Will I be able to recruit enough participants? • Will I be able to do the experiments within my study timeframe?
  63. 63. Ethics, Cost and Feasibility What do you do if the answer to these questions is no? Well, first you should be pleased that you asked the question before wasting loads of time doing the experiment! Secondly you can go back and see if you can tighten up your research question or modify your experimental design
  64. 64. Ethics, Cost and Feasibility You can reduce the number of samples required by: Reducing variability in your samples, possibly by controlling for confounders Or, reducing the number of comparisons being made. Remember, its better to answer one question well rather than adding lots of extra small questions
  65. 65. Final Checks
  66. 66. Don’t jump in too soon! 3 final questions 1. Consider the ethics again. Are you fully satisfied that you are going to do the least amount of harm to effectively answer your question?
  67. 67. 2. What would the devil’s advocate say about your experimental design? Are your controls sufficient to rule out alternative interpretations? Is your randomisation technique truly random? Are your experimental units truly independent?
  68. 68. 3. Have you had someone else’s input? Before you jump in, get someone independent to have a look. They might come up with additional confounders or spot ways where you can generate more robust data
  69. 69. It’s worth spending time on experimental planning Making sure things are right will help make sure you generate useful data Hopefully this series of videos should have helped you identify what to think about.
  70. 70. Part 5 Recap. Don’t be put off by sample size determination. It will stop you wasting time and money with poorly planned experiments Identify whether your variables are continuous or categorical, and the rest is easy. Use your pilot studies to reduce variabilities in your experiment and to provide the numbers you need to perform power analysis Don’t be afraid to redesign your experiment if you have any qualms about research ethics or the likelihood of generating meaningful data.
  71. 71. Sam Liam Jess Danielle

×