• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Explore, Analyze and Present your data
 

Explore, Analyze and Present your data

on

  • 403 views

 

Statistics

Views

Total Views
403
Views on SlideShare
402
Embed Views
1

Actions

Likes
1
Downloads
16
Comments
1

1 Embed 1

https://dmacc.blackboard.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Great File, Like It, Thanks!
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Explore, Analyze and Present your data Explore, Analyze and Present your data Presentation Transcript

    • expl ore analyze t en s re p your data Guillaume Calmettes
    • “Bonjour”, I am Guillaume! Sacre Bleu! Bordeaux gcalmettes@mednet.ucla.edu Office: MRL 3645
    • Disclaimer I am not a statistician
    • Statistics are scary Statistics (You at the beginning of the talk)
    • Statistics are scary not so Statistics (You at the middle of the talk)
    • Statistics are scary cool Statistics (You at the end of the talk)
    • Statistics are scary cool We have to deal with them anyways, so we had better enjoy them! Statistics (You at the end of the talk)
    • Press the t-test button and you’ll be done! Did you check the normality of your data first?
    • Why should you care about statistics? http://www.nature.com/nature/authors/gta/2e_Statistical_checklist.pdf
    • Why should you care about statistics? Advances in Physiological Education “Explorations in Statistics” series (2008-present) (Douglas Curran-Everett)
    • Why should you care about statistics? “Statistical Perspectives” series (2011-present) (Gordon Drummond) The Journal of Physiology Experimental Physiology The British Journal of Pharmacology Microcirculation The British Journal of Nutrition http://jp.physoc.org/cgi/collection/stats_reporting
    • Why should you care about statistics? Importance of being uncertain – September 2013
 How samples are used to estimate population statistics and what this means in terms of uncertainty. Error Bars – October 2013
 The use of error bars to represent uncertainty and advice on how to interpret them. Significance, P values and t-tests – November 2013
 Introduction to the concept of statistical significance and the one-sample t-test. http://blogs.nature.com/methagora/2013/08/giving_statistics_the_attention_it_deserves.html
    • Why should you care about statistics? “Journals […] fail to exert sufficient scrutiny over the results that they publish” “Nature research journals will introduce editorial measures to address the problem by improving the consistency and quality of reporting in life-sciences articles” “We will examine statistics more closely and encourage authors to be transparent, for example by including their raw data”
    • Look at your data
    • A picture is worth a thousand words John Snow (1813-1858) Location of deaths in the 1854 London Cholera Epidemic
    • Why visualize your data? The Anscombe’s quartet example Dataset #1 Dataset #2 Dataset #3 Dataset #4 x y x y x y x y 10 8.04 10 9.14 10 7.46 8 6.58 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 11 8.33 11 9.26 11 7.81 8 8.47 14 9.96 14 8.1 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 4 4.26 4 3.1 4 5.39 19 12.5 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 5 5.68 5 4.74 5 5.73 8 6.89 Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
    • Why visualize your data? The Anscombe’s quartet example Property in each case Value Mean of x 9 (exact) Variance of x 11 (exact) Mean of y 7.5 Variance of y 4.122 or 4.127 Correlation of x and y 0.816 Linear regression line y = 3.00 + 0.500x Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
    • Why visualize your data? The Anscombe’s quartet example Dataset #1 Dataset #2 Dataset #3 Dataset #4 Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
    • Why visualize your data? The Anscombe’s quartet example Dataset #1 Dataset #2 Dataset #3 Dataset #4 Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
    • Visualize your data in their raw form! Aim for revelation rather than mere summary A great graphic with raw data will reveal unexpected patterns and invites us to make comparisons we might not have thought of beforehand.
    • If you are still not convinced … Mean: 16 / Stdv: 5
    • If you are still not convinced … Mean: 16 / Stdv: 5
    • If you are still not convinced … Mean: 16 / Stdv: 5 e WBM secondary transplantation (16 weeks) Daniel’s Journal Club paper Donor engraftment (%) 80 P < 0.05 60 40 20 0 flDMR/+ DMR/+ mH19
    • Avoid making bar graphs “To maintain the highest level of trustworthiness of data, we are encouraging authors to display data in their raw form and not in a fashion that conceals their variance. Presenting data as columns with error bars (dynamite plunger plots) conceals data. We recommend that individual data be presented as dot plots shown next to the average for the group with appropriate error bars (Figure 1).” Rockman H.A. (2012). "Great expectations". J Clin Invest 122 (4): 1133
    • Avoid making bar graphs Error bars Different types, different meanings 100 SORRY , WE JUST 75 YOU... • descriptive statistics (Range, SD) • inferential statistics (SE, CI) 50 25 0 Cumming, G. et al. (2007). "Error bars in experimental biology". J Cell Biol 177 (1): 7–11
    • Avoid making bar graphs Error bars Different types, different meanings • descriptive statistics (Range, SD) • inferential statistics (SE, CI) Often, they also imply a symmetrical distribution of the data. Cumming, G. et al. (2007). "Error bars in experimental biology". J Cell Biol 177 (1): 7–11
    • Avoid making bar graphs Mean and Standard deviation are only useful in the context of a “normal distribution” 95% µ 95% of a normal distribution lies within two standard deviations (σ) of the mean (µ)
    • Avoid making bar graphs symmetrical distribution skewed distribution Data presentation to reveal the distribution of the data • Display data in their raw form. • A dot plot is a good start. • “Dynamite plunger plots” conceal data. • Check the pattern of distribution of the values.
    • Avoid making bar graphs symmetrical distribution skewed distribution • First set: Gaussian (or normal) distribution (symmetrically distributed) • Second set: right skewed, lognormal (few large values) “ This type of distribution of values is quite common in biology (ex: plasma concentrations of immune or inflammatory mediators)” “Plunger plots only: who would know that the values were skewed – ... ... and that the common statistical tests would be inappropriate?”
    • Avoid making bar graphs Don't tell me no one warned you before! Bar graph Dynamite plunger
    • Summary Why visualize your data? For others ... Providing a narrative for the reader But primarily for you ... Looking for patterns and relationships Summarize complex data structures Help avoid erroneous conclusions based upon questionable or unexpected data
    • Chose the right descriptor for your data
    • Averages can be misleading
    • Averages can be misleading
    • Averages can be misleading
    • Averages can be misleading
    • Is the mean always a good descriptor? # of children per household in China (2012) • mean: 1.35 http://www.globalhealthfacts.org/data/topic/map.aspx?ind=87
    • Is the mean always a good descriptor? # of children per household in China (2012) • mean: 1.35 • median: 1 more representative of the “typical” family (One child policy) http://www.globalhealthfacts.org/data/topic/map.aspx?ind=87
    • Any measure is wrong! “Whenever you make a measurement, you must know the uncertainty otherwise it is meaningless” Walter Lewis (MIT) 183.3cm 185.7cm http://www.youtube.com/watch?v=JUxHebuXviM
    • Any measure is wrong! “Whenever you make a measurement, you must know the uncertainty otherwise it is meaningless” Walter Lewis (MIT) The same concept applies when you report your data! Provide the uncertainty of your descriptor hint: this is NOT the standard deviation
    • Any measure is wrong! “Whenever you make a measurement, you must know the uncertainty otherwise it is meaningless” Walter Lewis (MIT) The same concept applies when you report your data! Provide the uncertainty of your descriptor hint: this is NOT the standard deviation Report the Confidence Interval of your descriptor
    • The Bootstrap: origin Modern electronic computation has encouraged a host of new statistical methods that require fewer distributional assumptions than their predecessors and can be applied to more complicated statistical estimators. These methods allow [...] to explore and describe data and draw valid statistical inferences without the usual concerns for mathematical tractability. Efron B. and Tibshirani R. (1991), Science, Jul 26;253(5018):390-5
    • Computing the bootstrap 95% CI A0 (m0) a1 a4 a5 a2 a3 an Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
    • Computing the bootstrap 95% CI A0 (m0) a1 a4 a5 a2 a3 an A1 A2 a4 a5 a3 a2 a1 an a2 a1 a2 a3 a1 a5 mA1 mA2 A2 an a1 an a1 a3 a4 mA3 A2 a4 a3 an a5 a1 a3 mA4 ... Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
    • Computing the bootstrap 95% CI A0 (m0) a1 a4 a5 a2 a3 an A1 A2 a4 a5 a3 a2 a1 an a2 a1 a2 a3 a1 a5 mA1 mA2 A2 an a1 an a1 a3 a4 mA3 A2 a4 a3 an a5 a1 a3 mA4 ... ... Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
    • Computing the bootstrap 95% CI A0 (m0) a1 a4 a5 a2 a3 an A1 A2 a4 a5 a3 a2 a1 an a2 a1 a2 a3 a1 a5 mA1 mA2 A2 an a1 an a1 a3 a4 mA3 A2 a4 a3 an a5 a1 a3 mA4 ... Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
    • Computing the bootstrap 95% CI A0 (m0) a1 a4 a5 a2 a3 an A1 A2 a4 a5 a3 a2 a1 an a2 a1 a2 a3 a1 a5 mA1 mA2 A2 an a1 an a1 a3 a4 mA3 A2 a4 a3 an a5 a1 a3 mA4 ... 5.18 [4.91, 4.47] Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
    • Analyze your data
    • Choose your statistical test wisely Authors Guidelines Every paper that contains statistical testing should state [...] a justification for the use of that test (including, for example, a discussion of the normality of the data when the test is appropriate only for normal data), [...], whether the tests were one-tailed or two-tailed, and the actual P value for each test (not merely "significant" or "P < 0.5"). http://www.nature.com/nature/authors/gta/#a5.6
    • The simple case (How to) mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male
    • The simple case (How to) Distribution of the data? mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male
    • The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male
    • The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male • fit of the histogram
    • The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male • fit of the histogram
    • The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 • fit of the histogram • QQ plot Male ith point A(i) Theoretical quantiles of the distribution Φ −1 i − 3/8 n + 1/4
    • The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male • fit of the histogram • QQ plot not “normal”
    • The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 • fit of the histogram • QQ plot Female Male Male
    • The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female visual inspection mean/std 187.0 ± 19.8 • fit of the histogram • QQ plot Female Male Male
    • The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female visual inspection mean/std test 187.0 ± 19.8 Male • fit of the histogram • QQ plot • Shapiro-Wilk test
    • The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female visual inspection mean/std test 187.0 ± 19.8 Male • fit of the histogram • QQ plot • Shapiro-Wilk test Null Hypothesis for the SW test: Data are normally distributed Female p-value: 0.9195 Male p-value: 0.3866
    • The simple case (How to) difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male Distribution of the data? Normally distributed
    • The simple case (How to) difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male Distribution of the data? Normally distributed
    • The simple case (How to) difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male Distribution of the data? Normally distributed
    • The simple case (How to) difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male Distribution of the data? Normally distributed Statistical test? t-test
    • The simple case (How to) difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male Distribution of the data? Normally distributed Statistical test? t-test Null Hypothesis for the t-test: Data belong to the same population t-test p-value < 2.2e-16
    • Usually it is not so simple
    • The “not so simple” case S1 S2
    • The “not so simple” case S1 S2
    • The “not so simple” case S1 S2 S1 S2
    • The “not so simple” case S1 S2 Shapiro-Wilk test: S1 p-value: 7.4e-05 S2 p-value: 6.7e-06 S1 S2
    • What to do?
    • What to do? For the t-test: ! Non parametric alternatives • Mann-Whitney U (independant) ! • Wilcoxon (dependant)
    • Choose a new statistical hero Bootstrapman t-test
    • Computing the bootstrap p-value Are the two samples different? Observed difference = 0.44
    • Computing the bootstrap p-value Are the two samples different? Observed difference = 0.44 If the two samples were from the same population, what would the probabilities be that the observed difference was from chance alone?
    • Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn
    • Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5
    • Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5 A1 B1 a4 b5 b3 b2 a1 an a2 b1 b2 a3 b1 a5 mA1 mB1 D1 = mA1-mB1
    • Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5 A1 B1 a4 b5 b3 b2 a1 an a2 b1 b2 a3 b1 a5 mA1 mB1 D1 = mA1-mB1 D0 = 0.44 D1 = -0.83
    • Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5 A2 B2 a1 b5 b3 a1 a4 an a2 b1 b5 b5 b1 b5 mA2 mB2 D2 = mA2-mB2 D0 = 0.44 D1 = -0.83 D2 = 0.84
    • Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5 A1 B1 a4 b5 b3 b2 a1 an a2 b1 b2 a3 b1 a5 mA1 mB1 D1 = mA1-mB1 Repeat 10000 times (D1 ... D10000)
    • Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5 A1 B1 a4 b5 b3 b2 a1 an a2 b1 b2 a3 b1 a5 mA1 mB1 D1 = mA1-mB1 Repeat 10000 times (D1 ... D10000) How many pseudo-differences are greater or equal than the observed difference D0 ? (0.44)
    • Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5 A1 B1 a4 b5 b3 b2 a1 an a2 b1 b2 a3 b1 a5 mA1 mB1 D1 = mA1-mB1 How many pseudo-differences are greater or equal than the observed difference D0 ? Repeat 10000 times (D1 ... D10000) (0.44) 9829<D0 171>D0
    • Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5 A1 B1 a4 b5 b3 b2 a1 an a2 b1 b2 a3 b1 a5 mA1 mB1 D1 = mA1-mB1 How many pseudo-differences are greater or equal than the observed difference D0 ? 171 = 0.0171 p= 10000 (one-tailed) Repeat 10000 times (D1 ... D10000) (0.44) 9829<D0 171>D0
    • Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn MW: p = 0.0169 171 = 0.0171 p= 10000 (one-tailed) a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5 A1 B1 a4 b5 b3 b2 a1 an a2 b1 b2 a3 b1 a5 mA1 mB1 D1 = mA1-mB1 How many pseudo-differences are greater or equal than the observed difference D0 ? Repeat 10000 times (D1 ... D10000) (0.44) 9829<D0 171>D0
    • Summary How do my data look like? Distribution? • visual inspection (hist. / QQ plot) • normality test What do I want to compare? • parametric test Right statistical test? • non parametric test • resampling statistics
    • The dark side of the p-value
    • Statistical significance “The effect of the drug was statistically significant.”
    • Statistical significance “The effect of the drug was statistically significant.” so what?
    • Statistical significance (example) “The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”
    • Statistical significance (example) “The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).” Training has a larger effect in the mutant mice than in the control mice!
    • Statistical significance (example) “The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).” Training has a larger effect in the mutant mice than in the control mice!
    • Statistical significance (example) “The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).” * Activity Extreme scenario: - training-induced activity barely reaches significance in mutant mice (e.g., 0.049) and barely fails to reach significance for control mice (e.g., 0.051) - + - + control mutant Does not test whether training effect for mutant mice differs statistically from that for control mice.
    • Statistical significance (example) “The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).” When making a comparison between two effects, always report the statistical significance of their difference rather than the difference between significance levels. Nieuwenhuis S. and al. (2011), “Erroneous analyses of interactions in neuroscience: a problem of significance”, Nat Neuroscience, 14(9):1105-1107
    • P-values do not convey information Mean: 16 SD: 5 Mean: 20 SD: 5 Difference = 4 p-value = 0.1090
    • P-values do not convey information Mean: 16 SD: 5 Mean: 20 SD: 5 Difference = 4 p-value = 0.1090 0.0367
    • P-values do not convey information Mean: 16 SD: 5 Mean: 20 SD: 5 Difference = 4 p-value = 0.1090 0.0367 0.0009
    • P-values do not convey information Fact: Most applied scientists use p-values as a measure of evidence and of the size of the effect - The probability of hypotheses depends on much more than just the p-value. - This topic has renewed importance with the advent of the massive multiple testing often seen in genomics studies 8 “Manhattan plot” -log10(P) 6 4 2 Loannidis JP, (2005) PLoS Med 2(8):e124 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
    • Report effect size and CIs instead
    • P-value is function of the sample size Measured Effect Size: difference = 0.018 mV Amplitude (mV) Control Atropine 0.5 mV 100 ms 0.4 0.2 0 control atropine (n=6777) (n=5272) Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94
    • P-value is function of the sample size Measured Effect Size: difference = 0.018 mV Amplitude (mV) Control Atropine 0.5 mV 100 ms p = 10-5 0.4 0.2 0 control atropine (n=6777) (n=5272) Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94
    • P-value is function of the sample size P (t-test) 100 not significant 10–2 significant 10–4 101 102 103 Hedges' g 0.4 0.2 0.018 mV 0 –0.2 –0.4 101 102 103 Sample size Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94
    • Bootstrap effect size and 95% CIs a1 a2 a4 a5 a3 an a5 a1 a5 a3 a3 a7 a1 a4 a2 a2 a9 a1 a6 a3 a4 a3 A b1 b2 b4 b5 b3 bn etc... a1 a1 (10000 times) a8 a6 b4 b2 b2 b1 b7 b5 b3 b4 b2 b1 b1 b1 b3 b8 b4 b5 B etc... b1 b1 (10000 times) b2 b4 etc... etc... mA1 mA2 mA3 mA4 mA5 E1 E2 (mA1-mB1 ) (mA1-mB1 ) ... mB1 mB2 mB3 mB4 mB5 E10000 (mA10000-mB10000 )
    • Bootstrap effect size and 95% CIs a1 a2 a4 a5 a3 an a5 a1 a5 a3 a3 a7 a1 a4 a2 a2 a9 a1 a6 a3 a4 a3 A b1 b2 b4 b5 b3 bn etc... a1 a1 (10000 times) a8 a6 (0.44) b4 b2 b2 b1 b7 b5 b3 b4 b2 b1 b1 b1 b3 b8 b4 b5 B etc... b1 b1 (10000 times) b2 b4 etc... etc... mA1 mA2 mA3 mA4 mA5 E1 E2 (mA1-mB1 ) (mA1-mB1 ) ... mB1 mB2 mB3 mB4 mB5 E10000 (mA10000-mB10000 )
    • Bootstrap effect size and 95% CIs a1 a2 a4 a5 a3 an a5 a1 a5 a3 a3 a7 a1 a4 a2 a2 a9 a1 a6 a3 a4 a3 A b1 b2 b4 b5 b3 bn etc... a1 a1 (10000 times) a8 a6 (0.44) b4 b2 b2 b1 b7 b5 b3 b4 b2 b1 b1 b1 b3 b8 b4 b5 B etc... b1 b1 (10000 times) b2 b4 etc... etc... mA1 mA2 mA3 mA4 mA5 E1 E2 (mA1-mB1 ) (mA1-mB1 ) ... mB1 mB2 mB3 mB4 mB5 E10000 (mA10000-mB10000 )
    • Bootstrap effect size and 95% CIs a1 a2 a4 a5 a3 an a5 a1 a5 a3 a3 a7 a1 a4 a2 a2 a9 a1 a6 a3 a4 a3 A b1 b2 b4 b5 b3 bn etc... a1 a1 (10000 times) a8 250th a6 (0.44) b4 b2 b 9750th2 b1 b7 b5 b3 b4 b2 b1 b1 b1 b3 b8 b4 b5 B etc... b1 b1 (10000 times) b2 b4 etc... etc... mA1 mA2 mA3 mA4 mA5 E1 E2 (mA1-mB1 ) (mA1-mB1 ) ... mB1 mB2 mB3 mB4 mB5 E10000 (mA10000-mB10000 )
    • Bootstrap effect size and 95% CIs Do the 95% confidence intervals of the observed effect size include zero (no difference)? 0.44 [0.042, 0.853] Eff. size = 0.44 A B 250th 9750th
    • Statistical vs Biological significance
    • Statistical vs Biological significance “The P value reported by tests is a probabilistic significance, not a biological one.” “Statistical significance suggests but does not imply biological significance.” Krzywinski M and Altman N (2013) "Points of significance: Significance, P values and t-tests”. Nature Methods 10, 1041–1042
    • Statistical vs Biological significance Statistical significance has a meaning in a specific context No change Small change Large change Biological consequences?
    • Statistical vs Biological significance AB PD LP LP 1 PY LP 2 “Good enough” solutions 0.60 1,600 0.50 mRNA copy number Conductances at +15 mV (µS/nF) Somato-gastric ganglion 0.40 0.30 0.20 0.10 0 1,400 1,200 1,000 800 600 400 200 Kd K Ca A-type 0 shab BK-KC shal Schulz D.J. et al. (2006) "Variable channel expression in identified single and electrically coupled neurons in different animals". Nat Neurosci. 9: 356– 362
    • Statistical vs Biological significance Madhvani R.V. et al. (2011) "Shaping a new Ca2+ conductance to suppress early afterdepolarizations in cardiac myocytes". J Physiol 589(Pt 24):6081-92
    • Statistical vs Biological significance Breast cancer study Difference in cancer returning between control vs low-fat diet groups. Authors conclusions: People with low-fat diets had a 25% less chance of cancer returning
    • Statistical vs Biological significance Breast cancer study Difference in cancer returning between control vs low-fat diet groups. Authors conclusions: People with low-fat diets had a 25% less chance of cancer returning Actual return rates: - control: 12.4% - low-fat diet: 9.8% Difference 2.6% 2.6 9.8 = 26.5%
    • Beware of false positives (from the authors) Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5
    • Beware of false positives Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5
    • Beware of false positives 2012 Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5
    • Beware of false positives http://xkcd.com/882/
    • Present your data
    • Know your audience
    • Know your audience Who? Why? What? How?
    • Know your audience who is my audience? level of understanding? Who? what do they already know? Why? What? How?
    • Know your audience who is my audience? level of understanding? Who? what do they already know? why am I presenting? Why? what do my audience want to achieve? What? How?
    • Know your audience who is my audience? level of understanding? Who? what do they already know? why am I presenting? Why? what do my audience want to achieve? what do I want my audience to know? What? which story will captivate the audience? How?
    • Know your audience who is my audience? level of understanding? Who? what do they already know? why am I presenting? Why? what do my audience want to achieve? what do I want my audience to know? What? which story will captivate the audience? what medium will support the message the best? How? what format/layout will appeal to the audience?
    • Color blindness is a common disease Males: one in 12 (8%) / Females: one in 200 (0.5%)
    • Color blindness is a common disease “Anyone who needs to be convinced that making scientific images more accessible is a worthwhile task [...]: if your next grant or manuscript submission contains color figures, what if some of your reviewers are color blind? Will they be able to appreciate your figures? Considering the competition for funding and for publication, can you afford the possibility of frustrating your audience? The solution is at hand." Clarke, M. (2007). "Making figures comprehensible for color-blind readers" Nature blog (http://blogs.nature.com/nautilus/2007/02/post_4.html)
    • Making figures for color blind people Wong, B. (2011). "Points of view: Color blindness". Nature Methods 8, 441
    • Making figures for color blind people http://colororacle.org/
    • Making figures for color blind people http://colororacle.org/
    • Telling stories with data “The Martini Glass Structure” http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf
    • Telling stories with data “The Martini Glass Structure” GUIDED START ! EXPLORE NARRATIVE http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf
    • Aesthetic minimalism Suda B. (2010). "A practical guide to Designing with Data"
    • Aesthetic minimalism Suda B. (2010). "A practical guide to Designing with Data"
    • Aesthetic minimalism Suda B. (2010). "A practical guide to Designing with Data"
    • Aesthetic minimalism Suda B. (2010). "A practical guide to Designing with Data"
    • Aesthetic minimalism Suda B. (2010). "A practical guide to Designing with Data"
    • Aesthetic minimalism Suda B. (2010). "A practical guide to Designing with Data"
    • Common mistakes in data reporting Welcome to the FOX “Dishonest Charts” gallery
    • Common mistakes in data reporting
    • Common mistakes in data reporting E. Tufte’s “Lie Factor” Make things appear to be “better” than they are by fiddling with the scales of things
    • Common mistakes in data reporting
    • Common mistakes in data reporting
    • Common mistakes in data reporting
    • Common mistakes in data reporting
    • Common mistakes in data reporting
    • Common mistakes in data reporting Fig 1I “We found that relative to WT mice, the luminal microbiota of Il10−/− mice exhibited a ~100-fold increase in E. coli (Fig. 1I)” Arthur et al, (2012) Science 5;338(6103):120-3
    • Common mistakes in data reporting A B C D E
    • Common mistakes in data reporting A B C D E 20% 20% 20% 20% 20%
    • Common mistakes in data reporting
    • Common mistakes in data reporting
    • Common mistakes in data reporting Percent Return on Investment 40 30 20 10 0 year1 40 year2 year3 Group year4 Group A B Percent Return on Investment Group A 30 Group B 20 10 0 year1 year2 year3 year4
    • Thank you! “The important thing is not to stop questioning. Curiosity has its own reason for existing” - Albert Einstein-