Upcoming SlideShare
Loading in …5
×

# Statistics for Anaesthesiologists

636
-1

Published on

Statistics for Anaesthesiologists covers basic to intermediate level statistics for researchers especially commonly used study designs or tests in Anaesthesiology research.

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

No Downloads
Views
Total Views
636
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
• Nominal: Categorical data and numbers that are simply used as identifiers or names represent a nominal scale of measurement. Numbers on the back of a football jersey and your social security (Aadhar) number are examples.
Ordinal: An ordinal scale of measurement represents an ordered series of relationships or rank order. Individuals competing in a contest may be fortunate to achieve first, second, or third place.
Likert-type scales (such as &quot;On a scale of 1 to 10, with one being no pain and ten being high pain, how much pain are you in today?&quot;) also represent ordinal data. Fundamentally, these scales do not represent a measurable quantity. An individual may respond 8 to this question and be in less pain than someone else who responded 5. A person may not be in exactly half as much pain if they responded 4 than if they responded 8.
Interval: A scale that represents quantity and has equal units but for which zero represents simply an additional point of measurement is an interval scale. The Fahrenheit scale is a clear example of the interval scale of measurement. Thus, 60 degree Fahrenheit or -10 degrees Fahrenheit represent interval data. Zero does not represent the absolute lowest value. Rather, it is point on the scale with numbers both above and below it (for example, -10degrees Fahrenheit).
Ratio: The ratio scale of measurement is similar to the interval scale in that it also represents quantity and has equality of units. However, this scale also has an absolute zero (no numbers exist below zero). Very often, physical measures will represent ratio data (for example, height and weight). If one is measuring the length of a piece of wood in centimeters, there is quantity, equal units, and that measure cannot go below zero centimeters.
• Parametric means that it meets certain requirements with respect to parameters of the population (for example, the data will be normal--the distribution parallels the normal or bell curve). In addition, it means that numbers can be added, subtracted, multiplied, and divided. Parametric data are analyzed using statistical techniques identified as Parametric Statistics. As a rule, there are more statistical technique options for the analysis of parametric data and parametric statistics are considered more powerful than nonparametric statistics.
• Nonparametric data are lacking those same parameters and cannot be added, subtracted, multiplied, and divided. For example, it does not make sense to add Social Security numbers to get a third person. Nonparametric data are analyzed by using Nonparametric Statistics.
• The normality tests all report a P value. To understand any P value, you need to know the null hypothesis. In this case, the null hypothesis is that all the values were sampled from a Gaussian distribution. The P value answers the question:
If that null hypothesis were true, what is the chance that a random sample of data would deviate from the Gaussian ideal as much as these data do?
• If n&gt;30 Z-test is better
Because the one-tailed test provides more power to detect an effect, you may be tempted to use a one-tailed test whenever you have a hypothesis about the direction of an effect. Before doing so, consider the consequences of missing an effect in the other direction.  Imagine you have developed a new drug that you believe is an improvement over an existing drug.  You wish to maximize your ability to detect the improvement, so you opt for a one-tailed test. In doing so, you fail to test for the possibility that the new drug is less effective than the existing drug.  The consequences in this example are extreme, but they illustrate a danger of inappropriate use of a one-tailed test.
• If n&gt;30 Z-test is better
Degrees of freedom:
For the z-test degrees of freedom are not required since z-scores of 1.96 and 2.58 are used for 5% and 1% respectively. For unequal and equal variance t-tests = (n1 + n2) - 2 For paired sample t-test = number of pairs – 1
• In the approach of Ronald Fisher, the null hypothesis H0 will be rejected when the p-value of the test statistic is sufficiently extreme (vis-a-vis the test statistic&apos;s sampling distribution) and thus judged unlikely to be the result of chance. In a one-tailed test, &quot;extreme&quot; is decided beforehand as either meaning &quot;sufficiently small&quot; or meaning &quot;sufficiently large&quot; – values in the other direction are considered not significant. In a two-tailed test, &quot;extreme&quot; means &quot;either sufficiently small or sufficiently large&quot;, and values in either direction are considered significant
P value of Karl Pearson’s Chi squared test is different
• In the approach of Ronald Fisher, the null hypothesis H0 will be rejected when the p-value of the test statistic is sufficiently extreme (vis-a-vis the test statistic&apos;s sampling distribution) and thus judged unlikely to be the result of chance. In a one-tailed test, &quot;extreme&quot; is decided beforehand as either meaning &quot;sufficiently small&quot; or meaning &quot;sufficiently large&quot; – values in the other direction are considered not significant. In a two-tailed test, &quot;extreme&quot; means &quot;either sufficiently small or sufficiently large&quot;, and values in either direction are considered significant
• In the approach of Ronald Fisher, the null hypothesis H0 will be rejected when the p-value of the test statistic is sufficiently extreme (vis-a-vis the test statistic&apos;s sampling distribution) and thus judged unlikely to be the result of chance. In a one-tailed test, &quot;extreme&quot; is decided beforehand as either meaning &quot;sufficiently small&quot; or meaning &quot;sufficiently large&quot; – values in the other direction are considered not significant. In a two-tailed test, &quot;extreme&quot; means &quot;either sufficiently small or sufficiently large&quot;, and values in either direction are considered significant
• In the approach of Ronald Fisher, the null hypothesis H0 will be rejected when the p-value of the test statistic is sufficiently extreme (vis-a-vis the test statistic&apos;s sampling distribution) and thus judged unlikely to be the result of chance. In a one-tailed test, &quot;extreme&quot; is decided beforehand as either meaning &quot;sufficiently small&quot; or meaning &quot;sufficiently large&quot; – values in the other direction are considered not significant. In a two-tailed test, &quot;extreme&quot; means &quot;either sufficiently small or sufficiently large&quot;, and values in either direction are considered significant
• When we test quantitative data, we need to see whether the data is normally distributed or non-normally distributed. Spearman’s correlation for ordinal data that isn’t normally distributed, Pearson’s correlation for normally distributed data
• When we have a normal distribution , look at the number of groups and whether the data is paired. Paired data is where each patient in one group is matched against a smilar patient in the other group. If we have two groups and the data is unpaired, use Student’s t test. If there are two groups and the data is paired, use paired student’s t test. If there are &gt; 2 groups, use ANOVA for paired or unpaired data as appropriate.
• In a non-normal distribution, if there are two groups and the data is unpaired, use Mann-Whitney U test an dif the data is paired, use Wilcoxon Signed Rank Sum test. If there are &gt; 2 groups and the data is unpaired, use Kruskal Wallis test and if the data is paired, use Friedman’s test.
• Chi square distribution for different degree of freedom df = (Row-1) x (Column-1)
• Fisher’s exact test
• Fisher’s exact test: () Binomial coefficient &amp; ! Factorial
• If there is more than one factors (more than one-way) for testing means between groups it is called Factorial ANOVA
Null hypothesis in ANOVA is means of all the groups are equal
• Omnibus (one for all)
• Dependent variable depends on the Independent variable (it’s the effect) while Independent variable (the cause being tested) doesn’t depend on any other variable in the experiment but is directly controlled by the researcher.
• If the Between Group Variation is significantly greater than the Within Group Variation, then it is likely that there is a statistically significant difference between the groups.
• If the Between Group Variation is significantly greater than the Within Group Variation, then it is likely that there is a statistically significant difference between the groups.
• From http://web.utah.edu/stat/introstats/anovaflash.html
• S = SUBJECT DV = DEPENDENT VARIABLE
• If multiple covariates are present we compute multiple correlation coefficients (Multiple Regression)
• If there are two factors &amp; repeated measures are done then a two-way ANOVA of repeated measures is done
• The important point is that the same people are being measured more than once on the same dependent variable (i.e., why it is called repeated measures).
Unfortunately, repeated measures ANOVAs are particularly susceptible to violating the assumption of sphericity, which causes the test to become too liberal (i.e., leads to an increase in the Type I error rate; that is, the likelihood of detecting a statistically significant result when there isn&apos;t one). Fortunately, SPSS makes it easy to test whether your data has met or failed this assumption with Mauchly&apos;s Test of Sphericity
• Once an overall significant difference in means is detected we have to do a pairwise comparison with Post hoc Bonferroni test to discover specific means that differ
• Once an overall significant difference in means is detected we have to do a pairwise comparison with Post hoc Bonferroni test to discover specific means that differ
Dependent variable measured in same subjects at different times (independent variable)
• Dependent variable measured in same subjects at different times (independent variable) compared for two factors
There is a within-subjects factor (Time) and a between-subjects factor (group)
• There will be increased Type 1 errors
• In ANOVA effect size is by Partial Eta squared
Cohen’s d is mean difference divided by pooled standard deviation
• ### Statistics for Anaesthesiologists

1. 1. Statistics for Anaesthesiologists Dr John George K. MD,PDCC Associate Professor of Anaesthesiology KMC, Manipal
2. 2. Recommended Software • RStudio (GUI) with R, R Commander, R Commander Plugins like EZR (Free, Cross platform, powerful programming paradigm) • G*Power (Free, for power analysis) • SPSS (Commercial, expensive) • SOFA (Free, basic) • Graphpad.com • Spreadsheet software like MS Excel for initial data entry (export as CSV file format)
3. 3. Data Types • Nominal or Categorical data • Ordinal data • Interval data • Ratio data
4. 4. Data Types  Nominal: Categorical data and numbers that are simply used as identifiers or names. Ex: social security (Aadhar) number  Ordinal: an ordered series of relationships or rank order. Ex: first, second, or third place in a contest, Likert scale  Interval: A scale that represents quantity and has equal units but for which zero represents simply an additional point of measurement.. Ex: Fahrenheit scale  Ratio: similar to the interval scale. However, this scale also has an absolute zero (no numbers exist below zero). Ex: Height, Weight
5. 5. Parametric tests
6. 6. Non-parametric tests
7. 7. Reporting data types OK to compute Nominal Ordinal Interval Ratio Frequency Distribution Yes Yes Yes Yes Median, percentiles No Yes Yes Yes Mean, SD, SE of No mean No Yes Yes Ratio or coefficient of variation No No Yes No
8. 8. Tests for normality of data • Kolmogorov-Smirnov Test – inferior to others, relies on goodness of fit of a sample with a normal distribution curve, avoid its use! • Shapiro-Wilk Test – better, mores specific, more powerful especially with small sample sizes, available in Rcommander, SPSS (under menu Analyze>Descriptive Statistics>Explore)
9. 9. Tests for normality of data • D'Agostino-Pearson test • Anderson-Darling test • Q-Q (Quantile Probability) Plot – visual guide • Histogram – inferior, look for Skew or Kurtosis • Density Plot – better, look for Skew or Kurtosis
10. 10. Choosing a statistical test • Make sure you have adequate sample size (power) to reject null hypothesis (Ho) • Check is it one (only < or > μ, only one direction) or two-tailed comparison (≠μ , test significance at both sides) – in general use 2 • Look at your data types – ordinal, interval etc • Do descriptive statistics testing
11. 11. Choosing a statistical test • Test normality of data – tests and visual comparison (especially when n<30) • Decide to use Parametric Vs Non-parametric tests • Look at number of groups 2 or more – t-tests (if n<30), z-test (n>30) or ANOVA (F-test) or their non-parametric equivalents • For 2 or more groups check if data is paired or independent
12. 12. What is p-value? Ronald Fisher
13. 13. What is p-value?
14. 14. What is p-value? • The p-value is a probability of the test statistic’s sampling distribution under the null hypothesis (null distribution, we first assume Ho is true!) • The (left-tailed) p-value is the quantile of the value of the test statistic, the right-tailed p-value is one minus the quantile, while the two-tailed p-value is twice whichever of these is smaller. • The p-value is NOT the probability that the null hypothesis is true, nor is it the probability that the alternative hypothesis is false
15. 15. What is p-value? • p-value is NOT the same as α ! • p-value is NOT the probability of rejecting the null hypothesis (we reject Ho when p-value is less than the significance level which is α) • p-value is computed while α is set by experimental design • If Ho is true, α is the probability of rejecting null hypothesis
16. 16. CHI SQUARE OR FISHER’S EXACT TEST? • In the days before computers were readily available, people analyzed contingency tables by hand, or using a calculator, using chi-square tests • Works by computing the expected values for each cell if the relative risk (or odds' ratio) were 1.0. It then combines the discrepancies between observed and expected values into a chi-square statistic from which a P value is computed
17. 17. CHI SQUARE OR FISHER’S EXACT TEST? • The chi-square test is only an approximation! • Yates continuity correction is designed to make it better, but it over corrects so gives a p-value that is too large (too 'conservative’) • With large sample sizes, Yates' correction makes little difference, and the chi-square test works very well. With small sample sizes, chi-square is not accurate, with or without Yates' correction
18. 18. CHI SQUARE OR FISHER’S EXACT TEST? • Fisher's exact test, as its name implies, always gives an exact P value and works fine with small sample sizes • Fisher's test (unlike chi-square) is very hard to calculate by hand (so generally used for 2 x 2 or 2 x n table), but is easy to compute with a computer • Advisable to use when any cell of the table has expected value < 5
19. 19. CHI SQUARE OR FISHER’S EXACT TEST? • Most statistical books advise using it instead of chi-square test (especially small samples, but chi square becomes acceptable for large sample sizes) • Fisher’s exact test can be used for a m x n table • Some have criticized it as the exact answer to the wrong question!
20. 20. Men Women Total Dieting a b a+b Not Dieting c d c+d a+c b+d (a+b+c+d)=n Total
21. 21. ANOVA (ANALYSIS OF VARIANCE) • The one-way analysis of variance (ANOVA) is used to determine whether there are any significant differences between the means of two or more independent (unrelated) groups • For ex: to understand if exam performance (dependent variable) differed based on test anxiety levels amongst students, dividing students into three independent groups (e.g., low, medium and high-stressed students)
22. 22. ONE-WAY ANOVA DESIGN Treatment/C ondition CONDITION1 Levels (Independent Variable) Group1 Group2 Group3 S1 DV S2 DV S3 DV S4 DV S5 DV S6 DV S7 DV S8 DV S9 DV S10 DV S11 DV S12 DV S13 DV S14 DV S15 DV DV = Dependent Variable S = Subject
23. 23. ANOVA (ANALYSIS OF VARIANCE) • It is an omnibus test statistic and cannot tell you which specific groups were significantly different from each other; it only tells you that at least two groups were different. • Since you may have ≥3 groups in your study design, determining which of these groups differ from each other is done using a Post-hoc test (Tukey’s test is preferred) which gives a Multiple comparisons table.
24. 24. ANOVA (ANALYSIS OF VARIANCE) • To apply ANOVA 6 assumptions must be met: • Assumption #1: Your dependent variable should be measured at the interval or ratio level (i.e., they are continuous) • Assumption #2: Your independent variable should consist of two or more categorical, independent groups; it can be used for just two groups (but an independentsamples t-test is more commonly used for two groups)
25. 25. ANOVA (ANALYSIS OF VARIANCE) • Assumption #3: You should have independence of observations, which means that there is no relationship between the observations in each group or between the groups themselves. • Assumption #4: There should be no significant outliers. • Assumption #5: Your dependent variable should be approximately normally distributed for each category of the independent variable (but it is quite "robust" to violations of normality) • Assumption #6: There needs to be homogeneity of variances. (in SPSS using Levene's test for homogeneity of variances)
26. 26. ANOVA (ANALYSIS OF VARIANCE) METHOD • ANOVA calculates the mean for each of the groups - the Group Means. • It calculates the mean for all the groups combined - the Overall Mean. • Then it calculates, within each group, the total deviation of each individual's score from the Group Mean - Within Group (Error )Variation.
27. 27. ANOVA (ANALYSIS OF VARIANCE) METHOD • Next, it calculates the deviation of each Group Mean from the Overall Mean - Between Group Variation. • Finally, ANOVA produces the F statistic which is the ratio Between Group Variation to the Within Group (Error) Variation.
28. 28. ANOVA (ANALYSIS OF VARIANCE) METHOD
29. 29. TWO-WAY ANOVA DESIGN Treatment/Conditi on (Independent) Levels (Independent Variable) Group3 S6 DV S11 DV S2 DV S7 DV S12 DV S3 DV S8 DV S13 DV S4 DV S9 DV S14 DV S5 DV S10 DV S15 DV S16 DV S21 DV S26DV S17 DV CONDITION2 Group2 S1 DV CONDITION1 Group1 S22 DV S27 DV S18 DV S23 DV S28 DV S19 DV S24 DV S29 DV S20 DV S25 DV S30 DV
30. 30. ANCOVA (ANALYSIS OF COVARIANCE) • An extension of the one-way ANOVA used to determine whether there are any significant differences between the means of two or more independent (unrelated) groups (specifically, the adjusted means) by adjusting for a third or confounding variable • Third variable (known as a "covariate” or “confounding variable”) is that you want to "statistically control” that maybe affecting results of ANOVA • In each one of the two groups we can compute the correlation coefficient between the third variable and dependent variables
31. 31. REPEATED MEASURES ANOVA • A repeated measures ANOVA is used when you have a single group on which you have measured something a few times • For example, you may have a test of understanding of Classes. You give this test at the beginning of the topic, at the end of the topic and then at the end of the subject • You would use a one-way repeated measures ANOVA to see if student performance on the test changed over time
32. 32. REPEATED MEASURES ANOVA • Repeated measures ANOVA is the equivalent of the one-way ANOVA, but for related, not independent groups, and is the extension of the dependent t-test • A repeated measures ANOVA is also referred to as a withinsubjects ANOVA or ANOVA for correlated samples • The major advantage with running a repeated measures ANOVA over an independent ANOVA is that the test is generally much more powerful. This particular advantage is achieved by the reduction in variability (due to differences between subjects) during the performance of the test
33. 33. REPEATED MEASURES ANOVA Subjects Time/Condition (Independent Variable) T1 T2 T3 S1 S1 S1 S1 S2 S2 S2 S2 S3 S3 S3 S3 S4 S4 S4 S4 S5 S5 S5 S5
34. 34. TWO-WAY ANOVA REPEATED MEASURES Factor (Independent) Time/Condition (Independent Variable) Subjects T3 S1 S1 S1 S2 S2 S2 S2 S3 S3 S3 S3 S4 S4 S4 S4 S5 S5 S5 S5 S6 S6 S6 S6 S7 GROUP2 T2 S1 GROUP1 T1 S7 S7 S7 S8 S8 S8 S8 S9 S9 S9 S9 S10 S10 S10 S10
35. 35. Variable type & CHOOSING A Test Explanatory Variable Response Variable Methods Categorical Categorical Contingency Tables Categorical Quantitative ANOVA Quantitative Quantitative Regression
36. 36. ANOVA – WHY NOT JUST USE t-TESTS? • Multiple t-tests are not the answer because as the number of groups grows, the number of needed pair comparisons grows quickly. For example in 7 groups there are 21 pairs. If we test 21 pairs we should not be surprised to observe things that happen only 5% of the time. Thus in 21 pairings, a p-value = 0.05 for one pair cannot be considered significant. • Our level of significance α has to be divided for multiple comparisons (Ex: for above it becomes α/21) • ANOVA puts all the data into one number (F) and gives us one pvalue for the null hypothesis.
37. 37. ANOVA – WHY NOT JUST USE t-TESTS? From eBook: Research skills for Psychology Majors by William Gabrenya
38. 38. Likert ITEM & LIKERT Scale
39. 39. Likert ITEM & LIKERT Scale • Likert scale consists of multiple Likert-type items • Likert-type scales (such as "On a scale of 1 to 10, with one being no pain and ten being high pain, how much pain are you in today?") • Represent ordinal data (order, rank, but no real distance)
40. 40. Likert ITEM & LIKERT Scale • Fundamentally, these scales do not represent a measurable quantity • An individual may respond 8 and be in less pain than someone else who responded 5 • A person may not be in exactly half as much pain if they responded 4 than if they responded 8 • Visual Analog Scale is a Likert scale but often (wrongly) analyzed as if it were continuous data
41. 41. COMPOSITE SCORE & LIKERT Scale • Composite scores combine multiple Likert item scales into a single scale • Composite scores must first be analyzed for internal consistency and inter-item correlation for each item and reported (ex: using Cronbach’s alpha – scale reliability analysis) • These scores represent ordinal data so must use non-parametric tests and descriptives
42. 42. Cronbach’s Alpha For scales • Check for internal consistency and overall validity of a multiple Likert-type item scale • Check correlation (α) with each item deleted at a time • Based on number of items and comparison of its variances
43. 43. Cronbach’s Alpha For scales • Values of α range from 0 to 1 • Ideally overall α and α for each item (when deleted from scale) must be > 0.7 to 0.8 • Clinical scores need higher α > 0.8 to 0.9 (Bland-Altman)
44. 44. Power analysis & effect size • To calculate sample size (n) we must know the type of statistical test involved in our primary outcome measure • Also we must also know: • Desired α error (usually taken as 0.05) • Power (1-β) usually taken 0.8 (80%) or greater • Two or one-tailed comparison • Effect size
45. 45. Power analysis & effect size • Power is the fraction of experiments that you expect to yield a "statistically significant” p-value (80% of experiments of the sample may yield a significant p-value) • Effect size (Cohen’s d for mean) depends on study design, it is calculated by data from pilot studies or reference studies • Effect size depends on a clinically defined level of significance (ex: more than 20% difference between 2 groups, with difference for proportion or mean ± SD data etc)
46. 46. Power analysis & effect size • Cohen’s d is usually calculated based on pilot studies but if effect size is unknown Jacob Cohen provided 3 guess estimate effect sizes (value varies slightly for different statistical tests): 1. Small effect d around 0.2 (requires large sample sizes) 2. Medium effect d around 0.5 (seen with careful observation, use when in doubt) 3. Large effect greater than 0.8 (if large it is obvious) • Criticized when d is used as above as “T-shirt” effect sizes
47. 47. Power analysis & effect size • Calculation of required sample size a with set target for power before starting the final study is called A priori analysis (before the fact) – accepted method, especially important to avoid incorrectly being “blind” to a real difference in a negative study (due to large βerror) • Calculation of required sample size at the end of the final study is called Post hoc analysis (after the fact) – incorrect as the computed power is a simple reflection of the pvalue! • G*Power software is a free useful resource
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.