Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Introduction to Bayesian Methods by Corey Chivers 28312 views
- Intro to Machine Learning by Corey Chivers 4165 views
- Observing Dark Worlds by Corey Chivers 5845 views
- Bioinformatics with Python Cookbook... by Packt Publishing 398 views
- E 007 applied multivariate (neil ... by Victor Huaraccall... 678 views
- Hair et al 2010 by Aditya Novanto 5802 views

3,166 views

2,993 views

2,993 views

Published on

Published in:
News & Politics

No Downloads

Total views

3,166

On SlideShare

0

From Embeds

0

Number of Embeds

2,268

Shares

0

Downloads

33

Comments

1

Likes

1

No embeds

No notes for slide

- 1. Making sense out of data(aka doing statistics)
- 2. Things you will need
- 3. Who I am and what I do Corey Chivers PhD Student in Biology at McGillI study biological invasions using statistics
- 4. What is a Statistician?
- 5. What is a Statistician?A statistician issomeone who:
- 6. What is a Statistician?A statistician is ● Turns data into insights.someone who:
- 7. What is a Statistician?A statistician is ● Turns data into insights.someone who: ● Answers questions about the world.
- 8. What is a Statistician? var iatA statistician is ● Turns data into insights. io n i nsomeone who: ● Answers questions about the world.
- 9. What is a Statistician? var iatA statistician is ● Turns data into insights. io n i nsomeone who: ● Answers questions about the world. ● Isnt fun to talk to at a party?
- 10. Statistics is very cool
- 11. Data is Everywhere
- 12. Data is Everywhere
- 13. Statisticians are in demand
- 14. Portrait of a Statistician
- 15. Portrait of a Statistician ?
- 16. Portrait of a StatisticianThe cool kids are calling themselves Data Scientists
- 17. Portrait of a StatisticianThe cool kids are calling themselves Data Scientists Name: Hilary Mason Title: Chief Data Scientist at bit.ly member of Mayor Bloomberg’s Technology and Innovation Advisory Council From her web bio: “I <3 data and cheeseburgers.”
- 18. What do you know about statistics?● On a piece of paper, make a list of all the words you know about statistics.● Ill start: – Average (mean) – Variance – Normal distribution – ...
- 19. Despite how exciting we are, statisticians always start by assuming the world is boringThe Null Hypothesis, or Ho is this boring world.
- 20. Despite how exciting we are, statisticians always start by assuming the world is boringThe Null Hypothesis, or Ho is this boring world. Usually something like “there is no effect of caption size on the lulzyness of LOLcats”
- 21. Looking for evidence against the Null Hypothesis● The alternative hypothesis (Ha) is that something interesting is going on. – Ex: “Bigger captions are, on average, funnier”● How would we know?
- 22. Looking for evidence against the Null Hypothesis● The alternative hypothesis (Ha) is that something interesting is going on. – Ex: “Bigger captions are, on average, funnier”● How would we know?● To the internetz!
- 23. Collect some sample data! Small caption, fairly humourousBig caption, quite funny Small caption, funny-ishBig caption, peed in pants a little
- 24. Dealing with variabilitySome small caption imagesare funny, and some largecaption images are not funny.There is variance in the data.But we want to know if thereis a difference on average.Well need to take varianceinto account.
- 25. Descriptive Statistics Measures of Variability Variance Standard Deviation √ n n ∑ ( xi − ̄) x 2 ∑ (x i − ̄ ) x 22 i=1s= s= i=1 n−1 n−1 Where xi = the ith value of a distribution n = number of values in the sample x = sample mean
- 26. Descriptive Statistics Measures of Variability1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 9 nVariance and Standard Deviation ∑ ( x i − ̄ )2 x 2 i=1Therefore, variance of our dist’n (w/ mean = 5): s= n−1 Step 1 Step 2 Step 3 1-5 = -4 -42 = 16 16 + 9 + 4 + 4 + […] + 16 = 72 2-5 = -3 -32 = 9 3-5 = -2 -22 = 4 Step 4 (Variance) 3-5 = -2 -22 = 4 72/18 = 4 […] […] 9-5 = 4 42 = 16 Step 5 (Std Deviation) √4 = 2
- 27. Your turnCalculate the variance of the heights in your group. n 1) Write down your heights (xi) ∑ ( x i − ̄ )2 x 2) Calculate the average (Σxi / n) 3) Subtract the average for each 2 i=1 s= height and square it n−1 4) Add them all up and divide by n-1
- 28. Variance
- 29. Measures of Central Tendency Calculating the MeanUsing the following distribution of values:1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 9(Arithmetic) Mean – the average of a distribution of values n ∑ xi or Sum of values in dist’n x i =1 Number of values in dist’n ̄ = n−1 1+2+3+3+4+4+4+5+5+5+5+5+6+6+6+7+7+8+9 19 =5
- 30. Could the difference be due to chance?Remember, we started byassuming that there was nodifference (the NullHypothesis).If the Null Hypothesis istrue, what are thechances that weobserved this amount ofdifference betweengroups?How do we decide whetherthe difference is due tochance or not?By vote???
- 31. A better way: (formal) Hypothesis testing● Determine in advance the level of error you are willing to put up with. – We cannot avoid the chance of errors, but we can decide how often we are willing to have them happen.● Biologist like to use 0.05 (a 1 in 20 chance).● We call this α (alpha)
- 32. A better way: (formal) Hypothesis testing● Determine in advance the level of error you are willing to put up with. – We cannot avoid the chance of errors, but we can decide how often we are willing to have them happen.● Biologist like to use 0.05 (a 1 in 20 chance).● We call this α (alpha) Ronald Fisher: The man behind the idea of NHST
- 33. A better way: (formal) Hypothesis testing● Calculate how likely your data set is if the null were true.● If it is less than α, we say that we reject the null hypothesis.● If we reject the null, we say the results are statistically significant.
- 34. A better way: (formal) Hypothesis testing● Calculate how likely your data set is if the null were true.● If it is less than α, we say that we reject the null hypothesis.● If we reject the null, we say the results are statistically significant. “The world is not boring afterall!”
- 35. Lets do it!● To calculate how likely it is that our data is from the null hypothesis (ie difference is due to chance), we need a statistic.● But first, some Beer!
- 36. Students t-test● William Sealy Gosset figured out how to test if a batch of beer was significantly different than the standard. While working for the Guinness brewing company, he was forbidden to publish academic research, so published his method under the pseudonym student.
- 37. Students t-test The t-value is calculated using the following equation: X 1− X 2 ̄ ̄ t= √ 2 2 s s 1 2 + n1 n2Where x 1 and x 2 are the means of the experimental and controlgroups;S12 and S22 are the variances of the experimental and control groups;n1 and n2 are the sample sizes for the experimental and controlgroups.
- 38. Students t-test The t-value is calculated using the following equation: X 1− X 2 ̄ ̄ t= √ 2 2 s s 1 2 + n1 n2Where x 1 and x 2 are the means of the experimental and controlgroups;S12 and S22 are the variances of the experimental and control groups;n1 and n2 are the sample sizes for the experimental and controlgroups.
- 39. t-test State your alpha level α = 0.05If the t-test detects a difference between the means,there is a 5% chance that this conclusion isincorrect.
- 40. Calculating your t-value Generic-brand Name-brand (Group 2) (Group 1)Mean # of chips x 2 = 11.2 x 1 = 15.3 Standard S2 = 4.3 S1 = 2.4 Deviationn (sample size) n1 = 3 n2 = 3 X 1− X 2 ̄ ̄t= According to the data above: √ 2 2 s1 s 2 calculated t = 1.4 + n1 n2
- 41. Alternate HypothesisYou can only test ONE possible alternate hypothesis atany one time. The one chosen depends on what you arelooking to find.Alternative hypothesis: 2 types 2-tailed Non-directional (general): not specifying a direction. “The two groups are not the same” 1-tailed Directional (specific): specify direction “Group A is greater than group B.”
- 42. Look up the Critical t-valueIn order to find your critical t-value, you need 3 pieces of information: 1. Whether the alternate hypothesis is 1- or 2-tailed 2. Alpha level (usually = 0.05) 3. Degrees of freedom (df = n-1)Calculating degrees of freedom (df)Degrees of Freedom = n-1What if you have 2 different sample sizes (n1 and n2)…which do you pick to calculate your degrees of freedom?A: df = the smallest of : (n1-1) or (n2 –1)
- 43. Looking up your Critical t-value
- 44. Compare your ‘calculated’ t- value with your ‘critical’ t-valueIt is the difference in values between the t-value and critical t thatwill determine whether you can reject or fail to reject your nullhypothesisa) If ‘calculated’ > ‘critical’, then: reject null hyp. “My observed data are really unlikely under the null hypothesis, therefore I reject the null hypothesis!”b) If ‘calculated’ < ‘critical’, then: do NOT reject nullhyp. “My observed data are consistent with the null hypothesis, therefore I have no reason to believe that it is not true.”
- 45. What if we are measuring a category, rather than a number?● The t-test lets us compare the value of some attribute between two groups. – Do mutant fruit flies live longer than wild type? – Does IQ differ between Dawson and Laurier students? – Does drug x decrease blood pressure?● The dependent variable is quantitative: – Life span – IQ – Blood pressure
- 46. What if we are measuring a category, rather than a number?● Chi-squared test lets us test hypotheses about categories. – Are there more cars of a certain colour getting speeding tickets? – Is the ratio of dominant to recessive phenotypes 3:1? – Do chromosomes assort independently?● The dependent variable is categorical: – Car colour – Phenotype – Chromosome donor
- 47. Chi-square or T-test???How do you know which one you need? T-Test Chi-square Test• the dependent variable is • the dependent variable isquantitative qualitative (aka. Nominal data) (e.g. height, weight, etc.) (e.g. gender, colour, etc.)• data can be organized as two • data can be easily tabulated aslists of numbers counts:Example: Room Cold Example: temp temp (bpm) (bpm) Male 98 178 86 Femal 102 e 169 89 192 55 (dependent variable: gender)(dependent variable: heart rate)
- 48. Steps to performing a chi-square test1. State your null hypothesis2. State your alternate hypothesis3. State your alpha level (usually α = 0.05)4. Calculate your ‘calculated chi-square value’5. Look up your ‘critical chi-square value’ (from chi-square table)6. Compare your ‘calculated’ and ‘critical’ values a) If ‘calculated’ > ‘critical’, conclusion: reject null hyp. b) If ‘calculated’ < ‘critical’, conclusion: do NOT reject null hyp.7. State your conclusion
- 49. Sample hypotheses for chi-square Sex ratio in our class Null hypothesis 1. There is no difference between the frequency of men and women in the class ____________________________ 2. There is a difference between the frequency of menAlternative hypothesis and women in the class Chi-square can only test non- directional alt. hyp.
- 50. Calculating Chi-square ‘Calculated’ chi-square values are calculated using the following formula: O = observed E = expected Calculating the chi-square is easier using the following table:Gender O E O-E (O-E)2 (O-E)2 EFemaleMale χ2 = sum of last column =
- 51. Looking up the Critical χ2To find the critical χ2 , you need the alpha level and the df.Df for a χ2 test = (# of categories) – 1In our example, df = 2-1 = 1
- 52. Compare your ‘calculated’ chi- sq with your ‘critical’ chi-sqIt is the difference between the calculated chi-sq and critical chi-sqthat will determine whether you can reject or fail to reject yournull hypothesisa) If ‘calculated’ > ‘critical’, then: reject null hyp. “My observed data are really unlikely under the null hypothesis, therefore I reject the null hypothesis!”b) If ‘calculated’ < ‘critical’, then: do NOT reject nullhyp. “My observed data are consistent with the null hypothesis, therefore I have no reason to believe that it is not true.”
- 53. Statistics just might save your life
- 54. Questions for Corey● You can email me! Corey.chivers@mail.mcgill.ca● I blog about statistics: bayesianbiologist.com● I tweet about statistics: @cjbayesian

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Thanks

Dr E Munowenyu ( Did my Masters at McGill!)