Successfully reported this slideshow.
Topic One: IB Biology The science of data
What is data? <ul><li>Information, in the form of facts or </li></ul><ul><li>figures obtained from experiments </li></ul><...
Statistics  in Science <ul><li>Data can be collected about a population as an example, surveys </li></ul><ul><li>Data can ...
There are two types of Data <ul><ul><li>1.Qualitative </li></ul></ul><ul><ul><li>2. Quantitative </li></ul></ul>
1.  Qualitative Data <ul><li>Information that relates to  characteristics or   description  (observable qualities) </li></...
Qualitative data, put into a graph and manipulated numerically <ul><li>Survey results, teens and need for environmental ac...
2.  Quantitative data <ul><li>measured  using a  naturally occurring  numerical scale  </li></ul><ul><li>Examples </li></u...
Quantitative  <ul><li>Measurements are often displayed graphically </li></ul>
Quantitative = Measurement <ul><li>In data collection for Biology, data must be measured carefully, using laboratory equip...
How to determine uncertainty? <ul><li>As a “rule-of-thumb”, if not specified, use +/- 1/2 of the smallest measurement unit...
Looking at Data <ul><li>How accurate is the data?  (How close are the data to the “real” results?) This is also considered...
 
 
Comparing Averages <ul><li>Once the 2 averages are calculated for each set of data, the average values can be plotted toge...
These two averages should be close to the same
 
Drawing error bars <ul><li>The simplest way to draw an error bar is to use the mean as the central point, and to  use the ...
Average value Value farthest from average Calculated distance
What do error bars suggest? <ul><li>If the bars show extensive overlap, it is likely that there is  not  a significant dif...
 
Quick Review – 3 measures of “Central Tendency” <ul><li>1.  Mode  value that appears most frequently </li></ul><ul><li>2. ...
How can leaf lengths be displayed graphically?
Simply measure the lengths of each and plot how many are of each length
If smoothed, the histogram data assumes this shape
This Shape? <ul><li>Is a classic bell-shaped curve, AKA a Normal Distribution curve. </li></ul><ul><li>Essentially it mean...
Standard Deviation <ul><li>is a statistic that tells you how tightly all the various examples are clustered around the mea...
A typical standard distribution curve
According to this curve : <ul><li>One standard deviation   away from the mean in either direction on the horizontal axis (...
 
Three Standard Deviations? <ul><li>three standard deviations (the red, green and blue areas) account for about 99 percent ...
How is Standard Deviation calculated? <ul><li>With this formula! </li></ul>
<ul><li>DO I NEED TO KNOW THIS FOR THE TEST????? </li></ul>
Not the formula! <ul><li>This can be calculated on a scientific calculator </li></ul><ul><li>OR…. In  Microsoft Excel , ty...
You DO need to know the concept! <ul><li>standard deviation  is a statistic that tells  how tightly all the various datapo...
Here is an example
Comparison of Two Samples  Two of the same type of Mollusks from two different locations
Mollusk shell measurements   <ul><li>Example: The two mollusk shell samples  above  </li></ul><ul><li>Population 1. Mean =...
Standard deviation in the error bar
A sample with a small standard deviation suggest narrow variation (Pop. 2) .   <ul><li>The second population has a greater...
Statistical hypothesis testing  (null hypothesis)  <ul><li>Is used to describe some aspect of the statistical </li></ul><u...
Comparison of Two Samples Using a t-Test
t -test The  t -test compares the averages and standard deviations of two samples to see if there is a significant differe...
This is done on a  calculator or in excel
Drawing conclusions <ul><li>1. State the null hypothesis and the alternative hypothesis based on your research question . ...
<ul><li>4. Write a summary statement based on the decision. </li></ul><ul><ul><ul><li>The null hypothesis is rejected sinc...
Step 1 to calculating t test in Excel <ul><li>Pick a box where you want your t value displayed </li></ul><ul><li>Go to fx ...
In fx in Category pick Statistical and then find t test in select function
<ul><li>Array 1  is population 1 data cell B3 to B12 </li></ul><ul><li>Array 2  is population 2 data cells C3 to C12  </li...
What are tails in a t test ? <ul><li>Think of it as one of two side of the graph. A one- or two-tailed t-test is determine...
What are the 3 types? <ul><li>Type 1  Let’s say you wanted to test whether heart rate increased after drinking a cup of ho...
<ul><li>Say you want to know whether nursing students  </li></ul><ul><li>consume more coffee than do biology students.  </...
Type 3 <ul><li>Now the tricky part is to decide which of these to use. </li></ul><ul><li>Are the standard deviations about...
You now have a t test value
Convert data to a percent   <ul><li>selecting the box with the data  </li></ul><ul><li>right click  </li></ul><ul><li>pick...
.03% is less then .05%
Correlation or  Co-relation <ul><li>Refers to the departure of two random variables from independence (that they maybe int...
Scatterplots    <ul><li>                                             </li></ul><ul><li>The concept of correlation can be d...
Negative 1 correlation
Coefficient of -1.00 <ul><li>A  correlation coefficient of -1.00  means that every subject’s scores are the exactly same s...
A correlation coefficient of 0 means that the two variables, age and height, are unrelated to one another
Correlation Coefficient  +1.00
  Positive coefficient (+1.00) <ul><li>A correlation coefficient of +1.00 means that every subject’s scores are  exactly  ...
Creating a Scatter graph in Excel
 
Highlight your Numbers and Selected Chart Wizard
In Chart Wizard chose  Scatter graph
Label your graph &  x and y axis
To added a trendline  RIGHT click  on  any data point
Choice linear
You can now clip and paste it into word for you lab
Causation <ul><li>&quot; Correlation does not imply causation &quot; is a phrase used in  science  to emphasize that  corr...
Upcoming SlideShare
Loading in …5
×

Statistics

12,472 views

Published on

Published in: Education, Technology, Business
  • hi nsb biology class, EBO 4 lyfe <3 class 2k11
    from chris
    ps shout out to alanananaana
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Statistics

  1. 1. Topic One: IB Biology The science of data
  2. 2. What is data? <ul><li>Information, in the form of facts or </li></ul><ul><li>figures obtained from experiments </li></ul><ul><li>or surveys, used as a basis for </li></ul><ul><li>making calculations or drawing </li></ul><ul><li>conclusions </li></ul>
  3. 3. Statistics in Science <ul><li>Data can be collected about a population as an example, surveys </li></ul><ul><li>Data can be collected about a process as an example, experimentations. </li></ul>
  4. 4. There are two types of Data <ul><ul><li>1.Qualitative </li></ul></ul><ul><ul><li>2. Quantitative </li></ul></ul>
  5. 5. 1. Qualitative Data <ul><li>Information that relates to characteristics or description (observable qualities) </li></ul><ul><li>Information is often grouped by descriptive category </li></ul><ul><li>Examples </li></ul><ul><ul><ul><li>Species of plant </li></ul></ul></ul><ul><ul><ul><li>Type of insect </li></ul></ul></ul><ul><ul><ul><li>Shades of color </li></ul></ul></ul><ul><ul><ul><li>Rank of flavor in taste testing </li></ul></ul></ul><ul><ul><li>Remember: qualitative data can be “scored” and evaluated numerically </li></ul></ul>
  6. 6. Qualitative data, put into a graph and manipulated numerically <ul><li>Survey results, teens and need for environmental action </li></ul>
  7. 7. 2. Quantitative data <ul><li>measured using a naturally occurring numerical scale </li></ul><ul><li>Examples </li></ul><ul><ul><li>Chemical concentration </li></ul></ul><ul><ul><li>Temperature </li></ul></ul><ul><ul><li>Length </li></ul></ul><ul><ul><li>Weight…etc. </li></ul></ul>
  8. 8. Quantitative <ul><li>Measurements are often displayed graphically </li></ul>
  9. 9. Quantitative = Measurement <ul><li>In data collection for Biology, data must be measured carefully, using laboratory equipment (ex. timers, meterstick, pH meter, balance, etc) </li></ul><ul><li>The limits of the equipment used add some uncertainty to the data collected. All equipment has a certain magnitude of uncertainty. For example, is a ruler that is mass-produced a good measure of 1 cm? 1mm? 0.1mm? </li></ul><ul><li>For quantitative testing, you must indicate the level of uncertainty of the tool that you are using for measurement!! </li></ul>
  10. 10. How to determine uncertainty? <ul><li>As a “rule-of-thumb”, if not specified, use +/- 1/2 of the smallest measurement unit </li></ul>In the picture above of the meter stick, the uncertainty is +/- .5mm
  11. 11. Looking at Data <ul><li>How accurate is the data? (How close are the data to the “real” results?) This is also considered as BIAS </li></ul><ul><li>How precise is the data? (All test systems have some uncertainty, due to limits of measurement) Estimation of the limits of the experimental uncertainty is essential. </li></ul>
  12. 14. Comparing Averages <ul><li>Once the 2 averages are calculated for each set of data, the average values can be plotted together on a graph, to visualize the relationship between the 2 </li></ul>
  13. 15. These two averages should be close to the same
  14. 17. Drawing error bars <ul><li>The simplest way to draw an error bar is to use the mean as the central point, and to use the distance of the measurement that is furthest from the average as the endpoints of the data bar </li></ul>
  15. 18. Average value Value farthest from average Calculated distance
  16. 19. What do error bars suggest? <ul><li>If the bars show extensive overlap, it is likely that there is not a significant difference between those values </li></ul>
  17. 21. Quick Review – 3 measures of “Central Tendency” <ul><li>1. Mode value that appears most frequently </li></ul><ul><li>2. Median When all data are listed from least to greatest, the value at which half of the observations are greater, and half are lesser. </li></ul><ul><li>3. Mean The most commonly used measure of central tendency is the mean , or arithmetic average (sum of data points divided by the number of points)      </li></ul>
  18. 22. How can leaf lengths be displayed graphically?
  19. 23. Simply measure the lengths of each and plot how many are of each length
  20. 24. If smoothed, the histogram data assumes this shape
  21. 25. This Shape? <ul><li>Is a classic bell-shaped curve, AKA a Normal Distribution curve. </li></ul><ul><li>Essentially it means that in all studies with an adequate number of datapoints (>30) a significant number of results tend to be near the mean. Fewer results are found farther from the mean </li></ul>
  22. 26. Standard Deviation <ul><li>is a statistic that tells you how tightly all the various examples are clustered around the mean in a set of data </li></ul><ul><li>Is a more sophisticated indicator of the precision of a set of a given number of measurements </li></ul><ul><li>It is like an average deviation of measurement values from the mean. In large studies, the standard deviation is used to draw error bars, instead of the maximum deviation. </li></ul>
  23. 27. A typical standard distribution curve
  24. 28. According to this curve : <ul><li>One standard deviation away from the mean in either direction on the horizontal axis (the red area on the preceding graph) accounts for somewhere around 68 percent of the data in this group. </li></ul><ul><li>Two standard deviations away from the mean ( the red and green areas ) account for roughly 95 percent of the data . </li></ul>
  25. 30. Three Standard Deviations? <ul><li>three standard deviations (the red, green and blue areas) account for about 99 percent of the data </li></ul>-3sd -2sd +/-1sd 2sd +3sd
  26. 31. How is Standard Deviation calculated? <ul><li>With this formula! </li></ul>
  27. 32. <ul><li>DO I NEED TO KNOW THIS FOR THE TEST????? </li></ul>
  28. 33. Not the formula! <ul><li>This can be calculated on a scientific calculator </li></ul><ul><li>OR…. In Microsoft Excel , type the following code into the cell where you want the Standard Deviation result, using the &quot;unbiased,&quot; or &quot;n-1&quot; method: = STDEV (A1:A30) (substitute the cell name of the first value in your dataset for A1, and the cell name of the last value for A30.) </li></ul><ul><li>OR….Try this! http://www.pages.drexel.edu/~jdf37/mean.htm </li></ul>
  29. 34. You DO need to know the concept! <ul><li>standard deviation is a statistic that tells how tightly all the various datapoints are clustered around the mean in a set of data. </li></ul><ul><li>When the datapoints are tightly bunched together and the bell-shaped curve is steep, the standard deviation is small.(precise results, smaller sd) </li></ul><ul><li>When the datapoints are spread apart and the bell curve is relatively flat, a large standard deviation value suggests less precise results </li></ul>
  30. 35. Here is an example
  31. 36. Comparison of Two Samples Two of the same type of Mollusks from two different locations
  32. 37. Mollusk shell measurements <ul><li>Example: The two mollusk shell samples above </li></ul><ul><li>Population 1. Mean = 31.4 Standard deviation(s)= 5.7 </li></ul><ul><li>Population 2. Mean =41.6 Standard deviation(s) = 4.3 </li></ul>
  33. 38. Standard deviation in the error bar
  34. 39. A sample with a small standard deviation suggest narrow variation (Pop. 2) . <ul><li>The second population has a greater mean shell length but slightly narrower variation. Why this is the case would require further observation and experiment on environmental and genetic factors. </li></ul>
  35. 40. Statistical hypothesis testing (null hypothesis) <ul><li>Is used to describe some aspect of the statistical </li></ul><ul><li>set of data (like our mollusk data). </li></ul><ul><li>We will use this to compare the two data sets </li></ul><ul><li>The question that we might now ask is: </li></ul><ul><li>Null Hypothesis: Is there no significant difference between the two samples except as caused by chance selection of data. </li></ul><ul><li>OR </li></ul><ul><li>Alternative hypothesis: Is there a significant difference between the height of shells in sample A and sample B. </li></ul>
  36. 41. Comparison of Two Samples Using a t-Test
  37. 42. t -test The t -test compares the averages and standard deviations of two samples to see if there is a significant difference between them. We start by calculating a number , t t can be calculated using the equation: ( x 1 – x 2 ) (s 1 ) 2 n 1 (s 2 ) 2 n 2 + t = Where: x 1 is the mean of sample 1 s 1 is the standard deviation of sample 1 n 1 is the number of individuals in sample 1 x 2 is the mean of sample 2 s 2 is the standard deviation of sample 2 n 2 is the number of individuals in sample 2
  38. 43. This is done on a calculator or in excel
  39. 44. Drawing conclusions <ul><li>1. State the null hypothesis and the alternative hypothesis based on your research question . Null Hypothesis: 'There is no significant difference between the height of shells in sample A and sample B.' Alternative Hypothesis: 'There is a significant difference between the height of shells in sample A and sample B'. </li></ul><ul><li>2. Set the critical P level ( your cutoff % ) </li></ul><ul><li>at P= 0.05 (5%) </li></ul><ul><li>3. Write the decision rule for rejecting the null hypothesis. </li></ul><ul><ul><ul><li>If P  > 5% then the two sets are the same (i.e. accept the null hypothesis). </li></ul></ul></ul><ul><ul><ul><li>If P  < 5% then the two sets are different (i.e. reject the null hypothesis) . </li></ul></ul></ul>
  40. 45. <ul><li>4. Write a summary statement based on the decision. </li></ul><ul><ul><ul><li>The null hypothesis is rejected since calculated P = 0.003 < P =0.05 two-tailed test </li></ul></ul></ul><ul><li>5. Write a statement of results in standard English. </li></ul><ul><ul><ul><li>There is a significant difference between the height of shells in sample A and sample B. </li></ul></ul></ul>
  41. 46. Step 1 to calculating t test in Excel <ul><li>Pick a box where you want your t value displayed </li></ul><ul><li>Go to fx and </li></ul>
  42. 47. In fx in Category pick Statistical and then find t test in select function
  43. 48. <ul><li>Array 1 is population 1 data cell B3 to B12 </li></ul><ul><li>Array 2 is population 2 data cells C3 to C12 </li></ul><ul><li>Tails in Biology tail is always 2 </li></ul><ul><li>Type can be 1 paired 2Two sample equal variance 3 Two samples unequal variance </li></ul>
  44. 49. What are tails in a t test ? <ul><li>Think of it as one of two side of the graph. A one- or two-tailed t-test is determined by whether the total area of a is placed in one tail or divided equally between the two tails. The one-tailed t-test is performed if the results are interesting only if they turn out in a particular direction. </li></ul>One Tail Two Tail
  45. 50. What are the 3 types? <ul><li>Type 1 Let’s say you wanted to test whether heart rate increased after drinking a cup of hot sauce or whether plant growth would increase after adding fertilizer to pots of soil. In these cases you would be comparing the heart rate of the same people, or the growth of the same pot of plants before and after the treatment. This would require a &quot;paired&quot; or &quot;dependent&quot; T test. Excel calls this a &quot;type 1&quot; test </li></ul>
  46. 51. <ul><li>Say you want to know whether nursing students </li></ul><ul><li>consume more coffee than do biology students. </li></ul><ul><li>You would then have two groups of test subjects </li></ul><ul><li>rather than taking 2 measurements on each person. </li></ul><ul><li>Now you would use an &quot;unpaired&quot; or &quot;independent“ </li></ul><ul><li>T-test. Excel calls these &quot;type 2&quot; or &quot;type 3&quot; tests. </li></ul><ul><li>Type 2 Does the data seem homogeneity (similar) combines the data from several studies, homogeneity measures the differences or similarities between the several studies </li></ul>Type 2 & 3
  47. 52. Type 3 <ul><li>Now the tricky part is to decide which of these to use. </li></ul><ul><li>Are the standard deviations about the same for both </li></ul><ul><li>groups, or are they different? </li></ul><ul><li>You can test this statistically, but let’s just work with </li></ul><ul><li>how they seem. If in doubt, go with &quot;type 3&quot; for </li></ul><ul><li>unequal variances. </li></ul>
  48. 53. You now have a t test value
  49. 54. Convert data to a percent <ul><li>selecting the box with the data </li></ul><ul><li>right click </li></ul><ul><li>pick format the cell </li></ul><ul><li>select percentage </li></ul><ul><li>you are done </li></ul>
  50. 55. .03% is less then .05%
  51. 56. Correlation or Co-relation <ul><li>Refers to the departure of two random variables from independence (that they maybe interconnected). </li></ul><ul><li>Correlations cannot indicate the potential existence of causal relations. However, the causes underlying the correlation, if any, may be indirect and unknown, and high correlations also overlap with identity relations, where no causal process exists. Consequently, establishing a correlation between two variables is not a sufficient condition to establish a causal relationship (in either direction). </li></ul>
  52. 57. Scatterplots    <ul><li>                                            </li></ul><ul><li>The concept of correlation can be demonstrated by using scatterplots . </li></ul><ul><li>A scatterplot is a graph of data points for two variables, with one variable on each axis. </li></ul><ul><li>The data points are plotted in the field of the graph according to their values for each variable. This produces a &quot;scatter&quot; of points; a more narrow scatter pattern occurs when the correlation is high. </li></ul>
  53. 58. Negative 1 correlation
  54. 59. Coefficient of -1.00 <ul><li>A correlation coefficient of -1.00 means that every subject’s scores are the exactly same standardized distance but in opposite directions from the means of both variables </li></ul><ul><li>As the value of %Fat increases (i.e., as you move from left to right on the X axis), the value of Y decreases (i.e., moves toward the bottom on the Y axis). </li></ul>
  55. 60. A correlation coefficient of 0 means that the two variables, age and height, are unrelated to one another
  56. 61. Correlation Coefficient +1.00
  57. 62. Positive coefficient (+1.00) <ul><li>A correlation coefficient of +1.00 means that every subject’s scores are exactly the same standardized distance and the same direction from the means for both variables </li></ul><ul><li>Taller people tend to weigh more; </li></ul><ul><li>shorter people tend to weigh less . </li></ul>
  58. 63. Creating a Scatter graph in Excel
  59. 65. Highlight your Numbers and Selected Chart Wizard
  60. 66. In Chart Wizard chose Scatter graph
  61. 67. Label your graph & x and y axis
  62. 68. To added a trendline RIGHT click on any data point
  63. 69. Choice linear
  64. 70. You can now clip and paste it into word for you lab
  65. 71. Causation <ul><li>&quot; Correlation does not imply causation &quot; is a phrase used in science to emphasize that correlation between two variables does not automatically imply that one causes the other . </li></ul>

×