Upcoming SlideShare
×

Like this presentation? Why not share!

# Statistics

## on Jan 12, 2010

• 11,881 views

### Views

Total Views
11,881
Views on SlideShare
3,033
Embed Views
8,848

Likes
3
0
1

### 16 Embeds8,848

 http://richardsonscience.weebly.com 8556 http://www.weebly.com 112 http://jcollette17.blogspot.com 99 http://commackibbio1.blogspot.com 67 http://www.slideshare.net 2 http://pinterest.com 2 http://jcollette17.blogspot.se 1 https://www.google.com 1 https://www.google.ca 1 http://commackibbio1.blogspot.ca 1 http://jcollette17.blogspot.ca 1 http://commackibbio1.blogspot.kr 1 http://mskibbio.weebly.com 1 http://commackibbio1.blogspot.com.es 1 http://jcollette17.blogspot.com.au 1 http://www.google.com 1
More...

### Categories

Uploaded via as Microsoft PowerPoint

### Report content

11 of 1

• Comment goes here.
Are you sure you want to
Your message goes here
• hi nsb biology class, EBO 4 lyfe <3 class 2k11
from chris
ps shout out to alanananaana
Are you sure you want to
Your message goes here

## StatisticsPresentation Transcript

• Topic One: IB Biology The science of data
• What is data?
• Information, in the form of facts or
• figures obtained from experiments
• or surveys, used as a basis for
• making calculations or drawing
• conclusions
• Statistics in Science
• Data can be collected about a population as an example, surveys
• Data can be collected about a process as an example, experimentations.
• There are two types of Data
• 1.Qualitative
• 2. Quantitative
• 1. Qualitative Data
• Information that relates to characteristics or description (observable qualities)
• Information is often grouped by descriptive category
• Examples
• Species of plant
• Type of insect
• Shades of color
• Rank of flavor in taste testing
• Remember: qualitative data can be “scored” and evaluated numerically
• Qualitative data, put into a graph and manipulated numerically
• Survey results, teens and need for environmental action
• 2. Quantitative data
• measured using a naturally occurring numerical scale
• Examples
• Chemical concentration
• Temperature
• Length
• Weight…etc.
• Quantitative
• Measurements are often displayed graphically
• Quantitative = Measurement
• In data collection for Biology, data must be measured carefully, using laboratory equipment (ex. timers, meterstick, pH meter, balance, etc)
• The limits of the equipment used add some uncertainty to the data collected. All equipment has a certain magnitude of uncertainty. For example, is a ruler that is mass-produced a good measure of 1 cm? 1mm? 0.1mm?
• For quantitative testing, you must indicate the level of uncertainty of the tool that you are using for measurement!!
• How to determine uncertainty?
• As a “rule-of-thumb”, if not specified, use +/- 1/2 of the smallest measurement unit
In the picture above of the meter stick, the uncertainty is +/- .5mm
• Looking at Data
• How accurate is the data? (How close are the data to the “real” results?) This is also considered as BIAS
• How precise is the data? (All test systems have some uncertainty, due to limits of measurement) Estimation of the limits of the experimental uncertainty is essential.
•
•
• Comparing Averages
• Once the 2 averages are calculated for each set of data, the average values can be plotted together on a graph, to visualize the relationship between the 2
• These two averages should be close to the same
•
• Drawing error bars
• The simplest way to draw an error bar is to use the mean as the central point, and to use the distance of the measurement that is furthest from the average as the endpoints of the data bar
• Average value Value farthest from average Calculated distance
• What do error bars suggest?
• If the bars show extensive overlap, it is likely that there is not a significant difference between those values
•
• Quick Review – 3 measures of “Central Tendency”
• 1. Mode value that appears most frequently
• 2. Median When all data are listed from least to greatest, the value at which half of the observations are greater, and half are lesser.
• 3. Mean The most commonly used measure of central tendency is the mean , or arithmetic average (sum of data points divided by the number of points)
• How can leaf lengths be displayed graphically?
• Simply measure the lengths of each and plot how many are of each length
• If smoothed, the histogram data assumes this shape
• This Shape?
• Is a classic bell-shaped curve, AKA a Normal Distribution curve.
• Essentially it means that in all studies with an adequate number of datapoints (>30) a significant number of results tend to be near the mean. Fewer results are found farther from the mean
• Standard Deviation
• is a statistic that tells you how tightly all the various examples are clustered around the mean in a set of data
• Is a more sophisticated indicator of the precision of a set of a given number of measurements
• It is like an average deviation of measurement values from the mean. In large studies, the standard deviation is used to draw error bars, instead of the maximum deviation.
• A typical standard distribution curve
• According to this curve :
• One standard deviation away from the mean in either direction on the horizontal axis (the red area on the preceding graph) accounts for somewhere around 68 percent of the data in this group.
• Two standard deviations away from the mean ( the red and green areas ) account for roughly 95 percent of the data .
•
• Three Standard Deviations?
• three standard deviations (the red, green and blue areas) account for about 99 percent of the data
-3sd -2sd +/-1sd 2sd +3sd
• How is Standard Deviation calculated?
• With this formula!
• DO I NEED TO KNOW THIS FOR THE TEST?????
• Not the formula!
• This can be calculated on a scientific calculator
• OR…. In Microsoft Excel , type the following code into the cell where you want the Standard Deviation result, using the &quot;unbiased,&quot; or &quot;n-1&quot; method: = STDEV (A1:A30) (substitute the cell name of the first value in your dataset for A1, and the cell name of the last value for A30.)
• OR….Try this! http://www.pages.drexel.edu/~jdf37/mean.htm
• You DO need to know the concept!
• standard deviation is a statistic that tells how tightly all the various datapoints are clustered around the mean in a set of data.
• When the datapoints are tightly bunched together and the bell-shaped curve is steep, the standard deviation is small.(precise results, smaller sd)
• When the datapoints are spread apart and the bell curve is relatively flat, a large standard deviation value suggests less precise results
• Here is an example
• Comparison of Two Samples Two of the same type of Mollusks from two different locations
• Mollusk shell measurements
• Example: The two mollusk shell samples above
• Population 1. Mean = 31.4 Standard deviation(s)= 5.7
• Population 2. Mean =41.6 Standard deviation(s) = 4.3
• Standard deviation in the error bar
• A sample with a small standard deviation suggest narrow variation (Pop. 2) .
• The second population has a greater mean shell length but slightly narrower variation. Why this is the case would require further observation and experiment on environmental and genetic factors.
• Statistical hypothesis testing (null hypothesis)
• Is used to describe some aspect of the statistical
• set of data (like our mollusk data).
• We will use this to compare the two data sets
• The question that we might now ask is:
• Null Hypothesis: Is there no significant difference between the two samples except as caused by chance selection of data.
• OR
• Alternative hypothesis: Is there a significant difference between the height of shells in sample A and sample B.
• Comparison of Two Samples Using a t-Test
• t -test The t -test compares the averages and standard deviations of two samples to see if there is a significant difference between them. We start by calculating a number , t t can be calculated using the equation: ( x 1 – x 2 ) (s 1 ) 2 n 1 (s 2 ) 2 n 2 + t = Where: x 1 is the mean of sample 1 s 1 is the standard deviation of sample 1 n 1 is the number of individuals in sample 1 x 2 is the mean of sample 2 s 2 is the standard deviation of sample 2 n 2 is the number of individuals in sample 2
• This is done on a calculator or in excel
• Drawing conclusions
• 1. State the null hypothesis and the alternative hypothesis based on your research question . Null Hypothesis: 'There is no significant difference between the height of shells in sample A and sample B.' Alternative Hypothesis: 'There is a significant difference between the height of shells in sample A and sample B'.
• 2. Set the critical P level ( your cutoff % )
• at P= 0.05 (5%)
• 3. Write the decision rule for rejecting the null hypothesis.
• If P  > 5% then the two sets are the same (i.e. accept the null hypothesis).
• If P  < 5% then the two sets are different (i.e. reject the null hypothesis) .
• 4. Write a summary statement based on the decision.
• The null hypothesis is rejected since calculated P = 0.003 < P =0.05 two-tailed test
• 5. Write a statement of results in standard English.
• There is a significant difference between the height of shells in sample A and sample B.
• Step 1 to calculating t test in Excel
• Pick a box where you want your t value displayed
• Go to fx and
• In fx in Category pick Statistical and then find t test in select function
• Array 1 is population 1 data cell B3 to B12
• Array 2 is population 2 data cells C3 to C12
• Tails in Biology tail is always 2
• Type can be 1 paired 2Two sample equal variance 3 Two samples unequal variance
• What are tails in a t test ?
• Think of it as one of two side of the graph. A one- or two-tailed t-test is determined by whether the total area of a is placed in one tail or divided equally between the two tails. The one-tailed t-test is performed if the results are interesting only if they turn out in a particular direction.
One Tail Two Tail
• What are the 3 types?
• Type 1 Let’s say you wanted to test whether heart rate increased after drinking a cup of hot sauce or whether plant growth would increase after adding fertilizer to pots of soil. In these cases you would be comparing the heart rate of the same people, or the growth of the same pot of plants before and after the treatment. This would require a &quot;paired&quot; or &quot;dependent&quot; T test. Excel calls this a &quot;type 1&quot; test
• Say you want to know whether nursing students
• consume more coffee than do biology students.
• You would then have two groups of test subjects
• rather than taking 2 measurements on each person.
• Now you would use an &quot;unpaired&quot; or &quot;independent“
• T-test. Excel calls these &quot;type 2&quot; or &quot;type 3&quot; tests.
• Type 2 Does the data seem homogeneity (similar) combines the data from several studies, homogeneity measures the differences or similarities between the several studies
Type 2 & 3
• Type 3
• Now the tricky part is to decide which of these to use.
• Are the standard deviations about the same for both
• groups, or are they different?
• You can test this statistically, but let’s just work with
• how they seem. If in doubt, go with &quot;type 3&quot; for
• unequal variances.
• You now have a t test value
• Convert data to a percent
• selecting the box with the data
• right click
• pick format the cell
• select percentage
• you are done
• .03% is less then .05%
• Correlation or Co-relation
• Refers to the departure of two random variables from independence (that they maybe interconnected).
• Correlations cannot indicate the potential existence of causal relations. However, the causes underlying the correlation, if any, may be indirect and unknown, and high correlations also overlap with identity relations, where no causal process exists. Consequently, establishing a correlation between two variables is not a sufficient condition to establish a causal relationship (in either direction).
• Scatterplots
•
• The concept of correlation can be demonstrated by using scatterplots .
• A scatterplot is a graph of data points for two variables, with one variable on each axis.
• The data points are plotted in the field of the graph according to their values for each variable. This produces a &quot;scatter&quot; of points; a more narrow scatter pattern occurs when the correlation is high.
• Negative 1 correlation
• Coefficient of -1.00
• A correlation coefficient of -1.00 means that every subject’s scores are the exactly same standardized distance but in opposite directions from the means of both variables
• As the value of %Fat increases (i.e., as you move from left to right on the X axis), the value of Y decreases (i.e., moves toward the bottom on the Y axis).
• A correlation coefficient of 0 means that the two variables, age and height, are unrelated to one another
• Correlation Coefficient +1.00
• Positive coefficient (+1.00)
• A correlation coefficient of +1.00 means that every subject’s scores are exactly the same standardized distance and the same direction from the means for both variables
• Taller people tend to weigh more;
• shorter people tend to weigh less .
• Creating a Scatter graph in Excel
•
• Highlight your Numbers and Selected Chart Wizard
• In Chart Wizard chose Scatter graph
• Label your graph & x and y axis
• To added a trendline RIGHT click on any data point
• Choice linear
• You can now clip and paste it into word for you lab
• Causation
• &quot; Correlation does not imply causation &quot; is a phrase used in science to emphasize that correlation between two variables does not automatically imply that one causes the other .