Data can be collected about a population as an example, surveys
Data can be collected about a process as an example, experimentations.
There are two types of Data
1. Qualitative Data
Information that relates to characteristics or description (observable qualities)
Information is often grouped by descriptive category
Species of plant
Type of insect
Shades of color
Rank of flavor in taste testing
Remember: qualitative data can be “scored” and evaluated numerically
Qualitative data, put into a graph and manipulated numerically
Survey results, teens and need for environmental action
2. Quantitative data
measured using a naturally occurring numerical scale
Measurements are often displayed graphically
Quantitative = Measurement
In data collection for Biology, data must be measured carefully, using laboratory equipment (ex. timers, meterstick, pH meter, balance, etc)
The limits of the equipment used add some uncertainty to the data collected. All equipment has a certain magnitude of uncertainty. For example, is a ruler that is mass-produced a good measure of 1 cm? 1mm? 0.1mm?
For quantitative testing, you must indicate the level of uncertainty of the tool that you are using for measurement!!
How to determine uncertainty?
As a “rule-of-thumb”, if not specified, use +/- 1/2 of the smallest measurement unit
In the picture above of the meter stick, the uncertainty is +/- .5mm
Looking at Data
How accurate is the data? (How close are the data to the “real” results?) This is also considered as BIAS
How precise is the data? (All test systems have some uncertainty, due to limits of measurement) Estimation of the limits of the experimental uncertainty is essential.
Once the 2 averages are calculated for each set of data, the average values can be plotted together on a graph, to visualize the relationship between the 2
These two averages should be close to the same
Drawing error bars
The simplest way to draw an error bar is to use the mean as the central point, and to use the distance of the measurement that is furthest from the average as the endpoints of the data bar
Average value Value farthest from average Calculated distance
What do error bars suggest?
If the bars show extensive overlap, it is likely that there is not a significant difference between those values
Quick Review – 3 measures of “Central Tendency”
1. Mode value that appears most frequently
2. Median When all data are listed from least to greatest, the value at which half of the observations are greater, and half are lesser.
3. Mean The most commonly used measure of central tendency is the mean , or arithmetic average (sum of data points divided by the number of points)
How can leaf lengths be displayed graphically?
Simply measure the lengths of each and plot how many are of each length
If smoothed, the histogram data assumes this shape
Is a classic bell-shaped curve, AKA a Normal Distribution curve.
Essentially it means that in all studies with an adequate number of datapoints (>30) a significant number of results tend to be near the mean. Fewer results are found farther from the mean
is a statistic that tells you how tightly all the various examples are clustered around the mean in a set of data
Is a more sophisticated indicator of the precision of a set of a given number of measurements
It is like an average deviation of measurement values from the mean. In large studies, the standard deviation is used to draw error bars, instead of the maximum deviation.
A typical standard distribution curve
According to this curve :
One standard deviation away from the mean in either direction on the horizontal axis (the red area on the preceding graph) accounts for somewhere around 68 percent of the data in this group.
Two standard deviations away from the mean ( the red and green areas ) account for roughly 95 percent of the data .
Three Standard Deviations?
three standard deviations (the red, green and blue areas) account for about 99 percent of the data
-3sd -2sd +/-1sd 2sd +3sd
How is Standard Deviation calculated?
With this formula!
DO I NEED TO KNOW THIS FOR THE TEST?????
Not the formula!
This can be calculated on a scientific calculator
OR…. In Microsoft Excel , type the following code into the cell where you want the Standard Deviation result, using the "unbiased," or "n-1" method: = STDEV (A1:A30) (substitute the cell name of the first value in your dataset for A1, and the cell name of the last value for A30.)
standard deviation is a statistic that tells how tightly all the various datapoints are clustered around the mean in a set of data.
When the datapoints are tightly bunched together and the bell-shaped curve is steep, the standard deviation is small.(precise results, smaller sd)
When the datapoints are spread apart and the bell curve is relatively flat, a large standard deviation value suggests less precise results
Here is an example
Comparison of Two Samples Two of the same type of Mollusks from two different locations
Mollusk shell measurements
Example: The two mollusk shell samples above
Population 1. Mean = 31.4 Standard deviation(s)= 5.7
Population 2. Mean =41.6 Standard deviation(s) = 4.3
Standard deviation in the error bar
A sample with a small standard deviation suggest narrow variation (Pop. 2) .
The second population has a greater mean shell length but slightly narrower variation. Why this is the case would require further observation and experiment on environmental and genetic factors.
Statistical hypothesis testing (null hypothesis)
Is used to describe some aspect of the statistical
set of data (like our mollusk data).
We will use this to compare the two data sets
The question that we might now ask is:
Null Hypothesis: Is there no significant difference between the two samples except as caused by chance selection of data.
Alternative hypothesis: Is there a significant difference between the height of shells in sample A and sample B.
Comparison of Two Samples Using a t-Test
t -test The t -test compares the averages and standard deviations of two samples to see if there is a significant difference between them. We start by calculating a number , t t can be calculated using the equation: ( x 1 – x 2 ) (s 1 ) 2 n 1 (s 2 ) 2 n 2 + t = Where: x 1 is the mean of sample 1 s 1 is the standard deviation of sample 1 n 1 is the number of individuals in sample 1 x 2 is the mean of sample 2 s 2 is the standard deviation of sample 2 n 2 is the number of individuals in sample 2
This is done on a calculator or in excel
1. State the null hypothesis and the alternative hypothesis based on your research question . Null Hypothesis: 'There is no significant difference between the height of shells in sample A and sample B.' Alternative Hypothesis: 'There is a significant difference between the height of shells in sample A and sample B'.
2. Set the critical P level ( your cutoff % )
at P= 0.05 (5%)
3. Write the decision rule for rejecting the null hypothesis.
If P > 5% then the two sets are the same (i.e. accept the null hypothesis).
If P < 5% then the two sets are different (i.e. reject the null hypothesis) .
4. Write a summary statement based on the decision.
The null hypothesis is rejected since calculated P = 0.003 < P =0.05 two-tailed test
5. Write a statement of results in standard English.
There is a significant difference between the height of shells in sample A and sample B.
Step 1 to calculating t test in Excel
Pick a box where you want your t value displayed
Go to fx and
In fx in Category pick Statistical and then find t test in select function
Array 1 is population 1 data cell B3 to B12
Array 2 is population 2 data cells C3 to C12
Tails in Biology tail is always 2
Type can be 1 paired 2Two sample equal variance 3 Two samples unequal variance
What are tails in a t test ?
Think of it as one of two side of the graph. A one- or two-tailed t-test is determined by whether the total area of a is placed in one tail or divided equally between the two tails. The one-tailed t-test is performed if the results are interesting only if they turn out in a particular direction.
One Tail Two Tail
What are the 3 types?
Type 1 Let’s say you wanted to test whether heart rate increased after drinking a cup of hot sauce or whether plant growth would increase after adding fertilizer to pots of soil. In these cases you would be comparing the heart rate of the same people, or the growth of the same pot of plants before and after the treatment. This would require a "paired" or "dependent" T test. Excel calls this a "type 1" test
Say you want to know whether nursing students
consume more coffee than do biology students.
You would then have two groups of test subjects
rather than taking 2 measurements on each person.
Now you would use an "unpaired" or "independent“
T-test. Excel calls these "type 2" or "type 3" tests.
Type 2 Does the data seem homogeneity (similar) combines the data from several studies, homogeneity measures the differences or similarities between the several studies
Type 2 & 3
Now the tricky part is to decide which of these to use.
Are the standard deviations about the same for both
groups, or are they different?
You can test this statistically, but let’s just work with
how they seem. If in doubt, go with "type 3" for
You now have a t test value
Convert data to a percent
selecting the box with the data
pick format the cell
you are done
.03% is less then .05%
Correlation or Co-relation
Refers to the departure of two random variables from independence (that they maybe interconnected).
Correlations cannot indicate the potential existence of causal relations. However, the causes underlying the correlation, if any, may be indirect and unknown, and high correlations also overlap with identity relations, where no causal process exists. Consequently, establishing a correlation between two variables is not a sufficient condition to establish a causal relationship (in either direction).
The concept of correlation can be demonstrated by using scatterplots .
A scatterplot is a graph of data points for two variables, with one variable on each axis.
The data points are plotted in the field of the graph according to their values for each variable. This produces a "scatter" of points; a more narrow scatter pattern occurs when the correlation is high.
Negative 1 correlation
Coefficient of -1.00
A correlation coefficient of -1.00 means that every subject’s scores are the exactly same standardized distance but in opposite directions from the means of both variables
As the value of %Fat increases (i.e., as you move from left to right on the X axis), the value of Y decreases (i.e., moves toward the bottom on the Y axis).
A correlation coefficient of 0 means that the two variables, age and height, are unrelated to one another
Correlation Coefficient +1.00
Positive coefficient (+1.00)
A correlation coefficient of +1.00 means that every subject’s scores are exactly the same standardized distance and the same direction from the means for both variables
Taller people tend to weigh more;
shorter people tend to weigh less .
Creating a Scatter graph in Excel
Highlight your Numbers and Selected Chart Wizard
In Chart Wizard chose Scatter graph
Label your graph & x and y axis
To added a trendline RIGHT click on any data point
You can now clip and paste it into word for you lab
" Correlation does not imply causation " is a phrase used in science to emphasize that correlation between two variables does not automatically imply that one causes the other .