0
1
© 2006
Biostatistics Basics
An introduction to an expansive
and complex field
Evidence-based 2 © 2006
Common statistical terms
• Data
– Measurements or observations of a variable
• Variable
– A charac...
Evidence-based 3 © 2006
Statistical terms (cont.)
• Independent variables
– Precede dependent variables in time
– Are ofte...
Evidence-based 4 © 2006
Statistical terms (cont.)
• Parameters
– Summary data from a population
• Statistics
– Summary dat...
Evidence-based 5 © 2006
Populations
• A population is the group from which a
sample is drawn
– e.g., headache patients in ...
Evidence-based 6 © 2006
Random samples
• Subjects are selected from a population so
that each individual has an equal chan...
Evidence-based 7 © 2006
Random samples (cont.)
• Random samples are rarely utilized in
health care research
• Instead, pat...
Evidence-based 8 © 2006
Descriptive statistics (DSs)
• A way to summarize data from a sample
or a population
• DSs illustr...
Evidence-based 9 © 2006
DSs (cont.)
– Central tendency describes the location of the
middle of the data
– Variability is t...
Evidence-based 10 © 2006
Hypothetical study data
(partial from book)
Case # Visits
11 77
22 22
33 22
44 33
55 44
66 33
77 ...
Evidence-based 11 © 2006
Frequency distribution table
Frequency Percent Cumulative %
• 2 3 21.4 21.4
• 3 4 28.6 50.0
• 4 3...
Evidence-based 12 © 2006
Frequency distributions are
often depicted by a histogram
Evidence-based 13 © 2006
Histograms (cont.)
• A histogram is a type of bar chart, but
there are no spaces between the bars...
Evidence-based 14 © 2006
Measures of central tendency
• Mean (a.k.a., average)
– The most commonly used DS
• To calculate ...
Evidence-based 15 © 2006
Formula to calculate the mean
• Mean of a sample
• Mean of a population
• (X bar) refers to the m...
Evidence-based 16 © 2006
Measures of central
tendency (cont.)
• Mode
– The most frequently
occurring value in a
series
– T...
Evidence-based 17 © 2006
Measures of central
tendency (cont.)
• Median
– The value that divides a series of values in
half...
Evidence-based 18 © 2006
Measures of central
tendency (cont.)
• Each of the three methods of measuring
central tendency ha...
Evidence-based 19 © 2006
Levels of measurement
• There are 4 levels of measurement
– Nominal, ordinal, interval, and ratio...
Evidence-based 20 © 2006
Levels of measurement (cont.)
2. Ordinal
– Is similar to nominal because the
measurements involve...
Evidence-based 21 © 2006
Levels of measurement (cont.)
• Ordinal values only describe order, not
quantity
– Thus, severe p...
Evidence-based 22 © 2006
Levels of measurement (cont.)
3. Interval
– Measurements are ordered (like ordinal
data)
– Have e...
Evidence-based 23 © 2006
Levels of measurement (cont.)
4. Ratio
– Measurements have equal intervals
– There is a true zero...
Evidence-based 24 © 2006
Levels of measurement (cont.)
• Ratio examples
– Range of motion
• No movement corresponds to zer...
Evidence-based 25 © 2006
Levels of measurement (cont.)
• NOIR is a mnemonic to help remember
the names and order of the le...
Evidence-based 26 © 2006
Levels of measurement (cont.)
Measurement scale
Permissible mathematic
operations
Best measure of...
Evidence-based 27 © 2006
The shape of data
• Histograms of frequency distributions have
shape
• Distributions are often sy...
Evidence-based 28 © 2006
The shape of data (cont.)
Line depicting
the shape of
the data
Line depicting
the shape of
the da...
Evidence-based 29 © 2006
The normal distribution
• The area under a normal curve has a
normal distribution (a.k.a., Gaussi...
Evidence-based 30 © 2006
The normal distribution (cont.)
MeanMean
A normal distribution is symmetric about its meanA norma...
Evidence-based 31 © 2006
The normal distribution (cont.)
Mean = Median = ModeMean = Median = Mode
Evidence-based 32 © 2006
Skewed distributions
• The data are not distributed symmetrically
in skewed distributions
– Conse...
Evidence-based 33 © 2006
Skewed distributions (cont.)
• Skew is always toward the direction of the
longer tail
– Positive ...
Evidence-based 34 © 2006
Skewed distributions (cont.)
• Because the mean is shifted so much, it is
not the best estimate o...
Evidence-based 35 © 2006
More properties
of normal curves
• About 68.3% of the area under a normal
curve is within one sta...
Evidence-based 36 © 2006
More properties
of normal curves (cont.)
Evidence-based 37 © 2006
Standard deviation (SD)
• SD is a measure of the variability of a set
of data
• The mean represen...
Evidence-based 38 © 2006
SD (cont.)
• In effect, SD is the average amount of
spread in a distribution of scores
• The next...
Evidence-based 39 © 2006
SD (cont.)
Ages are spread
out along an X axis
Ages are spread
out along an X axis
The amount age...
Evidence-based 40 © 2006
Distances ages deviate above
and below the mean
Adding deviations
always equals zero
Adding devia...
Evidence-based 41 © 2006
Calculating S2
• To find the average, one would normally
total the scores above and below the
mea...
Evidence-based 42 © 2006
Calculating S2
cont.
Symbol for SD of a sample
σ for a population
Symbol for SD of a sample
σ for...
Evidence-based 43 © 2006
Calculating SD with Excel
Enter values in a columnEnter values in a column
Evidence-based 44 © 2006
SD with Excel (cont.)
Click Data Analysis
on the Tools menu
Click Data Analysis
on the Tools menu
Evidence-based 45 © 2006
SD with Excel (cont.)
Select Descriptive
Statistics and click OK
Select Descriptive
Statistics an...
Evidence-based 46 © 2006
SD with Excel (cont.)
Click Input Range iconClick Input Range icon
Evidence-based 47 © 2006
SD with Excel (cont.)
Highlight all the
values in the column
Highlight all the
values in the colu...
Evidence-based 48 © 2006
SD with Excel (cont.)
Check if labels are
in the first row
Check if labels are
in the first row
C...
Evidence-based 49 © 2006
SD with Excel (cont.)
SD is calculated precisely
Plus several other DSs
SD is calculated precisel...
Evidence-based 50 © 2006
Wide spread results in higher SDs
narrow spread in lower SDs
Evidence-based 51 © 2006
Spread is important when
comparing 2 or more group means
It is more difficult to
see a clear dist...
Evidence-based 52 © 2006
z-scores
• The number of SDs that a specific score is
above or below the mean in a distribution
•...
Evidence-based 53 © 2006
z-scores (cont.)
• Standardization
– The process of converting raw to z-scores
– The resulting di...
Evidence-based 54 © 2006
z-scores (cont.)
Refer to a z-table
to find proportion
under the curve
Refer to a z-table
to find...
Evidence-based 55 © 2006
z-scores (cont.)Partial z-table (to z = 1.5) showing proportions of the
area under a normal curve...
Upcoming SlideShare
Loading in...5
×

Biostatistics basics - Biostatistics

5,162

Published on

1 Comment
18 Likes
Statistics
Notes
No Downloads
Views
Total Views
5,162
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
605
Comments
1
Likes
18
Embeds 0
No embeds

No notes for slide

Transcript of "Biostatistics basics - Biostatistics"

  1. 1. 1 © 2006 Biostatistics Basics An introduction to an expansive and complex field
  2. 2. Evidence-based 2 © 2006 Common statistical terms • Data – Measurements or observations of a variable • Variable – A characteristic that is observed or manipulated – Can take on different values
  3. 3. Evidence-based 3 © 2006 Statistical terms (cont.) • Independent variables – Precede dependent variables in time – Are often manipulated by the researcher – The treatment or intervention that is used in a study • Dependent variables – What is measured as an outcome in a study – Values depend on the independent variable
  4. 4. Evidence-based 4 © 2006 Statistical terms (cont.) • Parameters – Summary data from a population • Statistics – Summary data from a sample
  5. 5. Evidence-based 5 © 2006 Populations • A population is the group from which a sample is drawn – e.g., headache patients in a chiropractic office; automobile crash victims in an emergency room • In research, it is not practical to include all members of a population • Thus, a sample (a subset of a population) is taken
  6. 6. Evidence-based 6 © 2006 Random samples • Subjects are selected from a population so that each individual has an equal chance of being selected • Random samples are representative of the source population • Non-random samples are not representative – May be biased regarding age, severity of the condition, socioeconomic status etc.
  7. 7. Evidence-based 7 © 2006 Random samples (cont.) • Random samples are rarely utilized in health care research • Instead, patients are randomly assigned to treatment and control groups – Each person has an equal chance of being assigned to either of the groups • Random assignment is also known as randomization
  8. 8. Evidence-based 8 © 2006 Descriptive statistics (DSs) • A way to summarize data from a sample or a population • DSs illustrate the shape, central tendency, and variability of a set of data – The shape of data has to do with the frequencies of the values of observations
  9. 9. Evidence-based 9 © 2006 DSs (cont.) – Central tendency describes the location of the middle of the data – Variability is the extent values are spread above and below the middle values • a.k.a., Dispersion • DSs can be distinguished from inferential statistics – DSs are not capable of testing hypotheses
  10. 10. Evidence-based 10 © 2006 Hypothetical study data (partial from book) Case # Visits 11 77 22 22 33 22 44 33 55 44 66 33 77 55 88 33 99 44 1010 66 1111 22 1212 33 1313 77 1414 44 • Distribution provides a summary of: – Frequencies of each of the values • 2 – 3 • 3 – 4 • 4 – 3 • 5 – 1 • 6 – 1 • 7 – 2 – Ranges of values • Lowest = 2 • Highest = 7 etc.
  11. 11. Evidence-based 11 © 2006 Frequency distribution table Frequency Percent Cumulative % • 2 3 21.4 21.4 • 3 4 28.6 50.0 • 4 3 21.4 71.4 • 5 1 7.1 78.5 • 6 1 7.1 85.6 • 7 2 14.3 100.0
  12. 12. Evidence-based 12 © 2006 Frequency distributions are often depicted by a histogram
  13. 13. Evidence-based 13 © 2006 Histograms (cont.) • A histogram is a type of bar chart, but there are no spaces between the bars • Histograms are used to visually depict frequency distributions of continuous data • Bar charts are used to depict categorical information – e.g., Male–Female, Mild–Moderate–Severe, etc.
  14. 14. Evidence-based 14 © 2006 Measures of central tendency • Mean (a.k.a., average) – The most commonly used DS • To calculate the mean – Add all values of a series of numbers and then divided by the total number of elements
  15. 15. Evidence-based 15 © 2006 Formula to calculate the mean • Mean of a sample • Mean of a population • (X bar) refers to the mean of a sample and refers to the mean of a population • X is a command that adds all of the X values • n is the total number of values in the series of a sample and N is the same for a population X μ N XΣ =µ n X X Σ =
  16. 16. Evidence-based 16 © 2006 Measures of central tendency (cont.) • Mode – The most frequently occurring value in a series – The modal value is the highest bar in a histogram ModeMode
  17. 17. Evidence-based 17 © 2006 Measures of central tendency (cont.) • Median – The value that divides a series of values in half when they are all listed in order – When there are an odd number of values • The median is the middle value – When there are an even number of values • Count from each end of the series toward the middle and then average the 2 middle values
  18. 18. Evidence-based 18 © 2006 Measures of central tendency (cont.) • Each of the three methods of measuring central tendency has certain advantages and disadvantages • Which method should be used? – It depends on the type of data that is being analyzed – e.g., categorical, continuous, and the level of measurement that is involved
  19. 19. Evidence-based 19 © 2006 Levels of measurement • There are 4 levels of measurement – Nominal, ordinal, interval, and ratio 1. Nominal – Data are coded by a number, name, or letter that is assigned to a category or group – Examples • Gender (e.g., male, female) • Treatment preference (e.g., manipulation, mobilization, massage)
  20. 20. Evidence-based 20 © 2006 Levels of measurement (cont.) 2. Ordinal – Is similar to nominal because the measurements involve categories – However, the categories are ordered by rank – Examples • Pain level (e.g., mild, moderate, severe) • Military rank (e.g., lieutenant, captain, major, colonel, general)
  21. 21. Evidence-based 21 © 2006 Levels of measurement (cont.) • Ordinal values only describe order, not quantity – Thus, severe pain is not the same as 2 times mild pain • The only mathematical operations allowed for nominal and ordinal data are counting of categories – e.g., 25 males and 30 females
  22. 22. Evidence-based 22 © 2006 Levels of measurement (cont.) 3. Interval – Measurements are ordered (like ordinal data) – Have equal intervals – Does not have a true zero – Examples • The Fahrenheit scale, where 0° does not correspond to an absence of heat (no true zero) • In contrast to Kelvin, which does have a true zero
  23. 23. Evidence-based 23 © 2006 Levels of measurement (cont.) 4. Ratio – Measurements have equal intervals – There is a true zero – Ratio is the most advanced level of measurement, which can handle most types of mathematical operations
  24. 24. Evidence-based 24 © 2006 Levels of measurement (cont.) • Ratio examples – Range of motion • No movement corresponds to zero degrees • The interval between 10 and 20 degrees is the same as between 40 and 50 degrees – Lifting capacity • A person who is unable to lift scores zero • A person who lifts 30 kg can lift twice as much as one who lifts 15 kg
  25. 25. Evidence-based 25 © 2006 Levels of measurement (cont.) • NOIR is a mnemonic to help remember the names and order of the levels of measurement – Nominal Ordinal Interval Ratio
  26. 26. Evidence-based 26 © 2006 Levels of measurement (cont.) Measurement scale Permissible mathematic operations Best measure of central tendency Nominal Counting Mode Ordinal Greater or less than operations Median Interval Addition and subtraction Symmetrical – Mean Skewed – Median Ratio Addition, subtraction, multiplication and division Symmetrical – Mean Skewed – Median
  27. 27. Evidence-based 27 © 2006 The shape of data • Histograms of frequency distributions have shape • Distributions are often symmetrical with most scores falling in the middle and fewer toward the extremes • Most biological data are symmetrically distributed and form a normal curve (a.k.a, bell-shaped curve)
  28. 28. Evidence-based 28 © 2006 The shape of data (cont.) Line depicting the shape of the data Line depicting the shape of the data
  29. 29. Evidence-based 29 © 2006 The normal distribution • The area under a normal curve has a normal distribution (a.k.a., Gaussian distribution) • Properties of a normal distribution – It is symmetric about its mean – The highest point is at its mean – The height of the curve decreases as one moves away from the mean in either direction, approaching, but never reaching zero
  30. 30. Evidence-based 30 © 2006 The normal distribution (cont.) MeanMean A normal distribution is symmetric about its meanA normal distribution is symmetric about its mean As one moves away from the mean in either direction the height of the curve decreases, approaching, but never reaching zero As one moves away from the mean in either direction the height of the curve decreases, approaching, but never reaching zero The highest point of the overlying normal curve is at the mean The highest point of the overlying normal curve is at the mean
  31. 31. Evidence-based 31 © 2006 The normal distribution (cont.) Mean = Median = ModeMean = Median = Mode
  32. 32. Evidence-based 32 © 2006 Skewed distributions • The data are not distributed symmetrically in skewed distributions – Consequently, the mean, median, and mode are not equal and are in different positions – Scores are clustered at one end of the distribution – A small number of extreme values are located in the limits of the opposite end
  33. 33. Evidence-based 33 © 2006 Skewed distributions (cont.) • Skew is always toward the direction of the longer tail – Positive if skewed to the right – Negative if to the left The mean is shifted the most
  34. 34. Evidence-based 34 © 2006 Skewed distributions (cont.) • Because the mean is shifted so much, it is not the best estimate of the average score for skewed distributions • The median is a better estimate of the center of skewed distributions – It will be the central point of any distribution – 50% of the values are above and 50% below the median
  35. 35. Evidence-based 35 © 2006 More properties of normal curves • About 68.3% of the area under a normal curve is within one standard deviation (SD) of the mean • About 95.5% is within two SDs • About 99.7% is within three SDs
  36. 36. Evidence-based 36 © 2006 More properties of normal curves (cont.)
  37. 37. Evidence-based 37 © 2006 Standard deviation (SD) • SD is a measure of the variability of a set of data • The mean represents the average of a group of scores, with some of the scores being above the mean and some below – This range of scores is referred to as variability or spread • Variance (S2 ) is another measure of spread
  38. 38. Evidence-based 38 © 2006 SD (cont.) • In effect, SD is the average amount of spread in a distribution of scores • The next slide is a group of 10 patients whose mean age is 40 years – Some are older than 40 and some younger
  39. 39. Evidence-based 39 © 2006 SD (cont.) Ages are spread out along an X axis Ages are spread out along an X axis The amount ages are spread out is known as dispersion or spread The amount ages are spread out is known as dispersion or spread
  40. 40. Evidence-based 40 © 2006 Distances ages deviate above and below the mean Adding deviations always equals zero Adding deviations always equals zero Etc.
  41. 41. Evidence-based 41 © 2006 Calculating S2 • To find the average, one would normally total the scores above and below the mean, add them together, and then divide by the number of values • However, the total always equals zero – Values must first be squared, which cancels the negative signs
  42. 42. Evidence-based 42 © 2006 Calculating S2 cont. Symbol for SD of a sample σ for a population Symbol for SD of a sample σ for a population S2 is not in the same units (age), but SD is S2 is not in the same units (age), but SD is
  43. 43. Evidence-based 43 © 2006 Calculating SD with Excel Enter values in a columnEnter values in a column
  44. 44. Evidence-based 44 © 2006 SD with Excel (cont.) Click Data Analysis on the Tools menu Click Data Analysis on the Tools menu
  45. 45. Evidence-based 45 © 2006 SD with Excel (cont.) Select Descriptive Statistics and click OK Select Descriptive Statistics and click OK
  46. 46. Evidence-based 46 © 2006 SD with Excel (cont.) Click Input Range iconClick Input Range icon
  47. 47. Evidence-based 47 © 2006 SD with Excel (cont.) Highlight all the values in the column Highlight all the values in the column
  48. 48. Evidence-based 48 © 2006 SD with Excel (cont.) Check if labels are in the first row Check if labels are in the first row Check Summary Statistics Check Summary Statistics Click OKClick OK
  49. 49. Evidence-based 49 © 2006 SD with Excel (cont.) SD is calculated precisely Plus several other DSs SD is calculated precisely Plus several other DSs
  50. 50. Evidence-based 50 © 2006 Wide spread results in higher SDs narrow spread in lower SDs
  51. 51. Evidence-based 51 © 2006 Spread is important when comparing 2 or more group means It is more difficult to see a clear distinction between groups in the upper example because the spread is wider, even though the means are the same
  52. 52. Evidence-based 52 © 2006 z-scores • The number of SDs that a specific score is above or below the mean in a distribution • Raw scores can be converted to z-scores by subtracting the mean from the raw score then dividing the difference by the SD σ µ− = X z
  53. 53. Evidence-based 53 © 2006 z-scores (cont.) • Standardization – The process of converting raw to z-scores – The resulting distribution of z-scores will always have a mean of zero, a SD of one, and an area under the curve equal to one • The proportion of scores that are higher or lower than a specific z-score can be determined by referring to a z-table
  54. 54. Evidence-based 54 © 2006 z-scores (cont.) Refer to a z-table to find proportion under the curve Refer to a z-table to find proportion under the curve
  55. 55. Evidence-based 55 © 2006 z-scores (cont.)Partial z-table (to z = 1.5) showing proportions of the area under a normal curve for different values of z. Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 0.93320.9332 Corresponds to the area under the curve in black Corresponds to the area under the curve in black
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×