3. Objectives
At the end of this session, students should be able to:
Define statistics/biostatistics
Understand the importance of Biostatistics in
biomedical research
Differentiate between population and sample
Understand the different types of data
2011.3.6
3
4. 4
Statistics
•The science of collecting, analyzing, presenting,
and interpreting data. —(Encyclopaedia Britannica
2009) http://www.britannica.com/
•Branch of mathematics that deals with the
collection, organization, and analysis of numerical
data and with such problems as experiment design
and decision making. —(Microsoft Encarta
Premium 2009)
5. 5
•A science dealing with the collection, analysis,
interpretation, and presentation of masses of
numerical data. —(Webster's International Dictionary)
•The science and art of dealing with variation in
data through collection, classification, and analysis
in such a way as to obtain reliable results. —(John
M. Last, A Dictionary of Epidemiology )
6. 6
• The science of the collection, organization, and
interpretation of data. It deals with all aspects of
this, including the planning of data collection in
terms of the design of surveys and experiments.
—(From Wikipedia, the free encyclopedia)
http://en.wikipedia.org/wiki/Statistics
7. 7
Biostatistics/Medical Statistics/Health
Statistics
• deals with applications of statistics to medicine and
the health sciences, including epidemiology, public
health, forensic medicine, and clinical research.
• Medical Statistics has been a recognized branch of
statistics in the UK for more than 40 years but the
term does not appear to have come into general use
in North America, where the wider term
'biostatistics' is more commonly used.
• The refers to Biostatistics as Health Statistics
8. 8
Why we need to study Medical Statistics?
Three reasons:
(1) Basic requirement of medical research.
(2) Update your medical knowledge.
(3) Data management and treatment.
9. 9
I. Basic concepts
• Homogeneity: All individuals have similar values
or belong to same category.
Example: all individuals are Sierra Leoneans,
women, middle age (30~40 years old), work in a
computer factory ---- homogeneity in nationality,
gender, age and occupation.
• Variation: the differences in feature, voice…
1. Homogeneity and Variation
10. Data Is Everywhere!
• Data is utilized and summarized frequently in
research literature
• From Archives of Surgeryarticle, August 2000:
• “Hypothesis: Surgeon-directed institutional peer review,
associated with positive physician feedback, can decrease the
morbidity and mortality rates associated with carotid
endarterectomy.”
• “Results: Stroke rate decreased from 3.8%(1993-1994) to
0%(1997-1998). The mortality rate decreased from 2.8%(1993-
1994) to 0%(1997-1998). (Average) length of stay decreased
from 4.7 days(1993-1994) to 2.6 days(1997-1998). The (average)
total cost decreased from $13,344(1993-1994) to $9,548(1997-
1998).”
• Source: Olcott IV, C., et al. (2004). Institutional peer review can reduce the risk and cost of carotid endarterectomy Arch
10
11. Data is utilized and summarized with statistics
frequently in popular media
• From cnn.com, Monday July 8th, 2008:
“For the first time, an influential doctors group is
recommending that some children as young as
eight be given cholesterol-fighting drugs to ward
off future heart problems . . . With one-third of U.S.
children overweight and about 17 percent obese,
the new recommendations are important,‟ said Dr.
Jennifer Li, a Duke University children's heart
specialist.”
11
12. • From Washington Post, June 27th, 2008:
• “The number of young homosexual men being newly
diagnosed with HIV infection is rising by 12 percent a
year, with the steepest upward trend in young black
men, according to a new report.”
12
13. Data Provides Information
• Good data can be analyzed and summarized to
provide useful information
• Bad data can be analyzed and summarized to provide
incorrect/harmful/non-informative information
13
14. Steps in a Research Project
Planning/design of study
Data collection
Data analysis
Presentation
Interpretation
• Biostatistics CAN play a role in each of these
steps! (but sometimes is only called upon for
the data analysis part)
14
15. Biostatistics Issues
Planning/design of studies
– Primary question(s) of interest:
Quantifying information about a single group?
Comparing multiple groups?
– Sample size
How many subjects needed in total?
How many in each of the groups to be compared?
– Selecting study participants
Randomly chosen from “master list?”
• Selected from a pool of interested persons?
• Take whoever shows up? 2011.3.6
15
16. Biostatistics Issues
Data collection
Data analysis
What statistical methods are appropriate given the data
collected?
Dealing with variability (both natural and sampling related):
Important patterns in data are obscured by variability
Distinguish real patterns from random variation
• Inference: using information from the single study coupled with
information about variability to make statement about the larger
population/process of interest
16
17. Biostatistics Issues
Presentation
• What summary measures will best convey the “main
messages” in the data about the primary (and
secondary) research questions of interest
• How to convey/ rectify uncertainty in estimates based
on the data
Interpretation
• What do the results mean in terms of practice, the
program, the population etc.?
17
18. 1954 Salk Polio Vaccine Trial
18
•Source: Meier, P. (1972), “The Biggest Public Health Experiment Ever: The
1954 Field Trial of the Salk Poliomyelitis Vaccine,” InJ. Tanur
(Editor),Statistics: A Guide to the Unknown.Holden-Day.
19. Design: Features of the Polio Trial
• Comparison group
• Randomized
• Placebo controls
• Double blind
Objective—the groups should be equivalent except
for the factor (vaccine) being investigated
19
20. Analysis Question
Question
There were almost twice as many polio cases in
the placebo compared to the vaccine group
Could the results be due to chance?
2011.3.6
20
21. Such Great Imbalance by Chance?
Polio cases
Vaccine—82
Placebo—162
Statistical methods tell us how to make these
probability calculations
2011.3.6
21
22. 22
• Throw a coin: The mark face may be up or down
---- variation!
• Treat the patients suffering from pneumonia
with same antibiotics: A part of them recovered
and others didn’t ---- variation!
• If there is no variation, there is no need for
statistics.
• There are many examples of variations in the
medical field:
height, weight, pulse, blood pressure, … …
23. 23
• Population: The whole collection of individuals
that one intends to study.
• Sample: A representative part of the population.
• Randomization: An important way to make the
sample representative.
Population and Sample
24. 24
limited population and limitless population
• All the cases with hepatitis B collected in a
hospital in Makeni. (limited)
• All the deaths found from the permanent
residents in Kono. (limited)
• All the rats for testing the toxicity of a medicine.
(limitless)
• All the patients for testing the effect of a
medicine. (limitless) hypertensive, diabetic, …
25. 25
Random
By chance!
• Random event: the event may occur or may not
occur in one experiment.
Before one experiment, nobody is sure whether
the event occurs or not.
Example: weather, traffic accident, …
There must be some regulation in a large number
of experiments.
26. 26
Probability
• Measure the possibility of occurrence of a random
event.
• A : random event
• P(A) : Probability of the random event A
P(A)=1, if an event always occurs.
P(A)=0, if an event never occurs.
27. 27
• Number of observations: n (large enough)
Number of occurrences of random event A: m
f(A) m/n
(Frequency or Relative frequency)
Example: Throw a coin event:
n=100, m (Times of the mark face occurred)=46
m/n=46%, this is the frequency; P(A)=1/2=50%,
this is the Probability.
Estimation of Probability----Frequency
28. 28
4. Parameter and Statistic
• Parameter : A measure of population or
A measure of the distribution of population.
Parameter is usually presented by Greek letter.
such as μ,π,σ.
-- Parameters are unknown usually
29. 29
--To know the parameter of a population, we need
a sample
• Statistic: A measure of sample
or
A measure of the distribution of sample.
Statistic is usually presented by Latin letter
such as s , p, t.
30. 30
Sampling Error
error :The difference between observed value and
true value.
Three kinds of error:
(1) Systematic error (fixed)
(2) Measurement error (random) (Observational error)
(3) Sampling error (random)
31. 31
Sampling error
• The statistics of different samples from the same
population: different from each other!
• The statistics: different from the parameter!
The sampling error exists in any sampling
research.
It can not be avoided but may be estimated.
32. 32
Types of data
1. Numerical Data ( Quantitative Data )
• The variable describe the characteristic of
individuals quantitatively
-- Numerical Data
• The data of numerical variable
-- Quantitative Data
33. 33
2. Categorical Data ( Enumeration Data )
• The variable describe the category of individuals
according to a characteristic of individuals
-- Categorical Data
• The number of individuals in each category
-- Enumeration Data
34. 34
Special case of categorical data :
Ordinal Data ( rank data )
• There exists order among all possible categories. ( level
of measurement). E.g, the stages of a tumour
-- Ordinal Data
• The data of ordinal variable, which represent the order
of individuals only
-- Rank data
35. 35
Examples
Which type of data they belong to?
• RBC (4.58 106/mcL)
• Diastolic/systolic blood pressure
(8/12 kPa) or ( 80/100 mmHg)
• Percentage of individuals with blood type A
(20%) (A, B, AB, O)
• Protein in urine (++) (-, ±, +, ++, +++)
• Incidence rate of breast cancer ( 35/100,000)
36. 36
The Basic Steps of Statistical Work
1. Design of study
• Professional design:
Research aim
Subjects,
Measures, etc.
38. 38
2. Collection of data
• Source of data
Government report system such as: cholera,
plague (black death) …
Registration system such as: birth/death
certificate …
Routine records such as: patient case report …
Ad hoc survey such as: influenza A (H1N1) …
39. 39
• Data collection – Accuracy, complete,
in time
Protocol: Place, subjects, timing; training;
pilot; questionnaire; instruments; sampling
method and sample size; budget…
Procedure: observation, interview, filling
form, letter, telephone, web.
40. 40
3. Data Sorting
• Checking
Hand, computer software
• Amend
• Missing data?
• Grouping
According to categorical variables (sex, occupation,
disease…)
According to numerical variables (age, income, blood
pressure …)
41. 41
4. Data Analysis
• Descriptive statistics (show the sample)
mean, incidence rate …
-- Table and plot
• Inferential statistics (towards the
population)
-- Estimation
-- Hypothesis test (comparison)
42. 42
About Teaching and Learning
• Aim:
Training statistical thinking
Skill of dealing with medical data.
• Emphasize:
Essential concepts and statistical thinking