4. D Ezz Abdelfattah
Understanding &
defining the problem
What to measure?
& how to measure?
Collection & Editing
Constructingtables,
Graphs, Estimation,
Looking for patterns,
Testing Hypotheses,
Predicting,…
Interpretation, New
ideas & Solving the
problem
What is the Steps for Scientific Research?
Dr Ezz H. Abdelfattah
4
5. Previously
discussed
problem
1st time
(new)
problem
Dr Ezz H. Abdelfattah
What is the mean value of ...?
What is the percentage of …?
What is the your opinion regarding…?
Is the mean value of ... Equals …?
Is the percentage of …Equals …?
There is a relation between … and …?
There is a difference between … and …?
We Answer Using:
Estimation
Likert Scale
We Answer Using:
Testing Hypotheses
9. Sample Size for Estimating the Population
Mean(µ)
2
E
Z
n
Sample Size
n is the required size of the sample
E is the maximum allowable error
For example if E=0.05 and =1.1, then with
95% Confidence, we have:
is the population standard deviation
Z is the standard Normal value (e.g. = 1.96 at confidence 95%)
10. Sample Size for Estimating the Population
Proportion() 2
)1(
E
Z
ppn
Sample Size
n is the required size of the sample
E is the maximum allowable error
For example if E=0.05 and p =0.30 then with
95% Confidence, we have:
P is the sample proportion
Z is the standard Normal value (e.g. = 1.96 at confidence 95%)
11. D Ezz Abdelfattah
What is biostatistics?
Biostatistics provides a framework for the analysis of
data.
Through the application of statistic principles to the
biologic sciences, biostatisticians are able to
methodically distinguish between true differences
among observations and random variations caused by
chance alone.
11
12. D Ezz Abdelfattah
How is biostatistics useful?
From an application standpoint, knowledge of
biostatistics and epidemiology permits one to make
valid conclusions (information) from data sets.
Associations between risk factors and disease are
determined with this information and, ultimately, are
used to reduce illness and injury.
12
16. D Ezz Abdelfattah
Why is the measurement levels important?
AppropriateStatistical measurement or test is based on measurementlevel
16
17. D Ezz Abdelfattah
What is the difference between a and a ?
A sample is a subgroup of the population. Used to study
the population
A population consists of all subjects (human or
otherwise) that are being studied.
17
18. D Ezz Abdelfattah
What is the difference between
a and a ?
A statistic is a characteristic or measure obtained by
using the data values from a sample.
A parameter is a characteristic or measure obtained by
using the data values from a specific population
18
20. How can the Inferential Statistics draw a picture” about what
we don’t have?
Dr Ezz H. Abdelfattah
Population Sample
Inferential
statistics
Decision
Descriptive
statistics
20D Ezz Abdelfattah
21. How can the Inferential Statistics draw a picture” about what
we don’t have?
Dr Ezz H. Abdelfattah 21D Ezz Abdelfattah
26. D Ezz Abdelfattah
Comparative Studies
gather past data from selected cases and controls
to determine differences, if any, in exposure to a
suspected risk factor. These are commonly
referred to as case–control studies
Retrospective studies
Prospective Studies
enroll group or groups of subjects and follow
them over certain periods of time.
examples include occupational mortality studies
and clinical trials
What is the difference between Retrospective and Prospective
studies?
26
30. D Ezz Abdelfattah
What is the Analysis of ?
30
The cohort study design focuses on a particular exposure rather
than a particular disease as in case–control studies.
Basic survival analysis and Cox’s proportional hazards
regression—were developed to deal with survival data resulting
from prospective or cohort studies.
Survival analysis, which was developed to deal with data
resulting from prospective studies, is also focused on the
occurrence of an event, such as death or relapse of a disease,
after some initial treatment—a binary outcome.
31. D Ezz Abdelfattah
What is the Analysis of ?
31
The basic difference with the logistic regression analysis is that:
For survival data, studies have staggered entry, and subjects are
followed for varying lengths of time; they do not have the same
probability for the event to occur.
Second, each member of the cohort belongs to one of three types
of termination:
1. Subjects still alive on the analysis date
2. Subjects who died on a known date within the study period
3. Subjects who are lost to follow-up after a certain date (This is
known as Censoring).
In prospective studies, the important feature is not only the
outcome event, such as death, but the time to that event, the
survival time.
32. D Ezz Abdelfattah
What is Kaplan-Meier is used for?
32
The Kaplan-Meier procedure is a method of estimating time-to-
event models in the presence of censored cases.
Example. Does a new treatment for
AIDS have any therapeutic benefit in
extending life? We could conduct a
study using two groups of AIDS
patients, one receiving traditional
therapy and the other receiving the
experimental treatment. Constructing a
Kaplan-Meier model from the data
would allow us to compare overall
survival rates between the two groups
to determine whether the experimental
treatment is an improvement over the
traditional therapy. We can also plot
the survival or hazard functions and
compare them visually for more
detailed information.
33. D Ezz Abdelfattah
What is Cox’s regression is used for?
33
Cox Regression builds a predictive model for time-to-event data.
The model produces a survival function that predicts the
probability that the event of interest has occurred at a given time t
for given values of the predictor variables.
The shape of the survival function and the regression coefficients for the
predictors are estimated from observed subjects; the model can then be
applied to new cases that have measurements for the predictor variables.
Example. Do men and women have different risks of developing lung cancer based
on cigarette smoking? By constructing a Cox Regression model, with cigarette usage
(cigarettes smoked per day) and gender entered as covariates, we can test
hypotheses regarding the effects of gender and cigarette usage on time-to-onset for
lung cancer.
34. D Ezz Abdelfattah
What is ROC is used for?
34
ROC Curves procedure is a useful way to evaluate the performance
of classification schemes in which there is one variable with two
categories by which subjects are classified.
Example. It is in a researcher interest to correctly classify pregnant women into those
women who will and will give vaginal delivery, so special methods are developed for
making these decisions. ROC curves can be used to evaluate how well these methods
perform.
41. How do we take a decision
(the P value)
D Ezz Abdelfattah
Answering a Statistical Question
41
42. p-Value in Hypothesis Testing
• p-VALUE is the probability of observing a sample value as
extreme as, or more extreme than, the value observed, given
that the null hypothesis is true.
e.g: H0: Mean PB for male = Mean PB for female
• In testing a hypothesis, we can also compare the p-value to
the significance level ().
• Decision rule using the p-value:
Reject H0 if p-value < significance level
42D Ezz Abdelfattah
43. To perform a hypothesis test using the p-value approach
• If P-value ≤ , then the test is significant (reject H0)
otherwise, the test is not significant (do not reject H0).
• Assume that we find that p-value = 0.03,
• Assume that want to use = 0.05
• then the test is significant, that is we reject the null
hypothesis at = 0.05 because
P-value = 0.03 < 0.05
(Note that the test will not be significant at = 0.01)
P-value = 0.03 > 0.01
D Ezz Abdelfattah
43
44. What does it mean when p-value < ?
(a) .05, we have strong evidence that H0 is not true.
(b) .01, we have very strong evidence that H0 is not true.
(c) .001, we have extremely strong evidence that H0 is not true.
44D Ezz Abdelfattah