3. Types Of Statistics
• Descriptive: summarization of data.
• Inferential: to generalize the
findings to a larger population from
the sample collected.
6. Estimation & Estimate
• Estimation is the process by which one
makes inferences about a population,
based on information obtained from a
sample.
• Estimate is the number computed by
using the data collected from a sample
7. Estimator
• An estimator is a statistic that estimates some fact
about the population.
• Estimator creates an estimate e,g, sample mean (x̄) is
an estimator for the population mean, μ.
• Estimator is the tool & estimate is the product.
8. Objective of the Estimate
• To determine the approximate value of
population parameter on the basis of
sample statistic
9. Properties of a good
estimator:
• Unbiased: parameter & statistics same
• Efficiency: small variance
i, e, precise
• Consistency:
• Sufficiency:
10. Basic Terminology
• Population: is the collection of all
individuals or items under
consideration in a statistical study.
• Sample: is that part of the population
from which information is collected.
11.
12. Related Terms
• A parameter is an unknown numerical
summary of the population.
• A statistic is a known numerical summary
of the sample which can be used to make
inference about parameters.
13.
14.
15. Types of Estimate
1. Point Estimate:
• A single value estimate for
a population parameter.
e,g, Mean, proportion, OR, RR.
2. Interval estimate:
( Confidence Interval)
16. Point Estimate
• Calculated from random variables
• Vary from study to study
• Importance of point estimates lies in fact that
many statistical formulas based on them.
17. Confidence Interval
(CI)
•Interval around point estimate
with upper & lower limits
expected to contain unknown
the population parameter with
certain degree of confidence.
18. •This degree of confidence is
the – confidence level e,g,
95% CI, 99% CI.
•Upper & lower limits are -
confidence limit.
19.
20. CI consists of 3 parts:
1.Confidence level
2.A statistic/ Point estimate
3.A margin of error
23. Central Limit Theorem
& Law of Large Number
States when SS is large >30
• Is normally distributed.
• Has a mean equal to the
population mean.
24. Rule of Normal
Distribution Curve
• All of the data will fall within three SD of the mean.
It has three parts:
• 68% of data falls within the 1 SD.
• 95% fall within 2 SD.
• 99.7% fall within 3 SD.
• The rule is also called the 68-95-99.7
• Rule or the Three Sigma Rule.
25. 95% CI
Definition 1:
95% sure that the true mean (μ) will fall
within the upper and lower bounds.
Definition 2:
95% of the intervals constructed using
sample means ( x ) will contain the true
mean ( μ ).
26.
27.
28. When to use t statistic
•When sample size is small
•Popular standard deviation not
known.
29.
30.
31. CI in proportion
two pieces of information: the z-score and the P-hat. and
P-hat is just dividing the number of events by the number
of trials.
34. Width of CI depends on:
• Sample size:
• Degree of confidence level
• Variability of the Data
**In narrow CI population parameter
is more likely close to the to the
point estimate.
39. Why CI important in modern
research?
• For decades researchers relied on the p-value
• Calculating a CI became a mandatory pre-requisite
in modern research
• Many international journals do not accept
manuscripts for publication if testing is only based
only on the p-value, why?
• CI provides magnitude of the effect.
40. P value vs CI
• CI along with p value is pre-requite for publication.
• P value only shows significance not quantifies the
strength or weakness
• As such, p-value is NOT the probability of making a
type I error. does NOT indicate the size or
importance of the observed effect.
41. CI for null Hypothesis
• If CI calculated for effect & outcome & if CI
include null value zero then H0 accepted if not
H0 rejected.
• 95% CI range is 1.16 - 6.84 so H0 rejected &
result significant.
• 95% CI range is -0.15 to 2.85 so H0 accepted &
not significant
• At the same time provide effect size & clinical
significance of the result.
42. CI for Risk Ratio RR
Risk of outcome in exposed
• RR= ---------------------------------------------
Risk of outcome in unexposed
Risk of lung cancer in smokers
• RR= ---------------------------------------------
Risk of lung cancer in nonsmokers
• RR = 1 means exposure is not a risk,
• If CI includes Ho value 1 then no effect & same
interpretation for OR.
43. Relation Of CI & p value
• Complementary to each other.
In Hypothesis testing:
• When null hypothesis accepted CI contains null
value 0 & p is >0.05&
• When not include null value p value is <0.05 &
null hypothesis is rejected & result is significant.
• Similarly when CI include 1 in ratio then p is
>0.05 result is not significant.
• Similarly when CI not include 1 in ratio then p is
<0.05 result is significant.
44. Problems with the CI
It has inherited two important
pitfalls:
• First, always a 5% risk to assume a
significant difference when actually no
difference exists (Type I error).
• Second, as it measures the effect of
chance; statistically significant does not
mean clinically significant.
45. Summary
• CI is essential for all the values like, mean, proportion,
OR, RR etc.
• Confidence level i.e. 95% CI expresses degree of
uncertainty.
• Margin of error indicates precision.
• Narrow CI is desirable.
• P value & CI are complementary to each other in
significance test
• CI provides magnitude & effect size to have decision
making for clinical significance.
Editor's Notes
appreciate evidence-based medicine are daunting to say the least, especially when confronted with the myriad of statistics in any paper. healthcare students to the interpretation of some of the most commonly used statistics for reporting the results of medical research.
It is the simple description of the findings , are used to describe the basic features of the data in a study.
Summarize the data numerically and graphically using descriptive statistics and graphs, usually known as descriptive statistics. Together with simple graphics analysis,
Make statements about some feature about the population (parameter) after analyzing the data, usually known as inferential statistics.
Inference is the act of generalizing from the data (sample) to a larger phenomenon (Population) with calculated degree of certainity .
Use a random sample to learn something about a larger population.
Estimating population parameters from sample statistics is one of the major applications of inferential statistics.
Major statistical method by which Sample statistics is used to estimate the corresponding population parameter
Entire group of study subject is the population having common characteristics, information also termed as target population or reference population.
It is rarely possible to study the whole of a population, but it is possible to take a subset of a population to study. A subset of a population is called a sample
It is the summary value of population obtained by studying the whole population which is in real sense not practical so its representative value\ue can be obtained from its
subset which is the sample and summary value of sample is the static obtained by calculation of data from sample,
They are random & vary from sample to sample obtained from the same population even when sample size is equal.
Population parameters are identified as greek letter and sample are by Roman letters.
• sample proportion pˆ (“p hat”) is the point estimate of p
sample mean x (“x bar”) is the point estimate of μ
• sample standard deviation s is the point estimate of σ
point estimate: Point estimate
• Point estimates are single points that estimates parameter directly which serve as a "best guess" or "best estimate" of an unknown population parameter
We study sample to get inference on population parameter. Sample statistics give us point estimate of population. What we get is unlikely to be true population value due to sampling variation range of value around point estimate is needed which likely to have the population parameter. This range is the cI. Calculation of CI takes only sampling variation & sampling error into account not the systematic error or bias.
More the sample error less is the precision & vise versa.
Sample statistics with high precision is more valid estimate of population parameter.
So SE is the measure precision of sample statistics as an estimate of population parameter.
SE is the estimate how much sample estimate deviate from population parameter.
Sample statistics and SE used to calculate CI
States when SS is large >30
Is normally distributed (regardless of the shape of the population from which the samples were drawn)
Has a mean equal to the population mean, “mu” regardless of the shape or size of the population size of the sample
Has a standard deviation--the standard error of the mean--equal to the population standard deviation divided by the square root of the sample size
What is the difference between law of large numbers and central limit theorem?
The Central limit Theorem states that when sample size tends to infinity, the sample mean will be normally distributed.
The Law of Large Number states that when sample size tends to infinity, the sample mean equals to population mean
The empirical rule states that for a normal distribution, nearly all of the data will fall within three standard deviations of the mean. The empirical rule can be broken down into three parts:
68% of data falls within the first standard deviation from the mean.
95% fall within two standard deviations.
99.7% fall within three standard deviations.
The rule is also called the 68-95-99 7 Rule or the Three Sigma Rule.
For decades researchers relied on the p-value to report whether this effect is true (significant) or just has happened by chance (insignificant). In the last decade, many international journals do not accept manuscripts for publication if testing for significance was based only on the p-value. Calculating a confidence interval (CI) for every variable measured became a mandatory pre-requisite in modern research. But why should researchers measure the confidence interval and what benefit do we get from the confidence interval over the p-value?
Null hypothesis is no relation, no effect
Null value is zero (H0= 0)
If CI calculated for effect & outcome & if CI include null value zero then H0 accepted if not H0 rejected.
At the same time provide effect size & clinical significance of the result.
In general, a condence interval and hypothesis test have equivalent information, i.e., if the null value (value under H0) is in the condence interval, then the test will not reject (not signicant). If null value falls outside 95% CI then H0 is rejected.
For difference measure null value undern………………………..mozammel 234
Odds ratio (OR)
An odds ratio is a relative measure of effect, which allows the comparison of the intervention group of a study relative to the comparison or placebo group.
So when researchers calculate an odds ratio they do it like this:
The numerator is the odds in the intervention arm
The denominator is the odds in the control or placebo arm = Odds Ratio (OR)
So if the outcome is the same in both groups the ratio will be 1, which implies there is no difference between the two arms of the study.
However:
If the OR is > 1 the control is better than the intervention.
If the OR is < 1 the intervention is better than the control.
Concept check 1
If the trial comparing SuperStatin to placebo with the outcome of all cause mortality found the following:
Odds of all cause mortality for SuperStatin were 0.4
Odds of all cause mortality for placebo were 0.8
Odds ratio would equal 0.5
So if the trial comparing SuperStatin to placebo stated “OR 0.5”
What would it mean?
A. The odds of death in the SuperStatin arm are 50% less than in the placebo arm.
B. There is no difference between groups
C. The odds of death in the placebo arm are 50% less than in the SuperStatin arm.