SlideShare a Scribd company logo
1 of 75
STATISTICS
INTRODUCTION
 Statistics may be defined as the discipline
concerned with the treatment of numerical data
derived from group of individuals.
 Biostatistics is a branch of statistics applied to
biological or medical sciences.
 Consists of various steps like generation of
hypothesis, collection of data, and application of
statistical analysis.
 Two major branches : descriptive and inferential.
 Descriptive statistics explain the distribution of
population measurements by providing types of data,
estimates of central tendency (mean, mode and
median), and measures of variability (standard
deviation, correlation coefficient)
 Inferential statistics is used to express the level of
certainty about estimates and includes hypothesis
testing, standard error of mean, and confidence
interval.
DATA
 Observations recorded during research constitute
data.
The two main types of data:
 Qualitative
 Quantitative
 most studies will have a combination of both
Qualitative Data/Categorical
 variables that do not have a numerical value.
They usually describe a meaning and give a
name or label to variables.
 2 TYPES: nominal and ordinal
Quantitative Data
 These are variables that are truly numerical. Quantitative
data (interval data) may be discrete or continuous
 A continuous variable can take any value within a given
range
eg: hemoglobin (Hb) level may be taken as 11.3, 12.6,
13.4
gm %
 Discrete variable is usually assigned integer values i.e.
does not have fractional values.
Eg:blood pressure values are generally discrete variables
or
number of cigarettes smoked per day by a person.
DATA COLLECTION
 Sample refers to the subjects chosen from the
population for investigation. It should be ensured that
the sample is representative of the whole population
Need for Sampling :
 makes it easier and more economic than studying the
whole population
 saves time, manpower, cost, increases efficiency.
If sample size is too small: it will not give us valid results
If too large: more cost and manpower
Methods of sampling
Random sampling/
probability sampling
Non random / non
probability sampling
 Simple random
sampling
 Stratifies random
sampling
 Systematic random
sampling
 Multistage sampling
 Multiphase sampling
 Cluster random
sampling
 Convenience
sampling
 Contact sampling
 Quota sampling
 Volunteer sampling
 Snow ball sampling
Random sampling
 Simple random: chosen randomly, entirely by chance.
Each individual has same probability of being chosen.
Methods: lottery, random number tables, computer
generated.
 Systematic random: indivuduals in the population are
arranged in a certain manner and a random starting
point is selected and every nth indivudual is selected.
 “n” is the sampling interval ie: total number of units in
the population/ total number of units in the sample.
Some indivuduals have larger probability of being
chosen.
 Stratified random: initially whole population is stratified
(Non homogeneous population is converted to
homogeneous groups) and the systematic random
sampling applied to each strata.
Eg: for a sample of 100 from a population of 1000
(heterogeneous)
first divided into homogeneous strata (ie: 700 males, 300
females),
then select 70 males and 30 females randomly.
 Multistage sampling: done in successive stages. Each
sampling unit is nested in the previous sampling unit.
 Eg: in large country surveys, states are chosen, then
districts , then every 10th person as final sampling unit.
 Multiphase sampling: done in successive phases ie: part
of information is obtained from whole population and
part from subsample.
 Eg: in a Tb survey, Mantoux test done in first phase,
then Xray
done in all mantoux positives, then sputum tested in all
Xray
positives.
 Cluster sampling: applicable when units of population
are natural groups or clusters. All indivuduals in the
cluster are selected as a whole.
Non random sampling
 Selection based on expert knowledge of the population.
Cannot be assured that each item has equal chances of
being selected.
 Convenience sampling: patients selected at the
convenience of the researcher. Eg: selecting shoppers in a
mall as the walk by to fill out a survey.
 Quota sampling: population segmented into mutually
exclusive groups, then judgement is used to select units.
 Snow ball sampling: existing study subjects recruit future
subjects from their acquaintances, thus sample group
appears to grow like a snowball. Used for hidden
population which are difficult to access like drug abusers,
commercial sex workers.
DATA PRESENTATION
Raw data can be presented in three different ways
 tabular,
 graphic (chart),
 numerical (descriptive statistics) forms.
Tabular form
 Frequency, cumulative frequency, relative
freqency tables
 These methods can be used to present all
different types of variables including nominal,
ordinal and quantitative data. In order to present
continuous data by this mode it needs to be
arranged into groups (intervals) first.
Graphic (Chart) Presentation
 Pie chart
 Bar chart
 Histograms
 Frequency curves
 Cumulative frequency curves ( ogive )
 Scatter plots
Pie chart
 useful to show the
proportion of different
groups that constitute
the total sample
 The whole pie
represents the total
sample while the size
occupied by each
group will be
proportional to their
number.
 used for ordinal and
nominal data.
Bar charts
 Bar charts are used to
compare different
classes of data.
 The x axis is usually
dimensionless while the
y axis represents the
frequency of each class.
 Each class could
represent a single
group, or be further
divided into subgroups
HISTOGRAMS
 specialized bar chart used
to give a visual
presentation of interval
data.
 Quantitative data, and in
particular continuous data,
are divided into intervals in
order to be integrated into
frequency tables.
USES:
 To show the mode of
distribution of the data.
 To demonstrate descriptive
statistics like mean, mode
and standard deviation
Frequency Curves
• These are very
similar to
histograms, but
without the bars.
• Advantage is that
they can be used to
compare the
distribution of 2 or
more groups on the
same chart
Ogive
 Data values represented on the horizontal axis
and either the cumulative frequencies, the
cumulative relative frequencies or cumulative
percent frequencies on the vertical axis.
 This type of graph is useful to identify the
proportion of a sample that falls below or above
certain limit.
Scatter plots
• These are used to
determine if there is
any relationship
between two sample
variables.
• strength of the
relationship can be
calculated using a
correlation
coefficient.
Numerical Presentation (Descriptive Statistics)
 Main aim: present a meaningful summary of the
sample data rather than drawing conclusions
about the whole population
 Three key characteristics are distribution, central
tendency and measurements of spread.
2 types of frequency distribution:
 Normal/ Gaussian
 Non normal/ non Gaussian
Gaussian /normal distribution/ symmetrical
 If data is symmetrically distributed on both sides of mean and
form a bell-shaped curve in frequency distribution plot, the
distribution of data is called normal or Gaussian.
 The normal curve describes the ideal distribution of
continuous values i.e. heart rate, blood sugar level and Hb %
level.
 The normal (parametric) distribution is characterized by a
single peak (unimodal) and a symmetrical spread of variables
on either side.
 All central tendency measures (mean, mode and median) are
equal in a normal distribution and they are represented by
the point of maximum frequency. The spread of data is equal
on either side, which represents standard deviation (SD).
 In an ideal Gaussian
distribution, the values
lying between the points
1 SD below and 1 SD
above the mean value
(i.e. ± 1 SD) will include
68.27% of all values.
 The range, mean ± 2 SD
includes approximately
95% of values distributed
about this mean,
excluding 2.5% above
and 2.5% below the
range.
 Methods of analysis :‘t’
test and analysis of
Mean = median = mode
Tot area of the curve = 1
SD, Variance = 1
Skew is zero.
Non gaussian distribution/
assymmetrical
 Skewed – positive, negative
 Bimodal
 Multimodal
 J shaped
 Bow shaped
 V shaped
 If the difference (mean–median) is positive, the curve
is positively skewed and if it is (mean–median)
negative, the curve is negatively skewed, and
therefore, measure of central tendency differs.
 Measures of skewness: Karl pearson measure,
bowley’s measure, kelly’s measure, moment’s
measure
Central tendency in various distributions:
MEASURES OF CENTRAL
TENDENCY
An estimate of the "center" of a distribution of values.
The three central tendency measures are mean, median
and mode
 Mean
 Total sum of the values divided by the number of
variables (arithmetic mean).
 Used for parametric data and should not be used to
report central tendency of ordinal or nominal data.
 Eg: Suppose height of 7 children’s is 60, 70, 80, 90, 90,
100, and 110 cms. mean(X) = Σx/n=600/7=85.71.
 Most affected measure if outliers ( extreme values ) are
present
 Median is the middle value when all the data are
arranged in numerical order. This means that 50% of
the data are below and 50% above that value.
This is preferable to measuring central tendency in
nonparametric
data since it is less affected by outliers than the mean.
 Mode is the most frequently occurring observation in a
set of data. It is not a good indicator of central tendency
but it is the only way for measuring central tendency in
nominal and ordinal data.
Least affected by outliers.
In bimodal distribution, mode = 3 median - 2mean
Measures of Spread/dispersion
Absolute ( have units ) Relative ( no units )
 Range
 Mean deviation
 Standard deviation
 Quartile deviation
 Coefficient of range
 Coefficient of Quartile
deviation
 Coefficient of Mean
deviation
 Coefficient of variation
= (SD/mean)*100
Measures of Spread/dispersion
 Range is the simplest measure of spread, but with limited
practical use. It is the difference between the maximum and the
minimum value in a data set.
 Variance is calculated from the sum of the square of difference
of each value from the mean divided by the total study
population.
 The SD is the square root of the variance. Standard deviation
(SD) describes the variability of the observation about the mean.
Also called root-mean-square value
If sample size is >30, denominator is (ή-1)
 Percentiles are the main measures of non-parametric data
spread.
 Eg: tertiles ( 4 equal parts ), quartiles, pentiles, hextiles,
heptiles, octiles, deciles, centiles ( 100 equal parts )
Quartiles are self explanatory:
 the 1st quartile has 25% of the data below it, the 2nd
quartile corresponds to the median and has 50% of data
below it, and the 3rd quartile has 75% of data below it.
Eg: Percentiles are used in WHO growth chart. Upper
reference curve is 50th centile for boys and lower reference
curve is 3rd centile for girls. Road to health is the space
between these 2 curves which indicates normality ( 95% of
healthy normal children fall in this area )
Standard error of mean
 Since we study some patients (sample) to draw conclusions
about all patients or population and use the sample mean (M) as
an estimate of the population mean (M1) , we need to know how
far M can vary from M1 if repeated samples of size N are taken.
 Standard error of mean SEM = SD/√n
 SEM is always less than SD.
 Measure of difference between sample and population values.
Uses:
 Determine limits of confidence within which the mean would lie
 Determine if a sample is drawn from a known population or not
 Calculate sample size
For example, take fasting blood sugar of 200 lawyers.
Suppose mean is 90 mg% and SD = 8 mg%.
SEM = SD/√n=8/√200=8/14.14=0.56.
Mean fasting blood sugar + 2 SEM = 90 + (2 x 0.56) = 91.12
Mean fasting blood sugar - 2 SEM = 90 - (2 x 0.56) = 88.88
 So, confidence limits of fasting blood sugar of lawyer’s
population are 88.88 to 91.12 mg %. If mean fasting blood
sugar of another lawyer is 80, we can say that, he is not
from the same population
 Confidence Interval (CI) OR (Fiducial limits): Confidence
limits are two extremes of a measurement within which
95% observations would lie.
Mathematical relationships between
two variables
 Correlation
 Regression
Correlation
 Measure of degree of linear relationship between two
continuous variables. It is represented by ‘r’.
 The association is positive if the values of x-axis and y-axis
tend to be high or low together.
 The association is negative i.e. -1 if the high y axis values
tends to go with low values of x axis and considered as
perfect negative correlation.
 Larger the correlation coefficient, stronger is the
association.
 EG: correlation between height and weight, age and height,
weight loss and poverty, parity and birth weight,
socioeconomic status and hemoglobin.
 The correlation coefficient values are always between -1
and +1. If the variables are not correlated, then
correlation coefficient is zero
 Correlation is represented by scatter diagram.
 Pearson coefficient : for Gaussian distribution
 Spearman coefficient: for non Gaussian distribution
Regression
 Provides structure of relationship between 2 quantitative
variables.
 Regression coefficient(b) : measures change in a dependant
variable(y) with change in independent variable(x) /variables
(x1,x2,x3)
Types of regression:
 simple linear ( 1 dependant and 1 independent variable),
 multiple linear ( 1dependant and more than 1 independent
variable)
 simple curvilinear (1 dependant and 1independent variable with
some power of independent variable)
 multiple curvilinear (1 dependant and more than 1 independent
variable with some power of independent variable)
Null Hypothesis
 The primary object of statistical analysis is to find out
whether the effect produced by a compound under study is
genuine and is not due to chance.
 First step in such a test is to state the null hypothesis.
 In null hypothesis (statistical hypothesis), we make
assumption that there exist no differences between the two
groups.
Eg: ‘drug A is not better than the placebo’
 Alternative hypothesis (research hypothesis) states that
there is a difference between two groups.
Eg: ‘there is a difference between new drug ‘A’ and placebo.’
 When the null hypothesis is accepted, the difference
between the two groups is not significant.
 If alternative hypothesis is proved i.e. null hypothesis is
rejected, then the difference between two groups is
statistically significant.
 A difference between drug ‘A’ and placebo group,
which would have arisen by chance is less than five
percent of the cases, that is less than 1 in 20 times is
considered as statistically significant (P < 0.05).
Errors in statistics
 Errors in estimation/sampling
 Errors in statistical analysis
Errors in estimation
Random error Systematic error
 Error in measurement
ie: measured values
are inconsistent when
repeated measures of
a variable are taken
 Unpredictable.
Considered as ‘noise’
 Precision is the
opposite
 Doesn’t affect average,
but affects variability
around mean
 Caused by any factor
which systematically
affects measurements
of variable
 Affects the mean
 Called as bias
 Opposite is accuracy
(validity)
 Precision: degree to which repeated measurements
show same or similar results
Also called repeatability, reliability, consistency,
reproducibility
 Accuracy: degree of closeness of a measured value to
its actual/true value
 Any systematic error in an epidemiological study
occuring during data collection, compilation, analysis,
or intepretation.
Predominantly 3 types:
1. Subject bias eg: hawthorne bias, recall bias
2. Observer/ investigator bias eg: selection bias,
berkesonian bias
3. Analyser bias
Bias
 Hawthorne/ attention bias: subjects may alter their
behavior when they know they are being observed.
 Apprehension bias: certain variables ( BP, Heart rate)
may alter from usual levels if subject is apprehensive
 Berkesonian bias/ admission rate bias: bias due to
hospital cases and controls being systematically
different from each other
 Selection bias: Selection bias occurs as a result of
patients declining to take part in a clinical trial and
therefore those who do take part may differ in some way.
 Publication bias: Studies with positive or statistically
significant results are more likely to be published by
scientific journals compared with studies yielding
negative trials.
Measures to minimise bias
 Blinding
- Single blinding eliminates subject bias
- Double blinding eliminates subject and observer bias
- Triple blinding eliminates sunject, observer and analyzer
bias
 Randomization – eliminates selection bias
Randomization ensures that the two groups are comparable
and that the only difference between them is the intervention
of interest.
 Matching – eliminates confounding
Errors in analysis
Type I error ( false rejection of null hypothesis, FALSE
POSITIVE)
 Also known as α error.
 It is the probability of finding a difference when no such
difference actually exists.
 Type I error can be made small by changing the level of
significance and by increasing the size of sample.
 Eg: we proved in our trial that new drug ‘A’ has an analgesic
action and accepted as an analgesic. If we commit type I
error in this experiment, then subsequent trial on this
compound will automatically reject our claim that drug ‘A’ is
having analgesic action
 Probability of type 1 error is given by P value.
 Significance level or α level is the maximum tolerable
level of type 1 error. α level is fixed in advance.
 If p value < significance level, results are declared
statistically significant.
 Most commonly P value less than 0.05 or 5% is
considered as significant level. If we may adopt a
different standard like P < 0.01 or 1% then, type 1 error
will be reduced.
Type II Error (false acceptance of null hypothesis, FALSE
NEGATIVE)
 This is also called as β error.
 It is the probability of inability to detect the difference when
it actually exists.
 This error is more serious because once we labelled the
compound as inactive, there is possibility that nobody will
try it again.
 Minimized by taking larger sample and by employing
sufficient dose of the compound under trial.
 Most medical research will accept a β value of 0.2
 Study power is the probability that it will detect a statistically
significant difference if one exists. It is calculated as (1-β).
Acceptable power of a study: 0.8
Other measures of probability
Odds ratio
 This is used to measure the effect of certain intervention on
the probability of an event happening.
 An odds ratio of 1 mean there is no significant difference
between the 2 groups.
 In a case control study, OR is calculated from the 2 by 2
table
 OR ( Cross product ratio) = ad/bc
 Interpretation of OR : >1 – associated, =1 – not associated,
<1 – has protective effect
Risk ratio
Risk ratio is very similar to odds ratio. In risk ratio calculations
that the denominator is the total population.
Relative risk = incidence among exposed/ incidence among
non exposed ie:
[a/(a+b)]/[c/(c=d)]
Attributable risk: indicates to what extent the disease can be
attributed to the exposure =(incidence among exposed-
incidence among non exposed)/ incidence among exposed
Sample Size Determination
Factors Influencing Sample Size Include:
1) Prevalence of particular event or characteristics- If the
prevalence is high, small sample can be taken and vice versa.
If prevalence is not known, then it can be obtained by a pilot
study.
2) Probability level considered for accuracy of estimate- If we
need more safeguard about conclusions on data, we need a
larger sample. Hence, the size of sample would be larger when
the safeguard is 99% than when it is only 95%.
3) Availability of money, material, and manpower.
4) Time bound study curtails the sample size
Sample Size Determination and Variance Estimate
Formula requires the knowledge of standard deviation or
variance.
Frequently used sources for estimation of standard
deviation are:
 A pilot or preliminary sample may be drawn from the
population, and the variance computed from the sample
may be used as an estimate of standard deviation.
Observations used in pilot sample may be counted as a
part of the
final sample.
 From the previous or similar studies
5 points are to be considered very carefully.
1. Assess the minimum expected difference between the
groups.
2. find out standard deviation of variables.
3. set the level of significance (alpha level, generally set
at P < 0.05) and Power of study (1-beta = 80%).
4. select the formula from computer programs to obtain
the sample size. Various softwares are available free
of cost for calculation of sample size and power of
study.
5. Lastly, appropriate allowances are given for non-
compliance and dropouts, and this will be the final
sample size for each group in study.
Power of Study
 It is a probability that study will reveal a difference
between the groups if the difference actually exists.
 Power of study is very important while calculation of
sample size.
 Any study to be scientifically sound should have at least
80% power. If power of study is less than 80%,
probability of missing the difference is high
 If we increase the power of study, then sample size also
increases.
STATISTICAL TESTS
 parametric tests (for gaussian distribution)
 non-parametric tests (for non-Gaussian distribution)
 Non-parametric tests are less powerful than parametric
tests. Generally, P values tend to be higher, making it
harder to detect real differences.
Few systematic steps should be followed to establish the
appropriate test for a data.
1. Identify whether the data are Qualitative or Quantitative
2. For Quantitative data, determine the type of distribution
3. Decide how many groups are being compared
4. Determine whether the data is paired or not.
Student’s ‘t’ Test
 Applied for analysis when the number of sample is 30 or
less. If sample size is more than 30, ‘Z’ test is applied.
 It is usually applicable for graded data like blood sugar
level, body weight, height etc.
 When comparison has to be made between two
measurements in the same subjects after two consecutive
treatments, paired ‘t’ test is used. Eg: when we want to
compare effect of drug A (i.e. decrease blood sugar) before
start of treatment (baseline) and after 1 month of treatment
with drug A.
 When comparison is made between two
measurements in two different groups, unpaired
‘t’ test is used.
For example, when we compare the effects of drug
A and B (i.e. mean change in blood sugar) after one
month from baseline in both groups, unpaired ‘t’
test’ is applicable.
ANOVA
 One way ANOVA
It compares three or more unmatched groups when the
data are categorized in one way.
For example, to compare a control group with three
different doses of aspirin in rats. Here, there are four
unmatched group of rats.
 Two way ANOVA
Determines how a response is affected by two factors.
For to measure response to three different drugs in both
men and women.
Chi-square test
 The Chi-square test is a non-parametric test of
proportions.
 Two events can often be studied for their association
such as smoking and cancer, treatment and outcome of
disease, level of cholesterol and coronary heart disease
 Test measures the probability (P) or relative frequency of
association due to chance and also if two events are
associated or dependent on each other.
 Though, Chi-square test tells an association between two
events or characters, it does not measure the strength of
association.
 Chi-square (x2) is calculated from the 2*2
contingency table
Types of clinical studies
RETROSPECTIVE PROSPECTIVE
 Case control
 Cross sectional
 Cohort
 Randomized/ non
randomized
interventional control
trials
Retrospective studies
 Retrospective studies look backward in time and
select study groups based on their exposure to a
risk or protective factor in relation to an outcome
established at the start of the study
Useful in:
 rare conditions when a prospective approach
would take too long
 significant lag period between exposure and
disease
 situations where a prospective investigation may
be unethical
 insufficient evidence to justify an interventional
Advantages: Retrospective studies are relatively
inexpensive and can utilize existing databases and
registers.
Disadvantages:
 recall bias
 not possible to randomize the groups – confounding
factors may be present
 Cross-sectional studies/prevalence study
Examine either a random sample or all of the subjects in
a well-defined study population in order to obtain the
answer to a specific clinical question. They include
surveys and studies which examine the prevalence of a
disease.
Prevalence = (new+old cases/total population)*100
 Case–control studies
Patients with a specific disease or condition are selected
and matched to a control group. The cases and controls
are then compared for potential risk factors or causative
agents implicated in the aetiology of the disease.
Disadv: various types of bias
 Observational cohort studies
Cohort studies involve the selection of two or more groups
and their subsequent follow-up over a number of years.
The groups are selected based on the differences in their
exposure to a particular agent and patients are followed
up to see who develops the illness.
Incidence = (number of new cases/ tot population at
risk)*1000
 Randomized and non-randomized (cohort) interventional
controlled trials
It evaluates an intervention rather than merely observing
two or more groups over time. Systematic bias should be
avoided.
The groups being compared should ideally only be
Basic steps in conducting a RCT:
1. Drawing up a protocol
2. Selecting reference and experimental population
3. Randomization
4. Intervention
5. Follow up
6. Assessment of outcome
Sensitivity
 Ability of the test to correctly identify those
patients with the disease.
 A test with 80% sensitivity detects 80% of
patients with the disease (true positives) but 20%
with the disease go undetected (false negatives).
 A high sensitivity is clearly important where the
test is used to identify a serious but treatable
disease (e.g. cervical cancer).
Specificity
 Ability of the test to correctly identify those
patients without the disease.
 A test with 80% specificity correctly reports 80%
of patients without the disease as test negative
(true negatives) but 20% patients without the
disease are incorrectly identified as test positive
(false positives).
Positive predictive value
 The PPV of a test determines how likely is it that
a patient has the disease given that the test result
is positive.
 PPV is more accurate if the prevalence of the
disease in the population is high.
Negative predictive value
 The NPV of a test determines how likely is it that
a patient does not have the disease given that the
test result is negative
 Likelihood ratio is defined as how much more
likely is it that a patient who tests positive has the
disease compared with one who tests negative
THANK YOU

More Related Content

Similar to Statistics ppt.ppt

Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in ResearchManoj Sharma
 
Review of Basic Statistics and Terminology
Review of Basic Statistics and TerminologyReview of Basic Statistics and Terminology
Review of Basic Statistics and Terminologyaswhite
 
3 measures of central dendency
3  measures of central dendency3  measures of central dendency
3 measures of central dendencyDr. Nazar Jaf
 
Statistical treatment and data processing copy
Statistical treatment and data processing   copyStatistical treatment and data processing   copy
Statistical treatment and data processing copySWEET PEARL GAMAYON
 
Biostatistics basics-biostatistics4734
Biostatistics basics-biostatistics4734Biostatistics basics-biostatistics4734
Biostatistics basics-biostatistics4734AbhishekDas15
 
Biostatistics basics-biostatistics4734
Biostatistics basics-biostatistics4734Biostatistics basics-biostatistics4734
Biostatistics basics-biostatistics4734AbhishekDas15
 
Statistics "Descriptive & Inferential"
Statistics "Descriptive & Inferential"Statistics "Descriptive & Inferential"
Statistics "Descriptive & Inferential"Dalia El-Shafei
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencyMmedsc Hahm
 
Introduction to statistics in health care
Introduction to statistics in health care Introduction to statistics in health care
Introduction to statistics in health care Dhasarathi Kumar
 
Biostatistics
BiostatisticsBiostatistics
Biostatisticspriyarokz
 
descriptive and inferential statistics
descriptive and inferential statisticsdescriptive and inferential statistics
descriptive and inferential statisticsMona Sajid
 
1 descriptive statistics
1 descriptive statistics1 descriptive statistics
1 descriptive statisticsSanu Kumar
 

Similar to Statistics ppt.ppt (20)

Stat-Lesson.pptx
Stat-Lesson.pptxStat-Lesson.pptx
Stat-Lesson.pptx
 
Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in Research
 
Unit 3 Sampling
Unit 3 SamplingUnit 3 Sampling
Unit 3 Sampling
 
Review of Basic Statistics and Terminology
Review of Basic Statistics and TerminologyReview of Basic Statistics and Terminology
Review of Basic Statistics and Terminology
 
Basic concept of statistics
Basic concept of statisticsBasic concept of statistics
Basic concept of statistics
 
3 measures of central dendency
3  measures of central dendency3  measures of central dendency
3 measures of central dendency
 
Statistical treatment and data processing copy
Statistical treatment and data processing   copyStatistical treatment and data processing   copy
Statistical treatment and data processing copy
 
Biostatistics basics-biostatistics4734
Biostatistics basics-biostatistics4734Biostatistics basics-biostatistics4734
Biostatistics basics-biostatistics4734
 
Biostatistics basics-biostatistics4734
Biostatistics basics-biostatistics4734Biostatistics basics-biostatistics4734
Biostatistics basics-biostatistics4734
 
Statistics "Descriptive & Inferential"
Statistics "Descriptive & Inferential"Statistics "Descriptive & Inferential"
Statistics "Descriptive & Inferential"
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Advanced statistics
Advanced statisticsAdvanced statistics
Advanced statistics
 
STATISTICS.pptx
STATISTICS.pptxSTATISTICS.pptx
STATISTICS.pptx
 
Introduction to statistics in health care
Introduction to statistics in health care Introduction to statistics in health care
Introduction to statistics in health care
 
Bgy5901
Bgy5901Bgy5901
Bgy5901
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Descriptive Analysis.pptx
Descriptive Analysis.pptxDescriptive Analysis.pptx
Descriptive Analysis.pptx
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
descriptive and inferential statistics
descriptive and inferential statisticsdescriptive and inferential statistics
descriptive and inferential statistics
 
1 descriptive statistics
1 descriptive statistics1 descriptive statistics
1 descriptive statistics
 

Recently uploaded

Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment BookingHousewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...narwatsonia7
 
VIP Call Girls Pune Vani 9907093804 Short 1500 Night 6000 Best call girls Ser...
VIP Call Girls Pune Vani 9907093804 Short 1500 Night 6000 Best call girls Ser...VIP Call Girls Pune Vani 9907093804 Short 1500 Night 6000 Best call girls Ser...
VIP Call Girls Pune Vani 9907093804 Short 1500 Night 6000 Best call girls Ser...Miss joya
 
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Service
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls ServiceKesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Service
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Servicemakika9823
 
Hi,Fi Call Girl In Mysore Road - 7001305949 | 24x7 Service Available Near Me
Hi,Fi Call Girl In Mysore Road - 7001305949 | 24x7 Service Available Near MeHi,Fi Call Girl In Mysore Road - 7001305949 | 24x7 Service Available Near Me
Hi,Fi Call Girl In Mysore Road - 7001305949 | 24x7 Service Available Near Menarwatsonia7
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiNehru place Escorts
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowRiya Pathan
 
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...CALL GIRLS
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.MiadAlsulami
 
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...Miss joya
 
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...narwatsonia7
 
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls ServiceCall Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Servicenarwatsonia7
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escortsvidya singh
 
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls JaipurCall Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipurparulsinha
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Colaba Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls Colaba Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls Colaba Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls Colaba Mumbai ❤️ 9920874524 👈 Cash on Deliverynehamumbai
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...narwatsonia7
 
Low Rate Call Girls Ambattur Anika 8250192130 Independent Escort Service Amba...
Low Rate Call Girls Ambattur Anika 8250192130 Independent Escort Service Amba...Low Rate Call Girls Ambattur Anika 8250192130 Independent Escort Service Amba...
Low Rate Call Girls Ambattur Anika 8250192130 Independent Escort Service Amba...narwatsonia7
 

Recently uploaded (20)

Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment BookingHousewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
 
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCREscort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
 
VIP Call Girls Pune Vani 9907093804 Short 1500 Night 6000 Best call girls Ser...
VIP Call Girls Pune Vani 9907093804 Short 1500 Night 6000 Best call girls Ser...VIP Call Girls Pune Vani 9907093804 Short 1500 Night 6000 Best call girls Ser...
VIP Call Girls Pune Vani 9907093804 Short 1500 Night 6000 Best call girls Ser...
 
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Service
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls ServiceKesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Service
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Service
 
Hi,Fi Call Girl In Mysore Road - 7001305949 | 24x7 Service Available Near Me
Hi,Fi Call Girl In Mysore Road - 7001305949 | 24x7 Service Available Near MeHi,Fi Call Girl In Mysore Road - 7001305949 | 24x7 Service Available Near Me
Hi,Fi Call Girl In Mysore Road - 7001305949 | 24x7 Service Available Near Me
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
 
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...
 
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...
 
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls ServiceCall Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
 
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls JaipurCall Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Colaba Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls Colaba Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls Colaba Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls Colaba Mumbai ❤️ 9920874524 👈 Cash on Delivery
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
 
Low Rate Call Girls Ambattur Anika 8250192130 Independent Escort Service Amba...
Low Rate Call Girls Ambattur Anika 8250192130 Independent Escort Service Amba...Low Rate Call Girls Ambattur Anika 8250192130 Independent Escort Service Amba...
Low Rate Call Girls Ambattur Anika 8250192130 Independent Escort Service Amba...
 

Statistics ppt.ppt

  • 2. INTRODUCTION  Statistics may be defined as the discipline concerned with the treatment of numerical data derived from group of individuals.  Biostatistics is a branch of statistics applied to biological or medical sciences.  Consists of various steps like generation of hypothesis, collection of data, and application of statistical analysis.
  • 3.  Two major branches : descriptive and inferential.  Descriptive statistics explain the distribution of population measurements by providing types of data, estimates of central tendency (mean, mode and median), and measures of variability (standard deviation, correlation coefficient)  Inferential statistics is used to express the level of certainty about estimates and includes hypothesis testing, standard error of mean, and confidence interval.
  • 4. DATA  Observations recorded during research constitute data. The two main types of data:  Qualitative  Quantitative  most studies will have a combination of both
  • 5. Qualitative Data/Categorical  variables that do not have a numerical value. They usually describe a meaning and give a name or label to variables.  2 TYPES: nominal and ordinal
  • 6. Quantitative Data  These are variables that are truly numerical. Quantitative data (interval data) may be discrete or continuous  A continuous variable can take any value within a given range eg: hemoglobin (Hb) level may be taken as 11.3, 12.6, 13.4 gm %  Discrete variable is usually assigned integer values i.e. does not have fractional values. Eg:blood pressure values are generally discrete variables or number of cigarettes smoked per day by a person.
  • 7. DATA COLLECTION  Sample refers to the subjects chosen from the population for investigation. It should be ensured that the sample is representative of the whole population Need for Sampling :  makes it easier and more economic than studying the whole population  saves time, manpower, cost, increases efficiency. If sample size is too small: it will not give us valid results If too large: more cost and manpower
  • 8. Methods of sampling Random sampling/ probability sampling Non random / non probability sampling  Simple random sampling  Stratifies random sampling  Systematic random sampling  Multistage sampling  Multiphase sampling  Cluster random sampling  Convenience sampling  Contact sampling  Quota sampling  Volunteer sampling  Snow ball sampling
  • 9. Random sampling  Simple random: chosen randomly, entirely by chance. Each individual has same probability of being chosen. Methods: lottery, random number tables, computer generated.  Systematic random: indivuduals in the population are arranged in a certain manner and a random starting point is selected and every nth indivudual is selected.  “n” is the sampling interval ie: total number of units in the population/ total number of units in the sample. Some indivuduals have larger probability of being chosen.
  • 10.  Stratified random: initially whole population is stratified (Non homogeneous population is converted to homogeneous groups) and the systematic random sampling applied to each strata. Eg: for a sample of 100 from a population of 1000 (heterogeneous) first divided into homogeneous strata (ie: 700 males, 300 females), then select 70 males and 30 females randomly.  Multistage sampling: done in successive stages. Each sampling unit is nested in the previous sampling unit.  Eg: in large country surveys, states are chosen, then districts , then every 10th person as final sampling unit.
  • 11.  Multiphase sampling: done in successive phases ie: part of information is obtained from whole population and part from subsample.  Eg: in a Tb survey, Mantoux test done in first phase, then Xray done in all mantoux positives, then sputum tested in all Xray positives.  Cluster sampling: applicable when units of population are natural groups or clusters. All indivuduals in the cluster are selected as a whole.
  • 12. Non random sampling  Selection based on expert knowledge of the population. Cannot be assured that each item has equal chances of being selected.  Convenience sampling: patients selected at the convenience of the researcher. Eg: selecting shoppers in a mall as the walk by to fill out a survey.  Quota sampling: population segmented into mutually exclusive groups, then judgement is used to select units.  Snow ball sampling: existing study subjects recruit future subjects from their acquaintances, thus sample group appears to grow like a snowball. Used for hidden population which are difficult to access like drug abusers, commercial sex workers.
  • 13. DATA PRESENTATION Raw data can be presented in three different ways  tabular,  graphic (chart),  numerical (descriptive statistics) forms.
  • 14. Tabular form  Frequency, cumulative frequency, relative freqency tables  These methods can be used to present all different types of variables including nominal, ordinal and quantitative data. In order to present continuous data by this mode it needs to be arranged into groups (intervals) first.
  • 15.
  • 16. Graphic (Chart) Presentation  Pie chart  Bar chart  Histograms  Frequency curves  Cumulative frequency curves ( ogive )  Scatter plots
  • 17. Pie chart  useful to show the proportion of different groups that constitute the total sample  The whole pie represents the total sample while the size occupied by each group will be proportional to their number.  used for ordinal and nominal data.
  • 18. Bar charts  Bar charts are used to compare different classes of data.  The x axis is usually dimensionless while the y axis represents the frequency of each class.  Each class could represent a single group, or be further divided into subgroups
  • 19. HISTOGRAMS  specialized bar chart used to give a visual presentation of interval data.  Quantitative data, and in particular continuous data, are divided into intervals in order to be integrated into frequency tables. USES:  To show the mode of distribution of the data.  To demonstrate descriptive statistics like mean, mode and standard deviation
  • 20. Frequency Curves • These are very similar to histograms, but without the bars. • Advantage is that they can be used to compare the distribution of 2 or more groups on the same chart
  • 21. Ogive  Data values represented on the horizontal axis and either the cumulative frequencies, the cumulative relative frequencies or cumulative percent frequencies on the vertical axis.  This type of graph is useful to identify the proportion of a sample that falls below or above certain limit.
  • 22. Scatter plots • These are used to determine if there is any relationship between two sample variables. • strength of the relationship can be calculated using a correlation coefficient.
  • 23. Numerical Presentation (Descriptive Statistics)  Main aim: present a meaningful summary of the sample data rather than drawing conclusions about the whole population  Three key characteristics are distribution, central tendency and measurements of spread. 2 types of frequency distribution:  Normal/ Gaussian  Non normal/ non Gaussian
  • 24. Gaussian /normal distribution/ symmetrical  If data is symmetrically distributed on both sides of mean and form a bell-shaped curve in frequency distribution plot, the distribution of data is called normal or Gaussian.  The normal curve describes the ideal distribution of continuous values i.e. heart rate, blood sugar level and Hb % level.  The normal (parametric) distribution is characterized by a single peak (unimodal) and a symmetrical spread of variables on either side.  All central tendency measures (mean, mode and median) are equal in a normal distribution and they are represented by the point of maximum frequency. The spread of data is equal on either side, which represents standard deviation (SD).
  • 25.  In an ideal Gaussian distribution, the values lying between the points 1 SD below and 1 SD above the mean value (i.e. ± 1 SD) will include 68.27% of all values.  The range, mean ± 2 SD includes approximately 95% of values distributed about this mean, excluding 2.5% above and 2.5% below the range.  Methods of analysis :‘t’ test and analysis of Mean = median = mode Tot area of the curve = 1 SD, Variance = 1 Skew is zero.
  • 26. Non gaussian distribution/ assymmetrical  Skewed – positive, negative  Bimodal  Multimodal  J shaped  Bow shaped  V shaped
  • 27.  If the difference (mean–median) is positive, the curve is positively skewed and if it is (mean–median) negative, the curve is negatively skewed, and therefore, measure of central tendency differs.  Measures of skewness: Karl pearson measure, bowley’s measure, kelly’s measure, moment’s measure
  • 28. Central tendency in various distributions:
  • 29. MEASURES OF CENTRAL TENDENCY An estimate of the "center" of a distribution of values. The three central tendency measures are mean, median and mode  Mean  Total sum of the values divided by the number of variables (arithmetic mean).  Used for parametric data and should not be used to report central tendency of ordinal or nominal data.  Eg: Suppose height of 7 children’s is 60, 70, 80, 90, 90, 100, and 110 cms. mean(X) = Σx/n=600/7=85.71.  Most affected measure if outliers ( extreme values ) are present
  • 30.  Median is the middle value when all the data are arranged in numerical order. This means that 50% of the data are below and 50% above that value. This is preferable to measuring central tendency in nonparametric data since it is less affected by outliers than the mean.  Mode is the most frequently occurring observation in a set of data. It is not a good indicator of central tendency but it is the only way for measuring central tendency in nominal and ordinal data. Least affected by outliers. In bimodal distribution, mode = 3 median - 2mean
  • 31. Measures of Spread/dispersion Absolute ( have units ) Relative ( no units )  Range  Mean deviation  Standard deviation  Quartile deviation  Coefficient of range  Coefficient of Quartile deviation  Coefficient of Mean deviation  Coefficient of variation = (SD/mean)*100
  • 32. Measures of Spread/dispersion  Range is the simplest measure of spread, but with limited practical use. It is the difference between the maximum and the minimum value in a data set.  Variance is calculated from the sum of the square of difference of each value from the mean divided by the total study population.  The SD is the square root of the variance. Standard deviation (SD) describes the variability of the observation about the mean. Also called root-mean-square value If sample size is >30, denominator is (ή-1)
  • 33.  Percentiles are the main measures of non-parametric data spread.  Eg: tertiles ( 4 equal parts ), quartiles, pentiles, hextiles, heptiles, octiles, deciles, centiles ( 100 equal parts ) Quartiles are self explanatory:  the 1st quartile has 25% of the data below it, the 2nd quartile corresponds to the median and has 50% of data below it, and the 3rd quartile has 75% of data below it. Eg: Percentiles are used in WHO growth chart. Upper reference curve is 50th centile for boys and lower reference curve is 3rd centile for girls. Road to health is the space between these 2 curves which indicates normality ( 95% of healthy normal children fall in this area )
  • 34. Standard error of mean  Since we study some patients (sample) to draw conclusions about all patients or population and use the sample mean (M) as an estimate of the population mean (M1) , we need to know how far M can vary from M1 if repeated samples of size N are taken.  Standard error of mean SEM = SD/√n  SEM is always less than SD.  Measure of difference between sample and population values. Uses:  Determine limits of confidence within which the mean would lie  Determine if a sample is drawn from a known population or not  Calculate sample size
  • 35. For example, take fasting blood sugar of 200 lawyers. Suppose mean is 90 mg% and SD = 8 mg%. SEM = SD/√n=8/√200=8/14.14=0.56. Mean fasting blood sugar + 2 SEM = 90 + (2 x 0.56) = 91.12 Mean fasting blood sugar - 2 SEM = 90 - (2 x 0.56) = 88.88  So, confidence limits of fasting blood sugar of lawyer’s population are 88.88 to 91.12 mg %. If mean fasting blood sugar of another lawyer is 80, we can say that, he is not from the same population  Confidence Interval (CI) OR (Fiducial limits): Confidence limits are two extremes of a measurement within which 95% observations would lie.
  • 36. Mathematical relationships between two variables  Correlation  Regression
  • 37. Correlation  Measure of degree of linear relationship between two continuous variables. It is represented by ‘r’.  The association is positive if the values of x-axis and y-axis tend to be high or low together.  The association is negative i.e. -1 if the high y axis values tends to go with low values of x axis and considered as perfect negative correlation.  Larger the correlation coefficient, stronger is the association.  EG: correlation between height and weight, age and height, weight loss and poverty, parity and birth weight, socioeconomic status and hemoglobin.
  • 38.  The correlation coefficient values are always between -1 and +1. If the variables are not correlated, then correlation coefficient is zero  Correlation is represented by scatter diagram.  Pearson coefficient : for Gaussian distribution  Spearman coefficient: for non Gaussian distribution
  • 39. Regression  Provides structure of relationship between 2 quantitative variables.  Regression coefficient(b) : measures change in a dependant variable(y) with change in independent variable(x) /variables (x1,x2,x3) Types of regression:  simple linear ( 1 dependant and 1 independent variable),  multiple linear ( 1dependant and more than 1 independent variable)  simple curvilinear (1 dependant and 1independent variable with some power of independent variable)  multiple curvilinear (1 dependant and more than 1 independent variable with some power of independent variable)
  • 40. Null Hypothesis  The primary object of statistical analysis is to find out whether the effect produced by a compound under study is genuine and is not due to chance.  First step in such a test is to state the null hypothesis.  In null hypothesis (statistical hypothesis), we make assumption that there exist no differences between the two groups. Eg: ‘drug A is not better than the placebo’  Alternative hypothesis (research hypothesis) states that there is a difference between two groups. Eg: ‘there is a difference between new drug ‘A’ and placebo.’
  • 41.  When the null hypothesis is accepted, the difference between the two groups is not significant.  If alternative hypothesis is proved i.e. null hypothesis is rejected, then the difference between two groups is statistically significant.  A difference between drug ‘A’ and placebo group, which would have arisen by chance is less than five percent of the cases, that is less than 1 in 20 times is considered as statistically significant (P < 0.05).
  • 42. Errors in statistics  Errors in estimation/sampling  Errors in statistical analysis
  • 43. Errors in estimation Random error Systematic error  Error in measurement ie: measured values are inconsistent when repeated measures of a variable are taken  Unpredictable. Considered as ‘noise’  Precision is the opposite  Doesn’t affect average, but affects variability around mean  Caused by any factor which systematically affects measurements of variable  Affects the mean  Called as bias  Opposite is accuracy (validity)
  • 44.  Precision: degree to which repeated measurements show same or similar results Also called repeatability, reliability, consistency, reproducibility  Accuracy: degree of closeness of a measured value to its actual/true value
  • 45.  Any systematic error in an epidemiological study occuring during data collection, compilation, analysis, or intepretation. Predominantly 3 types: 1. Subject bias eg: hawthorne bias, recall bias 2. Observer/ investigator bias eg: selection bias, berkesonian bias 3. Analyser bias Bias
  • 46.  Hawthorne/ attention bias: subjects may alter their behavior when they know they are being observed.  Apprehension bias: certain variables ( BP, Heart rate) may alter from usual levels if subject is apprehensive  Berkesonian bias/ admission rate bias: bias due to hospital cases and controls being systematically different from each other
  • 47.  Selection bias: Selection bias occurs as a result of patients declining to take part in a clinical trial and therefore those who do take part may differ in some way.  Publication bias: Studies with positive or statistically significant results are more likely to be published by scientific journals compared with studies yielding negative trials.
  • 48. Measures to minimise bias  Blinding - Single blinding eliminates subject bias - Double blinding eliminates subject and observer bias - Triple blinding eliminates sunject, observer and analyzer bias  Randomization – eliminates selection bias Randomization ensures that the two groups are comparable and that the only difference between them is the intervention of interest.  Matching – eliminates confounding
  • 49. Errors in analysis Type I error ( false rejection of null hypothesis, FALSE POSITIVE)  Also known as α error.  It is the probability of finding a difference when no such difference actually exists.  Type I error can be made small by changing the level of significance and by increasing the size of sample.  Eg: we proved in our trial that new drug ‘A’ has an analgesic action and accepted as an analgesic. If we commit type I error in this experiment, then subsequent trial on this compound will automatically reject our claim that drug ‘A’ is having analgesic action
  • 50.  Probability of type 1 error is given by P value.  Significance level or α level is the maximum tolerable level of type 1 error. α level is fixed in advance.  If p value < significance level, results are declared statistically significant.  Most commonly P value less than 0.05 or 5% is considered as significant level. If we may adopt a different standard like P < 0.01 or 1% then, type 1 error will be reduced.
  • 51. Type II Error (false acceptance of null hypothesis, FALSE NEGATIVE)  This is also called as β error.  It is the probability of inability to detect the difference when it actually exists.  This error is more serious because once we labelled the compound as inactive, there is possibility that nobody will try it again.  Minimized by taking larger sample and by employing sufficient dose of the compound under trial.  Most medical research will accept a β value of 0.2  Study power is the probability that it will detect a statistically significant difference if one exists. It is calculated as (1-β). Acceptable power of a study: 0.8
  • 52. Other measures of probability Odds ratio  This is used to measure the effect of certain intervention on the probability of an event happening.  An odds ratio of 1 mean there is no significant difference between the 2 groups.  In a case control study, OR is calculated from the 2 by 2 table  OR ( Cross product ratio) = ad/bc  Interpretation of OR : >1 – associated, =1 – not associated, <1 – has protective effect
  • 53. Risk ratio Risk ratio is very similar to odds ratio. In risk ratio calculations that the denominator is the total population. Relative risk = incidence among exposed/ incidence among non exposed ie: [a/(a+b)]/[c/(c=d)] Attributable risk: indicates to what extent the disease can be attributed to the exposure =(incidence among exposed- incidence among non exposed)/ incidence among exposed
  • 54. Sample Size Determination Factors Influencing Sample Size Include: 1) Prevalence of particular event or characteristics- If the prevalence is high, small sample can be taken and vice versa. If prevalence is not known, then it can be obtained by a pilot study. 2) Probability level considered for accuracy of estimate- If we need more safeguard about conclusions on data, we need a larger sample. Hence, the size of sample would be larger when the safeguard is 99% than when it is only 95%. 3) Availability of money, material, and manpower. 4) Time bound study curtails the sample size
  • 55. Sample Size Determination and Variance Estimate Formula requires the knowledge of standard deviation or variance. Frequently used sources for estimation of standard deviation are:  A pilot or preliminary sample may be drawn from the population, and the variance computed from the sample may be used as an estimate of standard deviation. Observations used in pilot sample may be counted as a part of the final sample.  From the previous or similar studies
  • 56. 5 points are to be considered very carefully. 1. Assess the minimum expected difference between the groups. 2. find out standard deviation of variables. 3. set the level of significance (alpha level, generally set at P < 0.05) and Power of study (1-beta = 80%). 4. select the formula from computer programs to obtain the sample size. Various softwares are available free of cost for calculation of sample size and power of study. 5. Lastly, appropriate allowances are given for non- compliance and dropouts, and this will be the final sample size for each group in study.
  • 57. Power of Study  It is a probability that study will reveal a difference between the groups if the difference actually exists.  Power of study is very important while calculation of sample size.  Any study to be scientifically sound should have at least 80% power. If power of study is less than 80%, probability of missing the difference is high  If we increase the power of study, then sample size also increases.
  • 58. STATISTICAL TESTS  parametric tests (for gaussian distribution)  non-parametric tests (for non-Gaussian distribution)  Non-parametric tests are less powerful than parametric tests. Generally, P values tend to be higher, making it harder to detect real differences. Few systematic steps should be followed to establish the appropriate test for a data. 1. Identify whether the data are Qualitative or Quantitative 2. For Quantitative data, determine the type of distribution 3. Decide how many groups are being compared 4. Determine whether the data is paired or not.
  • 59.
  • 60. Student’s ‘t’ Test  Applied for analysis when the number of sample is 30 or less. If sample size is more than 30, ‘Z’ test is applied.  It is usually applicable for graded data like blood sugar level, body weight, height etc.  When comparison has to be made between two measurements in the same subjects after two consecutive treatments, paired ‘t’ test is used. Eg: when we want to compare effect of drug A (i.e. decrease blood sugar) before start of treatment (baseline) and after 1 month of treatment with drug A.
  • 61.  When comparison is made between two measurements in two different groups, unpaired ‘t’ test is used. For example, when we compare the effects of drug A and B (i.e. mean change in blood sugar) after one month from baseline in both groups, unpaired ‘t’ test’ is applicable.
  • 62. ANOVA  One way ANOVA It compares three or more unmatched groups when the data are categorized in one way. For example, to compare a control group with three different doses of aspirin in rats. Here, there are four unmatched group of rats.  Two way ANOVA Determines how a response is affected by two factors. For to measure response to three different drugs in both men and women.
  • 63. Chi-square test  The Chi-square test is a non-parametric test of proportions.  Two events can often be studied for their association such as smoking and cancer, treatment and outcome of disease, level of cholesterol and coronary heart disease  Test measures the probability (P) or relative frequency of association due to chance and also if two events are associated or dependent on each other.  Though, Chi-square test tells an association between two events or characters, it does not measure the strength of association.
  • 64.  Chi-square (x2) is calculated from the 2*2 contingency table
  • 65. Types of clinical studies RETROSPECTIVE PROSPECTIVE  Case control  Cross sectional  Cohort  Randomized/ non randomized interventional control trials
  • 66. Retrospective studies  Retrospective studies look backward in time and select study groups based on their exposure to a risk or protective factor in relation to an outcome established at the start of the study Useful in:  rare conditions when a prospective approach would take too long  significant lag period between exposure and disease  situations where a prospective investigation may be unethical  insufficient evidence to justify an interventional
  • 67. Advantages: Retrospective studies are relatively inexpensive and can utilize existing databases and registers. Disadvantages:  recall bias  not possible to randomize the groups – confounding factors may be present
  • 68.  Cross-sectional studies/prevalence study Examine either a random sample or all of the subjects in a well-defined study population in order to obtain the answer to a specific clinical question. They include surveys and studies which examine the prevalence of a disease. Prevalence = (new+old cases/total population)*100  Case–control studies Patients with a specific disease or condition are selected and matched to a control group. The cases and controls are then compared for potential risk factors or causative agents implicated in the aetiology of the disease. Disadv: various types of bias
  • 69.  Observational cohort studies Cohort studies involve the selection of two or more groups and their subsequent follow-up over a number of years. The groups are selected based on the differences in their exposure to a particular agent and patients are followed up to see who develops the illness. Incidence = (number of new cases/ tot population at risk)*1000  Randomized and non-randomized (cohort) interventional controlled trials It evaluates an intervention rather than merely observing two or more groups over time. Systematic bias should be avoided. The groups being compared should ideally only be
  • 70. Basic steps in conducting a RCT: 1. Drawing up a protocol 2. Selecting reference and experimental population 3. Randomization 4. Intervention 5. Follow up 6. Assessment of outcome
  • 71. Sensitivity  Ability of the test to correctly identify those patients with the disease.  A test with 80% sensitivity detects 80% of patients with the disease (true positives) but 20% with the disease go undetected (false negatives).  A high sensitivity is clearly important where the test is used to identify a serious but treatable disease (e.g. cervical cancer).
  • 72. Specificity  Ability of the test to correctly identify those patients without the disease.  A test with 80% specificity correctly reports 80% of patients without the disease as test negative (true negatives) but 20% patients without the disease are incorrectly identified as test positive (false positives).
  • 73. Positive predictive value  The PPV of a test determines how likely is it that a patient has the disease given that the test result is positive.  PPV is more accurate if the prevalence of the disease in the population is high.
  • 74. Negative predictive value  The NPV of a test determines how likely is it that a patient does not have the disease given that the test result is negative  Likelihood ratio is defined as how much more likely is it that a patient who tests positive has the disease compared with one who tests negative