Biostats Lec-2.pdf

ONLINE
ONLINE
BIOSTATISTICS
Lecture - 2

2. Normal Distribution:
• Distribution are arranged in linear fashion and vary
continuously on both the sides from the central value
• Probability distribution is symmetric about the mean

Examples of Normal distribution:
1. Measures of size of living tissue:
Length, height, skin area, weight
2. The length of inert appendages:
Hair, claws, nails, teeth
3. Certain physiological
measurements, such as blood
pressure of adult humans.

The curve plotted with the help of data of normal distribution presents a bell
shaped symmetrical curve called "normal distribution curve". This curve is also
known as the "Gaussian curve".
Mean = Median = Mode

3. Poisson Distribution :
It expresses the probability of a given number of events occurring in a fixed interval of
time or space, if these events occur with a known constant mean rate and
independently of the time since the last event
Eg. Telephone call per hour
Online order per day
Number of radioactive decay events per second
Number of plants per meter square
Number of mutation in DNA strand per unit length
The proportion of cells that will be infected at a given MOI
The number of deaths per year in a given age group.
The number of bacteria in a certain amount of liquid.

Conditions for Poisson Distribution:
• An event can occur any number of times during a time period.
• Events occur independently. In other words, if an event
occurs, it does not affect the probability of another event
occurring in the same time period.
• The rate of occurrence is constant; that is, the rate does not
change based on time.
• The probability of an event occurring is proportional to the
length of the time period. For example, it should be twice as
likely for an event to occur in a 2 hour time period than it is
for an event to occur in a 1 hour period.

• Let X be the discrete random variable that represents the
number of events observed over a given time period.
• Let λ is the average number of events per interval
• k is the number of times an event occurs in an interval
(k can take values 0, 1, 2, ....)
where e is Euler's number.
There is no upper limit on
the value of k for this
formula, though the
probability rapidly
approaches 0 as k increases

IFAS ONLINE
Apply Your Mind
Given below are names of statistical distribution (Column I) and their characteristic
features (Column II)
Column I Column II
A. Binomial distribution i. Each observation represents one of two outcomes
(success or failure)
B. Poisson distribution ii. Probability that is symmetric about the mean
C. Normal distribution iii Probability of a given number of events happening in a
fixed interval of time
Which one of the following represents a correct match between columns I and II?
(DEC 2018)
(1) A - (ii) ; B - (i) ; C - (iii)
(2) A - (i) ; B - (ii) ; C - (iii)
(3) A - (i); B - (iii) ; C - (ii)
(4) A - (iii) ; B - (ii); C - (i)

IFAS ONLINE
Apply Your Mind
Which one of the following statements regarding normal distribution is NOT correct?
(JUNE 2019)
(1) It is symmetric around the mean
(2) It is symmetric around the median
(3) It is symmetric around the variance.
(4) It is symmetric around the mode.

IFAS ONLINE
Apply Your Mind
A weed is assumed to be dispersed randomly in a meadow. What statistical distribution
will describe the dispersion correctly? (DEC 2013)
(1) Binomial (2) Negative Binomial
(3) Poisson (4) Normal

IFAS ONLINE
Apply Your Mind
A researcher samples n individuals randomly from a population of blackbuck and
identifies their sex. The number of females in the sample follows (DEC 2019)
(1) an exponential distribution
(2) a binomial distribution
(3) a Poisson distribution
(4) a normal distribution

• First, you need to identify the target
population of your research.
• The population is the entire group
that you want to draw conclusions
about.
• The sample is the specific group of
individuals that you will collect data
from.
• The sampling frame is the actual list
of individuals that the sample will
be drawn from.

When you conduct
research about a
group of people, it’s
rarely possible to
collect data from
every person in that
group. Instead, you
select a sample.
There are two types of sampling methods:
Probability sampling involves random
selection, allowing you to make statistical
inferences about the whole group.
Non-probability sampling involves non-
random selection based on convenience or
other criteria, allowing you to easily collect
initial data.

Probability sampling methods
Probability sampling means that every member of the
population has a chance of being selected.
There are four main types of probability sample

1. Simple Random Sample:
A random sample is a sample where each item of the
population has an equal chance of being included in the
sample
You want to select a
simple random sample
of 100 employees of
Company X. You
assign a number to
every employee in the
company database
from 1 to 1000, and
use a random number
generator to select 100
numbers.

2. Systematic Random sampling
• Population is large, scattered
and not homogeneous.
• Samples are selected at regular
intervals from the population
All employees of the company
are listed in alphabetical order.
From the first 10 numbers, you
randomly select a starting
point: number 2. From number
2 onwards, every 3rd person
on the list is selected (5, 8, 11,
14, and so on), and you end
up with a sample of 100
people.

3. Stratified Random sampling:
Used when the population is not homogeneous and large
Population is divided into groups/clusters and within each
group cluster, a probability sample is selected from it.
The company has 800 female
employees and 200 male
employees. You want to ensure
that the sample reflects the
gender balance of the company,
so you sort the population into
two strata based on gender.
Then you use random sampling
on each group, selecting 80
women and 20 men, which
gives you a representative
sample of 100 people.

4. Clustered sampling:
Population is divided into groups/clusters and a sample of
group cluster is chosen using probability method.
The company has offices
in 10 cities across the
country (all with roughly
the same number of
employees in similar
roles). You don’t have the
capacity to travel to every
office to collect your data,
so you use random
sampling to select 3 offices
– these are your clusters.

Non-random sampling
• In a non-probability sample,
individuals are selected
based on non-random
criteria, and not every
individual has a chance of
being included.
• This type of sample is easier
and cheaper to access, but
you can’t use it to make valid
statistical inferences about
the whole population.
Appropriate for
exploratory
and qualitative research.
The aim is not to test
a hypothesis about a
broad population, but to
develop an initial
understanding

1. Convenience sampling
• It includes the individuals who happen
to be most accessible to the
researcher.
Research about
opinions about student
support services in
your university, so after
each of your classes,
you ask your fellow
students to complete
a survey on the topic.
This is an easy and inexpensive way to
gather initial data, but there is no way
to tell if the sample is representative of
the population, so it can’t produce
generalizable results.

2. Opportunity/Voluntary response
sampling:
Only participants available and willing to participate are used.

3. Purposive sampling
• This type of sampling
involves the researcher
using their judgment to
select a sample that is
most useful to the
purposes of the
research.
You want to know more
about the opinions and
experiences of failed
students at your college

4. Snowball sampling
• If the population is hard to
access, snowball sampling can
be used to recruit participants
via other participants.
• The number of people you
have access to “snowballs” as
you get in contact with more
people.
You are researching
experiences of homelessness
in your city. there is no list of all
homeless, probability sampling
isn’t possible.
One person who agrees to
participate in the research, and
he puts you in contact with
other homeless people that she
knows in the area.

IFAS ONLINE
Apply Your Mind
Given below are sampling techniques and their features. Which one of the following
options correctly matches sampling techniques with their features? (DEC 2019 ASSAM)
(1) A-(ii); B-(i); C-(iv); D-(iii)
(2) A-(ii); B-(iv); C-(iii); D-(i)
(3) A-(i); B-(iv); C-(iii); D-(ii)
(4) A-(i); B-(iv); C-(ii); D-(iii)

PARAMETRIC AND NON
PARAMETRIC TESTS

Non-parametric tests:
• Tests don’t require that your data follow the normal
distribution.
• They’re also known as distribution-free tests and can
provide benefits in certain situations (nominal/ordinal).
• Used when individual variability among the study groups is
high
Example: Chi square test,
Spearman Correlation,
Kruskal Wallis Test,
Mann-Whitney U test,
Mann-Kendall’s test

Parametric tests
• Make assumptions about the parameters of the population
distribution from which the sample is drawn.
• These test assume that the population data are normally
distributed.
• Parametric test is more powerful as compared to non-
parametric test.
• Results can be significantly affected by outliers in a
parametric test.
Example:
• Paired/unpaired t-test
• ANOVA
• Pearson correlation

• Parametric
• t-test is used for differences in a continuous
dependent variable between two groups.
• An ANOVA assesses for difference in a continuous
dependent variable between more than two groups
• Non-parametric:
• Mann-Whitney U test is used for differences in a
continuous dependent variable between two groups.
• Kruskal-Wallis test assesses for difference in a
continuous dependent variable between more than
two groups
Test to assess differences in continuous dependent
variable in two or more groups

Test to assess strength of association between two variables
• Parametric test:
• Pearson correlation is used when assessing the relationship
between two continuous variables.
• Non-parametric test:
• Spearman correlation is appropriate when at least one of the
variables is measured on an ordinal scale.
• Kendall rank correlation is a non-parametric test that measures the
strength of dependence between two variables

IFAS ONLINE
Apply Your Mind
Choose the correct answer from the statements indicated below: (DEC 2018)
(1) Chi square test is parametric.
(2) Non-parametric test assumes normal distribution.
(3) Results can be significantly affected by outliers in a parametric test.
(4) Non-parametric test is more powerful as compared to parametric test.

IFAS ONLINE
Apply Your Mind
Two groups (Control, Treated) are to be compared to test the effect of a treatment.
Since individual variability is high in both groups, the appropriate statistical test to use is
(JUNE 2015)
(1) Analysis of variance.
(2) Kendall's test.
(3) Student's t-test.
(4) Mann-Whitney U-test.

IFAS ONLINE
Apply Your Mind
The frequency distribution of tree heights in two forest areas with different annual
rainfall are given Which of the following statistical analysis will you choose to test
whether rainfall has an effect on tree heights? (JUNE 2013)
(1) t-test for comparison of means.
(2) A non-parametric comparison of the two groups
(3) Correlation analysis of rainfall and mean tree heights.
(4) Regression of tree heights on rainfall.

IFAS ONLINE
Apply Your Mind
The use of Kruskal Wallis test is most appropriate in which of these cases? (JUNE 2016)
(1) There are more than two groups and each group is normally distributed.
(2) There are more than two groups and the distribution in each group is not normal.
(3) There are two groups and each group is normally distributed.
(4) There are two groups and the distribution in each group is not normal

CONFIDENCE INTERVAL
• A Confidence Interval is a range of values we are
fairly sure our true value lies in.

EXAMPLE: Average Height of humans
• We measure the heights of 100 randomly
chosen men, and get a mean height of
175cm.
• We calculate standard deviation and it comes
out to be 20 cm.
• The 95% Confidence Interval is: 175cm ±
3.92cm.
• This says the true mean of ALL men (if we
measure) is likely to be between 171.08cm
and 178.92cm in 95 % of cases.
• So there is a 1-in-20 chance (5%) that our
Confidence Interval does NOT include the
true mean.

How to calculate CI
Step 1:
• Start with the number of
observations (n=100)
• Calculate the mean (μ=175 cm)
• Calculate the standard
deviation (σ=20 cm)

Step 2:
Decide what Confidence Interval we want:
95% or 99% are common choices.
Then find the "Z" value for that Confidence
Interval here:

• Step 3: use that Z value in this formula for the
Confidence Interval

Observation (n) = 100
Mean (μ) = 175 cm
Standard deviation (σ) = 20
Z (for 95% CI) = 1.96
92
.
3
175
2
96
.
1
175
10
20
96
.
1
175
100
20
96
.
1
175










IFAS ONLINE
Apply Your Mind
The mean and standard deviation of serum cholesterol in a population of senior citizens
are assumed to be 200 and 24mg/dl, respectively. In a random sample of 36 senior
citizens, what values of cholesterol (to the nearest whole number) should lead to
rejection of the null hypothesis at 95% confidence level? (JUNE 2015)
(1) above 224
(2) above 248
(3) below 176 and above 224
(4) below 192 and above 208

IFAS ONLINE
Apply Your Mind
The number of seeds in the fruit of a plant species, Ho : µ=30. A random sample of 9
fruits gives the mean number of seeds as 24 with a standard deviation of 6.12.
(a) What are the confidence limits for the sample mean?
(b) Would your reject or accept the null hypothesis at 95% confidence level?
(DEC 2014)
(1) (a) 18 and 30, (b) reject the hypothesis
(2) (a) 20 and 28, (b) reject the hypothesis
(3) (a) 20 and 28, (b) accept the hypothesis
(4) (a) 18 and 30, (b) accept the hypothesis

ERRORS AND LEVEL OF
SIGNIFICANCE

Null Hypothesis:
• The null hypothesis, H0 is the commonly accepted
fact;.
• It is the opposite of the alternate hypothesis (H1).
• Researchers work to reject, nullify or disprove the
null hypothesis.
• Researchers come up with an alternate hypothesis,
one that they think explains a phenomenon, and
then work to reject the null hypothesis.
Example: Null hypothesis, H0: Corona Virus will not be
effective in causing disease at high temperature
Alternate hypothesis H1 : Corona can cause disease even at
high temperature

Hypothesis Testing
• Hypothesis: explanation based on limited evidences
• No hypothesis is 100 % true, unless proven
• Always chance of drawing incorrect conclusions
(errors)

Two types of errors in testing of hypothesis
1. Type I error: Rejection of null hypothesis which is true.
2. Type II error: Acceptance of null hypothesis which is false.
Reject H0 Accept H0
H0 is True Type I error Correct Decision
H0 is False Correct Decision Type II error

Research on
drug
Reality
Beneficial
Harmful
Beneficial Harmful
Type II error
Type I error
OK
OK

Significance level (p-value):
• The level of significance (p-value) is used in
hypothesis testing to help you support or reject the
null hypothesis.
• The p-value is the evidence against a null hypothesis.
• The smaller the p-value, the strong the evidence
that you should reject the null hypothesis.
• Thus, p-value is the probability of committing type I
error
• The smaller the p-value, lesser is the probability of
the error (Type I) of rejecting a true null hypothesis

If p > 0.10 → “not significant”
If p ≤ 0.10 → “marginally significant”
If p ≤ 0.05 → “significant”
If p ≤ 0.01 → “highly significant.”
Significance level (p): Error for Rejection of True Null Hypothesis:
p ≤ 0.05 : This means that the probability of accepting a
true alternative hypothesis is 95% and committing type I
error is 5%)

IFAS ONLINE
Apply Your Mind
In the following statement taken from a research paper, what does p in the parenthesis
stands for? (DEC 2011)
“The mean temperature of this region now is significantly higher than the one 50
years ago (p<0.05, t-test)”
(1) Ratio of the mean temperature of the two times periods tested
(2) Probability of the error of rejecting a true null hypothesis
(3) Probability of the error of accepting a false null hypothesis
(4) Probability of the t-test being effective in detecting significant difference in the
mean annual temperatures of the two periods

Biostats Lec-2.pdf

Recommended

Recommended

More Related Content

Similar to Biostats Lec-2.pdf

Similar to Biostats Lec-2.pdf (20)

Recently uploaded

Recently uploaded (20)

Biostats Lec-2.pdf