3. â˘INTRODUCTION
â˘Statistics is defined as, "the discipline that concerns with the
collection, organization, analysis, summarization,
interpretation and presentation of data".
4. ⢠A.L.Bowley: Science of counting or science of averages
⢠Turtle: a body of principles and techniques of collecting, classifying,
presenting, comparing and interpreting the quality data
⢠Wallis and Roberts: Statistics is a body of methods for making
decisions in the face of uncertainty
⢠Croxton and Cowden: defined as the collection ,presentation, analysis
and interpretation of numerical data
5. ⢠Descriptive statistics: methods of data collection ,presentation and
characterization of a set of data. All these help in describing the
various features of the collected sample data. It includes graphical
representation and quantitative measures Eg: bar charts, line graphs
⢠Inferential statistics
⢠Helps in characterizing a population or help in decision making which
is based on the sample results of the population
⢠The larger unit about which analysis is to be done is called population
and the fraction or portion of that population is called Sample
6. ⢠Biostatistics:
⢠Special science related to figures which is responsible to collect, analyze and
interpret the data obtained from an experimental study or a survey
⢠Biostatistics the branch of statistics thatdeals with data relating to living
organisms.
â˘Biostatistics applied to the collection, analysis, and interpretation of biological
data and especially data relating to human biology, health, and medicine
⢠Biostatistics is the branch of statistics applied to biological or medical
sciences, nursing, public health.
7. ⢠Uses of biostatistics
⢠To check whether the difference between two populations is real or a
chance occurrence for a particular attribute
⢠Used to evaluate efficiency of vaccines
⢠To fix priorities in public health programs
8. ⢠Steps in Biostatistics:
1. Generation of hypothesis.
2. Collection of experimental data.
3. Classification of the collected data.
4. Categorization and analysis of collected data.
5. Interpretation of data.
9. ⢠Data- different observations of statistical analysis and interpretation
⢠Frequency distribution: It is a statistical method for summarizing
the data.
⢠A statistical data is arranged in groups according to conveniently
established division of range of the observation. that frequencies are
listed in a table is known as âfrequency distribution/tableâ.
⢠Frequency distribution is a series when a number of observations
with similar or closely related values are put in separate groups
10. Objectives of Frequency Distribution
1 To estimate the frequencies of the population
2 To facilitate the analysis of data.
3 To facilitate computation of various statistical measures.
⢠In a frequency distribution raw data is presented by distinct groups
which are known as classes
Components of frequency distribution:
Class : Groups according to size of data.
⢠Class limit: The smallest and largest possible measurements in each
class. lower limit and upper limit
11. Class mark- It is also known as middle value.
Class mark = ½(Lower limit+ Upper limit)
Class interval = (Upper limit- Lower limit)
Class Frequency -The number of observations falling in
each class.
Tally mark-Strokes against each frequency observed.
12. x Frequency Tally
Marks
10-20 2 11
20-30 5 1111
30-40 5 1111
40-50 4 1111
Classes
Class limit
Lower limit 40
Upper limit 50
Class mark
½(lower +upper)
½(40+50)
0.5*90=45
13. Frequency distribution types
1.Discrete or Ungrouped Frequency distribution These
dataâs not arranged in group, these are individual series and
arranging in ascending order.
No continuity from one class to another
Number of times particular value is repeated which is called the frequency of
that class
Exact measurements of units is clearly mentioned
There is a definite difference between the variables of different groups of
items
14. Example:
From the following, make a ungrouped frequency distribution.
11,12,5,3,11,13,17,13,5,5,11,5
X Frequency Tally
Marks
3 1 1
5 4 1111
11 3 111
12 1 1
13 2 11
17 1 1
15. 2. Grouped frequency distribution- It is based on classes, forming
frequency distribution table.
Example:
From the following data construct a grouped frequency
distribution.
3,8,5,2,15,16,13,12,10,19,18,11
The class intervals theoretically continue from the beginning of the
frequency distribution to the end with out break
17. ⢠Types of class intervals
⢠Exclusive method: the upper limit of one class will be lower limit of
another class
⢠Inclusive method;
⢠Overlapping is avoided, both the upper and lower limits are included
in the class interval
18. ⢠Open end classes: A class limit is missing at the lower end of the first
class interval or at the upper end of the last class interval or both are
not specified
⢠Situation arises in number of practical situations- economics, medical
data when there are few very high or few very low values which are
far apart from majority of observations
19. Range: The difference between largest and smallest value denoted by
R
R= Largest value- smallest value
R=L-S
Mid value: The central point of a class interval is called the mid value
or midpoint
It is calculated by adding the upper and lower limits of a class and
dividing by 2
Mid value= L+U
2
20.
21. Number of class intervals: It should not be many
⢠For any ideal frequency distribution, the number of class intervals can vary
from 5 to 15
⢠The difference between lower and upper limits help to fix number of class
intervals
Sturges rule
⢠K= 1+3.322Log10 N
⢠N= Total number of observations
⢠K= number of class intervals
⢠If number of observations =10, then
⢠K= 1+ 3.322Log10= 4.322=4
22.
23. ⢠Cumulative frequency distribution: It shows the number of data items
with values less than or equal to the upper class limit of each class
⢠Cumulative relative frequency distribution gives the proportion of the
data items
⢠cumulative percentage frequency distribution shows the percentage
of data items with values less than or equal to the upper class limit of
each class
24.
25. Measures of central tendency
⢠It is Known as measure of central value or measure of location.
⢠It is a statistical measure and calculates the location or position of
central point to explain the central tendency of the whole quality of
data
⢠Averages are the values which lie between the smallest and the largest
observations
⢠Averages are also known as measures of central tendency
26. Importance of central tendency
⢠To find representative value: gives us one value for the distribution
and the value represents the entire distribution
⢠To condense data
⢠To make comparisons: comparing two or more distributions
⢠Helpful in further statistical analysis
⢠Calculating other statistical measures like dispersion(Statistical
dispersion means the extent to which numerical data is likely to vary
about an average value)
27. Properties of good measures of central tendency
⢠It should be rigidly defined
⢠Easy to understand and calculate
⢠Remain unaffected by the extreme values
⢠Capable of being used in further statistical computation
⢠Based on all items in the series
28. Various measures of central tendency are
⢠Arithmetic mean
⢠Median
⢠Mode
⢠Geometric mean
⢠Harmonic mean
29. ⢠Geometric mean is defined as the nth root of the product of n
numbers
⢠where n is the total number of data values.
30.
31. ⢠The Harmonic Mean (HM) : defined as the reciprocal of the average of
the reciprocals of the data values..
⢠It is based on all the observations, and it is rigidly defined.
⢠Harmonic mean gives less weightage to the large values and large
weightage to the small values to balance the values correctly.
⢠In general, the harmonic mean is used when there is a necessity to
give greater weight to the smaller items.
⢠It is applied in the case of times and average rates.
32. ⢠Different central values are classified as given below
Mathematical average
When all the values of items in series are considered while
taking average - Mean, Geometric mean, harmonic mean
Position average â Average depends on the position of the items rather
then values of the items. median, mode, percentiles
33.
34. Applications of AM
o Standard deviation and variance can be calculated.
o Correlations and regressions analysis uses mean.
o In bioequivalence studies, mean (e.g. AUC and cmax) and residual error are
determined.
o Material attributes (size of particles) and product properties are expressed by mean,
e.g. Mean dissolution, mean weight of product, mean disintegration time, mean
content uniformity, mean assay, mean potency, etc.
35. Merits of mean:
It considers all observations
can be used for comparisons
Simple to calculate and understand
can be used in algebraic calculations
no need of sorting or arrangement (ascending and descending order)
It is stable and not affected by the variation of sampling
36. Limitations of Arithmetic Mean
⢠The arithmetic mean is:
1) Very much affected by extreme values.
2) Not determined by inspection and computation is essential.
3) Not suitable to evaluate qualitative data (non-numerical).
4) Not an appropriate measure, in case of skewed distribution
5) Not applicable to nominal or categorical data (e.g. Stages of cancer), results do
not give meaningful conclusions.
37. Characteristics of Arithmetic Mean
⢠A good average is defined:
1)No scope for different interpretations.
2)Not affected by extreme values or fluctuations.
3)Should possess sampling stability.
4)Capable of being used for comparison statistically.
5)Easy to calculate and understand.
38. Method of Calculation of Mean
1. Calculation of Arithmetic .Mean- Individual series
2. Calculation of Arithmetic .Mean- Discrete series (ungrouped data)
3. Calculation of Arithmetic .Mean- Continuous series (grouped data)
43. I. Calculation of Mean - Individual series of data
Prob) The hardness of 6 tablets is measured (kg/cm2) and given below.
Hardness, kg/cm2 5.2 4.8 5.4 5 4.6 5.2
Sol) Sum of observations: ďx = 5.2+4.8+5.4+5.0+4.6+5.2 = 30.2 kg/cm2
Number of observations: n = 6
44.
45.
46.
47. Prob) Tablets (samples) are taken from a batch and weighed. The
weights of tablets are nearer to each other, having frequencies.
Calculate the mean weight of the tablets for the following data.
51. ⢠Prob) The particle sizes (in a powder) are measured using the
microscopic method. The experimental data are reported in the table
given below. Find the mean particle size using the direct method.
54. Merits of median
⢠Easily defined and understood
⢠Evaluated by using graphical methods
⢠Useful in open end classes
⢠Applied in unequal distributions
Demerits
⢠Unsuitable for large and small items in a series
⢠Not based on all of the observations (positional average)
⢠It is difficult to determine incase of even number of observations
⢠Gets affected by sampling fluctuations more than that of mean
55. Applications
can be used to understand the features of a data set when
Observations are qualitative in nature
Extreme points are present in the data set
A fast estimate of an average
56. ⢠Method of Calculation of Median
1. Calculation of median- Individual series
2. Calculation of median- Discrete series (ungrouped data)
3. Calculation of median- Continuous series (grouped data)
57.
58.
59.
60. When n is an even number: as the observations are even, it is difficult to locate the central point,
median. Two middle values will be considered to estimate the median and mean. The disintegration
times (in seconds) of 6 six tablets are given below.
61.
62.
63. Illustration: the median cmax value is calculated from the data of cmax,
from bioequivalence studies of a drug formulation (given below).
The given data are arranged and cumulative frequency is obtained.
64. In the above table, 22 is the term that first appeared in the row (having the value of 135
ďg/mL). The median Cmax = 135 ďg/mL.
65.
66.
67.
68. Illustration: the particle size distribution data of tablets (in a sample) is considered, along with
the number of particles (frequency). The median size particle is calculated as follows.
The size range is in continuous distribution and the interval is uniform. The given data are
arranged to get the cumulative frequency
69. Size range x(Âľ m) Frequency(f) Cumulative frequency (cf) Observation
20-30 3 3
30-40 5 8 Cumulative frequency of
the preceding median
class (c )
40-50 (median class) 20 ( f) 28 Cumulative frequency of
the median class
50-60 10 38
60-70 5 43
Ôf = 43 or n= 43
n/2= 43/2=21.5
70. ⢠From the table, it is observed that the median size should lie between 40-50 ďm (not
necessarily the middle point, because preceding cumulative frequency is also
considered for computation). The exact median is estimated using following equation
⢠Data: L = 40 ďm; n = 43; c.f = 8; f = 20; i = 10 ďm;
Md = ?
71. ⢠Mode : Defined as an observations that occur most frequently in the data
⢠Used in case of nominal scales
Merits
⢠Simple and accurate
⢠Applied in open end distributions
⢠Can be identified by merely examining the data and its computation is easier
⢠Gets moderately affected by the items
⢠Best reprentative data as it is associated with highest frequencies
Demerits
⢠In Bimodal distribution, mode value cannot be determined
⢠Based on only fewer observations