STATISTICAL VALIDATION METHOD
VIZ. STATISTICAL TREATMENT OF
FINITE SAMPLE
By:
Sachin kumar
M.Pharm. (Pharmacology)
Deptt. of Pharma. Sciences
M.D.U. Rohtak, 124001
CONTENTS
• INTRODUCTION
• ANALYSIS OF DATA
1. MEASURES OF CENTRAL TENDENCY
2. MEASURES OF DISPERSION
3. SKEWNESS
4. CORRELATION
5. REGRESSION
• TEST OF SIGNIFICANCE
1. T-TEST
2. F-TEST
3. ANOVA
• REFERENCES
INTRODUCTION
• Biostatistics :- It is defined as the application of
statistical method to the data derived from
biological sciences.
• Statistics :- It is the collection of methods used in
planning an experiment and analyzing data in order
to draw accurate conclusions.
- It include collection, organization,
presentation, analysis and interpretation of
numerical data.
• Data :- Facts or figures from which conclusion can
be drawn.
- It may be qualitative or quantitative.
ANALYSIS OF DATA
• Analysis can be done through different
statistical techniques:-
1. Measures of central tendency
2. Measures of dispersion
3. Skewness
4. Correlation
5. Regression
1. MEASURES OF CENTRAL
TENDENCY
• The observation of set of data exibit a tendency
to cluster around a specific value. This
characterstic of data is central tendency.
• The value around which individual observation
are clustered is called central value.
• Three main measures of central tendency.
1. Mean
2. Median
3. Mode
• MEAN- It is the mathematical average denoted
by x-bar.
(a) Arithmetic mean (simple mean)-
for ungrouped data-
for grouped data-
(b) Geometric mean-
(c ) Harmonic mean-
• Median- Central value when arranging in ascending
or descending order. Denoted by ‘M’.
for ungrouped data-
n is odd→ M = [(n+1)/2]th value
n is even →
M = [ (n/2)th value + (n/2 +1)th value ] /2
for grouped data-
M = L+ [ (n/2-F)/f ] x c
L= lower limit of median class
F= frequency of the class preceding
the median class
c= width of the median interval
MODE- Most commonly occuring value
-for ungrouped data-
which occurs maximum no. of times
-for grouped data-
mode= L+ [( f₁-f₀) / 2f₁- f₀-f₂]x c
L= lower limit of mode class
f₀= frequency of class preceding the m.c.
f₁ = frequency of class succeeding the m.c.
f₂= frequency of mode class
c= width of mode class
mode class= class which have maximum frequency
2. MEASURES OF DISPERSION
• For comparing two set of data sets, we require a
measures of dispersion. Dispersion indicate the
extent to which a distribution is squeezed.
• There are five main measures of dispersion:
- Range
- Interquartile range
- Mean deviation
- standard deviation
- variance
• RANGE- It is the simplest measure of dispersion.
Range= L-S
L= largest observation
S= smallest observation
• INTERQUARTILE RANGE- Problem with range
such as instability from one sample to another or
when added new sample. So we calculate I.R.
I.R.= Q₃-Q₁
Q₁ = first quartile
Q₂= second quartile
Q₃= third quartile
• MEAN DEVIATION- The average absolute deviation
from the central value of a data set is called mean
deviation.
-For grouped data-
M.D. about mean = (∑|xᵢ-x̅|) /n
M.D. about median = (∑|xᵢ-M|) /n
M.D. about mode = (∑|xᵢ-Z|) /n
- For grouped data-
M.D. about mean = (∑fᵢ|xᵢ-x̅|) / ∑fᵢ
M.D. about median = (∑fᵢ|xᵢ-M|) / ∑fᵢ
M.D. about mode = (∑fᵢ|xᵢ-Z|) / ∑fᵢ
• STANDARD DEVIATION- It tell us how much
scores deviate from the mean. Denoted by sigma
or S.
Standard error of mean(SEM)= S/ n
• VARIANCE- It tell us how far a set of numbers
are spread out from their mean.
-Variance is the square root of standard
deviation.
3. SKEWNESS
• It is the measure of degree of asymmetry of the
distribution.
(a) Symmetric- Mean, median, mode are the
same.
(b) Skewed left- Mean to the left of the median,
long tail on the left.
(c) Skewed right- Mean to the right of the
median, long tail on the left.
• Coefficient of Skewness = (mean-mode)/ S.D
4. CORRELATION
• In correlation we study the degree of relationship
between two variables.
- Types of correlation:
(a) positive or negative correlation
(b) simple or multiple correlation
• Correlation coefficient- It is a measure of
correlation . Denoted by ‘r’.
when r=1 (+ve correlation)
when r= -1 (-ve correlation)
when r=0 (no correlation)
5. REGRESSION
• It is the functional relationship between two
variable.
-We take variable whose values are known as
independent variable and the variable whose
values are to predicted as the dependent
variable.
Line of regression of Y on X- It is used for
estimation of the variable Y for a give value of
the variable X.
X= Independent variable
Y= Dependent variable
Line of regression of X on Y- It is used for
estimation of the variable X for a give value of
the variable Y.
Y= Independent variable
X= Dependent variable
Regression coefficient- It is measure or regression.
Denoted by ‘b’.
bxy(X on Y) = ( n∑xy - ∑x∑y )/ n∑y²-(∑y)²
byx(Y on X) = ( n∑xy - ∑x∑y )/ n∑x²-(∑x)²
TEST OF SIGNIFICANCE
• It is the formal procedure for comparing
observed data with a claim (also called a
hypothesis) whose truth we want to assess.
1. T-TEST
• Two types of t-test
(a) Unpaired t-test
(b) Paired t-test
Unpaired t-test- If there is no link between the
data. Data is independent.
- Testing the significance of single mean-
- Testing the significance of difference between
two mean-
- Degree of freedom= n₁ +n₂ -2
• PAIRED T-TEST- When the two samples were
dependent. Two samples are said to be
dependent when the observation in one sample is
related to those in other.
- When the samples are dependent, they have equal
sample size.
2. F-TEST
• Used to compare the precision of two set of data.
2. ANOVA
• Developed by Sir Ronald A. Fisher in 1920.
• A statistical technique specially designed to the
test whether the means of more than two
quantitative population are equal.
• Types of ANOVA
(a) One way ANOVA
(b) Two way ANOVA
-ONE WAY ANOVA- There is only one factor or
independent variable.
-TWO WAY ANOVA- There are two independent
variable.
ONE WAY ANOVA
• Suppose we have three different groups.
• There are 5 steps:
1. Hypothesis- Two hypothesis.
Null hypothesis H₀ = All mean are equal.
Alternate hypothesis = At least one difference
among the mean
Group- A Group- B Group-c
1 2 2
2 4 3
5 2 4
2. Calculate degree of freedom(d.f)-
Between the group= k-1
k= No. of level
3-1= 2
With in the group= N-k
N= total no. of observation
9-3= 6
Total d.f.= 8
F- critical value- 5.14
3. Sum of squared deviation from mean-
Calculate mean - X̅ᴀ= 2.67
X̅ʙ= 2.67
X̅ᴄ= 3.00
Grand mean X= sum of all observation/ total no.
of observation
25/9= 2.78
- Total sum of square= ∑(X-X̅)²
= 13.6
- Sum of square with in the group=
∑(Xᴀ-X̅ᴀ)² + ∑(Xʙ-Xʙ̅)² + ∑(Xᴄ-X̅ᴄ)² = 13.37
- Sum of square between the group =
total S.S. - S.S. with in the group
13.6-13.37= 0.23
4. Calculate variance-
between the group= S.S. between group/ d.f.
between group
- .23/2 = 0.12
with in the group= S.S. with in the group/ d.f.
with in the group
- 13.34/6= 2.22
5. F-value-
variance between the group/ variance
with in the group
0.12/2.22= 0.5
RESULT- 0.5< 5.14
we fail to reject null hypothesis. Hence
there is no significant between these
three groups.
REFRENCES
Mendham J, Denny RC, Barnes JD, Thomas M,
Sivasanker B. Vogel’s textbook of quantitative
chemical analysis. 6th ed. Delhi: Pearson Education
Ltd; 2000: 110-133.
Patel GC, Jani GK. Basic biostatistics for pharmacy.
2nd ed. Ahemdabad: Atul Parkashan; 2007-2008.
Manikandan S. Measures of central tendency:
Median and mode. J Pharmacol Pharmacother.
2011: 2(3): 214-215.
KNOWLEDGE NOT SHARE, IS WASTED.
- CLAN JACOBS

Biostat.

  • 1.
    STATISTICAL VALIDATION METHOD VIZ.STATISTICAL TREATMENT OF FINITE SAMPLE By: Sachin kumar M.Pharm. (Pharmacology) Deptt. of Pharma. Sciences M.D.U. Rohtak, 124001
  • 2.
    CONTENTS • INTRODUCTION • ANALYSISOF DATA 1. MEASURES OF CENTRAL TENDENCY 2. MEASURES OF DISPERSION 3. SKEWNESS 4. CORRELATION 5. REGRESSION • TEST OF SIGNIFICANCE 1. T-TEST 2. F-TEST 3. ANOVA • REFERENCES
  • 3.
    INTRODUCTION • Biostatistics :-It is defined as the application of statistical method to the data derived from biological sciences. • Statistics :- It is the collection of methods used in planning an experiment and analyzing data in order to draw accurate conclusions. - It include collection, organization, presentation, analysis and interpretation of numerical data. • Data :- Facts or figures from which conclusion can be drawn. - It may be qualitative or quantitative.
  • 4.
    ANALYSIS OF DATA •Analysis can be done through different statistical techniques:- 1. Measures of central tendency 2. Measures of dispersion 3. Skewness 4. Correlation 5. Regression
  • 5.
    1. MEASURES OFCENTRAL TENDENCY • The observation of set of data exibit a tendency to cluster around a specific value. This characterstic of data is central tendency. • The value around which individual observation are clustered is called central value. • Three main measures of central tendency. 1. Mean 2. Median 3. Mode
  • 6.
    • MEAN- Itis the mathematical average denoted by x-bar. (a) Arithmetic mean (simple mean)- for ungrouped data- for grouped data-
  • 7.
    (b) Geometric mean- (c) Harmonic mean-
  • 8.
    • Median- Centralvalue when arranging in ascending or descending order. Denoted by ‘M’. for ungrouped data- n is odd→ M = [(n+1)/2]th value n is even → M = [ (n/2)th value + (n/2 +1)th value ] /2 for grouped data- M = L+ [ (n/2-F)/f ] x c L= lower limit of median class F= frequency of the class preceding the median class c= width of the median interval
  • 9.
    MODE- Most commonlyoccuring value -for ungrouped data- which occurs maximum no. of times -for grouped data- mode= L+ [( f₁-f₀) / 2f₁- f₀-f₂]x c L= lower limit of mode class f₀= frequency of class preceding the m.c. f₁ = frequency of class succeeding the m.c. f₂= frequency of mode class c= width of mode class mode class= class which have maximum frequency
  • 10.
    2. MEASURES OFDISPERSION • For comparing two set of data sets, we require a measures of dispersion. Dispersion indicate the extent to which a distribution is squeezed. • There are five main measures of dispersion: - Range - Interquartile range - Mean deviation - standard deviation - variance
  • 11.
    • RANGE- Itis the simplest measure of dispersion. Range= L-S L= largest observation S= smallest observation • INTERQUARTILE RANGE- Problem with range such as instability from one sample to another or when added new sample. So we calculate I.R. I.R.= Q₃-Q₁ Q₁ = first quartile Q₂= second quartile Q₃= third quartile
  • 12.
    • MEAN DEVIATION-The average absolute deviation from the central value of a data set is called mean deviation. -For grouped data- M.D. about mean = (∑|xᵢ-x̅|) /n M.D. about median = (∑|xᵢ-M|) /n M.D. about mode = (∑|xᵢ-Z|) /n - For grouped data- M.D. about mean = (∑fᵢ|xᵢ-x̅|) / ∑fᵢ M.D. about median = (∑fᵢ|xᵢ-M|) / ∑fᵢ M.D. about mode = (∑fᵢ|xᵢ-Z|) / ∑fᵢ
  • 13.
    • STANDARD DEVIATION-It tell us how much scores deviate from the mean. Denoted by sigma or S. Standard error of mean(SEM)= S/ n
  • 14.
    • VARIANCE- Ittell us how far a set of numbers are spread out from their mean. -Variance is the square root of standard deviation.
  • 15.
    3. SKEWNESS • Itis the measure of degree of asymmetry of the distribution. (a) Symmetric- Mean, median, mode are the same. (b) Skewed left- Mean to the left of the median, long tail on the left. (c) Skewed right- Mean to the right of the median, long tail on the left. • Coefficient of Skewness = (mean-mode)/ S.D
  • 17.
    4. CORRELATION • Incorrelation we study the degree of relationship between two variables. - Types of correlation: (a) positive or negative correlation (b) simple or multiple correlation • Correlation coefficient- It is a measure of correlation . Denoted by ‘r’. when r=1 (+ve correlation) when r= -1 (-ve correlation) when r=0 (no correlation)
  • 19.
    5. REGRESSION • Itis the functional relationship between two variable. -We take variable whose values are known as independent variable and the variable whose values are to predicted as the dependent variable. Line of regression of Y on X- It is used for estimation of the variable Y for a give value of the variable X. X= Independent variable Y= Dependent variable
  • 20.
    Line of regressionof X on Y- It is used for estimation of the variable X for a give value of the variable Y. Y= Independent variable X= Dependent variable Regression coefficient- It is measure or regression. Denoted by ‘b’. bxy(X on Y) = ( n∑xy - ∑x∑y )/ n∑y²-(∑y)² byx(Y on X) = ( n∑xy - ∑x∑y )/ n∑x²-(∑x)²
  • 21.
    TEST OF SIGNIFICANCE •It is the formal procedure for comparing observed data with a claim (also called a hypothesis) whose truth we want to assess.
  • 22.
    1. T-TEST • Twotypes of t-test (a) Unpaired t-test (b) Paired t-test Unpaired t-test- If there is no link between the data. Data is independent. - Testing the significance of single mean-
  • 23.
    - Testing thesignificance of difference between two mean- - Degree of freedom= n₁ +n₂ -2
  • 24.
    • PAIRED T-TEST-When the two samples were dependent. Two samples are said to be dependent when the observation in one sample is related to those in other. - When the samples are dependent, they have equal sample size.
  • 25.
    2. F-TEST • Usedto compare the precision of two set of data.
  • 26.
    2. ANOVA • Developedby Sir Ronald A. Fisher in 1920. • A statistical technique specially designed to the test whether the means of more than two quantitative population are equal. • Types of ANOVA (a) One way ANOVA (b) Two way ANOVA -ONE WAY ANOVA- There is only one factor or independent variable. -TWO WAY ANOVA- There are two independent variable.
  • 27.
    ONE WAY ANOVA •Suppose we have three different groups. • There are 5 steps: 1. Hypothesis- Two hypothesis. Null hypothesis H₀ = All mean are equal. Alternate hypothesis = At least one difference among the mean Group- A Group- B Group-c 1 2 2 2 4 3 5 2 4
  • 28.
    2. Calculate degreeof freedom(d.f)- Between the group= k-1 k= No. of level 3-1= 2 With in the group= N-k N= total no. of observation 9-3= 6 Total d.f.= 8 F- critical value- 5.14
  • 30.
    3. Sum ofsquared deviation from mean- Calculate mean - X̅ᴀ= 2.67 X̅ʙ= 2.67 X̅ᴄ= 3.00 Grand mean X= sum of all observation/ total no. of observation 25/9= 2.78 - Total sum of square= ∑(X-X̅)² = 13.6 - Sum of square with in the group= ∑(Xᴀ-X̅ᴀ)² + ∑(Xʙ-Xʙ̅)² + ∑(Xᴄ-X̅ᴄ)² = 13.37
  • 31.
    - Sum ofsquare between the group = total S.S. - S.S. with in the group 13.6-13.37= 0.23 4. Calculate variance- between the group= S.S. between group/ d.f. between group - .23/2 = 0.12 with in the group= S.S. with in the group/ d.f. with in the group - 13.34/6= 2.22
  • 32.
    5. F-value- variance betweenthe group/ variance with in the group 0.12/2.22= 0.5 RESULT- 0.5< 5.14 we fail to reject null hypothesis. Hence there is no significant between these three groups.
  • 33.
    REFRENCES Mendham J, DennyRC, Barnes JD, Thomas M, Sivasanker B. Vogel’s textbook of quantitative chemical analysis. 6th ed. Delhi: Pearson Education Ltd; 2000: 110-133. Patel GC, Jani GK. Basic biostatistics for pharmacy. 2nd ed. Ahemdabad: Atul Parkashan; 2007-2008. Manikandan S. Measures of central tendency: Median and mode. J Pharmacol Pharmacother. 2011: 2(3): 214-215.
  • 34.
    KNOWLEDGE NOT SHARE,IS WASTED. - CLAN JACOBS