2. Statistics
the practice or science of collecting and
analysing numerical data in large quantities,
especially for the purpose of inferring proportions
in a whole from those in a representative sample.
used to communicate research findings and to
support hypotheses and give credibility to research
methodology and conclusions.
4. Example 1: Is the lipase concentration
significantly different among the various fruits?
Fruit samples 1st Sample
2nd Sample
3rd Sample
Average
lipase
concentration
Lime
0.564
0.585
0.606
0.585
Lemon
0.104
0.101
0.107
0.104
Grapefruit
0.182
0.183
0.181
0.182
Avocado
0.415
0.637
0.550
0.534
Peanut
0.182
0.328
0.405
0.367
5. 0.8
Average lipase concentration U/100uL
Average lipase concentration in various fruits
0.7
0.6
0.585
0.5
No significant
difference between the
average lipase
concentration of lime
and avocado!!
0.534
No observable
difference between the
average lipase
concentration of lime
and avocado
0.4
0.3
0.367
0.182
0.2
0.104
0.1
0
Lime
Lemon
Grapefruit
Fruits
Avocado
Peanut
7. Error Bars
Overlap – no observable difference
Overlap – no significant difference if
inferential stats is used
No overlap – observable difference
No overlap – significant difference is
inferential stats is used
8. Example 2: Is the average distance
travelled by the shuttlecock
significantly different among the
various shots?
Trials
Set 1
Set 2
Set 3
Set 4
Set 5
Set 6
Average
Shot 1
4.921
4.698
4.598
4.822
5.171
5.096
4.884
Shot 2
4.879
4.772
4.772
4.787
4.808
4.596
4.769
Shot 3
4.483
4.536
4.565
4.430
4.760
4.594
4.561
Shot 4
4.392
4.268
4.096
4.162
4.388
4.462
4.295
Shot 5
4.180
4.122
4.142
4.092
4.238
3.712
4.081
Shot 6
3.612
3.698
3.612
3.962
3.788
3.928
3.767
9. 6
5
Average distance travelled by the shuttlecock for
each of the six shots
4.884
4.769
4.561
Average distance (m)
4.295
4.081
4
3.767
3
2
1
0
Shot 1
Shot 2
Shot 3
Shot 4
SHOTS
Shot 5
Shot 6
10. Student’s Conclusion:
There is a significant difference?? in the
average distance travelled by the
shuttlecock among the six shots.
13. The P-Value approach
P-Value – the probability of obtaining a
value which is different from what is
being hypothesized.
The smaller the P-Value, the more likely
the results are statistically significant.
14. So…what is the P-Value for a
statistically significant result?
Generally……
P < 0.05 (Results are statistically
significant)
P < 0.001 (Results are extremely
statistically significant)
15. Example 3: Is there a significant
difference in the absorbance of
reaction mixture of papain at various
concentration?
Concentrations
of Papain (%)
Absorbance readings
2
0.51
0.49
0.42
0.35
0.42
0.44
0.53
0.31
10
0.52
5
0.40
0.41
0.36
0.21
0.21
0.33
P = 0.04
There is a significant difference in the average
absorbance among the three concentrations of Papain.
16. Various Statistical tools for
generating P-Values.
Group
comparisons
Statistical
Analyses
Establishing linear
relationships
between variables
17. Various Statistical tools for
generating P-Values (I)
Sample size n
= 5 - 15
Mann-Whitney
U-Test
Sample size n
> 15
T-Test
Sample size n
= 5 - 15
Kruskal-Wallis
K-Test
2 groups
Group
comparisons
More than 2
groups
Sample size n
> 15
ANOVA
Post hoc test:
Multiple
Comparisons
18. Example 4:
An experiment was
conducted to find out if the
survival of E.Coli differed
between those grown using
brass and glass pots.
Since there are two groups
to be compared and n = 9,
use the Mann-Whitney Test
Results: P > 0.05
There is no significant
difference in the average
number of bacterial colonies
between the two samples
Number of
bacterial
colonies in each
brass pot
Number of
bacterial
colonies
present in each
glass pot
405
412
310
231
196
89
63
567
167
134
312
253
675
423
465
134
78
231
19. Example 5:
Experiment to find out if temperature
readings differ among the various layers.
Since n < 5 for each group, non of our
statistical tools is appropriate for the
analysis.
20. Example 6:
Experiment to find out if the mean concentration
of ethanol produced differed significantly between
the two methods.
If n > 15 for both groups, use T-Test set at α = 5%
21. Example 7:
Experiment
to find out if the mean
amount of ion adsorbed by mango
peels differed significantly among the
three groups.
3 T-Tests??
T-Test
P = 0.00005
T-Test
P = 0.00003
T-Test
P = 0.00379
22. Example 7:
For comparing more than 2 means with
n > 15 for each treatment group, use
ANOVA.
DO NOT USE MULTIPLE T-TESTS as
the error rate gets INFLATED!!
If ANOVA shows a significant difference
in the means among the groups, use
Tukey’s Multiple Comparisons to
determine where the difference lies.
23. Example 8:
Experiment to determine if there is a significant
difference in the average acid concentration
among the four preparations.
Preparation A
Preparation B
Preparation C
Preparation D
0.45
0.35
0.24
0.34
0.35
0.56
0.12
0.56
0.46
0.24
0.13
0.53
0.24
0.56
0.17
0.43
0.56
0.24
0.45
0.21
Comparing averages among three or more
groups with 5 ≤ n ≤ 15 for each group.
Kruskal Wallis Test
24. Various Statistical tools for
generating P-Values (II)
Establishing linear
relationships
between variables
Functional
dependence of one
variable on another
Simple linear
regression
Non dependence
between variables
Simple linear
correlation
25. Simple Linear Regression
Two variables
One variable (dependent/response variable)
depends on the other (independent/predictor
variable)
Represented by
scatterplots
Reported with
r2 and P-value
26. 2
r
and P-value in regression
analysis
– coefficient of determination
Measures how much of the variation in
the dependent variable is due to the
independent variable.
0% ≤ r2 ≤ 100%
r2
27. 2
r
and P-value in regression
analysis
P-Value – the probability of obtaining the
slope of the regression line if the actual
slope is zero.
Always report
n= 5
r = 0.80
2 and
both r
P-value.
2
Sample
slope
Population
slope
28. Simple Linear Correlation
Two variables
Neither of the two is functionally
dependent on the other
Represented by scatterplots
r (pearson correlation coefficient) –
measures the strength of linear
relationship between two variables.
Always report both r and P-value
29. Guidelines to interpreting r
Coefficient, r
Strength of Association
Positive
Negative
Small
.1 to .3
-0.1 to -0.3
Medium
.3 to .5
-0.3 to -0.5
Large
.5 to 1.0
-0.5 to -1.0
31. Example 10:
Experiment to find out if there is a significant
correlation between percentage of DPPH
reacted and concentration of fruit peel extract.
•P-Value?
•Scatterplot?
32. 1. The Don’ts……………………
For n < 5, DO NOT analyze your data
with inferential statistics.
E.g. Trying to determine if the amount of
heavy metal ion removed differed
among the three methods
Concentration of
heavy metal ion
removed
Method 1
Method 2
Method 3
0.421
0.324
0.534
0.521
0.512
0.342
0.654
0.526
0.523
33. 2. The Don’ts………………
When no statistical analysis is being
performed on the data sets, refrain from
using the word ‘Significant’!
You can however claim that ‘there is an
observable difference…’
34. 3. The Don’ts…………………
Data analyses DO NOT PROVE
hypotheses.
The results either support or do not
support the hypotheses.
Refrain from using the word ‘Prove’ or
Discover!!
35. 3. The Don’ts…………………
Do not attempt to analyze too many variables at
the same time!
Analyses of multiple variables at the same time
Multivariate Statistical Analyses!!
36. The Dos…………
Decide on the appropriate significance
level before statistical analyses (e.g.
5%)
Always factor in the appropriate
statistical tool for analyzing your data at
the planning stage
Always report your significance level
and P-value!
Consult your treachers or Mr Law if you
have any queries