Basic data analyses skills for science research

The Dos and Don’ts!!

Prepared by Law HL

Statistics


the practice or science of collecting and
analysing numerical data in large quantities,
especially for the purpose of inferring proportions
in a whole from those in a representative sample.



used to communicate research findings and to
support hypotheses and give credibility to research
methodology and conclusions.

Example 1: Is the lipase concentration
significantly different among the various fruits?
Fruit samples 1st Sample

2nd Sample

3rd Sample

Average
lipase
concentration

Lime

0.564

0.585

0.606

0.585

Lemon

0.104

0.101

0.107

0.104

Grapefruit

0.182

0.183

0.181

0.182

Avocado

0.415

0.637

0.550

0.534

Peanut

0.182

0.328

0.405

0.367

0.8
Average lipase concentration U/100uL

Average lipase concentration in various fruits
0.7
0.6

0.585

0.5

No significant
difference between the
average lipase
concentration of lime
and avocado!!

0.534

No observable
difference between the
average lipase
concentration of lime
and avocado

0.4

0.3

0.367

0.182

0.2

0.104

0.1

0
Lime

Lemon

Grapefruit
Fruits

Avocado

Peanut

Student’s Conclusion:


Lime has a significantly higher ?? lipase
concentration than the other fruit
samples.

Error Bars
Overlap – no observable difference
 Overlap – no significant difference if
inferential stats is used
 No overlap – observable difference
 No overlap – significant difference is
inferential stats is used


Example 2: Is the average distance
travelled by the shuttlecock
significantly different among the
various shots?
Trials

Set 1

Set 2

Set 3

Set 4

Set 5

Set 6

Average

Shot 1

4.921

4.698

4.598

4.822

5.171

5.096

4.884

Shot 2

4.879

4.772

4.772

4.787

4.808

4.596

4.769

Shot 3

4.483

4.536

4.565

4.430

4.760

4.594

4.561

Shot 4

4.392

4.268

4.096

4.162

4.388

4.462

4.295

Shot 5

4.180

4.122

4.142

4.092

4.238

3.712

4.081

Shot 6

3.612

3.698

3.612

3.962

3.788

3.928

3.767

6

5

Average distance travelled by the shuttlecock for
each of the six shots
4.884

4.769

4.561

Average distance (m)

4.295

4.081

4

3.767

3

2

1

0

Shot 1

Shot 2

Shot 3
Shot 4
SHOTS

Shot 5

Shot 6

Student’s Conclusion:


There is a significant difference?? in the
average distance travelled by the
shuttlecock among the six shots.

Statistical Significance


The results observed that are due to
REAL treatment effects and NOT due to
Chance.

The P-Value approach
P-Value – the probability of obtaining a
value which is different from what is
being hypothesized.
 The smaller the P-Value, the more likely
the results are statistically significant.


So…what is the P-Value for a
statistically significant result?
Generally……
 P < 0.05 (Results are statistically
significant)
 P < 0.001 (Results are extremely
statistically significant)


Example 3: Is there a significant
difference in the absorbance of
reaction mixture of papain at various
concentration?
Concentrations
of Papain (%)

Absorbance readings

2

0.51

0.49

0.42

0.35

0.42

0.44

0.53

0.31

10



0.52

5



0.40
0.41

0.36

0.21

0.21

0.33

P = 0.04
There is a significant difference in the average
absorbance among the three concentrations of Papain.

Various Statistical tools for
generating P-Values.
Group
comparisons
Statistical
Analyses
Establishing linear
relationships
between variables

generating P-Values (I)
Sample size n
= 5 - 15

Mann-Whitney
U-Test

Sample size n
> 15

T-Test

Sample size n
= 5 - 15

Kruskal-Wallis
K-Test

2 groups

Group
comparisons
More than 2
groups
Sample size n
> 15

ANOVA

Post hoc test:
Multiple
Comparisons

Example 4:
An experiment was
conducted to find out if the
survival of E.Coli differed
between those grown using
brass and glass pots.
 Since there are two groups
to be compared and n = 9,
use the Mann-Whitney Test
 Results: P > 0.05
 There is no significant
difference in the average
number of bacterial colonies
between the two samples


Number of
bacterial
colonies in each
brass pot

Number of
bacterial
colonies
present in each
glass pot

405

412

310

231

196

89

63

567

167

134

312

253

675

423

465

134

78

231

Example 5:


Experiment to find out if temperature
readings differ among the various layers.



Since n < 5 for each group, non of our
statistical tools is appropriate for the
analysis.

Example 6:


Experiment to find out if the mean concentration
of ethanol produced differed significantly between
the two methods.



If n > 15 for both groups, use T-Test set at α = 5%

Example 7:
 Experiment

to find out if the mean
amount of ion adsorbed by mango
peels differed significantly among the
three groups.
 3 T-Tests??

T-Test
P = 0.00005

T-Test
P = 0.00003

T-Test
P = 0.00379

Example 7:
For comparing more than 2 means with
n > 15 for each treatment group, use
ANOVA.
 DO NOT USE MULTIPLE T-TESTS as
the error rate gets INFLATED!!
 If ANOVA shows a significant difference
in the means among the groups, use
Tukey’s Multiple Comparisons to
determine where the difference lies.


Example 8:


Experiment to determine if there is a significant
difference in the average acid concentration
among the four preparations.
Preparation A

Preparation B

Preparation C

Preparation D

0.45

0.35

0.24

0.34

0.35

0.56

0.12

0.56

0.46

0.24

0.13

0.53

0.24

0.56

0.17

0.43

0.56

0.24

0.45

0.21

Comparing averages among three or more
groups with 5 ≤ n ≤ 15 for each group.
 Kruskal Wallis Test


generating P-Values (II)

Establishing linear
relationships
between variables

Functional
dependence of one
variable on another

Simple linear
regression

Non dependence
between variables

Simple linear
correlation

Simple Linear Regression
Two variables
 One variable (dependent/response variable)
depends on the other (independent/predictor
variable)
 Represented by
scatterplots
 Reported with
r2 and P-value


2
r

and P-value in regression
analysis

– coefficient of determination
 Measures how much of the variation in
the dependent variable is due to the
independent variable.
 0% ≤ r2 ≤ 100%
 r2

2
r

and P-value in regression
analysis

P-Value – the probability of obtaining the
slope of the regression line if the actual
slope is zero.
 Always report
n= 5
r = 0.80
2 and
both r
P-value.


2

Sample
slope

Population
slope

Simple Linear Correlation
Two variables
 Neither of the two is functionally
dependent on the other
 Represented by scatterplots
 r (pearson correlation coefficient) –
measures the strength of linear
relationship between two variables.
 Always report both r and P-value


Guidelines to interpreting r
Coefficient, r

Strength of Association

Positive

Negative

Small

.1 to .3

-0.1 to -0.3

Medium

.3 to .5

-0.3 to -0.5

Large

.5 to 1.0

-0.5 to -1.0

Caution……………………..


It is not appropriate to analyze a nonlinear relationship using Pearson
correlation coefficient

Example 10:


Experiment to find out if there is a significant
correlation between percentage of DPPH
reacted and concentration of fruit peel extract.
•P-Value?
•Scatterplot?

1. The Don’ts……………………
For n < 5, DO NOT analyze your data
with inferential statistics.
 E.g. Trying to determine if the amount of
heavy metal ion removed differed
among the three methods


Concentration of
heavy metal ion
removed

Method 1

Method 2

Method 3

0.421

0.324

0.534

0.521

0.512

0.342

0.654

0.526

0.523

2. The Don’ts………………
When no statistical analysis is being
performed on the data sets, refrain from
using the word ‘Significant’!
 You can however claim that ‘there is an
observable difference…’


3. The Don’ts…………………
Data analyses DO NOT PROVE
hypotheses.
 The results either support or do not
support the hypotheses.
 Refrain from using the word ‘Prove’ or
Discover!!


3. The Don’ts…………………


Do not attempt to analyze too many variables at
the same time!



Analyses of multiple variables at the same time
 Multivariate Statistical Analyses!!

The Dos…………
Decide on the appropriate significance
level before statistical analyses (e.g.
5%)
 Always factor in the appropriate
statistical tool for analyzing your data at
the planning stage
 Always report your significance level
and P-value!
 Consult your treachers or Mr Law if you
have any queries


Basic data analyses skills for science research

Basic data analyses skills for science research

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Basic data analyses skills for science research

Similar to Basic data analyses skills for science research (20)

Recently uploaded

Recently uploaded (20)

Basic data analyses skills for science research