Basic stat tools

BASIC STATISTICAL
TOOLS IN RESEARCH
Mr. Jerome L. Buhay
Mathematics and Statistics Department
DLSU-Dasmariñas

Objectives
At the end of this webinar the participants will be
able to:
• Identify and describe some basic terms in
Statistics
• Differentiate parametric and non-parametric
tests
• Demonstrate the use of different statistical tests
• Interpret statistical result

Basic Terms
1. Population is the set of all individuals or entities
under consideration or study.
2. Variable is a characteristic of interest measurable
in everyone in the population that varies. It may
change from group to group, person to person, or
even within one person over time.
Types of Variables
Qualitative Variable – consists of categories or
attributes, which have non-numerical characteristics.
Quantitative Variable – consists of numbers
representing counts or measurement.

Basic Terms
3. Sample is a part of the population or a sub-
collection of elements drawn from a
population.
4. Parameter is a numerical measurement
describing some characteristic of a population
5. Statistics is a numerical measurement
describing some characteristic of a sample.

Basic Terms
6. Survey is often conducted to gather opinions or
feedback about a variety of topics.
- Census Survey, referred as census, is conducted
to gather information from the entire population.
- Sampling Survey, referred as survey, is
conducted to gather information only from a part of
the population.

Basic Terms
7. Hypothesis is a statement or a tentative theory that
is assumed to be true. Usually tested using sample
data.
Null hypothesis – the null hypothesis is denoted by Ho; it is
the hypothesis of “no difference” and is the hypothesis that is
being tested. -
Alternative hypothesis – the alternative hypothesis is
denoted by Ha or H1. This is the hypothesis that contradicts
the null hypothesis. Is assumed to be true when the Ho is
rejected.

Identify whether the statement is a null or
alternative hypothesis.
▪ Drug X is not effective in treating COVID19.
Ans. Ho
▪ There is a significant difference between the academic performance of
male and female students.
Ans. Ha
▪ The monthly salary of factory workers is dependent of their
educational attainment.
Ans. Ha
▪ There is no a significant relationship between patients age and number
of days of recovery to COVID19.
Ans. Ho
▪ There is no a significant difference among the mathematics
performance of students under different learning modalities?
Ans. Ho

Measurement Scales/Levels
The Nominal Scale
•simply represents qualitative difference in the
variable measured
•can only tell us that difference exists without
the possibility of telling the direction or
magnitude of the difference
•e.g. Program in college, race, gender,
occupation, religion, etc.

The Ordinal Scale
•the categories that make up an ordinal scale
form an ordered sequence
•can tell us the direction of the difference but
not the magnitude
•e.g. coffee cup sizes, socioeconomic class, T-
shirt sizes, food preferences

The Interval Scale
•categories on an interval scale are organized
sequentially, and all categories are numerically
measured
•we can determine the direction and the magnitude
of a difference
•May have an arbitrary zero (convenient point of
reference) but has no true zero point
e.g. temperature in Fahrenheit, time in seconds

The Ratio Scale
•consists of equal, ordered categories anchored by a
zero point that is not arbitrary but meaningful
(representing absence of a variable
•allows us to determine the direction, the magnitude,
and the ratio of the difference
•e.g. reaction time, number of errors on a test, scores
in a test, speed of cars, weight loss, etc

Classification of Data Analytic Methods
Dependence Method
The dependence methods test for the presence of
or absence of relationship between two sets of
variables – the dependent and independent
variables. Common dependence methods are t-test,
ANOVA, ANCOVA, regression analysis, chi-
square test, MANOVA, discriminant analysis and,
logistic regression.

Interdependence methods
When data sets do exist for which it is impossible
to conceptually designate one set of variables as
dependent and another set of variables as
independent. For these types of data sets the
objectives are to identify how and why the
variables are related among themselves. Common
examples are correlation analysis, principal
component analysis, and factor analysis.
Classification of Data Analytic Methods

Relationships of Variables
Dependency
Independent
Variables
Demographic
Profiles
•Age
•Gender
•Family Income
•Educational
Attainment
DependentVariables
•Level of Awareness
•Level of Satisfaction
•Level of Performance

Relationships of Variables
Interdependency
•Level of Awareness
•Level of Satisfaction
•Level Knowledge
•Level of Performance
•Level of Compliance

Parametric VS Non-Parametric
Test
Parametric Tests Non-Parametric
•Independent Observations
•Normal Distribution
•Interval / Ratio Scale Data
•Independent Observations
•Easy to use and understand
•Free Distribution
•Ordinal/Nominal Scale Data

Interpreting Statistical Result
Important Terms
✓ The test statistic is a value computed from the sample data,
and it is used in making the decision about the rejection of
the null hypothesis.
✓ The critical region (or rejection region) is the set of all
values of the test statistic that cause us to reject the null
hypothesis. It is decided by Critical Value.
✓ The significance level (denoted by ) is the probability that
the test statistic will fall in the critical region when the null
hypothesis is actually true. Common choices for  are 0.05,
0.01, and 0.10.

➢ The statement of the problem/hypothesis is the
basis for interpreting results.
➢ The null hypothesis is either rejected or not to be
rejected
➢ Significant result is met when the null hypothesis
is rejected. Not significant when the null
hypothesis is not rejected.

Significance can mean any of the following:
– There is a relationship.
– There is an association between or among
variables.
– There is an effect.
– The treatment is effective.
– A variable is dependent on the other variable/s.
– There is a difference/different effect.

Question:
– When and how do you reject or fail to reject
the null hypothesis?
– When do we say that the result is Significant?

Traditional method
➢ Reject H0 if the test statistic falls within the critical region.
➢ Fail to reject H0 if the test statistic does not fall within the
critical region.
Critical
Value
Critical
Value

P-value method
➢Reject H0 if P-value   (where  is the
significance level, such as 0.05).
➢Fail to reject H0 if P-value > .

Basic Parametric Tests
T-test
ANOVA
Pearson Correlation
Linear Regression

T- test
• T-test is a parametric test that is commonly used
to test difference between 2 group means. Means
may be from independent or dependent groups
• A dependence method, usually a univariate tests
and is most effective to use when the independent
variable is non-metric.
Example: testing the relationship between level of
job satisfaction and gender.

One-sample T-test
➢Used to test single population mean
➢Usually compare the mean to existing
population mean or to the standard norm
➢Example is comparing the performance in
the board exam of a certain school to the
national result

Sample SPSS output
T-Test
N Mean
Std.
Deviation
Std. Error
Mean
Time to effect 200 4.366 2.68660 0.18997
Lower Upper
Time to effect -3.337 199 0.001 -0.63400 -1.0086 -0.2594
One-Sample Statistics
One-Sample Test
Test Value = 5
t df
Sig. (2-
tailed)
Mean
Difference
95% Confidence Interval of
the Difference

T-test for Independent Samples
✓ Also called the two sample t-test for independent
samples
✓ Assumptions maybe equal or unequal variances
✓ It intends to test whether there is a significant
difference between the means of two unrelated
groups
✓ It is use to test the null hypothesis:
𝜇1 = 𝜇2

Sample SPSS output
T-Test
N Mean
Std.
Deviation
Std.
Error
Mean
Female 101 4.620 2.820 0.281
Male 99 4.107 2.531 0.254
Lower Upper
Equal variances
assumed
2.651 0.105 1.352 198 0.178 0.513 0.379 -0.235 1.260
Equal variances not
assumed
1.354 196.491 0.177 0.513 0.379 -0.234 1.260
Time to
effect
Group Statistics
Gender
Time to
effect
Independent Samples Test
Levene's Test for
Equality of
Variances t-test for Equality of Means
F Sig. t df
Sig. (2-
tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference

Sample SPSS output
T-Test
N Mean
Std.
Deviation
Std.
Error
Truck 40 19.70 3.107 0.491
Automobile 114 25.30 3.646 0.341
Lower Upper
Equal variances
assumed
0.004 0.948 -8.664 152 0.000 -5.597 0.646 -6.874 -4.321
Equal variances not
assumed
-9.356 79.405 0.000 -5.597 0.598 -6.788 -4.407
Fuel
efficiency
Levene's Test for
Equality of
t-test for Equality of Means
F Sig. t df
Sig. (2-
tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Group Statistics
Vehicle type
Fuel
efficiency
Independent Samples Test

T-test for dependent samples
➢Also called the paired t-test
➢It intends to test whether there is a
significant difference between the means
from the same group.
➢Mostly used in comparing pre-test and post-
test results
➢It is use to test the null hypothesis:
𝜇 𝑏𝑒𝑓𝑜𝑟𝑒 = 𝜇 𝑎𝑓𝑡𝑒𝑟

Sample SPSS Output
T-Test
Mean N
Std.
Deviation
Std. Error
Mean
Triglyceride 138.44 16 29.040 7.260
Final triglyceride 124.38 16 29.412 7.353
Weight 198.38 16 33.472 8.368
Final weight 190.31 16 33.508 8.377
Lower Upper
Pair 1
Triglyceride - Final
triglyceride
14.063 46.875 -10.915 39.040 1.200 15 0.249
Pair 2 Weight - Final weight 8.063 2.886 6.525 9.600 11.175 15 0.000
Paired Samples Test
Paired Differences
t df
Sig. (2-
tailed)Mean
Std.
Deviation
95% Confidence Interval
of the Difference
Paired Samples Statistics
Pair 1
Pair 2

ANOVA – Analysis of Variance
➢ It is an appropriate technique for estimating the
parameters of a linear model, Y = α + βx + ε, when the
independent variables are nominal or categorical.
➢ In practice, it is used to test significant differences
among group means (more than 2 groups)
➢ Mostly use in experimental research, esp. when design
of experiment is applied.
➢ Example: Consider the case where a medical
researcher is interested about the effect of occupation
on cholesterol level. The independent variable,
occupation, is nominal.

Sample SPSS Output
Descriptives
SEXUALITY
RELIGION 1 50 2.441 0.765
RELIGION 2 50 2.129 0.677
RELIGION 3 50 1.993 0.467
RELIGION 4 50 2.313 0.534
Total 200 2.219 0.640
SEXUALITY
Levene Statistic df1 df2 Sig.
5.175 3 196 0.002
SEXUALITY
Sum of
Squares df
Mean
Square F Sig.
Between
Groups
5.868 3 1.956 5.062 0.002
Within Groups 75.735 196 0.386
Total 81.603 199
Test of Homogeneity of Variances
ANOVA
N Mean
Std.
Deviation

Sample SPSS Output
Post Hoc Tests
Dependent
Variable:
SEXUALITY
Games-Howell
Lower Bound Upper Bound
RELIGION 2 0.312 0.144 0.141 -0.065 0.690
RELIGION 3 0.448* 0.127 0.004 0.116 0.780
RELIGION 4 0.128 0.132 0.767 -0.218 0.473
RELIGION 1 -0.312 0.144 0.141 -0.690 0.065
RELIGION 3 0.136 0.116 0.650 -0.169 0.440
RELIGION 4 -0.184 0.122 0.434 -0.503 0.134
RELIGION 1 -0.448 0.127 0.004 -0.780 -0.116
RELIGION 2 -0.136 0.116 0.650 -0.440 0.169
RELIGION 4 -0.320 0.100 0.010 -0.582 -0.058
RELIGION 1 -0.128 0.132 0.767 -0.473 0.218
RELIGION 2 0.184 0.122 0.434 -0.134 0.503
RELIGION 3 0.320* 0.100 0.010 0.058 0.582
RELIGION 1
RELIGION 2
RELIGION 3
RELIGION 4
*. The mean difference is significant at the 0.05 level.
Multiple Comparisons
(I) RELIGION
Mean
Difference (I-J) Std. Error Sig.
95% Confidence Interval

Correlation Analysis
❖Correlation is a measure of the direction
and strength of linear relationship between
two variables.
➢Direction means positive or negative.
➢Strength can be perfect, strong or high,
moderate, low or zero or no correlation.
❖Correlation between two variables does not
prove X causes Y or Y causes X.

– Degree/Strength and Direction of Relationship
❖ How well do the data fit a specific form?
❖ Typically look for how well data fit a straight line.
❖ Scatter diagram is an illustrative way to determine
the strength and direction of relationship.
❖Pearson Correlation Coefficient is a numerical
measure that can also be used to determine
strength and direction of relationship.
What is correlation?

Pearson correlation coefficient r
Pearson Correlation coefficient is a numerical
value that measures strength and direction of
linear relationship
Symbol: r
✓ r can range from -1.0 to +1.0
✓ Sign (+/-) indicates “direction”
✓ Value indicates “strength”
✓ Measures a “linear” relationship only
✓ Significance of the Pearson r can be tested using t-
test

Pearson correlation coefficient r
Illustration:
•
-1
•
1
•
0
Perfect
Negative
Correlation
Perfect
Positive
Correlation
No/Zero
Correlation
➢Closer to 0 = weaker
➢Closer to 1.0 = stronger
➢r close to 1.0 perfect
➢r  0 could mean many things:
❖No correlation at all between X & Y
❖Non-linear relationship between X & Y
❖Restricted range on X and/or Y
❖Outlier may be causing problems

Activity: Interpret the following r coefficient
1) r = 0.85
2) r = -0.69
3) r = -0.37
4) r = -0.11
5) r = 0.09
6) r = 0.32
7) r = -0.92
8) r = 0.75

Activity: Interpret the following r coefficient
1) r = 0.85 Ans.: Very Strong Positive
2) r = -0.69 Ans.: Moderate/Strong Negative
3) r = -0.37 Ans.: Weak Negative
4) r = -0.11 Ans.: No/Very weak
5) r = 0.09 Ans.: No/Very weak
6) r = 0.29 Ans.: Weak Positive
7) r = -0.92 Ans.: Very Strong Negative
8) r = 0.75 Ans.: Strong Positive

Interpreting r
r Verbal Interpretation
-1 Perfect Negative Correlation
-0.8 to -0.99 Very Strong Negative Correlation
-0.6 to -0.79 Strong Negative Correlation
-0.4 to -0.59 Moderate Negative Correlation
-0.2 to -0.39 Weak Negative Correlation
-0.01 to -0.19 Very Weak Negative Correlation
0 No Correlation
0.01 to 0.19 Very Weak Positive Correlation
0.2 to 0.39 Weak Positive Correlation
0.4 to 0.59 Moderate Positive Correlation
0.6 to 0.79 Strong Positive Correlation
0.8 to 0.99 Very Strong Positive Correlation
1 Perfect Positive Correlation
Interpreting Correlation (Evans, 1996)

Sample SPSS Output
RELATIONSHIP
TOWARDS
ADMINISTRATO
RS
RELATIONSHI
P TOWARDS
FELLOW
EMPLOYEES
ATTITUDE
TOWARDS
WORK
PROFESSIONA
LISM
PUBLIC
RELATIONS
Pearson Correlation 1 -0.093 0.191 0.222 .574**
Sig. (2-tailed) 0.610 0.278 0.207 0.005
N 34 34 34 34 34
Pearson Correlation -0.093 1 .518* 0.327 .429*
Sig. (2-tailed) 0.610 0.004 0.059 0.011
N 34 34 34 34 34
Pearson Correlation 0.191 .518* 1 .665**
.794**
Sig. (2-tailed) 0.278 0.004 0.000 0.000
N 34 34 34 34 34
Pearson Correlation 0.222 0.327 .665** 1 .687**
Sig. (2-tailed) 0.207 0.059 0.000 0.000
N 34 34 34 34 34
Pearson Correlation .574**
.429*
.794**
.687** 1
Sig. (2-tailed) 0.005 0.011 0.000 0.000
N 34 34 34 34 34
PROFESSIONALISM
PUBLIC RELATIONS
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
Correlations
RELATIONSHIP
TOWARDS
ADMINISTRATORS
RELATIONSHIP
TOWARDS FELLOW
EMPLOYEES
ATTITUDE TOWARDS
WORK

Common Nonparametric Tests
Chi-square Test
Wilcoxon Signed rank Test
Wilcoxon Rank-Sum Test
Kruskal-Wallis Test
Wilcoxon-Mann-Whitney Test
Spearman Rank-order Correlation

Chi-Square Test
The Chi-Square test is known as the test of
goodness of fit and Chi-Square test of
Independence. In the Chi-Square test of
Independence, the frequency of one nominal
variable is compared with different values of the
second nominal variable.
The Chi-square test of Independence is used
when we want to test associations between two
categorical variables.

Chi-Square Test
Assumptions
Independent random sampling
Nominal/Ordinal level data
No more than 20% of the cells have an
expected frequency less than 5
No empty cells

Wilcoxon Signed Rank Test
The Wilcoxon signed rank test is a frequently
used nonparametric test for paired data (e.g.,
consisting of pre- and post treatment
measurements) based on independent units of
analysis.
A nonparametric alternative to the paired t-test
It is a test about the median or known as the
median test.

Wilcoxon Rank-Sum Test
The Wilcoxon rank-sum test is a
nonparametric alternative to the two
sample t-test which is based solely on the
order in which the observations from the
two samples fall.

Kruskal –Wallis Test
the Kruskal–Wallis one-way analysis of
variance by ranks is a non-parametric method for
testing equality of population medians among
groups.
It is identical to a one-way analysis of variance
with the data replaced by their ranks.

Wilcoxon-Mann-Whitney Test
The Wilcoxon-Mann-Whitney test uses the ranks
of data to test the hypothesis that two samples
of sizes m and n might come from the same
population
The Mann-Whitney test is nonparametric : it does
not rest on any assumption concerning the
underlying distributions. It is therefore more
widely applicable than the t-test.

Spearman Rank-Order Correlation
➢ Spearman's Rank Correlation is a technique used
to test the direction and strength of the relationship
between two variables. In other words, its a device
to show whether any one set of numbers is
correlated to another set of numbers.
➢ It uses the statistic Rs which falls between -1 and
+1.
➢ It is a test identical to Pearson correlation r.
•Back

Summary of Parametric and
Nonparametric Test
Nonparametric tests Parametric tests
Nominal data Ordinal data Interval, ratio data
One group Chi square
goodness of fit
Wilcoxon signed rank
test
One group t-test
Two unrelated
groups
Chi square Wilcoxon rank sum
test,
Mann-Whitney test
Student’s t-test
Two related
groups
McNemar’s test Wilcoxon signed rank
test
Paired Student’s t-test
K-unrelated
groups
Chi square test Kruskal -Wallis one-
way analysis of
variance
ANOVA
K-related groups Friedman matched
samples
ANOVA with repeated
measurements

References
Altares, P. 2012. Elementary statistics with computer applications. (2nd ed., Vol. xii).
Manila(PH): Rex Bookstore.
Anderson DR, Sweeney DJ. Statistics for Business and Economics. Boston: MA:
Cengage Learning; 2018.
Anderson DR, Sweeney DJ. Essentials of Modern Business Statistics with Microsoft
Excel. Boston: MA: Cengage Learning; 2016.
Bluman, A. 2013. Elementary Statistics.6th ed., Vol. 1. Singapore (SG): McGraw-
Hill Education.Cuesta H. Practical Data Analysis. Birmingham: Packt
Publishing; 2016.
Dando P. Say It with Data: A Concise Guide to Making Your Case and Getting
Results. ALA ed. Chicago; 2014.
Levin, J. A., Fox, J. A., & Forde, D. R. 2009. Elementary statistics in social
research: the essentials .11th ed., Vol. xiv. Singapore (SG): Pearson
Education South Asia Pte.

Basic stat tools

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Basic stat tools

Similar to Basic stat tools (20)

More from Rachelle Bisa

More from Rachelle Bisa (8)

Recently uploaded

Recently uploaded (20)

Basic stat tools