This document provides guidelines for exploring data and assumption testing in applied statistics. It discusses descriptive statistics, normal distribution tests, and assigning practice exercises to student groups. Specifically, it explains how to generate descriptive statistics and histograms in SPSS, introduces the Kolmogorov-Smirnov normality test, and provides examples analyzing normality for intrinsic motivation scores and exam scores from different universities. Students are then assigned to groups and asked questions related to outliers, measures of central tendency, variables, distribution characteristics, within-subjects designs, statistical errors, effect sizes, standard error, p-values, z-scores, and degrees of freedom.
1. Introduction to Applied Statistics and Applied Statistical Methods Practical guidelines
Prof. Dr. Chang Zhu page 1
Table of Contents
LECTURE 2.......................................................................................................................................................... 2
EXPLORING DATA/ASSUMPTION TESTING........................................................................................................ 2
DESCRIPTIVE STATISTICS................................................................................................................................ 2
NORMAL DISTRIBUTION TEST (the Kolmogorov-Smirnov (K-S) test) ............................................................ 3
ASSIGNMENT..................................................................................................................................................... 5
2. Introduction to Applied Statistics and Applied Statistical Methods Practical guidelines
Prof. Dr. Chang Zhu page 2
LECTURE 2
EXPLORING DATA/ASSUMPTION TESTING
Before proceeding to certain kind of analysis, it is important that we should explore:
• the characteristics of our data (mean, mode, median, variance, standard deviation, and range),
• the distribution of your data, if they are normally distributed or not by (1) values of skewness and
kurtosis (SPSS also provides histograms to visualize the distribution in the Descriptive Statistics
Command) and (2) tests of normality,
• the homogeneity of variance between groups (if we are to conduct analysis for groups).
DESCRIPTIVE STATISTICS
e.g. we want to know the characteristics and distribution of our data for the variable Intrinsic
_Motivation_learn (the file named sample data 1.sav)
In SPSS, choose Analyse > Descriptive Statistics > Frequencies
Select the variable Intrinsic _Motivation_learn and move it to the Variable(s) box by clicking the
button.
Click on the Statistics button to access the dialog box and select the options of your preference. After
finishing, click Continue.
Click on the Charts button to access the Frequencies: Charts dialog box. Choose Histograms (Show normal
curve on histogram), and click Continue to finish.
On the main dialog box, click OK to run the analysis.
In the Output document, you will see a table of descriptive statistics and a histogram with curve, based on
which you will have an idea of the characteristics (visual) distribution of your data.
3. Introduction to Applied Statistics and Applied Statistical Methods Practical guidelines
Prof. Dr. Chang Zhu page 3
NORMAL DISTRIBUTION TEST (the Kolmogorov-Smirnov (K-S) test)
To compare our data with a normally distributed data with the same mean and standard deviation, we can
use the Kolmogorov-Smirnov (K-S) test and the Shapiro-Wilk test.
Ex1. we want to know if all the scores for the variable Intrinsic_Motivation_learn are different from a
normally distributed dataset.
In SPSS, choose Analyse > Descriptive Statistics > Explore
Select the variable Intrinsic_Motivation_learn and move it to the Dependent List box by clicking the
button.
Click on the Statistics button to access the dialog box and select the Descriptive button. After finishing, click
Continue.
Click on Plots and select the option Normality plots with tests, which will provide us with the Kolmogorov-
Smirnov test and the Shapiro-Wilk test and the normal Q-Q plots.
On the main dialog box, click OK to run the analysis.
On the main dialog box, click OK to run the analysis.
In the Output document, the most important table we should look at is the table labelled Tests of
Normality.
Tests of Normality
Kolmogorov-Smirnova
Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
1-5 .189 58 .000 .902 58 .000
a. Lilliefors Significance Correction
According to the result, the Kolmogorov-Smirnov test and the Shapiro-Wilk test are highly significant,
indicating that the distribution of scores for the variable Intrinsic_Motivation_learn is significantly different
from a normal distribution. In other words, the distribution is not normal.
Ex2. Let us now look at a data set selected from Field (2009), namely SPSSexam.sav, that includes data on
student performance on SPSS exam. The data set contains four variables: exam (scores), computer
(measure of computer literacy in percent), lecture (percentage of SPSS lectures attended), numeracy (a
measure of numerical ability out of 15), and uni (the university of the participants, either Duncetown or
Sussex).
4. Introduction to Applied Statistics and Applied Statistical Methods Practical guidelines
Prof. Dr. Chang Zhu page 4
Conduct the Kolmogorov-Smirnov (K-S) test for the variable exam (scores) for each of the two groups
(Duncetown/Sussex university) and report the result by filling the missing information in the following table
and statement.
Tip: to conduct the K-S test separately for each group, we just need to move the variable uni to the Factor
List before proceeding to the next step.
Tests of Normality
University
Kolmogorov-Smirnova
Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Percentage on SPSS exam Duncetown University .106 50 _____ .972 50 _____
Sussex University .073 50 _____ .984 50 _____
a. Lilliefors Significance Correction
*. This is a lower bound of the true significance.
The percentage on the SPSS exam, D (50) = 0.10, p (</>) ___ .05 and D (50) = 0.07, p (</>) ___ .05 are
significantly (normal/non-normal) ________________ for both the Duncetown and Sussex groups,
respectively.
(Note: the test statistic for the K-S test is denoted by the letter D in papers).
5. Introduction to Applied Statistics and Applied Statistical Methods Practical guidelines
Prof. Dr. Chang Zhu page 5
ASSIGNMENT
1) Use your own data set, explore your data with descriptive statistics and normal distribution tests
for 2 scale variables. Report the results in APA format.
2) Subscribe yourself to one of the groups in Pointcarre (based on the group formation in Assignment
1). For students who worked individually, you can also join one of the groups for those assignments
that need group discussion.
To do this, access the course Introduction to Applied Statistics and Statistical Methods, click on
Course group and you will find all the possible groups that you can subscribe as a member (just
subscribe to one group).
Post your group’s answer to the given question assigned to your group as indicated below, to the
forum labelled Exploring data, under the topic Questions and Answers/basic concepts in statistics
and distributions
When you post the group’s answer to the forum, remember to mention the question, e.g.
Question: What is ..?
Answer: ….
(shortly state which part of the answer you still have doubt (if any) and need other group’s support?
Extra credit is given if you can contribute to the answers of other group (giving critical feedback or
provide more information on the issue for your peers)
Group 1: How can we deal with outliers in case we detect them?
Group 2: Under what circumstances should we be cautious about using the mean as a measure of
central tendency?
Group 3: What are dependent/independent variables? Are there any other ways to refer to
dependent/independent variables in research papers?
Group 4: What are the cut-off (limit) values for skewness and kurtosis?
Group 5: For data to be normally distributed, what characteristics it should have?
6. Introduction to Applied Statistics and Applied Statistical Methods Practical guidelines
Prof. Dr. Chang Zhu page 6
Group 6: What is within-subjects design? What are the possible problems associated with this kind
of design?
Group 7: What are Type I and Type II error in statistics?
Group 8: What is an effect size and how is it measured?
Group 10: What is the standard error of the mean (SE)? How is it calculated?
Group 11: What does a p-value generally tell us? How can p <.05 interpreted?
Group 12: What is a z-score? How is it calculated?
Group 13: In the equation to calculate the variance, we divide the sum of squared errors by the
number of participants (N) minus 1. In this case, (N – 1) is referred to as degree of freedom. Can
you explain the concept “degree of freedom”?
Group 14: In addition to the K-S test, what are some other ways to examine the distribution of a
data set?