What is SPSS and How to Use it for Data Analysis

What is SPSS
(Statistical Product and Service Solutions)
+ a software program used for statistical analysis.
It's a tool that helps people understand and work
with data.
+ import data from almost any type of file (like Excel
or Google Sheets) and then perform statistical
analysis on it.
+ analysis can range from simple descriptive
statistics (like averages, medians, or modes) to
more complex inferential statistics (like t-tests, chi-
square tests, or regression analysis).

SPSS Screen
Data View
Variable View

Variable View
questions like: "What's your age?", "Are you male or female?", "How much do you like ice cream on a scale
of 1 to 10?“
to enter all these responses into SPSS to analyze them. Each question from your survey will be a "variable"
in SPSS.
❑ Name: A short, no-space name for each question. Like "age", "gender", "ice cream".
❖ Uniqueness: Each variable in your dataset must have a unique name. You cannot have two variables with the exact same name.
❖ No Spaces Allowed:
❖ No Special Characters: like #, $, %, &, *, etc.
❖ Starting Character: Variable names must start with a letter or an underscore, not a number.
❖ Case Sensitivity: doesn't distinguish between uppercase and lowercase letters. So "TestScore", "TESTSCORE", and "testscore" would all be
considered the same variable name.
❑ Type: What kind of answer are you expecting? Numbers? Words?
❖ Numeric: This type is used for variables that contain numbers. These numbers can be either integers (like 1, 2, 3) or decimals (like 1.23,
4.56). You would use this type for a variable like "Age" or "Number of Children".
❖ String: This type is for variables that contain text, or "strings" of characters. For example, a variable like "Name" or "City of Residence"
would be a string.
❖ Date: The Date type is for variables that contain dates. SPSS has many different date formats you can choose from, so you can handle just
about any kind of date data.

Variable View
❑ Width: How long can the answers be?
+ For a numeric variable, width is the total number of digits allowed, including any decimal points and
negative signs. For example, with a width of 3, you can store numbers like -99, 100, or 12.3.
+ For a string (text) variable, width is the total number of characters allowed. For example, with a width
of 5, you can store words like "hello" or "world".
❑ Decimals: If you're using numbers, how many decimal places can they have?
• if you set "Decimals" to 2 for a variable, then the values of that variable will be shown with two digits
after the decimal point, like 12.34.
• This doesn't round or change your data, but it controls how the data is displayed.
❑ Label: A longer, more descriptive name for each question. Like "Age of the
respondent", "Gender of the respondent", "Rating for ice cream“
• While the "Name" of your variable has to be unique, short, and cannot contain spaces, the Label can be
longer, more descriptive, and can contain spaces. This allows you to provide additional context or
clarity that the variable name might not be able to communicate.
.

❑ Missing: How should SPSS handle it if there's no answer for a question?
1.define how the program should handle missing data for each variable.
2.it's common to have missing data. This could happen for a variety of reasons.
For example, a survey respondent might skip a question, or a particular
measurement might not be taken.
1.If you've used a specific number or code to represent missing data (like
"999" or "-1"), you can tell SPSS that this number means the data is
missing, and it will ignore these when doing analyses.
❑ Columns: How wide should the column be when you're looking at the
data?
• about how the data is displayed when you're looking at it in SPSS's Data
View.
❑ Align: Should the data be left-aligned, right-aligned, or centered in the
column?

Measure: What kind of data is it? Categories (like
male/female), order (like rankings), or scale (like age)?
Nominal: This is used for variables that represent categories, no inherent order or
ranking, example, "Gender" (male, female) or "Eye Color" (blue, green, brown)
Ordinal: This is used for variables that represent categories with an inherent order.
For example, "Education Level" (high school, bachelor's, master's, PhD)
Scale: Also known as "interval/ratio", this is used for variables that have a consistent
scale and meaningful zero point, Age" or "Height" would be scale variables, as they
have a constant interval of measurement (one year, one inch, etc.) and a true zero
point.

Scale
+ interval data:
1. Time of Day: When measuring time in hours throughout the day, we're dealing
with interval data. The difference between 1 PM and 2 PM is the same as
between 2 PM and 3 PM, which is an hour. However, there is no true zero
point - 0 o'clock doesn't mean the absence of time, it's simply midnight.
2. Elevation Above Sea Level: If we consider places above sea level, the
difference between an elevation of 100 meters and 200 meters is the same as
between 500 meters and 600 meters. Both differences are 100 meters. But
here, 0 doesn't mean the absence of elevation; instead, it represents sea level.
3. Temperature in Degrees Celsius or Fahrenheit: The difference between 15°C
and 25°C is the same as the difference between 25°C and 35°C. Both are
differences of 10°C. However, 0°C or 0°F does not mean the absence of
temperature.
4. intervals between the values are equal, but there's no true zero point that
indicates an absolute lack of the quality being measured.

Scale
+ Ratio Data
• Money: If you have $10, $20, or $30, the differences between
these amounts are all the same ($10). But if you have $0, that
means you literally have no money.
• Age: The difference between 10 years old and 20 years old is the
same as between 30 years old and 40 years old (a difference of
10 years). A person who is 0 years old is literally not yet born or
just born.
+ Imagine measuring the distance in kilometers. The difference
between 10 km and 20 km is the same as between 30 km and 40
km. It's a 10 km difference in both cases. But in this scale, 0 km
really means no distance at all - it's a true zero.
+ 0 really means the absence of the thing you're measuring.

❑ Values: If an answer option corresponds to a number (like "1 for male, 2 for
female"), you can specify that here
+ used to assign meaningful labels to the numerical codes of a categorical variable. This is
particularly useful when your data contains coded responses.
+ EXAMPLE, Imagine you have conducted a survey and collected data about people's favorite
fruit. You asked the question "What is your favorite fruit?" and gave them three options:
1.Apple
2.Banana
3.Orange
+ Instead of entering these fruit names into your SPSS dataset, you decide to code them
numerically:
+ 1 = Apple 2 = Banana 3 = Orange

"Permissible Statistical Operations by
Level of Measurement"
Level of
Measurement Mode Median Mean
Order
(Greater/Less) Differences Ratios
Nominal Yes No No No No No
Ordinal Yes Yes No Yes No No
Interval Yes Yes Yes Yes Yes No
Ratio (Scale in
SPSS)
Yes Yes Yes Yes Yes Yes

Variable Property - Role
Input:independent variable. This is the default assignment for variables in SPSS. This is a
variable that you think might affect something.
Target: The variable will be used as an outcome or dependent variable
Both: The variable will be used as both (independent variable) and an outcome
(dependent variable).
None: The variable has no role assignment. This might be used for variables that are
included in your dataset but not used in the analyses,
Partition: The variable will partition the data into separate samples. This can be used in
more advanced analyses where you want to run separate analyses for different groups in
your data.
Split: a more advanced version of SPSS for predictive modeling and data mining.

Variable Property - Role
+ magine you are conducting a study to understand the effect of exercise and diet on a
person's weight.
+ You collect the following data for each participant:
1. ID number
2. Age
3. Gender
4. Hours spent exercising per week
5. Type of diet (Vegan, Vegetarian, Non-Vegetarian)
6. Weight
+ In this study, each variable might have the following roles:
1. ID number: This would be set to None because it's just used to identify participants, not for analysis.
2. Age: This could be an Input. You think age might have an effect on a person's weight.
3. Gender: This could also be an Input. Gender might influence weight.
4. Hours spent exercising per week: This is another Input. You suspect that more exercise could lead to lower weight.
5. Type of diet: This is also an Input. The type of diet might affect a person's weight.
6. Weight: This is your Target variable. You're trying to understand and predict weight based on the other variables (age,
gender, hours of exercise, type of diet).
7. "Split" in the variable role is more typical when using SPSS Modeler

EXAMPLE
+ Input the following
data into SPSS

Briefly explain the concept of “Central Tendency”.
+ The concept of "Central Tendency" refers to the way of quantifying the "centre"
or the "average" of a dataset. For our dataset, the central tendency could be
applied to the 'Score', 'Age', 'Weight', and 'Height' variables because they are
continuous. For the 'Gender' and 'Collection Place' variables, which are
categorical (nominal), the mode is the most relevant measure of central
tendency, The three main measures of central tendency are:
+ 1. Mean: This is the arithmetic average of a set of values. It's calculated by
adding up all the values and then dividing by the number of values.
+ 2. Median: This is the middle value in a sorted list of values. If the number of
values is odd, the median is the exact middle number. If the number of values
is even, the median is the average of the two middle numbers.
+ 3. Mode: This is the most frequently occurring value in a set of data. For
continuous data, the mode might not be as meaningful, but for categorical data
like 'Gender' and 'Collection Place', the mode tells us the most common
category.

Statistics
MUET S core Age Weight Height
N Valid 770 769 769 753
Missing 0 1 1 17
Mean 3.4270 30.69 61.29 162.04
Median 3.5000 25.00 60.00 161.00
Mode 3.67 22 60 160
Statistics
CollectionPlace Gender
N Valid 770 770
Missing 0 0
Mode 3 2
Sum 1827 1232

+ Age: The mean age is 30.69 years, indicating that if you averaged all the ages, you would get approximately 30.69 years. The median
age is 25 years, indicating that the middle person's age, when all ages are arranged in order, is 25 years. The mode is 22 years, which
means that the most common age among the people is 22 years.
+ Weight: The mean weight is 61.29 kg, meaning that on average, a person in this group weighs around 61.29 kg. The median weight
is 60 kg, indicating that the middle person's weight, when all weights are arranged in order, is 60 kg. The mode of weight is 60 kg,
meaning the most common weight among the people is 60 kg.
+ Height: The mean height is 162.04 cm, suggesting that the average height of individuals in the group is approximately 162.04 cm.
The median height is 161 cm, which means the height that falls in the middle when all heights are arranged in order is 161 cm. The
mode of the height is 160 cm, suggesting the most common height among the individuals is 160 cm.
+ MUET Score: The mean MUET Score is 3.4270, suggesting that when all MUET scores are averaged, the result is roughly 3.4270. The
median MUET Score is 3.50, which means that when all MUET scores are arranged in ascending or descending order, the score in
the middle is 3.50. The mode for MUET Score is 3.67, indicating that the most frequently occurring MUET score among the
participants is 3.67.
+ Gender: the most common gender in this dataset is coded as '2'. 'Female' is the most common gender in this dataset. meaning that
there are more females than males in this data set.
+ Collection Place: The 'Mode' is the most frequently occurring category, which in this case is 3, corresponding to "Sunway Pyramid".
This indicates that Sunway Pyramid is the most common collection place in the data.
+ Note that in this case, we don't calculate mean and median for these categorical variables because they're nominal, meaning they
represent categories that don't have an inherent order or ranking. For such variables, it wouldn't make sense to calculate an average
or median.

2-Identify the average MUET score; height, weight and age
Descriptive Statistics
N Mean
MUET S core 770 3.4270
Age 769 30.69
Weight 769 61.29
Height 753 162.04
Valid N (listwise) 752
"MUET Score": There are 770 valid responses, and the average score is 3.4270.
"Age": There are 769 valid responses, and the average age is 30.69 years.
"Weight": There are 769 valid responses, and the average weight is 61.29 kg.
"Height": There are 753 valid responses, and the average height is 162.04 cm.
The "Valid N (listwise)" refers to the number of cases (individuals, participants, etc.)
that have valid, non-missing values for all of the variables listed. In this case, there
are 752 cases with complete data for MUET Score, Age, Weight, and Height.

3-Identify the most frequent MUET score, height, weight and age.
Mode: This is the most frequently occurring value in a dataset.
For the MUET Score, the mode is 3.67, meaning this score
appears most often. For Age, the most frequently occurring age
is 22 years. The most common weight is 60 kg, and the most
common height is 160 cm.
The "Missing" row indicates how many missing values are in
each column. For instance, "Age" and "Weight" each have one
missing value, while "Height" has 17 missing values

What is SPSS and How to Use it for Data Analysis

Recommended

Recommended

More Related Content

Similar to What is SPSS and How to Use it for Data Analysis

Similar to What is SPSS and How to Use it for Data Analysis (20)

Recently uploaded

Recently uploaded (20)

What is SPSS and How to Use it for Data Analysis