This document discusses data transformation in SPSS. It describes how to compute new variables using arithmetic, logical, and conditional expressions. It also explains how to recode the values of existing variables into new variables or categories using the recode command. Examples are provided to illustrate computing total scores, averages, increments with conditions, and recoding years of schooling into educational status categories.
2. Page 1 of 20
Data Transformation
Session Outline
1. Compute Variables
a. Introduction to Calculator Pad
Arithmetic Operators
Relational Operators
Logical Operators
b. Functions
Arithmetic Functions
Statistical Functions
Other Functions
c. Illustrative Examples
2. Compute Variables with Conditions (Conditional Transformation)
a. Illustrative Examples
3. Recoding Values
a. Recoding into Same Variable
Illustrative Examples
b. Recoding into Different Variables
Illustrative Examples
Compute Variables
Compute command is used to compute values for a variable based on numeric transformations of other
variables. Using this command we can create new variables or replace the existing variables (for new
variables we can also specify the variable type and label). Note that we can compute values for numeric
or string (alphanumeric) variables only. We can also compute values selectively for subsets of data based
on logical conditions. For computation purposes we can use mathematical and / or logical operators. We
can use over 70 built-in functions, including arithmetic functions, statistical functions and other functions.
3. Page 2 of 20
The general expression of Compute command is as follows:
compute [new variable] = arithmetic or logical expression
The following steps are followed to compute variables:
► From the menu choose:
Transform
Compute Variable…
Computer will show the compute variable dialogue box as following below.
4. Page 3 of 20
► Type the name of a single target variable, it can be an existing variable or a new variable.
► Write an Arithmetic or Logical Expression in the Numeric Expression field.
To build an expression, either paste components of variable list into the expression field or then edit the
name or type directly in the expression field. To build numeric expression we can use Existing
Variable Names, Arithmetic Operators, Constants and Functions. Besides we can use Calculator
Pad, Variable List and Function List.
Calculator Pad
We can use calculator pad to build Arithmetic or Logical Expression. For using the calculator pad click
the number on it using mouse. It is possible to make complex Expression using this Calculator Pad. There
are three types of operators and one function in calculator pad.
Arithmetic Operators: Arithmetic operator is used to make any numeric expression. Besides to use
negative sign we can use the mathematical operator. The mathematical/arithmetical operators are:
5. Page 4 of 20
Operator Meaning/use
+ Addition
- Subtraction(or negative sign)
/ Division
* Multiplication
** Exponentiations(To the power)
Relational Operators: Relational operators are used to compare the similar type of elements/variables.
For instance, a string variable is compared with another string variable. Again a numeric variable/value
can be compared with another numeric variable/value. The relational operators are:
Operator Meaning/use
< Less than
> Greater than
Greater than or equal
Less than or equal
or ~= Not equal
= Equal
Logical Operator: Logical operator is used to make relatively more Complex Expression. Suppose we
want the people whose age is greater than equal 25 and less than equal 60, then we can write: Age 25
AND Age 60. The AND used in this expression is a Logical operator. The Logical Operators are as
follows:
Operator Use
AND, & When both the conditions are true
OR, | When one of the two conditions are true and
another condition is false
NOT When does not satisfy the condition
Functions: There are more than 70 built-in functions, which includes:
Arithmetic Functions
6. Page 5 of 20
Statistical Functions
Logical Functions
Missing Value Functions etc.
Arithmetic Functions: There are several built in arithmetic functions in SPSS. Some of the typical
arithmetic functions are discussed below:
Abs(numexpr): This function is used to transform the value of a variable to its Absolute Value. The
numexpr stands here for Numeric Expression. For example, if the value of the variable Scale is - 4.7 then
if we use the function Abs(Scale) then we will get the answer 4.7. Also if then we use Abs(Scale) + 5,
then we will get the result 9.7.
Exp(numexpr): It is used to find the value of e raised to the power numexpr, where e is the base of the
natural logarithms and numexpr is the numerical expression.
Sqrt(numexpr): It is used to find the positive square root of a numeric expression, which cannot be
negative.
Ln(numexpr): It is used to find the e-based logarithm of an expression, which must be numeric and
greater than 0.
Lg10(numexpr): It is used to find the base-10 logarithm of an expression, which must be numeric and
greater than 0.
Statistical Functions: Along with the other functions, in SPSS there are some statistical fuctions. Some
of the statistical functions are discussed below:
Sum(numexpr, numexpr, …): It is used to find the sum of some arguments that have valid values. The
function requires two or more arguments, which are numeric.
Mean(numexpr, numexpr, …): It is used to find the arithmetic mean of its arguments that have valid
values. This function requires two or more arguments, which must be numeric.
Sd(numexpr, numexpr, …): It is used to find the Standard Deviation of two or more arguments which
have valid values. This function requires two or more arguments, which must be numeric.
Variance(numexpr, numexpr, …): This is used to find the variance of its arguments that have valid
values. This function requires two or more arguments, which must be numeric.
7. Page 6 of 20
Max(value, value, …): It is used to find the maximum value of its arguments that have valid values.
This function requires two or more arguments, which must be numeric.
Min(value, value, …): It is used to find the minimum value of its arguments that have valid values. This
function requires two or more arguments, which must be numeric.
Other Functions: Besides the above, there are some other functions may be used for transforming data
if necessary. Such as:
Normal(stddev): This function is used to generate random number from Normal Distribution, where
standard deviation can be fixed. It creates random numbers from the normal distribution.
Illustrative Example 1
Suppose we want to compute the total marks obtained by the competitors in the written test for a job in
a firm, from the following data:
English Mathematics
General
Knowledge
20 45 23
18 40 22
16 35 19
21 41 21
15 38 20
If our interest is in total marks, to get the total marks, we will follow the following arithmetic formula:
Total Marks = Marks in English + Marks in Mathematics + Marks in General Knowledge.
We will denote the Total Marks as Tmarks. In order to do that with SPSS, we follow the following
steps:
► In the Compute variable dialog box type Tmarks in the target variable box appeared at the
left-upper corner in the dialog box.
► Using either the calculator pad or the keyboard write in the numerical expression box
8. Page 7 of 20
English + Mathematics + General Knowledge
► Click left mouse button to Ok.
Then we will see that a new variable Tmarks has automatically been created on the right-most column of
the data sheet. The data sheets now look like:
English Mathematics
General
Knowledge
Tmarks
20 45 23 88
18 40 22 80
16 35 19 70
21 41 21 83
15 38 20 73
Suppose we want to compute the Average marks obtained from 3 subjects on a test from the above
data. To get the Average marks, we will follow the following formula:
Average Marks = (Marks in English + Marks in Mathematics + Marks in General
Knoledge)/3.
We will denote the Average Marks as Avmarks. In order to do that we follow the following steps:
► In the Compute variable dialog box type Avmarks in the Target variable box appeared at the
left-upper corner in the dialog box.
► Using either the calculator pad or the keyboard write in the numerical expression box
(English + Mathematics + General Knowledge)/3
► Click left mouse button to Ok.
Illustrative Example 1
Suppose we want to compute the yearly increment of the employee on the basis of their salary from the
following data using the formula:
9. Page 8 of 20
Increment = (10% of the salary) + 1000
Consider the following salary data. The data set is stored in the data set named Data Set 2.sav.
ID Salary
1 10000
2 15000
3 12000
4 13000
5 14000
6 17000
7 15500
8 16500
9 17500
To do the above computation, we follow the following instructions:
► In the Compute variable dialog box type increm in the target variable box appeared at the left-
upper corner in the dialog box.
► Using either the calculator pad or the keyboard write in numerical expression box
(Salary*0.10) + 1000
► Click left mouse button to Ok.
Computing Variables with Conditions (Conditional Transformations)
Conditional transformation using If cases dialog box allows us to apply data transformations to selected
subsets of cases to apply data transformation. A conditional expression returns a value of true, false, or
missing for each case.
If the result of a conditional expression is true, the transformation is applied to the case.
If the result of a conditional expression is false or missing, the transformation is not applied to the
case.
Most of the conditional expressions use one or more of the relational and logical operators
(discussed earlier).
10. Page 9 of 20
To fix the conditional expression, click the If in the Compute Variable dialog box, then the computer
will show the If cases dialog box. Then select the option Include If Cases Satisfies Condition.
The options are shown in following figure.
Illustrative Example 1
Suppose we want to find the deduction from the salary for the transport facility for the employees of a
firm, from the following data of salary (Data Set 2.sav). It is given that 5% of salary is deducted if salary
is greater than 12,000 taka. To perform this computation, we follow the following steps:
► From the menus choose
Transform
Compute Variable …
Then the Compute Variable dialog box will be open.
► In the Compute Variable dialog box type deduct in the Target Variable box appeared at the
11. Page 10 of 20
left corner in the dialog box.
► Using either the calculator pad or the keyboard write in Numeric Expression box
salary * 0.05
► Click If box appeared below the calculator pad. This will open Compute Variable: If Cases
dialog box
► Select include if case satisfies condition, which is appeared on the upper horizontal wider
bar.
► Using either the calculator pad or the keyboard write in the Compute Variable : IF Cases
dialog box.
salary > 12000
This condition specifies that the new variable deduct will be computed only for cases/records for
whom the value of the variable salary is greater than 12000. The cases that do not satisfy this
condition, the new variable deduct will be equal to the system-missing value.
► Click left mouse button to continue box to return the Compute Variable dialog box.
► Click Ok.
Now it is seen that a new variable deduct has automatically been created on the right most column on our
data sheet. The data sheet now looks like following:
ID Salary deduct
1 10000 .
2 15000 750
3 12000 .
4 13000 650
5 14000 700
6 17000 850
7 15500 775
8 16500 825
9 17500 875
Now what about the employee having salary less than or equal to 12,000? The deduction
should be zero. To perform this computation, we follow the following steps:
12. Page 11 of 20
► From the menus choose
Transform
Compute Variable …
Then the Compute Variable dialog box will be open.
► In the Compute Variable dialog box type deduct in the Target Variable box appeared at the
left corner in the dialog box.
► Using either the calculator pad or the keyboard write in Numeric Expression box
0
► Click If box appeared below the calculator pad. This will open Compute Variable: If Cases
dialog box
► Select include if case satisfies condition, which is appeared on the upper horizontal wider
bar.
► Using either the calculator pad or the keyboard write in the Compute Variable : IF Cases
dialog box.
salary <= 12000
► Click left mouse button to continue box to return the Compute Variable dialog box.
► Click Ok.
Illustrative Example 2
Suppose we want to compute the yearly increment of the employee of an institution who satisfies the
following condition from the following data (Data Set 3.sav).
Condition: Increment = 15% of the salary if Job Category = 3 and Experience is greater or equal to 5
years.
ID Salary Job Category Experience
1 10000 1 5
2 15000 2 6
3 17000 3 7
4 21000 3 8
5 18000 3 5
13. Page 12 of 20
Now to find the Increment we follow the following steps:
► From the menus choose
Transform
Compute Variable …
Then the Compute Variable dialog box will be open.
► In the Compute Variable dialog box type deduct in the Target Variable box appeared at the
left corner in the dialog box.
► Using either the calculator pad or the keyboard write in Numeric Expression box
salary * 0.15
► Click If box appeared below the calculator pad. This will open Compute Variable: If Cases
dialog box
► Select include if case satisfies condition, which is appeared on the upper horizontal wider
bar.
► Using either the calculator pad or the keyboard write in the Compute Variable : IF Cases
dialog box.
jobcat = 3 & exper 5
► Click left mouse button to continue box to return the Compute Variable dialog box.
► Click Ok.
Recoding Values
We can modify the data values by recoding them. This is particularly useful for collapsing or combining
categories. We can recode the values within existing variables, or we can create new variables based on
the recoded values of existing variables. That is two types of recoding are possible:
1. Recode into same variable
2. Recode into different variable
Recode into Same Variable
14. Page 13 of 20
Recode into Same Variables reassigns the values of existing variables or collapses ranges of existing
values into new values. For example, we can collapse Marks into Marks Range categories. We can
recode numeric and string variables, but we can not recode numeric and string variables together. If we
select multiple variables, they must be all of the same type. To recode values of a variable into same
variable we follow the following steps:
► From the menus choose
Transform
Recode into Same Variables…
Select the variable which we want to recode. (If we select multiple variables, they must all be of the same
type, numeric or string).
► Click Old and New Values.
► We shall see the Recode into Same Variables: Old and New Values dialog box.
We can define values to recode in this dialog box. All value specifications must be the same data type
(numeric or string) as the variables selected in the main dialog box. The variable whose value is to be
15. Page 14 of 20
recoded is defined as Old Value and after fixing its new value we click the Add button. We can recode
more than one Old Values to one New Value, but we can not recode one Old Value into more than one
new value.
Old Value : The values to be recoded. We can recode single values, range of values.
New Value : The single value into which each old value or range of values is recoded.
Old → New : The list of specifications that will be used to recode the variable(s). We can
add, change and remove specifications from the list.
16. Page 15 of 20
Illustrative Examples 1
Suppose we want to define ‘Educational Status’ on the basis of ‘year of schooling’ from the following
data using the following specifications (use Data Set 4.sav).
Year of schooling New Value (code) Meaning (value label)
0 = 1 = Illiterate
1-5 = 2 = Primary
6-10 = 3 = Secondary
11-12 = 4 = Higher Secondary
13-16 = 5 = Graduate
17 = 6 = Post Graduate
18 + = 7 = Higher
ID Year of Schooling
1 15
2 07
3 14
17. Page 16 of 20
4 08
5 13
6 00
7 18
8 06
9 20
10 11
11 10
12 05
13 12
14 16
Now, to recode this data into new values, we follow the following steps
► From the menus choose
Transform
Recode into Same Variables…
This will open the Recode into Same Variable dialog box.
► Select yearsch from the variable list (left window) and then click the arrow on the vertical bar of
the dialog box with the left mouse.
► Then we click on Old and New Values option.
► We shall see the Recode into Same Variables: Old and New Values dialog box.
► Using the Old value and New values options we recode the variable in our desired format.
► Click left mouse button to Continue box to return the Recode into Same Variable dialog
box.
► Click Ok.
We shall see that the variable yearsch has automatically been recoded on the existing variable.
Illustrative Examples 2
Suppose we want to define the ‘Social Status’ on the basis of Income Variable given below using the
following specifications:
Income (monthly) New Value (code) Meaning (value label)
18. Page 17 of 20
Less than 3000 = 1 = Lower Class
3001-10000 = 2 = Lower Middle Class
10001- 25000 = 3 = Middle Class
25001- 100,000 = 4 = Higher Middle Class
100001 + = 5 = Higher Class
ID Income
1 20000
2 1800
3 35000
4 56000
5 3200
6 17000
7 78000
8 22000
9 900
10 7000
11 32000
12 125000
13 45000
14 245000
Now, to recode this data into new values, we follow the following steps
► From the menus choose
Transform
Recode into Same Variables…
This will open the Recode into Same Variable dialog box.
► Select income from the variable list (left window) and then click the arrow on the vertical bar of
the dialog box with the left mouse.
► Then we click on Old and New Values option.
► We shall see the Recode into Same Variables: Old and New Values dialog box.
► Using the Old value and New values options we recode the variable in our desired format.
► Click left mouse button to Continue box to return the Recode into Same Variable dialog
box.
► Click Ok.
We shall see that the variable income has automatically been recoded on the existing variable.
19. Page 18 of 20
Recode into Different Variables
Recode into Different Variables reassigns the values of existing variables or collapses ranges of existing
values into new values for a new variable. For example, we can collapse Marks into a new variable
containing Marks-Range categories. We can recode numeric and string variables, but we can not recode
numeric and string variables together. If we select multiple variables, they must be all of the same type.
Also we can recode numeric variables into string variables and string variables into numeric variables.
To recode values of a variable into different variable we follow the following steps:
► From the menus choose
Transform
Recode into Different Variables…
► Select the variable which we want to recode. (If we select multiple variables, they must all be of
the same type, numeric or string).
► Enter an output (new) variable name for each new variable and click Change.
► Click Old and New Values and specify how to recode values.
We can define values to recode in the Old and New Value dialog box. All value specifications must be
the same data type (numeric or string) as the variables selected in the main dialog box.
20. Page 19 of 20
We can recode more than one Old Values to one New Value, but we can not recode one Old Value
into more than one new value.
If we want to recode a numeric variable into a string variable, you must also select Output variables are
strings.
Any old values that are not specified are not included in the new variable, and cases with those values
will be assigned the system-missing value for the new variable. To include all old values that do not
require recoding, select All Other Values for the old value and Copy old value(s) for the new value.
Illustrative Examples
Apply the Data Set 4.sav and Data Set 5.sav to see the process of recoding values into different
variables. It’s easy to implement.