Introduction to Statistical package of social sciences

Amity Institute of Psychology and Allied Sciences
1
Amity Institute of Psychology
and Allied Sciences
Introduction to SPSS (STAT626)

2
Introduction to SPSS
(STAT626)
Internal Assessment (Test)

Module 1
Introduction

• SPSS means “Statistical Package for the Social Sciences” and was first
launched in 1968. Since SPSS was acquired by IBM in 2009, it's officially
known as IBM SPSS Statistics but most users still just refer to it as “SPSS”.

• SPSS is short for Statistical Package for the Social Sciences, and it’s used
by various kinds of researchers for complex statistical data analysis. The
SPSS software package was created for the management and statistical
analysis of social science data. It was originally launched in 1968 by SPSS
Inc., and was later acquired by IBM in 2009.
• Officially dubbed IBM SPSS Statistics, most users still refer to it as SPSS.
As the world standard for social-science data analysis, SPSS is widely
coveted due to its straightforward and English-like command language and
impressively thorough user manual.
• SPSS is used by market researchers, health researchers, survey
companies, government entities, education researchers, marketing
organizations, data miners, and many more for processing and analyzing
survey data,

In computer science, garbage in, garbage out (GIGO) is the concept that
flawed, or nonsense (garbage) input data produces nonsense output.
Rubbish in, rubbish out (RIRO) is an alternate wording.

• SPSS - Quick Overview Main Features
• SPSS is software for editing and analyzing all sorts of data. These data may
come from basically any source: scientific research, a customer database,
Google Analytics or even the server log files of a website. SPSS can open
all file formats that are commonly used for structured data such as
• spreadsheets from MS Excel or OpenOffice;
• plain text files (.txt or .csv);
• relational (SQL) databases;
• Stata and SAS.

Data analysis with SPSS: general aspects,
workflow, critical issues
In general:
• Except for graphs, SPSS output should not be presented in reports or presentations.
• Instead, the information from SPSS output should be used to construct proper tables,
or to produce proper conclusions.
• SPSS has two worksheets sheets; both must be set up correctly (you can switch
between the two worksheets by clicking on these at the bottom of the SPSS window):
• The Data View, which contains the data.
• The Variable View, which contains information about each variable.
• In SPSS (as with most statistical software), each row represents one unit of analysis
in the Data View.

Data analysis with SPSS: general aspects,
workflow, critical issues
• In SPSS (as with most statistical software), each column represents one variable in
the Data View.
• At the top of each column (but not in Row 1!) is the name of the variable;
• When setting up the Variable View, many columns can be filled in. These are the
important columns to get right (the others aren’t important for our purposes):
• Name: A short description of the variable (with no spaces or punctuation!);
• Type: Usually Numeric (apart from names which can be String).
• Label: A fuller description of the variable. This is what will appear in tables and
graphs to describe the variable. For example Diastolic blood pressure (in mm Hg).
• Values: For qualitative variables only: tell SPSS what each number represents (for
example, does a 1 represent females, or males?)
• Measure: This is important: You must tell SPSS whether each variable is nominal,
ordinal, or scale (i.e. quantitative) by using the drop-down options.

Workflow

SPSS: general description, functions, menus,
commands
SPSS WINDOWS
Data Editor Window: It displays the contents of the data file. This is the window that opens
automatically when you start an SPSS session. In this window, you can create new data files or
modify existing ones. When you open more than one data file, each data file has a separate
Data Editor Window. The Data Editor Window provides two view of the data:
Data View: It displays the data values. Each variable is a column. Each row is a case.
Variable View: It displays a table consisting of variable names and their attributes. You can modify the
properties of each variable or add new variables or delete existing variables in the Variable View
Window.

SPSS: general description, functions, menus,
commands
SPSS WINDOWS
Viewer Window: It displays statistical results, tables, and charts. This window opens automatically the first time you
run a procedure that generates output.
Pivot Table Editor: It displays the results in pivot tables. To open this window, right click on the table, go to edit content
and select “In separate window”. Alternatively, left click on the table and go to Edit Menu. Select edit content and then
in separate window. You will be able to modify the table.
Chart Editor Window: This window is used to edit high-resolution charts and plots.
Text Output Editor Window: This is used to modify text output that is not displayed in pivot tables. To open the window,
right click on the text output, go to edit content and select “In separate window”. You will be able to modify the text
output.
Syntax Editor Window: It displays the choices made in the dialog box in the form of command syntax. These
commands can be edited and run to get some output. You can also copy an old SPSS program here and run it.

SPSS file management
There are three types of SPSS files that we will use during this class: data files, which
end in .sav; syntax files, which end in .sps; and output files, which end in .spv.

SPSS file management
IBM SPSS Statistics Data File Structure
The basic structure of IBM SPSS Statistics data files is similar to a database table:
Rows (records) are cases. Each row represents a case or an observation. For example,
each individual respondent to a questionnaire is a case.
Columns (fields) are variables. Each column represents a variable or characteristic that is
being measured. For example, each item on a questionnaire is a variable.
IBM SPSS Statistics data files also contain metadata that describes and defines the data
contained in the file. This descriptive information is called the dictionary. The information
contained in the dictionary includes:
Variable names and descriptive variable labels
Descriptive values labels
Missing values definitions
Print and write formats

• Setting directory
• Generating a codebook
• Defining
• Recoding &
• Computing variables

Module 2
Input and data cleaning

Defining variable
• Defining a variable includes giving it a name, specifying
its type, the values the variable can take (e.g., 1, 2, 3),
etc.
• Without this information, your data will be much harder to
understand and use.
• Whenever you are working with data, it is important to
make sure the variables in the data are defined so that
you (and anyone else who works with the data) can tell
exactly what was measured, and how.

Defining variable
• You can define information about your variables by accessing
the Variable View tab (at the bottom of the Data Editor window).
The Variable View tab displays information about the variables in
your data. You can get to the Variable View window in two ways:
• In the Data Editor window, click the Variable View tab at the bottom.
• In the Data Editor window, in the Data View tab, double-click a
variable name at the top of the column. This method has the
advantage of taking you to the specific variable you clicked.

Manual Input of Data
• Define Variables
• The "one person, one row" Rule
19

Manual Input of Data
• When you open the SPSS program, you will see a blank
spreadsheet in Data View. If you already have another dataset open
but want to create a new one, click File > New > Data to open a
blank spreadsheet.
• You will notice that each of the columns is labeled “var.” The column
names will represent the variables that you enter in your dataset.
You will also notice that each row is labeled with a number (“1,” “2,”
and so on). The rows will represent cases that will be a part of your
dataset. When you enter values for your data in the spreadsheet
cells, each value will correspond to a specific variable (column) and
a specific case (row).
20

Automated input of data and file import
• Excel to SPSS
• Text file to SPSS
21

• If you already have data that are in an SPSS file format
(file extension “.sav”), you can simply open that file to
begin working with your data in SPSS.
• However, if you have data stored in other types of files,
such as an Excel spreadsheet or a text file, you will need
to instruct SPSS how to read the file and then save it in
the SPSS file format (“.sav”).
• Below, we will cover how to import data from two
common types of files: Excel files and text files.
22

To open your Excel file in SPSS:
• File, Open, Data, from the SPSS menu.
• Select type of file you want to open,Excel *.xls *.xlsx,
*.xlsm .
• Select file name.
• Click 'Read variable names' if the first row of the
spreadsheat contains column headings.
• Click Open.
23

Data Cleaning
• Missing Values
• Invalid values
24

Transform
• Recoding variables
• Computing variables
25

Descriptive Analysis of Data
26

https://study.com/academy/lesson/what-is-
descriptive-statistics-examples-lesson-
quiz.html
27

Module III - Descriptive analysis
of data
28

Descriptive Statistics
Procedures for depicting the main aspects of sample data, without
necessarily inferring to a larger population.
• Descriptive statistics usually include the mean, median, and mode
to indicate central tendency, as well as
• the range and standard deviation that reveal how widely spread
the scores are within the sample.
• Descriptive statistics could also include charts and graphs such as
a frequency distribution or histogram, among others.
29

Frequencies
• When summarizing quantitative (continuous/interval/ratio) variables,
we are typically interested in questions like:
• What is the "center" of the data? (Mean, median)
• How spread out is the data? (Standard deviation/variance)
• What are the extremes of the data? (Minimum, maximum; Outliers)
• What is the "shape" of the distribution? Is it symmetric or
asymmetric? Are the values mostly clustered about the mean, or are
there many values in the "tails" of the distribution? (Skewness,
kurtosis)
30

Descriptives
When summarizing quantitative (continuous/interval/ratio) variables, we are typically
interested in questions like:
• What is the "center" of the data? (Mean, median)
• How spread out is the data? (Standard deviation/variance)
• What are the extremes of the data? (Minimum, maximum; Outliers)
• What is the "shape" of the distribution? Is it symmetric or asymmetric? Are the values
mostly clustered about the mean, or are there many values in the "tails" of the
distribution? (Skewness, kurtosis)
• In SPSS, the Descriptives procedure computes a select set of basic descriptive
statistics for one or more continuous numeric variables. In all, the statistics it can
produce are:
• N valid responses, Mean, Sum, Standard deviation, Variance, Minimum, Maximum,
Range, Standard error of the mean (or S.E. mean), Skewness, Kurtosis
31

Explore
• The Explore procedure produces detailed univariate statistics and graphs for numeric
scale variables for an entire sample, or for subsets of a sample. It can also be used to
assess the normality of a numeric scale variable with special inferential statistics and
detailed diagnostic plots.
• To run the Explore procedure, click Analyze > Descriptive Statistics > Explore.
32

Crosstabs
To describe a single categorical variable, we use frequency tables.
To describe the relationship between two categorical variables, we use a special type of
table called a cross-tabulation (or "crosstab" for short).
In a cross-tabulation, the categories of one variable determine the rows of the table, and
the categories of the other variable determine the columns. The cells of the table contain
the number of times that a particular combination of categories occurred. The "edges" (or
"margins") of the table typically contain the total number of observations for that category.
This type of table is also known as a:
• Crosstab.
• Two-way table.
• Contingency table.
33

Charts
SPSS to create bar graphs, histograms, line graphs, and scatterplots.
Editing the graphs, and printing selected parts of the output.
34

Module IV - Statistical tests
35

Means: the numerical average of a set of scores, computed as the sum of all scores
divided by the number of scores.
T-test: A t-test is a type of inferential statistic used to determine if there is a significant
difference between the means of two groups, which may be related in certain features.
One-way ANOVA: The one-way analysis of variance (ANOVA) is used to determine
whether there are any statistically significant differences between the means of three or
more independent (unrelated) groups.
Non parametric tests: The one-way analysis of variance (ANOVA) is used to determine
whether there are any statistically significant differences between the means of three or
more independent (unrelated) groups.
36

Normality tests: a theoretical distribution in which values pile up in the center at the
mean and fall off into tails at either end. When plotted, it gives the familiar bell-shaped
curve expected when variation about the mean value is random. The normal distribution
has several primary characteristics: It is symmetrical, it has both upper and lower
asymptotes, and its mean, median, and mode are the same value.
37

Correlation and regression
Correlation: n. the degree of a relationship (usually linear) between two variables, which
may be quantified as a correlation coefficient.
Regression analysis
any of several statistical techniques that are used to describe, explain, or predict (or all
three) the variance of an outcome or dependent variable using scores on one or more
predictor or independent variables.
For example, a regression analysis could show the extent to which 1st-year grades in
college (outcome) are predicted by such factors as standardized test scores, courses
taken in high school, letters of recommendation, and particular extracurricular activities.
38

Module V - Multivariate analysis
39

Factor Analysis
factor analysis
(FA) a broad family of mathematical procedures for reducing a set of
interrelations among manifest variables to a smaller set of unobserved
latent variables or factors.
For example, a number of tests of mechanical ability might be
intercorrelated to enable factor analysis to reduce them to a few
factors, such as fine motor coordination, speed, and attention.
40

Factor Analysis
manifest variable
a variable whose values can be directly observed or measured, as
opposed to one whose values must be inferred.
In structural equation modeling and factor analysis, manifest variables
are used to study latent variables. Also called indicator variable.
41

Factor Analysis
latent variable
a theoretical entity or construct that is used to explain one or more
manifest variables. Latent variables cannot be directly observed or
measured but rather are approximated through various measures
presumed to assess part of the given construct.
42

Factor Analysis
43

Factor Analysis
SPSS Anxiety Questionnaire
1. Statistics makes me cry
2. My friends will think I’m stupid for not being able to cope with SPSS
3. Standard deviations excite me
4. I dream that Pearson is attacking me with correlation coefficients
5. I don’t understand statistics
6. I have little experience of computers
7. All computers hate me
8. I have never been good at mathematics
44

Factor Analysis
45

Factor Analysis
46

Factor Analysis
47

Factor Analysis
48

Factor Analysis
49

Factor Analysis
50

Factor Analysis
51

Factor Analysis
52

Factor Analysis
53

Factor Analysis
54

Cluster Analysis
a method of multivariate data analysis in which individuals or units are
placed into distinct subgroups based on their strong similarity with
regard to specific attributes.
For example, one might use cluster analysis to form groups of
individual children on the basis of their levels of anxiety, aggression,
delinquency, and cognitive difficulties so as to identify useful typologies
that could increase understanding of co-occurring mental disorders and
lead to more appropriate treatments for specific individuals.
There are several different forms of cluster analysis—including
hierarchical clustering and latent class analysis—and each is
appropriate for use with different types of data. Results of a cluster
analysis often are presented in a dendrogram. 55

Cluster Analysis
Dendrogram
n. a type of treelike diagram used in hierarchical clustering. It lists all of
the participants at one end and then directs branches out from those
participants who are similar and connects them with a node that
represents a cluster. A dendrogram could be used, for example, to
cluster individuals into various categories of HIV risk, depending on
their number of sexual partners, their frequency of unprotected sex,
and the perceived risk of their partners. Individuals who had few sexual
partners with little or no unprotected sex and who perceived little or no
partner risk of HIV infection would be branched into a cluster that could
be labeled low risk, whereas individuals with high values on these three
variables would branch into a high-risk cluster, with other individuals
presumably clustering into a medium-risk group.
56

Cluster Analysis
57

Cluster Analysis
58

Cluster Analysis
59

Cluster Analysis
60

Cluster Analysis
61

Cluster Analysis
62

Cluster Analysis
63

Cluster Analysis
64

Videos
https://forms.gle/vNpsRtjPH3RoE7MKA
https://docs.google.com/forms/d/e/1FAIpQLSfXnVuKSLI47pOeEvIhX_6
YX1M8Fa_cv0Mnt5A7jXHVoOhPfA/viewform?usp=sf_link
https://study.com/academy/lesson/what-is-a-t-test-procedure-
interpretation-examples.html
https://study.com/academy/lesson/cluster-analysis-market-
segmentation-definition-examples.html
https://www.youtube.com/watch?v=Se28XHI2_xE

Introduction to Statistical package of social sciences

Recommended

Recommended

More Related Content

Similar to Introduction to Statistical package of social sciences

Similar to Introduction to Statistical package of social sciences (20)

Recently uploaded

Recently uploaded (20)

Introduction to Statistical package of social sciences

Editor's Notes