1. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 1 OF 10
All you need in Data Analysis using SPSS
1.DESCRIPTIVE STATISTICS
When to use: when we need to summarize data using statistical
measures. It is used in cases where we have all the data (All society
data), results are 100% correct, no pre-assumption exist.
SCALE: DESCRIPTIVE: – Mean, Sum, Range, Max, Min stdev, skewness,
Kurtosis, Check outlier candidates using standardized values
EXPLORE: Mean, Sum, Range… Check Normality, Check Outlier
candidates using Box plots
CATEGORICAL: FOR NOMINAL/ORDINAL) – Frequency, Percentage of values
FREQUENCIES: For each variable alone, display percentage and count of
variable values, Bar chart, Pie chart or histogram
CROSSTAB: 2 or more intersected variables, display percentages and count
RATIO STATISTICS
Describe the ratio between two scale variables.
Example of research question: Is there good uniformity in the ratio between
the appraisal price and sale price of homes in each of five counties?
Output: Median, mean, coefficient of dispersion (COD), median-centered
coefficient of variation, mean-centered coefficient of variation, minimum
and maximum values, the concentration index computed for a user-specified
range or percentage within the median ratio.
We can determine
Which township's housing values have changed the most?
Median values closer to 1 has changed the least
Larger COD values indicate greater variability.
The within % of median coefficient of concentration (COC) measures variability,
it simply reports the percentage of values within a certain percentage of the
median. Larger values of this statistic indicate less variability.
2. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 2 OF 10
2.PRETESTS SUMMARY
(Normality, Linearity, Homocedasticity)
1. Testing Normality
In H0 assume that skewness and Kurtosis are equal to Zero
H0: The population (for variable x) is normally distributed.
Ha: The population (for variable x) is NOT normally distributed.
If Sig < = 0.05 (reject H0), Means Not Normally distributed
If Sig > 0.05 (don’t reject H0), Means Normally distributed
[[SSPPSSSS]] DDEESSCCRRIIPPTTIIVVEE SSTTAATTIISSTTIICCSS//EEXXPPLLOORREE then check normality plots with test in
plots button
[[SSPPSSSS]] NNOONN PPAARRAAMMEETTRRIICC––OONNEE SSAAMMPPLLEE // KKOOLLMMOOGGOORROOVV--SSMMIIRRNNOOVV TTEESSTT
from settings, used to check Normal, uniform, exponential and poisson
distribution.
2. Testing Linearity
By using Simple Linear Regression y = aX + b
H0: a = 0 H0: The Slope of best fit line = 0
Ha: a ≠ 0 Ha: The Slope of best fit line ≠ 0
[[SSPPSSSS]] RREEGGRREESSSSIIOONN // LLIINNEEAARR
If Sig < = 0.05 (reject H0), Means Linear Relationship
If Sig > 0.05 (don’t reject H0), Means Not Linear Relationship
Or we could use the same test from comparing Means
[[SSPPSSSS]] CCOOMMPPAARRIINNGG MMEEAANNSS // MMEEAANNSS
The null hypothesis of correlation/linear regression is that the slope of the
best-fit line is equal to zero; in other words, as the X variable gets larger,
the associated Y variable gets neither higher nor lower.
3. Testing Homoscedasticity
The variability in scores for variable X should be similar at all values of
variable Y. it assumes that samples are obtained from populations of equal
variances.
[[SSPPSSSS]] GGEENNEERRAALL LLIINNEEAARR MMOODDEELL //MMUULLTTII VVAARRIIAATTEE // ((OOPPTTIIOONNSS//HHOOMMOOGGEENNEEIITTYY TTEESSTT))
[[SSPPSSSS]] CCOOMMPPAARRIINNGG MMEEAANNSS // IINNDDEEPPEENNDDEENNTT SSAAMMPPLLEESS TT TTEESSTT ((LLEEVVEENNEE’’SS TTEESSTT))
[[SSPPSSSS]] CCOOMMPPAARRIINNGG MMEEAANNSS // OONNEE WWAAYY AANNOOVVAA ((HHOOMMOOGGEENNEEIITTYY OOFF VVAARRIIAANNCCEE TTEESSTT))
How : shortest method
Running Levene's test in SPSS, by using one way ANOVA, and checking Homogeneity
of variance test in options
H0: population variances are equal for x, to group1,group2
Ha: population variances are not equal for x ,to group1,group2
H0: population variances are equal between Read and Write
Ha: population variances are not equal between Read and Write
3. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 3 OF 10
3.CORRELATION
[[SSPPSSSS]] CCOORRRREELLAATTEE // BBIIVVAARRIIAATTEE
Example of research question: Is there a significant relationship between age
and optimism Scores, if yes what is its magnitude and direction?
Does optimism increase with age?
The null hypothesis (H0) and alternative hypothesis (Ha) of the significance test
for correlation can be expressed as follow
H0: ρ = 0 or the population corr.coefficient = 0; there is a significant correlation
Ha: ρ ≠ 0 or the population corr. coefficient ≠0; a nonzero correlation could exist
If Sig < = 0.05 (reject H0),
Means there is a significant Correlation between X and Y
If Sig > 0.05 (reject H0),
Means there is No significant Correlation between X and Y
Strength of correlation coefficient is explained as
Range Explanation Same for negatives
[0.0 – 0.3[ Not Significant ذكرُي ال 0 to -0.3
[0.3 – 0.5[ Weak ضعيف -0.3 to –0.5
[0.5 – 0.7[ Intermediate متوسط -0.5 to –0.7
[0.7 – 0.9[ Strong قوي -0.7 to –0.9
[0.9 – 1.0[ Very Strong جدا قوي -0.9 to -1
Small r=.10 to .29, Medium r=.30 to .49, Large r=.50 to 1.0
If correlation coefficient between X and Y is
Positive: It means, Increase the value of X will Increase the value of Y
Negative: It means, Increase the value of X will Decrease the value of Y
Zero: No correlation at all.
Correlation coefficient between a variable and itself (X and X) always = 1
Use Spearman correlation coefficient for 2 ordinal variables
Use Pearson correlation coefficient for 2 Scale variables
Kandell’s tau : used exactly as Spearman correlation coefficient
Phi: used to find correlation between 2 Nominal Variables each of 2 values
Cramers: used to find correlation between 2 Nominal Variables one of them
or both of more than 2 values
4.CHECKING RELIABILITY
[[SSPPSSSS]] SSCCAALLEE // RREELLIIAABBIILLIITTYY AANNAALLYYSSIISS
Cronbach Alpha measures internal consistency
Variables used to calculate Cronbach Alpha
All Variables related to our research
Exclude empty variables, One value variables, Serials, ID’s and similar
Cronbach alpha values can be quite small. In this situation it may be
better to calculate and report the mean inter-item correlation for the
items. Optimal mean inter-item correlation values range from
.2 to .4 (as recommended by Briggs & Cheek 1986).
4. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 4 OF 10
5.LIKERT SCALE
What is Likert Scale Data?
Evaluation on a 5 degree scale, 3 degree scale or any other Level
Average Explanation – 5 Level Scale
Range Meaning -ve Meaning +ve
[1.0 – 1.8[ Strongly Agree Strongly disagree
[1.8 – 2.6[ Agree Disagree
[2.6 – 3.4[ Neutral Neutral
[3.4 – 4.2[ Disagree Agree
[4.2 – 5.0] Strongly disagree Strongly Agree
Average Explanation – 3 Level Scale
Range Meaning
[1.00 – 1.66[ Agree
[1.66 – 2.33[ Neutral
[2.33 – 3.00[ disagree
5. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 5 OF 10
6.INFERENTIAL STATISTICS
When to use: when we have a sample and want to generalize result to a
population, it include error in generalization called alpha, we have a
hypothesis that want to reject or retain an assumption
Nominal/Ordinal Tests
One Sample Binomial Test (one categorical variable with 2 values only)
[[SSPPSSSS]] AANNAALLYYZZEE--NNOONN PPAARRAAMMEETTRRIICC––OONNEE SSAAMMPPLLEE
Example of research question: Is proportion of Female Spiders = 0.75
H0: proportion of female spiders = 0.75
Ha: proportion of female spiders≠ 0.75
When performing the test, value of H0 should be at first case
Chi Square goodness of fit Test (one categorical/discrete variable, each have
2 or more answers (values))
[[SSPPSSSS]] AANNAALLYYZZEE--NNOONN PPAARRAAMMEETTRRIICC––OONNEE SSAAMMPPLLEE
Example of research question: are students interested in different fields
equally
H0: The proportions of MIS, CIS and CS Students are equal
Ha: The proportions of MIS, CIS and CS Students are NOT equal
H0: Students are interested in MIS, CIS and CS equally
Ha: Students are interested in MIS, CIS and CS unequally
Could be used as
H0: there is no significant difference between the Current smart phone
proportion and preferred smart phone proportion that the students have.
Ha: there is a significant difference between the Current smart
phoneproportion and preferred smart phoneproportionthat the students have.
7.NONPARAMETRIC TESTS
One Sample Wilcoxon Signed Rank test (One sample median test) (one scale
variable)
[[SSPPSSSS]] AANNAALLYYZZEE--NNOONN PPAARRAAMMEETTRRIICC––OONNEE SSAAMMPPLLEE
Example of research question: Is there a significant difference between a sample
median and a hypothesized value.
Fisher’s exact test
The Fisher’s exact test is used when you want to conduct a chi-square test
but one or more of your cells have an expected frequency of five or less.
Remember that the chi-square test assumes that each cell has an expected
frequency of five or more, but the Fisher’s exact test has no such assumption
and can be used regardless of how small the expected frequency is
The Kruskal Wallis test
Is used when you have one independent variable with two or more levels and
an ordinal dependent variable. In other words, it is the non-parametric version
of ANOVA and a generalized form of the Mann-Whitney test method since it permits
two or more groups.
Other Categorical Tests/measures (used in Crosstabs)
6. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 6 OF 10
Chi Square Test of Independence (two categorical variables, each have 2 or
more values)
[[SSPPSSSS]] AANNAALLYYZZEE--DDEESSCCRRIIPPTTIIVVEE SSTTAATTIISSTTIICCSS––CCRROOSSSSTTAABBSS
Example of research question:
Are older people more optimistic than younger people?
Is there an association between gender and smoking behavior?
Are males more likely to be smokers than females?
Is the proportion of males that smoke the same as the proportion of females?
H0: X is independent of Y
(There is no significant association between x and y.
Ha: X is NOT independent of Y
(There is a significant association between x and y.
H0: Obesity is independent of eating Junk Meals
Ha: Obesity is NOT independent of eating Junk Meals
McNemar Test: (two categorical variables each have 2 values (Yes/No) measure
the same feature at 2 different times to see the effect of an Intervention)
Example of research question: Is there a change in the proportion of the sample
diagnosed with clinical depression prior to, and following, the intervention?
When you have matched or repeated measures designs (e.g. pre-test/post-
test), you cannot use the usual chi-square test. Instead, you need to use
McNemar’s Test. In the health and medical area this might be the presence or
absence of some health condition (0=absent; 1=present), while in a political
context it might be the intention to vote for a particular candidate (0=no,
1=yes) before and after a campaign speech.
H0: there is No significant change in the proportion of participants diagnosed as
clinically depressed prior to and following the program
H0: there is a significant change in the proportion of participants diagnosed as
clinically depressed prior to and following the program
Cochran’s Q TEST
The McNemar’s Test described in the previous section is suitable if you
have only two time points. If you have three or more time
points[categorical var], each with 2 values [yes,no] you will need to use
Cochran’s Q Test
Example of research question: Is there a change in the proportion of
participants diagnosed with clinical depression across the three time
points: (a) prior to the program, (b) following the program and (c) three
months post-program?
Three categorical variables measuring the same characteristic. (e.g.
presence or absence of the characteristic 0=no, 1=yes) collected from each
participant at different time points.
H0: there is No significant change in the proportion of participants diagnosed as
clinically depressed Prior program, Following program and three months later
H0: there is a significant change in the proportion of participants diagnosed as
clinically depressed Prior program, Following program and three months later
7. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 7 OF 10
Risk-(Odds-Ratio) (two categorical variables each have 2 values (Yes/No))
a measure of the strength of the association between the presence of a factor
and the occurrence of an event.(No Null Hypothesis)
Quantify how strongly the presence or absence of property A is associated
with the presence or absence of property B in a given population. If each
individual in a population either does or does not have a property "A"
It gives us information as
If you have lung cancer, you are 81% more likely to smoke than if you
didn’t have lung cancer.
If you have smoke, you are 81% more likely to have Lung Cancer than if you
didn’t smoke. بمقدار تزيد بالسرطان المدخن اصابة احتمالية18%المدخن غير عن
KAPPA MEASURE OF AGREEMENT: (Two categorical variables with an equal number of
categories) commonly used in the medical literature to assess inter-rater
agreement (e.g. diagnostic classification from Rater 1 or Test 1: 0=not
depressed, 1=depressed; and the diagnostic classification of the same person
from Rater 2 or Test 2) Or Diagnosis from two different clinicians
Example of research question: How consistent are the diagnostic classifications
of the Edinburgh Postnatal Depression Scale and the Depression, Anxiety and
Stress Scale?
Example of research question: Assumes equal number of categories from Rater 1 and
Rater 2.
Interpretation of output from Kappa
The main piece of information we are interested in is the table Symmetric
Measures, which shows that the Kappa Measure of Agreement value is .56, with a
significance of p < .0005. According to Peat (2001, p. 228), a value of .5 for
Kappa represents moderate agreement, above .7 represents good agreement, and
above .8 represents very good agreement. So in this example the level of
agreement between the classification of cases as depressed using the EPDS and
the DASS-Dep is good.
Nominal. For nominal data (no intrinsic order, such as Catholic, Protestant, and
Jewish), you can select Contingency coefficient, Phi (coefficient) and Cramér's
V, Lambda (symmetric and asymmetric lambdas and Goodman and Kruskal's tau),
and Uncertainty coefficient.
Contingency coefficient. A measure of association based on chi-square. The value
ranges between 0 and 1, with 0 indicating no association between the row and
column variables and values close to 1 indicating a high degree of association
between the variables. The maximum value possible depends on the number of rows
and columns in a table.
8.COMPARING MEANS FOR SCALE VARIABLES (FOR NORMALLY DISTRIBUTED DATA)
One Sample T Test (One scale variable)
[[SSPPSSSS]] CCOOMMPPAARRIINNGG MMEEAANNSS // OONNEE SSAAMMPPLLEE TT TTEESSTT
Example of research question: Is there a significant difference between the exam
score average and 70
H0: Average weight of herring’s body = 400 grams
Ha: Average weight of herring’s body ≠ 400 grams
8. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 8 OF 10
Independent Samples T Test (two variables, one scale test variable, one
discrete with only 2 values for grouping)
[[SSPPSSSS]] CCOOMMPPAARRIINNGG MMEEAANNSS // IINNDDEEPPEENNDDEENNTT SSAAMMPPLLEESS TT TTEESSTT
Example of research question: Is there a significant difference in the mean self-
esteem scores for males and females?
H0: Average amount spent for males= Average amount spent for females
Ha: Average amount spent for males≠ Average amount spent for females
Paired Samples T Test (two scale variables, each measure the same feature, one
before and one after an action)
[[SSPPSSSS]] CCOOMMPPAARRIINNGG MMEEAANNSS // PPAAIIRREEDD SSAAMMPPLLEESS TT TTEESSTT
Example of research question: Is there a significant effect of medicine on lowering
average blood sugar in blood.
Note: this test is called Dependent Samples t test, since both observations are
related, it is not necessary to have 2 observations before and after an event,
for example, we might investigate if average score of read = average score of
write or not using this test. Why? Since write depends on read.
H0: Average reaction time before drinking a beer = Average reaction time
after drinking a beer
Ha: Average reaction time before drinking a beer ≠ Average reaction time
after drinking a beer
One way ANOVA(one scale variable, one discrete with multiple values)
[[SSPPSSSS]] CCOOMMPPAARRIINNGG MMEEAANNSS // OONNEE WWAAYY AANNOOVVAA
Example of research question: Is there a difference in optimism scores for young,
middle-aged and old participants?
H0: Average Weight of parsley plants is equal among fertilizers used
Ha: Average Weight of parsley plants is not equal among fertilizers used
Simple Linear Regression (Two scale variables, one is independent (Input) and
the other is Dependent (Output))
[[SSPPSSSS]] AANNAALLYYZZEE//RREEGGRREESSSSIIOONN//LLIINNEEAARR
Example of research question: How much of the variance in life satisfaction scores
can be explained by self-esteem?
life satisfaction = a * self-esteem + b
Multiple Linear Regression (3 or more scale variables, one or more are
independent (Input) and one is Dependent (Output))
[[SSPPSSSS]] AANNAALLYYZZEE//RREEGGRREESSSSIIOONN//LLIINNEEAARR
Example of research question:
How much of the variance in life satisfaction scores can be explained by the
following set of variables: self-esteem, optimism and perceived control?
Which of these variables is a better predictor of life satisfaction?
life satisfaction = a1 * self-esteem + a2 * optimism+ a3*perceived control + b
9. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 9 OF 10
If data is not normally distributed, we should use other alternative
methods as Wilcoxon Signed Rank Test, Kruskal-Wallis Test, Friedman Test,
Mann-Whitney U Test and others
Parametric Technique NonParametric Technique
Independent-samples t-test Mann-Whitney U Test
Paired-samples t-test Wilcoxon Signed Rank Test
One-way between-groups ANOVA Kruskal-Wallis Test
One-way repeated-measures ANOVA Friedman Test
None Chi-square for goodness of fi t
None Chi-square for independence
None McNemar’ Test
None Cochran’s Q Test
None Kappa Measure of Agreement
Two-way analysis of variance (between groups) None
Mixed between-within groups ANOVA None
Multivariate analysis of variance (MANOVA) None
Analysis of covariance None
one-way between-groups ANOVA (one independent variable, one dependent variable)
Two-way analysis of variance (between groups) (two independent variables, one dependent
variable).
Binary Logistic (one Binary dependent variable, one or more independent
variables either scale or categorical)
[[SSPPSSSS]] AANNAALLYYZZEE//RREEGGRREESSSSIIOONN//BBIINNAARRYY LLOOGGIISSTTIICC
H0: the model is adequately fits the data
Ha: the model is not adequately fits the data
Example of research question:
A catalog company wants to increase the proportion of mailings that result in
sales.
A doctor wants to accurately diagnose a possibly cancerous tumor.
A loan officer wants to know whether the next customer is likely to default.
9.ROC CURVE
[[SSPPSSSS]] AANNAALLYYZZEE//RROOCC CCUURRVVEE
SSEENNSSIITT IIVVIITT YY: Power to identify positives
SSPPEECCIIFFIICCIITT YY: Power to identify negatives
FFAALLSSEE PPOOSSIITT IIVVEE RRAATT EE (Whole Model)(α) = FP / (FP + TN)
H0: using the predicted is better than guessing. True area=0.5
Ha: using the predicted is not better than guessing. True area≠0.5
10. INSTRUCTOR: MOHAMMED ABDUL KHALEQ DWIKAT EMAIL:dwikatmo@gmail.com
TOPIC: SPSS TESTS' SUMMARY DATE: 1/31/2020 PAGE: 10 OF 10
10.GRAPHICS AND PLOTS
CONTROL CHARTS
Control charts are a graphical aid for assessing variation in a manufacturing
process. By distinguishing between common and unusual variation, you can
determine whether a process is functioning normally or needs to be adjusted.
Q-Q PLOTS
Deciles 10, Quintiles 5, Quartiles 4, Terciles 3
Percentile = 100 parts, Median = 2 parts
Plots the Quantiles of a variable's distribution against the Quantiles of any
of a number of test distributions
is a graphical method for comparing two probability distributions by plotting
their quantiles against each other.
Quantiles : Values that divide the cases into some number of equal-sized groups.
If data is normally distributed, they will fall along diagonal line
[[SSPPSSSS]] AANNAALLYYZZEE//DDEESSCCRRIIPPTTIIVVEE//QQ--QQ PPLLOOTTSS
P-P PLOTS
Plots a variable's cumulative proportions, against the cumulative proportions
of any of a number of test distributions.
Probability plots are generally used to determine whether the distribution of a
variable matches a given distribution. If the selected variable matches the test
distribution, the points cluster around a straight line.
[[SSPPSSSS]] AANNAALLYYZZEE//DDEESSCCRRIIPPTTIIVVEE//PP--PP PPLLOOTTSS