DBA Basics: Getting Started with Performance Tuning.pdf
Spss software
1. What is SPSS?
“Statistical Package for the Social Science”
One of the most popular statistical
packages which can perform highly
complex data and analysis with simple
instructions.
2. SPSS Inc. founded in 1968
2009 acquired by IBM
SPSS is now owned b y IBM
4. Opening SPSS
The default window will have the data editor
There are two sheets in the window:
1. Data view 2. Variable view
5.
6. Variable View window
Name
The first character of the variable
name must be alphabetic
less than 64 characters.
Spaces are NOT allowed.
7. Type
The two basic types of variables that you will use are
numeric and string.
8. Width
Width allows you to determine the number of
characters SPSS will allow to be entered for
the variable
9. Decimals
Number of decimals
It has to be less than or equal to 16
3.14159265
10. Label
Describing the variable
You can write characters with spaces up
to 256 characters
11. Values(coding)
This is used and to suggest which
numbers represent which categories
when the variable represents a
category
12. Defining the value labels
Click the cell in the values column
For the value, and the label, you can put up to 60
characters.
After defining the values click add and then click OK.
13. MISSING
Missing value:
• Not captured in the data set: errors in
feeding ...
•Empty value - No value in the population
15. Sample Questionnaire
Gender : Male / Female
Age : below 25/ 26-35/ 36-45/ 46-55
Category : student / Faculty / research scholars
How do you come to know about this
program(multiple choice)
Mail / WhatsApp/ friends/website
17. Saving the data
To save the data file -click ‘file’ and click
‘save as.’ You can save the file in different
forms by clicking “Save as type.”
Click
18. Opening the sample data
Open ‘Employee data.sav’ from the SPSS
Go to “File,” “Open,” and Click Data
30. Leave the Model: set as "Alpha", which represents
Cronbach's alpha in SPSS Statistics. If you want to
provide a name for the scale, enter it in the Scale label.
31.
32. Click the CONTINUE button,
Click the OK button to generate the output.
35. Cronbach's Alpha
.00-1.0
00 = no consistency in measurement (negative
consistency)
1.0= perfect consistency in measurement(positive
consistency)
.70= 70% of the variance in score is reliable
(acceptable level .7)
39. Hypotheses
Ha = Male and female have same academic score.
Ho = Male and female don’t have same academic score.
40. Rule of Thumb
If sig. value is > .05 = Accept Ho
If sig. value is < .05 = Accept Ha
41. Conditions :
dependent variable should be measured on a
interval or ratio scale.
Independent variable should consist of two
categorical group.(Nominal )
There needs to be homogeneity of variances.
Levene’s test for homogeneity
42.
43.
44.
45.
46. OUTPUT
If the levene’s sig value > .05 t-test should be based on
equal variance.
47.
48. Paired Samples t Test
Detect a difference between the mean of two
dependent variables.
Eg :
Measure the employees performance before and after
training .
49. Conditions
Dependent variable should be interval or ratio scale.
Two groups(dependent)
Normality
Homogeneity of variances
53. The sig. value is .000
which is < .05
Reject Ho
There is a significant difference between before and
after training.
We can conclude that the training program is effective
56. Eg
How strongly the sales are related with
advertising expenditure?
Work experience and their output.
Exam performance and preparation
time.
Online test score and students work
experience.
65. ANOVA is used to determine whether
there are any significant differences
between the means of more than two
independent groups.
understand whether exam performance
differed based on their community type
( urban, rural, semi-urban)
66. Conditions :
Dependent variable should be
measured at the interval or ratio
level.
Independent variable should consist
of three or more categorical group.
Normality
67.
68.
69.
70. Post hoc
ANOVA result will tell us whether there is a difference
among at least two of the group , but it will not tell us
which of the group exhibited this difference.
Post hoc help us to identify which group are different.
74. Exercise 1
Do students in each of the three
groups of community type have
similar academic score?
75. Two way ANOVA
SIGNIFICANT DIFFRENCE
BETWEEN TWO VARIABLES .
EXAMPLE
Scores – dependent
Age - Independent
Gender – Independent
76. Conditions :
Dependent variable should be ratio scale
Independent variables should be nominal
scale.
Two independent variable
Homogeneity (equality) of
variances(levene’s model)
77. Hypotheses
H1= Gender have no significant effect
on students academic score
H2= age have no significant effect on
score.
H3 = gender and age interaction have
no significant effect on score.
80. Displays overall mean,
means for each level of
duration, mean for each
level of modality and the
means for each
combination of duration by
modality (= the interaction
means).
Means
81. Levene’s test. This significant result means
the assumption of equal group variances
has not been met.
Output
82. In this case the analysis is not valid !.
Output
83.
84. Exercise : sample 3
Does the place of operations influences
ROCE?
Does the Industry type influences
ROCE?
97. It is used when we want to predict the value of a
variable(dependent) based on the value of another
variable.(independent).
Use multiple regression if independent variable are
more than one.
Example: How well can we predict a test score based
on work experience?
98. Conditions
variables should be measured at the interval or ratio
scale.
There needs to be a linear relationship between the
two variables(use scatter plot)
No significant outliers (observed data should not be
deviated from rest of the data).
One independent and one dependent variable.
99. Eg
Exam performance can be predicted based on revision
time.
Cigarette consumption can be predicted based on
smoking duration.
sales can be predicted based on ad.expenditure.
100.
101.
102.
103.
104. R is correlation value
R square = How much of the total variation in the
dependent variable can be explained by the
independent variable.
76.2% which is very large.
110. Cause and effect relationship
Eg you could use multiple regression to understand
whether exam performance can be predicted based on
revision time, lecture attendance and work exp.
111. It helps you to determine the overall fit (variance
explained) of the model .
Eg 2: sales can be predicted by sales men exp. And ad
exp.
112. Conditions
Dependent variable should be an interval scale.
You have two or more independent variables.
There needs to be a linear relationship between
independent and dependent variable.(scatter plot)
Your data must not show multicollinearity(R)
There should be no significant outliers
119. R, the multiple correlation coefficient,
R2 value ,which is the proportion of variance in the
dependent variable that can be explained by the
independent variables.
121. The table shows that the independent variables
statistically significantly predict the dependent
variable, F(4, 95) = 32.393, p < .0005 (i.e., the
regression model is a good fit of the data).
123. Exploratory Factor Analysis
Basic Concepts
What is Factor
Analysis?
The basic assumption of factor analysis is that for a
collection of observed variables there are a set
of underlying variables called factors (smaller than
the observed variables), that can explain the
interrelationships among those variables.
124. Exploratory Factor Analysis
Basic Concepts
Sample Size to Run a Factor
Analysis
10-15 Respondents per
variable
300 or more than 300 is a
good sample size
Run Kaiser-Meyer-Olkin
Measure of Sampling
Adequacy(KMO) test
125.
126.
127.
128.
129.
130.
131.
132.
133.
134.
135.
136.
137.
138.
139. KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling
Adequacy. .930
Bartlett's Test of
Sphericity
Approx. Chi-
Square
19334.49
2
df 253
Sig. .000
Interpreting
Exploratory Factor Analysis Output
140. Communalities
Initial Extraction
Statiscs makes me cry 1.000 .435
My friends will think I'm stupid for not
being able to cope with SPSS
1.000 .414
Standard deviations excite me 1.000 .530
I dream that Pearson is attacking me with
correlation coefficients
1.000 .469
I don't understand statistics 1.000 .343
I have little experience of computers 1.000 .654
All computers hate me 1.000 .545
I have never been good at mathematics 1.000 .739
My friends are better at statistics than me 1.000 .484
Computers are useful only for playing
games
1.000 .335
I did badly at mathematics at school 1.000 .690
People try to tell you that SPSS makes
statistics easier to understand but it doesn't
1.000 .513
I worry that I will cause irreparable
damage because of my incompetenece with
computers
1.000 .536
Computers have minds of their own and
deliberately go wrong whenever I use them
1.000 .488
Computers are out to get me 1.000 .378
I weep openly at the mention of central
tendency
1.000 .487
I slip into a coma whenever I see an
equation
1.000 .683
SPSS always crashes when I try to use it 1.000 .597
Everybody looks at me when I use SPSS 1.000 .343
I can't sleep for thoughts of eigen vectors 1.000 .484
I wake up under my duvet thinking that I
am trapped under a normal distribtion
1.000 .550
My friends are better at SPSS than I am 1.000 .464
Interpreting
Exploratory
Factor Analysis
Output
142. Rotated Component Matrixa
Component
1 2 3 4
I have little experience of computers .800
SPSS always crashes when I try to use it .684
I worry that I will cause irreparable damage because
of my incompetence with computers
.647
All computers hate me .638
Computers have minds of their own and deliberately
go wrong whenever I use them
.579
Computers are useful only for playing games .550
Computers are out to get me .459
I can't sleep for thoughts of Eigen vectors .677
I wake up under my duvet thinking that I am trapped
under a normal distribution
.661
Standard deviations excite me -.567
People try to tell you that SPSS makes statistics easier
to understand but it doesn't
.473 .523
I dream that Pearson is attacking me with correlation
coefficients
.516
I weep openly at the mention of central tendency .514
Statiscs makes me cry .496
I don't understand statistics .429
I have never been good at mathematics .833
I slip into a coma whenever I see an equation .747
I did badly at mathematics at school .747
My friends are better at statistics than me .648
My friends are better at SPSS than I am .645
If I'm good at statistics my friends will think I'm a
nerd
.586
My friends will think I'm stupid for not being able to
cope with SPSS
.543
Everybody looks at me when I use SPSS .428
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
Interpreting
Exploratory
Factor
Analysis
Output
143. Factors
Fear of
Compute
rs
Fear of
Statistics
Fear of
Maths
Peer
Evaluatio
n
I have little experience of computers .800
SPSS always crashes when I try to use it .684
I worry that I will cause irreparable damage because
of my incompetence with computers
.647
All computers hate me .638
Computers have minds of their own and deliberately
go wrong whenever I use them
.579
Computers are useful only for playing games .550
Computers are out to get me .459
I can't sleep for thoughts of Eigen vectors .677
I wake up under my duvet thinking that I am trapped
under a normal distribution
.661
Standard deviations excite me -.567
People try to tell you that SPSS makes statistics
easier to understand but it doesn't
.473 .523
I dream that Pearson is attacking me with
correlation coefficients
.516
I weep openly at the mention of central tendency .514
Statiscs makes me cry .496
I don't understand statistics .429
I have never been good at mathematics .833
I slip into a coma whenever I see an equation .747
I did badly at mathematics at school .747
My friends are better at statistics than me .648
My friends are better at SPSS than I am .645
If I'm good at statistics my friends will think I'm a
nerd
.586
My friends will think I'm stupid for not being able to .543
Interpreting
Exploratory
Factor
Analysis
Output