SlideShare a Scribd company logo
Conjoint Analysis :

Conjoint Analysis is a marketing research technique designed to help determine preferences of
customers. It is used to analyse how customers value different attributes of a product ( or service)
and thus gives an insight into the trade-offs they are to make among the various attributes. To put
simply, it tells how much each feature of a product is worth to the consumers.

This study includes surveying people with a certain set of attribute combinations which the survey-
takers rank or provide preferences. Analysis will be done to model the customer preferences for
different combination of attributes. The attributes are termed factors and the different values are

In the example that we have taken to use Conjoint Analysis through the tool SPSS, we have analysed
data on carpet, taking attributes like Price, Brand, Money-return, Package design and Seal as the
attributes based on which the consumers give prefernces. Using two data sets, we calculate the part
worths and decide on the weightage of each of the attributes that the users have provided.

Variable name Variable label         Value label
package       package design         A*, B*, C*
brand         brand name             K2R, Glory, Bissell
price         price                  $1.19, $1.39, $1.59
seal          Good Housekeeping seal no, yes
money         money-back guarantee no, yes

Code to fetch import the data and analyse :

CONJOINT PLAN='C:UsersAbhiDesktopcarpet_plan.sav'

Model Description

                                    Relation to Ranks
                N of Levels             or Scores

package                         3 Discrete

brand                           3 Discrete

price                           3 Linear (less)

seal                            2 Linear (more)

money                           2 Linear (more)

Calculation of the part-worth of each attribute

                              Utility Estimate    Std. Error

package   A*                            -2.233           .192

          B*                             1.867           .192

          C*                              .367           .192

brand     K2R                             .367           .192

          Glory                          -.350           .192

          Bissell                        -.017           .192

price     $1.19                         -6.595           .988

          $1.39                         -7.703          1.154

          $1.59                         -8.811          1.320

seal      no                             2.000           .287

          yes                            4.000           .575

money     no                             1.250           .287
yes                   2.500        .575

(Constant)                        12.870       1.282

This table shows the utility (part-worth) scores and their standard errors for each factor level. Higher
utility values indicate greater preference. We can see that the value of the part worths are such that,
for each attribute if part-worths are added for different levels, it sums up to zero. Thus with respect
to brand Glory and Bisell, K2R is preferred more. As expected, there is an inverse relationship
between price and utility, with higher prices corresponding to lower utility. The presence of a seal of
approval or money-back guarantee corresponds to a higher utility.Also, total utility of a combination
can be calculated as :

If the cleaner had package design C*, brand Bissell, price $1.59, a seal of approval, and a money -back
guarantee, the total utility would be:

0.367 + (−0.017) + (−8.811) + 4.000 + 2.500 + 12.870 = 10.909.

  Importance Values

package            35.635

brand              14.911

price              29.410

seal               11.172

money               8.872

We can see that attributes package has most importance followed by price. Money return is of least
concern for the consumer. The values are computed by taking the utility range for each factor
separately and dividing by the sum of the utility ranges for al l factors. The values thus represent
percentages and have the property that they sum to 100.

                 B Coefficient


price                          -5.542

seal                           2.000

money                          1.250

The utility for a particular factor level is determined by multiplying the level by the coefficient. For
example, the predicted utility for a price of $1.19 was listed as −6.595 in the utilities table. This is
simply the value of the price level, 1.19, multiplied by the price coefficient, −5.542.

This table provides measures of the correlation between the observed and estimated preferences.

        Preference Scores of
            Simulations a

Number         ID           Score

1                     1        10.258

2                     2        14.292
The real power of conjoint analysis is the ability to predict preference for product profiles that
weren't rated by the subjects. These are referred to as simulation cases.

                Preference Probabilities of Simulations

Card                                           Bradley-Terry-
Number          ID        Maximum Utility            Luce        Logit

1                     1             30.0%                43.1%    30.9%

2                     2             70.0%                56.9%    69.1%

The maximum utility model determines the probability as the number
of respondents predicted to choose the profile divided by the total
number of respondents. For each respondent, the predicted choice is
simply the profile with the largest total utility.

Number of Reversals

Factor      price                                    3

            money                                    2

            seal                                     2

            brand                                    0

            package                                  0

Subject     1          Subject 1                     1

            2          Subject 2                     2

            3          Subject 3                     0

            4          Subject 4                     0

            5          Subject 5                     0

            6          Subject 6                     1
7        Subject 7             0

          8        Subject 8             0

          9        Subject 9             1

          10       Subject 10            2

This table displays the number of reversals for each factor and for each subject. For example, three
subjects showed a reversal for price. That is, they preferred product profiles with higher prices.

    Reversal Summary

N of
als       N of Subjects

1                         3

2                         2
Q. Perform Discriminant Analysis on the given dataset.
The dataset chosen contains statistics on set of people who have been given bank loans & have defaulted or not defaulted with their various characteristics.

Output Created                                                  04-Apr-2013 18:39:05
Comments                                                                                        weight:normal}
Input                   Data                     E:VGSOMSTUDYSECOND                          Your trial period for SPSS for Windows will expire in 14 da
                                                 SEMBRMSPSS16Samplesbanklo                  ys.p{color:0;font-family:Monospaced;font-size:13pt;font-
                                                 an.sav                                         GET
                        Active Dataset           DataSet1                                         FILE='E:VGSOMSTUDYSECOND SEMBRMSPSS16Samplesbanklo
                        File Label               Bank Loan Default                              DATASET NAME DataSet1 WINDOW=FRONT.
                        Filter                   <none>                                         DISCRIMINANT
                                                                                                  /GROUPS=default(0 1)
                        Weight                   <none>                                           /VARIABLES=employ address age
                        Split File               <none>                                           /ANALYSIS ALL
                                                                                                  /PRIORS EQUAL
                        N of Rows in Working                                                      /STATISTICS=MEAN STDDEV UNIVF BOXM COEFF CORR TABLE
                        Data File                                                                 /PLOT=COMBINED
Missing Value Handling Definition of Missing     User-defined missing values are
                                                 treated as missing in the analysis               /CLASSIFY=NONMISSING POOLED MEANSUB.
                        Cases Used               In the analysis phase, cases with no
                                                 user- or system-missing values for
                                                 any predictor variable are used.
                                                 Cases with user-, system-missing, or
                                                 out-of-range values for the
                                                 grouping variable are always
Syntax                                           DISCRIMINANT
                                                  /GROUPS=default(0 1)
                                                  /VARIABLES=employ address age
                                                  /ANALYSIS ALL
                                                  /PRIORS EQUAL
                                                  /STATISTICS=MEAN STDDEV UNIVF
                                                 BOXM COEFF CORR TABLE
                                                  /CLASSIFY=NONMISSING POOLED

Resources               Processor Time                                   00:00:00.047
                                                                                        [DataSet1] E:VGSOMSTUDYSECOND SEMBRMSPSS16Samplesbankloan.sav
                        Elapsed Time                                     00:00:00.121
All-Groups Stacked Histogram is no longer displayed.

            Analysis Case Processing Summary
Unweighted Cases                            N         Percent
Valid                                           700       82.4
Excluded   Missing or out-of-range
                                                150       17.6
           group codes
           At least one missing
                                                 0          .0
           discriminating variable
           Both missing or out-of-
           range group codes and at
                                                 0          .0
           least one missing
           discriminating variable
           Total                                150       17.6
Total                                           850      100.0

                                     Group Statistics
                                                                  Valid N (listwise)
Previously defaulted                  Mean       Std. Deviation Unweighted    Weighted
No      Years with current
                                          9.51            6.664         517     517.000
        Years at current address          8.95            7.001         517     517.000
        Age in years                     35.51            7.708         517     517.000
Yes     Years with current
                                          5.22            5.543         183     183.000
        Years at current address          6.39            5.925         183     183.000
        Age in years                     33.01            8.518         183     183.000
Total   Years with current
                                          8.39            6.658         700     700.000
        Years at current address          8.28            6.825         700     700.000
        Age in years                     34.86            7.997         700     700.000
Tests of Equality of Group Means
                                 Wilks' Lambda          F             df1         df2         Sig.
Years with current
                                             .920      60.759               1           698      .000
Years at current address                     .973      19.402               1           698      .000
Age in years                                 .981      13.482               1           698      .000

                                Pooled Within-Groups Matrices
                                                    Years with
                                                      current            Years at
                                                     employer        current address Age in years       This matrix shows correlation between the predictors. The largest
Correlation      Years with current                                                                     correlations occur between Credit card debt in thousands and the
                                                             1.000               .292           .524    other variables.
                 Years at current address                     .292              1.000           .588
                 Age in years                                 .524               .588          1.000

Analysis 1
Box's Test of Equality of Covariance Matrices

                    Log Determinants
Previously defaulted            Rank       Determinant
No                                     3            11.012
Yes                                    3            10.501
Pooled within-groups                   3            10.919
The ranks and natural logarithms of determinants
printed are those of the group covariance
          Test Results
Box's M                   28.171
F         Approx.          4.665
          df1                    6
          df2            7.335E5
          Sig.              .000
Log Determinants
Previously defaulted        Rank        Determinant
No                                 3             11.012
Yes                                3             10.501
Pooled within-groups               3             10.919
Tests null hypothesis of
equal population covariance
Summary of Canonical Discriminant Functions

Functio                                                       Canonical
n       Eigenvalue       % of Variance     Cumulative %       Correlation
1                .100a             100.0              100.0            .301
a. First 1 canonical discriminant functions were used in the analysis.

                         Wilks' Lambda
Test of
n(s)    Wilks' Lambda       Chi-square           df       Sig.
1                    .909        66.251               3       .000

    Standardized Canonical Discriminant
           Function Coefficients
Years with current
Years at current address                 .436
Age in years                             -.330
Structure Matrix
Years with current
Years at current address              .528
Age in years                          .440
Pooled within-groups correlations
between discriminating variables and
standardized canonical discriminant
 Variables ordered by absolute size of
correlation within function.

 Functions at Group
Previo      Function
ed             1
No                 .188
Yes                -.530
functions evaluated
at group means

Classification Statistics

       Classification Processing Summary
Processed                                    850
Excluded    Missing or out-of-range
            group codes
            At least one missing
            discriminating variable
Used in Output                               850
Prior Probabilities for Groups
Previo                    Cases Used in Analysis
ed           Prior        Unweighted         Weighted
No                .500                517          517.000
Yes               .500                183          183.000
Total            1.000                700          700.000

        Classification Function Coefficients
                                 Previously defaulted
                                       No             Yes
Years with current
                                        -.192           -.302
Years at current address                -.302           -.348
Age in years                                .797        .827
(Constant)                            -12.588         -12.444
Fisher's linear discriminant functions

                                  Classification Resultsa

                         Previously                 Predicted Group Membership
                         defaulted                       No            Yes          Total     The Discriminant Analysis shows that the persons in the category
Original     Count       No                                     300          217        517   who have previously defaulted are predicted likely to default this
                         Yes                                     44          139        183   time as well & those who haven’t defaulted earlier are predicted less
                         Ungrouped cases                         81           69        150   likely to default this time.
             %           No                                     58.0         42.0     100.0   The conclusion is inferred from the total no. of defaulters being
                         Yes                                    24.0         76.0     100.0   more than non defaulters (139>44) similarly (300>217).
                         Ungrouped cases                        54.0         46.0     100.0
a. 62.7% of original grouped cases correctly classified.
Q. Perform Factor Analysis on the given dataset.

The dataset chosen contains fictional statistics anxiety questionnaire. It contains response given
by students regarding their ease of use, liking and usage of SPSS in statistics.

By using the Scree Plot I have chosen 5 factors.

Since a student may give related answers depending upon the choices hence I considered the
variables to be inter-related and hence used Oblimin rotation. Say a student gave high points for
variable “I have little experience of computers” is likely to give high points for “All computers
hate me” as the variables are correlated somewhat.
Using the options of SPSS the following Pattern Matrix was generated.
                                       Pattern Matrix a


                                   1            2              3           4   5

I have little experience of            .903

SPSS always crashes when I             .732
try to use it

All computers hate me                  .684

I worry that I will cause              .662
irreparable damage because
of my incompetenece with

Computers have minds of                .581
their own and deliberately go
wrong whenever I use them

People try to tell you that            .446
SPSS makes statistics easier
to understand but it doesn't

Computers are out to get me            .333

My friends are better at SPSS                       .661
than I am

My friends are better at                            .655
statistics than me

If I'm good at statistics my                        .622
friends will think I'm a nerd

My friends will think I'm stupid                    .504           .330
for not being able to cope
with SPSS

Everybody looks at me when                          .358           .358
I use SPSS

I can't sleep for thoughts of                                      -.728
eigen vectors

I wake up under my duvet               .324                        -.543
thinking that I am trapped
under a normal distribtion
Computers are useful only for         .359                .393           -.366
playing games

Standard deviations excite                         .301   .356           .315

I have never been good at                                        -.855

I did badly at mathematics at                                    -.736

I slip into a coma whenever I                                    -.722
see an equation

Statiscs makes me cry                                                    -.772

I don't understand statistics                                            -.730

I weep openly at the mention                                             -.664
of central tendency

I dream that Pearson is                                                  -.564
attacking me with correlation

Extraction Method: Principal Component Analysis.
Rotation Method: Oblimin with Kaiser Normalization.

a. Rotation converged in 15 iterations.

The total variance explained by each factor is given below

Total Variance Explained

         Rotation Sums of
               Loadings a
nent             Total

1                        5.522

2                        2.452

3                        2.383

4                        3.535

5                        4.913
Extraction Method:
Principal Component

It is calculated by the sum of squared loadings of the factor and dividing the sum of squared loadings by
the number of variables and multiplying by 100.

Hence the factoring would be as follows depending on the loading values.

                       Factor                                               Variable Nos.
                         1                               1,2,3,4,5,6,7,14
                         2                               8,9,10
                         3                               13
                         4                               17,18,19
                         5                               20,21,22,23

Since variables 11, 12, 15 and 16 have very close loadings in different factors it is not good as this
variable is assessing both constructs.15 has exact same value in both Factor 2 and Factor 3.These are
said to have split loading.

They are hence mentioned in a separately.

                       Factor                                               Variable No
                          2                              11,16,15
                          3                              12,15

As Split loading is present this is not a simple structure.

Factor 1: Anxiety about the usage of computers accounts for 55.22% of the total variance and loads 8 of
the variables.

Factor 2: View of students regarding their understanding of statistics and SPSS with regard to their peers
accounts for 24.52% of the total variance and loads 3 variables. It also split loads variable 11, 16 and 15.

Factor 3: Anxiety about Eigen vectors corresponds to only 23.83% of the total variance and loads only 1
variable directly while it split loads variable 12 and 15.

Factor 4: Students interest in mathematics accounts for 35.35% of the total variance and loads 3

Factor 5: Dislike for statistics accounts for 49.13% of the total variance and loads 4 variables.

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same
group (called cluster) are more similar (in some sense or another) to each other than to those in other groups
(clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis
used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and



Output Crea ted                                                                                 02-Apr-2013 22:00:05


Input                       Da ta                             C:Us ers dev
                                                              ma l etiaDownloadsClusterAnonFaculty.sav

                            Acti ve Da taset                  Da ta Set3

                            Fi l ter                          <none>

                            Wei ght                           <none>

                            Spl it File                       <none>

                            N of Rows i n Working Data File                                                          44

Mi s sing Value Handling    Defi nition of Missing            Us er-defined missing values a re treated as missing.

                            Ca s es Used                      Sta ti stics a re based on cases with no missing values
                                                              for a ny va ri able used.

Synta x                                                       PROXIMITIES       Sa l ary FTE Ra nk Arti cles Experience
                                                              OUT('C:Us ersDEVMAL~1AppDataLocalTempspss
                                                              6496s pssclus.tmp')
                                                                 /PRINT NONE
                                                                 /STANDARDIZE=VARIABLE Z.

Res ources                  Proces sor Ti me                                                              00:00:00.078

                            El a psed Time                                                                00:00:00.082

                            Works pace Bytes                                                                     11152

Fi l es Saved               Ma tri x Fi le                    C:Us ers DEVMAL~1AppDataLocalTempspss6496
                                                              s pssclus.tmp
The variables are which I have used in the dataset are as follows:
•            Name -- Although faculty salaries are public information under North Carolina state law
•            Salary – annual salary in dollars, from the university report available in One Stop.
•            FTE – Full time equivalent work load for the faculty member.
•            Rank – where 1 = adjunct, 2 = visiting, 3 = assistant, 4 = associate, 5 = professor
•            Articles – number of published scholarly articles, excluding things like comments in newsletters,
abstracts in proceedings, and the like.
•            Experience – Number of years working as a full time faculty member in a Department of Psychology.
•            ArticlesAPD – number of published articles as listed in the university’s Academic Publications
•            Sex –biological sex from physical appearance.

In the first step SPSS computes for each pair of cases the squared Euclidian distance between the cases. This is
quite simply, the sum across variables (from i = 1 to v) of the squared difference between the score on variable
i for the one case (Xi) and the score on variable i for the other case (Yi). The two cases which are separated by
the smallest Euclidian distance are identified and then classified together into the first cluster.                   At this point
there is one cluster with two cases in it.
Next SPSS re-computes the squared Euclidian distances between each entity (case or cluster) and each other
entity. When one or both of the compared entities is a cluster, SPSS computes the averaged squared Euclidian
distance between members of the one entity and members of the other entity.                              The two entities with the
smallest squared Euclidian distance are classified together.                          SPSS then re-computes the squared Euclidian
distances between each entity and each other entity and the two with the smallest squared Euclidian distance
are classified together.                This continues until all of the cases have been clustered into one big cluster.

The output obtained can be seen below:

                                          Case Processing Summary a

                                                   Ca s es

                Va l i d                          Mi s s i ng                    Tota l

         N                 Percent            N              Percent         N            Percent

                44             100.0%                0                 .0%       44           100.0%

    a.       Squa red Euclidean Distance used
On the first step SPSS clustered case 32 with 33. The squared Euclidian distance between these two cases is
0.000. At stages 2-4 SPSS creates three more clusters, each containing two cases. At stage 5 SPSS adds case
39 to the cluster that already contains cases 37 and 38. By the 43rd stage all cases have been clustered into
one entity.
The results can be seen below:

Average Linkage (Between Groups)

                                                   Agglomeration Schedule

               Cl us ter Combi ned                                    Sta ge Cl us ter Fi rs t Appea rs

Sta ge     Cl us ter 1        Cl us ter 2        Coeffi ci ents      Cl us ter 1             Cl us ter 2        Next Sta ge

1                        32                 33                .000                  0                      0                  9

2                        41                 42                .000                  0                      0                  6

3                        43                 44                .000                  0                      0                  6

4                        37                 38                .000                  0                      0                  5

5                        37                 39                .001                  4                      0                  7

6                        41                 43                .002                  2                      3                  27

7                        36                 37                .003                  0                      5                  27

8                        20                 22                .007                  0                      0                  11

9                        30                 32                .012                  0                      1                  13

10                       21                 26                .012                  0                      0                  14

11                       20                 25                .031                  8                      0                  12

12                       16                 20                .055                  0                      11                 14

13                       29                 30                .065                  0                      9                  26

14                       16                 21                .085                 12                      10                 20

15                       11                 18                .093                  0                      0                  22

16                       8                  9                 .143                  0                      0                  25

17                       17                 24                .144                  0                      0                  20

18                       13                 23                .167                  0                      0                  22

19                       14                 15                .232                  0                      0                  32

20                       16                 17                .239                 14                      17                 23

21                       7                  12                .279                  0                      0                  28

22                       11                 13                .441                 15                      18                 29

23                       16                 27                .451                 20                      0                  26

24                       3                  10                .572                  0                      0                  28

25                       6                  8                 .702                  0                      16                 36

26                       16                 29                .768                 23                      13                 35

27                       36                 41                .858                  7                      6                  33
28                 3                        7                    .904                      24                 21   31

29                 11                    28                      .993                      22                 0    30

30                 5                     11                     1.414                      0                  29   34

31                 3                        4                   1.725                      28                 0    36

32                 14                    31                     1.928                      19                 0    34

33                 36                    40                     2.168                      27                 0    40

34                 5                     14                     2.621                      30                 32   35

35                 5                     16                     2.886                      34                 26   37

36                 3                        6                   3.089                      31                 25   38

37                 5                     19                     4.350                      35                 0    39

38                 1                        3                   4.763                      0                  36   41

39                 5                     34                     5.593                      37                 0    42

40                 35                    36                     8.389                      0                  33   43

41                 1                        2                   8.961                      38                 0    42

42                 1                        5                  11.055                      41                 39   43

43                 1                     35                    17.237                      42                 40   0

                                       Cluster Membership

Ca s e                  5 Cl us ters            4 Cl us ters            3 Cl us ters       2 Cl us ters

1:Ros alyn                              1                       1                      1                  1

2:La wrence                             2                       2                      1                  1

3:Suni la                               1                       1                      1                  1

4:Ra ndolph                             1                       1                      1                  1

5:Mi ckey                               3                       3                      2                  1

6:Loui s                                1                       1                      1                  1

7:Tony                                  1                       1                      1                  1

8:Ra ul                                 1                       1                      1                  1

9:Ca ta l ina                           1                       1                      1                  1

10:Johns on                             1                       1                      1                  1

11:Beul ah                              3                       3                      2                  1

12:Ma rti na                            1                       1                      1                  1

13:Ma ri e                              3                       3                      2                  1

14:Ernes t                              3                       3                      2                  1

15:Chri s topher                        3                       3                      2                  1

16:Erni e                               3                       3                      2                  1

17:Chri s ta                            3                       3                      2                  1
18:Li nette                      3              3              2              1

19:Bo                            3              3              2              1

20:Ca rl a                       3              3              2              1

21:Al berto                      3              3              2              1

22:Chri s ti na                  3              3              2              1

23:Jona h                        3              3              2              1

24:Tucker                        3              3              2              1

25:Sha nta                       3              3              2              1

26:Mel i ssa                     3              3              2              1

27:Jenna                         3              3              2              1

28:Johnny                        3              3              2              1

29:Cl ea tus                     3              3              2              1

30:Jona s                        3              3              2              1

31:Ta d                          3              3              2              1

32:Ama ryl l is                  3              3              2              1

33:Na tha n                      3              3              2              1

34:Dea nna                       3              3              2              1

35:Wi l ly                       4              4              3              2

36:Dea na                        5              4              3              2

37:Dea                           5              4              3              2

38:Cl a ude                      5              4              3              2

39:Ama nda                       5              4              3              2

40:Bori s                        5              4              3              2

41:Ga rrett                      5              4              3              2

42:Stew                          5              4              3              2

43:Bree                          5              4              3              2

44:Ka rma                        5              4              3              2

Vertical Icicle:
In this document, it is not possible to display the full vertical icicle, but, yet, the results for the same are
described below.
For the two cluster solution you can see that one cluster consists of ten cases (Boris through Willy, followed by
a column with no X’s).    These were our adjunct (part-time) faculty (excepting one) and the second cluster
consists of everybody else.
For the three cluster solution you can see the cluster of adjunct faculty and the others split into two.   Deanna
through Mickey were our junior faculty and Lawrence through Rosalyn our senior faculty
For the four cluster solution you can see that one case (Lawrence) forms a cluster of his own.
It displays essentially the same information that is found in the agglomeration schedule but in graphic form.

 * * * * * * * * * * * H I E R A R C H I C A L       C L U S T E R     A N A L Y S I S * * * * * * * * * *

 Dendrogram using Average Linkage (Between Groups)

                                                    Rescaled Distance Cluster Combine

                C A S E                     0         5        10        15        20        25
  Label                               Num   +---------+---------+---------+---------+---------+

  Amaryllis                            32   ─┐
  Nathan                               33   ─┤
  Jonas                                30   ─┼─┐
  Cleatus                              29   ─┘ │
  Alberto                              21   ─┐ │
  Melissa                              26   ─┤ │
  Carla                                20   ─┤ ├─────┐
  Christina                            22   ─┤ │     │
  Shanta                               25   ─┤ │     │
  Ernie                                16   ─┤ │     │
  Christa                              17   ─┼─┘     │
  Tucker                               24   ─┤       │
  Jenna                                27   ─┘       ├───┐
  Beulah                               11   ─┐       │   │
  Linette                              18   ─┼─┐     │   │
  Marie                                13   ─┤ ├─┐   │   │
  Jonah                                23   ─┘ │ ├─┐ │   │
  Johnny                               28   ───┘ │ │ │   ├───┐
  Mickey                                5   ─────┘ ├─┘   │   │
  Ernest                               14   ─┬───┐ │     │   │
  Christopher                          15   ─┘   ├─┘     │   ├───────────────┐
  Tad                                  31   ─────┘       │   │               │
  Bo                                   19   ─────────────┘   │               │
  Deanna                               34   ─────────────────┘               │
  Raul                                  8   ─┬─┐                             │
  Catalina                              9   ─┘ ├─────┐                       ├───────────────┐
  Louis                                 6   ───┘     │                       │               │
  Tony                                  7   ─┬─┐     ├───┐                   │               │
  Martina                              12   ─┘ ├─┐   │   │                   │               │
  Sunila                                3   ─┬─┘ ├───┘   ├───────────┐       │               │
  Johnson                              10   ─┘   │       │           │       │               │
  Randolph                              4   ─────┘       │           ├───────┘               │
  Rosalyn                               1   ─────────────┘           │                       │
  Lawrence                              2   ─────────────────────────┘                       │
  Garrett                              41   ─┐                                               │
  Stew                                 42   ─┼─┐                                             │
  Bree                                 43   ─┤ │                                             │
  Karma                                44   ─┘ ├───┐                                         │
  Dea                                  37   ─┐ │   │                                         │
  Claude                               38   ─┤ │   ├─────────────────┐                       │
  Amanda                               39   ─┼─┘   │                 │                       │
  Deana                                36   ─┘     │                 ├───────────────────────┘
  Boris                                40   ───────┘                 │

  Willy                                35   ─────────────────────────┘
Multiple Regression Analysis
In this Analysis we are using a data file that was created by randomly sampling 400 elementary
schools from the California Department of Education's API 2000 dataset. This data file contains a
measure of school academic performance as well as other attributes of the elementary schools, such
as, class size, enrolment, poverty, etc.,

Now, performing a regression analysis using api00 as the outcome variable and the
variables acs_k3, meals and full as predictors. These measure the academic performance of the
school (api00), the average class size in kindergarten through 3rd grade (acs_k3), the percentage of
students receiving free meals (meals) - which is an indicator of poverty, and the percentage of
teachers who have full teaching credentials (full). We expect that better academic performance would
be associated with lower class size, fewer students receiving free meals, and a higher percentage of
teachers having full teaching credentials. The output is as follows:


Output Created                                               02-Apr-2013 21:48:19


Input                      Data                              C:UsersDivijDesktopSPSS Dataelemapi.sav

                           Active Dataset                    DataSet5

                           Filter                            <none>

                           Weight                            <none>

                           Split File                        <none>

                           N of Row s in Working Data File   400

Missing Value Handling     Definition of Missing             User-defined missing values are treated as

                           Cases Used                        Statistics are based on cases with no missing
                                                             values for any variable used.

Syntax                                                       regression

                                                                 /dependent api00

                                                                 /method=enter acs_k3 meals full


Resources                  Processor Time                    00:00:00.063

                           Elapsed Time                      00:00:00.026

                           Memory Required                   2284 bytes

                           Additional Memory Required for
                                                             0 bytes
                           Residual Plots

Variables Entered/Removed
Variables             Variables
Model       Entered               Removed            Method

1           pct full
            credential, avg
                                  .                  Enter
            class size k-3,
            pct free meals

a. All requested variables entered.

b. Dependent Variable: api 2000

Model Summary

                                          Adjusted R           Std. Error of the
Model       R             R Square        Square               Estimate

1           .821          .674            .671                 64.153

a. Predictors: (Constant), pct full credential, avg class size k-3, pct
free meals

Model                          Sum of Squares df                 Mean Square       F             Sig.

1           Regression         2634884.261       3               878294.754        213.407       .000

            Residual           1271713.209       309             4115.577

            Total              3906597.470       312

a. Predictors: (Constant), pct full credential, avg class size k-3, pct free meals

b. Dependent Variable: api 2000

                                      Unstandardized Coefficients         Coefficients

Model                                 B                Std. Error         Beta               t              Sig.

1           (Constant)                906.739          28.265                                32.080         .000

            avg class size k-3        -2.682           1.394              -.064              -1.924         .055

            pct free meals            -3.702           .154               -.808              -24.038        .000

            pct full credential       .109             .091               .041               1.197          .232

a. Dependent Variable: api 2000
Let's test the three predictors on whether they are statistically significant and, if so, the direction of the
relationship. The average class size (acs_k3, b=-2.682) is not significant (p=0.055), but only just so,
and the coefficient is negative which would indicate that larger class sizes is related t o lower
academic performance, which is what we would expect. Next, the effect of meals (b=-3.702, p=.000)
is significant and its coefficient is negative indicating that the greater the proportion students receiving
free meals, the lower the academic performance. We cannot say that free meals are causing lower
academic performance. The meals variable is highly related to income level and functions more as a
proxy for poverty. Thus, higher levels of poverty are associated with lower academic performance.
Finally, the percentage of teachers with full credentials (full, b=0.109, p=.2321) seems to be unrelated
to academic performance. This would seem to indicate that the percentage of teachers with full
credentials is not an important factor in predicting academic performance which is unexpected.

From these results, we would conclude that lower class sizes are related to higher performance, that
fewer students receiving free meals is associated with higher performance, and that the percentage of
teachers with full credentials was not related to academic performance in the schools. Before we
write this up as our finding, we should do checks to make sure we can firmly stand behind these

Examining Data

Step 1)

To start examining the data we have a look at the first 10 data points for the variables included in our

regression analysis. We need to lay focus on the number of missing data points in the given data.

api00 acs_k3 meals       full

  693     16   67 76.00
  570     15   92 79.00
  546     17   97 68.00
  571     20   90 87.00
  478     18   89 87.00
  858     20   . 100.00
  918     19   . 100.00
  831     20   . 96.00
  860     20   . 100.00
  737     21   29 96.00

Number of cases read: 10            Number of cases listed: 10

We see that among the first 10 observations, we have four missing values for meals. Keeping this in
mind, we can use the descriptives command with /var=all to get descriptive statistics for all of the
variables, and pay special attention to the number of valid cases for meals.

Step 2)

Descriptive Statistics

                                N           Minimum     Maximum    Mean         Std. Deviation

school number                   400         58          6072       2866.81      1543.811

district number                 400         41          796        457.73       184.823
api 2000                   400         369         940          647.62     142.249

api 1999                   400         333         917          610.21     147.136

growth 1999 to 2000        400         -69         134          37.41      25.247

pct free meals             315         6           100          71.99      24.386

english language learners 400          0           91           31.45      24.839

year round school          400         0           1            .23        .421

pct 1st year in school     399         2           47           18.25      7.485

avg class size k-3         398         -21         25           18.55      5.005

avg class size 4-6         397         20          50           29.69      3.841

parent not hsg             400         0           100          21.25      20.676

parent hsg                 400         0           100          26.02      16.333

parent some college        400         0           67           19.71      11.337

parent college grad        400         0           100          19.70      16.471

parent grad school         400         0           67           8.64       12.131

avg parent ed              381         1.00        4.62         2.6685     .76379

pct full credential        400         .42         100.00       66.0568    40.29793

pct emer credential        400         0           59           12.66      11.746

number of students         400         130         1570         483.47     226.448

Percentage free meals in
                         400           1           3            2.02       .819
3 categories

Valid N (listwise)         295

Examining the output for the variables we used in our regression analysis above,
namely api00, acs_k3, meals, full. For api00, we see that the values range from 369 to 940 and
there are 400 valid values. For acs_k3, the average class size ranges from -21 to 25 and there are 2
missing values. An average class size of -21 sounds wrong. The variable meals ranges from 6%
getting free meals to 100% getting free meals, so these values seem reasonable, but there are only
315 valid values for this variable. The percent of teachers being full credentialed ranges from .42 to
100, and all of the values are valid.

This has uncovered a number of peculiarities worthy of further examination. We now obtain a
corrected data set from the same source. This data set has got all the data corrected & is free from
the shortcomings diagnosed above. We run another multiple regression on the new data set.
New Multiple regression analysis

For this multiple regression example, we will regress the dependent variable, api00, on all of the
predictor variables in the data set.


Output Created                                                                          02-Apr-2013 22:54:47


Input                        Data                              C:UsersDivijDesktopSPSS


                             Active Dataset                    DataSet8

                             Filter                            <none>

                             Weight                            <none>

                             Split File                        <none>

                             N of Row s in Working Data File                                                400

Missing Value Handling       Definition of Missing             User-defined missing values are treated as

                             Cases Used                        Statistics are based on cases with no missing

                                                               values for any variable used.

Syntax                                                         regression
                                                                /dependent api00

                                                                /method=enter ell meals yr_rnd mobility
                                                               acs_k3 acs_46 full emer enroll .

Resources                    Processor Time                                                       00:00:00.031

                             Elapsed Time                                                         00:00:00.022

                             Memory Required                                                       4724 bytes

                             Additional Memory Required for
                                                                                                       0 bytes
                             Residual Plots

Variables Entered/Removed

                 Variables        Variables
Model            Entered              Removed        Method
1           number of
            students, avg
            class size 4-6,
            pct 1st year in
            school, avg
            class size k-3,
            pct emer
                                                    . Enter
            english language
            learners, year
            round school,
            pct free meals,
            pct full

a. All requested variables entered.
b. Dependent Variable: api 2000

Model Summary

                                             Adjusted R        Std. Error of the
Model               R            R Square      Square               Estimate
1                      .919           .845              .841             56.768

a. Predictors: (Constant), number of students, avg class size 4-6, pct
1st year in school, avg class size k-3, pct emer credential, english
language learners, year round school, pct free meals, pct full


Model                             Sum of Squares        df          Mean Square        F           Sig.
1           Regression                6740702.006              9       748966.890     232.409        .000

            Residual                  1240707.781             385        3222.618

            Total                     7981409.787             394

a. Predictors: (Constant), number of students, avg class size 4-6, pct 1st year in school, avg
class size k-3, pct emer credential, english language learners, year round school, pct free
meals, pct full credential
b. Dependent Variable: api 2000


Model                                           Unstandardized Coefficients         Coefficients          t       Sig.
B            Std. Error          Beta

1        (Constant)                            758.942            62.286                         12.185   .000

         english language learners                  -.860           .211               -.150     -4.083   .000

         pct free meals                            -2.948           .170               -.661    -17.307   .000

         year round school                      -19.889            9.258               -.059     -2.148   .032

         pct 1st year in school                    -1.301           .436               -.069     -2.983   .003

         avg class size k-3                        1.319           2.253               .013        .585   .559

         avg class size 4-6                        2.032            .798               .055       2.546   .011

         pct full credential                        .610            .476               .064       1.281   .201

         pct emer credential                        -.707           .605               -.058     -1.167   .244

         number of students                         -.012           .017               -.019      -.724   .469

a. Dependent Variable: api 2000

    1) Examining the output from this regression analysis. As with the simple regression, we look to
       the p-value of the F-test to see if the overall model is significant. With a p-value of zero to
       three decimal places, the model is statistically significant. The R-squared is 0.845, meaning
       that approximately 85% of the variability of api00 is accounted for by the variables in the
       model. In this case, the adjusted R-squared indicates that about 84% of the variability
        ofapi00 is accounted for by the model, even after taking into account the number of predictor
        variables in the model. The coefficients for each of the variables indicates the amount of
        change one could expect in api00 given a one-unit change in the value of that variable, given
        that all other variables in the model are held constant. For example, consider the
        variable ell. We would expect a decrease of 0.86 in the api00 score for every one unit
        increase in ell, assuming that all other variables in the model are held constant.

    2) R-Square is the proportion of variance in the dependent variable (api00) which can be
       predicted from the independent variables (ell, meals, yr_rnd,
       mobility, acs_k3, acs_46, full, emer and enroll). This value indicates that 84% of the
       variance in api00 can be predicted from the
        variables ell, meals,yr_rnd, mobility, acs_k3, acs_46, full, emer and enroll.

    3) The beta coefficients are used by some researchers to compare the relative strength of the
       various predictors within the model. Because the beta coefficients are all measured in
       standard deviations, instead of the units of the variables, they can be compared to one
       another. In other words, the beta coefficients are the coefficients that you would obtain if the
        outcome and predictor variables were all transformed to standard scores, also cal led z-
        scores, before running the regression. In this example, meals has the largest Beta coefficient,
        -0.661, and acs_k3 has the smallest Beta, 0.013. Thus, a one standard deviation increase
        in meals leads to a 0.661 standard deviation decrease in predicted api00, with the other
        variables held constant. And, a one standard deviation increase in acs_k3, in turn, leads to a
        0.013 standard deviation increase api00 with the other variables in the model held constant.
    4) The adjusted R-square attempts to yield a more honest value to estimate the R-squared for
       the population. The value of R-square was .8446, while the value of Adjusted R-square was
.8409. The adjusted R-square attempts to yield a more honest value to estimate the R-
   squared for the population.

5) The F Value is the Mean Square Regression (748966.89) divided by the Mean Square
   Residual (3222.61761), yielding F=232.41. The p value associated with this F value is very
   small (0.0000). These values are used to answer the question "Do the independent variables
   reliably predict the dependent variable?". The p value is compared to your alpha level
   (typically 0.05) and, if smaller, you can conclude "Yes, the independent variables reliably
   predict the dependent variable".

6) These are the degrees of freedom associated with the sources of variance. The Total
   variance has N-1 degrees of freedom (DF). In this case, there were N=395 observations, so
   the DF for total is 394.

More Related Content

What's hot

Crescent pure
Crescent pureCrescent pure
Crescent pure
Chinmaya Lovekar
Apple Case Study
Apple Case StudyApple Case Study
Apple Case Study
Meherunnesha (Nishat)
Case Analysis - Crescent Pure
Case Analysis - Crescent PureCase Analysis - Crescent Pure
Case Analysis - Crescent Pure
Danh Đỗ
Comparison of Marketing Mix of IKEA in Four Countries
Comparison of Marketing Mix of IKEA in Four CountriesComparison of Marketing Mix of IKEA in Four Countries
Comparison of Marketing Mix of IKEA in Four Countries
Fatima Arshad
J.C. Penney’s Case Study
J.C. Penney’s Case StudyJ.C. Penney’s Case Study
J.C. Penney’s Case Study
Kaitlin Rutledge
Pepsi Lipton Brisk - Harvard Business Review Case
Pepsi Lipton Brisk - Harvard Business Review CasePepsi Lipton Brisk - Harvard Business Review Case
Pepsi Lipton Brisk - Harvard Business Review Case
Strategic Management project on Johnson & Johnson
Strategic Management project on Johnson & Johnson Strategic Management project on Johnson & Johnson
Strategic Management project on Johnson & Johnson
Shobhita Dayal
Mountain Man Brewing Company: Case Analysis
Mountain Man Brewing Company: Case AnalysisMountain Man Brewing Company: Case Analysis
Mountain Man Brewing Company: Case Analysis
Shashank Srivastava
Metabical - Marketing Case Study
Metabical - Marketing Case StudyMetabical - Marketing Case Study
Metabical - Marketing Case Study
Shrishti Gupta
Launching krispy natural case study analysis
Launching krispy natural case study analysisLaunching krispy natural case study analysis
Launching krispy natural case study analysis
Abhishek Pathak
Linear technology case analysis dividend payout policy
Linear technology case analysis dividend payout policyLinear technology case analysis dividend payout policy
Linear technology case analysis dividend payout policy
Himanshu Gulia
Zara: Fast Fashion
Zara: Fast FashionZara: Fast Fashion
Zara: Fast Fashion
Burak Günbal
Atlantic Computers: A Bundle of Pricing Options
Atlantic Computers: A Bundle of Pricing OptionsAtlantic Computers: A Bundle of Pricing Options
Atlantic Computers: A Bundle of Pricing Options
Aqualisa Quartz - Simply A Better Shower (HBR Case Study)
Aqualisa Quartz - Simply A Better Shower (HBR Case Study)Aqualisa Quartz - Simply A Better Shower (HBR Case Study)
Aqualisa Quartz - Simply A Better Shower (HBR Case Study)
Arjun Parekh
Black & decker
Black & deckerBlack & decker
Black & decker
Fouad Al-shaikh
Colgate Palmolive Case Study Analysis
Colgate Palmolive Case Study AnalysisColgate Palmolive Case Study Analysis
Colgate Palmolive Case Study Analysis
Pratik Sanghvi
House of tata - Complete case study
House of tata - Complete case studyHouse of tata - Complete case study
House of tata - Complete case study
JC Penney Failed Pricing Strategy
JC Penney Failed Pricing StrategyJC Penney Failed Pricing Strategy
JC Penney Failed Pricing Strategy
Coda coffee case study
Coda coffee case studyCoda coffee case study
Coda coffee case study
Piyush Sogra

What's hot (20)

Crescent pure
Crescent pureCrescent pure
Crescent pure
Apple Case Study
Apple Case StudyApple Case Study
Apple Case Study
Case Analysis - Crescent Pure
Case Analysis - Crescent PureCase Analysis - Crescent Pure
Case Analysis - Crescent Pure
Comparison of Marketing Mix of IKEA in Four Countries
Comparison of Marketing Mix of IKEA in Four CountriesComparison of Marketing Mix of IKEA in Four Countries
Comparison of Marketing Mix of IKEA in Four Countries
J.C. Penney’s Case Study
J.C. Penney’s Case StudyJ.C. Penney’s Case Study
J.C. Penney’s Case Study
Pepsi Lipton Brisk - Harvard Business Review Case
Pepsi Lipton Brisk - Harvard Business Review CasePepsi Lipton Brisk - Harvard Business Review Case
Pepsi Lipton Brisk - Harvard Business Review Case
Strategic Management project on Johnson & Johnson
Strategic Management project on Johnson & Johnson Strategic Management project on Johnson & Johnson
Strategic Management project on Johnson & Johnson
Mountain Man Brewing Company: Case Analysis
Mountain Man Brewing Company: Case AnalysisMountain Man Brewing Company: Case Analysis
Mountain Man Brewing Company: Case Analysis
Metabical - Marketing Case Study
Metabical - Marketing Case StudyMetabical - Marketing Case Study
Metabical - Marketing Case Study
Launching krispy natural case study analysis
Launching krispy natural case study analysisLaunching krispy natural case study analysis
Launching krispy natural case study analysis
Linear technology case analysis dividend payout policy
Linear technology case analysis dividend payout policyLinear technology case analysis dividend payout policy
Linear technology case analysis dividend payout policy
Zara: Fast Fashion
Zara: Fast FashionZara: Fast Fashion
Zara: Fast Fashion
Atlantic Computers: A Bundle of Pricing Options
Atlantic Computers: A Bundle of Pricing OptionsAtlantic Computers: A Bundle of Pricing Options
Atlantic Computers: A Bundle of Pricing Options
Aqualisa Quartz - Simply A Better Shower (HBR Case Study)
Aqualisa Quartz - Simply A Better Shower (HBR Case Study)Aqualisa Quartz - Simply A Better Shower (HBR Case Study)
Aqualisa Quartz - Simply A Better Shower (HBR Case Study)
Black & decker
Black & deckerBlack & decker
Black & decker
Colgate Palmolive Case Study Analysis
Colgate Palmolive Case Study AnalysisColgate Palmolive Case Study Analysis
Colgate Palmolive Case Study Analysis
House of tata - Complete case study
House of tata - Complete case studyHouse of tata - Complete case study
House of tata - Complete case study
JC Penney Failed Pricing Strategy
JC Penney Failed Pricing StrategyJC Penney Failed Pricing Strategy
JC Penney Failed Pricing Strategy
Coda coffee case study
Coda coffee case studyCoda coffee case study
Coda coffee case study

Viewers also liked

Magazine Protezione Civile - Anno 4 - n. 14 - gennaio-marzo 2014
Magazine Protezione Civile - Anno 4 - n. 14 - gennaio-marzo 2014Magazine Protezione Civile - Anno 4 - n. 14 - gennaio-marzo 2014
Magazine Protezione Civile - Anno 4 - n. 14 - gennaio-marzo 2014
Proyecto educativo institucional 2012
Proyecto educativo institucional 2012Proyecto educativo institucional 2012
Proyecto educativo institucional 2012
vicente fierro
02b riesgos ocupacionales censopas
02b riesgos ocupacionales censopas02b riesgos ocupacionales censopas
02b riesgos ocupacionales censopas
Tania Acevedo-Villar
proceso administartivo
proceso administartivoproceso administartivo
proceso administartivo
Marco Vivar
Feminismo para no feministas
Feminismo para no feministasFeminismo para no feministas
Feminismo para no feministas
Susana Marin Traura
Proceso cautelar, contencioso
Proceso cautelar, contenciosoProceso cautelar, contencioso
Proceso cautelar, contencioso
Leandro Gauna H
Reglamento del xodigo fiscal de la federacion
Reglamento del xodigo fiscal de la federacionReglamento del xodigo fiscal de la federacion
Reglamento del xodigo fiscal de la federacion
Liliana Elizabeth Garduño Ramírez
La prueba en materia penal
La prueba en materia penalLa prueba en materia penal
La prueba en materia penal
Rogelio Armando
Guia del Premio
Guia del PremioGuia del Premio
Guia del Premio
PlAn De MeDiOs
PlAn De MeDiOsPlAn De MeDiOs
PlAn De MeDiOs
éTica para amador
éTica para amadoréTica para amador
éTica para amador
Curso Taller de Preparación para la Certificación (PMI- RMP)®- Realizar el an...
Curso Taller de Preparación para la Certificación (PMI- RMP)®- Realizar el an...Curso Taller de Preparación para la Certificación (PMI- RMP)®- Realizar el an...
Curso Taller de Preparación para la Certificación (PMI- RMP)®- Realizar el an...
David Salomon Rojas Llaullipoma
Desafios matemáticos para alumnos 3°
Desafios matemáticos para alumnos 3°Desafios matemáticos para alumnos 3°
Desafios matemáticos para alumnos 3°
La importancia de la vinculación para el desarrollo de la infraestructura de ...
La importancia de la vinculación para el desarrollo de la infraestructura de ...La importancia de la vinculación para el desarrollo de la infraestructura de ...
La importancia de la vinculación para el desarrollo de la infraestructura de ...
Academia de Ingeniería de México
Estructura Organizacional
Estructura OrganizacionalEstructura Organizacional
Estructura Organizacional
matias vasquez
Gestion tecnologica
Gestion tecnologicaGestion tecnologica
Gestion tecnologica
Lorena Ohmen
Unidad 2
Unidad 2Unidad 2
Unidad 2
Abigail Criollo
2016 06 21_dafiti (1)
2016 06 21_dafiti (1)2016 06 21_dafiti (1)
2016 06 21_dafiti (1)
Camilo Gonzalez

Viewers also liked (20)

Magazine Protezione Civile - Anno 4 - n. 14 - gennaio-marzo 2014
Magazine Protezione Civile - Anno 4 - n. 14 - gennaio-marzo 2014Magazine Protezione Civile - Anno 4 - n. 14 - gennaio-marzo 2014
Magazine Protezione Civile - Anno 4 - n. 14 - gennaio-marzo 2014
Proyecto educativo institucional 2012
Proyecto educativo institucional 2012Proyecto educativo institucional 2012
Proyecto educativo institucional 2012
02b riesgos ocupacionales censopas
02b riesgos ocupacionales censopas02b riesgos ocupacionales censopas
02b riesgos ocupacionales censopas
proceso administartivo
proceso administartivoproceso administartivo
proceso administartivo
Feminismo para no feministas
Feminismo para no feministasFeminismo para no feministas
Feminismo para no feministas
Proceso cautelar, contencioso
Proceso cautelar, contenciosoProceso cautelar, contencioso
Proceso cautelar, contencioso
Reglamento del xodigo fiscal de la federacion
Reglamento del xodigo fiscal de la federacionReglamento del xodigo fiscal de la federacion
Reglamento del xodigo fiscal de la federacion
La prueba en materia penal
La prueba en materia penalLa prueba en materia penal
La prueba en materia penal
Guia del Premio
Guia del PremioGuia del Premio
Guia del Premio
PlAn De MeDiOs
PlAn De MeDiOsPlAn De MeDiOs
PlAn De MeDiOs
éTica para amador
éTica para amadoréTica para amador
éTica para amador
Curso Taller de Preparación para la Certificación (PMI- RMP)®- Realizar el an...
Curso Taller de Preparación para la Certificación (PMI- RMP)®- Realizar el an...Curso Taller de Preparación para la Certificación (PMI- RMP)®- Realizar el an...
Curso Taller de Preparación para la Certificación (PMI- RMP)®- Realizar el an...
Desafios matemáticos para alumnos 3°
Desafios matemáticos para alumnos 3°Desafios matemáticos para alumnos 3°
Desafios matemáticos para alumnos 3°
La importancia de la vinculación para el desarrollo de la infraestructura de ...
La importancia de la vinculación para el desarrollo de la infraestructura de ...La importancia de la vinculación para el desarrollo de la infraestructura de ...
La importancia de la vinculación para el desarrollo de la infraestructura de ...
Estructura Organizacional
Estructura OrganizacionalEstructura Organizacional
Estructura Organizacional
Gestion tecnologica
Gestion tecnologicaGestion tecnologica
Gestion tecnologica
Unidad 2
Unidad 2Unidad 2
Unidad 2
2016 06 21_dafiti (1)
2016 06 21_dafiti (1)2016 06 21_dafiti (1)
2016 06 21_dafiti (1)

Similar to Spss analysis conjoint_cluster_regression_pca_discriminant

A Simple Tutorial on Conjoint and Cluster Analysis
A Simple Tutorial on Conjoint and Cluster AnalysisA Simple Tutorial on Conjoint and Cluster Analysis
A Simple Tutorial on Conjoint and Cluster Analysis
Iterative Path
DS-004-Robust Design
DS-004-Robust DesignDS-004-Robust Design
DS-004-Robust Design
ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066
Benchmark the Actual Bond Prices
Benchmark the Actual Bond PricesBenchmark the Actual Bond Prices
Benchmark the Actual Bond Prices
Ran Zhang
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
Basic Deep Learning.pptx
Basic Deep Learning.pptxBasic Deep Learning.pptx
Basic Deep Learning.pptx
Data preparation
Data preparationData preparation
Data preparation
Cognos Framework Manager
Cognos Framework ManagerCognos Framework Manager
Cognos Framework Manager
Franky Lao
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
Ggplot in python
Ggplot in pythonGgplot in python
Ggplot in python
Ajay Ohri
Cost optimization through simulation rev1
Cost optimization through simulation rev1Cost optimization through simulation rev1
Cost optimization through simulation rev1
David Panek
Data pre processing
Data pre processingData pre processing
Data pre processing
Conjoint analysis with mcmc
Conjoint analysis with mcmcConjoint analysis with mcmc
Conjoint analysis with mcmc
Esteban Ribero
R markup code to create Regression Model
R markup code to create Regression ModelR markup code to create Regression Model
R markup code to create Regression Model
Mohit Rajput
A Critical Look at Fixtures
A Critical Look at FixturesA Critical Look at Fixtures
A Critical Look at Fixtures
Akuntansi Manajemen Edisi 8 oleh Hansen & Mowen Bab 3
Akuntansi Manajemen Edisi 8 oleh Hansen & Mowen Bab 3Akuntansi Manajemen Edisi 8 oleh Hansen & Mowen Bab 3
Akuntansi Manajemen Edisi 8 oleh Hansen & Mowen Bab 3
Dwi Wahyu
Informix physical database design for data warehousing
Informix physical database design for data warehousingInformix physical database design for data warehousing
Informix physical database design for data warehousing
Keshav Murthy
Predicting Real Estate Prices with an ANN
Predicting Real Estate Prices with an ANNPredicting Real Estate Prices with an ANN
Predicting Real Estate Prices with an ANN
Chris Armstrong
relational algebra Tuple Relational Calculus - database management system
relational algebra Tuple Relational Calculus - database management systemrelational algebra Tuple Relational Calculus - database management system
relational algebra Tuple Relational Calculus - database management system
Surya Swaroop

Similar to Spss analysis conjoint_cluster_regression_pca_discriminant (20)

A Simple Tutorial on Conjoint and Cluster Analysis
A Simple Tutorial on Conjoint and Cluster AnalysisA Simple Tutorial on Conjoint and Cluster Analysis
A Simple Tutorial on Conjoint and Cluster Analysis
DS-004-Robust Design
DS-004-Robust DesignDS-004-Robust Design
DS-004-Robust Design
ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066
Benchmark the Actual Bond Prices
Benchmark the Actual Bond PricesBenchmark the Actual Bond Prices
Benchmark the Actual Bond Prices
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
Basic Deep Learning.pptx
Basic Deep Learning.pptxBasic Deep Learning.pptx
Basic Deep Learning.pptx
Data preparation
Data preparationData preparation
Data preparation
Cognos Framework Manager
Cognos Framework ManagerCognos Framework Manager
Cognos Framework Manager
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
Ggplot in python
Ggplot in pythonGgplot in python
Ggplot in python
Cost optimization through simulation rev1
Cost optimization through simulation rev1Cost optimization through simulation rev1
Cost optimization through simulation rev1
Data pre processing
Data pre processingData pre processing
Data pre processing
Conjoint analysis with mcmc
Conjoint analysis with mcmcConjoint analysis with mcmc
Conjoint analysis with mcmc
R markup code to create Regression Model
R markup code to create Regression ModelR markup code to create Regression Model
R markup code to create Regression Model
A Critical Look at Fixtures
A Critical Look at FixturesA Critical Look at Fixtures
A Critical Look at Fixtures
Akuntansi Manajemen Edisi 8 oleh Hansen & Mowen Bab 3
Akuntansi Manajemen Edisi 8 oleh Hansen & Mowen Bab 3Akuntansi Manajemen Edisi 8 oleh Hansen & Mowen Bab 3
Akuntansi Manajemen Edisi 8 oleh Hansen & Mowen Bab 3
Informix physical database design for data warehousing
Informix physical database design for data warehousingInformix physical database design for data warehousing
Informix physical database design for data warehousing
Predicting Real Estate Prices with an ANN
Predicting Real Estate Prices with an ANNPredicting Real Estate Prices with an ANN
Predicting Real Estate Prices with an ANN
relational algebra Tuple Relational Calculus - database management system
relational algebra Tuple Relational Calculus - database management systemrelational algebra Tuple Relational Calculus - database management system
relational algebra Tuple Relational Calculus - database management system

More from Dev Karan Singh Maletia

Empathize and define dev maletia
Empathize and define   dev maletiaEmpathize and define   dev maletia
Empathize and define dev maletia
Dev Karan Singh Maletia
Harley davidson _india_basic marketing__problems__possible solutions
Harley davidson _india_basic marketing__problems__possible solutionsHarley davidson _india_basic marketing__problems__possible solutions
Harley davidson _india_basic marketing__problems__possible solutions
Dev Karan Singh Maletia
Business rerearch survey_analysis__intention to use tablet p_cs among univer...
Business rerearch  survey_analysis__intention to use tablet p_cs among univer...Business rerearch  survey_analysis__intention to use tablet p_cs among univer...
Business rerearch survey_analysis__intention to use tablet p_cs among univer...
Dev Karan Singh Maletia
Basic human resource management report on consulting firms
Basic human resource management report on consulting firmsBasic human resource management report on consulting firms
Basic human resource management report on consulting firms
Dev Karan Singh Maletia
Job analysis security personnel
Job analysis security personnelJob analysis security personnel
Job analysis security personnel
Dev Karan Singh Maletia
Review of a paper on market basket analysis
Review of a paper on market basket analysisReview of a paper on market basket analysis
Review of a paper on market basket analysis
Dev Karan Singh Maletia
Tata steel ideation
Tata steel ideationTata steel ideation
Tata steel ideation
Dev Karan Singh Maletia
Tata steel ideation contest
Tata steel ideation contestTata steel ideation contest
Tata steel ideation contest
Dev Karan Singh Maletia

More from Dev Karan Singh Maletia (8)

Empathize and define dev maletia
Empathize and define   dev maletiaEmpathize and define   dev maletia
Empathize and define dev maletia
Harley davidson _india_basic marketing__problems__possible solutions
Harley davidson _india_basic marketing__problems__possible solutionsHarley davidson _india_basic marketing__problems__possible solutions
Harley davidson _india_basic marketing__problems__possible solutions
Business rerearch survey_analysis__intention to use tablet p_cs among univer...
Business rerearch  survey_analysis__intention to use tablet p_cs among univer...Business rerearch  survey_analysis__intention to use tablet p_cs among univer...
Business rerearch survey_analysis__intention to use tablet p_cs among univer...
Basic human resource management report on consulting firms
Basic human resource management report on consulting firmsBasic human resource management report on consulting firms
Basic human resource management report on consulting firms
Job analysis security personnel
Job analysis security personnelJob analysis security personnel
Job analysis security personnel
Review of a paper on market basket analysis
Review of a paper on market basket analysisReview of a paper on market basket analysis
Review of a paper on market basket analysis
Tata steel ideation
Tata steel ideationTata steel ideation
Tata steel ideation
Tata steel ideation contest
Tata steel ideation contestTata steel ideation contest
Tata steel ideation contest

Recently uploaded

Best practices for project execution and delivery
Best practices for project execution and deliveryBest practices for project execution and delivery
Best practices for project execution and delivery
Chapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .pptChapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .ppt
Income Tax exemption for Start up : Section 80 IAC
Income Tax  exemption for Start up : Section 80 IACIncome Tax  exemption for Start up : Section 80 IAC
Income Tax exemption for Start up : Section 80 IAC
CA Dr. Prithvi Ranjan Parhi
Business storytelling: key ingredients to a story
Business storytelling: key ingredients to a storyBusiness storytelling: key ingredients to a story
Business storytelling: key ingredients to a story
Alexandra Fulford
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Industrial Tech SW: Category Renewal and Creation
Industrial Tech SW:  Category Renewal and CreationIndustrial Tech SW:  Category Renewal and Creation
Industrial Tech SW: Category Renewal and Creation
Christian Dahlen
The Genesis of Famous Dark WEb Platform
The Genesis of Famous Dark WEb PlatformThe Genesis of Famous Dark WEb Platform
The Genesis of Famous Dark WEb Platform
Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024
Adnet Communications
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta MatkaDpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
➒➌➎➏➑➐➋➑➐➐Dpboss Matka Guessing Satta Matka Kalyan Chart Indian Matka
NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...
NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...
NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...
Zodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
Zodiac Signs and Food Preferences_ What Your Sign Says About Your TasteZodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
Zodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
my Pandit
Top mailing list providers in the USA.pptx
Top mailing list providers in the USA.pptxTop mailing list providers in the USA.pptx
Top mailing list providers in the USA.pptx
GKohler - Retail Scavenger Hunt Presentation
GKohler - Retail Scavenger Hunt PresentationGKohler - Retail Scavenger Hunt Presentation
GKohler - Retail Scavenger Hunt Presentation
list of states and organizations .pdf
list of  states  and  organizations .pdflist of  states  and  organizations .pdf
list of states and organizations .pdf
Rbc Rbcua
The Heart of Leadership_ How Emotional Intelligence Drives Business Success B...
The Heart of Leadership_ How Emotional Intelligence Drives Business Success B...The Heart of Leadership_ How Emotional Intelligence Drives Business Success B...
The Heart of Leadership_ How Emotional Intelligence Drives Business Success B...
Stephen Cashman
The latest Heat Pump Manual from Newentide
The latest Heat Pump Manual from NewentideThe latest Heat Pump Manual from Newentide
The latest Heat Pump Manual from Newentide
Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...
Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...
Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...
my Pandit
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel ChartSatta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
➒➌➎➏➑➐➋➑➐➐Dpboss Matka Guessing Satta Matka Kalyan Chart Indian Matka
The Steadfast and Reliable Bull: Taurus Zodiac Sign
The Steadfast and Reliable Bull: Taurus Zodiac SignThe Steadfast and Reliable Bull: Taurus Zodiac Sign
The Steadfast and Reliable Bull: Taurus Zodiac Sign
my Pandit
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...

Recently uploaded (20)

Best practices for project execution and delivery
Best practices for project execution and deliveryBest practices for project execution and delivery
Best practices for project execution and delivery
Chapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .pptChapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .ppt
Income Tax exemption for Start up : Section 80 IAC
Income Tax  exemption for Start up : Section 80 IACIncome Tax  exemption for Start up : Section 80 IAC
Income Tax exemption for Start up : Section 80 IAC
Business storytelling: key ingredients to a story
Business storytelling: key ingredients to a storyBusiness storytelling: key ingredients to a story
Business storytelling: key ingredients to a story
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Industrial Tech SW: Category Renewal and Creation
Industrial Tech SW:  Category Renewal and CreationIndustrial Tech SW:  Category Renewal and Creation
Industrial Tech SW: Category Renewal and Creation
The Genesis of Famous Dark WEb Platform
The Genesis of Famous Dark WEb PlatformThe Genesis of Famous Dark WEb Platform
The Genesis of Famous Dark WEb Platform
Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta MatkaDpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...
NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...
NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...
Zodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
Zodiac Signs and Food Preferences_ What Your Sign Says About Your TasteZodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
Zodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
Top mailing list providers in the USA.pptx
Top mailing list providers in the USA.pptxTop mailing list providers in the USA.pptx
Top mailing list providers in the USA.pptx
GKohler - Retail Scavenger Hunt Presentation
GKohler - Retail Scavenger Hunt PresentationGKohler - Retail Scavenger Hunt Presentation
GKohler - Retail Scavenger Hunt Presentation
list of states and organizations .pdf
list of  states  and  organizations .pdflist of  states  and  organizations .pdf
list of states and organizations .pdf
The Heart of Leadership_ How Emotional Intelligence Drives Business Success B...
The Heart of Leadership_ How Emotional Intelligence Drives Business Success B...The Heart of Leadership_ How Emotional Intelligence Drives Business Success B...
The Heart of Leadership_ How Emotional Intelligence Drives Business Success B...
The latest Heat Pump Manual from Newentide
The latest Heat Pump Manual from NewentideThe latest Heat Pump Manual from Newentide
The latest Heat Pump Manual from Newentide
Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...
Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...
Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel ChartSatta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
The Steadfast and Reliable Bull: Taurus Zodiac Sign
The Steadfast and Reliable Bull: Taurus Zodiac SignThe Steadfast and Reliable Bull: Taurus Zodiac Sign
The Steadfast and Reliable Bull: Taurus Zodiac Sign
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...

Spss analysis conjoint_cluster_regression_pca_discriminant

  • 1. Conjoint Analysis : Conjoint Analysis is a marketing research technique designed to help determine preferences of customers. It is used to analyse how customers value different attributes of a product ( or service) and thus gives an insight into the trade-offs they are to make among the various attributes. To put simply, it tells how much each feature of a product is worth to the consumers. This study includes surveying people with a certain set of attribute combinations which the survey- takers rank or provide preferences. Analysis will be done to model the customer preferences for different combination of attributes. The attributes are termed factors and the different values are levels. In the example that we have taken to use Conjoint Analysis through the tool SPSS, we have analysed data on carpet, taking attributes like Price, Brand, Money-return, Package design and Seal as the attributes based on which the consumers give prefernces. Using two data sets, we calculate the part worths and decide on the weightage of each of the attributes that the users have provided. Variable name Variable label Value label package package design A*, B*, C* brand brand name K2R, Glory, Bissell price price $1.19, $1.39, $1.59 seal Good Housekeeping seal no, yes money money-back guarantee no, yes Code to fetch import the data and analyse : GET FILE='C:UsersAbhiDesktopcarpet_plan.sav'. DATASET NAME DataSet1 WINDOW=FRONT. GET FILE='C:UsersAbhiDesktopcarpet_prefs.sav'. DATASET NAME DataSet2 WINDOW=FRONT. CONJOINT PLAN='C:UsersAbhiDesktopcarpet_plan.sav' /DATA='C:UsersAbhiDesktopcarpet_prefs.sav' /SEQUENCE=PREF1 PREF2 PREF3 PREF4 PREF5 PREF6 PREF7 PREF8 PREF9 PREF10 PR EF11 PREF12 PREF13 PREF14 PREF15 PREF16 PREF17 PREF18 P REF19 PREF20 PREF21 PREF22 /SUBJECT=ID /FACTORS=PACKAGE BRAND (DISCRETE) PRICE (LINEAR LESS) SEAL (LINEAR MORE) MONEY (LINEAR MORE) /PRINT=SUMMARYONLY.
  • 2. Model Description Relation to Ranks N of Levels or Scores package 3 Discrete brand 3 Discrete price 3 Linear (less) seal 2 Linear (more) money 2 Linear (more) Calculation of the part-worth of each attribute Utilities Utility Estimate Std. Error package A* -2.233 .192 B* 1.867 .192 C* .367 .192 brand K2R .367 .192 Glory -.350 .192 Bissell -.017 .192 price $1.19 -6.595 .988 $1.39 -7.703 1.154 $1.59 -8.811 1.320 seal no 2.000 .287 yes 4.000 .575 money no 1.250 .287
  • 3. yes 2.500 .575 (Constant) 12.870 1.282 This table shows the utility (part-worth) scores and their standard errors for each factor level. Higher utility values indicate greater preference. We can see that the value of the part worths are such that, for each attribute if part-worths are added for different levels, it sums up to zero. Thus with respect to brand Glory and Bisell, K2R is preferred more. As expected, there is an inverse relationship between price and utility, with higher prices corresponding to lower utility. The presence of a seal of approval or money-back guarantee corresponds to a higher utility.Also, total utility of a combination can be calculated as : If the cleaner had package design C*, brand Bissell, price $1.59, a seal of approval, and a money -back guarantee, the total utility would be: 0.367 + (−0.017) + (−8.811) + 4.000 + 2.500 + 12.870 = 10.909. Importance: Importance Values package 35.635 brand 14.911 price 29.410 seal 11.172 money 8.872 We can see that attributes package has most importance followed by price. Money return is of least concern for the consumer. The values are computed by taking the utility range for each factor separately and dividing by the sum of the utility ranges for al l factors. The values thus represent percentages and have the property that they sum to 100.
  • 4. Coefficients B Coefficient Estimate price -5.542 seal 2.000 money 1.250 The utility for a particular factor level is determined by multiplying the level by the coefficient. For example, the predicted utility for a price of $1.19 was listed as −6.595 in the utilities table. This is simply the value of the price level, 1.19, multiplied by the price coefficient, −5.542. This table provides measures of the correlation between the observed and estimated preferences. Preference Scores of Simulations a Card Number ID Score 1 1 10.258 2 2 14.292
  • 5. The real power of conjoint analysis is the ability to predict preference for product profiles that weren't rated by the subjects. These are referred to as simulation cases. b Preference Probabilities of Simulations Card Bradley-Terry- a Number ID Maximum Utility Luce Logit 1 1 30.0% 43.1% 30.9% 2 2 70.0% 56.9% 69.1% The maximum utility model determines the probability as the number of respondents predicted to choose the profile divided by the total number of respondents. For each respondent, the predicted choice is simply the profile with the largest total utility. Number of Reversals Factor price 3 money 2 seal 2 brand 0 package 0 Subject 1 Subject 1 1 2 Subject 2 2 3 Subject 3 0 4 Subject 4 0 5 Subject 5 0 6 Subject 6 1
  • 6. 7 Subject 7 0 8 Subject 8 0 9 Subject 9 1 10 Subject 10 2 This table displays the number of reversals for each factor and for each subject. For example, three subjects showed a reversal for price. That is, they preferred product profiles with higher prices. Reversal Summary N of Revers als N of Subjects 1 3 2 2
  • 7. Q. Perform Discriminant Analysis on the given dataset. The dataset chosen contains statistics on set of people who have been given bank loans & have defaulted or not defaulted with their various characteristics. Discriminant Notes Output Created 04-Apr-2013 18:39:05 p{color:black;font-family:sans-serif;font-size:10pt;font- Comments weight:normal} Input Data E:VGSOMSTUDYSECOND Your trial period for SPSS for Windows will expire in 14 da SEMBRMSPSS16Samplesbanklo ys.p{color:0;font-family:Monospaced;font-size:13pt;font- style:normal;font-weight:normal;text-decoration:none} an.sav GET Active Dataset DataSet1 FILE='E:VGSOMSTUDYSECOND SEMBRMSPSS16Samplesbanklo an.sav'. File Label Bank Loan Default DATASET NAME DataSet1 WINDOW=FRONT. Filter <none> DISCRIMINANT /GROUPS=default(0 1) Weight <none> /VARIABLES=employ address age Split File <none> /ANALYSIS ALL /PRIORS EQUAL N of Rows in Working /STATISTICS=MEAN STDDEV UNIVF BOXM COEFF CORR TABLE 850 Data File /PLOT=COMBINED /PLOT=CASES Missing Value Handling Definition of Missing User-defined missing values are treated as missing in the analysis /CLASSIFY=NONMISSING POOLED MEANSUB. phase. Cases Used In the analysis phase, cases with no user- or system-missing values for any predictor variable are used. Cases with user-, system-missing, or out-of-range values for the grouping variable are always excluded. Syntax DISCRIMINANT /GROUPS=default(0 1) /VARIABLES=employ address age /ANALYSIS ALL /PRIORS EQUAL /STATISTICS=MEAN STDDEV UNIVF BOXM COEFF CORR TABLE /PLOT=COMBINED /PLOT=CASES /CLASSIFY=NONMISSING POOLED MEANSUB. Resources Processor Time 00:00:00.047 [DataSet1] E:VGSOMSTUDYSECOND SEMBRMSPSS16Samplesbankloan.sav Elapsed Time 00:00:00.121
  • 8. Warnings All-Groups Stacked Histogram is no longer displayed. Analysis Case Processing Summary Unweighted Cases N Percent Valid 700 82.4 Excluded Missing or out-of-range 150 17.6 group codes At least one missing 0 .0 discriminating variable Both missing or out-of- range group codes and at 0 .0 least one missing discriminating variable Total 150 17.6 Total 850 100.0 Group Statistics Valid N (listwise) Previously defaulted Mean Std. Deviation Unweighted Weighted No Years with current 9.51 6.664 517 517.000 employer Years at current address 8.95 7.001 517 517.000 Age in years 35.51 7.708 517 517.000 Yes Years with current 5.22 5.543 183 183.000 employer Years at current address 6.39 5.925 183 183.000 Age in years 33.01 8.518 183 183.000 Total Years with current 8.39 6.658 700 700.000 employer Years at current address 8.28 6.825 700 700.000 Age in years 34.86 7.997 700 700.000
  • 9. Tests of Equality of Group Means Wilks' Lambda F df1 df2 Sig. Years with current .920 60.759 1 698 .000 employer Years at current address .973 19.402 1 698 .000 Age in years .981 13.482 1 698 .000 Pooled Within-Groups Matrices Years with current Years at employer current address Age in years This matrix shows correlation between the predictors. The largest Correlation Years with current correlations occur between Credit card debt in thousands and the 1.000 .292 .524 other variables. employer Years at current address .292 1.000 .588 Age in years .524 .588 1.000 Analysis 1 Box's Test of Equality of Covariance Matrices Log Determinants Log Previously defaulted Rank Determinant No 3 11.012 Yes 3 10.501 Pooled within-groups 3 10.919 The ranks and natural logarithms of determinants printed are those of the group covariance matrices. Test Results Box's M 28.171 F Approx. 4.665 df1 6 df2 7.335E5 Sig. .000
  • 10. Log Determinants Log Previously defaulted Rank Determinant No 3 11.012 Yes 3 10.501 Pooled within-groups 3 10.919 Tests null hypothesis of equal population covariance matrices. Summary of Canonical Discriminant Functions Eigenvalues Functio Canonical n Eigenvalue % of Variance Cumulative % Correlation 1 .100a 100.0 100.0 .301 a. First 1 canonical discriminant functions were used in the analysis. Wilks' Lambda Test of Functio n(s) Wilks' Lambda Chi-square df Sig. 1 .909 66.251 3 .000 Standardized Canonical Discriminant Function Coefficients Function 1 Years with current .980 employer Years at current address .436 Age in years -.330
  • 11. Structure Matrix Function 1 Years with current .934 employer Years at current address .528 Age in years .440 Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function. Functions at Group Centroids Previo Function usly default ed 1 No .188 Yes -.530 Unstandardized canonical discriminant functions evaluated at group means Classification Statistics Classification Processing Summary Processed 850 Excluded Missing or out-of-range 0 group codes At least one missing 0 discriminating variable Used in Output 850
  • 12. Prior Probabilities for Groups Previo Cases Used in Analysis usly default ed Prior Unweighted Weighted No .500 517 517.000 Yes .500 183 183.000 Total 1.000 700 700.000 Classification Function Coefficients Previously defaulted No Yes Years with current -.192 -.302 employer Years at current address -.302 -.348 Age in years .797 .827 (Constant) -12.588 -12.444 Fisher's linear discriminant functions Classification Resultsa Previously Predicted Group Membership defaulted No Yes Total The Discriminant Analysis shows that the persons in the category Original Count No 300 217 517 who have previously defaulted are predicted likely to default this Yes 44 139 183 time as well & those who haven’t defaulted earlier are predicted less Ungrouped cases 81 69 150 likely to default this time. % No 58.0 42.0 100.0 The conclusion is inferred from the total no. of defaulters being Yes 24.0 76.0 100.0 more than non defaulters (139>44) similarly (300>217). Ungrouped cases 54.0 46.0 100.0 a. 62.7% of original grouped cases correctly classified.
  • 13. Q. Perform Factor Analysis on the given dataset. The dataset chosen contains fictional statistics anxiety questionnaire. It contains response given by students regarding their ease of use, liking and usage of SPSS in statistics. By using the Scree Plot I have chosen 5 factors. Since a student may give related answers depending upon the choices hence I considered the variables to be inter-related and hence used Oblimin rotation. Say a student gave high points for variable “I have little experience of computers” is likely to give high points for “All computers hate me” as the variables are correlated somewhat.
  • 14. Using the options of SPSS the following Pattern Matrix was generated. Pattern Matrix a Component 1 2 3 4 5 I have little experience of .903 computers SPSS always crashes when I .732 try to use it All computers hate me .684 I worry that I will cause .662 irreparable damage because of my incompetenece with computers Computers have minds of .581 their own and deliberately go wrong whenever I use them People try to tell you that .446 SPSS makes statistics easier to understand but it doesn't Computers are out to get me .333 My friends are better at SPSS .661 than I am My friends are better at .655 statistics than me If I'm good at statistics my .622 friends will think I'm a nerd My friends will think I'm stupid .504 .330 for not being able to cope with SPSS Everybody looks at me when .358 .358 I use SPSS I can't sleep for thoughts of -.728 eigen vectors I wake up under my duvet .324 -.543 thinking that I am trapped under a normal distribtion
  • 15. Computers are useful only for .359 .393 -.366 playing games Standard deviations excite .301 .356 .315 me I have never been good at -.855 mathematics I did badly at mathematics at -.736 school I slip into a coma whenever I -.722 see an equation Statiscs makes me cry -.772 I don't understand statistics -.730 I weep openly at the mention -.664 of central tendency I dream that Pearson is -.564 attacking me with correlation coefficients Extraction Method: Principal Component Analysis. Rotation Method: Oblimin with Kaiser Normalization. a. Rotation converged in 15 iterations. The total variance explained by each factor is given below Total Variance Explained Rotation Sums of Squared Loadings a Compo nent Total 1 5.522 2 2.452 3 2.383 4 3.535 5 4.913
  • 16. Extraction Method: Principal Component Analysis. It is calculated by the sum of squared loadings of the factor and dividing the sum of squared loadings by the number of variables and multiplying by 100. Hence the factoring would be as follows depending on the loading values. Factor Variable Nos. 1 1,2,3,4,5,6,7,14 2 8,9,10 3 13 4 17,18,19 5 20,21,22,23 Since variables 11, 12, 15 and 16 have very close loadings in different factors it is not good as this variable is assessing both constructs.15 has exact same value in both Factor 2 and Factor 3.These are said to have split loading. They are hence mentioned in a separately. Factor Variable No 2 11,16,15 3 12,15 As Split loading is present this is not a simple structure. Factor 1: Anxiety about the usage of computers accounts for 55.22% of the total variance and loads 8 of the variables. Factor 2: View of students regarding their understanding of statistics and SPSS with regard to their peers accounts for 24.52% of the total variance and loads 3 variables. It also split loads variable 11, 16 and 15. Factor 3: Anxiety about Eigen vectors corresponds to only 23.83% of the total variance and loads only 1 variable directly while it split loads variable 12 and 15. Factor 4: Students interest in mathematics accounts for 35.35% of the total variance and loads 3 variable. Factor 5: Dislike for statistics accounts for 49.13% of the total variance and loads 4 variables.
  • 17. CLUSTER ANALYSIS Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. Proximities Notes Output Crea ted 02-Apr-2013 22:00:05 Comments Input Da ta C:Us ers dev ma l etiaDownloadsClusterAnonFaculty.sav Acti ve Da taset Da ta Set3 Fi l ter <none> Wei ght <none> Spl it File <none> N of Rows i n Working Data File 44 Mi s sing Value Handling Defi nition of Missing Us er-defined missing values a re treated as missing. Ca s es Used Sta ti stics a re based on cases with no missing values for a ny va ri able used. Synta x PROXIMITIES Sa l ary FTE Ra nk Arti cles Experience /MATRIX OUT('C:Us ersDEVMAL~1AppDataLocalTempspss 6496s pssclus.tmp') /VIEW=CASE /MEASURE=SEUCLID /PRINT NONE /ID=Name /STANDARDIZE=VARIABLE Z. Res ources Proces sor Ti me 00:00:00.078 El a psed Time 00:00:00.082 Works pace Bytes 11152 Fi l es Saved Ma tri x Fi le C:Us ers DEVMAL~1AppDataLocalTempspss6496 s pssclus.tmp
  • 18. The variables are which I have used in the dataset are as follows: • Name -- Although faculty salaries are public information under North Carolina state law • Salary – annual salary in dollars, from the university report available in One Stop. • FTE – Full time equivalent work load for the faculty member. • Rank – where 1 = adjunct, 2 = visiting, 3 = assistant, 4 = associate, 5 = professor • Articles – number of published scholarly articles, excluding things like comments in newsletters, abstracts in proceedings, and the like. • Experience – Number of years working as a full time faculty member in a Department of Psychology. • ArticlesAPD – number of published articles as listed in the university’s Academic Publications • Sex –biological sex from physical appearance. In the first step SPSS computes for each pair of cases the squared Euclidian distance between the cases. This is quite simply, the sum across variables (from i = 1 to v) of the squared difference between the score on variable i for the one case (Xi) and the score on variable i for the other case (Yi). The two cases which are separated by the smallest Euclidian distance are identified and then classified together into the first cluster. At this point there is one cluster with two cases in it. Next SPSS re-computes the squared Euclidian distances between each entity (case or cluster) and each other entity. When one or both of the compared entities is a cluster, SPSS computes the averaged squared Euclidian distance between members of the one entity and members of the other entity. The two entities with the smallest squared Euclidian distance are classified together. SPSS then re-computes the squared Euclidian distances between each entity and each other entity and the two with the smallest squared Euclidian distance are classified together. This continues until all of the cases have been clustered into one big cluster. The output obtained can be seen below: Case Processing Summary a Ca s es Va l i d Mi s s i ng Tota l N Percent N Percent N Percent 44 100.0% 0 .0% 44 100.0% a. Squa red Euclidean Distance used
  • 19. On the first step SPSS clustered case 32 with 33. The squared Euclidian distance between these two cases is 0.000. At stages 2-4 SPSS creates three more clusters, each containing two cases. At stage 5 SPSS adds case 39 to the cluster that already contains cases 37 and 38. By the 43rd stage all cases have been clustered into one entity. The results can be seen below: Average Linkage (Between Groups) Agglomeration Schedule Cl us ter Combi ned Sta ge Cl us ter Fi rs t Appea rs Sta ge Cl us ter 1 Cl us ter 2 Coeffi ci ents Cl us ter 1 Cl us ter 2 Next Sta ge 1 32 33 .000 0 0 9 2 41 42 .000 0 0 6 3 43 44 .000 0 0 6 4 37 38 .000 0 0 5 5 37 39 .001 4 0 7 6 41 43 .002 2 3 27 7 36 37 .003 0 5 27 8 20 22 .007 0 0 11 9 30 32 .012 0 1 13 10 21 26 .012 0 0 14 11 20 25 .031 8 0 12 12 16 20 .055 0 11 14 13 29 30 .065 0 9 26 14 16 21 .085 12 10 20 15 11 18 .093 0 0 22 16 8 9 .143 0 0 25 17 17 24 .144 0 0 20 18 13 23 .167 0 0 22 19 14 15 .232 0 0 32 20 16 17 .239 14 17 23 21 7 12 .279 0 0 28 22 11 13 .441 15 18 29 23 16 27 .451 20 0 26 24 3 10 .572 0 0 28 25 6 8 .702 0 16 36 26 16 29 .768 23 13 35 27 36 41 .858 7 6 33
  • 20. 28 3 7 .904 24 21 31 29 11 28 .993 22 0 30 30 5 11 1.414 0 29 34 31 3 4 1.725 28 0 36 32 14 31 1.928 19 0 34 33 36 40 2.168 27 0 40 34 5 14 2.621 30 32 35 35 5 16 2.886 34 26 37 36 3 6 3.089 31 25 38 37 5 19 4.350 35 0 39 38 1 3 4.763 0 36 41 39 5 34 5.593 37 0 42 40 35 36 8.389 0 33 43 41 1 2 8.961 38 0 42 42 1 5 11.055 41 39 43 43 1 35 17.237 42 40 0 Cluster Membership Ca s e 5 Cl us ters 4 Cl us ters 3 Cl us ters 2 Cl us ters 1:Ros alyn 1 1 1 1 2:La wrence 2 2 1 1 3:Suni la 1 1 1 1 4:Ra ndolph 1 1 1 1 5:Mi ckey 3 3 2 1 6:Loui s 1 1 1 1 7:Tony 1 1 1 1 8:Ra ul 1 1 1 1 9:Ca ta l ina 1 1 1 1 10:Johns on 1 1 1 1 11:Beul ah 3 3 2 1 12:Ma rti na 1 1 1 1 13:Ma ri e 3 3 2 1 14:Ernes t 3 3 2 1 15:Chri s topher 3 3 2 1 16:Erni e 3 3 2 1 17:Chri s ta 3 3 2 1
  • 21. 18:Li nette 3 3 2 1 19:Bo 3 3 2 1 20:Ca rl a 3 3 2 1 21:Al berto 3 3 2 1 22:Chri s ti na 3 3 2 1 23:Jona h 3 3 2 1 24:Tucker 3 3 2 1 25:Sha nta 3 3 2 1 26:Mel i ssa 3 3 2 1 27:Jenna 3 3 2 1 28:Johnny 3 3 2 1 29:Cl ea tus 3 3 2 1 30:Jona s 3 3 2 1 31:Ta d 3 3 2 1 32:Ama ryl l is 3 3 2 1 33:Na tha n 3 3 2 1 34:Dea nna 3 3 2 1 35:Wi l ly 4 4 3 2 36:Dea na 5 4 3 2 37:Dea 5 4 3 2 38:Cl a ude 5 4 3 2 39:Ama nda 5 4 3 2 40:Bori s 5 4 3 2 41:Ga rrett 5 4 3 2 42:Stew 5 4 3 2 43:Bree 5 4 3 2 44:Ka rma 5 4 3 2 Vertical Icicle: In this document, it is not possible to display the full vertical icicle, but, yet, the results for the same are described below. For the two cluster solution you can see that one cluster consists of ten cases (Boris through Willy, followed by a column with no X’s). These were our adjunct (part-time) faculty (excepting one) and the second cluster consists of everybody else. For the three cluster solution you can see the cluster of adjunct faculty and the others split into two. Deanna through Mickey were our junior faculty and Lawrence through Rosalyn our senior faculty For the four cluster solution you can see that one case (Lawrence) forms a cluster of his own.
  • 22. Dendrogram It displays essentially the same information that is found in the agglomeration schedule but in graphic form. * * * * * * * * * * * H I E R A R C H I C A L C L U S T E R A N A L Y S I S * * * * * * * * * * Dendrogram using Average Linkage (Between Groups) Rescaled Distance Cluster Combine C A S E 0 5 10 15 20 25 Label Num +---------+---------+---------+---------+---------+ Amaryllis 32 ─┐ Nathan 33 ─┤ Jonas 30 ─┼─┐ Cleatus 29 ─┘ │ Alberto 21 ─┐ │ Melissa 26 ─┤ │ Carla 20 ─┤ ├─────┐ Christina 22 ─┤ │ │ Shanta 25 ─┤ │ │ Ernie 16 ─┤ │ │ Christa 17 ─┼─┘ │ Tucker 24 ─┤ │ Jenna 27 ─┘ ├───┐ Beulah 11 ─┐ │ │ Linette 18 ─┼─┐ │ │ Marie 13 ─┤ ├─┐ │ │ Jonah 23 ─┘ │ ├─┐ │ │ Johnny 28 ───┘ │ │ │ ├───┐ Mickey 5 ─────┘ ├─┘ │ │ Ernest 14 ─┬───┐ │ │ │ Christopher 15 ─┘ ├─┘ │ ├───────────────┐ Tad 31 ─────┘ │ │ │ Bo 19 ─────────────┘ │ │ Deanna 34 ─────────────────┘ │ Raul 8 ─┬─┐ │ Catalina 9 ─┘ ├─────┐ ├───────────────┐ Louis 6 ───┘ │ │ │ Tony 7 ─┬─┐ ├───┐ │ │ Martina 12 ─┘ ├─┐ │ │ │ │ Sunila 3 ─┬─┘ ├───┘ ├───────────┐ │ │ Johnson 10 ─┘ │ │ │ │ │ Randolph 4 ─────┘ │ ├───────┘ │ Rosalyn 1 ─────────────┘ │ │ Lawrence 2 ─────────────────────────┘ │ Garrett 41 ─┐ │ Stew 42 ─┼─┐ │ Bree 43 ─┤ │ │ Karma 44 ─┘ ├───┐ │ Dea 37 ─┐ │ │ │ Claude 38 ─┤ │ ├─────────────────┐ │ Amanda 39 ─┼─┘ │ │ │ Deana 36 ─┘ │ ├───────────────────────┘ Boris 40 ───────┘ │ Willy 35 ─────────────────────────┘
  • 23. Multiple Regression Analysis In this Analysis we are using a data file that was created by randomly sampling 400 elementary schools from the California Department of Education's API 2000 dataset. This data file contains a measure of school academic performance as well as other attributes of the elementary schools, such as, class size, enrolment, poverty, etc., Now, performing a regression analysis using api00 as the outcome variable and the variables acs_k3, meals and full as predictors. These measure the academic performance of the school (api00), the average class size in kindergarten through 3rd grade (acs_k3), the percentage of students receiving free meals (meals) - which is an indicator of poverty, and the percentage of teachers who have full teaching credentials (full). We expect that better academic performance would be associated with lower class size, fewer students receiving free meals, and a higher percentage of teachers having full teaching credentials. The output is as follows: Regression Notes Output Created 02-Apr-2013 21:48:19 Comments Input Data C:UsersDivijDesktopSPSS Dataelemapi.sav Active Dataset DataSet5 Filter <none> Weight <none> Split File <none> N of Row s in Working Data File 400 Missing Value Handling Definition of Missing User-defined missing values are treated as missing. Cases Used Statistics are based on cases with no missing values for any variable used. Syntax regression /dependent api00 /method=enter acs_k3 meals full . Resources Processor Time 00:00:00.063 Elapsed Time 00:00:00.026 Memory Required 2284 bytes Additional Memory Required for 0 bytes Residual Plots b Variables Entered/Removed
  • 24. Variables Variables Model Entered Removed Method 1 pct full credential, avg . Enter class size k-3, a pct free meals a. All requested variables entered. b. Dependent Variable: api 2000 Model Summary Adjusted R Std. Error of the Model R R Square Square Estimate a 1 .821 .674 .671 64.153 a. Predictors: (Constant), pct full credential, avg class size k-3, pct free meals b ANOVA Model Sum of Squares df Mean Square F Sig. a 1 Regression 2634884.261 3 878294.754 213.407 .000 Residual 1271713.209 309 4115.577 Total 3906597.470 312 a. Predictors: (Constant), pct full credential, avg class size k-3, pct free meals b. Dependent Variable: api 2000 a Coefficients Standardized Unstandardized Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 906.739 28.265 32.080 .000 avg class size k-3 -2.682 1.394 -.064 -1.924 .055 pct free meals -3.702 .154 -.808 -24.038 .000 pct full credential .109 .091 .041 1.197 .232 a. Dependent Variable: api 2000
  • 25. Let's test the three predictors on whether they are statistically significant and, if so, the direction of the relationship. The average class size (acs_k3, b=-2.682) is not significant (p=0.055), but only just so, and the coefficient is negative which would indicate that larger class sizes is related t o lower academic performance, which is what we would expect. Next, the effect of meals (b=-3.702, p=.000) is significant and its coefficient is negative indicating that the greater the proportion students receiving free meals, the lower the academic performance. We cannot say that free meals are causing lower academic performance. The meals variable is highly related to income level and functions more as a proxy for poverty. Thus, higher levels of poverty are associated with lower academic performance. Finally, the percentage of teachers with full credentials (full, b=0.109, p=.2321) seems to be unrelated to academic performance. This would seem to indicate that the percentage of teachers with full credentials is not an important factor in predicting academic performance which is unexpected. From these results, we would conclude that lower class sizes are related to higher performance, that fewer students receiving free meals is associated with higher performance, and that the percentage of teachers with full credentials was not related to academic performance in the schools. Before we write this up as our finding, we should do checks to make sure we can firmly stand behind these results. Examining Data Step 1) To start examining the data we have a look at the first 10 data points for the variables included in our regression analysis. We need to lay focus on the number of missing data points in the given data. api00 acs_k3 meals full 693 16 67 76.00 570 15 92 79.00 546 17 97 68.00 571 20 90 87.00 478 18 89 87.00 858 20 . 100.00 918 19 . 100.00 831 20 . 96.00 860 20 . 100.00 737 21 29 96.00 Number of cases read: 10 Number of cases listed: 10 We see that among the first 10 observations, we have four missing values for meals. Keeping this in mind, we can use the descriptives command with /var=all to get descriptive statistics for all of the variables, and pay special attention to the number of valid cases for meals. Step 2) Descriptive Statistics N Minimum Maximum Mean Std. Deviation school number 400 58 6072 2866.81 1543.811 district number 400 41 796 457.73 184.823
  • 26. api 2000 400 369 940 647.62 142.249 api 1999 400 333 917 610.21 147.136 growth 1999 to 2000 400 -69 134 37.41 25.247 pct free meals 315 6 100 71.99 24.386 english language learners 400 0 91 31.45 24.839 year round school 400 0 1 .23 .421 pct 1st year in school 399 2 47 18.25 7.485 avg class size k-3 398 -21 25 18.55 5.005 avg class size 4-6 397 20 50 29.69 3.841 parent not hsg 400 0 100 21.25 20.676 parent hsg 400 0 100 26.02 16.333 parent some college 400 0 67 19.71 11.337 parent college grad 400 0 100 19.70 16.471 parent grad school 400 0 67 8.64 12.131 avg parent ed 381 1.00 4.62 2.6685 .76379 pct full credential 400 .42 100.00 66.0568 40.29793 pct emer credential 400 0 59 12.66 11.746 number of students 400 130 1570 483.47 226.448 Percentage free meals in 400 1 3 2.02 .819 3 categories Valid N (listwise) 295 Examining the output for the variables we used in our regression analysis above, namely api00, acs_k3, meals, full. For api00, we see that the values range from 369 to 940 and there are 400 valid values. For acs_k3, the average class size ranges from -21 to 25 and there are 2 missing values. An average class size of -21 sounds wrong. The variable meals ranges from 6% getting free meals to 100% getting free meals, so these values seem reasonable, but there are only 315 valid values for this variable. The percent of teachers being full credentialed ranges from .42 to 100, and all of the values are valid. This has uncovered a number of peculiarities worthy of further examination. We now obtain a corrected data set from the same source. This data set has got all the data corrected & is free from the shortcomings diagnosed above. We run another multiple regression on the new data set.
  • 27. New Multiple regression analysis For this multiple regression example, we will regress the dependent variable, api00, on all of the predictor variables in the data set. Regression Notes Output Created 02-Apr-2013 22:54:47 Comments Input Data C:UsersDivijDesktopSPSS Dataelemapi2.sav Active Dataset DataSet8 Filter <none> Weight <none> Split File <none> N of Row s in Working Data File 400 Missing Value Handling Definition of Missing User-defined missing values are treated as missing. Cases Used Statistics are based on cases with no missing values for any variable used. Syntax regression /dependent api00 /method=enter ell meals yr_rnd mobility acs_k3 acs_46 full emer enroll . Resources Processor Time 00:00:00.031 Elapsed Time 00:00:00.022 Memory Required 4724 bytes Additional Memory Required for 0 bytes Residual Plots b Variables Entered/Removed Variables Variables Model Entered Removed Method
  • 28. 1 number of students, avg class size 4-6, pct 1st year in school, avg class size k-3, pct emer . Enter credential, english language learners, year round school, pct free meals, pct full a credential a. All requested variables entered. b. Dependent Variable: api 2000 Model Summary Adjusted R Std. Error of the Model R R Square Square Estimate a 1 .919 .845 .841 56.768 a. Predictors: (Constant), number of students, avg class size 4-6, pct 1st year in school, avg class size k-3, pct emer credential, english language learners, year round school, pct free meals, pct full credential b ANOVA Model Sum of Squares df Mean Square F Sig. a 1 Regression 6740702.006 9 748966.890 232.409 .000 Residual 1240707.781 385 3222.618 Total 7981409.787 394 a. Predictors: (Constant), number of students, avg class size 4-6, pct 1st year in school, avg class size k-3, pct emer credential, english language learners, year round school, pct free meals, pct full credential b. Dependent Variable: api 2000 a Coefficients Standardized Model Unstandardized Coefficients Coefficients t Sig.
  • 29. B Std. Error Beta 1 (Constant) 758.942 62.286 12.185 .000 english language learners -.860 .211 -.150 -4.083 .000 pct free meals -2.948 .170 -.661 -17.307 .000 year round school -19.889 9.258 -.059 -2.148 .032 pct 1st year in school -1.301 .436 -.069 -2.983 .003 avg class size k-3 1.319 2.253 .013 .585 .559 avg class size 4-6 2.032 .798 .055 2.546 .011 pct full credential .610 .476 .064 1.281 .201 pct emer credential -.707 .605 -.058 -1.167 .244 number of students -.012 .017 -.019 -.724 .469 a. Dependent Variable: api 2000 1) Examining the output from this regression analysis. As with the simple regression, we look to the p-value of the F-test to see if the overall model is significant. With a p-value of zero to three decimal places, the model is statistically significant. The R-squared is 0.845, meaning that approximately 85% of the variability of api00 is accounted for by the variables in the model. In this case, the adjusted R-squared indicates that about 84% of the variability ofapi00 is accounted for by the model, even after taking into account the number of predictor variables in the model. The coefficients for each of the variables indicates the amount of change one could expect in api00 given a one-unit change in the value of that variable, given that all other variables in the model are held constant. For example, consider the variable ell. We would expect a decrease of 0.86 in the api00 score for every one unit increase in ell, assuming that all other variables in the model are held constant. 2) R-Square is the proportion of variance in the dependent variable (api00) which can be predicted from the independent variables (ell, meals, yr_rnd, mobility, acs_k3, acs_46, full, emer and enroll). This value indicates that 84% of the variance in api00 can be predicted from the variables ell, meals,yr_rnd, mobility, acs_k3, acs_46, full, emer and enroll. 3) The beta coefficients are used by some researchers to compare the relative strength of the various predictors within the model. Because the beta coefficients are all measured in standard deviations, instead of the units of the variables, they can be compared to one another. In other words, the beta coefficients are the coefficients that you would obtain if the outcome and predictor variables were all transformed to standard scores, also cal led z- scores, before running the regression. In this example, meals has the largest Beta coefficient, -0.661, and acs_k3 has the smallest Beta, 0.013. Thus, a one standard deviation increase in meals leads to a 0.661 standard deviation decrease in predicted api00, with the other variables held constant. And, a one standard deviation increase in acs_k3, in turn, leads to a 0.013 standard deviation increase api00 with the other variables in the model held constant. 4) The adjusted R-square attempts to yield a more honest value to estimate the R-squared for the population. The value of R-square was .8446, while the value of Adjusted R-square was
  • 30. .8409. The adjusted R-square attempts to yield a more honest value to estimate the R- squared for the population. 5) The F Value is the Mean Square Regression (748966.89) divided by the Mean Square Residual (3222.61761), yielding F=232.41. The p value associated with this F value is very small (0.0000). These values are used to answer the question "Do the independent variables reliably predict the dependent variable?". The p value is compared to your alpha level (typically 0.05) and, if smaller, you can conclude "Yes, the independent variables reliably predict the dependent variable". 6) These are the degrees of freedom associated with the sources of variance. The Total variance has N-1 degrees of freedom (DF). In this case, there were N=395 observations, so the DF for total is 394.