SAS & SPSS Sample Codes

Qimiao Amy Hu
Sample Multiple Regression Analysis using SPSS:
Based on below Scatter Plot Matrix and Correlation Matrix, Y and X1, X2 & X3 are highly correlated;
while Y and other variables (X4 to X10) are weakly correlated. We can drop X5 to X10 from the
model. In additions, both plots exhibit multicollinearality among X1, X2 & X3 (correlation highlighted
in yellow).
Y= # of active physicians
X1 = total population
X2 = total personal income
X3 = number of hospital beds
X4 = % of population aged 18‒34
X5 = % of population 65 or older
X6 = % high school graduates
X7 = % bachelor's degrees
X8 = % below poverty level
X9 = % unemployment
X10 = per capita income
Y
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10Y

Qimiao Amy Hu
Sample SAS Codes:
ODS SALES.EXCELXP
file='/folders/myfolders/sasuser.v94/sales performance.xls'
STYLE=minimal
OPTIONS ( Orientation = 'landscape'
FitToPage = 'yes'
Pages_FitWidth = '1'
Pages_FitHeight = '100' );
ods output ParameterEstimates=work.Sales_Regre;
ods graphics on;
title "Linear Regression with Diagnostic Plots";
Proc Reg data=Sales_Reg;
# of active
physicians
Total
population
Total
personal
income
# of
hospital
beds
% of pop
aged 18-34
% of pop
65 or older
% of high
school
grads
% of
bachelor's
degrees
% below
poverty
level
%
unemploym
ent
Per capita
income
Pearson Correlation 1 .980**
.986**
.990**
.312** -.080 -.057 .182 -.034 -.061 .276*
Sig. (2-tailed) .000 .000 .000 .006 .488 .620 .113 .770 .598 .015
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation .980** 1 .995**
.987**
.303** -.130 -.081 .106 -.035 -.019 .207
Sig. (2-tailed) .000 .000 .000 .007 .258 .484 .357 .764 .868 .071
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation .986**
.995** 1 .983**
.310** -.127 -.055 .161 -.072 -.047 .276*
Sig. (2-tailed) .000 .000 .000 .006 .271 .634 .161 .533 .685 .015
N 77 77 77 77 77 77 77 77 77 77 77
.987**
.983** 1 .284* -.070 -.098 .106 .009 -.021 .205
Sig. (2-tailed) .000 .000 .000 .012 .546 .395 .361 .941 .855 .074
N 77 77 77 77 77 77 77 77 77 77 77
.303**
.310**
.284* 1 -.541** .040 .344** -.044 -.034 .162
Sig. (2-tailed) .006 .007 .006 .012 0 .728 .002 .705 .767 .159
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation -.080 -.130 -.127 -.070 -.541** 1 -.115 -.163 .133 .065 -.026
Sig. (2-tailed) .488 .258 .271 .546 .000 .321 .156 .250 .576 .823
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation -.057 -.081 -.055 -.098 .040 -.115 1 .720**
-.832**
-.701**
.442**
Sig. (2-tailed) .620 .484 .634 .395 .728 .321 .000 .000 .000 .000
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation .182 .106 .161 .106 .344** -.163 .720** 1 -.618**
-.568**
.746**
Sig. (2-tailed) .113 .357 .161 .361 .002 .156 .000 .000 .000 .000
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation -.034 -.035 -.072 .009 -.044 .133 -.832**
-.618** 1 .576**
-.623**
Sig. (2-tailed) .770 .764 .533 .941 .705 .250 .000 .000 .000 .000
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation -.061 -.019 -.047 -.021 -.034 .065 -.701**
-.568**
.576** 1 -.391**
Sig. (2-tailed) .598 .868 .685 .855 .767 .576 .000 .000 .000 .000
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation .276* .207 .276* .205 .162 -.026 .442**
.746**
-.623**
-.391** 1
Sig. (2-tailed) .015 .071 .015 .074 .159 .823 .000 .000 .000 .000
N 77 77 77 77 77 77 77 77 77 77 77
Per capita
income
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
% of pop
aged 18-34
% of pop
65 or older
% of high
school
grads
% of
bachelor's
degrees
% below
poverty
level
%
unemploym
ent
Correlations
# of active
physicians
Total
population
Total
personal
income
# of
hospital
beds

Qimiao Amy Hu
model y=x1-x8;
OUTPUT OUT=OUTREG1 P=PREDICT R=RESID RSTUDENT=RSTUDENT COOKD=COOKD;
run;
title 'Sales Regression Histogram';
ods select HistogramBins MyHist;
proc univariate data=Sales_Reg;
histogram x1 / midpercents name='MyHist'
endpoints = 3.425 to 3.6 by .025;
run;
PROC IMPORT OUT=Demographics DATAFILE='/folders/myfolders/demographics.xls'
DBMS=xls REPLACE;
SHEET='sheet1';
Proc Format ;
Value RC 1='White' 2='African American' 3='Hispanic' 4='Asian' 5-9='Others';
Run;
Proc Format ;
Value GD 1='Male' 2='Female' 9='Unknown';
run;
Proc Freq data=Demographics;
Format race RC.;
Format Gender GD.;
Tables Race*Gender/chisq out=chisqT;
run;
PROC EXPORT DATA =chisqT
OUTFILE = "C:desktopdemographics.xls"
DBMS=xls REPLACE;
Sheet = "ChisqT";
QUIT;
Proc Freq data=Demographics;
Format race RC.;
Format Gender GD.;
Tables Race*Gender/fisher out=fisherT;
run;

Qimiao Amy Hu
The SAS System
The FREQ Procedure
Frequency
Percent
Row Pct
Col Pct
Table of race by gender
race(race)
gender(gender)
Male Female Unknown Total
White 6
11.11
50.00
24.00
5
9.26
41.67
17.86
1
1.85
8.33
100.00
12
22.22
African American 6
11.11
37.50
24.00
10
18.52
62.50
35.71
0
0.00
0.00
0.00
16
29.63
Hispanic 6
11.11
35.29
24.00
11
20.37
64.71
39.29
0
0.00
0.00
0.00
17
31.48
Asian 7
12.96
100.00
28.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
7
12.96
Others 0
0.00
0.00
0.00
2
3.70
100.00
7.14
0
0.00
0.00
0.00
2
3.70
Total 25
46.30
28
51.85
1
1.85
54
100.00
Statistics for Table of race by gender
Statistic
D
F Value Prob
Chi-Square 8 15.1896 0.0556
Likelihood Ratio Chi-Square 8 17.9763 0.0214
Mantel-Haenszel Chi-Square 1 1.4866 0.2228
Phi Coefficient 0.5304
Contingency Coefficient 0.4685
Cramer's V 0.3750
Sample Size = 54
The SAS System

Qimiao Amy Hu
Obs race gender COUNT PERCENT
1 White Male 6 11.1111
2 White Female 5 9.2593
3 White Unknown 1 1.8519
4 African American Male 6 11.1111
5 African American Female 10 18.5185
6 Hispanic Male 6 11.1111
7 Hispanic Female 11 20.3704
8 Asian Male 7 12.9630
9 Others Female 2 3.7037
PROC IMPORT OUT=Child_SC
DATAFILE='/folders/myfolders/sasuser.v94/sugar contents in the cereals.xls'
DBMS=xls REPLACE;
SHEET='children';
PROC IMPORT OUT=Adult_SC
DATAFILE='/folders/myfolders/sasuser.v94/sugar contents in the cereals.xls'
DBMS=xls REPLACE;
SHEET='adults';
Data CSC_STA;
set Child_SC (Rename=(Children_cereals=y1))end=Hu nobs=no_of_obs1;
SumY1+Y1;
SSY1+Y1**2;
YY1+2*Y1;
if Hu;
Sample_size1=no_of_obs1;
last=Hu;
n1=_n_;
true=Hu;
MeanY1=SumY1/n1;
VARY1=(SSY1-YY1*MeanY1+n1*(MeanY1)**2)/(n1-1);
Keep n1 SumY1 MeanY1 VarY1;
run;
proc print data=CSC_STA noobs;
title "Children Sugar Content Statistics";
run;
Data ASC_STA;
Set Adult_SC (Rename=(adults_cereals=y2))end=Hu nobs=no_of_obs2;
SumY2+Y2;

Qimiao Amy Hu
SSY2+Y2**2;
YY2+2*Y2;
if Hu;
Sample_size1=no_of_obs2;
last=Hu;
n2=_n_;
true=Hu;
MeanY2=SumY2/n2;
VARY2=(SSY2-YY2*MeanY2+n2*(MeanY2)**2)/(n2-1);
Keep n2 SumY2 MeanY2 VarY2;
run;
proc print data=ASC_STA noobs;
title "Adults Sugar Content Statistics";
run;
Data SC_STA;
Set Work.CSC_STA;
Set Work.ASC_STA;
/*Alpha=5%*/
/*NL denotes the sample size for the sample group with larger sample variance
NS denotes the sample size for the sample group with smaller sample variance */
if max(VarY1,VarY2)=VarY1 then NL=n1;
else NL=n2;
If NL=n1 then NS=n2;
else NS=n1;
F=Max(VarY1, VarY2)/Min(VarY1, VarY2);
p_value1=1 - CDF('F', F, NL-1 , NS-1);
T_critical1=FINV(1-.05 , NL-1 , NS-1);
t_Sta2=((MeanY1-MeanY2)-0)/sqrt(VARY1/n1+VARY2/n2);
df2=(VARY1/n1+VARY2/n2)**2/(1/(n1-1)*(VARY1/n1)**2+1/(n2-1)*(VARY2/n2)**2);
T_crital2=TINV(1-.05/2, df2);
p_Value2=2*(1-CDF('T', t_Sta2, df2));
SS_pool=((n1-1)*(VarY1**2)+(n2-1)*(VarY2**2))/((n1-1)+(n2-1));
SE_pool=sqrt(SS_pool)*sqrt(1/n1+1/n2);
df3=(n1-1)+(n2-1);
t_Sta3=(MeanY1-MeanY2-0)/SE_pool;
T_Critical3=TINV(1-0.05/2, df3);
P_Value3=2*(1-CDF('T', t_Sta3, df3));
Drop SS_Pool;
Run;
proc transpose data=SC_STA out=Two_sided_T_Test (Rename=(Col1=STA_Value));
Proc Print Data=Two_sided_T_Test noobs;

Qimiao Amy Hu
Title "Two_sided_Tests_Results";
Run;

SAS & SPSS Sample Codes

Recommended

Recommended

More Related Content

Similar to SAS & SPSS Sample Codes

Similar to SAS & SPSS Sample Codes (20)

SAS & SPSS Sample Codes