SlideShare a Scribd company logo
Qimiao Amy Hu
Sample Multiple Regression Analysis using SPSS:
Based on below Scatter Plot Matrix and Correlation Matrix, Y and X1, X2 & X3 are highly correlated;
while Y and other variables (X4 to X10) are weakly correlated. We can drop X5 to X10 from the
model. In additions, both plots exhibit multicollinearality among X1, X2 & X3 (correlation highlighted
in yellow).
Y= # of active physicians
X1 = total population
X2 = total personal income
X3 = number of hospital beds
X4 = % of population aged 18‒34
X5 = % of population 65 or older
X6 = % high school graduates
X7 = % bachelor's degrees
X8 = % below poverty level
X9 = % unemployment
X10 = per capita income
Y
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10Y
Qimiao Amy Hu
Sample SAS Codes:
ODS SALES.EXCELXP
file='/folders/myfolders/sasuser.v94/sales performance.xls'
STYLE=minimal
OPTIONS ( Orientation = 'landscape'
FitToPage = 'yes'
Pages_FitWidth = '1'
Pages_FitHeight = '100' );
ods output ParameterEstimates=work.Sales_Regre;
ods graphics on;
title "Linear Regression with Diagnostic Plots";
Proc Reg data=Sales_Reg;
# of active
physicians
Total
population
Total
personal
income
# of
hospital
beds
% of pop
aged 18-34
% of pop
65 or older
% of high
school
grads
% of
bachelor's
degrees
% below
poverty
level
%
unemploym
ent
Per capita
income
Pearson Correlation 1 .980**
.986**
.990**
.312** -.080 -.057 .182 -.034 -.061 .276*
Sig. (2-tailed) .000 .000 .000 .006 .488 .620 .113 .770 .598 .015
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation .980** 1 .995**
.987**
.303** -.130 -.081 .106 -.035 -.019 .207
Sig. (2-tailed) .000 .000 .000 .007 .258 .484 .357 .764 .868 .071
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation .986**
.995** 1 .983**
.310** -.127 -.055 .161 -.072 -.047 .276*
Sig. (2-tailed) .000 .000 .000 .006 .271 .634 .161 .533 .685 .015
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation .990**
.987**
.983** 1 .284* -.070 -.098 .106 .009 -.021 .205
Sig. (2-tailed) .000 .000 .000 .012 .546 .395 .361 .941 .855 .074
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation .312**
.303**
.310**
.284* 1 -.541** .040 .344** -.044 -.034 .162
Sig. (2-tailed) .006 .007 .006 .012 0 .728 .002 .705 .767 .159
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation -.080 -.130 -.127 -.070 -.541** 1 -.115 -.163 .133 .065 -.026
Sig. (2-tailed) .488 .258 .271 .546 .000 .321 .156 .250 .576 .823
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation -.057 -.081 -.055 -.098 .040 -.115 1 .720**
-.832**
-.701**
.442**
Sig. (2-tailed) .620 .484 .634 .395 .728 .321 .000 .000 .000 .000
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation .182 .106 .161 .106 .344** -.163 .720** 1 -.618**
-.568**
.746**
Sig. (2-tailed) .113 .357 .161 .361 .002 .156 .000 .000 .000 .000
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation -.034 -.035 -.072 .009 -.044 .133 -.832**
-.618** 1 .576**
-.623**
Sig. (2-tailed) .770 .764 .533 .941 .705 .250 .000 .000 .000 .000
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation -.061 -.019 -.047 -.021 -.034 .065 -.701**
-.568**
.576** 1 -.391**
Sig. (2-tailed) .598 .868 .685 .855 .767 .576 .000 .000 .000 .000
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation .276* .207 .276* .205 .162 -.026 .442**
.746**
-.623**
-.391** 1
Sig. (2-tailed) .015 .071 .015 .074 .159 .823 .000 .000 .000 .000
N 77 77 77 77 77 77 77 77 77 77 77
Per capita
income
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
% of pop
aged 18-34
% of pop
65 or older
% of high
school
grads
% of
bachelor's
degrees
% below
poverty
level
%
unemploym
ent
Correlations
# of active
physicians
Total
population
Total
personal
income
# of
hospital
beds
Qimiao Amy Hu
model y=x1-x8;
OUTPUT OUT=OUTREG1 P=PREDICT R=RESID RSTUDENT=RSTUDENT COOKD=COOKD;
run;
title 'Sales Regression Histogram';
ods select HistogramBins MyHist;
proc univariate data=Sales_Reg;
histogram x1 / midpercents name='MyHist'
endpoints = 3.425 to 3.6 by .025;
run;
PROC IMPORT OUT=Demographics DATAFILE='/folders/myfolders/demographics.xls'
DBMS=xls REPLACE;
SHEET='sheet1';
Proc Format ;
Value RC 1='White' 2='African American' 3='Hispanic' 4='Asian' 5-9='Others';
Run;
Proc Format ;
Value GD 1='Male' 2='Female' 9='Unknown';
run;
Proc Freq data=Demographics;
Format race RC.;
Format Gender GD.;
Tables Race*Gender/chisq out=chisqT;
run;
PROC EXPORT DATA =chisqT
OUTFILE = "C:desktopdemographics.xls"
DBMS=xls REPLACE;
Sheet = "ChisqT";
QUIT;
Proc Freq data=Demographics;
Format race RC.;
Format Gender GD.;
Tables Race*Gender/fisher out=fisherT;
run;
Qimiao Amy Hu
The SAS System
The FREQ Procedure
Frequency
Percent
Row Pct
Col Pct
Table of race by gender
race(race)
gender(gender)
Male Female Unknown Total
White 6
11.11
50.00
24.00
5
9.26
41.67
17.86
1
1.85
8.33
100.00
12
22.22
African American 6
11.11
37.50
24.00
10
18.52
62.50
35.71
0
0.00
0.00
0.00
16
29.63
Hispanic 6
11.11
35.29
24.00
11
20.37
64.71
39.29
0
0.00
0.00
0.00
17
31.48
Asian 7
12.96
100.00
28.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
7
12.96
Others 0
0.00
0.00
0.00
2
3.70
100.00
7.14
0
0.00
0.00
0.00
2
3.70
Total 25
46.30
28
51.85
1
1.85
54
100.00
Statistics for Table of race by gender
Statistic
D
F Value Prob
Chi-Square 8 15.1896 0.0556
Likelihood Ratio Chi-Square 8 17.9763 0.0214
Mantel-Haenszel Chi-Square 1 1.4866 0.2228
Phi Coefficient 0.5304
Contingency Coefficient 0.4685
Cramer's V 0.3750
Sample Size = 54
The SAS System
Qimiao Amy Hu
Obs race gender COUNT PERCENT
1 White Male 6 11.1111
2 White Female 5 9.2593
3 White Unknown 1 1.8519
4 African American Male 6 11.1111
5 African American Female 10 18.5185
6 Hispanic Male 6 11.1111
7 Hispanic Female 11 20.3704
8 Asian Male 7 12.9630
9 Others Female 2 3.7037
PROC IMPORT OUT=Child_SC
DATAFILE='/folders/myfolders/sasuser.v94/sugar contents in the cereals.xls'
DBMS=xls REPLACE;
SHEET='children';
PROC IMPORT OUT=Adult_SC
DATAFILE='/folders/myfolders/sasuser.v94/sugar contents in the cereals.xls'
DBMS=xls REPLACE;
SHEET='adults';
Data CSC_STA;
set Child_SC (Rename=(Children_cereals=y1))end=Hu nobs=no_of_obs1;
SumY1+Y1;
SSY1+Y1**2;
YY1+2*Y1;
if Hu;
Sample_size1=no_of_obs1;
last=Hu;
n1=_n_;
true=Hu;
MeanY1=SumY1/n1;
VARY1=(SSY1-YY1*MeanY1+n1*(MeanY1)**2)/(n1-1);
Keep n1 SumY1 MeanY1 VarY1;
run;
proc print data=CSC_STA noobs;
title "Children Sugar Content Statistics";
run;
Data ASC_STA;
Set Adult_SC (Rename=(adults_cereals=y2))end=Hu nobs=no_of_obs2;
SumY2+Y2;
Qimiao Amy Hu
SSY2+Y2**2;
YY2+2*Y2;
if Hu;
Sample_size1=no_of_obs2;
last=Hu;
n2=_n_;
true=Hu;
MeanY2=SumY2/n2;
VARY2=(SSY2-YY2*MeanY2+n2*(MeanY2)**2)/(n2-1);
Keep n2 SumY2 MeanY2 VarY2;
run;
proc print data=ASC_STA noobs;
title "Adults Sugar Content Statistics";
run;
Data SC_STA;
Set Work.CSC_STA;
Set Work.ASC_STA;
/*Alpha=5%*/
/*NL denotes the sample size for the sample group with larger sample variance
NS denotes the sample size for the sample group with smaller sample variance */
if max(VarY1,VarY2)=VarY1 then NL=n1;
else NL=n2;
If NL=n1 then NS=n2;
else NS=n1;
F=Max(VarY1, VarY2)/Min(VarY1, VarY2);
p_value1=1 - CDF('F', F, NL-1 , NS-1);
T_critical1=FINV(1-.05 , NL-1 , NS-1);
t_Sta2=((MeanY1-MeanY2)-0)/sqrt(VARY1/n1+VARY2/n2);
df2=(VARY1/n1+VARY2/n2)**2/(1/(n1-1)*(VARY1/n1)**2+1/(n2-1)*(VARY2/n2)**2);
T_crital2=TINV(1-.05/2, df2);
p_Value2=2*(1-CDF('T', t_Sta2, df2));
SS_pool=((n1-1)*(VarY1**2)+(n2-1)*(VarY2**2))/((n1-1)+(n2-1));
SE_pool=sqrt(SS_pool)*sqrt(1/n1+1/n2);
df3=(n1-1)+(n2-1);
t_Sta3=(MeanY1-MeanY2-0)/SE_pool;
T_Critical3=TINV(1-0.05/2, df3);
P_Value3=2*(1-CDF('T', t_Sta3, df3));
Drop SS_Pool;
Run;
proc transpose data=SC_STA out=Two_sided_T_Test (Rename=(Col1=STA_Value));
Proc Print Data=Two_sided_T_Test noobs;
Qimiao Amy Hu
Title "Two_sided_Tests_Results";
Run;

More Related Content

Similar to SAS & SPSS Sample Codes

Research Methodology anova
  Research Methodology anova  Research Methodology anova
Research Methodology anova
Praveen Minz
 
D041114862
D041114862D041114862
D041114862
IOSR-JEN
 
Perhitungan Beta Unstandardized & Beta Standardized.pdf
Perhitungan Beta Unstandardized & Beta Standardized.pdfPerhitungan Beta Unstandardized & Beta Standardized.pdf
Perhitungan Beta Unstandardized & Beta Standardized.pdf
Aminullah Assagaf
 
Perhitungan Beta Unstandardized & Beta Standardized.pptx
Perhitungan Beta Unstandardized & Beta Standardized.pptxPerhitungan Beta Unstandardized & Beta Standardized.pptx
Perhitungan Beta Unstandardized & Beta Standardized.pptx
Aminullah Assagaf
 
Research Methodology chi square test
 Research Methodology chi square test Research Methodology chi square test
Research Methodology chi square test
Praveen Minz
 
Bayesian Dynamic Linear Models for Strategic Asset Allocation
Bayesian Dynamic Linear Models for Strategic Asset AllocationBayesian Dynamic Linear Models for Strategic Asset Allocation
Bayesian Dynamic Linear Models for Strategic Asset Allocation
max chen
 
MULTICOLLINERITY.pptx
MULTICOLLINERITY.pptxMULTICOLLINERITY.pptx
MULTICOLLINERITY.pptx
YanYingLoh
 
Slides ensae 11bis
Slides ensae 11bisSlides ensae 11bis
Slides ensae 11bis
Arthur Charpentier
 
Ch02
Ch02Ch02
T distribution table
T distribution tableT distribution table
T distribution table
Raja Rosenani
 
An overview of statistics management with excel
An overview of statistics management with excelAn overview of statistics management with excel
An overview of statistics management with excel
KRISHANACHOUDHARY1
 
Group-2-Measure-of-Kurtosis-1.pptx
Group-2-Measure-of-Kurtosis-1.pptxGroup-2-Measure-of-Kurtosis-1.pptx
Group-2-Measure-of-Kurtosis-1.pptx
Jose Teodoro Escobar
 
Normal_Curves_z-scores
Normal_Curves_z-scoresNormal_Curves_z-scores
Normal_Curves_z-scores
Omar (TUBBS 128) Ventura VII
 
Lesson 6 coefficient of determination
Lesson 6   coefficient of determinationLesson 6   coefficient of determination
Lesson 6 coefficient of determination
MehediHasan1023
 
04.08121302
04.0812130204.08121302
93 crit valuetables_4th
93 crit valuetables_4th93 crit valuetables_4th
93 crit valuetables_4th
Amish Soni
 
Estadística aplicada a la calidad
Estadística aplicada a la calidadEstadística aplicada a la calidad
Estadística aplicada a la calidad
Felipe Cordero
 
chap08_01.ppt
chap08_01.pptchap08_01.ppt
chap08_01.ppt
RishiRanjan76
 
Statistics ppt.pptx
Statistics ppt.pptxStatistics ppt.pptx
Statistics ppt.pptx
SumiyyahQureshi
 
2(c)4-Nurkhairany_Amyra_Mokhtar.pdf
2(c)4-Nurkhairany_Amyra_Mokhtar.pdf2(c)4-Nurkhairany_Amyra_Mokhtar.pdf
2(c)4-Nurkhairany_Amyra_Mokhtar.pdf
Nurain748010
 

Similar to SAS & SPSS Sample Codes (20)

Research Methodology anova
  Research Methodology anova  Research Methodology anova
Research Methodology anova
 
D041114862
D041114862D041114862
D041114862
 
Perhitungan Beta Unstandardized & Beta Standardized.pdf
Perhitungan Beta Unstandardized & Beta Standardized.pdfPerhitungan Beta Unstandardized & Beta Standardized.pdf
Perhitungan Beta Unstandardized & Beta Standardized.pdf
 
Perhitungan Beta Unstandardized & Beta Standardized.pptx
Perhitungan Beta Unstandardized & Beta Standardized.pptxPerhitungan Beta Unstandardized & Beta Standardized.pptx
Perhitungan Beta Unstandardized & Beta Standardized.pptx
 
Research Methodology chi square test
 Research Methodology chi square test Research Methodology chi square test
Research Methodology chi square test
 
Bayesian Dynamic Linear Models for Strategic Asset Allocation
Bayesian Dynamic Linear Models for Strategic Asset AllocationBayesian Dynamic Linear Models for Strategic Asset Allocation
Bayesian Dynamic Linear Models for Strategic Asset Allocation
 
MULTICOLLINERITY.pptx
MULTICOLLINERITY.pptxMULTICOLLINERITY.pptx
MULTICOLLINERITY.pptx
 
Slides ensae 11bis
Slides ensae 11bisSlides ensae 11bis
Slides ensae 11bis
 
Ch02
Ch02Ch02
Ch02
 
T distribution table
T distribution tableT distribution table
T distribution table
 
An overview of statistics management with excel
An overview of statistics management with excelAn overview of statistics management with excel
An overview of statistics management with excel
 
Group-2-Measure-of-Kurtosis-1.pptx
Group-2-Measure-of-Kurtosis-1.pptxGroup-2-Measure-of-Kurtosis-1.pptx
Group-2-Measure-of-Kurtosis-1.pptx
 
Normal_Curves_z-scores
Normal_Curves_z-scoresNormal_Curves_z-scores
Normal_Curves_z-scores
 
Lesson 6 coefficient of determination
Lesson 6   coefficient of determinationLesson 6   coefficient of determination
Lesson 6 coefficient of determination
 
04.08121302
04.0812130204.08121302
04.08121302
 
93 crit valuetables_4th
93 crit valuetables_4th93 crit valuetables_4th
93 crit valuetables_4th
 
Estadística aplicada a la calidad
Estadística aplicada a la calidadEstadística aplicada a la calidad
Estadística aplicada a la calidad
 
chap08_01.ppt
chap08_01.pptchap08_01.ppt
chap08_01.ppt
 
Statistics ppt.pptx
Statistics ppt.pptxStatistics ppt.pptx
Statistics ppt.pptx
 
2(c)4-Nurkhairany_Amyra_Mokhtar.pdf
2(c)4-Nurkhairany_Amyra_Mokhtar.pdf2(c)4-Nurkhairany_Amyra_Mokhtar.pdf
2(c)4-Nurkhairany_Amyra_Mokhtar.pdf
 

SAS & SPSS Sample Codes

  • 1. Qimiao Amy Hu Sample Multiple Regression Analysis using SPSS: Based on below Scatter Plot Matrix and Correlation Matrix, Y and X1, X2 & X3 are highly correlated; while Y and other variables (X4 to X10) are weakly correlated. We can drop X5 to X10 from the model. In additions, both plots exhibit multicollinearality among X1, X2 & X3 (correlation highlighted in yellow). Y= # of active physicians X1 = total population X2 = total personal income X3 = number of hospital beds X4 = % of population aged 18‒34 X5 = % of population 65 or older X6 = % high school graduates X7 = % bachelor's degrees X8 = % below poverty level X9 = % unemployment X10 = per capita income Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10Y
  • 2. Qimiao Amy Hu Sample SAS Codes: ODS SALES.EXCELXP file='/folders/myfolders/sasuser.v94/sales performance.xls' STYLE=minimal OPTIONS ( Orientation = 'landscape' FitToPage = 'yes' Pages_FitWidth = '1' Pages_FitHeight = '100' ); ods output ParameterEstimates=work.Sales_Regre; ods graphics on; title "Linear Regression with Diagnostic Plots"; Proc Reg data=Sales_Reg; # of active physicians Total population Total personal income # of hospital beds % of pop aged 18-34 % of pop 65 or older % of high school grads % of bachelor's degrees % below poverty level % unemploym ent Per capita income Pearson Correlation 1 .980** .986** .990** .312** -.080 -.057 .182 -.034 -.061 .276* Sig. (2-tailed) .000 .000 .000 .006 .488 .620 .113 .770 .598 .015 N 77 77 77 77 77 77 77 77 77 77 77 Pearson Correlation .980** 1 .995** .987** .303** -.130 -.081 .106 -.035 -.019 .207 Sig. (2-tailed) .000 .000 .000 .007 .258 .484 .357 .764 .868 .071 N 77 77 77 77 77 77 77 77 77 77 77 Pearson Correlation .986** .995** 1 .983** .310** -.127 -.055 .161 -.072 -.047 .276* Sig. (2-tailed) .000 .000 .000 .006 .271 .634 .161 .533 .685 .015 N 77 77 77 77 77 77 77 77 77 77 77 Pearson Correlation .990** .987** .983** 1 .284* -.070 -.098 .106 .009 -.021 .205 Sig. (2-tailed) .000 .000 .000 .012 .546 .395 .361 .941 .855 .074 N 77 77 77 77 77 77 77 77 77 77 77 Pearson Correlation .312** .303** .310** .284* 1 -.541** .040 .344** -.044 -.034 .162 Sig. (2-tailed) .006 .007 .006 .012 0 .728 .002 .705 .767 .159 N 77 77 77 77 77 77 77 77 77 77 77 Pearson Correlation -.080 -.130 -.127 -.070 -.541** 1 -.115 -.163 .133 .065 -.026 Sig. (2-tailed) .488 .258 .271 .546 .000 .321 .156 .250 .576 .823 N 77 77 77 77 77 77 77 77 77 77 77 Pearson Correlation -.057 -.081 -.055 -.098 .040 -.115 1 .720** -.832** -.701** .442** Sig. (2-tailed) .620 .484 .634 .395 .728 .321 .000 .000 .000 .000 N 77 77 77 77 77 77 77 77 77 77 77 Pearson Correlation .182 .106 .161 .106 .344** -.163 .720** 1 -.618** -.568** .746** Sig. (2-tailed) .113 .357 .161 .361 .002 .156 .000 .000 .000 .000 N 77 77 77 77 77 77 77 77 77 77 77 Pearson Correlation -.034 -.035 -.072 .009 -.044 .133 -.832** -.618** 1 .576** -.623** Sig. (2-tailed) .770 .764 .533 .941 .705 .250 .000 .000 .000 .000 N 77 77 77 77 77 77 77 77 77 77 77 Pearson Correlation -.061 -.019 -.047 -.021 -.034 .065 -.701** -.568** .576** 1 -.391** Sig. (2-tailed) .598 .868 .685 .855 .767 .576 .000 .000 .000 .000 N 77 77 77 77 77 77 77 77 77 77 77 Pearson Correlation .276* .207 .276* .205 .162 -.026 .442** .746** -.623** -.391** 1 Sig. (2-tailed) .015 .071 .015 .074 .159 .823 .000 .000 .000 .000 N 77 77 77 77 77 77 77 77 77 77 77 Per capita income **. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed). % of pop aged 18-34 % of pop 65 or older % of high school grads % of bachelor's degrees % below poverty level % unemploym ent Correlations # of active physicians Total population Total personal income # of hospital beds
  • 3. Qimiao Amy Hu model y=x1-x8; OUTPUT OUT=OUTREG1 P=PREDICT R=RESID RSTUDENT=RSTUDENT COOKD=COOKD; run; title 'Sales Regression Histogram'; ods select HistogramBins MyHist; proc univariate data=Sales_Reg; histogram x1 / midpercents name='MyHist' endpoints = 3.425 to 3.6 by .025; run; PROC IMPORT OUT=Demographics DATAFILE='/folders/myfolders/demographics.xls' DBMS=xls REPLACE; SHEET='sheet1'; Proc Format ; Value RC 1='White' 2='African American' 3='Hispanic' 4='Asian' 5-9='Others'; Run; Proc Format ; Value GD 1='Male' 2='Female' 9='Unknown'; run; Proc Freq data=Demographics; Format race RC.; Format Gender GD.; Tables Race*Gender/chisq out=chisqT; run; PROC EXPORT DATA =chisqT OUTFILE = "C:desktopdemographics.xls" DBMS=xls REPLACE; Sheet = "ChisqT"; QUIT; Proc Freq data=Demographics; Format race RC.; Format Gender GD.; Tables Race*Gender/fisher out=fisherT; run;
  • 4. Qimiao Amy Hu The SAS System The FREQ Procedure Frequency Percent Row Pct Col Pct Table of race by gender race(race) gender(gender) Male Female Unknown Total White 6 11.11 50.00 24.00 5 9.26 41.67 17.86 1 1.85 8.33 100.00 12 22.22 African American 6 11.11 37.50 24.00 10 18.52 62.50 35.71 0 0.00 0.00 0.00 16 29.63 Hispanic 6 11.11 35.29 24.00 11 20.37 64.71 39.29 0 0.00 0.00 0.00 17 31.48 Asian 7 12.96 100.00 28.00 0 0.00 0.00 0.00 0 0.00 0.00 0.00 7 12.96 Others 0 0.00 0.00 0.00 2 3.70 100.00 7.14 0 0.00 0.00 0.00 2 3.70 Total 25 46.30 28 51.85 1 1.85 54 100.00 Statistics for Table of race by gender Statistic D F Value Prob Chi-Square 8 15.1896 0.0556 Likelihood Ratio Chi-Square 8 17.9763 0.0214 Mantel-Haenszel Chi-Square 1 1.4866 0.2228 Phi Coefficient 0.5304 Contingency Coefficient 0.4685 Cramer's V 0.3750 Sample Size = 54 The SAS System
  • 5. Qimiao Amy Hu Obs race gender COUNT PERCENT 1 White Male 6 11.1111 2 White Female 5 9.2593 3 White Unknown 1 1.8519 4 African American Male 6 11.1111 5 African American Female 10 18.5185 6 Hispanic Male 6 11.1111 7 Hispanic Female 11 20.3704 8 Asian Male 7 12.9630 9 Others Female 2 3.7037 PROC IMPORT OUT=Child_SC DATAFILE='/folders/myfolders/sasuser.v94/sugar contents in the cereals.xls' DBMS=xls REPLACE; SHEET='children'; PROC IMPORT OUT=Adult_SC DATAFILE='/folders/myfolders/sasuser.v94/sugar contents in the cereals.xls' DBMS=xls REPLACE; SHEET='adults'; Data CSC_STA; set Child_SC (Rename=(Children_cereals=y1))end=Hu nobs=no_of_obs1; SumY1+Y1; SSY1+Y1**2; YY1+2*Y1; if Hu; Sample_size1=no_of_obs1; last=Hu; n1=_n_; true=Hu; MeanY1=SumY1/n1; VARY1=(SSY1-YY1*MeanY1+n1*(MeanY1)**2)/(n1-1); Keep n1 SumY1 MeanY1 VarY1; run; proc print data=CSC_STA noobs; title "Children Sugar Content Statistics"; run; Data ASC_STA; Set Adult_SC (Rename=(adults_cereals=y2))end=Hu nobs=no_of_obs2; SumY2+Y2;
  • 6. Qimiao Amy Hu SSY2+Y2**2; YY2+2*Y2; if Hu; Sample_size1=no_of_obs2; last=Hu; n2=_n_; true=Hu; MeanY2=SumY2/n2; VARY2=(SSY2-YY2*MeanY2+n2*(MeanY2)**2)/(n2-1); Keep n2 SumY2 MeanY2 VarY2; run; proc print data=ASC_STA noobs; title "Adults Sugar Content Statistics"; run; Data SC_STA; Set Work.CSC_STA; Set Work.ASC_STA; /*Alpha=5%*/ /*NL denotes the sample size for the sample group with larger sample variance NS denotes the sample size for the sample group with smaller sample variance */ if max(VarY1,VarY2)=VarY1 then NL=n1; else NL=n2; If NL=n1 then NS=n2; else NS=n1; F=Max(VarY1, VarY2)/Min(VarY1, VarY2); p_value1=1 - CDF('F', F, NL-1 , NS-1); T_critical1=FINV(1-.05 , NL-1 , NS-1); t_Sta2=((MeanY1-MeanY2)-0)/sqrt(VARY1/n1+VARY2/n2); df2=(VARY1/n1+VARY2/n2)**2/(1/(n1-1)*(VARY1/n1)**2+1/(n2-1)*(VARY2/n2)**2); T_crital2=TINV(1-.05/2, df2); p_Value2=2*(1-CDF('T', t_Sta2, df2)); SS_pool=((n1-1)*(VarY1**2)+(n2-1)*(VarY2**2))/((n1-1)+(n2-1)); SE_pool=sqrt(SS_pool)*sqrt(1/n1+1/n2); df3=(n1-1)+(n2-1); t_Sta3=(MeanY1-MeanY2-0)/SE_pool; T_Critical3=TINV(1-0.05/2, df3); P_Value3=2*(1-CDF('T', t_Sta3, df3)); Drop SS_Pool; Run; proc transpose data=SC_STA out=Two_sided_T_Test (Rename=(Col1=STA_Value)); Proc Print Data=Two_sided_T_Test noobs;
  • 7. Qimiao Amy Hu Title "Two_sided_Tests_Results"; Run;