This document summarizes and provides sample code for performing multiple regression and other statistical analyses in SAS. It includes sample SAS code for:
1. Performing a multiple linear regression with diagnostic plots to analyze the relationship between physician numbers (Y) and various population characteristics (X1-X10);
2. Exporting the regression results and diagnostic plots to an Excel file;
3. Conducting univariate analysis to examine the distribution of key independent variables;
4. Performing chi-square and Fisher's exact tests to analyze relationships between demographic variables;
5. Calculating summary statistics and conducting hypothesis tests to compare sugar content in children's and adults' cereals.
The document contains 25 math word problems with multiple choice answers. It provides the questions, possible answers for each question, and a key with the answer for each question. The questions cover a range of math topics including percentages, ratios, proportions, averages, and algebra. An additional section provides hints and solutions for each question to explain the steps to arrive at the correct answer.
AP Statistics - Confidence Intervals with Means - One SampleFrances Coronel
The document discusses how to construct confidence intervals for means using z-scores and t-scores. It outlines the assumptions, calculations, and conclusions for one-sample confidence intervals. The key steps are to check assumptions about the population distribution and sample size, then use the appropriate formula to calculate the confidence interval with either z-critical values if the population standard deviation is known, or t-critical values if the population standard deviation is unknown.
This document summarizes the 5-year clinical outcomes from the LEADERS trial, which compared the Biolimus A9-eluting stent with a biodegradable polymer to the Sirolimus-eluting stent with a durable polymer. The main findings over 5 years of follow-up were:
1) The Biolimus stent was non-inferior to the Sirolimus stent for the primary endpoint of major adverse cardiac events (MACE), comprising cardiac death, heart attack (MI), or clinically-indicated target vessel revascularization (TVR).
2) There were no significant differences between the stents in rates of cardiac death, heart attack (MI), or stent thrombosis.
This document describes a regression analysis conducted on data containing 97 observations of PSA levels and 7 predictor variables. Initially, a full regression model was fit using the first 65 observations. Diagnostic plots of the residuals showed some lack of randomness, indicating a need for transformation. A Box-Cox transformation with lambda=0.5 was applied to the response variable before refitting the model. The transformed model will be validated using the remaining 32 observations to select the best regression model for predicting PSA levels from this data.
Presentation on life tables . It gives the methods of calculating both the abridged and complete life tables. Fergenecy technique is also included in the presentations. The simple steps make it easier for any student with basic understanding of demography in social statistics and actuarial science to have a grasp of the life table workings and what is required to perform in depth analysis.
This document provides an introduction to the normal distribution and the standard normal distribution. It includes:
- Information on the key characteristics of the normal distribution and how it is defined by the mean and standard deviation.
- Examples of real-world data that can be modeled by the normal distribution.
- An explanation of the standard normal distribution and how probability tables are used to find probabilities for this distribution.
- Worked examples of calculating probabilities using the standard normal distribution tables.
1) The document analyzes the relationship between shoe size and height using data collected from 15 males and 15 females aged 17-18.
2) Correlation coefficients were calculated, showing a moderate positive correlation between height and shoe size.
3) Chi-squared tests found the relationship to be independent for males and females individually, but dependent when combining both genders.
The document contains 25 math word problems with multiple choice answers. It provides the questions, possible answers for each question, and a key with the answer for each question. The questions cover a range of math topics including percentages, ratios, proportions, averages, and algebra. An additional section provides hints and solutions for each question to explain the steps to arrive at the correct answer.
AP Statistics - Confidence Intervals with Means - One SampleFrances Coronel
The document discusses how to construct confidence intervals for means using z-scores and t-scores. It outlines the assumptions, calculations, and conclusions for one-sample confidence intervals. The key steps are to check assumptions about the population distribution and sample size, then use the appropriate formula to calculate the confidence interval with either z-critical values if the population standard deviation is known, or t-critical values if the population standard deviation is unknown.
This document summarizes the 5-year clinical outcomes from the LEADERS trial, which compared the Biolimus A9-eluting stent with a biodegradable polymer to the Sirolimus-eluting stent with a durable polymer. The main findings over 5 years of follow-up were:
1) The Biolimus stent was non-inferior to the Sirolimus stent for the primary endpoint of major adverse cardiac events (MACE), comprising cardiac death, heart attack (MI), or clinically-indicated target vessel revascularization (TVR).
2) There were no significant differences between the stents in rates of cardiac death, heart attack (MI), or stent thrombosis.
This document describes a regression analysis conducted on data containing 97 observations of PSA levels and 7 predictor variables. Initially, a full regression model was fit using the first 65 observations. Diagnostic plots of the residuals showed some lack of randomness, indicating a need for transformation. A Box-Cox transformation with lambda=0.5 was applied to the response variable before refitting the model. The transformed model will be validated using the remaining 32 observations to select the best regression model for predicting PSA levels from this data.
Presentation on life tables . It gives the methods of calculating both the abridged and complete life tables. Fergenecy technique is also included in the presentations. The simple steps make it easier for any student with basic understanding of demography in social statistics and actuarial science to have a grasp of the life table workings and what is required to perform in depth analysis.
This document provides an introduction to the normal distribution and the standard normal distribution. It includes:
- Information on the key characteristics of the normal distribution and how it is defined by the mean and standard deviation.
- Examples of real-world data that can be modeled by the normal distribution.
- An explanation of the standard normal distribution and how probability tables are used to find probabilities for this distribution.
- Worked examples of calculating probabilities using the standard normal distribution tables.
1) The document analyzes the relationship between shoe size and height using data collected from 15 males and 15 females aged 17-18.
2) Correlation coefficients were calculated, showing a moderate positive correlation between height and shoe size.
3) Chi-squared tests found the relationship to be independent for males and females individually, but dependent when combining both genders.
ANOVA (analysis of variance) is a statistical technique used to compare differences between group means. It involves calculating the F ratio, which is the ratio of variance between groups to variance within groups. If the calculated F value is greater than the critical F value from statistical tables, then the difference between group means is considered statistically significant. The document provides steps for conducting a one-way ANOVA, including calculating sums of squares, mean squares, and the F ratio to determine if differences between three varieties of wheat are statistically significant based on per acre production data.
The document describes the application of log-linear modeling on medical data from Akanu Ibiam Federal Polytechnic Medical Centre. Log-linear models were used to study the associations between age, sex, and blood group. Interactions between the variables were observed in the fitted models. Specifically, interactions between age and sex, sex and blood group, and age and blood group were significant at the 5% level based on tests of partial association.
This document discusses calculating unstandardized and standardized beta coefficients manually versus using SPSS. It provides an example dataset with 10 observations of variables X and Y along with the calculations to derive the coefficients manually. The manual calculations find the unstandardized beta (b) is 0.611 and standardized beta is 0.438. The document notes SPSS can also be used to calculate the regression coefficients.
This document discusses the calculation of unstandardized (B) and standardized (Beta) regression coefficients manually versus using SPSS. It provides an example dataset with 10 observations of variables X and Y along with the calculations to derive B and Beta manually. The manual calculations find B to be 0.611 and Beta to be 0.438. The document also notes that SPSS can be used to calculate B and Beta and compares the results.
The Chi Square test can be used to determine if categorical data is dependent or independent. It involves calculating the difference between observed and expected frequencies, squaring the differences, summing them, and comparing the result to critical values from a Chi Square table to determine if the null hypothesis should be rejected or not. The document provides examples of how to perform Chi Square tests on genetic and vaccination data.
Bayesian Dynamic Linear Models for Strategic Asset Allocationmax chen
This document discusses Bayesian dynamic linear models for strategic asset allocation. It presents an approach using Bayesian modeling to predict excess returns on stocks and bonds based on predictor variables. The models allow for time-varying parameters and stochastic volatility. The approach averages predictions across multiple models to improve performance. It finds that accounting for parameter uncertainty and time-variation through Bayesian modeling and model averaging improves out-of-sample return and risk predictions compared to standard linear models without these features.
This document discusses multicollinearity, beginning with definitions and the case of perfect multicollinearity. It then examines the case of near or imperfect multicollinearity using data on the demand for widgets. There is high multicollinearity between the price and income variables, resulting in unstable coefficient estimates with large standard errors and insignificant t-statistics. The document outlines methods to detect multicollinearity such as high R-squared but insignificant variables, high pairwise correlations, auxiliary regressions, and variance inflation factors. It provides an example using data on chicken demand.
This document summarizes the use of log-Poisson regression models for claims reserving and calculating reserves. It shows how to fit a log-Poisson regression model to incremental claims payments data and use it to estimate total reserves. It also provides methods for calculating the prediction error and quantifying the uncertainty of reserve estimates, including using the bootstrap procedure to generate multiple simulated reserve estimates.
The document contains data arranged in tables with columns for variables x, y, f, x^2, etc. It discusses calculating means, standard deviations, and fitting distributions such as normal and lognormal to the data. It also contains examples of using the method of least squares to fit linear and quadratic regression models to data.
This table provides critical values (tα/ν) of the Student's t-distribution for various confidence levels (α) with degrees of freedom (ν) ranging from 1 to infinity. The t-distribution is used to test hypotheses about the mean of a population when the population standard deviation is unknown. The table allows researchers to determine if a calculated t-statistic is greater than the critical value and thus determine if the null hypothesis can be rejected for a given confidence level and degrees of freedom.
Excel can create a visual timeline chart and help you map out a project schedule and project phases. Specifically, you can create a Gantt chart, which is a popular tool for project management because it maps out tasks based on how long they'll take, when they start, and when they finish.
The document discusses kurtosis, which refers to the peakedness of a distribution. It presents formulas for calculating kurtosis from both ungrouped and grouped data. As an example, it computes the kurtosis from a set of ungrouped data and determines that the distribution is platykurtic, with a kurtosis value less than 3. It also computes the kurtosis from a set of grouped data and again finds the distribution to be platykurtic.
The document discusses the normal distribution and how it relates to sampling. It states that as the sample size increases, the sampling distribution of sample means approaches normality with a mean equal to the population mean and a standard deviation equal to the population standard deviation divided by the square root of the sample size. This is known as the central limit theorem.
The document provides information about regression analysis and calculating the coefficient of determination. It includes:
1) Instructions on how to perform a regression analysis using a calculator to find the least squares regression line, correlation coefficient, and residual plot from sample data.
2) An explanation of the coefficient of determination as a measure of how much variability in the variable y can be explained by its linear relationship with variable x.
3) A calculation example finding the coefficient of determination to be 0.83 for a dataset relating height and shoe size, meaning approximately 83% of the variation in shoe size can be explained by height.
The document presents Mamdani and Sugeno fuzzy inference system models for calculating the resonant frequency of rectangular microstrip antennas. Two types of fuzzy inference system models - Mamdani and Sugeno - are used to compute the resonant frequency. The parameters of the fuzzy inference system models are determined using various optimization algorithms. The Sugeno fuzzy inference system model trained with the least-squares algorithm provided the best results, with the resonant frequency predictions being in very good agreement with experimental results from literature.
This document contains tables of critical values for various statistical tests including the z-distribution, t-distribution, chi-square distribution, and F-distribution. The z-distribution table lists critical values for the z-test across different levels of significance. Similarly, the other tables provide critical values for t-tests, chi-square tests, ANOVA, and other statistical analyses across different degrees of freedom and significance levels.
This document analyzes statistical data from a sample of 300 metal pieces to determine if they meet a client's specifications. The data is grouped into 11 intervals in a frequency table. The mean is calculated as 0.601 and the standard deviation as 0.037. Graphs of the histogram and boxplot are included. It is determined that 96% of pieces, or 288 out of 300, meet the specifications of being between 1.5 +/- 0.15. The document also analyzes what percentage of pieces fall within various intervals from the mean.
This document discusses confidence intervals and provides examples of calculating confidence intervals for a population mean when the standard deviation is known and unknown. It explains that a confidence interval consists of an interval of values that has a specified probability of containing the true, unknown population parameter. The document also discusses the properties of the t-distribution and provides examples of constructing 95% confidence intervals for sample means from various datasets and commenting on how many intervals contain the actual population mean.
What is range and how to calculate it, quartile deviation,co-efficient of quartile deviation, arithmetic mean,mean deviation, variances and standard deviation.
The document summarizes a study that models wind direction data using a simultaneous linear functional relationship model for multivariate circular data following a von Mises distribution. It introduces the model and describes its parameters, parameter estimation using maximum likelihood, derivation of the covariance matrix, and a simulation study. The simulation study evaluated bias in estimates of the model parameters αj (which represents wind direction at different locations) and κ (which represents concentration) for varying sample sizes and values of κ. Results showed small bias in estimates of αj and κ.
ANOVA (analysis of variance) is a statistical technique used to compare differences between group means. It involves calculating the F ratio, which is the ratio of variance between groups to variance within groups. If the calculated F value is greater than the critical F value from statistical tables, then the difference between group means is considered statistically significant. The document provides steps for conducting a one-way ANOVA, including calculating sums of squares, mean squares, and the F ratio to determine if differences between three varieties of wheat are statistically significant based on per acre production data.
The document describes the application of log-linear modeling on medical data from Akanu Ibiam Federal Polytechnic Medical Centre. Log-linear models were used to study the associations between age, sex, and blood group. Interactions between the variables were observed in the fitted models. Specifically, interactions between age and sex, sex and blood group, and age and blood group were significant at the 5% level based on tests of partial association.
This document discusses calculating unstandardized and standardized beta coefficients manually versus using SPSS. It provides an example dataset with 10 observations of variables X and Y along with the calculations to derive the coefficients manually. The manual calculations find the unstandardized beta (b) is 0.611 and standardized beta is 0.438. The document notes SPSS can also be used to calculate the regression coefficients.
This document discusses the calculation of unstandardized (B) and standardized (Beta) regression coefficients manually versus using SPSS. It provides an example dataset with 10 observations of variables X and Y along with the calculations to derive B and Beta manually. The manual calculations find B to be 0.611 and Beta to be 0.438. The document also notes that SPSS can be used to calculate B and Beta and compares the results.
The Chi Square test can be used to determine if categorical data is dependent or independent. It involves calculating the difference between observed and expected frequencies, squaring the differences, summing them, and comparing the result to critical values from a Chi Square table to determine if the null hypothesis should be rejected or not. The document provides examples of how to perform Chi Square tests on genetic and vaccination data.
Bayesian Dynamic Linear Models for Strategic Asset Allocationmax chen
This document discusses Bayesian dynamic linear models for strategic asset allocation. It presents an approach using Bayesian modeling to predict excess returns on stocks and bonds based on predictor variables. The models allow for time-varying parameters and stochastic volatility. The approach averages predictions across multiple models to improve performance. It finds that accounting for parameter uncertainty and time-variation through Bayesian modeling and model averaging improves out-of-sample return and risk predictions compared to standard linear models without these features.
This document discusses multicollinearity, beginning with definitions and the case of perfect multicollinearity. It then examines the case of near or imperfect multicollinearity using data on the demand for widgets. There is high multicollinearity between the price and income variables, resulting in unstable coefficient estimates with large standard errors and insignificant t-statistics. The document outlines methods to detect multicollinearity such as high R-squared but insignificant variables, high pairwise correlations, auxiliary regressions, and variance inflation factors. It provides an example using data on chicken demand.
This document summarizes the use of log-Poisson regression models for claims reserving and calculating reserves. It shows how to fit a log-Poisson regression model to incremental claims payments data and use it to estimate total reserves. It also provides methods for calculating the prediction error and quantifying the uncertainty of reserve estimates, including using the bootstrap procedure to generate multiple simulated reserve estimates.
The document contains data arranged in tables with columns for variables x, y, f, x^2, etc. It discusses calculating means, standard deviations, and fitting distributions such as normal and lognormal to the data. It also contains examples of using the method of least squares to fit linear and quadratic regression models to data.
This table provides critical values (tα/ν) of the Student's t-distribution for various confidence levels (α) with degrees of freedom (ν) ranging from 1 to infinity. The t-distribution is used to test hypotheses about the mean of a population when the population standard deviation is unknown. The table allows researchers to determine if a calculated t-statistic is greater than the critical value and thus determine if the null hypothesis can be rejected for a given confidence level and degrees of freedom.
Excel can create a visual timeline chart and help you map out a project schedule and project phases. Specifically, you can create a Gantt chart, which is a popular tool for project management because it maps out tasks based on how long they'll take, when they start, and when they finish.
The document discusses kurtosis, which refers to the peakedness of a distribution. It presents formulas for calculating kurtosis from both ungrouped and grouped data. As an example, it computes the kurtosis from a set of ungrouped data and determines that the distribution is platykurtic, with a kurtosis value less than 3. It also computes the kurtosis from a set of grouped data and again finds the distribution to be platykurtic.
The document discusses the normal distribution and how it relates to sampling. It states that as the sample size increases, the sampling distribution of sample means approaches normality with a mean equal to the population mean and a standard deviation equal to the population standard deviation divided by the square root of the sample size. This is known as the central limit theorem.
The document provides information about regression analysis and calculating the coefficient of determination. It includes:
1) Instructions on how to perform a regression analysis using a calculator to find the least squares regression line, correlation coefficient, and residual plot from sample data.
2) An explanation of the coefficient of determination as a measure of how much variability in the variable y can be explained by its linear relationship with variable x.
3) A calculation example finding the coefficient of determination to be 0.83 for a dataset relating height and shoe size, meaning approximately 83% of the variation in shoe size can be explained by height.
The document presents Mamdani and Sugeno fuzzy inference system models for calculating the resonant frequency of rectangular microstrip antennas. Two types of fuzzy inference system models - Mamdani and Sugeno - are used to compute the resonant frequency. The parameters of the fuzzy inference system models are determined using various optimization algorithms. The Sugeno fuzzy inference system model trained with the least-squares algorithm provided the best results, with the resonant frequency predictions being in very good agreement with experimental results from literature.
This document contains tables of critical values for various statistical tests including the z-distribution, t-distribution, chi-square distribution, and F-distribution. The z-distribution table lists critical values for the z-test across different levels of significance. Similarly, the other tables provide critical values for t-tests, chi-square tests, ANOVA, and other statistical analyses across different degrees of freedom and significance levels.
This document analyzes statistical data from a sample of 300 metal pieces to determine if they meet a client's specifications. The data is grouped into 11 intervals in a frequency table. The mean is calculated as 0.601 and the standard deviation as 0.037. Graphs of the histogram and boxplot are included. It is determined that 96% of pieces, or 288 out of 300, meet the specifications of being between 1.5 +/- 0.15. The document also analyzes what percentage of pieces fall within various intervals from the mean.
This document discusses confidence intervals and provides examples of calculating confidence intervals for a population mean when the standard deviation is known and unknown. It explains that a confidence interval consists of an interval of values that has a specified probability of containing the true, unknown population parameter. The document also discusses the properties of the t-distribution and provides examples of constructing 95% confidence intervals for sample means from various datasets and commenting on how many intervals contain the actual population mean.
What is range and how to calculate it, quartile deviation,co-efficient of quartile deviation, arithmetic mean,mean deviation, variances and standard deviation.
The document summarizes a study that models wind direction data using a simultaneous linear functional relationship model for multivariate circular data following a von Mises distribution. It introduces the model and describes its parameters, parameter estimation using maximum likelihood, derivation of the covariance matrix, and a simulation study. The simulation study evaluated bias in estimates of the model parameters αj (which represents wind direction at different locations) and κ (which represents concentration) for varying sample sizes and values of κ. Results showed small bias in estimates of αj and κ.
1. Qimiao Amy Hu
Sample Multiple Regression Analysis using SPSS:
Based on below Scatter Plot Matrix and Correlation Matrix, Y and X1, X2 & X3 are highly correlated;
while Y and other variables (X4 to X10) are weakly correlated. We can drop X5 to X10 from the
model. In additions, both plots exhibit multicollinearality among X1, X2 & X3 (correlation highlighted
in yellow).
Y= # of active physicians
X1 = total population
X2 = total personal income
X3 = number of hospital beds
X4 = % of population aged 18‒34
X5 = % of population 65 or older
X6 = % high school graduates
X7 = % bachelor's degrees
X8 = % below poverty level
X9 = % unemployment
X10 = per capita income
Y
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10Y
2. Qimiao Amy Hu
Sample SAS Codes:
ODS SALES.EXCELXP
file='/folders/myfolders/sasuser.v94/sales performance.xls'
STYLE=minimal
OPTIONS ( Orientation = 'landscape'
FitToPage = 'yes'
Pages_FitWidth = '1'
Pages_FitHeight = '100' );
ods output ParameterEstimates=work.Sales_Regre;
ods graphics on;
title "Linear Regression with Diagnostic Plots";
Proc Reg data=Sales_Reg;
# of active
physicians
Total
population
Total
personal
income
# of
hospital
beds
% of pop
aged 18-34
% of pop
65 or older
% of high
school
grads
% of
bachelor's
degrees
% below
poverty
level
%
unemploym
ent
Per capita
income
Pearson Correlation 1 .980**
.986**
.990**
.312** -.080 -.057 .182 -.034 -.061 .276*
Sig. (2-tailed) .000 .000 .000 .006 .488 .620 .113 .770 .598 .015
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation .980** 1 .995**
.987**
.303** -.130 -.081 .106 -.035 -.019 .207
Sig. (2-tailed) .000 .000 .000 .007 .258 .484 .357 .764 .868 .071
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation .986**
.995** 1 .983**
.310** -.127 -.055 .161 -.072 -.047 .276*
Sig. (2-tailed) .000 .000 .000 .006 .271 .634 .161 .533 .685 .015
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation .990**
.987**
.983** 1 .284* -.070 -.098 .106 .009 -.021 .205
Sig. (2-tailed) .000 .000 .000 .012 .546 .395 .361 .941 .855 .074
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation .312**
.303**
.310**
.284* 1 -.541** .040 .344** -.044 -.034 .162
Sig. (2-tailed) .006 .007 .006 .012 0 .728 .002 .705 .767 .159
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation -.080 -.130 -.127 -.070 -.541** 1 -.115 -.163 .133 .065 -.026
Sig. (2-tailed) .488 .258 .271 .546 .000 .321 .156 .250 .576 .823
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation -.057 -.081 -.055 -.098 .040 -.115 1 .720**
-.832**
-.701**
.442**
Sig. (2-tailed) .620 .484 .634 .395 .728 .321 .000 .000 .000 .000
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation .182 .106 .161 .106 .344** -.163 .720** 1 -.618**
-.568**
.746**
Sig. (2-tailed) .113 .357 .161 .361 .002 .156 .000 .000 .000 .000
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation -.034 -.035 -.072 .009 -.044 .133 -.832**
-.618** 1 .576**
-.623**
Sig. (2-tailed) .770 .764 .533 .941 .705 .250 .000 .000 .000 .000
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation -.061 -.019 -.047 -.021 -.034 .065 -.701**
-.568**
.576** 1 -.391**
Sig. (2-tailed) .598 .868 .685 .855 .767 .576 .000 .000 .000 .000
N 77 77 77 77 77 77 77 77 77 77 77
Pearson Correlation .276* .207 .276* .205 .162 -.026 .442**
.746**
-.623**
-.391** 1
Sig. (2-tailed) .015 .071 .015 .074 .159 .823 .000 .000 .000 .000
N 77 77 77 77 77 77 77 77 77 77 77
Per capita
income
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
% of pop
aged 18-34
% of pop
65 or older
% of high
school
grads
% of
bachelor's
degrees
% below
poverty
level
%
unemploym
ent
Correlations
# of active
physicians
Total
population
Total
personal
income
# of
hospital
beds
3. Qimiao Amy Hu
model y=x1-x8;
OUTPUT OUT=OUTREG1 P=PREDICT R=RESID RSTUDENT=RSTUDENT COOKD=COOKD;
run;
title 'Sales Regression Histogram';
ods select HistogramBins MyHist;
proc univariate data=Sales_Reg;
histogram x1 / midpercents name='MyHist'
endpoints = 3.425 to 3.6 by .025;
run;
PROC IMPORT OUT=Demographics DATAFILE='/folders/myfolders/demographics.xls'
DBMS=xls REPLACE;
SHEET='sheet1';
Proc Format ;
Value RC 1='White' 2='African American' 3='Hispanic' 4='Asian' 5-9='Others';
Run;
Proc Format ;
Value GD 1='Male' 2='Female' 9='Unknown';
run;
Proc Freq data=Demographics;
Format race RC.;
Format Gender GD.;
Tables Race*Gender/chisq out=chisqT;
run;
PROC EXPORT DATA =chisqT
OUTFILE = "C:desktopdemographics.xls"
DBMS=xls REPLACE;
Sheet = "ChisqT";
QUIT;
Proc Freq data=Demographics;
Format race RC.;
Format Gender GD.;
Tables Race*Gender/fisher out=fisherT;
run;
4. Qimiao Amy Hu
The SAS System
The FREQ Procedure
Frequency
Percent
Row Pct
Col Pct
Table of race by gender
race(race)
gender(gender)
Male Female Unknown Total
White 6
11.11
50.00
24.00
5
9.26
41.67
17.86
1
1.85
8.33
100.00
12
22.22
African American 6
11.11
37.50
24.00
10
18.52
62.50
35.71
0
0.00
0.00
0.00
16
29.63
Hispanic 6
11.11
35.29
24.00
11
20.37
64.71
39.29
0
0.00
0.00
0.00
17
31.48
Asian 7
12.96
100.00
28.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
7
12.96
Others 0
0.00
0.00
0.00
2
3.70
100.00
7.14
0
0.00
0.00
0.00
2
3.70
Total 25
46.30
28
51.85
1
1.85
54
100.00
Statistics for Table of race by gender
Statistic
D
F Value Prob
Chi-Square 8 15.1896 0.0556
Likelihood Ratio Chi-Square 8 17.9763 0.0214
Mantel-Haenszel Chi-Square 1 1.4866 0.2228
Phi Coefficient 0.5304
Contingency Coefficient 0.4685
Cramer's V 0.3750
Sample Size = 54
The SAS System
5. Qimiao Amy Hu
Obs race gender COUNT PERCENT
1 White Male 6 11.1111
2 White Female 5 9.2593
3 White Unknown 1 1.8519
4 African American Male 6 11.1111
5 African American Female 10 18.5185
6 Hispanic Male 6 11.1111
7 Hispanic Female 11 20.3704
8 Asian Male 7 12.9630
9 Others Female 2 3.7037
PROC IMPORT OUT=Child_SC
DATAFILE='/folders/myfolders/sasuser.v94/sugar contents in the cereals.xls'
DBMS=xls REPLACE;
SHEET='children';
PROC IMPORT OUT=Adult_SC
DATAFILE='/folders/myfolders/sasuser.v94/sugar contents in the cereals.xls'
DBMS=xls REPLACE;
SHEET='adults';
Data CSC_STA;
set Child_SC (Rename=(Children_cereals=y1))end=Hu nobs=no_of_obs1;
SumY1+Y1;
SSY1+Y1**2;
YY1+2*Y1;
if Hu;
Sample_size1=no_of_obs1;
last=Hu;
n1=_n_;
true=Hu;
MeanY1=SumY1/n1;
VARY1=(SSY1-YY1*MeanY1+n1*(MeanY1)**2)/(n1-1);
Keep n1 SumY1 MeanY1 VarY1;
run;
proc print data=CSC_STA noobs;
title "Children Sugar Content Statistics";
run;
Data ASC_STA;
Set Adult_SC (Rename=(adults_cereals=y2))end=Hu nobs=no_of_obs2;
SumY2+Y2;
6. Qimiao Amy Hu
SSY2+Y2**2;
YY2+2*Y2;
if Hu;
Sample_size1=no_of_obs2;
last=Hu;
n2=_n_;
true=Hu;
MeanY2=SumY2/n2;
VARY2=(SSY2-YY2*MeanY2+n2*(MeanY2)**2)/(n2-1);
Keep n2 SumY2 MeanY2 VarY2;
run;
proc print data=ASC_STA noobs;
title "Adults Sugar Content Statistics";
run;
Data SC_STA;
Set Work.CSC_STA;
Set Work.ASC_STA;
/*Alpha=5%*/
/*NL denotes the sample size for the sample group with larger sample variance
NS denotes the sample size for the sample group with smaller sample variance */
if max(VarY1,VarY2)=VarY1 then NL=n1;
else NL=n2;
If NL=n1 then NS=n2;
else NS=n1;
F=Max(VarY1, VarY2)/Min(VarY1, VarY2);
p_value1=1 - CDF('F', F, NL-1 , NS-1);
T_critical1=FINV(1-.05 , NL-1 , NS-1);
t_Sta2=((MeanY1-MeanY2)-0)/sqrt(VARY1/n1+VARY2/n2);
df2=(VARY1/n1+VARY2/n2)**2/(1/(n1-1)*(VARY1/n1)**2+1/(n2-1)*(VARY2/n2)**2);
T_crital2=TINV(1-.05/2, df2);
p_Value2=2*(1-CDF('T', t_Sta2, df2));
SS_pool=((n1-1)*(VarY1**2)+(n2-1)*(VarY2**2))/((n1-1)+(n2-1));
SE_pool=sqrt(SS_pool)*sqrt(1/n1+1/n2);
df3=(n1-1)+(n2-1);
t_Sta3=(MeanY1-MeanY2-0)/SE_pool;
T_Critical3=TINV(1-0.05/2, df3);
P_Value3=2*(1-CDF('T', t_Sta3, df3));
Drop SS_Pool;
Run;
proc transpose data=SC_STA out=Two_sided_T_Test (Rename=(Col1=STA_Value));
Proc Print Data=Two_sided_T_Test noobs;