SAS and R Code
For
Basic Statistics
Avjinder Singh Kaler
Table Content
1.Reading Data
2.Descriptive Statistics (DS)
3.Correlation and Covariance
4.Analysis of Variance (ANOVA)
5.Regression and Multiple Regression
1.Reading Data
1.SAS
I. There are many way of data reading in SAS, but the most common one is
Data data_name; data name used for file
Input x y z; variables
Datalines;
1 2 4
4 5 7
. . . Copy and paste data here
;
Run;
II. Data can be read from the folder of computer.
data data_name;
infile 'C:Avi soy.txt';
input GE ENV YIELD;
run;
filename avi 'C:Avi soy.txt';
data data_name;
infile avi;
input GE ENV YIELD;
run;
III. Data can be imported using the import option under File tab.
2.R
> getwd() # get current working directory
> setwd("C:/MyFolder") # set working directory
First Set working directory to the folder where data is present
If data is in text format, then data can be read as:
I. mydata <- read.table(“data_name.txt”)
If data is in csv format, then data can be read as:
II. mydata <- read.csv(“data_name.csv”)
2. Descriptive Statistics
1.SAS
Descriptive statistics in SAS can be computed using PROC UNIVARIATE
PROC UNIVARIATE DATA=data_name;
VAR yield; Descriptive statistics for variable
By ENV; By Group
Histogram;
RUN;
Other ways are PROC MEANS and PROC FREQ
PROC FREQ DATA=data_name;
TABLES yield ;
RUN;
PROC MEANS DATA=data_name;
CLASS GE;
VAR yield;
RUN;
2.R
Descriptive statistics in R can be computed by following ways;
summary(mydata)
There are some packages in R that can be loaded and used for DS such as Hmisc, pastecs,
and psych .
library(Hmisc)
describe(mydata)
library(pastecs)
stat.desc(mydata)
library(psych)
describe(mydata)
Note: First install these packages.
3. Correlation and Covariance
There are different methods of correlations such as pearson, spearman or kendal.
1.SAS
Correlation and covariance in SAS for different methods:
proc corr cov data=data_name pearson spearman kendall hoeffding plots=all;
var x y z;
run;
2.R
Correlation and covariance in R for different methods:
cor(mydata, use= "complete.obs", method= pearson )
cov(mydata, use= "complete.obs", method= pearson )
Note: Here mydata is numeric data frame. Method can be changed.
There are some packages that can be loaded such as Hmisc package.
library(Hmisc)
rcorr(mydata, type="pearson") # type can be pearson or spearman
Correlation between two variables x and y
cor(x,y)
4. Analysis of Variance (ANOVA)
1.SAS
PROC ANOVA, PROC GLM, PROC MIXED, PROC GLIMMIX, and PROC HPMIXED
can be used for ANOVA.
Proc ANOVA for non-missing data.
PROC GLM for fixed effect factors.
PROC MIXED, PROC GLIMMIX, and PROC HPMIXED for random and fixed effect
factors.
Proc ANOVA/GLM data=data_name;
Class factorvar;
Model responsevar= factorvar;
MEANS factorvars / BON T LSD TUKEY;
Run;
Proc MIXED/GLIMMIX/HPMIXED data=data_name;
Class factorvar;
Model responsevar= factorvar;
Random factorvar;
lsmeans A / adjust= BON T TUKEY;
Run;
Note: Select one model depending on your dataset.
2.R
Analysis of variance (ANOVA) can be computed in R using:
x <- aov(responvar ~ factorvar, data=mydata) #CBD
x <- aov(y ~ A + B, data=mydata) #RCBD
x <- aov(y ~ A + B + A:B, data=mydata) #factorial design
summary(x)
Multiple comparisons
TukeyHSD(x)
5. Regression and Multiple Regressions
1.SAS
Simple Regression
Proc reg data=data_name;
model response_var = factor_var1;
run;
Multiple Regression
Proc reg data=data_name;
model response_var = factor_var1 factor_var2 factor_var3 factor_var4 ;
run;
2.R
Multiple Linear Regression
x<- lm(y ~ x1 + x2 + x3, data=mydata)
summary(fit) # show results

SAS and R Code for Basic Statistics

  • 1.
    SAS and RCode For Basic Statistics Avjinder Singh Kaler
  • 2.
    Table Content 1.Reading Data 2.DescriptiveStatistics (DS) 3.Correlation and Covariance 4.Analysis of Variance (ANOVA) 5.Regression and Multiple Regression
  • 3.
    1.Reading Data 1.SAS I. Thereare many way of data reading in SAS, but the most common one is Data data_name; data name used for file Input x y z; variables Datalines; 1 2 4 4 5 7 . . . Copy and paste data here ; Run; II. Data can be read from the folder of computer. data data_name; infile 'C:Avi soy.txt'; input GE ENV YIELD; run; filename avi 'C:Avi soy.txt'; data data_name; infile avi; input GE ENV YIELD; run; III. Data can be imported using the import option under File tab. 2.R > getwd() # get current working directory > setwd("C:/MyFolder") # set working directory First Set working directory to the folder where data is present If data is in text format, then data can be read as: I. mydata <- read.table(“data_name.txt”) If data is in csv format, then data can be read as: II. mydata <- read.csv(“data_name.csv”)
  • 4.
    2. Descriptive Statistics 1.SAS Descriptivestatistics in SAS can be computed using PROC UNIVARIATE PROC UNIVARIATE DATA=data_name; VAR yield; Descriptive statistics for variable By ENV; By Group Histogram; RUN; Other ways are PROC MEANS and PROC FREQ PROC FREQ DATA=data_name; TABLES yield ; RUN; PROC MEANS DATA=data_name; CLASS GE; VAR yield; RUN; 2.R Descriptive statistics in R can be computed by following ways; summary(mydata) There are some packages in R that can be loaded and used for DS such as Hmisc, pastecs, and psych . library(Hmisc) describe(mydata) library(pastecs) stat.desc(mydata) library(psych) describe(mydata) Note: First install these packages.
  • 5.
    3. Correlation andCovariance There are different methods of correlations such as pearson, spearman or kendal. 1.SAS Correlation and covariance in SAS for different methods: proc corr cov data=data_name pearson spearman kendall hoeffding plots=all; var x y z; run; 2.R Correlation and covariance in R for different methods: cor(mydata, use= "complete.obs", method= pearson ) cov(mydata, use= "complete.obs", method= pearson ) Note: Here mydata is numeric data frame. Method can be changed. There are some packages that can be loaded such as Hmisc package. library(Hmisc) rcorr(mydata, type="pearson") # type can be pearson or spearman Correlation between two variables x and y cor(x,y)
  • 6.
    4. Analysis ofVariance (ANOVA) 1.SAS PROC ANOVA, PROC GLM, PROC MIXED, PROC GLIMMIX, and PROC HPMIXED can be used for ANOVA. Proc ANOVA for non-missing data. PROC GLM for fixed effect factors. PROC MIXED, PROC GLIMMIX, and PROC HPMIXED for random and fixed effect factors. Proc ANOVA/GLM data=data_name; Class factorvar; Model responsevar= factorvar; MEANS factorvars / BON T LSD TUKEY; Run; Proc MIXED/GLIMMIX/HPMIXED data=data_name; Class factorvar; Model responsevar= factorvar; Random factorvar; lsmeans A / adjust= BON T TUKEY; Run; Note: Select one model depending on your dataset. 2.R Analysis of variance (ANOVA) can be computed in R using: x <- aov(responvar ~ factorvar, data=mydata) #CBD x <- aov(y ~ A + B, data=mydata) #RCBD x <- aov(y ~ A + B + A:B, data=mydata) #factorial design summary(x) Multiple comparisons TukeyHSD(x)
  • 7.
    5. Regression andMultiple Regressions 1.SAS Simple Regression Proc reg data=data_name; model response_var = factor_var1; run; Multiple Regression Proc reg data=data_name; model response_var = factor_var1 factor_var2 factor_var3 factor_var4 ; run; 2.R Multiple Linear Regression x<- lm(y ~ x1 + x2 + x3, data=mydata) summary(fit) # show results