Sas summary guide

3,554 views
3,465 views

Published on

Published in: Technology
1 Comment
5 Likes
Statistics
Notes
No Downloads
Views
Total views
3,554
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
365
Comments
1
Likes
5
Embeds 0
No embeds

No notes for slide

Sas summary guide

  1. 1. SASSummary Guide School of Applied Statistics November, 03
  2. 2. 1 Contents1. Introduction........................................................................................................................2 1.1 Structure of a SAS Job .........................................................................................2 1.2 SAS Language......................................................................................................2 1.3 SAS Variables ......................................................................................................2 1.4 SAS Data Sets ......................................................................................................32. Introduction to the DATA Step .........................................................................................3 2.1 DATA Statement..................................................................................................3 2.2 Sources of Input ...................................................................................................3 2.3 Input of Raw Data ................................................................................................4 2.4 Formats: Input and Output ...................................................................................5 2.5 How SAS Executes a DATA Step .......................................................................5 2.6 Transformation of Data ........................................................................................5 2.7 Missing Values.....................................................................................................5 2.8 Modifying an Existing SAS Data Set ..................................................................6 2.9 Output from a SAS DATA Step...........................................................................6 2.10 Output to Create Stored ASCII Files .................................................................73. Introduction to the PROC Step ..........................................................................................74. Basic Procedures................................................................................................................85. More on the DATA Step....................................................................................................13 5.1 IF - THEN - ELSE Statements.............................................................................13 5.2 Selecting Observations.........................................................................................14 5.3 DO and END Statements .....................................................................................14 5.4 DO Loops .............................................................................................................14 5.5 Arrays...................................................................................................................15 5.6 RETAIN ...............................................................................................................15 5.7 DROP and KEEP .................................................................................................15 5.8 RENAME and LABEL ........................................................................................166. Data Management ..............................................................................................................16 6.1 SET.......................................................................................................................16 6.2 MERGE................................................................................................................17 6.3 UPDATE ..............................................................................................................177. Statistical Procedures.........................................................................................................188. Graphical Procedures .........................................................................................................219. Output Delivery System (ODS) .........................................................................................2210. Further Facilities ..............................................................................................................2311. Publications......................................................................................................................23SAS Summary Guide November, 03 School of Applied Statistics
  3. 3. 21. IntroductionThis handout is meant as a brief introduction to the syntax of the SAS package which isavailable on UNIX workstations and PC computers at The University of Reading. The SASlanguage is similar for all versions but there are differences in file access and storage. Thisdocument is designed to give a brief synopsis of many basic commands used in the Data stepand the general structure to some statistical procedures (Proc). It is, by no means, completeand there are numerous specialised manuals published by SAS Institute (some of which are inRoom G16 in the School of Applied Statistics).1.1 Structure of a SAS JobA SAS program consists of a sequence of one or more steps and each step may containseveral SAS statements. There are two kinds of step:-• The DATA step which is used to create and manipulate SAS data sets• The PROC step which is used for analysing or processing SAS data setsA SAS job is made up of any number of these steps. The beginning of one step signifies theending of the previous step.1.2 SAS LanguageSAS statements can begin in any column of a line and can be continued on subsequent lines.Each SAS statement must end with a semicolon but is mainly case-sensitive (i.e. upper andlower case should not be freely mixed).There are three types of SAS statements:-• Statements which appear in the DATA step• Statements which appear in the PROC step• Statements which can appear anywhere (global statements)Comments can also be included in a SAS program, these are useful for annotating yourprogram. An asterisk is used to comment out a single statement.e.g. * This is a comment ;or to comment out a block of lines use the /* and */ delimiter pairs:-e.g. /* This is a comment which will not be acted upon by SAS */1.3 SAS VariablesThere are two types of SAS variable - numeric and character. They can have the followingattributes:-LENGTH numeric variables 2 - 8 bytes character variables 1 - 200 bytes / charactersINFORMAT format SAS uses to read a data value into a variableFORMAT format SAS uses to write each value of a variableLABEL descriptive label of up to 256 charactersSAS Summary Guide November, 03 School of Applied Statistics
  4. 4. 31.4 SAS Data SetsA SAS data set is a collection of data values arranged in a rectangular table, the rowsrepresenting observations and the columns representing variables. Each variable must begiven a name which consists of 1 - 32 characters. The name must start with a letter and cancontain any alphanumeric character or underscore. Avoid special characters in variablenames such as . or $ . Special variables within SAS are denoted by names that begin and endwith an underscore.SAS data sets can be either temporary or permanent. Temporary data sets are given a one-level name by the user which is automatically prefixed with WORK. by the SAS system.This name can be omitted altogether, in which case SAS names the data sets DATA1,DATA2 ... for the 1st, 2nd ... data sets defined. Temporary data sets are erased on leaving thecurrent SAS session. Permanent data sets must be given a two-level name by the user linkingto their storage location.e.g. LIBNAME PERM complete_pathname; PROC PRINT DATA=PERM.STUDENTS; RUN;Permanent SAS data sets are stored differently between versions and allocated different fileextensions. However, all data sets are upward compatible. There are several words whichshould not be used as the first part of the SAS data set name. These include such words asPRINT, EXEC, DATA etc. and also SAS reserved names such as LIBRARY, MAPS, WORKetc.SAS automatically documents a permanent data set to include a data set label, variableattributes and history information. The data are stored in the form in which SAS uses them,therefore saving computer time and making it unnecessary to execute input statements eachtime the data set is used.2. Introduction to the DATA Step2.1 DATA StatementThe DATA statement signals the beginning of the DATA step and gives a name to the SASdata set being created. This SAS data set can be used as input to any subsequent DATA orPROC steps.e.g. a) DATA PERM.PATIENTS; creates a permanent data set b) DATA SCHOOL; creates a temporary data set c) DATA; creates a temporary data set with default name DATAn d) DATA _NULL_; does not create a data set2.2 Sources of Inputa) The DATALINES or CARDS statement is used when the data are in the same file as the SAS statements:- DATA REGRESS; INPUT X Y Z;SAS Summary Guide November, 03 School of Applied Statistics
  5. 5. 4 DATALINES; 61 44 29 17 6 43 . .b) The INFILE statement is used to read data from an external file on your workdisk:- DATA REGRESS; INFILE file_identifier; INPUT X Y Z;The file identifier in the INFILE statement is the full pathname and filename of the externaldata file, residing on your disk, which is to be linked to your SAS program.2.3 Input of Raw DataThe INPUT statement is used to describe the raw input data. There are three types of inputmode which can be mixed in one INPUT statement:-• LIST (or free-field)• COLUMN• FORMATTEDa) LIST INPUTThis mode of input simply lists the variables in the order in which they appear in the inputdatae.g. INPUT NAME $ AGE SEX $; INPUT NAME $ Q1-Q32;where $ is used after a variable name to indicate a character variable whose value has adefault length of 8 with no embedded blanks. Values must be separated by at least one space(free format).b) COLUMN INPUTWith this mode of input the columns are specified within which each variable value is locatede.g. INPUT CANNAME $ 1-15 PARTY $ 20-24 VOTES 30-40;The data values can be read in any order and blank fields are automatically set to missing.Embedded blanks are allowed in character data by specifying the maximum length of a value.c) FORMATTED INPUTThis is a very flexible method of input as it is possible to read data in virtually any form. SASkeeps track of its position on the input lines with a pointere.g. INPUT @3 QUEST3 +10 QUEST12 / @60 RESPONSE;There are various types of pointer controls each having a different meaning. Listed beloware some of the more frequently used ones:-@n move pointer to column nSAS Summary Guide November, 03 School of Applied Statistics
  6. 6. 5+n move the pointer forward n columns#n move pointer to line n/ move to next lineWhichever mode of input is used the following pointer controls can be used to maintain thecurrent pointer position:-@ hold data line for next INPUT statement in the current DATA step@@ hold data line for more executions of the DATA step2.4 Formats: Input and OutputA set of directions for reading a value is called an INFORMAT and a set of directions forprinting a value is called a FORMAT. It is possible to specify formats for numeric andcharacter variables and also date and time variables. There are a large number of FORMATand INFORMAT specifications, refer to SAS Language Reference Version 8 for furtherinformation.2.5 How SAS Executes a DATA StepA DATA step is executed once for each observation in the data set. A DATA step that doesnot contain an INPUT, SET, MERGE or UPDATE statement is executed once. The SASvariable _N_ is automatically generated for each DATA step, its value is the number of timesthat SAS has begun executing the step (_N_ is not directly available outside the currentDATA step). All variables referred to in the DATA step, for example the variables named inthe input statement and any new variables generated, make up the program data vector.For each execution of the DATA step:-• The program data vector is initialised to missing.• The data values of the current observation are read using the INPUT statement. Any new variables are computed and added to the program data vector and any variables not wanted are dropped.• The values in the program data vector are then added to the data set being created2.6 Transformation of DataThere is a range of standard functions available in SAS for transforming data. For a full listof these functions consult the SAS Language Reference. Manipulation and transformation ofdata is carried out in the DATA step with the resulting variable being added to the data setautomatically.e.g. SUM=X + X; X2=X * X; or X2=X**2; LX=LOG(X);2.7 Missing ValuesVariables with missing values on input are specified in SAS by a full stop or a blank field.On output numeric variables are displayed as a full stop and character variables as a blankfield. For numeric variables it is also possible to specify up to 27 special missing valuesymbols ( A - Z and _ ) to distinguish between different kinds of missing data. This is doneusing the MISSING statement:-SAS Summary Guide November, 03 School of Applied Statistics
  7. 7. 6 DATA; INPUT X; MISSING A B; IF X = 99 THEN X = .A; IF X = 999 THEN X = .B; CARDS;a) .A is used to distinguish from the variable name Ab) A variable is set to missing if the input field contains only a full stop or is blank.c) A variable is set to missing if the input field contains an illegal character2.8 Modifying an Existing SAS Data SetOnce data have been read into a SAS data set it is possible to modify that data in other DATAsteps while keeping the original data set unchanged and without having to re-input the datafrom the raw data file. This is easily done by transferring data from the existing SAS data setinto another one.e.g. DATA NEW; SET PERM.PATIENTS; DOSE=PILL_A*QTY_A;Each time the SET statement is executed another observation is transferred from the existingSAS data set PERM.PATIENTS to the SAS data set being created and called NEW .2.9 Output from a SAS DATA StepOUTPUT statements allow you to control when an observation is written to one of the SASdata sets which are currently being created.e.g. OUTPUT; OUTPUT MISSDATA;When an OUTPUT statement is executed SAS will immediately output the current values tothe named or current SAS data set. OUTPUT statements are useful for:-a) Creating 2 or more observations from 1 record of input datab) Combining several observations into one observationc) Creating more than one SAS data set from one input fileeg. DATA HARV1 HARV2; SET COMPLETE; IF HARVEST=1 THEN OUTPUT HARV1; IF HARVEST=2 THEN OUTPUT HARV2;SAS Summary Guide November, 03 School of Applied Statistics
  8. 8. 72.10 Output to Create Stored ASCII FilesThe FILE and PUT statements are used within a DATA step and are analogous to the INFILEand INPUT statements. The FILE command links SAS to a specific external file, while thePUT command specifies the output record format.e.g. DATA CREATE; SET CLASSNO; FILE file_identifier; PUT NAME $ 1-8 SEX $ 11 AGE 13-14;3. Introduction to the PROC StepSome of the procedures available in SAS are:-Basics: CHART, CONTENTS, CORR, DATASETS, FORMAT, FREQ, MEANS, PLOT, PRINT, SORT, SUMMARY, TABULATE, TRANSPOSE, UNIVARIATEStatistics: ANOVA, CANCORR, CANDISC, CLUSTER, DISCRIM, FACTOR, GLM, PRINCOMP, REG, TTESTGraph: GCHART, GCONTOUR, GMAP, GPLOT, GSLIDE, G3D, G3GRIDSAS procedures analyse and process SAS data sets as follows:-a) Read SAS data setsb) Perform the requested taskc) Print resultsd) Create SAS output data sets (optional)Most SAS procedures have default option settings for the more common situations oranalyses. However, information can be given to the PROC step to specify:-a) Which data set to processb) Which variables to processc) Whether to process the data in subsetsThe PROC statement is used to begin a procedure.e.g. PROC MEANS DATA=PERM.PATIENTS MEAN STD;Some of the more commonly used statements within the PROC step are:-a) General statements common to many proceduresVAR Specifies variables to be analysedID Specifies a variable whose values identify observations in the SAS data setSAS Summary Guide November, 03 School of Applied Statistics
  9. 9. 8BY Specifies that the data set is to be processed in groups N.B. The data set must have already been sorted in the order of the current BY group.WEIGHT Specifies a variable whose values are the relative weights for the observationsWHERE Subsets observations to be analysed based on specified criteriab) Statements specific to individual proceduresTABLES Table request in PROC FREQPLOT Plot request in PROC PLOTMODEL Model specification in PROC ANOVA, PROC GLM, PROC REG etc.c) Statements describing variable attributesFORMAT Specifies formats for printing variable valuesLABEL Associates descriptive labels with variable namesLists of names can be abbreviated:-a) Range of variables VAR SEX -- TEMP;b) Numeric suffix range VAR Q1 - Q20;c) Range of numeric variables only VAR AGE _NUMERIC_ TEMP;d) Range of character variables only VAR NAME _CHARACTER_ SEX;e) All numeric variables VAR _NUMERIC_;f) All character variables VAR _CHARACTER_;4. Basic ProceduresPROC CHARTThis procedure produces horizontal and vertical bar charts, pie charts, star charts and blockcharts for numeric and character variables. The charts can represent frequencies andcumulative frequencies, percentages and cumulative percentages, sums and means.PROC CHART DATA = data_set_name options ;HBAR variable_list ; produces horizontal bar chartVBAR variable_list ; produces vertical bar chartPIE variable_list ; produces pie chartSTAR variable_list ; produces star chartBLOCK variable_list ; produces block chartBY variable_list ;SAS Summary Guide November, 03 School of Applied Statistics
  10. 10. 9PROC CORRThis procedure computes correlation coefficients between variables. Various univariatestatistics are also computed.PROC CORR DATA = data_set_name options ;VAR variable_list ;WITH variable_list ;WEIGHT variable ;FREQ variable ;BY variable_list ;PROC FORMATThis procedure is used to define formats for specifying labels for variable values used foroutput. Formats can be used for either numeric or character variables. They can be used inPUT statements in a DATA step and in FORMAT statements in a PROC step. In FORMATstatements in a DATA step they can also be used in which case they are then associated withthe variable for the remainder of the SAS job, unless changed.PROC FORMAT options ;VALUE format_name value1 = label1 value2 = label2 . . valuen = labeln ;format_name Must be a unique SAS name which must begin with a $ for character variablesvalues Can be a single number or a range of numbers, or several numerical or character valueslabels Labels can contain a maximum of 40 characters and must be enclosed in quotese.g. PROC FORMAT; VALUE $SEXFMT M = Male F = Female; VALUE AGEFMT 1 - 16 = Child 17 - High = Adult;The formats defined above can be used in other procedures as follows:- PROC PRINT DATA = PERM.PATIENTS;SAS Summary Guide November, 03 School of Applied Statistics
  11. 11. 10 VAR SEX AGE; FORMAT SEX $SEXFMT. AGE AGEFMT. ;NB. The full stop after SEXFMT and AGEFMT is essentialPROC FREQThis procedure produces 1 - way to n - way frequency tables of character and numericvariables.PROC FREQ DATA = data_set_name options ;WEIGHT weighting_variable ;BY variable_list ;TABLES table_request / options ;In the TABLES specification the values of the last variable form the columns and the valuesof the second last variable form the rows.e.g. TABLES VAR1; one - way table TABLES VAR1 * VAR2; two - way tablePROC MEANSThis procedure is used to produce simple univariate statistics for numeric variables. Theoptions available allow you to specify which statistics you want calculated e.g. mean,standard deviation, minimum. If no statistics are specifically requested in the MEANSstatement, then variable name, N, mean, standard deviation, minimum, maximum areprinted automatically.PROC MEANS DATA = data_set_name options ;BY variable_list ;VAR variable_list ;ID variable_list ;FREQ variable ;WEIGHT weighting_variable ;OUTPUT OUT = output_data_set_name statistics ;SAS Summary Guide November, 03 School of Applied Statistics
  12. 12. 11PROC PLOTThis procedure produces line-printer plots for both numeric and character variables. Variousoptions are available for specifying the plotting symbol, scaling the axes, drawing referencelines, superimposing 2 or more plots and drawing contour plots.PROC PLOT DATA = data_set_name options ;PLOT vertical_variable * horizontal_variable / options ;BY variable_list ;PROC PRINTThis procedure prints the values in a SAS data set.PROC PRINT DATA = data_set_name options ;BY variable_list ;VAR variable_list ;ID variable_list ;PAGEBY variable ;SUM variable_list ;SUMBY variable ;PROC SORTThis procedure rearranges the observations in an existing SAS data set or creates a new dataset containing the rearranged observations. Multiple sorting groups can be specified andvariables can be sorted in ascending or descending order.PROC SORT DATA = data_set_name OUT = output_data_set_name options ;BY variable_list ;Variables are automatically sorted in ascending order, for descending order putDESCENDING before the variable names in the BY statement. The SORT procedure shouldalways be used when subsequent procedures process the data set in groups using the BYstatement. It is possible to process a data set without sorting it beforehand by using theNOTSORTED option on the BY statement of the procedure being used. However, SASassumes that consecutive observations with the same BY value are grouped together althoughthe BY values are not necessarily sorted in alphabetic or numeric order.SAS Summary Guide November, 03 School of Applied Statistics
  13. 13. 12PROC SUMMARYThis procedure produces a SAS data set containing statistics similar to the MEANSprocedure, but much more efficiently. PROC SUMMARY does not produce any printedoutput and the data does not have to be sorted in order to produce subgroup statistics. AnOUTPUT and a VAR statement must be specified, and any number of OUTPUT statementscan be used. The VAR statement must precede the OUTPUT statements.PROC SUMMARY DATA = data_set_name options ;CLASS variable_list ;VAR variable_list ;BY variable_list ;FREQ variable ;WEIGHT weighting_variable ;ID variable_list ;OUTPUT OUT = output data_set_name statistics ;PROC TABULATEThis procedure provides a more flexible alternative to the FREQ procedure for producingtables. Each cell in the table contains a descriptive statistic e.g. mean, standard deviation,etc. TABULATE will generate tables defined by the TABLE statement. Classificationvariables must be specified with the CLASS statement, while the variables to be tabulated i.e.whose values are to be the cell contents must be specified by the VAR statement. Eachexpression in the TABLE statement defines the categories for the tables dimensions - page,row and column.PROC TABULATE DATA = data_set_name options ;CLASS variable_list ;VAR variable_list ;BY variable_list ;FREQ variable ;WEIGHT weighting_variable ;FORMAT variables_format ;LABEL variable = label ;TABLE page_expression, row_expression, column_expression ;SAS Summary Guide November, 03 School of Applied Statistics
  14. 14. 13PROC TRANSPOSEThis procedure transposes data sets, changing observations into variables and variables intoobservations. An output data set is created automatically and named according to theDATAn convention if a name is not specified.PROC TRANSPOSE DATA = data_set_name options ;VAR variable_list ;ID variable ;IDLABEL variable ;COPY variable_list ;BY variable_list ;5. More on the DATA Step5.1 IF - THEN - ELSE StatementsThese statements are used to execute a further SAS statement conditional on someexpression.IF expression THEN statement;ELSE statement ;THEN statement is executed if expression is non zero, non missing or trueELSE statement is executed if expression is zero, missing or falseThere are eight relational operators:-LT or < LE or <= GT or > GE or >=NL or ~< NG or ~> EQ or = NE or ~=In addition there are three logical operators:-NOT or ~ AND or & ORe.g. DATA ; IF CODE = 1 OR CODE = 2 THEN SEX = MALE ; ELSE SEX = FEMALE;e.g. DATA ; INPUT AGE ;SAS Summary Guide November, 03 School of Applied Statistics
  15. 15. 14 IF 0 < AGE < 10 THEN AGEGRP = 1 ; IF 10 <= AGE < 19 THEN AGEGRP = 2 ; IF AGE >= 19 THEN AGEGRP = 3 ;Any observations with values not included in one of the categories will produce missing orblank values.5.2 Selecting ObservationsIf not all observations are to be included in the data set being created they can be excluded bythe DELETE statement or the subsetting IF statement. The DELETE statement stops theprocessing of an observation:-e.g. DATA MALES ; INPUT AGE SEX $ ; IF SEX = F THEN DELETE ;The subsetting IF statement allows an observation to pass if the expression is true:-e.g. DATA MALES ; INPUT AGE SEX $ ; IF SEX = M ;The result from both of the above DATA steps is the same.5.3 DO and END StatementsDO statements specify that any statements following the DO are to be executed until amatching END appears.e.g. DATA ; INPUT AGE SEX $ FAMILY $ ; IF SEX = F THEN DO ; AGE = AGE - 5 ; FAMILY = NEW ; END ; ELSE AGE = AGE + 3 ;5.4 DO LoopsDO loops allow a range of statements, within a DATA step, to be repeated either a specifiednumber of times or while a specified condition holds.DO variable= start TO stop ;SAS Summary Guide November, 03 School of Applied Statistics
  16. 16. 15DO variable = start TO stop BY increment ;DO WHILE (expression) ;DO UNTIL (expression) ;DO OVER array_name ;Each must have a matching END statement to terminate execution.e.g. DO N = 1 TO 20 ; DO N = 1 TO 20 BY 4 ; DO WHILE (N < 20) ; DO UNTIL (N = 20) ;5.5 ArraysArrays in SAS are useful for processing a lot of SAS variables in the same wayARRAY array_name [index_variable] array_elements ;e.g. ARRAY A Q1 - Q5 ; DO OVER A ; A = LOG(A) ; END ;Array elements are substituted for the array name in SAS statements depending on the valueof the index variable. SAS will use its own internal index variable if none is defined. In theexample above the DO group is executed for every element in the array.5.6 RETAINThis statement retains a variable value from the last execution of the DATA step. Normallyall variables are set to missing before each execution of the DATA step. Initial values canalso be assigned to the variables.RETAIN variable ;RETAIN variable initial_value ;5.7 DROP and KEEPThe DROP statement excludes named variables from a data set or analysis and the KEEPstatement includes only named variables in a data set or analysis. Both statements can beused in the DATA step or as data set options which appear after the data set name on PROCsteps.SAS Summary Guide November, 03 School of Applied Statistics
  17. 17. 16e.g. DATA PERM.PATIENTS ; DROP PATNO ; DATA PERM.PATIENTS(DROP = PATNO) ; PROC PRINT DATA = PERM.PATIENTS(KEEP = AGE SEX) ;5.8 RENAME and LABELThe RENAME statement is used to rename variables.RENAME old_name = new_name ;The LABEL statement assigns labels of up to 40 characters to variables.LABEL variable = label ;6. Data Management6.1 SETReads observations from 1 or more SAS data sets and can interleave observations.a) Subset the observations DATA FEMALES ; SET STUDENTS ; IF SEX = F ;b) Subset the variables DATA SMALL ; SET STUDENTS ; DROP WEIGHT AGE ;c) Add a new variable DATA ADD ; SET STUDENTS ; WTKG = WEIGHT / 2.2 ;d) Multiple output data sets DATA MALES FEMALES ; SET STUDENTS ; IF SEX = M THEN OUTPUT MALES ; IF SEX = F THEN OUTPUT FEMALES ;e) Multiple input data sets DATA ALL ;SAS Summary Guide November, 03 School of Applied Statistics
  18. 18. 17(Concatenate) SET MALES FEMALES ;f) Multiple input data sets DATA ALL ;(Interleave) SET MALES FEMALES ; BY NAME ;6.2 MERGECombines observations from two or more SAS data sets and places them side by side.a) One-to-one MergingIf there are the same number of observations in each data set and if the observations are in thesame order then they can be combined as shown below. The two data sets are placed side byside in the combined data set being created. DATA COUPLES ; MERGE HUSBANDS WIVES;For any duplicate variable name in the data sets, only the values of that variable from the lastnamed data set will be saved.b) Match MergingThe two data sets, having already been sorted, are placed side-by-side in the order specifiedin the BY statement. DATA STABLE ; MERGE HORSE TRAINER ; BY OWNER ;6.3 UPDATEUpdates a master file with a transaction file where the BY variable is the KEY for matchingobservations. DATA SURGERY; UPDATE SURGERY BLOODCT; BY PATIENT;This should be used only when, for a master data set, there are several changes that can beapplied all in one job.SAS Summary Guide November, 03 School of Applied Statistics
  19. 19. 187. Statistical ProceduresThere are a wide range of statistical procedures available in SAS for carrying out suchtechniques as analysis of variance and covariance, linear and non-linear regression analysis,multivariate methods and non-parametric methods. A few examples of some of the morewidely used procedures are given below. For more details on all the procedures available forstatistical analysis, consult the appropriate manuals.PROC ANOVAThis procedure is used to carry out an analysis of variance of balanced data (see also PROCGLM). Many of the statements which can be used with this procedure are not necessary forstandard analyses.PROC ANOVA DATA=data_set_name options ;   required statements;CLASS variable_list ;   must appear in this orderMODEL dependent_variables = effects / options ; BY variable_list ;   must appear before theABSORB variable_list ;   first RUN statementFREQ variable ; MEANS effects / options ;  can appear after theTEST H = effects E = effect ;  MODEL statement  MANOVA H = effects E = effect M = equations / options; and can be usedREPEATED factor_names / options ;   interactivelye.g. PROC ANOVA DATA = EXPT ; CLASS METHOD VARIETY ; MODEL YIELD = METHOD VARIETY METHOD * VARIETY ; BY YEAR ;SAS Summary Guide November, 03 School of Applied Statistics
  20. 20. 19PROC GLMThis procedure can be used to fit general linear models to data to enable statistical methodssuch as analysis of variance, analysis of covariance, regression analysis (includingcomparison of regressions) and multivariate analysis of variance to be carried out.Unbalanced data and data with missing values can also be analysed using this procedure.There are numerous statements and options available with this procedure, but mostapplications only use a few of them.PROC GLM DATA=data_set_name options ;  must precede MODEL CLASS variable_list ;  statementMODEL dependent_variables = independent_variables / options ; required statementABSORB variable_list ; BY variable_list ;   must appear before the FREQ variable ; ID variable_list ;  first RUN statement WEIGHT weighting_variable ;  CONTRAST label effect_values / options ; ESTIMATE name effect_values / options ;  LSMEANS effects / options ;   can appear after theMANOVA H = effects E = effect M = equations / options ;   MODEL statementMEANS effects / options ;   and can be usedOUTPUT OUT = output_data_set_name;  interactivelyRANDOM effects / options ; REPEATED factor_names / options ;  TEST H = effects E = effect / options ;  e.g. PROC GLM DATA = EXPT2 ; CLASS TREAT SUBJECT TIME ; MODEL RESP = TREAT SUBJECT(TREAT) TIME TREAT * TIME ; TEST H = TREAT E = SUBJECT(TREAT) ; LSMEANS TREAT TIME TREAT*TIME ; OUTPUT OUT = NEW P = RHAT R = RESID ;SAS Summary Guide November, 03 School of Applied Statistics
  21. 21. 20PROC TTESTThis procedure carries out a simple t-test on the means of two groups of observations. Thegrouping factor specified by the CLASS statement it must have only two levels.PROC TTEST DATA = data_set_name options ;  required statementsCLASS variable_list ; BY variable_list ;   optional statementsVAR variable_list ; e.g. PROC TTEST DATA = EXPT5 ; CLASS SEX ; VAR SCORE ;PROC NLINThis procedure is used to fit nonlinear regression models. The model to be fitted has to bespecified, as do the parameters to be estimated, initial guesses for them, and possibly thepartial derivatives of the model with respect to each parameter. Some models are difficult tofit and in these cases the initial guesses can be critical. There is no guarantee that theprocedure will be able to fit the model successfully.PROC NLIN DATA = data_set_name options ; PARMS parameter = values ;  required statementsMODEL dependent variable = expression ;  BOUNDS expressions ; BY variable_list ;   ID variable_list ;  optional statementsDER.parameter = expression ;  OUTPUT OUT = output_data_set_name ;  e.g. PROC NLIN DATA = EXPT3 ; PARMS B0 = 0.5 B1 = 0.08 ; MODEL Y = B0*(1-EXP(-B1*X)) ; DER.BO = 1-EXP(-B1*X) ; DER.B1 = B0*X*EXP(-B1*X) ;SAS Summary Guide November, 03 School of Applied Statistics
  22. 22. 21PROC REGThis procedure is used to fit linear regression models. There are other regression proceduressuch as RSQUARE, RSREG and STEPWISE for selecting subsets of independent variablesin a multiple regression analysis, fitting quadratic response surfaces and carrying outstepwise regression, respectively.PROC REG DATA = data_set_name options ; } required statement required statement forMODEL dependent_variables = independent_variables / options ;} model fitting: can be used interactivelyVAR variable_list ; BY variable_list ;    must appear before theFREQ variable ;   first RUN statementWEIGHT weighting_variable; ID variable ;  ADD variable_list; DELETE variable_list;  MTEST equations ;  OUTPUT OUT = output_data_set_name ;  can appear anywhere after PLOT y_variate*x_variate;  a MODEL statement andREFIT;  can be used interactively RESTRICT equations ; REWEIGHT condition;  TEST equations ;  e.g. PROC REG DATA = EXPT4 ; MODEL POP = YEAR ; OUTPUT OUT = REGOUT P = EPOP R = RESID ;8. Graphical ProceduresThe majority of procedures available to produce high-quality, hard-copy graphical outputwork in the same way as those mentioned in section 4. Syntactically most are prefixed by theletter G e.g. GCHART, GPLOT etc. Additional global statements allow the user to specifymore precisely the axes, symbols and patterns etc. used in the representation of the data.This is a topic beyond the scope of this Summary Guide but information can be found in thetwo volumes of the manuals SAS/GRAPH. To produce hard-copy, the various versions ofSAS access the graphics devices in different ways, so refer to the appropriate SASCompanion Guide for more complete information.SAS Summary Guide November, 03 School of Applied Statistics
  23. 23. 229. Output Delivery System (ODS)Many procedures produced output data sets which could be used in further calculations e.gparameter estimates from regression analysis. However, some more common procedureslacked this facility. Since verion 7 the Output Delivery System (ODS) has made the savingof datasets, formatted output for high-resolution printers and web quality output using HTMLmuch simpler.Equally it is possible to control the output stream more effectively and greater choice ofoutput objects to data sets is available.ODS is a vast topic with many individual statements. Each statement (shown in the nexttable has its own set of options which are not shown here and are best described in themanual.Table of ODS StatementsODS EXCLUDE {Specify output objects to exclude from ODS destinations. Open, manage, or close the HTML destination. IfODS HTML   the destination is open, you can create HTML output.ODS LISTING {Open, manage or close the Listing destination. Create a SAS data set from an output object and manageODS OUTPUT   the selection and exclusion lists for the Output destination. Specify which locations to search for the definitions that ODS PATH  were created by PROC TEMPLATE, as well as  the order in which to search for them.  Open, manage or close the Printer destination. If theODS PRINTER  destination is open, you can create Printer output.ODS SELECT {Specify output objects for ODS destinations.  Write to the SAS log the specified selection orODS SHOW  exclusion list.  Write to the SAS log a record of each output object that isODS TRACE  created, or suppress the writing of this record. Print or suppress a warning that a style definition or a tableODS VERIFY  definition that is used is not supplied by SAS Institute.SAS Summary Guide November, 03 School of Applied Statistics
  24. 24. 2310. Further FacilitiesThere are many more facilities in SAS in addition to those that have been documented here.These include:-• A macro processing language• A full-screen editor (FSP) enabling data to be entered and updated. It also contains a spreadsheet facility.• Interactive matrix language (IML). A very powerful module for programming matrix algebra useful for statistical and mathematical applications• Time series module (ETS) for carrying out econometric and time-series analysis.11. PublicationsThere is a vast range of SAS manuals for both UNIX and PC versions. They can be orderedfrom:-SAS Software Ltd.Wittington HouseHenley RoadMedmenhamMarlowSL7 2EBThe Main Library on campus has a few manuals for reference based on previous versions. Inaddition, users of SAS at The University of Reading can read the current documentation on-line by registering athttp://v8doc.sas.com/sashtml/SAS Summary Guide November, 03 School of Applied Statistics

×