Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
SASTechies [email_address] http://www.sastechies.com
<ul><li>You can use a DATA step to read raw data into a SAS data set from multiple sources; </li></ul><ul><ul><li>Instream...
<ul><li>Filename fileref “C:Tempsome.txt” </li></ul><ul><li>Data readdata; </li></ul><ul><li>Infile fileref; </li></ul><ul...
<ul><li>During the  compilation phase , each statement is scanned for syntax errors. Most syntax errors prevent further pr...
<ul><li>Input buffer , an area of memory, is created to hold a record from the external file. It’s a logical concept </li>...
<ul><li>Program Data Vector (PDV) , a logical framework that the SAS System uses when creating SAS data sets. </li></ul>11...
<ul><li>During the compilation phase, SAS software also scans each statement in the DATA step, looking for syntax errors. ...
Data Set Descriptor  The attributes of Total are determined by the expression in the statement.  11/13/09 SAS Techies  200...
<ul><li>During execution, each observation in the input data set is processed, stored in the PDV, and then written to the ...
<ul><li>When an INPUT statement begins to read data  values from a record, it uses an input pointer to keep track of its p...
<ul><li>Next, control returns to the top of the DATA step. Then the variable values in the program data vector are reset t...
<ul><li>The execution phase continues in this manner until there are no more records in the raw data file to be read and t...
<ul><li>When reading raw data, use the INFILE statement to indicate which file the data is in. </li></ul><ul><li>INFILE  f...
<ul><li>INPUT   variable    < $ >   startcol-endcol  . . .  ;   </li></ul><ul><li>where  </li></ul><ul><li>variable  is th...
<ul><li>Start of Compilation Phase </li></ul><ul><li>When the  SET  statement is compiled, a slot is added to the program ...
<ul><li>At the bottom of the DATA step (in this example, when the RUN statement is encountered), the compilation phase is ...
<ul><li>During execution, each observation in the input data set is processed, stored in the program data vector, and then...
<ul><li>First, the values in the program data vector are written to the new data set as the first observation.  </li></ul>...
<ul><li>At the beginning of the second iteration, the value of _N_ is set to  2  and the value of _ERROR_ is reset to  0 ....
11/13/09 SAS Techies  2009
<ul><li>SAS Log  A note in the SAS log displays the number of observations and variables in the new data set and also ALL ...
<ul><li>Compile-time errors , including syntax errors such as missing or invalid punctuation or misspelled keywords. </li>...
<ul><li>When the DATA step compiles, the SAS data set  Work.Annual  is created. However, due to the syntax error, the DATA...
<ul><li>Most execution-time errors produce warning messages but allow the SAS program to continue executing. Note: If you ...
11/13/09 SAS Techies  2009 NOTE: Invalid data for RecHR in line 14 35-37.   RULE: ----+----1----+----2----+----3----+----4...
<ul><li>PUT Statement  When the source of program errors may not be apparent, you can use the  PUT  statement to examine v...
<ul><li>proc print  data=clinic.admit  obs= ‘Patient’  label double split='*'   ;  </li></ul><ul><li>var age height weight...
<ul><li>If condition then expression; </li></ul><ul><li>If ….then….else….; </li></ul><ul><li>Do i=1 to 10 by 3; …statement...
Upcoming SlideShare
Loading in …5
×

Understanding SAS Data Step Processing

16,799 views

Published on

Learning
Base SAS,
Advanced SAS,
Proc SQl,
ODS,
SAS in financial industry,
Clinical trials,
SAS Macros,
SAS BI,
SAS on Unix,
SAS on Mainframe,
SAS interview Questions and Answers,
SAS Tips and Techniques,
SAS Resources,
SAS Certification questions...

visit http://sastechies.blogspot.com

Published in: Technology, Business

Understanding SAS Data Step Processing

  1. 1. SASTechies [email_address] http://www.sastechies.com
  2. 2. <ul><li>You can use a DATA step to read raw data into a SAS data set from multiple sources; </li></ul><ul><ul><li>Instream data – Cards / datalines / input </li></ul></ul><ul><ul><li>External file – Infile / Input </li></ul></ul><ul><ul><li>DBMS – SAS Access to DBMS (Oracle/SQL Server etc.) </li></ul></ul>11/13/09 SAS Techies 2009
  3. 3. <ul><li>Filename fileref “C:Tempsome.txt” </li></ul><ul><li>Data readdata; </li></ul><ul><li>Infile fileref; </li></ul><ul><li>Input var1 $ var2; </li></ul><ul><li>To read the raw data file, the DATA step must give the following instructions to the SAS System: </li></ul><ul><ul><li>reference the external text file to be read </li></ul></ul><ul><ul><li>name the SAS data set </li></ul></ul><ul><ul><li>identify the external file </li></ul></ul><ul><ul><li>describe the data values to be read. </li></ul></ul>11/13/09 SAS Techies 2009
  4. 4. <ul><li>During the compilation phase , each statement is scanned for syntax errors. Most syntax errors prevent further processing of the DATA step. </li></ul><ul><li>If the DATA step compiles successfully, then the execution phase begins. A DATA step executes once for each observation in the input data set, unless otherwise directed. </li></ul>11/13/09 SAS Techies 2009
  5. 5. <ul><li>Input buffer , an area of memory, is created to hold a record from the external file. It’s a logical concept </li></ul><ul><li>Note: The input buffer is created only when raw data is read, not when a SAS data set is read. </li></ul><ul><li>Then the PDV is created. The program data vector is the area of memory where SAS software builds a data set, one observation at a time. </li></ul>11/13/09 SAS Techies 2009
  6. 6. <ul><li>Program Data Vector (PDV) , a logical framework that the SAS System uses when creating SAS data sets. </li></ul>11/13/09 SAS Techies 2009
  7. 7. <ul><li>During the compilation phase, SAS software also scans each statement in the DATA step, looking for syntax errors. Syntax errors include: </li></ul><ul><ul><ul><li>missing or misspelled keywords </li></ul></ul></ul><ul><ul><ul><li>invalid variable names </li></ul></ul></ul><ul><ul><ul><li>missing or invalid punctuation </li></ul></ul></ul><ul><ul><ul><li>invalid options. </li></ul></ul></ul><ul><li>Variable attributes such as length and type are determined the first time that a variable is encountered. </li></ul>11/13/09 SAS Techies 2009
  8. 8. Data Set Descriptor The attributes of Total are determined by the expression in the statement. 11/13/09 SAS Techies 2009 data perm.update; infile invent; input Item $ 1-13 IDnum $ 15-19 InStock 21-22 BackOrd 24-25; Total=instock+backord; run; Data Set Name: PERM.UPDATE Member Type: DATA Engine: V8 Created: 11:25 Friday, August 7, 1998 Observations: 0 Variables: 5 Indexes: 0 Observation Length: 30
  9. 9. <ul><li>During execution, each observation in the input data set is processed, stored in the PDV, and then written to the new data set as an observation, unless otherwise directed. </li></ul><ul><li>The DATA step executes once for each observation in the input data set, unless otherwise directed. </li></ul><ul><li>At the beginning of the execution phase, the value of _N_ is 1. Because there are no data errors, the value of _ERROR_ is 0. The remaining variables are initialized to missing. </li></ul><ul><li>Next, the INFILE statement identifies the location of the raw data. </li></ul>11/13/09 SAS Techies 2009
  10. 10. <ul><li>When an INPUT statement begins to read data values from a record, it uses an input pointer to keep track of its position. </li></ul><ul><li>At the end of the DATA step, three default actions occur. </li></ul><ul><li>First, the record is dumped to the SAS dataset from the PDV </li></ul>Raw Data File Invent 11/13/09 SAS Techies 2009 data perm.update; infile invent; input Item $ 1-13 IDnum $ 15-19 Instock 21-22 BackOrd 24-25; Total=instock+backord; run; >----+----1----+----2----+ V   Bird Feeder LG088    3   20 •   6 Glass Mugs SB082   6   12     Glass Tray  BQ049 12     6     Padded Hangrs  MN256 15   20     Jewelry Box  AJ498 23     0     Red Apron  AQ072   9   12     Crystal Vase  AQ672 27     0     Picnic Basket  LS930 21     0     Brass Clock  AN910   2   10  
  11. 11. <ul><li>Next, control returns to the top of the DATA step. Then the variable values in the program data vector are reset to missing. </li></ul><ul><li>SAS Dataset </li></ul><ul><li>When reading raw data, SAS software sets the value of each variable in the DATA step to missing at the beginning of each iteration, with these exceptions: </li></ul><ul><ul><ul><li>variables named in a RETAIN statement </li></ul></ul></ul><ul><ul><ul><li>variables created in a sum statement </li></ul></ul></ul><ul><ul><ul><li>data elements in a _TEMPORARY_ array </li></ul></ul></ul><ul><ul><ul><li>any variables created with options in the FILE or INFILE statements </li></ul></ul></ul><ul><ul><ul><li>automatic variables. </li></ul></ul></ul>11/13/09 SAS Techies 2009 Item         IDnum InStock BackOrd Total Bird Feeder  LG088       3      20    23
  12. 12. <ul><li>The execution phase continues in this manner until there are no more records in the raw data file to be read and the data portion of the new data set is complete </li></ul><ul><li>At the end of the execution phase, the SAS log confirms that the raw data file was read and displays the number of observations and variables in the data set. </li></ul><ul><li>SAS log </li></ul><ul><li>NOTE: 9 records were read from the infile INVENT. </li></ul><ul><li>NOTE: The data set PERM.UPDATE has 9 observations and 5 variables. </li></ul>11/13/09 SAS Techies 2009
  13. 13. <ul><li>When reading raw data, use the INFILE statement to indicate which file the data is in. </li></ul><ul><li>INFILE file-specification < options >; </li></ul><ul><li>Ex: Infile fileref dlm=“,” dsd missover lrecl= obs= </li></ul>11/13/09 SAS Techies 2009 Obs= Pad Lrecl= End= DLM= DSD EOF= FILEVAR= FIRSTOBS= LENGTH= LINESIZE= MISSOVER N= _INFILE_
  14. 14. <ul><li>INPUT variable   < $ > startcol-endcol . . . ; </li></ul><ul><li>where </li></ul><ul><li>variable is the SAS name you assign to the field </li></ul><ul><li>the dollar sign ($) identifies the data set type as character (nothing appears here if the data set is numeric) </li></ul><ul><li>startcol represents the starting column location in the data line for this variable </li></ul><ul><li>endcol represents the ending column location in the data line for this variable </li></ul>11/13/09 SAS Techies 2009
  15. 15. <ul><li>Start of Compilation Phase </li></ul><ul><li>When the SET statement is compiled, a slot is added to the program data vector for each variable in the input data set. </li></ul>data finance.duejan; set finance.loans; Interest=amount*(rate/12); run; SAS Data Set Finance.Loans 11/13/09 SAS Techies 2009 Account Amount Rate Months Payment 101-1092  22000 0.1000     60   467.43 101-1731 114000  0.0950   360   958.57 101-1289  10000   0.1050     36   325.02 101-3144    3500  0.1050     12   308.52
  16. 16. <ul><li>At the bottom of the DATA step (in this example, when the RUN statement is encountered), the compilation phase is complete and the descriptor portion of the new SAS data set is created. </li></ul><ul><li>The descriptor portion of the data set includes: </li></ul><ul><ul><ul><li>name of the data set </li></ul></ul></ul><ul><ul><ul><li>number of observations and variables </li></ul></ul></ul><ul><ul><ul><li>names and attributes of the variables. </li></ul></ul></ul><ul><li>Remember, _N_ and _ERROR_ are not written to the data set. There are no observations because the DATA step has not yet executed. </li></ul>11/13/09 SAS Techies 2009
  17. 17. <ul><li>During execution, each observation in the input data set is processed, stored in the program data vector, and then written to the new data set as an observation, unless otherwise directed. </li></ul><ul><li>The SET statement reads the first observation from the input data set and writes the values to the program data vector. </li></ul>11/13/09 SAS Techies 2009
  18. 18. <ul><li>First, the values in the program data vector are written to the new data set as the first observation. </li></ul><ul><li>Second, control returns to the top of the DATA step. </li></ul><ul><li>Third, SAS retains the values of variables that were read from a SAS data set with the SET statement, or that were created by a sum statement. All other variable values, such as the variable Interest, are set to missing. </li></ul>11/13/09 SAS Techies 2009
  19. 19. <ul><li>At the beginning of the second iteration, the value of _N_ is set to 2 and the value of _ERROR_ is reset to 0 . </li></ul><ul><li>Remember, the automatic variable _N_ keeps track of the number of times the DATA step has begun to execute. </li></ul><ul><li>SAS prints the record to the Output and the control returns to the start of the Datastep and so on. </li></ul>11/13/09 SAS Techies 2009
  20. 20. 11/13/09 SAS Techies 2009
  21. 21. <ul><li>SAS Log A note in the SAS log displays the number of observations and variables in the new data set and also ALL errors that might have occurred in the compilation or execution. </li></ul><ul><li>Recognizing Errors in a DATA Step Program This section teaches you how to debug common DATA step programming errors. After completing this section, you will be able to </li></ul><ul><ul><ul><ul><li>recognize and diagnose syntax errors </li></ul></ul></ul></ul><ul><ul><ul><ul><li>recognize and diagnose execution-time errors </li></ul></ul></ul></ul><ul><ul><ul><ul><li>diagnose errors in programming logic. </li></ul></ul></ul></ul>11/13/09 SAS Techies 2009
  22. 22. <ul><li>Compile-time errors , including syntax errors such as missing or invalid punctuation or misspelled keywords. </li></ul><ul><li>Execution-time errors , such as illegal mathematical operations or processing a character variable as a numeric variable. Execution-time errors are detected after compilation, during the execution of the DATA step. </li></ul><ul><li>In addition, any errors in your program logic can sometimes cause a DATA step program to produce results that are different from what you expect. </li></ul>11/13/09 SAS Techies 2009
  23. 23. <ul><li>When the DATA step compiles, the SAS data set Work.Annual is created. However, due to the syntax error, the DATA step does not execute. The new data set contains no observations or variables. </li></ul><ul><li>Note that SAS does not correct the misspelled word in your program. </li></ul><ul><li>If no syntax errors are detected or if SAS can interpret the syntax errors, the DATA step compiles and then executes. </li></ul>11/13/09 SAS Techies 2009
  24. 24. <ul><li>Most execution-time errors produce warning messages but allow the SAS program to continue executing. Note: If you process a DATA step in noninteractive mode, execution-time errors may cause the program to stop processing. </li></ul><ul><li>The new data set is created and contains nine observations, even though some values are missing. </li></ul>11/13/09 SAS Techies 2009
  25. 25. 11/13/09 SAS Techies 2009 NOTE: Invalid data for RecHR in line 14 35-37. RULE: ----+----1----+----2----+----3----+----4----+----5--- 14 2575 Quigley, M 74 152 Q13 11 26 I ID=2575 Name=Quigley, M RestHR=74 MaxHR=152 RecHR=. TimeMin=11 TimeSec=26 Tolerance=I _ERROR_=1 _N_=14 NOTE: 21 records were read from the infile TESTS. The minimum record length was 45. The maximum record length was 45. NOTE: The data set CLINIC.STRESS has 21 observations and 8 variables. NOTE: DATA statement used: real time 2.04 seconds cpu time 0.06 seconds
  26. 26. <ul><li>PUT Statement When the source of program errors may not be apparent, you can use the PUT statement to examine variable values and generate your own message in the log. </li></ul><ul><li>data test; if code='1' then Type='Variable'; else if code='2' then Type='Fixed'; else put 'MY NOTE: invalid value: ' code=; run; </li></ul><ul><li>Data step Debugger </li></ul>11/13/09 SAS Techies 2009
  27. 27. <ul><li>proc print data=clinic.admit obs= ‘Patient’ label double split='*' ; </li></ul><ul><li>var age height weight fee; </li></ul><ul><li>where age>30; </li></ul><ul><li>sum fee; </li></ul><ul><li>Sum by age; </li></ul><ul><li>Label age=‘Age Today’; </li></ul><ul><li>run; </li></ul><ul><li>Sample Output: </li></ul><ul><li>PROC PRINT step lists all the variables in a data set. You can select variables and control the order in which they appear by using a VAR statement in your PROC PRINT step. </li></ul><ul><li>To change the text for the Obs heading, you can specify the OBS= option </li></ul><ul><li>To remove the Obs column, you can specify the NOOBS option </li></ul>SAS Techies 2009 11/13/09 Patient Age Height Weight Fee 1 27 72 168 85.20 2 34 66 152 124.80 3 31 61 123 149.75 4 43 63 137 149.75 5 51 71 158 124.80
  28. 28. <ul><li>If condition then expression; </li></ul><ul><li>If ….then….else….; </li></ul><ul><li>Do i=1 to 10 by 3; …statements…end; </li></ul><ul><li>Do while…. </li></ul>SAS Techies 2009 11/13/09

×