Get the scoop on the loop   how best to write a loop in the data step
Upcoming SlideShare
Loading in...5
×
 

Get the scoop on the loop how best to write a loop in the data step

on

  • 355 views

During the execution of the DATA step processing, the DATA step works like a loop, repetitively reading the data and creating observations one at a time. We call this type of loop the implicit loop. ...

During the execution of the DATA step processing, the DATA step works like a loop, repetitively reading the data and creating observations one at a time. We call this type of loop the implicit loop. Sometimes we need to execute certain SAS® statements repeatedly. In this situation, we need to construct an explicit loop by using the DO, DO WHILE, or DO UNTIL statements. There is a wide range of applications for explicit loops, such as generating random samples, reading multiple external data files, and so forth. However, in some scenarios, creating an explicit loop can be very tricky, even for seasoned programmers. Constructing a successful loop is dependent upon grasping SAS® programming fundamentals, such as understanding that the SAS data set is created one observation at a time in the program data vector (PDV). In this paper, you will learn how to create loops with various applications and what happens in the PDV when creating the explicit loop.

Statistics

Views

Total Views
355
Views on SlideShare
351
Embed Views
4

Actions

Likes
0
Downloads
15
Comments
0

2 Embeds 4

http://www.linkedin.com 3
https://www.linkedin.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Get the scoop on the loop   how best to write a loop in the data step Get the scoop on the loop how best to write a loop in the data step Presentation Transcript

  • Get the Scoop on the Loop How Best to Write a Loop in the DATA Step Arthur Li Department of Information Science City of Hope Comprehensive Cancer Center Duarte, CA
  • INTRODUCTION
    • Loops: execute one or a group of statements repetitively until it reaches a predefined condition
    • For SAS, there are implicit and explicit loops
    • Sometimes programmers can’t distinguish clearly between the two different loops
    • Knowing when the situation calls for creating an explicit loop is one of a programmer’s challenges
  • COMPILATION AND EXECUTION PHASES Compilation Phase Execution phase If there is no syntax error
    • A DATA step is processed in two-phase sequences :
    • Each statement is scanned for syntax errors
    • PDV is created according to the descriptor portion of the input dataset
    • SAS uses the PDV to build the new dataset
  • IMPLICIT LOOP Patient:
    • During the execution phase, the DATA step works like a loop – an implicit loop
    • It repetitively executes statements
      • reads data values
      • creates observations in the PDV one at a time
    • Each loop is called an iteration
    • Suppose you have the following dataset that contains patient IDs for a clinical trial
    • You would like to assign each patient with either a drug or a placebo (50% chance of either/or)
    M1240 4 F2340 3 F2390 2 M2390 1 ID
  • IMPLICIT LOOP
    • The RANUNI function
    RANUNI (SEED)
    • It generates a number ~ Uniform(0, 1)
    • e.g. 0.13567, 0.34567, 0.56789, etc
    • SEED is a nonnegative integer
    • The RANUNI function generates a stream of numbers based on SEED
    • When SEED is set to 0, the generated number cannot be reproduced
    • when SEED is a non-zero number, the generated number can be produced
  • IMPLICIT LOOP data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; Patient: PDV: COMPILATION:
    • Check for Syntax Error
    • PDV is Created
    Automatic variables: _N_ = 1: 1 st observation is being processed _N_ = 2: 2 nd observation is being processed M1240 4 F2340 3 F2390 2 M2390 1 ID K GROUP D RANNUM K ID D _ERROR_ D _N_
  • IMPLICIT LOOP data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; Patient: PDV: COMPILATION:
    • Check for Syntax Error
    • PDV is Created
    Automatic variables: _ERROR_ = 1: signals the data error of the currently-processed observation M1240 4 F2340 3 F2390 2 M2390 1 ID K GROUP D RANNUM K ID D _ERROR_ D _N_
  • IMPLICIT LOOP data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; Patient: PDV: Variable exists in the INPUT dataset
    • SAS sets each variable to missing in the PDV only before the 1st iteration of the execution
    • Variables will retain their values in the PDV until they are replaced by the new values
    COMPILATION: M1240 4 F2340 3 F2390 2 M2390 1 ID K GROUP D RANNUM K ID D _ERROR_ D _N_
  • IMPLICIT LOOP data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; Patient: PDV: Variables being created in the DATA step
    • SAS sets each variable to missing in the PDV at the beginning of every iteration of the execution
    COMPILATION: M1240 4 F2340 3 F2390 2 M2390 1 ID K GROUP D RANNUM K ID D _ERROR_ D _N_
  • IMPLICIT LOOP data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; Patient: PDV: COMPILATION: D = dropped K = kept M1240 4 F2340 3 F2390 2 M2390 1 ID K GROUP D RANNUM K ID D _ERROR_ D _N_
  • IMPLICIT LOOP data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; 1 st iteration:
    • _N_  1
    • _ERROR_  0
    • The rest of variables are set to missing
    Patient: PDV: EXECUTION: M1240 4 F2340 3 F2390 2 M2390 1 ID . 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_
  • IMPLICIT LOOP data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; 1 st iteration:
    • The SET statement copies the 1 st observation  PDV
    Patient: PDV: EXECUTION: M1240 4 F2340 3 F2390 2 M2390 1 ID . M2390 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_
  • IMPLICIT LOOP data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; 1 st iteration:
    • RANNUM is generated
    Patient: PDV: EXECUTION: M1240 4 F2340 3 F2390 2 M2390 1 ID 0.36993 M2390 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_
  • IMPLICIT LOOP data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; 1 st iteration:
    • GROUP  ‘P’ since RANNUM is not > 0.5
    Patient: PDV: EXECUTION: M1240 4 F2340 3 F2390 2 M2390 1 ID P 0.36993 M2390 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_
  • IMPLICIT LOOP data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; 1 st iteration:
    • The implicit OUTPUT statement writes the variables marked with (K) to the final dataset
    Patient: PDV: Trial1: EXECUTION: M1240 4 F2340 3 F2390 2 M2390 1 ID P 0.36993 M2390 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ M2390 ID P GROUP 1
  • REVIEW: OUTPUT Statement data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output; run ; Explicit OUTPUT
    • The explicit OUTPUT statement:
    • Writes the current observation from the PDV to a SAS dataset immediately
    • Not at the end of the DATA step
  • REVIEW: OUTPUT Statement data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output; run ; Implicit OUTPUT
    • The implicit OUTPUT statement:
    • Without explicit OUTPUT statements, every DATA step contains an implicit OUTPUT statement at the end of the DATA step
    • It tells SAS to write observations to the dataset at the end of the DATA step
  • REVIEW: OUTPUT Statement data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output; run ;
    • Placing an explicit OUTPUT
    • Override the implicit OUTPUT
    • SAS adds an observation to a dataset only when an explicit OUTPUT is executed
    • We can use more than one OUTPUT statement in the DATA step
  • IMPLICIT LOOP data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; 2 nd iteration:
    • _N_ ↑2
    • ID is retained since ID is from input dataset
    • GROUP and RANNUM are set to missing
    Patient: PDV: Trial1: EXECUTION: M1240 4 F2340 3 F2390 2 M2390 1 ID . M2390 0 2 K GROUP D RANNUM K ID D _ERROR_ D _N_ M2390 ID P GROUP 1
  • IMPLICIT LOOP data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; 2 nd iteration:
    • The SET statement copies the 2 nd observation  PDV
    Patient: PDV: Trial1: Skip a few iterations…. EXECUTION: M1240 4 F2340 3 F2390 2 M2390 1 ID . M2390 0 2 K GROUP D RANNUM K ID D _ERROR_ D _N_ M2390 ID P GROUP 1
  • IMPLICIT LOOP data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; The end of 4 th iteration:
    • The implicit OUTPUT statement writes the variables marked with K to the final dataset
    • SAS returns to the beginning of the DATA step
    Patient: PDV: Trial1: EXECUTION: M1240 4 F2340 3 F2390 2 M2390 1 ID D 0.51880 M1240 0 4 K GROUP D RANNUM K ID D _ERROR_ D _N_ D M1240 4 D F2340 3 D F2390 2 M2390 ID P GROUP 1
  • IMPLICIT LOOP data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; 5 th iteration:
    • _N_ ↑5
    • ID is retained
    • GROUP and RANNUM are set to missing
    Patient: PDV: Trial1: EXECUTION: M1240 4 F2340 3 F2390 2 M2390 1 ID . M1240 0 5 K GROUP D RANNUM K ID D _ERROR_ D _N_ D M1240 4 D F2340 3 D F2390 2 M2390 ID P GROUP 1
  • IMPLICIT LOOP data trial1 (drop=rannum); set patient; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; run ; 5 th iteration:
    • SAS reaches the end-of-file-marker, which means that there are no more observations to read
    • The execution phase is completed, goes to next DATA/PROC step
    Patient: PDV: Trial1: End-of-file marker EXECUTION: M1240 4 F2340 3 F2390 2 M2390 1 ID . M1240 0 5 K GROUP D RANNUM K ID D _ERROR_ D _N_ D M1240 4 D F2340 3 D F2390 2 M2390 ID P GROUP 1
  • EXPLICIT LOOP
    • Suppose you don’t have a dataset containing the patient IDs
    • You are asked to assign four patients, ‘M2390’, ‘F2390’, ‘F2340’, ‘M1240’, with a 50% chance of receiving either the drug or the placebo
    • You can create the ID and assign each ID to a group in the DATA step at the same time. For example
  • EXPLICIT LOOP data trial2(drop = rannum); id = 'M2390' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; id = 'F2390' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; id = 'F2340' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; id = 'M1240' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; run ; Assigning IDs in the DATA step
  • EXPLICIT LOOP data trial2(drop = rannum); id = 'M2390' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; id = 'F2390' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; id = 'F2340' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; id = 'M1240' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; run ; 4 explicit OUTPUT statements
  • EXPLICIT LOOP data trial2(drop = rannum); id = 'M2390' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; id = 'F2390' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; id = 'F2340' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; id = 'M1240' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; run ; 4 almost identical blocks
    • Put identical codes in a loop
    • Loop along the IDs
    • Reduce amount of coding
  • ITERATIVE DO LOOP
    • General form for an iterative DO loop
    DO INDEX-VARIABLE = VALUE1, VALUE2, …, VALUEN ; SAS STATEMENTS END;
    • INDEX-VARIABLE: contains the value of the current iteration
    • The loop will execute along VALUE1 through VALUEN
    • The VALUES can be either character or numeric
  • ITERATIVE DO LOOP data trial2(drop = rannum); id = 'M2390' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; id = 'F2390' ; ... id = 'F2340' ; ... id = 'M1240' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; run ; DO INDEX-VARIABLE = VALUE1, VALUE2, …, VALUEN ; SAS STATEMENTS END;
    • INDEX-VARIABLE: ID
    • VALUE1 – VALUEN:
    • 'M2390’, 'F2390’, 'F2340’, 'M1240'
    • SAS STATEMENTS:
    rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ;
  • ITERATIVE DO LOOP data trial2(drop = rannum); id = 'M2390' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; id = 'F2390' ; ... id = 'F2340' ; ... id = 'M1240' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; run ; DO INDEX-VARIABLE = VALUE1, VALUE2, …, VALUEN ; SAS STATEMENTS END; data trial2 (drop = rannum); do id = 'M2390' , 'F2390' , 'F2340' , 'M1240' ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
  • ITERATIVE DO LOOP DO INDEX-VARIABLE = START TO STOP < BY INCREMENT> ; SAS STATEMENTS END;
    • Usually we use the iterative DO loop and loop along a sequence of integers
    • The loop will execute from the START to the STOP value
  • ITERATIVE DO LOOP DO INDEX-VARIABLE = START TO STOP < BY INCREMENT> ; SAS STATEMENTS END;
    • Usually we use the iterative DO loop and loop along a sequence of integers
    • The optional BY clause specifies an increment between START and STOP
    • The default value for INCREMENT is 1
  • ITERATIVE DO LOOP DO INDEX-VARIABLE = START TO STOP < BY INCREMENT> ; SAS STATEMENTS END;
    • Usually we use the iterative DO loop and loop along a sequence of integers
    • START, STOP, and INCREMENT
      • Numbers
      • Variables
      • SAS expressions
    • These values are set upon entry into the DO loop and cannot be modified during the processing of the DO loop
  • ITERATIVE DO LOOP DO INDEX-VARIABLE = START TO STOP < BY INCREMENT> ; SAS STATEMENTS END;
    • Usually we use the iterative DO loop and loop along a sequence of integers
    • INDEX-VARIABLE can be changed within the loop
  • ITERATIVE DO LOOP data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • Suppose you are using a sequence of numbers, say 1 to 4, as patient IDs
    DO INDEX-VARIABLE = START TO STOP < BY INCREMENT> ; SAS STATEMENTS END;
    • INDEX-VARIABLE: ID
    • START: 1
    • STOP: 4
    • INCREMENT: 1
  • ITERATIVE DO LOOP: EXECUTION PHASE data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • Since we didn’t read an input dataset, there will be only one iteration for the DATA step
    • _N_ will be 1 for the entire execution phase
    PDV: . . 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_
  • ITERATIVE DO LOOP: EXECUTION PHASE data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • ID  1
    PDV: 1 st Iteration of DO loop: . 1 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_
  • ITERATIVE DO LOOP: EXECUTION PHASE data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • RANNUM is generated
    PDV: 1 st Iteration of DO loop: 0.36993 1 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_
  • ITERATIVE DO LOOP: EXECUTION PHASE data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • GROUP  ‘P’ since RANNUM is not > 0.5
    PDV: 1 st Iteration of DO loop: P 0.36993 1 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_
  • ITERATIVE DO LOOP: EXECUTION PHASE data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • The OUTPUT statement instructs SAS to write observations to the output dataset
    PDV: 1 st Iteration of DO loop: P 0.36993 1 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ 1 ID P GROUP 1
  • ITERATIVE DO LOOP: EXECUTION PHASE data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • SAS reaches the end of DO loop
    PDV: 1 st Iteration of DO loop: P 0.36993 1 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ 1 ID P GROUP 1
  • ITERATIVE DO LOOP: EXECUTION PHASE data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • ID ↑ 2; since 2 ≤ 4, the 2 nd iteration continues
    PDV: 2 nd Iteration of DO loop: P 0.36993 2 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ 1 ID P GROUP 1
  • ITERATIVE DO LOOP: EXECUTION PHASE data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • RANNUM is generated
    PDV: 2 nd Iteration of DO loop: P 0.94018 2 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ 1 ID P GROUP 1
  • ITERATIVE DO LOOP: EXECUTION PHASE data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • GROUP  ‘D’ since RANNUM > 0.5
    PDV: 2 nd Iteration of DO loop: D 0.94018 2 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ 1 ID P GROUP 1
  • ITERATIVE DO LOOP: EXECUTION PHASE data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • The OUTPUT statement instructs SAS to write observations to the output dataset
    PDV: 2 nd Iteration of DO loop: D 0.94018 2 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ D 2 2 1 ID P GROUP 1
  • ITERATIVE DO LOOP: EXECUTION PHASE data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • Let’s skip two iterations
    PDV: D 0.94018 2 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ D 2 2 1 ID P GROUP 1
  • ITERATIVE DO LOOP: EXECUTION PHASE data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • SAS reaches the end of the DO loop of the 4 th iteration
    PDV: 4 th Iteration of DO loop: D 0.51880 4 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ D 4 4 D 3 3 D 2 2 1 ID P GROUP 1
  • ITERATIVE DO LOOP: EXECUTION PHASE data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • ID ↑5; since 5 is > 4, the loop ends
    PDV: 5 th iteration of DO loop: D 0.51880 5 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ D 4 4 D 3 3 D 2 2 1 ID P GROUP 1
  • ITERATIVE DO LOOP: EXECUTION PHASE data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • There will be no implicit OUTPUT statement
    • Since we didn’t read an input dataset, the DATA step execution ends
    PDV: D 0.51880 5 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ D 4 4 D 3 3 D 2 2 1 ID P GROUP 1
  • EXECUTING LOOPS CONDITIONALLY
    • Using an iterative DO loop requires specifying the number of iterations for the DO loop
    • Sometimes you will need to execute statements repetitively until a condition is met
    • In this situation, you need to use either the DO WHILE or DO UNTIL statements
  • DO WHILE DO WHILE (EXPRESSION) ; SAS STATEMENTS END;
    • EXPRESSION is evaluated at the top of the DO loop
    • The DO loop will not execute if the EXPRESSION is false
  • DO WHILE DO WHILE (EXPRESSION) ; SAS STATEMENTS END; data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ; data trial4 (drop=rannum); do while (id < 4 ); id + 1 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ; Iterative DO loop: DO WHILE loop:
  • DO WHILE data trial4 (drop=rannum); do while (id < 4 ); id + 1 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • _N_  1, _ERROR_  0
    • ID  0 because of the SUM statement
    • The rest of the variables are set to missing
    PDV: At the beginning of the execution phase: . 0 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_
  • DO WHILE data trial4 (drop=rannum); do while (id < 4 ); id + 1 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • Since ID < 4, loop continues
    PDV: 1 st iteration of the DO WHILE loop: . 0 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_
  • DO WHILE data trial4 (drop=rannum); do while (id < 4 ); id + 1 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • ID  1
    PDV: 1 st iteration of the DO WHILE loop: . 1 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_
  • DO WHILE data trial4 (drop=rannum); do while (id < 4 ); id + 1 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • RANNUM is generated
    PDV: 1 st iteration of the DO WHILE loop: 0.36993 1 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_
  • DO WHILE data trial4 (drop=rannum); do while (id < 4 ); id + 1 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • GROUP  ‘P’
    PDV: 1 st iteration of the DO WHILE loop: P 0.36993 1 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_
  • DO WHILE data trial4 (drop=rannum); do while (id < 4 ); id + 1 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • The OUTPUT statement instructs SAS to write observations to the output dataset
    PDV: 1 st iteration of the DO WHILE loop: P 0.36993 1 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ 1 ID P GROUP 1
  • DO WHILE data trial4 (drop=rannum); do while (id < 4 ); id + 1 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • SAS reaches the end of DO loop
    PDV: 1 st iteration of the DO WHILE loop: P 0.36993 1 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ 1 ID P GROUP 1
  • DO WHILE data trial4 (drop=rannum); do while (id < 4 ); id + 1 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • Since ID < 4, the loop continues
    PDV: 2 nd iteration of the DO WHILE loop: P 0.36993 1 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ 1 ID P GROUP 1
  • DO WHILE data trial4 (drop=rannum); do while (id < 4 ); id + 1 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ;
    • ID  2
    PDV: 2 nd iteration of the DO WHILE loop: P 0.36993 2 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ 1 ID P GROUP 1
  • DO WHILE data trial4 (drop=rannum); do while (id < 4 ); id + 1 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ; PDV:
    • Let’s skip a few iterations
    P 0.36993 2 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ 1 ID P GROUP 1
  • DO WHILE data trial4 (drop=rannum); do while (id < 4 ); id + 1 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ; PDV: At the end of the 4 th iteration:
    • Here’s the contents of the PDV at the end of the 4 th loop
    D 0.51880 4 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ D 4 4 D 3 3 D 2 2 1 ID P GROUP 1
  • DO WHILE data trial4 (drop=rannum); do while (id < 4 ); id + 1 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ; PDV: 5 th iteration:
    • Now ID is not < 4, loop stops
    D 0.51880 4 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ D 4 4 D 3 3 D 2 2 1 ID P GROUP 1
  • DO WHILE data trial4 (drop=rannum); do while (id < 4 ); id + 1 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ; PDV: 5 th iteration:
    • The execution phase ends
    D 0.51880 4 0 1 K GROUP D RANNUM K ID D _ERROR_ D _N_ D 4 4 D 3 3 D 2 2 1 ID P GROUP 1
  • DO UNTIL
    • Unlike DO WHILE loops, the DO UNTIL loop evaluates the condition at the end of the loop
    • The DO UNTIL loop will not continue for another iteration if the EXPRESSION is evaluated to be TRUE at the end of the current loop
    • That means the DO UNTIL loop always executes at least once
    DO UNTIL (EXPRESSION) ; SAS STATEMENTS END;
  • DO UNTIL DO UNTIL (EXPRESSION) ; SAS STATEMENTS END; data trial3 (drop = rannum); do id = 1 to 4 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ; data trial4 (drop=rannum); do while (id < 4 ); id + 1 ; rannum = ranuni( 2 ); if rannum> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ; Iterative DO loop: DO WHILE loop: data trial5 (drop=rannum); do until (id >= 4 ); id + 1 ; rannum = ranuni( 2 ); if rannum > 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ; DO UNTIL loop: Will not continue if the EXPRESSION is false Will not continue for another iteration if the EXPRESSION is true
  • NESTED LOOPS
    • Suppose that you would like to assign 12 patients with either a drug or a placebo
    • These 12 subjects are from 3 cancer centers (“COH”, “UCLA”, and “USC”) with 4 subjects per center
    data trial6; length center $ 4 ; do center = &quot;COH&quot; , &quot;UCLA&quot; , &quot;USC&quot; ; do id = 1 to 4 ; if ranuni( 2 ) > 0.5 then group = 'D' ; else group = 'P' ; output ; end ; end ; run ; Outer loop Inner loop
  • NESTED LOOPS
    • Suppose that you would like to assign 12 patients with either a drug or a placebo
    • These 12 subjects are from 3 cancer centers (“COH”, “UCLA”, and “USC”) with 4 subjects per center
    Obs center id group 1 COH 1 P 2 COH 2 D 3 COH 3 D 4 COH 4 D 5 UCLA 1 D 6 UCLA 2 D 7 UCLA 3 P 8 UCLA 4 P 9 USC 1 P 10 USC 2 P 11 USC 3 D 12 USC 4 P
  • COMBINING IMPLICIT AND EXPLICIT LOOPS
    • In previous program, all the observations were created from one DATA step since we didn’t read any input data
    • Suppose the values for CENTER is stored in a SAS dataset
    • For each center, you need to assign 4 patients with either a drug or a placebo
    data trial7; set cancer_center; do id = 1 to 4 ; if ranuni( 2 )> 0.5 then group = 'D' ; else group = 'P' ; output ; end ; run ; DATA step: implicit loop USC 3 UCLA 2 COH CENTER 1 explicit loop
  • UTILIZING LOOPS TO CREATE SAMPLES DIRECT ACCESS MODE
    • W hen reading a SAS dataset, by default, SAS reads the dataset sequentially
      • SAS reads one observation for each iteration of the DATA step
      • This process will stop once it reaches the end-of-file marker
    sequentially The end-of-file marker 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1
  • DIRECT ACCESS MODE
    • SAS can also access an observation directly via direct-access mode
    09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1 Direct Access
  • DIRECT ACCESS MODE
    • There are 3 important components for using the direct-access mode
    Step1: Tell SAS which observation you would like to select by using POINT = in the SET statement SET SAS-DATA-SET POINT = VARIABLE;
    • Temporary variable, not outputted
    • Set to 0 in the PDV at the very beginning of the DATA step
    09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1
  • DIRECT ACCESS MODE
    • There are 3 important components for using the direct-access mode
    Step1: Tell SAS which observation you would like to select by using POINT = in the SET statement SET SAS-DATA-SET POINT = VARIABLE;
    • VARIABLE must be assigned to an observation number before the SET statement
    09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1
  • DIRECT ACCESS MODE
    • There are 3 important components for using the direct-access mode
    Step1: Tell SAS which observation you would like to select by using POINT = in the SET statement SET SAS-DATA-SET POINT = VARIABLE; For example, to select the 5 th observation… data sample1; obs_n = 5 ; set sbp point= obs_n; run ; Sbp: 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1
  • DIRECT ACCESS MODE
    • There are 3 important components for using the direct-access mode
    • Step2: Use the STOP statement
    • When using direct-access mode, SAS will not be able to detect the end-of-file marker
    data sample1; obs_n = 5 ; set sbp point= obs_n; run ; Sbp: The end-of-file marker 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1
  • DIRECT ACCESS MODE
    • There are 3 important components for using the direct-access mode
    • Step2: Use the STOP statement
    • When using direct-access mode, SAS will not be able to detect the end-of-file marker
    • Without telling SAS explicitly when to stop processing, it will cause infinite looping
    STOP; data sample1; obs_n = 5 ; set sbp point= obs_n; stop ; run ; Sbp: 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1
  • DIRECT ACCESS MODE
    • There are 3 important components for using the direct-access mode
    Step3: Use the OUTPUT statement data sample1; obs_n = 5 ; set sbp point= obs_n; stop ; run ; Sbp: Implicit output Recall: If there is no explicit OUTPUT, SAS writes the observations to the output data at the end of the DATA step 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1
  • DIRECT ACCESS MODE
    • There are 3 important components for using the direct-access mode
    Step3: Use the OUTPUT statement data sample1; obs_n = 5 ; set sbp point= obs_n; stop ; run ; Sbp: Implicit output DATA step processing stop DATA step processing stops BEFORE the end of the DATA step  Implicit OUTPUT will not be reached! 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1
  • DIRECT ACCESS MODE
    • There are 3 important components for using the direct-access mode
    Step3: Use the OUTPUT statement data sample1; obs_n = 5 ; set sbp point= obs_n; output; stop ; run ; Sbp: Add the OUTPUT statement before the STOP 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE
      • Select every 3 rd observation
    • A systematic sample is created by selecting every k th observation from an original dataset
    09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1 07 04 01 ID 127 3 106 2 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE
    • The systematic sample cannot be created sequentially - A direct-access mode must be used
    • You can create a systematic sample by using an iterative DO loop
    DO INDEX-VARIABLE = START TO STOP < BY INCREMENT> ; SAS STATEMENTS END; 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1 1 total # of obs. k - every k th obs.
  • CREATING A SYSTEMATIC SAMPLE
    • To find out the total number of observations, use the NOBS = option in the SET statement
    SET SAS-DATA-SET NOBS = VARIABLE;
    • A temporary variable that contains the # of observations of the SAS-DATA-SET
    • It will not be outputted to the final dataset
    • It is created automatically based on the descriptor portion of the SAS-DATA-SET during the compilation phase
    • It will retain its value throughout the execution phase
  • CREATING A SYSTEMATIC SAMPLE PDV: At the beginning of the execution phase:
    • _N_  1
    • _N_ will be 1 throughout the execution phase because SAS didn’t read the input data sequentially
    data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; _ERROR_ is not shown for simplicity . 9 0 1 K SBP K ID D TOTAL D CHOOSE D _N_ 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE PDV: At the beginning of the execution phase:
    • CHOOSE  0
    • TOTAL  9, based on the descriptor portion of Sbp
    • The rest of variables  missing
    data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; . 9 0 1 K SBP K ID D TOTAL D CHOOSE D _N_ 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE PDV: 1 st iteration of the DO loop:
    • CHOOSE  1
    data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; . 9 1 1 K SBP K ID D TOTAL D CHOOSE D _N_ 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE PDV: 1 st iteration of the DO loop:
    • SAS reads the 1 st observation via direct-access mode
    data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; 145 01 9 1 1 K SBP K ID D TOTAL D CHOOSE D _N_ 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE PDV: 1 st iteration of the DO loop:
    • The OUTPUT statement instructs SAS to write the contents from PDV to Sample2
    data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; Sample2: 145 01 9 1 1 K SBP K ID D TOTAL D CHOOSE D _N_ 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1 01 ID 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE PDV: 1 st iteration of the DO loop:
    • SAS reaches the end of 1 st iteration
    data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; Sample2: 145 01 9 1 1 K SBP K ID D TOTAL D CHOOSE D _N_ 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1 01 ID 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE PDV: 2 nd iteration of the DO loop:
    • CHOOSE ↑4
    • Since 4 ≤ TOTAL (9), the 2 nd iteration continues
    data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; Sample2: 145 01 9 4 1 K SBP K ID D TOTAL D CHOOSE D _N_ 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1 01 ID 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE PDV: 2 nd iteration of the DO loop:
    • SAS reads the 4 th observation via direct-access mode
    data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; Sample2: 106 04 9 4 1 K SBP K ID D TOTAL D CHOOSE D _N_ 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1 01 ID 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE PDV: 2 nd iteration of the DO loop:
    • The OUTPUT statement instructs SAS to write the contents from PDV to Sample2
    data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; Sample2: 106 04 9 4 1 K SBP K ID D TOTAL D CHOOSE D _N_ 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1 106 04 2 01 ID 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE PDV: 2 nd iteration of the DO loop:
    • SAS reaches the end of 2 nd iteration
    data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; Sample2: 106 04 9 4 1 K SBP K ID D TOTAL D CHOOSE D _N_ 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1 106 04 2 01 ID 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE PDV: 3 rd iteration of the DO loop:
    • CHOOSE ↑7
    • Since 7 ≤ TOTAL (9), the 3 rd iteration continues
    data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; Sample2: 106 04 9 7 1 K SBP K ID D TOTAL D CHOOSE D _N_ 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1 106 04 2 01 ID 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE PDV: 3 rd iteration of the DO loop:
    • SAS reads the 7 th observation via direct-access mode
    data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; Sample2: 127 07 9 7 1 K SBP K ID D TOTAL D CHOOSE D _N_ 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1 106 04 2 01 ID 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE PDV: 3 rd iteration of the DO loop:
    • The OUTPUT statement instructs SAS to write the contents from PDV to Sample2
    data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; Sample2: 127 07 9 7 1 K SBP K ID D TOTAL D CHOOSE D _N_ 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1 127 07 3 106 04 2 01 ID 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE PDV: 3 rd iteration of the DO loop:
    • SAS reaches the end of 3 rd iteration
    data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; Sample2: 127 07 9 7 1 K SBP K ID D TOTAL D CHOOSE D _N_ 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1 127 07 3 106 04 2 01 ID 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE PDV: 4 th iteration of the DO loop:
    • CHOOSE ↑10
    • Since 10 > TOTAL (9), the loop ends
    data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; Sample2: 127 07 9 10 1 K SBP K ID D TOTAL D CHOOSE D _N_ 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1 127 07 3 106 04 2 01 ID 145 SBP 1
  • CREATING A SYSTEMATIC SAMPLE PDV:
    • The STOP statement stops the DATA step processing
    data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; Sample2: 127 07 9 10 1 K SBP K ID D TOTAL D CHOOSE D _N_ 09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1 127 07 3 106 04 2 01 ID 145 SBP 1
  • CREATING A RANDOM SAMPLE WITH REPLACEMENT
    • A random sample – a sample is created from an original dataset on a random basis
    • A random sample with replacement
      • An observation is replaced back into the original dataset after it is chosen
      • Any observations can be chosen more than once
  • CREATING A RANDOM SAMPLE WITH REPLACEMENT data sample2; do choose = 1 to total by 3 ; set sbp point = choose nobs = total; output ; end ; stop ; run ; Systematic sample
    • CHOOSE is incremented by k to create a systematic sample
    • To create a random sample, we need to generate a random integer between 1 and total # of observations
    09 08 07 06 05 04 03 02 01 ID 113 9 119 8 127 7 112 6 151 5 106 4 126 3 119 2 145 SBP 1
  • CREATING A RANDOM SAMPLE WITH REPLACEMENT data sample3 (drop= i); do i = 1 to 3 ; choose = ceil(ranuni( 5 )*total); set sbp point=choose nobs=total; output ; end ; stop ; run ;
    • How to generate a random integer between 1 and total # of observations ?
    RANUNI(SEED) A randomly generated real number (0,1) N Total number of observations RANUNI(SEED)*N A real number (0, N) CEIL(RANUNI(SEED)*N) An integer [1, N]
  • CREATING A RANDOM SAMPLE WITHOUT REPLACEMENT SELF STUDY!
  • UTILIZING LOOPS TO READ A LIST OF EXTERNAL FILES THE INFILE STATEMENT WITH THE END= OPTION
    • To read an external file, you can use the INFILE statement
    • For example, text1.txt , is located in “C:”,
    text1.txt: 01 145 02 119 data example13; infile &quot;C:text1.txt&quot; ; input id $ sbp; run ;
    • 2 observations  SAS will use 2 DATA step iterations to read the data
    • Like a SAS dataset, the external file also contains an end-of-file marker
    • When SAS reaches the end-of-file marker, it stops reading
    End-of-file marker
  • THE INFILE STATEMENT WITH THE END= OPTION
    • When reading a SAS dataset …
    Input dataset: Output dataset: PDV: M1240 4 F2340 3 F2390 2 M2390 1 ID D M1240 4 D F2340 3 D F2390 2 M2390 ID P GROUP 1 D 0.51880 M1240 0 4 K GROUP D RANNUM K ID D _ERROR_ D _N_
  • THE INFILE STATEMENT WITH THE END= OPTION
    • When reading a raw dataset …
    Input dataset: Output dataset: PDV: 01 145 02 119 Input buffer: Used to hold raw data 119 02 2 01 ID 145 SBP 1 119 02 0 2 K SBP K ID D _ERROR_ D _N_ … 9 1 1 2 0 … 6 5 4 3 2 1
  • THE INFILE STATEMENT WITH THE END= OPTION
    • You can use an explicit loop to read the external file
    • To construct an explicit loop, you need to specify
      • the number of iterations for the iterative DO loop
      • or a condition for the DO WHILE /DO UNTIL loops
    • One way to specify a condition is by telling SAS to read the observations until it reads the last record
    • To identify the last record, use the END = option in the INFILE statement
    INFILE FILE-SPECIFICATION END = VARIABLE; The VARIABLE is set to 1 when SAS reads the last record of the external file; otherwise it sets to 0
  • THE INFILE STATEMENT WITH THE END= OPTION
    • The following program uses the DO UNTIL loop to read the external file
    data example14; infile &quot;C:text1.txt&quot; end = last; do until (last = 1 ); input id $ sbp; output ; end ; run ;
    • There’s only one DATA step iteration
    • Within this iteration, the DO UNTIL loop iterates twice to read the two observations in text1.txt .
  • THE INFILE STATEMENT WITH THE FILEVAR = OPTION
    • Generally, you specify the name and the location of the external file immediately in the INFILE statement
    • Alternatively, you can use the FILEVAR = option in the INFILE statement to read an external file that is specified by the FILEVAR = option
    infile &quot;C:text1.txt&quot; ; INFILE FILE-SPECIFICATION FILEVAR = VARIABLE
    • VARIABLE contains the name of the external file
    • must be created before the INFILE statement
    A placeholder, not an actual filename
  • THE INFILE STATEMENT WITH THE FILEVAR = OPTION
    • For example,
    data example15; filename = &quot;C:text1.txt&quot; ; infile dummy filevar = filename; input id $ sbp; run ; 167 data example14; 168 filename = &quot;C:text1.txt&quot;; 169 infile dummy filevar = filename; 170 input id $ sbp; 171 run; NOTE: The infile DUMMY is: File Name=C:text1.txt, RECFM=V,LRECL=256 NOTE: 2 records were read from the infile DUMMY. The minimum record length was 6. The maximum record length was 6. NOTE: The data set WORK.EXAMPLE13 has 2 observations and 2 variables.
  • READING MULTIPLE EXTERNAL FILES text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 read concatenate Identical Format:
    • You can read them all by using the FILEVAR = option in the INFILE statement in one single DATA step
    06 05 04 03 02 01 ID 118 6 140 5 106 4 126 3 119 2 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 read concatenate Identical Format:
    • The FILEVAR = option will cause the INFILE statement to close the current input file and open a new one which is the FILEVAR = option
    06 05 04 03 02 01 ID 118 6 140 5 106 4 126 3 119 2 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES text1.txt: 01 145 02 119 data example15 ; filename = &quot;C:text1.txt&quot; ; infile dummy filevar = filename; input id $ sbp; run ; text2.txt: 03 126 04 106 text3.txt: 05 140 06 118
    • These three statements need to be placed inside a loop
  • READING MULTIPLE EXTERNAL FILES text1.txt: 01 145 02 119 data example15 ; filename = &quot;C:text1.txt&quot; ; infile dummy filevar = filename; input id $ sbp; run ; text2.txt: 03 126 04 106 text3.txt: 05 140 06 118
    • The names of the external files suggest that you create an iterative DO loop and iterate between 1 and 3
    do i = 1 to 3 ; end ;
  • READING MULTIPLE EXTERNAL FILES text1.txt: 01 145 02 119 data example15 ; filename = &quot;C:text1.txt&quot; ; infile dummy filevar = filename; input id $ sbp; run ; text2.txt: 03 126 04 106 text3.txt: 05 140 06 118
    • Modify the FILENAME statement by using the the || operator
    do i = 1 to 3 ; end ;
  • READING MULTIPLE EXTERNAL FILES text1.txt: 01 145 02 119 data example15 ; filename = &quot;C:text1.txt&quot; ; infile dummy filevar = filename; input id $ sbp; run ; text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do i = 1 to 3 ; end ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ;
  • READING MULTIPLE EXTERNAL FILES text1.txt: 01 145 02 119 data example15 ; filename = &quot;C:text1.txt&quot; ; infile dummy filevar = filename; input id $ sbp; run ; text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do i = 1 to 3 ; end ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ;
    • Add the OUTPUT statement within the loop
    output ;
  • READING MULTIPLE EXTERNAL FILES text1.txt: 01 145 02 119 data example15 ; filename = &quot;C:text1.txt&quot; ; infile dummy filevar = filename; input id $ sbp; run ; text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do i = 1 to 3 ; end ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ;
    • FILEVAR = option controls closing the current input file and opening a new file; SAS will not be able to detect the end-of-file marker
    • Place a STOP statement outside the loop
    output ; stop ;
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ;
    • _N_  1
    • Other variables  missing
    At the beginning of the DATA step: text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 K ID . . 1 K SBP D FILENAME D I D _N_
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ;
    • I  1
    1 st iteration of the DO loop: text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 K ID . 1 1 K SBP D FILENAME D I D _N_
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ;
    • FILENAME  C:text1.txt
    1 st iteration of the DO loop: text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 K ID . C:text1.txt 1 1 K SBP D FILENAME D I D _N_
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ;
    • INFILE reads:
    • 1 st data line from ‘text1.txt’  input buffer
    1 st iteration of the DO loop: text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 Input buffer: K ID D 0 LAST . C:text1.txt 1 1 K SBP D FILENAME D I D _N_ … 5 4 1 1 0 … 6 5 4 3 2 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ;
    • INPUT reads data values: input buffer  PDV
    1 st iteration of the DO loop: text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 Input buffer: K 01 ID 145 C:text1.txt 1 1 K SBP D FILENAME D I D _N_ … 5 4 1 1 0 … 6 5 4 3 2 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ;
    • OUTPUT tells SAS to write observations: PDV  output dataset
    1 st iteration of the DO loop: text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 K 01 ID 145 C:text1.txt 1 1 K SBP D FILENAME D I D _N_ 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ;
    • SAS reaches the end of the DO loop
    1 st iteration of the DO loop: text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 K 01 ID 145 C:text1.txt 1 1 K SBP D FILENAME D I D _N_ 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ;
    • I is incremented to 2
    2 nd iteration of the DO loop: text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 K 01 ID 145 C:text1.txt 2 1 K SBP D FILENAME D I D _N_ 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ;
    • FILENAME  C:text2.txt
    2 nd iteration of the DO loop: text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 K 01 ID 145 C:text2.txt 2 1 K SBP D FILENAME D I D _N_ 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ;
    • INFILE reads:
    • 1 st data line from ‘text2.txt’  input buffer
    2 nd iteration of the DO loop: text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 Input buffer: ??? K 01 ID 145 C:text2.txt 2 1 K SBP D FILENAME D I D _N_ … 6 2 1 3 0 … 6 5 4 3 2 1 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ;
    • Why? Once one iteration of the DO loop has completed, the following iteration starts to read a new file that is specified by the FILENAME variable
    2 nd iteration of the DO loop: text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 Input buffer: ??? K 01 ID 145 C:text2.txt 2 1 K SBP D FILENAME D I D _N_ … 6 2 1 3 0 … 6 5 4 3 2 1 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ;
    • _N_  1
    • LAST  0
    • Other variables  missing
    At the beginning of the DATA step: K ID D 0 LAST . . 1 K SBP D FILENAME D I D _N_
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ;
    • I  1
    1 st Iteration of the DO loop (outer loop): K ID D 0 LAST . 1 1 K SBP D FILENAME D I D _N_
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 1 st Iteration of the DO loop (outer loop):
    • FILENAME  C:text1.txt
    K ID D 0 LAST . C:text1.txt 1 1 K SBP D FILENAME D I D _N_
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 1 st Iteration of the DO loop (outer loop):
    • The DO UNTIL loop evaluates the condition at the end of the loop
    1 st Iteration of the DO UNTIL loop (inner loop): K ID D 0 LAST . C:text1.txt 1 1 K SBP D FILENAME D I D _N_
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 1 st Iteration of the DO loop (outer loop):
    • INFILE reads:
    • 1 st data line’ from text1.txt’  input buffer
    1 st Iteration of the DO UNTIL loop (inner loop): Input buffer: K ID D 0 LAST . C:text1.txt 1 1 K SBP D FILENAME D I D _N_ … 5 4 1 1 0 … 6 5 4 3 2 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 1 st Iteration of the DO loop (outer loop):
    • INPUT statement reads data values:
    • input buffer  PDV
    1 st Iteration of the DO UNTIL loop (inner loop): Input buffer: K 01 ID D 0 LAST 145 C:text1.txt 1 1 K SBP D FILENAME D I D _N_ … 5 4 1 1 0 … 6 5 4 3 2 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 1 st Iteration of the DO loop (outer loop): 1 st Iteration of the DO UNTIL loop (inner loop): Input buffer:
    • OUTPUT statement:
    • PDV  output dataset
    K 01 ID D 0 LAST 145 C:text1.txt 1 1 K SBP D FILENAME D I D _N_ … 5 4 1 1 0 … 6 5 4 3 2 1 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 1 st Iteration of the DO loop (outer loop): 1 st Iteration of the DO UNTIL loop (inner loop): Input buffer:
    • SAS reaches the end of the inner loop
    • Since LAST ≠1, the inner loop continues
    K 01 ID D 0 LAST 145 C:text1.txt 1 1 K SBP D FILENAME D I D _N_ … 5 4 1 1 0 … 6 5 4 3 2 1 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 1 st Iteration of the DO loop (outer loop): 2 nd Iteration of the DO UNTIL loop (inner loop): Input buffer:
    • The DO UNTIL loop evaluates the condition at the end of the loop
    K 01 ID D 0 LAST 145 C:text1.txt 1 1 K SBP D FILENAME D I D _N_ … 5 4 1 1 0 … 6 5 4 3 2 1 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 1 st Iteration of the DO loop (outer loop): 2 nd Iteration of the DO UNTIL loop (inner loop): Input buffer:
    • INFILE reads: 2 nd data line from ‘text1.txt’  input buffer
    • LAST  1
    K 01 ID D 1 LAST 145 C:text1.txt 1 1 K SBP D FILENAME D I D _N_ … 9 1 1 2 0 … 6 5 4 3 2 1 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 1 st Iteration of the DO loop (outer loop): 2 nd Iteration of the DO UNTIL loop (inner loop): Input buffer:
    • The INPUT statement reads data values: input buffer  PDV
    K 02 ID D 1 LAST 119 C:text1.txt 1 1 K SBP D FILENAME D I D _N_ … 9 1 1 2 0 … 6 5 4 3 2 1 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 1 st Iteration of the DO loop (outer loop): 2 nd Iteration of the DO UNTIL loop (inner loop): Input buffer:
    • OUTPUT statement:
    • PDV  output dataset
    K 02 ID D 1 LAST 119 C:text1.txt 1 1 K SBP D FILENAME D I D _N_ … 9 1 1 2 0 … 6 5 4 3 2 1 119 02 2 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 1 st Iteration of the DO loop (outer loop): 2 nd Iteration of the DO UNTIL loop (inner loop): Input buffer:
    • SAS reaches the end of the inner loop
    • Since LAST = 1, the inner loop ends
    K 02 ID D 1 LAST 119 C:text1.txt 1 1 K SBP D FILENAME D I D _N_ … 9 1 1 2 0 … 6 5 4 3 2 1 119 02 2 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 1 st Iteration of the DO loop (outer loop):
    • SAS reaches the end of the outer loop
    K 02 ID D 1 LAST 119 C:text1.txt 1 1 K SBP D FILENAME D I D _N_ 119 02 2 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 2 nd Iteration of the DO loop (outer loop):
    • I ↑ 2
    • since I ≤ 3, the 2 nd iteration of the outer loop continues
    K 02 ID D 1 LAST 119 C:text1.txt 2 1 K SBP D FILENAME D I D _N_ 119 02 2 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 2 nd Iteration of the DO loop (outer loop):
    • FILENAME  C:text2.txt
    K 02 ID D 1 LAST 119 C:text2.txt 2 1 K SBP D FILENAME D I D _N_ 119 02 2 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 2 nd Iteration of the DO loop (outer loop):
    • The DO UNTIL loop evaluates the condition at the end of the loop
    1 st Iteration of the DO UNTIL loop (inner loop): K 02 ID D 1 LAST 119 C:text2.txt 2 1 K SBP D FILENAME D I D _N_ 119 02 2 01 ID 145 SBP 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 2 nd Iteration of the DO loop (outer loop):
    • INFILE reads: 1 st data line from ‘text2.txt’  input buffer
    • Not the last record of ‘text2.txt’, LAST  0
    1 st Iteration of the DO UNTIL loop (inner loop): Input buffer: K 02 ID D 0 LAST 119 C:text2.txt 2 1 K SBP D FILENAME D I D _N_ 119 02 2 01 ID 145 SBP 1 … 6 2 1 3 0 … 6 5 4 3 2 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 2 nd Iteration of the DO loop (outer loop):
    • INPUT statement reads data values:
    • input buffer  PDV
    1 st Iteration of the DO UNTIL loop (inner loop): Input buffer: K 03 ID D 0 LAST 126 C:text2.txt 2 1 K SBP D FILENAME D I D _N_ 119 02 2 01 ID 145 SBP 1 … 6 2 1 3 0 … 6 5 4 3 2 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 2 nd Iteration of the DO loop (outer loop):
    • OUTPUT statement:
    • PDV  output dataset
    1 st Iteration of the DO UNTIL loop (inner loop): Input buffer: K 03 ID D 0 LAST 126 C:text2.txt 2 1 K SBP D FILENAME D I D _N_ 126 03 3 119 02 2 01 ID 145 SBP 1 … 6 2 1 3 0 … 6 5 4 3 2 1
  • READING MULTIPLE EXTERNAL FILES data example15 (drop = i); do i = 1 to 3 ; filename = &quot;C:text&quot; || put(i, 1. ) || &quot;.txt&quot; ; infile dummy filevar = filename; input id $ sbp; output ; end; stop; run ; text1.txt: 01 145 02 119 text2.txt: 03 126 04 106 text3.txt: 05 140 06 118 do until (last); infile dummy filevar = filename end =last; input id $ sbp; output ; end ; 2 nd Iteration of the DO loop (outer loop):
    • Skip the rest….
    1 st Iteration of the DO UNTIL loop (inner loop): Input buffer: K 03 ID D 0 LAST 126 C:text2.txt 2 1 K SBP D FILENAME D I D _N_ 126 03 3 119 02 2 01 ID 145 SBP 1 … 6 2 1 3 0 … 6 5 4 3 2 1
  • ARRAY
    • There is a wide range of applications in using loop structures with ARRAY processing
    • Since ARRAY is a large and different topic, we are not covering ARRAY in this talk
  • CONCLUSION
    • Loops allow us to create more simplified and efficient programming codes
    • In order to use loop structures correctly, we need to understand how DATA steps are processed
    • When trying to debug our programming errors, we often realize that most of the errors are closely related to programming fundamentals, which is understanding how the PDV works
  • CONTACT INFORMATION
    • Arthur Li
    • City of Hope
    • Division of Information Science
    • 1500 East Duarte Road
    • Duarte, CA 91010 - 3000
    • Phone: (626) 256-4673 ext. 65121
    • E-mail: xueli@coh.org