3. Introduction
• A simple data step concept first proposed
by Ian Whitlock on the SAS – L list with
Don Henderson.
• Detail analysis by Paul Dorfman and
Howard Schreier.
• The term DOW (DO – Whitlock) loop was
coined by Dorfman.
4. Structure of the DOW Loop
• The original DOW loop is a simple do
loop with a set statement in the structure.
Data ... ;
<Stuff done before break-event> ;
Do <Index Specs> Until ( Break-Event ) ;
Set A ;
<Stuff done for each record> ;
End ;
<Stuff done after break-event... > ;
Run ;
5. Application 1: Cartesian Join
• Cartesian Join is one of the simple things
that can be easily achieved in a single
SQL statement.
PROC SQL;
CREATE TABLE DATA AS SELECT * FROM
A,B;
QUIT;
6. Application 1: Cartesian Join
• Cartesian Joins can also be done
using a simple DOW loop.
• Not as simple and as elegant as the
SQL statement
• Applicable to all versions of SAS
DATA CART_JOIN;
SET A;
DO J = 1 TO COUNT;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
7. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
8. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
9. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
10. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
11. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
12. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
13. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
14. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
15. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
16. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
17. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
18. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
19. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
20. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
21. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
22. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
23. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
24. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
25. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
26. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
27. The engine
A B C D
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
DATA CART_JOIN;
SET A;
DO J = 1 TO 4;
SET B NOBS=COUNT POINT=J;
OUTPUT;
END;
RUN;
28. Application 2: Geometric Mean
• Most of the mathematical and statistical
functions in SAS are column based and
deriving a calculation for just a single
column can be very troublesome.
• The DOW loop provides a simple
solution to these problems.
• We can use a DOW loop to loop through
the observations and derive the answer.
29. Application 2: Geometric Mean
/*LOOP RUN*/
DATA _NULL_;
/*SINGLE DO LOOP*/
DO UNTIL (EOF);
SET HAVE END=EOF;
BY CATEGORY;
/*RESETING THE PREVIOUS VALUE*/
IF FIRST.CATEGORY THEN DO;
TOTAL = 1;
COUNT = 0;
END;
/*CALCULATING THE TOTAL SUM AND COUNTS VALUE*/
TOTAL = TOTAL*NUMBER;
COUNT = SUM(COUNT,1);
/*OUTPUTING THE RESULTS AND THE GEOMETRIC MEAN CALCULATION*/
IF LAST.CATEGORY THEN DO;
/*CALCULATION FORMULA AS PER THE INSTRUCTION IN SAS HELP*/
GEOMEAN = TOTAL**(1/COUNT);
/*OUTPUTING THE SOLUTION*/
PUT CATEGORY= GEOMEAN=;
OUTPUT;
END;
END;
RUN;
30. DATA _NULL_;
DO UNTIL (EOF);
SET A END = EOF;
COUNT = SUM(COUNT,1);
VALUE = SUM(VALUE,A);
END;
RUN;
The engine
A B C D
1 2 3 4
2 2 3 4
3 2 3 4
4 2 3 4
Count Value
31. DATA _NULL_;
DO UNTIL (EOF);
SET A END = EOF;
COUNT = SUM(COUNT,1);
VALUE = SUM(VALUE,A);
END;
RUN;
The engine
A B C D
1 2 3 4
2 2 3 4
3 2 3 4
4 2 3 4
Count Value
1
32. DATA _NULL_;
DO UNTIL (EOF);
SET A END = EOF;
COUNT = SUM(COUNT,1);
VALUE = SUM(VALUE,A);
END;
RUN;
The engine
A B C D
1 2 3 4
2 2 3 4
3 2 3 4
4 2 3 4
Count Value
1 1
33. DATA _NULL_;
DO UNTIL (EOF);
SET A END = EOF;
COUNT = SUM(COUNT,1);
VALUE = SUM(VALUE,A);
END;
RUN;
The engine
A B C D
1 2 3 4
2 2 3 4
3 2 3 4
4 2 3 4
Count Value
1 1
34. DATA _NULL_;
DO UNTIL (EOF);
SET A END = EOF;
COUNT = SUM(COUNT,1);
VALUE = SUM(VALUE,A);
END;
RUN;
The engine
A B C D
1 2 3 4
2 2 3 4
3 2 3 4
4 2 3 4
Count Value
1 1
2
35. DATA _NULL_;
DO UNTIL (EOF);
SET A END = EOF;
COUNT = SUM(COUNT,1);
VALUE = SUM(VALUE,A);
END;
RUN;
The engine
A B C D
1 2 3 4
2 2 3 4
3 2 3 4
4 2 3 4
Count Value
1 1
2 3
36. DATA _NULL_;
DO UNTIL (EOF);
SET A END = EOF;
COUNT = SUM(COUNT,1);
VALUE = SUM(VALUE,A);
END;
RUN;
The engine
A B C D
1 2 3 4
2 2 3 4
3 2 3 4
4 2 3 4
Count Value
1 1
2 3
37. DATA _NULL_;
DO UNTIL (EOF);
SET A END = EOF;
COUNT = SUM(COUNT,1);
VALUE = SUM(VALUE,A);
END;
RUN;
The engine
A B C D
1 2 3 4
2 2 3 4
3 2 3 4
4 2 3 4
Count Value
1 1
2 3
3
38. DATA _NULL_;
DO UNTIL (EOF);
SET A END = EOF;
COUNT = SUM(COUNT,1);
VALUE = SUM(VALUE,A);
END;
RUN;
The engine
A B C D
1 2 3 4
2 2 3 4
3 2 3 4
4 2 3 4
Count Value
1 1
2 3
3 6
39. DATA _NULL_;
DO UNTIL (EOF);
SET A END = EOF;
COUNT = SUM(COUNT,1);
VALUE = SUM(VALUE,A);
END;
RUN;
The engine
A B C D
1 2 3 4
2 2 3 4
3 2 3 4
4 2 3 4
Count Value
1 1
2 3
3 6
40. DATA _NULL_;
DO UNTIL (EOF);
SET A END = EOF;
COUNT = SUM(COUNT,1);
VALUE = SUM(VALUE,A);
END;
RUN;
The engine
A B C D
1 2 3 4
2 2 3 4
3 2 3 4
4 2 3 4
Count Value
1 1
2 3
3 6
4
41. DATA _NULL_;
DO UNTIL (EOF);
SET A END = EOF;
COUNT = SUM(COUNT,1);
VALUE = SUM(VALUE,A);
END;
RUN;
The engine
A B C D
1 2 3 4
2 2 3 4
3 2 3 4
4 2 3 4
Count Value
1 1
2 3
3 6
4 10
42. Multiple DOW Loop
• The double DOW loop structure
was first proposed by Howard
Scherier on the SAS – L as a
variation of the original technique.
• There have been several
approaches which use multiple
DOW loops to achieve various
calculations which requires multiple
iterations through the data.
43. Application 3: Bivariate Regression
• In order to calculate the
parameters of the regression, we
have to first calculate the mean of
both the x and y variables.
• After running through this, we will
then use the mean to calculate the
sum of square for both x-squares
and xy which are essential for the
regression parameter estimation.
44. Application 3: Bivariate Regression
/******************************************************************************
-------------------------------------------------------------------------------
DOW LOOP REGRESSION
-------------------------------------------------------------------------------
THIS MACRO IS A SIMPLE DEMONSTRATION THE USE OF DOUBLE DOW LOOP TO COMPUTE THE
SLOPE AND INTERCEPT OF A SIMPLE LINEAR REGRESSION MODEL.
-------------------------------------------------------------------------------
INPUT DESCRIPTION
-------------------------------------------------------------------------------
INPUT INPUT DATASET
Y DEPEDENT VARIABLE
X INDEPENDENT VARIABLE
-------------------------------------------------------------------------------
*******************************************************************************/
%LET INPUT = HTWT;
%LET Y = HEIGHT;
%LET X = WEIGHT;
/******************************************************************************/
45. Application 3: Bivariate Regression
/******************************************************************************
MAIN MACRO
*******************************************************************************/
%MACRO DOW_REG();
/******************************************************************************/
/******************************************************************************
*******************************************************************************/
DATA _NULL_;
/******************************************************************************
FIRST DOW LOOP FOR MEANS CALCULATION SUMMARIZATION
*******************************************************************************/
DO I = 1 BY 1 UNTIL (EOF);
SET &INPUT END=EOF;
SUM_Y = SUM(SUM_Y,&Y);
SUM_X = SUM(SUM_X,&X);
COUNT = SUM(COUNT,1);
END;
/******************************************************************************
MEANS CALCULATION
*******************************************************************************/
MEAN_Y = SUM_Y/COUNT;
MEAN_X = SUM_X/COUNT;
46. Application 3: Bivariate Regression
/******************************************************************************
LOOP TO CALCULATE SUM OF SQUARES AND XY
*******************************************************************************/
DO _N_ = 1 TO COUNT;
SET &INPUT;
XY = (&Y - MEAN_Y)*(&X - MEAN_X);
XX = (&X - MEAN_X)**2;
SUM_XY = SUM(SUM_XY,XY);
SUM_XX = SUM(SUM_XX,XX);
END;
/******************************************************************************
CALCULATION OF SLOPES
*******************************************************************************/
SLOPE = SUM_XY/SUM_XX;
INTERCEPT = MEAN_Y - SLOPE*MEAN_X;
PUT SLOPE = INTERCEPT = ;
/******************************************************************************
*******************************************************************************/
RUN;
%MEND;
/******************************************************************************
MACRO CALLING
*******************************************************************************/
%DOW_REG();
47. Conclusion
• One of the most versatile programming
structures in SAS
• Can be used on a variety of scenario to
achieve the required results
• Beside data manipulation, DOW loops
are also useful for summarization of
data, mathematical calculations and
even model parameter estimations.
Editor's Notes
The do loop in the middle is known as the DOW loop structure.
The DOW loop basically separates the data step into three different steps.
The first step is done to create some variables that need to be used for the DOW loop section.
Typical statements in this area are RETAIN and ARRAY statements.
The second part of the structure is the DOW loop itself.
This part is the main engine of structure.
It presents the user with a variety of options to create new variables, summarizing information and other nifty tricks to work with.
This is also the section which can be very troublesome should the users do something wrong here.
The last portion is a final calculation step which is useful for final summarization or calculations involving overall summarization.
This step is occasionally being used for the subsequent DOW loops such as those in the Double DOW loop programming structure.
You can distinctly see the structure of the DOW loop easily.
The first set statement is the first portion of the DOW structure which basically reads in one dataset.
The main part which is the DOW loop is the part that reads in the second dataset and setting each observations of the second dataset to the first dataset.
Each combination is then outputted by the single output statement.
In this particular DOW loop, only the main structure of the DOW is being used.
The loop uses a EOF statement to identify the end of a loop and uses a by statement as in this case, it was a calculation of geometric mean for each sub category.
At the end of each loop, a summarization process takes place and the geometric mean calculated accordingly.