Your SlideShare is downloading. ×
0
Adding Statistical Functionality to the DATA Step with PROC FCMP Stacey Christian and Jacques Rioux  SAS Institute Inc., C...
Introduction/Motivation <ul><li>Ever want to call a SAS procedure from the DATA step? </li></ul><ul><li>Ever want to encap...
Overview <ul><li>RUN_MACRO function in FCMP </li></ul><ul><li>Recursive Technique </li></ul><ul><li>Iterative Technique/Th...
RUN_MACRO Function in FCMP  <ul><li>executes a predefined SAS macro </li></ul><ul><li>Syntax: </li></ul><ul><li>rc =  run_...
See Macro Run <ul><li>  /* Create a macro called testmacro */ </li></ul><ul><li>%macro subtract_macro; </li></ul><ul><li>%...
See Macro Run  in DATA Step <ul><li>  </li></ul><ul><li>options cmplib = (sasuser.ds); </li></ul><ul><li>data _null_; </li...
Recursive Technique:   Segmenting Time Series Data <ul><li>“ Segmenting Time Series: A Survey and Novel Approach ”  Keogh,...
Top Down Algorithm <ul><li>SegmentTopDown  ( currentSegment )  </li></ul><ul><li>{ </li></ul><ul><li>error = run_linear_ap...
Top Down Subroutine <ul><li>subroutine  segment_topdown (data $, segdata $, var $, start, end, threshold); </li></ul><ul><...
Linear Approximation Subroutine function  linear_approximation (ds_in $, var $, first_obs, last_obs); rc = run_macro(‘line...
Linear Approximation Macro <ul><li>%macro  linear_approximation_macro ; </li></ul><ul><li>  </li></ul><ul><li>data _TEMP_;...
Recursive Technique: Results data _NULL_;  call segment_topdown(&quot;sasuser.snp&quot;, &quot;work.segds_20&quot;,  &quot...
Recursive Technique:   Graphic Results  42 Piecewise Linear Segments
Recursive Technique:   Graphic Results  113 Piecewise Linear Segments
Iterative Technique <ul><ul><li>&quot;Minimum Quadratic Distance Estimation for the Proportional Hazards Regression Model ...
Iteratively Reweighted Least Squares Algorithm <ul><li>initialize_weights( weights ); </li></ul><ul><li>params1 = run_regr...
IterativeTechnique:    DATA Step code <ul><li>subroutine fit_ph_model(indata $, parmData $, depVars $, weightVars $, indep...
Run_Regression Subroutine subroutine  run_regression ( data $, dependent $, independent $, weight $, parmData $, parmArray...
Run_Regression Macro <ul><li>%macro  run_regression_macro;  </li></ul><ul><li>proc reg data=&data outest=&parmData NOPRINT...
The True Glory of Reusable Functions:  The Simulation <ul><ul><li>Now have a “fitting routine” for the Proportional Hazard...
The Simulation Study <ul><li>proc fcmp; </li></ul><ul><li>do i=1 to 1000;  </li></ul><ul><li>call simulate_ph_data (&quot;...
Simulation Results
Simulation Graphs
Meta Programming <ul><li>Create you own scoring function dynamically from a fitted model </li></ul>subroutine create_score...
Score Function Macro %macro create_score_func_macro; proc transpose data =&paramds out=&paramds._t; var &independent; run;...
Score Function Macro - continued proc fcmp outlib=&library..score; function &scoreFunc(&theArgs); return(&Intercept + &the...
Run Create Score Function data _NULL_; call create_score(&quot;work.mroz&quot;, &quot;lwage&quot;, &quot;educ exper age ki...
Conclusions <ul><li>Users can encapsulate preexisting analytical procedures as building blocks for even larger more comple...
Where to find more information <ul><li>http://support.sas.com/saspresents </li></ul><ul><li>Paper is PDF form </li></ul><u...
Adding Statistical Functionality to the DATA Step with PROC FCMP Paper 326-2010
Upcoming SlideShare
Loading in...5
×

Adding Statistical Functionality to the DATA Step with PROC FCMP

1,403

Published on

Extend and reuse SAS own procedures within data step code. Using PROC FCMP, we show you can create reusable code in the data step to pull together the power of possibly many procedures and getting a much cleaner programming model.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,403
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • The SEGMENT_TOPDOWN subroutine was called from the DATA step on S&amp;P 500 data from1950 to 2010. Data contained 15,116 observations Copyright © 2010, SAS Institute Inc. All rights reserved.
  • 42 segments (with 20% error reduction threshold) Copyright © 2010, SAS Institute Inc. All rights reserved.
  • 113 segments (with 15% error reduction threshold) Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Note we ran in fcmp instead of data step Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Transcript of "Adding Statistical Functionality to the DATA Step with PROC FCMP"

    1. 1. Adding Statistical Functionality to the DATA Step with PROC FCMP Stacey Christian and Jacques Rioux SAS Institute Inc., Cary, NC Paper 326-2010
    2. 2. Introduction/Motivation <ul><li>Ever want to call a SAS procedure from the DATA step? </li></ul><ul><li>Ever want to encapsulate a complicated analytical algorithm in a reusable function? </li></ul><ul><li>This talk will demonstrate how to add statistical functionality to the DATA step through the definition of FCMP function wrappers. </li></ul>
    3. 3. Overview <ul><li>RUN_MACRO function in FCMP </li></ul><ul><li>Recursive Technique </li></ul><ul><li>Iterative Technique/The Simulation </li></ul><ul><li>Meta Programming with FCMP </li></ul>
    4. 4. RUN_MACRO Function in FCMP <ul><li>executes a predefined SAS macro </li></ul><ul><li>Syntax: </li></ul><ul><li>rc = run_macro (‘macro_name’, var_1, var_2, … ); </li></ul><ul><ul><li>rc : return code </li></ul></ul><ul><ul><li>macro_name : name of sas macro to run </li></ul></ul><ul><ul><li>var_N : variables to pass to/from macro </li></ul></ul>
    5. 5. See Macro Run <ul><li>  /* Create a macro called testmacro */ </li></ul><ul><li>%macro subtract_macro; </li></ul><ul><li>%let difference = %sysevalf(&a - &b); </li></ul><ul><li>%mend subtract_macro; </li></ul><ul><li>/* Use subtract_macro within a function */ </li></ul><ul><li>proc fcmp outlib = sasuser.ds.functions; </li></ul><ul><li>function subtract(a,b); </li></ul><ul><li>rc = run_macro(‘subtract_macro', a, b, difference); </li></ul><ul><li>if rc eq 0 then return(difference); </li></ul><ul><li>else return(.); </li></ul><ul><li>endsub; </li></ul><ul><li>  </li></ul><ul><li>/* test the call */ </li></ul><ul><li>a = 5.3; </li></ul><ul><li>b = 0.7; </li></ul><ul><li>diff = subtract(a, b); </li></ul><ul><li>put diff=; </li></ul><ul><li>run; </li></ul>diff=4.6
    6. 6. See Macro Run in DATA Step <ul><li>  </li></ul><ul><li>options cmplib = (sasuser.ds); </li></ul><ul><li>data _null_; </li></ul><ul><li>a = 5.3; </li></ul><ul><li>b = 0.7; </li></ul><ul><li>diff = subtract(a, b); </li></ul><ul><li>put diff=; </li></ul><ul><li>run; </li></ul>diff=4.6
    7. 7. Recursive Technique: Segmenting Time Series Data <ul><li>“ Segmenting Time Series: A Survey and Novel Approach ” Keogh, Eamonn, et. al. </li></ul><ul><li>reduce extremely large time series data sets </li></ul><ul><li>piecewise linear approximations </li></ul><ul><li>top-down recursive algorithm </li></ul>
    8. 8. Top Down Algorithm <ul><li>SegmentTopDown ( currentSegment ) </li></ul><ul><li>{ </li></ul><ul><li>error = run_linear_approximation( currentSegment ); </li></ul><ul><li>leftError = run_linear_approximation ( leftSegment ); </li></ul><ul><li>rightError = run_linear_approximation ( rightSegment ); </li></ul><ul><li>combinedError = leftError + rightError; </li></ul><ul><li>if (combinedError < error) then { </li></ul><ul><li>call SegmentTopDown ( leftSegment ) ; </li></ul><ul><li>call SegmentTopDown ( rightSegment ); </li></ul><ul><li>} else { </li></ul><ul><li>keep_segment( currentSegment ); </li></ul><ul><li>} </li></ul><ul><li>} </li></ul>
    9. 9. Top Down Subroutine <ul><li>subroutine segment_topdown (data $, segdata $, var $, start, end, threshold); </li></ul><ul><li>  </li></ul><ul><li>error = linear_approximation (data, start,end); </li></ul><ul><li>  </li></ul><ul><li>mid = start + floor((end-start)/2); </li></ul><ul><li>left_error = linear_approximation (data, start, mid); </li></ul><ul><li>right_error = linear_approximation (data, mid+1, end); </li></ul><ul><li>  </li></ul><ul><li>improvement = (error – (left_error + right_error)) / error; </li></ul><ul><li>if (improvement > threshold) then do; </li></ul><ul><li>call segment_topdown (data, segdata, start, mid, threshold); </li></ul><ul><li>call segment_topdown (data, segdata, mid+1, end, threshold); </li></ul><ul><li>end; </li></ul><ul><li>else do; </li></ul><ul><li>call append_segment (segdata, start, end, error); </li></ul><ul><li>end; </li></ul><ul><li>  </li></ul><ul><li>endsub; </li></ul>
    10. 10. Linear Approximation Subroutine function linear_approximation (ds_in $, var $, first_obs, last_obs); rc = run_macro(‘linear_approximation_macro’, ds_in, first_obs, last_obs, var, error); return(error); endsub;
    11. 11. Linear Approximation Macro <ul><li>%macro linear_approximation_macro ; </li></ul><ul><li>  </li></ul><ul><li>data _TEMP_; </li></ul><ul><li>set &ds_in(firstobs=&first_obs obs=&last_obs); </li></ul><ul><li>retain _TREND_ 0; </li></ul><ul><li>_TREND_ = _TREND_ + 1; </li></ul><ul><li>run; </li></ul><ul><li>  </li></ul><ul><li>proc reg data=_TEMP_ outest=_EST_ noprint; </li></ul><ul><li>model &var = _TREND_ / sse; </li></ul><ul><li>run; quit; </li></ul><ul><li>  </li></ul><ul><li>proc sql noprint; select _SSE_ into :ERROR from _est_; quit; </li></ul><ul><li>  </li></ul><ul><li>%mend linear_approximation_macro ; </li></ul><ul><li>  </li></ul>
    12. 12. Recursive Technique: Results data _NULL_; call segment_topdown(&quot;sasuser.snp&quot;, &quot;work.segds_20&quot;, &quot;close&quot;, 1, 15116, 0.2); call segment_topdown(&quot;sasuser.snp&quot;, &quot;work.segds_15&quot;, &quot;close&quot;, 1, 15116, 0.15); run;
    13. 13. Recursive Technique: Graphic Results 42 Piecewise Linear Segments
    14. 14. Recursive Technique: Graphic Results 113 Piecewise Linear Segments
    15. 15. Iterative Technique <ul><ul><li>&quot;Minimum Quadratic Distance Estimation for the Proportional Hazards Regression Model with Grouped Data“, Jacques Rioux and Andrew Luong </li></ul></ul><ul><ul><li>Survival models/proportional hazard model </li></ul></ul><ul><ul><li>Proc PHREG (max likelihood) versus minimum distance methods </li></ul></ul><ul><ul><li>Iteratively reweighted least squares algorithm </li></ul></ul>
    16. 16. Iteratively Reweighted Least Squares Algorithm <ul><li>initialize_weights( weights ); </li></ul><ul><li>params1 = run_regression( weights ); </li></ul><ul><li>while (maxRelativeDifference > criteria) </li></ul><ul><li>{ </li></ul><ul><li>update_weights(weights); </li></ul><ul><li>params2 = run_regression( weights ); </li></ul><ul><li>maxRelativeDifference = params2 - params1; </li></ul><ul><li>params1 = params2; </li></ul><ul><li>} </li></ul>
    17. 17. IterativeTechnique: DATA Step code <ul><li>subroutine fit_ph_model(indata $, parmData $, depVars $, weightVars $, indepVars $ ); </li></ul><ul><li>array params1[3]; </li></ul><ul><li>array params2[3]; </li></ul><ul><li>call prepare_phdata (indata, “_prepdata_”); </li></ul><ul><li>call run_regression (“_prepdata_”, depVars, indepVars, weightVars, parmData, params1); </li></ul><ul><li>maxRelativeDifference = 1; </li></ul><ul><li>do while( maxRelativeDifference > 0.0001 ); </li></ul><ul><li>call update_weights (“_prepdata_”, weightVars, parmData); </li></ul><ul><li>call run_regression ( “_prepdata_”, depVars, indepVars, weightVars, parmData, params2 ); </li></ul><ul><li>maxRelativeDifference = calc_max_relative_diff (params1,params2); </li></ul><ul><li>end; </li></ul><ul><li>endsub; </li></ul>
    18. 18. Run_Regression Subroutine subroutine run_regression ( data $, dependent $, independent $, weight $, parmData $, parmArray[*]); outargs parmArray; array tmpArray[1] _temporary_; rc = RUN_MACRO ('run_regression_macro', data, parmData , dependent, independent, weight) ; rc = read_array(parmData, tmpArray); do i = 1 to dim(parmArray); parmArray[i] = tmpArray[1,i]; end; endsub;
    19. 19. Run_Regression Macro <ul><li>%macro run_regression_macro; </li></ul><ul><li>proc reg data=&data outest=&parmData NOPRINT; </li></ul><ul><li>model &dependent = &independent/noint; </li></ul><ul><li>weight &weight; </li></ul><ul><li>quit; </li></ul><ul><li>data &parmData; </li></ul><ul><li>set &parmData; </li></ul><ul><li>keep &independent; </li></ul><ul><li>run; </li></ul><ul><li>%mend run_regression_macro </li></ul><ul><li>  </li></ul>
    20. 20. The True Glory of Reusable Functions: The Simulation <ul><ul><li>Now have a “fitting routine” for the Proportional Hazard Model (fit_ph_model) </li></ul></ul><ul><ul><li>Create a function to generate PH data (called generate_ph_data) </li></ul></ul><ul><ul><li>Create a function to append fits to results data set (called append_ph_data). </li></ul></ul>
    21. 21. The Simulation Study <ul><li>proc fcmp; </li></ul><ul><li>do i=1 to 1000; </li></ul><ul><li>call simulate_ph_data (&quot;work.simdata&quot;); </li></ul><ul><li>call fit_ph_model(&quot;work.simdata&quot;, &quot;work.params&quot;, </li></ul><ul><li>&quot;log_log_Pij&quot;, &quot;Weight&quot;, </li></ul><ul><li>&quot;x1 x2 x3&quot; ); </li></ul><ul><li>call append_data(&quot;work.simresults&quot;, &quot;work.params&quot;); </li></ul><ul><li>end; </li></ul><ul><li>run; </li></ul>
    22. 22. Simulation Results
    23. 23. Simulation Graphs
    24. 24. Meta Programming <ul><li>Create you own scoring function dynamically from a fitted model </li></ul>subroutine create_score( data $, dependent $, independent $, scoreFunc $, library $ ); paramds = &quot;work.params&quot;; rc = RUN_MACRO('run_regression_macro', data, paramds, dependent, independent); rc = RUN_MACRO('create_score_func_macro', paramds, independent, scoreFunc, library); endsub;
    25. 25. Score Function Macro %macro create_score_func_macro; proc transpose data =&paramds out=&paramds._t; var &independent; run; proc sql noprint; select trim(_NAME_) || &quot; * &quot; || strip(put(col1,BEST12.)) into: theScore separated by &quot; + &quot; from &paramds._t; select trim(_NAME_) into: theArgs separated by &quot; , &quot; from &paramds._t; quit; data _NULL_; set &paramds; call symputX (&quot;Intercept&quot;,intercept); run; <continued>
    26. 26. Score Function Macro - continued proc fcmp outlib=&library..score; function &scoreFunc(&theArgs); return(&Intercept + &theScore); endsub; quit; %mend create_score_func_macro;
    27. 27. Run Create Score Function data _NULL_; call create_score(&quot;work.mroz&quot;, &quot;lwage&quot;, &quot;educ exper age kidslt6 kidsge6&quot;, &quot;PredLWage_Full&quot;, &quot;sasuser.score&quot;); call create_score(&quot;work.mroz&quot;, &quot;lwage&quot;, &quot;educ exper age&quot;, &quot;PredLWage_NoKids&quot;, &quot;sasuser.score&quot;); run; data _NULL_; educ = 15; exper = 5; age = 30; kidslt6 = 2; kidsge6 = 1; PredWage_Full = exp(PredLWage_Full(educ, exper, age, kidslt6, kidsge6)); put PredWage_Full=; PredWage_NoKids = exp(PredLWage_NoKids(educ, exper, age)); put PredWage_NoKids=; run; PredWage_Full=3.4199679212 PredWage_NoKids=3.787216653
    28. 28. Conclusions <ul><li>Users can encapsulate preexisting analytical procedures as building blocks for even larger more complex statistical analysis methods! </li></ul><ul><li>PROC FCMP provides the vehicle to write reusable, independent program units (functions and subroutines) </li></ul><ul><li>These units can be written and tested independently. </li></ul>
    29. 29. Where to find more information <ul><li>http://support.sas.com/saspresents </li></ul><ul><li>Paper is PDF form </li></ul><ul><li>Zip file containing all source code </li></ul>
    30. 30. Adding Statistical Functionality to the DATA Step with PROC FCMP Paper 326-2010
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×