Adding Statistical Functionality to the DATA Step with PROC FCMP
Upcoming SlideShare
Loading in...5
×
 

Adding Statistical Functionality to the DATA Step with PROC FCMP

on

  • 1,605 views

Extend and reuse SAS own procedures within data step code. Using PROC FCMP, we show you can create reusable code in the data step to pull together the power of possibly many procedures and getting a ...

Extend and reuse SAS own procedures within data step code. Using PROC FCMP, we show you can create reusable code in the data step to pull together the power of possibly many procedures and getting a much cleaner programming model.

Statistics

Views

Total Views
1,605
Views on SlideShare
1,600
Embed Views
5

Actions

Likes
1
Downloads
12
Comments
0

3 Embeds 5

http://www.slideshare.net 3
http://www.slashdocs.com 1
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • The SEGMENT_TOPDOWN subroutine was called from the DATA step on S&P 500 data from1950 to 2010. Data contained 15,116 observations Copyright © 2010, SAS Institute Inc. All rights reserved.
  • 42 segments (with 20% error reduction threshold) Copyright © 2010, SAS Institute Inc. All rights reserved.
  • 113 segments (with 15% error reduction threshold) Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Note we ran in fcmp instead of data step Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.
  • Copyright © 2010, SAS Institute Inc. All rights reserved.

Adding Statistical Functionality to the DATA Step with PROC FCMP Adding Statistical Functionality to the DATA Step with PROC FCMP Presentation Transcript

  • Adding Statistical Functionality to the DATA Step with PROC FCMP Stacey Christian and Jacques Rioux SAS Institute Inc., Cary, NC Paper 326-2010
  • Introduction/Motivation
    • Ever want to call a SAS procedure from the DATA step?
    • Ever want to encapsulate a complicated analytical algorithm in a reusable function?
    • This talk will demonstrate how to add statistical functionality to the DATA step through the definition of FCMP function wrappers.
  • Overview
    • RUN_MACRO function in FCMP
    • Recursive Technique
    • Iterative Technique/The Simulation
    • Meta Programming with FCMP
  • RUN_MACRO Function in FCMP
    • executes a predefined SAS macro
    • Syntax:
    • rc = run_macro (‘macro_name’, var_1, var_2, … );
      • rc : return code
      • macro_name : name of sas macro to run
      • var_N : variables to pass to/from macro
  • See Macro Run
    •   /* Create a macro called testmacro */
    • %macro subtract_macro;
    • %let difference = %sysevalf(&a - &b);
    • %mend subtract_macro;
    • /* Use subtract_macro within a function */
    • proc fcmp outlib = sasuser.ds.functions;
    • function subtract(a,b);
    • rc = run_macro(‘subtract_macro', a, b, difference);
    • if rc eq 0 then return(difference);
    • else return(.);
    • endsub;
    •  
    • /* test the call */
    • a = 5.3;
    • b = 0.7;
    • diff = subtract(a, b);
    • put diff=;
    • run;
    diff=4.6
  • See Macro Run in DATA Step
    •  
    • options cmplib = (sasuser.ds);
    • data _null_;
    • a = 5.3;
    • b = 0.7;
    • diff = subtract(a, b);
    • put diff=;
    • run;
    diff=4.6
  • Recursive Technique: Segmenting Time Series Data
    • “ Segmenting Time Series: A Survey and Novel Approach ” Keogh, Eamonn, et. al.
    • reduce extremely large time series data sets
    • piecewise linear approximations
    • top-down recursive algorithm
  • Top Down Algorithm
    • SegmentTopDown ( currentSegment )
    • {
    • error = run_linear_approximation( currentSegment );
    • leftError = run_linear_approximation ( leftSegment );
    • rightError = run_linear_approximation ( rightSegment );
    • combinedError = leftError + rightError;
    • if (combinedError < error) then {
    • call SegmentTopDown ( leftSegment ) ;
    • call SegmentTopDown ( rightSegment );
    • } else {
    • keep_segment( currentSegment );
    • }
    • }
  • Top Down Subroutine
    • subroutine segment_topdown (data $, segdata $, var $, start, end, threshold);
    •  
    • error = linear_approximation (data, start,end);
    •  
    • mid = start + floor((end-start)/2);
    • left_error = linear_approximation (data, start, mid);
    • right_error = linear_approximation (data, mid+1, end);
    •  
    • improvement = (error – (left_error + right_error)) / error;
    • if (improvement > threshold) then do;
    • call segment_topdown (data, segdata, start, mid, threshold);
    • call segment_topdown (data, segdata, mid+1, end, threshold);
    • end;
    • else do;
    • call append_segment (segdata, start, end, error);
    • end;
    •  
    • endsub;
  • Linear Approximation Subroutine function linear_approximation (ds_in $, var $, first_obs, last_obs); rc = run_macro(‘linear_approximation_macro’, ds_in, first_obs, last_obs, var, error); return(error); endsub;
  • Linear Approximation Macro
    • %macro linear_approximation_macro ;
    •  
    • data _TEMP_;
    • set &ds_in(firstobs=&first_obs obs=&last_obs);
    • retain _TREND_ 0;
    • _TREND_ = _TREND_ + 1;
    • run;
    •  
    • proc reg data=_TEMP_ outest=_EST_ noprint;
    • model &var = _TREND_ / sse;
    • run; quit;
    •  
    • proc sql noprint; select _SSE_ into :ERROR from _est_; quit;
    •  
    • %mend linear_approximation_macro ;
    •  
  • Recursive Technique: Results data _NULL_; call segment_topdown(&quot;sasuser.snp&quot;, &quot;work.segds_20&quot;, &quot;close&quot;, 1, 15116, 0.2); call segment_topdown(&quot;sasuser.snp&quot;, &quot;work.segds_15&quot;, &quot;close&quot;, 1, 15116, 0.15); run;
  • Recursive Technique: Graphic Results 42 Piecewise Linear Segments
  • Recursive Technique: Graphic Results 113 Piecewise Linear Segments
  • Iterative Technique
      • &quot;Minimum Quadratic Distance Estimation for the Proportional Hazards Regression Model with Grouped Data“, Jacques Rioux and Andrew Luong
      • Survival models/proportional hazard model
      • Proc PHREG (max likelihood) versus minimum distance methods
      • Iteratively reweighted least squares algorithm
  • Iteratively Reweighted Least Squares Algorithm
    • initialize_weights( weights );
    • params1 = run_regression( weights );
    • while (maxRelativeDifference > criteria)
    • {
    • update_weights(weights);
    • params2 = run_regression( weights );
    • maxRelativeDifference = params2 - params1;
    • params1 = params2;
    • }
  • IterativeTechnique: DATA Step code
    • subroutine fit_ph_model(indata $, parmData $, depVars $, weightVars $, indepVars $ );
    • array params1[3];
    • array params2[3];
    • call prepare_phdata (indata, “_prepdata_”);
    • call run_regression (“_prepdata_”, depVars, indepVars, weightVars, parmData, params1);
    • maxRelativeDifference = 1;
    • do while( maxRelativeDifference > 0.0001 );
    • call update_weights (“_prepdata_”, weightVars, parmData);
    • call run_regression ( “_prepdata_”, depVars, indepVars, weightVars, parmData, params2 );
    • maxRelativeDifference = calc_max_relative_diff (params1,params2);
    • end;
    • endsub;
  • Run_Regression Subroutine subroutine run_regression ( data $, dependent $, independent $, weight $, parmData $, parmArray[*]); outargs parmArray; array tmpArray[1] _temporary_; rc = RUN_MACRO ('run_regression_macro', data, parmData , dependent, independent, weight) ; rc = read_array(parmData, tmpArray); do i = 1 to dim(parmArray); parmArray[i] = tmpArray[1,i]; end; endsub;
  • Run_Regression Macro
    • %macro run_regression_macro;
    • proc reg data=&data outest=&parmData NOPRINT;
    • model &dependent = &independent/noint;
    • weight &weight;
    • quit;
    • data &parmData;
    • set &parmData;
    • keep &independent;
    • run;
    • %mend run_regression_macro
    •  
  • The True Glory of Reusable Functions: The Simulation
      • Now have a “fitting routine” for the Proportional Hazard Model (fit_ph_model)
      • Create a function to generate PH data (called generate_ph_data)
      • Create a function to append fits to results data set (called append_ph_data).
  • The Simulation Study
    • proc fcmp;
    • do i=1 to 1000;
    • call simulate_ph_data (&quot;work.simdata&quot;);
    • call fit_ph_model(&quot;work.simdata&quot;, &quot;work.params&quot;,
    • &quot;log_log_Pij&quot;, &quot;Weight&quot;,
    • &quot;x1 x2 x3&quot; );
    • call append_data(&quot;work.simresults&quot;, &quot;work.params&quot;);
    • end;
    • run;
  • Simulation Results
  • Simulation Graphs
  • Meta Programming
    • Create you own scoring function dynamically from a fitted model
    subroutine create_score( data $, dependent $, independent $, scoreFunc $, library $ ); paramds = &quot;work.params&quot;; rc = RUN_MACRO('run_regression_macro', data, paramds, dependent, independent); rc = RUN_MACRO('create_score_func_macro', paramds, independent, scoreFunc, library); endsub;
  • Score Function Macro %macro create_score_func_macro; proc transpose data =&paramds out=&paramds._t; var &independent; run; proc sql noprint; select trim(_NAME_) || &quot; * &quot; || strip(put(col1,BEST12.)) into: theScore separated by &quot; + &quot; from &paramds._t; select trim(_NAME_) into: theArgs separated by &quot; , &quot; from &paramds._t; quit; data _NULL_; set &paramds; call symputX (&quot;Intercept&quot;,intercept); run; <continued>
  • Score Function Macro - continued proc fcmp outlib=&library..score; function &scoreFunc(&theArgs); return(&Intercept + &theScore); endsub; quit; %mend create_score_func_macro;
  • Run Create Score Function data _NULL_; call create_score(&quot;work.mroz&quot;, &quot;lwage&quot;, &quot;educ exper age kidslt6 kidsge6&quot;, &quot;PredLWage_Full&quot;, &quot;sasuser.score&quot;); call create_score(&quot;work.mroz&quot;, &quot;lwage&quot;, &quot;educ exper age&quot;, &quot;PredLWage_NoKids&quot;, &quot;sasuser.score&quot;); run; data _NULL_; educ = 15; exper = 5; age = 30; kidslt6 = 2; kidsge6 = 1; PredWage_Full = exp(PredLWage_Full(educ, exper, age, kidslt6, kidsge6)); put PredWage_Full=; PredWage_NoKids = exp(PredLWage_NoKids(educ, exper, age)); put PredWage_NoKids=; run; PredWage_Full=3.4199679212 PredWage_NoKids=3.787216653
  • Conclusions
    • Users can encapsulate preexisting analytical procedures as building blocks for even larger more complex statistical analysis methods!
    • PROC FCMP provides the vehicle to write reusable, independent program units (functions and subroutines)
    • These units can be written and tested independently.
  • Where to find more information
    • http://support.sas.com/saspresents
    • Paper is PDF form
    • Zip file containing all source code
  • Adding Statistical Functionality to the DATA Step with PROC FCMP Paper 326-2010