Base SAS Statistics Procedures

8,846 views

Published on

Learning
Base SAS,
Advanced SAS,
Proc SQl,
ODS,
SAS in financial industry,
Clinical trials,
SAS Macros,
SAS BI,
SAS on Unix,
SAS on Mainframe,
SAS interview Questions and Answers,
SAS Tips and Techniques,
SAS Resources,
SAS Certification questions...

visit http://sastechies.blogspot.com

Published in: Technology
3 Comments
26 Likes
Statistics
Notes
No Downloads
Views
Total views
8,846
On SlideShare
0
From Embeds
0
Number of Embeds
30
Actions
Shares
0
Downloads
0
Comments
3
Likes
26
Embeds 0
No embeds

No notes for slide
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • SASTechies.com Sharad C Narnindi Attic Technologies,Inc 2005
  • Base SAS Statistics Procedures

    1. 1. SASTechies [email_address] http://www.sastechies.com
    2. 2. <ul><li>Creating </li></ul><ul><ul><li>SAS Tables, </li></ul></ul><ul><ul><li>Listings, </li></ul></ul><ul><ul><li>Basic Statistics Procedures with SAS </li></ul></ul><ul><ul><li>Graphs </li></ul></ul><ul><ul><li>ODS HTML </li></ul></ul><ul><ul><li>Proc Report and Other Utility Procedures </li></ul></ul>TLG’s 11/13/09 SAS Techies 2009
    3. 3. <ul><li>Descriptive </li></ul><ul><li>Distributive </li></ul>SAS Techies 2009 11/13/09
    4. 4. <ul><li>PROC TABULATE creates customized one-, two-, and three-dimensional tables that display any of a large number of descriptive statistics. </li></ul><ul><ul><li>modify virtually every feature of a table </li></ul></ul><ul><ul><li>calculate percentages </li></ul></ul><ul><ul><li>produce sub-reports without sorting data </li></ul></ul><ul><ul><li>summarize data and produce a report in one step </li></ul></ul><ul><ul><li>generate multiple tables in one step. </li></ul></ul>SAS Techies 2009 proc tabulate data=diabstat; class type; var premium; table type premium; run; proc tabulate data=clinic.admit; class sex; var height weight; table sex,height*min weight*min; run; proc tabulate data=clinic.admit; class sex actlevel; var height weight; table actlevel,sex,height*min weight*min; run; 11/13/09   Height Weight Min Min Sex 61.00 118.00 F M 69.00 147.00 ActLevel HIGH ActLevel LOW   Height Weight Min Min Sex 66.00 140.00 F M 72.00 168.00   Height Weight Min Min Sex 61.00 118.00 F M 71.00 154.00
    5. 5. <ul><li>To set up a table with PROC TABULATE, you need to identify the data you are analyzing, and then determine </li></ul><ul><ul><li>which variables, if any, you need to classify your data </li></ul></ul><ul><ul><li>which variables, if any, you need to analyze your data </li></ul></ul><ul><ul><li>the type of table you need to represent your data. </li></ul></ul>SAS Techies 2009 11/13/09 PROC TABULATE invokes the procedure and identifies your data set CLASS specifies variables used to classify data VAR analyze data - uses variables and statistics to form the table. TABLE defines the table to display your data --uses variables and statistics to form the table.
    6. 6. <ul><li>Class variables </li></ul><ul><ul><li>can be character or numeric . </li></ul></ul><ul><ul><li>classify data into groups or categories. </li></ul></ul><ul><ul><li>have only a few distinct values , in most cases. (PROC TABULATE prints each value of a class variable.) </li></ul></ul><ul><li>Analysis Variables Unlike class variables, analysis variables </li></ul><ul><ul><li>must be numeric </li></ul></ul><ul><ul><li>are used for arithmetic analysis </li></ul></ul><ul><ul><li>often contain continuous values. </li></ul></ul>SAS Techies 2009 the same variable cannot appear in both the CLASS statement and the VAR statement in the same step. 11/13/09
    7. 7. SAS Techies 2009 11/13/09 <ul><li>You use the TABLE statement to specify </li></ul><ul><li>the number of dimensions in the table (page, row, column) </li></ul><ul><li>the variables in the table ( Sex , Height ) </li></ul><ul><li>the statistics to be calculated (MAX) </li></ul>                               
    8. 8. SAS Techies 2009 TABLE page-expression, row-expression, column-expression ; Dimension expressions contain elements .                                                                                                Dimension expressions can also contain operators that you use when combining elements to produce the table you want.                                                                                            Commas , one type of operator, separate the dimensions of the table.                                                                                                                                    11/13/09
    9. 9. <ul><li>proc tabulate data=clinic.diabstat; </li></ul><ul><li>class type sex; </li></ul><ul><li>var totalclaim premium; </li></ul><ul><li>table type; </li></ul><ul><li>table type premium; </li></ul><ul><li>table type,premium; </li></ul><ul><li>table type,premium,sum; </li></ul><ul><li>run; </li></ul><ul><li>Two-dimensional tables always have row and column headings; one-dimensional tables only have column headings. </li></ul>SAS Techies 2009 11/13/09 Type I II N N 3.00 17.00 Type Premium I II N N Sum 3.00 17.00 3359.15 Premium Sum Type 312.65 I II 3046.50 Type I   Sum Premium 312.65 Type II   Sum Premium 3046.50
    10. 10. <ul><li>Your final task before writing your own PROC TABULATE step is to specify the statistics needed. To request a statistic, you use an operator, the asterisk ( * ), to attach the statistic to the variable. </li></ul><ul><li>If you specify only class variables in your TABLE statement, </li></ul><ul><li>the default statistic is N (frequency) </li></ul><ul><li>the only statistics you can request are N and PCTN (percent of total frequency). </li></ul><ul><li>If you specify any analysis variables in your TABLE statement, </li></ul><ul><li>the default statistic is SUM </li></ul><ul><li>you can request any statistic to be computed on the analysis variables. </li></ul><ul><li>In a TABLE statement, you can specify statistics in any dimension, but they must all be in the same dimension . </li></ul>SAS Techies 2009 proc tabulate data=clinic.admit; class sex actlevel; var height weight; table height* mean weight* max ,actlevel; table sex* pctn actlevel* n ; run; 11/13/09
    11. 11. <ul><li>To specify a summary row, you specify ALL in the row expression of your TABLE statement. </li></ul>SAS Techies 2009 proc tabulate; data=clinic.admit; var fee; class sex; table sex all ,fee; run; proc tabulate data=clinic.admit; class sex; table sex all ; run; 11/13/09   Fee Sum Sex 1418.35 F M 1268.60 All 2686.95 Sex All F M N N N 11 10 21
    12. 12. <ul><li>proc tabulate data=clinic.admit; </li></ul><ul><li>class sex; </li></ul><ul><li>var height weight; </li></ul><ul><li>table (height weight)*mean,sex all; </li></ul><ul><li>label sex='Sex of Patient' height='Height' weight='Weight'; </li></ul><ul><li>keylabel min='Lowest Reading' max='Highest Reading' mean='Average Reading'; </li></ul><ul><li>Run; </li></ul>SAS Techies 2009 11/13/09   Lowest Reading Highest Reading Average Reading Sex 152.00 568.00 253.45 F Fasting Glucose Level M Fasting Glucose Level 156.00 492.00 354.89
    13. 13. SAS Techies 2009 To group elements and control how expressions are evaluated, you can use the parentheses operator. You can also produce hierarchical tables by using the asterisk operator to cross class variables with other variables. concatenation, using the blank Operator to display variables side by side or stacked 11/13/09 table height*mean weight*mean,sex all; table sex,actlevel*age*max; table type*(sex all);
    14. 14. SAS Techies 2009 You can condense multiple pages into a single page. Condensed output You can specify how percentages are calculated. You can create formats to change headings for class variable values. 11/13/09 table type,fee,sum / condense ; table type,fee*sex*pctsum<type>; proc format; value $actfmt 'LOW'='(1) Low' 'MOD'='(2) Moderate' 'HIGH'='(3) High';
    15. 15. SAS Techies 2009 11/13/09 Error How to rectify VARIABLE appears in both CLASS and VAR lists. You specified a variable as both class and analysis. Type of name (VARIABLE) unknown at line n. You forgot to specify a variable as either class or analysis. Variable VARIABLE in list does not match type prescribed for this list. You specified a character variable as analysis.
    16. 16. <ul><li>proc means data=clinic.diabetes </li></ul><ul><li>N mean std min max; </li></ul><ul><li>run; </li></ul><ul><li>Descriptive statistics such as mean, sum, minimum, and maximum can answer basic questions about numeric data. </li></ul><ul><li>If variables are not specified, statistics on all the variables are calculated.. </li></ul><ul><li>var age height weight; </li></ul>SAS Techies 2009 proc means data=clinic.diabetes min max maxdec=0 ; run; 11/13/09 Variable N Mean Std Dev Minimum Maximum Age Height Weight Pulse FastGluc PostGluc 20 20 20 20 20 20 47 67 175 75 299 355 13 4 36 8 126 126 15 61 102 65 152 206 63 75 240 100 568 625 Variable Minimum Maximum Age Height Weight Pulse FastGluc PostGluc 15 61 102 65 152 206 63 75 240 100 568 625
    17. 17. <ul><li>proc means data=flights.laguardia median maxdec=0; var boarded transferred deplaned; </li></ul><ul><li>class Dest; </li></ul><ul><li>run </li></ul><ul><li>CLASS Group Processing You will often want statistics for grouped observations, instead of for observations as a whole. </li></ul>SAS Techies 2009 11/13/09 Dest N Obs Variable Median CPH 6 Boarded Transferred Deplaned 137 12 149 FRA 7 Boarded Transferred Deplaned 176 12 189 LON 20 Boarded Transferred Deplaned 186 11 200 PAR 13 Boarded Transferred Deplaned 155 15 182
    18. 18. <ul><li>proc sort data=clinic.heart out=work.hartsort; by survive sex; run; </li></ul><ul><li>proc means data=work.hartsort maxdec=1; var arterial heart cardiac urinary; by survive sex; run; </li></ul><ul><li>Like the CLASS statement, the BY statement specifies variables to use for categorizing observations. </li></ul><ul><li>Unlike CLASS processing, BY processing requires that your data already be sorted in the order of the BY variables. </li></ul><ul><li>BY group results have a layout that is different from that of CLASS group results. </li></ul>SAS Techies 2009 11/13/09 Survive=DIED Sex=2 Variable N Mean Std Dev Minimum Maximum Arterial Heart Cardiac Urinary 6 6 6 6 94.2 103.7 318.3 100.3 27.3 16.7 102.6 155.7 72.0 81.0 156.0 0.0 145.0 130.0 424.0 405.0 Survive=SURV Sex=1 Variable N Mean Std Dev Minimum Maximum Arterial Heart Cardiac Urinary 5 5 5 5 77.2 109.0 298.0 100.8 12.2 32.0 139.8 60.2 61.0 77.0 66.0 44.0 88.0 149.0 410.0 200.0
    19. 19. <ul><li>By default, PROC FREQ creates a one-way table with the frequency , percent , cumulative frequency , and cumulative percent of every value of all variables in a data set. </li></ul><ul><li>Frequency distributions work best with variables that contain repeating values. </li></ul><ul><li>proc freq data=finance.loans; tables rate months; run; </li></ul><ul><li>Note: One table per variable in one-way freq tables </li></ul>SAS Techies 2009 11/13/09 Variable Frequency Percent Cumulative Frequency Cumulative Percent Value Number of observations with the value. Frequency of the value divided by the total number of observations. Sum of the frequency counts of the value and all other values listed above it in the table. Cumulative frequency of the value divided by the total number of observation Rate Frequency Percent Cumulative Frequency Cumulative Percent 9.50% 2 22.22 2 22.22 9.75% 1 11.11 3 33.33 10.00% 2 22.22 5 55.56 10.50% 4 44.44 9 100.00 Months Frequency Percent Cumulative Frequency Cumulative Percent 12 1 11.11 1 11.11 24 1 11.11 2 22.22 36 1 11.11 3 33.33 48 1 11.11 4 44.44 60 2 22.22 6 66.67 360 3 33.33 9 100.00
    20. 20. <ul><li>ORDER= DATA|FORMATTED|FREQ|INTERNAL where </li></ul><ul><li>DATA orders values by appearance in the data set </li></ul><ul><li>FORMATTED orders by formatted value </li></ul><ul><li>FREQ orders values by descending frequency count </li></ul><ul><li>INTERNAL orders by unformatted value (default). </li></ul>SAS Techies 2009 SAS Data Set Clinic.Diabetes 11/13/09 ht Frequency Percent Cumulative Frequency Cumulative Percent 61 2 10.00 2 10.00 71 2 10.00 4 20.00 66 2 10.00 6 30.00 eight Frequency Percent Cumulative Frequency Cumulative Percent 64 3 15.00 3 15.00 61 2 10.00 5 25.00 65 2 10.00 7 35.00 66 2 10.00 9 45.00 ID Sex Age Height Weight Pulse FastGluc PostGluc 2304 F 16 61 102 100 568 625 1128 M 43 71 218 76 156 208 4425 F 48 66 162 80 244 322 1387 F 57 64 142 70 177 206 Height Frequency Percent Cumulative Frequency Cumulative Percent Medium 8 40.00 8 40.00 Short 7 35.00 15 75.00 Tall 5 25.00 20 100.00
    21. 21. <ul><li>To create a two-way table, join two variables with asterisks ( * ) in the TABLES statement of a PROC FREQ step. </li></ul>SAS Techies 2009 proc format; value wtfmt low-139='< 140' 140-180='140-180' 181-high='> 180'; value htfmt low-64='< 5''5&quot;' 65-70='5''5-10&quot;' 71-high='> 5''10&quot;'; run; proc freq data=clinic.diabetes; tables weight*height ; format weight wtfmt. height htfmt.; run; 11/13/09 Frequency Percent Row Pct Col Pct Table of Weight by Height Weight Height Total < 5'5&quot; 5'5-10&quot; > 5'10&quot; < 140 2 10.00 100.00 28.57 0 0.00 0.00 0.00 0 0.00 0.00 0.00 2 10.00     140-180 5 25.00 50.00 71.43 5 25.00 50.00 62.50 0 0.00 0.00 0.00 10 50.00     > 180 0 0.00 0.00 0.00 3 15.00 37.50 37.50 5 25.00 62.50 100.00 8 40.00     Total 7 35.00 8 40.00 5 25.00 20 100.00
    22. 22. SAS Techies 2009 levels v tables sex*weight*height ; ^ ^ rows + columns = two-way tables proc freq data=clinic.diabetes; tables sex*weight*height ; format weight weight. height height.; run; 11/13/09 Frequency Percent Row Pct Col Pct Table 1 of Weight by Height Controlling for Sex=F Weight Height Total < 5'5&quot; 5'5-10&quot; > 5'10&quot; < 140 2 18.18 100.00 28.57 0 0.00 0.00 0.00 0 0.00 0.00 . 2 18.18     140-180 5 45.45 55.56 71.43 4 36.36 44.44 100.00 0 0.00 0.00 . 9 81.82     > 180 0 0.00 . 0.00 0 0.00 . 0.00 0 0.00 . . 0 0.00     Total 7 63.64 4 36.36 0 0.00 11 100.00 Frequency Percent Row Pct Col Pct Table 2 of Weight by Height Controlling for Sex=M Weight Height Total < 5'5&quot; 5'5-10&quot; > 5'10&quot; < 140 0 0.00 . . 0 0.00 . 0.00 0 0.00 . 0.00 0 0.00     140-180 0 0.00 0.00 . 1 11.11 100.00 25.00 0 0.00 0.00 0.00 1 11.11     > 180 0 0.00 0.00 . 3 33.33 37.50 75.00 5 55.56 62.50 100.00 8 88.89     Total 0 0.00 4 44.44 5 55.56 9 100.00
    23. 23. <ul><li>To generate list output for crosstabulations, add a slash ( / ) and the LIST option to the TABLES statement in your PROC FREQ step. </li></ul><ul><li>TABLES variable-1*variable-2 <* ... variable-n > / LIST ; </li></ul>SAS Techies 2009 proc freq data=clinic.diabetes; tables sex*weight*height / list ; format weight wtfmt. height htfmt.; run; 11/13/09 Sex Weight Height Frequency Percent Cumulative Frequency Cumulative Percent F < 140 < 5'5&quot; 2 10.00 2 10.00 F 140-180 < 5'5&quot; 5 25.00 7 35.00 F 140-180 5'5-10&quot; 4 20.00 11 55.00 M 140-180 5'5-10&quot; 1 5.00 12 60.00 M > 180 5'5-10&quot; 3 15.00 15 75.00 M > 180 > 5'10&quot; 5 25.00 20 100.00
    24. 24. <ul><li>NOFREQ suppresses cell frequencies </li></ul><ul><li>NOPERCENT suppresses cell percentages </li></ul><ul><li>NOROW supresses row percentages </li></ul><ul><li>NOCOL suppresses column percentages </li></ul>SAS Techies 2009 11/13/09 proc freq data=clinic.diabetes; tables sex*weight / nofreq norow nocol ; format weight weight.; run; Percent Table of Sex by Weight Sex Weight Total < 140 140-180 > 180 F 10.00 45.00 0.00 55.00 M 0.00 5.00 40.00 45.00 Total 2 10.00 10 50.00 8 40.00 20 100.00

    ×