Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Workshop SAS FOR DATA MANAGEMENT Session 1

213 views

Published on

Duke SSRI EHDi Workshop slides by Lorrie Schmid, April 2015

This workshop, conducted across three sessions, offers an overview of the SAS programming language, focusing on data management activities. Session 1 is a general overview of SAS with a focus on major SAS components (Program Editor, Log and Output), core concepts of SAS programming (DATA and PROC), and issues of importing/exporting, reading, and writing SAS datasets. Session 2 focuses on data modification, including variable creation and variable recoding, as well as adding to and subsettting from particular datasets. Key SAS statements described include: ATTRIB, SET, WHERE, IF-THEN, MERGE. Session 3 focuses on data analysis, specifically descriptive analyses typically used in data management. Key SAS statements described include: PROC CONTENTS, PROC MEANS, and PROC FREQ. Together, these sessions allow researchers to learn basic data management processes using the SAS statistical system.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Workshop SAS FOR DATA MANAGEMENT Session 1

  1. 1. SAS for Data Management Lorrie Schmid Adam Mack March / April, 2015
  2. 2. Session 1: General Overview of SAS
  3. 3. What is SAS? • SAS is a programming language • Data management (Base SAS) • Statistical analysis (SAS/STAT) • Graphical procedures (SAS/Graph) • Also includes many other elements that you can access at Duke SAS/Access SAS/ETS SAS/GIS SAS OLAP SAS Share SAS/AF SAS/FSP SAS/IML SAS/OR SAS/Connect SAS/Genetics SAS Integration SAS/QC
  4. 4. How to get SAS at Duke • If you are part of Duke, but not SSRI, contact OIT: https://oit.duke.edu/comp-print/software/ • If you are part of SSRI, request SAS using the IT ticketing system: https://ssri.duke.edu/ssri-computer-help
  5. 5. Opening SAS for the first time • My first time… • Hopefully, your first time…
  6. 6. Key Components of Session 1 • The SAS system • Importing / Exporting data • Creating and reading data • Data components • Variable Names • SAS statement types • Debugging Programs
  7. 7. SAS Programming System Components • Program Editor • Run • Log • Output • Explorer
  8. 8. LOG PROGRAM EDITOR EXPLORER RUN
  9. 9. Program Editor • Programs are written in this window. • Create and write programs • Edit programs already written • Save programs • Run programs
  10. 10. Running a Program • Click on the running man icon. • F8 key. • Selecting submit command from the run menu. • Output found in output window.
  11. 11. Log Window • SAS messages are displayed in the log. • The log is a tool for diagnosing problems with the program. • Comments appear in blue • Warnings appear in green • Problems (typically) appear in red – and usually cause the program to quit running
  12. 12. Output Window • Results from statistical analyses will be shown in the Output window. • Save as a file or print. • Results window.
  13. 13. Explorer Window • Create and find SAS libraries • Create and view SAS data files • Open SAS data files and view contents • Edit and delete programs, logs and data files • View and delete output
  14. 14. 2-minute statement on SAS Statements • SAS statements include a string of SAS keywords, SAS names, and operators that give information or perform a task. • Key statement types: – Global (e.g., LIBNAME, RUN) – DATA – PROC • Each statement must end with a semicolon!
  15. 15. Importing / Exporting Data • Two ways to import and export data to be discussed. –Using the SAS Import and Export Wizard. –Using StatTransfer to translate from one datafile program to another. https://www.stattransfer.com/
  16. 16. Importing Example
  17. 17. Importing Syntax PROC IMPORT OUT= WORK.test DATAFILE= "C:UsersschmidDesktopSASexample.xls" DBMS=EXCEL REPLACE; RANGE=“example"; GETNAMES=YES; MIXED=NO; USEDATE=YES; SCANTIME=YES; RUN;
  18. 18. Importing / Exporting SAS Data • Many different ways to import or export SAS data. –Using a DATA step to create a data file from raw data. –Using Dynamic Data Exchange (DDE) with Excel and Word.
  19. 19. Concerns about Importing / Exporting • Make sure that your SAS version is compatible with Microsoft –32-bit versus 64-bit processing • Don’t assume that the data imports correct – check and test! –This includes variable names, labels and formats
  20. 20. Reading SAS data sets • After the import is completed, SAS data sets are available to view, manipulate and analyze. • Depending on the process used in the OUT= statement, data sets may be temporary – only to be used during the current session or permanent.
  21. 21. Types of SAS Data Sets • Temporary – Only exist during the SAS session – One-part data set name (i.e. survey) or two-part name starting with WORK. • Permanent – Can be saved to disk or to your hard drive – Available for use anytime – Two-part data set name (i.e. my.survey)
  22. 22. Best Practices – Data Sets • Generally, all work should be done as a series of temporary datasets • Important to save code but not all of the in- between SAS datasets created by the code • However, significant changes need to be saved as permanent datasets – with versions, etc.
  23. 23. Permanent Data Sets - Libraries • Permanent SAS data sets are saved in a pre-designated subdirectory or folder • SAS refers to it as a LIBRARY • Libraries can be specified through a graphic interface or by using the libname statement.
  24. 24. Permanent Data Sets – LIBNAME Statement • LIBNAME Statement –Tells SAS where the permanent data set is to be stored or is already stored. –Syntax: LIBNAME mydata ‘C:temp’; KEYWORD LIBREF DRIVE:DIRECTORY
  25. 25. General Assumptions about Data • Imagine a table or spreadsheet. • SAS consists of: –Variables (columns) –Observations (rows) • Two data types: –Character –Numeric
  26. 26. SAS data files and elements: Our Example for this class ID date race gender age senseek1 senseek2 senseek3 senseek4 momed livewmomlivewdad 1101 4/18/2002 1 1 12 4 5 3 3 5 1 1 1102 4/23/2002 1 1 12 5 3 1 4 . 1 1 1103 5/11/2002 2 0 12 4 1 1 3 5 1 1 1104 5/14/2002 2 0 12 4 3 2 3 4 1 1 1105 5/18/2002 2 1 12 5 2 3 4 4 1 1 1106 5/22/2002 1 0 12 2 2 3 3 3 1 1 1107 6/18/2001 4 0 13 3 3 4 3 1 1 1 1108 5/27/2002 1 0 12 4 4 2 4 5 1 0 1109 6/1/2002 2 1 12 3 4 4 3 5 1 1 1110 6/6/2002 1 0 12 5 5 5 5 4 1 0 1111 6/24/2001 1 1 13 5 1 5 5 4 0 0 1112 6/8/2002 2 0 12 1 2 5 3 3 1 1 1113 6/30/2001 1 1 13 1 1 1 1 3 1 0 1114 6/12/2002 1 1 12 3 4 3 3 4 1 0 1115 6/14/2002 1 1 12 5 5 5 5 3 1 0 1116 6/18/2002 2 0 12 3 1 1 5 4 1 1 1117 7/8/2001 2 1 13 5 5 5 4 2 1 1 1118 6/22/2002 3 1 12 5 4 4 4 . 1 1 1119 6/27/2002 2 1 12 4 5 5 5 4 1 1 1120 7/1/2002 1 0 12 1 1 1 1 2 1 1 1121 7/4/2002 1 1 12 . . . . 4 1 1
  27. 27. Numeric Data • Numeric data are numbers… which can include decimals or negative numbers • Dates and times are a special form of numeric data. • Periods (.) are used to define missing data. • Primary statistical analysis occurs using numeric data.
  28. 28. Character Data • SAS defines character data as “a sequence of letters, numbers and/or special characters” • You use “ “ to define the character variable. • Manipulation of character data is possible and often necessary • Some basic analyses can be conducted, but are limited
  29. 29. Converting Character to Numeric (and Numeric to Character) • Step 1: Create a new variable • Step 2: Use either PUT (num -> char) or INPUT (char -> num) to identify the correct new variable type. • Step 3: Delete the original variable • Step 4: Rename the new variable into the original variable name
  30. 30. Example - Converting newrace = PUT(race, 1.0); DROP race; RENAME newrace = race; newid = INPUT (id, $4); DROP id; RENAME newid = id;
  31. 31. Dates in SAS • Internal to SAS, dates are stored as the numeric distance away from Jan 1, 1960. –Negative numbers are dates before –Positive numbers are dates after • Formats can be used to transform internal dates to usable formats. –Date9. = 14 MAR 1978 –Mmddyy10. = 03/14/1978
  32. 32. Dates in SAS • Functions can be used to construct or deconstruct a date – Mdy (m,d,y) = takes the month, day, year and puts them altogether – Month (), day (), year () – takes the date and creates a month, day, and year variable • Dates can be tricky – make sure that the formats and functions work the way that you want them to.
  33. 33. Missing Data • Almost all data sets have incomplete data. • Character data are blanks. (“ “) • Numeric data are periods. (.) • Importance of missing data to analysis N.
  34. 34. Variable name rules (SAS) • Names must be 32 characters or less. • Must start with a letter or underscore ( _ ). • Contain only letters, numbers and underscores.
  35. 35. Variable name rules (best practices) • You want variable names to describe useful information about the data • Conform not only to SAS but to other statistical software standards • Can include letters, numbers, and underscores – ONLY! • Case matters – so typically use all lowercase • Can be up to 32 characters, but shorter is better
  36. 36. More about SAS programs • Programs consists of SAS statements • They are executed in the order they are entered • Can be UPPER CASE or lower case • Can continue on the next line or be on the same line as other statements • Every SAS statements ends with a semicolon
  37. 37. SAS Building Blocks • DATA step –Introduce and prepare data for use in SAS. • PROC step –Issues procedures to actually perform analysis and generate output.
  38. 38. SAS Global Options • PROC OPTIONS; RUN; • Types of options –Log, output, and procedure options –Data set control options –Error handling options –Reading and writing data options
  39. 39. Run Statement • Run statement tells SAS that the data step or procedure has ended. • End each set of statements with a run statement. • You must submit your programs in order for them to run.
  40. 40. Basic Components of SAS • Every SAS program is constructed using the DATA step and/or PROC statements. • Any combination of DATA steps and/or procedures may be used. • Run statements should be used throughout program.
  41. 41. Example Program LIBNAME example “c:sample”; DATA test; SET example.sasexample; newrace = PUT(race, 1.0); DROP race; RENAME newrace = race; PROC FREQ; TABLES race; TITLE ‘Check creation of new race variable’; RUN; PROC CONTENTS; TITLE ‘Check creation of new race variable’ RUN;
  42. 42. Other Important Statements • Comments • Variable labels • Formats
  43. 43. Comment Syntax • Two styles of comments –One starts with an asterisk ( * ) and ends with a semicolon. –The other starts with a slash asterisk ( /*) and ends with an asterisk slash ( */).
  44. 44. Comments • Explain the program using comments. • Concepts to include: • Program Name • Date program written / revised • Input and output file names (if applicable) • Purpose of the program
  45. 45. Variable Labels • By default, SAS uses variable names as labels. • LABEL statement allows you to create your own. • Syntax: LABEL <var name> = ‘Label description’
  46. 46. Variable Formats • A format is an instruction that SAS uses to write data values. • Formats can be used to control the appearance of data values or group values together for analysis.
  47. 47. Syntax for Variable Formats • <$> format <w>.<d> where: • $ : indicates a character format; its absence indicates a numeric format. • Format: names the format. • W: specifies the format width. • D: specifies an optional decimal • Formats ALWAYS contain a period (.)
  48. 48. Format and Label Syntax LIBNAME format "C:sgsgsdjljd"; PROC FORMAT; VALUE ss15 1 = 'Strongly disagree' 2 = 'Disagree' 3 = 'Neither Disagree or Agree' 4 = 'Agree' 5 = 'Strongly Agree'; DATA example; SET format.example; LABEL ID = 'student ID' Age = 'student age' race = 'student race/ethnicity' gender = 'student gender'; FORMAT gender best8. senseek1 senseek2 senseek3 senseek4 ss15.; RUN;
  49. 49. Debugging Programs • Reading and understanding the SAS log • Debug your programs one step at a time • Common errors: –Missing semicolon –Misspelled commands –Incorrect variable definitions –Using the DATA and PROC steps incorrectly or with the wrong options
  50. 50. Example: Problems with Programs LIBNAME “c:temp”; DATA one SET schmid.data; newrace = PUT(race, 1.0); drop race; rename newrace = race; RUN; PROC FREQ; VAR race; RUN; PROC MEANS n mean std; VAR race; RUN;
  51. 51. SAS Resources: Helpful Books and Websites
  52. 52. SAS Websites of Note • SAS Online Documentation http://support.sas.com/documentation/ • UCLA SAS Portal http://www.ats.ucla.edu/stat/sas/
  53. 53. Recommended Books on SAS • Andrews, F. et al. (1998). Selecting Statistical Techniques for Social Science Data: A Guide for SAS. Cary, NC: SAS Institute. • Delwiche, L. & Slaughter, S (2012). The Little SAS Book, 5th Edition. Cary, NC: SAS Publishing. • Hatcher, L. & Stepanski, E. (1994). A Step-by-Step Approach to Using the SAS System for Univariate and Multivariate Statistics. Cary, NC: SAS Publishing. • Cody, R. & Smith, J. (2005). Applied Statistics and the SAS Programming Language, 5th edition. Upper Saddle River, NJ: Prentice Hall. • Cody, R. (2004). SAS Functions by Example. Cary, NC: SAS Publishing.

×