Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Session II:
Modifying and Managing SAS data
Introduction to
Data Modification
• Problem: Data is not in the form needed
– Data manipulation SAS statements that
transf...
Session 2 Outline
• Creating and redefining variables
– Variable and Mathematical expressions
– SAS functions
– IF-THEN st...
Data Modification – Best Practices
• Do not to overwrite the original data set or
variables.
• Use arithmetic operations a...
Creating new variables
• Very easy to create new variables
• Syntax:
VARIABLE NAME= EXPRESSION;
• Example:
Schmid = 42
• N...
Creating variables –
ATTRIB Statement
• ATTRIB statement
LENGTH = <$> length
FORMAT = format
LABEL = ‘label’
• Example:
AT...
Different Variable Expressions
• Use the same form
VARIABLE NAME = EXPRESSION
• Numeric Constant
GRADE = 7;
• Character Co...
Different Mathematical Expressions
• Addition
VARADD= OLDVAR + 10;
• Subtraction:
VARSUB = OLDVAR– ANOTHERVAR;
• Multiplic...
Execution of Mathematical Expressions
• SAS follows the standard mathematical rules
of precedence.
– Exponents
– Multiplic...
SAS Functions
• Sometimes a simple equation doesn’t give you
want you want.
• FUNCTIONS are SAS programming already
done a...
Multitude of Functions
• Arithmetic: abs, max, min, mod, sqrt, sign
• Array: dim, hbound, lbound
• Character: compress, in...
Two function examples: SUM and MEAN
LIBNAME sas " ";
DATA examplefx;
SET sas.example;
ATTRIB senseeks label = "sensation s...
Output – Functions: SUM and MEAN
ID senseek1 senseek2 senseek3 senseek4 senseek senseeksum senseekm
senseek
mean
senseekme...
Review: Missing data
• Almost all datasets have some set of
incomplete data.
• Typically, missing numeric data are signifi...
Missing data and mathematical
expressions
• Any item that is missing and part of a
mathematical expression or SAS function...
IF-THEN STATEMENTS
• You sometimes want to apply a statement to
some observations but not others… or create
a new variable...
Comparison Operators
Symbolic Mnemonic Meaning
= EQ Equal
^= NE Not Equal
> GT Greater Than
< LT Less Than
>= GE Greater T...
Example: IF-THEN statement
*** Create a new variable based on values in the old variable ***;
IF gender = 1 THEN genfmt = ...
OUTPUT – IF – THEN EXAMPLE
gender genfmt Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 f 48 53.93 48 53.93
1...
IF-THEN / ELSE Statement
• More efficient IF-THEN statement for grouping
observations.
• Syntax: IF condition THEN action;...
IF-THEN / ELSE Statement Program
*** This shows an example of a new variable - momedcode ***
*** created by using an 'IF-T...
IF-THEN/ELSE RESULTS
momed Frequency Percent
1 8 5.16
2 24 15.48
3 18 11.61
4 67 43.23
5 38 24.52
momedcode Frequency Perc...
IF-THEN, DO
• The DO loop allows for more than one action
to be completed on a condition.
• Syntax: IF condition THEN DO;
...
Example: IF-THEN, DO
*** Creation of several new variables based on who a student lives with ***
*** using a IF-THEN DO lo...
IF-THEN, DO RESULTS
tradfam
tradfam Frequency Percent
Cumulative
Frequency
Cumulative
Percent
92 45.32 92 45.32
y 111 54.6...
Reducing or Subsetting a data set
• Occasionally, you may want to reduce the size
of a data set by:
– Reducing the number ...
WHERE Statement
• WHERE statements are typically used to reduce
the number of useable data cases
• WHERE statement can als...
KEEP and DROP Options
• Options can be used as part of the DATA step
or independently after the DATA and SET steps
• Examp...
EXAMPLE: WHERE and DROP
*** This program looks at a couple of different ways for reducing the size of the data set through...
Revisiting DATA step processing
• Reading data
– Opening and bringing data for processing
– Most management and all analys...
Before Concatenating or Merging…
• Know your data sets
• Identify and fix sources of common problems
• Ensure observations...
CONCATENATION
• Using the SET statement to concatenate.
• # of Observations in new data set = sum of
Observations in old d...
“Perfect” Concatenation Example
DATA all;
SET sas.example sas.new;
RUN;
PROC CONTENTS DATA = all;
RUN;
Concatenation Example Output
Data Set Name SAS.EXAMPLE Observations 203
Member Type DATA Variables 17
Engine V9 Indexes 0
...
Concatenation Example Output
Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat Label
1 ID Nu...
Less Ideal Concatenation
DATA all;
SET sas.example sas.newb;
RUN;
PROC CONTENTS DATA = all;
RUN;
Concatenation Example Output
Data Set Name SAS.EXAMPLE Observations 203
Member Type DATA Variables 17
Engine V9 Indexes 0
...
Concatenation Example Output
# Variable In Example In NewB Label
1 ID YES YES ID
5 age YES YES age
2 date YES YES date
4 g...
Concatenation Example Output
senseek4
senseek4 Frequency Percent
Cumulative
Frequency
Cumulative
Percent
. 213 57.41 213 5...
MERGE statement
– Combines two or more data sets.
– Often used in conjunction with the BY statement.
– Leaving off the BY ...
Matched Merge – BY statement
• Almost always used in conjunction with the
BY statement.
• When using a matched merge, the ...
“Perfect” Merging Example
• *** This program merges two files – demographics and sensation seeking items ***
• *** to crea...
Inventory - Demographics
Data Set Name SAS.DEMO Observations 203
Member Type DATA Variables 12
Engine V9 Indexes 0
Created...
Inventory – Sensation Seeking
Data Set Name SAS.SENSEEK Observations 168
Member Type DATA Variables 6
Engine V9 Indexes 0
...
Inventory – Merged Data Sets
Data Set Name WORK.EXAMPLE Observations 203
Member Type DATA Variables 17
Engine V9 Indexes 0...
Overwriting in Merges
• If the two data sets have variables with the same
name, the value of the variable in the data set
...
Key Takeaways for Concatenation and
Merging
• Know your data
• Plan your transformations
• Always assume you did it wrong
...
Transforming Longitudinal Data
• Different researchers and different analysis
types expect data to be “wide” or “long”
• W...
Example: Long data
ID Time Senseek1 Senseek2 Senseek3 Senseek4
1101 1 4 5 3 3
1101 2 1 4 5 5
1101 3 1 4 1 3
1101 4 1 4 4 5...
Example: Wide Data
ID SS1
_T1
SS2
_T1
SS3
_T1
SS4
_T1
SS1
_T2
SS2
_T2
SS3
_T2
SS4
_T2
SS1
_T3
SS2
_T3
SS3
_T3
SS4
_T3
SS1
...
Transforming data from long to wide:
PROC TRANSPOSE
• This is not something that you want to hand
code!
• PROC TRANSPOSE c...
PROC TRANSPOSE syntax
PROC TRANSPOSE DATA=sas.transpose OUT=wide1 PREFIX=ss1_t;
BY ID ;
ID time;
VAR senseek1;
RUN;
…
PROC...
PROC TRANSPOSE Output
ID ss1_t
1
ss1_t
2
ss1_t
3
ss1_t
4
ss2_t
1
ss2_t
2
ss2_t
3
ss2_t
4
ss3_t
1
ss3_t
2
ss3_t
3
ss3_t
4
s...
PROC SQL
• SQL functionality in SAS, those who use SQL
can easily use PROC SQL in SAS
• PROC SQL can be used instead of a ...
Types of SAS Errors
• Four major types of errors
– Syntax errors
– Execution-time errors
– Data errors
– Semantic errors
How to Troubleshoot Errors
• Unit testing -> do not submit multiple commands
without testing
• Read your log -> and start ...
Upcoming SlideShare
Loading in …5
×

Workshop SAS FOR DATA MANAGEMENT Session 2

157 views

Published on

Duke SSRI EHDi Data Management Workshop, by Lorrie Schmid, April 2015

This workshop, conducted across three sessions, offers an overview of the SAS programming language, focusing on data management activities. Session 1 is a general overview of SAS with a focus on major SAS components (Program Editor, Log and Output), core concepts of SAS programming (DATA and PROC), and issues of importing/exporting, reading, and writing SAS datasets. Session 2 focuses on data modification, including variable creation and variable recoding, as well as adding to and subsettting from particular datasets. Key SAS statements described include: ATTRIB, SET, WHERE, IF-THEN, MERGE. Session 3 focuses on data analysis, specifically descriptive analyses typically used in data management. Key SAS statements described include: PROC CONTENTS, PROC MEANS, and PROC FREQ. Together, these sessions allow researchers to learn basic data management processes using the SAS statistical system.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Workshop SAS FOR DATA MANAGEMENT Session 2

  1. 1. Session II: Modifying and Managing SAS data
  2. 2. Introduction to Data Modification • Problem: Data is not in the form needed – Data manipulation SAS statements that transform the data set in some way. – Data subsetting SAS statements that specify a subgroup. – Concatenation Adding new cases /rows to a dataset. – Merging Adding new variables / columns to a dataset.
  3. 3. Session 2 Outline • Creating and redefining variables – Variable and Mathematical expressions – SAS functions – IF-THEN statement, ELSE and DO • Subsetting data – IF-THEN statement – WHERE statement • Concatenation • Merge / Matched merge
  4. 4. Data Modification – Best Practices • Do not to overwrite the original data set or variables. • Use arithmetic operations and functions to create new variables. • Can also use IF-THEN (IF THEN/ELSE) statements to create new variables.
  5. 5. Creating new variables • Very easy to create new variables • Syntax: VARIABLE NAME= EXPRESSION; • Example: Schmid = 42 • Naming convention for variables.
  6. 6. Creating variables – ATTRIB Statement • ATTRIB statement LENGTH = <$> length FORMAT = format LABEL = ‘label’ • Example: ATTRIB schmid LENGTH= 2 LABEL= “constant for schmid”;
  7. 7. Different Variable Expressions • Use the same form VARIABLE NAME = EXPRESSION • Numeric Constant GRADE = 7; • Character Constant GRADE = ‘seventh’; • Variable duplicated: GRADE = VAR1;
  8. 8. Different Mathematical Expressions • Addition VARADD= OLDVAR + 10; • Subtraction: VARSUB = OLDVAR– ANOTHERVAR; • Multiplication: VARMULT= OLDVAR* 7; • Division: VARDIV= OLDVAR/ VARSUB;
  9. 9. Execution of Mathematical Expressions • SAS follows the standard mathematical rules of precedence. – Exponents – Multiplication – Division – Addition – Subtraction • You can use parentheses to override the order. • Also, parentheses help others read your programs.
  10. 10. SAS Functions • Sometimes a simple equation doesn’t give you want you want. • FUNCTIONS are SAS programming already done and incorporated into the code. • Syntax: Function-name (argument, argument,..)
  11. 11. Multitude of Functions • Arithmetic: abs, max, min, mod, sqrt, sign • Array: dim, hbound, lbound • Character: compress, index, scan, substr • Date & Time: date, intck, mdy, timepart • Mathematical: log, ordinal • Probability: poisson, probchi, probnorm • Random Number: normal ranumi • Sample Statistic: mean, n, std, num
  12. 12. Two function examples: SUM and MEAN LIBNAME sas " "; DATA examplefx; SET sas.example; ATTRIB senseeks label = "sensation seeking - sum fx"; senseek = sum (senseek1, senseek2, senseek3, senseek4); ATTRIB senseeksum label = "sensation seeking - sum"; senseeksum = senseek1 + senseek2 + senseek3 + senseek4; ATTRIB senseekm label = "sensation seeking - mean fx"; senseekm = mean (senseek1, senseek2, senseek3, senseek4); ATTRIB senseekmean label = "sensation seeking - mean"; senseekmean = (senseek1—senseek4)/4; ATTRIB senseekmeansum label = "another mean - using sum"; senseekmeansum = senseeksum/4; RUN;
  13. 13. Output – Functions: SUM and MEAN ID senseek1 senseek2 senseek3 senseek4 senseek senseeksum senseekm senseek mean senseekmea nsum 1101 4 5 3 3 15 15 3.75 3.75 3.75 1102 5 3 1 4 13 13 3.25 3.25 3.25 1103 4 1 1 3 9 9 2.25 2.25 2.25 1104 4 3 2 3 12 12 3 3 3 1105 5 2 3 4 14 14 3.5 3.5 3.5 1106 2 2 3 3 10 10 2.5 2.5 2.5 1107 3 3 4 3 13 13 3.25 3.25 3.25 1108 4 4 2 4 14 14 3.5 3.5 3.5 1109 3 4 4 3 14 14 3.5 3.5 3.5 1110 5 5 5 5 20 20 5 5 5 1111 5 1 5 5 16 16 4 4 4 1112 1 2 5 3 11 11 2.75 2.75 2.75 1113 1 1 1 1 4 4 1 1 1 1114 3 4 3 3 13 13 3.25 3.25 3.25 1115 5 5 5 5 20 20 5 5 5 1116 3 1 1 5 10 10 2.5 2.5 2.5 1117 5 5 5 4 19 19 4.75 4.75 4.75 1118 5 4 4 4 17 17 4.25 4.25 4.25 1119 4 5 5 5 19 19 4.75 4.75 4.75 1120 1 1 1 1 4 4 1 1 1 1121 . . . . . . . . . 1122 4 4 5 5 18 18 4.5 4.5 4.5
  14. 14. Review: Missing data • Almost all datasets have some set of incomplete data. • Typically, missing numeric data are signified by a period (.) • Know the amount of missing data that you have within and across variables before you create new variables.
  15. 15. Missing data and mathematical expressions • Any item that is missing and part of a mathematical expression or SAS function will be missing in the created variable from the expression or function. • Log statement can alert you. • Inputting missing values can be done.
  16. 16. IF-THEN STATEMENTS • You sometimes want to apply a statement to some observations but not others… or create a new variable based on some expression. • Syntax: IF condition THEN action; • Example: IF gender = 4 THEN delete;
  17. 17. Comparison Operators Symbolic Mnemonic Meaning = EQ Equal ^= NE Not Equal > GT Greater Than < LT Less Than >= GE Greater Than or Equal <= LE Less Than or Equal & AND All comparisons must be true ! OR Only one comparison must be true
  18. 18. Example: IF-THEN statement *** Create a new variable based on values in the old variable ***; IF gender = 1 THEN genfmt = 'm'; IF gender = 0 THEN genfmt = 'f'; *** Deletes data based on values in a variable***; IF race = 1 THEN delete; *** Sets items above 4 to 10 in senseek variable ***; *** Could use GE or >= here ***; IF senseek1 GE 4 THEN senseek1 = 10; RUN;
  19. 19. OUTPUT – IF – THEN EXAMPLE gender genfmt Frequency Percent Cumulative Frequency Cumulative Percent 0 f 48 53.93 48 53.93 1 m 41 46.07 89 100.00 race Frequency Percent 2 68 76.40 3 12 13.48 4 5 5.62 5 1 1.12 6 3 3.37 Analysis Variable : senseek1 senseek1 N Mean Minimum Maximum 86 7.546 1.000 10.000
  20. 20. IF-THEN / ELSE Statement • More efficient IF-THEN statement for grouping observations. • Syntax: IF condition THEN action; ELSE IF condition THEN action; ELSE IF condition THEN action; ELSE action;
  21. 21. IF-THEN / ELSE Statement Program *** This shows an example of a new variable - momedcode *** *** created by using an 'IF-THEN/ELSE set of statements ***; IF momed = 1 THEN momedcode = 'No HS'; ELSE IF momed = 2 THEN momedcode = 'HS'; ELSE IF momed = 3 THEN momedcode = 'Assoc'; ELSE IF momed = 4 THEN momedcode = 'BA/BS'; ELSE IF momed = 5 THEN momedcode = 'MA/MS'; ELSE IF momed = 6 THEN momedcode = 'PhD JD MD'; RUN; *** Note, to gain more efficiency, I could have found the highest frequency *** *** of momed first and used that in the first IF-THEN statement, then *** *** found the second highest frequency, and so on ***; PROC FREQ; TABLES momed momedcode momed*momedcode/LIST; RUN;
  22. 22. IF-THEN/ELSE RESULTS momed Frequency Percent 1 8 5.16 2 24 15.48 3 18 11.61 4 67 43.23 5 38 24.52 momedcode Frequency Percent Assoc 18 11.61 BA/BS 67 43.23 HS 24 15.48 MA/MS 38 24.52 No HS 8 5.16 momed momedcode Frequency Percent 1 No HS 8 5.16 2 HS 24 15.48 3 Assoc 18 11.61 4 BA/BS 67 43.23 5 MA/MS 38 24.52
  23. 23. IF-THEN, DO • The DO loop allows for more than one action to be completed on a condition. • Syntax: IF condition THEN DO; action; action; (…) end;
  24. 24. Example: IF-THEN, DO *** Creation of several new variables based on who a student lives with *** *** using a IF-THEN DO loop ***; IF livewmom = 1 AND livewdad = 1 THEN DO; totfam = 3; tradfam = 'yes'; nontrad = 'no'; *** You could add as many actions here as you wanted ***; END; RUN; PROC FREQ; TABLES tradfam nontrad/missing; RUN; PROC MEANS; VAR tofam;
  25. 25. IF-THEN, DO RESULTS tradfam tradfam Frequency Percent Cumulative Frequency Cumulative Percent 92 45.32 92 45.32 y 111 54.68 203 100.00 nontrad nontrad Frequency Percent Cumulative Frequency Cumulative Percent 92 45.32 92 45.32 n 111 54.68 203 100.00 totfam totfam Frequency Percent Cumulative Frequency Cumulative Percent 92 45.32 92 45.32 3 111 54.68 203 100.00
  26. 26. Reducing or Subsetting a data set • Occasionally, you may want to reduce the size of a data set by: – Reducing the number of cases (rows) using the WHERE statement – Reducing the number of variables (columns) using the DROP and KEEP statements.
  27. 27. WHERE Statement • WHERE statements are typically used to reduce the number of useable data cases • WHERE statement can also be very useful when doing data checks. – Missing values – Out of range values – Logic checks • Syntax: WHERE condition;
  28. 28. KEEP and DROP Options • Options can be used as part of the DATA step or independently after the DATA and SET steps • Example Syntax: • DATA one(KEEP = <variable list>); • DATA two(DROP = <variable list>); • DATA one; SET data; KEEP var1; DROP var2;
  29. 29. EXAMPLE: WHERE and DROP *** This program looks at a couple of different ways for reducing the size of the data set through the *** ***WHERE statement and the DROP statement. WHERE statements can reduce the number of *** ***observations (rows) while DROP statements can reduce the number of variables (columns) ***; *** This example subsets the data set by only looking at females (gender = 2) ***; DATA reduce1; SET reduce.example; WHERE gender = 2; *** This example subsets the data by only looking at white females (race = 2) and (gender = 2) ***; DATA reduce2; SET reduce.example; WHERE ((gender = 2) and (race = 2)); *** This example shows how to drop variables not being used from the dataset ***; DATA reduce3 (DROP= momed momwork daded dadwork); SET reduce.example;
  30. 30. Revisiting DATA step processing • Reading data – Opening and bringing data for processing – Most management and all analysis can occur with reading data sets • Combining data – Adding cases: concatenation – Adding variables: merging • Modifying data – Using MODIFY statement
  31. 31. Before Concatenating or Merging… • Know your data sets • Identify and fix sources of common problems • Ensure observations are in order • Test, test, test
  32. 32. CONCATENATION • Using the SET statement to concatenate. • # of Observations in new data set = sum of Observations in old data sets. • Missing variables and what causes them. • Example Syntax: DATA example; SET <name of dataset(s)>;
  33. 33. “Perfect” Concatenation Example DATA all; SET sas.example sas.new; RUN; PROC CONTENTS DATA = all; RUN;
  34. 34. Concatenation Example Output Data Set Name SAS.EXAMPLE Observations 203 Member Type DATA Variables 17 Engine V9 Indexes 0 Created Wed, Feb 25, 2015 11:57:34 AM Observation Length 104 Data Set Name SAS.NEW Observations 191 Member Type DATA Variables 17 Engine V9 Indexes 0 Created Monday,March 09, 2015 10:39:41 AM Observation Length 104 Data Set Name WORK.ALL Observations 394 Member Type DATA Variables 17 Engine V9 Indexes 0 Created Monday,March 09, 2015 10:42:57 AM Observation Length 104
  35. 35. Concatenation Example Output Alphabetic List of Variables and Attributes # Variable Type Len Format Informat Label 1 ID Num 8 ID 5 age Num 8 age 2 date Num 8 DATE9. DATE9. date 4 gender Num 8 gender 14 livewdad Num 8 livewdad 13 livewmom Num 8 livewmom 12 momcode Char 1 $1. $1. momcode 11 momed Num 8 momed 17 nontrad Char 1 $1. $1. nontrad 3 race Num 8 race 6 sensation_seeking Char 1 $1. $1. sensation seeking 7 senseek1 Num 8 senseek1 8 senseek2 Num 8 senseek2 9 senseek3 Num 8 senseek3 10 senseek4 Num 8 senseek4 15 totfam Char 1 $1. $1. totfam 16 tradfam Char 1 $1. $1. tradfam
  36. 36. Less Ideal Concatenation DATA all; SET sas.example sas.newb; RUN; PROC CONTENTS DATA = all; RUN;
  37. 37. Concatenation Example Output Data Set Name SAS.EXAMPLE Observations 203 Member Type DATA Variables 17 Engine V9 Indexes 0 Created Wed, Feb 25, 2015 11:57:34 AM Observation Length 104 Data Set Name SAS.NEWB Observations 168 Member Type DATA Variables 14 Engine V9 Indexes 0 Created Monday, March 09, 2015 10:53:24 AM Observation Length 104 Data Set Name WORK.ALL Observations 371 Member Type DATA Variables 18 Engine V9 Indexes 0 Created Monday, March 09, 2015 10:53:47 AM Observation Length 112
  38. 38. Concatenation Example Output # Variable In Example In NewB Label 1 ID YES YES ID 5 age YES YES age 2 date YES YES date 4 gender YES YES gender 14 livewdad YES YES livewdad 13 livewmom YES YES livewmom 18 livewsib YES livewsib 12 momcode YES YES momcode 11 momed YES YES momed 17 nontrad YES nontrad 3 race YES YES race 6 sensation_seeking YES YES sensation seeking 7 senseek1 YES YES senseek1 8 senseek2 YES YES senseek2 9 senseek3 YES YES senseek3 10 senseek4 YES senseek4 15 totfam YES totfam 16 tradfam YES tradfam
  39. 39. Concatenation Example Output senseek4 senseek4 Frequency Percent Cumulative Frequency Cumulative Percent . 213 57.41 213 57.41 1 28 7.55 241 64.96 2 11 2.96 252 67.92 3 47 12.67 299 80.59 4 32 8.63 331 89.22 5 40 10.78 371 100.00
  40. 40. MERGE statement – Combines two or more data sets. – Often used in conjunction with the BY statement. – Leaving off the BY statement can lead to unexpected and possibly disastrous results. – WARNING! If a variable exists in both data sets, the output data set will contain the value of the variable from the rightmost data set in the MERGE statement.
  41. 41. Matched Merge – BY statement • Almost always used in conjunction with the BY statement. • When using a matched merge, the variable you want to match on is identified. • Each data set must be sorted by the match variable before the merge.
  42. 42. “Perfect” Merging Example • *** This program merges two files – demographics and sensation seeking items *** • *** to create the example dataset using the ID variable to do a matched merge ***; • PROC SORT data = sas.senseek; • BY id; • RUN; • PROC SORT data = sas.demo; • BY id; • RUN; • DATA example; • MERGE sas.senseek sas.demo; • BY id; • RUN; • PROC CONTENTS data= sas.senseek; • PROC CONTENTS data = sas.demo; • PROC CONTENTS data = example; • RUN;
  43. 43. Inventory - Demographics Data Set Name SAS.DEMO Observations 203 Member Type DATA Variables 12 Engine V9 Indexes 0 Created Monday, March 09, 2015 11:18:42 AM Observation Length 72 Alphabetic List of Variables and Attributes # Variable Type Len Format Informat Label 1 ID Num 8 ID 5 age Num 8 age 2 date Num 8 DATE9. DATE9. date 4 gender Num 8 gender 10 livewdad Num 8 livewdad 9 livewmom Num 8 livewmom 8 momcode Char 1 $1. $1. momcode 7 momed Num 8 momed 13 nontrad Char 1 $1. $1. nontrad 3 race Num 8 race 11 totfam Char 1 $1. $1. totfam 12 tradfam Char 1 $1. $1. tradfam
  44. 44. Inventory – Sensation Seeking Data Set Name SAS.SENSEEK Observations 168 Member Type DATA Variables 6 Engine V9 Indexes 0 Created Monday, March 09, 2015 11:18:42 AM Observation Length 48 Alphabetic List of Variables and Attributes # Variable Type Len Format Informat Label 1 ID Num 8 ID 9 sensation_seeking Char 1 $1. $1. sensation seeking 2 senseek1 Num 8 senseek1 3 senseek2 Num 8 senseek2 4 senseek3 Num 8 senseek3 5 senseek4 Num 8 senseek4
  45. 45. Inventory – Merged Data Sets Data Set Name WORK.EXAMPLE Observations 203 Member Type DATA Variables 17 Engine V9 Indexes 0 Created Monday, March 09, 2015 11:18:42 AM Observation Length 104 Alphabetic List of Variables and Attributes # Variable Type Len Format Informat Label 1 ID Num 8 ID 13 age Num 8 age 10 date Num 8 DATE9. DATE9. date 12 gender Num 8 gender 17 livewdad Num 8 livewdad 16 livewmom Num 8 livewmom 15 momcode Char 1 $1. $1. momcode 14 momed Num 8 momed 9 nontrad Char 1 $1. $1. nontrad 11 race Num 8 race 2 sensation_seeking Char 1 $1. $1. sensation seeking 3 senseek1 Num 8 senseek1 4 senseek2 Num 8 senseek2 5 senseek3 Num 8 senseek3 6 senseek4 Num 8 senseek4 7 totfam Char 1 $1. $1. totfam 8 tradfam Char 1 $1. $1. tradfam
  46. 46. Overwriting in Merges • If the two data sets have variables with the same name, the value of the variable in the data set named last will overwrite the value of the variable in the data set named first. • Option – rename those variables in the merging. • Syntax: (RENAME = (oldvariable = newvariable)) in a DATA step; RENAME option statement under SET step
  47. 47. Key Takeaways for Concatenation and Merging • Know your data • Plan your transformations • Always assume you did it wrong • Test, test, test
  48. 48. Transforming Longitudinal Data • Different researchers and different analysis types expect data to be “wide” or “long” • What does this mean? • Wide: one subject across a row, with time elements as part of the variables • Long: each subject spans multiple rows, with time as a variable
  49. 49. Example: Long data ID Time Senseek1 Senseek2 Senseek3 Senseek4 1101 1 4 5 3 3 1101 2 1 4 5 5 1101 3 1 4 1 3 1101 4 1 4 4 5 1102 1 5 3 1 4 1102 2 1 1 1 1 1102 3 1 4 2 3 1102 4 1 3 3 3 1103 1 4 1 1 3 1103 2 1 . . . 1103 3 1 4 3 3 1103 4 1 3 2 2
  50. 50. Example: Wide Data ID SS1 _T1 SS2 _T1 SS3 _T1 SS4 _T1 SS1 _T2 SS2 _T2 SS3 _T2 SS4 _T2 SS1 _T3 SS2 _T3 SS3 _T3 SS4 _T3 SS1 _T4 SS2 _T4 SS3 _T4 SS4 _T4 1101 4 5 3 3 1 4 5 5 1 4 1 3 1 4 4 5 1102 5 3 1 4 1 1 1 1 1 4 2 3 1 3 3 3 1103 4 1 1 3 1 . . . 1 4 3 3 1 3 2 2 1104 4 3 2 3 1 4 4 5 1 4 4 2 1 . . .
  51. 51. Transforming data from long to wide: PROC TRANSPOSE • This is not something that you want to hand code! • PROC TRANSPOSE can help change files from long to wide and back again.
  52. 52. PROC TRANSPOSE syntax PROC TRANSPOSE DATA=sas.transpose OUT=wide1 PREFIX=ss1_t; BY ID ; ID time; VAR senseek1; RUN; … PROC TRANSPOSE DATA=sas.transpose OUT=wide4 PREFIX=ss4_t; BY ID ; ID time; VAR senseek4; RUN; DATA wide; MERGE wide1 wide2 wide3 wide4; BY ID; RUN;
  53. 53. PROC TRANSPOSE Output ID ss1_t 1 ss1_t 2 ss1_t 3 ss1_t 4 ss2_t 1 ss2_t 2 ss2_t 3 ss2_t 4 ss3_t 1 ss3_t 2 ss3_t 3 ss3_t 4 ss4_t 1 ss4_t 2 ss4_t 3 ss4_t 4 1101 4 1 1 1 5 4 4 4 3 5 1 4 3 5 3 5 1102 5 1 1 1 3 1 4 3 1 1 2 3 4 1 3 3 1103 4 1 1 1 1. 4 3 1. 3 2 3. 3 2 1104 4 1 1 1 3 4 4. 2 4 4. 3 5 2. 1105 5 1 1 1 2 3 3 3 3 3 3 4 4 3 2 1 1106 2 1 1 1 2 3 4 4 3 2 1 3 3 2 5 3
  54. 54. PROC SQL • SQL functionality in SAS, those who use SQL can easily use PROC SQL in SAS • PROC SQL can be used instead of a DATA and PROC step found in SAS • Some processes are more efficient… if you are doing them correctly • Especially useful when dealing with BIG data
  55. 55. Types of SAS Errors • Four major types of errors – Syntax errors – Execution-time errors – Data errors – Semantic errors
  56. 56. How to Troubleshoot Errors • Unit testing -> do not submit multiple commands without testing • Read your log -> and start from the top – Multiple errors can be fixed with solving one problem • “Google it” • SAS / SAS User Group International http://support.sas.com/events/sasglobalforum/previou s/online.html

×