2. Agenda
Introduction
Read data from raw data file
Formatting the data
Data Manipulation and statistical analysis
Combining and subsetting data
Processing the data iteratively
Report Production
3. Turning data into information
DATA
Data Step
SAS PROC
Data Steps
Sets
Information
4. Design of SAS System
MultiVendor Architecture
90% 10%
independent dependent
Servers/ Super
PC Workstation Midrange Mainframe
Computer
6. SAS Program
A SAS program is a sequence of steps that the
user submits for execution.
DATA steps are typically used to create
Raw SAS data sets.
Data
DATA SAS PROC Report
Step Data Step
Set
SAS
Data PROC steps are typically used to process
Set SAS data sets (that is, generate reports
and graphs, edit data, sort data).
7. SAS Syntax Rules
SAS statements
• usually begin with an identifying keyword
• always end with a semicolon.
data work.staff;
infile 'emplist.dat';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc print data=work.staff;
run;
proc means data=work.staff mean max;
class JobTitle;
var Salary;
run;
8. SAS Syntax Rules
SAS statements are free-format.
They can begin and end in any column.
One or more blanks or special characters can be used to
separate words.
A single statement can span multiple lines.
Several statements can be on the same line.
Unconventional Spacing
data work.staff;
infile 'emplist.dat';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc means data=work.staff mean max;
class JobTitle; var Salary;run;
9. SAS Syntax Rules
SAS statements are free-format.
They can begin and end in any column.
One or more blanks or special characters can be used to
separate words.
A single statement can span multiple lines.
Several statements can be on the same line.
Unconventional Spacing
data work.staff;
infile 'emplist.dat';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc means data=work.staff mean max;
class JobTitle; var Salary;run;
10. SAS Syntax Rules
SAS statements are free-format.
They can begin and end in any column.
One or more blanks or special characters can be used to
separate words.
A single statement can span multiple lines.
Several statements can be on the same line.
Unconventional Spacing
data work.staff;
infile 'emplist.dat';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc means data=work.staff mean max;
class JobTitle; var Salary;run;
11. SAS Syntax Rules
SAS statements are free-format.
They can begin and end in any column.
One or more blanks or special characters can be used to
separate words.
A single statement can span multiple lines.
Several statements can be on the same line.
Unconventional Spacing
data work.staff;
infile 'emplist.dat';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc means data=work.staff mean max;
class JobTitle; var Salary;run;
12. SAS Comments
Type /* ‘comment’ */ for multiple lines
comment
Type ‘*’ for single lines comment, and end it
with ‘;’
/* create dataset work.staff */
data work.staff;
infile 'emplist.dat';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
13. SAS Data Library
When you invoke SAS, you automatically
have access to a temporary and a permanent
SAS data library.
WORK - temporary library WORK
SASUSER - permanent library
You can create and access your SASUSER
own permanent libraries.
IA - permanent library IA
14. Create SAS Data Library
By Statement
libname IA ‘C:SAS Institute’;
By wizard
15. Agenda
Introduction to SAS
Read data from raw data file
Formatting the data
Data Manipulation and statistical analysis
Combining and subsetting data
Processing the data iteratively
Report Production
16. What is the Import Wizard ?
The Import Wizard is a point-and-click graphical
interface that enables you to create a SAS data
set from several types of external files including
– dBASE files (*.DBF)
– Excel spreadsheets (*.XLS)
– Microsoft Access tables
– delimited files (*.*)
– comma-separated values (*.CSV).
17. The Import Procedure
PROC IMPORT OUT=SAS-data-set
DATAFILE='external-file-name‘
DBMS=file-type;
RUN;
19. Writing the data step (1)
data work.empdata (DROP = …);
infile 'employee.dat';
input EmpID $ 1-4
LastName $ 5-17
FirstName $ 18-30
JobCode $ 31-36
Salary 37-45;
run;
OUTPUT
20. Writing the data step (2)
data work.aircraft;
infile ‘aircraft.dat’(DROP =..);
input @1 Model $16.
@18 AircraftID $6.
@25 InService mmddyy10.
@36 LastMaint mmddyy10.;
run;
21. Reading delimited Raw data file
data work.aircraft (KEEP = …);
infile ‘aircraft.dat’ DLM=‘,’ DSD;
input @1 Model $16.
@18 AircraftID $6.
@25 InService mmddyy10.
@36 LastMaint mmddyy10.;
run;
22. Testing the data step
data work.aircraft;
input @1 Model $16.
@18 AircraftID $6.
@25 InService mmddyy10.
@36 LastMaint mmddyy10.;
datalines;
JetCruise LF5200 030003 04/05/1994 03/11/2001
JetCruise LF5200 030005 02/15/1999 07/05/2001
run;
23. Retain the Variable(s)
RETAIN statement id used to :
Prevent initialization of variables to missing each time the data
step executes
Give an initial value to a retained variable
RETAIN variable(s) <initial value>
Sample :
data work.grand_salary(KEEP = emp_id emp_salary tot_sal);
set ia.employee_data;
retain tot_sal 0;
tot_sal = tot_sal + emp_salary;
run;
24. Agenda
Introduction
Read data from raw data file
Formatting the data
Data Manipulation and statistical analysis
Combining and subsetting data
Processing the data iteratively
Report Production
26. SAS Formats/Informats
Selected SAS formats:
w.d standard numeric format
$w. standard character format
COMMAw.d commas in a number: 12,234.21
DOLLARw.d dollar signs and commas in a
number: $12,234.41
27. SAS Formats/Informats
Stored Value Format Displayed Value
27134.2864 COMMA12.2 27,134.29
27134.2864 12.2 27134.29
27134.2864 DOLLAR12.2 $27,134.29
27134.2864 DOLLAR9.2 $27134.29
27134.2864 DOLLAR8.2 27134.29
28. SAS Formats/Informats
Selected SAS date formats:
MMDDYYw. 101692 (MMDDYY6.)
10/16/92 (MMDDYY8.)
10/16/1992 (MMDDYY10.)
DATEw. 16OCT92 (DATE7.)
16OCT1992 (DATE9.)
29. SAS Formats/Informats
Stored Value Format Displayed Value
0 MMDDYY8. 01/01/60
0 MMDDYY10. 01/01/1960
0 DATE9. 01JAN1960
0 DDMMYY10. 01/01/1960
0 WORDDATE. January 1, 1960
0 WEEKDATE. Friday, January 1, 1960
31. Creating User defined Format
Format-name
names the format you are creating
for character values, must have a dollar sign
($) as the first character and no more than
seven additional characters, numbers, and
underscores
for numeric values, can be up to eight
characters, numbers, and underscores
cannot end in a number
continued...
32. Creating User defined Format
Format-name
cannot be the name of a SAS System format
does not end with a period in the VALUE
statement.
Labels must be
200 characters or fewer in length
enclosed in quotes.
33. Creating User defined Formats
Assign labels to single numbers.
proc format; Formatted
value gender 1='Female' value
2='Male'
other='Miscoded';
run;
Numeric data value
Numeric
format Keyword
name
34. Creating User defined Formats
Assign labels to ranges of numbers.
proc format; Keyword
value boardfmt low-49='Below'
50-99='Average'
100-high='Above Average';
run;
Numeric data ranges
35. Creating User defined Format
Assign labels to character values and ranges
of character values. Character
proc format; format name
value $grade 'A'='Good'
'B'-'D'='Fair'
Character 'F'='Poor'
value 'I','U'='See Instructor'
range other='Miscoded';
run;
Keyword
Discrete character values
36. Creating User defined Format
proc format;
value money low-<25000 ='< 25,000'
25000-50000='25,000 - 50,000'
50000<-high='> 50,000';
run;
money
proc print data=work.empdata;
format Salary money.;
run;
37. User defined Informat
PROC FORMAT;
INVALUE format-name range1='label'
range2='label'
…;
RUN;
38. Agenda
Introduction
Read data from raw data file
Formatting the data
Data Manipulation and statistical analysis
Combining and subsetting data
Processing the data iteratively
Report Production
39. Creating Multiple SAS Dataset
data work.north_america work.europe work.other;
set ia.employee_data;
select(emp_country);
when (‘USA’,’CANADA’)
output work.north_america;
when (’DENMARK’,’SWEDEN’,’ITALY’,
‘SPAIN’,’FRANCE’);
output work.europe;
otherwise
output work.other;
end;
run;
40. Create normal Variable
data ia.comparison;
merge ia.sales(rename=(SaleMon=Month))
work.goals;
by Month Region;
FClass=FSales-FGoal;
run;
41. Create Variable through conditional
processing
TotPassCap Size
100 Small
207 Large
98 Small
188 Mediu
if TotPassCap<=150 then Size='Small';
else if 150<TotPassCap<=200 then
Size='Medium';
else if 200<TotPassCap then
Size='Large';
42. The LENGTH Statement
You can use the LENGTH statement to define the
length of a variable explicitly.
General form of the LENGTH statement:
LENGTH variable(s) $ length;
Example:
length Size $ 6;
43. Conditionally Executing Multiple
Statements
You can use DO and END statements to execute a group
of statements based on a condition.
General form of DO and END statements:
IF expression THEN
DO;
executable statements
END;
ELSE
DO;
executable statements
END;
45. SUBSTR Function
The SUBSTR function extracts a portion of the character
data value based on how many characters are
designated for retrieval
var1 = SUBSTR (var, start, <number of characters)
Example :
COUNTRY COUNTRY
name1=substr(name, 1, 3);
Dorothy E Dor
BEFORE AFTER
Sample of application: applicable to retrieve the first Initial
46. Retrieve Middle Initial
Problems :
Not all middle initials are in the same location,
so you can’t use the SUBSTR function
Not all people have middle initial
47. SCAN Function
The SCAN function extracts a portion of the character
data value based on what word-number to retrieve.
var1 = SUBSTR (var, word-number, <delimiter(s)>);
Example :
COUNTRY COUNTRY
name1= scan(name, 2, ‘ ‘);
Dorothy Edgar Edgar
BEFORE AFTER
48. Concatenation Operator
The concatenation operator joins character data
values together
var = var1 !! var2;
Besides !!, the other concatenation chars are :
Two vertical bars and two broken vertical bars
49. Concatenation Operator
Example : newname = name1 !! Name2;
Compilation :
NAME1 NAME2 NEWNAME
$ + $ = $
9 6 15
Dorothy E Dorothy E
2 spaces
50. TRIM Function
The TRIM function removes trailing blanks from a
character data value during execution
var = TRIM (var1)!! Var2;
Example :
NAME1 NAME2 NEWNAME
$ + $ = $
9 6 15
Dorothy E DorothyE
0 spaces
52. SUM function
The SUM function adds the values of the arguments
and ignores missing values.
General form of the SUM function to create a new
variable:
variable = SUM(argument1, argument2);
variable variable you want to create
argument variables, literals, or expressions to
be summed.
53. SUM function
When you see the implied variable list, use the
keyword OF in front of the first variable name to
prevent subtraction from occurring.
variable = SUM(OF var1-varN);
54. MEAN function
The MEAN function returns the arithmetic mean
(average) and ignores missing values.
variable = MEAN(argument1, argument2);
variable variable you want to create
argument variables, literals, or expressions to
be summed.
55. ROUND function
The ROUND function returns a value rounded to the
nearest rounded-off unit. If round-off unit is not
provided, the variable is rounded to the nearest
integer.
variable = ROUND(var1, round-off unit>);
Any number or fractional value can be used as a
round-off unit
56. INT function
The INT function returns the integer portion of an
argument.
var1 = INT(var);
57. Char-to-Num conversion
You can perform explicit character-to-numeric conversion with
the INPUT function.
var = INPUT(var1, informat-name);
Notes : you can’t accomplish the type conversion by reassigning
the new varibales with the same name
Emp_salary = INPUT(emp_salary);
58. Num-to-Char conversion
You can perform explicit character-to-numeric conversion with
the INPUT function.
var = PUT(var1, format-name);
Notes : you can’t accomplish the type conversion by reassigning
the new varibales with the same name
Emp_salary = INPUT(emp_salary);
59. Working with Date Values
Date values that are stored as SAS dates are special
numeric values.
A SAS date value is interpreted as the number
of days between January 1, 1960, and a specific date.
01JAN1959 01JAN1960 01JAN1961
informat
-365 0 366
format
01/01/1959 01/01/1960 01/01/1961
60. Converting Dates to SAS Date Values
SAS uses date informats to read and convert
dates to SAS date values, for example,
Stored Value Informat Converted Value
10/29/1999 MMDDYY10. 14546
29OCT1999 DATE9. 14546
29/10/1999 DDMMYY10. 14546
61. Writing SAS Date Values
SAS uses date formats to write values from
columns that represent dates, for example,
Stored Value Format Displayed Value
0 MMDDYY10. 01/01/1960
0 DATE9. 01JAN1960
0 DDMMYY10. 01/01/1960
0 WEEKDATE. Friday, January 1, 1960
62. SAS Time Values
SAS date informats and formats can be used
to read and write SAS time values.
12:00 AM 9:30 AM
05JUN1989 05JUN1989
0 34200
63. SAS Datetime Values
SAS datetimes are a combination of dates and times,
and are measured as the number of seconds since
January 1, 1960
‘ddmmmyyyy:hh:mm <:ss.s>’DT
SAS date informats and formats can also be used to
read and write SAS datetime values.
12:00 AM 9:30 AM
01JUN1960 05JUN1989
0 928661400
64. SAS Times
Just as SAS has a starting point of dates, it also
has a starting point of times
Time is measured as the number of seconds
since midnight
‘hh:mm<:ss.s>’T
65. INTNX Function
The INTNX function advances a date, time, or datetime
value by a given interval, and returns a date, time or a
datetime value.
var = INTNX (‘interval’, start-from, increment)
Example :
VAR1 VAR
17787 var=intnx(‘year’, var, 1); 17898
SAS date for 12SEP2008 SAS date for 01JAN2009
66. TODAY () Function
The TODAY() function returns the current date
as SAS date
var = TODAY ();
67. INTCK Function
The INTCK function returns the number of time
intervals in a given time span
var = intck (‘interval’, from, to);
68. Other DATE Function
var = YEAR (var1);
var = MONTH (var1);
var = DAY (var1);
var = QTR (var1);
var = WEEKDAY (var1); (1-7, 1=Sunday)
var = DAYPART (var1);
MDY (month, day, year);
69. JULIAN date
To convert a Julian date to SAS date-value
sas_date = DATEJUL (julian_date);
To convert a SAS date to a Julian date-value :
jul_date = JULDATE(sas_date);
70. Creating a SAS date
General Form of the MDY function
MDY (month, day, year)
Example :
emp_hire_date = MDY(mon, day, year);
72. The permitted directives
%a Abbreviated weekday name
%A Full weekday name
%b Abbreviated month name
%B Full weekday name
%d Day of the month as a decimal number (1-31), with no leading zero
%H Hour (24-hour clock) as a decimal number (0-23) with no leading zero
%I Hour (12-hour clock) as a decimal number (1-12) with no leading zero
%j Day of the year as a decimal number (1-366), with no leading
zero
%m Month as a decimal number (1-12) with no leading zero
%M Minute as a decimal number (0-59) with no leading zero
%p AM or PM
%S Second as a decimal number (0-59) with no leading zero
%U Week number of the year (Sunday as the first day of the week) as a
decimal number (0, 53) with no leading zero
%Y Year with century as a decimal number
75. UPCASE Function
The UPCASE function converts all letters in the data
value into uppercase
var = UPCASE (var)
Example :
COUNTRY country=upcase(country); COUNTRY
france FRANCE
BEFORE AFTER
Use the LOWCASE function to convert data values to lowercase
76. COMPBL Function
The COMPBL function compresses multiple consecutive blanks in a
data value into one blank. Since the length of a variable is set at
compilation, the resulting data value is padded with blanks.
var = COMPBL (var)
Example :
NAME name = compbl(name); NAME
DE PABLOS DE PABLOS
BEFORE AFTER
77. TRANWRD Function
The TRANWRD function replaces all occurrences of a pattern of
characters in a data value with another pattern of characters.
var = TRANWRD (var, target, replacement);
Example :
NAME name = tranwrd (name, ‘Miss’, ‘Ms’); NAME
Miss. Joy Ho Ms. Joy Ho
BEFORE AFTER
78. Calculating Summary Statistics
Model AircraftID InService TotPassCap
MF4000 010012 10890 267
LF5200 030006 10300 207
LF5200 030008 11389 207
proc means data=ia.aircraftcap maxdec=2;
var TotPassCap;
class Model;
run;
79. Calculating Summary Statistics
Model AircraftID InService TotPassCap
MF4000 010012 10890 267
LF5200 030006 10300 207
LF5200 030008 11389 207
proc means data=ia.aircraftcap maxdec=2;
var TotPassCap;
class Model;
run;
80. Calculating Summary Statistics
BY default, proc means will display all classification
variables and the following the statistics functions :
N
Mean
Std Dev
Minimum
Maximum
81. Agenda
Introduction
Read data from raw data file
Formatting the data
Data Manipulation and statistical analysis
Combining and subsetting data
Processing the data iteratively
Report Production
82. Concatenating SAS Jan
Data Sets Feb
Mar Data Set
Data Set Data Set
IA.SALES1 IA.SALES2 Apr IA.SALES1
Jan Jul May
Feb Aug Jun
Jul
Mar + Sep =
Aug
Apr Oct
Sep Data Set
May Nov
Oct IA.SALES2
Jun Dec
Nov
Dec
83. Concatenating SAS Data Sets
Example program to concatenate two
SAS data sets:
data ia.personnel;
set ia.employees ia.departments;
run;
Example program to concatenate several
SAS data sets:
data ia.airlines;
set ia.airport ia.aircraft ia.schedule
ia.budget ia.sales ia.personnel;
run;
84. Interleaving SAS data sets
Example program to interleave two
SAS data sets:
data ia.personnel;
set ia.employees (RENAME=(old=new))
ia.departments;
by id;
run;
85. Preparing Data for Merging
Often you must manipulate data before you
can perform a merge. You might have to
rename variables
sort the data.
86. Sorting the data
PROC SORT DATA = sas-data-set <OUT=sas-data-
set>;
BY variable<s> <descending>
RUN;
87. ...
Performing a Match MERGE
IA.SALES WORK.GOALS
SaleMon Region FSales Month Region FGoal
1 Europe 2118222.62 1 Europe 2127742.48
1 North America 3135765.34 1 North America 2934441.72
2 Europe 1960034.47 2 Europe 1920751.20
DATA STEP
data ia.comparison;
merge ia.sales(rename=(SaleMon=Month))
work.goals;
by Month Region;
FClass=FSales-FGoal;
run;
IA.COMPARISON
Month Region FSales FGoal FClass
1 Europe 2118222.62 2127742.48 -9519.86
1 North America 3135765.34 2934441.72 201323.62
2 Europe 1960034.47 1920751.20 39283.27
88. Other Merges (Self-study)
The DATA step merge works with many other kinds of
data combinations:
One-to-many Unique BY values are in one
data set and duplicate
matching BY values are in the
other data set.
Many-to-many Duplicate matching BY values
are in both data sets.
Non-matches Some BY values in one data
set have no matching BY
values in the other data set.
89. One-To-Many Merging
WORK.ONE WORK.TWO
X Y X Z
1 A 1 A1
2 B 1 A2
3 C 2 B1
3 C1
3 C2
data work.three;
merge work.one work.two;
by X; X Y Z
run;
1 A A1
1 A A2
2 B B1
3 C C1
3 C C2
90. One-To-Many Merging
IA.ALLSALES IA.ALLGOALS
Month Region Sales Month Goal
1 Europe 2118222.62 1 2127742.48
1 North America 3135765.34 2 1920751.20
2 Europe 1960034.47 3 2125112.75
2 North America 2926929.91
data ia.allcompare;
merge ia.allsales
ia.allgoals;
by Month;
Difference=Sales-Goal;
run;
Month Region Sales Goal Difference
1 Europe 2118222.62 21277742.48 -9519.86
1 North America 3135765.34 21277742.48 1008022.86
2 Europe 1960034.47 1920751.20 39283.27
2 North America 2926929.91 1920751.20 1006178.71
91. Many-To-Many Merging
WORK.ONE WORK.TWO
X Y X Z
1 A1 1 AA1
1 A2 1 AA2
2 B1 1 AA3
2 B2 2 BB1
2 BB2
data work.three;
merge work.one work.two;
by X; X Y Z
run; 1 A1 AA1
1 A2 AA2
1 A2 AA3
2 B1 BB1
2 B2 BB2
93. Merging With Non-matches
WORK.ONE WORK.TWO
X Y X Z
1 A 1 A1
2 B 3 C1
3 C 4 D1
data work.three;
merge work.one work.two;
by X;
X Y Z
run;
1 A A1
2 B
3 C C1
4 D1
95. Review the Match-merge (1)
data work.tot_sales;
merge ia.sales (in=a)
ia.transaction (in=b);
by sales_id;
if a and b;
run;
96. Review the Match-merge (2)
data not_in_a not_in_b;
merge work.a (in=a) work.b (in=b);
by num;
IF a and not b THEN not_in_b;
ELSE not_in_a;
run;
97. Reading a subset of Raw Data
Use the DATA step that was written earlier.
Add a subsetting IF statement to process only the subset
in which the value of AGE is at least 15.
data work.aircraft;
set ia.aircraft (firstobs=5 obs=10);
YrInService=year(InService);
Age=year(today())-YrInService;
if Age>=15;
run;
98. Subsetting Your Data with the
WHERE Statement
The WHERE statement enables you to select observations
that meet a certain condition before SAS brings the
observation into the PROC REPORT step.
Date FlightID TotPassCap TotPass TotRev
04JAN1999 IA00300 207 186 $140,170.00
26NOV1999 IA00300 207 176 $133,704.00
31DEC1999 IA00401 207 171 $129,491.00
proc report data=ia.sales1999 nowd;
where Date between '24nov1999'd
and '03jan2000'd;
run;
Date FlightID TotPassCap TotPass TotRev
26NOV1999 IA00300 207 176 $133,704.00
31DEC1999 IA00401 207 171 $129,491.00
99. WHERE or IF
WHERE IF
Step and Usage
Statement Statement
PROC step Yes No
DATA step (source of variable)
INPUT statement No Yes
Assignment statement No Yes
SET statement (single data set) Yes Yes
SET/MERGE
(multiple data sets)
Variable in ALL data sets Yes Yes
Variable not in ALL data sets No Yes
100. Operators
The WHERE statement can be used with
– comparison operators
– logical operators.
You can also use the WHERE statement with
special operators.
101. Comparison Operators
Mnemonic Symbol Definition
EQ = equal to
NE ^= not equal to
GT > greater than
LT < less than
GE >= greater than or equal to
LE <= less than or equal to
IN equal to one of a list
104. Special Operators
The following are special operators :
LIKE selects observations by comparing character
values to specified patterns. A
percent sign (%) replaces any number of characters
and an underscore (_) replaces
one character.
where Code like 'E_U%';
(E, a single character, U, followed by any
characters.)
continued...
105. Special Operators
The sounds-like (=*) operator selects observations
that contain a spelling variation
of the word or words specified.
where Name=*'SMITH';
(Selects the names Smythe, Smitt, and so on.)
CONTAINS or ? selects observations that include the
specified substring.
where Word ? 'LAM';
(BLAME, LAMENT, and BEDLAM are selected.)
continued...
106. Special Operators
IS NULL or IS MISSING selects observations in which
the value of the variable is missing.
where Flight is missing;
BETWEEN-AND selects observations in which the
value of the variable falls within a range of values.
where Date between '01mar1999'd
and '01apr1999'd;
107. Agenda
Introduction
Read data from raw data file
Formatting the data
Data Manipulation and statistical analysis
Combining and subsetting data
Processing the data iteratively
Report Production
112. DO WHILE syntax
DO WHILE (expression);
SAS statements;
END;
The statements in the loop iteratively execute while
the expression is true
The expression is evaluated at the top of the loop
The statements in the loop never executed id the
expression is initially false.
113. DO UNTIL syntax
DO UNTIL (expression);
SAS statements;
END;
The statements in the loop iteratively execute until
the expression becomes true
The expression is evaluated at the bottom of the
loop
The statements in the loop are executed at least
once
114. Sample of programs
data work.retire;
set ia.employee_data;
service_yrs = year(today()) – year (hire_date);
do while (service_yrs <= 30);
emp_salary = emp_salary *1.05;
service_yrs = service_yrs + 1;
end;
year_30 = year(intnx(‘year’, hire_date, 30));
retire_date = mdy(month(hire_date,
date(hire_date), year (hire_date));
run;
116. Sampling of SAS data set (1)
row_to_read = 10;
set ia.employee_data;
point = row_to_read nobs = total_rows;
output;
stop;
Notes :
NOBS= option creates a new temporary variable that contains the
number of observations in SAS data set.
This value is assigned during compilation, which means you can
reference this variable before the SET statement
117. Sampling of SAS data set (2)
sample size = 100;
do while (sample_size >0);
SAS statements…;
end;
stop;
Notes :
NOBS= option creates a new temporary variable that contains
the number of observations in SAS data set.
This value is assigned during compilation, which means you can
reference this variable before the SET statement
119. Agenda
Introduction
Read data from raw data file
Formatting the data
Data Manipulation and statistical analysis
Combining and subsetting data
Processing the data iteratively
Report Production
120. Creating a List Report (1)
Model AircraftID InService TotPassCap Size
MF4000 010012 10890 267 Large
LF5200 030006 10300 207 Large
LF5200 030008 11389 207 Large
proc print data=ia.aircraftcap;
var AircraftID Size TotPassCap;
run;
Aircra TotPassCa
ftID Size p
010012 Large 267
030006 Large 207
030008 Large 207
121. Creating a List Report (2)
Model AircraftID InService TotPassCap Size
MF4000 010012 10890 267 Large
LF5200 030006 10300 207 Large
LF5200 030008 11389 207 Large
proc report data=ia.aircraftcap nowd;
column AircraftID Size TotPassCap;
run;
Aircra TotPassCa
ftID Size p
010012 Large 267
030006 Large 207
030008 Large 207
122. The DEFINE Statement
General form of the DEFINE statement:
DEFINE variable /<usage> <attribute-list>;
You can define options (usage and attributes) in the
DEFINE statement in any order.
Default usage for character variables is DISPLAY.
– The report lists all of the variable’s values from the
data set.
123. The DEFINE Statement
Default usage for numeric variables is ANALYSIS.
If the report contains at least one display variable
and no group variables, the report lists all of the
values of the numeric variable.
If the report contains only numeric variables, the
report displays grand totals for the numeric
variables.
If the report contains group variables, the report
displays the sum of the numeric variables’ values for
each group.
124. The DEFINE Statement
Other available statistics include
N number of nonmissing values
MEAN average
MAX maximum value
MIN minimum value
125. The DEFINE Statement
Additional usage:
ORDER determines the order of the rows in
the report.
• The default order is ascending.
• To force the order to be descending,
include the DESCENDING option on
the DEFINE statement.
• Repetitious printing of values is
suppressed.
126. The DEFINE Statement
Selected attributes:
FORMAT= assigns a format to a variable.
• If there is a format stored in the
descriptor portion of the data set
it is the default format.
WIDTH= controls the width of a report
column.
• The default width is the variable
length.
continued...
127. The DEFINE Statement
CENTER identifies the justification of values
LEFT and the header within the report
RIGHT column.
• The default is LEFT for character
values and RIGHT for numeric
values.
continued...
128. The DEFINE Statement
'report-column-header' defines the report
column header.
• If there is a label
stored in the
descriptor portion of
the data set it is the
default header.
129. Creating an Enhanced List Report
The enhanced aircraft capacity list report
includes
– appropriate report column headings
– formatted values for the INSERVICE variable
– column widths wide enough for the headings
– values and headings centered within the
columns
– rows of the report ordered by descending
values of the variable SIZE.
130. Adding Options to Enhance Report
Appearance
Selected PROC REPORT options:
HEADLINE underlines all column headers
and the spaces between them.
HEADSKIP writes a blank line beneath all
column headers.
131. Writing A PROC REPORT step
Use DEFINE statements to define the variables
as display variables.
– Add column headers and specify column width.
– Add formats and specify alignment. Add titles.
proc report data=ia.sales1999 nowd headline
headskip;
column Date FlightId TotPass TotRev;
define Date / display center 'Sales Date';
define FlightId / display center 'Flight';
define TotPass / display format=3. width=10
center 'Total Passengers';
define TotRev / display format=dollar11.2
center 'Total Revenue';
title 'Sales and Passenger Data for 1999';
run;
132. Controlling Report Appearance
Use the HEADLINE option to underline the column headers and
the HEADSKIP option to add a blank line between the column
headers. Add titles.
title1 'Sales and Passenger Data, by Day of
Week'; title2 'Sunday through Saturday’;
proc report data=ia.sales1999 headline headskip
nowd;
column Day TotPass TotRev;
define Day / group 'Day of Week';
define TotPass / sum format=comma5.
width=10 center
'Total Passengers';
define TotRev / sum format=dollar13.2
center 'Total Revenue';
run;
133. PROC SUMMARY
PROC SUMMARY DATA = sas-data-set;
VAR analysis-variable(s);
CLASS class-variable(s);
OUTPUT OUT = output-data-set
STATISTIC = variable(s);
RUN;
Statistics in PROC SUMMARY :
• N Number of observations with no missing values
• MEAN average
• STD Standard Deviation
• MIN Minimum value
• MAX Maximum value
134. PROC TABULATE
PROC TABULATE DATA = sas-data-set;
CLASS class-variable(s);
VAR analysis-variable(s);
TABLE class_var<*analysis var* Stat>,
class_var<*analysis var * stat>;
RUN;
135. PROC TABULATE (detail)
PROC TABULATE DATA=sas-dataset (WHERE=(condition))
FORMAT=COMMA20.0;
CLASS Card_group Type;
VAR Balance N_account Crd_limit;
TABLE (Type ALL='Total'*{s={foreground=#002288
Background=white}}),
N_account*(SUM PCTSUM)
Balance='Current Balance'*(SUM PCTSUM)
crd_limit='Credit Limit'*(SUM PCTSUM MEAN)
Balance='Percentage Balance to Limit'*PCTSUM
<Crd_limit> *f=comma25.2;
BY Card_Group;
RUN;
136. The Output Delivery System
ODS statements enable you to create output in a
variety of forms.
ODS SAS
Output
Window
SAS
Report HTML
File
137. Generating HTML Files
The ODS HTML statement opens, closes, or manages
the HTML destination.
General form of the ODS statement to create an HTML
file:
ODS HTML FILE='HTML-file-specification'
<options>;
SAS code generating output
ODS HTML CLOSE;
138. Creating an HTML Report
Create a report and close the HTML destination.
ods html file='listing.html';
proc report data=ia.comparison nowd;
column Month …;
define Month /…;
other statements
run;
ods html close;