Ayapparaj / Praxis Business School
1Chapter 7
Chapter 7
Performing Conditional Processing
/* 2. Using the SAS data set Hosp, use PROC PRINT to list observations for
Subject values of 5, 100, 150, and 200. Do this twice, once using OR
operators and onceusing the IN operator. Note: Subject is a numeric
variable */
data a15009.hospques2;
set a15009.hosp;
where Subject = 5 or Subject = 100 or Subject = 150 or Subject = 200;
*using or function in where statement to give the condition given;
run;
/* OR */
data a15009.hospques22;
set a15009.hosp;
where Subject in(5,100,150,200);
*using or function in where statement to give the condition given;
run;
proc print data=a15009.hospques22;
run;
/*4. Using the Sales data set, create a new, temporary SAS data set
containing Region and TotalSales plus a new variable called Weight with
values of 1.5 for the North Region, 1.7 for the South Region, and 2.0 for
the West and East Regions. Use a SELECT statement to do this */
data a15009.salesques4;
set a15009.sales (keep = TotalSales Region);
*dataset is in the blog folder uploaded in Dropbox;
select;
*using select statement for giving the conditions and values associated
each condition;
when (Region = 'North') Weight = 1.5;
when (Region = 'South') Weight = 1.7;
when (Region = 'East') Weight = 2.0;
when (Region = 'West') Weight = 2.0;
otherwise;
end;
run;
Ayapparaj / Praxis Business School
2Chapter 8
proc print data=a15009.Salesques4;
run;
/*6. Using the Sales data set, list all the observations where Region is
North and Quantity is less than 60. Include in this list any observations
where the customer name (Customer) is Pet's are Us */
data a15009.salesques6;
set a15009.sales;
where Region = 'North' and Quantity < 60;
*using where statement to specify condition for region and quantity;
run;
proc print data=a15009.Salesques6;
run;
Chapter 8
Performing Iterative Processing: Looping
/*2. Run the program here to create a temporary SAS data set (MonthSales):
data monthsales;
input month sales @@;
---add your line(s) here---
datalines;
1 4000 2 5000 3 . 4 5500 5 5000 6 6000 7 6500 8 4500
9 5100 10 5700 11 6500 12 7500
;
Ayapparaj / Praxis Business School
3Performing Iterative Processing: Looping
Modify this program so that a new variable, SumSales, representing Sales to
date, is added to the data set. Be sure that the missing value for Sales in
month 3 does not result in a missing value for SumSales */
data a15009.monthsales;
input month sales @@;
sumsales+sales;
*using sum function above for both sum of sales and sum of sales;
retain sumsales 0;
*using retain function initializing sumsales variable to 0;
datalines;
1 4000 2 5000 3 . 4 5500 5 5000 6 6000
7 6500 8 4500 9 5100 10 5700 11 6500 12 7500
;
proc print data=a15009.monthsales;
run;
/*4. Count the number of missing values for the variables A, B, and C in
the Missing data set. Add the cumulative number of missing values to each
observation (use variable names MissA, MissB, and MissC). Use the MISSING
function to test for the missing values */
data a15009.missingdata;
input X $ Y Z A;
if missing(X) then misscounterX+1;
if missing(Y) then misscounterY+1;
if missing(Z) then misscounterZ+1;
if missing(A) then misscounterA+1;
*using sum function for finding number of missing values in each variable;
datalines;
M 56 68 89
F 33 60 71
M 45 91 .
F 35 35 68
M . 71 81
M 50 68 71
. 23 60 46
M 65 72 103
. 35 65 67
M 15 71 75
;
proc print data=a15009.missingdata;run;
Ayapparaj / Praxis Business School
4Performing Iterative Processing: Looping
/*6. Repeat Problem 5, except have the range of N go from 5 to 100 by 5 */
data a15009.logy2;
do N=5 to 100 by 5;
*using do loop to initialize N variable and assign values from 5 to 100 in
increments of 5;
LogN=LOG(N);
output;
end;
run;
proc print data=a15009.logy2;run;
*8. Use an iterative DO loop to plot the following equation:
Logit = log(p / (1 – p))
Use values of p from 0 to 1 (with a point at every .05). Using the
following GPLOT
statements will produce a very nice plot. (If you do not have SAS/GRAPH
software, use PROC PLOT to plot your points).
goptions reset=all
ftext='arial'
htext=1.0
ftitle='arial/bo'
htitle=1.5
colors=(black);
symbol v=none i=sm;
title "Logit Plot";
proc gplot data=logitplot;
plot Logit * p;
Ayapparaj / Praxis Business School
5Performing Iterative Processing: Looping
run;
quit;*/
data a15009.logitplot;
do p=0 to 1 by 0.05;
*using do loop to initialize p variable with values from 0 to 1 increasing
by 0.05;
Logit=LOG(p/(1-p));
output;
end;
run;
goptions reset=all ftext='arial' htext=1.0 ftitle='arial/bo' htitle=1.5
colors=(black);
symbol v=none i=sm;
title "Logit Plot";
proc gplot data=a15009.logitplot;
*using gplot procedure to make plotting;
plot Logit * p;
run;
quit;
/*10. You are testing three speed-reading methods (A, B, and C) by randomly
assigning 10 subjects to each of the three methods. You are given the
results as three lines of reading speeds, each line representing the
results from each of the three methods,
respectively. Here are the results:
250 255 256 300 244 268 301 322 256 333
267 275 256 320 250 340 345 290 280 300
350 350 340 290 377 401 380 310 299 399
Create a temporary SAS data set from these three lines of data. Each
observation should contain Method (A, B, or C), and Score. There should be
30 observations in this data set. Use a DO loop to create the Method
variable and remember to use a single trailing @ in your INPUT statement.
Provide a listing of this data set using PROC PRINT */
data a15009.reading;
do Method = 'MethodA','MethodB','MethodC';
*using do loop to initialize method variable with three values;
do SNo=1 to 10;
input score @;
output;end;end;
datalines;
250 255 256 300 244 268 301 322 256 333
267 275 256 320 250 340 345 290 280 300
Ayapparaj / Praxis Business School
6Performing Iterative Processing: Looping
350 350 340 290 377 401 380 310 299 399
;
proc print data=a15009.reading noobs;
var Method score;
run;
/* 12. You place money in a fund that returns a compound interest of 4.25%
annually. You deposit $1,000 every year. How many years will it take to
reach $30,000? Do not use compound interest formulas. Rather, use “brute
force” methods with DO WHILE or DO UNTIL statements to solve this problem
*/
data a15009.inte;
interest = 0.0425;*initializing the interest variable;
total=1000; *initializing the total valuable;
do year = 1 to 100 by 1 until (total ge 30000);
*specifying values for year and condition for total to stop the loop when
the
value reaches 30000;
total=total+interest*total;
output;
end;
format total dollar11.2; *specifying format for total variable;
run;
proc print data=a15009.inte;run;
Ayapparaj / Praxis Business School
7Performing Iterative Processing: Looping
/*14. Generate a table of integers and squares starting at 1 and ending
when the square value is greater than 100. Use either a DO UNTIL or DO
WHILE statement to accomplish this*/
*using DO UNTIL;
data a15009.square;
do Integers = 1 to 100 until (squares ge 100);
*using do until taking values from 1 to 100 and
specifying the condition for squares variable to
stop the loop when it reaches 100;
Squares = Integers * integers;
output;end;run;
proc print data=a15009.square;run;
*using IF STMT;
data a15009.square;
do Integers = 1 to 100 by 1;
Squares = Integers * integers;
if Squares gt 100 then leave;
output;end;run;
proc print data=a15009.square;run;
Ayapparaj / Praxis Business School
8Chapter 9 Working with Dates
Chapter 9 Working with Dates
/* 2. Using the following lines of data, create a temporary SAS data set
called ThreeDates. Each line of data contains three dates, the first two in
the form mm/dd/yyyy descenders and the last in the form ddmmmyyyy. Name the
three date variables Date1, Date2, and Date3. Format all three using the
MMDDYY10. format. Include in your data set the number of years from Date1
to Date2 (Year12) and the number of years from Date2 to Date3 (Year23).
Round these values to the nearest year. Here are the lines of data (note
that the columns do not line up):
01/03/1950 01/03/1960 03Jan1970
05/15/2000 05/15/2002 15May2003
10/10/1998 11/12/2000 25Dec2005 */
*loading the values as a separate data set in permanent library;
data a15009.three;
input @1 Date1 mmddyy10.
@12 Date2 mmddyy10.
@23 Date3 date9.;
format Date1 Date2 Date3 mmddyy10.;
datalines;
01/03/1950 01/03/1960 03Jan1970
05/15/2000 05/15/2002 15May2003
10/10/1998 11/12/2000 25Dec2005
;
*accessing the values from the above dataset using set function
Using yrdif function to calculate difference between date1,date2 and date3
variables and rounding them using round command along with yrdif;
data a15009.threedates;
set a15009..three;
year12=round(yrdif(Date1,Date2,'Actual'));
year23=round(yrdif(Date2,Date3,'Actual'));
run;
proc print data=threedates;
run;
proc print data=a15009.threedates;run;
/* 4. Using the Hosp data set, compute the subject’s ages two ways: as of
January 1, 2006 (call it AgeJan1), and as of today’s date (call it
AgeToday). The variable DOB represents the date of birth. Take the integer
portion of both ages. List the first 10 observations */
data a15009.hospques4;
set a15009.hosp;
AgeToday=int(yrdif(DOB,today(),'Actual'));
Ayapparaj / Praxis Business School
9Chapter 9 Working with Dates
AgeJan1=int(yrdif(DOB,'01Jan2006'd,'Actual'));
*using yrdif to find the difference between DOB and today’s date and int to
get only integer value of the difference;
run;
proc print data=a15009.hospques4;run;
/* 6. Using the Medical data set, compute frequencies for the days of the
week for the date of the visit (VisitDate). Supply a format for the days of
the week and months of the year */
*loading the medical dataset in the permanent library;
data a15009.medical;
input @1 VisitDate mmddyy10. @12 patno $3.;
datalines;
11/29/2003 879
11/30/2003 880
09/04/2003 883
08/28/2003 884
09/04/2003 885
08/26/2003 886
08/31/2003 887
08/25/2003 888
11/16/2003 913
11/15/2003 914
;
run;
data a15009.sevenques6;
set a15009.medical(keep=VisitDate); *taking medical data using set
function;
Days = weekday(VisitDate); *fetching weekday from visitdate variable;
run;
proc format; *providing format for days variable;
value days 1='Sun' 2='Mon' 3='Tue'
4='Wed' 5='Thu' 6='Fri'
7='Sat';
run;
Ayapparaj / Praxis Business School
10Chapter 9 Working with Dates
title "Frequencies for Visit Dates";
proc freq data=a15009.sevenques6;
tables Days / nocum nopercent;
format Days days.; run;
/* 8. Using the values for Day, Month, and Year in the raw data below,
create a temporary SAS data set containing a SAS date based on these values
(call it Date) and format this value using the MMDDYY10. format. Here are
the Day, Month, and Year values:
25 12 2005
1 1 1960
21 10 1946 */
*storing the data in the permanent library;
data a15009.dataset;
input Day Month Year;
datalines;
25 12 2005
1 1 1960
21 10 1946
;
data a15009.sevenques8;
set a15009.dataset;
Date = mdy(Month,Day,Year);
*merging the day month year values into mmddyyyy format;
format Date mmddyy10.;
run;
proc print data=a15009.sevenques8;run;
/* 10. Using the Hosp data set, compute the number of months from the
admission date (AdmitDate) and December 31, 2007 (call it MonthsDec). Also,
compute the number of months from the admission date to today's date (call
it MonthsToday). Use a date interval function to solve this problem. List
the first 20 observations for your solution */
Ayapparaj / Praxis Business School
11Chapter 9 Working with Dates
data a15009.sevenques10;
set a15009.hosp; *you can find hosp dataset in the blog folder uploaded in
the dropbox;
MonthDec = intck('month',AdmitDate,'31Dec2007'd);
*using intck function to find month difference between admitdate and
31Dec2007;
MonthToday = intck('month',AdmitDate,today());
run;
proc print data=a15009.sevenques10;
run;
/* 12. You want to see each patient in the Medical data set on the same day
of the week 5 weeks after they visited the clinic (the variable name is
VisitDate). Provide a listing of the patient number (Patno), the visit
date, and the date for the return visit */
data a15009.sevenques12;
set a15009.medical;
Followdate=intnx('month',VisitDate,5,'sameday');
*using intcx function to execute the specified condition;
run;
proc print data=a15009.sevenques12;
format Followdate VisitDate date9.;
run;
Ayapparaj / Praxis Business School
12Chapter 10
Chapter 10
Subsetting and Combining SAS Data
Sets
/* 2.Using the SAS data set Hosp, create a temporary SAS data set called
Monday2002, consisting of observations from Hosp where the admission date
(AdmitDate) falls on a Monday and the year is 2002. Include in this new
data set a variable called Age, computed as the person’s age as of the
admission date, rounded to the nearest year */
data a15009.monday2002;
set a15009.hosp;
*you can take hosp dataset from blog folder uploaded in dropbox;
where year(AdmitDate) eq 2002 and
weekday(AdmitDate) eq 2;
*using where statement to specify the condition for AdmitDate
Weekday gives value of Monday as 2 as series starts from 1 for Sunday
Year(admitdate) gives year value of admitdate;
Age = round(yrdif(DOB,AdmitDate,'Actual'));
*using yrdif function to find difference between DOB and AdmitDate;
run;
title "Listing of MONDAY2002";
proc print data=a15009.monday2002;
run;
/* 4. Using the SAS data set Bicycles, create two temporary SAS data sets
as follows: Mountain_USA consists of all observations from Bicycles where
State is Uttar Pradesh and Model is Mountain. Road_France consists of all
Ayapparaj / Praxis Business School
13Subsetting and Combining SAS Data Sets
observations from Bicycles where State is Maharastra and Model is Road
Bike. Print these two data sets */
data a15009.Mountain_USA a15009.Road_France;
set a15009.Bicycles;
*bicycle dataset is available in the blog folder uploaded in dropbox;
if State="Uttar Pradesh" and Model="Mountain Bike" then output
a15009.Mountain_USA;
else if State="Maharastra" and Model="Road Bike" then output
a15009.Road_France;
run;
*introducing two new datasets as a15009.Mountain_USA a15009.Road_France and
saving the observations to both the datasets based on the conditions given;
proc print data= a15009.Mountain_USA;run;
proc print data= a15009.Road_France;run;
/*6. Repeat Problem 5, except this time sort Inventory and NewProducts
first (create two temporary SAS data sets for the sorted observations).
Next, create a new, temporary SAS data set (Updated) by interleaving the
two temporary, sorted SAS data sets. Print out the result.*/
*sorting inventory dataset by model variable;
proc sort data=a15009.inventory out=a15009.inventory;
by Model;
run;
*sorting newproducts dataset by model variable;
proc sort data=a15009.newproducts out=a15009.newproducts;
by Model;
run;
*merging all the rows of both the datasets into a single dataset updated;
data a15009.updated;
set a15009.inventory a15009.newproducts;
by Model;
run;
title "Listing of UPDATED";
proc print data=a15009.updated;
run;
Ayapparaj / Praxis Business School
14Subsetting and Combining SAS Data Sets
/* 8. Run the program here to create a SAS data set called Markup:
data markup;
input manuf : $10. Markup;
datalines;
Cannondale 1.05
Trek 1.07
;
Combine this data set with the Bicycles data set so that each observation
in the Bicycles data set now has a markup value of 1.05 or 1.07, depending
on whether the bicycle is made by Cannondale or Trek. In this new data set
(call it Markup_Prices),create a new variable (NewTotal) computed as
TotalCost times Markup */
*combining both datasets using manuf variable;
data a15009.combi;
merge a15009.bicycles (rename=(Manuf=manuf)) a15009.markup2;
by manuf;
newtotal=sum(unitcost); run;
proc print data=a15009.combi;run;
data a15009.markup2;
input manuf : $10. Markup;
datalines;
Atlas 1.05
Hero 1.07
;
*sorting markup2 data by manuf variable;
proc sort data=a15009.markup2;
by manuf;
run;
*sorting markup2 data by manuf variable here the thing to note is
manufacturer is the label name not variable name;
proc sort data=a15009.Bicycles;
by Manuf;
run;
Ayapparaj / Praxis Business School
15Subsetting and Combining SAS Data Sets
/*10 Using the Purchase and Inventory data sets, provide a list of all
Models (and the Price) that were not purchased*/
*sorting the inventory dataset by Model Variable;
proc sort data=a15009.inventory out=a15009.inventory;
by Model;
run;
*sorting the purchase dataset by Model Variable;
proc sort data=a15009.purchase out=a15009.purchase;
by Model;
run;
*merging two datasets by Model variable
using "IN=" to filter the datsets to find model that were not purchased
along with the proce;
data a15009.notpurchased;
merge a15009.inventory(in=InInventory)a15009.purchase(in=InPurchase);
by Model;
if InInventory and not InPurchase;
keep Model Price;
run;
title "Listing of NOT_BOUGHT";
proc print data=a15009.notpurchased noobs;
run;
/*12 You want to merge two SAS data sets, Demographic and Survey1, based on
an identifier. In Demographic, this identifier is called ID; in Survey1,
the identifier is called Subj. Both are character variables.*/
*you can find both demographictwo and survey1 dataset in the blog folder
uploaded in dropbox;
proc sort data=a15009.demographictwo out=a15009.demographictwo;
by ID;
Ayapparaj / Praxis Business School
16Subsetting and Combining SAS Data Sets
run;
proc sort data=a15009.survey1 out=a15009.survey1;
by Subj;
run;
data a15009.combine12ten;
merge a15009.demographictwo a15009.survey1 (rename=(Subj = ID));
by ID;
run;
proc print data=a15009.combine12ten ;
run;
/*14 Data set Inventory contains two variables: Model (an 8-byte character
variable) and Price (a numeric value). The price of Model M567 has changed
to 25.95 and the price of Model X999 has changed to 35.99. Create a
temporary SAS data set (call it NewPrices) by updating the prices in the
Inventory data set*/
data a15009.modelnew;
input Model $ Price;
datalines;
M567 25.95
X999 35.99
;
*sorting inventory data by model variable;
proc sort data=a15009.inventory out=a15009.inventory;
by Model;
run;
*updating inventory data with modelnew for price for the models;
data a15009.newprices;
update a15009.inventory a15009.modelnew;
by Model;
run;
proc print data=a15009.newprices ;
run;
Ayapparaj / Praxis Business School
17Chapter 11
Chapter 11
Working with Numeric Functions
/* 2. Count the number of missing values for WBC, RBC, and Chol in the
Blood data set. Use the MISSING function to detect missing values */
data a15009.choly;
set a15009.blood;
*blood dataset is present in the blog folder uploaded in dropbox folder;
if missing(Gender) then MissG+1;
if missing(WBC) then MissWBC+1;
if missing(RBC) then MissRBC+1;
if missing(Chol) then MissChol+1;
*using sum function to find the number of missing values in each variable;
run;
proc print data=a15009.choly;run;
/* 4. The SAS data set Psych contains an ID variable, 10 question responses
(Ques1– Ques10), and 5 scores (Score1–Score5). You want to create a new,
temporary SAS data set (Evaluate) containing the following:
a. A variable called QuesAve computed as the mean of Ques1–Ques10. Perform
this computation only if there are seven or more non-missing question
values.
b. If there are no missing Score values, compute the minimum score
(MinScore), the maximum score (MaxScore), and the second highest score
(SecondHighest) */
data a15009.evaluate;
set a15009.psych;
*pysch dataset is present in the blog folder uploaded in dropbox folder;
if n(of Ques1-Ques10) ge 7 then QuesAve=mean(of Ques1-Ques10);
if n(of Score1-Score5) eq 5 then maxscore=max(of Score1-Score5);
if n(of Score1-Score5) eq 5 then Minscore=min(of Score1-Score5);
if n(of Score1-Score5) eq 5 then SecondHighest=largest(2,of Score1-Score5);
*using if then stmt to find max score min score secondhighest of the score
variables;
run;
proc print data=a15009.evaluate;run;
Ayapparaj / Praxis Business School
18Working with Numeric Functions
/* 6. Write a short DATA _NULL_ step to determine the largest integer you
can score on your computer in 3, 4, 5, 6, and 7 bytes */
data _null_;
set a15009.cons;
put int3= int4= int5= int6= int7= ;
run;
*output will appear in the log window;
/* 8. Create a temporary SAS data set (Random) consisting of 1,000
observations, each with a random integer from 1 to 5. Make sure that all
integers in the range are equally likely. Run PROC FREQ to test this
assumption */
data a15009.random;
do i=1 to 1000;
x=int(rand('uniform')*5)+1 /*OR*/ x=int(ranuni(0)*5+1);output ;end;
*here am using rand function to get random value between 1 and 5;
run;
proc freq data=a15009.random;
tables x/missing;run;
/* 10. Data set Char_Num contains character variables Age and Weight and
numeric variables SS and Zip. Create a new, temporary SAS data set called
Convert with new variables NumAge and NumWeight that are numeric values of
Age and Weight, respectively, and CharSS and CharZip that are character
variables created from SS and Zip. CharSS should contain leading 0s and
dashes in the appropriate places for Social Security numbers and CharZip
should contain leading 0s Hint: The Z5. format includes leading 0s for the
ZIP code */
Ayapparaj / Praxis Business School
19Working with Numeric Functions
data a15009.convert;
set a15009.char_num;
*char_num dataset is present in the blog folder uploaded in dropbox folder;
NumAge = input(Age,8.);
NumWeight = input(weight,8.);
*converting character variables weight and age into numeric variables;
CharSS = put(SS,ssn11.);
CharZip = put(Zip,z5.);
*converting numeric variables SS and Zip into character variables;
run;
proc print data=a15009.convert;
run;
/* 12. Using the Stocks data set (containing variables Date and Price),
compute daily changes in the prices. Use the statements here to create the
plot.
Note: If you do not have SAS/GRAPH installed, use PROC PLOT and omit the
GOPTIONS and SYMBOL statements.
goptions reset=all colors=(black) ftext=swiss htitle=1.5;
symbol1 v=dot i=smooth;
title "Plot of Daily Price Differences";
proc gplot data=difference;
plot Diff*Date;
run;
quit; */
data a15009.difference;
set a15009.stocks;
Diff = Dif(Price);
*using dif function to calculate the difference in thr price compared to
the previous value;
run;
goptions reset=all colors=(black) ftext=swiss htitle=1.5;
symbol1 v=dot i=smooth;
title "Plot of Daily Price Differences";
proc gplot data=a15009.difference;
plot Diff * Date;
run;quit;
Ayapparaj / Praxis Business School
20Chapter 12
Chapter 12
Working with Character Functions
/*2 Using the data set Mixed, create a temporary SAS data set (also called
Mixed) with the following new variables:
a. NameLow – Name in lowercase
b. NameProp – Name in proper case
c. (Bonus – difficult) NameHard – Name in proper case without using the
PROPCASE function*/
data a15009.mixed;
set a15009.mixed;
*you can find mixed dataset in the blog folder uploaded in dropbox;
length First Last $ 15 NameHard $ 20;
NameLow = lowcase(Name);
*converting entire word into lower case;
NameProp = propcase(Name);
*making first letter of each work into uppercase;
First = lowcase(scan(Name,1,' '));
*converting entire word into lower case;
Last = lowcase(scan(Name,2,' '));
*converting entire word into lower case;
substr(First,1,1) = upcase(substr(First,1,1));
*converting entire word into upper case;
substr(Last,1,1) = upcase(substr(Last,1,1));
*converting entire word into upper case;
NameHard = catx(' ',First,Last);
*using catx making first letter of each work into uppercase,without using
propcase;
drop First Last;
run;
proc print data=a15009.mixed;
Ayapparaj / Praxis Business School
21Working with Character Functions
run;
/*4 Data set Names_And_More contains a character variable called Height. As
you can see in the listing in Problem 3, the heights are in feet and
inches. Assume that these units can be in upper- or lowercase and there may
or may not be a period following the units. Create a temporary SAS data set
(Height) that contains a numeric variable (HtInches) that is the height in
inches.*/
data a15009.height;
set a15009.names_and_more(keep = Height);
Height = compress(Height,'INFT.','i');
*using compress function with "i" argument to remove characters and to
ignore cases;
/* Alternative
Height = compress(Height,' ','kd');
*keep digits and blanks;
*/
Feet = input(scan(Height,1,' '),8.);
Inches = input(scan(Height,2,' '),?? 8.);
*using scan function to extract values around the characters from the
variable
1 value before space and 2 for value after two for ;
if missing(Inches) then HtInches = 12*Feet;
else HtInches = 12*Feet + Inches;
drop Feet Inches;
run;
proc print data=a15009.height;
run;
/*6 Data set Study (shown here) contains the character variables Group and
Dose. Create a new, temporary SAS data set (Study) with a variable called
GroupDose by putting these two values together, separated by a dash. The
length of the resulting variable should be 6 (test this using PROC CONTENTS
or the SAS Explorer). Make sure that there are no blanks (except trailing
blanks) in this value. Try this problem two ways: first using one of the
CAT functions, and second without using any CAT functions*/
*Using CAT functions;
Ayapparaj / Praxis Business School
22Working with Character Functions
data a15009.study;
set a15009.study;
length GroupDose $ 6;
GroupDose = catx('-',Group,Dose);
*here we are using catx to supply "-" as a separator between Group and Dose
variables;
run;
proc print data=a15009.study;
run;
*Without using CAT functions;
data a15009.study;
set a15009.study;
length GroupDose $ 6;
GroupDose = trim(Group) || '-' || Dose;
*using trim function to trim any space around thr values in Group and
Dose and join them and supply "-" in between the two values;
run;
proc print data=a15009.study;
run;
/*8 Notice in the listing of data set Study in Problem 6 that the variable
called Weight contains units (either lbs or kgs). These units are not
always consistent in case and may or may not contain a period. Assume an
upper- or lowercase LB indicates pounds and an upper- or lowercase KG
indicates kilograms. Create a new, temporary SAS data set (Study) with a
numeric variable also called Weight (careful here) that represents weight
in pounds, rounded to the nearest 10th of a pound. Note: 1 kilogram = 2.2
pounds*/
data a15009.study;
set a15009.study(keep=Weight rename=(Weight = WeightUnits));
Weight = input(compress(WeightUnits,,'kd'),8.);
*using compress(kd)inside input function to keep numerical values alone
from the string
and change if character variables present to numerical;
if find(WeightUnits,'KG','i') then Weight = round(2.2*Weight,.1);
*using find function with "i" argument to remove characters and to ignore
cases;
else if find(WeightUnits,'LB','i') then Weight = round(Weight,.1);
run;
proc print data=a15009.study;
run;
Ayapparaj / Praxis Business School
23Working with Character Functions
/*10 Data set Errors contains character variables Subj (3 bytes) and
PartNumber (8 bytes). (See the partial listing here.) Create a temporary
SAS data set (Check1) with any observation in Errors that violates either
of the following two rules: first, Subj should contain only digits, and
second, PartNumber should contain only the uppercase letters L and S and
digits. Here is a partial listing of Errors:*/
data a15009.violates_rules;
set a15009.errors;
where notdigit(trim(Subj)) or
verify(trim(PartNumber),'0123456789LS');
*using notdigit to check any invalid character type value present
Here you should use trim function along with notdigit because
Without the TRIM function "not" function used here would
return the position of the first trailing blank in each of the character
values;
run;
proc print data=a15009.violates_rules;
run;
/*12 List the subject number (Subj) for any observations in Errors where
PartNumber contains an upper- or lowercase X or D.*/
proc print data=a15009.errors;
where findc(PartNumber,'XD','i');
*using findc function with argument "i" to find if the variable values
contain any case ;
var Subj PartNumber;
Ayapparaj / Praxis Business School
24Working with Character Functions
run;
/*14. List all patients in the Medical data set where the word antibiotics
is in the comment field (Comment).*/
title "Observations Involving the word Antibiotics";
proc print data=a15009.medicaltwo;
where findw(Comment,'antibiotics');
*using findw function to find if the comment variable contain the word
"antiboitics" in its values;
run;
<< Medicaltwo dataset >>
/*16 Provide a list, in alphabetical order by last name, of the
observations in the Names_And_More data set. Set the length of the last
name to 15 and remove multiple blanks from Name. Note: The variable Name
contains a first name, one or more spaces, and then a last name.*/
data a15009.names;
set a15009.names_and_more;
length Last $ 15;
Name = compbl(Name);
*using compbl function to compress any blanks values present;
Last = scan(Name,2,' ');
*using scan function to take only second part of the name and store it the
last vsriable;
run;
*sorting the data in names dataset based on last variable values;
proc sort data=a15009.names;
by Last;
run;
proc print data=a15009.names;
Ayapparaj / Praxis Business School
25Chapter 13 Working with Arrays
id Name;
var Phone Height Mixed;
run;
Chapter 13 Working with Arrays
/* 1 Using the SAS data set Survey1, create a new, temporary SAS data set
(Survey1) where the values of the variables Ques1–Ques5 are reversed as
follows: 1 ?? 5; 2 ?? 4; 3 ?? 3; 4 ?? 2; 5 ?? 1.
Note: Ques1–Ques5 are character variables. Accomplish this using an
array.*/
*Data set SURVEY;
proc format library=a15009;
value $gender 'M' = 'Male'
'F' = 'Female'
' ' = 'Not entered'
other = 'Miscoded';
value age low-29 = 'Less than 30'
30-50 = '30 to 50'
51-high = '51+';
value $likert '1' = 'Strongly disagree'
'2' = 'Disagree'
'3' = 'No opinion'
'4' = 'Agree'
'5' = 'Strongly agree';
run;
data a15009.survey12;
set a15009.survey1;
array Ques{5} $ Q1-Q5;
*creating array with 5 values for storing variables from Q1 to Q5;
do i = 1 to 5;
Ques{i} = translate(Ques{i},'54321','12345');
*using do loop to create "i" variable with values from 1 to 5 and to
reverse the question using translate function inside the Ques array;
end;
drop i;
run;
proc print data=a15009.survey12;
run;
/* 2.Redo Problem 1, except use data set Survey2. Note: Ques1–Ques5 are
numeric variables.*/
data a15009.survey22;
set a15009.survey2;
array Ques{5} Q1-Q5;
Ayapparaj / Praxis Business School
26Chapter 14 Displaying Your Data
do i = 1 to 5;
Ques{i} = 6 - Ques{i};
end;
drop i;
run;
proc print data=a15009.survey22;
run;
/* 4.Data set Survey2 has five numeric variables (Q1–Q5), each with values
of 1, 2, 3, 4, or 5. You want to determine for each subject (observation)
if they responded with a 5 on any of the five questions. This is easily
done using the OR or the IN operators. However, for this question, use an
array to check each of the five questions. Set variable (ANY5) equal to Yes
if any of the five questions is a 5 and No otherwise.*/
data a15009.any5;
set a15009.survey2;
array Ques{5} Q1-Q5;
Any5 = 'No ';
do i = 1 to 5;
if Ques{i} = 5 then do;
Any5 = 'Yes';
leave;
end;
end;
drop i;
run;
proc print data=a15009.any5;
run;
Chapter 14 Displaying Your Data
/*1 List the first 10 observations in data set Blood. Include only the
variables Subject,WBC (white blood cell), RBC (red blood cell), and Chol.
Label the last three variables “White Blood Cells,” “Red Blood Cells,” and
“Cholesterol,” respectively. Omit the Obs column, and place Subject in the
first column. Be sure the column headings are the variable labels, not the
variable names.*/
proc print data=a15009.blood (obs=10) label;
Ayapparaj / Praxis Business School
27Chapter 14 Displaying Your Data
id Subject;
var WBC RBC Chol;
label WBC = 'White Blood Cells'
RBC = 'Red Blood Cells'
Chol = 'Cholesterol';
run;
/*2 Using the data set Sales, create the report shown here:*/
proc sort data=a15009.sales out=a15009.sales;
by Region;
run;
proc print data=a15009.sales;
by Region;
id Region;
var Quantity TotalSales;
sumby Region;
run;
Ayapparaj / Praxis Business School
28Chapter 15 Creating Customized Reports
/*4.List the first five observations from data set Blood. Print only
variables Subject, Gender, and BloodType. Omit the Obs column.*/
proc print data=a15009.blood(obs=5) noobs;
var Subject Gender BloodType;
run;
Chapter 15 Creating Customized Reports
/*2 Using the Blood data set, produce a summary report showing the average
WBC and RBC count for each value of Gender as well as an overall average.
Your report should look like this:*/
proc report data=a15009.blood nowd headline;
column Gender WBC RBC;
define Gender / group width=6;
Ayapparaj / Praxis Business School
29Chapter 15 Creating Customized Reports
define WBC / analysis mean "Average WBC"
width=7 format=comma6.0;
define RBC / analysis mean "Average RBC"
width=7 format=5.2;
rbreak after / dol summarize;
run;
quit;
/*4 Using the SAS data set BloodPressure, compute a new variable in your
report. This variable (Hypertensive) is defined as Yes for females
(Gender=F) if the SBP is greater than 138 or the DBP is greater than 88 and
No otherwise. For males (Gender=M), Hypertensive is defined as Yes if the
SBP is over 140 or the DBP is over 90 and No otherwise. Your report should
look like this:*/
*Data set BLOODPRESSURE;
proc report data=a15009.bloodpressure nowd;
column Gender SBP DBP Hypertensive;
define Gender / Group width=6;
define SBP / display width=5;
define DBP / display width=5;
define Hypertensive / computed "Hypertensive?" width=13;
compute Hypertensive / character length=3;
if Gender = 'F' and (SBP gt 138 or DBP gt 88)
then Hypertensive = 'Yes';
else Hypertensive='No';
if Gender = 'M' and
(SBP gt 140 or DBP gt 90)
then Hypertensive = 'Yes';
else Hypertensive = 'No';
endcomp;
run;
quit;
Ayapparaj / Praxis Business School
30Chapter 15 Creating Customized Reports
/*6 Using the SAS data set BloodPressure, produce a report showing Gender,
Age, SBP, and DBP. Order the report in Gender and Age order as shown
here:*/
proc report data=a15009.bloodpressure nowd;
column Gender Age SBP DBP;
define Gender / order width=6;
define Age / order width=5;
define SBP / display "Systolic Blood Pressure" width=8;
define DBP / display "Diastolic Blood Pressure" width=9;
run;
quit;
/*8 Using the data set Blood, produce a report like the one here. The
numbers in the table are the average WBC and RBC counts for each
combination of blood type and gender.*/
proc report data=a15009.bloodnew nowd headline;
column BloodType Gender,WBC Gender,RBC;
define BloodType / group 'Blood Type' width=5;
define Gender / across width=8 '-Gender-';
define WBC / analysis mean format=comma8.;
define RBC / analysis mean format=8.2;
run;
quit;

SAS Ron Cody Solutions for even Number problems from Chapter 7 to 15

  • 1.
    Ayapparaj / PraxisBusiness School 1Chapter 7 Chapter 7 Performing Conditional Processing /* 2. Using the SAS data set Hosp, use PROC PRINT to list observations for Subject values of 5, 100, 150, and 200. Do this twice, once using OR operators and onceusing the IN operator. Note: Subject is a numeric variable */ data a15009.hospques2; set a15009.hosp; where Subject = 5 or Subject = 100 or Subject = 150 or Subject = 200; *using or function in where statement to give the condition given; run; /* OR */ data a15009.hospques22; set a15009.hosp; where Subject in(5,100,150,200); *using or function in where statement to give the condition given; run; proc print data=a15009.hospques22; run; /*4. Using the Sales data set, create a new, temporary SAS data set containing Region and TotalSales plus a new variable called Weight with values of 1.5 for the North Region, 1.7 for the South Region, and 2.0 for the West and East Regions. Use a SELECT statement to do this */ data a15009.salesques4; set a15009.sales (keep = TotalSales Region); *dataset is in the blog folder uploaded in Dropbox; select; *using select statement for giving the conditions and values associated each condition; when (Region = 'North') Weight = 1.5; when (Region = 'South') Weight = 1.7; when (Region = 'East') Weight = 2.0; when (Region = 'West') Weight = 2.0; otherwise; end; run;
  • 2.
    Ayapparaj / PraxisBusiness School 2Chapter 8 proc print data=a15009.Salesques4; run; /*6. Using the Sales data set, list all the observations where Region is North and Quantity is less than 60. Include in this list any observations where the customer name (Customer) is Pet's are Us */ data a15009.salesques6; set a15009.sales; where Region = 'North' and Quantity < 60; *using where statement to specify condition for region and quantity; run; proc print data=a15009.Salesques6; run; Chapter 8 Performing Iterative Processing: Looping /*2. Run the program here to create a temporary SAS data set (MonthSales): data monthsales; input month sales @@; ---add your line(s) here--- datalines; 1 4000 2 5000 3 . 4 5500 5 5000 6 6000 7 6500 8 4500 9 5100 10 5700 11 6500 12 7500 ;
  • 3.
    Ayapparaj / PraxisBusiness School 3Performing Iterative Processing: Looping Modify this program so that a new variable, SumSales, representing Sales to date, is added to the data set. Be sure that the missing value for Sales in month 3 does not result in a missing value for SumSales */ data a15009.monthsales; input month sales @@; sumsales+sales; *using sum function above for both sum of sales and sum of sales; retain sumsales 0; *using retain function initializing sumsales variable to 0; datalines; 1 4000 2 5000 3 . 4 5500 5 5000 6 6000 7 6500 8 4500 9 5100 10 5700 11 6500 12 7500 ; proc print data=a15009.monthsales; run; /*4. Count the number of missing values for the variables A, B, and C in the Missing data set. Add the cumulative number of missing values to each observation (use variable names MissA, MissB, and MissC). Use the MISSING function to test for the missing values */ data a15009.missingdata; input X $ Y Z A; if missing(X) then misscounterX+1; if missing(Y) then misscounterY+1; if missing(Z) then misscounterZ+1; if missing(A) then misscounterA+1; *using sum function for finding number of missing values in each variable; datalines; M 56 68 89 F 33 60 71 M 45 91 . F 35 35 68 M . 71 81 M 50 68 71 . 23 60 46 M 65 72 103 . 35 65 67 M 15 71 75 ; proc print data=a15009.missingdata;run;
  • 4.
    Ayapparaj / PraxisBusiness School 4Performing Iterative Processing: Looping /*6. Repeat Problem 5, except have the range of N go from 5 to 100 by 5 */ data a15009.logy2; do N=5 to 100 by 5; *using do loop to initialize N variable and assign values from 5 to 100 in increments of 5; LogN=LOG(N); output; end; run; proc print data=a15009.logy2;run; *8. Use an iterative DO loop to plot the following equation: Logit = log(p / (1 – p)) Use values of p from 0 to 1 (with a point at every .05). Using the following GPLOT statements will produce a very nice plot. (If you do not have SAS/GRAPH software, use PROC PLOT to plot your points). goptions reset=all ftext='arial' htext=1.0 ftitle='arial/bo' htitle=1.5 colors=(black); symbol v=none i=sm; title "Logit Plot"; proc gplot data=logitplot; plot Logit * p;
  • 5.
    Ayapparaj / PraxisBusiness School 5Performing Iterative Processing: Looping run; quit;*/ data a15009.logitplot; do p=0 to 1 by 0.05; *using do loop to initialize p variable with values from 0 to 1 increasing by 0.05; Logit=LOG(p/(1-p)); output; end; run; goptions reset=all ftext='arial' htext=1.0 ftitle='arial/bo' htitle=1.5 colors=(black); symbol v=none i=sm; title "Logit Plot"; proc gplot data=a15009.logitplot; *using gplot procedure to make plotting; plot Logit * p; run; quit; /*10. You are testing three speed-reading methods (A, B, and C) by randomly assigning 10 subjects to each of the three methods. You are given the results as three lines of reading speeds, each line representing the results from each of the three methods, respectively. Here are the results: 250 255 256 300 244 268 301 322 256 333 267 275 256 320 250 340 345 290 280 300 350 350 340 290 377 401 380 310 299 399 Create a temporary SAS data set from these three lines of data. Each observation should contain Method (A, B, or C), and Score. There should be 30 observations in this data set. Use a DO loop to create the Method variable and remember to use a single trailing @ in your INPUT statement. Provide a listing of this data set using PROC PRINT */ data a15009.reading; do Method = 'MethodA','MethodB','MethodC'; *using do loop to initialize method variable with three values; do SNo=1 to 10; input score @; output;end;end; datalines; 250 255 256 300 244 268 301 322 256 333 267 275 256 320 250 340 345 290 280 300
  • 6.
    Ayapparaj / PraxisBusiness School 6Performing Iterative Processing: Looping 350 350 340 290 377 401 380 310 299 399 ; proc print data=a15009.reading noobs; var Method score; run; /* 12. You place money in a fund that returns a compound interest of 4.25% annually. You deposit $1,000 every year. How many years will it take to reach $30,000? Do not use compound interest formulas. Rather, use “brute force” methods with DO WHILE or DO UNTIL statements to solve this problem */ data a15009.inte; interest = 0.0425;*initializing the interest variable; total=1000; *initializing the total valuable; do year = 1 to 100 by 1 until (total ge 30000); *specifying values for year and condition for total to stop the loop when the value reaches 30000; total=total+interest*total; output; end; format total dollar11.2; *specifying format for total variable; run; proc print data=a15009.inte;run;
  • 7.
    Ayapparaj / PraxisBusiness School 7Performing Iterative Processing: Looping /*14. Generate a table of integers and squares starting at 1 and ending when the square value is greater than 100. Use either a DO UNTIL or DO WHILE statement to accomplish this*/ *using DO UNTIL; data a15009.square; do Integers = 1 to 100 until (squares ge 100); *using do until taking values from 1 to 100 and specifying the condition for squares variable to stop the loop when it reaches 100; Squares = Integers * integers; output;end;run; proc print data=a15009.square;run; *using IF STMT; data a15009.square; do Integers = 1 to 100 by 1; Squares = Integers * integers; if Squares gt 100 then leave; output;end;run; proc print data=a15009.square;run;
  • 8.
    Ayapparaj / PraxisBusiness School 8Chapter 9 Working with Dates Chapter 9 Working with Dates /* 2. Using the following lines of data, create a temporary SAS data set called ThreeDates. Each line of data contains three dates, the first two in the form mm/dd/yyyy descenders and the last in the form ddmmmyyyy. Name the three date variables Date1, Date2, and Date3. Format all three using the MMDDYY10. format. Include in your data set the number of years from Date1 to Date2 (Year12) and the number of years from Date2 to Date3 (Year23). Round these values to the nearest year. Here are the lines of data (note that the columns do not line up): 01/03/1950 01/03/1960 03Jan1970 05/15/2000 05/15/2002 15May2003 10/10/1998 11/12/2000 25Dec2005 */ *loading the values as a separate data set in permanent library; data a15009.three; input @1 Date1 mmddyy10. @12 Date2 mmddyy10. @23 Date3 date9.; format Date1 Date2 Date3 mmddyy10.; datalines; 01/03/1950 01/03/1960 03Jan1970 05/15/2000 05/15/2002 15May2003 10/10/1998 11/12/2000 25Dec2005 ; *accessing the values from the above dataset using set function Using yrdif function to calculate difference between date1,date2 and date3 variables and rounding them using round command along with yrdif; data a15009.threedates; set a15009..three; year12=round(yrdif(Date1,Date2,'Actual')); year23=round(yrdif(Date2,Date3,'Actual')); run; proc print data=threedates; run; proc print data=a15009.threedates;run; /* 4. Using the Hosp data set, compute the subject’s ages two ways: as of January 1, 2006 (call it AgeJan1), and as of today’s date (call it AgeToday). The variable DOB represents the date of birth. Take the integer portion of both ages. List the first 10 observations */ data a15009.hospques4; set a15009.hosp; AgeToday=int(yrdif(DOB,today(),'Actual'));
  • 9.
    Ayapparaj / PraxisBusiness School 9Chapter 9 Working with Dates AgeJan1=int(yrdif(DOB,'01Jan2006'd,'Actual')); *using yrdif to find the difference between DOB and today’s date and int to get only integer value of the difference; run; proc print data=a15009.hospques4;run; /* 6. Using the Medical data set, compute frequencies for the days of the week for the date of the visit (VisitDate). Supply a format for the days of the week and months of the year */ *loading the medical dataset in the permanent library; data a15009.medical; input @1 VisitDate mmddyy10. @12 patno $3.; datalines; 11/29/2003 879 11/30/2003 880 09/04/2003 883 08/28/2003 884 09/04/2003 885 08/26/2003 886 08/31/2003 887 08/25/2003 888 11/16/2003 913 11/15/2003 914 ; run; data a15009.sevenques6; set a15009.medical(keep=VisitDate); *taking medical data using set function; Days = weekday(VisitDate); *fetching weekday from visitdate variable; run; proc format; *providing format for days variable; value days 1='Sun' 2='Mon' 3='Tue' 4='Wed' 5='Thu' 6='Fri' 7='Sat'; run;
  • 10.
    Ayapparaj / PraxisBusiness School 10Chapter 9 Working with Dates title "Frequencies for Visit Dates"; proc freq data=a15009.sevenques6; tables Days / nocum nopercent; format Days days.; run; /* 8. Using the values for Day, Month, and Year in the raw data below, create a temporary SAS data set containing a SAS date based on these values (call it Date) and format this value using the MMDDYY10. format. Here are the Day, Month, and Year values: 25 12 2005 1 1 1960 21 10 1946 */ *storing the data in the permanent library; data a15009.dataset; input Day Month Year; datalines; 25 12 2005 1 1 1960 21 10 1946 ; data a15009.sevenques8; set a15009.dataset; Date = mdy(Month,Day,Year); *merging the day month year values into mmddyyyy format; format Date mmddyy10.; run; proc print data=a15009.sevenques8;run; /* 10. Using the Hosp data set, compute the number of months from the admission date (AdmitDate) and December 31, 2007 (call it MonthsDec). Also, compute the number of months from the admission date to today's date (call it MonthsToday). Use a date interval function to solve this problem. List the first 20 observations for your solution */
  • 11.
    Ayapparaj / PraxisBusiness School 11Chapter 9 Working with Dates data a15009.sevenques10; set a15009.hosp; *you can find hosp dataset in the blog folder uploaded in the dropbox; MonthDec = intck('month',AdmitDate,'31Dec2007'd); *using intck function to find month difference between admitdate and 31Dec2007; MonthToday = intck('month',AdmitDate,today()); run; proc print data=a15009.sevenques10; run; /* 12. You want to see each patient in the Medical data set on the same day of the week 5 weeks after they visited the clinic (the variable name is VisitDate). Provide a listing of the patient number (Patno), the visit date, and the date for the return visit */ data a15009.sevenques12; set a15009.medical; Followdate=intnx('month',VisitDate,5,'sameday'); *using intcx function to execute the specified condition; run; proc print data=a15009.sevenques12; format Followdate VisitDate date9.; run;
  • 12.
    Ayapparaj / PraxisBusiness School 12Chapter 10 Chapter 10 Subsetting and Combining SAS Data Sets /* 2.Using the SAS data set Hosp, create a temporary SAS data set called Monday2002, consisting of observations from Hosp where the admission date (AdmitDate) falls on a Monday and the year is 2002. Include in this new data set a variable called Age, computed as the person’s age as of the admission date, rounded to the nearest year */ data a15009.monday2002; set a15009.hosp; *you can take hosp dataset from blog folder uploaded in dropbox; where year(AdmitDate) eq 2002 and weekday(AdmitDate) eq 2; *using where statement to specify the condition for AdmitDate Weekday gives value of Monday as 2 as series starts from 1 for Sunday Year(admitdate) gives year value of admitdate; Age = round(yrdif(DOB,AdmitDate,'Actual')); *using yrdif function to find difference between DOB and AdmitDate; run; title "Listing of MONDAY2002"; proc print data=a15009.monday2002; run; /* 4. Using the SAS data set Bicycles, create two temporary SAS data sets as follows: Mountain_USA consists of all observations from Bicycles where State is Uttar Pradesh and Model is Mountain. Road_France consists of all
  • 13.
    Ayapparaj / PraxisBusiness School 13Subsetting and Combining SAS Data Sets observations from Bicycles where State is Maharastra and Model is Road Bike. Print these two data sets */ data a15009.Mountain_USA a15009.Road_France; set a15009.Bicycles; *bicycle dataset is available in the blog folder uploaded in dropbox; if State="Uttar Pradesh" and Model="Mountain Bike" then output a15009.Mountain_USA; else if State="Maharastra" and Model="Road Bike" then output a15009.Road_France; run; *introducing two new datasets as a15009.Mountain_USA a15009.Road_France and saving the observations to both the datasets based on the conditions given; proc print data= a15009.Mountain_USA;run; proc print data= a15009.Road_France;run; /*6. Repeat Problem 5, except this time sort Inventory and NewProducts first (create two temporary SAS data sets for the sorted observations). Next, create a new, temporary SAS data set (Updated) by interleaving the two temporary, sorted SAS data sets. Print out the result.*/ *sorting inventory dataset by model variable; proc sort data=a15009.inventory out=a15009.inventory; by Model; run; *sorting newproducts dataset by model variable; proc sort data=a15009.newproducts out=a15009.newproducts; by Model; run; *merging all the rows of both the datasets into a single dataset updated; data a15009.updated; set a15009.inventory a15009.newproducts; by Model; run; title "Listing of UPDATED"; proc print data=a15009.updated; run;
  • 14.
    Ayapparaj / PraxisBusiness School 14Subsetting and Combining SAS Data Sets /* 8. Run the program here to create a SAS data set called Markup: data markup; input manuf : $10. Markup; datalines; Cannondale 1.05 Trek 1.07 ; Combine this data set with the Bicycles data set so that each observation in the Bicycles data set now has a markup value of 1.05 or 1.07, depending on whether the bicycle is made by Cannondale or Trek. In this new data set (call it Markup_Prices),create a new variable (NewTotal) computed as TotalCost times Markup */ *combining both datasets using manuf variable; data a15009.combi; merge a15009.bicycles (rename=(Manuf=manuf)) a15009.markup2; by manuf; newtotal=sum(unitcost); run; proc print data=a15009.combi;run; data a15009.markup2; input manuf : $10. Markup; datalines; Atlas 1.05 Hero 1.07 ; *sorting markup2 data by manuf variable; proc sort data=a15009.markup2; by manuf; run; *sorting markup2 data by manuf variable here the thing to note is manufacturer is the label name not variable name; proc sort data=a15009.Bicycles; by Manuf; run;
  • 15.
    Ayapparaj / PraxisBusiness School 15Subsetting and Combining SAS Data Sets /*10 Using the Purchase and Inventory data sets, provide a list of all Models (and the Price) that were not purchased*/ *sorting the inventory dataset by Model Variable; proc sort data=a15009.inventory out=a15009.inventory; by Model; run; *sorting the purchase dataset by Model Variable; proc sort data=a15009.purchase out=a15009.purchase; by Model; run; *merging two datasets by Model variable using "IN=" to filter the datsets to find model that were not purchased along with the proce; data a15009.notpurchased; merge a15009.inventory(in=InInventory)a15009.purchase(in=InPurchase); by Model; if InInventory and not InPurchase; keep Model Price; run; title "Listing of NOT_BOUGHT"; proc print data=a15009.notpurchased noobs; run; /*12 You want to merge two SAS data sets, Demographic and Survey1, based on an identifier. In Demographic, this identifier is called ID; in Survey1, the identifier is called Subj. Both are character variables.*/ *you can find both demographictwo and survey1 dataset in the blog folder uploaded in dropbox; proc sort data=a15009.demographictwo out=a15009.demographictwo; by ID;
  • 16.
    Ayapparaj / PraxisBusiness School 16Subsetting and Combining SAS Data Sets run; proc sort data=a15009.survey1 out=a15009.survey1; by Subj; run; data a15009.combine12ten; merge a15009.demographictwo a15009.survey1 (rename=(Subj = ID)); by ID; run; proc print data=a15009.combine12ten ; run; /*14 Data set Inventory contains two variables: Model (an 8-byte character variable) and Price (a numeric value). The price of Model M567 has changed to 25.95 and the price of Model X999 has changed to 35.99. Create a temporary SAS data set (call it NewPrices) by updating the prices in the Inventory data set*/ data a15009.modelnew; input Model $ Price; datalines; M567 25.95 X999 35.99 ; *sorting inventory data by model variable; proc sort data=a15009.inventory out=a15009.inventory; by Model; run; *updating inventory data with modelnew for price for the models; data a15009.newprices; update a15009.inventory a15009.modelnew; by Model; run; proc print data=a15009.newprices ; run;
  • 17.
    Ayapparaj / PraxisBusiness School 17Chapter 11 Chapter 11 Working with Numeric Functions /* 2. Count the number of missing values for WBC, RBC, and Chol in the Blood data set. Use the MISSING function to detect missing values */ data a15009.choly; set a15009.blood; *blood dataset is present in the blog folder uploaded in dropbox folder; if missing(Gender) then MissG+1; if missing(WBC) then MissWBC+1; if missing(RBC) then MissRBC+1; if missing(Chol) then MissChol+1; *using sum function to find the number of missing values in each variable; run; proc print data=a15009.choly;run; /* 4. The SAS data set Psych contains an ID variable, 10 question responses (Ques1– Ques10), and 5 scores (Score1–Score5). You want to create a new, temporary SAS data set (Evaluate) containing the following: a. A variable called QuesAve computed as the mean of Ques1–Ques10. Perform this computation only if there are seven or more non-missing question values. b. If there are no missing Score values, compute the minimum score (MinScore), the maximum score (MaxScore), and the second highest score (SecondHighest) */ data a15009.evaluate; set a15009.psych; *pysch dataset is present in the blog folder uploaded in dropbox folder; if n(of Ques1-Ques10) ge 7 then QuesAve=mean(of Ques1-Ques10); if n(of Score1-Score5) eq 5 then maxscore=max(of Score1-Score5); if n(of Score1-Score5) eq 5 then Minscore=min(of Score1-Score5); if n(of Score1-Score5) eq 5 then SecondHighest=largest(2,of Score1-Score5); *using if then stmt to find max score min score secondhighest of the score variables; run; proc print data=a15009.evaluate;run;
  • 18.
    Ayapparaj / PraxisBusiness School 18Working with Numeric Functions /* 6. Write a short DATA _NULL_ step to determine the largest integer you can score on your computer in 3, 4, 5, 6, and 7 bytes */ data _null_; set a15009.cons; put int3= int4= int5= int6= int7= ; run; *output will appear in the log window; /* 8. Create a temporary SAS data set (Random) consisting of 1,000 observations, each with a random integer from 1 to 5. Make sure that all integers in the range are equally likely. Run PROC FREQ to test this assumption */ data a15009.random; do i=1 to 1000; x=int(rand('uniform')*5)+1 /*OR*/ x=int(ranuni(0)*5+1);output ;end; *here am using rand function to get random value between 1 and 5; run; proc freq data=a15009.random; tables x/missing;run; /* 10. Data set Char_Num contains character variables Age and Weight and numeric variables SS and Zip. Create a new, temporary SAS data set called Convert with new variables NumAge and NumWeight that are numeric values of Age and Weight, respectively, and CharSS and CharZip that are character variables created from SS and Zip. CharSS should contain leading 0s and dashes in the appropriate places for Social Security numbers and CharZip should contain leading 0s Hint: The Z5. format includes leading 0s for the ZIP code */
  • 19.
    Ayapparaj / PraxisBusiness School 19Working with Numeric Functions data a15009.convert; set a15009.char_num; *char_num dataset is present in the blog folder uploaded in dropbox folder; NumAge = input(Age,8.); NumWeight = input(weight,8.); *converting character variables weight and age into numeric variables; CharSS = put(SS,ssn11.); CharZip = put(Zip,z5.); *converting numeric variables SS and Zip into character variables; run; proc print data=a15009.convert; run; /* 12. Using the Stocks data set (containing variables Date and Price), compute daily changes in the prices. Use the statements here to create the plot. Note: If you do not have SAS/GRAPH installed, use PROC PLOT and omit the GOPTIONS and SYMBOL statements. goptions reset=all colors=(black) ftext=swiss htitle=1.5; symbol1 v=dot i=smooth; title "Plot of Daily Price Differences"; proc gplot data=difference; plot Diff*Date; run; quit; */ data a15009.difference; set a15009.stocks; Diff = Dif(Price); *using dif function to calculate the difference in thr price compared to the previous value; run; goptions reset=all colors=(black) ftext=swiss htitle=1.5; symbol1 v=dot i=smooth; title "Plot of Daily Price Differences"; proc gplot data=a15009.difference; plot Diff * Date; run;quit;
  • 20.
    Ayapparaj / PraxisBusiness School 20Chapter 12 Chapter 12 Working with Character Functions /*2 Using the data set Mixed, create a temporary SAS data set (also called Mixed) with the following new variables: a. NameLow – Name in lowercase b. NameProp – Name in proper case c. (Bonus – difficult) NameHard – Name in proper case without using the PROPCASE function*/ data a15009.mixed; set a15009.mixed; *you can find mixed dataset in the blog folder uploaded in dropbox; length First Last $ 15 NameHard $ 20; NameLow = lowcase(Name); *converting entire word into lower case; NameProp = propcase(Name); *making first letter of each work into uppercase; First = lowcase(scan(Name,1,' ')); *converting entire word into lower case; Last = lowcase(scan(Name,2,' ')); *converting entire word into lower case; substr(First,1,1) = upcase(substr(First,1,1)); *converting entire word into upper case; substr(Last,1,1) = upcase(substr(Last,1,1)); *converting entire word into upper case; NameHard = catx(' ',First,Last); *using catx making first letter of each work into uppercase,without using propcase; drop First Last; run; proc print data=a15009.mixed;
  • 21.
    Ayapparaj / PraxisBusiness School 21Working with Character Functions run; /*4 Data set Names_And_More contains a character variable called Height. As you can see in the listing in Problem 3, the heights are in feet and inches. Assume that these units can be in upper- or lowercase and there may or may not be a period following the units. Create a temporary SAS data set (Height) that contains a numeric variable (HtInches) that is the height in inches.*/ data a15009.height; set a15009.names_and_more(keep = Height); Height = compress(Height,'INFT.','i'); *using compress function with "i" argument to remove characters and to ignore cases; /* Alternative Height = compress(Height,' ','kd'); *keep digits and blanks; */ Feet = input(scan(Height,1,' '),8.); Inches = input(scan(Height,2,' '),?? 8.); *using scan function to extract values around the characters from the variable 1 value before space and 2 for value after two for ; if missing(Inches) then HtInches = 12*Feet; else HtInches = 12*Feet + Inches; drop Feet Inches; run; proc print data=a15009.height; run; /*6 Data set Study (shown here) contains the character variables Group and Dose. Create a new, temporary SAS data set (Study) with a variable called GroupDose by putting these two values together, separated by a dash. The length of the resulting variable should be 6 (test this using PROC CONTENTS or the SAS Explorer). Make sure that there are no blanks (except trailing blanks) in this value. Try this problem two ways: first using one of the CAT functions, and second without using any CAT functions*/ *Using CAT functions;
  • 22.
    Ayapparaj / PraxisBusiness School 22Working with Character Functions data a15009.study; set a15009.study; length GroupDose $ 6; GroupDose = catx('-',Group,Dose); *here we are using catx to supply "-" as a separator between Group and Dose variables; run; proc print data=a15009.study; run; *Without using CAT functions; data a15009.study; set a15009.study; length GroupDose $ 6; GroupDose = trim(Group) || '-' || Dose; *using trim function to trim any space around thr values in Group and Dose and join them and supply "-" in between the two values; run; proc print data=a15009.study; run; /*8 Notice in the listing of data set Study in Problem 6 that the variable called Weight contains units (either lbs or kgs). These units are not always consistent in case and may or may not contain a period. Assume an upper- or lowercase LB indicates pounds and an upper- or lowercase KG indicates kilograms. Create a new, temporary SAS data set (Study) with a numeric variable also called Weight (careful here) that represents weight in pounds, rounded to the nearest 10th of a pound. Note: 1 kilogram = 2.2 pounds*/ data a15009.study; set a15009.study(keep=Weight rename=(Weight = WeightUnits)); Weight = input(compress(WeightUnits,,'kd'),8.); *using compress(kd)inside input function to keep numerical values alone from the string and change if character variables present to numerical; if find(WeightUnits,'KG','i') then Weight = round(2.2*Weight,.1); *using find function with "i" argument to remove characters and to ignore cases; else if find(WeightUnits,'LB','i') then Weight = round(Weight,.1); run; proc print data=a15009.study; run;
  • 23.
    Ayapparaj / PraxisBusiness School 23Working with Character Functions /*10 Data set Errors contains character variables Subj (3 bytes) and PartNumber (8 bytes). (See the partial listing here.) Create a temporary SAS data set (Check1) with any observation in Errors that violates either of the following two rules: first, Subj should contain only digits, and second, PartNumber should contain only the uppercase letters L and S and digits. Here is a partial listing of Errors:*/ data a15009.violates_rules; set a15009.errors; where notdigit(trim(Subj)) or verify(trim(PartNumber),'0123456789LS'); *using notdigit to check any invalid character type value present Here you should use trim function along with notdigit because Without the TRIM function "not" function used here would return the position of the first trailing blank in each of the character values; run; proc print data=a15009.violates_rules; run; /*12 List the subject number (Subj) for any observations in Errors where PartNumber contains an upper- or lowercase X or D.*/ proc print data=a15009.errors; where findc(PartNumber,'XD','i'); *using findc function with argument "i" to find if the variable values contain any case ; var Subj PartNumber;
  • 24.
    Ayapparaj / PraxisBusiness School 24Working with Character Functions run; /*14. List all patients in the Medical data set where the word antibiotics is in the comment field (Comment).*/ title "Observations Involving the word Antibiotics"; proc print data=a15009.medicaltwo; where findw(Comment,'antibiotics'); *using findw function to find if the comment variable contain the word "antiboitics" in its values; run; << Medicaltwo dataset >> /*16 Provide a list, in alphabetical order by last name, of the observations in the Names_And_More data set. Set the length of the last name to 15 and remove multiple blanks from Name. Note: The variable Name contains a first name, one or more spaces, and then a last name.*/ data a15009.names; set a15009.names_and_more; length Last $ 15; Name = compbl(Name); *using compbl function to compress any blanks values present; Last = scan(Name,2,' '); *using scan function to take only second part of the name and store it the last vsriable; run; *sorting the data in names dataset based on last variable values; proc sort data=a15009.names; by Last; run; proc print data=a15009.names;
  • 25.
    Ayapparaj / PraxisBusiness School 25Chapter 13 Working with Arrays id Name; var Phone Height Mixed; run; Chapter 13 Working with Arrays /* 1 Using the SAS data set Survey1, create a new, temporary SAS data set (Survey1) where the values of the variables Ques1–Ques5 are reversed as follows: 1 ?? 5; 2 ?? 4; 3 ?? 3; 4 ?? 2; 5 ?? 1. Note: Ques1–Ques5 are character variables. Accomplish this using an array.*/ *Data set SURVEY; proc format library=a15009; value $gender 'M' = 'Male' 'F' = 'Female' ' ' = 'Not entered' other = 'Miscoded'; value age low-29 = 'Less than 30' 30-50 = '30 to 50' 51-high = '51+'; value $likert '1' = 'Strongly disagree' '2' = 'Disagree' '3' = 'No opinion' '4' = 'Agree' '5' = 'Strongly agree'; run; data a15009.survey12; set a15009.survey1; array Ques{5} $ Q1-Q5; *creating array with 5 values for storing variables from Q1 to Q5; do i = 1 to 5; Ques{i} = translate(Ques{i},'54321','12345'); *using do loop to create "i" variable with values from 1 to 5 and to reverse the question using translate function inside the Ques array; end; drop i; run; proc print data=a15009.survey12; run; /* 2.Redo Problem 1, except use data set Survey2. Note: Ques1–Ques5 are numeric variables.*/ data a15009.survey22; set a15009.survey2; array Ques{5} Q1-Q5;
  • 26.
    Ayapparaj / PraxisBusiness School 26Chapter 14 Displaying Your Data do i = 1 to 5; Ques{i} = 6 - Ques{i}; end; drop i; run; proc print data=a15009.survey22; run; /* 4.Data set Survey2 has five numeric variables (Q1–Q5), each with values of 1, 2, 3, 4, or 5. You want to determine for each subject (observation) if they responded with a 5 on any of the five questions. This is easily done using the OR or the IN operators. However, for this question, use an array to check each of the five questions. Set variable (ANY5) equal to Yes if any of the five questions is a 5 and No otherwise.*/ data a15009.any5; set a15009.survey2; array Ques{5} Q1-Q5; Any5 = 'No '; do i = 1 to 5; if Ques{i} = 5 then do; Any5 = 'Yes'; leave; end; end; drop i; run; proc print data=a15009.any5; run; Chapter 14 Displaying Your Data /*1 List the first 10 observations in data set Blood. Include only the variables Subject,WBC (white blood cell), RBC (red blood cell), and Chol. Label the last three variables “White Blood Cells,” “Red Blood Cells,” and “Cholesterol,” respectively. Omit the Obs column, and place Subject in the first column. Be sure the column headings are the variable labels, not the variable names.*/ proc print data=a15009.blood (obs=10) label;
  • 27.
    Ayapparaj / PraxisBusiness School 27Chapter 14 Displaying Your Data id Subject; var WBC RBC Chol; label WBC = 'White Blood Cells' RBC = 'Red Blood Cells' Chol = 'Cholesterol'; run; /*2 Using the data set Sales, create the report shown here:*/ proc sort data=a15009.sales out=a15009.sales; by Region; run; proc print data=a15009.sales; by Region; id Region; var Quantity TotalSales; sumby Region; run;
  • 28.
    Ayapparaj / PraxisBusiness School 28Chapter 15 Creating Customized Reports /*4.List the first five observations from data set Blood. Print only variables Subject, Gender, and BloodType. Omit the Obs column.*/ proc print data=a15009.blood(obs=5) noobs; var Subject Gender BloodType; run; Chapter 15 Creating Customized Reports /*2 Using the Blood data set, produce a summary report showing the average WBC and RBC count for each value of Gender as well as an overall average. Your report should look like this:*/ proc report data=a15009.blood nowd headline; column Gender WBC RBC; define Gender / group width=6;
  • 29.
    Ayapparaj / PraxisBusiness School 29Chapter 15 Creating Customized Reports define WBC / analysis mean "Average WBC" width=7 format=comma6.0; define RBC / analysis mean "Average RBC" width=7 format=5.2; rbreak after / dol summarize; run; quit; /*4 Using the SAS data set BloodPressure, compute a new variable in your report. This variable (Hypertensive) is defined as Yes for females (Gender=F) if the SBP is greater than 138 or the DBP is greater than 88 and No otherwise. For males (Gender=M), Hypertensive is defined as Yes if the SBP is over 140 or the DBP is over 90 and No otherwise. Your report should look like this:*/ *Data set BLOODPRESSURE; proc report data=a15009.bloodpressure nowd; column Gender SBP DBP Hypertensive; define Gender / Group width=6; define SBP / display width=5; define DBP / display width=5; define Hypertensive / computed "Hypertensive?" width=13; compute Hypertensive / character length=3; if Gender = 'F' and (SBP gt 138 or DBP gt 88) then Hypertensive = 'Yes'; else Hypertensive='No'; if Gender = 'M' and (SBP gt 140 or DBP gt 90) then Hypertensive = 'Yes'; else Hypertensive = 'No'; endcomp; run; quit;
  • 30.
    Ayapparaj / PraxisBusiness School 30Chapter 15 Creating Customized Reports /*6 Using the SAS data set BloodPressure, produce a report showing Gender, Age, SBP, and DBP. Order the report in Gender and Age order as shown here:*/ proc report data=a15009.bloodpressure nowd; column Gender Age SBP DBP; define Gender / order width=6; define Age / order width=5; define SBP / display "Systolic Blood Pressure" width=8; define DBP / display "Diastolic Blood Pressure" width=9; run; quit; /*8 Using the data set Blood, produce a report like the one here. The numbers in the table are the average WBC and RBC counts for each combination of blood type and gender.*/ proc report data=a15009.bloodnew nowd headline; column BloodType Gender,WBC Gender,RBC; define BloodType / group 'Blood Type' width=5; define Gender / across width=8 '-Gender-'; define WBC / analysis mean format=comma8.; define RBC / analysis mean format=8.2; run; quit;