SlideShare a Scribd company logo
1 of 108
How to Speed Up Your Validation
Process Without Really Trying?
Alice Cheng
Chiltern International
September 10, 2015
Objective:
To Provide Tips and Techniques to Speed Up
Your Validation Process of 2 datasets.
2 #WUSS15
Introduction
3 #WUSS15
Structure of this Paper
Without
Automation
With
Automation
Orthodox
Use
Unorthodox
Use
%QCDATA
%SYSINFO
%QCDIR
Introduction
5 #WUSS15
Structure of this Paper
Without
Automation
With
Automation
Orthodox
Use
Unorthodox
Use
%QCDATA
&SYSINFO
%QCDIR
7 #WUSS15
Why Validate?
The Latin motto Finis origine pendet
means “The end depends on the
beginning”.
So if your data is clean,
your analysis result
cannot be accurate.
Overview
PROC COMPARE allows users to:
 Compare 2 datasets;
 Compare variables against variables of the
same dataset;
 Compare variables against variables of
different names between 2 datasets.
8 #WUSS15
Scope
We will concentrate on:
Comparing 2 Datasets
9 #WUSS15
Scope (continued)
 Compare 2 datasets and their attributes (Type, Creation
and Modified Dates, Number of Observations, Number
of Variables, Number of Observations and labels).
 Check if the Same Variables in both datasets.
 Check for Variable Attributes (Type, Length, Format,
Informat, Label).
 Check if Observations Matched in ID variables.
 Check if there are Duplicated Observations based on ID
variables.
 Comparison of Variable Values.
10 #WUSS15
Syntax
PROC COMPARE <option(s)>;
BY <DESCENDING> variable-1
<DESCENDING> variable-2 …
<NOTSORTED>;
ID <DESCENDING> variable-1 …
<DESCENDING> variable-2 …
<NOTSORTED>;
VAR variable(s);
WITH variable(s);
run;
11 #WUSS15
Syntax Used Here:
PROC COMPARE <option(s)>;
BY <DESCENDING> variable-1
<DESCENDING> variable-2 …
<NOTSORTED>;
ID <DESCENDING> variable-1 …
<DESCENDING> variable-2 …
<NOTSORTED>;
VAR variable(s);
WITH variable(s);
RUN;
12 #WUSS15
Some Useful Options in PROC COMPARE
Statement
 BASE = option specifies the dataset use as the base dataset
for comparisons. Alias: DATA = option.
 COMPARE = option specifies the dataset to be used as the
comparison dataset.
 LISTVAR lists all variables found in only one dataset.
 LISTOBS lists all observations that are found only in one
dataset.
 LISTALL is equivalent to applying both LISTVAR and LISTOBS.
 LISTEQUALVAR lists variables that are matched in values.
13 #WUSS15
Some Useful Options in PROC COMPARE
Statement –continued
 ALLOBS includes in the report of value comparison results
the values and for numeric variables, the differences for all
matching observations, even if they are judged equal.
 ALLSTATS prints a table of summary statistics for all pairs of
matching variables.
 METHOD=ABSOLUTE|EXACT|PERCENT|RELATIVE specifies
the method for judging the equality of numeric values. By
default, METHOD=EXACT.
 CRITERION=gamma specifies the criterion for judging the
equality of numeric values. Default value is 0.0001.
14 #WUSS15
Some Useful Options in PROC COMPARE
Statement –continued
 TRANSPOSE prints the reports of value difference by
observation instead of by variable.
 BRIEFSUMMARY prints only a short comparison summary.
 NOVALUES suppresses the report of the values comparison
results.
 NOPRINT suppress all printed output.
 MAXPRINT=total|(per-variable, total) specifies the
maximum number of differences to be printed.
15 #WUSS15
Some Useful Options in PROC COMPARE
Statement –continued
 OUTBASE, OUTCOMP, OUTDIF, OUTPERCENT and OUT=
options generate output dataset with the observations
from the BASE dataset, the COMPARE dataset and an
observation to indicate the difference and the difference in
percent between 2 datasets for any pair of observations, if
any.
 OUTALL is equivalent to using these 4 options: OUTBASE,
OUTCOMP, OUTDIF and OUTPERCENT.
 OUTNOEQUAL suppresses the writing of an observation to
the output dataset when all values in the observation are
judged equal.
16 #WUSS15
PROC COMPARE Statements
 BY Statement allows BY group comparison. Please make
sure BASE and COMPARE datasets are sorted according to
the variables stated in the BY statement.
 ID Statement helps one to easily identify the
observations(s) with discrepancies. Sorting the BASE and
COMPARE datasets according to ID variables is highly
recommended, despite a NOTSORTED option is available.
 VAR Statement enables one to specify the variables to be
compared.
 WITH Statement restricts the comparison of the values of
variables to the ones named in this statement.
17 #WUSS15
18 #WUSS15
A Simple Example Using PROC COMPARE –
BASE Dataset PROD.ADSL
Note: The RED text above represents problematic data.
19 #WUSS15
A Simple Example Using PROC COMPARE –
COMPARE Dataset QC.ADSL
Note: The RED text above represents problematic data.
Discrepancies: PROD.ADSL vs QC.ADSL
 PROC ADSL has 8 variables and 8 observations while
QC.ADSL has 7 variables and 7 observations.
 Variable AGEU exists only in PROD.ADSL.
 There are some discrepancies in values of variables
HEIGHTBL, BMIBL and COUNTRY.
 Subject 302 has an observation only in PROC.ADSL.
 Subject 303 has 2 observations in PROD.ADSL, even though
ADSL is supposed to have only one record per subject.
 Subject 303 has only one single observation in QC.ADSL.
 Subject 304 exists only in QC.ADSL.
20 #WUSS15
21 #WUSS15
EXAMPLE 1: COMPARE PROD.ADSL vs QC.ADSL
PROC COMPARE
BASE=PROD.ADSL COMPARE=QC.ADSL
LISTVAR LISTOBS LISTEQUALVAR
MAXPRINT=(20, 2400);
ID SUBJID;
RUN;
Again the meaning of these options are:
 LISTVAR lists all variables found in only one dataset.
 LISTOBS lists all observations that are found only in one
dataset.
 LISTEQUALVAR lists variables that are matched in values.
 MAXPRINT=total|(per-variable, total) specifies the
maximum number of differences to be printed.
22 #WUSS15
23 #WUSS15
EXAMPLE 1: PROC COMPARE OUTPUT
(1). Data Set Summary
24 #WUSS15
EXAMPLE 1: PROC COMPARE OUTPUT
(2). Variable Summary
25 #WUSS15
EXAMPLE 1: PROC COMPARE OUTPUT
(3). Comparison Results for Observations
26 #WUSS15
EXAMPLE 1: PROC COMPARE OUTPUT:
(4). Observation Summary
27 #WUSS15
EXAMPLE 1: PROC COMPARE OUTPUT (Continued)
(5). Value Comparison Results for Variables
28 #WUSS15
EXAMPLE 1: PROC COMPARE OUTPUT (Continued)
(5). Value Comparison Results for Variables
Note: BMIBL is partly based on HEIGHTBL. So discrepancies in BMIBL for
Subject 202 is most likely the result of discrepancy in HEIGHTBL value.
29 #WUSS15
EXAMPLE 1: PROC COMPARE OUTPUT (Continued)
(5). Value Comparison Results for Variables
Truncated. Limited to 20 characters.
Note: The value for COUNTRY has been truncated because PROC COMPARE
limited the display to 20 characters. This is true for the ID variables, as well as,
the variables to be compared in this report. The plus symbols (+) are likely to
mean the variables has a longer length than display.
30 #WUSS15
EXAMPLE 2: COMPARE PROD.ADSL vs.
PROD.ADSL PERFECT MATCH
PROC COMPARE
BASE=PROD.ADSL COMPARE=PROD.ADSL
LISTVAR LISTOBS LISTEQUALVAR;
ID SUBJID;
run;
Note: Since the BASE dataset and the COMPARE dataset are
the same, it is guaranteed to be a PERFECT MATCH!
31 #WUSS15
EXAMPLE 2: COMPARE PROD.ADSL vs.
PROD.ADSL PERFECT MATCH (Continued)
Eureka!!! This is the favorite text that makes every
validator rejoice!
32 #WUSS15
EXAMPLE 2: COMPARE PROD.ADSL vs.
PROD.ADSL PERFECT MATCH (Continued)
NOTE: No unequal values were found.
All values compared are exactly
equal.
Now for Example 2, we know these statements are true
since we are comparing PROD.ADSL with itself!!!
However, these statements are NOT the Holy Grail to
judge if this is truly a PERFECT MATCH!!!
A Highly Recommended Paper
Don’t Get Blindsided by PROC COMPARE
Joshua Horstman and Roger Muller
This paper contains 4 excellent examples why the
following statements
NOTE: No unequal values were found.
All values compared are exactly equal.
are NOT the ‘Holy Grail’, as described by Horstman and
Muller!!!
33 #WUSS15
34 #WUSS15
What is a Holy Grail?
The Holy Grail is a dish, plate, stone or cup that is part of
an important theme of Arthurian literature.
According to legend, it has special powers and is designed
to provide Happiness, Eternal Youth and Food in Infinite
Abundance. (Wikipedia)
35 #WUSS15
What is the Holy Grail in validation?
We will get to this in detail later …
Another Highly Recommended Paper
A highly recommended paper on PROC COMPARE
introduction:
PROC COMPARE – Worth Another Look!
Christianna S. Williams
This paper contains 10 excellent examples starting from
the very basic.
36 #WUSS15
37 #WUSS15
Without Automation:
Orthodox Use
Introduction
38 #WUSS15
Structure of this Paper
Without
Automation
With
Automation
Orthodox
Use
Unorthodox
Use
%QCDATA
&SYSINFO
%QCDIR
ORTHODOX USE TO SPEED UP VALIDATION
 Use Divide and Conquer Techniques
Use WHERE statement
39 #WUSS15
PROC COMPARE BASE=PROD.ADSL
COMPARE=QC.ADSL LISTVAR LISTOBS
LISTEQUALVAR;
ID SUBJID;
WHERE SUBJID=‘303’;
RUN;
ORTHODOX USE TO SPEED UP VALIDATION
(Continued)
 Use Divide and Conquer Techniques
Use VAR statement
40 #WUSS15
PROC COMPARE BASE=PROD.ADSL
COMPARE=QC.ADSL LISTVAR LISTOBS
LISTEQUALVAR;
ID SUBJID;
VAR HEIGHTBL WEIGHTBL BMIBL;
RUN;
ORTHODOX USE TO SPEED UP VALIDATION
(Continued)
 Use Divide and Conquer Techniques
Apply DROP = to both BASE and COMPARE datasets
41 #WUSS15
PROC COMPARE
BASE=PROD.ADSL (drop=HEIGHTBL WEIGHTBL BMIBL)
COMPARE=QC.ADSL (drop=HEIGHTBL WEIGHTBL BMIBL)
LISTVAR LISTOBS LISTEQUALVAR;
ID SUBJID;
RUN;
ORTHODOX USE TO SPEED UP VALIDATION
(Continued)
 Be Parsimonious in ID Variables
42 #WUSS15
Due to limited space in a page, SAS can only print a limited
number of ID variables. You want to keep those ID variables
that are critical.
Example: ID Variables for ADLB in one Single Study
Study ID, Subject ID, Lab Category, Lab Parameter, Analysis Visit, Analysis Date
ORTHODOX USE TO SPEED UP VALIDATION
(Continued)
 Be Parsimonious in ID Variable Values
43 #WUSS15
Remember PROC COMPARE output prints only 20 characters
for each variable. So keep the part of the value that are
meaningful!
ORTHODOX USE TO SPEED UP VALIDATION
(Continued)
 Be Parsimonious in ID Variable Values
44 #WUSS15
Remember PROC COMPARE output prints at most 20
characters for each variable. So keep the part of the value
that are meaningful!
Example: Use SUBJID instead of USUBJID
USUBJID = ‘ABCD-1234-56-0000000001’
SUBJID = ‘0000000001’
ORTHODOX USE TO SPEED UP VALIDATION
(Continued)
 Reduce the Amount of Printed Output
45 #WUSS15
Use MAXPRINT = total | (per-variable, total)
total = Maximum Total Number of Differences to be printed.
per-variable = Maximum Total Number of Differences to
be printed for each variable within BY group.
ORTHODOX USE TO SPEED UP VALIDATION
(Continued)
 Output Results in a Dataset, Instead of Printing
46 #WUSS15
• Use OUTBASE, OUTCMP, OUTDIF, OUTPERCENT,
OUTNOEQUAL and OUT= options.
• Suppress printing with NOPRINT option.
ORTHODOX USE TO SPEED UP VALIDATION
(Continued)
 Output Results in a Dataset, Instead of Printing
47 #WUSS15
• Use OUTBASE, OUTCMP, OUTDIF, OUTPERCENT,
OUTNOEQUAL and OUT= options.
• Suppress printing with NOPRINT option.
PROC COMPARE BASE=PROD.ADSL COMPARE=QC.ADSL
OUT=OUT_EX1 OUTBASE OUTCOMP OUTDIF
OUTPERCENT OUTNOEQUAL NOPRINT;
ID SUBJID;
RUN;
ORTHODOX USE TO SPEED UP VALIDATION
(Continued)
 Output Results in a Dataset, Instead of Printing
48 #WUSS15
OUT_EX1 generated by OUT=OUT_EX1, OUTBASE, OUTCOMP, OUTDIF and
OUTPERCENT Options
ORTHODOX USE TO SPEED UP VALIDATION
(Continued)
 Output Results in a Dataset, Instead of Printing
49 #WUSS15
Art Carpenter’s Method to Output Data with Discrepancies
• Transpose the output dataset from PROC COMPARE.
• This method creates a report that displays all
discrepancies for different variables one subject at a
time.
Reference: Carpenter’s Guide to Innovative SAS® Techniques
50 #WUSS15
Without Automation:
Orthodox Use
Introduction
51 #WUSS15
Structure of this Paper
Without
Automation
With
Automation
Orthodox
Use
%QCDATA
Unorthodox
Use &SYSINFO
%QCDIR
52 #WUSS15
UNORTHODOX USE TO SPEED UP VALIDATION
Recall when comparing PROD.ADSL against QC.ADSL,
there are 2 discrepancies in BMIBL.
BMI = (Weight in Kg) / (Height in meter) 2
UNORTHODOX USE TO SPEED UP VALIDATION
(Continued)
• Put Source Variables from BASE and COMPARE
Datasets in the ID Statement
53 #WUSS15
UNORTHODOX USE TO SPEED UP VALIDATION
(Continued)
• Put Source Variables from BASE and COMPARE
Datasets in the ID Statement
54 #WUSS15
PROC SORT DATA=PROD.ADSL OUT=PROD_ADSL;
BY SUBJID WEIGHTBL HEIGHTBL; RUN;
PROC SORT DATA=PROD.ADSL OUT=PROD_ADSL;
BY SUBJID WEIGHTBL HEIGHTBL; RUN;
PROC COMPARE BASE=PROD_ADSL COMPARE=QC_ADSL
LISTVAR LISTOBS LISTEQUALVAR;
ID SUBJID WEIGHTBL HEIGHTBL ;
RUN;
55 #WUSS15
UNORTHODOX USE TO SPEED UP VALIDATION
(Continued)
OUTPUT:
56 #WUSS15
UNORTHODOX USE TO SPEED UP VALIDATION
(Continued)
OUTPUT:
Note: For Subject 202, the value for HEIGHTBL in PROD_ADSL
is different from that in QC_ADSL. Hence, the discrepancy in
BMIBL value is likely due to the discrepancy in the source
variable HEIGHTBL.
57 #WUSS15
UNORTHODOX USE TO SPEED UP VALIDATION
(Continued)
OUTPUT:
Note: The value for source variables HEIGHTBL and
WEIGHTBL are the same, so the discrepancies in BMIBL is
likely due to miscalculation or a misunderstanding of its
definition.
UNORTHODOX USE TO SPEED UP VALIDATION
(Continued)
• Put Both Source Variables and the Variable to be
Compared in the ID Statement
58 #WUSS15
PROC SORT DATA=PROD.ADSL OUT=PROD_ADSL;
BY SUBJID WEIGHTBL HEIGHTBL BMIBL; RUN;
PROC SORT DATA=PROD.ADSL OUT=PROD_ADSL;
BY SUBJID WEIGHTBL HEIGHTBL BMIBL; RUN;
PROC COMPARE BASE=PROD_ADSL COMPARE=QC_ADSL
LISTVAR LISTOBS LISTEQUALVAR;
ID SUBJID WEIGHTBL HEIGHTBL BMIBL;
RUN;
59 #WUSS15
UNORTHODOX USE TO SPEED UP VALIDATION
(Continued)
OUTPUT:
Now the 2 pairs of observations line up nicely for you to see!
60 #WUSS15
UNORTHODOX USE TO SPEED UP VALIDATION
(Continued)
OUTPUT:
Now the 2 pairs of observations line up nicely for you to see!
Same HEIGHTBL and WEIGHTBL values, but
different BMIBL value. Most likely, miscalculation!
61 #WUSS15
UNORTHODOX USE TO SPEED UP VALIDATION
(Continued)
OUTPUT:
Now the 2 pairs of observations line up nicely for you to see!
Different HEIGHTBL values and different BMIBL
values. Most likely, discrepancies is due to
values in source variable HEIGHTBL!
UNORTHODOX USE TO SPEED UP VALIDATION
(Continued)
Caveat
 Of course, SOURCE variables have to be still in both
BASE and COMPARE datasets.
 For this method, please put variables in ID statement
in the following order:
ID Original ID Variables
Source Variables
Variable to be Compared;
62 #WUSS15
UNORTHODOX USE TO SPEED UP VALIDATION
(continued)
Caveat
 There is at least one variables other than the original
ID variables, the source variables and the variable to
be compared in both the BASE and SOURCE dataset.
Otherwise, you will get a message like:
NOTE: Except for the 4 ID variables, the data
sets WORK.PROD_ADSL and WORK.QC_ADSL have no
variables to compare. Comparisons of data values
not performed.
63 #WUSS15
64 #WUSS15
With Automation:
%QCDATA
Introduction
65 #WUSS15
Structure of this Paper
Without
Automation
With
Automation
Orthodox
Use
Unorthodox
Use
%QCDATA
%SYSINFO
%QCDIR
66 #WUSS15
%QCDATA to Validate Individual Dataset
%QCDATA
%QCDATA achieves the following:
 Sort BASE and COMPARE dataset based on ID
variables.
 Identify and print out duplicate observations
of BASE and COMPARE datasets upon
request, if applicable.
 Allow users to specify ID variables to be used
in PROC COMPARE.
67 #WUSS15
%QCDATA (Continued)
%QCDATA achieves the following:
 Allow users to specify ID variables to be used in
PROC COMPARE.
 Allow users to drop variables no to be compared.
 Perform PROC COMPARE.
Generate PROC COMPARE report, output dataset
and a dataset with the flag variables for duplication,
return code from PROC COMPARE and the status of
16 conditions based on &SYSINFO.
68 #WUSS15
69 #WUSS15
%QCDATA (Continued)
%macro QCDATA (
BASE= /*Specify BASE dataset. */
, COMPARE= /*Specify COMPARE dataset. */
,VARS= /*Specify variables to be compared. */
,DROPBASEVARS= /*Specify BASE variable to be dropped. */
,DROPCOMPVARS= /*Specify COMPARE variable to be dropped. */
,PRINTDUPS=Y /*Print Duplicate Observations. */
,COMPRPT=Y /*Generate PROC COMPARE report. */
,CREATEOUTDS=Y/*Create Output Dataset from PROC COMPARE.*/
,OUTDATA= /*Specify the name of Output Dataset. */
,OUTRC= /*Specify the Flag Variable/RC Data Name. */
,REFRESH = N /*Refresh dataset specified in OUTRC- */
);
70 #WUSS15
%QCDATA (Continued)
LIBNAME PROD “C:STUDYABCABC-CL-1001DATAPRODADAM”;
LIBNAME PROD “C:STUDYABCABC-CL-1001DATAQCADAM”;
LIBNAME MACROS “C:STUDYABCABC-CL-1001DATAQCADAM”;
OPTIONS SASAUTOS=(MACROS, SASAUTOS);
%QCDATA(BASE=PROD.ADSL,COMPARE=QC.ADSL,IDVARS=SUBJID,
PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE1,OUTRC=RC,REFRESH=Y);
%QCDATA(BASE=PROD.ADSL,COMPARE=PROD.ADSL,IDVARS=SUBJID,
PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE2,OUTRC=RC,REFRESH=N);
%QCDATA(BASE=QC.ADSL,COMPARE=QC.ADSL,IDVARS=SUBJID,
PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE3,OUTRC=RC,REFRESH=N);
%QCDATA(BASE=PROC.ADAE,COMPARE=QC.ADAE,IDVARS=SUBJID,
PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE4,OUTRC=RC,REFRESH=N);
71 #WUSS15
%QCDATA (Continued)
LIBNAME PROD “C:STUDYABCABC-CL-1001DATAPRODADAM”;
LIBNAME PROD “C:STUDYABCABC-CL-1001DATAQCADAM”;
LIBNAME MACROS “C:STUDYABCABC-CL-1001DATAQCADAM”;
OPTIONS SASAUTOS=(MACROS, SASAUTOS);
%QCDATA(BASE=PROD.ADSL,COMPARE=QC.ADSL,IDVARS=SUBJID,
PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE1,OUTRC=RC,REFRESH=Y);
%QCDATA(BASE=PROD.ADSL,COMPARE=PROD.ADSL,IDVARS=SUBJID,
PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE2,OUTRC=RC,REFRESH=N);
%QCDATA(BASE=QC.ADSL,COMPARE=QC.ADSL,IDVARS=SUBJID,
PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE3,OUTRC=RC,REFRESH=N);
%QCDATA(BASE=PROC.ADAE,COMPARE=QC.ADAE,IDVARS=SUBJID,
PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE4,OUTRC=RC,REFRESH=N);
Generate output dataset for each comparison result.
72 #WUSS15
%QCDATA (Continued)
LIBNAME PROD “C:STUDYABCABC-CL-1001DATAPRODADAM”;
LIBNAME PROD “C:STUDYABCABC-CL-1001DATAQCADAM”;
LIBNAME MACROS “C:STUDYABCABC-CL-1001DATAQCADAM”;
OPTIONS SASAUTOS=(MACROS, SASAUTOS);
%QCDATA(BASE=PROD.ADSL,COMPARE=QC.ADSL,IDVARS=SUBJID,
PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE1,OUTRC=RC,REFRESH=Y);
%QCDATA(BASE=PROD.ADSL,COMPARE=PROD.ADSL,IDVARS=SUBJID,
PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE2,OUTRC=RC,REFRESH=N);
%QCDATA(BASE=QC.ADSL,COMPARE=QC.ADSL,IDVARS=SUBJID,
PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE3,OUTRC=RC,REFRESH=N);
%QCDATA(BASE=PROC.ADAE,COMPARE=QC.ADAE,IDVARS=SUBJID,
PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE4,OUTRC=RC,REFRESH=N);
Refresh RC dataset at first %QCDATA invocation and
generate RC dataset.
73 #WUSS15
%QCDATA (Continued)
RC Dataset Generated by 4 Invocations of %QCDATA.
74 #WUSS15
%QCDATA (continued)
RC Dataset Generated by 4 Invocations of %QCDATA.
Flag Variable for Duplicate Observations in BASE Dataset
75 #WUSS15
%QCDATA (Continued)
RC Dataset Generated by 4 Invocations of %QCDATA.
Flag Variable for Duplicate Observations in COMPARE Dataset
76 #WUSS15
%QCDATA (continued)
RC Dataset Generated by 4 Invocations of %QCDATA.
These values are generated by/derived from &SYSINFO.
What???
77 #WUSS15
Introduction to &SYSINFO
Generated by PROC COMPARE
Introduction
78 #WUSS15
Structure of this Paper
Without
Automation
With
Automation
Orthodox
Use
Unorthodox
Use %SYSINFO
%QCDATA
%QCDIR
79 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE
&SYSINFO contains a return code
summarizing the result of
comparison of the BASE dataset
against the COMPARE dataset.
80 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE (Continued)
PROC COMPARE …;
… more codes …;
run;
%let RC = &SYSINFO;
Note: It is critical to
get the value of
&SYSINFO
immediately after
PROC COMPARE or
its value will change.
81 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE
So exactly what is &SYSINFO?
82 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE (Continued)
&SYSINFO is the sum of the code
for each condition that fails and
the conditions are …
83 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE (Continued)
84 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE (Continued)
85 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE (Continued)
Note: The Code are in 2**X format.
86 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE (Continued)
Note: The Code are in 2**X format.
87 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE (Continued)
Now because the code for each
Failure Description is of 2**X Format
(X=1, …, 16) and no code will be
repeated in a test,
Once we know the sum of the codes
(i.e., the value of &SYSINFO),
we will know what condition(s), if any,
has/have been violated.
88 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE (Continued)
Example
Say, &SYSINFO resolves to 48.
From the table, we know that is the result of
16 + 32 = 48
89 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE (Continued)
So, &SYSINFO = 48 = 16 + 32
So Condition 5 (Variable
has different length) and
Condition 6 (Variable has
different label) have been
Violated.
90 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE (Continued)
So again, this indicates Condition 5 (Variable has different length)
and Condition 6 (Variable has different label) have been
violated!!!
91 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE (Continued)
48 is an easy number to see its sum from
the table.
What if we have say 5357? What are the
code that adds up to this?
Ok. We can put the 5357 in BINARY16.
format and found the position with 1s.
Better yet, perform Bit Map Testing.
92 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE (Continued)
Code for Bit Map Testing (found within %QCDATA)
93 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE (Continued)
Code for Bit Map Testing (found within %QCDATA)
/* Test for Data Set Label. */
if RC='1'b then do;
RC_1='Y';
put '<<< Data set labels differ.';
end;
/* Test for Data Set Types. */
if RC='1.'b then do;
RC_2='Y'; RC_2 assigned.
put '<<< Data set types differs';
end;
RC_1 assigned.
94 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE (Continued)
Code for Bit Map Testing (found within %QCDATA)
/* Test for Data Set Label. */
if RC='1'b then do;
RC_1='Y';
put '<<< Data set labels differ.';
end;
/* Test for Data Set Types. */
if RC='1.'b then do;
RC_2='Y';
put '<<< Data set types differs';
end;
95 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE (Continued)
• A &SYSINFO < 64 means the differences
are only due to ATTRIBUTES.
96 #WUSS15
Introduction to &SYSINFO from
PROC COMPARE (Continued)
• Remember the Holy Grail?
It is &SYSINFO=0!!!
It means NO ERROR Found!!!
In my RC dataset output,
&SYSINFO=0 is equivalent
to RC_0=‘Y’.
97 #WUSS15
%QCDIR to Validate Datasets in
Production and QC Directories
Introduction
98 #WUSS15
Structure of this Paper
Without
Automation
With
Automation
Orthodox
Use
Unorthodox
Use %SYSINFO
%QCDATA
%QCDIR
99 #WUSS15
%QCDIR to validate datasets in directory
PROD and QC are the directories and ADXXs are the datasets.
100 #WUSS15
%QCDIR to validate datasets in directory
(Continued)
Note: DICT.IDVARS contains the IDVARS for the dataset.
101 #WUSS15
%QCDIR to validate datasets in directory
(Continued)
By means of PROC CONTENTS of PROD._ALL_ and QC._ALL_,
One can identify all datasets within the directory. Then
Merge with DICT.IDVARS to create WORK.ALLDS above.
102 #WUSS15
%QCDIR to validate datasets in directory
(Continued)
By means of PROC CONTENTS of PROD._ALL_ and QC._ALL_,
One can identify all datasets within the directory. Then
Merge wit DICT.IDVARS to create WORK.ALLDS. This is performed
within %QCDIR.
Parameters in %QCDIR …
103 #WUSS15
%QCDIR to validate datasets in directory
(Continued)
Code from %QCDIR
Use CALL EXECUTE (%NRSTR (…)) to execute %QCDATA.
104 #WUSS15
%QCDIR to validate datasets in directory
(Continued)
Code from %QCDIR
Note: using CALL EXECUTE (%NRSTR (…)) to execute
%QCDATA is important. CALL EXECUTE(…) by itself will
generate incorrect result. This is because CALL EXECUTE
executes immediately, but here we need to resolve the
value of &SYSINFO for each comparison. Adding
%NRSTR( ) delays the execution.
CALL EXECUTE (‘%NRSTR(%QCDATA(BASE=‘||BASE||
‘,COMPARE=‘||COMPARE||
‘,IDVARS=‘||IDVARS||
‘,OUTRC=ALLRC,
REFRESH=Y));’);
105 #WUSS15
%QCDIR to validate datasets in directory
(Continued)
106 #WUSS15
%QCDIR to validate datasets in directory
(Continued)
Figure 5: Comparison of Datasets in Production vs. Validation Process
(Continued)
107 #WUSS15
Conclusion
In this paper, the author goes through tips and
techniques to speed up the validation process. For
‘Without Automation’ techniques, Orthodox and
Unorthodox methods are introduced.
For ‘Automation’ Part, we have introduced %QCDATA,
&SYSINFO and %QCDIR.
‘How to Speed Up Your Validation Process Without
Really Trying?’ Maybe you do need to try a bit. But
hopefully, these techniques have speeded up your
Validation Process. Happy Compare-ing!!!
Acknowledgment
 Art Carpenter, Caloxy Inc.
 Jane Eslinger, SAS Technical Support
 Boyd Roloff, Chiltern International
 Kevin Russell, SAS Technical Support
108 #WUSS15
Contact Information
Name: Alice Cheng
Enterprise: Chiltern International
E-mail: alice.cheng@chiltern.com
alice_m_cheng@yahoo.com
109 #WUSS14
110 #WUSS15
The End

More Related Content

Similar to Speed Up Validation Without Really Trying

Guntur step by step survey logistic
Guntur   step by step survey logisticGuntur   step by step survey logistic
Guntur step by step survey logisticsigeledek
 
DESIGN OF 8-BIT COMPARATORS
DESIGN OF 8-BIT COMPARATORSDESIGN OF 8-BIT COMPARATORS
DESIGN OF 8-BIT COMPARATORSIRJET Journal
 
Database Management System Review
Database Management System ReviewDatabase Management System Review
Database Management System ReviewKaya Ota
 
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clusteringLiang Xie, PhD
 
Data warehousing testing strategies cognos
Data warehousing testing strategies cognosData warehousing testing strategies cognos
Data warehousing testing strategies cognosSandeep Mehta
 
Session 3 MEASUREMENT MODEL EVALUATION
Session 3 MEASUREMENT MODEL EVALUATIONSession 3 MEASUREMENT MODEL EVALUATION
Session 3 MEASUREMENT MODEL EVALUATIONDr. Firdaus Basbeth
 
Eli plots visualizing innumerable number of correlations
Eli plots   visualizing innumerable number of correlationsEli plots   visualizing innumerable number of correlations
Eli plots visualizing innumerable number of correlationsLeonardo Auslender
 
6 integrity and security
6 integrity and security6 integrity and security
6 integrity and securityDilip G R
 
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCPart 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCJoachim Jacob
 
ETL Validator Usecase -Metadata Comparison
ETL Validator Usecase -Metadata ComparisonETL Validator Usecase -Metadata Comparison
ETL Validator Usecase -Metadata ComparisonDatagaps Inc
 
ETL Validator Usecase -Metadata Comparison
ETL Validator Usecase -Metadata ComparisonETL Validator Usecase -Metadata Comparison
ETL Validator Usecase -Metadata ComparisonVasavi Chinta
 
Think of oracle and mysql bind value
Think of oracle and mysql bind value Think of oracle and mysql bind value
Think of oracle and mysql bind value Louis liu
 
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...Daniel Valcarce
 
Advanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsAdvanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsDave Stokes
 
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...Edureka!
 
Exploring Microoptimizations Using Tizen Code as an Example
Exploring Microoptimizations Using Tizen Code as an ExampleExploring Microoptimizations Using Tizen Code as an Example
Exploring Microoptimizations Using Tizen Code as an ExamplePVS-Studio
 
Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Prof. Wim Van Criekinge
 
A Verified Modern SAT Solver
A Verified Modern SAT SolverA Verified Modern SAT Solver
A Verified Modern SAT SolverKatie Naple
 
UsingAHP_02
UsingAHP_02UsingAHP_02
UsingAHP_02pbaxter
 

Similar to Speed Up Validation Without Really Trying (20)

Guntur step by step survey logistic
Guntur   step by step survey logisticGuntur   step by step survey logistic
Guntur step by step survey logistic
 
DESIGN OF 8-BIT COMPARATORS
DESIGN OF 8-BIT COMPARATORSDESIGN OF 8-BIT COMPARATORS
DESIGN OF 8-BIT COMPARATORS
 
Database Management System Review
Database Management System ReviewDatabase Management System Review
Database Management System Review
 
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clustering
 
Data warehousing testing strategies cognos
Data warehousing testing strategies cognosData warehousing testing strategies cognos
Data warehousing testing strategies cognos
 
Session 3 MEASUREMENT MODEL EVALUATION
Session 3 MEASUREMENT MODEL EVALUATIONSession 3 MEASUREMENT MODEL EVALUATION
Session 3 MEASUREMENT MODEL EVALUATION
 
Eli plots visualizing innumerable number of correlations
Eli plots   visualizing innumerable number of correlationsEli plots   visualizing innumerable number of correlations
Eli plots visualizing innumerable number of correlations
 
6 integrity and security
6 integrity and security6 integrity and security
6 integrity and security
 
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCPart 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
 
ETL Validator Usecase -Metadata Comparison
ETL Validator Usecase -Metadata ComparisonETL Validator Usecase -Metadata Comparison
ETL Validator Usecase -Metadata Comparison
 
ETL Validator Usecase -Metadata Comparison
ETL Validator Usecase -Metadata ComparisonETL Validator Usecase -Metadata Comparison
ETL Validator Usecase -Metadata Comparison
 
Think of oracle and mysql bind value
Think of oracle and mysql bind value Think of oracle and mysql bind value
Think of oracle and mysql bind value
 
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...
 
Advanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsAdvanced MySQL Query Optimizations
Advanced MySQL Query Optimizations
 
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...
R Programming For Beginners | R Language Tutorial | R Tutorial For Beginners ...
 
Exploring Microoptimizations Using Tizen Code as an Example
Exploring Microoptimizations Using Tizen Code as an ExampleExploring Microoptimizations Using Tizen Code as an Example
Exploring Microoptimizations Using Tizen Code as an Example
 
Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013
 
Bioinformatica t4-alignments
Bioinformatica t4-alignmentsBioinformatica t4-alignments
Bioinformatica t4-alignments
 
A Verified Modern SAT Solver
A Verified Modern SAT SolverA Verified Modern SAT Solver
A Verified Modern SAT Solver
 
UsingAHP_02
UsingAHP_02UsingAHP_02
UsingAHP_02
 

Recently uploaded

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 

Recently uploaded (20)

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 

Speed Up Validation Without Really Trying

  • 1. How to Speed Up Your Validation Process Without Really Trying? Alice Cheng Chiltern International September 10, 2015
  • 2. Objective: To Provide Tips and Techniques to Speed Up Your Validation Process of 2 datasets. 2 #WUSS15
  • 3. Introduction 3 #WUSS15 Structure of this Paper Without Automation With Automation Orthodox Use Unorthodox Use %QCDATA %SYSINFO %QCDIR
  • 4. Introduction 5 #WUSS15 Structure of this Paper Without Automation With Automation Orthodox Use Unorthodox Use %QCDATA &SYSINFO %QCDIR
  • 5. 7 #WUSS15 Why Validate? The Latin motto Finis origine pendet means “The end depends on the beginning”. So if your data is clean, your analysis result cannot be accurate.
  • 6. Overview PROC COMPARE allows users to:  Compare 2 datasets;  Compare variables against variables of the same dataset;  Compare variables against variables of different names between 2 datasets. 8 #WUSS15
  • 7. Scope We will concentrate on: Comparing 2 Datasets 9 #WUSS15
  • 8. Scope (continued)  Compare 2 datasets and their attributes (Type, Creation and Modified Dates, Number of Observations, Number of Variables, Number of Observations and labels).  Check if the Same Variables in both datasets.  Check for Variable Attributes (Type, Length, Format, Informat, Label).  Check if Observations Matched in ID variables.  Check if there are Duplicated Observations based on ID variables.  Comparison of Variable Values. 10 #WUSS15
  • 9. Syntax PROC COMPARE <option(s)>; BY <DESCENDING> variable-1 <DESCENDING> variable-2 … <NOTSORTED>; ID <DESCENDING> variable-1 … <DESCENDING> variable-2 … <NOTSORTED>; VAR variable(s); WITH variable(s); run; 11 #WUSS15
  • 10. Syntax Used Here: PROC COMPARE <option(s)>; BY <DESCENDING> variable-1 <DESCENDING> variable-2 … <NOTSORTED>; ID <DESCENDING> variable-1 … <DESCENDING> variable-2 … <NOTSORTED>; VAR variable(s); WITH variable(s); RUN; 12 #WUSS15
  • 11. Some Useful Options in PROC COMPARE Statement  BASE = option specifies the dataset use as the base dataset for comparisons. Alias: DATA = option.  COMPARE = option specifies the dataset to be used as the comparison dataset.  LISTVAR lists all variables found in only one dataset.  LISTOBS lists all observations that are found only in one dataset.  LISTALL is equivalent to applying both LISTVAR and LISTOBS.  LISTEQUALVAR lists variables that are matched in values. 13 #WUSS15
  • 12. Some Useful Options in PROC COMPARE Statement –continued  ALLOBS includes in the report of value comparison results the values and for numeric variables, the differences for all matching observations, even if they are judged equal.  ALLSTATS prints a table of summary statistics for all pairs of matching variables.  METHOD=ABSOLUTE|EXACT|PERCENT|RELATIVE specifies the method for judging the equality of numeric values. By default, METHOD=EXACT.  CRITERION=gamma specifies the criterion for judging the equality of numeric values. Default value is 0.0001. 14 #WUSS15
  • 13. Some Useful Options in PROC COMPARE Statement –continued  TRANSPOSE prints the reports of value difference by observation instead of by variable.  BRIEFSUMMARY prints only a short comparison summary.  NOVALUES suppresses the report of the values comparison results.  NOPRINT suppress all printed output.  MAXPRINT=total|(per-variable, total) specifies the maximum number of differences to be printed. 15 #WUSS15
  • 14. Some Useful Options in PROC COMPARE Statement –continued  OUTBASE, OUTCOMP, OUTDIF, OUTPERCENT and OUT= options generate output dataset with the observations from the BASE dataset, the COMPARE dataset and an observation to indicate the difference and the difference in percent between 2 datasets for any pair of observations, if any.  OUTALL is equivalent to using these 4 options: OUTBASE, OUTCOMP, OUTDIF and OUTPERCENT.  OUTNOEQUAL suppresses the writing of an observation to the output dataset when all values in the observation are judged equal. 16 #WUSS15
  • 15. PROC COMPARE Statements  BY Statement allows BY group comparison. Please make sure BASE and COMPARE datasets are sorted according to the variables stated in the BY statement.  ID Statement helps one to easily identify the observations(s) with discrepancies. Sorting the BASE and COMPARE datasets according to ID variables is highly recommended, despite a NOTSORTED option is available.  VAR Statement enables one to specify the variables to be compared.  WITH Statement restricts the comparison of the values of variables to the ones named in this statement. 17 #WUSS15
  • 16. 18 #WUSS15 A Simple Example Using PROC COMPARE – BASE Dataset PROD.ADSL Note: The RED text above represents problematic data.
  • 17. 19 #WUSS15 A Simple Example Using PROC COMPARE – COMPARE Dataset QC.ADSL Note: The RED text above represents problematic data.
  • 18. Discrepancies: PROD.ADSL vs QC.ADSL  PROC ADSL has 8 variables and 8 observations while QC.ADSL has 7 variables and 7 observations.  Variable AGEU exists only in PROD.ADSL.  There are some discrepancies in values of variables HEIGHTBL, BMIBL and COUNTRY.  Subject 302 has an observation only in PROC.ADSL.  Subject 303 has 2 observations in PROD.ADSL, even though ADSL is supposed to have only one record per subject.  Subject 303 has only one single observation in QC.ADSL.  Subject 304 exists only in QC.ADSL. 20 #WUSS15
  • 19. 21 #WUSS15 EXAMPLE 1: COMPARE PROD.ADSL vs QC.ADSL PROC COMPARE BASE=PROD.ADSL COMPARE=QC.ADSL LISTVAR LISTOBS LISTEQUALVAR MAXPRINT=(20, 2400); ID SUBJID; RUN;
  • 20. Again the meaning of these options are:  LISTVAR lists all variables found in only one dataset.  LISTOBS lists all observations that are found only in one dataset.  LISTEQUALVAR lists variables that are matched in values.  MAXPRINT=total|(per-variable, total) specifies the maximum number of differences to be printed. 22 #WUSS15
  • 21. 23 #WUSS15 EXAMPLE 1: PROC COMPARE OUTPUT (1). Data Set Summary
  • 22. 24 #WUSS15 EXAMPLE 1: PROC COMPARE OUTPUT (2). Variable Summary
  • 23. 25 #WUSS15 EXAMPLE 1: PROC COMPARE OUTPUT (3). Comparison Results for Observations
  • 24. 26 #WUSS15 EXAMPLE 1: PROC COMPARE OUTPUT: (4). Observation Summary
  • 25. 27 #WUSS15 EXAMPLE 1: PROC COMPARE OUTPUT (Continued) (5). Value Comparison Results for Variables
  • 26. 28 #WUSS15 EXAMPLE 1: PROC COMPARE OUTPUT (Continued) (5). Value Comparison Results for Variables Note: BMIBL is partly based on HEIGHTBL. So discrepancies in BMIBL for Subject 202 is most likely the result of discrepancy in HEIGHTBL value.
  • 27. 29 #WUSS15 EXAMPLE 1: PROC COMPARE OUTPUT (Continued) (5). Value Comparison Results for Variables Truncated. Limited to 20 characters. Note: The value for COUNTRY has been truncated because PROC COMPARE limited the display to 20 characters. This is true for the ID variables, as well as, the variables to be compared in this report. The plus symbols (+) are likely to mean the variables has a longer length than display.
  • 28. 30 #WUSS15 EXAMPLE 2: COMPARE PROD.ADSL vs. PROD.ADSL PERFECT MATCH PROC COMPARE BASE=PROD.ADSL COMPARE=PROD.ADSL LISTVAR LISTOBS LISTEQUALVAR; ID SUBJID; run; Note: Since the BASE dataset and the COMPARE dataset are the same, it is guaranteed to be a PERFECT MATCH!
  • 29. 31 #WUSS15 EXAMPLE 2: COMPARE PROD.ADSL vs. PROD.ADSL PERFECT MATCH (Continued) Eureka!!! This is the favorite text that makes every validator rejoice!
  • 30. 32 #WUSS15 EXAMPLE 2: COMPARE PROD.ADSL vs. PROD.ADSL PERFECT MATCH (Continued) NOTE: No unequal values were found. All values compared are exactly equal. Now for Example 2, we know these statements are true since we are comparing PROD.ADSL with itself!!! However, these statements are NOT the Holy Grail to judge if this is truly a PERFECT MATCH!!!
  • 31. A Highly Recommended Paper Don’t Get Blindsided by PROC COMPARE Joshua Horstman and Roger Muller This paper contains 4 excellent examples why the following statements NOTE: No unequal values were found. All values compared are exactly equal. are NOT the ‘Holy Grail’, as described by Horstman and Muller!!! 33 #WUSS15
  • 32. 34 #WUSS15 What is a Holy Grail? The Holy Grail is a dish, plate, stone or cup that is part of an important theme of Arthurian literature. According to legend, it has special powers and is designed to provide Happiness, Eternal Youth and Food in Infinite Abundance. (Wikipedia)
  • 33. 35 #WUSS15 What is the Holy Grail in validation? We will get to this in detail later …
  • 34. Another Highly Recommended Paper A highly recommended paper on PROC COMPARE introduction: PROC COMPARE – Worth Another Look! Christianna S. Williams This paper contains 10 excellent examples starting from the very basic. 36 #WUSS15
  • 36. Introduction 38 #WUSS15 Structure of this Paper Without Automation With Automation Orthodox Use Unorthodox Use %QCDATA &SYSINFO %QCDIR
  • 37. ORTHODOX USE TO SPEED UP VALIDATION  Use Divide and Conquer Techniques Use WHERE statement 39 #WUSS15 PROC COMPARE BASE=PROD.ADSL COMPARE=QC.ADSL LISTVAR LISTOBS LISTEQUALVAR; ID SUBJID; WHERE SUBJID=‘303’; RUN;
  • 38. ORTHODOX USE TO SPEED UP VALIDATION (Continued)  Use Divide and Conquer Techniques Use VAR statement 40 #WUSS15 PROC COMPARE BASE=PROD.ADSL COMPARE=QC.ADSL LISTVAR LISTOBS LISTEQUALVAR; ID SUBJID; VAR HEIGHTBL WEIGHTBL BMIBL; RUN;
  • 39. ORTHODOX USE TO SPEED UP VALIDATION (Continued)  Use Divide and Conquer Techniques Apply DROP = to both BASE and COMPARE datasets 41 #WUSS15 PROC COMPARE BASE=PROD.ADSL (drop=HEIGHTBL WEIGHTBL BMIBL) COMPARE=QC.ADSL (drop=HEIGHTBL WEIGHTBL BMIBL) LISTVAR LISTOBS LISTEQUALVAR; ID SUBJID; RUN;
  • 40. ORTHODOX USE TO SPEED UP VALIDATION (Continued)  Be Parsimonious in ID Variables 42 #WUSS15 Due to limited space in a page, SAS can only print a limited number of ID variables. You want to keep those ID variables that are critical. Example: ID Variables for ADLB in one Single Study Study ID, Subject ID, Lab Category, Lab Parameter, Analysis Visit, Analysis Date
  • 41. ORTHODOX USE TO SPEED UP VALIDATION (Continued)  Be Parsimonious in ID Variable Values 43 #WUSS15 Remember PROC COMPARE output prints only 20 characters for each variable. So keep the part of the value that are meaningful!
  • 42. ORTHODOX USE TO SPEED UP VALIDATION (Continued)  Be Parsimonious in ID Variable Values 44 #WUSS15 Remember PROC COMPARE output prints at most 20 characters for each variable. So keep the part of the value that are meaningful! Example: Use SUBJID instead of USUBJID USUBJID = ‘ABCD-1234-56-0000000001’ SUBJID = ‘0000000001’
  • 43. ORTHODOX USE TO SPEED UP VALIDATION (Continued)  Reduce the Amount of Printed Output 45 #WUSS15 Use MAXPRINT = total | (per-variable, total) total = Maximum Total Number of Differences to be printed. per-variable = Maximum Total Number of Differences to be printed for each variable within BY group.
  • 44. ORTHODOX USE TO SPEED UP VALIDATION (Continued)  Output Results in a Dataset, Instead of Printing 46 #WUSS15 • Use OUTBASE, OUTCMP, OUTDIF, OUTPERCENT, OUTNOEQUAL and OUT= options. • Suppress printing with NOPRINT option.
  • 45. ORTHODOX USE TO SPEED UP VALIDATION (Continued)  Output Results in a Dataset, Instead of Printing 47 #WUSS15 • Use OUTBASE, OUTCMP, OUTDIF, OUTPERCENT, OUTNOEQUAL and OUT= options. • Suppress printing with NOPRINT option. PROC COMPARE BASE=PROD.ADSL COMPARE=QC.ADSL OUT=OUT_EX1 OUTBASE OUTCOMP OUTDIF OUTPERCENT OUTNOEQUAL NOPRINT; ID SUBJID; RUN;
  • 46. ORTHODOX USE TO SPEED UP VALIDATION (Continued)  Output Results in a Dataset, Instead of Printing 48 #WUSS15 OUT_EX1 generated by OUT=OUT_EX1, OUTBASE, OUTCOMP, OUTDIF and OUTPERCENT Options
  • 47. ORTHODOX USE TO SPEED UP VALIDATION (Continued)  Output Results in a Dataset, Instead of Printing 49 #WUSS15 Art Carpenter’s Method to Output Data with Discrepancies • Transpose the output dataset from PROC COMPARE. • This method creates a report that displays all discrepancies for different variables one subject at a time. Reference: Carpenter’s Guide to Innovative SAS® Techniques
  • 49. Introduction 51 #WUSS15 Structure of this Paper Without Automation With Automation Orthodox Use %QCDATA Unorthodox Use &SYSINFO %QCDIR
  • 50. 52 #WUSS15 UNORTHODOX USE TO SPEED UP VALIDATION Recall when comparing PROD.ADSL against QC.ADSL, there are 2 discrepancies in BMIBL. BMI = (Weight in Kg) / (Height in meter) 2
  • 51. UNORTHODOX USE TO SPEED UP VALIDATION (Continued) • Put Source Variables from BASE and COMPARE Datasets in the ID Statement 53 #WUSS15
  • 52. UNORTHODOX USE TO SPEED UP VALIDATION (Continued) • Put Source Variables from BASE and COMPARE Datasets in the ID Statement 54 #WUSS15 PROC SORT DATA=PROD.ADSL OUT=PROD_ADSL; BY SUBJID WEIGHTBL HEIGHTBL; RUN; PROC SORT DATA=PROD.ADSL OUT=PROD_ADSL; BY SUBJID WEIGHTBL HEIGHTBL; RUN; PROC COMPARE BASE=PROD_ADSL COMPARE=QC_ADSL LISTVAR LISTOBS LISTEQUALVAR; ID SUBJID WEIGHTBL HEIGHTBL ; RUN;
  • 53. 55 #WUSS15 UNORTHODOX USE TO SPEED UP VALIDATION (Continued) OUTPUT:
  • 54. 56 #WUSS15 UNORTHODOX USE TO SPEED UP VALIDATION (Continued) OUTPUT: Note: For Subject 202, the value for HEIGHTBL in PROD_ADSL is different from that in QC_ADSL. Hence, the discrepancy in BMIBL value is likely due to the discrepancy in the source variable HEIGHTBL.
  • 55. 57 #WUSS15 UNORTHODOX USE TO SPEED UP VALIDATION (Continued) OUTPUT: Note: The value for source variables HEIGHTBL and WEIGHTBL are the same, so the discrepancies in BMIBL is likely due to miscalculation or a misunderstanding of its definition.
  • 56. UNORTHODOX USE TO SPEED UP VALIDATION (Continued) • Put Both Source Variables and the Variable to be Compared in the ID Statement 58 #WUSS15 PROC SORT DATA=PROD.ADSL OUT=PROD_ADSL; BY SUBJID WEIGHTBL HEIGHTBL BMIBL; RUN; PROC SORT DATA=PROD.ADSL OUT=PROD_ADSL; BY SUBJID WEIGHTBL HEIGHTBL BMIBL; RUN; PROC COMPARE BASE=PROD_ADSL COMPARE=QC_ADSL LISTVAR LISTOBS LISTEQUALVAR; ID SUBJID WEIGHTBL HEIGHTBL BMIBL; RUN;
  • 57. 59 #WUSS15 UNORTHODOX USE TO SPEED UP VALIDATION (Continued) OUTPUT: Now the 2 pairs of observations line up nicely for you to see!
  • 58. 60 #WUSS15 UNORTHODOX USE TO SPEED UP VALIDATION (Continued) OUTPUT: Now the 2 pairs of observations line up nicely for you to see! Same HEIGHTBL and WEIGHTBL values, but different BMIBL value. Most likely, miscalculation!
  • 59. 61 #WUSS15 UNORTHODOX USE TO SPEED UP VALIDATION (Continued) OUTPUT: Now the 2 pairs of observations line up nicely for you to see! Different HEIGHTBL values and different BMIBL values. Most likely, discrepancies is due to values in source variable HEIGHTBL!
  • 60. UNORTHODOX USE TO SPEED UP VALIDATION (Continued) Caveat  Of course, SOURCE variables have to be still in both BASE and COMPARE datasets.  For this method, please put variables in ID statement in the following order: ID Original ID Variables Source Variables Variable to be Compared; 62 #WUSS15
  • 61. UNORTHODOX USE TO SPEED UP VALIDATION (continued) Caveat  There is at least one variables other than the original ID variables, the source variables and the variable to be compared in both the BASE and SOURCE dataset. Otherwise, you will get a message like: NOTE: Except for the 4 ID variables, the data sets WORK.PROD_ADSL and WORK.QC_ADSL have no variables to compare. Comparisons of data values not performed. 63 #WUSS15
  • 63. Introduction 65 #WUSS15 Structure of this Paper Without Automation With Automation Orthodox Use Unorthodox Use %QCDATA %SYSINFO %QCDIR
  • 64. 66 #WUSS15 %QCDATA to Validate Individual Dataset
  • 65. %QCDATA %QCDATA achieves the following:  Sort BASE and COMPARE dataset based on ID variables.  Identify and print out duplicate observations of BASE and COMPARE datasets upon request, if applicable.  Allow users to specify ID variables to be used in PROC COMPARE. 67 #WUSS15
  • 66. %QCDATA (Continued) %QCDATA achieves the following:  Allow users to specify ID variables to be used in PROC COMPARE.  Allow users to drop variables no to be compared.  Perform PROC COMPARE. Generate PROC COMPARE report, output dataset and a dataset with the flag variables for duplication, return code from PROC COMPARE and the status of 16 conditions based on &SYSINFO. 68 #WUSS15
  • 67. 69 #WUSS15 %QCDATA (Continued) %macro QCDATA ( BASE= /*Specify BASE dataset. */ , COMPARE= /*Specify COMPARE dataset. */ ,VARS= /*Specify variables to be compared. */ ,DROPBASEVARS= /*Specify BASE variable to be dropped. */ ,DROPCOMPVARS= /*Specify COMPARE variable to be dropped. */ ,PRINTDUPS=Y /*Print Duplicate Observations. */ ,COMPRPT=Y /*Generate PROC COMPARE report. */ ,CREATEOUTDS=Y/*Create Output Dataset from PROC COMPARE.*/ ,OUTDATA= /*Specify the name of Output Dataset. */ ,OUTRC= /*Specify the Flag Variable/RC Data Name. */ ,REFRESH = N /*Refresh dataset specified in OUTRC- */ );
  • 68. 70 #WUSS15 %QCDATA (Continued) LIBNAME PROD “C:STUDYABCABC-CL-1001DATAPRODADAM”; LIBNAME PROD “C:STUDYABCABC-CL-1001DATAQCADAM”; LIBNAME MACROS “C:STUDYABCABC-CL-1001DATAQCADAM”; OPTIONS SASAUTOS=(MACROS, SASAUTOS); %QCDATA(BASE=PROD.ADSL,COMPARE=QC.ADSL,IDVARS=SUBJID, PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE1,OUTRC=RC,REFRESH=Y); %QCDATA(BASE=PROD.ADSL,COMPARE=PROD.ADSL,IDVARS=SUBJID, PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE2,OUTRC=RC,REFRESH=N); %QCDATA(BASE=QC.ADSL,COMPARE=QC.ADSL,IDVARS=SUBJID, PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE3,OUTRC=RC,REFRESH=N); %QCDATA(BASE=PROC.ADAE,COMPARE=QC.ADAE,IDVARS=SUBJID, PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE4,OUTRC=RC,REFRESH=N);
  • 69. 71 #WUSS15 %QCDATA (Continued) LIBNAME PROD “C:STUDYABCABC-CL-1001DATAPRODADAM”; LIBNAME PROD “C:STUDYABCABC-CL-1001DATAQCADAM”; LIBNAME MACROS “C:STUDYABCABC-CL-1001DATAQCADAM”; OPTIONS SASAUTOS=(MACROS, SASAUTOS); %QCDATA(BASE=PROD.ADSL,COMPARE=QC.ADSL,IDVARS=SUBJID, PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE1,OUTRC=RC,REFRESH=Y); %QCDATA(BASE=PROD.ADSL,COMPARE=PROD.ADSL,IDVARS=SUBJID, PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE2,OUTRC=RC,REFRESH=N); %QCDATA(BASE=QC.ADSL,COMPARE=QC.ADSL,IDVARS=SUBJID, PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE3,OUTRC=RC,REFRESH=N); %QCDATA(BASE=PROC.ADAE,COMPARE=QC.ADAE,IDVARS=SUBJID, PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE4,OUTRC=RC,REFRESH=N); Generate output dataset for each comparison result.
  • 70. 72 #WUSS15 %QCDATA (Continued) LIBNAME PROD “C:STUDYABCABC-CL-1001DATAPRODADAM”; LIBNAME PROD “C:STUDYABCABC-CL-1001DATAQCADAM”; LIBNAME MACROS “C:STUDYABCABC-CL-1001DATAQCADAM”; OPTIONS SASAUTOS=(MACROS, SASAUTOS); %QCDATA(BASE=PROD.ADSL,COMPARE=QC.ADSL,IDVARS=SUBJID, PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE1,OUTRC=RC,REFRESH=Y); %QCDATA(BASE=PROD.ADSL,COMPARE=PROD.ADSL,IDVARS=SUBJID, PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE2,OUTRC=RC,REFRESH=N); %QCDATA(BASE=QC.ADSL,COMPARE=QC.ADSL,IDVARS=SUBJID, PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE3,OUTRC=RC,REFRESH=N); %QCDATA(BASE=PROC.ADAE,COMPARE=QC.ADAE,IDVARS=SUBJID, PRINTDUP=Y,COMPRPT=Y,OUTDATA=COMPARE4,OUTRC=RC,REFRESH=N); Refresh RC dataset at first %QCDATA invocation and generate RC dataset.
  • 71. 73 #WUSS15 %QCDATA (Continued) RC Dataset Generated by 4 Invocations of %QCDATA.
  • 72. 74 #WUSS15 %QCDATA (continued) RC Dataset Generated by 4 Invocations of %QCDATA. Flag Variable for Duplicate Observations in BASE Dataset
  • 73. 75 #WUSS15 %QCDATA (Continued) RC Dataset Generated by 4 Invocations of %QCDATA. Flag Variable for Duplicate Observations in COMPARE Dataset
  • 74. 76 #WUSS15 %QCDATA (continued) RC Dataset Generated by 4 Invocations of %QCDATA. These values are generated by/derived from &SYSINFO. What???
  • 75. 77 #WUSS15 Introduction to &SYSINFO Generated by PROC COMPARE
  • 76. Introduction 78 #WUSS15 Structure of this Paper Without Automation With Automation Orthodox Use Unorthodox Use %SYSINFO %QCDATA %QCDIR
  • 77. 79 #WUSS15 Introduction to &SYSINFO from PROC COMPARE &SYSINFO contains a return code summarizing the result of comparison of the BASE dataset against the COMPARE dataset.
  • 78. 80 #WUSS15 Introduction to &SYSINFO from PROC COMPARE (Continued) PROC COMPARE …; … more codes …; run; %let RC = &SYSINFO; Note: It is critical to get the value of &SYSINFO immediately after PROC COMPARE or its value will change.
  • 79. 81 #WUSS15 Introduction to &SYSINFO from PROC COMPARE So exactly what is &SYSINFO?
  • 80. 82 #WUSS15 Introduction to &SYSINFO from PROC COMPARE (Continued) &SYSINFO is the sum of the code for each condition that fails and the conditions are …
  • 81. 83 #WUSS15 Introduction to &SYSINFO from PROC COMPARE (Continued)
  • 82. 84 #WUSS15 Introduction to &SYSINFO from PROC COMPARE (Continued)
  • 83. 85 #WUSS15 Introduction to &SYSINFO from PROC COMPARE (Continued) Note: The Code are in 2**X format.
  • 84. 86 #WUSS15 Introduction to &SYSINFO from PROC COMPARE (Continued) Note: The Code are in 2**X format.
  • 85. 87 #WUSS15 Introduction to &SYSINFO from PROC COMPARE (Continued) Now because the code for each Failure Description is of 2**X Format (X=1, …, 16) and no code will be repeated in a test, Once we know the sum of the codes (i.e., the value of &SYSINFO), we will know what condition(s), if any, has/have been violated.
  • 86. 88 #WUSS15 Introduction to &SYSINFO from PROC COMPARE (Continued) Example Say, &SYSINFO resolves to 48. From the table, we know that is the result of 16 + 32 = 48
  • 87. 89 #WUSS15 Introduction to &SYSINFO from PROC COMPARE (Continued) So, &SYSINFO = 48 = 16 + 32 So Condition 5 (Variable has different length) and Condition 6 (Variable has different label) have been Violated.
  • 88. 90 #WUSS15 Introduction to &SYSINFO from PROC COMPARE (Continued) So again, this indicates Condition 5 (Variable has different length) and Condition 6 (Variable has different label) have been violated!!!
  • 89. 91 #WUSS15 Introduction to &SYSINFO from PROC COMPARE (Continued) 48 is an easy number to see its sum from the table. What if we have say 5357? What are the code that adds up to this? Ok. We can put the 5357 in BINARY16. format and found the position with 1s. Better yet, perform Bit Map Testing.
  • 90. 92 #WUSS15 Introduction to &SYSINFO from PROC COMPARE (Continued) Code for Bit Map Testing (found within %QCDATA)
  • 91. 93 #WUSS15 Introduction to &SYSINFO from PROC COMPARE (Continued) Code for Bit Map Testing (found within %QCDATA) /* Test for Data Set Label. */ if RC='1'b then do; RC_1='Y'; put '<<< Data set labels differ.'; end; /* Test for Data Set Types. */ if RC='1.'b then do; RC_2='Y'; RC_2 assigned. put '<<< Data set types differs'; end; RC_1 assigned.
  • 92. 94 #WUSS15 Introduction to &SYSINFO from PROC COMPARE (Continued) Code for Bit Map Testing (found within %QCDATA) /* Test for Data Set Label. */ if RC='1'b then do; RC_1='Y'; put '<<< Data set labels differ.'; end; /* Test for Data Set Types. */ if RC='1.'b then do; RC_2='Y'; put '<<< Data set types differs'; end;
  • 93. 95 #WUSS15 Introduction to &SYSINFO from PROC COMPARE (Continued) • A &SYSINFO < 64 means the differences are only due to ATTRIBUTES.
  • 94. 96 #WUSS15 Introduction to &SYSINFO from PROC COMPARE (Continued) • Remember the Holy Grail? It is &SYSINFO=0!!! It means NO ERROR Found!!! In my RC dataset output, &SYSINFO=0 is equivalent to RC_0=‘Y’.
  • 95. 97 #WUSS15 %QCDIR to Validate Datasets in Production and QC Directories
  • 96. Introduction 98 #WUSS15 Structure of this Paper Without Automation With Automation Orthodox Use Unorthodox Use %SYSINFO %QCDATA %QCDIR
  • 97. 99 #WUSS15 %QCDIR to validate datasets in directory PROD and QC are the directories and ADXXs are the datasets.
  • 98. 100 #WUSS15 %QCDIR to validate datasets in directory (Continued) Note: DICT.IDVARS contains the IDVARS for the dataset.
  • 99. 101 #WUSS15 %QCDIR to validate datasets in directory (Continued) By means of PROC CONTENTS of PROD._ALL_ and QC._ALL_, One can identify all datasets within the directory. Then Merge with DICT.IDVARS to create WORK.ALLDS above.
  • 100. 102 #WUSS15 %QCDIR to validate datasets in directory (Continued) By means of PROC CONTENTS of PROD._ALL_ and QC._ALL_, One can identify all datasets within the directory. Then Merge wit DICT.IDVARS to create WORK.ALLDS. This is performed within %QCDIR. Parameters in %QCDIR …
  • 101. 103 #WUSS15 %QCDIR to validate datasets in directory (Continued) Code from %QCDIR Use CALL EXECUTE (%NRSTR (…)) to execute %QCDATA.
  • 102. 104 #WUSS15 %QCDIR to validate datasets in directory (Continued) Code from %QCDIR Note: using CALL EXECUTE (%NRSTR (…)) to execute %QCDATA is important. CALL EXECUTE(…) by itself will generate incorrect result. This is because CALL EXECUTE executes immediately, but here we need to resolve the value of &SYSINFO for each comparison. Adding %NRSTR( ) delays the execution. CALL EXECUTE (‘%NRSTR(%QCDATA(BASE=‘||BASE|| ‘,COMPARE=‘||COMPARE|| ‘,IDVARS=‘||IDVARS|| ‘,OUTRC=ALLRC, REFRESH=Y));’);
  • 103. 105 #WUSS15 %QCDIR to validate datasets in directory (Continued)
  • 104. 106 #WUSS15 %QCDIR to validate datasets in directory (Continued) Figure 5: Comparison of Datasets in Production vs. Validation Process (Continued)
  • 105. 107 #WUSS15 Conclusion In this paper, the author goes through tips and techniques to speed up the validation process. For ‘Without Automation’ techniques, Orthodox and Unorthodox methods are introduced. For ‘Automation’ Part, we have introduced %QCDATA, &SYSINFO and %QCDIR. ‘How to Speed Up Your Validation Process Without Really Trying?’ Maybe you do need to try a bit. But hopefully, these techniques have speeded up your Validation Process. Happy Compare-ing!!!
  • 106. Acknowledgment  Art Carpenter, Caloxy Inc.  Jane Eslinger, SAS Technical Support  Boyd Roloff, Chiltern International  Kevin Russell, SAS Technical Support 108 #WUSS15
  • 107. Contact Information Name: Alice Cheng Enterprise: Chiltern International E-mail: alice.cheng@chiltern.com alice_m_cheng@yahoo.com 109 #WUSS14