SlideShare a Scribd company logo
“On visualizing Direct and Partial Correlations – ELI plots”
Leonardo E. Auslender
SAS Institute, Inc., Bedminster, NJ
1. Introduction
Statisticians and data analysts focus on correlations among pairs of
variables to understand the strength of linear relationships in the data.
Since correlations measure relations among pairs of variables, the
standard output is in matrix form, which tends to be difficult to interpret
for a large number of variables. The superlative analyst may also
incorporate partial correlations to further deepen the analysis, which at
least doubles the standard output. The hapless data-miner who faces
hundreds, if not thousands, of variables does not long to wade through
reams of outputs of correlations to find “interesting” patterns.1
In this paper, I present a method that enables to visualize any number of
Pearson (and partial) correlations by using a Proc-Timeplot-like output I
call Exploratory Linear Information (ELI) plots. Proc Timeplot is a
procedure available in SAS Base, of the SAS Institute SAS software,
since at least version 5.18. 2
Proc Timeplot “plots one or more variables
over time intervals” (SAS Procedures Guide, v. 6, 3rd
. edition, p. 579);
the time interval variable acts as an index for the observations being
plotted. Notice that the index variable is itself not plotted and, moreover,
that it is not at all necessary to have a time variable as an index (p. 581
of the same manual, ‘date’ variable.). In this paper, our index is a
variable that contains the names of the variables being correlated against
a ‘with’ variable, and we plot correlations (and partial correlations if so
desired) in an overlay fashion.
The proposed method, embedded in a SAS macro, allows to:
a) Plot correlations of either all variables against each
other or against a single 'with' variable, properly sorted
by the absolute value of the correlation.
b) Plot on the same graph described in a) the first ‘nth’
largest absolute value partial correlations, ‘n’ being a
chosen parameter dependent upon the desired crowding
of information in the plot.
c) Print the correlation and p-value matrices in a tabulate
fashion. The standard output is usually difficult to read
due to the intricacies of conceptualizing of long
sequences of numbers. 3
The tabulate presentation,
neater but still difficult to interpret, is necessary for
documentation.
2. Exploratory data analysis, variable selection and
correlation matrices.
The typical practice of data analysis includes, at least in principle,
exploratory data analysis, as espoused by Tukey (1977). More recently,
Cleveland (1993) emphasized visualization techniques, and many
research papers investigate the topic. This paper addresses the issue of
visualizing correlations, itself a component of EDA, with simple tools
available in the SAS System.
In addition, the hurried data mining practitioner finds himself/herself in
search of selecting variables for a model, a segmentation algorithm or a
customer profile, in an environment of hundreds and perhaps thousands
of variables. Stepwise methods, however much criticized, are one of the
present methodologies used to address variable selection.
In addition to variable selection techniques, practitioners also look at
correlations among variables to investigate linear dependencies. Less
frequently, practitioners look at squared partial (first order) correlation
coefficients. Given the linear model Y = α + β X + δ Z + ε with the
typical assumptions, these coefficients measure the proportion of
variation of a variable Y not estimated by X that is estimated by Z in
linear models. Equivalently, they measure the correlation between Y and
X holding Z constant. Direct and indirect effects of X and Z on Y can be
measured by the partial correlation coefficients. In the same vein, second
order partial correlation coefficients can be defined by partialling out an
additional variable from a first-order partial correlation. And third,
fourth, etc.
Specifically, given X, Y and Z, the zero order correlation between X and
Y is given by:
rxy = ( Σ (xi - x’) (yi -y’)) / √ Σ (xi - x’)2
Σ(yi - y’)2
where the apostrophe denotes mean value.
The partial correlation of x and y, given z, is:
rxy.z = ( rxy - rxz ryz) / √ (1 – rxz
2
) (1 – ryz
2
).
3. Programming considerations.
The Corr Procedure (with which the reader should be familiar to fully
understand this paper) is the basic tool for finding correlations, as in the
following code embedded in a macro:
PROC CORR DATA = &INDATA. OUTP = &OUTDATA. (WHERE = (_TYPE_ IN (“CORR”, “N))
RENAME = (_NAME_ = WITH)) NOPRINT;
%IF %NRBQUOTE(&WITH.) > %THEN WITH &WITH.; %STR(;)
VAR %DO K = 1 %TO &NUMVAR.; &&VAR&K. %END; %STR(;)
RUN;
In this macro-code, we are requesting not to print (NOPRINT) the
correlations, but to keep them in the data set &OUTDATA. The rest of
the code allows for the use of a ‘with’ variable and of selected VAR
variables. The names of the variables have been kept in macro variables
var1 through var&numvar. (&numvar. being the number of variables)
because we require the variables to be alphabetically ordered to search
for missing values later on. The standard output data set referenced by
&OUTDATA. provides the correlations but not the number of
observations for the ‘with’ variable. This number is critical in
determining p-values, and given the prevalence of missing values in
large databases, it forces us to re-capture that information. 4
(See section
3 below the typical Proc corr output).
OUTDATA AFTER PROC CORROUTDATA AFTER PROC CORROUTDATA AFTER PROC CORROUTDATA AFTER PROC CORR
OBS _TYPE_ _WITH LN_DAY N_DAYLS2 N_DAYLST N_DAYSEX N_INTRST RESPONSEOBS _TYPE_ _WITH LN_DAY N_DAYLS2 N_DAYLST N_DAYSEX N_INTRST RESPONSEOBS _TYPE_ _WITH LN_DAY N_DAYLS2 N_DAYLST N_DAYSEX N_INTRST RESPONSEOBS _TYPE_ _WITH LN_DAY N_DAYLS2 N_DAYLST N_DAYSEX N_INTRST RESPONSE
1 N 26610.00 38185.00 38185.00 38185.00 38185.00 22931.001 N 26610.00 38185.00 38185.00 38185.00 38185.00 22931.001 N 26610.00 38185.00 38185.00 38185.00 38185.00 22931.001 N 26610.00 38185.00 38185.00 38185.00 38185.00 22931.00
2 CORR LN_DAY 1.00 0.77 0.92 0.72 0.11 0.992 CORR LN_DAY 1.00 0.77 0.92 0.72 0.11 0.992 CORR LN_DAY 1.00 0.77 0.92 0.72 0.11 0.992 CORR LN_DAY 1.00 0.77 0.92 0.72 0.11 0.99
3 CORR N_DAYLS2 0.77 1.00 0.95 0.86 0.03 0.683 CORR N_DAYLS2 0.77 1.00 0.95 0.86 0.03 0.683 CORR N_DAYLS2 0.77 1.00 0.95 0.86 0.03 0.683 CORR N_DAYLS2 0.77 1.00 0.95 0.86 0.03 0.68
4 CORR N_DAYLST 0.924 CORR N_DAYLST 0.924 CORR N_DAYLST 0.924 CORR N_DAYLST 0.92 0.95 1.00 0.85 0.06 0.870.95 1.00 0.85 0.06 0.870.95 1.00 0.85 0.06 0.870.95 1.00 0.85 0.06 0.87
5 CORR N_DAYSEX 0.72 0.86 0.85 1.00 0.03 0.665 CORR N_DAYSEX 0.72 0.86 0.85 1.00 0.03 0.665 CORR N_DAYSEX 0.72 0.86 0.85 1.00 0.03 0.665 CORR N_DAYSEX 0.72 0.86 0.85 1.00 0.03 0.66
6 CORR N_INTRST 0.11 0.03 0.06 0.03 1.006 CORR N_INTRST 0.11 0.03 0.06 0.03 1.006 CORR N_INTRST 0.11 0.03 0.06 0.03 1.006 CORR N_INTRST 0.11 0.03 0.06 0.03 1.00 0.120.120.120.12
7 CORR RESPONSE 0.99 0.68 0.87 0.66 0.12 1.007 CORR RESPONSE 0.99 0.68 0.87 0.66 0.12 1.007 CORR RESPONSE 0.99 0.68 0.87 0.66 0.12 1.007 CORR RESPONSE 0.99 0.68 0.87 0.66 0.12 1.00
8 CORR SEXUNKN8 CORR SEXUNKN8 CORR SEXUNKN8 CORR SEXUNKN ----0.21 0.020.21 0.020.21 0.020.21 0.02 ----0.08 0.320.08 0.320.08 0.320.08 0.32 ----0.070.070.070.07 ----0.240.240.240.24
9 CORR TENURE9 CORR TENURE9 CORR TENURE9 CORR TENURE ----0.050.050.050.05 0.010.010.010.01 ----0.01 0.030.01 0.030.01 0.030.01 0.03 ----0.050.050.050.05 ----0.040.040.040.04
Due to the likelihood of the presence of missing values, it is necessary to
find out the number of non-missing observations for every pair of
variables. Since the &outdata. data set provides the number of present
observations for individual variables (but not for the ‘with’ variable), it
is necessary to obtain the information for those pairs in which at least
one variable has missing values. Once the number of non-missing values
is determined for every pair of variables, the p-values are computed by:
√√√√ (N – 2). Corr
____________ , ∼∼∼∼ t (N - 2).
√√√√ (1 – Corr2
)
which can be programmed as:
_STAT = ABS (SQRT(_NUMOBS - 2) * _CORR / SQRT ( 1 - (_CORR * _CORR)));
IF _NUMOBS > 100 OR _STAT > 40
THEN _P_VAL = ROUND ( 2 * (1 - PROBNORM (_STAT)),.00001);
ELSE IF _STAT > . THEN _P_VAL =
ROUND ( 2 * (1 - PROBT ( _STAT, _NUMOBS - 2 ,0 )),.00001);
ELSE _P_VAL = .;
At this point, we have obtained or calculated correlations and p-values
that allow us to “timeplot”. Since we have p-value information (in sas
data set &SASWORK.7 below), the analyst may desire to plot only
significant correlations, usually given by a p-value threshold. The
Timeplot code is:
PROC TIMEPLOT DATA = &SASWORK.7;
PLOT _CORR = "0" %IF &PARTIAL. = Y %THEN %DO K = 1 %TO &N_PRTLS.;
MXPART&K. = "&K."
%END;
/ OVERLAY NPP POS = 60 HILOC REF = 0 REFCHAR = '|' OVPCHAR = "*"
AXIS = -1 TO 1 BY .02 ;
ID _VARLBL ; /* VAR NAME + LABEL */
BY _WITH; /* SET OF WITH VARS */
TITLE2
%IF &PARTIAL. = Y %THEN "CORRS BY #BYVAL1, &N_PRTLS. PARTIALS REQUESTED";
%ELSE "CORRELATIONS BY #BYVAL1";
%STR(;)
%IF &SGNFCNT. = Y %THEN TITLE3 "SIGNIFICANT CORRS 95% ONLY"; %STR(;)
RUN;
In this code, we request at least to plot the correlation between a set of
‘with’ and ‘var’ variables (_WITH, _CORR) identified in the plot by the
value 0 (zero level correlation). If partial correlations are requested as
well, calculated in a “PROC IML” step, (“%DO K = 1 %TO
&N_PRTLS. …”), their values are identified by 1, 2, 3 … &N_prtls. in
descending order, where &n_prtls. is a user determined parameter. The
names of the variables partialled out corresponding to 1, 2, 3… are
found in a later printout under the names PART1, PART2, PART3 … .
We use * to denote overprinting (Ovpchar option).
3. Case Study.
I present one case, without a ‘with’ variable. 5
The ‘with’ variable case is
merely a subset of the more general case. All the variables are
continuous and their meaning is unimportant for this exercise. The usual
(clipped) printout of Proc Corr and the (clipped) Output data set
generated in this case are:
LN_DAYLN_DAYLN_DAYLN_DAY
LN_DAY RESPONSE N_DAYLST N_DAYLS2 N_DAYSEX TOT_RCVDLN_DAY RESPONSE N_DAYLST N_DAYLS2 N_DAYSEX TOT_RCVDLN_DAY RESPONSE N_DAYLST N_DAYLS2 N_DAYSEX TOT_RCVDLN_DAY RESPONSE N_DAYLST N_DAYLS2 N_DAYSEX TOT_RCVD
1.00000 0.99097 0.92451 0.76645 0.72429 0.224471.00000 0.99097 0.92451 0.76645 0.72429 0.224471.00000 0.99097 0.92451 0.76645 0.72429 0.224471.00000 0.99097 0.92451 0.76645 0.72429 0.22447
0.0 0.0001 0.0001 00.0 0.0001 0.0001 00.0 0.0001 0.0001 00.0 0.0001 0.0001 0.0001 0.0001 0.0001.0001 0.0001 0.0001.0001 0.0001 0.0001.0001 0.0001 0.0001
26610 16057 26610 26610 26610 2661026610 16057 26610 26610 26610 2661026610 16057 26610 26610 26610 2661026610 16057 26610 26610 26610 26610
SEXUNKN N_INTRST TENURE V3 V1 V2SEXUNKN N_INTRST TENURE V3 V1 V2SEXUNKN N_INTRST TENURE V3 V1 V2SEXUNKN N_INTRST TENURE V3 V1 V2
----0.21161 0.109580.21161 0.109580.21161 0.109580.21161 0.10958 ----0.053240.053240.053240.05324 ----0.01432 0.004370.01432 0.004370.01432 0.004370.01432 0.00437 ----0.001370.001370.001370.00137
0.0001 0.0001 0.0001 0.0195 0.4757 0.82280.0001 0.0001 0.0001 0.0195 0.4757 0.82280.0001 0.0001 0.0001 0.0195 0.4757 0.82280.0001 0.0001 0.0001 0.0195 0.4757 0.8228
26610 26610 26610 26610 26610 266126610 26610 26610 26610 26610 266126610 26610 26610 26610 26610 266126610 26610 26610 26610 26610 26610000
N_DAYLS2N_DAYLS2N_DAYLS2N_DAYLS2
N_DAYLS2 N_DAYLST N_DAYSEX LN_DAY RESPONSE TOT_RCVDN_DAYLS2 N_DAYLST N_DAYSEX LN_DAY RESPONSE TOT_RCVDN_DAYLS2 N_DAYLST N_DAYSEX LN_DAY RESPONSE TOT_RCVDN_DAYLS2 N_DAYLST N_DAYSEX LN_DAY RESPONSE TOT_RCVD
1.00000 0.95119 0.86207 0.76645 0.67704 0.199001.00000 0.95119 0.86207 0.76645 0.67704 0.199001.00000 0.95119 0.86207 0.76645 0.67704 0.199001.00000 0.95119 0.86207 0.76645 0.67704 0.19900
0.0 0.000.0 0.000.0 0.000.0 0.0001 0.0001 0.0001 0.0001 0.000101 0.0001 0.0001 0.0001 0.000101 0.0001 0.0001 0.0001 0.000101 0.0001 0.0001 0.0001 0.0001
38185 38185 38185 26610 22931 3818538185 38185 38185 26610 22931 3818538185 38185 38185 26610 22931 3818538185 38185 38185 26610 22931 38185
N_INTRST SEXUNKN TENURE V3 V1 V2N_INTRST SEXUNKN TENURE V3 V1 V2N_INTRST SEXUNKN TENURE V3 V1 V2N_INTRST SEXUNKN TENURE V3 V1 V2
0.02730 0.01862 0.009800.02730 0.01862 0.009800.02730 0.01862 0.009800.02730 0.01862 0.00980 ----0.00816 0.00204 0.001020.00816 0.00204 0.001020.00816 0.00204 0.001020.00816 0.00204 0.00102
0.0001 0.0003 0.0555 0.1109 0.6904 0.84230.0001 0.0003 0.0555 0.1109 0.6904 0.84230.0001 0.0003 0.0555 0.1109 0.6904 0.84230.0001 0.0003 0.0555 0.1109 0.6904 0.8423
38185 38185 38185 3818538185 38185 38185 3818538185 38185 38185 3818538185 38185 38185 38185 38185 3818538185 3818538185 3818538185 38185
The first line of numbers in the Proc Corr output is the corresponding
correlation coefficients, while the second is the corresponding p-values.
For the case of hundreds or thousands of variables, this presentation is
non-informative, and the wrapping-around effect will make it tedious to
review. It becomes more cumbersome when the analyst wants to
simplify the task by only looking at correlations with significant p-
values. In this light, we propose the following Timeplot-like output
(which corresponds to the set of correlations associated with LN_DAY),
adapted for visualization:
ELI PLOT: CORRELATIONS BY LN_DAYELI PLOT: CORRELATIONS BY LN_DAYELI PLOT: CORRELATIONS BY LN_DAYELI PLOT: CORRELATIONS BY LN_DAY
WITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAY
VAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min max
----1 11 11 11 1
****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------****
N_DAYLS2: |N_DAYLS2: |N_DAYLS2: |N_DAYLS2: | | 0 || 0 || 0 || 0 |
N_DAYLST: #_days lst_clkth | | 0 |N_DAYLST: #_days lst_clkth | | 0 |N_DAYLST: #_days lst_clkth | | 0 |N_DAYLST: #_days lst_clkth | | 0 |
N_DAYSEX: | | 0N_DAYSEX: | | 0N_DAYSEX: | | 0N_DAYSEX: | | 0 ||||
N_INTRST: #_intrsts e_intr | | 0 |N_INTRST: #_intrsts e_intr | | 0 |N_INTRST: #_intrsts e_intr | | 0 |N_INTRST: #_intrsts e_intr | | 0 |
RESPONSE: | | 0 |RESPONSE: | | 0 |RESPONSE: | | 0 |RESPONSE: | | 0 |
SEXUNKN: |SEXUNKN: |SEXUNKN: |SEXUNKN: | 0 | |0 | |0 | |0 | |
TENURE: # days since bec | 0 | |TENURE: # days since bec | 0 | |TENURE: # days since bec | 0 | |TENURE: # days since bec | 0 | |
TOT_RCVD: tot rcvd e_rcvd | | 0TOT_RCVD: tot rcvd e_rcvd | | 0TOT_RCVD: tot rcvd e_rcvd | | 0TOT_RCVD: tot rcvd e_rcvd | | 0 ||||
V1: | 0 |V1: | 0 |V1: | 0 |V1: | 0 |
V2: | 0 |V2: | 0 |V2: | 0 |V2: | 0 |
V3: |V3: |V3: |V3: | 0| |0| |0| |0| |
****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------****
The previous ELI plot illustrates the correlation patterns among the
variables. ‘0’ marks direct (or zero order) correlations. The plot allows
the ‘stepwise-prone’ analyst to focus directly on areas of high-
correlation if interested in variable selection. In this case, N_Dayls2, N-
daylst, N-daysex, etc. These areas will be the ones closer to the (-1, +1)
axes. The midpoint of the plot marks the zero correlation mark.
Further, for every “(with, var)” pair, we can also plot the four (or any
number so desired) largest 1st
order partial correlations, denoted by the
numbers 1 through 4. Overlaps are denoted by ‘*’. The printout titled
“DIRECT & PARTIAL VAR NAMES” details the names of the
variables for each of the plotted correlations.
ELI PLOT: CORRS BY LN_DAY, 4 PARTIALS REQUESTEDELI PLOT: CORRS BY LN_DAY, 4 PARTIALS REQUESTEDELI PLOT: CORRS BY LN_DAY, 4 PARTIALS REQUESTEDELI PLOT: CORRS BY LN_DAY, 4 PARTIALS REQUESTED
WITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAY
VAR_NAME_+_LABEL mVAR_NAME_+_LABEL mVAR_NAME_+_LABEL mVAR_NAME_+_LABEL min maxin maxin maxin max
----1 11 11 11 1
****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------****
N_DAYLS2: | 2N_DAYLS2: | 2N_DAYLS2: | 2N_DAYLS2: | 2------------------------------------------------------------------------------------------------------------||||------------------------------------------------------------------------------------*3*3*3*3----------------1 |1 |1 |1 |
N_DAYLST: #_days lst_clkth | | *3* |N_DAYLST: #_days lst_clkth | | *3* |N_DAYLST: #_days lst_clkth | | *3* |N_DAYLST: #_days lst_clkth | | *3* |
N_DAYSEX:N_DAYSEX:N_DAYSEX:N_DAYSEX: | | *| | *| | *| | *------------1 |1 |1 |1 |
N_INTRST: #_intrsts e_intr | | ** |N_INTRST: #_intrsts e_intr | | ** |N_INTRST: #_intrsts e_intr | | ** |N_INTRST: #_intrsts e_intr | | ** |
RESPONSE: | |RESPONSE: | |RESPONSE: | |RESPONSE: | | * |* |* |* |
SEXUNKN: | 1SEXUNKN: | 1SEXUNKN: | 1SEXUNKN: | 1--------------------------------****------------40 | |40 | |40 | |40 | |
TENURE: # days since bec | *3* | |TENURE: # days since bec | *3* | |TENURE: # days since bec | *3* | |TENURE: # days since bec | *3* | |
TOT_RCVD: tot rcvd e_rcvdTOT_RCVD: tot rcvd e_rcvdTOT_RCVD: tot rcvd e_rcvdTOT_RCVD: tot rcvd e_rcvd | | *1 || | *1 || | *1 || | *1 |
V1: | 1V1: | 1V1: | 1V1: | 1----* |* |* |* |
V2: | *V2: | *V2: | *V2: | * ||||
V3: | *|1 |V3: | *|1 |V3: | *|1 |V3: | *|1 |
****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------****
ELI PLOT: CORRS BY N_DAYLS2, 4 PELI PLOT: CORRS BY N_DAYLS2, 4 PELI PLOT: CORRS BY N_DAYLS2, 4 PELI PLOT: CORRS BY N_DAYLS2, 4 PARTIALS REQUESTEDARTIALS REQUESTEDARTIALS REQUESTEDARTIALS REQUESTED
WITH:=N_DAYLS2WITH:=N_DAYLS2WITH:=N_DAYLS2WITH:=N_DAYLS2
VAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min max
----1 11 11 11 1
****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------****
LN_DAY: | 2LN_DAY: | 2LN_DAY: | 2LN_DAY: | 2------------------------------------------------------------------------------------------------------------||||------------------------------------------------------------------------------------*3*3*3*3----------------1 |1 |1 |1 |
N_DAYLST: #_days lst_clkth |N_DAYLST: #_days lst_clkth |N_DAYLST: #_days lst_clkth |N_DAYLST: #_days lst_clkth | | ** || ** || ** || ** |
N_DAYSEX: | | *N_DAYSEX: | | *N_DAYSEX: | | *N_DAYSEX: | | *----1 |1 |1 |1 |
N_INTRST: #_intrsts e_intr | 1*4|0 |N_INTRST: #_intrsts e_intr | 1*4|0 |N_INTRST: #_intrsts e_intr | 1*4|0 |N_INTRST: #_intrsts e_intr | 1*4|0 |
RESPRESPRESPRESPONSE: | *ONSE: | *ONSE: | *ONSE: | *------------------------------------------------------------------------------------------------------------||||----------------------------------------------------------------------------*3 |*3 |*3 |*3 |
SEXUNKN: | 1SEXUNKN: | 1SEXUNKN: | 1SEXUNKN: | 1------------------------------------------------------------|0|0|0|0------------------------*2 |*2 |*2 |*2 |
TENURE: # days since bec |TENURE: # days since bec |TENURE: # days since bec |TENURE: # days since bec | 40404040----* |* |* |* |
TOT_RCVD: tot rcvd e_rcvd | | * |TOT_RCVD: tot rcvd e_rcvd | | * |TOT_RCVD: tot rcvd e_rcvd | | * |TOT_RCVD: tot rcvd e_rcvd | | * |
V1: | 1* |V1: | 1* |V1: | 1* |V1: | 1* |
VVVV2: | * |2: | * |2: | * |2: | * |
V3: | * |V3: | * |V3: | * |V3: | * |
****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------****
Let us concentrate on a specific example. For instance, the first line of
the first diagram above (shown just below for clarity of exposition) plots
LN_DAY (‘with’ variable) against N_DAYLS2, and four first-order
partials in decreasing absolute order of magnitude. The correlations are
joined by hyphens that allow for a more compact view. ‘1’ in the first
line of the graph corresponds to the correlation between LN_DAY and
N_DAYLS2 after partialling out RESPONSE (which corresponds to
variable PART1 in the first observation of the printout below). ‘2’
corresponds to the next largest absolute partial correlation, which
corresponds to N_DAYLST, etc. In the diagram, there is an overlap
between the zero-order correlation and the partial corresponding to
N_INTRST (PART4), denoted by ‘*’. Given the distance of all these
correlations from the mid-point of zero correlation, the analyst might
deem these variables worth for further study. While p-values for direct
correlations are given in a tabulate below, corresponding p-values for the
partial correlations are not calculated at present.
WITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAY
VAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min max
----1 11 11 11 1
****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------****
N_DAYLS2: | 2N_DAYLS2: | 2N_DAYLS2: | 2N_DAYLS2: | 2------------------------------------------------------------------------------------------------------------||||------------------------------------------------------------------------------------*3*3*3*3----------------1 |1 |1 |1 |
DIRECT & PARTIAL VAR NAMESDIRECT & PARTIAL VAR NAMESDIRECT & PARTIAL VAR NAMESDIRECT & PARTIAL VAR NAMES
WITH=LN_DAYWITH=LN_DAYWITH=LN_DAYWITH=LN_DAY
OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4
1 N_DAYLS2 RESPONSE N_DAYLST SEXUNKN N_INTRST
2 N_DAYLST RESPONSE N_DAYLS2 SEXUNKN TENURE
3 N_DAYSEX SEXUNKN TENURE N_INTRST V1
4 N_INTRST N_DAYLS2 N_DAYLST N_DAYSEX V3
5 RESPONSE N_DAYLST N_DAYLS2 V1 TENURE
6 SEXUNKN N_DAYSEX N_DAYLST N_DAYLS2 TOT_RCVD
7 TENURE N_DAYLST N_DAYSEX N_DAYLS2 RESPONSE
8 TOT_RCVD SEXUNKN TENURE V3 V1
9 V1 RESPONSE N_DAYSEX N_DAYLST TOT_RCVD
10 V2 RESPONSE N_DAYLS2 N_DAYLST N_DAYSEX
11 V3 RESPONSE TOT_RCVD N_INTRST V2
WITH=N_DAYLS2WITH=N_DAYLS2WITH=N_DAYLS2WITH=N_DAYLS2
OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4
12 LN_DAY RESPONSE N_DAYLST SEXUNKN N_INTRST
13 N_DAYLST LN_DAY RESPONSE SEXUNKN N_INTRST
14 N_DAYSEX SEXUNKN TENURE V1 V2
15 N_INTRST N_DAYLST LN_DAY RESPONSE TOT_RCVD
16 RESPONSE LN_DAY N_DAYLST SEXUNKN N_INTRST
17 SEXUNKN N_DAYSEX N_DAYLST LN_DAY RESPONSE
18 TENURE LN_DAY N_DAYLST RESPONSE N_DAYSEX
19 TOT_RCVD N_INTRST V3 V1 V2
20 V1 RESPONSE N_DAYSEX TOT_RCVD TENURE
21 V2 N_DAYLST LN_DAY N_DAYSEX RESPONSE
22 V3 TOT_RCVD SEXUNKN TENURE N_INTRST
ELI plots allow for a different configuration as well. Instead of plotting
the largest first-order partial correlations in addition to the zero order
one, we can plot the largest of the first-order, second largest, third
largest, etc. For the sake of brevity, this excursion is omitted.
Finally, and for documentation purposes, the correlation coefficients and
corresponding p-values are also tabulated
:
UPPER TRIANGULAR MATRIX
ALPHABETICALLY ORDERED
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ†
‚CORRELATIONS ‚ ‚ ‚#_days‚ ‚ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‚ ‚lst_c-‚ ‚#_int-‚ ‚ ‚# days‚ ‚
‚ ‚ ‚ ‚lkthru‚ ‚ rsts ‚ ‚ ‚since ‚ tot ‚
‚ ‚ ‚N_DAY-‚&_dec-‚N_DAY-‚e_int-‚RESPO-‚SEXUN-‚became‚ rcvd ‚
‚ ‚LN_DAY‚ LS2 ‚.16.99‚ SEX ‚ rs2 ‚ NSE ‚ KN ‚member‚e_rcvd‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚VARIABLE ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
‚LN_DAY ‚ ‚ 0.77‚ 0.92‚ 0.73‚ 0.10‚ 0.99‚ -0.21‚ -0.04‚ 0.23‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚N_DAYLS2 ‚ ‚ ‚ 0.95‚ 0.86‚ 0.03‚ 0.68‚ 0.02‚ 0.01‚ 0.20‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚N_DAYLST ‚ ‚ ‚ ‚ 0.85‚ 0.06‚ 0.87‚ -0.08‚ -0.01‚ 0.22‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚N_DAYSEX ‚ ‚ ‚ ‚ ‚ 0.03‚ 0.66‚ 0.32‚ 0.03‚ 0.22‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚N_INTRST ‚ ‚ ‚ ‚ ‚ ‚ 0.12‚ -0.07‚ -0.05‚ 0.29‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚RESPONSE ‚ ‚ ‚ ‚ ‚ ‚ ‚ -0.24‚ -0.04‚ 0.22‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚SEXUNKN ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.11‚ 0.06‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚TENURE ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.00‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚TOT_RCVD ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒŒ
„ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ†
‚P_VALS OF CORRS ‚ ‚ ‚#_days‚ ‚ ‚ ‚ ‚ ‚ ‚
‚ ‚ ‚ ‚lst_c-‚ ‚#_int-‚ ‚ ‚# days‚ ‚
‚ ‚ ‚ ‚lkthru‚ ‚ rsts ‚ ‚ ‚since ‚ tot ‚
‚ ‚ ‚N_DAY-‚&_dec-‚N_DAY-‚e_int-‚RESPO-‚SEXUN-‚became‚ rcvd ‚
‚ ‚LN_DAY‚ LS2 ‚.16.99‚ SEX ‚ rs2 ‚ NSE ‚ KN ‚member‚e_rcvd‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚VARIABLE ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
‚LN_DAY ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚N_DAYLS2 ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.056‚ 0.000‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚N_DAYLST ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.028‚ 0.000‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚N_DAYSEX ‚ ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚N_INTRST ‚ ‚ ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚RESPONSE ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚SEXUNKN ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.000‚ 0.000‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚TENURE ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.831‚
‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰
‚TOT_RCVD ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒŒ
Since many correlations may not be significant at an alpha
level of, say, 95%, the ELI graphs can be made to portray
significant correlations only. In our example however, we
presented all possible effects with corresponding partial
correlations.
6. Trademarks.
SAS and all other SAS Institute Inc. product or service names
are registered trademarks or trademarks of SAS Institute Inc.
in the USA and other countries. indicates USA registration.
Other brand and product names are registered trademarks or
trademarks of their respective companies.
7. End Notes.
1
Data mining has often been defined as the search for
patterns, interesting or otherwise. Curiously, “interesting” is
in the eye of the beholder, and patterns are not well defined.
Ergo, any tool that purports to find interesting patterns
belongs under the rubric of data mining, which thus cannot
properly define any scientific application, since almost
anything can belong to it. My own preference is “Giga-data
analysis” (as opposed to the more traditional statistician’s
“small data set analysis”). It is in this spirit that I envision
this paper.
Since information from data requires the processes of
summarization, conceptualization, interpretation and
application, the data analyst victorious in all these steps after
successful perusal of reams of pages might require
hospitalization as well
2
Yes, I am that old. This paper deals only with Pearson
correlation coefficients, but the additional use of other
measures contained in Proc Corr is straightforward.
Programming Timeplot-like diagrams in other software
should not pose an insurmountable task. I created my first
diagram in Basic in 1980.
Additionally, the adjustment necessary for correlations
among continuous and categorical as well as among
categorical variables can be easily added.
3
I consider the name Timeplot a limiting and misleading
denomination. C’est la vie.
5
Partial correlations can also be understood as the
correlation between the residuals of a regression between Y
and X, and between Y and Z. See Cohen and Cohen (1983)
for an overall discussion, and Leahy (1996) for suppression
effects in the area of data base marketing.
6
The skillful programmer might be enticed to utilize Proc
Printto. My preference for a more arduous route is based on
the additional flexibility provided to enhance the overall
procedure, such as including partial correlations in one step,
multiple comparisons of correlations, Drezner’s Multirelation
(1995), etc.
Missing values are excluded from the calculation of
correlations in a pair-wise form. For a proposed solution to
the problem of missing values in the context of large
databases, see Auslender (1997).
7
The macro at present accepts only one ‘with’ variable. It is
a straightforward modification to enhance the code to accept
multiple ‘with’ variables.
8. Bibliography
Auslender L., Missing Value Imputation Methods for Large
DataBases, Proceedings of the 1997 northeastern SAS Users
Group Meeting, 1997.
Cleveland W., Visualizing Data, Hobart Press, USA, 1993.
Cohen J., Cohen P. Applied Multiple Regression/Correlation
Analysis for the Behavioral Sciences, Lawrence Erlbaum
Associates, Publishers, 1983.
Drezner, Z., Multirelation – Correlation among more than
two variables, Computational Statistics and Data Analysis,
1995, March.
Hoaglin D., Mosteller F., Tukey J., Understanding Robust
and Exploratory Data Analysis, John Wiley & Sons, 1983.
Leahy K., Nature, prevalence, and benefits of suppression
effects in direct response segmentation, Proceedings of the
American Statistical Association 1995 Meeting, 1996.
9. Contact Information
Your comments and questions are valued and encouraged.
Contact the author at:
Leonardo E. Auslender
SAS Institute
1545 Rt. 206 N, Suite 270
Bedminster, NJ 07921
908 470 0080 x 8217 (o)
908 470 0081 (f)
leonardo.auslender@sas.com

More Related Content

What's hot

Design of State Estimator for a Class of Generalized Chaotic Systems
Design of State Estimator for a Class of Generalized Chaotic SystemsDesign of State Estimator for a Class of Generalized Chaotic Systems
Design of State Estimator for a Class of Generalized Chaotic Systems
ijtsrd
 
Additional Relational Algebra Operations
Additional Relational Algebra OperationsAdditional Relational Algebra Operations
Additional Relational Algebra Operations
A. S. M. Shafi
 
Sharepoint quality management system
Sharepoint quality management systemSharepoint quality management system
Sharepoint quality management system
selinasimpson2101
 
Extended relational algebra
Extended relational algebraExtended relational algebra
Extended relational algebra
1Arun_Pandey
 
Understanding databases and querying
Understanding databases and queryingUnderstanding databases and querying
Understanding databases and querying
Usman Sharif
 
Transportation and logistics modeling 2
Transportation and logistics modeling 2Transportation and logistics modeling 2
Transportation and logistics modeling 2
karim sal3awi
 
Bbs11 ppt ch10
Bbs11 ppt ch10Bbs11 ppt ch10
Bbs11 ppt ch10
Tuul Tuul
 

What's hot (7)

Design of State Estimator for a Class of Generalized Chaotic Systems
Design of State Estimator for a Class of Generalized Chaotic SystemsDesign of State Estimator for a Class of Generalized Chaotic Systems
Design of State Estimator for a Class of Generalized Chaotic Systems
 
Additional Relational Algebra Operations
Additional Relational Algebra OperationsAdditional Relational Algebra Operations
Additional Relational Algebra Operations
 
Sharepoint quality management system
Sharepoint quality management systemSharepoint quality management system
Sharepoint quality management system
 
Extended relational algebra
Extended relational algebraExtended relational algebra
Extended relational algebra
 
Understanding databases and querying
Understanding databases and queryingUnderstanding databases and querying
Understanding databases and querying
 
Transportation and logistics modeling 2
Transportation and logistics modeling 2Transportation and logistics modeling 2
Transportation and logistics modeling 2
 
Bbs11 ppt ch10
Bbs11 ppt ch10Bbs11 ppt ch10
Bbs11 ppt ch10
 

Similar to Eli plots visualizing innumerable number of correlations

Regression kriging
Regression krigingRegression kriging
Regression kriging
FAO
 
Scalable Constrained Spectral Clustering
Scalable Constrained Spectral ClusteringScalable Constrained Spectral Clustering
Scalable Constrained Spectral Clustering
1crore projects
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
ANIRBANMAJUMDAR18
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
Mason Ziemer
 
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATADETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
IJCSEA Journal
 
[M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization [M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization
Andrea Rubio
 
How PROC SQL and SAS® Macro Programming Made My Statistical Analysis Easy? A ...
How PROC SQL and SAS® Macro Programming Made My Statistical Analysis Easy? A ...How PROC SQL and SAS® Macro Programming Made My Statistical Analysis Easy? A ...
How PROC SQL and SAS® Macro Programming Made My Statistical Analysis Easy? A ...
Venu Perla
 
Lecture7a Applied Econometrics and Economic Modeling
Lecture7a Applied Econometrics and Economic ModelingLecture7a Applied Econometrics and Economic Modeling
Lecture7a Applied Econometrics and Economic Modeling
stone55
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
Paulo Faria
 
Methods of Unsupervised Learning (Article 10 - Practical Exercises)
Methods of Unsupervised Learning (Article 10 - Practical Exercises)Methods of Unsupervised Learning (Article 10 - Practical Exercises)
Methods of Unsupervised Learning (Article 10 - Practical Exercises)
Theodore Grammatikopoulos
 
Colombo14a
Colombo14aColombo14a
Colombo14a
AlferoSimona
 
Alacart Poor man's classification trees
Alacart Poor man's classification treesAlacart Poor man's classification trees
Alacart Poor man's classification trees
Leonardo Auslender
 
JF608: Quality Control - Unit 3
JF608: Quality Control - Unit 3JF608: Quality Control - Unit 3
JF608: Quality Control - Unit 3
Asraf Malik
 
Datamining with R
Datamining with RDatamining with R
Datamining with R
Shitalkumar Sukhdeve
 
An econometric model for Linear Regression using Statistics
An econometric model for Linear Regression using StatisticsAn econometric model for Linear Regression using Statistics
An econometric model for Linear Regression using Statistics
IRJET Journal
 
ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]
Theodore Grammatikopoulos
 
Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovariance
Shrey Nishchal
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classification
SnehaDey21
 
User_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxUser_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docx
dickonsondorris
 
Canonical correlation
Canonical correlationCanonical correlation
Canonical correlation
National Institute of Biologics
 

Similar to Eli plots visualizing innumerable number of correlations (20)

Regression kriging
Regression krigingRegression kriging
Regression kriging
 
Scalable Constrained Spectral Clustering
Scalable Constrained Spectral ClusteringScalable Constrained Spectral Clustering
Scalable Constrained Spectral Clustering
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATADETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
 
[M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization [M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization
 
How PROC SQL and SAS® Macro Programming Made My Statistical Analysis Easy? A ...
How PROC SQL and SAS® Macro Programming Made My Statistical Analysis Easy? A ...How PROC SQL and SAS® Macro Programming Made My Statistical Analysis Easy? A ...
How PROC SQL and SAS® Macro Programming Made My Statistical Analysis Easy? A ...
 
Lecture7a Applied Econometrics and Economic Modeling
Lecture7a Applied Econometrics and Economic ModelingLecture7a Applied Econometrics and Economic Modeling
Lecture7a Applied Econometrics and Economic Modeling
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
 
Methods of Unsupervised Learning (Article 10 - Practical Exercises)
Methods of Unsupervised Learning (Article 10 - Practical Exercises)Methods of Unsupervised Learning (Article 10 - Practical Exercises)
Methods of Unsupervised Learning (Article 10 - Practical Exercises)
 
Colombo14a
Colombo14aColombo14a
Colombo14a
 
Alacart Poor man's classification trees
Alacart Poor man's classification treesAlacart Poor man's classification trees
Alacart Poor man's classification trees
 
JF608: Quality Control - Unit 3
JF608: Quality Control - Unit 3JF608: Quality Control - Unit 3
JF608: Quality Control - Unit 3
 
Datamining with R
Datamining with RDatamining with R
Datamining with R
 
An econometric model for Linear Regression using Statistics
An econometric model for Linear Regression using StatisticsAn econometric model for Linear Regression using Statistics
An econometric model for Linear Regression using Statistics
 
ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]
 
Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovariance
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classification
 
User_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxUser_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docx
 
Canonical correlation
Canonical correlationCanonical correlation
Canonical correlation
 

More from Leonardo Auslender

1 UMI.pdf
1 UMI.pdf1 UMI.pdf
Ensembles.pdf
Ensembles.pdfEnsembles.pdf
Ensembles.pdf
Leonardo Auslender
 
Suppression Enhancement.pdf
Suppression Enhancement.pdfSuppression Enhancement.pdf
Suppression Enhancement.pdf
Leonardo Auslender
 
4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf
Leonardo Auslender
 
4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf
Leonardo Auslender
 
4_2_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 2.pdf4_2_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 2.pdf
Leonardo Auslender
 
4_2_Ensemble models and grad boost part 3.pdf
4_2_Ensemble models and grad boost part 3.pdf4_2_Ensemble models and grad boost part 3.pdf
4_2_Ensemble models and grad boost part 3.pdf
Leonardo Auslender
 
4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf
Leonardo Auslender
 
4_3_Ensemble models and grad boost part 2.pdf
4_3_Ensemble models and grad boost part 2.pdf4_3_Ensemble models and grad boost part 2.pdf
4_3_Ensemble models and grad boost part 2.pdf
Leonardo Auslender
 
4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf
Leonardo Auslender
 
4_1_Tree World.pdf
4_1_Tree World.pdf4_1_Tree World.pdf
4_1_Tree World.pdf
Leonardo Auslender
 
Classification methods and assessment.pdf
Classification methods and assessment.pdfClassification methods and assessment.pdf
Classification methods and assessment.pdf
Leonardo Auslender
 
Linear Regression.pdf
Linear Regression.pdfLinear Regression.pdf
Linear Regression.pdf
Leonardo Auslender
 
4 MEDA.pdf
4 MEDA.pdf4 MEDA.pdf
4 MEDA.pdf
Leonardo Auslender
 
2 UEDA.pdf
2 UEDA.pdf2 UEDA.pdf
2 UEDA.pdf
Leonardo Auslender
 
3 BEDA.pdf
3 BEDA.pdf3 BEDA.pdf
3 BEDA.pdf
Leonardo Auslender
 
1 EDA.pdf
1 EDA.pdf1 EDA.pdf
0 Statistics Intro.pdf
0 Statistics Intro.pdf0 Statistics Intro.pdf
0 Statistics Intro.pdf
Leonardo Auslender
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
Leonardo Auslender
 
4 2 ensemble models and grad boost part 3 2019-10-07
4 2 ensemble models and grad boost part 3 2019-10-074 2 ensemble models and grad boost part 3 2019-10-07
4 2 ensemble models and grad boost part 3 2019-10-07
Leonardo Auslender
 

More from Leonardo Auslender (20)

1 UMI.pdf
1 UMI.pdf1 UMI.pdf
1 UMI.pdf
 
Ensembles.pdf
Ensembles.pdfEnsembles.pdf
Ensembles.pdf
 
Suppression Enhancement.pdf
Suppression Enhancement.pdfSuppression Enhancement.pdf
Suppression Enhancement.pdf
 
4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf
 
4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf
 
4_2_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 2.pdf4_2_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 2.pdf
 
4_2_Ensemble models and grad boost part 3.pdf
4_2_Ensemble models and grad boost part 3.pdf4_2_Ensemble models and grad boost part 3.pdf
4_2_Ensemble models and grad boost part 3.pdf
 
4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf
 
4_3_Ensemble models and grad boost part 2.pdf
4_3_Ensemble models and grad boost part 2.pdf4_3_Ensemble models and grad boost part 2.pdf
4_3_Ensemble models and grad boost part 2.pdf
 
4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf
 
4_1_Tree World.pdf
4_1_Tree World.pdf4_1_Tree World.pdf
4_1_Tree World.pdf
 
Classification methods and assessment.pdf
Classification methods and assessment.pdfClassification methods and assessment.pdf
Classification methods and assessment.pdf
 
Linear Regression.pdf
Linear Regression.pdfLinear Regression.pdf
Linear Regression.pdf
 
4 MEDA.pdf
4 MEDA.pdf4 MEDA.pdf
4 MEDA.pdf
 
2 UEDA.pdf
2 UEDA.pdf2 UEDA.pdf
2 UEDA.pdf
 
3 BEDA.pdf
3 BEDA.pdf3 BEDA.pdf
3 BEDA.pdf
 
1 EDA.pdf
1 EDA.pdf1 EDA.pdf
1 EDA.pdf
 
0 Statistics Intro.pdf
0 Statistics Intro.pdf0 Statistics Intro.pdf
0 Statistics Intro.pdf
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
 
4 2 ensemble models and grad boost part 3 2019-10-07
4 2 ensemble models and grad boost part 3 2019-10-074 2 ensemble models and grad boost part 3 2019-10-07
4 2 ensemble models and grad boost part 3 2019-10-07
 

Recently uploaded

Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 

Recently uploaded (20)

Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 

Eli plots visualizing innumerable number of correlations

  • 1. “On visualizing Direct and Partial Correlations – ELI plots” Leonardo E. Auslender SAS Institute, Inc., Bedminster, NJ 1. Introduction Statisticians and data analysts focus on correlations among pairs of variables to understand the strength of linear relationships in the data. Since correlations measure relations among pairs of variables, the standard output is in matrix form, which tends to be difficult to interpret for a large number of variables. The superlative analyst may also incorporate partial correlations to further deepen the analysis, which at least doubles the standard output. The hapless data-miner who faces hundreds, if not thousands, of variables does not long to wade through reams of outputs of correlations to find “interesting” patterns.1 In this paper, I present a method that enables to visualize any number of Pearson (and partial) correlations by using a Proc-Timeplot-like output I call Exploratory Linear Information (ELI) plots. Proc Timeplot is a procedure available in SAS Base, of the SAS Institute SAS software, since at least version 5.18. 2 Proc Timeplot “plots one or more variables over time intervals” (SAS Procedures Guide, v. 6, 3rd . edition, p. 579); the time interval variable acts as an index for the observations being plotted. Notice that the index variable is itself not plotted and, moreover, that it is not at all necessary to have a time variable as an index (p. 581 of the same manual, ‘date’ variable.). In this paper, our index is a variable that contains the names of the variables being correlated against a ‘with’ variable, and we plot correlations (and partial correlations if so desired) in an overlay fashion. The proposed method, embedded in a SAS macro, allows to: a) Plot correlations of either all variables against each other or against a single 'with' variable, properly sorted by the absolute value of the correlation. b) Plot on the same graph described in a) the first ‘nth’ largest absolute value partial correlations, ‘n’ being a chosen parameter dependent upon the desired crowding of information in the plot. c) Print the correlation and p-value matrices in a tabulate fashion. The standard output is usually difficult to read due to the intricacies of conceptualizing of long sequences of numbers. 3 The tabulate presentation, neater but still difficult to interpret, is necessary for documentation. 2. Exploratory data analysis, variable selection and correlation matrices. The typical practice of data analysis includes, at least in principle, exploratory data analysis, as espoused by Tukey (1977). More recently, Cleveland (1993) emphasized visualization techniques, and many research papers investigate the topic. This paper addresses the issue of visualizing correlations, itself a component of EDA, with simple tools available in the SAS System. In addition, the hurried data mining practitioner finds himself/herself in search of selecting variables for a model, a segmentation algorithm or a customer profile, in an environment of hundreds and perhaps thousands of variables. Stepwise methods, however much criticized, are one of the present methodologies used to address variable selection. In addition to variable selection techniques, practitioners also look at correlations among variables to investigate linear dependencies. Less frequently, practitioners look at squared partial (first order) correlation coefficients. Given the linear model Y = α + β X + δ Z + ε with the typical assumptions, these coefficients measure the proportion of variation of a variable Y not estimated by X that is estimated by Z in linear models. Equivalently, they measure the correlation between Y and X holding Z constant. Direct and indirect effects of X and Z on Y can be measured by the partial correlation coefficients. In the same vein, second order partial correlation coefficients can be defined by partialling out an additional variable from a first-order partial correlation. And third, fourth, etc. Specifically, given X, Y and Z, the zero order correlation between X and Y is given by: rxy = ( Σ (xi - x’) (yi -y’)) / √ Σ (xi - x’)2 Σ(yi - y’)2 where the apostrophe denotes mean value. The partial correlation of x and y, given z, is: rxy.z = ( rxy - rxz ryz) / √ (1 – rxz 2 ) (1 – ryz 2 ). 3. Programming considerations. The Corr Procedure (with which the reader should be familiar to fully understand this paper) is the basic tool for finding correlations, as in the following code embedded in a macro: PROC CORR DATA = &INDATA. OUTP = &OUTDATA. (WHERE = (_TYPE_ IN (“CORR”, “N)) RENAME = (_NAME_ = WITH)) NOPRINT; %IF %NRBQUOTE(&WITH.) > %THEN WITH &WITH.; %STR(;) VAR %DO K = 1 %TO &NUMVAR.; &&VAR&K. %END; %STR(;) RUN; In this macro-code, we are requesting not to print (NOPRINT) the correlations, but to keep them in the data set &OUTDATA. The rest of the code allows for the use of a ‘with’ variable and of selected VAR variables. The names of the variables have been kept in macro variables var1 through var&numvar. (&numvar. being the number of variables) because we require the variables to be alphabetically ordered to search for missing values later on. The standard output data set referenced by &OUTDATA. provides the correlations but not the number of observations for the ‘with’ variable. This number is critical in determining p-values, and given the prevalence of missing values in large databases, it forces us to re-capture that information. 4 (See section 3 below the typical Proc corr output).
  • 2. OUTDATA AFTER PROC CORROUTDATA AFTER PROC CORROUTDATA AFTER PROC CORROUTDATA AFTER PROC CORR OBS _TYPE_ _WITH LN_DAY N_DAYLS2 N_DAYLST N_DAYSEX N_INTRST RESPONSEOBS _TYPE_ _WITH LN_DAY N_DAYLS2 N_DAYLST N_DAYSEX N_INTRST RESPONSEOBS _TYPE_ _WITH LN_DAY N_DAYLS2 N_DAYLST N_DAYSEX N_INTRST RESPONSEOBS _TYPE_ _WITH LN_DAY N_DAYLS2 N_DAYLST N_DAYSEX N_INTRST RESPONSE 1 N 26610.00 38185.00 38185.00 38185.00 38185.00 22931.001 N 26610.00 38185.00 38185.00 38185.00 38185.00 22931.001 N 26610.00 38185.00 38185.00 38185.00 38185.00 22931.001 N 26610.00 38185.00 38185.00 38185.00 38185.00 22931.00 2 CORR LN_DAY 1.00 0.77 0.92 0.72 0.11 0.992 CORR LN_DAY 1.00 0.77 0.92 0.72 0.11 0.992 CORR LN_DAY 1.00 0.77 0.92 0.72 0.11 0.992 CORR LN_DAY 1.00 0.77 0.92 0.72 0.11 0.99 3 CORR N_DAYLS2 0.77 1.00 0.95 0.86 0.03 0.683 CORR N_DAYLS2 0.77 1.00 0.95 0.86 0.03 0.683 CORR N_DAYLS2 0.77 1.00 0.95 0.86 0.03 0.683 CORR N_DAYLS2 0.77 1.00 0.95 0.86 0.03 0.68 4 CORR N_DAYLST 0.924 CORR N_DAYLST 0.924 CORR N_DAYLST 0.924 CORR N_DAYLST 0.92 0.95 1.00 0.85 0.06 0.870.95 1.00 0.85 0.06 0.870.95 1.00 0.85 0.06 0.870.95 1.00 0.85 0.06 0.87 5 CORR N_DAYSEX 0.72 0.86 0.85 1.00 0.03 0.665 CORR N_DAYSEX 0.72 0.86 0.85 1.00 0.03 0.665 CORR N_DAYSEX 0.72 0.86 0.85 1.00 0.03 0.665 CORR N_DAYSEX 0.72 0.86 0.85 1.00 0.03 0.66 6 CORR N_INTRST 0.11 0.03 0.06 0.03 1.006 CORR N_INTRST 0.11 0.03 0.06 0.03 1.006 CORR N_INTRST 0.11 0.03 0.06 0.03 1.006 CORR N_INTRST 0.11 0.03 0.06 0.03 1.00 0.120.120.120.12 7 CORR RESPONSE 0.99 0.68 0.87 0.66 0.12 1.007 CORR RESPONSE 0.99 0.68 0.87 0.66 0.12 1.007 CORR RESPONSE 0.99 0.68 0.87 0.66 0.12 1.007 CORR RESPONSE 0.99 0.68 0.87 0.66 0.12 1.00 8 CORR SEXUNKN8 CORR SEXUNKN8 CORR SEXUNKN8 CORR SEXUNKN ----0.21 0.020.21 0.020.21 0.020.21 0.02 ----0.08 0.320.08 0.320.08 0.320.08 0.32 ----0.070.070.070.07 ----0.240.240.240.24 9 CORR TENURE9 CORR TENURE9 CORR TENURE9 CORR TENURE ----0.050.050.050.05 0.010.010.010.01 ----0.01 0.030.01 0.030.01 0.030.01 0.03 ----0.050.050.050.05 ----0.040.040.040.04 Due to the likelihood of the presence of missing values, it is necessary to find out the number of non-missing observations for every pair of variables. Since the &outdata. data set provides the number of present observations for individual variables (but not for the ‘with’ variable), it is necessary to obtain the information for those pairs in which at least one variable has missing values. Once the number of non-missing values is determined for every pair of variables, the p-values are computed by: √√√√ (N – 2). Corr ____________ , ∼∼∼∼ t (N - 2). √√√√ (1 – Corr2 ) which can be programmed as: _STAT = ABS (SQRT(_NUMOBS - 2) * _CORR / SQRT ( 1 - (_CORR * _CORR))); IF _NUMOBS > 100 OR _STAT > 40 THEN _P_VAL = ROUND ( 2 * (1 - PROBNORM (_STAT)),.00001); ELSE IF _STAT > . THEN _P_VAL = ROUND ( 2 * (1 - PROBT ( _STAT, _NUMOBS - 2 ,0 )),.00001); ELSE _P_VAL = .; At this point, we have obtained or calculated correlations and p-values that allow us to “timeplot”. Since we have p-value information (in sas data set &SASWORK.7 below), the analyst may desire to plot only significant correlations, usually given by a p-value threshold. The Timeplot code is: PROC TIMEPLOT DATA = &SASWORK.7; PLOT _CORR = "0" %IF &PARTIAL. = Y %THEN %DO K = 1 %TO &N_PRTLS.; MXPART&K. = "&K." %END; / OVERLAY NPP POS = 60 HILOC REF = 0 REFCHAR = '|' OVPCHAR = "*" AXIS = -1 TO 1 BY .02 ; ID _VARLBL ; /* VAR NAME + LABEL */ BY _WITH; /* SET OF WITH VARS */ TITLE2 %IF &PARTIAL. = Y %THEN "CORRS BY #BYVAL1, &N_PRTLS. PARTIALS REQUESTED"; %ELSE "CORRELATIONS BY #BYVAL1"; %STR(;) %IF &SGNFCNT. = Y %THEN TITLE3 "SIGNIFICANT CORRS 95% ONLY"; %STR(;) RUN;
  • 3. In this code, we request at least to plot the correlation between a set of ‘with’ and ‘var’ variables (_WITH, _CORR) identified in the plot by the value 0 (zero level correlation). If partial correlations are requested as well, calculated in a “PROC IML” step, (“%DO K = 1 %TO &N_PRTLS. …”), their values are identified by 1, 2, 3 … &N_prtls. in descending order, where &n_prtls. is a user determined parameter. The names of the variables partialled out corresponding to 1, 2, 3… are found in a later printout under the names PART1, PART2, PART3 … . We use * to denote overprinting (Ovpchar option). 3. Case Study. I present one case, without a ‘with’ variable. 5 The ‘with’ variable case is merely a subset of the more general case. All the variables are continuous and their meaning is unimportant for this exercise. The usual (clipped) printout of Proc Corr and the (clipped) Output data set generated in this case are: LN_DAYLN_DAYLN_DAYLN_DAY LN_DAY RESPONSE N_DAYLST N_DAYLS2 N_DAYSEX TOT_RCVDLN_DAY RESPONSE N_DAYLST N_DAYLS2 N_DAYSEX TOT_RCVDLN_DAY RESPONSE N_DAYLST N_DAYLS2 N_DAYSEX TOT_RCVDLN_DAY RESPONSE N_DAYLST N_DAYLS2 N_DAYSEX TOT_RCVD 1.00000 0.99097 0.92451 0.76645 0.72429 0.224471.00000 0.99097 0.92451 0.76645 0.72429 0.224471.00000 0.99097 0.92451 0.76645 0.72429 0.224471.00000 0.99097 0.92451 0.76645 0.72429 0.22447 0.0 0.0001 0.0001 00.0 0.0001 0.0001 00.0 0.0001 0.0001 00.0 0.0001 0.0001 0.0001 0.0001 0.0001.0001 0.0001 0.0001.0001 0.0001 0.0001.0001 0.0001 0.0001 26610 16057 26610 26610 26610 2661026610 16057 26610 26610 26610 2661026610 16057 26610 26610 26610 2661026610 16057 26610 26610 26610 26610 SEXUNKN N_INTRST TENURE V3 V1 V2SEXUNKN N_INTRST TENURE V3 V1 V2SEXUNKN N_INTRST TENURE V3 V1 V2SEXUNKN N_INTRST TENURE V3 V1 V2 ----0.21161 0.109580.21161 0.109580.21161 0.109580.21161 0.10958 ----0.053240.053240.053240.05324 ----0.01432 0.004370.01432 0.004370.01432 0.004370.01432 0.00437 ----0.001370.001370.001370.00137 0.0001 0.0001 0.0001 0.0195 0.4757 0.82280.0001 0.0001 0.0001 0.0195 0.4757 0.82280.0001 0.0001 0.0001 0.0195 0.4757 0.82280.0001 0.0001 0.0001 0.0195 0.4757 0.8228 26610 26610 26610 26610 26610 266126610 26610 26610 26610 26610 266126610 26610 26610 26610 26610 266126610 26610 26610 26610 26610 26610000 N_DAYLS2N_DAYLS2N_DAYLS2N_DAYLS2 N_DAYLS2 N_DAYLST N_DAYSEX LN_DAY RESPONSE TOT_RCVDN_DAYLS2 N_DAYLST N_DAYSEX LN_DAY RESPONSE TOT_RCVDN_DAYLS2 N_DAYLST N_DAYSEX LN_DAY RESPONSE TOT_RCVDN_DAYLS2 N_DAYLST N_DAYSEX LN_DAY RESPONSE TOT_RCVD 1.00000 0.95119 0.86207 0.76645 0.67704 0.199001.00000 0.95119 0.86207 0.76645 0.67704 0.199001.00000 0.95119 0.86207 0.76645 0.67704 0.199001.00000 0.95119 0.86207 0.76645 0.67704 0.19900 0.0 0.000.0 0.000.0 0.000.0 0.0001 0.0001 0.0001 0.0001 0.000101 0.0001 0.0001 0.0001 0.000101 0.0001 0.0001 0.0001 0.000101 0.0001 0.0001 0.0001 0.0001 38185 38185 38185 26610 22931 3818538185 38185 38185 26610 22931 3818538185 38185 38185 26610 22931 3818538185 38185 38185 26610 22931 38185 N_INTRST SEXUNKN TENURE V3 V1 V2N_INTRST SEXUNKN TENURE V3 V1 V2N_INTRST SEXUNKN TENURE V3 V1 V2N_INTRST SEXUNKN TENURE V3 V1 V2 0.02730 0.01862 0.009800.02730 0.01862 0.009800.02730 0.01862 0.009800.02730 0.01862 0.00980 ----0.00816 0.00204 0.001020.00816 0.00204 0.001020.00816 0.00204 0.001020.00816 0.00204 0.00102 0.0001 0.0003 0.0555 0.1109 0.6904 0.84230.0001 0.0003 0.0555 0.1109 0.6904 0.84230.0001 0.0003 0.0555 0.1109 0.6904 0.84230.0001 0.0003 0.0555 0.1109 0.6904 0.8423 38185 38185 38185 3818538185 38185 38185 3818538185 38185 38185 3818538185 38185 38185 38185 38185 3818538185 3818538185 3818538185 38185 The first line of numbers in the Proc Corr output is the corresponding correlation coefficients, while the second is the corresponding p-values. For the case of hundreds or thousands of variables, this presentation is non-informative, and the wrapping-around effect will make it tedious to review. It becomes more cumbersome when the analyst wants to simplify the task by only looking at correlations with significant p- values. In this light, we propose the following Timeplot-like output (which corresponds to the set of correlations associated with LN_DAY), adapted for visualization: ELI PLOT: CORRELATIONS BY LN_DAYELI PLOT: CORRELATIONS BY LN_DAYELI PLOT: CORRELATIONS BY LN_DAYELI PLOT: CORRELATIONS BY LN_DAY WITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAY VAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min max ----1 11 11 11 1 ****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------**** N_DAYLS2: |N_DAYLS2: |N_DAYLS2: |N_DAYLS2: | | 0 || 0 || 0 || 0 | N_DAYLST: #_days lst_clkth | | 0 |N_DAYLST: #_days lst_clkth | | 0 |N_DAYLST: #_days lst_clkth | | 0 |N_DAYLST: #_days lst_clkth | | 0 | N_DAYSEX: | | 0N_DAYSEX: | | 0N_DAYSEX: | | 0N_DAYSEX: | | 0 |||| N_INTRST: #_intrsts e_intr | | 0 |N_INTRST: #_intrsts e_intr | | 0 |N_INTRST: #_intrsts e_intr | | 0 |N_INTRST: #_intrsts e_intr | | 0 | RESPONSE: | | 0 |RESPONSE: | | 0 |RESPONSE: | | 0 |RESPONSE: | | 0 | SEXUNKN: |SEXUNKN: |SEXUNKN: |SEXUNKN: | 0 | |0 | |0 | |0 | | TENURE: # days since bec | 0 | |TENURE: # days since bec | 0 | |TENURE: # days since bec | 0 | |TENURE: # days since bec | 0 | | TOT_RCVD: tot rcvd e_rcvd | | 0TOT_RCVD: tot rcvd e_rcvd | | 0TOT_RCVD: tot rcvd e_rcvd | | 0TOT_RCVD: tot rcvd e_rcvd | | 0 |||| V1: | 0 |V1: | 0 |V1: | 0 |V1: | 0 | V2: | 0 |V2: | 0 |V2: | 0 |V2: | 0 | V3: |V3: |V3: |V3: | 0| |0| |0| |0| | ****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------****
  • 4. The previous ELI plot illustrates the correlation patterns among the variables. ‘0’ marks direct (or zero order) correlations. The plot allows the ‘stepwise-prone’ analyst to focus directly on areas of high- correlation if interested in variable selection. In this case, N_Dayls2, N- daylst, N-daysex, etc. These areas will be the ones closer to the (-1, +1) axes. The midpoint of the plot marks the zero correlation mark. Further, for every “(with, var)” pair, we can also plot the four (or any number so desired) largest 1st order partial correlations, denoted by the numbers 1 through 4. Overlaps are denoted by ‘*’. The printout titled “DIRECT & PARTIAL VAR NAMES” details the names of the variables for each of the plotted correlations. ELI PLOT: CORRS BY LN_DAY, 4 PARTIALS REQUESTEDELI PLOT: CORRS BY LN_DAY, 4 PARTIALS REQUESTEDELI PLOT: CORRS BY LN_DAY, 4 PARTIALS REQUESTEDELI PLOT: CORRS BY LN_DAY, 4 PARTIALS REQUESTED WITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAY VAR_NAME_+_LABEL mVAR_NAME_+_LABEL mVAR_NAME_+_LABEL mVAR_NAME_+_LABEL min maxin maxin maxin max ----1 11 11 11 1 ****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------**** N_DAYLS2: | 2N_DAYLS2: | 2N_DAYLS2: | 2N_DAYLS2: | 2------------------------------------------------------------------------------------------------------------||||------------------------------------------------------------------------------------*3*3*3*3----------------1 |1 |1 |1 | N_DAYLST: #_days lst_clkth | | *3* |N_DAYLST: #_days lst_clkth | | *3* |N_DAYLST: #_days lst_clkth | | *3* |N_DAYLST: #_days lst_clkth | | *3* | N_DAYSEX:N_DAYSEX:N_DAYSEX:N_DAYSEX: | | *| | *| | *| | *------------1 |1 |1 |1 | N_INTRST: #_intrsts e_intr | | ** |N_INTRST: #_intrsts e_intr | | ** |N_INTRST: #_intrsts e_intr | | ** |N_INTRST: #_intrsts e_intr | | ** | RESPONSE: | |RESPONSE: | |RESPONSE: | |RESPONSE: | | * |* |* |* | SEXUNKN: | 1SEXUNKN: | 1SEXUNKN: | 1SEXUNKN: | 1--------------------------------****------------40 | |40 | |40 | |40 | | TENURE: # days since bec | *3* | |TENURE: # days since bec | *3* | |TENURE: # days since bec | *3* | |TENURE: # days since bec | *3* | | TOT_RCVD: tot rcvd e_rcvdTOT_RCVD: tot rcvd e_rcvdTOT_RCVD: tot rcvd e_rcvdTOT_RCVD: tot rcvd e_rcvd | | *1 || | *1 || | *1 || | *1 | V1: | 1V1: | 1V1: | 1V1: | 1----* |* |* |* | V2: | *V2: | *V2: | *V2: | * |||| V3: | *|1 |V3: | *|1 |V3: | *|1 |V3: | *|1 | ****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------**** ELI PLOT: CORRS BY N_DAYLS2, 4 PELI PLOT: CORRS BY N_DAYLS2, 4 PELI PLOT: CORRS BY N_DAYLS2, 4 PELI PLOT: CORRS BY N_DAYLS2, 4 PARTIALS REQUESTEDARTIALS REQUESTEDARTIALS REQUESTEDARTIALS REQUESTED WITH:=N_DAYLS2WITH:=N_DAYLS2WITH:=N_DAYLS2WITH:=N_DAYLS2 VAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min max ----1 11 11 11 1 ****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------**** LN_DAY: | 2LN_DAY: | 2LN_DAY: | 2LN_DAY: | 2------------------------------------------------------------------------------------------------------------||||------------------------------------------------------------------------------------*3*3*3*3----------------1 |1 |1 |1 | N_DAYLST: #_days lst_clkth |N_DAYLST: #_days lst_clkth |N_DAYLST: #_days lst_clkth |N_DAYLST: #_days lst_clkth | | ** || ** || ** || ** | N_DAYSEX: | | *N_DAYSEX: | | *N_DAYSEX: | | *N_DAYSEX: | | *----1 |1 |1 |1 | N_INTRST: #_intrsts e_intr | 1*4|0 |N_INTRST: #_intrsts e_intr | 1*4|0 |N_INTRST: #_intrsts e_intr | 1*4|0 |N_INTRST: #_intrsts e_intr | 1*4|0 | RESPRESPRESPRESPONSE: | *ONSE: | *ONSE: | *ONSE: | *------------------------------------------------------------------------------------------------------------||||----------------------------------------------------------------------------*3 |*3 |*3 |*3 | SEXUNKN: | 1SEXUNKN: | 1SEXUNKN: | 1SEXUNKN: | 1------------------------------------------------------------|0|0|0|0------------------------*2 |*2 |*2 |*2 | TENURE: # days since bec |TENURE: # days since bec |TENURE: # days since bec |TENURE: # days since bec | 40404040----* |* |* |* | TOT_RCVD: tot rcvd e_rcvd | | * |TOT_RCVD: tot rcvd e_rcvd | | * |TOT_RCVD: tot rcvd e_rcvd | | * |TOT_RCVD: tot rcvd e_rcvd | | * | V1: | 1* |V1: | 1* |V1: | 1* |V1: | 1* | VVVV2: | * |2: | * |2: | * |2: | * | V3: | * |V3: | * |V3: | * |V3: | * | ****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------**** Let us concentrate on a specific example. For instance, the first line of the first diagram above (shown just below for clarity of exposition) plots LN_DAY (‘with’ variable) against N_DAYLS2, and four first-order partials in decreasing absolute order of magnitude. The correlations are joined by hyphens that allow for a more compact view. ‘1’ in the first line of the graph corresponds to the correlation between LN_DAY and N_DAYLS2 after partialling out RESPONSE (which corresponds to variable PART1 in the first observation of the printout below). ‘2’
  • 5. corresponds to the next largest absolute partial correlation, which corresponds to N_DAYLST, etc. In the diagram, there is an overlap between the zero-order correlation and the partial corresponding to N_INTRST (PART4), denoted by ‘*’. Given the distance of all these correlations from the mid-point of zero correlation, the analyst might deem these variables worth for further study. While p-values for direct correlations are given in a tabulate below, corresponding p-values for the partial correlations are not calculated at present. WITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAYWITH:=LN_DAY VAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min maxVAR_NAME_+_LABEL min max ----1 11 11 11 1 ****------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------**** N_DAYLS2: | 2N_DAYLS2: | 2N_DAYLS2: | 2N_DAYLS2: | 2------------------------------------------------------------------------------------------------------------||||------------------------------------------------------------------------------------*3*3*3*3----------------1 |1 |1 |1 | DIRECT & PARTIAL VAR NAMESDIRECT & PARTIAL VAR NAMESDIRECT & PARTIAL VAR NAMESDIRECT & PARTIAL VAR NAMES WITH=LN_DAYWITH=LN_DAYWITH=LN_DAYWITH=LN_DAY OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4 1 N_DAYLS2 RESPONSE N_DAYLST SEXUNKN N_INTRST 2 N_DAYLST RESPONSE N_DAYLS2 SEXUNKN TENURE 3 N_DAYSEX SEXUNKN TENURE N_INTRST V1 4 N_INTRST N_DAYLS2 N_DAYLST N_DAYSEX V3 5 RESPONSE N_DAYLST N_DAYLS2 V1 TENURE 6 SEXUNKN N_DAYSEX N_DAYLST N_DAYLS2 TOT_RCVD 7 TENURE N_DAYLST N_DAYSEX N_DAYLS2 RESPONSE 8 TOT_RCVD SEXUNKN TENURE V3 V1 9 V1 RESPONSE N_DAYSEX N_DAYLST TOT_RCVD 10 V2 RESPONSE N_DAYLS2 N_DAYLST N_DAYSEX 11 V3 RESPONSE TOT_RCVD N_INTRST V2 WITH=N_DAYLS2WITH=N_DAYLS2WITH=N_DAYLS2WITH=N_DAYLS2 OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4OBS VAR PART1 PART2 PART3 PART4 12 LN_DAY RESPONSE N_DAYLST SEXUNKN N_INTRST 13 N_DAYLST LN_DAY RESPONSE SEXUNKN N_INTRST 14 N_DAYSEX SEXUNKN TENURE V1 V2 15 N_INTRST N_DAYLST LN_DAY RESPONSE TOT_RCVD 16 RESPONSE LN_DAY N_DAYLST SEXUNKN N_INTRST 17 SEXUNKN N_DAYSEX N_DAYLST LN_DAY RESPONSE 18 TENURE LN_DAY N_DAYLST RESPONSE N_DAYSEX 19 TOT_RCVD N_INTRST V3 V1 V2 20 V1 RESPONSE N_DAYSEX TOT_RCVD TENURE 21 V2 N_DAYLST LN_DAY N_DAYSEX RESPONSE 22 V3 TOT_RCVD SEXUNKN TENURE N_INTRST ELI plots allow for a different configuration as well. Instead of plotting the largest first-order partial correlations in addition to the zero order one, we can plot the largest of the first-order, second largest, third largest, etc. For the sake of brevity, this excursion is omitted. Finally, and for documentation purposes, the correlation coefficients and corresponding p-values are also tabulated
  • 6. : UPPER TRIANGULAR MATRIX ALPHABETICALLY ORDERED „ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ† ‚CORRELATIONS ‚ ‚ ‚#_days‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚lst_c-‚ ‚#_int-‚ ‚ ‚# days‚ ‚ ‚ ‚ ‚ ‚lkthru‚ ‚ rsts ‚ ‚ ‚since ‚ tot ‚ ‚ ‚ ‚N_DAY-‚&_dec-‚N_DAY-‚e_int-‚RESPO-‚SEXUN-‚became‚ rcvd ‚ ‚ ‚LN_DAY‚ LS2 ‚.16.99‚ SEX ‚ rs2 ‚ NSE ‚ KN ‚member‚e_rcvd‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚VARIABLE ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚LN_DAY ‚ ‚ 0.77‚ 0.92‚ 0.73‚ 0.10‚ 0.99‚ -0.21‚ -0.04‚ 0.23‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚N_DAYLS2 ‚ ‚ ‚ 0.95‚ 0.86‚ 0.03‚ 0.68‚ 0.02‚ 0.01‚ 0.20‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚N_DAYLST ‚ ‚ ‚ ‚ 0.85‚ 0.06‚ 0.87‚ -0.08‚ -0.01‚ 0.22‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚N_DAYSEX ‚ ‚ ‚ ‚ ‚ 0.03‚ 0.66‚ 0.32‚ 0.03‚ 0.22‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚N_INTRST ‚ ‚ ‚ ‚ ‚ ‚ 0.12‚ -0.07‚ -0.05‚ 0.29‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚RESPONSE ‚ ‚ ‚ ‚ ‚ ‚ ‚ -0.24‚ -0.04‚ 0.22‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚SEXUNKN ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.11‚ 0.06‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚TENURE ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.00‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚TOT_RCVD ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒŒ
  • 7. „ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ…ƒƒƒƒƒƒ† ‚P_VALS OF CORRS ‚ ‚ ‚#_days‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚lst_c-‚ ‚#_int-‚ ‚ ‚# days‚ ‚ ‚ ‚ ‚ ‚lkthru‚ ‚ rsts ‚ ‚ ‚since ‚ tot ‚ ‚ ‚ ‚N_DAY-‚&_dec-‚N_DAY-‚e_int-‚RESPO-‚SEXUN-‚became‚ rcvd ‚ ‚ ‚LN_DAY‚ LS2 ‚.16.99‚ SEX ‚ rs2 ‚ NSE ‚ KN ‚member‚e_rcvd‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚VARIABLE ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚LN_DAY ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚N_DAYLS2 ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.056‚ 0.000‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚N_DAYLST ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.028‚ 0.000‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚N_DAYSEX ‚ ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚N_INTRST ‚ ‚ ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ 0.000‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚RESPONSE ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ 0.000‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚SEXUNKN ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.000‚ 0.000‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚TENURE ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ 0.831‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒ‰ ‚TOT_RCVD ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒ‹ƒƒƒƒƒƒŒ Since many correlations may not be significant at an alpha level of, say, 95%, the ELI graphs can be made to portray significant correlations only. In our example however, we presented all possible effects with corresponding partial correlations. 6. Trademarks. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. 7. End Notes. 1 Data mining has often been defined as the search for patterns, interesting or otherwise. Curiously, “interesting” is in the eye of the beholder, and patterns are not well defined. Ergo, any tool that purports to find interesting patterns belongs under the rubric of data mining, which thus cannot properly define any scientific application, since almost anything can belong to it. My own preference is “Giga-data analysis” (as opposed to the more traditional statistician’s “small data set analysis”). It is in this spirit that I envision this paper. Since information from data requires the processes of summarization, conceptualization, interpretation and application, the data analyst victorious in all these steps after successful perusal of reams of pages might require hospitalization as well 2 Yes, I am that old. This paper deals only with Pearson correlation coefficients, but the additional use of other measures contained in Proc Corr is straightforward. Programming Timeplot-like diagrams in other software should not pose an insurmountable task. I created my first diagram in Basic in 1980. Additionally, the adjustment necessary for correlations among continuous and categorical as well as among categorical variables can be easily added. 3 I consider the name Timeplot a limiting and misleading denomination. C’est la vie. 5 Partial correlations can also be understood as the correlation between the residuals of a regression between Y and X, and between Y and Z. See Cohen and Cohen (1983) for an overall discussion, and Leahy (1996) for suppression effects in the area of data base marketing. 6 The skillful programmer might be enticed to utilize Proc Printto. My preference for a more arduous route is based on the additional flexibility provided to enhance the overall procedure, such as including partial correlations in one step, multiple comparisons of correlations, Drezner’s Multirelation (1995), etc. Missing values are excluded from the calculation of correlations in a pair-wise form. For a proposed solution to the problem of missing values in the context of large databases, see Auslender (1997).
  • 8. 7 The macro at present accepts only one ‘with’ variable. It is a straightforward modification to enhance the code to accept multiple ‘with’ variables. 8. Bibliography Auslender L., Missing Value Imputation Methods for Large DataBases, Proceedings of the 1997 northeastern SAS Users Group Meeting, 1997. Cleveland W., Visualizing Data, Hobart Press, USA, 1993. Cohen J., Cohen P. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, Lawrence Erlbaum Associates, Publishers, 1983. Drezner, Z., Multirelation – Correlation among more than two variables, Computational Statistics and Data Analysis, 1995, March. Hoaglin D., Mosteller F., Tukey J., Understanding Robust and Exploratory Data Analysis, John Wiley & Sons, 1983. Leahy K., Nature, prevalence, and benefits of suppression effects in direct response segmentation, Proceedings of the American Statistical Association 1995 Meeting, 1996. 9. Contact Information Your comments and questions are valued and encouraged. Contact the author at: Leonardo E. Auslender SAS Institute 1545 Rt. 206 N, Suite 270 Bedminster, NJ 07921 908 470 0080 x 8217 (o) 908 470 0081 (f) leonardo.auslender@sas.com