For Problem 2, you are to evaluate the given analysis and interpretation for clarity, completeness,
sufficiency, accuracy, and consistency. Indicate what you think is good, not good, and what you
would do differently. Note: points will be deducted for comments on format. The critique must
be about predictive analytics, not layout. Do not copy the report into your exam, rather use the
2 Return Surgeries, 15 points
For this problem, you are to evaluate the analysis and interpretation for clarity, completeness,
sufficiency, accuracy, and consistency. Indicate what you think is good, not good, and what you
would do differently. Keep your assessments by the numbering system to assure your criticisms
coincide with the respective material.
I do not want comments on format. Whether a figure is not in a convenient place is of not
interest. Grammar and spelling are of no interest. I will mark down for non-essential criticism.
Focus on the analysis and the interpretation of the results.
2.1 Introduction
A continuing question on daily return surgeries time series indices is whether any one series is
interchangeable with another; i.e., does one surgery index time series have the same daily counts
as some other particular surgery index time series within specified statistical error? Statistical
differences in surgery index time series include, e.g., networks of multiple observers or counting
methodology. The Debrecen index is compared to the Surgery Tracking And Recognition Algorithm
(STARA) index, and with the Addendum of Authenticated and Verified Surgery Observations
(AAVSO) index.
A pairwise comparison of index counts is confounded by the possible autocorrelation of each
series, and hence a traditional regression-type comparison is inappropriate as the autocorrelation
violates the regression independence assumption. In addition, two aspects of a time series must
be examined for a comparison; when the count occurred and the count magnitude. The analytical
methodologies include autocorrelation and cross-correlation from statistical time series analysis to
determine when a count occurred, and the nonparametric Wilcoxon signed rank test to compare
magnitudes of the count. The time series analyses are used to determine pairwise day-by-day
alignment. Once the paired series are time-aligned, the count magnitudes are made using the
Wilcoxon signed rank test as the counts data do not follow a normal (Gaussian) distribution.
Section 2.2 is a description of the three returning time series data sets; Section 2.3 discusses
the time series statistical analyses of each of the three data sets; Section 2.4 is the data set com-
parisons, or statistical time series cross-correlation analysis including a brief explanation of why
regression is inappropriate; Section 2.5 are the count magnitude comparisons; and Section 2.6 are
the conclusions.
2.2 Data Sets
This section describes the daily returning surgeries times series of the AAVSO, Debrecen, and
STARA data ...
Z Score,T Score, Percential Rank and Box Plot Graph
For Problem 2, you are to evaluate the given analysis and inte.docx
1. For Problem 2, you are to evaluate the given analysis and
interpretation for clarity, completeness,
sufficiency, accuracy, and consistency. Indicate what you think
is good, not good, and what you
would do differently. Note: points will be deducted for
comments on format. The critique must
be about predictive analytics, not layout. Do not copy the report
into your exam, rather use the
2 Return Surgeries, 15 points
For this problem, you are to evaluate the analysis and
interpretation for clarity, completeness,
sufficiency, accuracy, and consistency. Indicate what you think
is good, not good, and what you
would do differently. Keep your assessments by the numbering
system to assure your criticisms
coincide with the respective material.
I do not want comments on format. Whether a figure is not in a
convenient place is of not
interest. Grammar and spelling are of no interest. I will mark
down for non-essential criticism.
Focus on the analysis and the interpretation of the results.
2.1 Introduction
A continuing question on daily return surgeries time series
2. indices is whether any one series is
interchangeable with another; i.e., does one surgery index time
series have the same daily counts
as some other particular surgery index time series within
specified statistical error? Statistical
differences in surgery index time series include, e.g., networks
of multiple observers or counting
methodology. The Debrecen index is compared to the Surgery
Tracking And Recognition Algorithm
(STARA) index, and with the Addendum of Authenticated and
Verified Surgery Observations
(AAVSO) index.
A pairwise comparison of index counts is confounded by the
possible autocorrelation of each
series, and hence a traditional regression-type comparison is
inappropriate as the autocorrelation
violates the regression independence assumption. In addition,
two aspects of a time series must
be examined for a comparison; when the count occurred and the
count magnitude. The analytical
methodologies include autocorrelation and cross-correlation
from statistical time series analysis to
determine when a count occurred, and the nonparametric
Wilcoxon signed rank test to compare
magnitudes of the count. The time series analyses are used to
determine pairwise day-by-day
alignment. Once the paired series are time-aligned, the count
magnitudes are made using the
Wilcoxon signed rank test as the counts data do not follow a
normal (Gaussian) distribution.
Section 2.2 is a description of the three returning time series
data sets; Section 2.3 discusses
the time series statistical analyses of each of the three data sets;
Section 2.4 is the data set com-
3. parisons, or statistical time series cross-correlation analysis
including a brief explanation of why
regression is inappropriate; Section 2.5 are the count magnitude
comparisons; and Section 2.6 are
the conclusions.
2.2 Data Sets
This section describes the daily returning surgeries times series
of the AAVSO, Debrecen, and
STARA data sets. The descriptions indicate some of the
characteristics that must be accounted for
prior to a statistical comparison. The AAVSO series is
described first, followed by the Debrecen
series, and ending with the STARA series.
2.2.1 AAVSO Data
The AAVSO’s program of data-gathering and analysis of
surgeries has been active since its inception
in 1944. AAVSO raw data are submitted monthly as sets of
date- and time-stamped values. The
pre scrubbed AAVSO data contain 34,435 returning surgery
counts that span from May 1, 2010
through July 12, 2013. The left panel of Figure 1 shows that
these data are truncated on the left
at zero counts, skewed to the right. The histogram suggests
these count data follow a Poisson
distribution.
2.2.2 Debrecen Data
The pre scrubbed Debrecen data contain 41,866 daily returning
surgery counts that span from
4. December 4, 1981 through January 5, 2011. As with the AAVSO
data, the middle panel of Figure
1 shows that these data are truncated on the left at zero counts,
skewed to the right. The histogram
suggests these data follow a Poisson distribution.
Figure 1: AAVSO, Debrecen, and STARA index counts
histograms of the pre scrubbed data. The
green dashed curves are best-fit exponential distributions, and
the black solid curves are best-fit
gamma distributions..
2.2.3 STARA Data
The STARA data contain 1,152 daily returning surgery return
counts span from May 1, 2010
through July 12, 2013. The right panel of Figure 1 shows that
these data are truncated on the
left at zero counts, and are skewed to the right. This suggests
these counts data follow a Poisson
distribution.
The Poisson distributions of each of these data sets affect the
accuracy of the paired count
magnitude comparisons, as will be seen below.
2.3 Autocorrelation Analysis
A time series is a stochastic process where the index set is of
countable time increments; i.e., a
time series is a set of observations, xt, each recorded at a
specified time t. To allow for the possibly
unpredictable nature of future observations we may suppose that
each observation is a realization
5. of a random variable Xt. The time series {xt, t ∈ T0} is a
realization of the family of random
variables {Xt, t ∈ T}, where T ≥ T0.; i.e., the realization xt is a
subset of all possible values of Xt.
The following time series process analyses assess whether the
count pairings are index set (time)
aligned. This alignment is necessary before paired counts
magnitude comparisons can be made.
The times series autocorrelation analysis is preceded by a
descriptive analysis of the data sets.
2.3.1 Descriptive Analysis
The AAVSO and Debrecen series have days with multiple
observations which we summarize by the
count median. Further, the time span of each data set must be
matched. The common span is
found to be from May 1, 2010 through January 5, 2011. The
time series that result from using the
daily median and matched spans are displayed in Figures 2 and
3. Figure 2 depicts the three series
in a stacked, matched-span plot. The AAVSO data are in the
upper panel, the Debrecen data are
in the middle panel, and the lower panel has the STARA data.
These plots show ambiguously
matched count magnitudes.
Figure 2: The AAVSO (top panel), Debrecen (middle panel),
and STARA (bottom panel)
matched-span time series plot. The data are daily..
Figure 2 has the three matched-span series superimposed over
each other. The AAVSO series
is the solid black curve, the Debrecen series is the dashed red
curve, and the STARA series is the
dotted green curve. As with the stacked plot, this plot also
6. shows no obvious coincidence in count
magnitude.
Figure 3: The three matched-span series superimposed. The
AAVSO series is the solid black
curve, the Debrecen series is the dashed red curve, and the
STARA series is the dotted green curve.
The data are daily..
Fortunately, statistical time series analysis is able to remove
much of the apparent ambiguity.
Time series analysis will help determine if the counts are time-
aligned. Once this outcome is
available, a magnitude comparison is possible.
2.3.2 Autocorrelation Models
Before the counts time series magnitudes can be compared, the
individual time series must be
examined for autocorrelation, as autocorrelation inflates the
series variability. A critical property of
any time series is stationarity, which is required to assess the
autocorrelations and cross-correlations.
Stationarity is the property of a time series in which, over a
specified time span, the mean and
variance of the series is constant. This is the time series
analysis equivalent of the mean zero,
constant variance assumption requirement for such statistical
methods a the t-test, analysis of
variance, and regression. If a time series follows a Gaussian
distribution, then it can be shown that
the time series is stationary.
We saw above that the three counts time series do not follow a
7. normal distribution, and hence
stationarity may not be assumed. A commonly used
transformation to obtain a stationary time
series is differencing. A first difference transformation is
5Xt = Xt −Xt−1 = Xt −BXt = (1 −B)Xt, (1)
where 5Xt is the tth first difference operation between the tth
and the t−1st values of the random
variable X, and B is the back shift operator such that BXt =
Xt−1. The differencing operator
may be extended to second (5(2)), third (5(3)), etc., differences,
as can the back shift operator B,
but higher order differencing is not needed for the return
surgeries time series. The first difference
transformation results in stationarity for each of the three
series.
With stationarity established, we can examine each series for
autocorrelation. The sample
autocorrelation function (ACF) and the sample partial
autocorrelation function (PACF), and their
associated plots, are used to identify if and what types of
autocorrelation exist. The sample ACF
measures time series white noise autocorrelation as a moving
average order. The sample PACF
measures time series autocorrelation as the order of
autoregression
In Figures 4 and 5, the panels on the diagonal depict the first-
differenced (lag 1) series sample
ACF and sample PACF respectively. The off-diagonal panels
are unadjusted cross-correlations
8. between paired series, and are here ignored pending further time
series analysis. In each figure, the
row one column one panel is the AAVSO series, the second row
second column panel is the Debrecen
series, and the third row third column is the STARA series, each
after taking first differences. We
are interested in the plot lag values of each panel that extend
above or below the horizontal blue
dashed 95% confidence interval (CI) lines. Each series has 211
days of return surgery counts, and
at the 95% CI, this suggests that there are 0.05 × 211 ≈ 11
expected CI marginal overreaches. We
therefore are interested in those lag patterns that strongly
extend outside the CI band.
Figure 4 is the sample ACF of the three series. The zeroth lag (t
= t) is ignored in each sample
ACF plot. The AAVSO plot suggests a lag 1 (preceding day)
moving average model should be
examined. The Debrecen plot indicates that both a lag 1 and a
lag 3 moving average model may
be appropriate. The STARA plot, like the AAVSO plot, suggests
a lag 1 moving average model
should be tested.
Figure 4: The sample ACFs of the AAVSO, Debrecen, and
STARA time series..
Figure 5 is the sample PACF of the three series. In each panel
on the diagonal of the plot, there
are no systematic overreaches of the CIs, i.e., the overreaches
appear random, which suggests no
autoregressive behavior in these three series.
9. Figure 5: The sample PACFs of the AAVSO, Debrecen, and
STARA time series..
The sample ACF and sample PACF suggest the types of time
series models for each surgery
count source. The models take the form of Autoregressive
Integrated Moving Average (ARIMA)
models. The models are denoted as ARIMA(p,d,q), where AR
refers to the autoregressive compo-
nent, I refers to the integrated component which determines the
order of differencing to establish
stationarity, MA refers to the moving average component, and
p, d, and q are the non-negative
integers indicating the orders of autoregression, integration, and
moving averaging, respectively.
The ARIMA analysis of the AAVSO series gives a ARIMA(0, 1,
1) model, the Debrecen model is
ARIMA(0, 1, 3), and the STARA model ARIMA(1, 1, 3).
Goodness-of-fit indicators for the ARIMA models are
cumulative periodograms of the model
standardized residuals, and time series plots of the standardized
residuals. The behavior of the
model residuals are particularly important for the cross-
correlation analysis below. Figures ??
and ?? are the diagnostics for the AAVSO ARIMA(0, 1, 1)
model. Figure ?? is the cumulative
periodogram. The blue dashed diagonal lines define a 95% CI
band that, if the black cumulative
periodogram curve lies within, suggests the model is adequate.
Containment of the curve within the
CI band suggests it follows a normal distribution, which is an
indicator of model adequacy. Figure
?? has three diagnostic plots. The top panel is the standardized
residuals time series plot which
indicates an adequate model when no more than 11 residuals
10. exceed the plus or minus 3 standard
deviation levels. The middle panel is the sample ACF of the
residuals which suggest the ARIMA
model is adequate as all the lags lie within the horizontal red
dashed 95% CI levels. The bottom
panel is the Ljung-Box p-value plot in which no p-values fall
below the threshold line indicating an
adequate model. Hence, the ARIMA(0, 1, 1) may be considered
a reasonable model of the AAVSO
series.
Figure 6: AAVSO series ARIMA model diagnostic plots..
Figures ?? and ?? are the diagnostics for the Debrecen
ARIMA(0, 1, 3) model. Figure ?? is
the cumulative periodogram which suggests it follows a normal
distribution. Figure ?? has the
time-based diagnostic plots. The standardized residuals time
series plot has only 2 of the possible
11 values that lie outside ±3 standard deviations. The sample
ACF of the residuals suggest the
ARIMA model has all the lags within the 95% CI band. The
Ljung-Box p-value plot has no p-
values below the horizontal red threshold line. Hence, the
ARIMA(0, 1, 3) may be considered a
reasonable model of the Debrecen series.
Figure ?? and ?? are the diagnostics for the STARA ARIMA(1,
1, 3) model. Figure ?? is the
cumulative periodogram which suggests the periodogram is
normally distributed. Figure ?? has
the three time series diagnostic plots. The standardized
residuals time series plot has no residuals
11. outside the plus or minus 3 standard deviation levels. The
sample ACF of the residuals has all the
lags within the horizontal red dashed 95% CI levels. The Ljung-
Box p-value plot has no p-values
below the horizontal red threshold line. Hence, the ARIMA(1,
1, 3) may be considered a reasonable
model of the STARA series.
The autocorrelation of each of the three return surgery data sets
has been identified and de-
scribed. The residuals analyses show that the residuals of each
time series are reduced to white
noise, and thus the residuals are independent between any series
pair. This is an important property
for the series comparisons. We may now make pairwise
comparisons of the data sets.
Figure 7: Debrecen series ARIMA model diagnostic plots..
2.4 Cross-Correlation Analysis
The panel of scatter plots of the count sources in Figure 9 show
the paired series associations. The
second row, column one panel shows that the Debrecen versus
AAVSO data have a clear nonlinear
relationship with the smaller counts having the greater
nonlinearity, and the large counts have the
greater variability. A similar nonlinear relationship exists
between the Debrecen and STARA series,
which is depicted in the second row, column three panel.
However, the STARA versus AAVSO data
exhibit a more nearly linear relationship, though the variability
of the larger counts is greater. This
relationship is shown in the panel in the third row of the first
12. column. Some of these characteristics
have been addressed by constructing ARIMA models for each
series, and it is with these models
that the cross-correlations, i.e., the time-based alignment, may
be developed.
With autocorrelated data it is difficult to assess the dependence
or comparison between any
two time series. It is therefore necessary to disentangle the
linear association between any two
series from their respective autocorrelations. Another property
that must be satisfied is that the
two series must be stationary and independent of each other.
While the data may be stationary,
they must still be transformed to white noise to assure
independence. The transformation may be
accomplished by using the residuals from the respective series
ARIMA models. We saw from the
ARIMA model diagnostics that the residuals from the series
ARIMA models are white noise, thus
implying that the residuals of the ARIMA models are
independent. For example, it was shown
that the AAVSO data are adequately modeled by an ARIMA(0,
1, 1) with no intercept term, so,
Figure 8: STARA series ARIMA model diagnostic plots..
for xt representing the AAVSO counts,
x̄ t = zt −θzt−1
= (1 −θB) zt, (2)
where x̄ t is the white noise model return surgery count at time t,
zt is the white noise value at
13. time t, and θ is the white noise parameter that is estimated from
the ARIMA model analysis.
The ARIMA model residuals x̄ t, t = 0,±1,±2, · · · , are white
noise and this process is known as
prewhitening.
We now compare the two series using the cross-correlation
function (CCF) by prewhitening one
series with its ARIMA model. The other series then is filtered
through this same ARIMA model.
Stationarity is assured by incorporating the first difference in
the ARIMA filter. As prewhitening is a
linear operation, any linear relationship between the two series
will be preserved after prewhitening.
For example, to compare the AAVSO data with the Debrecen
data, first prewhiten the AAVSO
data using its ARIMA model. Then filter the Debrecen data with
the AAVSO ARIMA model.
Finally, use the CCF to look for lags between the two series.
Often a regression model is used to measure the relationship of
one counts series to another. The
fallacy of this method arises from the violation of two
assumptions of regression: (i) the response
must follow a normal distribution, and (ii) the two series must
be independent. The first assumption
was shown above to be violated as the counts follow a Poisson
distribution. The second assumption
is violated as demonstrated by the autocorrelation identified in
the ARIMA model analyses, which
is an indictment of non-independence.
Figure 10 is the sample CCF between the ARIMA(0, 1, 1)
filtered Debrecen counts and the
14. Figure 9: Scatter plots of the return surgeries count sources
show the paired series associations..
ARIMA(0, 1, 1) prewhitened AAVSO counts. It is clear from
the plot that the only lag is at zero,
which suggests that the two series are nearly aligned in time.
Figure 11 is the sample CCF between the ARIMA(0, 1, 1)
filtered STARA counts and the
ARIMA(0, 1, 1) prewhitened AAVSO counts. The plot shows
balance between the AAVSO the
STARA data. The AAVSO series and the STARA series is
balanced at lag 0. This balance
suggests that the two series are aligned in time.
Figure 12 is the sample CCF between the ARIMA(1, 1, 3)
filtered Debrecen counts and the
ARIMA(1, 1, 3) prewhitened STARA counts. The AAVSO
series and the STARA series are bal-
anced at lag 0. This balance suggests that the two series are
aligned in time.
The cross-correlation analysis gives the pairwise time
alignments to compare the magnitude of
the counts for each series. The cross-correlation between the
AAVSO and Debrecen series have zero
lag and hence they are aligned. The same result holds for the
cross-correlation between the AAVSO
and STARA data, i.e., they are aligned. Similarly, the cross-
correlation between the STARA and
Debrecen data show they are aligned.
2.5 Magnitude Comparison
With the appropriate shifts for each return surgery counts series
15. if needed, the counts magnitude
comparison is tested with the Wilcoxon signed ranks test. This
test is used over the t-test as the
counts data do not follow a normal distribution, which is an
assumption required for the t-test. The
n time-ordered data pairs (x1,1,x2,1), (x1,2,x2,2), · · · ,
(x1,n∗ ,x2,n∗ ) for which the absolute value of
Figure 10: The sample CCF between the ARIMA(0, 1, 1)
filtered Debrecen counts and the
ARIMA(0, 1, 1) prewhitened AAVSO count residuals..
Figure 11: The sample CCF between the ARIMA(0, 1, 1)
filtered STARA counts and the
ARIMA(0, 1, 1) prewhitened AAVSO count residuals..
Figure 12: The sample CCF between the ARIMA(1, 1, 3)
filtered Debrecen counts and the
ARIMA(1, 1, 3) prewhitened STARA count residuals..
the differences are found such that
Di = x1,i −x2,i, i = 1, . . . ,n∗ . (3)
Simplistically, all differences with the value 0 are eliminated so
the remaining differences are n ≤ n∗ .
The n |Di| differences are ordered from lowest to highest, and
then are ranked 1 to n. The ith rank
Ri is designated as a positive rank if Di > 0, or Ri is designated
as a negative rank if Di < 0. The
test statistic is the sum of the positive signed ranks:
16. T∗ =
∑
Ri, ∀ Ri 3 Di > 0, i = 1, . . . ,n. (4)
The test statistic T∗ is compared to the quantiles of a
distribution whose shape varies depending
on conditions.
Table 2 lists the surgery counts time series pairs and their
respective Wilcoxon signed rank
test statistics. The last column in the table indicates if the count
magnitudes may be considered
statistically equal. Only the STARA and Debrecen time series
have statistically identical daily
counts.
Table 2: Wilcoxon rank sum test with continuity correction
counts magnitude comparison..
X Y n W P(>W) X = Y
AAVSO Debrecen 211 35368.5 < 2.2e− 16 no
AAVSO STARA 211 34903 < 2.2e− 16 no
STARA Debrecen 210 22286.5 0.8468 yes
2.6 Conclusions
Three time series of daily returning surgeries counts were
compared for interchangeability; i.e.,
does one return surgery time series have the same daily counts
as some other particular time series
within specified statistical error? Each series had peculiarities,
e.g., networks of multiple observers
17. or counting methodology, for which some adjustments were
made in the time series and magnitude
analyses.
The Debrecen time series was compared to the STARA time
series, and with the AAVSO time
series. Also, the STARA and AAVSO series were compared.
These daily time series were shown to
be autocorrelated which was accounted for before the series
were compared.
Each time series was made stationary by taking the first
difference. The autocorrelation function
and the partial autocorrelation function were used to identify
the order and type of autocorrela-
tion for each of the series. The analysis of the AAVSO series
gave the ARIMA(0, 1, 1) model,
the Debrecen series analysis gave the ARIMA(0, 1, 3) model,
and the STARA analysis gave the
ARIMA(1, 1, 3) model.
The cross-correlation function (CCF) between the ARIMA(0, 1,
1) filtered Debrecen counts
and the ARIMA(0, 1, 1) prewhitened AAVSO counts showed the
count changes occurred on the
same days. It was clear from the plot that there was no lagging,
which suggested that the two
series were time-aligned. The CCF between the ARIMA(0, 1, 1)
filtered STARA counts and the
ARIMA(0, 1, 1) prewhitened AAVSO counts showed that the
count series were time-aligned. The
CCF between the ARIMA(1, 1, 3) filtered Debrecen counts and
the ARIMA(1, 1, 3) prewhitened
STARA counts suggested that the Debrecen series and the
STARA data are time aligned.
18. After the appropriate series shifts were made, the magnitude of
the series counts was compared.
Table 2 gives the details of the counts magnitude comparisons,
and the table shows that only the
STARA and Debrecen series are interchangeable.
We showed that returning surgeries time series counts
comparisons are best made after a statis-
tical times series analysis is performed. We also showed that, as
the counts do not follow a normal
distribution, the appropriate magnitude comparison statistical
method is the Wilcoxon signed ranks
test provided the series pairings first are time-aligned. The
results showed that only the Debrecen
series and the STARA series are interchangeable.
3 Bonus, 3D VAR(2) Model, 5 points
Set up the three-dimensional (3D) VAR(2) where the third
variable does not Granger-cause the
first variable. The Bonus.R script may help.
4 Bonus, “Best Model”, 5 points
Give criteria for aiding in the choice of a “best” time series
model when two or more such models
are available. What is, arguably, the most important criterion?
Time Series Model Construction, 20 pointsFossil Fuels
Company StocksBlackhole Detection from Suspected Gravity
LensingReturn Surgeries, 15 pointsIntroductionData
SetsAAVSO DataDebrecen DataSTARA DataAutocorrelation
AnalysisDescriptive AnalysisAutocorrelation ModelsCross-
Correlation AnalysisMagnitude ComparisonConclusionsBonus,
3D VAR(2) Model, 5 pointsBonus, ``Best Model'', 5 points