220 F

Forecasting the Term Structure using Nelson-Siegel
Factors and Combined Forecasts
Econ 220F, Prof. Gordon Dahl, Spring 2007
Michael D. Bauer

June 12, 2007

Abstract

This paper attempts to replicate the good performance of the DL-approach to
term structure forecasting, documented in Diebold and Li (2006), in a newer dataset.
It finds that in the original specification using AR(1) processes to forecast the Nelson-
Siegel factors, this approach does not perform well. A better alternative is to model
the factors as martingales, and therefore simply predicting future yields by today’s
fitted yields. Furthermore, the persistence of the individual yields and their pricing
errors allows to reduce the forecast error variance, by shrinking the DL-forecasts
towards the current yields. Forecast combinations that incorporate DL-forecasts
and other yield forecasts perform best among the considered competitors.

1 Introduction
Accurate forecasts of the term structure of interest rates are crucial in bond portfolio
management, derivative pricing, and risk management. Affine term structure models,
which are heavily used in practice to price options and other derivatives on fixed income
instruments, perform poorly in out-of-sample forecasting (Duffee, 2002). An important
recent contribution in this field is Diebold and Li (2006), who adapt the Nelson-Siegel
(1987) framework to the purpose of forecasting the entire term structure. They exploit
the parsimonity of this framework by forecasting just three latent factors (the Nelson-
Siegel factors, henceforth NS-factors), which are modeled as univariate AR(1) processes
and interpreted as level, slope and curvature of the yield curve. From the forecasts of these
factors, the entire term structure at future points in time can be generated. This approach
is now commonly known as the Diebold-Li (DL) approach to forecasting the yield curve,
and it outperforms all considered competitors over a forecast horizon of 12 months in US
data in the forecast exercise of these authors.
In the present paper, we attempt to replicate these findings, using first data over an
identical sample window, and then a bigger sample including data up until December
2006. We extract the factors and analyze their dynamic properties. Then we compare
fitted and empirical yield curves, and find that pricing errors are persistent. The main
task is the forecasting exercise: We compare the original DL-approach and some variants to
several competitor models. In particular, we vary the specification of the factor dynamics,
considering Random Walks as well as AR(1) processes. A decomposition of the forecast
error variance into the contribution of the factor prediction errors and the pricing errors
provides insight into how different DL-specifications compare relative to each other, and
how one could improve upon the DL-method. Furthermore we include combined forecasts
in the analysis, using both equal weights and performance weights, that each combine the
forecasts of all competing individual forecast models.
Our results are disappointing if one believes in the superior performance of the DL-
approach in its original specification: While for the sample including the same time periods
as DL, the DL-approach does outperform some competitors at selected maturities and
forecast horizons, the outperformance is less pronounced than in the original paper. Since
summary statistics and yield curves on selected dates agree quite closely between us and
DL, it is surprising to find such different results. Using the full sample, the performance of
the original DL-approach is even worse. Specifying AR(1) processes to forecast the factors,
one can in no case beat the random walk significantly in terms of predictive accuracy. We
find that no-change forecasts for the NS-factors generally improves the forecast. The

1

evidence suggests that one should not try to forecast the factors, since the additional
estimation uncertainty contaminates the forecasts. Today’s fitted yield is usually the best
DL-forecast.
Another conclusion is one that is made frequently: The methods that perform by
far the best are the combined forecasts. The diversification gains from including several
competing forecasts into a combined forecast are considerable, and make this approach to
yield curve forecasting our method of choice.
The paper proceeds as follows: Section 2 explains the zero curve and its construction
and presents summary statistics for the zero curves for the US that we bootstrapped from
the CRSP bond price data. In section 3 we analyze the NS-factors as they are extracted
from the US data, compare them to empirical factors (level, slope and curvature as they
are usually calculated), and assess the fit that the NS-factors provide to empirical yield
curves. In section 4 we describe and discuss the DL-approach to forecasting, present its
competitors and the method to compare predictive accuracy. Then the results of the
forecast exercise are presented and discussed. Section 5 concludes.

2 The Zero Curve
With the CRSP Monthly Treasury Database, we have an excellent data source at hand for
constructing the US term structure of interest rates. It is very well documented, updated
regularly, and checked for consistency. We defer the details of the data processing to
appendix A. In the following subsection we explain why it is necessary to bootstrap spot
rates from the data and outline how we performed this task. Then we present summary
statistics for the US yield curve.

2.1 Bootstrapping the Zero Curve from Bond Prices
The yields we are interested in are the spot rates: The net return that is obtained on
a τ -period investment. The cross-section (across different maturities) of spot rates at a
particular point in time is called the term structure of interest rate, or yield/spot rate/zero
curve. For τ -period discount bonds, the yield to maturity (YTM) is equal to the spot rate
for that period, yt (τ ). For a coupon bond, the YTM is the discount rate that makes
the present value of future coupon and principal payments equal to the cash price of the
issue. This yield is not equal to a spot rate, since coupon payments cannot generally be
reinvested at the same rate. Therefore the YTM across different maturities is not equal to
the zero curve. This latter has to be constructed from observed bond prices, and we do so

2

using the methodology of Fama and Bliss (1987): Simply put, “the discount rate function
is extended each step by computing the forward rate necessary to price successively longer
maturity bonds given the discount rate function fitted to the previously included issues”
(Bliss, 1996, p.10). Since there is a one-to-one mapping between zero curve and discount
rate function, this achieves the goal of constructing the zero curve. We provide details
about the procedure in the appendix. After we obtaining spot rates for all maturities at
a certain point in time, we linearly interpolate these rates into 17 fixed maturities, as DL
did.

2.2 Summary Statistics
Figure 1 provides a three-dimensional plot of the US yield curve over the entire time
horizon. The variation in the level of the yield curve is much stronger than the variation
in slope and curvature, yet the latter are obviously important as well.
In table 1 we show the summary statistics for the monthly term structure in the US.
We see some of the usual patterns:

• The average yield curve is upward sloping.

• The short end of the yield curve is more volatile than the long end.

• Yields of all maturities are highly persistent.

Like DL we find short rates in our sample to be more persistent than long rates. In sum,
our table is qualitatively and quantitatively very similar to table 1 in DL.

3 Fitting the Yield Curve using Nelson-Siegel Factors
3.1 Extracting the Factors
In extracting the NS-factors from the yield data, we follow exactly the approach of Diebold
and Li (2006): For each month of data, we regress the yields of 17 maturities on the factor
loadings. The factor loadings are

1 − exp(−λτ ) 1 − exp(−λτ )
loadings(τ ) = 1, , − exp(−λτ ) , (1)
λτ λτ

3

where λ is fixed at the value 0.0609 as in DL1 Now the regressions

yt (τ ) = βt loadings(τ ) + t (τ ), for τ = τ1 . . . τ17 , (2)

where βt are the regression coefficients, will give us three time series for the three Nelson-
Siegel factors, {β1t , β2t , β3t }t=1 . 2
T

3.2 Model-based Factors and Empirical Factors
We will now compare the NS-factors, to their empirical counterparts as they are usually
calculated: The level is taken to be the yield at the longest maturity (10 years), the slope
is the difference between the 10-year and 3-month yields, and the curvature is twice the
2-year yield minus the sum of 10-year and 3-month yields.
As detailed in DL, the NS-factors can be interpreted as level, slope and curvature
of the yield curve. The first factor, if increased, raises yields at all maturities, since its
loading is constant, and can therefore be interpreted as the level of the yield curve. The
second factor, if increased, increases short yields more strongly than long yields, since short
yields load more heavily on it. It can therefore be interpreted as the negative of the usually
employed measure of the yield curve slope (long minus short yield). The third factor, if
increased, will increase medium-term yields the most, since long and short yields do not
load strongly on this factor. Because 2yt (24) − yt (3) − yt (120) = 0.00053β2t + 0.37β3t , we
multiply this third factor by 0.3 and it will then closely correspond to the usual curvature
measure.
In figure 2 we see that in fact the empirical counterparts correspond very closely to
the estimated factors.

3.3 Dynamic Properties of Nelson-Siegel Factors
In table 2 we present summary statistics for the model-based factors. As their empirical
counterparts, they are very persistent. In the last column we present the results for the
Augmented Dickey-Fuller test, in order to test for unit root. The MacKinnon critical value
for rejecting the null hypothesis of a unit root is -2.57 for a sample size of 250 (our sample
1
For this value the loading on the curvature factor (the third factor) is maximized at 30 months.
Usually maturities of two or three years are used to calculate curvature (by subtracting twice from this
yield the sum of short and long yields), and 30 months is right in between.
2 ˆ
We differ in our notation from DL, in that we denote by β and not by β the actually observed
NS-factors. This will be useful because the concept of some “true” unobservable NS-factors will not be
needed, and we can distinguish better between actual and forecasted factors.

4

size is 264), wherefore we cannot reject the null for the level and slope factors. They may
well contain unit roots, whereas the curvature factor probably does not.
At this point the autocorrelation and partial autocorrelation functions of the NS-
factors should be considered, and we show these in figure 3. The level factor probably
has long-memory, and might be non-stationary. The persistence is also high for both
slope and curvature, yet it is decidedly smaller than the persistence of the level factor.
Linear models that could explain correlation structures other than the Random Walk
(RW) model, are for example AR, ARMA, ARIMA, or ARFIMA models. We fit AR(1)
models to the factors and present the autocorrelation functions of the residuals in figure
4, together with Bartlett confidence bands. Not all serial correlation is captured, since
some autocorrelations at low lags are significant, and the Ljung-Box test confirms this
(results not included). We postpone further discussion of the NS-factors until we discuss
the forecast approach based on these.

3.4 Actual and Fitted Yield Curves
In figure 5 we plot the fitted yield curve using the average of the factors, together with the
average empirical yield curve. We see that the fitted curve provides a very good fit. This
is what one would expect: The average yield curve is very smooth, and so three factors
should be sufficient to capture its shape and generate a good fit.
In figure 6 we show the actual and the fitted yield curve on selected dates, choosing
the same dates as DL in their figure 5. Our plots are essentially identical to those of DL.
The important point to notice is that at some dates the fit of the NS-fitted yield curve is
much better than at other dates, with the difficulties arising when the actual yield curve is
very unsmooth and dispersed. Since yield curves are seldom looking like in August 1998,
where the fit is particularly bad, the fit is usually satisfactory.
To obtain a better understanding of how well the Nelson-Siegel fitted yield curves fit
the actual yield curves, we provide a three-dimensional plot of the residuals in figure 7.
Overall the fit seems to be sufficiently good: The residuals are usually within -0.2 and 0.2.
No long, persistent deviations from zero are apparent.
But the dynamic properties of these pricing errors should be looked at more closely.
We present summary statistics for these in table 3. Of course the means are close to zero
and never significantly different from it. The RMSE’s indicate that at the shortest and
longest maturity the fit is worst. The most important insight from this table is that pricing
errors are in fact persistent, with first order autocorrelations of the residuals ranging from
0.23 to 0.88, usually being larger than 0.5. This persistence implies that we can possibly
improve upon the DL-forecast, which predicts using fitted yields. This issue, which was

5

not considered in the original DL paper, will be discussed further in the following section.

4 Forecasting the Yield Curve
Applying simple forecasting techniques to the three NS-factors, and then using the fitted
yield curve as the forecast is the basis of the DL approach. In the following we go into
more details on this approach and possible variations and extensions, and then discuss
competing forecast models, forecast combination, and the method of choice to compare
predictive accuracy (Diebold-Mariano). Finally we present the results for the US.
Throughout we use the following notation: The number of observations in the sample
is T , with R observations being in the initial estimation window. The forecast horizon
is h (months) and the first forecast is therefore for R + h. In the out-of-sample window
we have P observations (R + P = T ). For emphasis we sometimes denote the time the
forecast is made as tf . So tf = R, . . . , T − h.

4.1 Diebold-Li Forecast Approach
The approach to forecast the yield curve chosen by Diebold-Li (DL) consists of forecasting
the NS-factors, and then using the fitted future yield as the predicted value. The time t
forecast of the zero spot rate at t + h (maturity τ ) is given by

ˆ ˆ
yt+h/t (τ ) = l(τ ) βt+h/t , (3)

where l(τ ) is the column vector with the three NS factor loadings for maturity τ , and the
ˆ
3 × 1 vector βt+h/t contains the forecasts for the factors. Univariate AR(1) processes and
first order VARs were used as models for the factors in DL’s forecasting exercise.
î
When we forecast at time tf a factor (i) using AR(1) regression, the forecast is βt+h|t =
ˆ t ˆ t −h
b0 + b1 β i , with b0 and b1 being the OLS coefficients from regressing {βt } f on {βt }1f .
t h
This specification will be called DL-AR.
One interesting question is whether the AR(1) specification can at all improve upon
a naive no-change forecast. Modeling the all three factors as martingales leads to an
extremely simple forecast: Today’s fitted yield, yt+h/t (τ ) = l(τ ) · βt . We include this spec-
ˆ
ification (DL-RW) in our forecasting exercise in order to assess its performance compared
to the original DL specification.
The Dickey-Fuller tests indicated that level and slope factors might have unit roots,
so an obvious further model choice is to model these as random walks, and choose AR(1)
for the curvature (DL-RW2AR1). The ACF on the other hand gives the feeling that a

6

unit root seems decidedly more likely in the level factor than in the other two, and so we
also include a specification where we choose RW for the level, and AR(1) for slope and
curvature (DL-RW1AR2). There are eight combinations when choosing betwen AR(1)
and RW for each factor – we content ourselves by just looking at four of these. 3
The accuracy of these forecasts is determined by how well the future factors are pre-
dicted, and how close the actual future yield is to its fitted yield. The forecast error can
be decomposed as follows:

ˆ
yt (τ ) − yt+h|t (τ ) = βt+h l(τ ) + pet+h (τ ) − βt+h|t l(τ )
ˆ
= (β − β ˆ )l(τ ) + pet+h (τ )
t+h t+h/t

= f pet+h l(τ ) + pet+h (τ )

The factor prediction error (f pe) is a 3 × 1 random vector with the errors we make
in predicting the factors. These are weighed differently, depending on the three loadings
l for maturity τ . The pricing error (pe) is the deviation of the actual future yield from
fitted yield curve. Assuming that time t pricing error and factor prediction error are
uncorrelated (an assumption that seems plausible and could be tested) the expected loss
of the forecast is

E(e2 ) = Var(f pet l(τ )) + Var(pet (τ ))
t

= l(τ ) Σl(τ ) + Var(yt − βt l(τ )) (4)
= V f + V pe (5)

where Σ = E(f pet · f pet ) is the unconditional contemporaneous covariance matrix of the
three factors prediction errors. We denote by Vf the contribution of the factor prediction
errors, and by V pe the contribution of the pricing errors to the expected loss. The diagonal
elements of Σ are the forecast error variances for the chosen factor prediction method, the
off-diagonal elements the covariances of the factor prediction errors. How important these
elements are depends on how heavily a yield of the chosen maturity loads on each factor.
Intuitively, when predicting short yields, the errors in predicting level and slope are both
3
Since these model choices depend on the observed persistence in our sample, one could argue that
the researcher has information about the factor dynamics only until the time of the forecast. Yet the fact
that we forecast and estimate persistence partly in the same data would only be problematic (bias our
results) if the persistence features changed over time and it actually pays off to the forecaster to have
up-to-date information on these. We are not worried about this bias, because qualitatively similar results
on Dickey-Fuller tests and ACF functions obtain in the sample without the forecast window. Also this
is a minor point because the comparison between AR(1) and RW is in our case not motivated by the
observed data.

7

important, but when predicting a long yield V f depends only on the forecast error variance
of level factor and not of level and curvature.
If one is willing to assume independence between factor prediction errors and pricing
errors, the performance of any DL-specification depends entirely on how well it predicts
the factors. Of course it is possible that a bad forecast for the factor produces a fitted
yield that is closer to the future realization than the fitted yield from the actual future
ˆ
factors, that is |yt+h (τ ) − βt+h|t l(τ )| < |yt+h (τ ) − βt+h l(τ )| = |pet+h |. This is the case if
f pet+h l(τ ) is of opposite sign than pet+h . If this happens systematically, that is if there is
negative correlation between these two, the expected loss will be decreased, and might be
smaller than if we predict the factors well but have no correlation. Under the assumption
of no correlation however, a DL-forecast will outperform a competitor of its kind if and
only if it can predict the factors better. Again, against other methods, the performance
of DL-forecasts depends on both how well NS-fitted yield curves fit actual yields (in the
future) and how well its factors can be predicted.
The pricing error variance is one common source of loss for all DL specifications. We
may hope to take advantage of the persistence in the pricing errors for better predictions,
yet for this we have to deviate in our forecast from a fitted yield. Instead of explicitly
modeling and forecasting the pricing error, a much simpler approach can do the same job:
Namely to just shrink towards the actual yield at the time the forecast is made, e.g. simply
taking a (possibly weighted) average of the DL-forecast for the yield and its current value.
The reason this reduces pricing errors is intuitive: A yield that is off quite a bit from the
fitted curve will probably still be in the future, with a pricing error of the same sign (at
least over shorter horizons). Since the yield itself is persistent, shrinking the fitted yield
forecast towards last period’s value should reduce the pricing error. So this shrinking
method exploits the persistence in the pricing errors, and it does not need to estimate any
additional parameters. Following this reasoning, a model within the DL class would then
usually be beaten by a shrinkage towards the current actual yield. We denote the forecast
based on a simple average of DL-RW and the current yield as DL-RWS (S for shrinkage),
and assess our hypothesis in the last table.

4.2 Competing Forecast Models
In the following we present the different models that compete with the DL-forecasts. We
denote by b0 and b1 in each case the OLS coefficients estimated from the regressions in
the data available at the time of the forecast.

Random Walk The first and most important competitor is the simple no change yield

8

forecast, yt+h|t (τ ) = yt (τ ).
ˆ

AR(1) on yield levels Yields at each maturity are predicted using an AR(1) that is
t
estimated on the available data for that maturity {yt (τ )}1f . The yields are regressed
on h-period lagged values, to produce optimal MSE linear forecasts: yt+h|t (τ ) =
ˆ
b0 + b1 yt (τ ).

AR(1) on yield changes Here we regress h period yield changes on their past values
in order to obtain a prediction for future yield changes: yt+h|t (τ ) − yt (τ ) = b0 +
ˆ
b1 (yt (τ ) − yt−h (τ ))

Slope regression The forecasted yield change is the predicted value from regressing his-
torical (h period) yield changes on yield curve slopes: yt+h|t (τ )−yt (τ ) = b0 +b1 (yt (τ )−
ˆ
yt (3).

Fama-Bliss forward rate regression The forecasted yield change is the predicted value
from regressing historical yield changes on forward premia: yt+h|t (τ ) − yt (τ ) =
ˆ
b0 + b1 (fth (τ ) − yt (τ )), where fth (τ ) is the forward rate for investments from t + h to
t + h + τ , observed at time t.

Regression on three AR(1) principal components Extracting three principal com-
ponents from the 17 yields (of course using data only until tf ), we forecast these
using AR(1) models, and generate the yield forecasts from the predicted principal
components (see DL, p.359).

4.3 Forecast Combination
It has been known for quite some time that “combining multiple forecasts leads to increased
forecast accuracy” (Clemen, 1989), as opposed to just using a single ex-ante optimal
forecast. 4 We will evaluate the performance of two combination strategies, equal weights
and performance weights.
Given N individual forecasts for a yield (at maturity τ , which we omit in the following
notation) the linearly combined forecast is given by

N
ˆc
yt+h|t ˆ
= wt yt+h|t = ˆi
wt,i yt+h|t ,
i=1

where the N × 1 vector of weights wt can be time varying. For equal weights, each element
of this vector is equal to 1/N . For performance weights, each forecast is weighed by the
4
For a detailed analysis of the subject see Timmermann (2005).

9

inverse of its MSE over the last 24 months (or over as many months as are available):

1/M SEtf ,i
wtf ,i = N
j=1 1/M SEtf ,i
f t
1
M SEtf ,i = e2
v + 1 t=tf −v t,i
v = min(tf − R, 24),

where et,i is the forecast error of forecast model i at time t. As before tf is the time
at which the forecast is made. Naturally we cannot use data on the performance of the
individual forecasts after tf , e.g. the MSPEs over the whole out-of-sample period. The
choice of using the last 24 periods is of course arbitrary, and is based on a trade-off between
precise estimation of the performance (which necessitates many periods), and the fact that
performance might change over time (asking for less periods).

4.4 Comparing Predictive Power
For each maturity and each forecast horizon, a forecast model produces errors e =
(eR+h , . . . , eT ), with the individual error observation being et = yt − yt/t−h . There are
ˆ
ne = T − R − h + 1 = P − h + 1 error observations. As usual we assess the forecast
accuracy in terms of squared error loss by estimating the expected loss E(e2 ) with its
t
empirical counterpart, the mean squared prediction error (MSE) e e/ne .
It should be noted that this estimator is biased downward for models that involve pa-
rameter estimation (Efron, 1983), since it does not account for this estimation uncertainty.
In our case, the uncertainty in the parameter estimation is obviously different across the
models. The RW model for yields needs no estimation at all. It seems intuitive that in a
performance comparison using MSEs a bias could arise in favor of models that rely heavily
on parameter estimation, because MSE does not account for this estimation uncertainty.
But we will see that the RW performs remarkably well, and its relative performance would
only be further improved by employing more accurate strategies to estimate the expected
loss (e.g. the bootstrap).
To rigorously test whether models forecast with the same accuracy or one provides ev-
idence to outperform, we use the test statistic advocated by Diebold and Mariano (1995),
henceforth DM. We estimate the long-run variance of the loss-differential series by estimat-
ing the spectral density at frequency zero with the well-known Bartlett Kernel. The lag
length for the included autocovariances is chosen to be equal to the forecast horizon. This
test statistic converges in law to a standard normal distribution, and so we can compare

10

it to the usual values of ±1.65 (10% sig. level) and ±1.96 (5% sig. level).
We would like to point out that this test has asymptotically correct coverage for a
hypothesis like “DL-RW performs equally well as the random walk for maturities of τ
months at a horizon of h months.” The problem of testing multiple hypotheses naturally
arises in our context, since we are interested in the performance at different maturities and
over different forecast horizons. This is one area where an extension could be valuable:
Controlling the family-wise error rate and the false discovery rate is necessary for correct
inference.

4.5 Results for the US
The sample choice of DL is to take Jan-1985 to Jan-1994 as the initial estimation window,
with data available until Dec-2000. We initially chose that exact same sample for our
forecasting exercise. For short forecast horizons, we obtain numerically similar results to
DL – their method does not fare that well. For longer horizons, in particular 12 months,
where DL find that their method significantly outperforms, we find neither quantitatively
nor qualitatively similar results. DL-AR does not significantly outperform the RW at any
maturity considered by DL. Using exactly the same sample choice, the DL strategy does
not fare as well in our data set as in the original paper.
The full sample, Jan-1985 to Dec-2006, was split in half to obtain the initial estimation
window, so T = 264 and R = P = 132. Results in this sample are the ones we actually
present in the tables. They provide new evidence whether the different DL specifications
perform well, as compared to the no-change forecast for each yield.
The results of our forecast exercise for a horizon of h = 1 month are presented in table
4. We forecast yields at maturities 3, 12, and 60 months. The columns show the mean
and standard deviation of the forecast errors, the root MSE, autocorrelations at one and
twelve months lags, as well as the Diebold-Mariano test statistic assessing the hypothesis
of equal forecast performance.5 For the short rate, the CF methods do best and have
MSEs that are significantly smaller than that of the RW. For the one year yield, no model
outperforms the random walk, and in fact the DL strategies perform significantly worse.
For the five year yield, again no model outperforms the random walk. Forecasting yields
one month ahead better than with the naive no change forecast seems difficult, yet forecast
combinations perform comparatively well.
In table 5 results for the six months forecast horizon are shown. Again we can only beat
the RW for the short rate, with the DL strategy employing RW factors, and with the CF
5
The slope regression method is not included for τ = 3 since it is not applicable.

11

strategies. It is noteworthy that the DM statistic for the CF strategy using performance
weights is again strongly significant. For other maturities than 3 months, no model can
significantly outperform the no change forecast, but DL-RW and combined forecasts have
smaller MSE than the RW for the 5y yield.
Finally table 6 shows the results for the year-ahead forecasts. Only for the 5 year yield
can one model, DL with RW factors, outperform the RW. The CF methods have smaller
MSE then the RW for the 3m and 5y maturities, yet not significantly so. Noteworthy is
the better performance of DL-RW compared to DL-AR at all three maturities.
In sum we find markedly different results than DL: The DL-AR strategy never beats
the RW, and is often significantly worse. There are two reasons for this: First the data
has been constructed in a different way, and the results are obviously sensitive to this.
Secondly, on the data from 2000 to 2006, which is included in our sample but not in DL’s,
the DL strategy performs particularly bad – the results deteriorate from the DL sample
choice to the full sample. What should certainly be pointed out is the fact that the DL-
RW performs remarkably better than DL-AR. The former strategy has smaller RMSE in
all of the nine settings considered, and it beats the RW in two cases, whereas the latter
strategy never does. Today’s factor values seem to be decidedly better forecasts for future
factors than an AR(1) forecast. Intuitively, the factors are so close to martingales that
the added estimation uncertainty from estimating the AR(1) coefficients leads to worse
forecasts. So a forecast using NS-factors should, according to our evidence, best be done
by just predicting today’s fitted yield.
The DL-RW method fares well in comparison to the RW. In only three out of nine cases
does it have a larger MSE. Twice it beats the RW significantly. So there are obviously
gains to be had from the DL-forecast methodology. The information in the factors contains
less noise than the individual yields, so one is well advised to base a yield forecast at least
partly on the information in these factors.
The combination of forecasts shows its merit also in our study. Although the CF
strategies do not beat the RW throughout, their RMSE is mostly smaller. In particular
for the short rate they outperform remarkably (yet not significantly so for the year-ahead
forecasts). The combined forecasts are particularly good if at least some of the individual
forecasts have lower MSE than the RW. The four different DL-specifications that we
included have mostly one or two among them that performs well. Therefore they contribute
to the good performance of the combined forecasts.
In addition to the previous results, we look at the DL-RW strategy from another
perspective. How does it fare in comparison with it’s shrinkage version? What we call
shrinkage here is really a simple average between last periods yield and last periods fitted

12

yield. This is also a combined forecast, a particularly simple one. In table 7 we see that
our hypothesis that a DL-model is usually outperformed by a combination of itself with
the previous period’s yield. At all maturities and forecast horizons is the MSE smaller for
the shrinkage specification. Again, this is because yields and pricing errors are persistent
and the shrinkage method takes advantage of this fact.

5 Further Directions
In the following we propose various extensions to the analysis of this paper.
As mentioned earlier, yield forecast methods are compared at different horizons and
maturities. Therefore we run into the problem of multiple hypothesis testing. Since
many hypothesis tests (for equal forecast accuracy) are performed at once, one should
control the false discovery rate (Benjamini and Hochberg, 1995). Whenever we want to
aggregate the evidence, and not only forecast one yield over one horizon forecast horizon,
our inference will be incorrect without appropriately controlling the coverage of our tests.
One method that seems promising has been proposed by Storey (2002): These authors fix
the rejection region and then estimate its corresponding error rate, which “offers increased
applicability, accuracy and power”. Whichever method we choose, it will enable us to test
hypotheses about the relative performance of two methods over several forecast horizons
and/or maturities. This will allow for more concrete conclusions than just eyeballing
several DM test results and aggregating them only verbally.
Estimating the contribution of the factor prediction errors and of the pricing errors
to the MSE of the forecast, as well as the correlation between the two would certainly
be interesting. We could then understand how important the pricing errors are in the
forecast error variance, and compare different competitors in the DL class with regard to
estimates of their Vf and of the correlation between factor prediction errors and pricing
errors. Furthermore the combination of DL-forecasts with forecasts that do not rely on NS-
factors, as shown, generally improves performance, because it reduces the pricing errors.
Further investigations into this direction seem promising. In particular, a weighting scheme
for combining DL-forecasts and individual yield forecasts that do not rely on NS-factors)
should take into account how important pricing errors have been in the past compared
to forecast prediction errors. If pricing errors are relatively important, individual yield
forecasts should be weighed more strongly. It might be possible to derive ex-ante optimal
weights for this problem, either time-constant or time-varying, which would be a very
fruitful extension, and could be empirically promising.
Other obvious extensions include attempts to forecast the factors better using either

13

univariate (e.g. ARFIMA) or multivariate models. The key here is a parsimonious config-
uration. Also one could include more information. Important sources of information could
be among others macro factors, international yield curve factors, risk measures (e.g. ex-
pected risk premia), volatility measures, and latent factors. The task is then to find ways
to incorporate relevant information without introducing too much estimation uncertainty.
For linear factor forecasts, a possibility to include other information are transfer functions
(see for example Liu, 2006, chap.5), which are just highly restricted VARs. They could
provide a reasonable middle-ground between too much additional estimation uncertainty
and ignoring possibly valuable information.
Constructing reliable yield curve data for many countries, and testing newly developed
forecast strategies in new data is an important further task. If we keep improving on our
strategies and test them repeatedly in the same data, we obviously run into the problem
of datasnooping. Extending available data sources is therefore important.
Any research program that aims at improving term structure forecasts should not be
oblivious to the advantages the DL factor approach brings with it. The above extensions
could possibly enable applied researchers to profit from this approach considerably.

6 Conclusion
Forecasting the yield curve accurately is a difficult task. Among the competing forecast
models considered in this paper, none could consistently beat the random walk. However,
at almost all maturities and forecast horizons, at least some of the models perform well.
This gives particular appeal to combined forecasts, which profit from this fact. They
mostly have smaller MSE than the RW, and sometimes beat it significantly.
With regard to the DL strategy, we cannot confirm the positive results in the original
paper. The method DL-AR by itself does worse in our data, in particular in the full
sample. There is light on the horizon though: Getting rid of the estimation uncertainty
and simply modeling some or all factors as martingales improves upon the performance
of the DL method. Moreover, there is mostly at least one of the alternative models that
performs well. Therefore they are very valuable to include in combined forecasts.
The relatively good performance of the DL-RW specification indicates that the ap-
proach to forecast yields via a fitted yield curve is promising. From our results we conclude
that trying to forecast the factors can be counterproductive. The added estimation un-
certainty is a candidate explanation for why today’s fitted yield curve is a better forecast
than the one based on forecasted factors.

14

References
Benjamini, Yoav and Yosef Hochberg, “Controlling the False Discovery Rate: A
Practicaul and Powerful APproach to Multiple Testing,” Journal Of The Royal Statis-
tical Society Series B, 1995, 57 (1), 289–300.

Bliss, Robert R., “Testing term structure estimation methods,” Working Paper 96-12,
Federal Reserve Bank of Atlanta 1996.

Clemen, Robert T., “Combining forecasts: A review and annotated bibliography,”
International Journal of Forecasting, 1989, 5 (4), 559–583.

Diebold, Francis X. and Canlin Li, “Forecasting the Term Structure of Government
Bond Yields,” Journal of Econometrics, February 2006, 130 (2), 337–364.

Diebold, Francis X and Roberto S Mariano, “Comparing Predictive Accuracy,”
Journal of Business & Economic Statistics, July 1995, 13 (3), 253–63.

Diebold, Francis X., Canlin Li, and Vivian Z. Yue, “Global Yield Curve Dynamics
and Interactions: A Generalized Nelson-Siegel Approach,” Manuscript, Department of
Economics, University of Pennsylvania June 2006.

, Glenn D. Rudebusch, and S. Boragan Aruoba, “The Macroeconomy and the
Yield Curve: A Dynamic Latent Factor Approach,” Journal of Econometrics, March-
April 2006, 131 (1-2), 309–338.

Duffee, Gregory R., “Term Premia and Interest Forecasts in Affine Models,” Journal
of Finance, February 2002, 57 (1), 405–443.

Efron, Bradley, “Estimating the Error Rate of a Prediction Rule: Improvement on
Cross-Validation,” Journal of the American Statistical Assocation, June 1983, 78 (382),
316–331.

Fama, Eugene F. and Robert R. Bliss, “The Information in Long-Maturity Forward
Rates,” American Economic Review, September 1987, 77 (4), 680–692.

Jeffrey, Andrew, Oliver Linton, and Thong Nguyen, “Flexible Term Structure
Estimation: Which Method is Preferred?,” Metrika, March 2006, 63 (1), 99–122.

Liu, Lon-Mu, Time Series Analysis and Forecasting, 2nd ed., Scientific Computing As-
sociates Corp., 2006.

15

Storey, John D., “A direct approach to false discovery rates,” Journal Of The Royal
Statistical Society Series B, 2002, 64 (3), 479–498.

Timmermann, Allan, “Forecast Combinations,” 2005. Forthcoming in Handbook of
Economic Forecasting.

A Data
A.1 Data Source
Our data source for monthly data on US government bonds is the CRSP Monthly US
Treasury Database6 . We use the cross-sectional file, which includes monthly data on all
outstanding Treasury bills, notes and bonds. In particular, all dead bonds that have long
been redeemed are also available, with the same data quality as today’s issues. Our sample
includes all observations from January 1985 to December 2006.

A.2 Filters
We include only non-callable, fully taxable, non-flower bonds, since the pricing of non-
standard issues deviates from the usual well-known bond pricing theory. The relevant
price variable is the mean of bid and ask price. These are flat prices, that is they do not
include accrued interest. We call our mean price simply the price and the mean price plus
accrued interest the cash price.
First we exclude those quotations where the price is lower than 50 or higher than 130,
since issues with discounts/premiums of that magnitude usually show thin trading and
the prices are therefore subject to idiosyncratic variations.
We exclude quotations where the yield differs significantly from the yield at nearby
maturities: We generate two moving averages, one including the three issues of shorter
maturity, and one using the three issues of longer maturity, and include an issue only if it
is within .2 percentage points of either moving average or within the two. This procedure
is an adapted (simplified) version of the methodology employed by CRSP to construct the
Fama-Bliss files.
Also excluded from the analysis are all issues with maturity of one month or less or 15
years or more, since again, there is thin trading for these issues.
Our filtered bond price data includes the following variables:
6
Source: CRSP, Center for Research in Security Prices. Graduate School of Business, The University
of Chicago 2007. Used with permission. All rights reserved. www.crsp.uchicago.edu

16

• Date of quotation, date of maturity, date of first coupon payment, days to maturity.

• Coupon rate, value of first coupon, accrued interest.

• Price, yield to maturity (annualized).

The date of the first coupon payment starts the semiannual cycle of coupon payments.
All coupon payments are exactly half the coupon rate times the face value ($100), except
for possibly the first coupon payment, in case it occurred not exactly half a year after the
date the issue was dated by the Treasury.

A.3 Bootstrapping the Zero Curve
Although the CRSP Monthly Treasury Database includes Fama-Bliss yields, these are
only available at maturities from one to five years, so that we had to construct Fama-Bliss
yields ourselves from the available bond price data. In the following we briefly outline our
algorithm.7
The underlying pricing assumption is that the daily forward rates are constant between
two successive maturities. The forward rate function is therefore a step function with
jumps at the maturities of the available issues. For each point in time, the issue with
the shortest maturity starts the iteration. If it is a discount bond, the forward rate
follows easily from the formula relating the cash price and the forward rate: pcash =
100 exp(−τ1 F1 ), where τ1 is the maturity of the first issue. If the first issue is a coupon
bond, the forward rate is its yield to maturity, which we take from the data source. Now
for each successive issue, the forward rate is calculated so that, given previous forward
rates, it exactly prices that issue. Since coupons are paid at half-year intervals, and the
difference in maturities of two successive issues is always smaller than half a year, there
is only one cash-flow that is discounted using that forward rate, and there is a simple
closed form solution. If there is more than one issue at a particular maturity, we calculate
a forward rate for each of them, and then average these forward rates. This is common
practice. It should be noted that in these cases the bonds are naturally not exactly priced
by the averaged forward rate.
After bootstrapping the spot rates for all maturities at each point in time, we pool
these rates into fixed maturities using linear interpolation. The fixed maturities are
3,6,9,12,15,18,21,24,30,36,48,60,72,84,96,108 and 120 months. In some cases there is no
issue with a maturity of at least 120 months so we extrapolate the spot rate of earlier
maturities.
7
A lucid explanation of the methodology can be found in Jeffrey et al. (2006).

17

A.4 Data for Other Countries: UK, Germany, Japan
Our initial goal was to compare the forecast performance of the Diebold-Li approach
across different countries: US, UK, Germany, and Japan. Data on government bonds
for all of these countries is available via Thomson Datastream. We downloaded the data
from Datastrom using Excel, imported it into Stata, and developed algorithms to bring
it into a format that is susceptible to analysis in Matlab. The documentation and data
quality, in particular on dead government bonds, is much worse in Datastream than in the
CRSP Monthly Treasury Database. For example, convertible bonds and callable bonds are
not reliably marked as those, and quotations are sometimes ex-dividend (so that accrued
interests are negative).
We ran the same filters as for the CRSP data, and since type-of-issue indicators (like
convertible, tax-free, callable) were unreliable, excluded issues with names including “con-
version” and other obscure names (e.g. “paid”). Those issues were usually associated with
markedly different pricing. After the initial data processing, we attempted to bootstrap
the spot rate curve from the bond prices. A difficult issue was that the accrued interest
quotation convention is not standardized across countries. Therefore we calculated ac-
crued interest ourselves. Yet we were in the end not able to confirm consistency between
YTM and cash prices, an important check before even proceeding to the bootstrap. As
expected, the zero curves that we extracted from the bond prices were badly behaved and
inconsistent, with large outliers and occasionally negative yields. Given the available time,
we were unfortunately not able to extract meaningful and consistent zero curves from the
Datastream data.

B The Kalman Filter
The Diebold-Li forecast approach can be cast into a state-space representation, as detailed
in Diebold et al. (2006b). This allows to include other factors into the dynamics of the
NS-factors, and provides correct inference about estimation of parameters of the factor
dynamics (as opposed to the two step estimation of DL, where first factors and then their
dynamics are estimated).
During our attempt to tackle the forecasting exercise using a state-space representation
throughout, several issues came up and lead to a decision in favor of the simple two step
approach.

• The estimation procedure is much more complex, since numerical optimization has
to be employed to maximize the likelihood function. The results are sensitive to

18

initial values, and (naturally) to the restrictions imposed. This complexity does not
pay off in the forecast exercise.

• Using a rolling or recursive forecast scheme, the computational costs quickly become
very large.

• We are not interested in the inference about the factor dynamics, but in the inference
about the forecast performance. Therefore the correct inference about the estimation
of the dynamic process for the factors, which is possible with the one-step estimation,
is not helpful in our case.

• The state-space approach requires estimating variances of the measurement and
transition equations. Several different restrictions are possible (and some necessary
for computational tractability) on these covariance matrices. Again, this adds un-
necessary complexity if the task is simply to forecast the factors and yields.

We will investigate the estimation of the factors and their dynamics via state-space
models further in the future, since it does provide some advantages if one is willing to
pay the price of considerable additional complexity: Including macro factors in the dy-
namic system, or global yield curve factors (Diebold et al., 2006a) might improve forecast
performance. Also, heteroskedasticity and missing observations can be dealt with.

C Stata and Matlab Code
In the following we give an overview of the most important code modules we developed
during the course of this project. Table 8 lists and describes the do-files that were written
to facilitate data processing. Table 9 lists and describes the most important Matlab
scripts that were written to carry out the analysis. Various further supporting functions
were written in Matlab which, for sake of brevity, are not included in these tables.

19

Maturity Mean Std.Dev. Min. Max. ρ(1)
ˆ
3 4.8162 2.0117 0.8148 9.1087 0.9921
6 4.9699 2.0379 0.9443 9.4404 0.9924
9 5.1130 2.0696 0.9781 9.5942 0.9915
12 5.2088 2.0721 1.0393 9.6742 0.9902
15 5.3124 2.0782 1.0657 9.9827 0.9898
18 5.3969 2.0628 1.1436 10.1823 0.9894
21 5.4687 2.0405 1.2187 10.2632 0.9888
24 5.5139 2.0110 1.2990 10.4049 0.9878
30 5.6608 1.9864 1.4443 10.7367 0.9870
36 5.7709 1.9439 1.6173 10.7781 0.9862
48 5.9824 1.9065 1.9962 11.2589 0.9843
60 6.0994 1.8516 2.3482 11.3029 0.9845
72 6.2623 1.8490 2.6606 11.6440 0.9848
84 6.3534 1.7986 2.9993 11.8313 0.9847
96 6.4446 1.7690 3.2172 11.5174 0.9843
108 6.5048 1.7726 3.3858 11.7241 0.9856
120 (Level) 6.4592 1.7325 3.4678 11.6604 0.9839
Slope 1.6430 1.2574 -0.8975 3.9835 0.9698
Curvature -0.2477 0.7593 -2.1724 1.5963 0.9217

Table 1: Summary statistics, US term structure, Jan. 85 - Dec. 06

Factor Mean Std.dev. Min. Max. ρ(1)
ˆ ADF
β1t 6.9092 1.7126 3.8639 12.1111 0.9837 -2.3358
β2t -2.1774 1.7239 -5.5527 1.0354 0.9780 -1.8491
β3t -0.7766 2.0031 -5.9966 4.1809 0.9264 -3.2295

Table 2: Summary statistics and unit-root tests for the NS-factors

Maturity Mean Std.Dev. Min. Max. MAE RMSE ρ(1)
ˆ
3 -0.0401 0.0997 -0.5010 0.2471 0.0805 0.1073 0.7536
6 -0.0038 0.0485 -0.1228 0.2956 0.0351 0.0485 0.2279
9 0.0289 0.0654 -0.2019 0.2666 0.0522 0.0714 0.5549
12 0.0213 0.0641 -0.1810 0.2221 0.0536 0.0674 0.6895
15 0.0282 0.0505 -0.1596 0.1999 0.0463 0.0578 0.7253
18 0.0225 0.0356 -0.0955 0.1015 0.0349 0.0420 0.6245
21 0.0102 0.0293 -0.0981 0.1060 0.0239 0.0310 0.4652
24 -0.0230 0.0434 -0.2164 0.0717 0.0357 0.0490 0.6036
30 -0.0167 0.0384 -0.2022 0.1516 0.0318 0.0418 0.5726
36 -0.0281 0.0516 -0.1879 0.1646 0.0466 0.0587 0.7341
48 -0.0124 0.0635 -0.1878 0.2032 0.0506 0.0646 0.7120
60 -0.0425 0.0578 -0.1774 0.2095 0.0594 0.0716 0.6338
72 0.0087 0.0693 -0.1276 0.3643 0.0481 0.0697 0.8772
84 0.0134 0.0554 -0.2388 0.2946 0.0387 0.0569 0.6199
96 0.0369 0.0437 -0.1596 0.1627 0.0467 0.0572 0.7707
108 0.0430 0.0427 -0.0772 0.1966 0.0498 0.0605 0.6706
120 -0.0466 0.1003 -0.5511 0.1370 0.0790 0.1104 0.8579

Table 3: Summary statistics, yield curve residuals

Method Mean Std.Dev. RMSE ρ(1)
ˆ ρ(12)
ˆ DM
τ = 3 months
Diebold-Li, AR(1) factors -0.0786 0.1933 0.2080 0.2791 -0.0699 -0.0928
Diebold-Li, RW factors -0.0366 0.1901 0.1929 0.2519 -0.0706 -1.6078
Diebold-Li, RW×2/AR(1)×1 -0.0383 0.1931 0.1962 0.2728 -0.0796 -1.3631
Diebold-Li, RW×1/AR(1)×2 -0.0388 0.1925 0.1956 0.2759 -0.0437 -1.5612
Random walk -0.0006 0.2097 0.2089 0.3012 0.1168
AR(1) for yield levels 0.0052 0.2109 0.2102 0.3144 0.1126 1.0777
AR(1) for yield changes 0.0203 0.2017 0.2019 0.0941 0.1082 -1.4723
Fama-Bliss 0.0543 0.1871 0.1942 0.2631 0.0791 -1.6645
Principal components -0.0258 0.1955 0.1965 0.2942 0.0260 -1.9504
CF, equal weights -0.0154 0.1925 0.1924 0.2413 0.0177 -3.0409
CF, perf. weights -0.0165 0.1921 0.1921 0.2344 0.0110 -2.9190
τ = 12 months
Diebold-Li, AR(1) factors -0.0197 0.2532 0.2530 0.4765 0.1480 2.7697
Diebold-Li, RW factors 0.0253 0.2360 0.2365 0.3631 0.1081 2.0668
Diebold-Li, RW×2/AR(1)×1 0.0205 0.2495 0.2494 0.4580 0.1352 3.1367
Diebold-Li, RW×1/AR(1)×2 0.0201 0.2563 0.2561 0.4853 0.1705 3.4298
Random walk -0.0011 0.2252 0.2243 0.2644 0.0123
AR(1) for yield levels -0.0022 0.2274 0.2266 0.2793 0.0180 1.4805
AR(1) for yield changes 0.0240 0.2181 0.2186 0.0187 0.0082 -0.8586
Slope regression 0.0328 0.2232 0.2248 0.2147 0.0103 0.1127
Fama-Bliss 0.0643 0.2204 0.2287 0.1783 0.0213 0.4759
Principal components -0.0106 0.2345 0.2338 0.3322 0.0920 1.6177
CF, equal weights 0.0153 0.2293 0.2290 0.2994 0.0662 1.2815
CF, perf. weights 0.0179 0.2279 0.2277 0.2807 0.0532 1.0222
τ = 60 months
Diebold-Li, AR(1) factors -0.0891 0.2818 0.2945 0.0999 -0.0219 0.7527
Diebold-Li, RW×2/AR(1)×1 -0.0491 0.2823 0.2855 0.0936 -0.0173 -0.2260
Diebold-Li, RW×1/AR(1)×2 -0.0493 0.2833 0.2865 0.0978 -0.0094 -0.0743
Random walk -0.0052 0.2881 0.2870 0.0704 -0.0022
AR(1) for yield levels -0.0393 0.2879 0.2895 0.0802 0.0022 0.6287
AR(1) for yield changes 0.0261 0.2890 0.2891 -0.0559 0.0102 0.4312
Slope regression 0.0154 0.2892 0.2886 0.0818 -0.0100 0.5558
Fama-Bliss 0.0380 0.2894 0.2908 0.0639 0.0068 0.8896
Principal components -0.0224 0.2813 0.2811 0.0798 -0.0351 -1.2700
CF, equal weights -0.0219 0.2833 0.2831 0.0566 -0.0113 -1.3693
CF, perf. weights -0.0218 0.2842 0.2840 0.0552 -0.0113 -1.1154

Table 4: Performance of one-month-ahead forecasts

ˆ ρ(18)
ˆ DM
τ = 3 months
Diebold-Li, RW×2/AR(1)×1 -0.0505 0.7938 0.7923 0.5105 -0.1498 -0.5913
Diebold-Li, RW×1/AR(1)×2 -0.0432 0.8575 0.8552 0.6409 -0.0021 1.0289
Random walk -0.0053 0.8048 0.8016 0.5431 -0.0781
AR(1) for yield levels -0.0677 0.8281 0.8276 0.6029 -0.0511 0.7917
AR(1) for yield changes 0.0615 0.7100 0.7099 0.2725 -0.1422 -1.4233
Fama-Bliss 0.2863 0.7323 0.7835 0.5344 -0.0364 -0.2332
Principal components -0.1674 0.7875 0.8020 0.6294 -0.0229 0.0073
CF, equal weights -0.0413 0.7679 0.7660 0.5445 -0.0845 -1.8778
CF, perf. weights -0.0132 0.7595 0.7566 0.5335 -0.0817 -2.5483
τ = 12 months
Diebold-Li, RW factors 0.0064 0.8173 0.8141 0.5326 -0.0690 1.0007
Diebold-Li, RW×2/AR(1)×1 -0.0273 0.8671 0.8641 0.6014 -0.0628 1.7920
Diebold-Li, RW×1/AR(1)×2 -0.0216 0.9430 0.9396 0.6737 0.0544 1.8478
Random walk -0.0188 0.8057 0.8027 0.5091 -0.0999
AR(1) for yield changes 0.0668 0.7452 0.7453 0.2987 -0.1690 -1.2765
Fama-Bliss 0.1198 0.8480 0.8532 0.5158 -0.0599 2.0753
CF, equal weights -0.0301 0.8201 0.8174 0.5480 -0.0637 0.6904
CF, perf. weights -0.0138 0.8103 0.8072 0.5428 -0.0681 0.2564
τ = 60 months
Diebold-Li, RW factors -0.0992 0.6989 0.7032 -0.0499 -0.2004 -1.3044
Diebold-Li, RW×2/AR(1)×1 -0.1348 0.6952 0.7055 0.1337 -0.1445 -0.3385
Diebold-Li, RW×1/AR(1)×2 -0.1326 0.7285 0.7376 0.1884 -0.0875 0.5077
Random walk -0.0596 0.7159 0.7155 -0.0280 -0.1926
AR(1) for yield changes 0.0940 0.7322 0.7353 -0.0305 -0.2000 0.7273
Fama-Bliss 0.0694 0.7472 0.7475 0.0163 -0.2088 1.1200
CF, equal weights -0.1233 0.6960 0.7042 0.0416 -0.1899 -0.7840
CF, perf. weights -0.1178 0.6887 0.6960 0.0324 -0.1899 -1.3672

Table 5: Performance of six-month-ahead forecasts

ˆ ρ(24)
ˆ DM
τ = 3 months
Diebold-Li, RW×2/AR(1)×1 -0.0797 1.4330 1.4293 0.1859 -0.1438 -0.4434
Diebold-Li, RW×1/AR(1)×2 -0.0583 1.5366 1.5313 0.4574 -0.1920 0.4564
Random walk -0.0293 1.4448 1.4391 0.2399 -0.1514
AR(1) for yield changes 0.1006 1.4788 1.4761 0.1517 -0.1329 0.7043
Fama-Bliss 0.4055 1.4434 1.4936 0.3098 -0.1931 0.4384
CF, equal weights -0.1407 1.4053 1.4065 0.2885 -0.1916 -0.5146
CF, perf. weights -0.0820 1.4096 1.4061 0.3194 -0.1963 -0.6105
τ = 12 months
Diebold-Li, RW factors -0.0353 1.4302 1.4247 0.2670 -0.1990 0.3973
Diebold-Li, RW×2/AR(1)×1 -0.0954 1.4466 1.4437 0.2954 -0.2225 0.3992
Diebold-Li, RW×1/AR(1)×2 -0.0788 1.5668 1.5623 0.4625 -0.1982 0.6371
Random walk -0.0586 1.4231 1.4185 0.2435 -0.1893
Fama-Bliss 0.2170 1.5061 1.5155 0.2721 -0.1909 1.3848
CF, equal weights -0.1401 1.4285 1.4295 0.3148 -0.2247 0.1490
CF, perf. weights -0.0852 1.4292 1.4258 0.3327 -0.2423 0.1164
τ = 60 months
Diebold-Li, RW factors -0.1768 0.9712 0.9832 -0.1768 -0.0747 -1.7332
Diebold-Li, RW×2/AR(1)×1 -0.2403 0.9573 0.9832 -0.0205 -0.1452 -0.4089
Diebold-Li, RW×1/AR(1)×2 -0.2341 1.0528 1.0743 0.0454 -0.0814 0.6023
Random walk -0.1376 0.9994 1.0048 -0.1677 -0.0695
Slope regression 0.0953 1.0679 1.0677 -0.0121 -0.1290 0.7539
Fama-Bliss 0.1253 1.0999 1.1025 -0.0893 -0.0710 1.2509
CF, equal weights -0.2818 0.9629 0.9995 -0.0549 -0.1321 -0.1122
CF, perf. weights -0.2359 0.9522 0.9772 -0.0713 -0.1516 -0.8154

Table 6: Performance of twelve-month-ahead forecasts

Maturity h=1 h=6 h = 12
DL-RW DL-RWS DL-RW DL-RWS DL-RW DL-RWS
τ=3 -1.6078 -2.6504 -2.1124 -2.2599 -1.0574 -1.1257
6 2.0583 1.4747 1.9878 1.8932 1.2586 1.2003
9 1.0717 0.1421 0.3481 0.2165 0.1665 -0.2141
12 2.0668 1.3091 1.0007 0.8895 0.3973 0.3512
24 2.2482 2.0024 1.0362 0.9909 1.1089 1.0887
36 -1.0325 -1.5159 -1.2929 -1.3990 -0.8712 -0.9132
60 -0.7370 -1.3395 -1.3044 -1.4262 -1.7332 -1.7956
120 1.8357 0.6831 1.3764 1.1322 1.9675 1.8284

Table 7: Comparison of DL-RW to its shrinkage version DL-RWS

Module Functionality
us-bonds.do
Preparation of CRSP data: filtering, reformatting,
consistency checks, export to csv-format
prepare datastream issues.do
Preparation of Datatream data on issues: analysis,
filtering, reformatting, export to csv-format
prepare datastream data.do
Preparation of Datatream data on prices, ytm, and accrued interest:
reshaping, reformatting, export to csv-format
prepare complete.do
Preparation of consolidated Datatream data: merging, reshaping,
reformatting, filtering, consistency checks, export to csv-format
create reformat date.do
Function to convert date-strings into Stata readable format

Table 8: Stata do-files

Module Functionality
data prep/us data prep.m
Construction of the zero curve and forward rates for the US,
using CRSP data
data prep/uk data prep.m
Construction of the zero curve and forward rates for the UK,
using Datastream data (does not produce consistent results)
data summary.m
Summary statistics for the zero curve
dl create factors etc.m
Creation of NS-factors, comparison of NS-factors and empirical factors,
ﬁtted and empirical yield curves, graphing of spot rate and residuals
dl factor dynamics.m
Analysis of dynamic properties of the factors, ACF/PACFs,
ARMA-modelling, residual autocorrelations, tests for white noise
dl forecast.m
Out-of-sample forecasting and comparison of predictive accuracy
kalman.m
Estimation of state-space model
kalman loglik.m
Implementation of the Kalman ﬁlter

Table 9: Matlab modules

12

10
Yield (percent)

8

6

4

2

0
150

100 2004
50 1993

Maturity (months) 0 1982
Time

Figure 1: Yield curves, Jan. 85 - Dec. 06

Level
14
Model−based level
12 Empirical level
10

8

6

4

2
87 93 98 04

Slope
6
−(model−based slope)
4 Empirical slope

2

0

−2
87 93 98 04

Curvature
2

1

0

−1
0.3(model−based curvature)
−2
Empirical curvature
−3
87 93 98 04

Figure 2: NS-factors vs. empirical factors

ACF, β1 PACF,β1
1 1

0 0

−1 −1
0 10 20 30 40 0 10 20 30 40
ACF, β PACF,β
2 2
1 1

0 0

−1 −1
0 10 20 30 40 0 10 20 30 40
ACF, β3 PACF,β3
1 1

0 0

−1 −1
0 10 20 30 40 0 10 20 30 40

Figure 3: Autocorrelations and partial autocorrelations of NS-factors

Residuals of AR(1) for β1
1

0.5

0

−0.5
0 5 10 15 20 25 30 35 40
1

0.5

0

−0.5
0 5 10 15 20 25 30 35 40
1

0.5

0

−0.5
0 5 10 15 20 25 30 35 40

Figure 4: Autocorrelations of the residuals of AR(1) models for the NS-factors

6.6
empirical
6.4
fitted

6.2

6
Yield (percent)

5.8

5.6

5.4

5.2

5

0 20 40 60 80 100 120
Maturity (months)

Figure 5: Average ﬁtted vs. average empirical yield curve

Yield curve on 1989−03−31 Yield curve on 1989−07−31
9.8 7.9
empirical empirical
fitted 7.8 fitted
9.6

7.7

Yield (percent)
Yield (percent)

9.4
7.6

9.2
7.5

9
7.4

8.8 7.3
0 20 40 60 80 100 120 0 20 40 60 80 100 120
Maturity (months) Maturity (months)
Yield curve on 1997−05−30 Yield curve on 1998−08−31
6.8 5.15
6.6 empirical
5.1 fitted
6.4
empirical
fitted 5.05
6.2
Yield (percent)

Yield (percent)

6 5

5.8 4.95
5.6
4.9
5.4
4.85
5.2

5
0 20 40 60 80 100 120 0 20 40 60 80 100 120
Maturity (months) Maturity (months)

Figure 6: Fitted vs. empirical yield curves on selected dates

0.4

0.2
Yield (percent)

0

−0.2

−0.4
150

100 2004
50 1993

Maturity (months) 0 1982
Time

Figure 7: Yield curve residuals (pricing errors)

220 F

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (12)

Similar to 220 F

Similar to 220 F (20)

Recently uploaded

Recently uploaded (20)

220 F