SlideShare a Scribd company logo
1 of 42
Download to read offline
1
THE STATE OF ECONOMETRICS AFTER JOHN W. PRATT, ROBERT
SCHLAIFER, BRIAN SKYRMS, AND ROBERT L. BASMANN
By P. A. V. B. SWAMY
Federal Reserve Board (Retired), Washington, DC 20551, USA; swamyparavastu@hotmail.com
PETER VON ZUR MUEHLEN
Federal Reserve Board (Retired), Washington, DC 20551, USA; pmuehlen@verizon.net
J. S. MEHTA
Department of Mathematics (Retired), Temple University, Philadelphia, PA 19122, USA;
mehta1007@comcast.net
and
I-LOK CHANG
Department of Mathematics (Retired), American University, Washington, DC 20016, USA;
ilchang@verizon.net
Correspondence: swamyparavastu@hotmail.com; Tel.: 571-435-4979
SUMMARY. Thirty-five years ago, introducing a distinction between factors and
concomitants in regressions, John W. Pratt and Robert Schlaifer determined that the error
term in a regression represents the net effect of omitted relevant regressors. As this paper
demonstrates, this assumption poses a problem whenever the purpose of a model is to explain
an economic phenomenon, because the estimated coefficients as well as the error will be
wrong in the sense that they are not unique. But a model that is not unique cannot be a causal
description of unique events in the real world. For a remedy, this paper presents a
methodology based on conditions under which the error term and the coefficients on
regressors included in a model do become unique, where the latter represent the sums of
direct and indirect effects on the dependent variable, with omitted but relevant regressors
having been chosen to define both these effects. The two effects corresponding to any
particular omitted relevant regressor can be learned only by converting that regressor into an
included regressor. For those cases where omitted relevant regressors are not identified,
thereby preventing a meaningful distinction between direct and indirect effects, we introduce
so-called coefficient drivers and a feasible method of generalized least squares, permitting a
“total-effect” causal interpretation of the coefficient on each regressor included in a model.
Key words: unique time-varying coefficient and unique error term; direct effect; indirect
effect; and total effect of a regressor; omitted relevant regressor; coefficient driver;
measurement-error bias
JEL Classifications: C16; C21; C22
2
1. INTRODUCTION
This paper seeks to address and then remedy a problem that, in our view, has beset
econometrics since its inception: the lack of uniqueness of results inherent in any of the
regression methodologies currently in use. As we will show, this problem is compounded by
an apparent disregard of omitted-regressor biases that afflict current practice in
econometrics. This paper is not the first to call attention to this issue, the first alarms having
been raised as long as 35 years ago by Pratt and Schlaifer’s (1984, 1988) papers calling
attention to a fundamental mistake in econometrics: the common assumption that the
included regressors in a model are uncorrelated, mean independent, or independent of the
model error term. Since the error term of a model is made up of omitted relevant regressors
which are unknown, and since, according to the authors, the included regressors in the model
cannot be uncorrelated with every omitted relevant regressor that affects the dependent
variable in the regression, the pervasive assumption that the included regressors are not
correlated with the effect of unidentified omitted relevant regressors is meaningless. An even
stronger assumption that the included regressors are not correlated with any omitted relevant
regressor is patently false. Compelling and important as their contribution has been, it
appears that up to now, it has been widely ignored. One hurdle, aside from a possible refusal
to accept Pratt and Schlaifer’s very arguments, is the perceived difficulty of any remedy that
would accommodate their critique in meaningful ways, including the techniques presented
here and in earlier writings by some of the co-authors of this paper. To date, only Swamy,
Peter von zur Muehlen, I-Lok Chang, Jathinder Mehta, George Tavlas, and various
coauthors have actually estimated unique models using their proposed methodology, one
3
that has taken time and effort to refine. A critical element in Swamy and I-Lok Chang’s
estimation method is the use of so-called “coefficient drivers,” observable variables that
enter a regression model but not in ways with which econometricians are familiar. The
necessary search for coefficient drivers, which relies on practical experience and theoretical
insight, and the subtleties of their application, may be a third obstacle to wider adoption of
methodologies that assure accurate estimation of unique coefficients and error terms.
Econometrics serves two functions, one being practical and the other metaphysical.
Its most practical function is to predict economic events where “why” matters less than
“how.” If a model predicts well over time, it is considered successful, regardless of how it
was constructed. By contrast, the metaphysical purpose of econometrics is to explain
empirical relationships and to provide causal interpretations. It is this latter -- conceptually
far more complex -- endeavor that this paper seeks to treat. For purposes of prediction
Swamy and Tinsley (1980, pp. 111 and 112) developed a feasible minimum mean square
error linear predictor of needed future values of the dependent variable in a general
predictive model under the novel (for that time) assumption that the coefficients of an
econometric model follow a vector autoregressive and moving average (ARMA) process.
Regarding the metaphysical purpose of providing causal interpretations, an early
contribution by Swamy and von zur Muehlen (1988) addressed a number of fundamental
issues that arise when the task is to determine causality in probabilistic models, with special
4
attention given to inductive procedures used in the literature that had seemed to be in
violation of Jeffreys’ rules 1 and 2 involving the requirement of logical consistency.1
In the present context, the issue of causal interpretation arises when, as is often the
case, the error term in a regression is thought to represent the effects of relevant regressors
left out of the equation. As we shall demonstrate, in such instances, the coefficients and error
terms themselves are not unique and therefore lack the interpretation of causal effects,
because when coefficients cannot be given unique interpretations, they cannot be reflections
of the real world which, by definition, is unique. In this paper, we shall show that proper
causal interpretations are possible if we replace commonly held restrictive assumptions
underlying models of the kind considered here with meaningful alternatives. A precedent for
this is Freedman (2005), who, drawing on Pratt and Schlaifer’s (1984, 1988) insight that if
the error represents the net effect of omitted relevant regressors, then it must be correlated
with the included regressors, also concluded that standard assumptions fail. For the special
case of a model in which subjects are independently and identically distributed (iid) and all
variables are jointly normal with expectation zero, he showed that necessary conditions to
prevent estimating the “wrong parameters” were that the combined effect of the omitted
regressors (i) is independent of each variable included in the equation, (ii) is independent
across subjects, and (iii) has expectation 0. But he also indicated that these assumptions are
unrealistic and therefore “hard to swallow.” The purpose of this paper then is to develop
conditions that are realistic and palatable and to embed them in a methodology under which
1
The importance of the distinctions raised by Swamy and von zur Muehlen (1988) was highlighted in Aigner
and Zellner (1988, p. 3) and Zellner (1988, pp. 7 and 8).
5
valid causal interpretations of a model do become possible and even natural. Like Freedman,
we take our cue from Pratt and Schlaifer (1984,1988) (hereafter PS), who defined a class of
relations called “laws” and provided conditions under which such laws can be observed in
data. PS did not dwell on exact epistemologies of “causation,” so we shall follow Basmann
(1988) who defined causality as designating a property of the real world, meaning that all
relations considered to be causal orderings must be unique description of the real world. The
ideas of PS and Basmann lead us to consider as causal models only those that are free of
mis-specifications in the sense that their coefficients and error terms are unique.2
Any discussion of causation in the present context would not be complete without
considering the distinction between deterministic and statistical causation, so important for
any treatment of causation in econometrics. Swamy, Conway, and von zur Muehlen (1985)
discussed such a distinction in the context of a treatise on the possibilities and limitations of
logical and causal inference in an Aristotelian framework when some statements are
probabilistic. Subsequently, Swamy and von zur Muehlen (1988, pp. 141-144) extended the
concept of logical entailment -- the possibility of inferring some Q from P via modus ponens
-- to probabilistic entailment -- the possibility of valid probabilistic inference of some
statement S with given probability from some other statement S also having some
probability, without violating basic Aristotelian principles of logic. Finally, in a paper that
has specific applicability to the present discussion, Skyrms (1988, p. 59) wrote that statistical
2
Here, “mis-specifications-free models” describe “real-world relations,” and mis-specifications-free models
with unique coefficients and error terms describe unique real-world relations. Since causality designates a
property of the real-world, causal relations being unique in the real-world, causality is used here to designate
a property of mis-specifications-free models with unique coefficients and error terms.
6
causation is positive statistical relevance which does not disappear when we control for all
relevant pre-existing conditions, so that true correlation (or statistical causation) is not
reduced to zero when we control for all relevant pre-existing conditions, a point that we shall
take quite seriously.
The remainder of this paper is divided into three sections. Section 2 considers the
question of why models with non-unique coefficients and error terms cannot be causal. The
section also shows why a random coefficient model is based on an assumption that cannot
be satisfied and why so-called state-space models have non-unique coefficients and error
terms. Section 3 contains the body of this paper, giving a step-by-step derivation of a non-
linear model with unique coefficients and error term. The concepts involved in this
derivation are: time-variability of coefficients and “sufficient sets” of omitted relevant
regressors, i.e., of omitted regressors that affect the dependent variable of the model.
Accordingly, we show that the unknown true functional form of a unique causal model
requires all of its coefficients to be time-variant. The two most important take-aways from
this paper are: (1) when omitted relevant regressors are not identified, there exists no
meaningful distinction between direct and indirect effects on the dependent variable, which,
along with measurement-error biases arise as components of the unique coefficients on the
included regressors; and (2) we derive a feasible methodology based on the utilization of
coefficient drivers that enables a representation of unique regression coefficients as the sums
of direct and indirect effects (or total effects), thereby permitting valid causal interpretations
involving real-world phenomena.
7
To present these main results, Section 3 is divided into six subsections. Section 3.1
introduces the concept of coefficient drivers and the admissibility condition they should
satisfy. Section 3.2 shows that these drivers are useful for estimating the model when omitted
relevant regressors are not identified. Section 3.3 gives a matrix formulation of the model
when its time-varying coefficients are represented as linear stochastic functions of certain
observable time-varying coefficient drivers. Section 3.4 states the assumptions under which
the model in Section 3.3 is estimated. Section 3.5 derives an optimal predictor of the error
term and the estimators of the coefficients and their components. Section 3.6 derives the
second-order properties of the feasible generalized least squares (GLS) estimators of the
coefficients of the model presented in Section 3.3, utilizing a methodology introduced by
Cavanagh and Rothenberg (1995) with a modification. Section 4 concludes.
2. REQUIREMENTS OF A MODEL TO BE CAUSAL
Consider the linear model
Y X = + U (1)
y T  tYHere it is assumed that is a 1 vector of values taken by the dependent variable , t =
1, …, T, which are random and written as a column vector Y, X as in equation (1), is a T
K nonrandom design matrix of observations on K regressors which are called “the included
X uregressors,” the rank of is K, is a T 1 vector of unobserved values taken by the
tUrandom error terms , t = 1, …, T, which are random and written as a column vector U, as
tU X t t sin (1), and T is the number of periods. It is further assumed that E( | ) = 0, ; and
tU sU E( |X) is an element of , which is a nonsingular matrix and a smooth function of an
unknown finite dimensional parameter vector , the sample size T is large relative to K and
8
 ythe number of the rows of the column vector .3
In this section, we do not assume that
and X contain measurement errors. We relax this assumption in Section 3.
2.1 Three interpretations of the error term in a model
In order to dwell more deeply on the nature of causal modeling, it is useful to discuss
three interpretations of u that have been either explicit or implicit in the econometrics and
statistics literature.
Interpretation (I). The value (u) of the disturbance (U) in (1) arises because it is
extremely unlikely that in a model such as (1), all influences on y have been captured, no
matter how thorough and careful its specification. If K is less than the total number of all
these influences, then u represents all omitted regressors that affect y or simply all omitted
relevant regressors. To provide intuition for their principal argument regarding the nature of
u, Pratt and Schlaifer (1984, 1988) provided two examples, which we now repeat.
Example 1. To show that a law can be recognized if it is known which excluded variable its
error term depends on, Pratt and Schlaifer (1984, pp. 11-12) ask us to imagine a cylinder
sealed at the bottom and closed at the top by a movable piston. Now, let P, V, and T denote
the logarithms of the pressure, volume, and absolute temperature of the air. The deterministic
law P = C - V + T has an alternative form P = C -1.4V + S because at ordinary temperatures,
the entropy of the air in the cylinder, denoted by S, is a nearly linear function of T and V,
i.e., S = T + .4V. If it is known that T is held constant, then the relation P = C - V + T between
P and V is called “Boyle’s law.” If it is known that S is held constant (by allowing no heat
3
Throughout this paper, following Lehmann and Casella (1998, pp. 180-181), we maintain the distinction
between each random variable and its value by using an upper-case symbol to denote the former and a lower-
case symbol to denote the latter.
9
flow into or out of the cylinder), then the resulting relation, P = C -1.4V + S, between P and
V is called “the adiabatic law.” Alternatively, if some non-experimental data on P and V are
available but none on T or S, then the error term u in the presumed linear stochastic law P =
0 + V 1 + u is only known as being made up of one of the two excluded variables T and
S, although it is not known which one. Since both T and S are excluded from the proposed
linear stochastic law, P = 0 + V 1 + u, and since by the previous relation S = T + .4V, at
least one of either T or S must be correlated with V, an assumption that V is not correlated
with the effect of an unidentified excluded variable is meaningless, and the stronger
assumption that V is not correlated with any excluded variable is surely false.
Example 2. To show that excluded variables are not unique, Pratt and Schlaifer (1988, pp.
49-50), ask us to imagine a land, called Utopia, in which stocks are traded on only one day
of each year, such that the price of each firm’s stock is wholly determined by the firm’s
known earnings and dividends. Then the price is also wholly determined by earnings and
retained earnings. PS conclude that “if one wants to estimate a law relating price to earnings
only, then one cannot meaningfully talk about ‘the’ excluded variable (singular) because it
could equally well be either dividends or retained earnings or some other function of
earnings and dividends.” So, its error term depends on an omitted relevant regressor, which
is “either dividends or retained earnings or some other function of earnings and dividends.”
Thus, excluded variables are not unique! Furthermore, if the variable called “earnings” is
independent of dividends, then it is dependent on retained earnings and vice versa, meaning
that earnings cannot be independent of ‘the’ excluded variables (plural). If it is possible to
make a forecast of dividends, then price is functionally determined by earnings, that forecast,
10
and the residual of the regression of dividends on the forecast. If we regress stock price on
earnings only, then all that we can hope to learn is the ‘total’ effect of earnings, consisting
in part of the ‘direct’ effect of earnings given (say) dividends or retained earnings, in part of
the ‘indirect’ effect that may result from the fact that earnings may affect dividends or
retained earnings, and dividends or retained earnings that may in turn affect price. Crucially,
although the direct and indirect effects of earnings depend on the omitted regressor chosen
to define them, the total effect does not!
Interpretation (II). Heckman and Schmierer (2010, p. 1356) and others working on
nonparametric regressions write equation (1) as Y = E(Y|X) + U and assume that E(U|X) = 0.
In words, the disturbance is the deviation of Y from the conditional mean E(Y|X) given X
(see Greene 2012, p. 212). Heckman used this formulation to counter those critics of
econometrics who had argued that econometricians work with models that have mysterious
error terms. However, the conditional mean E(Y|X ) does not always exist.4
According to
Heckman and Schmierer, U represents omitted relevant regressors subject to the condition
that E(U|X) = 0 without the qualification that whenever E(Y|X) exists. Note that this
interpretation (II) is more restrictive than interpretation (I).
Interpretation (III). Consider the following model consisting of a sampling model
ˆy y 1e 2e= + + and a linking model Y X = + U with U = ZV ˆy ywhere , , Y, and U
T  ˆy yare 1 vectors, is the vector of survey estimates of the unknown elements of which
is the vector of values taken by the random vector Y 1e T , is a 1 vector of unknown
4
Sufficient conditions for its existence are given in Rao, C.R. (1973, p. 97).
11
2e T sampling errors, is a 1 vector of unknown non-sampling errors, Z T his a known
matrix, the random vector V and its value v 1h are vectors, and the observation subscript
t indexes areas. Here, the model error is U = ZV, and its value is denoted by Zv. As special
instances of this case, the small area models in Rao, J.N.K. (2003, p. 96) are based on two
ˆy X approximations: (i) = + Zv 1e ˆy+ , under the assumption that the survey weights in
2eare adjusted so that = 0, (ii) the model error is Zv that does not represent any omitted
relevant regressors. By contrast, Swamy, Mehta, Tavlas and Hall (2014, pp. 207-211)
2e  ˆy y 1e 2eassume that 0, so that = + + . They estimate this model based on a method
yof simultaneously estimating the common estimand of two sample estimators and the
sums of their sampling and non-sampling errors.
If a model such as equation (1) is meant to identify a true and causal relationship,
then the two approximations just described present a serious problem. In many instances, a
researcher may not care about causal interpretations and be principally focused on prediction
and correlation, so our concern with approximations may not matter. However, we suspect
that frequently, the temptation to interpret the coefficients estimated in such models as
“effects” rather than mere “correlations” is difficult to resist even by the analyst. In such
instances, the above criticism of two approximations cannot be ignored.
2.2 Why model (1) in not causal
We now uadopt interpretation (I) of and summarize why and how model (1) is non-
causal. In order to do so, we need (i) to define what we mean by causal and (ii) why model
(1) does not satisfy Skyrms’ (1988, 59) conditions for statistical causation cited in the
Introduction and adopted here. We resort to an idea, first offered by PS (1984, p. 11), that
12
uwithout interpreting the error vector, , often called “disturbance,” it is not possible to show
 uwhether an estimator of is consistent.5
A natural and conventional interpretation of is
that it is made up of omitted relevant regressors. But when relevant pre-existing conditions
cited by Skyrms (1988, p. 59) are unknown, as they usually are, their control is possible only
if the omitted relevant regressors are augmented by all relevant pre-existing conditions in
ways that we demonstrate in Section 3. To make the case, consider the model in (1) and let
u W W = , where is an unknown T L matrix containing all omitted relevant regressors
and all relevant pre-existing conditions for all T periods, and is an unknown L 1 vector
of coefficients. The columns of X in conjunction with those columns of W that contain all
yomitted relevant regressors, are at least sufficient to determine the value of . The
remaining columns of W containing all relevant pre-existing conditions can be controlled to
reduce any spurious correlations implied by (1) to zero, as we show below. To avoid leaving
out any omitted relevant regressors or relevant pre-existing conditions in W, we assume that
L is unknown.
For a single observation, say the tth, write model (1) as
ty tx tu tw= + (= ) (2)
ty y tx 0tx 1tx 1,K tx −where t indexes time or observations, is the tth element of , = ( , , …, )
0tx  X tu u tw Wwith = 1, t, is the tth row of , is the tth element of , is the tth row of ,
 and the coefficient vectors and are not known.
5
This is a conclusion also reached by Freedman (2005).
13
To establish that model (2) cannot be causal, we note the following:
(i) Under Swamy, Mehta and Chang’s (2017) definition of uniqueness applied to the
coefficients and error term in any model, the coefficients and error term of equation (2)
cannot be unique. Therefore, it is incorrect to refer to “the” omitted relevant regressors or to
“the” omitted pre-existing conditions of (2). Fortunately, we are able to show in Section 3
that with certain changes, the coefficients and error term in (2) can be made to be unique.
tx(ii) The included regressors cannot be uncorrelated with every omitted relevant
twregressor (in ), as PS (1984, pp. 13-14) have shown. We will also show in Section 3 that
the unique error term is made up of certain “sufficient sets” of omitted relevant regressors
or omitted pre-existing conditions. The included regressors can be independent of such
sufficient sets.
tw  (iii) The vectors , and are not unique, as PS (1984, pp. 13-14) have also
shown.
tw txResult (ii) implies that at least some of the elements of must be correlated with .
txTherefore, an assumption that the included regressors are not correlated with unidentified
omitted relevant regressors themselves is meaningless (see PS 1988, p. 34). This result
compels us not to make such a meaningless assumption below. Assertions (i)-(iii) lay the
txfoundation for why (which is not randomized) cannot be exogenous and is therefore
tucorrelated with , implying that Pratt and Schlaifer’s (1988) conditions for a law to be
14
observed in data are not satisfied and model (2) is therefore not causal.6
The next few
subsections flesh out some further details to motivate the approach taken in Section 3.
2.2 Digression on different types of causation
Skyrms’ (1988) discussion of the three types of causation, viz., deterministic,
probabilistic, and statistical, is a treatise par excellence and is relevant for this paper which
adopts probabilistic and statistical concepts of causation.7
As a historical note, we mention
Zellner’s (1979) enthusiasm for Feigl’s definition of causality as “predictability according
to a law or set of laws,” and his subsequent observation (Zellner 1988, p. 12) that in the
preceding two decades, not a single new causal economic law had been produced by all the
work done on definitions of causality and tests for causality. Feigl’s definition pointedly
raises this question: What then is a law? In answer, PS (1988, pp. 28 and 35) used Rubin’s
(1978) potential-value notation to formulate a law relating the dependent variable of (2) to
its included regressors, asserting that it is the existence of these “potential values” that
distinguishes a law from a statistical association, and that it is only in terms of these potential
values that it is possible to state the conditions under which a law can be observed in data.
Unfortunately, the correct functional form of such a law is typically unknown, and so these
conditions are difficult to verify. Indeed, in their path-breaking work, PS (1988) enumerate
conditions for observability of laws in data that are, as just noted, unverifiable. A related
issue concerns measurement errors that have also not been adequately treated in econometric
6
Goldberger (1964, pp. 380-388) showed that only incomplete theories can have exogenous variables. Here,
we have established that even incomplete theories represented by single-equation models with errors made up
of omitted relevant regressors cannot have exogenous regressors.
7
Skyrms (1988, p. 57) pointed out that the relations between different conceptions of probability are of central
importance to questions of probabilistic causation. For this reason, we need to emphasize here that we use
frequentist probability.
15
work. We take this up later, when we develop our own alternative methodology that depends
on utilization of “coefficient drivers” to be defined shortly. As we show in the next section,
unlike any other method in the literature, ours controls up front for all relevant pre-existing
conditions, leading to a model whose coefficients and error term are unique. Our approach
has the additional and desirable advantage of consistency with Basmann’s (1988) definition
of causality as a property describing the real world in which causal relations and orderings
are unique.8
As we show later, only a correct mis-specifications-free model with unique
coefficients and error term can be considered as describing a real-world relation, where the
exact meaning of the term in italics, so central to the paper, will become clear in Section 3.
Before we do so, we describe two common instances covering a broad range of econometric
modeling, in which neither coefficients nor error terms are unique.
2.3 Random coefficients in cross-section estimation
Suppose we wish to estimate (2) using a single cross-section data set. To do so,
change the subscript t in (2) to i which indexes cross-sectional units. As in other cases, inter-
individual heterogeneity may be present in this cross-sectional study. For this reason, we
need to change (2) to Hildreth and Houck’s (1968) representation, which is
iy
1
0
( )
K
ji j ji
j
x  
−
=
+= (i = 1, …, n) (3)
8
A referee has thoughtfully suggested that this definition of causality may need some further discussion, as
there are other definitions around that do not require uniqueness for a causal description. But are these other
definitions correct or relevant? Basmann (1988, p. 99) answered this question with the observation that “None
of the generally accepted meanings of ‘causality’ fails to involve the notion that causation is a real-world,
invariant relation between events rather than a mere property of a linguistic representation. To use ‘causality’
in the latter sense may court eventual public … [devaluation] and dismissal of econometric research and
econometricians.” Any relationship with non-unique coefficients and error term is by definition mis-specified.
But how can a mis-specified relation reflect any kind of acceptable causation?
16
ji j ji + ji jiwhere = , j = 0, 1, …, K – 1, and are the values taken by the random
ji ji ji jvariables and , respectively, is distributed with mean and where it is further
assumed that
ji X  ji j i   X
2
if = and =
0 otherwise{ j j j i i  
E( | ) = 0, , j and i and E( | ) = (4)
0ix  0iSuppose that = 1, i, and that the error term is made up of all omitted
iyrelevant regressors that affect . Hildreth and Houck (1968) did not have the benefit of PS’
0i(1984, 1988) earlier insight that when the disturbance is composed of all omitted
ix 1 1,(1, ,..., )i K ix x − relevant regressors, as it is in (2), the included regressors with = as the
vector of their values cannot be uncorrelated with every one of those omitted relevant
0iregressors (or with ) and can therefore not be exogenous. Also, in (3), the included
ixregressors with as the vector of their values are correlated with their random coefficients
in (3). As a consequence, the assumption in (4) is not satisfied. As we shall demonstrate,
ixwhen is not the value of the vector of exogenous variables, contrary to Hildreth and Houck
(1968), it cannot be treated as fixed.
2.4 State-space models
Durbin and Koopman (2001), among others, treat general linear Gaussian state-space
models, which, for a single dependent variable, can be written as
ty t tx  tu= + (5)
t  1tF − tv= + + (6)
17
 tvwhere is a fixed vector and is the value taken by a vector of errors treated as random
variables. Equation (5) is called the “observation equation,” and equation (6) is the “state
txequation.” A routine assumption is that in (5) is the value taken by the vector of included
tu tvexogenous regressors and that the random variables taking the values and ,
respectively, are independent, although the above cited authors do not offer an interpretation
tuof the random error term taking the value in (5) or an explicit or implicit
tuacknowledgement of the presence of pre-exiting conditions in (2). But, if consists of all
omitted relevant regressors, as is often assumed, then it follows from PS’ (1984, 1988)
txarticles that the vector in (5) cannot be the value of a vector of exogenous variables, since
tuthese variables are correlated with the random variable taking the value . Furthermore, the
txcorrelation between the included regressors with as the vector of their values and their
random coefficients are natural in models of the type (5). As a consequence, the coefficients
and the error term in (5) are not unique, and likewise, equation (5) will have the same
problems we previously identified for equation (2) above. In essence, as posited, equation
(6) implicitly contains an assumption that the coefficients in (5) are non-unique. Therefore,
if the representation in (2) is unsatisfactory, then so are state-space models, such as (5) and
(6).
3. A MODEL WITH UNIQUE COEFFICIENTS AND ERROR TERM
We turn to our main task and show how to construct and estimate a model whose
ty *
tycoefficients and error term are unique. We begin by introducing some notation. Let =
*
0t jtx
*
jtx *
jt ty jtx+ , and for j = 1, …, K - 1, let = + , where and are the observed
18
*
ty
*
jtx *
0t
*
jtvariables, and are the unobserved true values, and and are unobserved
*
twmeasurement errors. Also, for = 1, …, L, let denote the true value of the -th omitted
relevant regressor (or relevant pre-existing condition). Further, partition W in Section 2 as
1L 2L 1L 2L *
tw 1Lfollows: define integers and , + = L, such that (i) , = 1, …, , denotes
omitted relevant regressors, and (ii) *
tw 1L 1L 2L, = + 1, …. + = L, denotes all relevant
1L 2Lpre-existing conditions. Finally, assume that and are unknown. We are now prepared
*
tyto implement PS’ (1984) two conditions that must be satisfied for to be related to the
*
jtx ’s. We do this using a convenient class of nonlinear functional forms and a pair of
equations, one deterministic and the other stochastic. First, write the deterministic
*
ty
*
jtx *
twrelationship between and the 's and 's,
*
ty *
0t
1
* *
1
K
jt jt
j
x 
−
=

* *
1
L
t tw 
=
= + + (7)
*
0 t
*
 jt
*
jtxwhere is the intercept, the with j > 0 are the coefficients of the ’s, *
tand the
*
tware the coefficients of the ’s. Observe that formally, none of the coefficients, omitted
relevant regressors, and relevant pre-existing conditions in (2) and (7) are unique. So, if we
* *
1
L
t tw 
=
take the last term on the right-hand side as the error term of (7), then this model
will have the same problems that haunt equation (2). But what if more could be said about
that third term? Could one somehow relate every omitted relevant regressor (or every
19
*
twrelevant pre-existing condition) to the true values of the included regressors? To answer
this, we introduce the next equation that we allow to be stochastic:
*
tw *
0t
1
* *
1
K
jt jt
j
x 
−
=
 1,..., L== + ( ) (8)
*
0twhere is the value of an error term representing that portion of the omitted relevant
*
tw
1
* *
1
K
jt jt
j
x 
−
=
regressor (or the relevant pre-existing condition) that remains after the effect
*
jtx *
tw
*
jtof the included regressors 's on has been removed from it, and is the coefficient
*
jtxof . It is important to emphasize that not all equations in (8) are vacuous or empty,
*
jtxbecause, as PS (1984, p.14) proved, the included regressors taking the values 's cannot
*
twall be uncorrelated with every omitted relevant regressor taking the value . This means
there are some omitted regressors that must be (stochastically) related to the included
*
jtxregressors taking the values 's, and it is these relationships that we write down as equation
*
tw(8). Accordingly, when represents a relevant pre-existing condition, it can be controlled
using the included regressors via (8). It follows from Skyrms (1988, p. 59), that the controls
of relevant pre-existing conditions, activated by specifying an equation like (8), make the
spurious correlations implied by (7) disappear.
Notably, the coefficients of (7) and (8) are time-varying; so, this property needs
explanation. The problem here is that their correct functional forms are not actually known,
leading to a potential problem of mis-specification. PS themselves, did not specifically
address or solve the issue of unknown functional forms in this setting, but they did offer the
20
*
y
*
xfollowing two requirements for to be related to by a law: (i) for every possible value
*
tx tU tUof there exists an identically and independently distributed sequence { }, where
*
txdepends on the considered possible value of , and (ii) there exists a function f which on
*
txthe tth observation, associates with every possible value of a random variable taking the
*
ty *
tx tu *
ty tuvalue = f(the considered possible value of , ), where both and depend on
*
txthe considered possible value (see PS 1988, p. 28). We refer to this imperative as the
“correct function f” and assume that the coefficients of (7) lie on that function. We assume
further that a similar argument is true about (8). A most general approach to solving the issue
of unknown functional forms for the models that also satisfy the preceding conditions is to
make the coefficients of (7) and (8) depend on the observation index t, thereby opening a
rich class of functional forms able to cover the respective correct functional forms as special
cases. We call these functional forms “linear in variables and nonlinear in coefficients.”
The important point to note here is that no parameters in the relationship in (7) are
treated as constant. Non-constancy of coefficients vindicates an old admonition by
Goldberger (1987) that the particular choice by Barten and Theil of quantities in the
Rotterdam School demand functions (and, by implication, in any other econometric
relationship) to be treated as constants may be questioned.
We now assert that the combination of (7) and (8) constitutes a model that is
*
tysufficient for the determination of exactly. This model is obtained by substituting the
*
twright-hand side of (8) for in (7):
21
*
ty *
0t
* *
0
1
L
t t 
=

1
* * * *
1 1
( )
K L
jt jt jt t
j
x   
−
= =
+ = + + (9)
*
0twhich clearly shows that the portions , = 1, …, L, of all omitted relevant regressors
*
jtxand all relevant pre-existing conditions, in conjunction with the included regressors , j =
*
ty1, …, K – 1, are sufficient to determine the value of exactly. PS (1988, p. 51) referred
*
0t 1Lto , = 1, …, , as certain “sufficient sets” of omitted relevant regressors. PS (1988,
*
jtxp. 34) further proved that although the included regressors taking the values ( ’s) cannot
be independent of every omitted relevant regressor, they can be independent of the sufficient
* *
0
1
L
t t 
=
set of every such variable.9
Thus, taking the second term ( ) on the right-hand side
of equation (9) as the value of its error term, we can conclude that in this equation, there is
no problem of the correlation between the included regressors and the error term.
Among some statisticians, time-varying coefficients are usually thought to be
uninterpretable. Yet, in equation (9), we are able to offer the following valid and
* * *
1
( )
L
jt jt t  
=
+ unambiguous interpretation: the jth time-varying coefficient of (9) is the
*
jtx *
ty“total” effect of on . This total effect is the sum of two time-varying coefficients,
*
jt * *
1
L
jt t 
=
and ,
*
jtwhere the coefficient
*
jtx *
tyis a direct effect of on which appears
9
This echoes Freedman’s (2005) assumption (i).
22
* *
1
L
jt t 
=

*
jtx *
tyin (7), and the sum of products is an indirect effect of (on ) due to the effect
*
jtxof on every omitted relevant regressor which appears in (8), and the effect of every
*
tyomitted relevant regressor on which appears in (7).10
These effects are time dependent.
However, if, as usual, the omitted relevant regressors are not identified, then there exists no
meaningful distinction between direct and indirect effects (see PS 1984, p, 12). Note that
*
jt * *
1
L
jt t 
=

*
jtx *
tyalthough the direct ( ) and indirect ( ) effects of on depend on the
* * *
1
( )
L
jt jt t  
=
+ omitted relevant regressors chosen to define them, the total effect does
not. This result, obvious from (9), is due to PS (1988, p. 50). It justifies estimation of the
total effect of each included regressor on the dependent variable when the omitted relevant
regressions are not identified. But instead of being arbitrary and “hard to swallow,” it is now
a condition at once palatable and implementable.
Swamy, Mehta, Tavlas and Hall (2014, p. 217-219) proved earlier that the
* * *
1
  
=
+ 
L
jt jt tcoefficients (or the sums ) and error term of (9) are unique, even though the
coefficients and error terms of (7) and (8) are not. This means that even though the
10
In this measure
* *
1
L
jt t 
=
 of the indirect effect, *
jt and *
t are set equal to zero if the dependent variable
of the -th equation in (8) is a relevant pre-existing condition. This restriction is needed because the vector
tw is assumed to contain all relevant pre-existing conditions, and the effect of *
jtx on each relevant pre-
existing condition is not part of an indirect effect of *
jtx on *
ty .
23
components of the coefficient on each included regressor in (9), measuring direct and
indirect effects, are not unique (see PS 1984, p. 13), its total effect is unique.
It may be useful here to refer to Simpson’s paradox, also known as the Yule -Simpson
effect, which is a phenomenon in probability or statistics whereby the association between a
pair of variables (X, Y) reverses sign upon conditioning of a third variable, Z, regardless of
the value taken by Z. To illustrate this phenomenon in the present context, suppose that all
*
jtxbut the first regressor, , j = 2, …, K – 1, are deleted from the right-hand side of (9) and
*
1 tare added to its list of omitted relevant regressors. Then the direct-effect of the remaining
*
1tx *
tyregressor on does not change sign no matter how many of the deleted regressors,
*
jtx *
jtx, j = 2, …, K – 1, are included back in (9). However, with one or more of the , j = 2,
*
1tx…, K – 1, included back in (9), the indirect-effect component of the coefficient on
*
1txchanges. Simpson and his followers implicitly mistook the change in the total effect of
due to changes in its indirect effects for a change in its direct effect. Thus, Simpson’s paradox
cannot arise in the context of equations of the form (9) with unique coefficients and error
term (see Swamy, Mehta, Tavlas and Hall 2015, pp. 5 and 6).
In (7)-(9), all omitted relevant regressors and all relevant pre-existing conditions are
treated identically without any distinction. The question then becomes: is this treatment
appropriate? We now consider the details of our methodology.
Measurement-error biases: In terms observed values introduced at the beginning of this
section, (9) can be written as
24
ty 0t
1
1
K
jt jt
j
x 
−
=
= + (10)
where
0t *
0t *
0t
* *
0
1
L
t t 
=
= + + (11)
and for j = 1, …, K – 1,
jt
*
(1 )
jt
jtx

− * * *
1
( )
L
jt jt t  
=
+ = (12)
*
jtwhere we do not assume that the measurement errors , j = 0, 1, …, K – 1, are random
variables distributed with zero means. The measurement-error bias component of the
jt jtxcoefficient of in equation (10) is precisely
*
( )
jt
jtx

− * * *
1
( )
L
jt jt t  
=
+  (13)
Equations (7)-(13) clarify that the model in (2), whose coefficients and error term
have been shown to be non-unique, suffers from a variety of specification errors that doom
the prospect of obtaining consistent estimators. As we have shown, model (10) is free from
this deficiency and is therefore to be preferred to (2). Observe that the time-varying
tYcoefficients of (10) make non-stationary. We now consider a feasible approach to
estimating model (10).
In what follows we limit ourselves to cases where all omitted relevant regressors are
*
jtx *
tyunidentified. As we have already shown above, only the total effects of ’s on can be
meaningfully estimated in these cases. Furthermore, assuming that the coefficients of
25
equation (10) are distributed with finite means, these means cannot be consistently estimated
ty jtx jtxby regression of on the ’s alone because the included regressors with the ’s as their
jtvalues are correlated with their own coefficients in (10). This is because in (10) is a
jtxfunction of . Therefore, we proceed as follows.
3.1 Coefficient drivers
To impose the restrictions implied by equations (11) and (12) on the coefficients of
(10), we make the following assumption:
Assumption I: For j = 0, 1, …, K – 1, the coefficients of (10) satisfy the equations
 jt 0 j
1
1

−
=

p
jh ht
h
z jtu= + + (14)
htzwhere the ’s are called the “coefficient drivers,” which are potentially observable
* *
1
L
jt t 
=
  jtvariables that influence the component of .11
It follows from equation (14) that the coefficients of (10) are non-stationary.
In specifying equation (14), we impose another level on model (10), so that, together,
(10) and (14) form a bi-level formulation of the relationship between ty and the jtx ’s. From
(9) and (14) it follows that (10) has two sources of errors: (i) certain “sufficient sets” of all
omitted relevant regressors and all relevant pre-existing conditions, *
0t ’s, and (ii) all the
11
After clarifying the complications that arise in the Bayesian analyses of laws, PS (1988, p. 49) concluded
that a Bayesian will do much better to search like a non-Bayesian for variables that absorb “proxy effects” for
omitted regressors. These effects can be equal to the indirect-effect
* *
1
L
jt t 
=
 component of  jt , j = 1, …,
K – 1. Taking a hint from this conclusion, we choose the coefficient drivers that absorb the indirect effects of
included regressors in (10). This will be made clear in equation (15)(ii) below.
26
coefficients of (10) including its intercept. Note that the model with equations (10) and (14)
does not provide a hierarchical Bayes model, as (14) is separate and yet inseparable from
(10) and is not equivalent to the prior distribution specified in a hierarchy. Actually, Pratt
and Schlaifer (1988, p. 49) show how to apply Bayesian analysis to a law when it is not
known whether the condition for observability is satisfied or not. Equations (10) and (14)
are consistent with their procedure, as we show below.
Next, we indicate conditions that need to be imposed on the coefficient drivers.
Assumption II (Admissibility Condition): tZ 1 11 Z , Z )− t p ,t, ...,The vector = ( in equation
tZ tz  t(14) is an admissible vector of coefficient drivers if, given = , the value =
0 1 1,( , ,..., )   − t t p t tX 1tX 1−K ,tXthat the coefficient vector of (10) would take had = (1, , …,
) tx 1tx 1−K ,tx ) tX 1tX 1−K ,tX ) been = (1, ,…, is independent of = (1, , …, , t.
 tThis first condition, requiring the coefficient vector of (10), having as its value,
tX tZand to be conditionally independent, given , is needed to achieve consistent
estimators of the coefficients of (14). Our experience with equation (14) suggests that
statistically significant estimates of the coefficients of (14) cannot be obtained unless certain
further exclusion restrictions are imposed on these coefficients.
3.2 Estimation of the coefficients of (10) with unidentified omitted relevant regressors
We assert that the coefficient drivers included in (14) are appropriate and adequate
*
jtˆand that our guesses about the measurement errors in (11) and (12), denoted by , j = 1,
…, K – 1, are accurate if the coefficient drivers are admissible and the following assumption
is satisfied:
27
Assumption III  jt
1
1- ( )
* L
jt * * *
jt jt t
jt
ˆ
x

  
=
 
+ 
 
 
: For j = 1, …, K – 1, the equation =
0 j
1
1

−
=

p
jh ht
h
z jtu= + + holds, such that
(i) *
jt =
1
1-
*
jt
jt
ˆ
x

−
 
 
 
 
0 j and (ii)
1
 
=

L
* *
jt t =
1
1-
*
jt
jt
ˆ
x

−
 
 
 
 
(
1
1

−
=

p
jh ht
h
z + jtu ),  t (15)
(see Swamy, von zur Muehlen, Mehta and Chang 2019).
The rationale for the two equations in (15) derives from an argument in PS (1988, p.
49) that a Bayesian will do much better to search like a non-Bayesian for variables that
1
 
=

L
* *
jt tabsorb “proxy effects” for omitted relevant regressors. The term representing the
indirect effects is one of the two terms of the  jtsecond factor of the jth coefficient, , in (12).
The estimator on the right-hand side of (15)(ii) is a good estimator of indirect effects on its
left-hand side in the sense that the coefficient drivers included in (14) and its error term
absorb those effects completely, i.e., the equality sign in (15)(ii) holds. Below, we show how
to estimate the ’s and u’s in (15)(i) and (15)(ii).
3.3 Combining equations (10 and (14)
Using vector and matrix notation, (10) and (14) can be combined, yielding
Y zX  xD= + U (16)
y 1( ,..., )Ty y  1T where = is a vector of observations on the random vector Y in equation
tz 0 1 1,( , ,..., )t t p tz z z −  0tz (16), = with = 1, t, is a p-vector of observations on the coefficient
tx 0 1 1,( , ,..., )t t K tx x x −  0tx drivers in (14), = with = 1, t, is a K-vector of observations on
28
zX T Kpthe included regressors in (10), is a ( ) t tz xmatrix having the Kronecker product
 Kp xDas its tth row, is a -vector of the coefficients of equations in (14), = diag
 1 Tx x  u 1u ) Tuis a T TK block diagonal matrix, = ( , …, is a TK-vector of values
taken by the error vector U tu 0 1 1, , ..., )t t K ,tu u u − in (16), and for t = 1, …, T, = ( is a K-
zX  xDvector of errors in (14). Note that both the conditional mean, , and the error term,
U, of Y on the right-hand of equation (16) depend on the included regressors. It is this
zXproperty of (16) that precludes the existence of instrumental variables for .
3.4 An assumption about the error term of model (16)
Assumption IV tu: The errors (t = 1, …, T) are the realizations of a vector
stationary stochastic process following the first-order vector autoregressive equation
tu =  1−tu + ta (17)
Where  is a K  K diagonal matrix and ta , t = 1, …, T, is a realization of a sequence of
uncorrelated K-vector variables t with
E( , )|t t tz x = 0, E( , )|t t t tA z x =
2
Δ if =
0 if
{
 

a a t t
t t
(18)
and where t is a random vector taking the value ta , Δa is a K  K nonnegative definite
matrix.
xDNote that under Assumption IV, the error term U in model (16) is a vector non-
stationary process obtained by passing the stationary process {U} through the time-
xDdependent filter .
Assumption IV also implies that
29
( | , t tz x 2
a u
2 1
0 0 0 0
2
2 0 0 0 0
1 2 3
0 0 0 0
T
T
a
T T T
      
     

      
−
−
− − −
   
 
  
 
 
  
E ) = = , (19)
 u twhere is a random vector taking the value , is a random vector taking the value
tu t t  ,t tz x 2
0 a
2 2
0 Δ    +a a a, E( | ) = = (see Anderson 1971, pp. 178-182), and the
y Y y zXconditional covariance matrix , of taking the value in (16) given is
y xD 2
a u xD= (20)
y txwhere the dependence of on the ’s is explicit. We note that in the general formulation
of Cavanagh and Rothenberg’s (1995) (hereafter CR) model, reproduced in (1), such
 Δa
2
adependence is not present. The distinct nonzero elements of , and are the elements
 yof the unknown parameter vector, denoted by , on which depends, where the number
of rows of the column vector is K + K(K + 1)/2 + 1 = m, say.
Before we apply the non-Bayesian generalized least squares method to model (16)
under Assumption IV, we note that a Bayesian analysis of (16) has certain problems, which
are in addition to those already pointed out by Pratt and Schlaifer (1988, p. 49). It is known
that the likelihood function, which is one of the factors of the posterior distribution derived
via Bayes’ theorem, is model based. Swamy, von zur Muehlen, Mehta and Chang (2019, p.
324-325) improved this derivation by requiring that the likelihood functions be based on
unique coefficients and error term of an appropriate model. Under Assumption IV, model
(16) is such a model. The difficulty of applying Bayes’ theorem to (16) though is that the
covariance matrix in equation (20) depends on the included regressors. But subjective
30
Bayesians, such as Bruno de Finetti and Leonard J. Savage, require that the prior probability
density function (pdf) for this covariance matrix be independent of the likelihood function.
y xDThe dependence of in (20) on yamounts to the dependence of the prior pdf for on
the likelihood function. Therefore, the task is how to choose a prior pdf for the covariance
matrix in (20) that does not depend on the likelihood function. This is not possible unless
the included regressors are removed from the covariance matrix in (20). But to achieve this,
one has to place unreasonable restrictions on model (10) by making the error term
* *
0
1
L
t t 
=
 in (11) independent of the other terms on the right-hand side of equations (11)
and (12), a step that we must reject.
3.5 Estimating the coefficients of (16)
The generalized least squares (GLS) estimator of is
ˆ 1 1 1
( )z y z z yX X X Y − − −
 = (21)
where all the regular inverses are assumed to exist. Our discussion in Section 2 above shows
that the application of the generalized least squares method to the model in (1) does not lead
to consistent estimators of its coefficients. As should be clear by now, this problem goes
*
jtxaway in (16), the reasons being that, in (9), the included regressors taking the values ( ’s)
*
0tare independent of the sufficient set ( in (8)) of every omitted relevant regressor, and in
 t tX(16), the two random variables taking the values and are conditionally independent,
tZgiven .
uBased on the preceding discussion, the best linear unbiased predictor of is
31
ˆu 2
a u xD   1
 −
y Y ˆzX= ( - ) (22)
 ΔaStarting with the arbitrary initial values = 0 and = I, Chang, Hallahan and
Swamy (1992) and Chang, Swamy, Hallahan and Tavlas (2000) iteratively solved equations
 Δa
2
a (21) and (22) until the estimates of , , , and were stabilized.12
This iterative
procedure eliminates the dependence of these final estimates on the arbitrary starting values.
 Δa
2
aThe estimates of , , and obtained in the final iteration, are called “the residual-
y zX ˆˆ ˆˆbased estimates,” since they rely on the residuals ( - ), where is explained below.
y ˆ
yInserting these estimates into (20), gives an estimate of , denoted by . The feasible
GLS estimator of is
ˆˆ 1 1 1ˆ ˆ( )z y z z yX X X Y − − −
 = (23)
uand the feasible best linear unbiased predictor of is
ˆˆu 2
a u x
ˆˆ D   1ˆ −
y Y ˆˆzX= ( - ) (24)
2
aˆ 2
a u
ˆwhere is the residual-based estimate of and is the residual-based estimate
 Δaobtained using the residual-based estimates of and in place of their true values used
uin .
The estimates of the coefficients of (10) are
ˆ jt 0
ˆˆ j
1
1
ˆˆ
−
=

p
jh ht
h
z ˆˆjtu= + + (j = 0, 1, …, K – 1) (25)
12
This stabilization does not take place if  is non-diagonal.
32
ˆˆ ˆˆuwhere the ’s and ’s ˆˆ ˆˆuare the elements of and , respectively. The estimates of direct
and indirect effects used as intermediary estimates in the computation of the total effect of
*
jtx *
tythe jth regressor of (9) on its dependent variable at time t are
*
jt
ˆ
1
1-
*
jt
jt
ˆ
x

−
 
 
 
 
0 j
ˆˆ
1
 
=

itL
* *
jt t
1
1-
*
jt
jt
ˆ
x

−
 
 
 
 
1
1

−
=

p
jh ht
h
ˆˆ z jt
ˆˆu
1*
ˆ
ˆ1
jt
jt
jtx


−
 
− 
 
 
= , = ( + ), and , (26)
respectively.
Given a set of coefficient drivers, we need to experiment with different exclusion
restrictions on the coefficients of (14) and compare results. Usually, omitted relevant
1*
ˆ
ˆ1
jt
jt
jtx


−
 
− 
 
 
regressors are not identified and, therefore, we can only use the estimate, , of
*
jtx *
tythe total effect of on for all j = 1, …, K - 1.
3.6 Second-order properties of the feasible GLS estimator (23)
In this section, we study the second-order properties of (23). For this purpose, we use
Cavanagh and Rothenberg’s (1995) method but not their model in (1). Because of PS’ (1984,
1988) results listed in Section 2, CR’s (1995) GLS and feasible GLS estimators are replaced
by those in (21) and (23), respectively. Following CR, we standardize (21) as
ˆ( , )h   1/21 1
ˆ( )
( )
 
 − −
 −
  
 z y z
l
l X X l
1z xx
T
 
= = (27)
Kp 1zx T 1 1
( )  − −
 z z y zL X X X lwhere l is any -vector of constants, =
1/21 1
( )
−− −
   z y zl X X l 1
 −
y  L L L T Tis a column vector with T rows, = , and is a
txlower triangular matrix, the dependence of which on the ’s should not go unnoticed, and
33
x L xD= U 1T  1 1z zx x xis a vector. One may verify that = T and the elements of are
uncorrelated, mean zero random variables with unit variance. The standardization of (23),
ˆ( , )h  which is the same as that of is
ˆ ˆˆ( , )h   1/21 1
ˆˆ( )
( )
 
 − −
 −
  
 z y z
l
l X X l
= (28)
To develop the 1
( ) −
T approximation to the difference between the distributions of
ˆ ˆˆ( , )h   ˆ( , )h    and , we assume that the number of rows of the column vectors and
zXremains fixed, while the number of the rows of and the number of rows and columns of
y →  1
/ −
z y zX X Tincrease as T in such a way that converges in probability to a
tZ ˆ positive definite matrix, given .13
Let denote the residual-based estimate of
developed by Chang, Hallahan and Swamy (1992) and Chang, Swamy, Hallahan and Tavlas
(2000); and let d T ˆ = ( - ). We assume that d converges in distribution to a normal
distribution uniformly in .14
Note that CR (1995) do not require uniform convergence. We
 y further assume that the elements of are differentiable with respect to up to the third
order.
ˆ ˆˆ( , )h  Expanding in a stochastic Taylor series gives
13
Time-series settings that involve time trends, polynomial time series, and trending variables give cases where
this assumption is not satisfied. In these cases, we use Grenander’s conditions presented in Greene (2012, p.
65).
14
See Lehmann and Casella (1998, p. 441) for the importance of uniform convergence, defined in Lehmann
(1999, p. 93-97).
34
1ˆ ˆˆ ˆ( , ) ( , )h h
T
   = +  +b d
1
T
 
 
 
d Cd + R (29)
where b
ˆ ˆˆ( , )
ˆ
h  



= is the m-dimensional vector, C
2 ˆ ˆˆ1 ( , )
ˆ ˆ2
 
 

 
h
m m= is the matrix, the
derivatives in both b and C are evaluated at the true value and are stochastically bounded
→  3/2
( ) −
p Tas T , and R is .
The desirable property of equation (29) is that it is based on (9) with unique
coefficients and error term and on Assumptions I-IV. The derivative
b 1 1 1
1 1
1 ˆ ˆ( )
ˆ ( )
 

− − −
− −
 
    
   
 
z y z z y x
z y z
l X X X D U
l X X l
= (30)
is a column vector with m rows which is evaluated at the true value . Its ith element is
1 1
1
( ) − −
 z y zl X X l
1 1 1
1 1 1
ˆ ˆ( ) ( )
ˆ ˆ( )
ˆ ˆ
 
 
− − −
− − −
    
  + 
   
z y x z y z
z y z z y x
i i
X D U X X
l X X X D U
 
ˆi
ˆ ibwhere is the ith element of . Let denote the ith element of b. This element can be
written as
1 1
1
( ) − −
 z y zl X X l

1 1 1 1 1 1 1 1 1 1 1
ˆ ˆ( ) ( )
ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ( ) ( ) ( )
ˆ ˆ
 
       − − − − − − − − − − −
   
      − + 
   
y y
z y z z y y x z y z z y y z z y z z y x
i i
l X X X D U X X X X X X X D U
 
(31)
35
ˆ( )
ˆ


y
i
T T ˆ
ywhere is the matrix with zero in every position except for the elements of
ˆ
i ibwhich are functions of . The element can be compactly written as
1 1
1
( ) − −
 z y zl X X l

1 1 1 1 1 1 1
ˆ( )
( ) ( )
ˆ

    − − − − − − −
       − −    
y
z y z z y y z z y z z y x
i
l X X X I X X X X D U

(32)
From this result it follows that C m mis the matrix having typical element
1 1
1
( ) − −
 z y zl X X l

2
1 1 1 1 1 1 1
ˆ( )
( ) ( )
ˆ ˆ

    − − − − − − −
       − −     
y
z y z z y y z z y z z y x
i j
l X X X I X X X X D U
 
(33)
ˆ
j ˆ T T
2 ˆ( )
ˆ ˆ 

 
y
i j
where is the jth element of , and the matrix is evaluated at the true
value .
The ith element of b in (32) and the (i,j)th element of C in (33) can be written as
ib 2 z i xx
T
ijc 2 z ij xx
T
= , = (34)
2z ix 2z ijx 1zxwhere the T-dimensional vectors and are orthogonal to . From this result it
follows that b and C ˆ( , )h   zXare uncorrelated with given .
If the error term U zX ˆ( , )h  of (16) is normal, then conditional on , the variables ,
b, and C are also normal. In this case, if the remainder term R in (29) is “well behaved” in
36
ˆ ˆˆ( , )h  the sense to be defined presently, allowing it to be ignored, the distribution of can
be approximated to order using the asymptotic distribution of d. If the distribution of
1
( )T − ˆ ˆˆ( , )h  the error term of (16) is not normal, then the approximate distribution of
ˆ( , )h  can be obtained as long as the joint distribution of , b, C, and d is asymptotically
normal and possesses an Edgeworth expansion.
To proceed with estimating model (16), we need to determine its regularity
conditions when the error covariance is given by (20). These conditions should also imply
ˆ ˆˆ( , )h   1
( )T −
that an approximation to the distribution of with error can be calculated
from the moments of the approximate distribution of a random variable, , defined in the
regularity conditions of Assumption V stated next.
Assumption V: (Regularity condition) In the stochastic expansion (29), which
depends on unique coefficients and the unique error term of (9), and which satisfies
Assumptions I-IV, (i) the remainder term R satisfies 1
( )T −
Pr[TlogT|R| > 1] = ; (ii) the
1zx 2z ix 2z ijxelements of the vectors , and appearing in (27) and (34) are uniformly
xbounded as T tends to infinity; (iii ) the error term possesses moments up to the fourth
 ˆ( , )h  order, and (iv) the vector , consisting of the distinct elements of ( , b, C, d),
1
T −
possesses a valid Edgeworth expansion to order , that is, the joint density function for
 has the approximation
( ) 
( )
1
i i ijk i j k
TT
       +
+ + 
  
  1
( )T −
+ , (35)
37
where is the density function of a normal random vector with mean zero and covariance
 ( )  s,matrix ; is an even-order polynomial whose coefficients, along with the are
uniformly bounded scalars.
CR (1995) also adopted Assumption V for their model (1), but, as we now
demonstrate, unless modified in ways spelled out presently, a model of type (16) violates
Assumption V, given that, as presented, its error covariance matrix in (20) varies over time
as a function of the included regressors because in (35) is assumed to be constant. To
 *
remedy this shortcoming, we propose to replace each in CR’s (1995) paper with
which, given the definitions of b and C xDin (29), depends upon . This expanded definition
*
  ˆ( , )h  of is further justified because consists of the distinct elements of ( , b, C, d).
Note that, more generally, Assumption V is inappropriate for model (2), which is the
same as CR’s model (1) and other models employed in applied work, because, given results
(i)-(iii) at the end of Section 2.2, such models are plagued by non-uniqueness of their
coefficients and error terms. Since (16) is free of these problems when Assumptions I-IV
 *
hold, Assumption V with replaced by appears to be appropriate for (16) and implies
ˆ ˆˆ( , )h   1
( )T −
that an approximation to the distribution of with error can be calculated
from the moments of the approximate distribution of .
It follows from Cavanagh (1983) and Rothenberg (1984, 1988) that the distribution
ˆ ˆˆ( , )h   1
T −
of is the same, to order , as the distribution of
ˆ( , )h   +
1
ˆ( | ( , )) E b d h
T
+
38
1 1
ˆ ˆ ˆ( | ( , )) ( , ) var( | ( , ))
ˆ2 ( , )
     
 
  
 + −    
E d Cd h h b d h
T h
(36)
ˆ( , )h  CR (1995, p. 279) established that is asymptotically independent of b and C.
m m bb dd bdWe use the matrices , , and to denote the asymptotic variance and
covariance matrices for the vectors b and d Adand the m-vector to denote the asymptotic
ˆ( , )h  covariance between and d, as in CR (1995, p. 279). We can use CR’s (1995) method
1
( )T −
of proof to show that, based on moments of the Edgeworth approximation to the
distributions,
ˆ ˆˆ( , )h   ˆ( , )h  (i) the skewness of is always the same as that of ;
bd ˆ ˆˆ( , )h   ˆ( , )h  (ii) if = 0, then the mean of is the same as that of .
Ad ˆ ˆˆ( , )h   ˆ( , )h  (iii) if = 0, then the kurtosis of is the same as that of .
ˆ( , )h  It can further be shown that if is asymptotically uncorrelated with d, then
ˆ ˆˆ( , )h    ˆ( , )h  
var( )b d
T
 ˆ2 ( ( , ) )T E h b d
T
 
var( ) var( ) + + (37)
ˆ ˆˆ( , )h  This approximate variance of is necessarily greater than the variance of if
the third term on the right-hand side of approximate equation (37) is nonnegative.
4. CONCLUSIONS
Econometric work seeking to identify unique causal relationships cannot produce
consistent estimators if it does not take into account the implications of PS’ (1984, 1988)
conclusion that in any model, the included regressors cannot be uncorrelated with every
omitted relevant regressor. This result implies that in instances where these omitted relevant
39
regressors constitute the error term of a model, the generalized least squares estimators of its
coefficients are inconsistent. Also, in this case, the coefficients and error term of the model
are not unique. This paper offers a way around this dilemma with a methodology that does
yield unique coefficients and error terms. The proposed estimator involves the use of
functional forms that may not be known, a problem that we solve by allowing all the
coefficients in a model to vary freely with the observations on the dependent variable and
the included regressors. As we show, the unique coefficient on each included regressor of a
model with unique error term is the sum of direct and indirect effects of the regressor on the
dependent variable. An issue arises if the omitted relevant regressors chosen to define these
direct and indirect effects are not identified, because in this instance, there exists no
meaningful distinction between direct and indirect effects. To solve this problem, we offer
a two-level formulation of a time-varying coefficient model, where each coefficient on an
included regressor is expressed as a function of certain coefficient drivers that absorb indirect
effects of the included regressor present in the coefficient. For completeness, we study the
second-order properties of the feasible GLS estimator of the coefficients of such a two-level
formulation. To conclude, the methodology described in this paper makes it possible to write
down and estimate models that allow valid causal interpretations of their coefficients, a result
that we hope will be welcomed in the profession.
REFERENCES
AIGNER, D. J. and ZELLNER, A. (1988). Editors’ introduction, J. Econom., 12, 1-5.
ANDERSON, T.W. (1971). The Statistical Analysis of Time Series, John Wiley & Sons, New
York.
BASMANN, R. L. (1988). Causality tests and observationally equivalent representations of
40
econometric models. J. Econom., 39, 69-104.
CAVANAGH, C. L. (1983). Hypothesis testing in models with discrete dependent variables,
PhD thesis. University of California at Berkley.
CAVANAGH, C. L. and ROTHENBERG, T. J. (1995). Generalized least squares with
nonnormal errors. In Advances in econometrics and quantitative econometrics, G. S.
Maddala, P. C. B. Phillips, T. N. Srinivasan (Eds.). Blackwell Publishers, Inc.,
Cambridge, MA, USA.
CHANG, I., HALLAHAN, C. and SWAMY, P. A. V. B. (1992). Efficient computation of
stochastic coefficients models. In Computational economics and econometrics, H. M.
Amman, D. A. Belsley, L.F. Pau (Eds.). Kluwer Academic Publications, Boston, MA.
CHANG, I., SWAMY, P. A. V. B., HALLAHAN, C. and TAVLAS, G. S. (2000). A computational
approach to finding causal economic laws. Compu. Econ. 16, 105-136.
DURBIN, J. and KOOPMAN, S. J. (2001). Time series analysis by state space methods. Oxford
university press, Oxford.
FREEDMAN, D. A. (2005). What is the error term in a regression equation?
https://www.stat.berkeley.edu/~census/epsilon.pdf.
GOLDBERGER, A. S. (1964). Econometric theory. John Wiley & Sons, New York.
GOLDBERGER, A. S. (1987). Functional form and utility: A review of consumer demand
theory. Westview Press, Boulder.
GREENE, W. H. (2012). Econometric analysis, Seventh edition. Prentice Hall-Pearson,
Upper Saddle River, NJ.
HECKMAN, J. J. and SCHMIERER, D. (2010). Tests of hypotheses arising in the correlated
random coefficient model. Economic Modelling, 27, 1355-1367.
HILDRETH, C. and HOUCK, J. P. (1968). Some estimators for a model with random
coefficients, J. Am. Stat. Assoc., 63, 584-595.
LEHMANN, E. L. and CASELLA, G. (1998). Theory of point estimation, Second edition.
Springer, New York.
LEHMANN, E. L. (1999). Elements of large sample theory. Springer, New York.
PRATT, J. W. and SCHLAIFER, R. (1984). On the nature and discovery of structure (with
discussion). J. Am. Stat. Assoc., 79, 9-21, 29-33.
41
PRATT, J. W. and SCHLAIFER, R. (1988). On the interpretation and observation of laws. J.
Econom., 39, 23-52.
RAO, C. R. (1973). Linear Statistical Inference and Its Applications, Second edition. John
Wiley & Sons, New York.
RAO, J. N. K. (2003). Small area estimation. John Wiley & Sons, Inc., Publication, New
York.
ROTHENBERG, T. J. (1984). Approximating the distribution of econometric estimators and
test statistics. In Handbook of econometrics, Volume 2, Z. Griliches and I. Intriligator
(Eds.). North-Holland, Amsterdam.
ROTHENBERG, T. J. (1988). Approximate power functions for some robust tests of
regression coefficients. Econometrica, 56, 997-1019.
RUBIN, D. B. (1978). Bayesian inference for causal effects. Ann. Statist., 6, 34-58.
SKYRMS, B. (1988). Probability and causation. J. Econom., 39, 53-68.
SWAMY, P. A. V. B. and TINSLEY, P. A. (1980). Linear prediction and estimation methods
for regression models with stationary stochastic coefficients. J. Econom., 12, 103-142.
SWAMY, P. A. V. B., CONWAY, R. K. and VON ZUR MUEHLEN, P. (1985). The foundations
of econometrics – Are there any? (with discussion). Econometric reviews, 4, 1-61, 101-
119.
.
SWAMY, P. A. V. B. and VON ZUR MUEHLEN, P. (1988). Further thoughts on testing for
causality with econometric models. J. Econom. 39, 105-147.
SWAMY, P.A.V.B., MEHTA, J. S., Tavlas, G. S. and HALL, S. G. (2014). Small area
estimation with correctly specified linking models. In Recent advances in estimating
nonlinear models with applications in economics and finance, J. Ma and M. Wohar
(Eds.). Springer, New York.
SWAMY, P. A. V. B., MEHTA, J. S., TAVLAS, G. S. and HALL, S. G. (2015). Two applications
of the random coefficient procedure: Correcting for misspecifications in a small area
level model and resolving Simpson’s paradox. Economic Modelling, 45, 93-98.
SWAMY, P. A. V. B., MEHTA, J. S. and CHANG, I. (2017). Endogeneity, time-varying
coefficients, and incorrect vs. correct ways of specifying the error terms of econometric
models. Econometrics, 5, 8; doi: 10.3390/econometrics/5010008.
42
SWAMY, P. A. V. B., VON ZUR MUEHLEN, P., MEHTA, J. S. and CHANG, I. (2018).
Alternative approaches to the econometrics of panel data. In Panel data econometrics
theory, Mike Tsionas (Ed.). Academic Press/Elsevier, 50 Hampshire Street, 5th
Floor,
Cambridge, MA.
ZELLNER, A. (1979). Causality and econometrics. In Three aspects of policy and
policymaking, K. Brunner and A.H. Meltzer (Eds.). North Holland, Amsterdam.
ZELLNER, A. (1988). Causality and causal laws in economics. J. Econom., 39, 7-21.

More Related Content

What's hot (18)

cca stat
cca statcca stat
cca stat
 
Ash bus 308 week 4 quiz
Ash bus 308 week 4 quizAsh bus 308 week 4 quiz
Ash bus 308 week 4 quiz
 
Chi square test
Chi square testChi square test
Chi square test
 
Ash bus 308 week 4 quiz
Ash bus 308 week 4 quizAsh bus 308 week 4 quiz
Ash bus 308 week 4 quiz
 
Moderator mediator
Moderator mediatorModerator mediator
Moderator mediator
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
interpretation_of_probability
interpretation_of_probabilityinterpretation_of_probability
interpretation_of_probability
 
Ash bus 308 week 4 quiz
Ash bus 308 week 4 quizAsh bus 308 week 4 quiz
Ash bus 308 week 4 quiz
 
Chi squared test
Chi squared testChi squared test
Chi squared test
 
Chi square Test
Chi square TestChi square Test
Chi square Test
 
Chi square test
Chi square testChi square test
Chi square test
 
Chi square test presentation
Chi square test presentationChi square test presentation
Chi square test presentation
 
J itendra cca stat
J itendra cca statJ itendra cca stat
J itendra cca stat
 
Chisquare Test
Chisquare Test Chisquare Test
Chisquare Test
 
Research methodology chi square test
Research methodology  chi square testResearch methodology  chi square test
Research methodology chi square test
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Introduction to Mediation using SPSS
Introduction to Mediation using SPSSIntroduction to Mediation using SPSS
Introduction to Mediation using SPSS
 
SEM
SEMSEM
SEM
 

Similar to The state of econometrics

A General Approach To Causal Mediation Analysis
A General Approach To Causal Mediation AnalysisA General Approach To Causal Mediation Analysis
A General Approach To Causal Mediation AnalysisJeff Brooks
 
Antidote final tle32101192%2 e1
Antidote final tle32101192%2 e1Antidote final tle32101192%2 e1
Antidote final tle32101192%2 e1Arthur Weglein
 
A timely and necessary antidote to indirect methods and so called - arthur we...
A timely and necessary antidote to indirect methods and so called - arthur we...A timely and necessary antidote to indirect methods and so called - arthur we...
A timely and necessary antidote to indirect methods and so called - arthur we...Arthur Weglein
 
Arthur B. Weglein 2013
Arthur B. Weglein 2013Arthur B. Weglein 2013
Arthur B. Weglein 2013Arthur Weglein
 
A_Partial_Least_Squares_Latent_Variable_Modeling_A.pdf
A_Partial_Least_Squares_Latent_Variable_Modeling_A.pdfA_Partial_Least_Squares_Latent_Variable_Modeling_A.pdf
A_Partial_Least_Squares_Latent_Variable_Modeling_A.pdfLuqmanHakim478
 
Complexity theory and public management a ‘becoming’ field
Complexity theory and public management a ‘becoming’ field Complexity theory and public management a ‘becoming’ field
Complexity theory and public management a ‘becoming’ field LynellBull52
 
The Logical Implication Table in Binary Propositional Calculus: Justification...
The Logical Implication Table in Binary Propositional Calculus: Justification...The Logical Implication Table in Binary Propositional Calculus: Justification...
The Logical Implication Table in Binary Propositional Calculus: Justification...ijcsta
 
Theory DevelopmentInstitution AffiliationCourseDate.docx
Theory DevelopmentInstitution AffiliationCourseDate.docxTheory DevelopmentInstitution AffiliationCourseDate.docx
Theory DevelopmentInstitution AffiliationCourseDate.docxchristalgrieg
 
SubmissionCopyAlexanderBooth
SubmissionCopyAlexanderBoothSubmissionCopyAlexanderBooth
SubmissionCopyAlexanderBoothAlexander Booth
 
Fisher2010 IMEKO J Physics Conf Series1742 6596 238 1 012016
Fisher2010 IMEKO J Physics Conf Series1742 6596 238 1 012016Fisher2010 IMEKO J Physics Conf Series1742 6596 238 1 012016
Fisher2010 IMEKO J Physics Conf Series1742 6596 238 1 012016wpfisherjr
 
A guide to molecular mechanics and quantum chemical calculations
A guide to molecular mechanics and quantum chemical calculationsA guide to molecular mechanics and quantum chemical calculations
A guide to molecular mechanics and quantum chemical calculationsSapna Jha
 
Evaluation Of A Correlation Analysis Essay
Evaluation Of A Correlation Analysis EssayEvaluation Of A Correlation Analysis Essay
Evaluation Of A Correlation Analysis EssayCrystal Alvarez
 
Not sure how to do this case analysis please help me do it!1.Are t.pdf
Not sure how to do this case analysis please help me do it!1.Are t.pdfNot sure how to do this case analysis please help me do it!1.Are t.pdf
Not sure how to do this case analysis please help me do it!1.Are t.pdfamitbagga0808
 
A Critique of Factor Analysis of Interest Rates
A Critique of Factor Analysis of Interest RatesA Critique of Factor Analysis of Interest Rates
A Critique of Factor Analysis of Interest RatesIlias Lekkos
 

Similar to The state of econometrics (20)

A General Approach To Causal Mediation Analysis
A General Approach To Causal Mediation AnalysisA General Approach To Causal Mediation Analysis
A General Approach To Causal Mediation Analysis
 
Antidote final tle32101192%2 e1
Antidote final tle32101192%2 e1Antidote final tle32101192%2 e1
Antidote final tle32101192%2 e1
 
A timely and necessary antidote to indirect methods and so called - arthur we...
A timely and necessary antidote to indirect methods and so called - arthur we...A timely and necessary antidote to indirect methods and so called - arthur we...
A timely and necessary antidote to indirect methods and so called - arthur we...
 
Arthur B. Weglein 2013
Arthur B. Weglein 2013Arthur B. Weglein 2013
Arthur B. Weglein 2013
 
A_Partial_Least_Squares_Latent_Variable_Modeling_A.pdf
A_Partial_Least_Squares_Latent_Variable_Modeling_A.pdfA_Partial_Least_Squares_Latent_Variable_Modeling_A.pdf
A_Partial_Least_Squares_Latent_Variable_Modeling_A.pdf
 
Complexity theory and public management a ‘becoming’ field
Complexity theory and public management a ‘becoming’ field Complexity theory and public management a ‘becoming’ field
Complexity theory and public management a ‘becoming’ field
 
The Logical Implication Table in Binary Propositional Calculus: Justification...
The Logical Implication Table in Binary Propositional Calculus: Justification...The Logical Implication Table in Binary Propositional Calculus: Justification...
The Logical Implication Table in Binary Propositional Calculus: Justification...
 
Theory DevelopmentInstitution AffiliationCourseDate.docx
Theory DevelopmentInstitution AffiliationCourseDate.docxTheory DevelopmentInstitution AffiliationCourseDate.docx
Theory DevelopmentInstitution AffiliationCourseDate.docx
 
SubmissionCopyAlexanderBooth
SubmissionCopyAlexanderBoothSubmissionCopyAlexanderBooth
SubmissionCopyAlexanderBooth
 
Fisher2010 IMEKO J Physics Conf Series1742 6596 238 1 012016
Fisher2010 IMEKO J Physics Conf Series1742 6596 238 1 012016Fisher2010 IMEKO J Physics Conf Series1742 6596 238 1 012016
Fisher2010 IMEKO J Physics Conf Series1742 6596 238 1 012016
 
Naszodi a
Naszodi aNaszodi a
Naszodi a
 
A guide to molecular mechanics and quantum chemical calculations
A guide to molecular mechanics and quantum chemical calculationsA guide to molecular mechanics and quantum chemical calculations
A guide to molecular mechanics and quantum chemical calculations
 
Evaluation Of A Correlation Analysis Essay
Evaluation Of A Correlation Analysis EssayEvaluation Of A Correlation Analysis Essay
Evaluation Of A Correlation Analysis Essay
 
Not sure how to do this case analysis please help me do it!1.Are t.pdf
Not sure how to do this case analysis please help me do it!1.Are t.pdfNot sure how to do this case analysis please help me do it!1.Are t.pdf
Not sure how to do this case analysis please help me do it!1.Are t.pdf
 
1756-0500-3-267.pdf
1756-0500-3-267.pdf1756-0500-3-267.pdf
1756-0500-3-267.pdf
 
C0331021038
C0331021038C0331021038
C0331021038
 
A Critique of Factor Analysis of Interest Rates
A Critique of Factor Analysis of Interest RatesA Critique of Factor Analysis of Interest Rates
A Critique of Factor Analysis of Interest Rates
 
Algo sobre cladista to read
Algo sobre cladista to readAlgo sobre cladista to read
Algo sobre cladista to read
 
SIMULATION MODEL OF ORGANIZATIONAL CHANGE FOR ENVIRONMENTAL ENGINEERING CENTE...
SIMULATION MODEL OF ORGANIZATIONAL CHANGE FOR ENVIRONMENTAL ENGINEERING CENTE...SIMULATION MODEL OF ORGANIZATIONAL CHANGE FOR ENVIRONMENTAL ENGINEERING CENTE...
SIMULATION MODEL OF ORGANIZATIONAL CHANGE FOR ENVIRONMENTAL ENGINEERING CENTE...
 
Mechanism tilly
Mechanism tillyMechanism tilly
Mechanism tilly
 

Recently uploaded

Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Commonwealth
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managmentfactical
 
Chapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionChapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionMuhammadHusnain82237
 
Vp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppVp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppmiss dipika
 
Mulki Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mulki Call Girls 7001305949 WhatsApp Number 24x7 Best ServicesMulki Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mulki Call Girls 7001305949 WhatsApp Number 24x7 Best Servicesnajka9823
 
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfBPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfHenry Tapper
 
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service AizawlVip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawlmakika9823
 
Governor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintGovernor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintSuomen Pankki
 
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证jdkhjh
 
20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdfAdnet Communications
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdfHenry Tapper
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfMichael Silva
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...Henry Tapper
 
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...Amil baba
 
Tenets of Physiocracy History of Economic
Tenets of Physiocracy History of EconomicTenets of Physiocracy History of Economic
Tenets of Physiocracy History of Economiccinemoviesu
 
VIP Kolkata Call Girl Serampore 👉 8250192130 Available With Room
VIP Kolkata Call Girl Serampore 👉 8250192130  Available With RoomVIP Kolkata Call Girl Serampore 👉 8250192130  Available With Room
VIP Kolkata Call Girl Serampore 👉 8250192130 Available With Roomdivyansh0kumar0
 
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Sapana Sha
 
Financial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and DisadvantagesFinancial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and Disadvantagesjayjaymabutot13
 
Q3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesQ3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesMarketing847413
 
government_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfgovernment_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfshaunmashale756
 

Recently uploaded (20)

Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managment
 
Chapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionChapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th edition
 
Vp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsAppVp Girls near me Delhi Call Now or WhatsApp
Vp Girls near me Delhi Call Now or WhatsApp
 
Mulki Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mulki Call Girls 7001305949 WhatsApp Number 24x7 Best ServicesMulki Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mulki Call Girls 7001305949 WhatsApp Number 24x7 Best Services
 
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdfBPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
BPPG response - Options for Defined Benefit schemes - 19Apr24.pdf
 
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service AizawlVip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
 
Governor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintGovernor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraint
 
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
 
20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdf
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdf
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
 
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
 
Tenets of Physiocracy History of Economic
Tenets of Physiocracy History of EconomicTenets of Physiocracy History of Economic
Tenets of Physiocracy History of Economic
 
VIP Kolkata Call Girl Serampore 👉 8250192130 Available With Room
VIP Kolkata Call Girl Serampore 👉 8250192130  Available With RoomVIP Kolkata Call Girl Serampore 👉 8250192130  Available With Room
VIP Kolkata Call Girl Serampore 👉 8250192130 Available With Room
 
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
 
Financial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and DisadvantagesFinancial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and Disadvantages
 
Q3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesQ3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast Slides
 
government_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfgovernment_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdf
 

The state of econometrics

  • 1. 1 THE STATE OF ECONOMETRICS AFTER JOHN W. PRATT, ROBERT SCHLAIFER, BRIAN SKYRMS, AND ROBERT L. BASMANN By P. A. V. B. SWAMY Federal Reserve Board (Retired), Washington, DC 20551, USA; swamyparavastu@hotmail.com PETER VON ZUR MUEHLEN Federal Reserve Board (Retired), Washington, DC 20551, USA; pmuehlen@verizon.net J. S. MEHTA Department of Mathematics (Retired), Temple University, Philadelphia, PA 19122, USA; mehta1007@comcast.net and I-LOK CHANG Department of Mathematics (Retired), American University, Washington, DC 20016, USA; ilchang@verizon.net Correspondence: swamyparavastu@hotmail.com; Tel.: 571-435-4979 SUMMARY. Thirty-five years ago, introducing a distinction between factors and concomitants in regressions, John W. Pratt and Robert Schlaifer determined that the error term in a regression represents the net effect of omitted relevant regressors. As this paper demonstrates, this assumption poses a problem whenever the purpose of a model is to explain an economic phenomenon, because the estimated coefficients as well as the error will be wrong in the sense that they are not unique. But a model that is not unique cannot be a causal description of unique events in the real world. For a remedy, this paper presents a methodology based on conditions under which the error term and the coefficients on regressors included in a model do become unique, where the latter represent the sums of direct and indirect effects on the dependent variable, with omitted but relevant regressors having been chosen to define both these effects. The two effects corresponding to any particular omitted relevant regressor can be learned only by converting that regressor into an included regressor. For those cases where omitted relevant regressors are not identified, thereby preventing a meaningful distinction between direct and indirect effects, we introduce so-called coefficient drivers and a feasible method of generalized least squares, permitting a “total-effect” causal interpretation of the coefficient on each regressor included in a model. Key words: unique time-varying coefficient and unique error term; direct effect; indirect effect; and total effect of a regressor; omitted relevant regressor; coefficient driver; measurement-error bias JEL Classifications: C16; C21; C22
  • 2. 2 1. INTRODUCTION This paper seeks to address and then remedy a problem that, in our view, has beset econometrics since its inception: the lack of uniqueness of results inherent in any of the regression methodologies currently in use. As we will show, this problem is compounded by an apparent disregard of omitted-regressor biases that afflict current practice in econometrics. This paper is not the first to call attention to this issue, the first alarms having been raised as long as 35 years ago by Pratt and Schlaifer’s (1984, 1988) papers calling attention to a fundamental mistake in econometrics: the common assumption that the included regressors in a model are uncorrelated, mean independent, or independent of the model error term. Since the error term of a model is made up of omitted relevant regressors which are unknown, and since, according to the authors, the included regressors in the model cannot be uncorrelated with every omitted relevant regressor that affects the dependent variable in the regression, the pervasive assumption that the included regressors are not correlated with the effect of unidentified omitted relevant regressors is meaningless. An even stronger assumption that the included regressors are not correlated with any omitted relevant regressor is patently false. Compelling and important as their contribution has been, it appears that up to now, it has been widely ignored. One hurdle, aside from a possible refusal to accept Pratt and Schlaifer’s very arguments, is the perceived difficulty of any remedy that would accommodate their critique in meaningful ways, including the techniques presented here and in earlier writings by some of the co-authors of this paper. To date, only Swamy, Peter von zur Muehlen, I-Lok Chang, Jathinder Mehta, George Tavlas, and various coauthors have actually estimated unique models using their proposed methodology, one
  • 3. 3 that has taken time and effort to refine. A critical element in Swamy and I-Lok Chang’s estimation method is the use of so-called “coefficient drivers,” observable variables that enter a regression model but not in ways with which econometricians are familiar. The necessary search for coefficient drivers, which relies on practical experience and theoretical insight, and the subtleties of their application, may be a third obstacle to wider adoption of methodologies that assure accurate estimation of unique coefficients and error terms. Econometrics serves two functions, one being practical and the other metaphysical. Its most practical function is to predict economic events where “why” matters less than “how.” If a model predicts well over time, it is considered successful, regardless of how it was constructed. By contrast, the metaphysical purpose of econometrics is to explain empirical relationships and to provide causal interpretations. It is this latter -- conceptually far more complex -- endeavor that this paper seeks to treat. For purposes of prediction Swamy and Tinsley (1980, pp. 111 and 112) developed a feasible minimum mean square error linear predictor of needed future values of the dependent variable in a general predictive model under the novel (for that time) assumption that the coefficients of an econometric model follow a vector autoregressive and moving average (ARMA) process. Regarding the metaphysical purpose of providing causal interpretations, an early contribution by Swamy and von zur Muehlen (1988) addressed a number of fundamental issues that arise when the task is to determine causality in probabilistic models, with special
  • 4. 4 attention given to inductive procedures used in the literature that had seemed to be in violation of Jeffreys’ rules 1 and 2 involving the requirement of logical consistency.1 In the present context, the issue of causal interpretation arises when, as is often the case, the error term in a regression is thought to represent the effects of relevant regressors left out of the equation. As we shall demonstrate, in such instances, the coefficients and error terms themselves are not unique and therefore lack the interpretation of causal effects, because when coefficients cannot be given unique interpretations, they cannot be reflections of the real world which, by definition, is unique. In this paper, we shall show that proper causal interpretations are possible if we replace commonly held restrictive assumptions underlying models of the kind considered here with meaningful alternatives. A precedent for this is Freedman (2005), who, drawing on Pratt and Schlaifer’s (1984, 1988) insight that if the error represents the net effect of omitted relevant regressors, then it must be correlated with the included regressors, also concluded that standard assumptions fail. For the special case of a model in which subjects are independently and identically distributed (iid) and all variables are jointly normal with expectation zero, he showed that necessary conditions to prevent estimating the “wrong parameters” were that the combined effect of the omitted regressors (i) is independent of each variable included in the equation, (ii) is independent across subjects, and (iii) has expectation 0. But he also indicated that these assumptions are unrealistic and therefore “hard to swallow.” The purpose of this paper then is to develop conditions that are realistic and palatable and to embed them in a methodology under which 1 The importance of the distinctions raised by Swamy and von zur Muehlen (1988) was highlighted in Aigner and Zellner (1988, p. 3) and Zellner (1988, pp. 7 and 8).
  • 5. 5 valid causal interpretations of a model do become possible and even natural. Like Freedman, we take our cue from Pratt and Schlaifer (1984,1988) (hereafter PS), who defined a class of relations called “laws” and provided conditions under which such laws can be observed in data. PS did not dwell on exact epistemologies of “causation,” so we shall follow Basmann (1988) who defined causality as designating a property of the real world, meaning that all relations considered to be causal orderings must be unique description of the real world. The ideas of PS and Basmann lead us to consider as causal models only those that are free of mis-specifications in the sense that their coefficients and error terms are unique.2 Any discussion of causation in the present context would not be complete without considering the distinction between deterministic and statistical causation, so important for any treatment of causation in econometrics. Swamy, Conway, and von zur Muehlen (1985) discussed such a distinction in the context of a treatise on the possibilities and limitations of logical and causal inference in an Aristotelian framework when some statements are probabilistic. Subsequently, Swamy and von zur Muehlen (1988, pp. 141-144) extended the concept of logical entailment -- the possibility of inferring some Q from P via modus ponens -- to probabilistic entailment -- the possibility of valid probabilistic inference of some statement S with given probability from some other statement S also having some probability, without violating basic Aristotelian principles of logic. Finally, in a paper that has specific applicability to the present discussion, Skyrms (1988, p. 59) wrote that statistical 2 Here, “mis-specifications-free models” describe “real-world relations,” and mis-specifications-free models with unique coefficients and error terms describe unique real-world relations. Since causality designates a property of the real-world, causal relations being unique in the real-world, causality is used here to designate a property of mis-specifications-free models with unique coefficients and error terms.
  • 6. 6 causation is positive statistical relevance which does not disappear when we control for all relevant pre-existing conditions, so that true correlation (or statistical causation) is not reduced to zero when we control for all relevant pre-existing conditions, a point that we shall take quite seriously. The remainder of this paper is divided into three sections. Section 2 considers the question of why models with non-unique coefficients and error terms cannot be causal. The section also shows why a random coefficient model is based on an assumption that cannot be satisfied and why so-called state-space models have non-unique coefficients and error terms. Section 3 contains the body of this paper, giving a step-by-step derivation of a non- linear model with unique coefficients and error term. The concepts involved in this derivation are: time-variability of coefficients and “sufficient sets” of omitted relevant regressors, i.e., of omitted regressors that affect the dependent variable of the model. Accordingly, we show that the unknown true functional form of a unique causal model requires all of its coefficients to be time-variant. The two most important take-aways from this paper are: (1) when omitted relevant regressors are not identified, there exists no meaningful distinction between direct and indirect effects on the dependent variable, which, along with measurement-error biases arise as components of the unique coefficients on the included regressors; and (2) we derive a feasible methodology based on the utilization of coefficient drivers that enables a representation of unique regression coefficients as the sums of direct and indirect effects (or total effects), thereby permitting valid causal interpretations involving real-world phenomena.
  • 7. 7 To present these main results, Section 3 is divided into six subsections. Section 3.1 introduces the concept of coefficient drivers and the admissibility condition they should satisfy. Section 3.2 shows that these drivers are useful for estimating the model when omitted relevant regressors are not identified. Section 3.3 gives a matrix formulation of the model when its time-varying coefficients are represented as linear stochastic functions of certain observable time-varying coefficient drivers. Section 3.4 states the assumptions under which the model in Section 3.3 is estimated. Section 3.5 derives an optimal predictor of the error term and the estimators of the coefficients and their components. Section 3.6 derives the second-order properties of the feasible generalized least squares (GLS) estimators of the coefficients of the model presented in Section 3.3, utilizing a methodology introduced by Cavanagh and Rothenberg (1995) with a modification. Section 4 concludes. 2. REQUIREMENTS OF A MODEL TO BE CAUSAL Consider the linear model Y X = + U (1) y T  tYHere it is assumed that is a 1 vector of values taken by the dependent variable , t = 1, …, T, which are random and written as a column vector Y, X as in equation (1), is a T K nonrandom design matrix of observations on K regressors which are called “the included X uregressors,” the rank of is K, is a T 1 vector of unobserved values taken by the tUrandom error terms , t = 1, …, T, which are random and written as a column vector U, as tU X t t sin (1), and T is the number of periods. It is further assumed that E( | ) = 0, ; and tU sU E( |X) is an element of , which is a nonsingular matrix and a smooth function of an unknown finite dimensional parameter vector , the sample size T is large relative to K and
  • 8. 8  ythe number of the rows of the column vector .3 In this section, we do not assume that and X contain measurement errors. We relax this assumption in Section 3. 2.1 Three interpretations of the error term in a model In order to dwell more deeply on the nature of causal modeling, it is useful to discuss three interpretations of u that have been either explicit or implicit in the econometrics and statistics literature. Interpretation (I). The value (u) of the disturbance (U) in (1) arises because it is extremely unlikely that in a model such as (1), all influences on y have been captured, no matter how thorough and careful its specification. If K is less than the total number of all these influences, then u represents all omitted regressors that affect y or simply all omitted relevant regressors. To provide intuition for their principal argument regarding the nature of u, Pratt and Schlaifer (1984, 1988) provided two examples, which we now repeat. Example 1. To show that a law can be recognized if it is known which excluded variable its error term depends on, Pratt and Schlaifer (1984, pp. 11-12) ask us to imagine a cylinder sealed at the bottom and closed at the top by a movable piston. Now, let P, V, and T denote the logarithms of the pressure, volume, and absolute temperature of the air. The deterministic law P = C - V + T has an alternative form P = C -1.4V + S because at ordinary temperatures, the entropy of the air in the cylinder, denoted by S, is a nearly linear function of T and V, i.e., S = T + .4V. If it is known that T is held constant, then the relation P = C - V + T between P and V is called “Boyle’s law.” If it is known that S is held constant (by allowing no heat 3 Throughout this paper, following Lehmann and Casella (1998, pp. 180-181), we maintain the distinction between each random variable and its value by using an upper-case symbol to denote the former and a lower- case symbol to denote the latter.
  • 9. 9 flow into or out of the cylinder), then the resulting relation, P = C -1.4V + S, between P and V is called “the adiabatic law.” Alternatively, if some non-experimental data on P and V are available but none on T or S, then the error term u in the presumed linear stochastic law P = 0 + V 1 + u is only known as being made up of one of the two excluded variables T and S, although it is not known which one. Since both T and S are excluded from the proposed linear stochastic law, P = 0 + V 1 + u, and since by the previous relation S = T + .4V, at least one of either T or S must be correlated with V, an assumption that V is not correlated with the effect of an unidentified excluded variable is meaningless, and the stronger assumption that V is not correlated with any excluded variable is surely false. Example 2. To show that excluded variables are not unique, Pratt and Schlaifer (1988, pp. 49-50), ask us to imagine a land, called Utopia, in which stocks are traded on only one day of each year, such that the price of each firm’s stock is wholly determined by the firm’s known earnings and dividends. Then the price is also wholly determined by earnings and retained earnings. PS conclude that “if one wants to estimate a law relating price to earnings only, then one cannot meaningfully talk about ‘the’ excluded variable (singular) because it could equally well be either dividends or retained earnings or some other function of earnings and dividends.” So, its error term depends on an omitted relevant regressor, which is “either dividends or retained earnings or some other function of earnings and dividends.” Thus, excluded variables are not unique! Furthermore, if the variable called “earnings” is independent of dividends, then it is dependent on retained earnings and vice versa, meaning that earnings cannot be independent of ‘the’ excluded variables (plural). If it is possible to make a forecast of dividends, then price is functionally determined by earnings, that forecast,
  • 10. 10 and the residual of the regression of dividends on the forecast. If we regress stock price on earnings only, then all that we can hope to learn is the ‘total’ effect of earnings, consisting in part of the ‘direct’ effect of earnings given (say) dividends or retained earnings, in part of the ‘indirect’ effect that may result from the fact that earnings may affect dividends or retained earnings, and dividends or retained earnings that may in turn affect price. Crucially, although the direct and indirect effects of earnings depend on the omitted regressor chosen to define them, the total effect does not! Interpretation (II). Heckman and Schmierer (2010, p. 1356) and others working on nonparametric regressions write equation (1) as Y = E(Y|X) + U and assume that E(U|X) = 0. In words, the disturbance is the deviation of Y from the conditional mean E(Y|X) given X (see Greene 2012, p. 212). Heckman used this formulation to counter those critics of econometrics who had argued that econometricians work with models that have mysterious error terms. However, the conditional mean E(Y|X ) does not always exist.4 According to Heckman and Schmierer, U represents omitted relevant regressors subject to the condition that E(U|X) = 0 without the qualification that whenever E(Y|X) exists. Note that this interpretation (II) is more restrictive than interpretation (I). Interpretation (III). Consider the following model consisting of a sampling model ˆy y 1e 2e= + + and a linking model Y X = + U with U = ZV ˆy ywhere , , Y, and U T  ˆy yare 1 vectors, is the vector of survey estimates of the unknown elements of which is the vector of values taken by the random vector Y 1e T , is a 1 vector of unknown 4 Sufficient conditions for its existence are given in Rao, C.R. (1973, p. 97).
  • 11. 11 2e T sampling errors, is a 1 vector of unknown non-sampling errors, Z T his a known matrix, the random vector V and its value v 1h are vectors, and the observation subscript t indexes areas. Here, the model error is U = ZV, and its value is denoted by Zv. As special instances of this case, the small area models in Rao, J.N.K. (2003, p. 96) are based on two ˆy X approximations: (i) = + Zv 1e ˆy+ , under the assumption that the survey weights in 2eare adjusted so that = 0, (ii) the model error is Zv that does not represent any omitted relevant regressors. By contrast, Swamy, Mehta, Tavlas and Hall (2014, pp. 207-211) 2e  ˆy y 1e 2eassume that 0, so that = + + . They estimate this model based on a method yof simultaneously estimating the common estimand of two sample estimators and the sums of their sampling and non-sampling errors. If a model such as equation (1) is meant to identify a true and causal relationship, then the two approximations just described present a serious problem. In many instances, a researcher may not care about causal interpretations and be principally focused on prediction and correlation, so our concern with approximations may not matter. However, we suspect that frequently, the temptation to interpret the coefficients estimated in such models as “effects” rather than mere “correlations” is difficult to resist even by the analyst. In such instances, the above criticism of two approximations cannot be ignored. 2.2 Why model (1) in not causal We now uadopt interpretation (I) of and summarize why and how model (1) is non- causal. In order to do so, we need (i) to define what we mean by causal and (ii) why model (1) does not satisfy Skyrms’ (1988, 59) conditions for statistical causation cited in the Introduction and adopted here. We resort to an idea, first offered by PS (1984, p. 11), that
  • 12. 12 uwithout interpreting the error vector, , often called “disturbance,” it is not possible to show  uwhether an estimator of is consistent.5 A natural and conventional interpretation of is that it is made up of omitted relevant regressors. But when relevant pre-existing conditions cited by Skyrms (1988, p. 59) are unknown, as they usually are, their control is possible only if the omitted relevant regressors are augmented by all relevant pre-existing conditions in ways that we demonstrate in Section 3. To make the case, consider the model in (1) and let u W W = , where is an unknown T L matrix containing all omitted relevant regressors and all relevant pre-existing conditions for all T periods, and is an unknown L 1 vector of coefficients. The columns of X in conjunction with those columns of W that contain all yomitted relevant regressors, are at least sufficient to determine the value of . The remaining columns of W containing all relevant pre-existing conditions can be controlled to reduce any spurious correlations implied by (1) to zero, as we show below. To avoid leaving out any omitted relevant regressors or relevant pre-existing conditions in W, we assume that L is unknown. For a single observation, say the tth, write model (1) as ty tx tu tw= + (= ) (2) ty y tx 0tx 1tx 1,K tx −where t indexes time or observations, is the tth element of , = ( , , …, ) 0tx  X tu u tw Wwith = 1, t, is the tth row of , is the tth element of , is the tth row of ,  and the coefficient vectors and are not known. 5 This is a conclusion also reached by Freedman (2005).
  • 13. 13 To establish that model (2) cannot be causal, we note the following: (i) Under Swamy, Mehta and Chang’s (2017) definition of uniqueness applied to the coefficients and error term in any model, the coefficients and error term of equation (2) cannot be unique. Therefore, it is incorrect to refer to “the” omitted relevant regressors or to “the” omitted pre-existing conditions of (2). Fortunately, we are able to show in Section 3 that with certain changes, the coefficients and error term in (2) can be made to be unique. tx(ii) The included regressors cannot be uncorrelated with every omitted relevant twregressor (in ), as PS (1984, pp. 13-14) have shown. We will also show in Section 3 that the unique error term is made up of certain “sufficient sets” of omitted relevant regressors or omitted pre-existing conditions. The included regressors can be independent of such sufficient sets. tw  (iii) The vectors , and are not unique, as PS (1984, pp. 13-14) have also shown. tw txResult (ii) implies that at least some of the elements of must be correlated with . txTherefore, an assumption that the included regressors are not correlated with unidentified omitted relevant regressors themselves is meaningless (see PS 1988, p. 34). This result compels us not to make such a meaningless assumption below. Assertions (i)-(iii) lay the txfoundation for why (which is not randomized) cannot be exogenous and is therefore tucorrelated with , implying that Pratt and Schlaifer’s (1988) conditions for a law to be
  • 14. 14 observed in data are not satisfied and model (2) is therefore not causal.6 The next few subsections flesh out some further details to motivate the approach taken in Section 3. 2.2 Digression on different types of causation Skyrms’ (1988) discussion of the three types of causation, viz., deterministic, probabilistic, and statistical, is a treatise par excellence and is relevant for this paper which adopts probabilistic and statistical concepts of causation.7 As a historical note, we mention Zellner’s (1979) enthusiasm for Feigl’s definition of causality as “predictability according to a law or set of laws,” and his subsequent observation (Zellner 1988, p. 12) that in the preceding two decades, not a single new causal economic law had been produced by all the work done on definitions of causality and tests for causality. Feigl’s definition pointedly raises this question: What then is a law? In answer, PS (1988, pp. 28 and 35) used Rubin’s (1978) potential-value notation to formulate a law relating the dependent variable of (2) to its included regressors, asserting that it is the existence of these “potential values” that distinguishes a law from a statistical association, and that it is only in terms of these potential values that it is possible to state the conditions under which a law can be observed in data. Unfortunately, the correct functional form of such a law is typically unknown, and so these conditions are difficult to verify. Indeed, in their path-breaking work, PS (1988) enumerate conditions for observability of laws in data that are, as just noted, unverifiable. A related issue concerns measurement errors that have also not been adequately treated in econometric 6 Goldberger (1964, pp. 380-388) showed that only incomplete theories can have exogenous variables. Here, we have established that even incomplete theories represented by single-equation models with errors made up of omitted relevant regressors cannot have exogenous regressors. 7 Skyrms (1988, p. 57) pointed out that the relations between different conceptions of probability are of central importance to questions of probabilistic causation. For this reason, we need to emphasize here that we use frequentist probability.
  • 15. 15 work. We take this up later, when we develop our own alternative methodology that depends on utilization of “coefficient drivers” to be defined shortly. As we show in the next section, unlike any other method in the literature, ours controls up front for all relevant pre-existing conditions, leading to a model whose coefficients and error term are unique. Our approach has the additional and desirable advantage of consistency with Basmann’s (1988) definition of causality as a property describing the real world in which causal relations and orderings are unique.8 As we show later, only a correct mis-specifications-free model with unique coefficients and error term can be considered as describing a real-world relation, where the exact meaning of the term in italics, so central to the paper, will become clear in Section 3. Before we do so, we describe two common instances covering a broad range of econometric modeling, in which neither coefficients nor error terms are unique. 2.3 Random coefficients in cross-section estimation Suppose we wish to estimate (2) using a single cross-section data set. To do so, change the subscript t in (2) to i which indexes cross-sectional units. As in other cases, inter- individual heterogeneity may be present in this cross-sectional study. For this reason, we need to change (2) to Hildreth and Houck’s (1968) representation, which is iy 1 0 ( ) K ji j ji j x   − = += (i = 1, …, n) (3) 8 A referee has thoughtfully suggested that this definition of causality may need some further discussion, as there are other definitions around that do not require uniqueness for a causal description. But are these other definitions correct or relevant? Basmann (1988, p. 99) answered this question with the observation that “None of the generally accepted meanings of ‘causality’ fails to involve the notion that causation is a real-world, invariant relation between events rather than a mere property of a linguistic representation. To use ‘causality’ in the latter sense may court eventual public … [devaluation] and dismissal of econometric research and econometricians.” Any relationship with non-unique coefficients and error term is by definition mis-specified. But how can a mis-specified relation reflect any kind of acceptable causation?
  • 16. 16 ji j ji + ji jiwhere = , j = 0, 1, …, K – 1, and are the values taken by the random ji ji ji jvariables and , respectively, is distributed with mean and where it is further assumed that ji X  ji j i   X 2 if = and = 0 otherwise{ j j j i i   E( | ) = 0, , j and i and E( | ) = (4) 0ix  0iSuppose that = 1, i, and that the error term is made up of all omitted iyrelevant regressors that affect . Hildreth and Houck (1968) did not have the benefit of PS’ 0i(1984, 1988) earlier insight that when the disturbance is composed of all omitted ix 1 1,(1, ,..., )i K ix x − relevant regressors, as it is in (2), the included regressors with = as the vector of their values cannot be uncorrelated with every one of those omitted relevant 0iregressors (or with ) and can therefore not be exogenous. Also, in (3), the included ixregressors with as the vector of their values are correlated with their random coefficients in (3). As a consequence, the assumption in (4) is not satisfied. As we shall demonstrate, ixwhen is not the value of the vector of exogenous variables, contrary to Hildreth and Houck (1968), it cannot be treated as fixed. 2.4 State-space models Durbin and Koopman (2001), among others, treat general linear Gaussian state-space models, which, for a single dependent variable, can be written as ty t tx  tu= + (5) t  1tF − tv= + + (6)
  • 17. 17  tvwhere is a fixed vector and is the value taken by a vector of errors treated as random variables. Equation (5) is called the “observation equation,” and equation (6) is the “state txequation.” A routine assumption is that in (5) is the value taken by the vector of included tu tvexogenous regressors and that the random variables taking the values and , respectively, are independent, although the above cited authors do not offer an interpretation tuof the random error term taking the value in (5) or an explicit or implicit tuacknowledgement of the presence of pre-exiting conditions in (2). But, if consists of all omitted relevant regressors, as is often assumed, then it follows from PS’ (1984, 1988) txarticles that the vector in (5) cannot be the value of a vector of exogenous variables, since tuthese variables are correlated with the random variable taking the value . Furthermore, the txcorrelation between the included regressors with as the vector of their values and their random coefficients are natural in models of the type (5). As a consequence, the coefficients and the error term in (5) are not unique, and likewise, equation (5) will have the same problems we previously identified for equation (2) above. In essence, as posited, equation (6) implicitly contains an assumption that the coefficients in (5) are non-unique. Therefore, if the representation in (2) is unsatisfactory, then so are state-space models, such as (5) and (6). 3. A MODEL WITH UNIQUE COEFFICIENTS AND ERROR TERM We turn to our main task and show how to construct and estimate a model whose ty * tycoefficients and error term are unique. We begin by introducing some notation. Let = * 0t jtx * jtx * jt ty jtx+ , and for j = 1, …, K - 1, let = + , where and are the observed
  • 18. 18 * ty * jtx * 0t * jtvariables, and are the unobserved true values, and and are unobserved * twmeasurement errors. Also, for = 1, …, L, let denote the true value of the -th omitted relevant regressor (or relevant pre-existing condition). Further, partition W in Section 2 as 1L 2L 1L 2L * tw 1Lfollows: define integers and , + = L, such that (i) , = 1, …, , denotes omitted relevant regressors, and (ii) * tw 1L 1L 2L, = + 1, …. + = L, denotes all relevant 1L 2Lpre-existing conditions. Finally, assume that and are unknown. We are now prepared * tyto implement PS’ (1984) two conditions that must be satisfied for to be related to the * jtx ’s. We do this using a convenient class of nonlinear functional forms and a pair of equations, one deterministic and the other stochastic. First, write the deterministic * ty * jtx * twrelationship between and the 's and 's, * ty * 0t 1 * * 1 K jt jt j x  − =  * * 1 L t tw  = = + + (7) * 0 t *  jt * jtxwhere is the intercept, the with j > 0 are the coefficients of the ’s, * tand the * tware the coefficients of the ’s. Observe that formally, none of the coefficients, omitted relevant regressors, and relevant pre-existing conditions in (2) and (7) are unique. So, if we * * 1 L t tw  = take the last term on the right-hand side as the error term of (7), then this model will have the same problems that haunt equation (2). But what if more could be said about that third term? Could one somehow relate every omitted relevant regressor (or every
  • 19. 19 * twrelevant pre-existing condition) to the true values of the included regressors? To answer this, we introduce the next equation that we allow to be stochastic: * tw * 0t 1 * * 1 K jt jt j x  − =  1,..., L== + ( ) (8) * 0twhere is the value of an error term representing that portion of the omitted relevant * tw 1 * * 1 K jt jt j x  − = regressor (or the relevant pre-existing condition) that remains after the effect * jtx * tw * jtof the included regressors 's on has been removed from it, and is the coefficient * jtxof . It is important to emphasize that not all equations in (8) are vacuous or empty, * jtxbecause, as PS (1984, p.14) proved, the included regressors taking the values 's cannot * twall be uncorrelated with every omitted relevant regressor taking the value . This means there are some omitted regressors that must be (stochastically) related to the included * jtxregressors taking the values 's, and it is these relationships that we write down as equation * tw(8). Accordingly, when represents a relevant pre-existing condition, it can be controlled using the included regressors via (8). It follows from Skyrms (1988, p. 59), that the controls of relevant pre-existing conditions, activated by specifying an equation like (8), make the spurious correlations implied by (7) disappear. Notably, the coefficients of (7) and (8) are time-varying; so, this property needs explanation. The problem here is that their correct functional forms are not actually known, leading to a potential problem of mis-specification. PS themselves, did not specifically address or solve the issue of unknown functional forms in this setting, but they did offer the
  • 20. 20 * y * xfollowing two requirements for to be related to by a law: (i) for every possible value * tx tU tUof there exists an identically and independently distributed sequence { }, where * txdepends on the considered possible value of , and (ii) there exists a function f which on * txthe tth observation, associates with every possible value of a random variable taking the * ty * tx tu * ty tuvalue = f(the considered possible value of , ), where both and depend on * txthe considered possible value (see PS 1988, p. 28). We refer to this imperative as the “correct function f” and assume that the coefficients of (7) lie on that function. We assume further that a similar argument is true about (8). A most general approach to solving the issue of unknown functional forms for the models that also satisfy the preceding conditions is to make the coefficients of (7) and (8) depend on the observation index t, thereby opening a rich class of functional forms able to cover the respective correct functional forms as special cases. We call these functional forms “linear in variables and nonlinear in coefficients.” The important point to note here is that no parameters in the relationship in (7) are treated as constant. Non-constancy of coefficients vindicates an old admonition by Goldberger (1987) that the particular choice by Barten and Theil of quantities in the Rotterdam School demand functions (and, by implication, in any other econometric relationship) to be treated as constants may be questioned. We now assert that the combination of (7) and (8) constitutes a model that is * tysufficient for the determination of exactly. This model is obtained by substituting the * twright-hand side of (8) for in (7):
  • 21. 21 * ty * 0t * * 0 1 L t t  =  1 * * * * 1 1 ( ) K L jt jt jt t j x    − = = + = + + (9) * 0twhich clearly shows that the portions , = 1, …, L, of all omitted relevant regressors * jtxand all relevant pre-existing conditions, in conjunction with the included regressors , j = * ty1, …, K – 1, are sufficient to determine the value of exactly. PS (1988, p. 51) referred * 0t 1Lto , = 1, …, , as certain “sufficient sets” of omitted relevant regressors. PS (1988, * jtxp. 34) further proved that although the included regressors taking the values ( ’s) cannot be independent of every omitted relevant regressor, they can be independent of the sufficient * * 0 1 L t t  = set of every such variable.9 Thus, taking the second term ( ) on the right-hand side of equation (9) as the value of its error term, we can conclude that in this equation, there is no problem of the correlation between the included regressors and the error term. Among some statisticians, time-varying coefficients are usually thought to be uninterpretable. Yet, in equation (9), we are able to offer the following valid and * * * 1 ( ) L jt jt t   = + unambiguous interpretation: the jth time-varying coefficient of (9) is the * jtx * ty“total” effect of on . This total effect is the sum of two time-varying coefficients, * jt * * 1 L jt t  = and , * jtwhere the coefficient * jtx * tyis a direct effect of on which appears 9 This echoes Freedman’s (2005) assumption (i).
  • 22. 22 * * 1 L jt t  =  * jtx * tyin (7), and the sum of products is an indirect effect of (on ) due to the effect * jtxof on every omitted relevant regressor which appears in (8), and the effect of every * tyomitted relevant regressor on which appears in (7).10 These effects are time dependent. However, if, as usual, the omitted relevant regressors are not identified, then there exists no meaningful distinction between direct and indirect effects (see PS 1984, p, 12). Note that * jt * * 1 L jt t  =  * jtx * tyalthough the direct ( ) and indirect ( ) effects of on depend on the * * * 1 ( ) L jt jt t   = + omitted relevant regressors chosen to define them, the total effect does not. This result, obvious from (9), is due to PS (1988, p. 50). It justifies estimation of the total effect of each included regressor on the dependent variable when the omitted relevant regressions are not identified. But instead of being arbitrary and “hard to swallow,” it is now a condition at once palatable and implementable. Swamy, Mehta, Tavlas and Hall (2014, p. 217-219) proved earlier that the * * * 1    = +  L jt jt tcoefficients (or the sums ) and error term of (9) are unique, even though the coefficients and error terms of (7) and (8) are not. This means that even though the 10 In this measure * * 1 L jt t  =  of the indirect effect, * jt and * t are set equal to zero if the dependent variable of the -th equation in (8) is a relevant pre-existing condition. This restriction is needed because the vector tw is assumed to contain all relevant pre-existing conditions, and the effect of * jtx on each relevant pre- existing condition is not part of an indirect effect of * jtx on * ty .
  • 23. 23 components of the coefficient on each included regressor in (9), measuring direct and indirect effects, are not unique (see PS 1984, p. 13), its total effect is unique. It may be useful here to refer to Simpson’s paradox, also known as the Yule -Simpson effect, which is a phenomenon in probability or statistics whereby the association between a pair of variables (X, Y) reverses sign upon conditioning of a third variable, Z, regardless of the value taken by Z. To illustrate this phenomenon in the present context, suppose that all * jtxbut the first regressor, , j = 2, …, K – 1, are deleted from the right-hand side of (9) and * 1 tare added to its list of omitted relevant regressors. Then the direct-effect of the remaining * 1tx * tyregressor on does not change sign no matter how many of the deleted regressors, * jtx * jtx, j = 2, …, K – 1, are included back in (9). However, with one or more of the , j = 2, * 1tx…, K – 1, included back in (9), the indirect-effect component of the coefficient on * 1txchanges. Simpson and his followers implicitly mistook the change in the total effect of due to changes in its indirect effects for a change in its direct effect. Thus, Simpson’s paradox cannot arise in the context of equations of the form (9) with unique coefficients and error term (see Swamy, Mehta, Tavlas and Hall 2015, pp. 5 and 6). In (7)-(9), all omitted relevant regressors and all relevant pre-existing conditions are treated identically without any distinction. The question then becomes: is this treatment appropriate? We now consider the details of our methodology. Measurement-error biases: In terms observed values introduced at the beginning of this section, (9) can be written as
  • 24. 24 ty 0t 1 1 K jt jt j x  − = = + (10) where 0t * 0t * 0t * * 0 1 L t t  = = + + (11) and for j = 1, …, K – 1, jt * (1 ) jt jtx  − * * * 1 ( ) L jt jt t   = + = (12) * jtwhere we do not assume that the measurement errors , j = 0, 1, …, K – 1, are random variables distributed with zero means. The measurement-error bias component of the jt jtxcoefficient of in equation (10) is precisely * ( ) jt jtx  − * * * 1 ( ) L jt jt t   = +  (13) Equations (7)-(13) clarify that the model in (2), whose coefficients and error term have been shown to be non-unique, suffers from a variety of specification errors that doom the prospect of obtaining consistent estimators. As we have shown, model (10) is free from this deficiency and is therefore to be preferred to (2). Observe that the time-varying tYcoefficients of (10) make non-stationary. We now consider a feasible approach to estimating model (10). In what follows we limit ourselves to cases where all omitted relevant regressors are * jtx * tyunidentified. As we have already shown above, only the total effects of ’s on can be meaningfully estimated in these cases. Furthermore, assuming that the coefficients of
  • 25. 25 equation (10) are distributed with finite means, these means cannot be consistently estimated ty jtx jtxby regression of on the ’s alone because the included regressors with the ’s as their jtvalues are correlated with their own coefficients in (10). This is because in (10) is a jtxfunction of . Therefore, we proceed as follows. 3.1 Coefficient drivers To impose the restrictions implied by equations (11) and (12) on the coefficients of (10), we make the following assumption: Assumption I: For j = 0, 1, …, K – 1, the coefficients of (10) satisfy the equations  jt 0 j 1 1  − =  p jh ht h z jtu= + + (14) htzwhere the ’s are called the “coefficient drivers,” which are potentially observable * * 1 L jt t  =   jtvariables that influence the component of .11 It follows from equation (14) that the coefficients of (10) are non-stationary. In specifying equation (14), we impose another level on model (10), so that, together, (10) and (14) form a bi-level formulation of the relationship between ty and the jtx ’s. From (9) and (14) it follows that (10) has two sources of errors: (i) certain “sufficient sets” of all omitted relevant regressors and all relevant pre-existing conditions, * 0t ’s, and (ii) all the 11 After clarifying the complications that arise in the Bayesian analyses of laws, PS (1988, p. 49) concluded that a Bayesian will do much better to search like a non-Bayesian for variables that absorb “proxy effects” for omitted regressors. These effects can be equal to the indirect-effect * * 1 L jt t  =  component of  jt , j = 1, …, K – 1. Taking a hint from this conclusion, we choose the coefficient drivers that absorb the indirect effects of included regressors in (10). This will be made clear in equation (15)(ii) below.
  • 26. 26 coefficients of (10) including its intercept. Note that the model with equations (10) and (14) does not provide a hierarchical Bayes model, as (14) is separate and yet inseparable from (10) and is not equivalent to the prior distribution specified in a hierarchy. Actually, Pratt and Schlaifer (1988, p. 49) show how to apply Bayesian analysis to a law when it is not known whether the condition for observability is satisfied or not. Equations (10) and (14) are consistent with their procedure, as we show below. Next, we indicate conditions that need to be imposed on the coefficient drivers. Assumption II (Admissibility Condition): tZ 1 11 Z , Z )− t p ,t, ...,The vector = ( in equation tZ tz  t(14) is an admissible vector of coefficient drivers if, given = , the value = 0 1 1,( , ,..., )   − t t p t tX 1tX 1−K ,tXthat the coefficient vector of (10) would take had = (1, , …, ) tx 1tx 1−K ,tx ) tX 1tX 1−K ,tX ) been = (1, ,…, is independent of = (1, , …, , t.  tThis first condition, requiring the coefficient vector of (10), having as its value, tX tZand to be conditionally independent, given , is needed to achieve consistent estimators of the coefficients of (14). Our experience with equation (14) suggests that statistically significant estimates of the coefficients of (14) cannot be obtained unless certain further exclusion restrictions are imposed on these coefficients. 3.2 Estimation of the coefficients of (10) with unidentified omitted relevant regressors We assert that the coefficient drivers included in (14) are appropriate and adequate * jtˆand that our guesses about the measurement errors in (11) and (12), denoted by , j = 1, …, K – 1, are accurate if the coefficient drivers are admissible and the following assumption is satisfied:
  • 27. 27 Assumption III  jt 1 1- ( ) * L jt * * * jt jt t jt ˆ x     =   +      : For j = 1, …, K – 1, the equation = 0 j 1 1  − =  p jh ht h z jtu= + + holds, such that (i) * jt = 1 1- * jt jt ˆ x  −         0 j and (ii) 1   =  L * * jt t = 1 1- * jt jt ˆ x  −         ( 1 1  − =  p jh ht h z + jtu ),  t (15) (see Swamy, von zur Muehlen, Mehta and Chang 2019). The rationale for the two equations in (15) derives from an argument in PS (1988, p. 49) that a Bayesian will do much better to search like a non-Bayesian for variables that 1   =  L * * jt tabsorb “proxy effects” for omitted relevant regressors. The term representing the indirect effects is one of the two terms of the  jtsecond factor of the jth coefficient, , in (12). The estimator on the right-hand side of (15)(ii) is a good estimator of indirect effects on its left-hand side in the sense that the coefficient drivers included in (14) and its error term absorb those effects completely, i.e., the equality sign in (15)(ii) holds. Below, we show how to estimate the ’s and u’s in (15)(i) and (15)(ii). 3.3 Combining equations (10 and (14) Using vector and matrix notation, (10) and (14) can be combined, yielding Y zX  xD= + U (16) y 1( ,..., )Ty y  1T where = is a vector of observations on the random vector Y in equation tz 0 1 1,( , ,..., )t t p tz z z −  0tz (16), = with = 1, t, is a p-vector of observations on the coefficient tx 0 1 1,( , ,..., )t t K tx x x −  0tx drivers in (14), = with = 1, t, is a K-vector of observations on
  • 28. 28 zX T Kpthe included regressors in (10), is a ( ) t tz xmatrix having the Kronecker product  Kp xDas its tth row, is a -vector of the coefficients of equations in (14), = diag  1 Tx x  u 1u ) Tuis a T TK block diagonal matrix, = ( , …, is a TK-vector of values taken by the error vector U tu 0 1 1, , ..., )t t K ,tu u u − in (16), and for t = 1, …, T, = ( is a K- zX  xDvector of errors in (14). Note that both the conditional mean, , and the error term, U, of Y on the right-hand of equation (16) depend on the included regressors. It is this zXproperty of (16) that precludes the existence of instrumental variables for . 3.4 An assumption about the error term of model (16) Assumption IV tu: The errors (t = 1, …, T) are the realizations of a vector stationary stochastic process following the first-order vector autoregressive equation tu =  1−tu + ta (17) Where  is a K  K diagonal matrix and ta , t = 1, …, T, is a realization of a sequence of uncorrelated K-vector variables t with E( , )|t t tz x = 0, E( , )|t t t tA z x = 2 Δ if = 0 if {    a a t t t t (18) and where t is a random vector taking the value ta , Δa is a K  K nonnegative definite matrix. xDNote that under Assumption IV, the error term U in model (16) is a vector non- stationary process obtained by passing the stationary process {U} through the time- xDdependent filter . Assumption IV also implies that
  • 29. 29 ( | , t tz x 2 a u 2 1 0 0 0 0 2 2 0 0 0 0 1 2 3 0 0 0 0 T T a T T T                      − − − − −                 E ) = = , (19)  u twhere is a random vector taking the value , is a random vector taking the value tu t t  ,t tz x 2 0 a 2 2 0 Δ    +a a a, E( | ) = = (see Anderson 1971, pp. 178-182), and the y Y y zXconditional covariance matrix , of taking the value in (16) given is y xD 2 a u xD= (20) y txwhere the dependence of on the ’s is explicit. We note that in the general formulation of Cavanagh and Rothenberg’s (1995) (hereafter CR) model, reproduced in (1), such  Δa 2 adependence is not present. The distinct nonzero elements of , and are the elements  yof the unknown parameter vector, denoted by , on which depends, where the number of rows of the column vector is K + K(K + 1)/2 + 1 = m, say. Before we apply the non-Bayesian generalized least squares method to model (16) under Assumption IV, we note that a Bayesian analysis of (16) has certain problems, which are in addition to those already pointed out by Pratt and Schlaifer (1988, p. 49). It is known that the likelihood function, which is one of the factors of the posterior distribution derived via Bayes’ theorem, is model based. Swamy, von zur Muehlen, Mehta and Chang (2019, p. 324-325) improved this derivation by requiring that the likelihood functions be based on unique coefficients and error term of an appropriate model. Under Assumption IV, model (16) is such a model. The difficulty of applying Bayes’ theorem to (16) though is that the covariance matrix in equation (20) depends on the included regressors. But subjective
  • 30. 30 Bayesians, such as Bruno de Finetti and Leonard J. Savage, require that the prior probability density function (pdf) for this covariance matrix be independent of the likelihood function. y xDThe dependence of in (20) on yamounts to the dependence of the prior pdf for on the likelihood function. Therefore, the task is how to choose a prior pdf for the covariance matrix in (20) that does not depend on the likelihood function. This is not possible unless the included regressors are removed from the covariance matrix in (20). But to achieve this, one has to place unreasonable restrictions on model (10) by making the error term * * 0 1 L t t  =  in (11) independent of the other terms on the right-hand side of equations (11) and (12), a step that we must reject. 3.5 Estimating the coefficients of (16) The generalized least squares (GLS) estimator of is ˆ 1 1 1 ( )z y z z yX X X Y − − −  = (21) where all the regular inverses are assumed to exist. Our discussion in Section 2 above shows that the application of the generalized least squares method to the model in (1) does not lead to consistent estimators of its coefficients. As should be clear by now, this problem goes * jtxaway in (16), the reasons being that, in (9), the included regressors taking the values ( ’s) * 0tare independent of the sufficient set ( in (8)) of every omitted relevant regressor, and in  t tX(16), the two random variables taking the values and are conditionally independent, tZgiven . uBased on the preceding discussion, the best linear unbiased predictor of is
  • 31. 31 ˆu 2 a u xD   1  − y Y ˆzX= ( - ) (22)  ΔaStarting with the arbitrary initial values = 0 and = I, Chang, Hallahan and Swamy (1992) and Chang, Swamy, Hallahan and Tavlas (2000) iteratively solved equations  Δa 2 a (21) and (22) until the estimates of , , , and were stabilized.12 This iterative procedure eliminates the dependence of these final estimates on the arbitrary starting values.  Δa 2 aThe estimates of , , and obtained in the final iteration, are called “the residual- y zX ˆˆ ˆˆbased estimates,” since they rely on the residuals ( - ), where is explained below. y ˆ yInserting these estimates into (20), gives an estimate of , denoted by . The feasible GLS estimator of is ˆˆ 1 1 1ˆ ˆ( )z y z z yX X X Y − − −  = (23) uand the feasible best linear unbiased predictor of is ˆˆu 2 a u x ˆˆ D   1ˆ − y Y ˆˆzX= ( - ) (24) 2 aˆ 2 a u ˆwhere is the residual-based estimate of and is the residual-based estimate  Δaobtained using the residual-based estimates of and in place of their true values used uin . The estimates of the coefficients of (10) are ˆ jt 0 ˆˆ j 1 1 ˆˆ − =  p jh ht h z ˆˆjtu= + + (j = 0, 1, …, K – 1) (25) 12 This stabilization does not take place if  is non-diagonal.
  • 32. 32 ˆˆ ˆˆuwhere the ’s and ’s ˆˆ ˆˆuare the elements of and , respectively. The estimates of direct and indirect effects used as intermediary estimates in the computation of the total effect of * jtx * tythe jth regressor of (9) on its dependent variable at time t are * jt ˆ 1 1- * jt jt ˆ x  −         0 j ˆˆ 1   =  itL * * jt t 1 1- * jt jt ˆ x  −         1 1  − =  p jh ht h ˆˆ z jt ˆˆu 1* ˆ ˆ1 jt jt jtx   −   −      = , = ( + ), and , (26) respectively. Given a set of coefficient drivers, we need to experiment with different exclusion restrictions on the coefficients of (14) and compare results. Usually, omitted relevant 1* ˆ ˆ1 jt jt jtx   −   −      regressors are not identified and, therefore, we can only use the estimate, , of * jtx * tythe total effect of on for all j = 1, …, K - 1. 3.6 Second-order properties of the feasible GLS estimator (23) In this section, we study the second-order properties of (23). For this purpose, we use Cavanagh and Rothenberg’s (1995) method but not their model in (1). Because of PS’ (1984, 1988) results listed in Section 2, CR’s (1995) GLS and feasible GLS estimators are replaced by those in (21) and (23), respectively. Following CR, we standardize (21) as ˆ( , )h   1/21 1 ˆ( ) ( )    − −  −     z y z l l X X l 1z xx T   = = (27) Kp 1zx T 1 1 ( )  − −  z z y zL X X X lwhere l is any -vector of constants, = 1/21 1 ( ) −− −    z y zl X X l 1  − y  L L L T Tis a column vector with T rows, = , and is a txlower triangular matrix, the dependence of which on the ’s should not go unnoticed, and
  • 33. 33 x L xD= U 1T  1 1z zx x xis a vector. One may verify that = T and the elements of are uncorrelated, mean zero random variables with unit variance. The standardization of (23), ˆ( , )h  which is the same as that of is ˆ ˆˆ( , )h   1/21 1 ˆˆ( ) ( )    − −  −     z y z l l X X l = (28) To develop the 1 ( ) − T approximation to the difference between the distributions of ˆ ˆˆ( , )h   ˆ( , )h    and , we assume that the number of rows of the column vectors and zXremains fixed, while the number of the rows of and the number of rows and columns of y →  1 / − z y zX X Tincrease as T in such a way that converges in probability to a tZ ˆ positive definite matrix, given .13 Let denote the residual-based estimate of developed by Chang, Hallahan and Swamy (1992) and Chang, Swamy, Hallahan and Tavlas (2000); and let d T ˆ = ( - ). We assume that d converges in distribution to a normal distribution uniformly in .14 Note that CR (1995) do not require uniform convergence. We  y further assume that the elements of are differentiable with respect to up to the third order. ˆ ˆˆ( , )h  Expanding in a stochastic Taylor series gives 13 Time-series settings that involve time trends, polynomial time series, and trending variables give cases where this assumption is not satisfied. In these cases, we use Grenander’s conditions presented in Greene (2012, p. 65). 14 See Lehmann and Casella (1998, p. 441) for the importance of uniform convergence, defined in Lehmann (1999, p. 93-97).
  • 34. 34 1ˆ ˆˆ ˆ( , ) ( , )h h T    = +  +b d 1 T       d Cd + R (29) where b ˆ ˆˆ( , ) ˆ h      = is the m-dimensional vector, C 2 ˆ ˆˆ1 ( , ) ˆ ˆ2        h m m= is the matrix, the derivatives in both b and C are evaluated at the true value and are stochastically bounded →  3/2 ( ) − p Tas T , and R is . The desirable property of equation (29) is that it is based on (9) with unique coefficients and error term and on Assumptions I-IV. The derivative b 1 1 1 1 1 1 ˆ ˆ( ) ˆ ( )    − − − − −              z y z z y x z y z l X X X D U l X X l = (30) is a column vector with m rows which is evaluated at the true value . Its ith element is 1 1 1 ( ) − −  z y zl X X l 1 1 1 1 1 1 ˆ ˆ( ) ( ) ˆ ˆ( ) ˆ ˆ     − − − − − −        +      z y x z y z z y z z y x i i X D U X X l X X X D U   ˆi ˆ ibwhere is the ith element of . Let denote the ith element of b. This element can be written as 1 1 1 ( ) − −  z y zl X X l  1 1 1 1 1 1 1 1 1 1 1 ˆ ˆ( ) ( ) ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ( ) ( ) ( ) ˆ ˆ          − − − − − − − − − − −           − +      y y z y z z y y x z y z z y y z z y z z y x i i l X X X D U X X X X X X X D U   (31)
  • 35. 35 ˆ( ) ˆ   y i T T ˆ ywhere is the matrix with zero in every position except for the elements of ˆ i ibwhich are functions of . The element can be compactly written as 1 1 1 ( ) − −  z y zl X X l  1 1 1 1 1 1 1 ˆ( ) ( ) ( ) ˆ      − − − − − − −        − −     y z y z z y y z z y z z y x i l X X X I X X X X D U  (32) From this result it follows that C m mis the matrix having typical element 1 1 1 ( ) − −  z y zl X X l  2 1 1 1 1 1 1 1 ˆ( ) ( ) ( ) ˆ ˆ      − − − − − − −        − −      y z y z z y y z z y z z y x i j l X X X I X X X X D U   (33) ˆ j ˆ T T 2 ˆ( ) ˆ ˆ     y i j where is the jth element of , and the matrix is evaluated at the true value . The ith element of b in (32) and the (i,j)th element of C in (33) can be written as ib 2 z i xx T ijc 2 z ij xx T = , = (34) 2z ix 2z ijx 1zxwhere the T-dimensional vectors and are orthogonal to . From this result it follows that b and C ˆ( , )h   zXare uncorrelated with given . If the error term U zX ˆ( , )h  of (16) is normal, then conditional on , the variables , b, and C are also normal. In this case, if the remainder term R in (29) is “well behaved” in
  • 36. 36 ˆ ˆˆ( , )h  the sense to be defined presently, allowing it to be ignored, the distribution of can be approximated to order using the asymptotic distribution of d. If the distribution of 1 ( )T − ˆ ˆˆ( , )h  the error term of (16) is not normal, then the approximate distribution of ˆ( , )h  can be obtained as long as the joint distribution of , b, C, and d is asymptotically normal and possesses an Edgeworth expansion. To proceed with estimating model (16), we need to determine its regularity conditions when the error covariance is given by (20). These conditions should also imply ˆ ˆˆ( , )h   1 ( )T − that an approximation to the distribution of with error can be calculated from the moments of the approximate distribution of a random variable, , defined in the regularity conditions of Assumption V stated next. Assumption V: (Regularity condition) In the stochastic expansion (29), which depends on unique coefficients and the unique error term of (9), and which satisfies Assumptions I-IV, (i) the remainder term R satisfies 1 ( )T − Pr[TlogT|R| > 1] = ; (ii) the 1zx 2z ix 2z ijxelements of the vectors , and appearing in (27) and (34) are uniformly xbounded as T tends to infinity; (iii ) the error term possesses moments up to the fourth  ˆ( , )h  order, and (iv) the vector , consisting of the distinct elements of ( , b, C, d), 1 T − possesses a valid Edgeworth expansion to order , that is, the joint density function for  has the approximation ( )  ( ) 1 i i ijk i j k TT        + + +       1 ( )T − + , (35)
  • 37. 37 where is the density function of a normal random vector with mean zero and covariance  ( )  s,matrix ; is an even-order polynomial whose coefficients, along with the are uniformly bounded scalars. CR (1995) also adopted Assumption V for their model (1), but, as we now demonstrate, unless modified in ways spelled out presently, a model of type (16) violates Assumption V, given that, as presented, its error covariance matrix in (20) varies over time as a function of the included regressors because in (35) is assumed to be constant. To  * remedy this shortcoming, we propose to replace each in CR’s (1995) paper with which, given the definitions of b and C xDin (29), depends upon . This expanded definition *   ˆ( , )h  of is further justified because consists of the distinct elements of ( , b, C, d). Note that, more generally, Assumption V is inappropriate for model (2), which is the same as CR’s model (1) and other models employed in applied work, because, given results (i)-(iii) at the end of Section 2.2, such models are plagued by non-uniqueness of their coefficients and error terms. Since (16) is free of these problems when Assumptions I-IV  * hold, Assumption V with replaced by appears to be appropriate for (16) and implies ˆ ˆˆ( , )h   1 ( )T − that an approximation to the distribution of with error can be calculated from the moments of the approximate distribution of . It follows from Cavanagh (1983) and Rothenberg (1984, 1988) that the distribution ˆ ˆˆ( , )h   1 T − of is the same, to order , as the distribution of ˆ( , )h   + 1 ˆ( | ( , )) E b d h T +
  • 38. 38 1 1 ˆ ˆ ˆ( | ( , )) ( , ) var( | ( , )) ˆ2 ( , )             + −     E d Cd h h b d h T h (36) ˆ( , )h  CR (1995, p. 279) established that is asymptotically independent of b and C. m m bb dd bdWe use the matrices , , and to denote the asymptotic variance and covariance matrices for the vectors b and d Adand the m-vector to denote the asymptotic ˆ( , )h  covariance between and d, as in CR (1995, p. 279). We can use CR’s (1995) method 1 ( )T − of proof to show that, based on moments of the Edgeworth approximation to the distributions, ˆ ˆˆ( , )h   ˆ( , )h  (i) the skewness of is always the same as that of ; bd ˆ ˆˆ( , )h   ˆ( , )h  (ii) if = 0, then the mean of is the same as that of . Ad ˆ ˆˆ( , )h   ˆ( , )h  (iii) if = 0, then the kurtosis of is the same as that of . ˆ( , )h  It can further be shown that if is asymptotically uncorrelated with d, then ˆ ˆˆ( , )h    ˆ( , )h   var( )b d T  ˆ2 ( ( , ) )T E h b d T   var( ) var( ) + + (37) ˆ ˆˆ( , )h  This approximate variance of is necessarily greater than the variance of if the third term on the right-hand side of approximate equation (37) is nonnegative. 4. CONCLUSIONS Econometric work seeking to identify unique causal relationships cannot produce consistent estimators if it does not take into account the implications of PS’ (1984, 1988) conclusion that in any model, the included regressors cannot be uncorrelated with every omitted relevant regressor. This result implies that in instances where these omitted relevant
  • 39. 39 regressors constitute the error term of a model, the generalized least squares estimators of its coefficients are inconsistent. Also, in this case, the coefficients and error term of the model are not unique. This paper offers a way around this dilemma with a methodology that does yield unique coefficients and error terms. The proposed estimator involves the use of functional forms that may not be known, a problem that we solve by allowing all the coefficients in a model to vary freely with the observations on the dependent variable and the included regressors. As we show, the unique coefficient on each included regressor of a model with unique error term is the sum of direct and indirect effects of the regressor on the dependent variable. An issue arises if the omitted relevant regressors chosen to define these direct and indirect effects are not identified, because in this instance, there exists no meaningful distinction between direct and indirect effects. To solve this problem, we offer a two-level formulation of a time-varying coefficient model, where each coefficient on an included regressor is expressed as a function of certain coefficient drivers that absorb indirect effects of the included regressor present in the coefficient. For completeness, we study the second-order properties of the feasible GLS estimator of the coefficients of such a two-level formulation. To conclude, the methodology described in this paper makes it possible to write down and estimate models that allow valid causal interpretations of their coefficients, a result that we hope will be welcomed in the profession. REFERENCES AIGNER, D. J. and ZELLNER, A. (1988). Editors’ introduction, J. Econom., 12, 1-5. ANDERSON, T.W. (1971). The Statistical Analysis of Time Series, John Wiley & Sons, New York. BASMANN, R. L. (1988). Causality tests and observationally equivalent representations of
  • 40. 40 econometric models. J. Econom., 39, 69-104. CAVANAGH, C. L. (1983). Hypothesis testing in models with discrete dependent variables, PhD thesis. University of California at Berkley. CAVANAGH, C. L. and ROTHENBERG, T. J. (1995). Generalized least squares with nonnormal errors. In Advances in econometrics and quantitative econometrics, G. S. Maddala, P. C. B. Phillips, T. N. Srinivasan (Eds.). Blackwell Publishers, Inc., Cambridge, MA, USA. CHANG, I., HALLAHAN, C. and SWAMY, P. A. V. B. (1992). Efficient computation of stochastic coefficients models. In Computational economics and econometrics, H. M. Amman, D. A. Belsley, L.F. Pau (Eds.). Kluwer Academic Publications, Boston, MA. CHANG, I., SWAMY, P. A. V. B., HALLAHAN, C. and TAVLAS, G. S. (2000). A computational approach to finding causal economic laws. Compu. Econ. 16, 105-136. DURBIN, J. and KOOPMAN, S. J. (2001). Time series analysis by state space methods. Oxford university press, Oxford. FREEDMAN, D. A. (2005). What is the error term in a regression equation? https://www.stat.berkeley.edu/~census/epsilon.pdf. GOLDBERGER, A. S. (1964). Econometric theory. John Wiley & Sons, New York. GOLDBERGER, A. S. (1987). Functional form and utility: A review of consumer demand theory. Westview Press, Boulder. GREENE, W. H. (2012). Econometric analysis, Seventh edition. Prentice Hall-Pearson, Upper Saddle River, NJ. HECKMAN, J. J. and SCHMIERER, D. (2010). Tests of hypotheses arising in the correlated random coefficient model. Economic Modelling, 27, 1355-1367. HILDRETH, C. and HOUCK, J. P. (1968). Some estimators for a model with random coefficients, J. Am. Stat. Assoc., 63, 584-595. LEHMANN, E. L. and CASELLA, G. (1998). Theory of point estimation, Second edition. Springer, New York. LEHMANN, E. L. (1999). Elements of large sample theory. Springer, New York. PRATT, J. W. and SCHLAIFER, R. (1984). On the nature and discovery of structure (with discussion). J. Am. Stat. Assoc., 79, 9-21, 29-33.
  • 41. 41 PRATT, J. W. and SCHLAIFER, R. (1988). On the interpretation and observation of laws. J. Econom., 39, 23-52. RAO, C. R. (1973). Linear Statistical Inference and Its Applications, Second edition. John Wiley & Sons, New York. RAO, J. N. K. (2003). Small area estimation. John Wiley & Sons, Inc., Publication, New York. ROTHENBERG, T. J. (1984). Approximating the distribution of econometric estimators and test statistics. In Handbook of econometrics, Volume 2, Z. Griliches and I. Intriligator (Eds.). North-Holland, Amsterdam. ROTHENBERG, T. J. (1988). Approximate power functions for some robust tests of regression coefficients. Econometrica, 56, 997-1019. RUBIN, D. B. (1978). Bayesian inference for causal effects. Ann. Statist., 6, 34-58. SKYRMS, B. (1988). Probability and causation. J. Econom., 39, 53-68. SWAMY, P. A. V. B. and TINSLEY, P. A. (1980). Linear prediction and estimation methods for regression models with stationary stochastic coefficients. J. Econom., 12, 103-142. SWAMY, P. A. V. B., CONWAY, R. K. and VON ZUR MUEHLEN, P. (1985). The foundations of econometrics – Are there any? (with discussion). Econometric reviews, 4, 1-61, 101- 119. . SWAMY, P. A. V. B. and VON ZUR MUEHLEN, P. (1988). Further thoughts on testing for causality with econometric models. J. Econom. 39, 105-147. SWAMY, P.A.V.B., MEHTA, J. S., Tavlas, G. S. and HALL, S. G. (2014). Small area estimation with correctly specified linking models. In Recent advances in estimating nonlinear models with applications in economics and finance, J. Ma and M. Wohar (Eds.). Springer, New York. SWAMY, P. A. V. B., MEHTA, J. S., TAVLAS, G. S. and HALL, S. G. (2015). Two applications of the random coefficient procedure: Correcting for misspecifications in a small area level model and resolving Simpson’s paradox. Economic Modelling, 45, 93-98. SWAMY, P. A. V. B., MEHTA, J. S. and CHANG, I. (2017). Endogeneity, time-varying coefficients, and incorrect vs. correct ways of specifying the error terms of econometric models. Econometrics, 5, 8; doi: 10.3390/econometrics/5010008.
  • 42. 42 SWAMY, P. A. V. B., VON ZUR MUEHLEN, P., MEHTA, J. S. and CHANG, I. (2018). Alternative approaches to the econometrics of panel data. In Panel data econometrics theory, Mike Tsionas (Ed.). Academic Press/Elsevier, 50 Hampshire Street, 5th Floor, Cambridge, MA. ZELLNER, A. (1979). Causality and econometrics. In Three aspects of policy and policymaking, K. Brunner and A.H. Meltzer (Eds.). North Holland, Amsterdam. ZELLNER, A. (1988). Causality and causal laws in economics. J. Econom., 39, 7-21.