SlideShare a Scribd company logo
1 of 36
1
A PRIMER ON CAUSALITY
Marc F. Bellemare∗
Introduction
This is the second of two handouts written to help students
understand quantitative methods in the social
sciences. This handout is dedicated to discussing (some) of the
ways in which one can identify causal
relationships in the social sciences. In keeping with the notation
introduced in the handout on linear
regression, let �� be our variable of interest; �� be an
outcome of interest; and the vector �� = (��1, … , ����)
represent other factors – or control variables – for which we
have data. For the purposes of this discussion,
let �� measure a given policy, �� measure welfare, and the
vector �� measure the various control variables the
researcher has seen fit to include. See my “A Primer on Linear
Regression” for a more basic handout.
Mechanics
Recall that the regression of �� on (��, ��1, … , ����) is
written as
���� = �� + ��1��1�� + ⋯ + ���������� +
������ + ����, (1)
where i denotes a unit of observation. In the example of wages
and education, the unit of observation would
be an individual, but units of observations can be individuals,
households, plots, firms, villages, communities,
countries, etc. Just as the research question should drive the
choice of what to measure for ��, ��, and ��, the
research question also drives the choice of the relevant unit of
observation.
The problem is that unless the researcher runs an experiment in
which she randomly assigns the level of �� to
each unit of observation i, the relationship from �� to �� will
not be causal. That is, �� will not truly capture the
impact of �� on ��, as it will be “contaminated” by the
presence of unobservable factors. Some of those factors
can be included in �� = (��1, … , ����), of course, but it is
in general impossible to fully control for every relevant
factor. This is especially true when unobservable or costly to
observe factors (e.g., risk aversion, technical
ability, soil quality, etc.) play an important role in determining
�� and ��. So even if we get an estimate of ��
that is statistically significant, we cannot necessarily assume
that the relationship between the variable of
interest and the outcome variable is causal. In other words,
correlation does not imply causation.
For example, suppose �� is an individual’s consumption of
orange juice and �� is (some) indicator of health.
We have often discussed in lecture how a simple regression of
�� to �� would provide us with a biased
estimate of �� because orange juice consumption is nonrandom
and not exogenous to health. That is, there
are factors other than orange juice consumption which
determine health. Some are observable (e.g., how
much someone exercises; whether they smoke; their diet; etc.),
but several are unobservable (e.g., their
willingness to pay for orange juice; their subjective valuation of
health; their level of risk aversion; their
genes; etc.) Thus, it really isn’t sufficient to run a kitchen-sink
regression (i.e., a regression in which
everything observable is thrown in as a control) to properly
identify the causal impact of �� on ��.
∗ Associate Professor, Department of Applied Economics, and
Director, Center for International Food and
Agricultural Policy, University of Minnesota, 1994 Buford Ave,
Saint Paul, MN 55113, [email protected] This is
the August 2017 version of this handout.
mailto:[email protected]
2
Identification
So how do we identify causality? The best way to do so is to
run a randomized controlled trial (RCT), which
we have discussed in lecture. In this case, the idea would be to
get a random sample of individuals of size ��
and to assign half of the sample (i.e., �� 2⁄ ) to a control group
and half to a treatment group. The latter group
would be told to consume, say, one glass of orange juice every
morning, and the other half would be told not
to do so. Then, after a suitable period of time, we would
compare the mean of �� between groups. The null
hypothesis would of course be that the mean health of the
treatment group is equal to the mean health of
the control group. A rejection of the null in favor of finding that
the mean health of the treatment group is
higher than the mean health of the control group would then be
evidence in favor of the hypothesis that
orange juice is good for one’s health. More than that – it would
be evidence in favor that orange juice
consumption causes good health.
The problem is that it is not always possible to run an RCT, and
even the simple example described above
would be subject to important problems. For example, the
individuals in the treatment group may not
comply with the experimenters instructions, especially if they
don’t like orange juice. More generally, they
may simply forget to consume orange juice every morning.
Likewise, the individuals in the control group may
end up inadvertently consuming orange juice when they are not
supposed to. These reasons – and others –
would contaminate one’s estimate of �� in equation 1 and
would invalidate the test of equality of means
described above. So what is one to do?
Instrumental Variables Estimation
When one only has observational (i.e., nonexperimental) data at
one’s disposal, the best way to identify
causality is to find an instrumental variable (IV) for the
endogenous variable. In the example above, the
endogenous variable is ��, which is said to be endogenous to
��.
What is an IV? It is a variable �� that is (i) correlated with
��; but (ii) uncorrelated with �� and which is
used to make �� exogenous to ��. How does an IV exogenize
an endogenous variable? By virtue of being
correlated with the endogenous variable, yet uncorrelated with
the error term, which is the definition of
an instrument.
I realize that this sounds tautological, so for example, Angrist
(1990) studies the impact of education (��)
on wages (��). The problem is that education is endogenous to
wage, if anything because people acquire
education in expectation of the wage they think this will get
them. In other words, even if we find a
positive coefficient for education in a regression of wage on
explanatory variables, this is merely a
correlation, and it does not necessarily indicate that education
causally affects wages.
To instrument for this, Angrist had to find a variable that would
be correlated with how much education
someone would get, but uncorrelated with anything unobserved
and would affect wage only through
how much education they acquire. The instrument he settled
upon was an individual’s Vietnam draft
lottery number, since this correlates with whether one goes to
war and is then subject to the GI Bill, but
since those numbers are randomly generated, they are
uncorrelated with unobservables.
How does IV estimation work, mechanically speaking? Recall
that our equation of interest is
3
���� = �� + ��1��1�� + ⋯ + ���������� +
������ + ����. (1)
The way IV estimation proceeds is to first regress the
endogenous variable �� on the instrument �� as well as
on the control variables in �� = (��1, … , ����), such that
���� = �� + ��1��1�� + ⋯ + ���������� +
������ + ����. (2)
Once equation 2 is estimated, it is possible to predict the
variable ��, whose prediction we label ��� (the
circumflex accent – or “hat” – denotes a predicted variable in
econometrics) and to then estimate equation 1
as follows
���� = �� + ��1��1�� + ⋯ + ���������� +
������� + ����. (1’)
Note what has been done here: we have replaced the endogenous
variable with an exogenized version of the
same variable. The way it has been exogenized has been by
regressing it on the IV, which is exogenous to the
outcome of interest, and to obtain its predicted value, which we
then use in lieu of the original endogenous
variable.
The first requirement of an instrument – i.e., that it be
correlated with �� – is easily testable: we only need
to check that the coefficient �� in equation 2 is significantly
different enough from zero. The second
requirement of an instrument – i.e., that it only affect the
outcome of interest �� through the treatment
variable – cannot be tested for. Rather, one must make the case
that it is truly exogenous to the outcome of
interest. This is easier said than done in most cases, as some
people have devoted entire careers to finding
good IVs.
References
Angrist, Joshua D. (1990), “Lifetime Earnings and the Vietnam
Era Draft Lottery: Evidence from the Social
Security Administrative Records,” American Economic Review
80(3): 313-336.
IntroductionMechanicsIdentificationInstrumental Variables
EstimationReferences
Therapeutic Communication
1. In the movie, Shutter Island, Dr. Crawley asked Teddy
(Andrew) to explain what happened when
he discovered his deceased children? What therapeutic
communication is this an example of?
a. Closed ended questions
b. Using silence
c. Giving broad opening
d. Open ended questions
Rationale: Opened ended questions is correct because this i s
giving Andrew a chance to answer with
more than a simple yes or no. Using silence is incorrect because
he is not being quiet. Closed ended
questions is incorrect because he cannot answer the question
with just a yes or no answer. Giving broad
opening is incorrect because Dr. Crawley did ask Andrew to
pick the topic and express his thoughts.
2. Dr. Crawley told Andrew if he had another episode, he would
be lobotomized. In the ending of
the movie, Teddy called Dr. Sheehan, Chuck. Dr. Sheehan nods
his head at Dr. Crawley giving a signal.
What therapeutic communication is this?
a. Making observations
b. Presenting reality
c. Offering self
d. Accepting
Rationale: Making observations is correct because Dr. Sheehan
observed that Teddy called him Chuck,
verifying that he was having another episode. Presenting reality
is incorrect because he was not trying to
bring him back into reality. Offering self is incorrect because
the doctors are not “interested” into Teddy.
Accepting is incorrect because they are not accepting Teddy
behavior as normal.
3. In the movie, Shutter Island, the staff of Ashecliffe allowed
Andrew to play out the role of Teddy
hoping to cure his conspiracy insanity. This is an example of
what kind of therapeutic communication?
a. Accepting
b. Exploring
c. Engaging into fantasy
d. Denial
Rationale: Engaging into fantasy is correct because the staff
played into his fantasy of finding Andrew
Laeddis. Exploring is incorrect because exploring is delving
further into a subject, idea, experience, or
relationship. Denial is incorrect because no one was in denial
about his state of mind. Accepting is
incorrect because they know this act is only to cure his disease.
4. Which question shows an example of the therapeutic
communication placing events in
sequence?
a. “You feel angry when he doesn’t help.”
b. “Are you feeling…”
c. “What could you do to let your anger out harmlessly?”
d. “Will you please tell me more about the situation with all the
details?”
Rationale: “Will you please tell me more about the situation
with all the details” is an example of placing
events in sequence because it will tell you more about the
situation and you can piece together when
they happened. The others are incorrect because “You feel
angry when he doesn’t help” is an example
of focusing, “Are you feeling…” is an example of verbalizing
the implied and, “What could you do to let
your anger out harmlessly?” is an example of formulating a plan
of action.
5. Which of the following statements by Andrew Laeddis shows
the therapeutic communication of
understanding?
a. I feel ok.
b. This is a very difficult situation.
c. Yes, I understand that Teddy Daniels does not exist.
d. You are not listening to me.
Rationale: “Yes, I understand that Teddy Daniels does not
exist” is an example of understanding because
it coveys an attitude of receptivity and regard. The other
statements “I feel ok”, “This is a very difficult
situation”, and “You are not listening to me” are incorrect
because they are responses of implied
questions.
Defense Mechanism
1. What defense mechanism does Teddy exhibit in the movie
Shutter Island?
a. Sublimination
b. Rationalization
c. Projection
d. Fantasy
Rationale: Teddy is exhibiting fantasy because he is gratifying
frustrated desires by imaginary
achievements. In the movie Teddy is creating his own fantasy
world where he is still a US Marshall, and
he creates his own story of what happened to his wife because
he does not want to face the reality of
what really happened. Sublimination is incorrect because he is
not channeling an unacceptable impulse
in a socially acceptable direction. Rationalization is incorrect
because he is not trying to justify attitudes,
beliefs, or behaviors. Projection is incorrect because he is not
attributing his own unacceptable behavior
unto someone else.
NUR 114 Nursing Concept II
TYPES OF QUESTIONS
• Open questions
These are useful in getting another person to speak. They often
begin with the words: What, Why,
When, Who
Sometimes they are statements: “tell me about”, “give me
examples of”.
They can provide you with a good deal of information.
• Closed questions
These are questions that require a yes or no answer and are
useful for checking facts. They should
be used with care - too many closed questions can cause
frustration and shut down conversation.
• Specific questions
These are used to determine facts. For example “How much did
you spend on that”
• Probing questions
These check for more detail or clarification. Probing questions
allow you to explore specific areas.
However be careful because they can easily make people feel
they are being interrogated .
• Hypothetical questions
These pose a theoretical situation in the future. For example,
“What would you do if…?’ These
can be used to get others to think of new situations. They can
also be used in interviews to find
out how people might cope with new situations.
• Reflective questions
You can use these to reflect back what you think a speaker has
said, to check understanding. You
can also reflect the speaker’s feelings, which is useful in
dealing with angry or difficult people and
for defusing emotional situations.
• Leading questions.
These are used to gain acceptance of your view – they are not
useful in providing honest views
and opinions. If you say to someone ‘you will be able to cope,
won’t you?’ they may not like to
disagree.
You can use a series of different type of questions to “funnel”
information. This is a way of
structuring information in sequence to explore a topic and to get
to the heart of the issues. You may use
an open question, followed by a probing question, then a
specific question and a reflective question.
1
A PRIMER ON LINEAR REGRESSION
Marc F. Bellemare∗
Introduction
This set of lecture notes was written to allow you to understand
the classical linear regression model, which
is one of the most common tools of statistical analysis in the
social sciences. Among other things, a regression
allows the researcher to estimate the impact of a variable of
interest �� on an outcome of interest �� holding
other included factors �� = (��1, … , ����) constant. For
the purposes of this discussion, let �� measure a given
policy, �� measure welfare, and the vector �� measure the
various control variables the researcher has seen fit
to include.
Example
For example, one might be interested in the impact of
individuals’ years of education �� on their wage �� while
controlling for age, gender, race, state, sector of employment,
etc. in ��. Generally, social science research is
interested in the impact of a specific variable of interest on an
outcome of interest, i.e., in the impact of �� on
��.
Mechanics
The regression of �� on (��, ��1, … , ����) is typically
written as
���� = �� + ��1��1�� + ⋯ + ���������� +
������ + ����, (1)
where i denotes a unit of observation. In the example of wages
and education, the unit of observation would
be an individual, but units of observations can be individuals,
households, plots, firms, villages, communities,
countries, etc. Just as the research question should drive the
choice of what to measure for ��, ��, and ��, the
research question also drives the choice of the relevant unit of
observation.
When estimating equation 1, the researcher will have data on
�� units of observations, so �� = 1, … , ��.
Alternatively, we say that �� is the sample size. For each of
those �� units, the researcher will have data on ��,
��, and ��. In other words, we will ignore the problem of
missing data, as observations with missing data are
usually dropped by most statistical packages.
The role of regression analysis is to estimate the coefficients
(��, ��1, … , ����, ��). To differentiate the “true”
coefficients from coefficient estimates, we will use a circumflex
accent (i.e., a “hat”) to denote estimated
coefficients. Therefore, the estimated (��, ��1, … , ����,
��) will be denoted (���, �̂��1, … , �̂����, ���).
Going back to our interest in estimating the impact of �� on
�� at the margin, this impact is represented in the
context of equation 1 by the parameter ��. Indeed, if you
remember your partial derivatives, the marginal
∗ Associate Professor, Department of Applied Economics, and
Director, Center for International Food and
Agricultural Policy, University of Minnesota, 1994 Buford Ave,
Saint Paul, MN 55113, [email protected] This is
the August 2017 version of this handout.
mailto:[email protected]
2
impact of �� on �� at is equal to ����
����
= ��. Moreover, the partial derivative is such that only ��
varies. In other
words, �� measures the impact of a change in �� on ��
holding everything else constant, or ceteris paribus. In
this case, what we mean by “everythi ng else” is limited only to
the factors that are included in the vector �� of
control variables. Whatever is not included among the variables
�� is not held constant by regression analysis.
Indeed, the relationship in equation 1 is not deterministic in the
sense that even if we have data for ��, ��, and
�� and credible parameter estimates (���, �̂��1, … ,
�̂����, ���), we will still not be able to perfectly forecast
for ��. That
is because there are several things about any given problem that
we, as social science researchers, do not
observe and are not privy to. Individuals have intrinsic
motivations that even they may have difficulty
expressing. Individuals make errors. Individuals experience
unforeseen events. There are factors which are
very important in determining �� but which we simply do not
observe.
For all these reasons, we add an error term �� at the end of
equation 1. The error term simply represents our
ignorance about the problem. As such, it includes all of the
things that we did not think of including on the
right-hand side of equation 1, as well as all of the things that we
could not include on the right-hand side of
equation 1. The error term �� thus embodies our ignorance
about the relationship between two variables.
So how does a linear regression actually work? To take an
example I know well, suppose we are looking at
only two variables: rice yield (i.e., kg/are), which will be our
outcome of interest �� since it represents
agricultural productivity, and cultivated area (number of ares,
or hundredths of a hectare, or 100 square
meters), which will be our variable of interest ��. Indeed, the
inverse relationship between farm or plot size
and productivity has been a longstanding empirical puzzle in
development microeconomics (Barrett et al.,
2010). So, plotting some data on this question, we get the
following figure.
The scatter plot in figure 1 directly shows that the relationship
between yield and cultivated area is not
deterministic. That is, the relationship between the two
variables is not a straight line, and the fact that the
relationship is scattered indicates that there are other factors
besides cultivated area that contributed to
determining rice productivity.
The role of the regression – and, as we will soon understand, of
the error term – is to linearly approximate as
best as possible the relationship between two variables. In other
words, to do something that looks like the
red line in figure 2.
3
Figure 1. Rice Productivity Scatter
Figure 2. Rice Productivity Scatter and Regression Line
1
2
3
4
5
Y
ie
ld
-2 0 2 4 6
Cultivated Area
1
2
3
4
5
-2 0 2 4 6
Cultivated Area
Yield Fitted values
4
Note that we indeed find an inverse relationship between plot
size and productivity, since the regression line
slopes downward, which means that in this context, ��� < 0.
Indeed, running a simple regression of rice yield
on cultivated area yields ��� = 4.187 (and we can see from the
graph that 4.187 would indeed be the value of
rice productivity at a cultivated area of zero ares) and ��� =
−0.356, with both coefficients statistically
different from zero at less than the 1 percent level (i.e., there is
a less than one percent chance �� and �� are
no different from zero). In other words, the finding here is that
on average,
�� = 4.187 − 0.356��, (2)
or for every 1 percent increase in cultivated area, rice
productivity decreases by 0.356 percent. This may
seem counterintuitive, but remember that productivity is not
total output – it is only a measure of average
productivity on the plot.
So how does the apparatus of the linear regression determine
the value of the intercept and the value of the
slope of the regression line in figure 2? This is where our error
term comes into play. Indeed, a linear
regression will choose, among all possible lines, the one that
minimizes the sum of the distances between
each point in the scatter and the line itself, under the
assumption that the error is on average equal to zero
(i.e., that our predictions are right on average). Assuming that
the error term is equal to zero on average and
minimizing the sum of all point-line distances (technically, the
sum of squared errors) allows us to obtain
estimates ��� and ��� of the true parameters �� and ��.
A few remarks are in order. First off, note that the constant term
(or the intercept) �� does not have an
economic interpretation in this case, since a cultivated area of
zero really entails a yield of zero. Second, since
the only factor we included on the right-hand side of equation 1
was cultivated area, the error term includes
a lot of things which may be potentially crucial in determining
yield. For example, the plot’s position on the
toposequence, the quality of the soil, the source of irrigation of
the plot, various characteristics of the
household operating the plot, etc. So because we are typically
interested in the impact of �� on �� controlling
for a number of factors ��, regression results will not be
presented in the form of figure 2. Indeed, regression
results are typically presented in the form of table 1 at the end
of this document.
How do we interpret table 1? First off, note that N = 466. That
is, we have data on 466 plots. The first column
tells us what variables are included on the right-hand side of
equation 1, viz. cultivated area; land value; total
land owned by the household; household size (number of
individuals); household dependency ratio
(proportion of dependents within the household); whether the
household head is a single female or a single
male; whether the plot is irrigated by a dam, a spring, or
rainfed; soil quality measurements (carbon,
nitrogen, potassium percentages; soil pH; clay, silt, and sand
percentages); and an intercept. The second
column shows the estimated coefficients for the first
specification of equation 1 (in this case, a pooled cross -
section of all the plots and all the households, i.e., a
specification which ignores the fact that some
households own more than just one plot in the sample); and the
third column shows the standard errors
around each estimated coefficient.
These standard errors are used to determine whether each
coefficient is statistically significantly different
from zero or not. To make life simpler, table 1 shows whether
coefficients are significant at the 10, 5, or 1
percent levels by using the symbols *, **, and *** respectively.
Note that in all cases, there is a (significant)
inverse relationship between cultivated area and rice
productivity.
5
Taking column 1 as an example of how to interpret regression
results, what can we say? First and foremost,
note that for a 1 percent increase in cultivated area, there is an
associated productivity decrease of 0.27
percent (alternatively, a doubling of the size of the plot would
be associated with a 27-percent decrease in
productivity). Moreover, we can note three things. First off, the
more valuable a plot, the more productive it
is; second, plots irrigated by a dam are more productive than
plots without any irrigation; and third, plots
irrigated by a spring are more productive than plots without any
irrigation. In fact, comparing the magnitude
of the coefficient estimates for irrigation by a dam and
irrigation by a spring, we see that the impacts of these
two types of irrigation are essentially the same.
Another thing of note in table 1 is how the coefficient on the
variable of interest changes depending on what
is included on the right-hand side of equation 1. Comparing the
first two specifications (i.e., pooled cross-
section vs. household fixed effects), note how the magnitude of
the inverse relationship between
productivity and cultivated area is reduced from -0.271 to -
0.176 when household fixed effects (i.e., controls
for household-specific unobservables characteristics, which is
made possible here because there are 286
households for 466 plots; in other words, there are some
households who own more than one plot in the
sample) are included. This indicates that a great deal of the
inverse relationship can be attributed to
household-specific, otherwise unobservable factors. Likewise,
comparing specifications 1 and 3 (i.e., pooled
cross-section vs. soil quality), note again how the magnitude of
the inverse relationship between productivity
and cultivated area is reduced from -0.271 to -0.265 when soil
quality measurements are included. Overall,
this indicates that household-specific, unobserved factors are
more important in driving the inverse
relationship than the omission of soil quality measurements. In
any event, a comparison of specifications 1
and 2 and of specifications 1 and 3 point to an important
endogeneity problem (in this case, an omitted
variables problem) caused by the omission, respectively, of
household fixed effects and of soil quality
measurements.
References
Barrett, Christopher B., Marc F. Bellemare, and Janet Y. Hou
(2010), “Reconsidering Conventional
Explanations of the Inverse Productivity—Size Relationship,”
World Development 38(1): 88-97.
6
Table 1 – Yield Approach Estimation Results (n=466)
(1)
Pooled Cross-Section
(2)
Household Fixed Effects
(3)
Soil Quality
(4)
Household Fixed Effects and
Soil Quality
Variable Coefficient (Std. Err.) Coefficient (Std. Err.)
Coefficient (Std. Err.) Coefficient (Std. Err.)
Dependent Variable: Rice Yield (Kilograms/Are)
Cultivated Area -0.271*** (0.038) -0.176*** (0.046) -0.265***
(0.048) -0.187*** (0.052)
Total Land Area -0.055 (0.038) -0.054 (0.047)
Land Value 0.183*** (0.031) 0.303*** (0.069) 0.176***
(0.032) 0.287*** (0.063)
Household Characteristics
Household Size -0.007 (0.009) -0.008 (0.008)
Dependency Ratio -0.073 (0.130) -0.083 (0.144)
Single Female -0.056 (0.111) -0.070 (0.119)
Single Male 0.133 (0.134) 0.122 (0.155)
Plot Characteristics
Irrigated by Dam 0.389** (0.171) 0.228 (0.202) 0.450** (0.211)
0.545 (0.402)
Irrigated by Spring 0.365** (0.175) 0.250 (0.214) 0.425**
(0.214) 0.541 (0.389)
Irrigated by Rain 0.184 (0.180) 0.024 (0.217) 0.249 (0.220)
0.313 (0.431)
Soil Quality Measurements
Carbon -1.361 (1.510) -0.001 (1.844)
Nitrogen 1.668 (1.781) -0.007 (2.750)
pH -1.064 (7.163) -17.969 (14.459)
Potassium 1.183 (1.412) -5.528* (3.035)
Clay 0.293 (3.115) -5.183 (4.174)
Silt 0.521 (5.751) 4.485 (11.681)
Sand -0.135 (5.607) 5.261 (7.106)
Intercept -1.847*** (0.372) -3.162*** (0.694) -2.276 (1.552) -
3.328*** (0.819)
Number of Households – 286 – 286
Bootstrap Replications – – 500 500
Village Fixed Effects Yes Dropped Yes Dropped
R2 0.45 0.97 0.46 0.97
p-value (All Coefficients) 0.00 0.00 0.00 0.00
p-value (Fixed Effects) – 0.00 – 0.00
p-value (Soil Quality) – – 0.79 0.52
***, ** and * indicate statistical significance at the one, five
and ten percent levels, respectively.
IntroductionExampleMechanicsReferences
Ordinary Least-Squares
Ordinary Least-Squares
Ordinary Least-Squares
Ordinary Least-Squares
Ordinary Least-Squares
One-dimensional regression
x
y
Ordinary Least-Squares
One-dimensional regression
y = ax
Find a line that represent the
”best” linear relationship:
x
y
Ordinary Least-Squares
One-dimensional regression
iiie = y - x a
• Problem: the data does not
go through a line
iiy - x a
x
y
Ordinary Least-Squares
One-dimensional regression
iiie = y - x a
• Problem: the data does not
go through a line
• Find the line that minimizes
the sum:
i
iiå(y - x a)2
iiy - x a
x
y
Ordinary Least-Squares
One-dimensional regression
x
̂
iiie = y - x a
i
i = å(y -x a )2e(a)
• Problem: the data does not
go through a line
• Find the line that minimizes
the sum:
• We are looking for that
minimizes
i
iiå(y - x a)2
iiy - x a
x
y
Ordinary Least-Squares
Multidimentional linear regression
Using a model with m parameters
å=
j
jjmm xay = x a + ...+ x a11
Ordinary Least-Squares
Multidimentional linear regression
Using a model with m parameters
2x
å=
j
jjmm xay = a x + ...+ a x11
1
x
y
Ordinary Least-Squares
Multidimentional linear regression
Using a model with m parameters
2a
å=++=
j
jjmm xaxaxab ...11
1a
b
Ordinary Least-Squares
Multidimentional linear regression
Using a model with m parameters
and n measurements
å=+
j
jjmm xaxay = a x + ...11
2
2
1
,
2
1 1
, )(
y
(a)
y - Ax=
ú
û
ù
ê
ë
é
-=
-=
å
å å
=
= =
m
j
jji
n
i
m
j
jjii
xa
xaye
Ordinary Least-Squares
Multidimentional linear regression
Using a model with m parameters
and n measurements
å=+
j
jjmm xaxay = a x + ...11
2
2
1
,
2
1 1
, )(
(a)
y
(a)
y - Ax=
ú
û
ù
ê
ë
é
-=
-=
å
å å
=
= =
e
x a
x aye
m
j
jji
n
i
m
j
jjii
Ordinary Least-Squares
Minimizing e(a)
amin minimizes e(a) if
Ordinary Least-Squares
x
y
ei
no errors in ai
Ordinary Least-Squares
x
y
ei
x
y
ei
no errors in xi errors in xi
Ordinary Least-Squares
x
y
homogeneous errors
Ordinary Least-Squares
x
y
x
y
homogeneous errors non-homogeneous errors
Ordinary Least-Squares
x
y
no outliers
Ordinary Least-Squares
x
y
x
y
no outliers outliers
outliers

More Related Content

Similar to 1 A PRIMER ON CAUSALITY Marc F. Bellemare∗ Introd

Similar to 1 A PRIMER ON CAUSALITY Marc F. Bellemare∗ Introd (12)

Chapter 1 - AP Psychology
Chapter 1 - AP PsychologyChapter 1 - AP Psychology
Chapter 1 - AP Psychology
 
Epidemiology Depuk sir_ 1,2,3 chapter,OK
Epidemiology Depuk sir_ 1,2,3 chapter,OKEpidemiology Depuk sir_ 1,2,3 chapter,OK
Epidemiology Depuk sir_ 1,2,3 chapter,OK
 
Bergman Psych- ch 01
Bergman Psych- ch 01Bergman Psych- ch 01
Bergman Psych- ch 01
 
9e ch 01
9e ch 019e ch 01
9e ch 01
 
Chapter 1, Myers Psychology 9e
Chapter 1, Myers Psychology 9eChapter 1, Myers Psychology 9e
Chapter 1, Myers Psychology 9e
 
Myers 9e ch1 - Thinking Critically with Psychological Science
Myers 9e ch1 - Thinking Critically with Psychological ScienceMyers 9e ch1 - Thinking Critically with Psychological Science
Myers 9e ch1 - Thinking Critically with Psychological Science
 
Research design
Research designResearch design
Research design
 
1AFFECTIVE FORECASTINGAFFECTIVE FORECASTING
1AFFECTIVE FORECASTINGAFFECTIVE FORECASTING 1AFFECTIVE FORECASTINGAFFECTIVE FORECASTING
1AFFECTIVE FORECASTINGAFFECTIVE FORECASTING
 
Correlations and t scores (2)
Correlations and t scores (2)Correlations and t scores (2)
Correlations and t scores (2)
 
Articulo 50 palabras
Articulo 50 palabras Articulo 50 palabras
Articulo 50 palabras
 
Logic of social inquiry
Logic of social inquiryLogic of social inquiry
Logic of social inquiry
 
Sig Tailed
Sig TailedSig Tailed
Sig Tailed
 

More from VannaJoy20

©2017 Walden University 1 BP1005 Identity as an Early.docx
©2017 Walden University   1 BP1005 Identity as an Early.docx©2017 Walden University   1 BP1005 Identity as an Early.docx
©2017 Walden University 1 BP1005 Identity as an Early.docxVannaJoy20
 
 Print, complete, and score the following scales. .docx
              Print, complete, and score the following scales. .docx              Print, complete, and score the following scales. .docx
 Print, complete, and score the following scales. .docxVannaJoy20
 
 Consequentialist theory  Focuses on consequences of a.docx
 Consequentialist theory  Focuses on consequences of a.docx Consequentialist theory  Focuses on consequences of a.docx
 Consequentialist theory  Focuses on consequences of a.docxVannaJoy20
 
 The theory that states that people look after their .docx
 The theory that states that people look after their .docx The theory that states that people look after their .docx
 The theory that states that people look after their .docxVannaJoy20
 
 This is a graded discussion 30 points possibledue -.docx
 This is a graded discussion 30 points possibledue -.docx This is a graded discussion 30 points possibledue -.docx
 This is a graded discussion 30 points possibledue -.docxVannaJoy20
 
· Please include the following to create your Argumentative Essay .docx
· Please include the following to create your Argumentative Essay .docx· Please include the following to create your Argumentative Essay .docx
· Please include the following to create your Argumentative Essay .docxVannaJoy20
 
• FINISH IVF• NATURAL FAMILY PLANNING• Preimplanta.docx
• FINISH IVF• NATURAL FAMILY PLANNING• Preimplanta.docx• FINISH IVF• NATURAL FAMILY PLANNING• Preimplanta.docx
• FINISH IVF• NATURAL FAMILY PLANNING• Preimplanta.docxVannaJoy20
 
 Use the information presented in the module folder along with your.docx
 Use the information presented in the module folder along with your.docx Use the information presented in the module folder along with your.docx
 Use the information presented in the module folder along with your.docxVannaJoy20
 
• Ryanairs operations have been consistently plagued with emp.docx
• Ryanairs operations have been consistently plagued with emp.docx• Ryanairs operations have been consistently plagued with emp.docx
• Ryanairs operations have been consistently plagued with emp.docxVannaJoy20
 
· Your initial post should be at least 500 words, formatted and ci.docx
· Your initial post should be at least 500 words, formatted and ci.docx· Your initial post should be at least 500 words, formatted and ci.docx
· Your initial post should be at least 500 words, formatted and ci.docxVannaJoy20
 
• ALFRED CIOFFI• CATHOLIC PRIEST, ARCHDIOCESE OF MIAMI.docx
• ALFRED CIOFFI• CATHOLIC PRIEST, ARCHDIOCESE OF MIAMI.docx• ALFRED CIOFFI• CATHOLIC PRIEST, ARCHDIOCESE OF MIAMI.docx
• ALFRED CIOFFI• CATHOLIC PRIEST, ARCHDIOCESE OF MIAMI.docxVannaJoy20
 
· Implementation of research projects is very challenging.docx
· Implementation of research projects is very challenging.docx· Implementation of research projects is very challenging.docx
· Implementation of research projects is very challenging.docxVannaJoy20
 
©McGraw-Hill Education. All rights reserved. Authorized only.docx
©McGraw-Hill Education. All rights reserved. Authorized only.docx©McGraw-Hill Education. All rights reserved. Authorized only.docx
©McGraw-Hill Education. All rights reserved. Authorized only.docxVannaJoy20
 
••••••.docx
••••••.docx••••••.docx
••••••.docxVannaJoy20
 
· Epidemiology · Conceptual issues· Anxiety· Mood diso.docx
· Epidemiology · Conceptual issues· Anxiety· Mood diso.docx· Epidemiology · Conceptual issues· Anxiety· Mood diso.docx
· Epidemiology · Conceptual issues· Anxiety· Mood diso.docxVannaJoy20
 
· Reflect on the four peer-reviewed articles you critically apprai.docx
· Reflect on the four peer-reviewed articles you critically apprai.docx· Reflect on the four peer-reviewed articles you critically apprai.docx
· Reflect on the four peer-reviewed articles you critically apprai.docxVannaJoy20
 
· Choose a B2B company of your choice (please note that your chose.docx
· Choose a B2B company of your choice (please note that your chose.docx· Choose a B2B company of your choice (please note that your chose.docx
· Choose a B2B company of your choice (please note that your chose.docxVannaJoy20
 
© Strayer University. All Rights Reserved. This document conta.docx
© Strayer University. All Rights Reserved. This document conta.docx© Strayer University. All Rights Reserved. This document conta.docx
© Strayer University. All Rights Reserved. This document conta.docxVannaJoy20
 
©2005-2009 by Alexander Chernev. Professor Alexander Che.docx
©2005-2009 by Alexander Chernev. Professor Alexander Che.docx©2005-2009 by Alexander Chernev. Professor Alexander Che.docx
©2005-2009 by Alexander Chernev. Professor Alexander Che.docxVannaJoy20
 
©2014 by the Kellogg School of Management at Northwestern .docx
©2014 by the Kellogg School of Management at Northwestern .docx©2014 by the Kellogg School of Management at Northwestern .docx
©2014 by the Kellogg School of Management at Northwestern .docxVannaJoy20
 

More from VannaJoy20 (20)

©2017 Walden University 1 BP1005 Identity as an Early.docx
©2017 Walden University   1 BP1005 Identity as an Early.docx©2017 Walden University   1 BP1005 Identity as an Early.docx
©2017 Walden University 1 BP1005 Identity as an Early.docx
 
 Print, complete, and score the following scales. .docx
              Print, complete, and score the following scales. .docx              Print, complete, and score the following scales. .docx
 Print, complete, and score the following scales. .docx
 
 Consequentialist theory  Focuses on consequences of a.docx
 Consequentialist theory  Focuses on consequences of a.docx Consequentialist theory  Focuses on consequences of a.docx
 Consequentialist theory  Focuses on consequences of a.docx
 
 The theory that states that people look after their .docx
 The theory that states that people look after their .docx The theory that states that people look after their .docx
 The theory that states that people look after their .docx
 
 This is a graded discussion 30 points possibledue -.docx
 This is a graded discussion 30 points possibledue -.docx This is a graded discussion 30 points possibledue -.docx
 This is a graded discussion 30 points possibledue -.docx
 
· Please include the following to create your Argumentative Essay .docx
· Please include the following to create your Argumentative Essay .docx· Please include the following to create your Argumentative Essay .docx
· Please include the following to create your Argumentative Essay .docx
 
• FINISH IVF• NATURAL FAMILY PLANNING• Preimplanta.docx
• FINISH IVF• NATURAL FAMILY PLANNING• Preimplanta.docx• FINISH IVF• NATURAL FAMILY PLANNING• Preimplanta.docx
• FINISH IVF• NATURAL FAMILY PLANNING• Preimplanta.docx
 
 Use the information presented in the module folder along with your.docx
 Use the information presented in the module folder along with your.docx Use the information presented in the module folder along with your.docx
 Use the information presented in the module folder along with your.docx
 
• Ryanairs operations have been consistently plagued with emp.docx
• Ryanairs operations have been consistently plagued with emp.docx• Ryanairs operations have been consistently plagued with emp.docx
• Ryanairs operations have been consistently plagued with emp.docx
 
· Your initial post should be at least 500 words, formatted and ci.docx
· Your initial post should be at least 500 words, formatted and ci.docx· Your initial post should be at least 500 words, formatted and ci.docx
· Your initial post should be at least 500 words, formatted and ci.docx
 
• ALFRED CIOFFI• CATHOLIC PRIEST, ARCHDIOCESE OF MIAMI.docx
• ALFRED CIOFFI• CATHOLIC PRIEST, ARCHDIOCESE OF MIAMI.docx• ALFRED CIOFFI• CATHOLIC PRIEST, ARCHDIOCESE OF MIAMI.docx
• ALFRED CIOFFI• CATHOLIC PRIEST, ARCHDIOCESE OF MIAMI.docx
 
· Implementation of research projects is very challenging.docx
· Implementation of research projects is very challenging.docx· Implementation of research projects is very challenging.docx
· Implementation of research projects is very challenging.docx
 
©McGraw-Hill Education. All rights reserved. Authorized only.docx
©McGraw-Hill Education. All rights reserved. Authorized only.docx©McGraw-Hill Education. All rights reserved. Authorized only.docx
©McGraw-Hill Education. All rights reserved. Authorized only.docx
 
••••••.docx
••••••.docx••••••.docx
••••••.docx
 
· Epidemiology · Conceptual issues· Anxiety· Mood diso.docx
· Epidemiology · Conceptual issues· Anxiety· Mood diso.docx· Epidemiology · Conceptual issues· Anxiety· Mood diso.docx
· Epidemiology · Conceptual issues· Anxiety· Mood diso.docx
 
· Reflect on the four peer-reviewed articles you critically apprai.docx
· Reflect on the four peer-reviewed articles you critically apprai.docx· Reflect on the four peer-reviewed articles you critically apprai.docx
· Reflect on the four peer-reviewed articles you critically apprai.docx
 
· Choose a B2B company of your choice (please note that your chose.docx
· Choose a B2B company of your choice (please note that your chose.docx· Choose a B2B company of your choice (please note that your chose.docx
· Choose a B2B company of your choice (please note that your chose.docx
 
© Strayer University. All Rights Reserved. This document conta.docx
© Strayer University. All Rights Reserved. This document conta.docx© Strayer University. All Rights Reserved. This document conta.docx
© Strayer University. All Rights Reserved. This document conta.docx
 
©2005-2009 by Alexander Chernev. Professor Alexander Che.docx
©2005-2009 by Alexander Chernev. Professor Alexander Che.docx©2005-2009 by Alexander Chernev. Professor Alexander Che.docx
©2005-2009 by Alexander Chernev. Professor Alexander Che.docx
 
©2014 by the Kellogg School of Management at Northwestern .docx
©2014 by the Kellogg School of Management at Northwestern .docx©2014 by the Kellogg School of Management at Northwestern .docx
©2014 by the Kellogg School of Management at Northwestern .docx
 

Recently uploaded

Book Review of Run For Your Life Powerpoint
Book Review of Run For Your Life PowerpointBook Review of Run For Your Life Powerpoint
Book Review of Run For Your Life Powerpoint23600690
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17Celine George
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi RajagopalEADTU
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjMohammed Sikander
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSAnaAcapella
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...Nguyen Thanh Tu Collection
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSean M. Fox
 
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMDEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMELOISARIVERA8
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...EduSkills OECD
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfPondicherry University
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....Ritu480198
 
Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesAmanpreetKaur157993
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...Nguyen Thanh Tu Collection
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptNishitharanjan Rout
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppCeline George
 
Observing-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxObserving-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxAdelaideRefugio
 

Recently uploaded (20)

Book Review of Run For Your Life Powerpoint
Book Review of Run For Your Life PowerpointBook Review of Run For Your Life Powerpoint
Book Review of Run For Your Life Powerpoint
 
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopal
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
 
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMDEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
 
OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 
Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategies
 
Supporting Newcomer Multilingual Learners
Supporting Newcomer  Multilingual LearnersSupporting Newcomer  Multilingual Learners
Supporting Newcomer Multilingual Learners
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio App
 
Observing-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxObserving-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptx
 

1 A PRIMER ON CAUSALITY Marc F. Bellemare∗ Introd

  • 1. 1 A PRIMER ON CAUSALITY Marc F. Bellemare∗ Introduction This is the second of two handouts written to help students understand quantitative methods in the social sciences. This handout is dedicated to discussing (some) of the ways in which one can identify causal relationships in the social sciences. In keeping with the notation introduced in the handout on linear regression, let �� be our variable of interest; �� be an outcome of interest; and the vector �� = (��1, … , ����) represent other factors – or control variables – for which we have data. For the purposes of this discussion, let �� measure a given policy, �� measure welfare, and the vector �� measure the various control variables the researcher has seen fit to include. See my “A Primer on Linear Regression” for a more basic handout. Mechanics Recall that the regression of �� on (��, ��1, … , ����) is written as ���� = �� + ��1��1�� + ⋯ + ���������� + ������ + ����, (1) where i denotes a unit of observation. In the example of wages and education, the unit of observation would
  • 2. be an individual, but units of observations can be individuals, households, plots, firms, villages, communities, countries, etc. Just as the research question should drive the choice of what to measure for ��, ��, and ��, the research question also drives the choice of the relevant unit of observation. The problem is that unless the researcher runs an experiment in which she randomly assigns the level of �� to each unit of observation i, the relationship from �� to �� will not be causal. That is, �� will not truly capture the impact of �� on ��, as it will be “contaminated” by the presence of unobservable factors. Some of those factors can be included in �� = (��1, … , ����), of course, but it is in general impossible to fully control for every relevant factor. This is especially true when unobservable or costly to observe factors (e.g., risk aversion, technical ability, soil quality, etc.) play an important role in determining �� and ��. So even if we get an estimate of �� that is statistically significant, we cannot necessarily assume that the relationship between the variable of interest and the outcome variable is causal. In other words, correlation does not imply causation. For example, suppose �� is an individual’s consumption of orange juice and �� is (some) indicator of health. We have often discussed in lecture how a simple regression of �� to �� would provide us with a biased estimate of �� because orange juice consumption is nonrandom and not exogenous to health. That is, there are factors other than orange juice consumption which determine health. Some are observable (e.g., how much someone exercises; whether they smoke; their diet; etc.), but several are unobservable (e.g., their willingness to pay for orange juice; their subjective valuation of health; their level of risk aversion; their
  • 3. genes; etc.) Thus, it really isn’t sufficient to run a kitchen-sink regression (i.e., a regression in which everything observable is thrown in as a control) to properly identify the causal impact of �� on ��. ∗ Associate Professor, Department of Applied Economics, and Director, Center for International Food and Agricultural Policy, University of Minnesota, 1994 Buford Ave, Saint Paul, MN 55113, [email protected] This is the August 2017 version of this handout. mailto:[email protected] 2 Identification So how do we identify causality? The best way to do so is to run a randomized controlled trial (RCT), which we have discussed in lecture. In this case, the idea would be to get a random sample of individuals of size �� and to assign half of the sample (i.e., �� 2⁄ ) to a control group and half to a treatment group. The latter group would be told to consume, say, one glass of orange juice every morning, and the other half would be told not to do so. Then, after a suitable period of time, we would compare the mean of �� between groups. The null hypothesis would of course be that the mean health of the treatment group is equal to the mean health of the control group. A rejection of the null in favor of finding that the mean health of the treatment group is higher than the mean health of the control group would then be evidence in favor of the hypothesis that orange juice is good for one’s health. More than that – it would
  • 4. be evidence in favor that orange juice consumption causes good health. The problem is that it is not always possible to run an RCT, and even the simple example described above would be subject to important problems. For example, the individuals in the treatment group may not comply with the experimenters instructions, especially if they don’t like orange juice. More generally, they may simply forget to consume orange juice every morning. Likewise, the individuals in the control group may end up inadvertently consuming orange juice when they are not supposed to. These reasons – and others – would contaminate one’s estimate of �� in equation 1 and would invalidate the test of equality of means described above. So what is one to do? Instrumental Variables Estimation When one only has observational (i.e., nonexperimental) data at one’s disposal, the best way to identify causality is to find an instrumental variable (IV) for the endogenous variable. In the example above, the endogenous variable is ��, which is said to be endogenous to ��. What is an IV? It is a variable �� that is (i) correlated with ��; but (ii) uncorrelated with �� and which is used to make �� exogenous to ��. How does an IV exogenize an endogenous variable? By virtue of being correlated with the endogenous variable, yet uncorrelated with the error term, which is the definition of an instrument. I realize that this sounds tautological, so for example, Angrist (1990) studies the impact of education (��)
  • 5. on wages (��). The problem is that education is endogenous to wage, if anything because people acquire education in expectation of the wage they think this will get them. In other words, even if we find a positive coefficient for education in a regression of wage on explanatory variables, this is merely a correlation, and it does not necessarily indicate that education causally affects wages. To instrument for this, Angrist had to find a variable that would be correlated with how much education someone would get, but uncorrelated with anything unobserved and would affect wage only through how much education they acquire. The instrument he settled upon was an individual’s Vietnam draft lottery number, since this correlates with whether one goes to war and is then subject to the GI Bill, but since those numbers are randomly generated, they are uncorrelated with unobservables. How does IV estimation work, mechanically speaking? Recall that our equation of interest is 3 ���� = �� + ��1��1�� + ⋯ + ���������� + ������ + ����. (1) The way IV estimation proceeds is to first regress the endogenous variable �� on the instrument �� as well as on the control variables in �� = (��1, … , ����), such that ���� = �� + ��1��1�� + ⋯ + ���������� +
  • 6. ������ + ����. (2) Once equation 2 is estimated, it is possible to predict the variable ��, whose prediction we label ��� (the circumflex accent – or “hat” – denotes a predicted variable in econometrics) and to then estimate equation 1 as follows ���� = �� + ��1��1�� + ⋯ + ���������� + ������� + ����. (1’) Note what has been done here: we have replaced the endogenous variable with an exogenized version of the same variable. The way it has been exogenized has been by regressing it on the IV, which is exogenous to the outcome of interest, and to obtain its predicted value, which we then use in lieu of the original endogenous variable. The first requirement of an instrument – i.e., that it be correlated with �� – is easily testable: we only need to check that the coefficient �� in equation 2 is significantly different enough from zero. The second requirement of an instrument – i.e., that it only affect the outcome of interest �� through the treatment variable – cannot be tested for. Rather, one must make the case that it is truly exogenous to the outcome of interest. This is easier said than done in most cases, as some people have devoted entire careers to finding good IVs. References Angrist, Joshua D. (1990), “Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from the Social Security Administrative Records,” American Economic Review
  • 7. 80(3): 313-336. IntroductionMechanicsIdentificationInstrumental Variables EstimationReferences Therapeutic Communication 1. In the movie, Shutter Island, Dr. Crawley asked Teddy (Andrew) to explain what happened when he discovered his deceased children? What therapeutic communication is this an example of? a. Closed ended questions b. Using silence c. Giving broad opening d. Open ended questions Rationale: Opened ended questions is correct because this i s giving Andrew a chance to answer with more than a simple yes or no. Using silence is incorrect because he is not being quiet. Closed ended questions is incorrect because he cannot answer the question with just a yes or no answer. Giving broad
  • 8. opening is incorrect because Dr. Crawley did ask Andrew to pick the topic and express his thoughts. 2. Dr. Crawley told Andrew if he had another episode, he would be lobotomized. In the ending of the movie, Teddy called Dr. Sheehan, Chuck. Dr. Sheehan nods his head at Dr. Crawley giving a signal. What therapeutic communication is this? a. Making observations b. Presenting reality c. Offering self d. Accepting Rationale: Making observations is correct because Dr. Sheehan observed that Teddy called him Chuck, verifying that he was having another episode. Presenting reality is incorrect because he was not trying to bring him back into reality. Offering self is incorrect because the doctors are not “interested” into Teddy. Accepting is incorrect because they are not accepting Teddy behavior as normal. 3. In the movie, Shutter Island, the staff of Ashecliffe allowed Andrew to play out the role of Teddy hoping to cure his conspiracy insanity. This is an example of what kind of therapeutic communication?
  • 9. a. Accepting b. Exploring c. Engaging into fantasy d. Denial Rationale: Engaging into fantasy is correct because the staff played into his fantasy of finding Andrew Laeddis. Exploring is incorrect because exploring is delving further into a subject, idea, experience, or relationship. Denial is incorrect because no one was in denial about his state of mind. Accepting is incorrect because they know this act is only to cure his disease. 4. Which question shows an example of the therapeutic communication placing events in sequence? a. “You feel angry when he doesn’t help.” b. “Are you feeling…” c. “What could you do to let your anger out harmlessly?” d. “Will you please tell me more about the situation with all the details?”
  • 10. Rationale: “Will you please tell me more about the situation with all the details” is an example of placing events in sequence because it will tell you more about the situation and you can piece together when they happened. The others are incorrect because “You feel angry when he doesn’t help” is an example of focusing, “Are you feeling…” is an example of verbalizing the implied and, “What could you do to let your anger out harmlessly?” is an example of formulating a plan of action. 5. Which of the following statements by Andrew Laeddis shows the therapeutic communication of understanding? a. I feel ok. b. This is a very difficult situation. c. Yes, I understand that Teddy Daniels does not exist. d. You are not listening to me. Rationale: “Yes, I understand that Teddy Daniels does not exist” is an example of understanding because it coveys an attitude of receptivity and regard. The other statements “I feel ok”, “This is a very difficult situation”, and “You are not listening to me” are incorrect
  • 11. because they are responses of implied questions. Defense Mechanism 1. What defense mechanism does Teddy exhibit in the movie Shutter Island? a. Sublimination b. Rationalization c. Projection d. Fantasy Rationale: Teddy is exhibiting fantasy because he is gratifying frustrated desires by imaginary achievements. In the movie Teddy is creating his own fantasy world where he is still a US Marshall, and he creates his own story of what happened to his wife because he does not want to face the reality of what really happened. Sublimination is incorrect because he is not channeling an unacceptable impulse in a socially acceptable direction. Rationalization is incorrect because he is not trying to justify attitudes, beliefs, or behaviors. Projection is incorrect because he is not attributing his own unacceptable behavior unto someone else.
  • 12. NUR 114 Nursing Concept II TYPES OF QUESTIONS • Open questions These are useful in getting another person to speak. They often begin with the words: What, Why, When, Who Sometimes they are statements: “tell me about”, “give me examples of”. They can provide you with a good deal of information. • Closed questions These are questions that require a yes or no answer and are useful for checking facts. They should be used with care - too many closed questions can cause frustration and shut down conversation. • Specific questions These are used to determine facts. For example “How much did you spend on that” • Probing questions
  • 13. These check for more detail or clarification. Probing questions allow you to explore specific areas. However be careful because they can easily make people feel they are being interrogated . • Hypothetical questions These pose a theoretical situation in the future. For example, “What would you do if…?’ These can be used to get others to think of new situations. They can also be used in interviews to find out how people might cope with new situations. • Reflective questions You can use these to reflect back what you think a speaker has said, to check understanding. You can also reflect the speaker’s feelings, which is useful in dealing with angry or difficult people and for defusing emotional situations. • Leading questions. These are used to gain acceptance of your view – they are not useful in providing honest views and opinions. If you say to someone ‘you will be able to cope, won’t you?’ they may not like to disagree. You can use a series of different type of questions to “funnel” information. This is a way of structuring information in sequence to explore a topic and to get to the heart of the issues. You may use an open question, followed by a probing question, then a specific question and a reflective question.
  • 14. 1 A PRIMER ON LINEAR REGRESSION Marc F. Bellemare∗ Introduction This set of lecture notes was written to allow you to understand the classical linear regression model, which is one of the most common tools of statistical analysis in the social sciences. Among other things, a regression allows the researcher to estimate the impact of a variable of interest �� on an outcome of interest �� holding other included factors �� = (��1, … , ����) constant. For the purposes of this discussion, let �� measure a given policy, �� measure welfare, and the vector �� measure the various control variables the researcher has seen fit to include. Example For example, one might be interested in the impact of individuals’ years of education �� on their wage �� while controlling for age, gender, race, state, sector of employment, etc. in ��. Generally, social science research is interested in the impact of a specific variable of interest on an outcome of interest, i.e., in the impact of �� on ��. Mechanics The regression of �� on (��, ��1, … , ����) is typically written as
  • 15. ���� = �� + ��1��1�� + ⋯ + ���������� + ������ + ����, (1) where i denotes a unit of observation. In the example of wages and education, the unit of observation would be an individual, but units of observations can be individuals, households, plots, firms, villages, communities, countries, etc. Just as the research question should drive the choice of what to measure for ��, ��, and ��, the research question also drives the choice of the relevant unit of observation. When estimating equation 1, the researcher will have data on �� units of observations, so �� = 1, … , ��. Alternatively, we say that �� is the sample size. For each of those �� units, the researcher will have data on ��, ��, and ��. In other words, we will ignore the problem of missing data, as observations with missing data are usually dropped by most statistical packages. The role of regression analysis is to estimate the coefficients (��, ��1, … , ����, ��). To differentiate the “true” coefficients from coefficient estimates, we will use a circumflex accent (i.e., a “hat”) to denote estimated coefficients. Therefore, the estimated (��, ��1, … , ����, ��) will be denoted (���, �̂��1, … , �̂����, ���). Going back to our interest in estimating the impact of �� on �� at the margin, this impact is represented in the context of equation 1 by the parameter ��. Indeed, if you remember your partial derivatives, the marginal ∗ Associate Professor, Department of Applied Economics, and Director, Center for International Food and
  • 16. Agricultural Policy, University of Minnesota, 1994 Buford Ave, Saint Paul, MN 55113, [email protected] This is the August 2017 version of this handout. mailto:[email protected] 2 impact of �� on �� at is equal to ���� ���� = ��. Moreover, the partial derivative is such that only �� varies. In other words, �� measures the impact of a change in �� on �� holding everything else constant, or ceteris paribus. In this case, what we mean by “everythi ng else” is limited only to the factors that are included in the vector �� of control variables. Whatever is not included among the variables �� is not held constant by regression analysis. Indeed, the relationship in equation 1 is not deterministic in the sense that even if we have data for ��, ��, and �� and credible parameter estimates (���, �̂��1, … , �̂����, ���), we will still not be able to perfectly forecast for ��. That is because there are several things about any given problem that we, as social science researchers, do not observe and are not privy to. Individuals have intrinsic motivations that even they may have difficulty expressing. Individuals make errors. Individuals experience unforeseen events. There are factors which are very important in determining �� but which we simply do not observe.
  • 17. For all these reasons, we add an error term �� at the end of equation 1. The error term simply represents our ignorance about the problem. As such, it includes all of the things that we did not think of including on the right-hand side of equation 1, as well as all of the things that we could not include on the right-hand side of equation 1. The error term �� thus embodies our ignorance about the relationship between two variables. So how does a linear regression actually work? To take an example I know well, suppose we are looking at only two variables: rice yield (i.e., kg/are), which will be our outcome of interest �� since it represents agricultural productivity, and cultivated area (number of ares, or hundredths of a hectare, or 100 square meters), which will be our variable of interest ��. Indeed, the inverse relationship between farm or plot size and productivity has been a longstanding empirical puzzle in development microeconomics (Barrett et al., 2010). So, plotting some data on this question, we get the following figure. The scatter plot in figure 1 directly shows that the relationship between yield and cultivated area is not deterministic. That is, the relationship between the two variables is not a straight line, and the fact that the relationship is scattered indicates that there are other factors besides cultivated area that contributed to determining rice productivity. The role of the regression – and, as we will soon understand, of the error term – is to linearly approximate as best as possible the relationship between two variables. In other words, to do something that looks like the red line in figure 2.
  • 18. 3 Figure 1. Rice Productivity Scatter Figure 2. Rice Productivity Scatter and Regression Line 1 2 3 4 5 Y ie ld -2 0 2 4 6 Cultivated Area 1 2 3 4 5
  • 19. -2 0 2 4 6 Cultivated Area Yield Fitted values 4 Note that we indeed find an inverse relationship between plot size and productivity, since the regression line slopes downward, which means that in this context, ��� < 0. Indeed, running a simple regression of rice yield on cultivated area yields ��� = 4.187 (and we can see from the graph that 4.187 would indeed be the value of rice productivity at a cultivated area of zero ares) and ��� = −0.356, with both coefficients statistically different from zero at less than the 1 percent level (i.e., there is a less than one percent chance �� and �� are no different from zero). In other words, the finding here is that on average, �� = 4.187 − 0.356��, (2) or for every 1 percent increase in cultivated area, rice productivity decreases by 0.356 percent. This may seem counterintuitive, but remember that productivity is not total output – it is only a measure of average productivity on the plot. So how does the apparatus of the linear regression determine the value of the intercept and the value of the slope of the regression line in figure 2? This is where our error term comes into play. Indeed, a linear
  • 20. regression will choose, among all possible lines, the one that minimizes the sum of the distances between each point in the scatter and the line itself, under the assumption that the error is on average equal to zero (i.e., that our predictions are right on average). Assuming that the error term is equal to zero on average and minimizing the sum of all point-line distances (technically, the sum of squared errors) allows us to obtain estimates ��� and ��� of the true parameters �� and ��. A few remarks are in order. First off, note that the constant term (or the intercept) �� does not have an economic interpretation in this case, since a cultivated area of zero really entails a yield of zero. Second, since the only factor we included on the right-hand side of equation 1 was cultivated area, the error term includes a lot of things which may be potentially crucial in determining yield. For example, the plot’s position on the toposequence, the quality of the soil, the source of irrigation of the plot, various characteristics of the household operating the plot, etc. So because we are typically interested in the impact of �� on �� controlling for a number of factors ��, regression results will not be presented in the form of figure 2. Indeed, regression results are typically presented in the form of table 1 at the end of this document. How do we interpret table 1? First off, note that N = 466. That is, we have data on 466 plots. The first column tells us what variables are included on the right-hand side of equation 1, viz. cultivated area; land value; total land owned by the household; household size (number of individuals); household dependency ratio (proportion of dependents within the household); whether the household head is a single female or a single male; whether the plot is irrigated by a dam, a spring, or
  • 21. rainfed; soil quality measurements (carbon, nitrogen, potassium percentages; soil pH; clay, silt, and sand percentages); and an intercept. The second column shows the estimated coefficients for the first specification of equation 1 (in this case, a pooled cross - section of all the plots and all the households, i.e., a specification which ignores the fact that some households own more than just one plot in the sample); and the third column shows the standard errors around each estimated coefficient. These standard errors are used to determine whether each coefficient is statistically significantly different from zero or not. To make life simpler, table 1 shows whether coefficients are significant at the 10, 5, or 1 percent levels by using the symbols *, **, and *** respectively. Note that in all cases, there is a (significant) inverse relationship between cultivated area and rice productivity. 5 Taking column 1 as an example of how to interpret regression results, what can we say? First and foremost, note that for a 1 percent increase in cultivated area, there is an associated productivity decrease of 0.27 percent (alternatively, a doubling of the size of the plot would be associated with a 27-percent decrease in productivity). Moreover, we can note three things. First off, the more valuable a plot, the more productive it is; second, plots irrigated by a dam are more productive than plots without any irrigation; and third, plots irrigated by a spring are more productive than plots without any
  • 22. irrigation. In fact, comparing the magnitude of the coefficient estimates for irrigation by a dam and irrigation by a spring, we see that the impacts of these two types of irrigation are essentially the same. Another thing of note in table 1 is how the coefficient on the variable of interest changes depending on what is included on the right-hand side of equation 1. Comparing the first two specifications (i.e., pooled cross- section vs. household fixed effects), note how the magnitude of the inverse relationship between productivity and cultivated area is reduced from -0.271 to - 0.176 when household fixed effects (i.e., controls for household-specific unobservables characteristics, which is made possible here because there are 286 households for 466 plots; in other words, there are some households who own more than one plot in the sample) are included. This indicates that a great deal of the inverse relationship can be attributed to household-specific, otherwise unobservable factors. Likewise, comparing specifications 1 and 3 (i.e., pooled cross-section vs. soil quality), note again how the magnitude of the inverse relationship between productivity and cultivated area is reduced from -0.271 to -0.265 when soil quality measurements are included. Overall, this indicates that household-specific, unobserved factors are more important in driving the inverse relationship than the omission of soil quality measurements. In any event, a comparison of specifications 1 and 2 and of specifications 1 and 3 point to an important endogeneity problem (in this case, an omitted variables problem) caused by the omission, respectively, of household fixed effects and of soil quality measurements.
  • 23. References Barrett, Christopher B., Marc F. Bellemare, and Janet Y. Hou (2010), “Reconsidering Conventional Explanations of the Inverse Productivity—Size Relationship,” World Development 38(1): 88-97. 6 Table 1 – Yield Approach Estimation Results (n=466) (1) Pooled Cross-Section (2) Household Fixed Effects (3) Soil Quality (4) Household Fixed Effects and Soil Quality Variable Coefficient (Std. Err.) Coefficient (Std. Err.) Coefficient (Std. Err.) Coefficient (Std. Err.)
  • 24. Dependent Variable: Rice Yield (Kilograms/Are) Cultivated Area -0.271*** (0.038) -0.176*** (0.046) -0.265*** (0.048) -0.187*** (0.052) Total Land Area -0.055 (0.038) -0.054 (0.047) Land Value 0.183*** (0.031) 0.303*** (0.069) 0.176*** (0.032) 0.287*** (0.063) Household Characteristics Household Size -0.007 (0.009) -0.008 (0.008) Dependency Ratio -0.073 (0.130) -0.083 (0.144) Single Female -0.056 (0.111) -0.070 (0.119) Single Male 0.133 (0.134) 0.122 (0.155) Plot Characteristics Irrigated by Dam 0.389** (0.171) 0.228 (0.202) 0.450** (0.211) 0.545 (0.402) Irrigated by Spring 0.365** (0.175) 0.250 (0.214) 0.425** (0.214) 0.541 (0.389) Irrigated by Rain 0.184 (0.180) 0.024 (0.217) 0.249 (0.220) 0.313 (0.431) Soil Quality Measurements Carbon -1.361 (1.510) -0.001 (1.844) Nitrogen 1.668 (1.781) -0.007 (2.750) pH -1.064 (7.163) -17.969 (14.459) Potassium 1.183 (1.412) -5.528* (3.035) Clay 0.293 (3.115) -5.183 (4.174) Silt 0.521 (5.751) 4.485 (11.681) Sand -0.135 (5.607) 5.261 (7.106) Intercept -1.847*** (0.372) -3.162*** (0.694) -2.276 (1.552) - 3.328*** (0.819) Number of Households – 286 – 286 Bootstrap Replications – – 500 500 Village Fixed Effects Yes Dropped Yes Dropped
  • 25. R2 0.45 0.97 0.46 0.97 p-value (All Coefficients) 0.00 0.00 0.00 0.00 p-value (Fixed Effects) – 0.00 – 0.00 p-value (Soil Quality) – – 0.79 0.52 ***, ** and * indicate statistical significance at the one, five and ten percent levels, respectively. IntroductionExampleMechanicsReferences Ordinary Least-Squares Ordinary Least-Squares Ordinary Least-Squares Ordinary Least-Squares Ordinary Least-Squares One-dimensional regression x y Ordinary Least-Squares One-dimensional regression y = ax
  • 26. Find a line that represent the ”best” linear relationship: x y Ordinary Least-Squares One-dimensional regression iiie = y - x a • Problem: the data does not go through a line iiy - x a x y Ordinary Least-Squares One-dimensional regression iiie = y - x a • Problem: the data does not go through a line
  • 27. • Find the line that minimizes the sum: i iiå(y - x a)2 iiy - x a x y Ordinary Least-Squares One-dimensional regression x ̂ iiie = y - x a i i = å(y -x a )2e(a) • Problem: the data does not go through a line • Find the line that minimizes the sum: • We are looking for that minimizes i iiå(y - x a)2
  • 28. iiy - x a x y Ordinary Least-Squares Multidimentional linear regression Using a model with m parameters å= j jjmm xay = x a + ...+ x a11 Ordinary Least-Squares Multidimentional linear regression Using a model with m parameters 2x å= j jjmm xay = a x + ...+ a x11 1
  • 29. x y Ordinary Least-Squares Multidimentional linear regression Using a model with m parameters 2a å=++= j jjmm xaxaxab ...11 1a b Ordinary Least-Squares Multidimentional linear regression Using a model with m parameters and n measurements å=+ j
  • 30. jjmm xaxay = a x + ...11 2 2 1 , 2 1 1 , )( y (a) y - Ax= ú û ù ê ë é -= -= å å å
  • 31. = = = m j jji n i m j jjii xa xaye Ordinary Least-Squares Multidimentional linear regression Using a model with m parameters and n measurements å=+ j jjmm xaxay = a x + ...11
  • 32. 2 2 1 , 2 1 1 , )( (a) y (a) y - Ax= ú û ù ê ë é -= -= å å å
  • 33. = = = e x a x aye m j jji n i m j jjii Ordinary Least-Squares Minimizing e(a) amin minimizes e(a) if Ordinary Least-Squares
  • 34. x y ei no errors in ai Ordinary Least-Squares x y ei x y ei no errors in xi errors in xi Ordinary Least-Squares x y homogeneous errors
  • 35. Ordinary Least-Squares x y x y homogeneous errors non-homogeneous errors Ordinary Least-Squares x y no outliers Ordinary Least-Squares x y x y