Project -- Second Deliverable
Introduction
After reviewing the comments of first deliverable, we learned several things and fixed these problems in the second deliverable. First of all, we did not give sufficient and thorough introductions of the database’s background, which made readers have difficulties understanding our analysis based on the data. Second, we gave too many unnecessary details, such as data names in database, meanings of data values, which were confusing because readers cannot see outputs from Stata, therefore they do not know what we were referring to. So, in this deliverable, we will pay more attention to clarify each variable’s representation and relations between dependent variable and independent variables. Moreover, one label in our table was misleading. “family member number” was supposed to represent family size, the amount of people in each household, but readers may interpret that in different ways. We also concluded that, based on small t statistics and large p values, a control variable, employment status does not have much explanatory power with relation to the dependent variable. So in the second deliverable, we will replace it by region, which affects the cost of water much more significantly.
Regression Table:
Discussion of results from new regression analysis
a. different specification considered
Besides the linear regression model, we generated two more alternative specifications, log regression and quadratic regression, and determined the preferred one based on a comparison of their R squares. Because all three regressions have exactly same amount of control variables on the right side of regressions, comparing R square is as unbiased as comparing the adjusted R square.
To estimate what elements affect the dependent variable, annual water cost of households measured in dollars, both Linear and log regressions have control variables: total income each household earns measured in thousands of dollars, family size, whether household is located at farm or not, value of house measured in thousands of dollars, and region of household. The quadratic regression, however, has the square of total income and house value instead of their original first order terms.
After running regression models in Stata, we got a R square and an adjusted R square for all three regressions. To determine the preferred one between linear and log regressions, however, it’s necessary to transform logged dependent variable to unlogged dependent variable first (generating the squared correlation between annual water cost and estimated annual water cost). An R-square comparison is meaningful only if the dependent variable is the same for both models. For log model, the R-square measures the amount of variation in ln(watercost), but not true variation in cost of water.
The squared correlation between annual water cost and estimated annual water cost equals to (0.2305)^2, which is 0.053. The R square of linear regression is 0.06.
How to Send Pro Forma Invoice to Your Customers in Odoo 17
Project -- Second DeliverableIntroductionAfter reviewing the.docx
1. Project -- Second Deliverable
Introduction
After reviewing the comments of first deliverable, we learned
several things and fixed these problems in the second
deliverable. First of all, we did not give sufficient and thorough
introductions of the database’s background, which made readers
have difficulties understanding our analysis based on the data.
Second, we gave too many unnecessary details, such as data
names in database, meanings of data values, which were
confusing because readers cannot see outputs from Stata,
therefore they do not know what we were referring to. So, in
this deliverable, we will pay more attention to clarify each
variable’s representation and relations between dependent
variable and independent variables. Moreover, one label in our
table was misleading. “family member number” was supposed to
represent family size, the amount of people in each household,
but readers may interpret that in different ways. We also
concluded that, based on small t statistics and large p values, a
control variable, employment status does not have much
explanatory power with relation to the dependent variable. So in
the second deliverable, we will replace it by region, which
affects the cost of water much more significantly.
Regression Table:
Discussion of results from new regression analysis
a. different specification considered
2. Besides the linear regression model, we generated two more
alternative specifications, log regression and quadratic
regression, and determined the preferred one based on a
comparison of their R squares. Because all three regressions
have exactly same amount of control variables on the right side
of regressions, comparing R square is as unbiased as comparing
the adjusted R square.
To estimate what elements affect the dependent variable, annual
water cost of households measured in dollars, both Linear and
log regressions have control variables: total income each
household earns measured in thousands of dollars, family size,
whether household is located at farm or not, value of house
measured in thousands of dollars, and region of household. The
quadratic regression, however, has the square of total income
and house value instead of their original first order terms.
After running regression models in Stata, we got a R square and
an adjusted R square for all three regressions. To determine the
preferred one between linear and log regressions, however, it’s
necessary to transform logged dependent variable to unlogged
dependent variable first (generating the squared correlation
between annual water cost and estimated annual water cost). An
R-square comparison is meaningful only if the dependent
variable is the same for both models. For log model, the R-
square measures the amount of variation in ln(watercost), but
not true variation in cost of water.
The squared correlation between annual water cost and
estimated annual water cost equals to (0.2305)^2, which is
0.053. The R square of linear regression is 0.0646. So,
independent variables in linear regression model explain higher
proportion of the variation in the dependent variable and fits
observations better. Then we compared the R square of linear
regression with R square of quadratic regression, which is
0.0547. The linear regression is still better. We finally chose
linear regression as the best fitted model.
b. test statistics to determine between specifications
3. After testing the heteroskedasticity by both Breusch-Pagan and
White tests, we noticed that all three regessions are statistically
significant. That means all three regressions have
heteroskedaticity. To solve that probelm, we generated robust
standard deviation for each of them. The exitsence of
heteroskedasticity causes the varaince of residual varies with
changes of control variables’ values. For example, in the
Breusch-Pagan test of linear regression model, coefficient on
control varaiable household income is 212.7275 and it is
statistically significant. That means, household income causes
the variance of residual to be higher; every 1000 more income
rises the variance of residual by 212.7275. Negative and
statistically significant coefficeints cause the variance of
residual lower. Also, the variance of residual of households
located in new england division is lower than households in
other regions becuase the cofficient on new england is zero and
others are all positive.
Statical significant dose not mean economically significant.
According to the robust linear regression, the coefficent
between hosehold income and annual water cost is 0.216, with t
statistics of 22.51 and P value of 0.000. Such statistics indicate
that household income have effcet on annual water cost.
However, it is not economically signifcant because every 1000
dollars increase in income just incrased the annual water cost by
0.216 dollar. Similary, effecr of variable house value is
statistically significant, but it’s not economically significant
becuase 1000 dollars increase in house value just increase the
annual water cost by 0.18 dollar.
Discussion of whether the key control variable and at least two
others fit your prior expectations
We expected the key control variable, household income has
positive effect on annual water costs of water. Keeping all other
elements constant, the more people earn, the more they would
like to spend to improve their standard of living qualities.
Family size were supposed to be positively related to annual
water cost, too. Because more people means more requirements
4. of water. Last, we assumed that the coefficient on households in
west south central division is positive, which means households
in that region spend more on water than people in the new
England division. This is because of the long-lasting warm
weather there and people need to drink and wash more. Our
preferred model’s outputs match our expectations. Also, all
effects are statistically significant at the 0.05 level except the
effect of west north central division, which means that there are
no big differences in annual cost of water between households
in new England division and west north central division.
Architecture
1. The representation for floating point that we learned is
single precision. In that IEEE 754 floating point format,
represent the decimal value 63.25.
2. Represent the decimal value -1.125 in our IEEE 754
floating point format.
3. What result am I likely to get by adding 1E10 and 1E-32 in
our architecture, using IEEE 754 single precision? Why?
4. Suppose that I have a list of n random values, ranging from
1E-32 to 1E10 in size. Assume I have lots of values at each
available order of magnitude. If I wish to calculate the most
accurate sum that I can within the limitations of our
architecture, what is one simple thing I can do to make the sum
more accurate?
5. 5. Suppose we have a 5 stage pipeline with our MIPS
architecture. The latencies of the pipeline states are 250ps for
IF, 350ps for ID, 150ps for EX, 300ps for MEM, and 200ps for
WB. Further, a program we wish to run has 45% arithmetic
instructions, 20% branch instructions, and 35% load/store
instructions.
a. What is the clock cycle time for both pipelined and non-
pipelined processors?
b. What is the total latency of a load instruction in each of a
pipelined and non-pipelined processor?
c. If we could split one stage of the pipelined processor into
two stages, each with half the latency of the original stage,
which stage would you split and what is the new clock cycle
time of the processor?
d. Suppose we can double the number of registers. Doing so
would reduce the number of load/store instructions by 10% for
the program above, but increase the register latency by 50ps.
i. What is the speedup achieved by this proposed
improvement?
ii. What effect could this change have on the number of
instructions represented in the architecture?
Project -- Second Deliverable
Introduction
After reviewing the comments of first deliverable, we learned
several things and fixed these problems in the second
deliverable. First of all, we did not give sufficient and thorough
introductions of the database’s background, which made readers
have difficulties understanding our analysis based on the data.
Second, we gave too many unnecessary details, such as data
names in database, meanings of data values, which were
6. confusing because readers cannot see outputs from Stata,
therefore they do not know what we were referring to. So, in
this deliverable, we will pay more attention to clarify each
variable’s representation and relations between dependent
variable and independent variables. Moreover, one label in our
table was misleading. “family member number” was supposed to
represent family size, the amount of people in each household,
but readers may interpret that in different ways. We also
concluded that, based on small t statistics and large p values, a
control variable, employment status does not have much
explanatory power with relation to the dependent variable. So in
the second deliverable, we will replace it by region, which
affects the cost of water much more significantly.
Regression Table:
Discussion of results from new regression analysis
a. different specification considered
Besides the linear regression model, we generated two more
alternative specifications, log regression and quadratic
regression, and determined the preferred one based on a
comparison of their R squares. Because all three regressions
have exactly same amount of control variables on the right side
of regressions, comparing R square is as unbiased as comparing
the adjusted R square.
To estimate what elements affect the dependent variable, annual
water cost of households measured in dollars, both Linear and
log regressions have control variables: total income each
household earns measured in thousands of dollars, family size,
whether household is located at farm or not, value of house
7. measured in thousands of dollars, and region of household. The
quadratic regression, however, has the square of total income
and house value instead of their original first order terms.
After running regression models in Stata, we got a R square and
an adjusted R square for all three regressions. To determine the
preferred one between linear and log regressions, however, it’s
necessary to transform logged dependent variable to unlogged
dependent variable first (generating the squared correlation
between annual water cost and estimated annual water cost). An
R-square comparison is meaningful only if the dependent
variable is the same for both models. For log model, the R-
square measures the amount of variation in ln(watercost), but
not true variation in cost of water.
The squared correlation between annual water cost and
estimated annual water cost equals to (0.2305)^2, which is
0.053. The R square of linear regression is 0.0646. So,
independent variables in linear regression model explain higher
proportion of the variation in the dependent variable and fits
observations better. Then we compared the R square of linear
regression with R square of quadratic regression, which is
0.0547. The linear regression is still better. We finally chose
linear regression as the best fitted model.
b. test statistics to determine between specifications
After testing the heteroskedasticity by both Breusch-Pagan and
White tests, we noticed that all three regessions are statistically
significant. That means all three regressions have
heteroskedaticity. To solve that probelm, we generated robust
standard deviation for each of them. The exitsence of
heteroskedasticity causes the varaince of residual varies with
changes of control variables’ values. For example, in the
Breusch-Pagan test of linear regression model, coefficient on
control varaiable household income is 212.7275 and it is
statistically significant. That means, household income causes
the variance of residual to be higher; every 1000 more income
rises the variance of residual by 212.7275. Negative and
8. statistically significant coefficeints cause the variance of
residual lower. Also, the variance of residual of households
located in new england division is lower than households in
other regions becuase the cofficient on new england is zero and
others are all positive.
Statical significant dose not mean economically significant.
According to the robust linear regression, the coefficent
between hosehold income and annual water cost is 0.216, with t
statistics of 22.51 and P value of 0.000. Such statistics indicate
that household income have effcet on annual water cost.
However, it is not economically signifcant because every 1000
dollars increase in income just incrased the annual water cost by
0.216 dollar. Similary, effecr of variable house value is
statistically significant, but it’s not economically significant
becuase 1000 dollars increase in house value just increase the
annual water cost by 0.18 dollar.
Discussion of whether the key control variable and at least two
others fit your prior expectations
We expected the key control variable, household income has
positive effect on annual water costs of water. Keeping all other
elements constant, the more people earn, the more they would
like to spend to improve their standard of living qualities.
Family size were supposed to be positively related to annual
water cost, too. Because more people means more requirements
of water. Last, we assumed that the coefficient on households in
west south central division is positive, which means households
in that region spend more on water than people in the new
England division. This is because of the long-lasting warm
weather there and people need to drink and wash more. Our
preferred model’s outputs match our expectations. Also, all
effects are statistically significant at the 0.05 level except the
effect of west north central division, which means that there are
no big differences in annual cost of water between households
in new England division and west north central division.
9. Eco311 Project, Final Deliverable, Due Thursday 12/7 at 5 p.m.
Note. There will be a 20 point penalty for each day (or part
thereof) that the assignment is late.
This deliverable should include individual results for your
secondary topic and a discussion of your
conclusions. For your secondary topic, provide the following
steps in your analysis. Your grade on the
final project will be based on the content of your analysis, but
also whether you are able to generate a
document that is professional in its appearance and content.
Your intended audience is someone who
would have the knowledge that is expected of someone who has
mastered the content in Economics
311.
1. A title page that includes a descriptive title, the author’s
name, a date, and a subtitle indicating that
it is a deliverable to Prof. William Even for Eco311 in Fall
2017.
2. Provide an introductory section that with the following:
a. A review of the main findings of your first two deliverables.
This shouldn’t be a discussion
of the details of your data and variables, but rather a simple
10. discussion of the key findings
from your regression. For example, “In our earlier deliverables,
we learned that there are
several important determinants of whether a person over age 55
is employed. First, …..).
b. A discussion of what is new in this deliverable. For
example, “In this deliverable, I will
extend our earlier analysis to examine differences in
employment rates between men and
women and attempt to understand why women have lower
employment rates than men. I
will also investigate whether the aging has a differential effect
on employment rates. I find
that ….”.
3. Background. This should include a discussion of the main
hypotheses you are testing, the data you
will use, and a brief summary of your major findings.
4. Provide a table of summary statistics for the dependent and
control variables for the two groups
created by your secondary variable (i.e. by sex, race, location,
marital status, or year). If your
secondary variable is year, be sure to convert all variables
measured in dollars into current dollars
using the CPI. Included in the summary statistics should be a
t-statistic that tests the null
hypothesis that the means are equal for the two groups and
asterisks indicating whether the
difference in means is significantly different from zero at the
.10 (*), .05(**) or .01(***). You may
either use the stata command ttest, or a regression of the
11. relevant dependent variable on the group
dummy. For example,
ttest incss, by(female)
or
reg incss female
The means, test statistic, and p-value for the null hypothesis
should be included in a single
professional table. Be sure your variable names are self-
explanatory, that you have an appropriate
title, and that your footnotes clearly define the sample for your
analysis.
NOTE: You should try to get your table to fit on a single page.
This is much simpler if you start the
table at the top of a new page. Your table should not be split
across pages unless it is impossible to
fit it all on a single page.
Table 1. Summary Statistics by Sex for Analysis of
Determinants of Number of Children. (Sample
size =446,480)a
Variable Mean for Single
People
Mean for Married
People
t-statistics for equality
12. (p-value in
parentheses)
Number of childrenb 0.57 1.40 233.31
(0.000)
Age 28.33 30.17 155.98
(0.000)
Etc… for all the other
controls
a Sample is drawn from 2016 American Community Survey and
restricted to people aged 21-35.
Excludes people living in group quarters.
b Number of children represents number of own children living
in same household.
5. Provide a regression analysis that allows you to test whether
the between group difference in the
dependent variable is “explained” by differences in the control
variables. After performing your
regression analysis, discuss at least two sets of key variables
(e.g. a group of education dummies
would count as one set of key variables) in your regression and
indicate whether they help explain
why there is a gap in the dependent variable between the two
groups.
A sample table is provided below. The first column gives the
raw difference in the dependent
13. variable across groups (married in this case). Notice that this
exactly matches the difference in the
dependent variable provided in table 1. The second
specification is from a regression of number of
children on all of your control variables. The third
specification is included for the final question.
Keep in mind that this table has only age and its square as a
control variable. Your table should
have all or most of your control variables. If all of the control
variables aren’t listed, provide a list
of other controls that were included in a footnote to your table.
Table 2. Regression Analysis of Determinants of Number of
Children.a
Specification 1 Specification 2 Specification 3
Married 0.829*** 0.701*** 0.462**
(233.3) (197.6) (2.480)
Age
-0.0733*** 0.00143
(-11.51) (0.148)
Age2
0.00253*** 0.000871***
(22.95) (5.087)
Education (omitted group has less than
14. a high school degree)
High school Degree ---
---
Some College ---
---
College Degree
---
---
Married*Age --- -0.0219*
--- (-1.668)
Married*Age2 --- 0.00101***
--- (4.430)
Constant 0.574*** 0.573*** -0.180
(204.2) (6.345) (-1.335)
Observations 446,880 446,880 446,880
R-squared 0.109 0.161 0.164
F-test/p-valuec ---
a Sample is drawn from 2014 American Community Survey and
restricted to people aged 21-35.
Excludes people living in group quarters.
b t-statistics are in parentheses and are calculated using robust
standard errors.
*** indicates p-value below .01; ** below 0.05, and * below
0.1.
c F-test and associated p-value are for null hypothesis that …..
15. You should discuss your results and refer to the relevant table
and/or specification. An example of
such writing is below:
Based on a White/Breusch-Pagan test using the residuals from
specification (2), it was determined that
the regression model had heteroscedasticity. As a
consequence, all of the t-statistics in table 2 are
based on robust standard errors.
Based on a comparison of the coefficients on the married
dummy variable in specifications 1 and 2 of
table 2, we can see that the control variables we added account
for married people having .128 more
children than single people). One explanation for this is that
married people are, on average, 1.84 years
older than single people (see table 1). Moreover, over the 21-
35 year old age range in the sample, age
has a positive marginal effect on the number of children that is
increasing in the number of children
based on estimates of the quadratic in age in the regression.1
Consequently, an important reason that
married people have more children than single people is that
they are, on average, older.
You should discuss at least 2 “important” variables (or sets of
variables) and indicate whether they help
explain why the dependent variable differs across your two
groups.
6. Provide a test of the null hypothesis that the effect of at least
16. one key variable differs across your
two groups. Describe the results of your test and explain the
implications for how the variable has
differential effects on the dependent variable for the two groups
you are examining.
To provide a test that a variable has a differential impact across
groups, use interaction terms. For
example, if you want to test that age has a differential effect
across married and single people, create
interactions between married and age, age2. I have included
these in specification 3 of table 2. Be
sure to discuss the statistical and economic significance of the
interaction terms. If you have multiple
interaction terms, perform a test for the joint significance of
them all and include this in your table (as
illustrated in table 2). For example,
In specification (3) of table 2, interactions between married and
age and its square are added to the
regression. Given the quadratic in age, a simple comparison of
the marginal effect of age on children
for married and single people is not simple.
The marginal effect for single people is given by
������ℎ������
��������
= .00143 + .000871 ∗ ������
The marginal effect for married people is
������ℎ������
��������
17. = −.02047 + .001881 ∗ ������
A comparison of these marginal effects reveals that the
marginal effect of age is greater for married
than single people for ages 22-35. The marginal effect of age
is slightly larger for singles than married
at age 21.
An f-test of the null hypothesis that the coefficients on the
interaction terms is zero is provided at the
bottom of specification (3) in table 2. The results indicate that
the null ……
1 In fact, the quadratic in age implies that the marginal effect of
age is negative until age 14.5 and is positive for all
ages beyond 14.5
Second deliverable for final project – due Thursday 11/30 5:00
p.m.
Along with other students who have chosen the same primary
topic, create a single table of regression
analysis and discuss the results in the text of your paper. Be
sure to correct any problems that were
mentioned in the review of your first draft. Your grade on this
deliverable will be based on the content
of your analysis, but also whether you are able to generate a
document that is professional in its
18. appearance and content. Your intended audience is someone
who would have the knowledge that is
expected of someone who has mastered the content in
Economics 311.
1. Provide at least 2 tests of alternative specifications (e.g. log
vs linear, linear vs quadratic, dummy
variables vs continuous, etc.) In the text, describe the
specifications you compared and the
preferred specification based on your analysis. Present the
results of your regression analysis
and the relevant test-statistics in a professional table. Be sure
to discuss the results of your
analysis in the text.
2. For each specification considered in part (1), provide a
Breusch-Pagan test and the simple
version of the White test (2nd form discussed in notes) for
heteroscedasticity. Include the test
statistic and corresponding p-values for these test statistics in
your regression table. In the text
of your deliverable, describe the basis for the conclusions you
draw from your
heteroscedasticity tests. If you find heteroscedasticity, are
there specific characteristics that
cause the variance of the residual to be higher or lower?
Explain how you came to this
conclusion.
3. If there is evidence of heteroscedasticity, provide standard
errors (or t-statistics) that are
properly corrected by using robust standard errors. However,
if you are estimating a linear
19. probability model, use weighted least squares (instead of robust
standard errors) if possible –
but be sure to investigate whether WLS would result in negative
weights. If WLS results in
negative weights, discuss how you determined this.
4. Discuss whether your expected effects for your key control
variable and at least two others that
you included in your first deliverable are confirmed by the
preferred specification you identified
above. Discuss whether these effects are statistically
significant at the .05 level.
5. Discuss the “economic significance” of the effects for your
two control variables. For example,
describe the effect of a one standard deviation change in
continuous control variables on the
dependent variable; or a switch from 0 to 1 for a dummy
variable.
6. In order that your table be deemed professional, review the
document posted on my website.
The regression table should be self-explanatory. The reader
should be able to determine what
kind of regression was estimated, how the sample was created,
and what all the variables
measure without referring to the text. Examples of appropriate
tables are provided in the
document “Creating effective tables” that is posted in the
Canvas project module.
Specific elements of the table that should make the table self-
explanatory are as follows:
20. a. Title (make it clear what the table is about – e.g.
Determinants of Household Electricity
Expenditures in 2016).
b. Column and row headings (See sample tables for examples
of relevant column
headers).
c. Notes attached to the table that explain
i. The source of data for the table, including relevant sample
restrictions (e.g.
households aged 25-55 who have a mortgage).
ii. Whether the table has t-statistics or standard errors in
parentheses and the
type of regression (e.g. OLS, linear probability model estimated
with OLS, etc).
If robust standard errors are used for calculation of standard
errors or t-
statistics, make that clear. (e.g. t-statistics are in parentheses.
Robust
standard errors are used in specifications 3 and 4.)
iii. Anything that needs to be clarified about variables or
methods can be stated in
the list of variables or footnotes to the table. (e.g. income is
measured in 1000s
of 2016 dollars)
iv. See Miller for examples of how to use notes in tables.
Notice that notes in
tables are referenced with a letter (not a number).
21. d. Variable names that are easily understood. Some examples:
i. years of education, not educ
ii. Number of children, not NCHILd
iii. Household Income in 2016 dollars, not income
iv. Be sure units of measurement are clear (e.g. 1000s of
dollars, birthweight in
pounds).
v. If you are using dummy variables for categories, group them
together and make
it clear which dummy was omitted).
e. Make it clear what the dependent variable is and also indicate
sample size and either R2
or adjusted R2 (or both).
f. Regression tables may also present only a subset of
coefficients and mention in a note
or row whether other variables were included in the regression.
See the sample tables
provided on the projects webpage. Miller (table 4.2) provides
guidance on the
appropriate number of digits. If coefficients are “too small”,
rescale the relevant control
variable to adjust (e.g. measure income in 1000s of dollars
instead of dollars).
g. Regression tables should start at the top of the page, and if
they must span across
pages, be sure to “break” them at a reasonable point (e.g. don’t
split a row with
coefficients on one table and t-statistics on the next).
22. The sections of your paper should include the following.
1. Title page (title, authors, date)
2. Introduction: Summarize what was learned in first
deliverable and what is added in this
deliverable.
3. Discussion of results from new regression analysis (Table 1).
a. Different specifications considered and the test statistics you
used to decide between
the specifications.
b. The implications of the heteroskedasticity tests for each
specification (i.e. do you find
evidence of heteroskedasticity) and how this affected your
decision to use robust
standard errors or weighted least squares in each specification.
4. Discussion of whether the key control variable and at least
two others fit your prior
expectations. Be sure to explain why you expected the effect of
the control variable on the
dependent variable to be either positive or negative.
First deliverable for final project – due Tuesday 11/21 by 5 p.m.
Your first deliverable should include a word document that has
the following parts. You will also
23. separately submit a do-file and log-file that created all of the
results. All deliverables will be submitted
via Canvas by 5 p.m. There is a 20 percent penalty for any
paper submitted after the deadline but
before 11/23 at 5 p.m. No submissions will be accepted after
11/23 at 5 p.m.
1) A description of the data set you will use for your primary
analysis. This should include a
description of the primary data source (the American
Community Survey), the year(s) of
the data, and the restrictions that I described above. Note: if
you are investigating a
“household level variable” (e.g. homeownership, home value,
rent,) you should use only
the head of the household (pernum==1) for your analysis. If
you are investigating an
“individual level” variable (e.g. employment, marital status),
use both the reference
person and his/her spouse
2) A description of any sample restrictions that you are making
(e.g. omitted people in
certain age ranges, dropped people with missing data, etc.).
Make sure that you have
“cleaned” your data so that you have the same number of
observations on the
dependent and control variables. Also, describe any variables
that you create or modify.
For example, be sure to study the codebook to understand how
variables might be
coded if there are missing values (drop such observations). You
may also want to adjust
the units that certain variables are measured in (e.g. housing
values might be converted
to 1000s of dollars instead of dollars).
24. 3) A description of the dependent variable and the key control
variable in your analysis and
a discussion of why you believe the key control variable will
have either a positive or
negative effect on the dependent variable. You should show the
relationship between
the dependent variable and the key control variable of interest
with both a table and a
graph. The table and graph should be professionally designed
and be self-explanatory.
That is, anyone who looks at the table or graph should be able
to determine what
statistics are presented, the meaning of the variables, and where
the data came from.
4) A description of at least two other control variables that you
believe will help explain
variation in your dependent variable. Describe why you think
the direction of the
expected effect of each control variable on the dependent
variable would be either
positive or negative. A professional table showing for the
dependent variable and each
control variable the number of observations, the sample mean,
standard deviation,
minimum and maximum value for the dependent variable and
each of the control
variables in your data. Be sure to explain any modifications you
made to the variables in
the data set (e.g. did you have to recode variables that were
missing? Did you have to
convert a categorical variable to a continuous variable? Did you
make dummy
variables?)
25. 5) A simple linear regression of your dependent variable on the
key variable(s) of interest
and other control variables that you believe are important. The
results should be
presented in a professional table created with the Stata routine
esttab. The results
should include coefficients, t-statistics, R2 and adjusted R2 ,
and the number of
observations.
6) The structure of your word document should include the
following sections.
(a) Introduction.
(i) Describes the objective of the deliverable and a couple of the
major findings.
For example, this study will use data from the ACS to
understand the factors
that determine whether people are employed. We find several
important
determinants of employment. For example, …..
(b) The data
(i) Description of ACS data.
(ii) The dependent variable to be studied, the key control
variable, and other
controls you think are important.
26. (iii) Expected effect of each of the control variables and why.
(iv) Sample restrictions for your analysis.
(v) Any modifications you made to the variables that you use in
your analysis.
(c) Results of data analysis.
(i) The word document should provide a brief discussion of the
results in the tables
and figures presented below. All the tables and figures should
be numbered
and added to the end of your word document.
(ii) Table 1. A professional table showing sample statistics
(means, min, max,
observations, etc.)
(iii) Figure 1. A professional figure showing relationship
between dependent
variable and key control variable.
(iv) Table 2. A professional table showing relationship
between dependent variable
and key control variable.
(v) Table 3. A professional table showing regression results.