SlideShare a Scribd company logo
1 of 27
Project -- Second Deliverable
Introduction
After reviewing the comments of first deliverable, we learned
several things and fixed these problems in the second
deliverable. First of all, we did not give sufficient and thorough
introductions of the database’s background, which made readers
have difficulties understanding our analysis based on the data.
Second, we gave too many unnecessary details, such as data
names in database, meanings of data values, which were
confusing because readers cannot see outputs from Stata,
therefore they do not know what we were referring to. So, in
this deliverable, we will pay more attention to clarify each
variable’s representation and relations between dependent
variable and independent variables. Moreover, one label in our
table was misleading. “family member number” was supposed to
represent family size, the amount of people in each household,
but readers may interpret that in different ways. We also
concluded that, based on small t statistics and large p values, a
control variable, employment status does not have much
explanatory power with relation to the dependent variable. So in
the second deliverable, we will replace it by region, which
affects the cost of water much more significantly.
Regression Table:
Discussion of results from new regression analysis
a. different specification considered
Besides the linear regression model, we generated two more
alternative specifications, log regression and quadratic
regression, and determined the preferred one based on a
comparison of their R squares. Because all three regressions
have exactly same amount of control variables on the right side
of regressions, comparing R square is as unbiased as comparing
the adjusted R square.
To estimate what elements affect the dependent variable, annual
water cost of households measured in dollars, both Linear and
log regressions have control variables: total income each
household earns measured in thousands of dollars, family size,
whether household is located at farm or not, value of house
measured in thousands of dollars, and region of household. The
quadratic regression, however, has the square of total income
and house value instead of their original first order terms.
After running regression models in Stata, we got a R square and
an adjusted R square for all three regressions. To determine the
preferred one between linear and log regressions, however, it’s
necessary to transform logged dependent variable to unlogged
dependent variable first (generating the squared correlation
between annual water cost and estimated annual water cost). An
R-square comparison is meaningful only if the dependent
variable is the same for both models. For log model, the R-
square measures the amount of variation in ln(watercost), but
not true variation in cost of water.
The squared correlation between annual water cost and
estimated annual water cost equals to (0.2305)^2, which is
0.053. The R square of linear regression is 0.0646. So,
independent variables in linear regression model explain higher
proportion of the variation in the dependent variable and fits
observations better. Then we compared the R square of linear
regression with R square of quadratic regression, which is
0.0547. The linear regression is still better. We finally chose
linear regression as the best fitted model.
b. test statistics to determine between specifications
After testing the heteroskedasticity by both Breusch-Pagan and
White tests, we noticed that all three regessions are statistically
significant. That means all three regressions have
heteroskedaticity. To solve that probelm, we generated robust
standard deviation for each of them. The exitsence of
heteroskedasticity causes the varaince of residual varies with
changes of control variables’ values. For example, in the
Breusch-Pagan test of linear regression model, coefficient on
control varaiable household income is 212.7275 and it is
statistically significant. That means, household income causes
the variance of residual to be higher; every 1000 more income
rises the variance of residual by 212.7275. Negative and
statistically significant coefficeints cause the variance of
residual lower. Also, the variance of residual of households
located in new england division is lower than households in
other regions becuase the cofficient on new england is zero and
others are all positive.
Statical significant dose not mean economically significant.
According to the robust linear regression, the coefficent
between hosehold income and annual water cost is 0.216, with t
statistics of 22.51 and P value of 0.000. Such statistics indicate
that household income have effcet on annual water cost.
However, it is not economically signifcant because every 1000
dollars increase in income just incrased the annual water cost by
0.216 dollar. Similary, effecr of variable house value is
statistically significant, but it’s not economically significant
becuase 1000 dollars increase in house value just increase the
annual water cost by 0.18 dollar.
Discussion of whether the key control variable and at least two
others fit your prior expectations
We expected the key control variable, household income has
positive effect on annual water costs of water. Keeping all other
elements constant, the more people earn, the more they would
like to spend to improve their standard of living qualities.
Family size were supposed to be positively related to annual
water cost, too. Because more people means more requirements
of water. Last, we assumed that the coefficient on households in
west south central division is positive, which means households
in that region spend more on water than people in the new
England division. This is because of the long-lasting warm
weather there and people need to drink and wash more. Our
preferred model’s outputs match our expectations. Also, all
effects are statistically significant at the 0.05 level except the
effect of west north central division, which means that there are
no big differences in annual cost of water between households
in new England division and west north central division.
Architecture
1. The representation for floating point that we learned is
single precision. In that IEEE 754 floating point format,
represent the decimal value 63.25.
2. Represent the decimal value -1.125 in our IEEE 754
floating point format.
3. What result am I likely to get by adding 1E10 and 1E-32 in
our architecture, using IEEE 754 single precision? Why?
4. Suppose that I have a list of n random values, ranging from
1E-32 to 1E10 in size. Assume I have lots of values at each
available order of magnitude. If I wish to calculate the most
accurate sum that I can within the limitations of our
architecture, what is one simple thing I can do to make the sum
more accurate?
5. Suppose we have a 5 stage pipeline with our MIPS
architecture. The latencies of the pipeline states are 250ps for
IF, 350ps for ID, 150ps for EX, 300ps for MEM, and 200ps for
WB. Further, a program we wish to run has 45% arithmetic
instructions, 20% branch instructions, and 35% load/store
instructions.
a. What is the clock cycle time for both pipelined and non-
pipelined processors?
b. What is the total latency of a load instruction in each of a
pipelined and non-pipelined processor?
c. If we could split one stage of the pipelined processor into
two stages, each with half the latency of the original stage,
which stage would you split and what is the new clock cycle
time of the processor?
d. Suppose we can double the number of registers. Doing so
would reduce the number of load/store instructions by 10% for
the program above, but increase the register latency by 50ps.
i. What is the speedup achieved by this proposed
improvement?
ii. What effect could this change have on the number of
instructions represented in the architecture?
Project -- Second Deliverable
Introduction
After reviewing the comments of first deliverable, we learned
several things and fixed these problems in the second
deliverable. First of all, we did not give sufficient and thorough
introductions of the database’s background, which made readers
have difficulties understanding our analysis based on the data.
Second, we gave too many unnecessary details, such as data
names in database, meanings of data values, which were
confusing because readers cannot see outputs from Stata,
therefore they do not know what we were referring to. So, in
this deliverable, we will pay more attention to clarify each
variable’s representation and relations between dependent
variable and independent variables. Moreover, one label in our
table was misleading. “family member number” was supposed to
represent family size, the amount of people in each household,
but readers may interpret that in different ways. We also
concluded that, based on small t statistics and large p values, a
control variable, employment status does not have much
explanatory power with relation to the dependent variable. So in
the second deliverable, we will replace it by region, which
affects the cost of water much more significantly.
Regression Table:
Discussion of results from new regression analysis
a. different specification considered
Besides the linear regression model, we generated two more
alternative specifications, log regression and quadratic
regression, and determined the preferred one based on a
comparison of their R squares. Because all three regressions
have exactly same amount of control variables on the right side
of regressions, comparing R square is as unbiased as comparing
the adjusted R square.
To estimate what elements affect the dependent variable, annual
water cost of households measured in dollars, both Linear and
log regressions have control variables: total income each
household earns measured in thousands of dollars, family size,
whether household is located at farm or not, value of house
measured in thousands of dollars, and region of household. The
quadratic regression, however, has the square of total income
and house value instead of their original first order terms.
After running regression models in Stata, we got a R square and
an adjusted R square for all three regressions. To determine the
preferred one between linear and log regressions, however, it’s
necessary to transform logged dependent variable to unlogged
dependent variable first (generating the squared correlation
between annual water cost and estimated annual water cost). An
R-square comparison is meaningful only if the dependent
variable is the same for both models. For log model, the R-
square measures the amount of variation in ln(watercost), but
not true variation in cost of water.
The squared correlation between annual water cost and
estimated annual water cost equals to (0.2305)^2, which is
0.053. The R square of linear regression is 0.0646. So,
independent variables in linear regression model explain higher
proportion of the variation in the dependent variable and fits
observations better. Then we compared the R square of linear
regression with R square of quadratic regression, which is
0.0547. The linear regression is still better. We finally chose
linear regression as the best fitted model.
b. test statistics to determine between specifications
After testing the heteroskedasticity by both Breusch-Pagan and
White tests, we noticed that all three regessions are statistically
significant. That means all three regressions have
heteroskedaticity. To solve that probelm, we generated robust
standard deviation for each of them. The exitsence of
heteroskedasticity causes the varaince of residual varies with
changes of control variables’ values. For example, in the
Breusch-Pagan test of linear regression model, coefficient on
control varaiable household income is 212.7275 and it is
statistically significant. That means, household income causes
the variance of residual to be higher; every 1000 more income
rises the variance of residual by 212.7275. Negative and
statistically significant coefficeints cause the variance of
residual lower. Also, the variance of residual of households
located in new england division is lower than households in
other regions becuase the cofficient on new england is zero and
others are all positive.
Statical significant dose not mean economically significant.
According to the robust linear regression, the coefficent
between hosehold income and annual water cost is 0.216, with t
statistics of 22.51 and P value of 0.000. Such statistics indicate
that household income have effcet on annual water cost.
However, it is not economically signifcant because every 1000
dollars increase in income just incrased the annual water cost by
0.216 dollar. Similary, effecr of variable house value is
statistically significant, but it’s not economically significant
becuase 1000 dollars increase in house value just increase the
annual water cost by 0.18 dollar.
Discussion of whether the key control variable and at least two
others fit your prior expectations
We expected the key control variable, household income has
positive effect on annual water costs of water. Keeping all other
elements constant, the more people earn, the more they would
like to spend to improve their standard of living qualities.
Family size were supposed to be positively related to annual
water cost, too. Because more people means more requirements
of water. Last, we assumed that the coefficient on households in
west south central division is positive, which means households
in that region spend more on water than people in the new
England division. This is because of the long-lasting warm
weather there and people need to drink and wash more. Our
preferred model’s outputs match our expectations. Also, all
effects are statistically significant at the 0.05 level except the
effect of west north central division, which means that there are
no big differences in annual cost of water between households
in new England division and west north central division.
Eco311 Project, Final Deliverable, Due Thursday 12/7 at 5 p.m.
Note. There will be a 20 point penalty for each day (or part
thereof) that the assignment is late.
This deliverable should include individual results for your
secondary topic and a discussion of your
conclusions. For your secondary topic, provide the following
steps in your analysis. Your grade on the
final project will be based on the content of your analysis, but
also whether you are able to generate a
document that is professional in its appearance and content.
Your intended audience is someone who
would have the knowledge that is expected of someone who has
mastered the content in Economics
311.
1. A title page that includes a descriptive title, the author’s
name, a date, and a subtitle indicating that
it is a deliverable to Prof. William Even for Eco311 in Fall
2017.
2. Provide an introductory section that with the following:
a. A review of the main findings of your first two deliverables.
This shouldn’t be a discussion
of the details of your data and variables, but rather a simple
discussion of the key findings
from your regression. For example, “In our earlier deliverables,
we learned that there are
several important determinants of whether a person over age 55
is employed. First, …..).
b. A discussion of what is new in this deliverable. For
example, “In this deliverable, I will
extend our earlier analysis to examine differences in
employment rates between men and
women and attempt to understand why women have lower
employment rates than men. I
will also investigate whether the aging has a differential effect
on employment rates. I find
that ….”.
3. Background. This should include a discussion of the main
hypotheses you are testing, the data you
will use, and a brief summary of your major findings.
4. Provide a table of summary statistics for the dependent and
control variables for the two groups
created by your secondary variable (i.e. by sex, race, location,
marital status, or year). If your
secondary variable is year, be sure to convert all variables
measured in dollars into current dollars
using the CPI. Included in the summary statistics should be a
t-statistic that tests the null
hypothesis that the means are equal for the two groups and
asterisks indicating whether the
difference in means is significantly different from zero at the
.10 (*), .05(**) or .01(***). You may
either use the stata command ttest, or a regression of the
relevant dependent variable on the group
dummy. For example,
ttest incss, by(female)
or
reg incss female
The means, test statistic, and p-value for the null hypothesis
should be included in a single
professional table. Be sure your variable names are self-
explanatory, that you have an appropriate
title, and that your footnotes clearly define the sample for your
analysis.
NOTE: You should try to get your table to fit on a single page.
This is much simpler if you start the
table at the top of a new page. Your table should not be split
across pages unless it is impossible to
fit it all on a single page.
Table 1. Summary Statistics by Sex for Analysis of
Determinants of Number of Children. (Sample
size =446,480)a
Variable Mean for Single
People
Mean for Married
People
t-statistics for equality
(p-value in
parentheses)
Number of childrenb 0.57 1.40 233.31
(0.000)
Age 28.33 30.17 155.98
(0.000)
Etc… for all the other
controls
a Sample is drawn from 2016 American Community Survey and
restricted to people aged 21-35.
Excludes people living in group quarters.
b Number of children represents number of own children living
in same household.
5. Provide a regression analysis that allows you to test whether
the between group difference in the
dependent variable is “explained” by differences in the control
variables. After performing your
regression analysis, discuss at least two sets of key variables
(e.g. a group of education dummies
would count as one set of key variables) in your regression and
indicate whether they help explain
why there is a gap in the dependent variable between the two
groups.
A sample table is provided below. The first column gives the
raw difference in the dependent
variable across groups (married in this case). Notice that this
exactly matches the difference in the
dependent variable provided in table 1. The second
specification is from a regression of number of
children on all of your control variables. The third
specification is included for the final question.
Keep in mind that this table has only age and its square as a
control variable. Your table should
have all or most of your control variables. If all of the control
variables aren’t listed, provide a list
of other controls that were included in a footnote to your table.
Table 2. Regression Analysis of Determinants of Number of
Children.a
Specification 1 Specification 2 Specification 3
Married 0.829*** 0.701*** 0.462**
(233.3) (197.6) (2.480)
Age
-0.0733*** 0.00143
(-11.51) (0.148)
Age2
0.00253*** 0.000871***
(22.95) (5.087)
Education (omitted group has less than
a high school degree)
High school Degree ---
---
Some College ---
---
College Degree
---
---
Married*Age --- -0.0219*
--- (-1.668)
Married*Age2 --- 0.00101***
--- (4.430)
Constant 0.574*** 0.573*** -0.180
(204.2) (6.345) (-1.335)
Observations 446,880 446,880 446,880
R-squared 0.109 0.161 0.164
F-test/p-valuec ---
a Sample is drawn from 2014 American Community Survey and
restricted to people aged 21-35.
Excludes people living in group quarters.
b t-statistics are in parentheses and are calculated using robust
standard errors.
*** indicates p-value below .01; ** below 0.05, and * below
0.1.
c F-test and associated p-value are for null hypothesis that …..
You should discuss your results and refer to the relevant table
and/or specification. An example of
such writing is below:
Based on a White/Breusch-Pagan test using the residuals from
specification (2), it was determined that
the regression model had heteroscedasticity. As a
consequence, all of the t-statistics in table 2 are
based on robust standard errors.
Based on a comparison of the coefficients on the married
dummy variable in specifications 1 and 2 of
table 2, we can see that the control variables we added account
for married people having .128 more
children than single people). One explanation for this is that
married people are, on average, 1.84 years
older than single people (see table 1). Moreover, over the 21-
35 year old age range in the sample, age
has a positive marginal effect on the number of children that is
increasing in the number of children
based on estimates of the quadratic in age in the regression.1
Consequently, an important reason that
married people have more children than single people is that
they are, on average, older.
You should discuss at least 2 “important” variables (or sets of
variables) and indicate whether they help
explain why the dependent variable differs across your two
groups.
6. Provide a test of the null hypothesis that the effect of at least
one key variable differs across your
two groups. Describe the results of your test and explain the
implications for how the variable has
differential effects on the dependent variable for the two groups
you are examining.
To provide a test that a variable has a differential impact across
groups, use interaction terms. For
example, if you want to test that age has a differential effect
across married and single people, create
interactions between married and age, age2. I have included
these in specification 3 of table 2. Be
sure to discuss the statistical and economic significance of the
interaction terms. If you have multiple
interaction terms, perform a test for the joint significance of
them all and include this in your table (as
illustrated in table 2). For example,
In specification (3) of table 2, interactions between married and
age and its square are added to the
regression. Given the quadratic in age, a simple comparison of
the marginal effect of age on children
for married and single people is not simple.
The marginal effect for single people is given by
������ℎ������
��������
= .00143 + .000871 ∗ ������
The marginal effect for married people is
������ℎ������
��������
= −.02047 + .001881 ∗ ������
A comparison of these marginal effects reveals that the
marginal effect of age is greater for married
than single people for ages 22-35. The marginal effect of age
is slightly larger for singles than married
at age 21.
An f-test of the null hypothesis that the coefficients on the
interaction terms is zero is provided at the
bottom of specification (3) in table 2. The results indicate that
the null ……
1 In fact, the quadratic in age implies that the marginal effect of
age is negative until age 14.5 and is positive for all
ages beyond 14.5
Second deliverable for final project – due Thursday 11/30 5:00
p.m.
Along with other students who have chosen the same primary
topic, create a single table of regression
analysis and discuss the results in the text of your paper. Be
sure to correct any problems that were
mentioned in the review of your first draft. Your grade on this
deliverable will be based on the content
of your analysis, but also whether you are able to generate a
document that is professional in its
appearance and content. Your intended audience is someone
who would have the knowledge that is
expected of someone who has mastered the content in
Economics 311.
1. Provide at least 2 tests of alternative specifications (e.g. log
vs linear, linear vs quadratic, dummy
variables vs continuous, etc.) In the text, describe the
specifications you compared and the
preferred specification based on your analysis. Present the
results of your regression analysis
and the relevant test-statistics in a professional table. Be sure
to discuss the results of your
analysis in the text.
2. For each specification considered in part (1), provide a
Breusch-Pagan test and the simple
version of the White test (2nd form discussed in notes) for
heteroscedasticity. Include the test
statistic and corresponding p-values for these test statistics in
your regression table. In the text
of your deliverable, describe the basis for the conclusions you
draw from your
heteroscedasticity tests. If you find heteroscedasticity, are
there specific characteristics that
cause the variance of the residual to be higher or lower?
Explain how you came to this
conclusion.
3. If there is evidence of heteroscedasticity, provide standard
errors (or t-statistics) that are
properly corrected by using robust standard errors. However,
if you are estimating a linear
probability model, use weighted least squares (instead of robust
standard errors) if possible –
but be sure to investigate whether WLS would result in negative
weights. If WLS results in
negative weights, discuss how you determined this.
4. Discuss whether your expected effects for your key control
variable and at least two others that
you included in your first deliverable are confirmed by the
preferred specification you identified
above. Discuss whether these effects are statistically
significant at the .05 level.
5. Discuss the “economic significance” of the effects for your
two control variables. For example,
describe the effect of a one standard deviation change in
continuous control variables on the
dependent variable; or a switch from 0 to 1 for a dummy
variable.
6. In order that your table be deemed professional, review the
document posted on my website.
The regression table should be self-explanatory. The reader
should be able to determine what
kind of regression was estimated, how the sample was created,
and what all the variables
measure without referring to the text. Examples of appropriate
tables are provided in the
document “Creating effective tables” that is posted in the
Canvas project module.
Specific elements of the table that should make the table self-
explanatory are as follows:
a. Title (make it clear what the table is about – e.g.
Determinants of Household Electricity
Expenditures in 2016).
b. Column and row headings (See sample tables for examples
of relevant column
headers).
c. Notes attached to the table that explain
i. The source of data for the table, including relevant sample
restrictions (e.g.
households aged 25-55 who have a mortgage).
ii. Whether the table has t-statistics or standard errors in
parentheses and the
type of regression (e.g. OLS, linear probability model estimated
with OLS, etc).
If robust standard errors are used for calculation of standard
errors or t-
statistics, make that clear. (e.g. t-statistics are in parentheses.
Robust
standard errors are used in specifications 3 and 4.)
iii. Anything that needs to be clarified about variables or
methods can be stated in
the list of variables or footnotes to the table. (e.g. income is
measured in 1000s
of 2016 dollars)
iv. See Miller for examples of how to use notes in tables.
Notice that notes in
tables are referenced with a letter (not a number).
d. Variable names that are easily understood. Some examples:
i. years of education, not educ
ii. Number of children, not NCHILd
iii. Household Income in 2016 dollars, not income
iv. Be sure units of measurement are clear (e.g. 1000s of
dollars, birthweight in
pounds).
v. If you are using dummy variables for categories, group them
together and make
it clear which dummy was omitted).
e. Make it clear what the dependent variable is and also indicate
sample size and either R2
or adjusted R2 (or both).
f. Regression tables may also present only a subset of
coefficients and mention in a note
or row whether other variables were included in the regression.
See the sample tables
provided on the projects webpage. Miller (table 4.2) provides
guidance on the
appropriate number of digits. If coefficients are “too small”,
rescale the relevant control
variable to adjust (e.g. measure income in 1000s of dollars
instead of dollars).
g. Regression tables should start at the top of the page, and if
they must span across
pages, be sure to “break” them at a reasonable point (e.g. don’t
split a row with
coefficients on one table and t-statistics on the next).
The sections of your paper should include the following.
1. Title page (title, authors, date)
2. Introduction: Summarize what was learned in first
deliverable and what is added in this
deliverable.
3. Discussion of results from new regression analysis (Table 1).
a. Different specifications considered and the test statistics you
used to decide between
the specifications.
b. The implications of the heteroskedasticity tests for each
specification (i.e. do you find
evidence of heteroskedasticity) and how this affected your
decision to use robust
standard errors or weighted least squares in each specification.
4. Discussion of whether the key control variable and at least
two others fit your prior
expectations. Be sure to explain why you expected the effect of
the control variable on the
dependent variable to be either positive or negative.
First deliverable for final project – due Tuesday 11/21 by 5 p.m.
Your first deliverable should include a word document that has
the following parts. You will also
separately submit a do-file and log-file that created all of the
results. All deliverables will be submitted
via Canvas by 5 p.m. There is a 20 percent penalty for any
paper submitted after the deadline but
before 11/23 at 5 p.m. No submissions will be accepted after
11/23 at 5 p.m.
1) A description of the data set you will use for your primary
analysis. This should include a
description of the primary data source (the American
Community Survey), the year(s) of
the data, and the restrictions that I described above. Note: if
you are investigating a
“household level variable” (e.g. homeownership, home value,
rent,) you should use only
the head of the household (pernum==1) for your analysis. If
you are investigating an
“individual level” variable (e.g. employment, marital status),
use both the reference
person and his/her spouse
2) A description of any sample restrictions that you are making
(e.g. omitted people in
certain age ranges, dropped people with missing data, etc.).
Make sure that you have
“cleaned” your data so that you have the same number of
observations on the
dependent and control variables. Also, describe any variables
that you create or modify.
For example, be sure to study the codebook to understand how
variables might be
coded if there are missing values (drop such observations). You
may also want to adjust
the units that certain variables are measured in (e.g. housing
values might be converted
to 1000s of dollars instead of dollars).
3) A description of the dependent variable and the key control
variable in your analysis and
a discussion of why you believe the key control variable will
have either a positive or
negative effect on the dependent variable. You should show the
relationship between
the dependent variable and the key control variable of interest
with both a table and a
graph. The table and graph should be professionally designed
and be self-explanatory.
That is, anyone who looks at the table or graph should be able
to determine what
statistics are presented, the meaning of the variables, and where
the data came from.
4) A description of at least two other control variables that you
believe will help explain
variation in your dependent variable. Describe why you think
the direction of the
expected effect of each control variable on the dependent
variable would be either
positive or negative. A professional table showing for the
dependent variable and each
control variable the number of observations, the sample mean,
standard deviation,
minimum and maximum value for the dependent variable and
each of the control
variables in your data. Be sure to explain any modifications you
made to the variables in
the data set (e.g. did you have to recode variables that were
missing? Did you have to
convert a categorical variable to a continuous variable? Did you
make dummy
variables?)
5) A simple linear regression of your dependent variable on the
key variable(s) of interest
and other control variables that you believe are important. The
results should be
presented in a professional table created with the Stata routine
esttab. The results
should include coefficients, t-statistics, R2 and adjusted R2 ,
and the number of
observations.
6) The structure of your word document should include the
following sections.
(a) Introduction.
(i) Describes the objective of the deliverable and a couple of the
major findings.
For example, this study will use data from the ACS to
understand the factors
that determine whether people are employed. We find several
important
determinants of employment. For example, …..
(b) The data
(i) Description of ACS data.
(ii) The dependent variable to be studied, the key control
variable, and other
controls you think are important.
(iii) Expected effect of each of the control variables and why.
(iv) Sample restrictions for your analysis.
(v) Any modifications you made to the variables that you use in
your analysis.
(c) Results of data analysis.
(i) The word document should provide a brief discussion of the
results in the tables
and figures presented below. All the tables and figures should
be numbered
and added to the end of your word document.
(ii) Table 1. A professional table showing sample statistics
(means, min, max,
observations, etc.)
(iii) Figure 1. A professional figure showing relationship
between dependent
variable and key control variable.
(iv) Table 2. A professional table showing relationship
between dependent variable
and key control variable.
(v) Table 3. A professional table showing regression results.
Project -- Second DeliverableIntroductionAfter reviewing the.docx

More Related Content

Similar to Project -- Second DeliverableIntroductionAfter reviewing the.docx

Analysis of the Boston Housing Data from the 1970 census
Analysis of the Boston Housing Data from the 1970 censusAnalysis of the Boston Housing Data from the 1970 census
Analysis of the Boston Housing Data from the 1970 census
Shuai Yuan
 
Hot water-plumbing-system
Hot water-plumbing-systemHot water-plumbing-system
Hot water-plumbing-system
nhmurad
 
Turning Multivariable Models Into Interactive Animated Simulations
Turning Multivariable Models Into Interactive Animated SimulationsTurning Multivariable Models Into Interactive Animated Simulations
Turning Multivariable Models Into Interactive Animated Simulations
Tom Loughran
 
chapter 5Cost–Volume–Profit AnalysisLearning Objective.docx
chapter 5Cost–Volume–Profit AnalysisLearning Objective.docxchapter 5Cost–Volume–Profit AnalysisLearning Objective.docx
chapter 5Cost–Volume–Profit AnalysisLearning Objective.docx
christinemaritza
 
An Empirical Study on the Change of Consumption Level of Chinese Residents
An Empirical Study on the Change of Consumption Level of Chinese ResidentsAn Empirical Study on the Change of Consumption Level of Chinese Residents
An Empirical Study on the Change of Consumption Level of Chinese Residents
Dr. Amarjeet Singh
 
Lecture8 Applied Econometrics and Economic Modeling
Lecture8 Applied Econometrics and Economic ModelingLecture8 Applied Econometrics and Economic Modeling
Lecture8 Applied Econometrics and Economic Modeling
stone55
 
Water Value for Hydro Generation Reservoirs
Water Value for Hydro Generation ReservoirsWater Value for Hydro Generation Reservoirs
Water Value for Hydro Generation Reservoirs
spsmsanda
 
Download-manuals-surface water-waterlevel-37howtodohydrologicaldatavalidatio...
 Download-manuals-surface water-waterlevel-37howtodohydrologicaldatavalidatio... Download-manuals-surface water-waterlevel-37howtodohydrologicaldatavalidatio...
Download-manuals-surface water-waterlevel-37howtodohydrologicaldatavalidatio...
hydrologyproject001
 

Similar to Project -- Second DeliverableIntroductionAfter reviewing the.docx (20)

Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data Analytics
 
Ch14 multiple regression
Ch14 multiple regressionCh14 multiple regression
Ch14 multiple regression
 
Analysis of the Boston Housing Data from the 1970 census
Analysis of the Boston Housing Data from the 1970 censusAnalysis of the Boston Housing Data from the 1970 census
Analysis of the Boston Housing Data from the 1970 census
 
Hot water-plumbing-system
Hot water-plumbing-systemHot water-plumbing-system
Hot water-plumbing-system
 
052115 final nlm jd water energy goggles 2015 emc final
052115 final nlm jd water energy goggles 2015 emc final052115 final nlm jd water energy goggles 2015 emc final
052115 final nlm jd water energy goggles 2015 emc final
 
Turning Multivariable Models Into Interactive Animated Simulations
Turning Multivariable Models Into Interactive Animated SimulationsTurning Multivariable Models Into Interactive Animated Simulations
Turning Multivariable Models Into Interactive Animated Simulations
 
Shortcut Design Method for Multistage Binary Distillation via MS-Exce
Shortcut Design Method for Multistage Binary Distillation via MS-ExceShortcut Design Method for Multistage Binary Distillation via MS-Exce
Shortcut Design Method for Multistage Binary Distillation via MS-Exce
 
chapter 5Cost–Volume–Profit AnalysisLearning Objective.docx
chapter 5Cost–Volume–Profit AnalysisLearning Objective.docxchapter 5Cost–Volume–Profit AnalysisLearning Objective.docx
chapter 5Cost–Volume–Profit AnalysisLearning Objective.docx
 
An Empirical Study on the Change of Consumption Level of Chinese Residents
An Empirical Study on the Change of Consumption Level of Chinese ResidentsAn Empirical Study on the Change of Consumption Level of Chinese Residents
An Empirical Study on the Change of Consumption Level of Chinese Residents
 
Lecture8 Applied Econometrics and Economic Modeling
Lecture8 Applied Econometrics and Economic ModelingLecture8 Applied Econometrics and Economic Modeling
Lecture8 Applied Econometrics and Economic Modeling
 
Group5
Group5Group5
Group5
 
Lab 7 write up
Lab 7 write upLab 7 write up
Lab 7 write up
 
Rate Structures, Fixed Costs, Declining Demand, Reduced Revenues: What Really...
Rate Structures, Fixed Costs, Declining Demand, Reduced Revenues: What Really...Rate Structures, Fixed Costs, Declining Demand, Reduced Revenues: What Really...
Rate Structures, Fixed Costs, Declining Demand, Reduced Revenues: What Really...
 
Water Value for Hydro Generation Reservoirs
Water Value for Hydro Generation ReservoirsWater Value for Hydro Generation Reservoirs
Water Value for Hydro Generation Reservoirs
 
England's North-South Divide on Home Ownership
England's North-South Divide on Home OwnershipEngland's North-South Divide on Home Ownership
England's North-South Divide on Home Ownership
 
Lbnl 50409
Lbnl 50409Lbnl 50409
Lbnl 50409
 
Water quality index with missing parameters
Water quality index with missing parametersWater quality index with missing parameters
Water quality index with missing parameters
 
Water quality index with missing parameters
Water quality index with missing parametersWater quality index with missing parameters
Water quality index with missing parameters
 
Averting expenditure measure of willingness to pay
Averting expenditure    measure of willingness to payAverting expenditure    measure of willingness to pay
Averting expenditure measure of willingness to pay
 
Download-manuals-surface water-waterlevel-37howtodohydrologicaldatavalidatio...
 Download-manuals-surface water-waterlevel-37howtodohydrologicaldatavalidatio... Download-manuals-surface water-waterlevel-37howtodohydrologicaldatavalidatio...
Download-manuals-surface water-waterlevel-37howtodohydrologicaldatavalidatio...
 

More from briancrawford30935

You have been working as a technology associate the information .docx
You have been working as a technology associate the information .docxYou have been working as a technology associate the information .docx
You have been working as a technology associate the information .docx
briancrawford30935
 
You have chosen to join WHO. They are particularly interested in.docx
You have chosen to join WHO. They are particularly interested in.docxYou have chosen to join WHO. They are particularly interested in.docx
You have chosen to join WHO. They are particularly interested in.docx
briancrawford30935
 
You have been tasked to present at a town hall meeting in your local.docx
You have been tasked to present at a town hall meeting in your local.docxYou have been tasked to present at a town hall meeting in your local.docx
You have been tasked to present at a town hall meeting in your local.docx
briancrawford30935
 
You have been tasked to devise a program to address the needs of.docx
You have been tasked to devise a program to address the needs of.docxYou have been tasked to devise a program to address the needs of.docx
You have been tasked to devise a program to address the needs of.docx
briancrawford30935
 
You have been successful in your application for the position be.docx
You have been successful in your application for the position be.docxYou have been successful in your application for the position be.docx
You have been successful in your application for the position be.docx
briancrawford30935
 
You have been hired as the CSO (Chief Security Officer) for an org.docx
You have been hired as the CSO (Chief Security Officer) for an org.docxYou have been hired as the CSO (Chief Security Officer) for an org.docx
You have been hired as the CSO (Chief Security Officer) for an org.docx
briancrawford30935
 
You have learned that Mr. Moore does not drink alcohol in the mornin.docx
You have learned that Mr. Moore does not drink alcohol in the mornin.docxYou have learned that Mr. Moore does not drink alcohol in the mornin.docx
You have learned that Mr. Moore does not drink alcohol in the mornin.docx
briancrawford30935
 

More from briancrawford30935 (20)

You have collected the following documents (unstructured) and pl.docx
You have collected the following documents (unstructured) and pl.docxYou have collected the following documents (unstructured) and pl.docx
You have collected the following documents (unstructured) and pl.docx
 
You have been working as a technology associate the information .docx
You have been working as a technology associate the information .docxYou have been working as a technology associate the information .docx
You have been working as a technology associate the information .docx
 
You have chosen to join WHO. They are particularly interested in.docx
You have chosen to join WHO. They are particularly interested in.docxYou have chosen to join WHO. They are particularly interested in.docx
You have chosen to join WHO. They are particularly interested in.docx
 
You have been tasked to present at a town hall meeting in your local.docx
You have been tasked to present at a town hall meeting in your local.docxYou have been tasked to present at a town hall meeting in your local.docx
You have been tasked to present at a town hall meeting in your local.docx
 
You have been tasked as the health care administrator of a major hos.docx
You have been tasked as the health care administrator of a major hos.docxYou have been tasked as the health care administrator of a major hos.docx
You have been tasked as the health care administrator of a major hos.docx
 
You have been tasked to devise a program to address the needs of.docx
You have been tasked to devise a program to address the needs of.docxYou have been tasked to devise a program to address the needs of.docx
You have been tasked to devise a program to address the needs of.docx
 
You have been successful in your application for the position be.docx
You have been successful in your application for the position be.docxYou have been successful in your application for the position be.docx
You have been successful in your application for the position be.docx
 
You have been hired as a project management consultant by compan.docx
You have been hired as a project management consultant by compan.docxYou have been hired as a project management consultant by compan.docx
You have been hired as a project management consultant by compan.docx
 
You have been hired to manage a particular aspect of the new ad.docx
You have been hired to manage a particular aspect of the new ad.docxYou have been hired to manage a particular aspect of the new ad.docx
You have been hired to manage a particular aspect of the new ad.docx
 
You have been hired by Red Didgeridoo Technologies. They know th.docx
You have been hired by Red Didgeridoo Technologies. They know th.docxYou have been hired by Red Didgeridoo Technologies. They know th.docx
You have been hired by Red Didgeridoo Technologies. They know th.docx
 
You have been hired by TMI to design an application using shell scri.docx
You have been hired by TMI to design an application using shell scri.docxYou have been hired by TMI to design an application using shell scri.docx
You have been hired by TMI to design an application using shell scri.docx
 
You have been hired as the CSO (Chief Security Officer) for an org.docx
You have been hired as the CSO (Chief Security Officer) for an org.docxYou have been hired as the CSO (Chief Security Officer) for an org.docx
You have been hired as the CSO (Chief Security Officer) for an org.docx
 
You have been hired to evaluate the volcanic hazards associated .docx
You have been hired to evaluate the volcanic hazards associated .docxYou have been hired to evaluate the volcanic hazards associated .docx
You have been hired to evaluate the volcanic hazards associated .docx
 
You have been hired as an assistant to the public health officer for.docx
You have been hired as an assistant to the public health officer for.docxYou have been hired as an assistant to the public health officer for.docx
You have been hired as an assistant to the public health officer for.docx
 
You have been engaged to develop a special calculator program. T.docx
You have been engaged to develop a special calculator program. T.docxYou have been engaged to develop a special calculator program. T.docx
You have been engaged to develop a special calculator program. T.docx
 
You have now delivered the project to your customer ahead of schedul.docx
You have now delivered the project to your customer ahead of schedul.docxYou have now delivered the project to your customer ahead of schedul.docx
You have now delivered the project to your customer ahead of schedul.docx
 
You have now delivered the project to your customer. The project was.docx
You have now delivered the project to your customer. The project was.docxYou have now delivered the project to your customer. The project was.docx
You have now delivered the project to your customer. The project was.docx
 
You have now experienced the work of various scholars, artists and m.docx
You have now experienced the work of various scholars, artists and m.docxYou have now experienced the work of various scholars, artists and m.docx
You have now experienced the work of various scholars, artists and m.docx
 
You have learned that Mr. Moore does not drink alcohol in the mornin.docx
You have learned that Mr. Moore does not drink alcohol in the mornin.docxYou have learned that Mr. Moore does not drink alcohol in the mornin.docx
You have learned that Mr. Moore does not drink alcohol in the mornin.docx
 
You have been hired by a large hospitality firm (e.g., Marriot.docx
You have been hired by a large hospitality firm (e.g., Marriot.docxYou have been hired by a large hospitality firm (e.g., Marriot.docx
You have been hired by a large hospitality firm (e.g., Marriot.docx
 

Recently uploaded

Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
EADTU
 

Recently uploaded (20)

Observing-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxObserving-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptx
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptx
 
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
 
Supporting Newcomer Multilingual Learners
Supporting Newcomer  Multilingual LearnersSupporting Newcomer  Multilingual Learners
Supporting Newcomer Multilingual Learners
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDF
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 
Trauma-Informed Leadership - Five Practical Principles
Trauma-Informed Leadership - Five Practical PrinciplesTrauma-Informed Leadership - Five Practical Principles
Trauma-Informed Leadership - Five Practical Principles
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio App
 
male presentation...pdf.................
male presentation...pdf.................male presentation...pdf.................
male presentation...pdf.................
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptx
 
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of TransportBasic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17
 

Project -- Second DeliverableIntroductionAfter reviewing the.docx

  • 1. Project -- Second Deliverable Introduction After reviewing the comments of first deliverable, we learned several things and fixed these problems in the second deliverable. First of all, we did not give sufficient and thorough introductions of the database’s background, which made readers have difficulties understanding our analysis based on the data. Second, we gave too many unnecessary details, such as data names in database, meanings of data values, which were confusing because readers cannot see outputs from Stata, therefore they do not know what we were referring to. So, in this deliverable, we will pay more attention to clarify each variable’s representation and relations between dependent variable and independent variables. Moreover, one label in our table was misleading. “family member number” was supposed to represent family size, the amount of people in each household, but readers may interpret that in different ways. We also concluded that, based on small t statistics and large p values, a control variable, employment status does not have much explanatory power with relation to the dependent variable. So in the second deliverable, we will replace it by region, which affects the cost of water much more significantly. Regression Table: Discussion of results from new regression analysis a. different specification considered
  • 2. Besides the linear regression model, we generated two more alternative specifications, log regression and quadratic regression, and determined the preferred one based on a comparison of their R squares. Because all three regressions have exactly same amount of control variables on the right side of regressions, comparing R square is as unbiased as comparing the adjusted R square. To estimate what elements affect the dependent variable, annual water cost of households measured in dollars, both Linear and log regressions have control variables: total income each household earns measured in thousands of dollars, family size, whether household is located at farm or not, value of house measured in thousands of dollars, and region of household. The quadratic regression, however, has the square of total income and house value instead of their original first order terms. After running regression models in Stata, we got a R square and an adjusted R square for all three regressions. To determine the preferred one between linear and log regressions, however, it’s necessary to transform logged dependent variable to unlogged dependent variable first (generating the squared correlation between annual water cost and estimated annual water cost). An R-square comparison is meaningful only if the dependent variable is the same for both models. For log model, the R- square measures the amount of variation in ln(watercost), but not true variation in cost of water. The squared correlation between annual water cost and estimated annual water cost equals to (0.2305)^2, which is 0.053. The R square of linear regression is 0.0646. So, independent variables in linear regression model explain higher proportion of the variation in the dependent variable and fits observations better. Then we compared the R square of linear regression with R square of quadratic regression, which is 0.0547. The linear regression is still better. We finally chose linear regression as the best fitted model. b. test statistics to determine between specifications
  • 3. After testing the heteroskedasticity by both Breusch-Pagan and White tests, we noticed that all three regessions are statistically significant. That means all three regressions have heteroskedaticity. To solve that probelm, we generated robust standard deviation for each of them. The exitsence of heteroskedasticity causes the varaince of residual varies with changes of control variables’ values. For example, in the Breusch-Pagan test of linear regression model, coefficient on control varaiable household income is 212.7275 and it is statistically significant. That means, household income causes the variance of residual to be higher; every 1000 more income rises the variance of residual by 212.7275. Negative and statistically significant coefficeints cause the variance of residual lower. Also, the variance of residual of households located in new england division is lower than households in other regions becuase the cofficient on new england is zero and others are all positive. Statical significant dose not mean economically significant. According to the robust linear regression, the coefficent between hosehold income and annual water cost is 0.216, with t statistics of 22.51 and P value of 0.000. Such statistics indicate that household income have effcet on annual water cost. However, it is not economically signifcant because every 1000 dollars increase in income just incrased the annual water cost by 0.216 dollar. Similary, effecr of variable house value is statistically significant, but it’s not economically significant becuase 1000 dollars increase in house value just increase the annual water cost by 0.18 dollar. Discussion of whether the key control variable and at least two others fit your prior expectations We expected the key control variable, household income has positive effect on annual water costs of water. Keeping all other elements constant, the more people earn, the more they would like to spend to improve their standard of living qualities. Family size were supposed to be positively related to annual water cost, too. Because more people means more requirements
  • 4. of water. Last, we assumed that the coefficient on households in west south central division is positive, which means households in that region spend more on water than people in the new England division. This is because of the long-lasting warm weather there and people need to drink and wash more. Our preferred model’s outputs match our expectations. Also, all effects are statistically significant at the 0.05 level except the effect of west north central division, which means that there are no big differences in annual cost of water between households in new England division and west north central division. Architecture 1. The representation for floating point that we learned is single precision. In that IEEE 754 floating point format, represent the decimal value 63.25. 2. Represent the decimal value -1.125 in our IEEE 754 floating point format. 3. What result am I likely to get by adding 1E10 and 1E-32 in our architecture, using IEEE 754 single precision? Why? 4. Suppose that I have a list of n random values, ranging from 1E-32 to 1E10 in size. Assume I have lots of values at each available order of magnitude. If I wish to calculate the most accurate sum that I can within the limitations of our architecture, what is one simple thing I can do to make the sum more accurate?
  • 5. 5. Suppose we have a 5 stage pipeline with our MIPS architecture. The latencies of the pipeline states are 250ps for IF, 350ps for ID, 150ps for EX, 300ps for MEM, and 200ps for WB. Further, a program we wish to run has 45% arithmetic instructions, 20% branch instructions, and 35% load/store instructions. a. What is the clock cycle time for both pipelined and non- pipelined processors? b. What is the total latency of a load instruction in each of a pipelined and non-pipelined processor? c. If we could split one stage of the pipelined processor into two stages, each with half the latency of the original stage, which stage would you split and what is the new clock cycle time of the processor? d. Suppose we can double the number of registers. Doing so would reduce the number of load/store instructions by 10% for the program above, but increase the register latency by 50ps. i. What is the speedup achieved by this proposed improvement? ii. What effect could this change have on the number of instructions represented in the architecture? Project -- Second Deliverable Introduction After reviewing the comments of first deliverable, we learned several things and fixed these problems in the second deliverable. First of all, we did not give sufficient and thorough introductions of the database’s background, which made readers have difficulties understanding our analysis based on the data. Second, we gave too many unnecessary details, such as data names in database, meanings of data values, which were
  • 6. confusing because readers cannot see outputs from Stata, therefore they do not know what we were referring to. So, in this deliverable, we will pay more attention to clarify each variable’s representation and relations between dependent variable and independent variables. Moreover, one label in our table was misleading. “family member number” was supposed to represent family size, the amount of people in each household, but readers may interpret that in different ways. We also concluded that, based on small t statistics and large p values, a control variable, employment status does not have much explanatory power with relation to the dependent variable. So in the second deliverable, we will replace it by region, which affects the cost of water much more significantly. Regression Table: Discussion of results from new regression analysis a. different specification considered Besides the linear regression model, we generated two more alternative specifications, log regression and quadratic regression, and determined the preferred one based on a comparison of their R squares. Because all three regressions have exactly same amount of control variables on the right side of regressions, comparing R square is as unbiased as comparing the adjusted R square. To estimate what elements affect the dependent variable, annual water cost of households measured in dollars, both Linear and log regressions have control variables: total income each household earns measured in thousands of dollars, family size, whether household is located at farm or not, value of house
  • 7. measured in thousands of dollars, and region of household. The quadratic regression, however, has the square of total income and house value instead of their original first order terms. After running regression models in Stata, we got a R square and an adjusted R square for all three regressions. To determine the preferred one between linear and log regressions, however, it’s necessary to transform logged dependent variable to unlogged dependent variable first (generating the squared correlation between annual water cost and estimated annual water cost). An R-square comparison is meaningful only if the dependent variable is the same for both models. For log model, the R- square measures the amount of variation in ln(watercost), but not true variation in cost of water. The squared correlation between annual water cost and estimated annual water cost equals to (0.2305)^2, which is 0.053. The R square of linear regression is 0.0646. So, independent variables in linear regression model explain higher proportion of the variation in the dependent variable and fits observations better. Then we compared the R square of linear regression with R square of quadratic regression, which is 0.0547. The linear regression is still better. We finally chose linear regression as the best fitted model. b. test statistics to determine between specifications After testing the heteroskedasticity by both Breusch-Pagan and White tests, we noticed that all three regessions are statistically significant. That means all three regressions have heteroskedaticity. To solve that probelm, we generated robust standard deviation for each of them. The exitsence of heteroskedasticity causes the varaince of residual varies with changes of control variables’ values. For example, in the Breusch-Pagan test of linear regression model, coefficient on control varaiable household income is 212.7275 and it is statistically significant. That means, household income causes the variance of residual to be higher; every 1000 more income rises the variance of residual by 212.7275. Negative and
  • 8. statistically significant coefficeints cause the variance of residual lower. Also, the variance of residual of households located in new england division is lower than households in other regions becuase the cofficient on new england is zero and others are all positive. Statical significant dose not mean economically significant. According to the robust linear regression, the coefficent between hosehold income and annual water cost is 0.216, with t statistics of 22.51 and P value of 0.000. Such statistics indicate that household income have effcet on annual water cost. However, it is not economically signifcant because every 1000 dollars increase in income just incrased the annual water cost by 0.216 dollar. Similary, effecr of variable house value is statistically significant, but it’s not economically significant becuase 1000 dollars increase in house value just increase the annual water cost by 0.18 dollar. Discussion of whether the key control variable and at least two others fit your prior expectations We expected the key control variable, household income has positive effect on annual water costs of water. Keeping all other elements constant, the more people earn, the more they would like to spend to improve their standard of living qualities. Family size were supposed to be positively related to annual water cost, too. Because more people means more requirements of water. Last, we assumed that the coefficient on households in west south central division is positive, which means households in that region spend more on water than people in the new England division. This is because of the long-lasting warm weather there and people need to drink and wash more. Our preferred model’s outputs match our expectations. Also, all effects are statistically significant at the 0.05 level except the effect of west north central division, which means that there are no big differences in annual cost of water between households in new England division and west north central division.
  • 9. Eco311 Project, Final Deliverable, Due Thursday 12/7 at 5 p.m. Note. There will be a 20 point penalty for each day (or part thereof) that the assignment is late. This deliverable should include individual results for your secondary topic and a discussion of your conclusions. For your secondary topic, provide the following steps in your analysis. Your grade on the final project will be based on the content of your analysis, but also whether you are able to generate a document that is professional in its appearance and content. Your intended audience is someone who would have the knowledge that is expected of someone who has mastered the content in Economics 311. 1. A title page that includes a descriptive title, the author’s name, a date, and a subtitle indicating that it is a deliverable to Prof. William Even for Eco311 in Fall 2017. 2. Provide an introductory section that with the following: a. A review of the main findings of your first two deliverables. This shouldn’t be a discussion of the details of your data and variables, but rather a simple
  • 10. discussion of the key findings from your regression. For example, “In our earlier deliverables, we learned that there are several important determinants of whether a person over age 55 is employed. First, …..). b. A discussion of what is new in this deliverable. For example, “In this deliverable, I will extend our earlier analysis to examine differences in employment rates between men and women and attempt to understand why women have lower employment rates than men. I will also investigate whether the aging has a differential effect on employment rates. I find that ….”. 3. Background. This should include a discussion of the main hypotheses you are testing, the data you will use, and a brief summary of your major findings. 4. Provide a table of summary statistics for the dependent and control variables for the two groups created by your secondary variable (i.e. by sex, race, location, marital status, or year). If your secondary variable is year, be sure to convert all variables measured in dollars into current dollars using the CPI. Included in the summary statistics should be a t-statistic that tests the null hypothesis that the means are equal for the two groups and asterisks indicating whether the difference in means is significantly different from zero at the .10 (*), .05(**) or .01(***). You may either use the stata command ttest, or a regression of the
  • 11. relevant dependent variable on the group dummy. For example, ttest incss, by(female) or reg incss female The means, test statistic, and p-value for the null hypothesis should be included in a single professional table. Be sure your variable names are self- explanatory, that you have an appropriate title, and that your footnotes clearly define the sample for your analysis. NOTE: You should try to get your table to fit on a single page. This is much simpler if you start the table at the top of a new page. Your table should not be split across pages unless it is impossible to fit it all on a single page. Table 1. Summary Statistics by Sex for Analysis of Determinants of Number of Children. (Sample size =446,480)a Variable Mean for Single People Mean for Married People t-statistics for equality
  • 12. (p-value in parentheses) Number of childrenb 0.57 1.40 233.31 (0.000) Age 28.33 30.17 155.98 (0.000) Etc… for all the other controls a Sample is drawn from 2016 American Community Survey and restricted to people aged 21-35. Excludes people living in group quarters. b Number of children represents number of own children living in same household. 5. Provide a regression analysis that allows you to test whether the between group difference in the dependent variable is “explained” by differences in the control variables. After performing your regression analysis, discuss at least two sets of key variables (e.g. a group of education dummies would count as one set of key variables) in your regression and indicate whether they help explain why there is a gap in the dependent variable between the two groups. A sample table is provided below. The first column gives the raw difference in the dependent
  • 13. variable across groups (married in this case). Notice that this exactly matches the difference in the dependent variable provided in table 1. The second specification is from a regression of number of children on all of your control variables. The third specification is included for the final question. Keep in mind that this table has only age and its square as a control variable. Your table should have all or most of your control variables. If all of the control variables aren’t listed, provide a list of other controls that were included in a footnote to your table. Table 2. Regression Analysis of Determinants of Number of Children.a Specification 1 Specification 2 Specification 3 Married 0.829*** 0.701*** 0.462** (233.3) (197.6) (2.480) Age -0.0733*** 0.00143 (-11.51) (0.148) Age2 0.00253*** 0.000871*** (22.95) (5.087) Education (omitted group has less than
  • 14. a high school degree) High school Degree --- --- Some College --- --- College Degree --- --- Married*Age --- -0.0219* --- (-1.668) Married*Age2 --- 0.00101*** --- (4.430) Constant 0.574*** 0.573*** -0.180 (204.2) (6.345) (-1.335) Observations 446,880 446,880 446,880 R-squared 0.109 0.161 0.164 F-test/p-valuec --- a Sample is drawn from 2014 American Community Survey and restricted to people aged 21-35. Excludes people living in group quarters. b t-statistics are in parentheses and are calculated using robust standard errors. *** indicates p-value below .01; ** below 0.05, and * below 0.1. c F-test and associated p-value are for null hypothesis that …..
  • 15. You should discuss your results and refer to the relevant table and/or specification. An example of such writing is below: Based on a White/Breusch-Pagan test using the residuals from specification (2), it was determined that the regression model had heteroscedasticity. As a consequence, all of the t-statistics in table 2 are based on robust standard errors. Based on a comparison of the coefficients on the married dummy variable in specifications 1 and 2 of table 2, we can see that the control variables we added account for married people having .128 more children than single people). One explanation for this is that married people are, on average, 1.84 years older than single people (see table 1). Moreover, over the 21- 35 year old age range in the sample, age has a positive marginal effect on the number of children that is increasing in the number of children based on estimates of the quadratic in age in the regression.1 Consequently, an important reason that married people have more children than single people is that they are, on average, older. You should discuss at least 2 “important” variables (or sets of variables) and indicate whether they help explain why the dependent variable differs across your two groups. 6. Provide a test of the null hypothesis that the effect of at least
  • 16. one key variable differs across your two groups. Describe the results of your test and explain the implications for how the variable has differential effects on the dependent variable for the two groups you are examining. To provide a test that a variable has a differential impact across groups, use interaction terms. For example, if you want to test that age has a differential effect across married and single people, create interactions between married and age, age2. I have included these in specification 3 of table 2. Be sure to discuss the statistical and economic significance of the interaction terms. If you have multiple interaction terms, perform a test for the joint significance of them all and include this in your table (as illustrated in table 2). For example, In specification (3) of table 2, interactions between married and age and its square are added to the regression. Given the quadratic in age, a simple comparison of the marginal effect of age on children for married and single people is not simple. The marginal effect for single people is given by ������ℎ������ �������� = .00143 + .000871 ∗ ������ The marginal effect for married people is ������ℎ������ ��������
  • 17. = −.02047 + .001881 ∗ ������ A comparison of these marginal effects reveals that the marginal effect of age is greater for married than single people for ages 22-35. The marginal effect of age is slightly larger for singles than married at age 21. An f-test of the null hypothesis that the coefficients on the interaction terms is zero is provided at the bottom of specification (3) in table 2. The results indicate that the null …… 1 In fact, the quadratic in age implies that the marginal effect of age is negative until age 14.5 and is positive for all ages beyond 14.5 Second deliverable for final project – due Thursday 11/30 5:00 p.m. Along with other students who have chosen the same primary topic, create a single table of regression analysis and discuss the results in the text of your paper. Be sure to correct any problems that were mentioned in the review of your first draft. Your grade on this deliverable will be based on the content of your analysis, but also whether you are able to generate a document that is professional in its
  • 18. appearance and content. Your intended audience is someone who would have the knowledge that is expected of someone who has mastered the content in Economics 311. 1. Provide at least 2 tests of alternative specifications (e.g. log vs linear, linear vs quadratic, dummy variables vs continuous, etc.) In the text, describe the specifications you compared and the preferred specification based on your analysis. Present the results of your regression analysis and the relevant test-statistics in a professional table. Be sure to discuss the results of your analysis in the text. 2. For each specification considered in part (1), provide a Breusch-Pagan test and the simple version of the White test (2nd form discussed in notes) for heteroscedasticity. Include the test statistic and corresponding p-values for these test statistics in your regression table. In the text of your deliverable, describe the basis for the conclusions you draw from your heteroscedasticity tests. If you find heteroscedasticity, are there specific characteristics that cause the variance of the residual to be higher or lower? Explain how you came to this conclusion. 3. If there is evidence of heteroscedasticity, provide standard errors (or t-statistics) that are properly corrected by using robust standard errors. However, if you are estimating a linear
  • 19. probability model, use weighted least squares (instead of robust standard errors) if possible – but be sure to investigate whether WLS would result in negative weights. If WLS results in negative weights, discuss how you determined this. 4. Discuss whether your expected effects for your key control variable and at least two others that you included in your first deliverable are confirmed by the preferred specification you identified above. Discuss whether these effects are statistically significant at the .05 level. 5. Discuss the “economic significance” of the effects for your two control variables. For example, describe the effect of a one standard deviation change in continuous control variables on the dependent variable; or a switch from 0 to 1 for a dummy variable. 6. In order that your table be deemed professional, review the document posted on my website. The regression table should be self-explanatory. The reader should be able to determine what kind of regression was estimated, how the sample was created, and what all the variables measure without referring to the text. Examples of appropriate tables are provided in the document “Creating effective tables” that is posted in the Canvas project module. Specific elements of the table that should make the table self- explanatory are as follows:
  • 20. a. Title (make it clear what the table is about – e.g. Determinants of Household Electricity Expenditures in 2016). b. Column and row headings (See sample tables for examples of relevant column headers). c. Notes attached to the table that explain i. The source of data for the table, including relevant sample restrictions (e.g. households aged 25-55 who have a mortgage). ii. Whether the table has t-statistics or standard errors in parentheses and the type of regression (e.g. OLS, linear probability model estimated with OLS, etc). If robust standard errors are used for calculation of standard errors or t- statistics, make that clear. (e.g. t-statistics are in parentheses. Robust standard errors are used in specifications 3 and 4.) iii. Anything that needs to be clarified about variables or methods can be stated in the list of variables or footnotes to the table. (e.g. income is measured in 1000s of 2016 dollars) iv. See Miller for examples of how to use notes in tables. Notice that notes in tables are referenced with a letter (not a number).
  • 21. d. Variable names that are easily understood. Some examples: i. years of education, not educ ii. Number of children, not NCHILd iii. Household Income in 2016 dollars, not income iv. Be sure units of measurement are clear (e.g. 1000s of dollars, birthweight in pounds). v. If you are using dummy variables for categories, group them together and make it clear which dummy was omitted). e. Make it clear what the dependent variable is and also indicate sample size and either R2 or adjusted R2 (or both). f. Regression tables may also present only a subset of coefficients and mention in a note or row whether other variables were included in the regression. See the sample tables provided on the projects webpage. Miller (table 4.2) provides guidance on the appropriate number of digits. If coefficients are “too small”, rescale the relevant control variable to adjust (e.g. measure income in 1000s of dollars instead of dollars). g. Regression tables should start at the top of the page, and if they must span across pages, be sure to “break” them at a reasonable point (e.g. don’t split a row with coefficients on one table and t-statistics on the next).
  • 22. The sections of your paper should include the following. 1. Title page (title, authors, date) 2. Introduction: Summarize what was learned in first deliverable and what is added in this deliverable. 3. Discussion of results from new regression analysis (Table 1). a. Different specifications considered and the test statistics you used to decide between the specifications. b. The implications of the heteroskedasticity tests for each specification (i.e. do you find evidence of heteroskedasticity) and how this affected your decision to use robust standard errors or weighted least squares in each specification. 4. Discussion of whether the key control variable and at least two others fit your prior expectations. Be sure to explain why you expected the effect of the control variable on the dependent variable to be either positive or negative. First deliverable for final project – due Tuesday 11/21 by 5 p.m. Your first deliverable should include a word document that has the following parts. You will also
  • 23. separately submit a do-file and log-file that created all of the results. All deliverables will be submitted via Canvas by 5 p.m. There is a 20 percent penalty for any paper submitted after the deadline but before 11/23 at 5 p.m. No submissions will be accepted after 11/23 at 5 p.m. 1) A description of the data set you will use for your primary analysis. This should include a description of the primary data source (the American Community Survey), the year(s) of the data, and the restrictions that I described above. Note: if you are investigating a “household level variable” (e.g. homeownership, home value, rent,) you should use only the head of the household (pernum==1) for your analysis. If you are investigating an “individual level” variable (e.g. employment, marital status), use both the reference person and his/her spouse 2) A description of any sample restrictions that you are making (e.g. omitted people in certain age ranges, dropped people with missing data, etc.). Make sure that you have “cleaned” your data so that you have the same number of observations on the dependent and control variables. Also, describe any variables that you create or modify. For example, be sure to study the codebook to understand how variables might be coded if there are missing values (drop such observations). You may also want to adjust the units that certain variables are measured in (e.g. housing values might be converted to 1000s of dollars instead of dollars).
  • 24. 3) A description of the dependent variable and the key control variable in your analysis and a discussion of why you believe the key control variable will have either a positive or negative effect on the dependent variable. You should show the relationship between the dependent variable and the key control variable of interest with both a table and a graph. The table and graph should be professionally designed and be self-explanatory. That is, anyone who looks at the table or graph should be able to determine what statistics are presented, the meaning of the variables, and where the data came from. 4) A description of at least two other control variables that you believe will help explain variation in your dependent variable. Describe why you think the direction of the expected effect of each control variable on the dependent variable would be either positive or negative. A professional table showing for the dependent variable and each control variable the number of observations, the sample mean, standard deviation, minimum and maximum value for the dependent variable and each of the control variables in your data. Be sure to explain any modifications you made to the variables in the data set (e.g. did you have to recode variables that were missing? Did you have to convert a categorical variable to a continuous variable? Did you make dummy variables?)
  • 25. 5) A simple linear regression of your dependent variable on the key variable(s) of interest and other control variables that you believe are important. The results should be presented in a professional table created with the Stata routine esttab. The results should include coefficients, t-statistics, R2 and adjusted R2 , and the number of observations. 6) The structure of your word document should include the following sections. (a) Introduction. (i) Describes the objective of the deliverable and a couple of the major findings. For example, this study will use data from the ACS to understand the factors that determine whether people are employed. We find several important determinants of employment. For example, ….. (b) The data (i) Description of ACS data. (ii) The dependent variable to be studied, the key control variable, and other controls you think are important.
  • 26. (iii) Expected effect of each of the control variables and why. (iv) Sample restrictions for your analysis. (v) Any modifications you made to the variables that you use in your analysis. (c) Results of data analysis. (i) The word document should provide a brief discussion of the results in the tables and figures presented below. All the tables and figures should be numbered and added to the end of your word document. (ii) Table 1. A professional table showing sample statistics (means, min, max, observations, etc.) (iii) Figure 1. A professional figure showing relationship between dependent variable and key control variable. (iv) Table 2. A professional table showing relationship between dependent variable and key control variable. (v) Table 3. A professional table showing regression results.