We can define heteroscedasticity as the condition in which the variance of the error term or the residual term in a regression model varies. As you can see in the above diagram, in the case of homoscedasticity, the data points are equally scattered while in the case of heteroscedasticity, the data points are not equally scattered.
Two Conditions:
1] Known Variance
2] Unknown Variance
It is essential for all regression models that the relationship between the independent and dependent variables are represented correctly. Functional form tries to do exactly this. A functional form will give an equation for the dependent and independent variables so that the hypothesis tests can be carried out properly. Copy the link given below and paste it in new browser window to get more information on Functional Forms of Regression Analysis:- http://www.transtutors.com/homework-help/economics/functional-forms-of-regression-models.aspx
It deals with various functional forms in regression along with the derivation and interpretation of the slope and elasticity values of each of the models. The frequently used models of log-lin, lin-log and log-log models are also adequately elaborated. The link of the MS powerpoint used in this video is also given separately as a pinned comment.
Heteroscedasticity is the condition which refers to the violation of the Homoscedasticity condition of the linear regression model used in econometrics study. In simple words, it can be described as the situation which leads to increase in the variance of the residual terms with the increase in the fitted value of the variable. Copy the link given below and paste it in new browser window to get more information on Heteroscedasticity:- http://www.transtutors.com/homework-help/economics/heteroscedasticity.aspx
It is essential for all regression models that the relationship between the independent and dependent variables are represented correctly. Functional form tries to do exactly this. A functional form will give an equation for the dependent and independent variables so that the hypothesis tests can be carried out properly. Copy the link given below and paste it in new browser window to get more information on Functional Forms of Regression Analysis:- http://www.transtutors.com/homework-help/economics/functional-forms-of-regression-models.aspx
It deals with various functional forms in regression along with the derivation and interpretation of the slope and elasticity values of each of the models. The frequently used models of log-lin, lin-log and log-log models are also adequately elaborated. The link of the MS powerpoint used in this video is also given separately as a pinned comment.
Heteroscedasticity is the condition which refers to the violation of the Homoscedasticity condition of the linear regression model used in econometrics study. In simple words, it can be described as the situation which leads to increase in the variance of the residual terms with the increase in the fitted value of the variable. Copy the link given below and paste it in new browser window to get more information on Heteroscedasticity:- http://www.transtutors.com/homework-help/economics/heteroscedasticity.aspx
The presentation aims to explain the meaning of ECONOMETRICS and why this subject is studied as a separate discipline.
The reference is based on the book "BASIC ECONOMETRICS" by Damodar N. Gujarati.
For further explanation, check out the youtube link:
https://youtu.be/S3SUDiVpUGU
Brief notes on heteroscedasticity, very helpful for those who are bigners to econometrics. i thought this course to the students of BS economics, these notes include all the necessary proofs.
This presentation explains almost all the concepts that needs to be understood and developed before running an OLS in Regression analysis. The concept of Unconditional and Conditional means have been discussed in detail along with the differences between the PRF and SRF.
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
This lecture provides an overview of some modern regression techniques including a discussion of the bias variance tradeoff for regression errors and the topic of shrinkage estimators. This leads into an overview of ridge regression, LASSO, and elastic nets. These topics will be discussed in detail and we will go through the calibration/diagnostics and then conclude with a practical example highlighting the techniques.
The presentation aims to explain the meaning of ECONOMETRICS and why this subject is studied as a separate discipline.
The reference is based on the book "BASIC ECONOMETRICS" by Damodar N. Gujarati.
For further explanation, check out the youtube link:
https://youtu.be/S3SUDiVpUGU
Brief notes on heteroscedasticity, very helpful for those who are bigners to econometrics. i thought this course to the students of BS economics, these notes include all the necessary proofs.
This presentation explains almost all the concepts that needs to be understood and developed before running an OLS in Regression analysis. The concept of Unconditional and Conditional means have been discussed in detail along with the differences between the PRF and SRF.
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
This lecture provides an overview of some modern regression techniques including a discussion of the bias variance tradeoff for regression errors and the topic of shrinkage estimators. This leads into an overview of ridge regression, LASSO, and elastic nets. These topics will be discussed in detail and we will go through the calibration/diagnostics and then conclude with a practical example highlighting the techniques.
Stuck with your Regression Assignment? Get 24/7 help from tutors with Phd in the subject. Email us at support@helpwithassignment.com
Reach us at http://www.HelpWithAssignment.com
This 10 hours class is intended to give students the basis to empirically solve statistical problems. Talk 1 serves as an introduction to the statistical software R, and presents how to calculate basic measures such as mean, variance, correlation and gini index. Talk 2 shows how the central limit theorem and the law of the large numbers work empirically. Talk 3 presents the point estimate, the confidence interval and the hypothesis test for the most important parameters. Talk 4 introduces to the linear regression model and Talk 5 to the bootstrap world. Talk 5 also presents an easy example of a markov chains.
All the talks are supported by script codes, in R language.
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxbudbarber38650
FSE 200
Adkins Page 1 of 10
Simple Linear Regression
Correlation only measures the strength and direction of the linear relationship between two quantitative variables. If the relationship is linear, then we would like to try to model that relationship with the equation of a line. We will use a regression line to describe the relationship between an explanatory variable and a response variable.
A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x.
Ex. It has been suggested that there is a relationship between sleep deprivation of employees and the ability to complete simple tasks. To evaluate this hypothesis, 12 people were asked to solve simple tasks after having been without sleep for 15, 18, 21, and 24 hours. The sample data are shown below.
Subject
Hours without sleep, x
Tasks completed, y
1
15
13
2
15
9
3
15
15
4
18
8
5
18
12
6
18
10
7
21
5
8
21
8
9
21
7
10
24
3
11
24
5
12
24
4
Draw a scatterplot and describe the relationship. Lay a straight-edge on top of the plot and move it around until you find what you think might be a “line of best fit.” Then try to predict the number of tasks completed for someone having been without sleep 16 hours.
Was your line the same as that of the classmate sitting next to you? Probably not. We need a method that we can use to find the “best” regression line to use for prediction. The method we will use is called least-squares. No line will pass exactly through all the points in the scatterplot. When we use the line to predict a y for a given x value, if there is a data point with that same x value, we can compute the error (residual):
Our goal is going to be to make the vertical distances from the line as small as possible. The most commonly used method for doing this is the least-squares method.
The least-squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.
Equation of the Least-Squares Regression Line
· Least-Squares Regression Line:
· Slope of the Regression Line:
· Intercept of the Regression Line:
Generally, regression is performed using statistical software. Clearly, given the appropriate information, the above formulas are simple to use.
Once we have the regression line, how do we interpret it, and what can we do with it?
The slope of a regression line is the rate of change, that amount of change in when x increases by 1.
The intercept of the regression line is the value of when x = 0. It is statistically meaningful only when x can take on values that are close to zero.
To make a prediction, just substitute an x-value into the equation and find .
To plot the line on a scatterplot, just find a couple of points on the regression line, one near each end of the range of x in the data. Plot the points and connect them with a line. .
This article provides a brief discussion on several statistical parameters that are most commonly used in any measurement and analysis process. There are a plethora of such parameters but the most important and widely used are briefed in here.
Logistic model (or logit model) is used to model the probability of events realized in two classes such as alive/dead or healthy/sick etc. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc. Each object being detected in the image would be assigned a probability between 0 and 1, the sum adding to one.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
2. Introduction
2
We can define heteroscedasticity as the
condition in which the variance of the error
term or the residual term in a regression
model varies. As you can see in the above
diagram, in the case of homoscedasticity,
the data points are equally scattered while
in the case of heteroscedasticity, the data
points are not equally scattered.
3. Possible reasons for arising
Heteroscedasticity:
3
1.Often occurs in those data sets which have a large range
between the largest and the smallest observed values i.e.
when there are outliers.
2.When the model is not correctly specified.
3.If observations are mixed with different measures of scale.
4.When incorrect transformation of data is used to perform the
regression.
5.Skewness in the distribution of a regressor, and maybe some
other sources.
4. Effects of Heteroscedasticity
4
• OLS (Ordinary Least Square) estimators are not the Best Linear Unbiased
Estimator(BLUE) and their variance is not the lowest of all other
unbiased estimators.
• Estimators are no longer best/efficient.
• The tests of hypothesis (like t-test, F-test) are no longer valid due to the
inconsistency in the co-variance matrix of the estimated regression
coefficients.
6. Weighted Least Squares (WLS) Estimator
Presentation title 6
• The Weighted Least Squares estimator is the OLS estimator, which is applied
to a transformed model after multiplying each term on both sides of the
regression equation by a “weight”, denoted by wi . For instance, consider the
following general linear regression model with Heteroscedasticity:
• Yᵢ = 𝛽0 + 1Xᵢ1+ui ; i = 1,2, … n
• Var(ui) = ²Zᵢ² ; where Zᵢ = some function of Xᵢ
• To obtain WLS estimator, the transformed model will be:
• wᵢYᵢ= wᵢ 𝛽0 + 1(wᵢXᵢ1) + wᵢui ; i= 1,2, … n
7. 7
Question : For the model 𝒀𝒊 = 𝜷𝑿𝒊 + 𝒖𝒊 𝐰ith variance var(𝒖𝒊) = 𝝈𝟐𝒁𝒊
𝟐
, prove that WLS estimator
of 𝜷 has lower variance then its OLS estimator , where the weight is 𝒘𝒊 =
𝟏
𝒁𝒊
ANSWER:- For the model 𝒀𝒊 = 𝜷𝑿𝒊 + 𝒖𝒊 𝐰ith variance var(𝒖𝒊) =𝝈𝟐𝒁𝒊
𝟐
The OLS estimator of 𝜷 : 𝜷 =
𝑿𝒊𝒀𝒊
𝑿𝒊
𝟐 and var( 𝜷 )=
𝑿𝒊
𝟐
var(𝒖𝒊)
(𝑿𝒊
𝟐)𝟐 =
𝝈𝟐 𝑿𝒊
𝟐
𝒁𝒊
𝟐
(𝑿𝒊
𝟐)𝟐
If we divide entire equation by 𝒁𝒊 ,
𝒀𝒊
𝒁𝒊
= 𝜷
𝑿𝒊
𝒁𝒊
+
𝒖𝒊
𝒁𝒊
or 𝒚𝒊 = 𝜷𝒙𝒊 + 𝒗𝒊
Here var(𝒗𝒊) =
var(𝒖𝒊)
𝒁𝒊
𝟐 =
𝝈𝟐𝒁𝒊
𝟐
𝒁𝒊
𝟐 = 𝝈𝟐 ( constant – Homoscedasticity )
The WLS estimator of 𝜷 : 𝜷∗ =
𝒙𝒊𝒚𝒊
𝒙𝒊
𝟐 =
𝒙𝒊(𝜷𝒙𝒊+𝒗𝒊)
𝒙𝒊
𝟐 = 𝜷 +
𝒙𝒊𝒗𝒊
𝒙𝒊
𝟐
Var(𝜷∗ )=E(𝜷∗ - 𝜷)² =
𝒙𝒊
𝟐
𝑬(𝒗𝒊
𝟐
)
(𝒙𝒊
𝟐)𝟐 =
𝝈𝟐 𝒙𝒊
𝟐
(𝒙𝒊
𝟐)𝟐 =
𝝈𝟐
𝒙𝒊
𝟐 [E(𝒖𝒊 , 𝒖𝒋) = 0 ]….(𝑿𝒊’s are independent)
13. 13
• WLS method makes an implicit assumption that true error
variance (𝝈ᵢ²) is known. However, in reality, it is difficult to have
knowledge of the true error variance. Thus, we need some
other methods to obtain consistent estimate of variance of
error term.
• In this method, we need to make some assumptions about
true error variance (𝝈ᵢ²) and transform the original regression
model. After transformation, the new model satisfies
Homoscedasticity assumption. Let’s say original regression
model is:
Yᵢ = 𝛽1+𝛽2Xᵢ+ ui and var (uᵢ) = 𝝈ᵢ² ; i= 1,2, … n
14. Presentation title 14
When the error variance is proportional to
Xᵢ
Run the original OLS regression and obtain the residuals. Plot the square of these residuals , (σᵢ²) , against the
explanatory variable X. If we get a pattern similar to figure 1, then we say that error variance is proportional
to Xi or linearly related to Xᵢ and (σ²) is the factor of proportionality, which is a constant. Symbolically, E (Ui² )
= σ² Xi i = 1,2, … n
Now we transform the original regression model by dividing
original regression equation by √Xi , we get:
Yi /√Xi = β₁/√Xi + β₂ Xi/√Xi +ui / √Xi i = 1,2, … .n
= β₁ /√Xi + 𝜷𝟐 √Xi +Vi i = 1,2, … . n
Here, Vi= ui /√Xi and Xi > 0This transformed regression
equation is called “Square Root Transformation” and the error
variance Vi is Homoscedastic.
Proof:
E(Vi²)= E (ui / √Xi )²= E(Ui² )/ Xi = σ ²
15. Presentation title 15
When the error variance is proportional to
Xᵢ²
Run the original OLS regression and obtain the residuals. Plot the square of these residuals , (σᵢ²) , against the
explanatory variable X. If we get a pattern similar to figure 2, then we say that error variance is proportional
to Xi or non-linearly related to Xᵢ and (σ²) is the factor of proportionality, which is a constant. Symbolically, E
(Ui² ) = σ² Xᵢ² i = 1,2, … n
Now we transform the original regression model by dividing
original regression equation by Xi , we get:
Yi /Xi = β₁/Xi + β₂ Xi/Xi +ui / Xi i = 1,2, … .n
= β₁ /Xi + 𝜷𝟐 +Vi i = 1,2, … . n
Here, Vi= ui /Xi and Xi > 0This transformed regression
equation is called “Square Transformation” and the error
variance Vi is Homoscedastic.
Proof:
E(Vi²)= E (ui / Xi )²= E(Ui² )/ Xᵢ² = σ ²
16. Presentation title 16
When the error variance is
proportional to square of the mean
value of Y
According to this assumption, the error variance is proportional to square of the
mean value of Y and σ² is a constant. Symbolically, E (Uᵢ²) = σ² . [E (Yᵢ)] ²; i =
1,2, … . n.
Now we transform the original regression model by dividing it by E(Yᵢ) and we
get: Where, E(Yᵢ) = β₁ + β₂ Xᵢ
Yᵢ/E(Yᵢ) = β₁ /E(Yᵢ) + β₂Xᵢ/E(Yᵢ) + μᵢ /E(Yᵢ)
= β₁ /E(Yᵢ) + β₂Xᵢ/E(Yᵢ) + Vᵢ; i = 1,2, … . n
Where, Vᵢ= μᵢ /E(Yᵢ)
We can show that the error variance Vᵢ is Homoscedastic
Proof: E (Vᵢ²)= E (uᵢ / E(Yᵢ))² = E(Uᵢ)² / E(Yᵢ)² = σ²
17. Presentation title 17
E(Yi) depends on 𝜷₁ and𝜷₂ which are unknown. We know
𝒀𝒊= 𝜷₁ + 𝜷₂𝑿𝒊
Which is an estimator of E(Yi)
First, we run the usual OLS regression, disregarding the
heteroscedasticity problem , and obtain 𝒀𝒊 then using the
estimated 𝒀𝒊 , we transform our model:
𝒀𝒊
𝒀𝒊
=𝜷₁
𝟏
𝒀𝒊
+ 𝜷₂
𝑿𝒊
𝒀𝒊
+
𝒖𝒊
𝒀𝒊
………. i = 1,2, … . n.
The transformation will perform satisfactorily in practice if the
sample size is reasonably large.
18. 18
log transformation of the original regression model can help to
reduce the problem of heteroscedasticity. Symbolically,
log(Yᵢ)=𝜷₁ + 𝜷₂log(Xᵢ) +uᵢ
i = 1,2, … . n.
Log Transformation
25. Some “problems” associated with this
transformation method
25
• In multiple regression models, we may not decide which of the X variables should be
chosen for transforming the data.
• Log transformation is not applicable if some of the Y and X values are zero or negative.
• It may happen that the ratios of variables are found to be correlated even though the
original variables are uncorrelated or random. For instance, in the model, Yᵢ= β₁ +
β₂Xᵢ+ uᵢ ; Y and X may not be correlated but in the transformed model Yᵢ /Xᵢ= β₁/Xᵢ+ β₂
+ uᵢ /Xᵢ , Yᵢ/ Xᵢ and 1/Xᵢ are often found to be correlated. Hence, there is a problem of
spurious correlation.
26. Summary
All of the remedial measures discussed above are just a way
to speculate about the nature of the population error
variance, 𝝈ᵢ² and which method is to be used depends upon
the nature of the problem and severity of Heteroscedasticity.
26