1. The document discusses the simple linear regression model, which relates a dependent variable Y to an independent variable X using a straight line. It defines key terms like the population regression function, sample regression function, and error term.
2. It describes how ordinary least squares regression estimates the parameters in the sample regression function by minimizing the sum of squared residuals. This provides estimated values for the intercept and slope.
3. It discusses some algebraic properties of ordinary least squares estimates, including that the sum of residuals is 0 and their sample covariance with the independent variable is 0. It also defines other summary statistics like R-squared and total, explained, and residual sum of squares.
This presentation explains almost all the concepts that needs to be understood and developed before running an OLS in Regression analysis. The concept of Unconditional and Conditional means have been discussed in detail along with the differences between the PRF and SRF.
Investment Multiplier and Super multiplierKhemraj Subedi
Investment Multiplier and Super Multiplier are very important concept of Macroeconomics to understand the effect of autonomous investment and induced investment in final increase in national income.
Heteroscedasticity is the condition which refers to the violation of the Homoscedasticity condition of the linear regression model used in econometrics study. In simple words, it can be described as the situation which leads to increase in the variance of the residual terms with the increase in the fitted value of the variable. Copy the link given below and paste it in new browser window to get more information on Heteroscedasticity:- http://www.transtutors.com/homework-help/economics/heteroscedasticity.aspx
Ragui Assaad- University of Minnesota
Caroline Krafft- ST. Catherine University
ERF Training on Applied Micro-Econometrics and Public Policy Evaluation
Cairo, Egypt July 25-27, 2016
www.erf.org.eg
Binary outcome models are widely used in many real world application. We can used Probit and Logit models to analysis this type of data. Specially, dose response data can be analyze using these two models.
This presentation explains almost all the concepts that needs to be understood and developed before running an OLS in Regression analysis. The concept of Unconditional and Conditional means have been discussed in detail along with the differences between the PRF and SRF.
Investment Multiplier and Super multiplierKhemraj Subedi
Investment Multiplier and Super Multiplier are very important concept of Macroeconomics to understand the effect of autonomous investment and induced investment in final increase in national income.
Heteroscedasticity is the condition which refers to the violation of the Homoscedasticity condition of the linear regression model used in econometrics study. In simple words, it can be described as the situation which leads to increase in the variance of the residual terms with the increase in the fitted value of the variable. Copy the link given below and paste it in new browser window to get more information on Heteroscedasticity:- http://www.transtutors.com/homework-help/economics/heteroscedasticity.aspx
Ragui Assaad- University of Minnesota
Caroline Krafft- ST. Catherine University
ERF Training on Applied Micro-Econometrics and Public Policy Evaluation
Cairo, Egypt July 25-27, 2016
www.erf.org.eg
Binary outcome models are widely used in many real world application. We can used Probit and Logit models to analysis this type of data. Specially, dose response data can be analyze using these two models.
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...Muhammad Ali
Econometrics notes for BS economics students
Muhammad Ali
Assistant Professor of Statistics
Higher Education Department, KPK, Pakistan.
Email:Mohammadale1979@gmail.com
Cell#+923459990370
Skyp: mohammadali_1979
In this PPT, you can get more knowledge about the Assumptions of Ordinary Least Square (OLS). Most of us, did't know the basic idea about regression. So, you read this material in order to clarify yourself by this PPT.
Assumptions of OLS:
1. Linear Regression
2.X Values are repeated and fixed sampling
3. Zero mean value of Disturbance
4. Homoscedasticity
5. Hetroscedasticity
6.No Auto Correlation
7.Zero covariance between X (explanatory variable) and U (Disturbance term) and so on.....
Linear regression is an approach for modeling the relationship between one dependent variable and one or more independent variables.
Algorithms to minimize the error are
OLS (Ordinary Least Square)
Gradient Descent and much more.
Let me know if anything is required. Ping me at google #bobrupakroy
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
This lecture provides an overview of some modern regression techniques including a discussion of the bias variance tradeoff for regression errors and the topic of shrinkage estimators. This leads into an overview of ridge regression, LASSO, and elastic nets. These topics will be discussed in detail and we will go through the calibration/diagnostics and then conclude with a practical example highlighting the techniques.
Two-Variable (Bivariate) RegressionIn the last unit, we covered LacieKlineeb
Two-Variable (Bivariate) Regression
In the last unit, we covered scatterplots and correlation. Social scientists use these as descriptive tools for getting an idea about how our variables of interest are related. But these tools only get us so far. Regression analysis is the next step. Regression is by far the most used tool in social science research.
Simple regression analysis can tell us several things:
1. Regression can estimate the relationship between x and y in their
original units of measurement. To see why this is so useful, consider the example of infant mortality and median family income. Let’s say that a policymaker is interested in knowing how much of a change in median family income is needed to significantly reduce the infant mortality rate. Correlation cannot answer this question, but regression can.
2. Regression can tell us how well the independent variable (x) explains the dependent variable (y). The measure is called the
R square.
Simple Two-Variable (Bivariate) Regression
Regression uses the equation of a line to estimate the relationship between x and y. You may remember back in algebra learning about the equation of a line. Some learned it as Y =s X + K or Y = mX + B. In statistics, we use a different form:
Equation 1: Y = B0 + B1X + u
Let’s define each term in the equation:
· Y is the dependent variable. It is placed on the Y (vertical) axis. In the example below, the dependent variable (Y) is the infant mortality rate.
· B0 is the Y intercept. B0 is also referred to as “the constant.” B0 is the point where the regression line crosses the Y axis. Importantly, B0 is equal to the
predicted value of Ywhen X=0. In most cases, B0 is does not get much attention for two reasons. First, the researcher is usually interested in the relationship between x and y. not the relationship between x and y at the single value of x=0. Second, often independent variables do not take on the value zero. Consider the AECF sample data. There are no states with low-birth-weight percentages equal to zero, so we would be extrapolating beyond what the data tell us.
· B1 is usually the main point of interest for researchers. It is the slope of the line relating x to y. Researchers usually refer to B1 as a slope coefficient, regression coefficient or simply a coefficient.
B1 measures the change in Y for a one-unit change in x. We represent change by the symbol ∆.
B1 =
· u is the error term. The error term is the distance between the regression line and the dots on the scatterplot. Think about it, regression estimates a single line through the cloud of data. Naturally, the line does not hit all the data points. The degree to which the line “misses” the data point is the error. u can also be thought of as
all the other factors that affect the infant mortality rate besides X. Importantly, we
assume that u is totally random given X.
The ...
This power pint basically deals with the Overall Scenario of Ghandruk in regard to its Community Forest/ User group, Level of Remittance Inflow, and Impact of migration in Ghandruk.
This powerpoint Deals with basic Concepts of optical Fibers.It was prepared to assist students to get knowledge about Optical fibers and their working principle as well.
Read it ,, share it ,, Cheers...(C) Regmi Milan
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
2.1 the simple regression model
1. IMPORTATNT NOTE: Most of the equations have (hats) as an intercept. However in the
graphs you will see the intercept are (hats). This is my mistake but since I really do not
have time to make changes due to time constraints kindly corporate with me on this issue.
For simplicity please read as and and on ONLY FOR GRAPHS.
The Simple Regression Model:
Regression Analysis:
Y and X are two variables representing population and we are
interested in explaining y in terms of x.
Where Y = Dependent on X, which is the independent variable.
How to make a choice between the independent and the dependent
variable?
Income is the cause for consumption. Thus the income is the
independent variable and consumption is the effect, dependent
variable.
It is also called the two-variable linear regression model or bivariate
linear regression modelbecause it relates the two variables x and y.
Regression Analysis is concerned with the study of dependent variable
on one or more independent or explanatory variables with a view of
estimating or predicting population mean, in terms of the known or
fixed (in repeated sampling) value of the latter i.e,
The variable u, called the error termor disturbance in the relationship,
represents factors other than that affect y, the “unobserved” factor.
If the other factors in u are held fixed, so that the change in u is zero,
, then x has a linear effect on y:
2. Thus, the change in y is simply multiplied by the change in x.
Terminology- Notation:
Dependent Variable Independent Variable
Explained Variable Explanatory Variable
Predicted Predicator
RegressandRegressor
Response Stimulus
Endogenous Exogeneous
Outcome Covariate
Controlled Control Variables
are two unknown but fixed parameters known as the
regression coefficients.
Eg: Suppose in a "total Population" we have 60 families living in a
community called XYZ and their weekly income (X) and weekly
consumption (Y) are both in dollars.
X 80 100 120 140 160 180 200 220 240 260
Y 55 65 79 80 102 110 120 135 137 150
60 70 84 93 107 115 136 137 145 152
65 74 90 95 110 120 140 140 155 175
70 80 94 103 116 130 144 152 165 178
75 85 98 108 118 135 145 157 175 180
0 88 0 113 125 140 0 160 189 185
0 0 0 115 0 0 0 162 0 191
Total 325 462 445 707 678 750 685 1043 966 1211
Conditional
Means of
Y, E(Y/X) 65 77 89 101 113 125 137 149 161 173
The 60 families of X are divided into 10 income groups from $80-$260.
The values of X are "fixed" and 10 Y subpopulation.
There is a considerable variation in each income group
3. Geometrically, then a population regression curve is simply the locus of
the conditional means of the dependent variable for the fixed values
of the explanatory variable (s).
The conditional mean
Where denotes some function of the explanatory variable X.
E(Y/ ) is a linear function of say of a type:
E =
Meaning of term Liner:
1. Liner in Variables i.e. X i.e. E(Y/ is not Liner.
2. Liner in Parameters i.e. .
E = is not linear.
4. Eg: Linear in Parameters:
But for now whenever we refer to the term "linear" regression we only
mean linear in parameters the
Two way scatter plot of income and consumption
Population
Regression
Line
5. The Population Regression Line passes between the "Average" values
of consumption E(Y/X) which is also known as the conditional
expected value.
The CEV tells us the expected value of weekly consumption expenditure
or a family whose income is $80, $100…
Unconditional Expected Value: The unconditional expected value of
weekly consumption expenditure is given by E(Y) it disregards the
income levels of various families.
E(Y) = 7272/60 = $121.20
It tells us the expected value of weekly consumption expenditure of
"any" family.
Thus;
Conditional mean E(Y/X) is a function of where = ,
and so on.
It is a liner function, AND is also known asthe conditional Expected
Function, Population Regression Function or Population Function.
E =
Where are two unknown but fixed parameters known as
the regression coefficients.
And is the intercept and is the slope.
The main objective of the regression analysis is to estimate the values
of the unknown's on the basis of observations Y and X.
We saw previously that as family's income increases, family's
consumption expenditure on average increases too.
But what about the individual family?
6. For example see that as income increases from $80 to $100 we see
particular families consumption is $65, which is less than consumption
expenditure of two families whose weekly income is $ 80.
Thus we express this deviation of an individual as:
or ) or
.
The expenditure of an individual family given its income level can be
expressed as:
1. E = Systematic or deterministic and
2. = Nonsystematic and cannot be determined
=
Taking the expected value on both sides:
/ )+ / ).
Before we make any assumption of u and x. We make an important
assumption i.e. as long as we include the intercept in the equation;
nothing is lost by assuming that the average value of u in the
"population" is zero.i.e. E(u) = 0.
Relationship between u and x:
We assume u and x are not correlated or u and x are not linearly related.
7. It is possible for u to be uncorrelated with x while being correlated with
the functions of x such as the .
Thus the better assumption involves that the expected value of u given
x is zero or E ( / = E(u) = 0.
This is called the zero conditional mean assumption.
The sample regression function:
So far we have only talked about the population of Y values
corresponding to the fixed X's.
When collecting data it is almost impossible to collect data on the entire
population.
Thus for most practical situations we have is a sample of Y values
corresponding to some fixed X's.
Thus our task is to estimate PRF based on the sample information.
8. OR;
Where; is the estimator of and
.
Thenumerical value obtained by the estimator is known as the
"Estimate".
Expressing SRF is stochastic term can be written as:
.
Where is the residual term.
Conceptually is analogus to and can be regarded as the estimate of
.
So far:
PRF: and
SRF:
In terms of SRF:
In terms of PRF:
)+
It is almost impossible for SRF and PRF to be the same due to sampling
problems thus our main objective is to choose so that it
replicates as close as possible.
9. How is SRF itself determined since PRF is never known?
Ordinary Least Square:
PRF:
SRF: = .
Thus we should choose SRF in such a way that sum of the residuals
= is as small as possible.
Thus if we adopt the criterion of minimizing , then according to the
diagram above we should give equal weights to
In other words all the residuals should receive equal weights no matter
how far ( ) or how close ( ) they are from the SRF.
10. And such a minimization is possible by adopting least square criteria
which states that SRF can be fixed in such a way that
is as small as possible where;
.
Thus our goal is to choose in such a way that is as
small as possible which is done by OLS.
Let =
So we want to minimize .
Taking partial derivative with respect to .
= -2 .
= -2 .
=
.
Plugging the values of
( -( - )- )=0
Upon rearranging gives:
11. ( - )= -
- )( - –
Provided that
Thus
= or
equals the population covariance divided by the variance of
when .
Which concludes:
If and are positively correlated then is positive and
If and are negatively correlated then is negative.
Fitted Value and Residuals:
We assume that the intercept and slope , have been obtained
for a given sample of data.
Given , we can obtain the fitted value for each observation.
By definition each fitted value is on the OLS line.
The OLS residuals associated with observation i, is the difference
between and the its fitted value.
12. If is positive the line under predicts if is negative the line over
predicts.
The ideal case is for observation is when , but in every case OLS
is not equal to zero.
Algebraic Prosperities of OLS Statistics:
There are several useful algebraic properties of OLS estimates and their
associated statistics. We now cover the three most important of these.
(1) The sum, and therefore the sample average of the OLS residuals, is
zero. Mathematically,
It follows immediately from the OLS first order condition.
13. This means OLS estimates are chosen to make the residuals
add up to zero (for any data set). This says nothing about the residual
for any particular observation
(2) The sample covariance between the regressor and the OLS residuals
is zero. This can be written as:
The sample average of the OLS residuals is zero.
Example:
Thus and u captures all the factors not included in the
model eg: aptitude, ability as so on.
(3) The point ( is always on the OLS regression line.
Writing each as its fitted value, plus its residual, provides another way
to interpret an OLS regression.
For each i, write: .
14. From property (1) above, the average of the residuals is zero;
equivalently, the sample average of the fitted values, , is the same as
the sample average of the , or = .
Further, properties (1) and (2) can be used to show that the sample
covariance between is zero.
Thus, we can view OLS as decomposing each into two parts, a fitted
value and a residual.
The fitted values and residuals are uncorrelated in the sample.
Precision Or Standard Errors of Least Square Estimates:
Thus far we know that least square estimates are functions of SAMPLE
data.
And our estimates will change with each change in sample.
Therefore a proper measure of reliability and precision is needed. And
such precision/ reliability is measured by STANDARD ERROR.
Define the total sum of squares (SST), the explained sum of squares
(SSE), and the residual sum of squares (SSR) (also known as the sum
of squared residuals), as follows:
SST =
SSE =
SSR = .
SST is a measure of the total sample variation in the ; that is, it
measures how spread out the is in the sample.
If we divide SST by n-1 we obtain the sample variance of y.
15. Similarly, SSE measures the sample variation in the (where we use the
fact that ), and
SSR measures the sample variation in the .
The total variation in y can always be expressed as the sum of the
explained variation and the unexplained variation SSR. Thus,
SST = SSE +SSR.
PROOF:
Since the covariance between the residuals and the fitted value is zero.
16. We have
SST = SSE +SSR.
Goodness of Fit:
So far, we have no way of measuring how well the explanatory or
independent variable, x, explains the dependent variable, y.
It is often useful to compute a number that summarizes how well the
OLS regression line fits the data.
Assuming that the total sum of squares, SST, is not equal to zero—which
is true except in the very unlikely event that all the equal the same
value—we can divide SST on both sides to obtain:
Alternatively:
17. The R-squared of the regression, sometimes called the coefficient of
determination, is ASLO BE defined as
or
is the ratio of the explained variation compared to the total variation,
and thus it is interpreted as the fraction of the sample variation in y
that is explained by x.
is always between zero and one, since SSE can be no greater than SST.
When interpreting , we usually multiply it by 100 to change it into a
percent: 100* is the percentage of the sample variation in y that is
explained by x.