The document discusses simple linear regression and correlation. Simple linear regression predicts a dependent variable based on an independent variable. The linear regression equation is represented as y = β0 + β1x + ε, where β0 is the y-intercept, β1 is the slope, and ε is the error term. Multiple linear regression extends this to use two or more independent variables. Correlation is measured on a scale of -1 to 1 and indicates the strength and direction of the linear relationship between two variables. The coefficient of correlation r is calculated to quantify the correlation between variables.
To get a copy of the slides for free Email me at: japhethmuthama@gmail.com
You can also support my PhD studies by donating a 1 dollar to my PayPal.
PayPal ID is japhethmuthama@gmail.com
To get a copy of the slides for free Email me at: japhethmuthama@gmail.com
You can also support my PhD studies by donating a 1 dollar to my PayPal.
PayPal ID is japhethmuthama@gmail.com
To get a copy of the slides for free Email me at: japhethmuthama@gmail.com
You can also support my PhD studies by donating a 1 dollar to my PayPal.
PayPal ID is japhethmuthama@gmail.com
To get a copy of the slides for free Email me at: japhethmuthama@gmail.com
You can also support my PhD studies by donating a 1 dollar to my PayPal.
PayPal ID is japhethmuthama@gmail.com
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
2. • Simple linear regression: predicts a
variable based on the information from
another variable.
• Linear regression can only be used when
one has two continuous variables—an
independent variable and a dependent
variable.
11/7/2023
Simple Linear Regression and
Correlations
2
4. • A Simple regression model. is a two-
variable (bivariate) linear regression
model because it relates the two
variables x and y.
• Multiple linear regression (MLR): is
used to predict the outcome of a
variable based on the value of two or
more variables.
11/7/2023
Simple Linear Regression and
Correlations
4
6. Example:
• Suppose the relationship between
expenditure (Y) and income (X) of
households is expressed as:
Y = 0.6X + 120
• Here, on the basis of income, we can
predict expenditure. For an income level of
Br 1,500, then the estimated expenditure
will be:
Expenditure = 0.6(1500) + 120 = Br 1,020
• This functional relationship is
deterministic or exact, that is, given
income we can determine the exact
expenditure of a household.
11/7/2023
Simple Linear Regression and
Correlations
6
7. • But in reality this rarely happens:
different households with the same
income are not expected to spend equal
amounts due to habit, preference,
geographical and time variation, etc.
• Thus, we should express the regression
model as:
𝑦𝑖 = 𝛽0 + 𝛽1𝑥1 + 𝜖𝑖
11/7/2023
Simple Linear Regression and
Correlations
7
8. Generally the reasons for including the
error term are:
i. Omitted variables: a model is a
simplification of reality. It is not
always possible to include all relevant
variables in a functional form.
Excluded variables from the model
introduces an error.
ii. Measurement error: inaccuracy in
collection and measurement of sample
data.
iii.Sampling error
11/7/2023
Simple Linear Regression and
Correlations
8
9. Stochastic and Non-stochastic
Relationships
• If the relationship between x and y is such
that for a particular value of x, there is
only one corresponding value of y.it is
known as a deterministic (non-stochastic)
relationship . Other factors in 𝜖𝑖 are held
fixed, so that the change in 𝜖𝑖is zero.
𝑦𝑖 = 𝛽0 + 𝛽1𝑥1 + ⋯ ⋯ ⋯ + 𝛽𝑝𝑥𝑖
• Take into account the sources of errors
𝜖𝑖 𝑜𝑟 𝑢𝑖 stochastic term of the function will
be:
𝑦𝑖 = 𝛽0 + 𝛽1𝑥1 + 𝛽2𝑥2 + ⋯ ⋯ ⋯ + 𝛽𝑝𝑥𝑖 + 𝜖𝑖
11/7/2023
Simple Linear Regression and
Correlations
9
10. 11/7/2023
Simple Linear Regression and
Correlations
10
A simple regression analysis effectively treats
all factors affecting y other than x as being
unobserved.
𝒚 = 𝜷𝟎 + 𝜷𝟏𝒙𝟏
Let’s start by noting the following:
𝑥 =
𝑥𝑖
𝑛
𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 𝑥𝑖 = 𝑛𝑥
𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑙𝑦 𝑦𝑖 = 𝑛𝑦
Also
(𝑥𝑖 − 𝑥)2= (𝑥𝑖
2 − 2𝑥𝑖𝑥 + 𝑥2)
= 𝑥𝑖
2 − 2𝑥 𝑥𝑖 + 𝑥
2
= 𝑥𝑖
2 − 2𝑥𝑛𝑥 + 𝑛𝑥2
= 𝑥𝑖
2
− 𝑛𝑥2
11. • Now we can take the first derivative of
𝛽0
𝑦𝑖 = 𝛽0 + 𝛽1𝑥𝑖 + 𝜇𝑖
𝑦𝑖 = 𝛽0 + 𝛽1𝑥𝑖 + 𝜇𝑖
The sum of squares of the errors (SSE)
is:
𝑆𝑆𝐸 = 𝜀𝑖
2
= (𝑦𝑖 − 𝑦𝑖)2
𝜀𝑖 = 𝜇𝑖 − 𝜇𝑖 Minimizing errors
11/7/2023
Simple Linear Regression and
Correlations
11
12. −2 𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖 = 0
𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖 = 0
𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖 = 0
𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖 = 0
𝑛𝑦 − 𝑛𝛽0 − 𝛽1𝑛𝑥 = 0
𝑦 − 𝛽0 − 𝛽1𝑥 = 0
𝛽0 = 𝑦 − 𝛽1𝑥……………………… I
Note: This implies OLS line passes
through the means 𝑥 𝑎𝑛𝑑 𝑦
11/7/2023
Simple Linear Regression and
Correlations
12
14. But we know that (𝑥𝑖 − 𝑥)2
= 𝑥𝑖
2
− 𝑛𝑥2
and also 𝑛𝑥2
= 𝑥2
𝑥𝑖𝑦𝑖 − 𝑛𝑥𝑦 = 𝛽1 𝑥𝑖
2
− 𝛽1 𝑥2
𝑥𝑖𝑦𝑖 − 𝑛𝑥𝑦 = 𝛽1 (𝑥𝑖 − 𝑥)2
Also 𝑥𝑖𝑦𝑖 − 𝑛𝑥𝑦 = (𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)
Hence (𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦) = 𝛽1 (𝑥𝑖 − 𝑥)2
𝛽1 =
(𝑥𝑖−𝑥)(𝑦𝑖−𝑦)
(𝑥𝑖−𝑥)2 ……………………… II
11/7/2023
Simple Linear Regression and
Correlations
14
15. X 2 3 4 5 6 7
Y 7 2 8 14 12 10
11/7/2023
Simple Linear Regression and
Correlations
15
Example: For the data given below develop the linear
regression line
𝑥𝑖 = 27 𝑦𝑖 = 53
x =
xi
n
=
27
6
y =
yi
n
=
53
6
(𝑥𝑖 − 𝑥)2 = 17.5
(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦) = 𝑥𝑖𝑦𝑖 − 𝑛𝑥𝑦 = 25.5
16. Hence
𝛽1 =
(𝑥𝑖−𝑥)(𝑦𝑖−𝑦)
(𝑥𝑖−𝑥)2 =
25.5
17.5
= 1.46
𝛽0 = 𝑦 − 𝛽1𝑥 =
53
6
− 1.46
27
6
≈ 2.3
The regression line will be
𝑦 = 2.3 + 1.46𝑥
11/7/2023
Simple Linear Regression and
Correlations
16
y = 1.4571x + 2.2762
0
2
4
6
8
10
12
14
16
0 1 2 3 4 5 6 7 8
y
17. • The coefficient of x ( 𝛽1 )will be
expressed in other terms
• Multiply 𝛽1 by
1
𝑛
it will be
𝛽1 =
1
𝑛
( (𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦))
1
𝑛
( 𝑥𝑖 − 𝑥 2)
𝛽1 =
𝐶𝑜𝑣(𝑥, 𝑦)
𝑉𝑎𝑟(𝑥)
11/7/2023
Simple Linear Regression and
Correlations
17
18. COEFFICIENT OF CORRELATION (𝑟)
• It is the degree of relationship between two
variables.
• It goes between -1 and 1.
• 1 indicates that the two variables are moving in
unison. They rise and fall together and have perfect
correlation.
• -1 means that the two variables are in perfect
opposites.
11/7/2023
Simple Linear Regression and
Correlations
18
𝑟 =
𝑛 𝑥𝑦 − 𝑥 𝑦
𝑛 𝑥2 − 𝑥 2 𝑛 𝑦2 − 𝑦 2
or
𝑟 =
(𝑥 − 𝑥)(𝑦 − 𝑦)
(𝑥 − 𝑥)2 (𝑦 − 𝑦)2
19. 𝑟 =
𝑛 𝑥𝑦 − 𝑥 𝑦
𝑛 𝑥2 − 𝑥 2 𝑛 𝑦2 − 𝑦 2
or
𝑟 =
(𝑥 − 𝑥)(𝑦 − 𝑦)
(𝑥 − 𝑥)2 (𝑦 − 𝑦)2
• Example: It looks as if there exists a positive linear correlation
between average interest rate and yearly investment. This
means that if the average interest rate increases, then yearly
investment will also increase.
11/7/2023
Simple Linear Regression and
Correlations
19
20. 11/7/2023 Simple Linear Regression and Correlations 20
Example: It looks as if there exists a positive linear
correlation between average interest rate and yearly
investment.
0
500
1000
1500
2000
2500
13.5 14 14.5 15 15.5 16 16.5
Average
Investment
(Y)
Average Interest (X)
23. The equation of the straight line is
𝒚 = 𝜷𝟎 + 𝜷𝟏𝒙𝟏
𝛽1 =
10 22,569 −(149.1)(14,730)
10(2,229.03)−(149.1)2
𝛽1 =
24,447
59.49
𝛽1 = 𝟒𝟗𝟒. 𝟗𝟗
11/7/2023
Simple Linear Regression and
Correlations
23
𝛽1 =
(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)
(𝑥𝑖 − 𝑥)2
24. And 𝑎 = 𝑖=1
10
𝑦𝑖
𝑛
−
𝑏 𝑖=1
10
𝑥𝑖
𝑛
=
14,730
10
−
494.99 (149.1)
10
= −𝟓𝟗𝟎𝟕. 𝟑𝟎
Thus,
y = −5907.30 + 494.99x
11/7/2023
Simple Linear Regression and
Correlations
24
y = 494.99x - 5907.3
0
500
1000
1500
2000
2500
13.5 14 14.5 15 15.5 16 16.5
Average
Investment
(Y)
Average Interest (X)
Average Investment (Y)
25. COEFFICIENT OF DETERMINATION (𝒓𝟐)
• The coefficient of determination is a measurement
used to explain how much variability of one factor
can be caused by its relationship to another related
factor.
• It can be thought of as a percent.
• Values of 𝒓𝟐
lie between 0 and 1.
• In the example above the coefficient of
determination is 𝑟2
= 0.89892
= 0.8080. This means
that almost 81% of the variation in yearly
investments can be declared by the average
interest rate.
• An 𝒓𝟐
closer to 1 is an indicator of a
better goodness of fit for the observations, the
points will be around the regression line.
11/7/2023
Simple Linear Regression and
Correlations
25
26. Garage Age of car (in years) Resale value (in Birr)
1 1 41,250
2 6 10,250
3 4 24,310
4 2 38,720
5 5 8,740
6 4 26,110
7 1 38,650
8 2 36,200
11/7/2023
Simple Linear Regression and
Correlations
26
Example: A study was undertaken at eight garages
to determine how the resale value of a car is
affected by its age. The following data was
obtained:
27. The garage manager suspects a linear
relationship between the two variables.
Fit a curve of the form y = a + bx to the
data.
The equation for the regression line is
y = 48 644.17− 6 596.93X
The correlation coefficient is
𝑟 = −0.9601
𝑟2
= 0.921
11/7/2023
Simple Linear Regression and
Correlations
27