SlideShare a Scribd company logo
1 of 20
3-Dimensional Scatterplots
Multiple Linear Regressions on
Overall Well-being
Hubert Lo, Crystal Macias, and Gregory Knothe
California Polytechnic State University
1 Introduction
One can easily see happiness, but it’s difficult to define. One way to define it is as a sense of
one’s well-being. However one may define happiness, a majority of people try to attain it.
Happiness is important because it moves people forward in a positive way (i.e. enjoying life,
having good thoughts and making good choices). In reality, people do not always have complete
control over their happiness. Research suggests that genetics, environmental stress and factors,
and physical health play a role on one’s overall well-being. The National health and Nutrition
Examination Survey (NHANES) are studies assessing health and nutritional status of adults and
children in the US. NHANES has found that employed women had a higher sense of well-being
than their unemployed counterparts. The Health and Medicine Week published a journal
indicating going to the dentist is linked to one’s overall well-being. Thus, generating people to
manage any factors they can control.
Thus, leading to the question, “What is the happiest city in the nation?” The collected data from
the Gallup-Healthways Well-Being Index, factors of what may define “happiness”, calculates
one’s overall well-being score. Our project shall investigate numerical and categorical predictors
and how they affect a city’s overall well-being score. As well as how numerical and categorical
interacts with one another and how they affect a city’s overall well-being score.
2 Methods
The method process for accomplishing the project can be broken down into three key steps.
2.1 Gathering the Data
We had to gather data, preferably interesting data that was meaningful to analyze. We looked
around on different sites including:“http://www.quandl.com/”, “google public data”, “Federal
Reserve Bank of St. Louis - economic data”, and others. However, most of the data we found
didn’t have a categorical variable which we needed that for this project.
Since we live in San Luis Obispo and most remember that we were named as the “Happiest
Place in America” few years ago, we began to do our research in this route. Ultimately, we were
able to find out that Gallup-Healthways produces an annual ranking of cities in overall-well-
being by doing a detailed poll in every major city. Our group decided that this study was very
interesting. It also fitted our criteria as it contained categorical variables and quantitative
variables. We utilized Gallup-Healthways Well Being Index Data from 2012 to 2013, ranking
U.S. cities for overall well-being calculated from various factors including: percent Obese,
percent Exercise frequently, percent Eat produce frequently, percent Smoke, percent With daily
Stress, and percent Uninsured. Since the dataset was given in such a way that one can sort by
size and state, we decided to create two new variables. The first new variable, population size, is
grouped three population sizes into 0, 1, and 2. All the listed 0 is population size less than
300,000, 1 is population size between 300,000 and 1,000,000, and 2 is population size with
1,000,000 +. The second new variable created is region; we sorted the states into the appropriate
regions being grouped into four regions of 0, 1, 2, and 3. All the listed 0 is Western region, 1 is
Midwest region, 2 is South region, and 3 is Northeast region. Figure 1 displays all the variables
and metric descriptions. The 2012- 2013 GallUp-Healthways Well Being Index Data source link
can be found in the bibliography citations.
Figure 1: Data Variables
2.2 Choosing the Right Variables
In order to make meaningful regression models between the different variables, we decided it
was important to first create a matrix plot as shown in Figure 2. The matrix plot displayed all the
variables, making it easier to analyze which variables have a strong relationship with Overall
Well-being.
We are able to see the correlation for each variable with its relationship with Overall Well-being.
Observing the top row, we see that x5: Percent Smoke has a strong relationship with Overall
well-being with a correlation of -0.718. x2: Percent Obese and x3: Percent Exercise also has a
moderately strong relationship with overall well-being with a correlation of - 0.661 and -0.532
respectively. Moreover, x4: Percent Eat Produce Frequently and x9: Region also have a slight
relationship with a correlation of 0.267 and -0.343 respectively. Therefore, we decided to focus
on; Percent Obese, Percent Exercise, Percent Eat Produce Frequently, Percent Smoke, and
Region for creating the regression models.
Figure 2: Matrix Plot of All Variables
(Matrix plot is created with GGally package and
ggpairs function)
Legend
x1: Overall Well-Being
x2: Percent Obese
x3: Percent Exercise
x4: Percent Eat Produce
Frequently
x5: Percent Smoke
x6: Percent with Daily Stress
x7: Percent Uninsured
x8: Population Size
x9: Region
2.3 Creating Regression Models & 3D-Scatter Plots
Following the project criteria: we are asked to produce four types of regression models: 1) two
quantitative predictors 2) one quantitative predictor & one categorical predictor 3) one
quantitative predictor, one categorical predictor, and interaction term 4) two quantitative
predictors and interaction term. In order to create regression models and 3D plots of the models,
we had to utilize different R functions and packages. We used a different functions and packages
for each of the regression models.
For regression 1, Overall Well-being regressed on percent Smoke and percent Eat Produce
Frequently, we utilized the lm function to run the regression and the summary function to
produce the output. Then we installed the “scatterplot3d” package which creates a 3D scatterplot
of the regression model. Then we inserted percent Smoke as x variable, percent Eat Produce
Frequently as y variable and Overall well-being as z variable as inputs into the scatterplot3D
function, as well as other color and label options. Lastly, we created a plane, by saving the
regression model into a variable, then using the plane3d function which is within the package to
create a plane with the scatterplot.
For regression 2, Overall Well-being regressed on Region (categorical), and percent Smoke. We
had to use the factor function to make the region variable into categorical variables. Then we
used the lm function to create the regression. For this model, we also used the scatterplot3D
package, and inputted Region as x variable, percent Smoke as y variable, and Overall Well-being
as z variable, as inputs into the scatterplot3d function, and used other color and label options as
well. Moreover, in order to create the regression lines per region, we subsetted the data into four
regions, then ran four more regressions of each region on Overall Well-being predicted by
percent Smoke. Then we utilized the four regression models to calculate predicted values for
each region, and used the function points3d to plot the predicted lines.
For regression 3, Overall Well-being regressed on Region (categorical), percent Eat Produce
Frequently, and interaction term of the two. We have the Region variable already created as a
categorical variable in model 2. We used a different package called “rockchalk”. The rockchalk
package has many different functions that are useful in presenting regression models. Using the
mcGraph3 function to create 3 dimensional scatterplot with a plane, it enables the usage of
“interaction”. Something the scatterplot3d function was unable to do directly. We inputted
percent Smoke as x1 variable, percent Eat Produce Frequently as x2 variable, Overall Well-
being as y variable and interaction = TRUE, and other options in the mcGraph3 function,
including theta and phi option which rotates the displayed 3D plot.
For regression 4, Overall Well-being regressed on percent Smoke, percent Eat Produce
Frequently, and interaction term of percent Smoke and percent Eat Produce Frequently. We used
the rockchalk package again. Similarly to regression 3 model, we used mcGraph3 function. We
inputted percent Smoke as x1 variable, percent Eat Produce Frequently as x2 variable, and
Overall Well-being as y variable, and interaction = TRUE and other options including the theta
option. For more information on Rockchalk and Scatterplot3D packages, please see bibliography
citations. Figure 3 displays the four regression models and main functions and packages used in
each.
‘
Figure 3: Regression Models
1) Regression Model 1: Two quantitative predictors
Overall well being ~ percent.Smoke + percent.Eat.Produce.Frequently
-- Main Functions & Packages used: lm function, summary function, Scatterplot3d package.
2) Regression Model 2: One quantitative predictor & one categorical predictor
Overall well being ~ Region + percent.Smoke
-- Subset function, lm function, summary function, points3d function, Scatterplot3d package.
3) Regression Model 3: One quantitative predictor, one categorical predictor, and interaction term
Overall well being ~ Region + percent Eat Produce Frequently + percent Eat Produce
Frequently * Region
-- Lm function, summary function, mcGraph3 function, rockchalk package.
4) Regression Model 4: Two quantitative predictors and interaction term
Overall well being ~ percent Smoke + percent Eat Produce Frequently + percent Eat Produce
Frequently * percent Smoke
-- Lm function, summary function, mcGraph3 function, rockchalk package.
3 Results
3.1 Regression Model 1: Overall Well-Being ~ %Smoke + %Eat
Produce Frequently (%EPF)
Figure 5: 3D scatter plot of Model 1 with
regression plan. (Created in scatterplot3D)
Figure 4: R regression output for Model 1.
The model includes two
quantitative predictor variables,
%Smoke and %Eat Produce
Frequently, with our response
variable of Overall Well-Being.
The R regression output for model
1 can be seen in Figure 4. Both
predictor variables prove to be
statistically significant in the
presence of one another. But the
coefficients show two different
relationships in respect to our
response variable.
%Smoke shows a negative relationship
with Overall Well-Being. It can be
interpreted as, for each percent increase
in %Smoke results in an expected
decrease of 0.33908 in Overall Well-
being, in the presence of %EPF.
On the other hand, %EPF has a positive
relationship with Overall Well-Being.
For each percent increase in %EPF
there is an associated increase of
0.10848 increases in expected Overall
Well-Being, in the presence of
%Smoke.
These relationships can be seen visually
in Figure 5. If you are to look along the
x-axis, labeled as %Smoke, you can see
strong the negative slope of %Smoke.
While if you look at the y-axis, labeled
as %Eat Produce Frequently, you can
see the moderate positive slope of the
plane.
3.2 Regression Model 2: Overall Well-Being ~ Region + %Smoke
The model consists of our response variable, Overall Well-Being, and two predictor variables,
Region and %Smoke, which are categorical and quantitative variables respectively. The variable
%Smoke proved to be significant in the presence of Region, while only one Region category
produced a significant p-value, but is acceptable for what we wish to do with the data.
Figure 6: R regression output for Model 2.
Figure 7: 3D scatter plot of Model 2
with each Regions’ regression slope.
(Created in scatterplot3D)
We can see in Figure 6, that
%Smoke has a negative
coefficient, which is expected.
This can be interpreted as for
each percent increase in %Smoke
there is an expected decrease in
Overall Well-Being by .365404
while in the presence of the
factor Region.
If we were to calculate the
regression equations for these
coefficients, it would result in the
same slope for all regions with
slightly different intercepts, so
instead we used a different
method to produce more accurate
regression equations.
Figure 7, to better display Regions
relationship in respect to %Smoke
on Overall Well-Being.
To put it simply, we subsetted our
data by region, then ran a regression
on each of our new data sets of
Overall Well-Being ~ %Smoke.
We used those coefficients to create
new, more accurate, regression
equations to plot as our regression
lines for each region. Then we
plotted the new regression equations
on the 3D scatterplot as a regression
line in red to best depict the effect of
%Smoke per Region on Overall
Well-Being.
Figure 9: Linear Regression Model of the Regions
(categorical) predictor and %Eat Produce Frequently
(numerical) predictor on one’s Over Well-being Score
(numerical response variable) without an interaction.
Figure 8: 3D graph of Regression
Model 3 with slopes of %Eat
Produce Frequently with Overall
Well-being Score for each Region.
*Blue is Western Region.
*Green is Midwest Region.
*Black is South Region.
*Purple is Northeast Region.
Note: Without interaction
3.3 Regression Model 3: Overall Well-Being ~ Region + %Eat
Produce Frequently + Interaction Term (R * %EPF)
This model has one numeric and
one categorical predictor regressed
onto our response variable (Overall
Well-Being) with an interaction
term added to the model to see how
two predictor variables interact with
one another’s influence on the
response variable. We will look at
the model without the interaction
term first.
The Western Region shall be our indicator variable to
represent effects of levels of the categorical variables on the
response. The Western Region has an estimated slope of
58.17639, Midwest Region has an estimated slope of -
0.45399, South Region has an estimated slope of -1.30124,
Northeast Region has an estimated slope of -1.57456 and
percent Eat Produce Frequently has an estimated slope of
0.15593 in the presence of the Overall Well-Being predicting
variable. The slope differences can be seen in Figure 8.
Noticing the predicted P-values in Figure 9, only the
Midwest Region does not meet the alpha = 0.05 level. Thus
the South, Northeast, and percent Eat Produce Frequently are
statistically significant at the 0.05 level compared to the
Western Region.
The positive coefficient indicated that for every additional
%Eat Produce Frequently you can expect one’s Overall Well-
being to increase an average of 0.16.
The negative coefficient indicated that you can expect one’s
Overall Well-being to be 1.30 lower in the South compared
to the Western Region. The negative coefficient indicated
that you can expect one’s Overall Well-being to be 1.57
lower in the Northeast compared to the Western Region.
Figure 10: Linear Regression Model of the Regions
(categorical) predictor and %Eat Produce Frequently
(numerical) predictor on one’s Over Well-being
Score (numerical response variable) with an
interaction.
Figure 11: 3D graph of Regression Model 3
with slopes of %Eat Produce Frequently with
Overall Well-being Score for each Region with
interaction plane.
Now introducing the interaction
term, the slope coefficients for the
two variables become functions of
one another. The slopes of the
three different regions change with
the interaction of %Eat Produce
Frequently compared to the west
region as the “base line”.
For simplification, let’s refer to
%Eat Produce Frequently as X for
the following equations.
Figure 10:
Slope of the Western Region
R0 = 57.02 + 0.175*X.
Slope of the Midwest Region
R1 = 66.1006+0.00668*X.
The slope of the South Region
R2 = 38.18 + 0.473*X.
The slope of the Northeast Region
R3 = 62.210 + 0.0577*X.
Figure 10 displays, only the South Region, percent
Eat Produce Frequently, and the interaction of the
South and percent Eat Produce Frequently are
statistically significant at alpha = 0.10. The
significance of the interaction term causes the
regression plane to curve slightly, for the slope for
each categorical variable changes for each given value
of the numeric variable as shown in Figure 11.
In order to interpret the statistically significant
predictor variables, R2 = 38.18 + 0.473*X. An “x”
value of percent Eat Produce Frequently (ranging from
47-65 percent) multiplied by 0.474, then adding 38.18,
shall calculate the predicted Overall Well-being Score
for the South Region. For example, a 56 value of
percent Eat Produce Frequently is, 38.18 +
(0.473)*(56) calculating to a predicted 64.668 Overall
well-being Score for the South Region. Continuing off
the example, while %Eat Produce Frequently
increases, it indicated the predicted Overall Well-
being Score value to increase. Implying that eating
five or more servings of fruits and vegetables, four or
more days per week has a positive effect on one’s
Over Well-being score.
3.4 Regression Model 4: Overall Well-Being ~ %Smoke+ %Eat
Produce Frequently + Interaction Term (%Smoke * %EPF)
Figure 12 displays the model with two quantitative variables, % Smoke and % Eat Produce
Frequently, regressed on Overall Well-Being, with an interaction term added to the model.
Allowing us to see how the two predictor variables interact with one another’s influence on the
response variable. Without an interaction term % Smoke has a slope of -0.33908 and % Eat
Produce Frequently has a slope of 0.10845 in the presence of one another when predicting
Overall Well-Being (more information can be found about these relationships in Model 1). These
individual trends still exist after adding the interaction term, but become intertwined with one
another by the interaction term. The interaction term proved to be significant at an alpha level of
0.1 and the model was able to capture 53.9% of the variation in Overall Well-Being, as seen by
the adjusted R-squared value.
Figure 12: R regression output for
Regression Model 4.
Upon introducing the interaction term, the slope coefficients for the two variables become
functions of one another. The slope of %Smoke becomes: 0.6066 - 0.016449*%EPF, and the
slope of %Eats Produce Frequently: 0.444170 - 0.016449*%Smoke. This interaction causes the
regression plane to curve, for the slope for each predictor variable changes for each given value
of the other predictor variable as seen in Figure 13. For example, the slope of %Eat Produce
Frequently is 0.4914 when %Smoke is equal to 7, while the slope changes to 0.0424 when
%Smoke is equal to 34.3.
A simple way to interpret this is to once again look at the functions of the slopes given the other
variable. Continuing off the previous example, while smoking rates increase for each city, it
causes the positive effect of eating produce to diminish. It is also worth pointing out if the trend
continues, let’s say the smoking rate of a city were to be above 40%, then the slope of %Eat
Produce Frequently, would actually become negative, so the negative effects from smoking will
outweigh the positive effects of eating well.
Figure 13: Two 3D graphs of Regression Model 4 with
an interaction plane.
4 Discussion
4.1 Findings
Based on the regression models and 3-dimensional scatterplots that we have created, some of our
key findings include:
I) Positive association between eating produce frequently and overall well-being (Models1,3,4)
II) Negative association between smoking and overall well-being (Models 1, 2, & 4)
III) Effects of eat produce frequently and smoke differ depending on region (Models 2 & 3).
IV) A significant interaction of the two variables: eat produce frequently and smoking on overall
well-being. (Model 4)
4.2 Learnings
We learned how to effectively create detailed regression models and 3-dimensional scatter-plots
in R software with a list of diverse functions in the rockchalk and scatterplot3d packages. We
learned how to interpret the different regression models that we created, especially the models
with interaction terms within.
4.3 Challenges
Finding the appropriate package that would allow us to create and graph 3-dimensional
regression models with interaction terms. We struggled with this immensely, until, we were able
to find the rockchalk package that allowed us to overcome this obstacle.
4.4 Suggestions
Explore other packages to find the most useful one in respects to your study before delving
deeper into one package only to ditch it later. Use a data set with many varying types of variables
to produce better models. Find a dataset that has a bigger sample size to create better models that
give a higher power. For example, our dataset only looked at 189 cities, there are many smaller
counties less than population size of 300,000, a researcher can look and gather more data with
smaller counties.
4.5 Going Onward
Some other research ideas that we can pursue from this project:
I) What produce actually has the biggest impact on overall wellbeing? I.e. fruit produces: apples,
oranges, strawberries; vegetable produces: mushrooms, asparagus, peas, etc.
II) How many cigarettes does one has to smoke to feel the negative effects on overall well-being.
III) Since this project was done on a macro-level looking at U.S. cities, we can attempt to do
research on micro level where it is individuals basis. For example, we can survey a random
sample of college students, working professionals, and senior citizens, and collect more variables
and factors affecting their overall well-being.
5 Bibliography
Barret Schloerke, Jason Crowley, Di Cook, Heike Hofmann, Hadley Wickham, Francois Briatte
and Moritz Marbach (2014). GGally: Extension to ggplot2.. R package version 0.4.6.
http://CRAN.R-project.org/package=GGally
Ligges, U. and Mächler, M. (2003). Scatterplot3d - an R Package for Visualizing Multivariate
Data. Journal of Statistical Software 8(11), 1-20.
Paul E. Johnson (2013). rockchalk: Regression Estimation and Presentation. R package version
1.8.0. http://CRAN.R-project.org/package=rockchalk
R Core Team (2013). R: A language and environment for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
"U.S. Community Well-Being Tracking." U.S. Community Well-Being Tracking. G
.allup, 1 Jan. 2013. Web. 7 June 2014. http://www.gallup.com/poll/145913/City-Wellbeing-
Tracking.aspx?ref=s
6 Annotated Code
###########################################
# Stat 331 Geometry Proj #
# Hubert Lo Gregory Knothe Crystal Macias #
###########################################
#Setting up R, loading/inspecting the data set
rm(list=ls())
getwd()
wb <- read.csv(file.choose())
head(wb)
names(wb)
str(wb)
#Installing needed R packages
install.packages("scatterplot3d")
library(scatterplot3d)
install.packages("rockchalk")
library(rockchalk)
install.packages("ggplot2")
library(ggplot2)
install.packages("GGally")
library(GGally)
#Creating a correlation matrix for initial analysis with the help of the package GGally
ggpairs(wb, columns= 2:10)
##############################################################################
###############################
# Regression Model 1 #
#Using factor() to mold wb$Region into a factor (Note: We uses this variable in later models)
fregion=factor(wb$Region, labels=c("Western Region", "Midwest Region", "South Region",
"Northeast Region"))
is.factor(fregion)
#Using factor() to mold wb$Population.size into a factor (Note: We ended up not using this
variable)
fpopsize <- factor(wb$Population.size, labels=c("Population<300,000", "Population 300,000-
1,000,000","Population 1,000,000+"))
is.factor(fpopsize)
#Running a regression of %Smoke and %EPF on Overall.Wellbeing with the lm() function
mod1=lm(Overall.Wellbeing~ percent.Smoke + percent.Eat.Produce.Frequently, data=wb)
summary(mod1)
#Creating a 3D scatterplot of Model 1 with appropriate labels
s3d<-scatterplot3d(wb$percent.Smoke, wb$percent.Eat.Produce.Frequently,
wb$Overall.Wellbeing,color="red", pch=19, type= "h",
xlab="% Smoke", ylab="% Eat Produce Frequently", zlab="Overall well being", main= "Overall
well being regressed on % smoke & % eat produce frequently")
#Saving the regression Model 1 into a variable "plane"
plane<- lm(wb$Overall.Wellbeing ~ wb$percent.Smoke + wb$percent.Eat.Produce.Frequently)
#Graphing the regression plane of Model 1 onto the existing 3D scatterplot
s3d$plane3d(plane)
##############################################################################
###############################
# Regression Model 2 #
#Running a regression of %Smoke and Region on Overall.Wellbeing with the lm() function
mod2 = lm(wb$Overall.Wellbeing ~ factor(wb$Region) + wb$Smoke)
#Creating a 3D scatterplot of Model 2 with appropriate labels
s3d = scatterplot3d(wb$Region, wb$percent.Smoke, wb$Overall.Wellbeing,
pch=16, highlight.3d=TRUE, main="Overall Wellbeing ~ Region + Smoke",
xlab = "Region", ylab="%Smoke", zlab="Overall Wellbeing")
#Creates subsets of data for each Region to preform regression on to obtain the slopes for each
region
R0 = subset(wb, wb$Region==0)
R1 = subset(wb, wb$Region==1)
R2 = subset(wb, wb$Region==2)
R3 = subset(wb, wb$Region==3)
#Creating a vector filled with values of %Smoke to later use to graph the regression lines for
each region
x = seq(5.7,35,.1)
#Creating a regression equation for Region0, then using a for loop to create a predicted values
for each value
#in the array "x" to plot the regression line for Region0
lm0 = lm(R0$Overall.Wellbeing ~ R0$percent.Smoke)
pred=NULL
for (i in 1:length(x)) {
pred[i] = coef(lm0)[1] + coef(lm0)[2]*x[i] }
#Plotting the predicted values in a line to display the regression eqauation for Region0
s3d$points3d(rep(0,length(x)), x, pred, col="red", pch=16, cex=.5)
#Creating a regression equation for Region1, then using a for loop to create a predicted values
for each value
#in the array "x" to plot the regression line for Region1
lm1 = lm(R1$Overall.Wellbeing ~ R1$percent.Smoke)
pred=NULL
for (i in 1:length(x)) {
pred[i] = coef(lm1)[1] + coef(lm1)[2]*x[i] }
#Plotting the predicted values in a line to display the regression eqauation for Region1
s3d$points3d(rep(1,length(x)), x, pred, col="red", pch=16, cex=.5)
#Creating a regression equation for Region2, then using a for loop to create a predicted values
for each value
#in the array "x" to plot the regression line for Region2
lm2 = lm(R2$Overall.Wellbeing ~ R2$percent.Smoke)
pred=NULL
for (i in 1:length(x)) {
pred[i] = coef(lm2)[1] + coef(lm2)[2]*x[i] }
#Plotting the predicted values in a line to display the regression eqauation for Region2
s3d$points3d(rep(2,length(x)), x, pred, col="red", pch=16, cex=.5)
#Creating a regression equation for Region3, then using a for loop to create a predicted values
for each value
#in the array "x" to plot the regression line for Region3
lm3 = lm(R3$Overall.Wellbeing ~ R3$percent.Smoke)
pred=NULL
for (i in 1:length(x)) {
pred[i] = coef(lm3)[1] + coef(lm3)[2]*x[i] }
#Plotting the predicted values in a line to display the regression eqauation for Region3
s3d$points3d(rep(3,length(x)), x, pred, col="red", pch=16, cex=.5)
##############################################################################
###############################
# Regression 3 #
#Running a regression of Region and %EPF on Overall.Wellbeing with the lm() function
rm3=lm(Overall.Wellbeing~fregion + percent.Eat.Produce.Frequently, data=wb)
summary(rm3)
#Color cordinates the regions
col = factor(wb$Region, labels=c("blue", "green", "black", "purple"))
#Creating a 3D scatterplot of Model 3 with appropriate labels
RM3 <- scatterplot3d(x=wb$Region, y=wb$percent.Eat.Produce.Frequently,
z=wb$Overall.Wellbeing,
highlight.3d = F,
color=col,
main = "3D Scatterplot: Region vs % Eat Produce Frequently",
xlab="4 different Regions",
ylab="%Eat Produce Frequently",
zlab="Overall Well being")
#Creates subsets of data for each Region to preform regression on to obtain the slopes for each
region
R0 = subset(wb, wb$Region==0)
R1 = subset(wb, wb$Region==1)
R2 = subset(wb, wb$Region==2)
R3 = subset(wb, wb$Region==3)
#Creating a vector filled with values of %EPF to later use to graph the regression lines for each
region
x=NULL
x = seq(45,70,.01)
#Creating a regression equation for Region0, then using a for loop to create a predicted values
for each value
#in the array "x" to plot the regression line for Region0
lm0 = lm(R0$Overall.Wellbeing ~ R0$percent.Eat.Produce.Frequently)
pred=NULL
for (i in 1:length(x)) {
pred[i] = coef(lm0)[1] + coef(lm0)[2]*x[i] }
#Plotting the predicted values in a line to display the regression eqauation for Region0
RM3$points3d(rep(0,length(x)), x, pred, col="red", pch=16, cex=.5)
#Creating a regression equation for Region1, then using a for loop to create a predicted values
for each value
#in the array "x" to plot the regression line for Region1
lm1 = lm(R1$Overall.Wellbeing ~ R1$percent.Eat.Produce.Frequently)
pred=NULL
for (i in 1:length(x)) {
pred[i] = coef(lm1)[1] + coef(lm1)[2]*x[i] }
#Plotting the predicted values in a line to display the regression eqauation for Region1
RM3$points3d(rep(1,length(x)), x, pred, col="red", pch=16, cex=.5)
#Creating a regression equation for Region2, then using a for loop to create a predicted values
for each value
#in the array "x" to plot the regression line for Region2
lm2 = lm(R2$Overall.Wellbeing ~ R2$percent.Eat.Produce.Frequently)
pred=NULL
for (i in 1:length(x)) {
pred[i] = coef(lm2)[1] + coef(lm2)[2]*x[i] }
#Plotting the predicted values in a line to display the regression eqauation for Region2
RM3$points3d(rep(2,length(x)), x, pred, col="red", pch=16, cex=.5)
#Creating a regression equation for Region3, then using a for loop to create a predicted values
for each value
#in the array "x" to plot the regression line for Region3
lm3 = lm(R3$Overall.Wellbeing ~ R3$percent.Eat.Produce.Frequently)
pred=NULL
for (i in 1:length(x)) {
pred[i] = coef(lm3)[1] + coef(lm3)[2]*x[i] }
#Plotting the predicted values in a line to display the regression eqauation for Region3
RM3$points3d(rep(3,length(x)), x, pred, col="red", pch=16, cex=.5)
#Running a regression of Region, %EPF, and an interaction term on Overall.Wellbeing with the
lm() function
mod3=lm(Overall.Wellbeing~ fregion + percent.Eat.Produce.Frequently +
fregion*percent.Eat.Produce.Frequently, data=wb)
summary(mod3)
#Creates a 3D graph of Model 3 (with the interaction term) with the mcGraph3() function in the
rockchalk package
#with all the appropriate labels and an interaction plane
mcGraph3(x1=wb$Region, x2=wb$percent.Eat.Produce.Frequently, y=wb$Overall.Wellbeing,
interaction=TRUE,
main = "3D Graphic with Region & Eat healthy Frequently
Interation Term",
x1lab="4 different regions
West, Midwest, South, Northeast",
x2lab="
Servings of fruits &
vegetables days per week",
ylab="Overall Wellbeing")
##############################################################################
###############################
# Regression 4 #
#Running a regression of %Smoke, %EPF, and an interaction term on Overall.Wellbeing with
the lm() function
mod4=lm(Overall.Wellbeing ~ percent.Smoke + percent.Eat.Produce.Frequently +
percent.Smoke*percent.Eat.Produce.Frequently, data= wb)
summary(mod4)
#Spits the graphics device in R into two parts to plot both points of view of the regression model
in following code
par(mfrow=c(1,2))
#Creates a 3D graph of Model 4 (with the interaction term) with the mcGraph3() function in the
rockchalk package
#with all the appropriate labels and an interaction plane at 90 degrees to better view of %EPF's
slopes
mcGraph3(x1= wb$percent.Smoke, x2= wb$percent.Eat.Produce.Frequently,
y=wb$Overall.Wellbeing, interaction = TRUE,
main=" Overall Well Being regressed on smoke,
eat produce frequently, &
percent smoke *percent frequently - 90 degrees", theta= 90,
cex.main=1)
#Same as the above code, but the graph is displayed at a 0 degree angle to better view of
%Smoke's slopes
mcGraph3(x1= wb$percent.Smoke, x2= wb$percent.Eat.Produce.Frequently,
y=wb$Overall.Wellbeing, interaction = TRUE,
main=" Overall Well Being regressed on smoke,
eat produce frequently, &
percent smoke *percent frequently - 0 degrees", theta= 0,
cex.main=1)

More Related Content

Similar to 3D Scatterplot - R programming

Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionKaushik Rajan
 
Correlation and linear regression
Correlation and linear regression Correlation and linear regression
Correlation and linear regression Ashwini Mathur
 
Data analysis_PredictingActivity_SamsungSensorData
Data analysis_PredictingActivity_SamsungSensorDataData analysis_PredictingActivity_SamsungSensorData
Data analysis_PredictingActivity_SamsungSensorDataKaren Yang
 
Predicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine LearningPredicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine LearningIdanGalShohet
 
STAT 2103 Project 4 Performing a Multiple Linear Regress.docx
STAT 2103 Project 4  Performing a Multiple Linear Regress.docxSTAT 2103 Project 4  Performing a Multiple Linear Regress.docx
STAT 2103 Project 4 Performing a Multiple Linear Regress.docxdessiechisomjj4
 
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MININGUNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MININGIJDKP
 
Moderation and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSModeration and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSOsama Yousaf
 
Multinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfMultinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfAlemAyahu
 
Advanced Methods of Statistical Analysis used in Animal Breeding.
Advanced Methods of Statistical Analysis used in Animal Breeding.Advanced Methods of Statistical Analysis used in Animal Breeding.
Advanced Methods of Statistical Analysis used in Animal Breeding.DrBarada Mohanty
 
Prediciting happiness from mobile app survey data
Prediciting happiness from mobile app survey dataPrediciting happiness from mobile app survey data
Prediciting happiness from mobile app survey dataAlex Papageorgiou
 
FormalWriteupTornado_1
FormalWriteupTornado_1FormalWriteupTornado_1
FormalWriteupTornado_1Katie Harvey
 
Summary statistics
Summary statisticsSummary statistics
Summary statisticsRupak Roy
 
Add slides
Add slidesAdd slides
Add slidesRupa D
 
---Quantitative Project  World Income and Health Inequality.docx
---Quantitative Project  World Income and Health Inequality.docx---Quantitative Project  World Income and Health Inequality.docx
---Quantitative Project  World Income and Health Inequality.docxtienmixon
 
Forecasting COVID-19 using Polynomial Regression and Support Vector Machine
Forecasting COVID-19 using Polynomial Regression and Support Vector MachineForecasting COVID-19 using Polynomial Regression and Support Vector Machine
Forecasting COVID-19 using Polynomial Regression and Support Vector MachineIRJET Journal
 
IRJET- Comparison of Techniques for Diabetes Detection in Females using Machi...
IRJET- Comparison of Techniques for Diabetes Detection in Females using Machi...IRJET- Comparison of Techniques for Diabetes Detection in Females using Machi...
IRJET- Comparison of Techniques for Diabetes Detection in Females using Machi...IRJET Journal
 

Similar to 3D Scatterplot - R programming (20)

main
mainmain
main
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic Regression
 
Correlation and linear regression
Correlation and linear regression Correlation and linear regression
Correlation and linear regression
 
Data analysis_PredictingActivity_SamsungSensorData
Data analysis_PredictingActivity_SamsungSensorDataData analysis_PredictingActivity_SamsungSensorData
Data analysis_PredictingActivity_SamsungSensorData
 
Predicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine LearningPredicting deaths from COVID-19 using Machine Learning
Predicting deaths from COVID-19 using Machine Learning
 
STAT 2103 Project 4 Performing a Multiple Linear Regress.docx
STAT 2103 Project 4  Performing a Multiple Linear Regress.docxSTAT 2103 Project 4  Performing a Multiple Linear Regress.docx
STAT 2103 Project 4 Performing a Multiple Linear Regress.docx
 
JEDM_RR_JF_Final
JEDM_RR_JF_FinalJEDM_RR_JF_Final
JEDM_RR_JF_Final
 
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MININGUNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
 
Moderation and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSModeration and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSS
 
Multinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfMultinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdf
 
Advanced Methods of Statistical Analysis used in Animal Breeding.
Advanced Methods of Statistical Analysis used in Animal Breeding.Advanced Methods of Statistical Analysis used in Animal Breeding.
Advanced Methods of Statistical Analysis used in Animal Breeding.
 
Prediciting happiness from mobile app survey data
Prediciting happiness from mobile app survey dataPrediciting happiness from mobile app survey data
Prediciting happiness from mobile app survey data
 
FormalWriteupTornado_1
FormalWriteupTornado_1FormalWriteupTornado_1
FormalWriteupTornado_1
 
Summary statistics
Summary statisticsSummary statistics
Summary statistics
 
Add slides
Add slidesAdd slides
Add slides
 
---Quantitative Project  World Income and Health Inequality.docx
---Quantitative Project  World Income and Health Inequality.docx---Quantitative Project  World Income and Health Inequality.docx
---Quantitative Project  World Income and Health Inequality.docx
 
Structural equation modelling
Structural equation modellingStructural equation modelling
Structural equation modelling
 
Forecasting COVID-19 using Polynomial Regression and Support Vector Machine
Forecasting COVID-19 using Polynomial Regression and Support Vector MachineForecasting COVID-19 using Polynomial Regression and Support Vector Machine
Forecasting COVID-19 using Polynomial Regression and Support Vector Machine
 
Final Project Statr 503
Final Project Statr 503Final Project Statr 503
Final Project Statr 503
 
IRJET- Comparison of Techniques for Diabetes Detection in Females using Machi...
IRJET- Comparison of Techniques for Diabetes Detection in Females using Machi...IRJET- Comparison of Techniques for Diabetes Detection in Females using Machi...
IRJET- Comparison of Techniques for Diabetes Detection in Females using Machi...
 

Recently uploaded

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 

3D Scatterplot - R programming

  • 1. 3-Dimensional Scatterplots Multiple Linear Regressions on Overall Well-being Hubert Lo, Crystal Macias, and Gregory Knothe California Polytechnic State University
  • 2. 1 Introduction One can easily see happiness, but it’s difficult to define. One way to define it is as a sense of one’s well-being. However one may define happiness, a majority of people try to attain it. Happiness is important because it moves people forward in a positive way (i.e. enjoying life, having good thoughts and making good choices). In reality, people do not always have complete control over their happiness. Research suggests that genetics, environmental stress and factors, and physical health play a role on one’s overall well-being. The National health and Nutrition Examination Survey (NHANES) are studies assessing health and nutritional status of adults and children in the US. NHANES has found that employed women had a higher sense of well-being than their unemployed counterparts. The Health and Medicine Week published a journal indicating going to the dentist is linked to one’s overall well-being. Thus, generating people to manage any factors they can control. Thus, leading to the question, “What is the happiest city in the nation?” The collected data from the Gallup-Healthways Well-Being Index, factors of what may define “happiness”, calculates one’s overall well-being score. Our project shall investigate numerical and categorical predictors and how they affect a city’s overall well-being score. As well as how numerical and categorical interacts with one another and how they affect a city’s overall well-being score. 2 Methods The method process for accomplishing the project can be broken down into three key steps. 2.1 Gathering the Data We had to gather data, preferably interesting data that was meaningful to analyze. We looked around on different sites including:“http://www.quandl.com/”, “google public data”, “Federal Reserve Bank of St. Louis - economic data”, and others. However, most of the data we found didn’t have a categorical variable which we needed that for this project. Since we live in San Luis Obispo and most remember that we were named as the “Happiest Place in America” few years ago, we began to do our research in this route. Ultimately, we were able to find out that Gallup-Healthways produces an annual ranking of cities in overall-well- being by doing a detailed poll in every major city. Our group decided that this study was very interesting. It also fitted our criteria as it contained categorical variables and quantitative variables. We utilized Gallup-Healthways Well Being Index Data from 2012 to 2013, ranking U.S. cities for overall well-being calculated from various factors including: percent Obese, percent Exercise frequently, percent Eat produce frequently, percent Smoke, percent With daily Stress, and percent Uninsured. Since the dataset was given in such a way that one can sort by size and state, we decided to create two new variables. The first new variable, population size, is
  • 3. grouped three population sizes into 0, 1, and 2. All the listed 0 is population size less than 300,000, 1 is population size between 300,000 and 1,000,000, and 2 is population size with 1,000,000 +. The second new variable created is region; we sorted the states into the appropriate regions being grouped into four regions of 0, 1, 2, and 3. All the listed 0 is Western region, 1 is Midwest region, 2 is South region, and 3 is Northeast region. Figure 1 displays all the variables and metric descriptions. The 2012- 2013 GallUp-Healthways Well Being Index Data source link can be found in the bibliography citations. Figure 1: Data Variables
  • 4. 2.2 Choosing the Right Variables In order to make meaningful regression models between the different variables, we decided it was important to first create a matrix plot as shown in Figure 2. The matrix plot displayed all the variables, making it easier to analyze which variables have a strong relationship with Overall Well-being. We are able to see the correlation for each variable with its relationship with Overall Well-being. Observing the top row, we see that x5: Percent Smoke has a strong relationship with Overall well-being with a correlation of -0.718. x2: Percent Obese and x3: Percent Exercise also has a moderately strong relationship with overall well-being with a correlation of - 0.661 and -0.532 respectively. Moreover, x4: Percent Eat Produce Frequently and x9: Region also have a slight relationship with a correlation of 0.267 and -0.343 respectively. Therefore, we decided to focus on; Percent Obese, Percent Exercise, Percent Eat Produce Frequently, Percent Smoke, and Region for creating the regression models. Figure 2: Matrix Plot of All Variables (Matrix plot is created with GGally package and ggpairs function) Legend x1: Overall Well-Being x2: Percent Obese x3: Percent Exercise x4: Percent Eat Produce Frequently x5: Percent Smoke x6: Percent with Daily Stress x7: Percent Uninsured x8: Population Size x9: Region
  • 5. 2.3 Creating Regression Models & 3D-Scatter Plots Following the project criteria: we are asked to produce four types of regression models: 1) two quantitative predictors 2) one quantitative predictor & one categorical predictor 3) one quantitative predictor, one categorical predictor, and interaction term 4) two quantitative predictors and interaction term. In order to create regression models and 3D plots of the models, we had to utilize different R functions and packages. We used a different functions and packages for each of the regression models. For regression 1, Overall Well-being regressed on percent Smoke and percent Eat Produce Frequently, we utilized the lm function to run the regression and the summary function to produce the output. Then we installed the “scatterplot3d” package which creates a 3D scatterplot of the regression model. Then we inserted percent Smoke as x variable, percent Eat Produce Frequently as y variable and Overall well-being as z variable as inputs into the scatterplot3D function, as well as other color and label options. Lastly, we created a plane, by saving the regression model into a variable, then using the plane3d function which is within the package to create a plane with the scatterplot. For regression 2, Overall Well-being regressed on Region (categorical), and percent Smoke. We had to use the factor function to make the region variable into categorical variables. Then we used the lm function to create the regression. For this model, we also used the scatterplot3D package, and inputted Region as x variable, percent Smoke as y variable, and Overall Well-being as z variable, as inputs into the scatterplot3d function, and used other color and label options as well. Moreover, in order to create the regression lines per region, we subsetted the data into four regions, then ran four more regressions of each region on Overall Well-being predicted by percent Smoke. Then we utilized the four regression models to calculate predicted values for each region, and used the function points3d to plot the predicted lines. For regression 3, Overall Well-being regressed on Region (categorical), percent Eat Produce Frequently, and interaction term of the two. We have the Region variable already created as a categorical variable in model 2. We used a different package called “rockchalk”. The rockchalk package has many different functions that are useful in presenting regression models. Using the mcGraph3 function to create 3 dimensional scatterplot with a plane, it enables the usage of “interaction”. Something the scatterplot3d function was unable to do directly. We inputted percent Smoke as x1 variable, percent Eat Produce Frequently as x2 variable, Overall Well- being as y variable and interaction = TRUE, and other options in the mcGraph3 function, including theta and phi option which rotates the displayed 3D plot. For regression 4, Overall Well-being regressed on percent Smoke, percent Eat Produce Frequently, and interaction term of percent Smoke and percent Eat Produce Frequently. We used the rockchalk package again. Similarly to regression 3 model, we used mcGraph3 function. We inputted percent Smoke as x1 variable, percent Eat Produce Frequently as x2 variable, and Overall Well-being as y variable, and interaction = TRUE and other options including the theta
  • 6. option. For more information on Rockchalk and Scatterplot3D packages, please see bibliography citations. Figure 3 displays the four regression models and main functions and packages used in each. ‘ Figure 3: Regression Models 1) Regression Model 1: Two quantitative predictors Overall well being ~ percent.Smoke + percent.Eat.Produce.Frequently -- Main Functions & Packages used: lm function, summary function, Scatterplot3d package. 2) Regression Model 2: One quantitative predictor & one categorical predictor Overall well being ~ Region + percent.Smoke -- Subset function, lm function, summary function, points3d function, Scatterplot3d package. 3) Regression Model 3: One quantitative predictor, one categorical predictor, and interaction term Overall well being ~ Region + percent Eat Produce Frequently + percent Eat Produce Frequently * Region -- Lm function, summary function, mcGraph3 function, rockchalk package. 4) Regression Model 4: Two quantitative predictors and interaction term Overall well being ~ percent Smoke + percent Eat Produce Frequently + percent Eat Produce Frequently * percent Smoke -- Lm function, summary function, mcGraph3 function, rockchalk package.
  • 7. 3 Results 3.1 Regression Model 1: Overall Well-Being ~ %Smoke + %Eat Produce Frequently (%EPF) Figure 5: 3D scatter plot of Model 1 with regression plan. (Created in scatterplot3D) Figure 4: R regression output for Model 1. The model includes two quantitative predictor variables, %Smoke and %Eat Produce Frequently, with our response variable of Overall Well-Being. The R regression output for model 1 can be seen in Figure 4. Both predictor variables prove to be statistically significant in the presence of one another. But the coefficients show two different relationships in respect to our response variable. %Smoke shows a negative relationship with Overall Well-Being. It can be interpreted as, for each percent increase in %Smoke results in an expected decrease of 0.33908 in Overall Well- being, in the presence of %EPF. On the other hand, %EPF has a positive relationship with Overall Well-Being. For each percent increase in %EPF there is an associated increase of 0.10848 increases in expected Overall Well-Being, in the presence of %Smoke. These relationships can be seen visually in Figure 5. If you are to look along the x-axis, labeled as %Smoke, you can see strong the negative slope of %Smoke. While if you look at the y-axis, labeled as %Eat Produce Frequently, you can see the moderate positive slope of the plane.
  • 8. 3.2 Regression Model 2: Overall Well-Being ~ Region + %Smoke The model consists of our response variable, Overall Well-Being, and two predictor variables, Region and %Smoke, which are categorical and quantitative variables respectively. The variable %Smoke proved to be significant in the presence of Region, while only one Region category produced a significant p-value, but is acceptable for what we wish to do with the data. Figure 6: R regression output for Model 2. Figure 7: 3D scatter plot of Model 2 with each Regions’ regression slope. (Created in scatterplot3D) We can see in Figure 6, that %Smoke has a negative coefficient, which is expected. This can be interpreted as for each percent increase in %Smoke there is an expected decrease in Overall Well-Being by .365404 while in the presence of the factor Region. If we were to calculate the regression equations for these coefficients, it would result in the same slope for all regions with slightly different intercepts, so instead we used a different method to produce more accurate regression equations. Figure 7, to better display Regions relationship in respect to %Smoke on Overall Well-Being. To put it simply, we subsetted our data by region, then ran a regression on each of our new data sets of Overall Well-Being ~ %Smoke. We used those coefficients to create new, more accurate, regression equations to plot as our regression lines for each region. Then we plotted the new regression equations on the 3D scatterplot as a regression line in red to best depict the effect of %Smoke per Region on Overall Well-Being.
  • 9. Figure 9: Linear Regression Model of the Regions (categorical) predictor and %Eat Produce Frequently (numerical) predictor on one’s Over Well-being Score (numerical response variable) without an interaction. Figure 8: 3D graph of Regression Model 3 with slopes of %Eat Produce Frequently with Overall Well-being Score for each Region. *Blue is Western Region. *Green is Midwest Region. *Black is South Region. *Purple is Northeast Region. Note: Without interaction 3.3 Regression Model 3: Overall Well-Being ~ Region + %Eat Produce Frequently + Interaction Term (R * %EPF) This model has one numeric and one categorical predictor regressed onto our response variable (Overall Well-Being) with an interaction term added to the model to see how two predictor variables interact with one another’s influence on the response variable. We will look at the model without the interaction term first. The Western Region shall be our indicator variable to represent effects of levels of the categorical variables on the response. The Western Region has an estimated slope of 58.17639, Midwest Region has an estimated slope of - 0.45399, South Region has an estimated slope of -1.30124, Northeast Region has an estimated slope of -1.57456 and percent Eat Produce Frequently has an estimated slope of 0.15593 in the presence of the Overall Well-Being predicting variable. The slope differences can be seen in Figure 8. Noticing the predicted P-values in Figure 9, only the Midwest Region does not meet the alpha = 0.05 level. Thus the South, Northeast, and percent Eat Produce Frequently are statistically significant at the 0.05 level compared to the Western Region. The positive coefficient indicated that for every additional %Eat Produce Frequently you can expect one’s Overall Well- being to increase an average of 0.16. The negative coefficient indicated that you can expect one’s Overall Well-being to be 1.30 lower in the South compared to the Western Region. The negative coefficient indicated that you can expect one’s Overall Well-being to be 1.57 lower in the Northeast compared to the Western Region.
  • 10. Figure 10: Linear Regression Model of the Regions (categorical) predictor and %Eat Produce Frequently (numerical) predictor on one’s Over Well-being Score (numerical response variable) with an interaction. Figure 11: 3D graph of Regression Model 3 with slopes of %Eat Produce Frequently with Overall Well-being Score for each Region with interaction plane. Now introducing the interaction term, the slope coefficients for the two variables become functions of one another. The slopes of the three different regions change with the interaction of %Eat Produce Frequently compared to the west region as the “base line”. For simplification, let’s refer to %Eat Produce Frequently as X for the following equations. Figure 10: Slope of the Western Region R0 = 57.02 + 0.175*X. Slope of the Midwest Region R1 = 66.1006+0.00668*X. The slope of the South Region R2 = 38.18 + 0.473*X. The slope of the Northeast Region R3 = 62.210 + 0.0577*X. Figure 10 displays, only the South Region, percent Eat Produce Frequently, and the interaction of the South and percent Eat Produce Frequently are statistically significant at alpha = 0.10. The significance of the interaction term causes the regression plane to curve slightly, for the slope for each categorical variable changes for each given value of the numeric variable as shown in Figure 11. In order to interpret the statistically significant predictor variables, R2 = 38.18 + 0.473*X. An “x” value of percent Eat Produce Frequently (ranging from 47-65 percent) multiplied by 0.474, then adding 38.18, shall calculate the predicted Overall Well-being Score for the South Region. For example, a 56 value of percent Eat Produce Frequently is, 38.18 + (0.473)*(56) calculating to a predicted 64.668 Overall well-being Score for the South Region. Continuing off the example, while %Eat Produce Frequently increases, it indicated the predicted Overall Well- being Score value to increase. Implying that eating five or more servings of fruits and vegetables, four or more days per week has a positive effect on one’s Over Well-being score.
  • 11. 3.4 Regression Model 4: Overall Well-Being ~ %Smoke+ %Eat Produce Frequently + Interaction Term (%Smoke * %EPF) Figure 12 displays the model with two quantitative variables, % Smoke and % Eat Produce Frequently, regressed on Overall Well-Being, with an interaction term added to the model. Allowing us to see how the two predictor variables interact with one another’s influence on the response variable. Without an interaction term % Smoke has a slope of -0.33908 and % Eat Produce Frequently has a slope of 0.10845 in the presence of one another when predicting Overall Well-Being (more information can be found about these relationships in Model 1). These individual trends still exist after adding the interaction term, but become intertwined with one another by the interaction term. The interaction term proved to be significant at an alpha level of 0.1 and the model was able to capture 53.9% of the variation in Overall Well-Being, as seen by the adjusted R-squared value. Figure 12: R regression output for Regression Model 4.
  • 12. Upon introducing the interaction term, the slope coefficients for the two variables become functions of one another. The slope of %Smoke becomes: 0.6066 - 0.016449*%EPF, and the slope of %Eats Produce Frequently: 0.444170 - 0.016449*%Smoke. This interaction causes the regression plane to curve, for the slope for each predictor variable changes for each given value of the other predictor variable as seen in Figure 13. For example, the slope of %Eat Produce Frequently is 0.4914 when %Smoke is equal to 7, while the slope changes to 0.0424 when %Smoke is equal to 34.3. A simple way to interpret this is to once again look at the functions of the slopes given the other variable. Continuing off the previous example, while smoking rates increase for each city, it causes the positive effect of eating produce to diminish. It is also worth pointing out if the trend continues, let’s say the smoking rate of a city were to be above 40%, then the slope of %Eat Produce Frequently, would actually become negative, so the negative effects from smoking will outweigh the positive effects of eating well. Figure 13: Two 3D graphs of Regression Model 4 with an interaction plane.
  • 13. 4 Discussion 4.1 Findings Based on the regression models and 3-dimensional scatterplots that we have created, some of our key findings include: I) Positive association between eating produce frequently and overall well-being (Models1,3,4) II) Negative association between smoking and overall well-being (Models 1, 2, & 4) III) Effects of eat produce frequently and smoke differ depending on region (Models 2 & 3). IV) A significant interaction of the two variables: eat produce frequently and smoking on overall well-being. (Model 4) 4.2 Learnings We learned how to effectively create detailed regression models and 3-dimensional scatter-plots in R software with a list of diverse functions in the rockchalk and scatterplot3d packages. We learned how to interpret the different regression models that we created, especially the models with interaction terms within. 4.3 Challenges Finding the appropriate package that would allow us to create and graph 3-dimensional regression models with interaction terms. We struggled with this immensely, until, we were able to find the rockchalk package that allowed us to overcome this obstacle. 4.4 Suggestions Explore other packages to find the most useful one in respects to your study before delving deeper into one package only to ditch it later. Use a data set with many varying types of variables to produce better models. Find a dataset that has a bigger sample size to create better models that give a higher power. For example, our dataset only looked at 189 cities, there are many smaller counties less than population size of 300,000, a researcher can look and gather more data with smaller counties. 4.5 Going Onward Some other research ideas that we can pursue from this project: I) What produce actually has the biggest impact on overall wellbeing? I.e. fruit produces: apples, oranges, strawberries; vegetable produces: mushrooms, asparagus, peas, etc. II) How many cigarettes does one has to smoke to feel the negative effects on overall well-being. III) Since this project was done on a macro-level looking at U.S. cities, we can attempt to do research on micro level where it is individuals basis. For example, we can survey a random sample of college students, working professionals, and senior citizens, and collect more variables and factors affecting their overall well-being.
  • 14. 5 Bibliography Barret Schloerke, Jason Crowley, Di Cook, Heike Hofmann, Hadley Wickham, Francois Briatte and Moritz Marbach (2014). GGally: Extension to ggplot2.. R package version 0.4.6. http://CRAN.R-project.org/package=GGally Ligges, U. and Mächler, M. (2003). Scatterplot3d - an R Package for Visualizing Multivariate Data. Journal of Statistical Software 8(11), 1-20. Paul E. Johnson (2013). rockchalk: Regression Estimation and Presentation. R package version 1.8.0. http://CRAN.R-project.org/package=rockchalk R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. "U.S. Community Well-Being Tracking." U.S. Community Well-Being Tracking. G .allup, 1 Jan. 2013. Web. 7 June 2014. http://www.gallup.com/poll/145913/City-Wellbeing- Tracking.aspx?ref=s
  • 15. 6 Annotated Code ########################################### # Stat 331 Geometry Proj # # Hubert Lo Gregory Knothe Crystal Macias # ########################################### #Setting up R, loading/inspecting the data set rm(list=ls()) getwd() wb <- read.csv(file.choose()) head(wb) names(wb) str(wb) #Installing needed R packages install.packages("scatterplot3d") library(scatterplot3d) install.packages("rockchalk") library(rockchalk) install.packages("ggplot2") library(ggplot2) install.packages("GGally") library(GGally) #Creating a correlation matrix for initial analysis with the help of the package GGally ggpairs(wb, columns= 2:10) ############################################################################## ############################### # Regression Model 1 # #Using factor() to mold wb$Region into a factor (Note: We uses this variable in later models) fregion=factor(wb$Region, labels=c("Western Region", "Midwest Region", "South Region", "Northeast Region")) is.factor(fregion) #Using factor() to mold wb$Population.size into a factor (Note: We ended up not using this variable) fpopsize <- factor(wb$Population.size, labels=c("Population<300,000", "Population 300,000- 1,000,000","Population 1,000,000+")) is.factor(fpopsize) #Running a regression of %Smoke and %EPF on Overall.Wellbeing with the lm() function mod1=lm(Overall.Wellbeing~ percent.Smoke + percent.Eat.Produce.Frequently, data=wb)
  • 16. summary(mod1) #Creating a 3D scatterplot of Model 1 with appropriate labels s3d<-scatterplot3d(wb$percent.Smoke, wb$percent.Eat.Produce.Frequently, wb$Overall.Wellbeing,color="red", pch=19, type= "h", xlab="% Smoke", ylab="% Eat Produce Frequently", zlab="Overall well being", main= "Overall well being regressed on % smoke & % eat produce frequently") #Saving the regression Model 1 into a variable "plane" plane<- lm(wb$Overall.Wellbeing ~ wb$percent.Smoke + wb$percent.Eat.Produce.Frequently) #Graphing the regression plane of Model 1 onto the existing 3D scatterplot s3d$plane3d(plane) ############################################################################## ############################### # Regression Model 2 # #Running a regression of %Smoke and Region on Overall.Wellbeing with the lm() function mod2 = lm(wb$Overall.Wellbeing ~ factor(wb$Region) + wb$Smoke) #Creating a 3D scatterplot of Model 2 with appropriate labels s3d = scatterplot3d(wb$Region, wb$percent.Smoke, wb$Overall.Wellbeing, pch=16, highlight.3d=TRUE, main="Overall Wellbeing ~ Region + Smoke", xlab = "Region", ylab="%Smoke", zlab="Overall Wellbeing") #Creates subsets of data for each Region to preform regression on to obtain the slopes for each region R0 = subset(wb, wb$Region==0) R1 = subset(wb, wb$Region==1) R2 = subset(wb, wb$Region==2) R3 = subset(wb, wb$Region==3) #Creating a vector filled with values of %Smoke to later use to graph the regression lines for each region x = seq(5.7,35,.1) #Creating a regression equation for Region0, then using a for loop to create a predicted values for each value #in the array "x" to plot the regression line for Region0 lm0 = lm(R0$Overall.Wellbeing ~ R0$percent.Smoke) pred=NULL for (i in 1:length(x)) { pred[i] = coef(lm0)[1] + coef(lm0)[2]*x[i] }
  • 17. #Plotting the predicted values in a line to display the regression eqauation for Region0 s3d$points3d(rep(0,length(x)), x, pred, col="red", pch=16, cex=.5) #Creating a regression equation for Region1, then using a for loop to create a predicted values for each value #in the array "x" to plot the regression line for Region1 lm1 = lm(R1$Overall.Wellbeing ~ R1$percent.Smoke) pred=NULL for (i in 1:length(x)) { pred[i] = coef(lm1)[1] + coef(lm1)[2]*x[i] } #Plotting the predicted values in a line to display the regression eqauation for Region1 s3d$points3d(rep(1,length(x)), x, pred, col="red", pch=16, cex=.5) #Creating a regression equation for Region2, then using a for loop to create a predicted values for each value #in the array "x" to plot the regression line for Region2 lm2 = lm(R2$Overall.Wellbeing ~ R2$percent.Smoke) pred=NULL for (i in 1:length(x)) { pred[i] = coef(lm2)[1] + coef(lm2)[2]*x[i] } #Plotting the predicted values in a line to display the regression eqauation for Region2 s3d$points3d(rep(2,length(x)), x, pred, col="red", pch=16, cex=.5) #Creating a regression equation for Region3, then using a for loop to create a predicted values for each value #in the array "x" to plot the regression line for Region3 lm3 = lm(R3$Overall.Wellbeing ~ R3$percent.Smoke) pred=NULL for (i in 1:length(x)) { pred[i] = coef(lm3)[1] + coef(lm3)[2]*x[i] } #Plotting the predicted values in a line to display the regression eqauation for Region3 s3d$points3d(rep(3,length(x)), x, pred, col="red", pch=16, cex=.5) ############################################################################## ############################### # Regression 3 # #Running a regression of Region and %EPF on Overall.Wellbeing with the lm() function
  • 18. rm3=lm(Overall.Wellbeing~fregion + percent.Eat.Produce.Frequently, data=wb) summary(rm3) #Color cordinates the regions col = factor(wb$Region, labels=c("blue", "green", "black", "purple")) #Creating a 3D scatterplot of Model 3 with appropriate labels RM3 <- scatterplot3d(x=wb$Region, y=wb$percent.Eat.Produce.Frequently, z=wb$Overall.Wellbeing, highlight.3d = F, color=col, main = "3D Scatterplot: Region vs % Eat Produce Frequently", xlab="4 different Regions", ylab="%Eat Produce Frequently", zlab="Overall Well being") #Creates subsets of data for each Region to preform regression on to obtain the slopes for each region R0 = subset(wb, wb$Region==0) R1 = subset(wb, wb$Region==1) R2 = subset(wb, wb$Region==2) R3 = subset(wb, wb$Region==3) #Creating a vector filled with values of %EPF to later use to graph the regression lines for each region x=NULL x = seq(45,70,.01) #Creating a regression equation for Region0, then using a for loop to create a predicted values for each value #in the array "x" to plot the regression line for Region0 lm0 = lm(R0$Overall.Wellbeing ~ R0$percent.Eat.Produce.Frequently) pred=NULL for (i in 1:length(x)) { pred[i] = coef(lm0)[1] + coef(lm0)[2]*x[i] } #Plotting the predicted values in a line to display the regression eqauation for Region0 RM3$points3d(rep(0,length(x)), x, pred, col="red", pch=16, cex=.5) #Creating a regression equation for Region1, then using a for loop to create a predicted values for each value #in the array "x" to plot the regression line for Region1 lm1 = lm(R1$Overall.Wellbeing ~ R1$percent.Eat.Produce.Frequently) pred=NULL for (i in 1:length(x)) {
  • 19. pred[i] = coef(lm1)[1] + coef(lm1)[2]*x[i] } #Plotting the predicted values in a line to display the regression eqauation for Region1 RM3$points3d(rep(1,length(x)), x, pred, col="red", pch=16, cex=.5) #Creating a regression equation for Region2, then using a for loop to create a predicted values for each value #in the array "x" to plot the regression line for Region2 lm2 = lm(R2$Overall.Wellbeing ~ R2$percent.Eat.Produce.Frequently) pred=NULL for (i in 1:length(x)) { pred[i] = coef(lm2)[1] + coef(lm2)[2]*x[i] } #Plotting the predicted values in a line to display the regression eqauation for Region2 RM3$points3d(rep(2,length(x)), x, pred, col="red", pch=16, cex=.5) #Creating a regression equation for Region3, then using a for loop to create a predicted values for each value #in the array "x" to plot the regression line for Region3 lm3 = lm(R3$Overall.Wellbeing ~ R3$percent.Eat.Produce.Frequently) pred=NULL for (i in 1:length(x)) { pred[i] = coef(lm3)[1] + coef(lm3)[2]*x[i] } #Plotting the predicted values in a line to display the regression eqauation for Region3 RM3$points3d(rep(3,length(x)), x, pred, col="red", pch=16, cex=.5) #Running a regression of Region, %EPF, and an interaction term on Overall.Wellbeing with the lm() function mod3=lm(Overall.Wellbeing~ fregion + percent.Eat.Produce.Frequently + fregion*percent.Eat.Produce.Frequently, data=wb) summary(mod3) #Creates a 3D graph of Model 3 (with the interaction term) with the mcGraph3() function in the rockchalk package #with all the appropriate labels and an interaction plane mcGraph3(x1=wb$Region, x2=wb$percent.Eat.Produce.Frequently, y=wb$Overall.Wellbeing, interaction=TRUE, main = "3D Graphic with Region & Eat healthy Frequently Interation Term", x1lab="4 different regions West, Midwest, South, Northeast", x2lab="
  • 20. Servings of fruits & vegetables days per week", ylab="Overall Wellbeing") ############################################################################## ############################### # Regression 4 # #Running a regression of %Smoke, %EPF, and an interaction term on Overall.Wellbeing with the lm() function mod4=lm(Overall.Wellbeing ~ percent.Smoke + percent.Eat.Produce.Frequently + percent.Smoke*percent.Eat.Produce.Frequently, data= wb) summary(mod4) #Spits the graphics device in R into two parts to plot both points of view of the regression model in following code par(mfrow=c(1,2)) #Creates a 3D graph of Model 4 (with the interaction term) with the mcGraph3() function in the rockchalk package #with all the appropriate labels and an interaction plane at 90 degrees to better view of %EPF's slopes mcGraph3(x1= wb$percent.Smoke, x2= wb$percent.Eat.Produce.Frequently, y=wb$Overall.Wellbeing, interaction = TRUE, main=" Overall Well Being regressed on smoke, eat produce frequently, & percent smoke *percent frequently - 90 degrees", theta= 90, cex.main=1) #Same as the above code, but the graph is displayed at a 0 degree angle to better view of %Smoke's slopes mcGraph3(x1= wb$percent.Smoke, x2= wb$percent.Eat.Produce.Frequently, y=wb$Overall.Wellbeing, interaction = TRUE, main=" Overall Well Being regressed on smoke, eat produce frequently, & percent smoke *percent frequently - 0 degrees", theta= 0, cex.main=1)