2. 1 Introduction
One can easily see happiness, but it’s difficult to define. One way to define it is as a sense of
one’s well-being. However one may define happiness, a majority of people try to attain it.
Happiness is important because it moves people forward in a positive way (i.e. enjoying life,
having good thoughts and making good choices). In reality, people do not always have complete
control over their happiness. Research suggests that genetics, environmental stress and factors,
and physical health play a role on one’s overall well-being. The National health and Nutrition
Examination Survey (NHANES) are studies assessing health and nutritional status of adults and
children in the US. NHANES has found that employed women had a higher sense of well-being
than their unemployed counterparts. The Health and Medicine Week published a journal
indicating going to the dentist is linked to one’s overall well-being. Thus, generating people to
manage any factors they can control.
Thus, leading to the question, “What is the happiest city in the nation?” The collected data from
the Gallup-Healthways Well-Being Index, factors of what may define “happiness”, calculates
one’s overall well-being score. Our project shall investigate numerical and categorical predictors
and how they affect a city’s overall well-being score. As well as how numerical and categorical
interacts with one another and how they affect a city’s overall well-being score.
2 Methods
The method process for accomplishing the project can be broken down into three key steps.
2.1 Gathering the Data
We had to gather data, preferably interesting data that was meaningful to analyze. We looked
around on different sites including:“http://www.quandl.com/”, “google public data”, “Federal
Reserve Bank of St. Louis - economic data”, and others. However, most of the data we found
didn’t have a categorical variable which we needed that for this project.
Since we live in San Luis Obispo and most remember that we were named as the “Happiest
Place in America” few years ago, we began to do our research in this route. Ultimately, we were
able to find out that Gallup-Healthways produces an annual ranking of cities in overall-well-
being by doing a detailed poll in every major city. Our group decided that this study was very
interesting. It also fitted our criteria as it contained categorical variables and quantitative
variables. We utilized Gallup-Healthways Well Being Index Data from 2012 to 2013, ranking
U.S. cities for overall well-being calculated from various factors including: percent Obese,
percent Exercise frequently, percent Eat produce frequently, percent Smoke, percent With daily
Stress, and percent Uninsured. Since the dataset was given in such a way that one can sort by
size and state, we decided to create two new variables. The first new variable, population size, is
3. grouped three population sizes into 0, 1, and 2. All the listed 0 is population size less than
300,000, 1 is population size between 300,000 and 1,000,000, and 2 is population size with
1,000,000 +. The second new variable created is region; we sorted the states into the appropriate
regions being grouped into four regions of 0, 1, 2, and 3. All the listed 0 is Western region, 1 is
Midwest region, 2 is South region, and 3 is Northeast region. Figure 1 displays all the variables
and metric descriptions. The 2012- 2013 GallUp-Healthways Well Being Index Data source link
can be found in the bibliography citations.
Figure 1: Data Variables
4. 2.2 Choosing the Right Variables
In order to make meaningful regression models between the different variables, we decided it
was important to first create a matrix plot as shown in Figure 2. The matrix plot displayed all the
variables, making it easier to analyze which variables have a strong relationship with Overall
Well-being.
We are able to see the correlation for each variable with its relationship with Overall Well-being.
Observing the top row, we see that x5: Percent Smoke has a strong relationship with Overall
well-being with a correlation of -0.718. x2: Percent Obese and x3: Percent Exercise also has a
moderately strong relationship with overall well-being with a correlation of - 0.661 and -0.532
respectively. Moreover, x4: Percent Eat Produce Frequently and x9: Region also have a slight
relationship with a correlation of 0.267 and -0.343 respectively. Therefore, we decided to focus
on; Percent Obese, Percent Exercise, Percent Eat Produce Frequently, Percent Smoke, and
Region for creating the regression models.
Figure 2: Matrix Plot of All Variables
(Matrix plot is created with GGally package and
ggpairs function)
Legend
x1: Overall Well-Being
x2: Percent Obese
x3: Percent Exercise
x4: Percent Eat Produce
Frequently
x5: Percent Smoke
x6: Percent with Daily Stress
x7: Percent Uninsured
x8: Population Size
x9: Region
5. 2.3 Creating Regression Models & 3D-Scatter Plots
Following the project criteria: we are asked to produce four types of regression models: 1) two
quantitative predictors 2) one quantitative predictor & one categorical predictor 3) one
quantitative predictor, one categorical predictor, and interaction term 4) two quantitative
predictors and interaction term. In order to create regression models and 3D plots of the models,
we had to utilize different R functions and packages. We used a different functions and packages
for each of the regression models.
For regression 1, Overall Well-being regressed on percent Smoke and percent Eat Produce
Frequently, we utilized the lm function to run the regression and the summary function to
produce the output. Then we installed the “scatterplot3d” package which creates a 3D scatterplot
of the regression model. Then we inserted percent Smoke as x variable, percent Eat Produce
Frequently as y variable and Overall well-being as z variable as inputs into the scatterplot3D
function, as well as other color and label options. Lastly, we created a plane, by saving the
regression model into a variable, then using the plane3d function which is within the package to
create a plane with the scatterplot.
For regression 2, Overall Well-being regressed on Region (categorical), and percent Smoke. We
had to use the factor function to make the region variable into categorical variables. Then we
used the lm function to create the regression. For this model, we also used the scatterplot3D
package, and inputted Region as x variable, percent Smoke as y variable, and Overall Well-being
as z variable, as inputs into the scatterplot3d function, and used other color and label options as
well. Moreover, in order to create the regression lines per region, we subsetted the data into four
regions, then ran four more regressions of each region on Overall Well-being predicted by
percent Smoke. Then we utilized the four regression models to calculate predicted values for
each region, and used the function points3d to plot the predicted lines.
For regression 3, Overall Well-being regressed on Region (categorical), percent Eat Produce
Frequently, and interaction term of the two. We have the Region variable already created as a
categorical variable in model 2. We used a different package called “rockchalk”. The rockchalk
package has many different functions that are useful in presenting regression models. Using the
mcGraph3 function to create 3 dimensional scatterplot with a plane, it enables the usage of
“interaction”. Something the scatterplot3d function was unable to do directly. We inputted
percent Smoke as x1 variable, percent Eat Produce Frequently as x2 variable, Overall Well-
being as y variable and interaction = TRUE, and other options in the mcGraph3 function,
including theta and phi option which rotates the displayed 3D plot.
For regression 4, Overall Well-being regressed on percent Smoke, percent Eat Produce
Frequently, and interaction term of percent Smoke and percent Eat Produce Frequently. We used
the rockchalk package again. Similarly to regression 3 model, we used mcGraph3 function. We
inputted percent Smoke as x1 variable, percent Eat Produce Frequently as x2 variable, and
Overall Well-being as y variable, and interaction = TRUE and other options including the theta
6. option. For more information on Rockchalk and Scatterplot3D packages, please see bibliography
citations. Figure 3 displays the four regression models and main functions and packages used in
each.
‘
Figure 3: Regression Models
1) Regression Model 1: Two quantitative predictors
Overall well being ~ percent.Smoke + percent.Eat.Produce.Frequently
-- Main Functions & Packages used: lm function, summary function, Scatterplot3d package.
2) Regression Model 2: One quantitative predictor & one categorical predictor
Overall well being ~ Region + percent.Smoke
-- Subset function, lm function, summary function, points3d function, Scatterplot3d package.
3) Regression Model 3: One quantitative predictor, one categorical predictor, and interaction term
Overall well being ~ Region + percent Eat Produce Frequently + percent Eat Produce
Frequently * Region
-- Lm function, summary function, mcGraph3 function, rockchalk package.
4) Regression Model 4: Two quantitative predictors and interaction term
Overall well being ~ percent Smoke + percent Eat Produce Frequently + percent Eat Produce
Frequently * percent Smoke
-- Lm function, summary function, mcGraph3 function, rockchalk package.
7. 3 Results
3.1 Regression Model 1: Overall Well-Being ~ %Smoke + %Eat
Produce Frequently (%EPF)
Figure 5: 3D scatter plot of Model 1 with
regression plan. (Created in scatterplot3D)
Figure 4: R regression output for Model 1.
The model includes two
quantitative predictor variables,
%Smoke and %Eat Produce
Frequently, with our response
variable of Overall Well-Being.
The R regression output for model
1 can be seen in Figure 4. Both
predictor variables prove to be
statistically significant in the
presence of one another. But the
coefficients show two different
relationships in respect to our
response variable.
%Smoke shows a negative relationship
with Overall Well-Being. It can be
interpreted as, for each percent increase
in %Smoke results in an expected
decrease of 0.33908 in Overall Well-
being, in the presence of %EPF.
On the other hand, %EPF has a positive
relationship with Overall Well-Being.
For each percent increase in %EPF
there is an associated increase of
0.10848 increases in expected Overall
Well-Being, in the presence of
%Smoke.
These relationships can be seen visually
in Figure 5. If you are to look along the
x-axis, labeled as %Smoke, you can see
strong the negative slope of %Smoke.
While if you look at the y-axis, labeled
as %Eat Produce Frequently, you can
see the moderate positive slope of the
plane.
8. 3.2 Regression Model 2: Overall Well-Being ~ Region + %Smoke
The model consists of our response variable, Overall Well-Being, and two predictor variables,
Region and %Smoke, which are categorical and quantitative variables respectively. The variable
%Smoke proved to be significant in the presence of Region, while only one Region category
produced a significant p-value, but is acceptable for what we wish to do with the data.
Figure 6: R regression output for Model 2.
Figure 7: 3D scatter plot of Model 2
with each Regions’ regression slope.
(Created in scatterplot3D)
We can see in Figure 6, that
%Smoke has a negative
coefficient, which is expected.
This can be interpreted as for
each percent increase in %Smoke
there is an expected decrease in
Overall Well-Being by .365404
while in the presence of the
factor Region.
If we were to calculate the
regression equations for these
coefficients, it would result in the
same slope for all regions with
slightly different intercepts, so
instead we used a different
method to produce more accurate
regression equations.
Figure 7, to better display Regions
relationship in respect to %Smoke
on Overall Well-Being.
To put it simply, we subsetted our
data by region, then ran a regression
on each of our new data sets of
Overall Well-Being ~ %Smoke.
We used those coefficients to create
new, more accurate, regression
equations to plot as our regression
lines for each region. Then we
plotted the new regression equations
on the 3D scatterplot as a regression
line in red to best depict the effect of
%Smoke per Region on Overall
Well-Being.
9. Figure 9: Linear Regression Model of the Regions
(categorical) predictor and %Eat Produce Frequently
(numerical) predictor on one’s Over Well-being Score
(numerical response variable) without an interaction.
Figure 8: 3D graph of Regression
Model 3 with slopes of %Eat
Produce Frequently with Overall
Well-being Score for each Region.
*Blue is Western Region.
*Green is Midwest Region.
*Black is South Region.
*Purple is Northeast Region.
Note: Without interaction
3.3 Regression Model 3: Overall Well-Being ~ Region + %Eat
Produce Frequently + Interaction Term (R * %EPF)
This model has one numeric and
one categorical predictor regressed
onto our response variable (Overall
Well-Being) with an interaction
term added to the model to see how
two predictor variables interact with
one another’s influence on the
response variable. We will look at
the model without the interaction
term first.
The Western Region shall be our indicator variable to
represent effects of levels of the categorical variables on the
response. The Western Region has an estimated slope of
58.17639, Midwest Region has an estimated slope of -
0.45399, South Region has an estimated slope of -1.30124,
Northeast Region has an estimated slope of -1.57456 and
percent Eat Produce Frequently has an estimated slope of
0.15593 in the presence of the Overall Well-Being predicting
variable. The slope differences can be seen in Figure 8.
Noticing the predicted P-values in Figure 9, only the
Midwest Region does not meet the alpha = 0.05 level. Thus
the South, Northeast, and percent Eat Produce Frequently are
statistically significant at the 0.05 level compared to the
Western Region.
The positive coefficient indicated that for every additional
%Eat Produce Frequently you can expect one’s Overall Well-
being to increase an average of 0.16.
The negative coefficient indicated that you can expect one’s
Overall Well-being to be 1.30 lower in the South compared
to the Western Region. The negative coefficient indicated
that you can expect one’s Overall Well-being to be 1.57
lower in the Northeast compared to the Western Region.
10. Figure 10: Linear Regression Model of the Regions
(categorical) predictor and %Eat Produce Frequently
(numerical) predictor on one’s Over Well-being
Score (numerical response variable) with an
interaction.
Figure 11: 3D graph of Regression Model 3
with slopes of %Eat Produce Frequently with
Overall Well-being Score for each Region with
interaction plane.
Now introducing the interaction
term, the slope coefficients for the
two variables become functions of
one another. The slopes of the
three different regions change with
the interaction of %Eat Produce
Frequently compared to the west
region as the “base line”.
For simplification, let’s refer to
%Eat Produce Frequently as X for
the following equations.
Figure 10:
Slope of the Western Region
R0 = 57.02 + 0.175*X.
Slope of the Midwest Region
R1 = 66.1006+0.00668*X.
The slope of the South Region
R2 = 38.18 + 0.473*X.
The slope of the Northeast Region
R3 = 62.210 + 0.0577*X.
Figure 10 displays, only the South Region, percent
Eat Produce Frequently, and the interaction of the
South and percent Eat Produce Frequently are
statistically significant at alpha = 0.10. The
significance of the interaction term causes the
regression plane to curve slightly, for the slope for
each categorical variable changes for each given value
of the numeric variable as shown in Figure 11.
In order to interpret the statistically significant
predictor variables, R2 = 38.18 + 0.473*X. An “x”
value of percent Eat Produce Frequently (ranging from
47-65 percent) multiplied by 0.474, then adding 38.18,
shall calculate the predicted Overall Well-being Score
for the South Region. For example, a 56 value of
percent Eat Produce Frequently is, 38.18 +
(0.473)*(56) calculating to a predicted 64.668 Overall
well-being Score for the South Region. Continuing off
the example, while %Eat Produce Frequently
increases, it indicated the predicted Overall Well-
being Score value to increase. Implying that eating
five or more servings of fruits and vegetables, four or
more days per week has a positive effect on one’s
Over Well-being score.
11. 3.4 Regression Model 4: Overall Well-Being ~ %Smoke+ %Eat
Produce Frequently + Interaction Term (%Smoke * %EPF)
Figure 12 displays the model with two quantitative variables, % Smoke and % Eat Produce
Frequently, regressed on Overall Well-Being, with an interaction term added to the model.
Allowing us to see how the two predictor variables interact with one another’s influence on the
response variable. Without an interaction term % Smoke has a slope of -0.33908 and % Eat
Produce Frequently has a slope of 0.10845 in the presence of one another when predicting
Overall Well-Being (more information can be found about these relationships in Model 1). These
individual trends still exist after adding the interaction term, but become intertwined with one
another by the interaction term. The interaction term proved to be significant at an alpha level of
0.1 and the model was able to capture 53.9% of the variation in Overall Well-Being, as seen by
the adjusted R-squared value.
Figure 12: R regression output for
Regression Model 4.
12. Upon introducing the interaction term, the slope coefficients for the two variables become
functions of one another. The slope of %Smoke becomes: 0.6066 - 0.016449*%EPF, and the
slope of %Eats Produce Frequently: 0.444170 - 0.016449*%Smoke. This interaction causes the
regression plane to curve, for the slope for each predictor variable changes for each given value
of the other predictor variable as seen in Figure 13. For example, the slope of %Eat Produce
Frequently is 0.4914 when %Smoke is equal to 7, while the slope changes to 0.0424 when
%Smoke is equal to 34.3.
A simple way to interpret this is to once again look at the functions of the slopes given the other
variable. Continuing off the previous example, while smoking rates increase for each city, it
causes the positive effect of eating produce to diminish. It is also worth pointing out if the trend
continues, let’s say the smoking rate of a city were to be above 40%, then the slope of %Eat
Produce Frequently, would actually become negative, so the negative effects from smoking will
outweigh the positive effects of eating well.
Figure 13: Two 3D graphs of Regression Model 4 with
an interaction plane.
13. 4 Discussion
4.1 Findings
Based on the regression models and 3-dimensional scatterplots that we have created, some of our
key findings include:
I) Positive association between eating produce frequently and overall well-being (Models1,3,4)
II) Negative association between smoking and overall well-being (Models 1, 2, & 4)
III) Effects of eat produce frequently and smoke differ depending on region (Models 2 & 3).
IV) A significant interaction of the two variables: eat produce frequently and smoking on overall
well-being. (Model 4)
4.2 Learnings
We learned how to effectively create detailed regression models and 3-dimensional scatter-plots
in R software with a list of diverse functions in the rockchalk and scatterplot3d packages. We
learned how to interpret the different regression models that we created, especially the models
with interaction terms within.
4.3 Challenges
Finding the appropriate package that would allow us to create and graph 3-dimensional
regression models with interaction terms. We struggled with this immensely, until, we were able
to find the rockchalk package that allowed us to overcome this obstacle.
4.4 Suggestions
Explore other packages to find the most useful one in respects to your study before delving
deeper into one package only to ditch it later. Use a data set with many varying types of variables
to produce better models. Find a dataset that has a bigger sample size to create better models that
give a higher power. For example, our dataset only looked at 189 cities, there are many smaller
counties less than population size of 300,000, a researcher can look and gather more data with
smaller counties.
4.5 Going Onward
Some other research ideas that we can pursue from this project:
I) What produce actually has the biggest impact on overall wellbeing? I.e. fruit produces: apples,
oranges, strawberries; vegetable produces: mushrooms, asparagus, peas, etc.
II) How many cigarettes does one has to smoke to feel the negative effects on overall well-being.
III) Since this project was done on a macro-level looking at U.S. cities, we can attempt to do
research on micro level where it is individuals basis. For example, we can survey a random
sample of college students, working professionals, and senior citizens, and collect more variables
and factors affecting their overall well-being.
14. 5 Bibliography
Barret Schloerke, Jason Crowley, Di Cook, Heike Hofmann, Hadley Wickham, Francois Briatte
and Moritz Marbach (2014). GGally: Extension to ggplot2.. R package version 0.4.6.
http://CRAN.R-project.org/package=GGally
Ligges, U. and Mächler, M. (2003). Scatterplot3d - an R Package for Visualizing Multivariate
Data. Journal of Statistical Software 8(11), 1-20.
Paul E. Johnson (2013). rockchalk: Regression Estimation and Presentation. R package version
1.8.0. http://CRAN.R-project.org/package=rockchalk
R Core Team (2013). R: A language and environment for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
"U.S. Community Well-Being Tracking." U.S. Community Well-Being Tracking. G
.allup, 1 Jan. 2013. Web. 7 June 2014. http://www.gallup.com/poll/145913/City-Wellbeing-
Tracking.aspx?ref=s
15. 6 Annotated Code
###########################################
# Stat 331 Geometry Proj #
# Hubert Lo Gregory Knothe Crystal Macias #
###########################################
#Setting up R, loading/inspecting the data set
rm(list=ls())
getwd()
wb <- read.csv(file.choose())
head(wb)
names(wb)
str(wb)
#Installing needed R packages
install.packages("scatterplot3d")
library(scatterplot3d)
install.packages("rockchalk")
library(rockchalk)
install.packages("ggplot2")
library(ggplot2)
install.packages("GGally")
library(GGally)
#Creating a correlation matrix for initial analysis with the help of the package GGally
ggpairs(wb, columns= 2:10)
##############################################################################
###############################
# Regression Model 1 #
#Using factor() to mold wb$Region into a factor (Note: We uses this variable in later models)
fregion=factor(wb$Region, labels=c("Western Region", "Midwest Region", "South Region",
"Northeast Region"))
is.factor(fregion)
#Using factor() to mold wb$Population.size into a factor (Note: We ended up not using this
variable)
fpopsize <- factor(wb$Population.size, labels=c("Population<300,000", "Population 300,000-
1,000,000","Population 1,000,000+"))
is.factor(fpopsize)
#Running a regression of %Smoke and %EPF on Overall.Wellbeing with the lm() function
mod1=lm(Overall.Wellbeing~ percent.Smoke + percent.Eat.Produce.Frequently, data=wb)
16. summary(mod1)
#Creating a 3D scatterplot of Model 1 with appropriate labels
s3d<-scatterplot3d(wb$percent.Smoke, wb$percent.Eat.Produce.Frequently,
wb$Overall.Wellbeing,color="red", pch=19, type= "h",
xlab="% Smoke", ylab="% Eat Produce Frequently", zlab="Overall well being", main= "Overall
well being regressed on % smoke & % eat produce frequently")
#Saving the regression Model 1 into a variable "plane"
plane<- lm(wb$Overall.Wellbeing ~ wb$percent.Smoke + wb$percent.Eat.Produce.Frequently)
#Graphing the regression plane of Model 1 onto the existing 3D scatterplot
s3d$plane3d(plane)
##############################################################################
###############################
# Regression Model 2 #
#Running a regression of %Smoke and Region on Overall.Wellbeing with the lm() function
mod2 = lm(wb$Overall.Wellbeing ~ factor(wb$Region) + wb$Smoke)
#Creating a 3D scatterplot of Model 2 with appropriate labels
s3d = scatterplot3d(wb$Region, wb$percent.Smoke, wb$Overall.Wellbeing,
pch=16, highlight.3d=TRUE, main="Overall Wellbeing ~ Region + Smoke",
xlab = "Region", ylab="%Smoke", zlab="Overall Wellbeing")
#Creates subsets of data for each Region to preform regression on to obtain the slopes for each
region
R0 = subset(wb, wb$Region==0)
R1 = subset(wb, wb$Region==1)
R2 = subset(wb, wb$Region==2)
R3 = subset(wb, wb$Region==3)
#Creating a vector filled with values of %Smoke to later use to graph the regression lines for
each region
x = seq(5.7,35,.1)
#Creating a regression equation for Region0, then using a for loop to create a predicted values
for each value
#in the array "x" to plot the regression line for Region0
lm0 = lm(R0$Overall.Wellbeing ~ R0$percent.Smoke)
pred=NULL
for (i in 1:length(x)) {
pred[i] = coef(lm0)[1] + coef(lm0)[2]*x[i] }
17. #Plotting the predicted values in a line to display the regression eqauation for Region0
s3d$points3d(rep(0,length(x)), x, pred, col="red", pch=16, cex=.5)
#Creating a regression equation for Region1, then using a for loop to create a predicted values
for each value
#in the array "x" to plot the regression line for Region1
lm1 = lm(R1$Overall.Wellbeing ~ R1$percent.Smoke)
pred=NULL
for (i in 1:length(x)) {
pred[i] = coef(lm1)[1] + coef(lm1)[2]*x[i] }
#Plotting the predicted values in a line to display the regression eqauation for Region1
s3d$points3d(rep(1,length(x)), x, pred, col="red", pch=16, cex=.5)
#Creating a regression equation for Region2, then using a for loop to create a predicted values
for each value
#in the array "x" to plot the regression line for Region2
lm2 = lm(R2$Overall.Wellbeing ~ R2$percent.Smoke)
pred=NULL
for (i in 1:length(x)) {
pred[i] = coef(lm2)[1] + coef(lm2)[2]*x[i] }
#Plotting the predicted values in a line to display the regression eqauation for Region2
s3d$points3d(rep(2,length(x)), x, pred, col="red", pch=16, cex=.5)
#Creating a regression equation for Region3, then using a for loop to create a predicted values
for each value
#in the array "x" to plot the regression line for Region3
lm3 = lm(R3$Overall.Wellbeing ~ R3$percent.Smoke)
pred=NULL
for (i in 1:length(x)) {
pred[i] = coef(lm3)[1] + coef(lm3)[2]*x[i] }
#Plotting the predicted values in a line to display the regression eqauation for Region3
s3d$points3d(rep(3,length(x)), x, pred, col="red", pch=16, cex=.5)
##############################################################################
###############################
# Regression 3 #
#Running a regression of Region and %EPF on Overall.Wellbeing with the lm() function
18. rm3=lm(Overall.Wellbeing~fregion + percent.Eat.Produce.Frequently, data=wb)
summary(rm3)
#Color cordinates the regions
col = factor(wb$Region, labels=c("blue", "green", "black", "purple"))
#Creating a 3D scatterplot of Model 3 with appropriate labels
RM3 <- scatterplot3d(x=wb$Region, y=wb$percent.Eat.Produce.Frequently,
z=wb$Overall.Wellbeing,
highlight.3d = F,
color=col,
main = "3D Scatterplot: Region vs % Eat Produce Frequently",
xlab="4 different Regions",
ylab="%Eat Produce Frequently",
zlab="Overall Well being")
#Creates subsets of data for each Region to preform regression on to obtain the slopes for each
region
R0 = subset(wb, wb$Region==0)
R1 = subset(wb, wb$Region==1)
R2 = subset(wb, wb$Region==2)
R3 = subset(wb, wb$Region==3)
#Creating a vector filled with values of %EPF to later use to graph the regression lines for each
region
x=NULL
x = seq(45,70,.01)
#Creating a regression equation for Region0, then using a for loop to create a predicted values
for each value
#in the array "x" to plot the regression line for Region0
lm0 = lm(R0$Overall.Wellbeing ~ R0$percent.Eat.Produce.Frequently)
pred=NULL
for (i in 1:length(x)) {
pred[i] = coef(lm0)[1] + coef(lm0)[2]*x[i] }
#Plotting the predicted values in a line to display the regression eqauation for Region0
RM3$points3d(rep(0,length(x)), x, pred, col="red", pch=16, cex=.5)
#Creating a regression equation for Region1, then using a for loop to create a predicted values
for each value
#in the array "x" to plot the regression line for Region1
lm1 = lm(R1$Overall.Wellbeing ~ R1$percent.Eat.Produce.Frequently)
pred=NULL
for (i in 1:length(x)) {
19. pred[i] = coef(lm1)[1] + coef(lm1)[2]*x[i] }
#Plotting the predicted values in a line to display the regression eqauation for Region1
RM3$points3d(rep(1,length(x)), x, pred, col="red", pch=16, cex=.5)
#Creating a regression equation for Region2, then using a for loop to create a predicted values
for each value
#in the array "x" to plot the regression line for Region2
lm2 = lm(R2$Overall.Wellbeing ~ R2$percent.Eat.Produce.Frequently)
pred=NULL
for (i in 1:length(x)) {
pred[i] = coef(lm2)[1] + coef(lm2)[2]*x[i] }
#Plotting the predicted values in a line to display the regression eqauation for Region2
RM3$points3d(rep(2,length(x)), x, pred, col="red", pch=16, cex=.5)
#Creating a regression equation for Region3, then using a for loop to create a predicted values
for each value
#in the array "x" to plot the regression line for Region3
lm3 = lm(R3$Overall.Wellbeing ~ R3$percent.Eat.Produce.Frequently)
pred=NULL
for (i in 1:length(x)) {
pred[i] = coef(lm3)[1] + coef(lm3)[2]*x[i] }
#Plotting the predicted values in a line to display the regression eqauation for Region3
RM3$points3d(rep(3,length(x)), x, pred, col="red", pch=16, cex=.5)
#Running a regression of Region, %EPF, and an interaction term on Overall.Wellbeing with the
lm() function
mod3=lm(Overall.Wellbeing~ fregion + percent.Eat.Produce.Frequently +
fregion*percent.Eat.Produce.Frequently, data=wb)
summary(mod3)
#Creates a 3D graph of Model 3 (with the interaction term) with the mcGraph3() function in the
rockchalk package
#with all the appropriate labels and an interaction plane
mcGraph3(x1=wb$Region, x2=wb$percent.Eat.Produce.Frequently, y=wb$Overall.Wellbeing,
interaction=TRUE,
main = "3D Graphic with Region & Eat healthy Frequently
Interation Term",
x1lab="4 different regions
West, Midwest, South, Northeast",
x2lab="
20. Servings of fruits &
vegetables days per week",
ylab="Overall Wellbeing")
##############################################################################
###############################
# Regression 4 #
#Running a regression of %Smoke, %EPF, and an interaction term on Overall.Wellbeing with
the lm() function
mod4=lm(Overall.Wellbeing ~ percent.Smoke + percent.Eat.Produce.Frequently +
percent.Smoke*percent.Eat.Produce.Frequently, data= wb)
summary(mod4)
#Spits the graphics device in R into two parts to plot both points of view of the regression model
in following code
par(mfrow=c(1,2))
#Creates a 3D graph of Model 4 (with the interaction term) with the mcGraph3() function in the
rockchalk package
#with all the appropriate labels and an interaction plane at 90 degrees to better view of %EPF's
slopes
mcGraph3(x1= wb$percent.Smoke, x2= wb$percent.Eat.Produce.Frequently,
y=wb$Overall.Wellbeing, interaction = TRUE,
main=" Overall Well Being regressed on smoke,
eat produce frequently, &
percent smoke *percent frequently - 90 degrees", theta= 90,
cex.main=1)
#Same as the above code, but the graph is displayed at a 0 degree angle to better view of
%Smoke's slopes
mcGraph3(x1= wb$percent.Smoke, x2= wb$percent.Eat.Produce.Frequently,
y=wb$Overall.Wellbeing, interaction = TRUE,
main=" Overall Well Being regressed on smoke,
eat produce frequently, &
percent smoke *percent frequently - 0 degrees", theta= 0,
cex.main=1)