Constructing a Simple Linear Regression with R - Not for beginners One should have the basic concept in statistics to understand this and the different terms associated with this work sheet. #Simple Linear Regression #R #Data & Analytics
Elements of language learning - an analysis of how different elements of lang...
Simple Linear Regression with R
1. ## Simple Linear Regression With R ##
#One should have the basic concept in statistics to understand this and the different terms associated with this work
sheet.
# Here we will use an inbuilt data from R called mtcars, you can use external data also
?mtcars # First we understand what the data set is.
data(mtcars) # to import the data set mtcars into R
names(mtcars) #See the names of the variables in the data set mtcars
attach(mtcars) #attach the data set mtcars to current workspace
plot(cyl, hp,main = "Scatterplot of Number of cylinders & Gross horsepower") #scatterplot of the data
cor(cyl, hp) #to find the correlation between two variables
fix(mtcars) #to change the data set if there is any mistake
?lm # Function for fiting linear model
reg1<-lm(hp ~ cyl) # to get the regression equation of a bivariate data i.e. hp and cyl, where we are predicting the
horse power using the variable cylinders, hp=y and cyl=x
reg1 # to see the object reg1
summary(reg1) # gives the whole summary for the regression equation
#We get the summary for residuals or errors i.e the minimum, Q1, Q2 or median, Q3 and Q4 or maximum
#Estimate gives the regression model or equation which is y = -51.054 + 31.958x
#Standard Error of the intercept and slope(the independent variable x1)
#T value is given for hypothesis test that the intercept and slope is zero = 0, test that intercept is 0 is often not
of interest.
#Pr - Probability is given for the intercept and slope is zero = 0, the value for cyl is 3.48e-09, the standard cut
off is .05 which means that Cyl is a good predictor for hp
#Residual standard error - measure of variation of individual observations around the regression line, this is the
same as the square root of the mean squared error or it is e = y - yhat
#R-square is 0.693 which means 69% of the variation in dependent variable (y) is explained or predicted by the
independent variable (x).
#Adjusted R-Square
#F-statistics is the F ratio for hypothesis test that all the coefficents in the model are 0. This test is used to
see if Xi(all the independent variables or variable) is a significant explanatory variable of dependent variable (y).
attributes(reg1) #we can see the list of attributes associated with the object reg1
reg1$rank # we can see each attributes of the object
reg1$coefficients # We ask for the coefficient of the model reg1
abline(reg1, lwd = 3, col = 2) # ploting the regression line on graph
confint(reg1) # to see the confidence interval
confint(reg1, level = 0.99) # to see the confidence interval at 99% significance
predict(reg1) # Prediction of Hp based on no. of cyl
predict(reg1, interval = "prediction") # Confidence interval for predicted values
anova(reg1) #ANOVA table for the regression model
sqrt(1491) # find the square root of Mean square of residuals = residual standard error
rm(list = ls()) # clear all the data set imported in to R
## By Jerome Gomes ##