1. Modeling GPA of
NCSTATE students using
multiple linear regression
2015
ST 495 FINAL PROJECT
OLUSHEYE ERUJA
NORTH CAROLINA STATE UNIVERSITY
2. Data Description:
The datasetcontainsthe GPA (Grade pointAverage),Studyhours(Hoursspentstudyingperweek),
Workinghours(Hoursspentworkingperweek) andSleephours(hoursspentsleepingperweek) for30
NorthCarolinaState Universitystudents thatworkandgo tocollege.The variable “GPA”isthe
dependentvariable,while the variables“Study”,“Working”and“Sleep”are the independentvariables
or predictors.
Objectives:
We wishtodetermine if studyhours,workinghoursandsleephoursare useful inpredictingthe GPA of
students.We alsowishtocheck the 95% confidence intervalforthe regressioncoefficientforthe
“Study”parameter.Therefore we modelthe GPA of studentsusingmultiplelinearregression.
Methods:
We use multiple linearregressiontoanalyze the data.We start by usingscatterregression plotstocheck
the linearrelationshipbetweenthe dependentvariable andthe predictors.Thenwe use the “PROC
CORR” to checkthe correlationbetweenall the variables(bothx andy variables).Thenwe alsocheck
the assumptionsof regression:i) linearitybetweenthe yvariable andthe predictorsii) constantvariance
of the residualsiii) normalityof the residuals.Thenwe checkforthe confidenceintervalforthe slope of
the predictor“Study”(Ichose the “Study”parameterbecause itisthe onlyx variable thathas a strong
positive correlation withGPA).Thenfinallywe use MLRto model the GPA of studentsusingthe hoursof
study,hoursof workingandhoursof sleepingaspredictors.
Results:
20 30 40 50 60
Study
2.0
2.5
3.0
3.5
4.0
4.5
GPA
RegressionGPA
6. Parameter Estimates
Variable Label DF
Parameter
Estimate
Standard
Error t Value Pr > |t|
95% Confidence
Limits
Intercept Intercep
t
1 3.73197 0.40635 9.18 <.0001 2.89671 4.56724
Study Study 1 0.01935 0.00606 3.19 0.0036 0.00690 0.03179
Working Workin
g
1 -0.02649 0.00486 -5.45 <.0001 -
0.03647
-
0.01651
Sleep Sleep 1 -0.01048 0.00352 -2.98 0.0062 -
0.01771
-
0.00325
From the residual plotsabove,i.e“Residual vsPredictedvalue”,“Residual vsStudy”,“Residualvs
Working”and “Residual vsSleep”,we observe thatthe dotsare all scatteredrandomlyaroundthe
horizontal bandaboutzero,sowe conclude thatthe residualshave constantvariance.We alsoobserve
that “Residual vsQuantile”plotisfitted,sowe conclude thatthe residualsare normallydistributed.All
variableshave alinearrelationship,the residualshave constantvariance andthe residualsare normally
distributed, therefore none of the assumptionsforMLRseemsto be violated.
Usingthe outputgeneratedfromthe PROCREG statement,we testwhetherStudyhours,Workinghours
and Sleepinghoursare useful forpredictingthe GPA of students(H0:B1 = B2 = B3 = 0 VSHA: B1 = B2 =
B3 isnot equal to0). The F-value is97.70 and the p-value is“< 0.0001”. Since itis lessthana significance
level of 0.05, we have enoughevidence torejectthe Null hypothesisandconclude thatB1=B2=B3 is not
equal to0 or the partial slope of atleastone of Study,Working,Sleepissignificantlydifferentfrom0.
So we conclude thatthe overall model isstatisticallysignificant.
Residual by Regressorsfor GPA
40 50 60 70 80
Sleep
10 20 30 40 50 60
Working
20 30 40 50 60
Study
-0.2
0.0
0.2
0.4
Residual
-0.2
0.0
0.2
0.4
Residual
7. The R^2 value indicatesthat91.85% of the variationinGPA of students canbe explained bythe study
hours,workinghoursandsleepinghours.
The t-valuesare usedtotest the significance of individualpredictors.H0:B1 = 0, H0: B2 = 0, H0: B3 = 0.
(Each at 0.05 significance level).The predictor“Study”hasa t-value of 3.18 andp-value of 0.0036, so we
rejectthe null hypothesisandconclude that “Study”isa statisticallysignificantpredictorandB1 is not
equal to0. The predictor“Working””has a t-value of -5.45 and a p-value of “<0.0001” whichis lessthan
0.05, so we rejectthe null hypothesisforthispredictorandconclude thatitisa statisticallysignificant
predictor.The “Sleep”predictorhasat-value of -2.98 and a p-value of 0.0062, whichisalsolessthan
0.05. so we rejectthe Null hypothesis(H0:B3=0) and conclude itis a statistically significantlyuseful
predictor.
B1 (partial slope parameterforStudy) is0.01935. Thiscan be interpretedas,Forevery1hr increase in
Studyhours,The GPA is expectedtoincrease by0.01935 hrs.
B2 (partial slope parameterforWorking) is-0.02649. Thiscan be interpretedasforevery1hrincrease in
Workinghrs,GPA isexpectedtodecrease by0.02649hrs.
B3 (partial slope parameterforSleep) is -0.01048. Thiscan be interpretedasforevery1hr increase in
sleepinghours,GPA isexpectedtodecrease by0.01048hrs.
95% confidence intervalforB1(Studyparameter) is(0.00690, 0.03179) whichcan be interpretedas“We
are 90% confidentthatthe true slope of the parameterisbetween0.00690 and 0.03179” or “We are
90% confidentthatforevery1 hr increase inStudyhours,the GPA will increase between0.00690 and
0.03179 hrs onaverage givenafixedvalue of Workingandsleepinghours.
Summary:
In conclusion,we modeledGPA of 30 NCSTATEstudentsusingMLR withtheirstudyhours,working
hoursand sleepinghoursaspredictors.NoMLR assumptionsforresidualwasviolated.All the predictors
are statisticallysignificantaccordingtot-testsconducted.The overall model isstatisticalsignificant
accordingto F and p values,thatmeansat leastone of B1, B2 AND B3 is notequal to 0. The resultsshow
it isadvisable forstudentstoworkandsleeplessandstudymore tohave a highGPA.
Appendix:
proc import out=students
datafile="/folders/myshortcuts/myfolder/sasfinalproject
(1).xlsx"
dbms=xlsx;
getnames=yes;
run;
proc sgplot data=students;/*Examining linear relationship
between GPA and Study hours per week*/
scatter x=Study y=GPA;
reg x=Study y=GPA;
run;
proc sgplot data=students;/*Examining linear relationship
8. between GPA and Working hours*/
scatter x=Working y=GPA;
reg x=Working y=GPA;
run;
proc sgplot data=students;/*Examining linear relationship
between GPA and Sleep hours*/
scatter x=Sleep y=GPA;
reg x=Sleep y=GPA;
run;
proc corr data=students;/*Examining relationship between all
variables*/
var GPA Study Working Sleep;
run;
proc reg data=students;/*Fitting Multiple regression model*/
model GPA = Study Working Sleep;
run;
proc reg data=students;
model GPA = Study Working Sleep/clb clm cli;
run;
DATA:
GPA Study Working Sleep
3.8 50 10 46
2.34 25 48 53
2.409 28 40 56
2.84 30 40 50
4 55 20 40
3.125 34 25 54
3.67 30 15 40
2.25 25 50 60
2 18 51 80
3.51 32 20 56
2 15 58 40
3.999 55 18 38
2.46 29 38 56
2.7 25 25 56
3.6 35 30 46
2.9 30 40 45
3 35 20 59
4 60 10 40