SlideShare a Scribd company logo
1 of 28
Download to read offline
Scott Cunningham
Final Project
STAT 512
31/7/2015
PART I
1) Looking at the graph below, it is clear that the two pieces are not the same line.
This is confirmed by the sameline test I performed, as shown below.
The p-value is just barely significant (if we don’t get too picky about thousandths
and ten-thousandths). Thus, the extra 0.0003 is not enough for me to call it
insignificant. Because we have a significant p-value, we can reject the null
hypothesis β2 = 0 in favor of the alternative hypothesis β2 ≠ 0.
2) (a) Extra sum of squares = SSE(R)–SSE(F) = 306.48106–277.86404 = 28.61702
F-value: F(1, 34) = = = 3.50163
(b) F(1, 34) = 3.50 p-value = 0.0699 Conclusion: Fail to reject H0, and β7 = 0
(c) t* = −1.87 p-value = 0.0699 (t*)2
= 3.4969
A t-test with n degrees of freedom is equivalent to an F-test with (1, n) degrees of
freedom. As you can see the t-value squared gives the F-value from the previous
parts of the problem. Hence, both produce the same p-value.
3) Σ Type-I SS = 632.17685 Σ Type-II SS = 104.00489
The Type-I SS sum to SSM.
The two types of SS are equal for the danger predictor. This is because the
conditional probability is the same in both cases:
SS(Danger | BodyWt BrainWt Dreaming LifeSpan Gestation Predation Exposure)
so we get the same value.
4) Explanatory Variables R2
BodyWt BrainWt Dreaming LifeSpan Gestation Predation Exposure Danger 0.6952
BodyWt 0.1175
BrainWt 0.1136
Dreaming 0.5287
NonDreaming 0.9364
LifeSpan 0.1463
Gestation 0.3776
Predation 0.0078
Exposure 0.3861
Danger 0.3652
BodyWt BrainWt 0.1186
BodyWt BrainWt Sum 0.4118
LifeSpan Gestation 0.3780
Dreaming Sum 0.6346
NonDreaming Sum 0.9376
Predation Exposure Gestation 0.5404
BodyWt BrainWt Dreaming 0.5991
Part II
1) These are the initial scatter plots for each variable:
I decided that BodyWt, BrainWt, Gestation and Lifespan needed transformed
because they are all non-linear.
At the professor’s suggestion, I took the ratio of BrainWt to BodyWt to transform
those two. I also took the inverse of both Gestation and Lifespan.
LifeInverse ended up being fairly non-linear so I checked two further
transformations: square root and log. I decided to go with logLifeInv as it is more
linear.
Note: From here onward I use NonDreaming and threw out Dreaming
because NonDreaming had a more linear relationship with TotalSleep.
NonDreaming also has lower or the same correlation with all the variables
compared to Dreaming (see below).
I ran the correlation procedure and found that Predation, Exposure and Danger,
were all highly correlated. Thus, I removed Danger as it had the highest
correlation with the other two (it’s also the least linear of the three).
Box-Cox gave optimal λ as 0.75 so I decided Y did not need transformed.
I ran regression with the following model to obtain residuals:
TotalSleep = NonDreaming + Predation + Exposure + BrainBody + GestInv + logLifeInv
Thus, I obtained the following residual plots:
I also produced the following histogram and QQ-plot, which both indicate the
residual are approximately Normal.
I also checked TotalSleep with a QQ-plot and found that it was approximately
Normal as well.
In summation, I would conclude that
TotalSleep = NonDreaming + Predation + Exposure + BrainBody + GestInv + logLifeInv
is a good model to begin with as a starting point.
2) Mallow’s Cp reported the following models (I’ve only included the first 12):
I’ve highlighted my selection for best model in blue. I would use this model
because it has a Cp < p which is good, and a very high R2
. Adding more variables
doesn’t increase the R2
value very much, so it would be unnecessary. I don’t
want to include the BrainBody term since a negative coefficient would not make
sense for this variable, because it is positively correlated with TotalSleep, as
evidenced by its scatter plot. Given the somewhat high (~0.62) correlation
between Predation and Exposure, it does not make sense to have both in the
model, and as you can see, switching Exposure with Predation (last entry) results
in a Cp > p, which would not be a good model.
The following is a summary of the part of the table I have removed.
The Cp values slowly increase as different combinations of variables are tried,
with NonDreaming remaining constant among them. Once NonDreaming is
removed, the Cp values skyrocket, increasing by approximately 2000% on the
“best” model without NonDreaming (highlighted in yellow below), and increasing
further from thereon as different combinations without NonDreaming are tried.
This result makes sense as TotalSleep = NonDreaming + Dreaming, so it is the
most influential explanatory variable.
In conclusion, I choose the best model to be:
TotalSleep = β0 + β1NonDreaming – β2Predation + β3GestationInverse
3) The stepwise selection method produced the following result:
As this is the same model as selected above, I will not restate my reasons for
selecting it. However, I shall list some interesting points.
Some points of note:
 Predation contributes very little to R2
, however if we want to satisfy Cp < p,
it is necessary to include it. This is also true of GestationInverse.
 As pointed out above, NonDreaming contributes very heavily to
TotalSleep, which is why it has such a large partial R2
.
 The stepwise selection produces the same “best” model as the Cp
criterion.
To reiterate, I choose the best model to be:
TotalSleep = β0 + β1NonDreaming – β2Predation + β3GestationInverse*
*Note: On Mixable I saw Professor Sharabati saying he would suggest keeping
Dreaming and throwing out NonDreaming. I tested my model at every step after
switching NonDreaming with Dreaming and had much worse results, which
prompted meto continue using NonDreaming.
4) The residual plot for NonDreaming appears to be okay, except for maybe an
outlier at ~ −2.75.
The residual plot for Predation is fine, no discernible pattern.
However, the residual plot for GestationInverse indicates that the constant variance
assumption may be violated. We can also see that possible outlier at ~ −2.75.
Summary of residuals:
The GestationInverse residual plot makes me cautious, but I would not
denounce the model just yet.
I see no reason to assume the responses are not independent.
Looking at the following histogram and QQ-plot indicates that the residuals are
approximately Normally distributed. The histogram tells me that the possible
outlier is probably not an outlier, but I will confirm this in the nest question.
Based on the scatter plots in Question 1, I would say the linearity assumption is
not violated.
Overall, I would say this is an acceptable model to use with some caution due to
the GestationInverse residuals having slight problem with constant variance.
5) I use the Studentized Residuals and Cook’s Distance to check for outliers and
influential observations. I use VIF to check for multicollinearity.
VIF results: All VIF scores are well below the threshold for determining
multicollinearity. I conclude that multicollinearity is not a problem in the model.
For the residuals I only include output for the most influential/unusual points:
Looking at the Studentized Residuals and fences, none of the largest are
considered outliers.
Cook’s Distance Critical F-value = F(4,40) (.5) = 0.85356585
As you can see, none of the largest Cook’s D values come close to exceeding
the critical value.
In conclusion, I have determined and statistically proved that the suspected
outlier from the residual and QQ-plots is in fact, not an outlier, and that there are
no influential observations.
6) (a) = 1.153133 + 1.05652(NonDreaming) – 0.28902(Predation)
+ 35.46003(GestationInverse)
(b) 90% C.I. for µh : Highlighted in green below (first 20 obs.)
(c) 90% P.I. for (h)new : Highlighted in pink below (first 20 obs.)
(d) 90% C.I. for βi : Highlighted in blue below
SAS CODE
*data imported using File menu
PART 1 ;
symbol1 v=dot i=sm75S;
proc gplot data = sleep;
plot TotalSleep* (BodyWt BrainWt NonDreaming Dreaming Lifespan
Gestation Predation Exposure Danger);
run; *I used this to figure out which variable I wanted to use;
quit;
data sleep; *I decided on gestation, so I create the cslope term;
set sleep;
if gestation le 175
then cslope=0;
if gestation gt 175
then cslope=(gestation-175);
proc reg data=sleep; *regression to get equation;
model totalsleep=gestation cslope / p;
output out=sleepoutpred p=pred;
sameline: test cslope; *sameline test;
run;
quit;
symbol1 v=circle i=none c=black;
symbol2 v=none i=join c=red;
title1 'Question 1 - Piecewise Regression';
title2 'Scott Cunningham';
axis1 label = (angle=90 'TotalSleep');
proc sort data=sleepoutpred; by gestation;
proc gplot data=sleepoutpred;
plot (totalsleep pred)*gestation / overlay
vaxis=axis1;
run;
quit; *plotting the graph;
* END PROBLEM 1
--------------------------------------------------------------
PROBLEM 2 ;
data sleep; *creating sum;
set sleep;
sum = lifespan+gestation;
proc reg data = sleep; *running the two regressions;
model totalsleep = bodywt brainwt dreaming predation exposure danger;
model totalsleep = bodywt brainwt dreaming predation exposure danger
sum;
nilsum: test sum; *F-test;
run;
quit;
* END PROBLEM 2
-------------------------------------------------------------
PROBLEM 3 ;
proc reg data = sleep;
model totalsleep = bodywt brainwt dreaming lifespan gestation predation
exposure danger / ss1 ss2;
run;
quit;
* END PROBLEM 3
--------------------------------------------------------------
PROBLEM 4 ;
proc reg data = sleep;
model totalsleep = bodywt;
model totalsleep = brainwt;
model totalsleep = dreaming;
model totalsleep = nondreaming;
model totalsleep = lifespan;
model totalsleep = gestation;
model totalsleep = predation;
model totalsleep = exposure;
model totalsleep = danger;
model totalsleep = bodywt brainwt;
model totalsleep = bodywt brainwt sum;
model totalsleep = lifespan gestation;
model totalsleep = dreaming sum;
model totalsleep = nondreaming sum;
model totalsleep = predation exposure danger;
model totalsleep = bodywt brainwt dreaming;
run;
quit;
* END PROBLEM 4
--------------------------------
PART 2
PROBLEM 1 ;
symbol1 v=dot i=sm75S;
title1 'Question 1 - Scatter Plot with Smoothing Curve';
title2 'Scott Cunningham';
proc gplot data = sleep;
plot TotalSleep* (BodyWt BrainWt NonDreaming Dreaming Lifespan
Gestation Predation Exposure Danger) / vaxis=axis1;
run; *to examine the response variables;
quit;
data sleep; *creating transforms of the variables I think need it;
set sleep;
brainbody = brainwt/bodywt;
gestinv = 1/gestation;
lifeinv = 1/lifespan;
proc gplot data = sleep;
plot TotalSleep*(brainbody gestinv lifeinv) / vaxis=axis1;
run; *checking the new transformed variables;
quit;
data sleep; *checking two possible further transformations;
set sleep;
loglifeinv=log(lifeinv);
sqrtlifeinv=sqrt(lifeinv);
proc gplot data = sleep; *checking again;
plot TotalSleep*(loglifeinv sqrtlifeinv) / vaxis=axis1;
run;
quit;
proc corr data=sleep; *checking correlation between responses;
var nondreaming dreaming predation exposure danger brainbody gestinv
loglifeinv;
proc transreg data = sleep; *performing Box-Cox to check if Y needs to be
transformed;
model boxcox(totalsleep)=identity(nondreaming predation exposure
brainbody gestinv loglifeinv);
run;
quit;
proc reg data = sleep;
model totalsleep = nondreaming predation exposure brainbody gestinv
loglifeinv / r;
output out=sleepoutresid r=resid;
run; *computing the residuals;
quit;
symbol1 v=dot i=none;
title1 'Question 1 - Residual Plot';
title2 'Scott Cunningham';
axis1 label = (angle=90 'Residual');
proc gplot data = sleepoutresid;
plot resid*(nondreaming predation exposure brainbody gestinv
loglifeinv) / vref=0
vaxis=axis1;
proc univariate data=sleepoutresid noprint;
qqplot resid totalsleep / normal (L=1 mu=est sigma=est)
odstitle='Question 1 - QQ-plot'
odstitle2='Scott Cunningham';
histogram resid / odstitle='Question 1 - Histogram'
odstitle2='Scott Cunningham'
normal(noprint);
run;
quit; *these graphs are just sort of a final check to see if I did well in
refining the model;
*END PROBLEM 1
---------------------------------------------------
PROBELMS 2 & 3 ;
proc reg data = sleep;
model totalsleep = nondreaming predation exposure brainbody gestinv
loglifeinv / selection=cp b;
proc reg data = sleep;
model totalsleep = nondreaming predation exposure brainbody gestinv
loglifeinv / selection=stepwise;
run;
quit;
*END PROBLEMS 2 & 3
------------------------------------------------------
PROBLEMS 4, 5 & 6;
proc reg data=sleep;
model totalsleep = nondreaming predation gestinv / r vif clm cli clb
alpha=0.1;
output out=sleepresidbest r=resid;
run; *regression with all the options I need;
quit;
symbol1 v=dot i=none;
title1 'Question 4 - Residual Plot';
title2 'Scott Cunningham';
axis1 label = (angle=90 'Residual');
proc gplot data=sleepresidbest;
plot resid*(nondreaming predation gestinv) / vref=0
vaxis=axis1;
proc univariate data=sleepresidbest noprint;
qqplot resid / normal (L=1 mu=est sigma=est)
odstitle='Question 4 - QQ-plot'
odstitle2='Scott Cunningham';
histogram resid / odstitle='Question 4 - Histogram'
odstitle2='Scott Cunningham'
normal(noprint);
run;
quit; *graphs to check asumptions (problem 4)
*END PROBLEMS 4, 5 & 6;

More Related Content

What's hot

Linear Systems Gauss Seidel
Linear Systems   Gauss SeidelLinear Systems   Gauss Seidel
Linear Systems Gauss SeidelEric Davishahl
 
83662164 case-study-1
83662164 case-study-183662164 case-study-1
83662164 case-study-1homeworkping3
 
Changing the subject of a formula (Simple Formulae)
Changing the subject of a formula (Simple Formulae)Changing the subject of a formula (Simple Formulae)
Changing the subject of a formula (Simple Formulae)Alona Hall
 
Chapter 07 Chi Square
Chapter 07 Chi SquareChapter 07 Chi Square
Chapter 07 Chi Squareghalan
 
PRML Chapter 12
PRML Chapter 12PRML Chapter 12
PRML Chapter 12Sunwoo Kim
 
PRML Chapter 3
PRML Chapter 3PRML Chapter 3
PRML Chapter 3Sunwoo Kim
 
PRML Chapter 9
PRML Chapter 9PRML Chapter 9
PRML Chapter 9Sunwoo Kim
 
Changing the subject of a formula (grouping like terms and factorizing)
Changing the subject of a formula (grouping like terms and factorizing)Changing the subject of a formula (grouping like terms and factorizing)
Changing the subject of a formula (grouping like terms and factorizing)Alona Hall
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...ANIRBANMAJUMDAR18
 
What happens if measure the electron spin twice?
What happens if measure the electron spin twice?What happens if measure the electron spin twice?
What happens if measure the electron spin twice?Fausto Intilla
 
Applied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
Applied Numerical Methods Curve Fitting: Least Squares Regression, InterpolationApplied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
Applied Numerical Methods Curve Fitting: Least Squares Regression, InterpolationBrian Erandio
 
Direct and indirect methods
Direct and indirect methodsDirect and indirect methods
Direct and indirect methodsEjaz hussain
 
Solution of equations for methods iterativos
Solution of equations for methods iterativosSolution of equations for methods iterativos
Solution of equations for methods iterativosDUBAN CASTRO
 
Estimation Theory Class (Summary and Revision)
Estimation Theory Class (Summary and Revision)Estimation Theory Class (Summary and Revision)
Estimation Theory Class (Summary and Revision)Ahmad Gomaa
 
PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5Sunwoo Kim
 
PRML Chapter 8
PRML Chapter 8PRML Chapter 8
PRML Chapter 8Sunwoo Kim
 
Understanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsUnderstanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsSEMINARGROOT
 

What's hot (20)

Linear Systems Gauss Seidel
Linear Systems   Gauss SeidelLinear Systems   Gauss Seidel
Linear Systems Gauss Seidel
 
83662164 case-study-1
83662164 case-study-183662164 case-study-1
83662164 case-study-1
 
Linearization
LinearizationLinearization
Linearization
 
Changing the subject of a formula (Simple Formulae)
Changing the subject of a formula (Simple Formulae)Changing the subject of a formula (Simple Formulae)
Changing the subject of a formula (Simple Formulae)
 
Chapter 07 Chi Square
Chapter 07 Chi SquareChapter 07 Chi Square
Chapter 07 Chi Square
 
PRML Chapter 12
PRML Chapter 12PRML Chapter 12
PRML Chapter 12
 
PRML Chapter 3
PRML Chapter 3PRML Chapter 3
PRML Chapter 3
 
PRML Chapter 9
PRML Chapter 9PRML Chapter 9
PRML Chapter 9
 
Changing the subject of a formula (grouping like terms and factorizing)
Changing the subject of a formula (grouping like terms and factorizing)Changing the subject of a formula (grouping like terms and factorizing)
Changing the subject of a formula (grouping like terms and factorizing)
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
 
What happens if measure the electron spin twice?
What happens if measure the electron spin twice?What happens if measure the electron spin twice?
What happens if measure the electron spin twice?
 
Applied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
Applied Numerical Methods Curve Fitting: Least Squares Regression, InterpolationApplied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
Applied Numerical Methods Curve Fitting: Least Squares Regression, Interpolation
 
Direct and indirect methods
Direct and indirect methodsDirect and indirect methods
Direct and indirect methods
 
J41017380
J41017380J41017380
J41017380
 
Solution of equations for methods iterativos
Solution of equations for methods iterativosSolution of equations for methods iterativos
Solution of equations for methods iterativos
 
Estimation Theory Class (Summary and Revision)
Estimation Theory Class (Summary and Revision)Estimation Theory Class (Summary and Revision)
Estimation Theory Class (Summary and Revision)
 
004
004004
004
 
PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5
 
PRML Chapter 8
PRML Chapter 8PRML Chapter 8
PRML Chapter 8
 
Understanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsUnderstanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence Functions
 

Viewers also liked

PresentationMinchenNathan
PresentationMinchenNathanPresentationMinchenNathan
PresentationMinchenNathanNathan Minchen
 
Stat Final Project
Stat  Final ProjectStat  Final Project
Stat Final Projectprizumz
 
STAT 3510 Presentation
STAT 3510 PresentationSTAT 3510 Presentation
STAT 3510 PresentationMegan Kaehms
 
Final Project Social Psychology
Final Project Social PsychologyFinal Project Social Psychology
Final Project Social Psychologyyangkanglim
 
Rm psych stats & graphs
Rm psych stats & graphsRm psych stats & graphs
Rm psych stats & graphsCrystal Delosa
 
Psychology Final Project
Psychology Final ProjectPsychology Final Project
Psychology Final Projectpabs
 
The T-Test, by Geoff Browne
The T-Test, by Geoff BrowneThe T-Test, by Geoff Browne
The T-Test, by Geoff BrowneStephen Taylor
 
Statistics Project
Statistics ProjectStatistics Project
Statistics ProjectRonan Santos
 
T Test For Two Independent Samples
T Test For Two Independent SamplesT Test For Two Independent Samples
T Test For Two Independent Samplesshoffma5
 
Hypothesis testing ppt final
Hypothesis testing ppt finalHypothesis testing ppt final
Hypothesis testing ppt finalpiyushdhaker
 
What 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From FailureWhat 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From FailureReferralCandy
 
Five Killer Ways to Design The Same Slide
Five Killer Ways to Design The Same SlideFive Killer Ways to Design The Same Slide
Five Killer Ways to Design The Same SlideCrispy Presentations
 
How People Really Hold and Touch (their Phones)
How People Really Hold and Touch (their Phones)How People Really Hold and Touch (their Phones)
How People Really Hold and Touch (their Phones)Steven Hoober
 
Upworthy: 10 Ways To Win The Internets
Upworthy: 10 Ways To Win The InternetsUpworthy: 10 Ways To Win The Internets
Upworthy: 10 Ways To Win The InternetsUpworthy
 
The Seven Deadly Social Media Sins
The Seven Deadly Social Media SinsThe Seven Deadly Social Media Sins
The Seven Deadly Social Media SinsXPLAIN
 

Viewers also liked (20)

PresentationMinchenNathan
PresentationMinchenNathanPresentationMinchenNathan
PresentationMinchenNathan
 
Stat Final Project
Stat  Final ProjectStat  Final Project
Stat Final Project
 
STAT 3510 Presentation
STAT 3510 PresentationSTAT 3510 Presentation
STAT 3510 Presentation
 
Final Project Social Psychology
Final Project Social PsychologyFinal Project Social Psychology
Final Project Social Psychology
 
Paid user profile
Paid user profilePaid user profile
Paid user profile
 
Rm psych stats & graphs
Rm psych stats & graphsRm psych stats & graphs
Rm psych stats & graphs
 
Psychology Final Project
Psychology Final ProjectPsychology Final Project
Psychology Final Project
 
The T-Test, by Geoff Browne
The T-Test, by Geoff BrowneThe T-Test, by Geoff Browne
The T-Test, by Geoff Browne
 
Statistics Project
Statistics ProjectStatistics Project
Statistics Project
 
Student's T-Test
Student's T-TestStudent's T-Test
Student's T-Test
 
T Test For Two Independent Samples
T Test For Two Independent SamplesT Test For Two Independent Samples
T Test For Two Independent Samples
 
Student t-test
Student t-testStudent t-test
Student t-test
 
T test
T testT test
T test
 
Hypothesis testing ppt final
Hypothesis testing ppt finalHypothesis testing ppt final
Hypothesis testing ppt final
 
What 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From FailureWhat 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From Failure
 
Five Killer Ways to Design The Same Slide
Five Killer Ways to Design The Same SlideFive Killer Ways to Design The Same Slide
Five Killer Ways to Design The Same Slide
 
How People Really Hold and Touch (their Phones)
How People Really Hold and Touch (their Phones)How People Really Hold and Touch (their Phones)
How People Really Hold and Touch (their Phones)
 
Upworthy: 10 Ways To Win The Internets
Upworthy: 10 Ways To Win The InternetsUpworthy: 10 Ways To Win The Internets
Upworthy: 10 Ways To Win The Internets
 
Displaying Data
Displaying DataDisplaying Data
Displaying Data
 
The Seven Deadly Social Media Sins
The Seven Deadly Social Media SinsThe Seven Deadly Social Media Sins
The Seven Deadly Social Media Sins
 

Similar to Scott Cunningham STAT512 Final Project

Multivariate time series
Multivariate time seriesMultivariate time series
Multivariate time seriesLuigi Piva CQF
 
Hypothesis Testing techniques in social research.ppt
Hypothesis Testing techniques in social research.pptHypothesis Testing techniques in social research.ppt
Hypothesis Testing techniques in social research.pptSolomonkiplimo
 
Module code 124ms Tajvinder Virdee Sets and Log.docx
Module code 124ms  Tajvinder Virdee  Sets and Log.docxModule code 124ms  Tajvinder Virdee  Sets and Log.docx
Module code 124ms Tajvinder Virdee Sets and Log.docxmoirarandell
 
CMIS 102 Hands-On Lab Week 4OverviewThis hands-on lab all.docx
CMIS 102 Hands-On Lab Week 4OverviewThis hands-on lab all.docxCMIS 102 Hands-On Lab Week 4OverviewThis hands-on lab all.docx
CMIS 102 Hands-On Lab Week 4OverviewThis hands-on lab all.docxmonicafrancis71118
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help HelpWithAssignment.com
 
How invariants help writing loops
How invariants help writing loopsHow invariants help writing loops
How invariants help writing loopsnextbuild
 
Predicting breast cancer: Adrian Valles
Predicting breast cancer: Adrian VallesPredicting breast cancer: Adrian Valles
Predicting breast cancer: Adrian VallesAdrián Vallés
 
Math 533 week 6 more help
Math 533 week 6   more helpMath 533 week 6   more help
Math 533 week 6 more helpBrent Heard
 
© Charles T. Diebold, Ph.D., 73013. All Rights Reserved. Pa.docx
© Charles T. Diebold, Ph.D., 73013. All Rights Reserved.  Pa.docx© Charles T. Diebold, Ph.D., 73013. All Rights Reserved.  Pa.docx
© Charles T. Diebold, Ph.D., 73013. All Rights Reserved. Pa.docxLynellBull52
 
Monte carlo-simulation
Monte carlo-simulationMonte carlo-simulation
Monte carlo-simulationjaimarbustos
 
Where and why are the lucky primes positioned in the spectrum of the Polignac...
Where and why are the lucky primes positioned in the spectrum of the Polignac...Where and why are the lucky primes positioned in the spectrum of the Polignac...
Where and why are the lucky primes positioned in the spectrum of the Polignac...Chris De Corte
 
15 ch ken black solution
15 ch ken black solution15 ch ken black solution
15 ch ken black solutionKrunal Shah
 
Tips & tricks for Quantitative Aptitude
Tips & tricks for Quantitative AptitudeTips & tricks for Quantitative Aptitude
Tips & tricks for Quantitative AptitudeAmber Bhaumik
 

Similar to Scott Cunningham STAT512 Final Project (20)

Daa unit 1
Daa unit 1Daa unit 1
Daa unit 1
 
Multivariate time series
Multivariate time seriesMultivariate time series
Multivariate time series
 
Statistics Project
Statistics ProjectStatistics Project
Statistics Project
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
 
Hypothesis Testing techniques in social research.ppt
Hypothesis Testing techniques in social research.pptHypothesis Testing techniques in social research.ppt
Hypothesis Testing techniques in social research.ppt
 
Module code 124ms Tajvinder Virdee Sets and Log.docx
Module code 124ms  Tajvinder Virdee  Sets and Log.docxModule code 124ms  Tajvinder Virdee  Sets and Log.docx
Module code 124ms Tajvinder Virdee Sets and Log.docx
 
CMIS 102 Hands-On Lab Week 4OverviewThis hands-on lab all.docx
CMIS 102 Hands-On Lab Week 4OverviewThis hands-on lab all.docxCMIS 102 Hands-On Lab Week 4OverviewThis hands-on lab all.docx
CMIS 102 Hands-On Lab Week 4OverviewThis hands-on lab all.docx
 
Chapter 18,19
Chapter 18,19Chapter 18,19
Chapter 18,19
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help
 
How invariants help writing loops
How invariants help writing loopsHow invariants help writing loops
How invariants help writing loops
 
Predicting breast cancer: Adrian Valles
Predicting breast cancer: Adrian VallesPredicting breast cancer: Adrian Valles
Predicting breast cancer: Adrian Valles
 
Model selection
Model selectionModel selection
Model selection
 
Math 533 week 6 more help
Math 533 week 6   more helpMath 533 week 6   more help
Math 533 week 6 more help
 
report
reportreport
report
 
© Charles T. Diebold, Ph.D., 73013. All Rights Reserved. Pa.docx
© Charles T. Diebold, Ph.D., 73013. All Rights Reserved.  Pa.docx© Charles T. Diebold, Ph.D., 73013. All Rights Reserved.  Pa.docx
© Charles T. Diebold, Ph.D., 73013. All Rights Reserved. Pa.docx
 
Monte carlo-simulation
Monte carlo-simulationMonte carlo-simulation
Monte carlo-simulation
 
Where and why are the lucky primes positioned in the spectrum of the Polignac...
Where and why are the lucky primes positioned in the spectrum of the Polignac...Where and why are the lucky primes positioned in the spectrum of the Polignac...
Where and why are the lucky primes positioned in the spectrum of the Polignac...
 
15 ch ken black solution
15 ch ken black solution15 ch ken black solution
15 ch ken black solution
 
MATHS
MATHSMATHS
MATHS
 
Tips & tricks for Quantitative Aptitude
Tips & tricks for Quantitative AptitudeTips & tricks for Quantitative Aptitude
Tips & tricks for Quantitative Aptitude
 

Scott Cunningham STAT512 Final Project

  • 2. PART I 1) Looking at the graph below, it is clear that the two pieces are not the same line. This is confirmed by the sameline test I performed, as shown below. The p-value is just barely significant (if we don’t get too picky about thousandths and ten-thousandths). Thus, the extra 0.0003 is not enough for me to call it insignificant. Because we have a significant p-value, we can reject the null hypothesis β2 = 0 in favor of the alternative hypothesis β2 ≠ 0.
  • 3. 2) (a) Extra sum of squares = SSE(R)–SSE(F) = 306.48106–277.86404 = 28.61702 F-value: F(1, 34) = = = 3.50163 (b) F(1, 34) = 3.50 p-value = 0.0699 Conclusion: Fail to reject H0, and β7 = 0 (c) t* = −1.87 p-value = 0.0699 (t*)2 = 3.4969 A t-test with n degrees of freedom is equivalent to an F-test with (1, n) degrees of freedom. As you can see the t-value squared gives the F-value from the previous parts of the problem. Hence, both produce the same p-value.
  • 4. 3) Σ Type-I SS = 632.17685 Σ Type-II SS = 104.00489 The Type-I SS sum to SSM. The two types of SS are equal for the danger predictor. This is because the conditional probability is the same in both cases: SS(Danger | BodyWt BrainWt Dreaming LifeSpan Gestation Predation Exposure) so we get the same value. 4) Explanatory Variables R2 BodyWt BrainWt Dreaming LifeSpan Gestation Predation Exposure Danger 0.6952 BodyWt 0.1175 BrainWt 0.1136 Dreaming 0.5287 NonDreaming 0.9364 LifeSpan 0.1463 Gestation 0.3776 Predation 0.0078 Exposure 0.3861 Danger 0.3652 BodyWt BrainWt 0.1186 BodyWt BrainWt Sum 0.4118 LifeSpan Gestation 0.3780 Dreaming Sum 0.6346 NonDreaming Sum 0.9376 Predation Exposure Gestation 0.5404 BodyWt BrainWt Dreaming 0.5991
  • 5. Part II 1) These are the initial scatter plots for each variable:
  • 6.
  • 7.
  • 8.
  • 9. I decided that BodyWt, BrainWt, Gestation and Lifespan needed transformed because they are all non-linear. At the professor’s suggestion, I took the ratio of BrainWt to BodyWt to transform those two. I also took the inverse of both Gestation and Lifespan.
  • 10. LifeInverse ended up being fairly non-linear so I checked two further transformations: square root and log. I decided to go with logLifeInv as it is more linear.
  • 11. Note: From here onward I use NonDreaming and threw out Dreaming because NonDreaming had a more linear relationship with TotalSleep. NonDreaming also has lower or the same correlation with all the variables compared to Dreaming (see below). I ran the correlation procedure and found that Predation, Exposure and Danger, were all highly correlated. Thus, I removed Danger as it had the highest correlation with the other two (it’s also the least linear of the three).
  • 12. Box-Cox gave optimal λ as 0.75 so I decided Y did not need transformed. I ran regression with the following model to obtain residuals: TotalSleep = NonDreaming + Predation + Exposure + BrainBody + GestInv + logLifeInv Thus, I obtained the following residual plots:
  • 13.
  • 14.
  • 15.
  • 16. I also produced the following histogram and QQ-plot, which both indicate the residual are approximately Normal.
  • 17. I also checked TotalSleep with a QQ-plot and found that it was approximately Normal as well. In summation, I would conclude that TotalSleep = NonDreaming + Predation + Exposure + BrainBody + GestInv + logLifeInv is a good model to begin with as a starting point.
  • 18. 2) Mallow’s Cp reported the following models (I’ve only included the first 12): I’ve highlighted my selection for best model in blue. I would use this model because it has a Cp < p which is good, and a very high R2 . Adding more variables doesn’t increase the R2 value very much, so it would be unnecessary. I don’t want to include the BrainBody term since a negative coefficient would not make sense for this variable, because it is positively correlated with TotalSleep, as evidenced by its scatter plot. Given the somewhat high (~0.62) correlation between Predation and Exposure, it does not make sense to have both in the model, and as you can see, switching Exposure with Predation (last entry) results in a Cp > p, which would not be a good model. The following is a summary of the part of the table I have removed. The Cp values slowly increase as different combinations of variables are tried, with NonDreaming remaining constant among them. Once NonDreaming is removed, the Cp values skyrocket, increasing by approximately 2000% on the “best” model without NonDreaming (highlighted in yellow below), and increasing further from thereon as different combinations without NonDreaming are tried. This result makes sense as TotalSleep = NonDreaming + Dreaming, so it is the most influential explanatory variable. In conclusion, I choose the best model to be: TotalSleep = β0 + β1NonDreaming – β2Predation + β3GestationInverse
  • 19. 3) The stepwise selection method produced the following result: As this is the same model as selected above, I will not restate my reasons for selecting it. However, I shall list some interesting points. Some points of note:  Predation contributes very little to R2 , however if we want to satisfy Cp < p, it is necessary to include it. This is also true of GestationInverse.  As pointed out above, NonDreaming contributes very heavily to TotalSleep, which is why it has such a large partial R2 .  The stepwise selection produces the same “best” model as the Cp criterion. To reiterate, I choose the best model to be: TotalSleep = β0 + β1NonDreaming – β2Predation + β3GestationInverse* *Note: On Mixable I saw Professor Sharabati saying he would suggest keeping Dreaming and throwing out NonDreaming. I tested my model at every step after switching NonDreaming with Dreaming and had much worse results, which prompted meto continue using NonDreaming.
  • 20. 4) The residual plot for NonDreaming appears to be okay, except for maybe an outlier at ~ −2.75. The residual plot for Predation is fine, no discernible pattern.
  • 21. However, the residual plot for GestationInverse indicates that the constant variance assumption may be violated. We can also see that possible outlier at ~ −2.75. Summary of residuals: The GestationInverse residual plot makes me cautious, but I would not denounce the model just yet. I see no reason to assume the responses are not independent. Looking at the following histogram and QQ-plot indicates that the residuals are approximately Normally distributed. The histogram tells me that the possible outlier is probably not an outlier, but I will confirm this in the nest question. Based on the scatter plots in Question 1, I would say the linearity assumption is not violated. Overall, I would say this is an acceptable model to use with some caution due to the GestationInverse residuals having slight problem with constant variance.
  • 22.
  • 23. 5) I use the Studentized Residuals and Cook’s Distance to check for outliers and influential observations. I use VIF to check for multicollinearity. VIF results: All VIF scores are well below the threshold for determining multicollinearity. I conclude that multicollinearity is not a problem in the model. For the residuals I only include output for the most influential/unusual points: Looking at the Studentized Residuals and fences, none of the largest are considered outliers. Cook’s Distance Critical F-value = F(4,40) (.5) = 0.85356585 As you can see, none of the largest Cook’s D values come close to exceeding the critical value. In conclusion, I have determined and statistically proved that the suspected outlier from the residual and QQ-plots is in fact, not an outlier, and that there are no influential observations.
  • 24. 6) (a) = 1.153133 + 1.05652(NonDreaming) – 0.28902(Predation) + 35.46003(GestationInverse) (b) 90% C.I. for µh : Highlighted in green below (first 20 obs.) (c) 90% P.I. for (h)new : Highlighted in pink below (first 20 obs.) (d) 90% C.I. for βi : Highlighted in blue below
  • 25. SAS CODE *data imported using File menu PART 1 ; symbol1 v=dot i=sm75S; proc gplot data = sleep; plot TotalSleep* (BodyWt BrainWt NonDreaming Dreaming Lifespan Gestation Predation Exposure Danger); run; *I used this to figure out which variable I wanted to use; quit; data sleep; *I decided on gestation, so I create the cslope term; set sleep; if gestation le 175 then cslope=0; if gestation gt 175 then cslope=(gestation-175); proc reg data=sleep; *regression to get equation; model totalsleep=gestation cslope / p; output out=sleepoutpred p=pred; sameline: test cslope; *sameline test; run; quit; symbol1 v=circle i=none c=black; symbol2 v=none i=join c=red; title1 'Question 1 - Piecewise Regression'; title2 'Scott Cunningham'; axis1 label = (angle=90 'TotalSleep'); proc sort data=sleepoutpred; by gestation; proc gplot data=sleepoutpred; plot (totalsleep pred)*gestation / overlay vaxis=axis1; run; quit; *plotting the graph; * END PROBLEM 1 -------------------------------------------------------------- PROBLEM 2 ; data sleep; *creating sum; set sleep; sum = lifespan+gestation; proc reg data = sleep; *running the two regressions; model totalsleep = bodywt brainwt dreaming predation exposure danger; model totalsleep = bodywt brainwt dreaming predation exposure danger sum; nilsum: test sum; *F-test; run; quit; * END PROBLEM 2 -------------------------------------------------------------
  • 26. PROBLEM 3 ; proc reg data = sleep; model totalsleep = bodywt brainwt dreaming lifespan gestation predation exposure danger / ss1 ss2; run; quit; * END PROBLEM 3 -------------------------------------------------------------- PROBLEM 4 ; proc reg data = sleep; model totalsleep = bodywt; model totalsleep = brainwt; model totalsleep = dreaming; model totalsleep = nondreaming; model totalsleep = lifespan; model totalsleep = gestation; model totalsleep = predation; model totalsleep = exposure; model totalsleep = danger; model totalsleep = bodywt brainwt; model totalsleep = bodywt brainwt sum; model totalsleep = lifespan gestation; model totalsleep = dreaming sum; model totalsleep = nondreaming sum; model totalsleep = predation exposure danger; model totalsleep = bodywt brainwt dreaming; run; quit; * END PROBLEM 4 -------------------------------- PART 2 PROBLEM 1 ; symbol1 v=dot i=sm75S; title1 'Question 1 - Scatter Plot with Smoothing Curve'; title2 'Scott Cunningham'; proc gplot data = sleep; plot TotalSleep* (BodyWt BrainWt NonDreaming Dreaming Lifespan Gestation Predation Exposure Danger) / vaxis=axis1; run; *to examine the response variables; quit; data sleep; *creating transforms of the variables I think need it; set sleep; brainbody = brainwt/bodywt; gestinv = 1/gestation; lifeinv = 1/lifespan; proc gplot data = sleep; plot TotalSleep*(brainbody gestinv lifeinv) / vaxis=axis1; run; *checking the new transformed variables; quit;
  • 27. data sleep; *checking two possible further transformations; set sleep; loglifeinv=log(lifeinv); sqrtlifeinv=sqrt(lifeinv); proc gplot data = sleep; *checking again; plot TotalSleep*(loglifeinv sqrtlifeinv) / vaxis=axis1; run; quit; proc corr data=sleep; *checking correlation between responses; var nondreaming dreaming predation exposure danger brainbody gestinv loglifeinv; proc transreg data = sleep; *performing Box-Cox to check if Y needs to be transformed; model boxcox(totalsleep)=identity(nondreaming predation exposure brainbody gestinv loglifeinv); run; quit; proc reg data = sleep; model totalsleep = nondreaming predation exposure brainbody gestinv loglifeinv / r; output out=sleepoutresid r=resid; run; *computing the residuals; quit; symbol1 v=dot i=none; title1 'Question 1 - Residual Plot'; title2 'Scott Cunningham'; axis1 label = (angle=90 'Residual'); proc gplot data = sleepoutresid; plot resid*(nondreaming predation exposure brainbody gestinv loglifeinv) / vref=0 vaxis=axis1; proc univariate data=sleepoutresid noprint; qqplot resid totalsleep / normal (L=1 mu=est sigma=est) odstitle='Question 1 - QQ-plot' odstitle2='Scott Cunningham'; histogram resid / odstitle='Question 1 - Histogram' odstitle2='Scott Cunningham' normal(noprint); run; quit; *these graphs are just sort of a final check to see if I did well in refining the model; *END PROBLEM 1 --------------------------------------------------- PROBELMS 2 & 3 ; proc reg data = sleep;
  • 28. model totalsleep = nondreaming predation exposure brainbody gestinv loglifeinv / selection=cp b; proc reg data = sleep; model totalsleep = nondreaming predation exposure brainbody gestinv loglifeinv / selection=stepwise; run; quit; *END PROBLEMS 2 & 3 ------------------------------------------------------ PROBLEMS 4, 5 & 6; proc reg data=sleep; model totalsleep = nondreaming predation gestinv / r vif clm cli clb alpha=0.1; output out=sleepresidbest r=resid; run; *regression with all the options I need; quit; symbol1 v=dot i=none; title1 'Question 4 - Residual Plot'; title2 'Scott Cunningham'; axis1 label = (angle=90 'Residual'); proc gplot data=sleepresidbest; plot resid*(nondreaming predation gestinv) / vref=0 vaxis=axis1; proc univariate data=sleepresidbest noprint; qqplot resid / normal (L=1 mu=est sigma=est) odstitle='Question 4 - QQ-plot' odstitle2='Scott Cunningham'; histogram resid / odstitle='Question 4 - Histogram' odstitle2='Scott Cunningham' normal(noprint); run; quit; *graphs to check asumptions (problem 4) *END PROBLEMS 4, 5 & 6;