SlideShare a Scribd company logo
Braedon Churchill
Audrey Fu
Stats 431
Computing Project
1. Background
a. Description of the problem – As a professional golfer playing in
tournaments/events you want to maximize the earnings you receive from each
event. In order to do this you want to find out which aspects of a golfers
performance have a higher correlation with the average earnings attained in a
given event. Such aspects include number of events played in, average score per
round, percentage of greens hit in regulation, average driving distance, driving
accuracy, and average putts per round. By discovering which aspects correlate to
higher earnings you know which aspects of your own game you need to focus on
in order to obtain more money.
b. Description of statistical questions – Do any of the variables significantly explain
or predict the average earnings per round of golfers.
2. Results
- Exploratory Analysis
- Using the pairs function in r, the pairwise relationship of the data is observed.
Appendix I shows the results. Although appearing fairly random, a linear
relationship can be seen among the data points between most of the variables.
- Appendix II shows the residual plot of the linear model. The εi looks to be
normally distributed due to randomness and assuming they are independent.
- Hypothesis Testing
- Conducted an F Statistic hypothesis test to see if at least one of the variables has
a predictive value on the average earnings per event of golfers. Results are seen in
Appendix III. As seen in the test there is significant evidence that at least one of
the variables has a predictive value.
- Summary of the data
- Conducted VIF test as a function in r to see how the variance of the coefficients
are correlated as compared to when they are not linearly related. The results are
seen in Appendix IV. The coefficients are not very highly correlated which shows
that each coefficient has its own predictive value on average earnings per event.
- Performing the summary function in r, as seen in Appendix V, every coefficient
has a very low p value showing that they each have a predictive value on average
earnings per event. It is also seen that the R2 value is .82 which shows that 82% of
the variability is explained by the model.
3. Discussion
- The statistical analysis of an F Test was performed in order to test and see if at
least one of the variables has a predictive value on earnings per round. The
limitation of this test is that it doesn’t show which of the variables have predictive
values, only that if at least one of them does.
- Another hypothesis test which tests for the significance of p values of each
individual coefficient could be performed in order to see which of them have
predictive values. Using a data set which includes the stats of more individual
golfers could also be used in order to come up with more accurate test results.
4. Appendix
a. Details of Statistical Analyses (Appendix I – V)
Appendix I
Appendix II
Appendix III
H0: X1=X2=X3=X4=X5=X6=0 Ha: At least one Xi ≠ 0
k = 6 n = 18
DF1 = k = 6 DF2 = n-(k+1) = 11 α = .05
F Statistic = 13.99 with Fα, DF1=6, DF2=11 = 3.09
Reject Ho if F > Fα
Since 13.99 > 3.09 I reject the null hypothesis
It can be concluded that at least one of the variables has a predictive value on the average
earnings per event of golfers. It is shown with the residual plot that the data is normally
distributed. Independence is assumed.
Appendix IV
> vif(lm)
X1 X2 X3 X4 X5 X6
2.937727 3.598127 1.846040 1.830498 1.742528 2.145418
Appendix V
Call:
lm(formula = Y ~ X1 + X2 + X3 + X4 + X5 + X6)
Y = Average earnings per event
X1 = Average score per round
X2 = Percentage of greens in regulation
X3 = Driving accuracy
X4 = Average putts per round
X5 = Number of events
X6 = Average driving distance
Residuals:
Min 1Q Median 3Q Max
-50215 -21877 1518 18345 37626
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1411171.4 1350234.6 1.045 0.309796
X1 -44490.9 19562.2 -2.274 0.035418 *
X2 22564.0 4727.3 4.773 0.000152 ***
X3 -5463.8 1453.6 -3.759 0.001437 **
X4 57686.9 23583.6 2.446 0.024946 *
X5 -4751.8 1288.8 -3.687 0.001687 **
X6 -3466.1 999.3 -3.469 0.002742 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 29530 on 18 degrees of freedom
Multiple R-squared: 0.8234, Adjusted R-squared: 0.7646
F-statistic: 13.99 on 6 and 18 DF, p-value: 6.497e-06
b. R Code
#load the data set
Golfstats <- read.delim("~/Golfstats.txt")
> View(Golfstats)
#name the variables
> Y=Golfstats$Earnings.Event
> X1=Golfstats$Avg..Score
> X2=Golfstats$GIR.....
> X3=Golfstats$Driving.Accuracy....
> X4=Golfstats$Putts.Round
> X5=Golfstats$Events
> X6=Golfstats$Driving.Distance
#Perform Exploratory Analysis
> pairs(Data)
#create linear model
> lm(Y~X1+X2+X3+X4+X5+X6)
> lm=lm(Y~X1+X2+X3+X4+X5+X6)
#create residual plot to test the residuals
> plot(lm$fitted, lm$resid)
> abline(h=0, lty=2)
#check the significance of the variables and find test statistics
> summary(lm)
#check for multicolinearity of the variables
> vif(lm)

More Related Content

Similar to Statistical Analysis Project

Real Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningReal Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine Learning
IRJET Journal
 
Factors affecting customer satisfaction
Factors affecting customer satisfactionFactors affecting customer satisfaction
Factors affecting customer satisfaction
Saleesh Satheeshchandran
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET Journal
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET Journal
 
Chapter iv
Chapter ivChapter iv
Chapter iv
RovelineRomano
 
Perfmeasure.ppt
Perfmeasure.pptPerfmeasure.ppt
Perfmeasure.ppt
Mmdmmd4
 
APT_&_VaR[1]
APT_&_VaR[1]APT_&_VaR[1]
APT_&_VaR[1]
Darren Story, CFA
 
Engineering Economy : Decisions-Recognizing-Risks
Engineering Economy : Decisions-Recognizing-RisksEngineering Economy : Decisions-Recognizing-Risks
Engineering Economy : Decisions-Recognizing-Risks
mikkomonares1
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
Yanchang Zhao
 
Chapter 04
Chapter 04Chapter 04
Chapter 04
bmcfad01
 
Tutorial 8 Solutions.docx
Tutorial 8 Solutions.docxTutorial 8 Solutions.docx
Tutorial 8 Solutions.docx
LinhLeThiThuy4
 
creditriskmanagment_howardhaughton121510
creditriskmanagment_howardhaughton121510creditriskmanagment_howardhaughton121510
creditriskmanagment_howardhaughton121510
mrmelchi
 
High Performance Decision Tree Optimization within a Deep Learning Framework ...
High Performance Decision Tree Optimization within a Deep Learning Framework ...High Performance Decision Tree Optimization within a Deep Learning Framework ...
High Performance Decision Tree Optimization within a Deep Learning Framework ...
Yigal D. Jhirad
 
Forecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear RegressionForecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear Regression
ijtsrd
 
Statistical Model to Predict IPO Prices for Semiconductor
Statistical Model to Predict IPO Prices for SemiconductorStatistical Model to Predict IPO Prices for Semiconductor
Statistical Model to Predict IPO Prices for Semiconductor
Xuanhua(Peter) Yin
 
H354549
H354549H354549
H354549
aijbm
 
Bbs11 ppt ch14
Bbs11 ppt ch14Bbs11 ppt ch14
Bbs11 ppt ch14
Tuul Tuul
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
Yanchang Zhao
 
Effects of dividends on common stock prices the nepalese evidence
Effects of dividends on common stock prices  the nepalese evidenceEffects of dividends on common stock prices  the nepalese evidence
Effects of dividends on common stock prices the nepalese evidence
PankajKunwar3
 
1 chapter 04
1 chapter 041 chapter 04
1 chapter 04
NELSON DUBE
 

Similar to Statistical Analysis Project (20)

Real Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningReal Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine Learning
 
Factors affecting customer satisfaction
Factors affecting customer satisfactionFactors affecting customer satisfaction
Factors affecting customer satisfaction
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
Chapter iv
Chapter ivChapter iv
Chapter iv
 
Perfmeasure.ppt
Perfmeasure.pptPerfmeasure.ppt
Perfmeasure.ppt
 
APT_&_VaR[1]
APT_&_VaR[1]APT_&_VaR[1]
APT_&_VaR[1]
 
Engineering Economy : Decisions-Recognizing-Risks
Engineering Economy : Decisions-Recognizing-RisksEngineering Economy : Decisions-Recognizing-Risks
Engineering Economy : Decisions-Recognizing-Risks
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
Chapter 04
Chapter 04Chapter 04
Chapter 04
 
Tutorial 8 Solutions.docx
Tutorial 8 Solutions.docxTutorial 8 Solutions.docx
Tutorial 8 Solutions.docx
 
creditriskmanagment_howardhaughton121510
creditriskmanagment_howardhaughton121510creditriskmanagment_howardhaughton121510
creditriskmanagment_howardhaughton121510
 
High Performance Decision Tree Optimization within a Deep Learning Framework ...
High Performance Decision Tree Optimization within a Deep Learning Framework ...High Performance Decision Tree Optimization within a Deep Learning Framework ...
High Performance Decision Tree Optimization within a Deep Learning Framework ...
 
Forecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear RegressionForecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear Regression
 
Statistical Model to Predict IPO Prices for Semiconductor
Statistical Model to Predict IPO Prices for SemiconductorStatistical Model to Predict IPO Prices for Semiconductor
Statistical Model to Predict IPO Prices for Semiconductor
 
H354549
H354549H354549
H354549
 
Bbs11 ppt ch14
Bbs11 ppt ch14Bbs11 ppt ch14
Bbs11 ppt ch14
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Effects of dividends on common stock prices the nepalese evidence
Effects of dividends on common stock prices  the nepalese evidenceEffects of dividends on common stock prices  the nepalese evidence
Effects of dividends on common stock prices the nepalese evidence
 
1 chapter 04
1 chapter 041 chapter 04
1 chapter 04
 

Recently uploaded

一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
Building a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdfBuilding a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdf
cjimenez2581
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
inaya7568
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 

Recently uploaded (20)

一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
Building a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdfBuilding a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdf
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 

Statistical Analysis Project

  • 1. Braedon Churchill Audrey Fu Stats 431 Computing Project 1. Background a. Description of the problem – As a professional golfer playing in tournaments/events you want to maximize the earnings you receive from each event. In order to do this you want to find out which aspects of a golfers performance have a higher correlation with the average earnings attained in a given event. Such aspects include number of events played in, average score per round, percentage of greens hit in regulation, average driving distance, driving accuracy, and average putts per round. By discovering which aspects correlate to higher earnings you know which aspects of your own game you need to focus on in order to obtain more money. b. Description of statistical questions – Do any of the variables significantly explain or predict the average earnings per round of golfers. 2. Results - Exploratory Analysis - Using the pairs function in r, the pairwise relationship of the data is observed. Appendix I shows the results. Although appearing fairly random, a linear relationship can be seen among the data points between most of the variables. - Appendix II shows the residual plot of the linear model. The εi looks to be normally distributed due to randomness and assuming they are independent. - Hypothesis Testing - Conducted an F Statistic hypothesis test to see if at least one of the variables has a predictive value on the average earnings per event of golfers. Results are seen in Appendix III. As seen in the test there is significant evidence that at least one of the variables has a predictive value. - Summary of the data - Conducted VIF test as a function in r to see how the variance of the coefficients are correlated as compared to when they are not linearly related. The results are seen in Appendix IV. The coefficients are not very highly correlated which shows that each coefficient has its own predictive value on average earnings per event. - Performing the summary function in r, as seen in Appendix V, every coefficient has a very low p value showing that they each have a predictive value on average earnings per event. It is also seen that the R2 value is .82 which shows that 82% of the variability is explained by the model. 3. Discussion - The statistical analysis of an F Test was performed in order to test and see if at least one of the variables has a predictive value on earnings per round. The limitation of this test is that it doesn’t show which of the variables have predictive values, only that if at least one of them does. - Another hypothesis test which tests for the significance of p values of each individual coefficient could be performed in order to see which of them have predictive values. Using a data set which includes the stats of more individual golfers could also be used in order to come up with more accurate test results.
  • 2. 4. Appendix a. Details of Statistical Analyses (Appendix I – V) Appendix I Appendix II Appendix III H0: X1=X2=X3=X4=X5=X6=0 Ha: At least one Xi ≠ 0 k = 6 n = 18 DF1 = k = 6 DF2 = n-(k+1) = 11 α = .05 F Statistic = 13.99 with Fα, DF1=6, DF2=11 = 3.09 Reject Ho if F > Fα Since 13.99 > 3.09 I reject the null hypothesis It can be concluded that at least one of the variables has a predictive value on the average earnings per event of golfers. It is shown with the residual plot that the data is normally distributed. Independence is assumed.
  • 3. Appendix IV > vif(lm) X1 X2 X3 X4 X5 X6 2.937727 3.598127 1.846040 1.830498 1.742528 2.145418 Appendix V Call: lm(formula = Y ~ X1 + X2 + X3 + X4 + X5 + X6) Y = Average earnings per event X1 = Average score per round X2 = Percentage of greens in regulation X3 = Driving accuracy X4 = Average putts per round X5 = Number of events X6 = Average driving distance Residuals: Min 1Q Median 3Q Max -50215 -21877 1518 18345 37626 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1411171.4 1350234.6 1.045 0.309796 X1 -44490.9 19562.2 -2.274 0.035418 * X2 22564.0 4727.3 4.773 0.000152 *** X3 -5463.8 1453.6 -3.759 0.001437 ** X4 57686.9 23583.6 2.446 0.024946 * X5 -4751.8 1288.8 -3.687 0.001687 ** X6 -3466.1 999.3 -3.469 0.002742 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 29530 on 18 degrees of freedom Multiple R-squared: 0.8234, Adjusted R-squared: 0.7646 F-statistic: 13.99 on 6 and 18 DF, p-value: 6.497e-06 b. R Code #load the data set Golfstats <- read.delim("~/Golfstats.txt") > View(Golfstats) #name the variables > Y=Golfstats$Earnings.Event > X1=Golfstats$Avg..Score
  • 4. > X2=Golfstats$GIR..... > X3=Golfstats$Driving.Accuracy.... > X4=Golfstats$Putts.Round > X5=Golfstats$Events > X6=Golfstats$Driving.Distance #Perform Exploratory Analysis > pairs(Data) #create linear model > lm(Y~X1+X2+X3+X4+X5+X6) > lm=lm(Y~X1+X2+X3+X4+X5+X6) #create residual plot to test the residuals > plot(lm$fitted, lm$resid) > abline(h=0, lty=2) #check the significance of the variables and find test statistics > summary(lm) #check for multicolinearity of the variables > vif(lm)