Upcoming SlideShare
×

# NFL Data Predictor Model: How the Past Predicts the Future

613 views
514 views

Published on

Due to a horrible 2010 season performance, the General Manager of the Carolina Panthers, Marty Hurney, wants to replace Jimmy Clausen with a new starting quarterback.

Managerial Question: How can Hurney rate quarterback to determine who to sign for the 2011 season?

To complete this task, we developed a multi-predictor statistical model using the 2010 NLF quarterbacks’ passing statistics.

*Our study/presentation has no affiliation with www.nfl.com and is only for educational purposes.

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
613
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
2
0
Likes
0
Embeds 0
No embeds

No notes for slide
• Predication interval range is: 43.93
• ### NFL Data Predictor Model: How the Past Predicts the Future

1. 1. NFL Data Predictor Model:How the Past Predicts the Future Ryan Kunes Kristy Huffman Maziar Mahboubian
2. 2. EnterpriseDue to a horrible 2010 season performance, the GeneralManager of the Carolina Panthers, Marty Hurney, wants toreplace Jimmy Clausen with a new starting quarterback.Managerial Question: How can Hurney rate quarterback todetermine who to sign for the 2011 season?To complete this task, we developed a multi-predictorstatistical model using the 2010 NLF quarterbacks’ passingstatistics.
3. 3. Executive SummaryPredicted Values for New ObservationsNew Obs Fit SE Fit 95% CI 95% PI We created a 15 predictor model to gauge quarterback performance. To help 1 136.13 30.79 (74.60, 197.66) (71.53, 200.74)XXXX denotes we used his is an extreme outlierPlayer X’s, statistics as our new obersvation. Hurney, a point that number one pick, in the predictors.Values of Predictors for New ObservationsNew Obs Comp Att Pct Att/G Yds Avg Yds/G TD Int 1st 1st% Lng 20+ 40+ Sck First, we conducted a normal multi-linear regression model with all predictors 1 273 383 71.3 25.5 3845 10.0 256 35.0 6.00 130 35.5 83.0 39.0 5.00 6.00 and reached an R2 of 82.2%. Then a Stepwise regression eliminated 10 variables as unreliable predictors of quarterback success. The R2 dropped to 78.73%. To improve the model we tried using LnRate and eliminating extreme observations. Our best model is our Without Extreme Observations Model. The regression equation is Rate = - 0.97 + 5.45 Avg + 0.665 Pct + 0.931 TD - 1.12 Int + 0.0285 Lng S = 6.24771 R-Sq = 84.8% R-Sq(adj) = 83.7% Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 129.179 3.002 (123.186, 135.173) (115.340, 143.019) Values of Predictors for New Observations New Obs Avg Pct TD Int Lng 1 10.0 71.3 35.0 6.00 83.0
4. 4. Data SummaryVariable Definitions:Comp: Number of pass completions TD: Number of touch downsAtt: Number of pass attempts Int: Number of interceptions Note: Extreme range ofPct: Percentage rate 1st: Number of first downs predictors leads to many outliers. This isAtt/G: Number of attempts per game 1st%: Percentage rate of first downs partially explained byYds: Number of passing yards Lng: Longest pass in yards the statisticalAvg: Average number of yard 20+: Number of passes over 20 yards information of startingYrds: Number of yards per game 40+: Number of passes over 40 yards QBs v. backup QBs Sck: Number of sacksVariable N N* Mean SE Mean StDev Minimum Q1 Median Q3 MaximumRate 79 0 74.94 2.36 21.00 5.90 65.50 79.90 90.90 111.00Comp 79 0 132.2 14.9 132.7 1.0 13.0 73.0 257.0 450.0Att 79 0 217.5 23.7 210.6 1.0 27.0 133.0 432.0 679.0Pct 79 0 59.42 1.16 10.34 37.00 52.90 59.60 63.40 100.00Att/G 79 0 22.08 1.34 11.95 0.20 12.60 25.30 31.70 42.40Yds 79 0 1525 172 1533 6 122 857 3018 4710Avg 79 0 6.542 0.157 1.392 2.000 5.900 6.700 7.400 9.500Yds/G 79 0 148.0 10.0 89.1 1.2 67.0 171.0 221.0 294.4TD 79 0 9.43 1.20 10.64 0.00 0.00 5.00 17.00 36.00Int 79 0 6.430 0.701 6.232 0.000 1.000 4.000 10.000 25.0001st 79 0 73.95 8.57 76.20 0.00 7.00 39.00 149.00 253.001st% 79 0 31.96 1.33 11.84 0.00 26.50 32.30 35.60 100.00Lng 79 0 51.90 2.73 24.25 6.00 31.00 53.00 73.00 92.0020+ 79 0 19.22 2.20 19.59 0.00 1.00 10.00 38.00 65.0040+ 79 0 3.380 0.432 3.837 0.000 0.000 2.000 6.000 14.000Sck 79 0 14.19 1.48 13.16 0.00 2.00 9.00 25.00 52.00
5. 5. Data Correlation Matrices 0 0 00 00 50 00 0 5 00 0 0 00 00 50 00 0 0 0 0 00 00 0 00 0 00 5 0 0 0 0 0 2 4 0 2 5 5 7 10 2 4 0 2 4 3 6 9 0 1 30 2 40 1 2 0 1 2 0 5 10 5 10 2 5 0 5 1 0 2 4 100 50 0 400 Comp 0.481 200 0.000 0 Att 0.462 0.996 500 250 0.000 0.000 0 100 Pct 0.643 0.170 0.139 75 0.000 0.133 0.220 50 40 Att/G 0.273 0.801 0.809 -0.077 20 0.015 0.000 0.000 0.498 0 4000 Yds 0.504 0.993 0.991 0.163 0.792 2000 0.000 0.000 0.000 0.150 0.000 0 10 Avg 0.695 0.367 0.355 0.360 0.217 0.411 5 0.000 0.001 0.001 0.001 0.055 0.000 000 3 Yds/G 0.436 0.853 0.854 0.033 0.967 0.863 0.406 150 0.000 0.000 0.000 0.774 0.000 0.000 0.000 00 4 TD 0.556 0.950 0.937 0.189 0.734 0.958 0.424 0.817 20 0.000 0.000 0.000 0.096 0.000 0.000 0.000 0.000 0 20 Int 0.224 0.857 0.863 0.065 0.731 0.839 0.245 0.742 0.755 10 0.048 0.000 0.000 0.571 0.000 0.000 0.030 0.000 0.000 0 200 1st 0.500 0.995 0.991 0.169 0.786 0.997 0.395 0.851 0.961 0.837 100 0.000 0.000 0.000 0.136 0.000 0.000 0.000 0.000 0.000 0.000 0 100 1st% 0.534 0.193 0.180 0.477 -0.023 0.209 0.577 0.100 0.236 0.101 0.217 50 0.000 0.089 0.113 0.000 0.842 0.064 0.000 0.378 0.036 0.375 0.055 000 1 Lng 0.389 0.664 0.673 -0.058 0.726 0.679 0.438 0.770 0.664 0.623 0.657 0.051 50 0.000 0.000 0.000 0.612 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.658 0 50 20+ 0.518 0.947 0.948 0.150 0.765 0.976 0.458 0.860 0.933 0.796 0.961 0.218 0.680 25 0 0.000 0.000 0.000 0.186 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.053 0.000 10 40+ 0.497 0.869 0.867 0.141 0.691 0.901 0.447 0.790 0.860 0.676 0.883 0.205 0.697 0.905 5 0.000 0.000 0.000 0.215 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.070 0.000 0.000 0 Sck 0.410 0.847 0.866 0.080 0.692 0.874 0.371 0.760 0.782 0.742 0.853 0.148 0.642 0.882 0.811 0.000 0.000 0.000 0.482 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.194 0.000 0.000 0.000 MATRIX PLOT OF RATE, COMP, ATT, PCT, ATT/G, YDS, AVG, YDS/G, TD, INT, 1ST, 1ST%, LNG, 20+, 40+, SCK
6. 6. Kitchen Sink ModelRegression equation:Rate = - 0.6 - 0.100 Comp + 0.222 Att + 0.856 Pct - 2.59 Att/G - 0.0136 Yds + 1.80 Avg + 0.417 Yds/G + 0.880 TD - 1.62Int - 0.109 1st + 0.126 1st% + 0.128 Lng - 0.145 20+ - 0.907 40+ - 0.035 SckPredictor Coef SE Coef T P Analysis of VarianceConstant -0.60 13.21 -0.05 0.964 Source DF SS MS F PComp -0.1005 0.2127 -0.47 0.638 Regression 15 28259.6 1884.0 19.38 0.000Att 0.2219 0.1005 2.21 0.031 Residual Error 63 6123.8 97.2Pct 0.8562 0.1445 5.93 0.000 Total 78 34383.3Att/G -2.5928 0.9863 -2.63 0.011Yds -0.01360 0.03102 -0.44 0.662Avg 1.798 1.678 1.07 0.288 Source DF Seq SSYds/G 0.4170 0.1547 2.70 0.009 Comp 1 7971.0TD 0.8802 0.4970 1.77 0.081 Att 1 1512.0Int -1.6194 0.4150 -3.90 0.000 P-values for Att, 1 9621.5 Yds/G, TD, and Int Pct Pct, Att/G,1st -0.1085 0.3263 -0.33 0.741 Att/G 1 13.11st% 0.1260 0.1316 0.96 0.342 are significant at1the 95% confidence limit Yds 2194.5Lng 0.12816 0.09180 1.40 0.168 Avg 1 3364.820+ -0.1446 0.5568 -0.26 0.796 Yds/G 1 671.540+ -0.9067 0.9662 -0.94 0.352 TD 1 1294.2Sck -0.0351 0.2440 -0.14 0.886 Int 1 1327.7 15 predictor “Kitchen Sink” MLR model has an R2 of 1st 1 7.7S = 9.85916 R-Sq = 82.2% R-Sq(adj) = 77.9% 82.2% and s value of 9.86 1st% 1 55.2 Lng 1 139.4Predicted Values for New Observations 20+ 1 0.4New Obs Fit SE Fit 95% CI 95% PI 40+ 1 84.6 1 136.13 30.79 (74.60, 197.66) (71.53, 200.74)XX Sck 1 2.0XX denotes a point that is an extreme outlier in the predictors. Predicted QB rating for Player XValues of Predictors for New ObservationsNew Obs Comp Att Pct Att/G Yds Avg Yds/G TD Int 1st 1st% Lng 20+ 40+ Sck 1 273 383 71.3 25.5 3845 10.0 256 35.0 6.00 130 35.5 83.0 39.0 5.00 6.00
7. 7. Assessment: Kitchen Sink Model Independence: The plots do not appear to make a particular shape Constant Variance: The plots look scattered but show some fanning out Mean Zero: The plots are mirrored above and below zero Normal: Residual plot distribution is normal
8. 8. Stepwise Regression Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 To reduce the effect of multicolinearity, as seen inResponse is Rate on 15 predictors, with N = 79 the original model for predictors, Com, Att, Yds, 1st, 20+, and 40+, we ran a stepwise regression.Step 1 2 3 4 5Constant 6.342 -32.066 -24.042 -17.452 -23.056 To improve out model, we chose step five with theAvg 10.49 8.04 6.19 5.85 5.06T-Value 8.48 7.43 5.92 6.01 4.95 predictors, Avg, Pct, TD, Int, and Lng.P-Value 0.000 0.000 0.000 0.000 0.000 Criteria for selection:Pct 0.92 0.89 0.85 0.94T-Value 6.29 6.86 7.07 7.51 R2: 78.73%P-Value 0.000 0.000 0.000 0.000 S: 10TD 0.59 1.08 0.96 P-value: all below the 95% confidence limitT-Value 4.55 6.01 5.16 Number of Variables: Dropped from 10 to 5 P-Value 0.000 0.000 0.000Int -1.05 -1.22 Even thought the R2 decreased and S increased, weT-Value -3.67 -4.18 feel that this is an improved model because theP-Value 0.000 0.000 predictors’ p-values are all below the 95%Lng 0.149 confidence limit.T-Value 2.09P-Value 0.040S 15.2 12.4 11.1 10.2 10.0R-Sq 48.30 66.02 73.37 77.46 78.73R-Sq(adj) 47.63 65.12 72.30 76.24 77.27Mallows Cp 107.9 47.2 23.2 10.7 8.2
9. 9. Stepwise Model: Rate v. Avg, Pct, TD, Int, LngThe regression equation isRate = - 23.1 + 5.06 Avg + 0.942 Pct + 0.958 TD - 1.22 Int + 0.149 Lng Predicted Values for New ObservationsS = 10.0093 R-Sq = 78.7% R-Sq(adj) = 77.3% New Obs Fit SE Fit 95% CI 95% PI 1 133.28 4.62 (124.08, 142.48) (111.31, 155.24)Analysis of VarianceSource DF SS MS F P Values of Predictors for New ObservationsRegression 5 27069.8 5414.0 54.04 0.000Residual Error 73 7313.6 100.2 New Obs Avg Pct TD Int LngTotal 78 34383.3 1 Matrix Plot of35.0 Avg, Pct, TD, Int, Lng 10.0 71.3 Rate, 6.00 83.0 3 6 9 50 75 100 0 20 400 10 20 0 50 100Source DF Seq SS 100Avg decreased to 78.7% R2 1 16608.5 and S increased to 10 50Pct 1 6091.0TD 1 2526.1 Avg 0.695 0 9Int 1 1408.1 0.000 6Lng 1 436.1 3 100 Pct 0.643 0.360 0.000 0.001 75 50 Residual Plots for Rate 40 Normal Probability Plot Versus Fits 99.9 30 TD 0.556 0.424 0.189 20 99 0.000 0.000 0.096 90 15 0 Residual Percent 50 0 20 10 -15 Int 0.224 0.245 0.065 0.755 10 1 -30 0.048 0.030 0.571 0.000 0.1 0 -30 -15 0 15 30 30 60 90 120 Residual Fitted Value Histogram Versus Order Lng 0.389 0.438 -0.058 0.664 0.623 30 0.000 0.000 0.612 0.000 0.000 30 15 Frequency Residual 20 0 10 -15 Independence: The plots do not appear to make a particular shape 0 -30 Constant Variance: The plots look scattered but show some fanning out -24 -12 0 12 24 1 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 Residual Observation Order Mean Zero: The plots are mirrored above and below zero Normal: Residual plot distribution is normal
10. 10. LnRate Stepwise Model: LnRate v. Avg, Pct, TD, Int, LngThe regression equation isLnRate = 2.14 + 0.114 Avg + 0.0194 Pct + 0.00954 TD - 0.0166 Int + 0.00428 LngS = 0.278903 R-Sq = 63.7% R-Sq(adj) = 61.2%Predictor Coef SE Coef T P VIFConstant 2.1439 0.2277 9.42 0.000Avg 0.11391 0.02852 3.99 0.000 1.580Pct 0.019426 0.003496 5.56 0.000 1.310TD 0.009539 0.005170 1.84 0.069 3.037Int -0.016590 0.008130 -2.04 0.045 2.574Lng 0.004275 0.001986 2.15 0.035 2.326 Lower R2 indicates that using Natural Log does not help to have more accurate predication interval, e^(4.645)=104.063, e^(5.8692)=353.965 Prediction Interval range has significantly increased to 249.902. *A few extreme outliers prevent our model enhancement. TD’s VIF of 3.037 indicates multicolinearity, which is creating skewed results. Particularly that TD’s p-value increased to 0.069 and is no longer significant at the 95% confidence limit.Analysis of Variance Independence: The plots appear to make an inverted parabolaSource DF SS MS F P Constant Variance: The plots fan inwardRegression 5 9.9458 1.9892 25.57 0.000 Mean Zero: The plots are mirrored above and below zeroResidual Error 73 5.6785 0.0778 Normal: Residual plot distribution is left skewed.Total 78 15.6242Predicted Values for New ObservationsNew Obs Fit SE Fit 95% CI 95% PI 1 5.2571 0.1286 (5.0008, 5.5135) (4.6450, 5.8692)Values of Predictors for New ObservationsNew Obs Avg Pct TD Int Lng 1 10.0 71.3 35.0 6.00 83.0
11. 11. Without Extreme Observations Model: Rate v. Avg, Pct, TD, Int, Lng After analyzing the players who had unusual data, theirThe regression equation is extreme observations were removed because theyRate = - 0.97 + 5.45 Avg + 0.665 Pct + 0.931 TD - 1.12 Int + 0.0285 Lng represent second string players who did not get enough game to play to produce adequate data.S = 6.24771 R-Sq = 84.8% R-Sq(adj) = 83.7%Predictor Coef SE Coef T P VIFConstant -0.971 6.378 -0.15 0.879Avg 5.4514 0.6974 7.82 0.000 1.374Pct 0.66521 0.08587 7.75 0.000 1.273TD 0.9308 0.1180 7.89 0.000 2.910Int -1.1227 0.1829 -6.14 0.000 2.457Lng 0.02849 0.04806 0.59 0.555 2.284R2 responded well by increasing substantially.P- value of predictors is zero, except for Lng which is .555. However, its VIF of2.284 is not the highest.Point Interval range favorably decreased to 27.679Analysis of VarianceSource DF SS MS F PRegression 5 14378.9 2875.8 73.67 0.000 Independence: The plots do not appear to make a particular shapeResidual Error 66 2576.2 39.0 Constant Variance: The plots look scatteredTotal 71 16955.2 Mean Zero: The plots are mirrored above and below zero Normal: Residual plot distribution is normalPredicted Values for New ObservationsNew Obs Fit SE Fit 95% CI 95% PI 1 129.179 3.002 (123.186, 135.173) (115.340, 143.019)Values of Predictors for New ObservationsNew Obs Avg Pct TD Int Lng 1 10.0 71.3 35.0 6.00 83.0
12. 12. LnRate Without Extreme Observations Model: LnRate V. Avg, Pct, TD, Int, LngThe regression equation isLnRate = 3.09 + 0.0890 Avg + 0.00984 Pct + 0.0103 TD - 0.0128 Int + 0.000813 LngPredictor Coef SE Coef T P VIFConstant 3.0950 0.1172 26.40 0.000Avg 0.08904 0.01282 6.95 0.000 1.374Pct 0.009838 0.001578 6.23 0.000 1.273TD 0.010264 0.002170 4.73 0.000 2.910Int -0.012753 0.003362 -3.79 0.000 2.457Lng 0.0008131 0.0008834 0.92 0.361 2.284S = 0.114843 R-Sq = 77.2% R-Sq(adj) = 75.4%R2 slightly decreased to 77.2%P value of our predictors is zero, except for Lng, which decreased to 0.361Point Interval range, e^(4.7826)=119.4144, e^(5.2914)=198.6213, is roughlytwice as large as the previous model. Therefore the natural log does not improvethe model.Analysis of VarianceSource DF SS MS F P Independence: The plots do not appear to make a particular shapeRegression 5 2.93998 0.58800 44.58 0.000 Constant Variance: The plots look scatteredResidual Error 66 0.87047 0.01319 Mean Zero: The plots are mirrored above and below zeroTotal 71 3.81045 Normal: Residual plot distribution is normalPredicted Values for New ObservationsNew Obs Fit SE Fit 95% CI 95% PI 1 5.0370 0.0552 (4.9269, 5.1472) (4.7826, 5.2914)Values of Predictors for New ObservationsNew Obs Avg Pct TD Int Lng 1 10.0 71.3 35.0 6.00 83.0
13. 13. ConclusionCriteria for choice:The Without Extreme Observations Model improves both R2 and S, but thepredictor, Lng, is no longer significant at the 95% confidence limit. We feel thatthis does put a damper on our results. However, the increased reliability of theregression equation due to the improvement and the appearance of more constantvariance as compared to the Stepwise Model lead us to conclude it is the bestmodel.*Many improvements can be made to this model. For instance, one can observeonly starting quarterbacks. Because we needed to find 50 observations thisadjustment did not meet the objective requirements.Because the R2 is 84.8% and the S is 6.25, Hurney should use caution when usingthis model. The model does not taken into account all the information.
14. 14. Statistics For-The-Win!!!