Undercover Boss: Stripping Away the Disguise to Analyze the Financial Perform...
Statistics Report
1. John Worth
In an effort to determine the best possible method to predict the total gross revenue of
major Hollywood movies in theaters in the United States, a test was conducted based on
observations of 52 Hollywood movies released in the year 2012. The independent variables
chosen to explain this were opening weekend performance, the season in which the film was
released, the estimated budget of the film, and the number of screens the film was initially shown
on in the USA. Based on the data observed, I can say with 95% confidence that a Hollywood
movie released in the USA will generate between $97 million and $164 million. This being said,
there is reason to believe that the performance of films are explained by at least one of the other
variables analyzed.
Prior to any statistical analysis, predictions were made pertaining to the relationships
between each independent variable and the dependent. It is believed that opening weekend
performance, estimated budget, and number of screens shown in the USA all have a positive
relationship with the dependent variable (as each increases, so does the dependent). It is also
believed that of the seasons, summer films would generate the highest total gross. A regression
model of the entire data set can be seen in the attached appendix. It is believed a more accurate
model can be formed.
Multicollinearity was not present in the model, in which case a test was conducted to find
overall significance in the variables’ relationships. It was found that overall significance is
present. Next, tests were conducted to find individual significance between each independent
variable and total gross. From these tests, season released was found to be insignificant and was
removed. A new test was conducted to ensure that this new reduced model was more accurate,
2. which was the result. When the observations for “Taken 2” (see data) were entered into this
model, the result was $121.12M. It is believed that a more accurate model could still be formed.
VIF tests were then conducted and opening weekend was found to be dangerously high, it
was subsequently removed. The new model now includes only estimated budget and screens
shown in the USA as the independent variables. The next test conducted was to determine if
there was overall significance in this model, the results showed that there was. Another set of
tests for individual significance of the two remaining independent variables were conducted
afterwards. Estimated budget was still found to have individual significance. Screens shown in
the USA, however, no longer held individual significance, and consequently were removed as
well. The new model now only contains estimated budget as the independent variable explaining
total USA gross. Now that a bivariate model has been formed, it must be checked for any
violations of the assumptions of linear regression.
Through multiple tests including scatter plots, residual plots, and Durbin Watson tests, a
visual detection indicated a possible violation of the assumption that the expected value of the
error terms are linear. The appropriate polynomial prescription was applied to determine whether
or not this would improve the model. A new variable, budget squared, was then added and a
regression was run. This new model increased the percent of variability in the dependent variable
explained by the equation as well as reduced the standard error. A test was then conducted to
determine overall significance in the model, which there was. However, when conducting tests of
individual significance, both variables failed to show any, which resulted in the removal of the
polynomial variable. When entered into the bivariate model, the “Taken 2” observation is
34.89+1.13(45)=$85.72M. Based on the findings of the tests conducted, this is believed to be the
best model to predict total USA gross for major Hollywood movies.
3. Appendices
Opening Weekend (in millions) Estimated Budget (in millions) Screens Shown in USA
Opening Weekend (in millions) 1
Estimated Budget (in millions) 0.60686209 1
Screens Shown in USA 0.604160598 0.597862186 1
Total USA Gross (in millions)
Mean 130.9815
Standard Error 16.64957
Median 98
Mode 126
Standard Deviation 120.0618
Sample Variance 14414.83
Kurtosis 5.217573
Skewness 2.043474
Range 622.96
Minimum 0.04
Maximum 623
Sum 6811.04
Count 52
Confidence Level(95.0%) 33.42542
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.95803782
R Square 0.917836465
Adjusted R Square 0.906881327
Standard Error 36.63727831
Observations 52
ANOVA
df SS MS F Significance F
Regression 6 674753.4466 112458.9078 83.78136928 9.29657E-23
Residual 45 60403.0573 1342.290162
Total 51 735156.5039
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 90.0% Upper 90.0%
Intercept 109.995316 24.08266359 4.567406575 3.82446E-05 61.49034168 158.5002904 69.55023111 150.440401
Opening Weekend (in millions) 2.729583269 0.166160103 16.42742881 1.2431E-20 2.394919641 3.064246896 2.450529439 3.008637098
Fall -23.82059761 16.58196578 -1.436536411 0.157766127 -57.21839107 9.577195857 -51.66880516 4.027609941
Summer -28.54918988 16.92093063 -1.687211567 0.09848373 -62.6296936 5.531313848 -56.96666428 -0.131715467
Spring -40.67183284 17.60329404 -2.310467162 0.025502793 -76.12668703 -5.21697865 -70.23528706 -11.10837862
Estimated Budget (in millions) 0.352391577 0.102653087 3.432839562 0.001292254 0.145637647 0.559145506 0.179993171 0.524789982
Screens Shown in USA -0.028689901 0.007449963 -3.851012523 0.00036959 -0.043694897 -0.013684905 -0.041201574 -0.016178229
4. Full Regression F Test
H0: β1=0 α=0.05 F Critical=2.34
Ha: β1≠0 I reject H0 if F>2.34
F=83.78 83.78>2.34 I REJECT H0, THERE IS OVERALL SIGNIFICANCE.
Individual t Tests
Opening Weekend:
H0: β1=0 α=0.025 t Critical=2.021
Ha: β1≠0 I reject H0 if t>2.021 or t<-2.021
t=16.43 16.43>2.021 I REJECT H0, THERE IS INDIVIDUAL SIGNIFICANCE.
Estimated Budget:
H0: β=0 α=0.025 t Critical=2.021
Ha: β≠0 I reject H0 if t>2.021 or t<-2.021
t=3.43 3.43>2.021 I REJECT H0, THERE IS INDIVIDUAL SIGNIFICANCE.
Screens Shown in USA:
H0: β=0 α=0.025 t Critical=2.021
Ha: β≠0 I reject H0 if t>2.021 or t<-2.021
t=-3.85 -3.85<-2.021 I REJECT H0, THERE IS INDIVIDUAL SIGNIFICANCE.
Seasons:
Fall;
H0: β=0 α=0.025 t Critical=2.021
Ha: β≠0 I reject H0 if t>2.021 or t<-2.021
t=-1.44 I DO NOT REJECT H0, THERE IS NO INDIVIDUAL SIGNIFICANCE.
QUALITATIVE VARIABLE AS A WHOLE IS INSIGNIFICANT
5. Correlation Test
Multicollinearity does not exist.
Reduced Regression Model
Partial F Test
H0: All β=0 α=0.05 F Critical=2.84
Ha: At least one β≠0 I reject H0 is F>2.84
SSEf=67751.94 SSEr=60403.06 K=6 L=3 MSEf=1342.29
[(67751.94-60403.06)/6-3]/1342.29= 1.82
1.82<2.84 I DO NOTREJECT H0. SUGGEST REDUCED MODEL.
Total Gross=83.84+2.69(Opening Weekend)+0.34(Estimated Budget)-0.03(Screens)
Point Estimate (Based on final observation)
=83.84+2.69(49)+0.34(45)-0.03(3661)
=121.12
Total USA Gross (in millions) Opening Weekend (in millions) Estimated Budget (in millions) Screens Shown in USA
Total USA Gross (in millions) 1
Opening Weekend (in millions) 0.933995579 1
Estimated Budget (in millions) 0.637269456 0.60686209 1
Screens Shown in USA 0.466753369 0.604160598 0.597862186 1
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.952806441
R Square 0.907840114
Adjusted R Square 0.902080121
Standard Error 37.56991798
Observations 52
ANOVA
df SS MS F Significance F
Regression 3 667404.5645 222468.1882 157.6113264 7.55342E-25
Residual 48 67751.93939 1411.498737
Total 51 735156.5039
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 83.84209661 19.02099695 4.407870777 5.852E-05 45.59781902 122.0863742 45.59781902 122.0863742
Opening Weekend (in millions) 2.690003668 0.168337033 15.97986863 7.54681E-21 2.351539379 3.028467957 2.351539379 3.028467957
Estimated Budget (in millions) 0.343073737 0.104961234 3.268575672 0.002001929 0.132035031 0.554112443 0.132035031 0.554112443
Screens Shown in USA -0.02784698 0.007340335 -3.793693302 0.000415794 -0.042605713 -0.013088247 -0.042605713 -0.013088247
6. Bivariate Regression Models
Opening Weekend:
Estimated Budget:
Screens Shown in USA:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.637269456
R Square 0.40611236
Adjusted R Square 0.394234607
Standard Error 93.44520973
Observations 52
ANOVA
df SS MS F Significance F
Regression 1 298556.1428 298556.1428 34.19100961 3.77646E-07
Residual 50 436600.361 8732.007221
Total 51 735156.5039
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 34.89425938 20.9274484 1.667391968 0.101688883 -7.139757794 76.92827655 -7.139757794 76.92827655
Estimated Budget (in millions) 1.129544142 0.193173365 5.847307895 3.77646E-07 0.74154402 1.517544264 0.74154402 1.517544264
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.933995579
R Square 0.872347742
Adjusted R Square 0.869794697
Standard Error 43.32306257
Observations 52
ANOVA
df SS MS F Significance F
Regression 1 641312.1164 641312.1164 341.6891161 5.36337E-24
Residual 50 93844.3875 1876.88775
Total 51 735156.5039
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 28.63753479 8.169972732 3.505217915 0.000972696 12.22766161 45.04740797 12.22766161 45.04740797
Opening Weekend (in millions)2.639380358 0.142786257 18.48483476 5.36337E-24 2.352585721 2.926174995 2.352585721 2.926174995
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.466753369
R Square 0.217858707
Adjusted R Square 0.202215881
Standard Error 107.237704
Observations 52
ANOVA
df SS MS F Significance F
Regression 1 160160.2455 160160.2455 13.9270685 0.000486616
Residual 50 574996.2584 11499.92517
Total 51 735156.5039
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -48.89307659 50.44122658 -0.969307844 0.337057283 -150.2072619 52.42110869 -150.2072619 52.42110869
Screens Shown in USA 0.058006437 0.015543411 3.731898779 0.000486616 0.026786577 0.089226297 0.026786577 0.089226297
7. VIF’s For Reduced Model
Opening Weekend=Total USA Gross, Budget, Screens VIF=11.11
Budget=Opening Weekend,Total USA, Screens VIF=2.22
Screens=Budget, Opening Weekend, Total USA VIF=2.33
Total USA=Budget, Weekend, Screens VIF=11.11
REMOVE VARIABLE “OPENING WEEKEND” VIF TOO HIGH.
New Reduced Regression Model
Individual t Tests
Budget:
H0: β1=0 α=0.025 t Critical=2.021
Ha: β1≠0 I reject H0 if t>2.021 or t<-2.021
t=4.10 4.10>2.021 I REJECT H0. THERE IS INDIVIDUAL SIGNIFICANCE.
Screens:
H0: β1=0 α=0.025 t Critical=2.021
Ha: β1≠0 I reject H0 if t>2.021 or t<-2.021
t=0.98 I DO NOT REJECT H0. VARIABLE IS NOT SIGNIFICANT. REMOVE.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.646186363
R Square 0.417556816
Adjusted R Square 0.393783624
Standard Error 93.47998751
Observations 52
ANOVA
df SS MS F Significance F
Regression 2 306969.6087 153484.8044 17.56418867 1.77301E-06
Residual 49 428186.8952 8738.508065
Total 51 735156.5039
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -4.505973741 45.28395337 -0.099504867 0.921143411 -95.50748506 86.49553758 -95.50748506 86.49553758
Estimated Budget (in millions) 0.988120623 0.241074754 4.098814197 0.00015567 0.503662767 1.472578478 0.503662767 1.472578478
Screens Shown in USA 0.016585523 0.016902866 0.981225493 0.331301683 -0.017382058 0.050553105 -0.017382058 0.050553105
8. New Reduced Model
Assumptions Check
-500
0
500
0 50 100 150 200 250 300
Residuals
Estimated Budget (in millions)
Estimated Budget (in millions)
Residual Plot
9. Original Data Set
Movie Title Total USA Gross (in millions) Opening Weekend (in millions) Fall Summer Spring Estimated Budget (in millions) Screens Shown in USA
The Avengers 623 207 0 0 1 220 4349
Skyfall 304 88 1 0 0 200 3505
The Dark Knight Rises 448 160 0 1 0 250 4404
The Hobbit 303 84 0 0 0 180 4045
Ice Age 161 46 0 1 0 95 3881
Twilight 292 141 1 0 0 120 4070
Amazing Spider Man 262 62 0 1 0 230 4318
Madagascar 3 216 60 0 1 0 145 4258
Men in Black 3 179 54 0 0 1 225 4248
The Hunger Games 408 152 0 0 1 78 4137
This is 40 67 11 0 0 0 35 2913
Argo 136 19 1 0 0 44.5 3232
Ted 218 54 0 1 0 50 3239
21 Jump Street 138 36 0 0 1 42 3121
Prometheus 126 51 0 1 0 51 3396
Dictator 59 17 0 0 1 65 3008
Safe House 126 40 0 0 0 85 3119
The Bourne Legacy 113 38 0 1 0 125 3745
Django 162 30 0 0 0 100 3010
Rise of the Guardians 103 23 1 0 0 145 3653
Paranormal Activity 4 53 29 1 0 0 5 3412
Looper 66 20 1 0 0 30 2992
Dark Shadows 79 29 0 0 1 150 3755
Snow White and the Huntsmen 155 56 0 1 0 170 3773
Dredd 13 6 1 0 0 35 2506
Step up Revolution 35 11 0 1 0 33 2567
Silver Linings Playbook 132 0.4 1 0 0 21 16
Wreck-It Ralph 189 49 1 0 0 165 3752
Cloud Atlas 27 9 1 0 0 102 2008
Les Miserables 148 28 0 0 0 61 2808
Cabin in the Woods 42 14 0 0 1 30 2811
Magic Mike 113 39 0 1 0 7 2930
Lincoln 182 0.9 1 0 0 65 11
Jack Reacher 80 15 0 0 0 60 3352
Flight 93 4 1 0 0 31 1884
Savages 47 16 0 1 0 45 2628
End of Watch 41 13 1 0 0 7 2730
Hotel Transylvania 148 42 1 0 0 85 3349
Expendables 2 85 28 0 1 0 92 3316
LOL 0.04 0.04 0 0 1 11 105
American Reunion 56 21 0 0 1 50 3192
Total Recall 58 25 0 1 0 125 3601
Abraham Lincoln Vampire Slayer 37 16 0 1 0 69 3108
Red Dawn 44 14 1 0 0 65 2725
Project X 54 21 0 0 1 12 3055
Battleship 65 25 0 0 1 209 3690
Chronicle 64 22 0 0 0 12 2907
Here Comes the Boom 45 11 1 0 0 42 3014
The Watch 34 12 0 1 0 68 3168
The Chernobyl Diaries 18 7 0 0 1 1 2433
Alex Cross 25 11 1 0 0 35 2339
Taken 2 139 49 1 0 0 45 3661