SlideShare a Scribd company logo
1 of 15
Domestic Box Office Success (1980-2016)
I. Introduction
The movie industry has been developing for hundreds of years, especially from the late
19th century until what it is today in the 21st century. It is a pivotal industry in
entertainment today, which has brought enjoyment for many people. There are hundreds
of films being released each year, which greatly differ in successes. From this knowledge we
decided to research more into what makes a filmmore or less successful. The purpose is to
determine a film’s success in U.S. domestic markets and to find various data that would
determine if certain variables will or will not make a movie more or less successful. We will
estimate an Ordinary Least Squares regression to attempt to explain the monetary success
based off of variables such as budget, genre, rating, etc. Will the factors that we see such as
rating, genre or MPAA affect the success of a filmrelative to another? The foundation of
the research project is based off of this question, where it will be expanded the further we
move forward with this project.
II. Literature Review
Although our project was almost entirely created from intuition and self-knowledge of
what we know of the movie industry we decided to find out if there were any other tests by
other economists to determine the success in the film industry. From that, we found a few
articles and research papers, but we will be using our basis off of a paper that attempts to
answer nearly the same question, called “Examining Success in the Motion Picture Industry”
by Pat Topf. Topf has decided to use Total revenue as the basis for the project, with the
variables of Production Costs, Star Power, Age-Appropriate rating as a dummy variable,
Genre, Sequel, Summer/Winter Release, Holiday Release, as well as an interaction variable
between advertising costs and professional review scores. From this research we have
decided to use it as our basis and how our model will differ from Topf’s model as we will be
using a few different variables, while trying to achieve the same result in determining the
success of a filmin the motion picture industry.
III. Theoretical Model
We have decided to adjust all of the data for ticket price inflation to give a more
accurate representation of the impact of the movie in the box office through numerous
variables. The dependent variable is Domestic Gross adjusted in 2016 dollars, which we will
be determine if it is affected by: the production budget, the maximum number of theaters
that it is released in, the maximum number of weeks that it was released in a theater,
whether or not it is a sequel or not, the IMDB rating, Rotten Tomatoes Rating, the Motion
Picture Association of America (MPAA) filmrating system of G, PG, PG-13, R and Unrated.
As well as the quarter of the year that it was released in, as well as all of the genres that it is
considered from Box Office Mojo.
IV. Data
The dataset that we analyzed is the top 102 films in box office gross, released in the
United States from the year 1980 to 2016. The data was chosen to be after the year 1980
as we mainly looked pre-1980, as we felt that the production budget has one of the largest
impacts on the domestic gross of the movies. The choice of data, ensures accuracy for our
research, as it would give us complete data for the entirety of our sample size. We have
decided to use the top 102 grossing movies of all time from the given time frame as our
sample size, where it would not be an accurate representation of any movie being released
to make a guaranteed x amount from certain variables, but rather the most successful
movies have certain variables in them that would or would not have made them more or
less successful. All of the data will be available from three websites that we have chosen
that are considered one of the most popular data websites in the movie industry. We will
be using the Internet Movie Database, Rotten Tomatoes, and Box Office Mojo.
For our dependent model we will be determining the domestic gross of a film, in
millions of dollars. This variable is adjusted for inflation and will be represented as
(Domestic Gross). It is the revenue that a film has grossed during its theatrical release.
For our first independent variable, we have felt that this is our most important variable
which is the production budget, which will be denoted as (Production), as it is the amount
of money that the directors and producers of a filmis given in order to produce a film by
their production company. For our second variable, we have decided to use (Theaters),
which will be the maximum number of theatres that the movie was widely released in
during its theatrical release. While (Weeks) will be the number of weeks that the given
movie in our population was in theaters at any given time. We have decided to use the
ranking from the Internet Movie Database from a scale of 1-10 based off of the average
review by an IMDB user, which is going to represented as (IMDB), which we will do the
same for Rotten Tomatoes critics, but instead being rated from a scale of 1-100, which will
be denoted as (RoT).
We have quite a few dummy variables in our data, which will be represented by a 1 if it
fulfills the category, and 0 if otherwise not fulfilling the specific category. For these dummy
variables we will be using (Release), which determines whether or not the movie was re-
released in theatre at any time, which we believe is a pivotal factor in how much revenue
that a movie can potentially earn. As well as (Sequel), which may give some movies an edge
in the market as it is part of a film in a series. We will only be using this variable if it is truly
a sequel to a previous film. We will also be using a few dummy variables, which will be the
rating that the Motion Picture Association of America gives a film, also known as the MPAA
rating which scales from G to NC-17, while we chose to remove NC-17 and Unrated as we
have found that no films in our sample size were considered either of those. The last
dummy variable that we will be using is which quarter that the movie was released in. We
originally wanted to measure whether or not it was released in the summer or winter
months as we felt that it would be a good indicator (Topf, 2009). But, we have decided not
to as many movies were not released in those specific seasons, but rather two or three
weeks before, so we have decided to just opt with using the quarter systemas many
companies do quarterly releases on their financial statements.
V. Empirical Model
For our project we have decided to run a few models making adjustments as needed in
order to try to find the best feasible model given the data that was readily available for us.
After much discussion as a group we have decided to come up with two models that will be
analyzed, with our preliminary model involving all of the independent variables being
previously mention. While on the other had our Adjusted model omitting the variables:
Sequel, IMDB Rating, as well as Rotten Tomatoes Rating.
Preliminary Model:
Domestic Gross = β0 + β1(Production) + β2(Theaters) + β3(Weeks) + β4(IDBM) + β5(RoT) +
β6(Sequel) + β7(PG) + β8(PG-13) + β9(R) + β10(UR) + β11(Q2) + β12(Q3) + β13(Q4) + ε
After some theory we have decided to omit some variables from our original model to
make an adjusted model. Our second model that we examining is the model where we
will omit Sequel, IMDB rating, as well as Rotten Tomatoes Rating, as we saw these were
insignificant at 5% from our preliminary testing as seen in table 1.3, where we would
decide whether or not these variables were statistically significant or not.
Adjusted Model:
Domestic Gross = β0 + β1(Production) + β2(Theaters) + β3(Weeks) + β4(ReRelease) +
β5(PG) + β6(PG-13) + β7(R) + β8(Q2) + β9(Q3) + β10(Q4) + ε
The adjusted model is expected to receive better results after omitting variables that we
previously though were important, but statistically they would not be significant in our
hypothesis on the determination of success in the motion picture industry.
VI. Empirical Results
After running a regression on our first model, we received a R-squared value of
0.452 with an adjusted R-squared value of 0.370 as seen in table 1.1. This shows the
overall fitness of our model, and while it does not seem that strong of a model at a 0.370
value from the adjusted R-squared, we decided to do other tests to find out more about
our model. From this model, it is hard to determine the dependent variable, as we believe
that a higher adjusted R-squared would give a better explanation of the hypothesis that we
are attempting to answer. In this model we decided to do a F-test, although we were not
content with the model with the H0: B1 = B2 = B3 = B4 = B5 = B6 = B7 = B9 = B10 = B11 = B12 =
B13 = 0 and 13 degrees of freedom, we found the Critical-F value of 1.83 from our degrees
of freedom in our numerator and our denominator, with the F-value being 5.513, where all
of the data can be found on table 1.2. As 5.513 is clearly greater than our critical F, we
rejected the null. Since our preliminary model should still consist of running tests, we ran a
T-test on all of the variables, where we it would help assess the chance of the slope’s true
value. The Critical-T value was 1.984 from the formula, where we rejected the null on
whatever variable was greater than the Critical-T value. There were no signs of
multicollinearity in our model as the Variance Inflation Factor was less than 5 for the
majority of our variables that were not dummy variables, so we concluded that there is no
multicollinearity in our model from our given dataset.
After a few of our models that were ran, we have decided to proceed with the
adjusted model, as it seemed as the best feasible model based off of our data that we have
acquired. This is due to the significance of every variable that we have decided to keep,
which were: Production, Theatres, Weeks, Re-release, MPAA Ratings, and Quarters. The
significance of every variable was less than 8.5%, which is the closest we believe that we
can achieve, without finding the omitted variables which may be present in our model.
After the improvement of our F-Value increasing from 5.513 to 6.950 from our preliminary
model to our adjusted model, we have decided that there is not a high chance that the
variance of the variables is equal to each other, which makes the overall fitness of our
model marginally better. In running our T-test for the adjusted model, the majority of our
variables T-values were greater than the Critical-T, so they passed the test with flying colors
while rejecting the null. As well as our F-value rejecting the Critical-F null value of 1.94, it
shows that our model is marginally more significant prior to our preliminary model.
VII. Empirical Testing
One of the “diseases” that we tested for was multicollinearity, which was pretty
easy, as we just used the Variance Inflation Factor or the VIFs. Although we have decided
to use the adjusted model, we did that for both models just to get a general idea of how
the VIFs change throughout both models. Since most of the variables did not have VIFs
above 5.0, other than the dummy variables, but we saw those as exceptions as they are in
binary codes consisting of 0 and 1s, as to whether or not they met that variable
requirement or not. We have decided that we do not have multicollinearity among our
variables in our model. The VIF values are in tables 1.3 and 2.3, and is consistent for both
models run, that there are no signs of multicollinearity, with the exceptions of the dummy
variables.
After checking for multicollinearity, we would check whether or not our model has
heteroscedasticity, as in order to have a model it should pass all of the classical
assumptions, where one of the assumptions is that the variances in the model has to be
homoscedastic. After plotting the residuals against the standard predicted value in table
3.1, as well as the frequency and residuals in a histogram on table 3.2, we may have
heteroscedasticity, so we would clearly decide to test for it and attempt to correct for it.
We will test for it using the White Test, as we do not know the Z-value for the Park Test, we
felt that the White Test would be the best choice.
To do the white test, we squared the error term observations against the
independent variables that included all of the variables from the adjusted model, the
independent variables from the adjusted regression squared, and the independent
variables from the adjusted regression multiplied by another independent variable, doing
all of the possible combinations, while not including the dummy variables when squaring
or multiplying the independent variables against one another. Running the regression, we
get table 3.3 and 3.4. Find the critical-chi square value it was 26.30 at a 5% significance
level with 16 degrees of freedom after accounting for the White Test variables. After
finding the Chi-square we receive a value of 23.5 after multiplying the Population sample
against the adjusted R-squared from the White Test regression. We can conclude since
23.5>26.3 is not true, we do not reject the null so there is no heteroscedasticity present in
our model.
VIII. Conclusion
The results from our research suggest that Production Budget, MPAA Rating, the
number of weeks the movie was in theaters, as well as whether or not the movie was re-
released has a positive and statistically significant effect on the domestic gross revenue of a
film.
We have also concluded that the quarter in which the movie was released in is
inconclusive to our model, which may be replaced by whether or not the movie was
released on or near a holiday to give our model a better fit. While ratings by internet
websites such as the Internet Movie Database and Rotten Tomatoes, as widely popular they
are in the film industry is insignificant in our model, as well as whether or not a movie was a
sequel or not so we have decided to omit those variables from our original model.
After testing for both multicollinearity and heteroscedasticity, the implications of our
findings is that although our model may not be extremely strong there is still some
significance to it. Production budget is probably the most important factor in determining
how much a movie will make in the box office, while there are numerous of other variables
such as the MPAA rating or how long the movie was in the theatres, although it can be
argued that the length that a movie stays in theatres is due to causation of the movie
making a large amount of money and not a factor in determining how much a movie will
make. From our findings, there is a lot of variables to take into account that we did not
include, which could be added on, such as adding more movies into the data set to get a
larger sample size to get a more accurate representation, instead of the top 102 movies that
we have chosen for this movie. As well as, adding other dummy variables such as which
production company helped the movie acquire funds and advertising, or whether a movie
was released on a holiday.
Consequently, we can conclude that the film industry is very volatile and risky, and
certain variables can increasingly change the domestic box office success of a film, although
we there is not complete transparency from production companies in data, such as
advertising budgets. The movie industry is extremely complex and it cannot be narrowed
down by a few variables in its success, but rather it is determined by large numbers of
individuals willing and able to see a movie, which is hard to quantify statistically especially if
they will see the movies in theatres or not.
Appendix
Table 1.1
Table 1.2
Table 1.3
Table 2.1
Table 2.2
Table 2.3
Table 3.1
Table 3.2
Table 3.3
Table 3.4
Model Summaryb
Model R R Square Adjusted R Square
Std. Error of the
Estimate
1 .592a .351 .227 28238.27525
a. Predictors: (Constant), ReleaseWeeks, Quarter 2, PG, Re-Release, R, Quarter
3, ProductionSquared, WeeksSquared, PG-13, Quarter 4, ReleaseSquared,
ProductionWeeks, # of Weeks, Production Budget (in millions of dollars),
Widest Release (theatres), ProductionRelease
b. Dependent Variable: ResidualSquared
ANOVAa
Model
Sum of
Squares df Mean Square F Sig.
Works Cited
“Box Office Mojo” BoxOfficeMojo. 5/2/2016. http://www.boxofficemojo.com/
“The Internet Movie Database” IMDB. 1980-2016. http://www.imdb.com/
1 Regression 36222956190
.000
16 2263934762.
000
2.839 .001b
Residual 66981615870
.000
84 797400188.9
00
Total 10320457210
0.000
100
a. Dependent Variable: ResidualSquared
b. Predictors: (Constant), ReleaseWeeks, Quarter 2, PG, Re-Release, R, Quarter 3,
ProductionSquared, WeeksSquared, PG-13, Quarter 4, ReleaseSquared,
ProductionWeeks, # of Weeks, Production Budget (in millions of dollars), Widest
Release (theatres), ProductionRelease
“Rotten Tomatoes” RottenTomatoes. 5/2/2016. Http://www.rottentomatoes.com/
Topf, P. (2009). Examing Success in the Motion Picture Industry. Retrieved May 2, 2016, from
https://www.iwu.edu/economics/PPE18/8Topf.pdf

More Related Content

Similar to Econometric Paper Final Draft

movieRecommendation_FinalReport
movieRecommendation_FinalReportmovieRecommendation_FinalReport
movieRecommendation_FinalReportSohini Sarkar
 
Predicting movie success from search
Predicting movie success from searchPredicting movie success from search
Predicting movie success from searchijaia
 
Media evaluation question 3
Media evaluation question 3Media evaluation question 3
Media evaluation question 3cam_burden
 
Ibm advanced analytics platform for m&e
Ibm advanced analytics platform for m&eIbm advanced analytics platform for m&e
Ibm advanced analytics platform for m&eUnited Partners
 
Media A2 Evaluation
Media A2 EvaluationMedia A2 Evaluation
Media A2 Evaluationadamsims1992
 
What kind of media institution might distribute your media product and why?
What kind of media institution might distribute your media product and why?What kind of media institution might distribute your media product and why?
What kind of media institution might distribute your media product and why?hemal17
 
What kind of media institution might distribute your media product and why?
What kind of media institution might distribute your media product and why?  What kind of media institution might distribute your media product and why?
What kind of media institution might distribute your media product and why? hemal17
 
What kind of media institution might distribute your media product and why?
What kind of media institution might distribute your media product and why? What kind of media institution might distribute your media product and why?
What kind of media institution might distribute your media product and why? hemal17
 
Movies Sample PresentationSCM 315 – Business Decision Models.docx
Movies Sample PresentationSCM 315 – Business Decision Models.docxMovies Sample PresentationSCM 315 – Business Decision Models.docx
Movies Sample PresentationSCM 315 – Business Decision Models.docxgilpinleeanna
 
Overview of IMDB and Netflix Data
Overview of IMDB and Netflix DataOverview of IMDB and Netflix Data
Overview of IMDB and Netflix DataUrvashiChoudhary11
 
Comparision between digital platforms.pdf
Comparision between digital platforms.pdfComparision between digital platforms.pdf
Comparision between digital platforms.pdfasphunk999
 
movie_notebook.pdf
movie_notebook.pdfmovie_notebook.pdf
movie_notebook.pdfpinstechwork
 
Genre market analysis
Genre market analysisGenre market analysis
Genre market analysisAaronGemmell
 
Short film complete pro forma
Short film complete pro formaShort film complete pro forma
Short film complete pro formaJack Ward
 

Similar to Econometric Paper Final Draft (20)

Movies and Market Share
Movies and Market ShareMovies and Market Share
Movies and Market Share
 
GROUP 8
GROUP 8GROUP 8
GROUP 8
 
movieRecommendation_FinalReport
movieRecommendation_FinalReportmovieRecommendation_FinalReport
movieRecommendation_FinalReport
 
Predicting movie success from search
Predicting movie success from searchPredicting movie success from search
Predicting movie success from search
 
Media evaluation question 3
Media evaluation question 3Media evaluation question 3
Media evaluation question 3
 
Ibm advanced analytics platform for m&e
Ibm advanced analytics platform for m&eIbm advanced analytics platform for m&e
Ibm advanced analytics platform for m&e
 
Media A2 Evaluation
Media A2 EvaluationMedia A2 Evaluation
Media A2 Evaluation
 
What kind of media institution might distribute your media product and why?
What kind of media institution might distribute your media product and why?What kind of media institution might distribute your media product and why?
What kind of media institution might distribute your media product and why?
 
What kind of media institution might distribute your media product and why?
What kind of media institution might distribute your media product and why?  What kind of media institution might distribute your media product and why?
What kind of media institution might distribute your media product and why?
 
What kind of media institution might distribute your media product and why?
What kind of media institution might distribute your media product and why? What kind of media institution might distribute your media product and why?
What kind of media institution might distribute your media product and why?
 
Movies Sample PresentationSCM 315 – Business Decision Models.docx
Movies Sample PresentationSCM 315 – Business Decision Models.docxMovies Sample PresentationSCM 315 – Business Decision Models.docx
Movies Sample PresentationSCM 315 – Business Decision Models.docx
 
Overview of IMDB and Netflix Data
Overview of IMDB and Netflix DataOverview of IMDB and Netflix Data
Overview of IMDB and Netflix Data
 
Comparision between digital platforms.pdf
Comparision between digital platforms.pdfComparision between digital platforms.pdf
Comparision between digital platforms.pdf
 
Evaluation
EvaluationEvaluation
Evaluation
 
movie_notebook.pdf
movie_notebook.pdfmovie_notebook.pdf
movie_notebook.pdf
 
IMC Project
IMC ProjectIMC Project
IMC Project
 
Genre market analysis
Genre market analysisGenre market analysis
Genre market analysis
 
Institutions
InstitutionsInstitutions
Institutions
 
Short film complete pro forma
Short film complete pro formaShort film complete pro forma
Short film complete pro forma
 
Evaulation
EvaulationEvaulation
Evaulation
 

Econometric Paper Final Draft

  • 1. Domestic Box Office Success (1980-2016) I. Introduction The movie industry has been developing for hundreds of years, especially from the late 19th century until what it is today in the 21st century. It is a pivotal industry in entertainment today, which has brought enjoyment for many people. There are hundreds of films being released each year, which greatly differ in successes. From this knowledge we decided to research more into what makes a filmmore or less successful. The purpose is to determine a film’s success in U.S. domestic markets and to find various data that would determine if certain variables will or will not make a movie more or less successful. We will estimate an Ordinary Least Squares regression to attempt to explain the monetary success based off of variables such as budget, genre, rating, etc. Will the factors that we see such as rating, genre or MPAA affect the success of a filmrelative to another? The foundation of the research project is based off of this question, where it will be expanded the further we move forward with this project. II. Literature Review Although our project was almost entirely created from intuition and self-knowledge of what we know of the movie industry we decided to find out if there were any other tests by other economists to determine the success in the film industry. From that, we found a few articles and research papers, but we will be using our basis off of a paper that attempts to answer nearly the same question, called “Examining Success in the Motion Picture Industry” by Pat Topf. Topf has decided to use Total revenue as the basis for the project, with the variables of Production Costs, Star Power, Age-Appropriate rating as a dummy variable,
  • 2. Genre, Sequel, Summer/Winter Release, Holiday Release, as well as an interaction variable between advertising costs and professional review scores. From this research we have decided to use it as our basis and how our model will differ from Topf’s model as we will be using a few different variables, while trying to achieve the same result in determining the success of a filmin the motion picture industry. III. Theoretical Model We have decided to adjust all of the data for ticket price inflation to give a more accurate representation of the impact of the movie in the box office through numerous variables. The dependent variable is Domestic Gross adjusted in 2016 dollars, which we will be determine if it is affected by: the production budget, the maximum number of theaters that it is released in, the maximum number of weeks that it was released in a theater, whether or not it is a sequel or not, the IMDB rating, Rotten Tomatoes Rating, the Motion Picture Association of America (MPAA) filmrating system of G, PG, PG-13, R and Unrated. As well as the quarter of the year that it was released in, as well as all of the genres that it is considered from Box Office Mojo. IV. Data The dataset that we analyzed is the top 102 films in box office gross, released in the United States from the year 1980 to 2016. The data was chosen to be after the year 1980 as we mainly looked pre-1980, as we felt that the production budget has one of the largest impacts on the domestic gross of the movies. The choice of data, ensures accuracy for our research, as it would give us complete data for the entirety of our sample size. We have decided to use the top 102 grossing movies of all time from the given time frame as our
  • 3. sample size, where it would not be an accurate representation of any movie being released to make a guaranteed x amount from certain variables, but rather the most successful movies have certain variables in them that would or would not have made them more or less successful. All of the data will be available from three websites that we have chosen that are considered one of the most popular data websites in the movie industry. We will be using the Internet Movie Database, Rotten Tomatoes, and Box Office Mojo. For our dependent model we will be determining the domestic gross of a film, in millions of dollars. This variable is adjusted for inflation and will be represented as (Domestic Gross). It is the revenue that a film has grossed during its theatrical release. For our first independent variable, we have felt that this is our most important variable which is the production budget, which will be denoted as (Production), as it is the amount of money that the directors and producers of a filmis given in order to produce a film by their production company. For our second variable, we have decided to use (Theaters), which will be the maximum number of theatres that the movie was widely released in during its theatrical release. While (Weeks) will be the number of weeks that the given movie in our population was in theaters at any given time. We have decided to use the ranking from the Internet Movie Database from a scale of 1-10 based off of the average review by an IMDB user, which is going to represented as (IMDB), which we will do the same for Rotten Tomatoes critics, but instead being rated from a scale of 1-100, which will be denoted as (RoT). We have quite a few dummy variables in our data, which will be represented by a 1 if it fulfills the category, and 0 if otherwise not fulfilling the specific category. For these dummy
  • 4. variables we will be using (Release), which determines whether or not the movie was re- released in theatre at any time, which we believe is a pivotal factor in how much revenue that a movie can potentially earn. As well as (Sequel), which may give some movies an edge in the market as it is part of a film in a series. We will only be using this variable if it is truly a sequel to a previous film. We will also be using a few dummy variables, which will be the rating that the Motion Picture Association of America gives a film, also known as the MPAA rating which scales from G to NC-17, while we chose to remove NC-17 and Unrated as we have found that no films in our sample size were considered either of those. The last dummy variable that we will be using is which quarter that the movie was released in. We originally wanted to measure whether or not it was released in the summer or winter months as we felt that it would be a good indicator (Topf, 2009). But, we have decided not to as many movies were not released in those specific seasons, but rather two or three weeks before, so we have decided to just opt with using the quarter systemas many companies do quarterly releases on their financial statements. V. Empirical Model For our project we have decided to run a few models making adjustments as needed in order to try to find the best feasible model given the data that was readily available for us. After much discussion as a group we have decided to come up with two models that will be analyzed, with our preliminary model involving all of the independent variables being previously mention. While on the other had our Adjusted model omitting the variables: Sequel, IMDB Rating, as well as Rotten Tomatoes Rating.
  • 5. Preliminary Model: Domestic Gross = β0 + β1(Production) + β2(Theaters) + β3(Weeks) + β4(IDBM) + β5(RoT) + β6(Sequel) + β7(PG) + β8(PG-13) + β9(R) + β10(UR) + β11(Q2) + β12(Q3) + β13(Q4) + ε After some theory we have decided to omit some variables from our original model to make an adjusted model. Our second model that we examining is the model where we will omit Sequel, IMDB rating, as well as Rotten Tomatoes Rating, as we saw these were insignificant at 5% from our preliminary testing as seen in table 1.3, where we would decide whether or not these variables were statistically significant or not. Adjusted Model: Domestic Gross = β0 + β1(Production) + β2(Theaters) + β3(Weeks) + β4(ReRelease) + β5(PG) + β6(PG-13) + β7(R) + β8(Q2) + β9(Q3) + β10(Q4) + ε The adjusted model is expected to receive better results after omitting variables that we previously though were important, but statistically they would not be significant in our hypothesis on the determination of success in the motion picture industry. VI. Empirical Results After running a regression on our first model, we received a R-squared value of 0.452 with an adjusted R-squared value of 0.370 as seen in table 1.1. This shows the overall fitness of our model, and while it does not seem that strong of a model at a 0.370 value from the adjusted R-squared, we decided to do other tests to find out more about our model. From this model, it is hard to determine the dependent variable, as we believe that a higher adjusted R-squared would give a better explanation of the hypothesis that we are attempting to answer. In this model we decided to do a F-test, although we were not
  • 6. content with the model with the H0: B1 = B2 = B3 = B4 = B5 = B6 = B7 = B9 = B10 = B11 = B12 = B13 = 0 and 13 degrees of freedom, we found the Critical-F value of 1.83 from our degrees of freedom in our numerator and our denominator, with the F-value being 5.513, where all of the data can be found on table 1.2. As 5.513 is clearly greater than our critical F, we rejected the null. Since our preliminary model should still consist of running tests, we ran a T-test on all of the variables, where we it would help assess the chance of the slope’s true value. The Critical-T value was 1.984 from the formula, where we rejected the null on whatever variable was greater than the Critical-T value. There were no signs of multicollinearity in our model as the Variance Inflation Factor was less than 5 for the majority of our variables that were not dummy variables, so we concluded that there is no multicollinearity in our model from our given dataset. After a few of our models that were ran, we have decided to proceed with the adjusted model, as it seemed as the best feasible model based off of our data that we have acquired. This is due to the significance of every variable that we have decided to keep, which were: Production, Theatres, Weeks, Re-release, MPAA Ratings, and Quarters. The significance of every variable was less than 8.5%, which is the closest we believe that we can achieve, without finding the omitted variables which may be present in our model. After the improvement of our F-Value increasing from 5.513 to 6.950 from our preliminary model to our adjusted model, we have decided that there is not a high chance that the variance of the variables is equal to each other, which makes the overall fitness of our model marginally better. In running our T-test for the adjusted model, the majority of our variables T-values were greater than the Critical-T, so they passed the test with flying colors
  • 7. while rejecting the null. As well as our F-value rejecting the Critical-F null value of 1.94, it shows that our model is marginally more significant prior to our preliminary model. VII. Empirical Testing One of the “diseases” that we tested for was multicollinearity, which was pretty easy, as we just used the Variance Inflation Factor or the VIFs. Although we have decided to use the adjusted model, we did that for both models just to get a general idea of how the VIFs change throughout both models. Since most of the variables did not have VIFs above 5.0, other than the dummy variables, but we saw those as exceptions as they are in binary codes consisting of 0 and 1s, as to whether or not they met that variable requirement or not. We have decided that we do not have multicollinearity among our variables in our model. The VIF values are in tables 1.3 and 2.3, and is consistent for both models run, that there are no signs of multicollinearity, with the exceptions of the dummy variables. After checking for multicollinearity, we would check whether or not our model has heteroscedasticity, as in order to have a model it should pass all of the classical assumptions, where one of the assumptions is that the variances in the model has to be homoscedastic. After plotting the residuals against the standard predicted value in table 3.1, as well as the frequency and residuals in a histogram on table 3.2, we may have heteroscedasticity, so we would clearly decide to test for it and attempt to correct for it. We will test for it using the White Test, as we do not know the Z-value for the Park Test, we felt that the White Test would be the best choice.
  • 8. To do the white test, we squared the error term observations against the independent variables that included all of the variables from the adjusted model, the independent variables from the adjusted regression squared, and the independent variables from the adjusted regression multiplied by another independent variable, doing all of the possible combinations, while not including the dummy variables when squaring or multiplying the independent variables against one another. Running the regression, we get table 3.3 and 3.4. Find the critical-chi square value it was 26.30 at a 5% significance level with 16 degrees of freedom after accounting for the White Test variables. After finding the Chi-square we receive a value of 23.5 after multiplying the Population sample against the adjusted R-squared from the White Test regression. We can conclude since 23.5>26.3 is not true, we do not reject the null so there is no heteroscedasticity present in our model. VIII. Conclusion The results from our research suggest that Production Budget, MPAA Rating, the number of weeks the movie was in theaters, as well as whether or not the movie was re- released has a positive and statistically significant effect on the domestic gross revenue of a film. We have also concluded that the quarter in which the movie was released in is inconclusive to our model, which may be replaced by whether or not the movie was released on or near a holiday to give our model a better fit. While ratings by internet websites such as the Internet Movie Database and Rotten Tomatoes, as widely popular they
  • 9. are in the film industry is insignificant in our model, as well as whether or not a movie was a sequel or not so we have decided to omit those variables from our original model. After testing for both multicollinearity and heteroscedasticity, the implications of our findings is that although our model may not be extremely strong there is still some significance to it. Production budget is probably the most important factor in determining how much a movie will make in the box office, while there are numerous of other variables such as the MPAA rating or how long the movie was in the theatres, although it can be argued that the length that a movie stays in theatres is due to causation of the movie making a large amount of money and not a factor in determining how much a movie will make. From our findings, there is a lot of variables to take into account that we did not include, which could be added on, such as adding more movies into the data set to get a larger sample size to get a more accurate representation, instead of the top 102 movies that we have chosen for this movie. As well as, adding other dummy variables such as which production company helped the movie acquire funds and advertising, or whether a movie was released on a holiday. Consequently, we can conclude that the film industry is very volatile and risky, and certain variables can increasingly change the domestic box office success of a film, although we there is not complete transparency from production companies in data, such as advertising budgets. The movie industry is extremely complex and it cannot be narrowed down by a few variables in its success, but rather it is determined by large numbers of individuals willing and able to see a movie, which is hard to quantify statistically especially if they will see the movies in theatres or not.
  • 13. Table 3.2 Table 3.3 Table 3.4 Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .592a .351 .227 28238.27525 a. Predictors: (Constant), ReleaseWeeks, Quarter 2, PG, Re-Release, R, Quarter 3, ProductionSquared, WeeksSquared, PG-13, Quarter 4, ReleaseSquared, ProductionWeeks, # of Weeks, Production Budget (in millions of dollars), Widest Release (theatres), ProductionRelease b. Dependent Variable: ResidualSquared ANOVAa Model Sum of Squares df Mean Square F Sig.
  • 14. Works Cited “Box Office Mojo” BoxOfficeMojo. 5/2/2016. http://www.boxofficemojo.com/ “The Internet Movie Database” IMDB. 1980-2016. http://www.imdb.com/ 1 Regression 36222956190 .000 16 2263934762. 000 2.839 .001b Residual 66981615870 .000 84 797400188.9 00 Total 10320457210 0.000 100 a. Dependent Variable: ResidualSquared b. Predictors: (Constant), ReleaseWeeks, Quarter 2, PG, Re-Release, R, Quarter 3, ProductionSquared, WeeksSquared, PG-13, Quarter 4, ReleaseSquared, ProductionWeeks, # of Weeks, Production Budget (in millions of dollars), Widest Release (theatres), ProductionRelease
  • 15. “Rotten Tomatoes” RottenTomatoes. 5/2/2016. Http://www.rottentomatoes.com/ Topf, P. (2009). Examing Success in the Motion Picture Industry. Retrieved May 2, 2016, from https://www.iwu.edu/economics/PPE18/8Topf.pdf