SlideShare a Scribd company logo
1 of 6
IMDB Dataset
Aaron McClellan, Management & Strategic Leadership, Business Analytics
Introduction
For our final project,Ihave chosentoanalyze a movie dataset.Inthe dataset,there isa listof over5,000 movie titles
withseveral differentinputsto assistinanalyzing.WhatIwill be extractingfromthe datasetisthe significance of
attributesthatresultina large gross revenue of amovie.The goal of analyzingthisdatasetistosuccessfullyfigure out
whichattributesare the mostsignificantwhendeterminingfuture successof amovie title before itisreleased.Critics
and humaninstinct,whenitcomestomovies,issometimesunreliable.Iwanttobe able toaccuratelypredictwhat
attributesinfluence movie successbasedonseveral characteristicsinspecificareassuchas Facebook andthe IMDB site.
Background
Creatinga predictive model forthisdatasetisnotvital tohumanexistence,howeveritwouldbe useful forsome movie-
goers.Thisanalyzationpertainstothe entertainment/movie industry.Itcanhelpproducers,actors,actresses,directors,
filminvestors,andmovie-goersdeterminehowsuccessful the proposedmovie willbe.Withoutthe predictivemodeling,
there wouldonlybe gutdecisions/personal preferencesabouthow amovie will turnout.Noteveryone thinksthata
certainactor or actress isamazing,therefore sayingthe entiretyof the movieisamazing.Puttingitintermsof analytical
processingmakesthe predictionmore stable andunbiased.Thisprojectwouldbe deemedsignificanttothisgroupof
people mentionedpreviouslybecauseitwill be anunbiasedpredictive datasetthatwill be utilizedtodeterminegross
revenue.Everyproduceranddirectorbelieve theirmovie will be one of the greatest,andtheywill doeverything intheir
powerto make itthe greatest.However,majorityof the time,thisturnsouttobe false.Theycan take thisdatasetand
implementitintotheirthoughtprocesswhenplanningtheirmovie. Onthe flipside,Iheara lotof the time thatpeople
will gosee a movie andsay“I justwastedx amountof moneytosee that horrible film!”.Movie-goerscanuse this
datasetto make the same predictionsonce the movie isannouncedwithprimaryandsupportingactors/actresses.It
couldpossiblysave movie-goersmoneywhendebatingonwhethertogosee a movie ornot.
Goals
There are a couple of goalsthat I wishtoachieve withthisdataset.The goalsIwishto achieve are:
 Assistdirectorsandproducersinmaximizingtheirpotentialrevenue of aproposedfilm
 Save moneyor spendmoneywiselywhendebatingonseeinganew film
 Gain practice inusingmultiple linearregression
 Developmore skillinpre-processingtechniquessuchasdata partitioningandhandlingmissingdata
 Learn more aboutpost processingtechnique sensitivityanalysis
Literature Review
There are some otherpeople like me whohave hadthe same ideaof analyzingamovie database.One groupof people
workedonanalysisof temporal multivariate networksderivedfromIMDB.Theyusedmethodssuchas (p,q)-core and4-
ringto identifysubgraphsandshortcycles1
. Anothergroupof individualsfromStonyBrookUniversityanalyzedamovie
datasetusingregressionandk-nearestneighbormethods2
.Anotherindividualwantedtosee how hismovie preferences
correlatedwithOscarwinningtitles.He alinearregressionmodel forhisanalysis3
.
Methodology
I obtainedmyoriginal datasetfromdatascience website,Kaggle4
.The original datasetcontained28differentvariables.
The variablesinthe datasetwere bothcategorical andnumerical data.WhenI firststartedworkingonthisdataset,I
wantedtoinclude majorityof the variablesinmyanalysis,howeverIranintoa problem.The problemwasthatwhenI
was tryingto transformmycategorical data intonumerical data.XLMinerisa great software programthatallowsforthis
type of transformation,howevermydatasetcontainednumerousattributesthathad30+ differentcategorical data.For
example,therewere 30+directorsand 30+ actors/actresses.In Figure1, you will findasample of actors/actresses.
Figure 1 Figure 2
XLMinerhas a limitof 30 differentcategorical categories.Becauseof this,Iwasforcedto eitherdotwothings.The first
was to use the Reduce Categoryhandle of XLMinerforall the categorical data. The onlynegative of thisisthatitcuts out
a lot of data and forcesitinto a category.Knowingthatthisisn’twhatI wantedtobecome of my dataset,Ihad to take
the otherroute. The otherroute wasto pick andchoose whichattributesIdeemedacceptabletouse inmy analysis.So,
I didnot choose attributessuchasdirectorname and actor/actressname.I will explainfurtherinthe pre-processing
portionof thisreport.
Pre-processing
As statedbefore,Ihadto pickand choose whichattributestouse inmy modeling. There were acouple of attributesthat
I thoughtwere interestingandwantedtosee if theywere significant.Theyhadtodo withnumberof likesonsocial
mediawebsite,Facebook.Majorityof myattributesinmyanalysishadto dowiththis. In additiontonotchoosingsome
attributesbecause of the categorical capon creatingdummies,Ididnotchoose attributesthatwere reallya“make -or-
break”attribute whenitcomesto successof a film.The followingattributeswere eliminatedfrommyanalysisduring
the pre-processingprocess: color,directorname,actor1 name,actor 2 name,actor 3 name,movie title,numbervoted
users,face numberinposter,plotkeywords,movie IMDBlink,numberof usersforreviews,language,country,content
rating,title year,budget,andaspectratio.The attributeslistedin Figure2 displaythe attributes thatIkeptforthe
analysis.
Once I determinedthe attributestouse,Ithenstartedworkingwiththe data.I firstnoticedthatthere wasa lot of
missingdatainthe dataset.Idecidedthatmissingdatamade the entire recordinsignificant because withoutdata,the
record isincomplete andwouldmessupmymodel.The recordsthathad missingdatawouldhave negativelyimpacted
my model sogettingridof themwas myonlyoption. Iusedthe MissingData handle feature of XLMineranddeleted
those records. Afterusingthisfeature,the numberof recordsinmy datasetdecreasedfrom5,043 to 3,879.
Upon receivinganewdatasetwithnomissingdata,Ithenpartitionedthe data.Iuseda 60/40 splitwith60% being
attributedtotrainingand40% goingto validation.Ichose topartitionmydata because Ifeltthatit wouldhelpduring
the performance period.Partitioningthe dataintosegmentsthatare easilypreservedandretrieved made my
performance runsmoothly.
Model #1
For my firstmodel,Ichose tocreate a standardMultiple LinearRegressionanalysistosee whichattributeswere the
mostsensitive whenoutputtinggrossrevenue.WhenIhadfirst run myanalysis,Ihad includedthe variablebudget.
Afterlookingatmymodel,Isaw that budgetcouldbe deemedanoutlier.Thiswouldskew mydatasetwhen
determiningthe mostsensitive attribute.Therefore,Idecidedthatitdidnot fitwiththe restof the variablesandwould
not be comparedwiththe attributeslistedin Figure2.My outputwasgross revenue. Forthisfirstmodel,Ididnotuse
any variable selectionmethod.Iwantedtocompare thismodel withmynextmodel thatusedavariable selection
method.Itook the data generatedfromXLMiner’sMultiple LinearRegressionhandleandbeganasensitivityanalysisfor
postprocessing.
Model #2
For my secondmodel,Ihadgenerated anotherMultiple LinearRegression.However,thistime Iusedavariable selection
methodtosee howit wouldcompare withjusta standardMultiple LinearRegression.Iusedthe stepwisevariable
selectionmethodinthismodel.Iusedthismethodbecauseitisa combinationof backwardseliminationandforward
selectionmethods.Ibelievedthatstepwise wouldgiveme amore accurate prediction.Before runningthismodel, Iused
defaultvaluesforFOUT(2.71) and FIN (3.84). I had usedthe same variablesand same outputas myfirstmodel.After
runningthe model, Ichose the lastsubsetthat wasgeneratedbecause ithadthe lowestCPvalue aswell asthe highest
adjustedRsquaredand probability. Iagaintookthe data generatedandworkedonsensitivityanalysis.
Results
The modelsthatI createdare bothof continuousmethods.Toanalyze the modelsfurther,Ineededtofindapost
processingmethodthatcorrespondedwithmymethods.Ichose todo a sensitivityanalysisforbothmodelstosee what
the relationshipwasbetween attributesinthe standardMultiple LinearRegressionandstepwise MultipleLinear
Regression. Iwantedtotake the means,minimums,maximums,andstepsof the original dataandrun themthrougha
what-if analysisusing10 stepsforthe sensitivityanalysisforeachattribute.Tocompare,Ithenhad to take the standard
deviation.Icreatedthree graphsaftergeneratingstandarddeviation:1) SensitivityAbout the Mean,2) Most Sensitive
Attribute,and3) Least SensitiveAttribute.
Performance Measures of Model #1
For my firstmodel,Ifirstlistedthe coefficientsforeachof the attributesaswell asthe intercept.Ithengatheredthe
mean,minimum,maximum, andstepforeachof the attributesfromthe new dataset(afterusingMissing Datahandle).I
thencalculatedanoutputof grossrevenue bytakingthe interceptplusthe productof eachattribute coefficientandits
mean.I thentookthisgross revenue numberandputit intothe data tablesforthe what-if analysis.Before Icouldrun
the what-if analysis,Ihadtoinsertvaluesforeachattribute inthe data table.These valueswere calculatedbytakingthe
minimumplusthe numberof stepminusone andmultiplieditbythe calculatedstepvalue. Now mydatatable was
readyfor the what-if analysis.Iusedeachattribute meanasthe columninputforthe analysis.Aftergeneratingvalues
for grossrevenue,Ithentookthe standarddeviationof those valuestocompare them. Figure3 displaysthe resultsof
the standard deviation.
Figure 3
NextthingIdidwas lookat thisgraph and see whichattributeswere the mostandleastsensitivewhenitcame togross
revenue. Asseenfromthe graph, the cast total Facebooklikes wasthe mostsensitive attribute. Itishardto see which
attribute wasthe leastsensitivefromthisgraph,howeveritwasthe IMDB score.It turnsout that IMDB score attribute
has little influence ongrossrevenue of afilm.
Performance Measures Model #2
The same processdescribedabove wentintocreatingthe sensitivityanalysisformysecondmodel.Thistime,the
stepwise MultipleLinearRegressionmodel hadsome changes.The firstchange wasthatit hada lowergrossrevenue
output.The secondchange was that ithad a differentleastsensitive attribute.InFigure4, youwill findthe standard
deviationsof eachattribute comparedtoone another.
Figure 4
-
20,000,000,000.00
40,000,000,000.00
60,000,000,000.00
80,000,000,000.00
100,000,000,000.00
120,000,000,000.00
140,000,000,000.00
160,000,000,000.00
180,000,000,000.00
200,000,000,000.00
STDEV.S
Attribute
Sensitivity About The Mean
-
500,000,000.00
1,000,000,000.00
1,500,000,000.00
2,000,000,000.00
2,500,000,000.00
3,000,000,000.00
STDEV.S
Attribute
Sensitivity About The Mean
As showninthe graph,the mostsensitiveattribute wasincompetition.The actor1 Facebooklikescame inaveryclose
secondandalmosttook overas the most sensitive attribute.However,againthe mostsensitive attributewasthe cast
total Facebooklikes.Itisimportanttorealize how close these deviationswere because youdonotsimplywantto
disregardthe numberof Facebooklikesforthe primaryactorin the film.The leastsensitiveattribute inthismodel was
the numberof Facebooklikesforthe director.Itturnsout that it doesnotreallymatterwhothe director of the filmis.
Conclusion
In conclusion,thisanalysiscomparescertainattributesregardingFacebook andIMDBsite againstthe gross revenue of a
film.The highernumberof Facebooklikesfromthe primaryactorand supportingactorsplaysa significantrole in
generatingrevenue fromafilm.Throughboth modelsandthe sensitivityanalysis,someonecaneasilysee the supportin
thisconclusion.Directorsandproducerscantake thisdatasetand implementitintotheirthoughtprocesswhen
planningtheirmovie.Movie-goerscanuse thisdatasetto make the same predictionsonce the movie isannouncedwith
primaryand supportingactors/actresses.Itcouldpossiblysave movie-goersmoneywhendebatingonwhethertogosee
a movie or not.
Thisprojecthas helpedme substantiallyinpracticingwithrunninganalysisoncertaintopicsandgeneratingaresult.It
has developedmy skillinExcel andXLMinerbyusingthe MissingData handle,the Reduce Categorieshandle,the Data
Partitionhandle,the Multiple LinearRegressionhandle,andasensitivityanalysis.Overall,the effectivenessof this
projectwasveryuseful forme inthe preparationformycareer.I can take thisprojectas proof of knowledge inthese
areas as well asknowingassociatedterms.
Sources
1
https://www.computer.org/csdl/proceedings/apvis/2007/0808/00/04126213.pdf
2
https://www.cs.cmu.edu/~nasmith/TDF/ZhangWenbinISF2009Paper.pdf
3
https://www.r-bloggers.com/predicting-movie-ratings-with-imdb-data-and-r/
4
https://www.kaggle.com/deepmatrix/imdb-5000-movie-dataset

More Related Content

What's hot

Exploratory Data Analysis Bank Fraud Case Study
Exploratory  Data Analysis Bank Fraud Case StudyExploratory  Data Analysis Bank Fraud Case Study
Exploratory Data Analysis Bank Fraud Case StudyLumbiniSardare
 
Internet Movie Database (IMDB) Presentation
Internet Movie Database (IMDB) PresentationInternet Movie Database (IMDB) Presentation
Internet Movie Database (IMDB) Presentationlyvette24
 
Credit eda case study presentation
Credit eda case study presentation  Credit eda case study presentation
Credit eda case study presentation DeboraJasmin S
 
Exploratory Data Analysis Example - Credit Risk Analysis (Second Attempt)
Exploratory Data Analysis Example - Credit Risk Analysis (Second Attempt)Exploratory Data Analysis Example - Credit Risk Analysis (Second Attempt)
Exploratory Data Analysis Example - Credit Risk Analysis (Second Attempt)PRABHASH GOKARN
 
Sql queries presentation
Sql queries presentationSql queries presentation
Sql queries presentationNITISH KUMAR
 
SQL window functions for MySQL
SQL window functions for MySQLSQL window functions for MySQL
SQL window functions for MySQLDag H. Wanvik
 
Sql queries questions and answers
Sql queries questions and answersSql queries questions and answers
Sql queries questions and answersMichael Belete
 
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...Edureka!
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Simplilearn
 
Tableau PPT Intro, Features, Advantages, Disadvantages
Tableau PPT Intro, Features, Advantages, DisadvantagesTableau PPT Intro, Features, Advantages, Disadvantages
Tableau PPT Intro, Features, Advantages, DisadvantagesBurn & Born
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
Textual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsTextual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsYousef Fadila
 
1. SQL Basics - Introduction
1. SQL Basics - Introduction1. SQL Basics - Introduction
1. SQL Basics - IntroductionVarun A M
 
Tableau Developer
Tableau DeveloperTableau Developer
Tableau Developerabhinav9104
 
Superstore Data Analysis using R
Superstore Data Analysis using RSuperstore Data Analysis using R
Superstore Data Analysis using RMonika Mishra
 

What's hot (20)

OM Analytics.pdf
OM Analytics.pdfOM Analytics.pdf
OM Analytics.pdf
 
Final PPT Imdb
Final PPT ImdbFinal PPT Imdb
Final PPT Imdb
 
Exploratory Data Analysis Bank Fraud Case Study
Exploratory  Data Analysis Bank Fraud Case StudyExploratory  Data Analysis Bank Fraud Case Study
Exploratory Data Analysis Bank Fraud Case Study
 
Internet Movie Database (IMDB) Presentation
Internet Movie Database (IMDB) PresentationInternet Movie Database (IMDB) Presentation
Internet Movie Database (IMDB) Presentation
 
Credit eda case study presentation
Credit eda case study presentation  Credit eda case study presentation
Credit eda case study presentation
 
Exploratory Data Analysis Example - Credit Risk Analysis (Second Attempt)
Exploratory Data Analysis Example - Credit Risk Analysis (Second Attempt)Exploratory Data Analysis Example - Credit Risk Analysis (Second Attempt)
Exploratory Data Analysis Example - Credit Risk Analysis (Second Attempt)
 
Sql ppt
Sql pptSql ppt
Sql ppt
 
Capstone Project.pptx
Capstone Project.pptxCapstone Project.pptx
Capstone Project.pptx
 
Sql queries presentation
Sql queries presentationSql queries presentation
Sql queries presentation
 
SQL window functions for MySQL
SQL window functions for MySQLSQL window functions for MySQL
SQL window functions for MySQL
 
Sql queries questions and answers
Sql queries questions and answersSql queries questions and answers
Sql queries questions and answers
 
Sql and Sql commands
Sql and Sql commandsSql and Sql commands
Sql and Sql commands
 
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
 
Tableau PPT Intro, Features, Advantages, Disadvantages
Tableau PPT Intro, Features, Advantages, DisadvantagesTableau PPT Intro, Features, Advantages, Disadvantages
Tableau PPT Intro, Features, Advantages, Disadvantages
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Textual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsTextual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie Reviews
 
1. SQL Basics - Introduction
1. SQL Basics - Introduction1. SQL Basics - Introduction
1. SQL Basics - Introduction
 
Tableau Developer
Tableau DeveloperTableau Developer
Tableau Developer
 
Superstore Data Analysis using R
Superstore Data Analysis using RSuperstore Data Analysis using R
Superstore Data Analysis using R
 

Similar to IMDB Movie Dataset Analysis Predicts Box Office Success

A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016
A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016
A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016Journal For Research
 
movieRecommendation_FinalReport
movieRecommendation_FinalReportmovieRecommendation_FinalReport
movieRecommendation_FinalReportSohini Sarkar
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceHarivamshi D
 
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...IRJET Journal
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaTrushita Redij
 
Rachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra
 
Using genetic algorithms and simulation as decision support in marketing stra...
Using genetic algorithms and simulation as decision support in marketing stra...Using genetic algorithms and simulation as decision support in marketing stra...
Using genetic algorithms and simulation as decision support in marketing stra...infopapers
 
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...ijaia
 
KnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectKnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectMarciano Moreno
 
IRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET Journal
 
Classification modelling review
Classification modelling reviewClassification modelling review
Classification modelling reviewJaideep Adusumelli
 
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...ijaia
 
d5)Go to the following website by clicking on the provided link,
d5)Go to the following website by clicking on the provided link,d5)Go to the following website by clicking on the provided link,
d5)Go to the following website by clicking on the provided link,OllieShoresna
 
ppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhh
ppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhhppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhh
ppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhhshaikfahim2127
 
Recommending Movies Using Neo4j
Recommending Movies Using Neo4j Recommending Movies Using Neo4j
Recommending Movies Using Neo4j Ilias Katsabalos
 
Face identification
Face  identificationFace  identification
Face identification27vipin92
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET Journal
 

Similar to IMDB Movie Dataset Analysis Predicts Box Office Success (20)

A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016
A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016
A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016
 
movieRecommendation_FinalReport
movieRecommendation_FinalReportmovieRecommendation_FinalReport
movieRecommendation_FinalReport
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
 
projectreport
projectreportprojectreport
projectreport
 
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_Trushita
 
Rachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_report
 
Data Science Machine
Data Science Machine Data Science Machine
Data Science Machine
 
Developing Movie Recommendation System
Developing Movie Recommendation SystemDeveloping Movie Recommendation System
Developing Movie Recommendation System
 
Using genetic algorithms and simulation as decision support in marketing stra...
Using genetic algorithms and simulation as decision support in marketing stra...Using genetic algorithms and simulation as decision support in marketing stra...
Using genetic algorithms and simulation as decision support in marketing stra...
 
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
 
KnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectKnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProject
 
IRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence Matrix
 
Classification modelling review
Classification modelling reviewClassification modelling review
Classification modelling review
 
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
 
d5)Go to the following website by clicking on the provided link,
d5)Go to the following website by clicking on the provided link,d5)Go to the following website by clicking on the provided link,
d5)Go to the following website by clicking on the provided link,
 
ppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhh
ppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhhppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhh
ppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhh
 
Recommending Movies Using Neo4j
Recommending Movies Using Neo4j Recommending Movies Using Neo4j
Recommending Movies Using Neo4j
 
Face identification
Face  identificationFace  identification
Face identification
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
 

Recently uploaded

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 

Recently uploaded (20)

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 

IMDB Movie Dataset Analysis Predicts Box Office Success

  • 1. IMDB Dataset Aaron McClellan, Management & Strategic Leadership, Business Analytics Introduction For our final project,Ihave chosentoanalyze a movie dataset.Inthe dataset,there isa listof over5,000 movie titles withseveral differentinputsto assistinanalyzing.WhatIwill be extractingfromthe datasetisthe significance of attributesthatresultina large gross revenue of amovie.The goal of analyzingthisdatasetistosuccessfullyfigure out whichattributesare the mostsignificantwhendeterminingfuture successof amovie title before itisreleased.Critics and humaninstinct,whenitcomestomovies,issometimesunreliable.Iwanttobe able toaccuratelypredictwhat attributesinfluence movie successbasedonseveral characteristicsinspecificareassuchas Facebook andthe IMDB site. Background Creatinga predictive model forthisdatasetisnotvital tohumanexistence,howeveritwouldbe useful forsome movie- goers.Thisanalyzationpertainstothe entertainment/movie industry.Itcanhelpproducers,actors,actresses,directors, filminvestors,andmovie-goersdeterminehowsuccessful the proposedmovie willbe.Withoutthe predictivemodeling, there wouldonlybe gutdecisions/personal preferencesabouthow amovie will turnout.Noteveryone thinksthata certainactor or actress isamazing,therefore sayingthe entiretyof the movieisamazing.Puttingitintermsof analytical processingmakesthe predictionmore stable andunbiased.Thisprojectwouldbe deemedsignificanttothisgroupof people mentionedpreviouslybecauseitwill be anunbiasedpredictive datasetthatwill be utilizedtodeterminegross revenue.Everyproduceranddirectorbelieve theirmovie will be one of the greatest,andtheywill doeverything intheir powerto make itthe greatest.However,majorityof the time,thisturnsouttobe false.Theycan take thisdatasetand implementitintotheirthoughtprocesswhenplanningtheirmovie. Onthe flipside,Iheara lotof the time thatpeople will gosee a movie andsay“I justwastedx amountof moneytosee that horrible film!”.Movie-goerscanuse this datasetto make the same predictionsonce the movie isannouncedwithprimaryandsupportingactors/actresses.It couldpossiblysave movie-goersmoneywhendebatingonwhethertogosee a movie ornot. Goals There are a couple of goalsthat I wishtoachieve withthisdataset.The goalsIwishto achieve are:  Assistdirectorsandproducersinmaximizingtheirpotentialrevenue of aproposedfilm  Save moneyor spendmoneywiselywhendebatingonseeinganew film  Gain practice inusingmultiple linearregression  Developmore skillinpre-processingtechniquessuchasdata partitioningandhandlingmissingdata  Learn more aboutpost processingtechnique sensitivityanalysis Literature Review There are some otherpeople like me whohave hadthe same ideaof analyzingamovie database.One groupof people workedonanalysisof temporal multivariate networksderivedfromIMDB.Theyusedmethodssuchas (p,q)-core and4- ringto identifysubgraphsandshortcycles1 . Anothergroupof individualsfromStonyBrookUniversityanalyzedamovie datasetusingregressionandk-nearestneighbormethods2 .Anotherindividualwantedtosee how hismovie preferences correlatedwithOscarwinningtitles.He alinearregressionmodel forhisanalysis3 . Methodology I obtainedmyoriginal datasetfromdatascience website,Kaggle4 .The original datasetcontained28differentvariables. The variablesinthe datasetwere bothcategorical andnumerical data.WhenI firststartedworkingonthisdataset,I wantedtoinclude majorityof the variablesinmyanalysis,howeverIranintoa problem.The problemwasthatwhenI was tryingto transformmycategorical data intonumerical data.XLMinerisa great software programthatallowsforthis type of transformation,howevermydatasetcontainednumerousattributesthathad30+ differentcategorical data.For example,therewere 30+directorsand 30+ actors/actresses.In Figure1, you will findasample of actors/actresses.
  • 2. Figure 1 Figure 2 XLMinerhas a limitof 30 differentcategorical categories.Becauseof this,Iwasforcedto eitherdotwothings.The first was to use the Reduce Categoryhandle of XLMinerforall the categorical data. The onlynegative of thisisthatitcuts out a lot of data and forcesitinto a category.Knowingthatthisisn’twhatI wantedtobecome of my dataset,Ihad to take the otherroute. The otherroute wasto pick andchoose whichattributesIdeemedacceptabletouse inmy analysis.So, I didnot choose attributessuchasdirectorname and actor/actressname.I will explainfurtherinthe pre-processing portionof thisreport. Pre-processing As statedbefore,Ihadto pickand choose whichattributestouse inmy modeling. There were acouple of attributesthat I thoughtwere interestingandwantedtosee if theywere significant.Theyhadtodo withnumberof likesonsocial mediawebsite,Facebook.Majorityof myattributesinmyanalysishadto dowiththis. In additiontonotchoosingsome attributesbecause of the categorical capon creatingdummies,Ididnotchoose attributesthatwere reallya“make -or- break”attribute whenitcomesto successof a film.The followingattributeswere eliminatedfrommyanalysisduring the pre-processingprocess: color,directorname,actor1 name,actor 2 name,actor 3 name,movie title,numbervoted users,face numberinposter,plotkeywords,movie IMDBlink,numberof usersforreviews,language,country,content rating,title year,budget,andaspectratio.The attributeslistedin Figure2 displaythe attributes thatIkeptforthe analysis. Once I determinedthe attributestouse,Ithenstartedworkingwiththe data.I firstnoticedthatthere wasa lot of missingdatainthe dataset.Idecidedthatmissingdatamade the entire recordinsignificant because withoutdata,the record isincomplete andwouldmessupmymodel.The recordsthathad missingdatawouldhave negativelyimpacted my model sogettingridof themwas myonlyoption. Iusedthe MissingData handle feature of XLMineranddeleted those records. Afterusingthisfeature,the numberof recordsinmy datasetdecreasedfrom5,043 to 3,879. Upon receivinganewdatasetwithnomissingdata,Ithenpartitionedthe data.Iuseda 60/40 splitwith60% being attributedtotrainingand40% goingto validation.Ichose topartitionmydata because Ifeltthatit wouldhelpduring the performance period.Partitioningthe dataintosegmentsthatare easilypreservedandretrieved made my performance runsmoothly.
  • 3. Model #1 For my firstmodel,Ichose tocreate a standardMultiple LinearRegressionanalysistosee whichattributeswere the mostsensitive whenoutputtinggrossrevenue.WhenIhadfirst run myanalysis,Ihad includedthe variablebudget. Afterlookingatmymodel,Isaw that budgetcouldbe deemedanoutlier.Thiswouldskew mydatasetwhen determiningthe mostsensitive attribute.Therefore,Idecidedthatitdidnot fitwiththe restof the variablesandwould not be comparedwiththe attributeslistedin Figure2.My outputwasgross revenue. Forthisfirstmodel,Ididnotuse any variable selectionmethod.Iwantedtocompare thismodel withmynextmodel thatusedavariable selection method.Itook the data generatedfromXLMiner’sMultiple LinearRegressionhandleandbeganasensitivityanalysisfor postprocessing. Model #2 For my secondmodel,Ihadgenerated anotherMultiple LinearRegression.However,thistime Iusedavariable selection methodtosee howit wouldcompare withjusta standardMultiple LinearRegression.Iusedthe stepwisevariable selectionmethodinthismodel.Iusedthismethodbecauseitisa combinationof backwardseliminationandforward selectionmethods.Ibelievedthatstepwise wouldgiveme amore accurate prediction.Before runningthismodel, Iused defaultvaluesforFOUT(2.71) and FIN (3.84). I had usedthe same variablesand same outputas myfirstmodel.After runningthe model, Ichose the lastsubsetthat wasgeneratedbecause ithadthe lowestCPvalue aswell asthe highest adjustedRsquaredand probability. Iagaintookthe data generatedandworkedonsensitivityanalysis. Results The modelsthatI createdare bothof continuousmethods.Toanalyze the modelsfurther,Ineededtofindapost processingmethodthatcorrespondedwithmymethods.Ichose todo a sensitivityanalysisforbothmodelstosee what the relationshipwasbetween attributesinthe standardMultiple LinearRegressionandstepwise MultipleLinear Regression. Iwantedtotake the means,minimums,maximums,andstepsof the original dataandrun themthrougha what-if analysisusing10 stepsforthe sensitivityanalysisforeachattribute.Tocompare,Ithenhad to take the standard deviation.Icreatedthree graphsaftergeneratingstandarddeviation:1) SensitivityAbout the Mean,2) Most Sensitive Attribute,and3) Least SensitiveAttribute. Performance Measures of Model #1 For my firstmodel,Ifirstlistedthe coefficientsforeachof the attributesaswell asthe intercept.Ithengatheredthe mean,minimum,maximum, andstepforeachof the attributesfromthe new dataset(afterusingMissing Datahandle).I thencalculatedanoutputof grossrevenue bytakingthe interceptplusthe productof eachattribute coefficientandits mean.I thentookthisgross revenue numberandputit intothe data tablesforthe what-if analysis.Before Icouldrun the what-if analysis,Ihadtoinsertvaluesforeachattribute inthe data table.These valueswere calculatedbytakingthe minimumplusthe numberof stepminusone andmultiplieditbythe calculatedstepvalue. Now mydatatable was readyfor the what-if analysis.Iusedeachattribute meanasthe columninputforthe analysis.Aftergeneratingvalues for grossrevenue,Ithentookthe standarddeviationof those valuestocompare them. Figure3 displaysthe resultsof the standard deviation.
  • 4. Figure 3 NextthingIdidwas lookat thisgraph and see whichattributeswere the mostandleastsensitivewhenitcame togross revenue. Asseenfromthe graph, the cast total Facebooklikes wasthe mostsensitive attribute. Itishardto see which attribute wasthe leastsensitivefromthisgraph,howeveritwasthe IMDB score.It turnsout that IMDB score attribute has little influence ongrossrevenue of afilm. Performance Measures Model #2 The same processdescribedabove wentintocreatingthe sensitivityanalysisformysecondmodel.Thistime,the stepwise MultipleLinearRegressionmodel hadsome changes.The firstchange wasthatit hada lowergrossrevenue output.The secondchange was that ithad a differentleastsensitive attribute.InFigure4, youwill findthe standard deviationsof eachattribute comparedtoone another. Figure 4 - 20,000,000,000.00 40,000,000,000.00 60,000,000,000.00 80,000,000,000.00 100,000,000,000.00 120,000,000,000.00 140,000,000,000.00 160,000,000,000.00 180,000,000,000.00 200,000,000,000.00 STDEV.S Attribute Sensitivity About The Mean - 500,000,000.00 1,000,000,000.00 1,500,000,000.00 2,000,000,000.00 2,500,000,000.00 3,000,000,000.00 STDEV.S Attribute Sensitivity About The Mean
  • 5. As showninthe graph,the mostsensitiveattribute wasincompetition.The actor1 Facebooklikescame inaveryclose secondandalmosttook overas the most sensitive attribute.However,againthe mostsensitive attributewasthe cast total Facebooklikes.Itisimportanttorealize how close these deviationswere because youdonotsimplywantto disregardthe numberof Facebooklikesforthe primaryactorin the film.The leastsensitiveattribute inthismodel was the numberof Facebooklikesforthe director.Itturnsout that it doesnotreallymatterwhothe director of the filmis. Conclusion In conclusion,thisanalysiscomparescertainattributesregardingFacebook andIMDBsite againstthe gross revenue of a film.The highernumberof Facebooklikesfromthe primaryactorand supportingactorsplaysa significantrole in generatingrevenue fromafilm.Throughboth modelsandthe sensitivityanalysis,someonecaneasilysee the supportin thisconclusion.Directorsandproducerscantake thisdatasetand implementitintotheirthoughtprocesswhen planningtheirmovie.Movie-goerscanuse thisdatasetto make the same predictionsonce the movie isannouncedwith primaryand supportingactors/actresses.Itcouldpossiblysave movie-goersmoneywhendebatingonwhethertogosee a movie or not. Thisprojecthas helpedme substantiallyinpracticingwithrunninganalysisoncertaintopicsandgeneratingaresult.It has developedmy skillinExcel andXLMinerbyusingthe MissingData handle,the Reduce Categorieshandle,the Data Partitionhandle,the Multiple LinearRegressionhandle,andasensitivityanalysis.Overall,the effectivenessof this projectwasveryuseful forme inthe preparationformycareer.I can take thisprojectas proof of knowledge inthese areas as well asknowingassociatedterms.