SlideShare a Scribd company logo
1 of 16
Movie business ≠ Guaranteed profit
Is the probability of making a profitable movie
similar to flipping a coin???
https://sciencelens.co.nz/2012/06/01/flip-a-coin-day/
How to tackle this problem?
Step 1
Data wrangling
Preprocess
raw data
Storytelling
&
Inference
Modeling
& results
Classification
models
Explore
insights
Step 2 Step 3
Data source & feature classification
IMDB information:
IMDB score
Critic for reviews
User for reviews
Voted users
Social media information:
Director
Major actors
Cast
Movie
Descriptive information:
color
duration
movie_title
facenumber_in_poster
plot_keywords
aspect_ratio
content_rating
title_year
language
country
genres
movie_imdb_link
People information:
director
actor $$$:
gross
budget
revenue
Facebook likes
name
Predictor variables
Response variables
 IMDB movie dataset
Observation: 5043 movies
Features: 28 variables
number
Data wrangling
[Cleaning step]
• Checking the percentage of missing values in each variable (column) and observation (row)
• It tells me how to prioritize the recovery steps
• Duplicates removal
[Categorical variables]
• Proofread ‘’movie title” column
• Remove unnecessary words and spaces
• Manually fix “color”, “country”, “language” columns
• Fill up NaN values
• Use one hot encoding
• “content_rating” column
 Remove TV series
 Fill in NaN by web scraping (However, scraped data shows most of them are TV-series or
Not rated. I would just skip the fill-in)
 Group them into 4 and dummify them
• Dummify “genres” column
• Replace “Actor_name” and “director_name” columns into frequency
[Numeric variables]
• Fill in "title_year" column by web-scraped data and subgroup it
• Fill in "budget" column with web-scraped data
• Fill in "gross" column” with web-scraped data
• Add “month” column by web-scraped data
• Impute "num_critic_for_reviews", "director_facebook_likes", "actor_3_facebook_likes",
"actor_1_facebook_likes", "facenumber_in_poster", "actor_2_facebook_likes",
"aspect_ratio", "duration", "num_user_for_reviews" columns with median
[Final steps]
• Remove “movie_imdb_link” column
• Remove all rows with NaN
• Save it to ‘final_wrangle.csv’
Data wrangling
[Prepare target variable: revenue]
• Create a new column called ‘revenue’ by ‘gross’ - ‘budget’
• Change its unit to 1 million
[Outliers]
• Use seaborn.pairplot to get histograms of all predictive variables
• Check target variable
[High correlation between each predictor]
• Create a correlation heatmap
• Identify high positive and low negative correlation between variables
• Remove the variable which is highly related to the other variable positively or negatively
[Save the preprocessed data]
• final_pre.csv
Data preprocessing
Data storytelling & Data inference
[Strategy for numerical features]
Here I use simulated null hypothesis to test the significance (p-value) between each
predictor and the revenue.
module.py includes functions:
• ‘pearson_permuttion_plot’ for null hypothesis simulation, p-value calculation,
and plotting
[Strategy for categorical features]
Here I calculate the mean difference between the categories and test it with simulated
null hypothesis.
module.py includes functions:
• ‘mean_diff_testing’ for plotting
• ‘mean_diff_p’ for calculating p-value between different means
Is IMDB score a good indicator of the revenue?
The correlation is 24% and significant.
How reviews affect the revenue?
[Correlation & significance]
• Voted users: 49%, significant
• Users for reviews: 38%, significant
• Critics for reviews: 24%, significant
1. No correlation between the total budget and revenue.
2. Positive and Negative revenue has correlation with budget, perspectively
Is the budget correlated to the revenue? Invest
more, earn more back?
Recommendation:
There is a trend that most of failed
movies (negative revenue) are
supported by a big budget
Positive
revenue
Negative
revenue
Total budget
How seasonality and title year affect the revenue?
[Month]
The mean difference is significant.
(1: June and December; 0: the rest of month)
[Title year]
The mean difference is not significant.
(1: after 1966; 0: before 1966)
How genres affect the revenue?
[Significant genre & p-value]
PG-13 0.0067
R 0.0
Adventure 0.0
Animation 0.0001
Comedy 0.0018
Crime 0.0003
Drama 0.0
Family 0.0
Fantasy 0.0002
History 0.0
Sci-Fi 0.0453
Thriller 0.0
War 0.0002
Feature importance
[Top10 features]
• Voted users
• Users for reviews
• Critics for reviews
• IMDB score
• Social network-related
features
• Primary actor’s name
frequency
Modeling & results
Logistic
regression
SVM Random forest
Linear
regression
Logistic
regression
SVM Kernel SVM KNN
Random
forest
Gradient
boosting
classifier
Accuracy 0.32 0.69 0.70 0.55 0.68 0.72 0.71
Gradient boosting
Future plans
• Test this dataset in the neural network
• Merge more features from different movie
dataset
– NLP: voters reviews (text)
– other score system (Rotten tomatos)

More Related Content

Similar to Foresee your movie revenue

movieRecommendation_FinalReport
movieRecommendation_FinalReportmovieRecommendation_FinalReport
movieRecommendation_FinalReport
Sohini Sarkar
 
Building a Movie Success Predictor
Building a Movie Success PredictorBuilding a Movie Success Predictor
Building a Movie Success Predictor
Youness Lahdili
 
Pitch Deck For Pre Seed Funding Powerpoint Presentation Slides
Pitch Deck For Pre Seed Funding Powerpoint Presentation SlidesPitch Deck For Pre Seed Funding Powerpoint Presentation Slides
Pitch Deck For Pre Seed Funding Powerpoint Presentation Slides
SlideTeam
 

Similar to Foresee your movie revenue (20)

Population Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data WorldPopulation Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data World
 
Metis Project 2: Predicting Worldwide Gross - JungleBoogie
Metis Project 2: Predicting Worldwide Gross - JungleBoogieMetis Project 2: Predicting Worldwide Gross - JungleBoogie
Metis Project 2: Predicting Worldwide Gross - JungleBoogie
 
TMDb movie dataset by kaggle
TMDb movie dataset by kaggleTMDb movie dataset by kaggle
TMDb movie dataset by kaggle
 
Software estimation challenge diederik wortman - metri
Software estimation challenge   diederik wortman - metriSoftware estimation challenge   diederik wortman - metri
Software estimation challenge diederik wortman - metri
 
Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
 
Capacity building business template (success lab to market)
Capacity building business template (success lab to market)Capacity building business template (success lab to market)
Capacity building business template (success lab to market)
 
movieRecommendation_FinalReport
movieRecommendation_FinalReportmovieRecommendation_FinalReport
movieRecommendation_FinalReport
 
Custom event prospecting: Winning with Targeting in the Data Gold Rush
Custom event prospecting: Winning with Targeting in the Data Gold RushCustom event prospecting: Winning with Targeting in the Data Gold Rush
Custom event prospecting: Winning with Targeting in the Data Gold Rush
 
Digital analytics lecture4
Digital analytics lecture4Digital analytics lecture4
Digital analytics lecture4
 
Building a Movie Success Predictor
Building a Movie Success PredictorBuilding a Movie Success Predictor
Building a Movie Success Predictor
 
Quality Control PowerPoint Presentation Slides
Quality Control PowerPoint Presentation Slides Quality Control PowerPoint Presentation Slides
Quality Control PowerPoint Presentation Slides
 
Data driven coaching - Deliver 2016
Data driven coaching - Deliver 2016Data driven coaching - Deliver 2016
Data driven coaching - Deliver 2016
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
 
Trymain Rivero AFCU Presentation (for OSDC)
Trymain Rivero AFCU Presentation (for OSDC)Trymain Rivero AFCU Presentation (for OSDC)
Trymain Rivero AFCU Presentation (for OSDC)
 
The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...
 
Software Suite for Movie Market Analysis
Software Suite for Movie Market AnalysisSoftware Suite for Movie Market Analysis
Software Suite for Movie Market Analysis
 
Pitch Deck For Pre Seed Funding Powerpoint Presentation Slides
Pitch Deck For Pre Seed Funding Powerpoint Presentation SlidesPitch Deck For Pre Seed Funding Powerpoint Presentation Slides
Pitch Deck For Pre Seed Funding Powerpoint Presentation Slides
 
Nobody Knows Anything
Nobody Knows AnythingNobody Knows Anything
Nobody Knows Anything
 
Pitch Deck For Pre Seed Funding PowerPoint Presentation Slides
Pitch Deck For Pre Seed Funding PowerPoint Presentation SlidesPitch Deck For Pre Seed Funding PowerPoint Presentation Slides
Pitch Deck For Pre Seed Funding PowerPoint Presentation Slides
 

Recently uploaded

Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
HyderabadDolls
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Recently uploaded (20)

Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
 

Foresee your movie revenue

  • 1.
  • 2. Movie business ≠ Guaranteed profit Is the probability of making a profitable movie similar to flipping a coin??? https://sciencelens.co.nz/2012/06/01/flip-a-coin-day/
  • 3. How to tackle this problem? Step 1 Data wrangling Preprocess raw data Storytelling & Inference Modeling & results Classification models Explore insights Step 2 Step 3
  • 4. Data source & feature classification IMDB information: IMDB score Critic for reviews User for reviews Voted users Social media information: Director Major actors Cast Movie Descriptive information: color duration movie_title facenumber_in_poster plot_keywords aspect_ratio content_rating title_year language country genres movie_imdb_link People information: director actor $$$: gross budget revenue Facebook likes name Predictor variables Response variables  IMDB movie dataset Observation: 5043 movies Features: 28 variables number
  • 5. Data wrangling [Cleaning step] • Checking the percentage of missing values in each variable (column) and observation (row) • It tells me how to prioritize the recovery steps • Duplicates removal [Categorical variables] • Proofread ‘’movie title” column • Remove unnecessary words and spaces • Manually fix “color”, “country”, “language” columns • Fill up NaN values • Use one hot encoding • “content_rating” column  Remove TV series  Fill in NaN by web scraping (However, scraped data shows most of them are TV-series or Not rated. I would just skip the fill-in)  Group them into 4 and dummify them • Dummify “genres” column • Replace “Actor_name” and “director_name” columns into frequency
  • 6. [Numeric variables] • Fill in "title_year" column by web-scraped data and subgroup it • Fill in "budget" column with web-scraped data • Fill in "gross" column” with web-scraped data • Add “month” column by web-scraped data • Impute "num_critic_for_reviews", "director_facebook_likes", "actor_3_facebook_likes", "actor_1_facebook_likes", "facenumber_in_poster", "actor_2_facebook_likes", "aspect_ratio", "duration", "num_user_for_reviews" columns with median [Final steps] • Remove “movie_imdb_link” column • Remove all rows with NaN • Save it to ‘final_wrangle.csv’ Data wrangling
  • 7. [Prepare target variable: revenue] • Create a new column called ‘revenue’ by ‘gross’ - ‘budget’ • Change its unit to 1 million [Outliers] • Use seaborn.pairplot to get histograms of all predictive variables • Check target variable [High correlation between each predictor] • Create a correlation heatmap • Identify high positive and low negative correlation between variables • Remove the variable which is highly related to the other variable positively or negatively [Save the preprocessed data] • final_pre.csv Data preprocessing
  • 8. Data storytelling & Data inference [Strategy for numerical features] Here I use simulated null hypothesis to test the significance (p-value) between each predictor and the revenue. module.py includes functions: • ‘pearson_permuttion_plot’ for null hypothesis simulation, p-value calculation, and plotting [Strategy for categorical features] Here I calculate the mean difference between the categories and test it with simulated null hypothesis. module.py includes functions: • ‘mean_diff_testing’ for plotting • ‘mean_diff_p’ for calculating p-value between different means
  • 9. Is IMDB score a good indicator of the revenue? The correlation is 24% and significant.
  • 10. How reviews affect the revenue? [Correlation & significance] • Voted users: 49%, significant • Users for reviews: 38%, significant • Critics for reviews: 24%, significant
  • 11. 1. No correlation between the total budget and revenue. 2. Positive and Negative revenue has correlation with budget, perspectively Is the budget correlated to the revenue? Invest more, earn more back? Recommendation: There is a trend that most of failed movies (negative revenue) are supported by a big budget Positive revenue Negative revenue Total budget
  • 12. How seasonality and title year affect the revenue? [Month] The mean difference is significant. (1: June and December; 0: the rest of month) [Title year] The mean difference is not significant. (1: after 1966; 0: before 1966)
  • 13. How genres affect the revenue? [Significant genre & p-value] PG-13 0.0067 R 0.0 Adventure 0.0 Animation 0.0001 Comedy 0.0018 Crime 0.0003 Drama 0.0 Family 0.0 Fantasy 0.0002 History 0.0 Sci-Fi 0.0453 Thriller 0.0 War 0.0002
  • 14. Feature importance [Top10 features] • Voted users • Users for reviews • Critics for reviews • IMDB score • Social network-related features • Primary actor’s name frequency
  • 15. Modeling & results Logistic regression SVM Random forest Linear regression Logistic regression SVM Kernel SVM KNN Random forest Gradient boosting classifier Accuracy 0.32 0.69 0.70 0.55 0.68 0.72 0.71 Gradient boosting
  • 16. Future plans • Test this dataset in the neural network • Merge more features from different movie dataset – NLP: voters reviews (text) – other score system (Rotten tomatos)