SlideShare a Scribd company logo
1 of 20
Download to read offline
Understanding
Movie
Success
Piyush Bhargava
Jason Helgren
Abhishek Singh
The Questions
How do we calibrate a movie’s success? What are the
indicators?
How do other variables correlate with these indicators?
Other interesting insights?
Methodology
Scrape 13,000
movie URLs
from Box Office
Mojo using
Python
Randomly
select 5% and
download
JSON blobs
from OMDB
Clean data and
check for
consistency.
Export to
PostgreSQL
Methodology
Scrape 13,000
movie URLs
from Box Office
Mojo using
Python
Randomly
select 5% and
download
JSON blobs
from OMDB
Clean data and
check for
consistency.
Export to
PostgreSQL
Problems along the way:
● Couldn’t get an API key
● Lots of missing values
● Not all URLs had JSON files
Methodology
Scrape 13,000
movie URLs
from Box Office
Mojo using
Python
Randomly
select 5% and
download
JSON blobs
from OMDB
Clean data and
check for
consistency.
Export to
PostgreSQL
Problems along the way:
● Couldn’t get an API key
● Lots of missing values
● Not all URLs had JSON files
Final Result:
database of
about 500
movies
ER Diagram
ER Diagram
Sample Query: User vs Critic Rating
SELECT AVG(metascore) AS critic_avg,
STDDEV_POP(metascore) AS critic_std_dev,
AVG(imdb_rating) AS imdb_avg,
STDDEV_POP(imdb_rating) AS imdb_std_dev,
CORR(metascore, imdb_rating) AS correlation
FROM movies;
User vs Critic Rating
Sample query: Summer blockbuster effect?
SELECT sum(1) AS count, month, genre FROM
(SELECT lhs.movie, lhs.month, rhs.genre FROM
(SELECT movie, month FROM movies) AS lhs
LEFT JOIN
(SELECT movie, genre from genre) AS rhs
USING(movie)) AS iq WHERE genre
IN ('Action', 'Comedy', 'Drama', 'Romance', 'Adventure', 'Crime')
GROUP BY month, genre ORDER BY month;
Sample Query: Summer Blockbuster Effect?
SELECT sum(1) AS count, month, genre FROM
(SELECT lhs.movie, lhs.month, rhs.genre FROM
(SELECT movie, month FROM movies) AS lhs
LEFT JOIN
(SELECT movie, genre FROM genres) AS rhs
USING(movie)) AS iq WHERE genre
IN ('Action', 'Comedy', 'Drama', 'Romance', 'Adventure', 'Crime')
GROUP BY month, genre ORDER BY month;
1
Sample query: Summer blockbuster effect?
SELECT sum(1) AS count, month, genre FROM
(SELECT lhs.movie, lhs.month, rhs.genre FROM
(SELECT movie, month FROM movies) AS lhs
LEFT JOIN
(SELECT movie, genre FROM genres) AS rhs
USING(movie)) AS iq WHERE genre
IN ('Action', 'Comedy', 'Drama', 'Romance', 'Adventure', 'Crime')
GROUP BY month, genre ORDER BY month;
1
2
Sample query: Summer blockbuster effect?
SELECT sum(1) AS count, month, genre FROM
(SELECT lhs.movie, lhs.month, rhs.genre FROM
(SELECT movie, month FROM movies) AS lhs
LEFT JOIN
(SELECT movie, genre FROM genres) AS rhs
USING(movie)) AS iq WHERE genre
IN ('Action', 'Comedy', 'Drama', 'Romance', 'Adventure', 'Crime')
GROUP BY month, genre ORDER BY month;
1
2
3
Summer Blockbuster Effect
Summer Blockbuster Effect
Most Common Genres and MPAA Rating
Awards/Nominations Versus Votes
*Major Win - Oscar/BAFTA/Golden Globe Wins
ρ = 0.65 ρ = 0.53
Nominations Wins Major Wins
Awards/Nominations vs other variables
*Major Win/Nomination - Oscar/BAFTA/Golden Globe Wins/Nominations
Revenue Vs IMDB Rating
● Higher rated movies earn more worldwide
than in US.
● Worldwide movies are potentially influenced
by IMDB rating/number of votes.
Takeaways and Next Steps
● Scrape more data to get a
normal distribution!
● Build a model to predict
movie success.

More Related Content

What's hot

Critical Evaluation Question 3
Critical Evaluation Question 3Critical Evaluation Question 3
Critical Evaluation Question 3SaiyaraMedia
 
Analysis of questionnaire
Analysis of questionnaire Analysis of questionnaire
Analysis of questionnaire 21052000
 
Geraints Target Audience
Geraints Target AudienceGeraints Target Audience
Geraints Target AudienceEddieDew
 
Evaluation question 4
Evaluation question 4Evaluation question 4
Evaluation question 4laurablythe05
 

What's hot (6)

Critical Evaluation Question 3
Critical Evaluation Question 3Critical Evaluation Question 3
Critical Evaluation Question 3
 
Analysis of questionnaire
Analysis of questionnaire Analysis of questionnaire
Analysis of questionnaire
 
Geraints Target Audience
Geraints Target AudienceGeraints Target Audience
Geraints Target Audience
 
MLA 7 Visual Guide - Films
MLA 7 Visual Guide - FilmsMLA 7 Visual Guide - Films
MLA 7 Visual Guide - Films
 
About me.
About me.About me.
About me.
 
Evaluation question 4
Evaluation question 4Evaluation question 4
Evaluation question 4
 

Viewers also liked

Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning ProjectAbhishek Singh
 
Project Presentation (1)
Project Presentation (1)Project Presentation (1)
Project Presentation (1)Abhishek Singh
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning ProjectAdeyemi Fowe
 
BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBigML, Inc
 
BSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic WorkflowsBSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic WorkflowsBigML, Inc
 
MLaaS - Machine Learning as a Service
MLaaS - Machine Learning as a ServiceMLaaS - Machine Learning as a Service
MLaaS - Machine Learning as a ServiceKarl Seiler
 
Die Bedeutung von Machine Learning für den e-Commerce am Beispiel von Amazon
Die Bedeutung von Machine Learning für den e-Commerce am Beispiel von AmazonDie Bedeutung von Machine Learning für den e-Commerce am Beispiel von Amazon
Die Bedeutung von Machine Learning für den e-Commerce am Beispiel von AmazonRising Media Ltd.
 
AI Deep Learning - CF Machine Learning
AI Deep Learning - CF Machine LearningAI Deep Learning - CF Machine Learning
AI Deep Learning - CF Machine LearningKarl Seiler
 
Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...
Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...
Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...Daniel Katz
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelinesjeykottalam
 

Viewers also liked (11)

Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
Project Presentation (1)
Project Presentation (1)Project Presentation (1)
Project Presentation (1)
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and Evaluations
 
C3.2.1
C3.2.1C3.2.1
C3.2.1
 
BSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic WorkflowsBSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic Workflows
 
MLaaS - Machine Learning as a Service
MLaaS - Machine Learning as a ServiceMLaaS - Machine Learning as a Service
MLaaS - Machine Learning as a Service
 
Die Bedeutung von Machine Learning für den e-Commerce am Beispiel von Amazon
Die Bedeutung von Machine Learning für den e-Commerce am Beispiel von AmazonDie Bedeutung von Machine Learning für den e-Commerce am Beispiel von Amazon
Die Bedeutung von Machine Learning für den e-Commerce am Beispiel von Amazon
 
AI Deep Learning - CF Machine Learning
AI Deep Learning - CF Machine LearningAI Deep Learning - CF Machine Learning
AI Deep Learning - CF Machine Learning
 
Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...
Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...
Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 

Similar to Data Acquisition Project

Rohit garg iim_raipur_method
Rohit garg iim_raipur_methodRohit garg iim_raipur_method
Rohit garg iim_raipur_methodRohit Garg
 
Movies Sample PresentationSCM 315 – Business Decision Models.docx
Movies Sample PresentationSCM 315 – Business Decision Models.docxMovies Sample PresentationSCM 315 – Business Decision Models.docx
Movies Sample PresentationSCM 315 – Business Decision Models.docxgilpinleeanna
 
Predicting movie success from search
Predicting movie success from searchPredicting movie success from search
Predicting movie success from searchijaia
 
Prediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handoutPrediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handoutHa Phuong
 
Regressioin mini case
Regressioin mini caseRegressioin mini case
Regressioin mini caseveesingh
 
movieRecommendation_FinalReport
movieRecommendation_FinalReportmovieRecommendation_FinalReport
movieRecommendation_FinalReportSohini Sarkar
 
Econometric Paper Final Draft
Econometric Paper Final DraftEconometric Paper Final Draft
Econometric Paper Final DraftRichard Huynh
 
Community detection recommender system
Community detection   recommender systemCommunity detection   recommender system
Community detection recommender systemRupaDutta3
 
Aspect identification
Aspect identificationAspect identification
Aspect identificationJean Brenda
 
Movie Recommendation System - MovieLens Dataset
Movie Recommendation System - MovieLens DatasetMovie Recommendation System - MovieLens Dataset
Movie Recommendation System - MovieLens DatasetJagruti Joshi
 
Modeling Techniques in Predictive Analytics
Modeling Techniques in Predictive AnalyticsModeling Techniques in Predictive Analytics
Modeling Techniques in Predictive Analyticspiya chauhan
 
Game monetization and promotion in Asia (with focus on Indonesia, Japan, Sout...
Game monetization and promotion in Asia (with focus on Indonesia, Japan, Sout...Game monetization and promotion in Asia (with focus on Indonesia, Japan, Sout...
Game monetization and promotion in Asia (with focus on Indonesia, Japan, Sout...GameCamp
 

Similar to Data Acquisition Project (14)

Rohit garg iim_raipur_method
Rohit garg iim_raipur_methodRohit garg iim_raipur_method
Rohit garg iim_raipur_method
 
Movies Sample PresentationSCM 315 – Business Decision Models.docx
Movies Sample PresentationSCM 315 – Business Decision Models.docxMovies Sample PresentationSCM 315 – Business Decision Models.docx
Movies Sample PresentationSCM 315 – Business Decision Models.docx
 
Predicting movie success from search
Predicting movie success from searchPredicting movie success from search
Predicting movie success from search
 
Prediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handoutPrediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handout
 
Regressioin mini case
Regressioin mini caseRegressioin mini case
Regressioin mini case
 
movieRecommendation_FinalReport
movieRecommendation_FinalReportmovieRecommendation_FinalReport
movieRecommendation_FinalReport
 
Econometric Paper Final Draft
Econometric Paper Final DraftEconometric Paper Final Draft
Econometric Paper Final Draft
 
Community detection recommender system
Community detection   recommender systemCommunity detection   recommender system
Community detection recommender system
 
Movies and Market Share
Movies and Market ShareMovies and Market Share
Movies and Market Share
 
Aspect identification
Aspect identificationAspect identification
Aspect identification
 
Foresee your movie revenue
Foresee your movie revenueForesee your movie revenue
Foresee your movie revenue
 
Movie Recommendation System - MovieLens Dataset
Movie Recommendation System - MovieLens DatasetMovie Recommendation System - MovieLens Dataset
Movie Recommendation System - MovieLens Dataset
 
Modeling Techniques in Predictive Analytics
Modeling Techniques in Predictive AnalyticsModeling Techniques in Predictive Analytics
Modeling Techniques in Predictive Analytics
 
Game monetization and promotion in Asia (with focus on Indonesia, Japan, Sout...
Game monetization and promotion in Asia (with focus on Indonesia, Japan, Sout...Game monetization and promotion in Asia (with focus on Indonesia, Japan, Sout...
Game monetization and promotion in Asia (with focus on Indonesia, Japan, Sout...
 

Data Acquisition Project

  • 2. The Questions How do we calibrate a movie’s success? What are the indicators? How do other variables correlate with these indicators? Other interesting insights?
  • 3. Methodology Scrape 13,000 movie URLs from Box Office Mojo using Python Randomly select 5% and download JSON blobs from OMDB Clean data and check for consistency. Export to PostgreSQL
  • 4. Methodology Scrape 13,000 movie URLs from Box Office Mojo using Python Randomly select 5% and download JSON blobs from OMDB Clean data and check for consistency. Export to PostgreSQL Problems along the way: ● Couldn’t get an API key ● Lots of missing values ● Not all URLs had JSON files
  • 5. Methodology Scrape 13,000 movie URLs from Box Office Mojo using Python Randomly select 5% and download JSON blobs from OMDB Clean data and check for consistency. Export to PostgreSQL Problems along the way: ● Couldn’t get an API key ● Lots of missing values ● Not all URLs had JSON files Final Result: database of about 500 movies
  • 8. Sample Query: User vs Critic Rating SELECT AVG(metascore) AS critic_avg, STDDEV_POP(metascore) AS critic_std_dev, AVG(imdb_rating) AS imdb_avg, STDDEV_POP(imdb_rating) AS imdb_std_dev, CORR(metascore, imdb_rating) AS correlation FROM movies;
  • 9. User vs Critic Rating
  • 10. Sample query: Summer blockbuster effect? SELECT sum(1) AS count, month, genre FROM (SELECT lhs.movie, lhs.month, rhs.genre FROM (SELECT movie, month FROM movies) AS lhs LEFT JOIN (SELECT movie, genre from genre) AS rhs USING(movie)) AS iq WHERE genre IN ('Action', 'Comedy', 'Drama', 'Romance', 'Adventure', 'Crime') GROUP BY month, genre ORDER BY month;
  • 11. Sample Query: Summer Blockbuster Effect? SELECT sum(1) AS count, month, genre FROM (SELECT lhs.movie, lhs.month, rhs.genre FROM (SELECT movie, month FROM movies) AS lhs LEFT JOIN (SELECT movie, genre FROM genres) AS rhs USING(movie)) AS iq WHERE genre IN ('Action', 'Comedy', 'Drama', 'Romance', 'Adventure', 'Crime') GROUP BY month, genre ORDER BY month; 1
  • 12. Sample query: Summer blockbuster effect? SELECT sum(1) AS count, month, genre FROM (SELECT lhs.movie, lhs.month, rhs.genre FROM (SELECT movie, month FROM movies) AS lhs LEFT JOIN (SELECT movie, genre FROM genres) AS rhs USING(movie)) AS iq WHERE genre IN ('Action', 'Comedy', 'Drama', 'Romance', 'Adventure', 'Crime') GROUP BY month, genre ORDER BY month; 1 2
  • 13. Sample query: Summer blockbuster effect? SELECT sum(1) AS count, month, genre FROM (SELECT lhs.movie, lhs.month, rhs.genre FROM (SELECT movie, month FROM movies) AS lhs LEFT JOIN (SELECT movie, genre FROM genres) AS rhs USING(movie)) AS iq WHERE genre IN ('Action', 'Comedy', 'Drama', 'Romance', 'Adventure', 'Crime') GROUP BY month, genre ORDER BY month; 1 2 3
  • 16. Most Common Genres and MPAA Rating
  • 17. Awards/Nominations Versus Votes *Major Win - Oscar/BAFTA/Golden Globe Wins ρ = 0.65 ρ = 0.53 Nominations Wins Major Wins
  • 18. Awards/Nominations vs other variables *Major Win/Nomination - Oscar/BAFTA/Golden Globe Wins/Nominations
  • 19. Revenue Vs IMDB Rating ● Higher rated movies earn more worldwide than in US. ● Worldwide movies are potentially influenced by IMDB rating/number of votes.
  • 20. Takeaways and Next Steps ● Scrape more data to get a normal distribution! ● Build a model to predict movie success.