SlideShare a Scribd company logo
Data Mining about
Movie Industry
중어중문학과 2011312723 홍경택
문헌정보학과 2014314637 박지원
Contents
Overview1.
Exploratory Data Analysis
Data Present Condition
Modeling
Analysis Result & Verification
Result
2.
3.
4.
5.
6.
Overview1.
연도 2011 2012 2013 2014 2015
극장
매출
(억 원)
전체 12,358 14,551 15,513 16,641 17,154
증감률 5.8% 17.8% 6.6% 7.3% 3.1%
관객 수
(만 명)
총 관객 수 15,972 19,489 21,335 21,506 21,729
증감률 7.1% 22.0% 9.5% 0.8% 1.0%
1인당 관람 횟수 (회) 3.15 3.83 4.17 4.19 4.22
< 2011-2015Movie Theater Sales, Attendance,
the Number of Watching Movie per Person >
Recent 5 years from 2011 to 2015, movie theater sales, total attendance
and the number of watching movie per person all continued to increase.
Overview1.
13,414,009
6,129,681
12,705,700
Hit Movie of 2015 and
its Total Attendance
Overview1.
13,414,009
6,129,681
12,705,700
???
Overview1.
“ A Study on Prediction Model of Movie Success ”
Topic
Purpose
To understand the factors that influence movie success with
various analysis techniques
To predict total attendance by framing prediction model of
movie success
To exploit the result for marketing strategy development
Data Present Condition2.
Annual Accounts ReportNumerical Data
Data Present Condition2.
Annual Accounts ReportNumerical Data
It provides overview of 2015
Korean Film Industry. It
includes box-office value,
production cost with
investment returns and
overseas sales as well.
Data on Excel files including..
Title
Release month
Genre
The number of screen
Nationality
Distributor
The number of screening
Screen share
Total attendance
Data Present Condition2.
Preconditioning Process
Remove sales which are not much different with attendance
Filter it by setting a period 2013 ~ 2016
Remove data less than 10000 attendance
(No outlier because it is achievement record)
Turn release date into release month to assort peak/off season
Data Present Condition2.
Data baseline statistics (train data [2013~2015])
Data Present Condition2.
After data cleaning
Train data [2013 – 2015]
9663 -> 836
Test data [2016]
4120-> 294
Data baseline statistics (train data [2013~2015])
Exploratory Data Analysis3.
Exploratory Data Analysis3.
Histogram of numeric data
Total attendance Showing frequency The number of screens
Distribution of Total attendance and Showing frequency is about the same
Exploratory Data Analysis3.
• The most high ratio of released month is January is high and the rests are similar
• In October, It shows a large number of spectators compared to other months.
Histogram and Bar chart
Exploratory Data Analysis3.
Plots between Total attendance and
factors show the distribution of data
Plot in factors
Exploratory Data Analysis3.
Plots between Total attendance and
factors show the distribution of data
Plot in factors
U.S.A
Korea
Actiondrama
Over 15
Exploratory Data Analysis3.
• Too many, various numeric data in Total attendance make analysis more difficult
• So, we categorized the number of Total attendance
Discovering a problem
Exploratory Data Analysis3.
 table(movie$관객수범주)
 100만미만 100만 200만 300만 400만 500만 600만 700만 800만 900만 1000만이상
687 59 32 18 10 10 5 2 2 3 8
What is changed?
Categorize and make a new data column
Exploratory Data Analysis3.
Exploratory Data Analysis3.
• Finding meaning in the data has become easier.
Exploratory Data Analysis3.
• With “prop.table”, it is better to see the data directly.
mosaicplot
Modeling4.
Model introduction
Linear Regression Decision Tree
• One of the most frequent used
techniques in statistics is linear
regression
• Multiple regression is an
extension of linear regression
into relationship between more
than two variables. In simple
linear relation we have one
predictor and one response
variable, but in multiple
regression we have more than
one predictor variable and one
response variable.
• Decision tree is a graph to represent
choices and their results in form of a
tree. The nodes in the graph represent
an event or choice and the edges of
the graph represent the decision rules
or conditions. It is mostly used in
Machine Learning and Data Mining
applications using R.
• Generally, a model is created with
observed data also called training data.
Then a set of validation data is used to
verify and improve the model.
Modeling4.
1. All data (except attendance
category) was inserted
2. Released month,
The number of screen Showing
frequency,
Genre(historical drama),
show meaning correlation
Multiple regression
Modeling4.
BUT
Too high VIF in distributor, nation and Genre, so
except these factors, regression
was run again
Released month
The number of screen
Showing frequency,
All age class
show meaning correlation
Multiple regression
Modeling4.
Predict
attendance category
in test data(2016)
>fit.results2 <- predict(train.dt, newdata = test)
Decision Tree 1
Modeling4.
Predict
attendance category
in test data(2016)
>fit.results2 <- predict(train.dt, newdata = test)
Decision Tree 2
Analysis Result & Verification5.
Decision tree’s accuracy is 89%
Confusion Matrix 1
Analysis Result & Verification5.
Decision tree’s accuracy is 81%
Confusion Matrix 2
Result6.
• The number of screenings
• The number of screens boxoffice
Result6.
Lack of various data Various genre could not
analyzed
Better
Prediction and
analysis
Limitations and expectation
Thank You

More Related Content

Viewers also liked

Movie Prose - A Business Intelligence system
Movie Prose - A Business Intelligence systemMovie Prose - A Business Intelligence system
Movie Prose - A Business Intelligence system
ankurkath
 
Data Mining
Data MiningData Mining
Data Mining
Garima Singh
 
Double page spread movie review analysis
Double page spread movie review analysisDouble page spread movie review analysis
Double page spread movie review analysis
Karanveer Sohal
 
Money monster Movie Analysis : What we learned from this movie
Money monster Movie Analysis : What we learned from this movieMoney monster Movie Analysis : What we learned from this movie
Money monster Movie Analysis : What we learned from this movie
Imtiaz alam
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentationKaiwen Qi
 
Predicting Box Office Revenues
Predicting Box Office RevenuesPredicting Box Office Revenues
Predicting Box Office Revenues
atamaki
 
Textual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsTextual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie Reviews
Yousef Fadila
 
Text Mining of Movie Reviews
Text Mining of Movie ReviewsText Mining of Movie Reviews
Text Mining of Movie Reviews
Maruthi Nataraj K
 
127 Hours - Movie review
127 Hours - Movie review127 Hours - Movie review
127 Hours - Movie review
Faizan Anjum
 
Sentiment Analyzer
Sentiment AnalyzerSentiment Analyzer
Sentiment Analyzer
Ankit Raj
 
Movie Sentiment Analysis
Movie Sentiment AnalysisMovie Sentiment Analysis
Movie Sentiment Analysis
Indian School of Business
 
Bayesian Co clustering
Bayesian Co clusteringBayesian Co clustering
Bayesian Co clusteringlau
 
Sentiment analysis of tweets
Sentiment analysis of tweetsSentiment analysis of tweets
Sentiment analysis of tweetsVasu Jain
 
Big Data & Sentiment Analysis
Big Data & Sentiment AnalysisBig Data & Sentiment Analysis
Big Data & Sentiment Analysis
Michel Bruley
 
kill bill-movie revies
kill bill-movie revieskill bill-movie revies
kill bill-movie revies
Jaskaran Singh
 

Viewers also liked (16)

Movie Prose - A Business Intelligence system
Movie Prose - A Business Intelligence systemMovie Prose - A Business Intelligence system
Movie Prose - A Business Intelligence system
 
Data Mining
Data MiningData Mining
Data Mining
 
Double page spread movie review analysis
Double page spread movie review analysisDouble page spread movie review analysis
Double page spread movie review analysis
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Money monster Movie Analysis : What we learned from this movie
Money monster Movie Analysis : What we learned from this movieMoney monster Movie Analysis : What we learned from this movie
Money monster Movie Analysis : What we learned from this movie
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentation
 
Predicting Box Office Revenues
Predicting Box Office RevenuesPredicting Box Office Revenues
Predicting Box Office Revenues
 
Textual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsTextual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie Reviews
 
Text Mining of Movie Reviews
Text Mining of Movie ReviewsText Mining of Movie Reviews
Text Mining of Movie Reviews
 
127 Hours - Movie review
127 Hours - Movie review127 Hours - Movie review
127 Hours - Movie review
 
Sentiment Analyzer
Sentiment AnalyzerSentiment Analyzer
Sentiment Analyzer
 
Movie Sentiment Analysis
Movie Sentiment AnalysisMovie Sentiment Analysis
Movie Sentiment Analysis
 
Bayesian Co clustering
Bayesian Co clusteringBayesian Co clustering
Bayesian Co clustering
 
Sentiment analysis of tweets
Sentiment analysis of tweetsSentiment analysis of tweets
Sentiment analysis of tweets
 
Big Data & Sentiment Analysis
Big Data & Sentiment AnalysisBig Data & Sentiment Analysis
Big Data & Sentiment Analysis
 
kill bill-movie revies
kill bill-movie revieskill bill-movie revies
kill bill-movie revies
 

Similar to Datamining korea movie industry

me-module-3-data-presentation-and-interpretation-may-2.ppt
me-module-3-data-presentation-and-interpretation-may-2.pptme-module-3-data-presentation-and-interpretation-may-2.ppt
me-module-3-data-presentation-and-interpretation-may-2.ppt
HodaFakour2
 
me-module-3-data-presentation-and-interpretation-may-2.ppt
me-module-3-data-presentation-and-interpretation-may-2.pptme-module-3-data-presentation-and-interpretation-may-2.ppt
me-module-3-data-presentation-and-interpretation-may-2.ppt
JamesMajok1
 
me-module-3-data-presentation-and-interpretation-may-2.ppt
me-module-3-data-presentation-and-interpretation-may-2.pptme-module-3-data-presentation-and-interpretation-may-2.ppt
me-module-3-data-presentation-and-interpretation-may-2.ppt
Gayatri Devi
 
Me module-3-data-presentation-and-interpretation-may-2
Me module-3-data-presentation-and-interpretation-may-2Me module-3-data-presentation-and-interpretation-may-2
Me module-3-data-presentation-and-interpretation-may-2
TsegayeTesfaye4
 
WEEK-1-IS-20022023-094301am.pdf
WEEK-1-IS-20022023-094301am.pdfWEEK-1-IS-20022023-094301am.pdf
WEEK-1-IS-20022023-094301am.pdf
MdDahri
 
Statistics online lecture 01.pptx
Statistics online lecture  01.pptxStatistics online lecture  01.pptx
Statistics online lecture 01.pptx
IkramUlhaq93
 
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docxFIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
AKHIL969626
 
Lecture 7B Panel Econometrics I 2011
Lecture 7B Panel Econometrics I 2011Lecture 7B Panel Econometrics I 2011
Lecture 7B Panel Econometrics I 2011Moses sichei
 
Statics for the management
Statics for the managementStatics for the management
Statics for the managementRohit Mishra
 
Statics for the management
Statics for the managementStatics for the management
Statics for the management
Rohit Mishra
 
Data Matrix Of Cpi Data Distribution After Transformation...
Data Matrix Of Cpi Data Distribution After Transformation...Data Matrix Of Cpi Data Distribution After Transformation...
Data Matrix Of Cpi Data Distribution After Transformation...
Kimberly Jones
 
Stats LECTURE 1.pptx
Stats LECTURE 1.pptxStats LECTURE 1.pptx
Stats LECTURE 1.pptx
KEHKASHANNIZAM
 
Aed1222 lesson 4
Aed1222 lesson 4Aed1222 lesson 4
Aed1222 lesson 4nurun2010
 
Predicting movie success from search
Predicting movie success from searchPredicting movie success from search
Predicting movie success from search
ijaia
 

Similar to Datamining korea movie industry (20)

me-module-3-data-presentation-and-interpretation-may-2.ppt
me-module-3-data-presentation-and-interpretation-may-2.pptme-module-3-data-presentation-and-interpretation-may-2.ppt
me-module-3-data-presentation-and-interpretation-may-2.ppt
 
me-module-3-data-presentation-and-interpretation-may-2.ppt
me-module-3-data-presentation-and-interpretation-may-2.pptme-module-3-data-presentation-and-interpretation-may-2.ppt
me-module-3-data-presentation-and-interpretation-may-2.ppt
 
me-module-3-data-presentation-and-interpretation-may-2.ppt
me-module-3-data-presentation-and-interpretation-may-2.pptme-module-3-data-presentation-and-interpretation-may-2.ppt
me-module-3-data-presentation-and-interpretation-may-2.ppt
 
Me module-3-data-presentation-and-interpretation-may-2
Me module-3-data-presentation-and-interpretation-may-2Me module-3-data-presentation-and-interpretation-may-2
Me module-3-data-presentation-and-interpretation-may-2
 
WEEK-1-IS-20022023-094301am.pdf
WEEK-1-IS-20022023-094301am.pdfWEEK-1-IS-20022023-094301am.pdf
WEEK-1-IS-20022023-094301am.pdf
 
Statistics online lecture 01.pptx
Statistics online lecture  01.pptxStatistics online lecture  01.pptx
Statistics online lecture 01.pptx
 
Panel data content
Panel data contentPanel data content
Panel data content
 
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docxFIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docx
 
currency wars
currency warscurrency wars
currency wars
 
Lecture 7B Panel Econometrics I 2011
Lecture 7B Panel Econometrics I 2011Lecture 7B Panel Econometrics I 2011
Lecture 7B Panel Econometrics I 2011
 
Statics for the management
Statics for the managementStatics for the management
Statics for the management
 
Statics for the management
Statics for the managementStatics for the management
Statics for the management
 
Chapter01
Chapter01Chapter01
Chapter01
 
Chapter01
Chapter01Chapter01
Chapter01
 
Data Matrix Of Cpi Data Distribution After Transformation...
Data Matrix Of Cpi Data Distribution After Transformation...Data Matrix Of Cpi Data Distribution After Transformation...
Data Matrix Of Cpi Data Distribution After Transformation...
 
Stats LECTURE 1.pptx
Stats LECTURE 1.pptxStats LECTURE 1.pptx
Stats LECTURE 1.pptx
 
Aed1222 lesson 4
Aed1222 lesson 4Aed1222 lesson 4
Aed1222 lesson 4
 
Report 3
Report 3Report 3
Report 3
 
Panel slides
Panel slidesPanel slides
Panel slides
 
Predicting movie success from search
Predicting movie success from searchPredicting movie success from search
Predicting movie success from search
 

Recently uploaded

一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Datamining korea movie industry

  • 1. Data Mining about Movie Industry 중어중문학과 2011312723 홍경택 문헌정보학과 2014314637 박지원
  • 2. Contents Overview1. Exploratory Data Analysis Data Present Condition Modeling Analysis Result & Verification Result 2. 3. 4. 5. 6.
  • 3. Overview1. 연도 2011 2012 2013 2014 2015 극장 매출 (억 원) 전체 12,358 14,551 15,513 16,641 17,154 증감률 5.8% 17.8% 6.6% 7.3% 3.1% 관객 수 (만 명) 총 관객 수 15,972 19,489 21,335 21,506 21,729 증감률 7.1% 22.0% 9.5% 0.8% 1.0% 1인당 관람 횟수 (회) 3.15 3.83 4.17 4.19 4.22 < 2011-2015Movie Theater Sales, Attendance, the Number of Watching Movie per Person > Recent 5 years from 2011 to 2015, movie theater sales, total attendance and the number of watching movie per person all continued to increase.
  • 6. Overview1. “ A Study on Prediction Model of Movie Success ” Topic Purpose To understand the factors that influence movie success with various analysis techniques To predict total attendance by framing prediction model of movie success To exploit the result for marketing strategy development
  • 7. Data Present Condition2. Annual Accounts ReportNumerical Data
  • 8. Data Present Condition2. Annual Accounts ReportNumerical Data It provides overview of 2015 Korean Film Industry. It includes box-office value, production cost with investment returns and overseas sales as well. Data on Excel files including.. Title Release month Genre The number of screen Nationality Distributor The number of screening Screen share Total attendance
  • 9. Data Present Condition2. Preconditioning Process Remove sales which are not much different with attendance Filter it by setting a period 2013 ~ 2016 Remove data less than 10000 attendance (No outlier because it is achievement record) Turn release date into release month to assort peak/off season
  • 10. Data Present Condition2. Data baseline statistics (train data [2013~2015])
  • 11. Data Present Condition2. After data cleaning Train data [2013 – 2015] 9663 -> 836 Test data [2016] 4120-> 294 Data baseline statistics (train data [2013~2015])
  • 13. Exploratory Data Analysis3. Histogram of numeric data Total attendance Showing frequency The number of screens Distribution of Total attendance and Showing frequency is about the same
  • 14. Exploratory Data Analysis3. • The most high ratio of released month is January is high and the rests are similar • In October, It shows a large number of spectators compared to other months. Histogram and Bar chart
  • 15. Exploratory Data Analysis3. Plots between Total attendance and factors show the distribution of data Plot in factors
  • 16. Exploratory Data Analysis3. Plots between Total attendance and factors show the distribution of data Plot in factors U.S.A Korea Actiondrama Over 15
  • 17. Exploratory Data Analysis3. • Too many, various numeric data in Total attendance make analysis more difficult • So, we categorized the number of Total attendance Discovering a problem
  • 18. Exploratory Data Analysis3.  table(movie$관객수범주)  100만미만 100만 200만 300만 400만 500만 600만 700만 800만 900만 1000만이상 687 59 32 18 10 10 5 2 2 3 8 What is changed? Categorize and make a new data column
  • 20. Exploratory Data Analysis3. • Finding meaning in the data has become easier.
  • 21. Exploratory Data Analysis3. • With “prop.table”, it is better to see the data directly. mosaicplot
  • 22. Modeling4. Model introduction Linear Regression Decision Tree • One of the most frequent used techniques in statistics is linear regression • Multiple regression is an extension of linear regression into relationship between more than two variables. In simple linear relation we have one predictor and one response variable, but in multiple regression we have more than one predictor variable and one response variable. • Decision tree is a graph to represent choices and their results in form of a tree. The nodes in the graph represent an event or choice and the edges of the graph represent the decision rules or conditions. It is mostly used in Machine Learning and Data Mining applications using R. • Generally, a model is created with observed data also called training data. Then a set of validation data is used to verify and improve the model.
  • 23. Modeling4. 1. All data (except attendance category) was inserted 2. Released month, The number of screen Showing frequency, Genre(historical drama), show meaning correlation Multiple regression
  • 24. Modeling4. BUT Too high VIF in distributor, nation and Genre, so except these factors, regression was run again Released month The number of screen Showing frequency, All age class show meaning correlation Multiple regression
  • 25. Modeling4. Predict attendance category in test data(2016) >fit.results2 <- predict(train.dt, newdata = test) Decision Tree 1
  • 26. Modeling4. Predict attendance category in test data(2016) >fit.results2 <- predict(train.dt, newdata = test) Decision Tree 2
  • 27. Analysis Result & Verification5. Decision tree’s accuracy is 89% Confusion Matrix 1
  • 28. Analysis Result & Verification5. Decision tree’s accuracy is 81% Confusion Matrix 2
  • 29. Result6. • The number of screenings • The number of screens boxoffice
  • 30. Result6. Lack of various data Various genre could not analyzed Better Prediction and analysis Limitations and expectation