SlideShare a Scribd company logo
1 of 9
Modeling Techniques in Predictive Analytics:
Business Problems and Solutions with R
TEXT ANALYTICS
objective of case study
 To analyze the trend of movies released over the years and how they differ from decade to decade using
text analytics tools and methods.
Methodology
 We have the data of movies released over last 100 years in the file. We will capture each
and every text from that file and store that text in the form of text corpus. We will perform
text formatting on the text and only use the relevant information for our analysis. We make
use of R Programming Language for our statistical analysis.
 The Internet Movie Database (IMDb.com) is a good source of information about movies
and which is freely available on Internet. We have downloaded the information in the form
of text file for our use. For our example, we choose a smaller text file from IMDb, the tagline
file.
 Text analytics like predictive analytics is also number game, but with words rather than
numbers as the raw input. We will turn words into numbers for analysis.
Data Preprocessing
This is how the unstructured text file looks.
We must process the text before we can understand what it says.
We have to process and clean this data to understand the content of the data.
 We make use of this formatting in parsing the tagline file for entry into text database.
This is how structured data looks like:
Packages Used
library(tm)
library(stringr)
library(grid)
library(ggplot2)
library(latticeExtra
library(cluster)
library(proxy
Visualization using HISTOGRAM
 To determine the ranges of year to consider in our study, we look at the distribution of release
dates in the movie taglines data. The histogram figure below shows more than one hundred
movies a year from the mid 1970’s through 2013 and more than one thousand movies a year
from 2003 to 2013.
Understanding the Trend Using Plot
 We use a horizon plot to visualize text measure in time.
 We identify five common groups or cluster of words, defining the text measures that we call LOVED,
WORLDS, TRUTH, LIFE, STORY.
INTERPRETATION/EVALUATION
 Story based movie produced more with fluctuations.
 Autobiographical movies has been produced more after 2000. Prominent increase of autobiographical movies have
been noticed from 2000-2010.
 Non fiction movies has been produced more. Prominent increase of non fiction movies from 98-2010.
 Movies related to natural Geography/Wildlife, world has been produced more with fluctuations,a bit up and down till
1980---up in 89--down till 2002-increse 2010.
 Movies with subject as love story has been pretty fluctuating,UP till 79--down in1980-81--82high-low till 86-high-
majorly low till 2003--high till 2005-low-high trend in 2010.
CONCLUSION
 Based on our Analysis: The current trend of movies are Non-Fictional movies.
 As production of movies are directly proportional to revenue, it is preferred for producers to invest in
non-fictional movies, as category “Truth” is quite higher and production of over all Movies has also
increased after 2000.
 As per our Analysis, Producers can have higher revenue, if they produce/make “Non-Fictional-Movies”
Modeling Techniques in Predictive Analytics

More Related Content

Similar to Modeling Techniques in Predictive Analytics

Media Evaluation - Richard Bedford
Media Evaluation - Richard Bedford Media Evaluation - Richard Bedford
Media Evaluation - Richard Bedford Richardbedford
 
Evaluation Q1.
Evaluation Q1.Evaluation Q1.
Evaluation Q1.a2media14e
 
movieRecommendation_FinalReport
movieRecommendation_FinalReportmovieRecommendation_FinalReport
movieRecommendation_FinalReportSohini Sarkar
 
A2 evaluation question 2
A2 evaluation question 2A2 evaluation question 2
A2 evaluation question 2Brillancez
 
A2 coursework evaluation final
A2 coursework evaluation finalA2 coursework evaluation final
A2 coursework evaluation finalEmmaReeve
 
A2 Horror Evaluation
A2 Horror EvaluationA2 Horror Evaluation
A2 Horror Evaluationguest9314cf8
 
A2 Horror Evaluation
A2 Horror EvaluationA2 Horror Evaluation
A2 Horror Evaluationwelavmedia
 
Pow! Your Point: Better Presentations for a Happier Audience
Pow! Your Point: Better Presentations for a Happier AudiencePow! Your Point: Better Presentations for a Happier Audience
Pow! Your Point: Better Presentations for a Happier AudienceMesaPublicLibrary
 
Media A2 Evaluation
Media A2 EvaluationMedia A2 Evaluation
Media A2 Evaluationadamsims1992
 
Rohit garg iim_raipur_method
Rohit garg iim_raipur_methodRohit garg iim_raipur_method
Rohit garg iim_raipur_methodRohit Garg
 
Stat research presentation
Stat research presentationStat research presentation
Stat research presentationAmberCherie
 
Data Acquisition Project
Data Acquisition ProjectData Acquisition Project
Data Acquisition ProjectAbhishek Singh
 

Similar to Modeling Techniques in Predictive Analytics (18)

Media Evaluation - Richard Bedford
Media Evaluation - Richard Bedford Media Evaluation - Richard Bedford
Media Evaluation - Richard Bedford
 
Evaluation Q1.
Evaluation Q1.Evaluation Q1.
Evaluation Q1.
 
Movies and Market Share
Movies and Market ShareMovies and Market Share
Movies and Market Share
 
movieRecommendation_FinalReport
movieRecommendation_FinalReportmovieRecommendation_FinalReport
movieRecommendation_FinalReport
 
Jack Psuedo - Question 1
Jack Psuedo - Question 1Jack Psuedo - Question 1
Jack Psuedo - Question 1
 
Media q1 part 2
Media q1 part 2Media q1 part 2
Media q1 part 2
 
A2 evaluation question 2
A2 evaluation question 2A2 evaluation question 2
A2 evaluation question 2
 
A2 coursework evaluation final
A2 coursework evaluation finalA2 coursework evaluation final
A2 coursework evaluation final
 
A2 Horror Evaluation
A2 Horror EvaluationA2 Horror Evaluation
A2 Horror Evaluation
 
A2 Horror Evaluation
A2 Horror EvaluationA2 Horror Evaluation
A2 Horror Evaluation
 
Pow! Your Point: Better Presentations for a Happier Audience
Pow! Your Point: Better Presentations for a Happier AudiencePow! Your Point: Better Presentations for a Happier Audience
Pow! Your Point: Better Presentations for a Happier Audience
 
Media A2 Evaluation
Media A2 EvaluationMedia A2 Evaluation
Media A2 Evaluation
 
Q2 Evaluation A2
Q2 Evaluation A2Q2 Evaluation A2
Q2 Evaluation A2
 
Rohit garg iim_raipur_method
Rohit garg iim_raipur_methodRohit garg iim_raipur_method
Rohit garg iim_raipur_method
 
Evaluation
EvaluationEvaluation
Evaluation
 
Stat research presentation
Stat research presentationStat research presentation
Stat research presentation
 
Data Acquisition Project
Data Acquisition ProjectData Acquisition Project
Data Acquisition Project
 
Media A2 Evaluation
Media A2 EvaluationMedia A2 Evaluation
Media A2 Evaluation
 

Modeling Techniques in Predictive Analytics

  • 1. Modeling Techniques in Predictive Analytics: Business Problems and Solutions with R TEXT ANALYTICS
  • 2. objective of case study  To analyze the trend of movies released over the years and how they differ from decade to decade using text analytics tools and methods.
  • 3. Methodology  We have the data of movies released over last 100 years in the file. We will capture each and every text from that file and store that text in the form of text corpus. We will perform text formatting on the text and only use the relevant information for our analysis. We make use of R Programming Language for our statistical analysis.  The Internet Movie Database (IMDb.com) is a good source of information about movies and which is freely available on Internet. We have downloaded the information in the form of text file for our use. For our example, we choose a smaller text file from IMDb, the tagline file.  Text analytics like predictive analytics is also number game, but with words rather than numbers as the raw input. We will turn words into numbers for analysis.
  • 4. Data Preprocessing This is how the unstructured text file looks. We must process the text before we can understand what it says. We have to process and clean this data to understand the content of the data.  We make use of this formatting in parsing the tagline file for entry into text database. This is how structured data looks like: Packages Used library(tm) library(stringr) library(grid) library(ggplot2) library(latticeExtra library(cluster) library(proxy
  • 5. Visualization using HISTOGRAM  To determine the ranges of year to consider in our study, we look at the distribution of release dates in the movie taglines data. The histogram figure below shows more than one hundred movies a year from the mid 1970’s through 2013 and more than one thousand movies a year from 2003 to 2013.
  • 6. Understanding the Trend Using Plot  We use a horizon plot to visualize text measure in time.  We identify five common groups or cluster of words, defining the text measures that we call LOVED, WORLDS, TRUTH, LIFE, STORY.
  • 7. INTERPRETATION/EVALUATION  Story based movie produced more with fluctuations.  Autobiographical movies has been produced more after 2000. Prominent increase of autobiographical movies have been noticed from 2000-2010.  Non fiction movies has been produced more. Prominent increase of non fiction movies from 98-2010.  Movies related to natural Geography/Wildlife, world has been produced more with fluctuations,a bit up and down till 1980---up in 89--down till 2002-increse 2010.  Movies with subject as love story has been pretty fluctuating,UP till 79--down in1980-81--82high-low till 86-high- majorly low till 2003--high till 2005-low-high trend in 2010.
  • 8. CONCLUSION  Based on our Analysis: The current trend of movies are Non-Fictional movies.  As production of movies are directly proportional to revenue, it is preferred for producers to invest in non-fictional movies, as category “Truth” is quite higher and production of over all Movies has also increased after 2000.  As per our Analysis, Producers can have higher revenue, if they produce/make “Non-Fictional-Movies”