TEXT ANALYTICS
Analysis of reviews fetched
from IMDB for Hobbit Series
1
Submitted By :-
Amrapalli Karan
Kamalika Some
Krishanu Mukherjee
Somenath Sit
Objective
• Web Crawling from IMDB for 3 sequels of The Hobbit.
• Creation of Term Document Matrix and WordCloud
• Dimension Reduction using Latent Semantic Analysis
• Influencing Words in Ratings
• Comparison of sentiments expressed in reviews and ratings
given
5/2/2016 2
Web Crawling
5/2/2016 3
Cleaning of TDM
5/2/2016 4
TDM Dictionary
Final TDM
Filtered TDM
Excluded few common but unnecessary
words like "hobbit", "film", "movie",
"movies“ etc.
Dictionary with
common english
words
Dominating Words in TDM
5/2/2016 5
Hobbit - 2012 Hobbit - 2013 Hobbit - 2014
Dimension Reduction using LSA
5/2/2016 6
TK DK SK
Important Variables in TK matrix
5/2/2016 7
Built a model with “satisfaction” as response variable, to find out which variable are having
more power in predicting the “Ratings
satisfaction=ifelse(Ratings<5,"Dissatisfied",
ifelse(Ratings<7,"Satisfied","Impressed!"))
Dimension Reduction using LSA
5/2/2016 8
• Plotted variable importance with Scree Plot.
• Take optimal no of variables (documents) to filter DK matrix.
DK matrix with important variables
5/2/2016 9
For Hobbit-2012
5/2/2016 10
• Story ,book, like these words
are having deciding power in
“Ratings”.
• People talked more about the
book and the story line.
For Hobbit-2013
5/2/2016 11
• Series ,good , great, story
these words are having
deciding power in “Ratings”.
• In 2013 also viewers were only
impressed with the story,
battles etc.
For Hobbit-2014
5/2/2016 12
• Beside like, good; bad, story
these words are also having
deciding power in “Ratings”.
• Along with the good words,
some negative words have
been used here.
• Story, book these things are not
that effecting in comparison
with previous sequels.
Sentiment Analysis
5/2/2016 13
• A basic task in sentiment analysis is classifying
the polarity of a given text at the document,
sentence, or feature/aspect level — whether
the expressed opinion in a document, a
sentence is positive, negative, or neutral.
• We performed sentiment analysis (polarity) on
the movie reviews of Hobbit and its sequels .
• We used R for the analysis.
Average Ratings
5/2/2016 14
3.65
2.52
4.13
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Hobbit :An Unexpected Journey (2012) For Hobbit: The Desolation of Smaug
(2013)
For Hobbit: The Battle of the Five
Armies (2014)
Key Findings………
• For Hobbit :An Unexpected Journey (2012),
97% of the negative ratings had a negative
polarity for the corresponding reviews, while
78% of the positive ratings had a positive
polarity for the corresponding reviews.
• 76.5% of the ratings were negative.
• The average polarity of the reviews was
(0.012).
Key Findings….(contd)…
• For Hobbit: The Desolation of Smaug (2013),
85% of the negative ratings had a negative
polarity for the corresponding reviews, while
60% of the positive ratings had a positive
polarity for the corresponding reviews.
• 89% of the ratings were negative.
• The average polarity of the reviews was
(0.004).
Key Findings….(contd)…
• For Hobbit: The Battle of the Five
Armies (2014), only 3% of the negative ratings
had a negative polarity for the corresponding
reviews, while 100% of the positive ratings
had a positive polarity for the corresponding
reviews.
• 68% of the ratings were negative.
• The average polarity of the reviews was
(0.005).
Polarity Comparison - Region wise
5/2/2016 18
Hobbit 2012 Hobbit 2013
Hobbit 2014
Polarity Comparison – Region wise
5/2/2016 19
5/2/2016 20

Sentiment analytics

  • 1.
    TEXT ANALYTICS Analysis ofreviews fetched from IMDB for Hobbit Series 1 Submitted By :- Amrapalli Karan Kamalika Some Krishanu Mukherjee Somenath Sit
  • 2.
    Objective • Web Crawlingfrom IMDB for 3 sequels of The Hobbit. • Creation of Term Document Matrix and WordCloud • Dimension Reduction using Latent Semantic Analysis • Influencing Words in Ratings • Comparison of sentiments expressed in reviews and ratings given 5/2/2016 2
  • 3.
  • 4.
    Cleaning of TDM 5/2/20164 TDM Dictionary Final TDM Filtered TDM Excluded few common but unnecessary words like "hobbit", "film", "movie", "movies“ etc. Dictionary with common english words
  • 5.
    Dominating Words inTDM 5/2/2016 5 Hobbit - 2012 Hobbit - 2013 Hobbit - 2014
  • 6.
    Dimension Reduction usingLSA 5/2/2016 6 TK DK SK
  • 7.
    Important Variables inTK matrix 5/2/2016 7 Built a model with “satisfaction” as response variable, to find out which variable are having more power in predicting the “Ratings satisfaction=ifelse(Ratings<5,"Dissatisfied", ifelse(Ratings<7,"Satisfied","Impressed!"))
  • 8.
    Dimension Reduction usingLSA 5/2/2016 8 • Plotted variable importance with Scree Plot. • Take optimal no of variables (documents) to filter DK matrix.
  • 9.
    DK matrix withimportant variables 5/2/2016 9
  • 10.
    For Hobbit-2012 5/2/2016 10 •Story ,book, like these words are having deciding power in “Ratings”. • People talked more about the book and the story line.
  • 11.
    For Hobbit-2013 5/2/2016 11 •Series ,good , great, story these words are having deciding power in “Ratings”. • In 2013 also viewers were only impressed with the story, battles etc.
  • 12.
    For Hobbit-2014 5/2/2016 12 •Beside like, good; bad, story these words are also having deciding power in “Ratings”. • Along with the good words, some negative words have been used here. • Story, book these things are not that effecting in comparison with previous sequels.
  • 13.
    Sentiment Analysis 5/2/2016 13 •A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level — whether the expressed opinion in a document, a sentence is positive, negative, or neutral. • We performed sentiment analysis (polarity) on the movie reviews of Hobbit and its sequels . • We used R for the analysis.
  • 14.
    Average Ratings 5/2/2016 14 3.65 2.52 4.13 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Hobbit:An Unexpected Journey (2012) For Hobbit: The Desolation of Smaug (2013) For Hobbit: The Battle of the Five Armies (2014)
  • 15.
    Key Findings……… • ForHobbit :An Unexpected Journey (2012), 97% of the negative ratings had a negative polarity for the corresponding reviews, while 78% of the positive ratings had a positive polarity for the corresponding reviews. • 76.5% of the ratings were negative. • The average polarity of the reviews was (0.012).
  • 16.
    Key Findings….(contd)… • ForHobbit: The Desolation of Smaug (2013), 85% of the negative ratings had a negative polarity for the corresponding reviews, while 60% of the positive ratings had a positive polarity for the corresponding reviews. • 89% of the ratings were negative. • The average polarity of the reviews was (0.004).
  • 17.
    Key Findings….(contd)… • ForHobbit: The Battle of the Five Armies (2014), only 3% of the negative ratings had a negative polarity for the corresponding reviews, while 100% of the positive ratings had a positive polarity for the corresponding reviews. • 68% of the ratings were negative. • The average polarity of the reviews was (0.005).
  • 18.
    Polarity Comparison -Region wise 5/2/2016 18 Hobbit 2012 Hobbit 2013 Hobbit 2014
  • 19.
    Polarity Comparison –Region wise 5/2/2016 19
  • 20.