Comprehensive sentimental data
mining, analysis and visualization to improve
Prashast Kumar Singh
Making sense of data to facilitate consumer-centric companies,
governments etc in taking decisions to improve their products.
Imagine you’re a mobile manufacturer, say HTC. You just
launched your flagship phone, the HTC X, which boomed
on the Internet with users posting their reactions about
But, is there any way you could actually go through
hundreds of thousands of those reviews individually, and
use that data for your organization at all ?
Is it possible to analyze the sentiments of the user in
what he wrote about the HTC X ?
How about analyzing not just the positive/negative
quotient in the posts, but also getting a summarized
feedback on what users liked the most, and disliked the
most as well, about the HTC X ?
The government wants to connect to hundreds of thousands of
people and analyze their views. How to directly connect to people to
answer questions like:
Government wants to know how the people are reacting to a new
•What parts of the policy do the voters like? (Example Tax cuts)
•What parts of the policy need to be changed of modified?
Getting feedback on proposed laws
•What do the people think about a proposed law
•How the proposal be improved?
•Analyse the negative comments.
Our approach towards a Comprehensive
sentimental analysis and
Break up a review into sentences, and parse each
sentence using the rules of English grammar.
Identify the various relationships(dependencies)
existing between all pairs of words.
Filter the relevant relationships and make a list of
relevant nouns and adjectives.
Assign scores using a self-learning scoring algorithm.
Use the generated data structures to visualize data
to provide answers to businesses’ questions.
Parsing is the process of assigning structural descriptions
to sequences of words in a natural language.
The Stanford typed dependencies representation was designed
to provide a simple description of the grammatical relationships
in a sentence that can easily be understood.
We find the scores of the Adjectives present using the
SentiWordNet API. These scores are then assigned to the
corresponding Nouns and stored in Guava structures.
Intuitive 2D and 3D visualizations of every aspect of data,
mapping changes in sentiments about your brand,
demographics and other analytics
A few challenges:
Analysis of sentiments inside data is a very
complex task for a machine because of the
multiple and often unpredictable soft and hard
variables that come into play when
interpreting it. The main problem being that
the sentiment of a sentence only rarely lies in
the sentence itself and is instead rooted in the
cultural context around that sentence.
This requires the algorithm to compute a vast
amount of densely interconnected information
to answer a fairly simple question in human
terms. Just a few keywords taken separately
won’t do the job.
A bit like: The Government is wrong in its
decision because it is a racist one.
We need to consider a lot of combinations
together to figure out WHY the decision is
thought wrong by people.
Retrieve Data from
various Social media
intuitive 2D and 3D
every aspect of
Load the collected
data into Database
Share Of Voice
How STARK attempts to answer a few generic scenarios?
Company A: Can you summarize what the user talked about my
product, in specific detail?
STARK shows the summary of the reviews
Company A: We had incorporated a new kind of a camera
having a super-fast zoom. How strongly did the user talk
about the camera?
STARK processes the reviews and generates the following
meter graph for CAMERA. The meter graph shows that
the user has responded positively to the quality of
Company A: Overall, how strongly did he express his views
about my product?
STARK shows the mean sentiment distribution of various
components i.e the aggregated mean sentiment shown by
all users towards each component.
Company A: Since we had many new things in our product
this time, I'd like to know that feature which was talked
about the most by him.
STARK shows the percentage distribution of various
components in the review. It gives an overview of the
components that are being talked about and to
Company A: I still need one more detail. Did he talk about the
camera positively only? Or was it negatively Or both? How
many times positive and how many negative?
STARK shows the sentiment distribution of various
components. Sentiments distribution means the sentiment
shown by user towards each component.
Company A: Could you finally quantify the scores assigned to
STARK shows the scatter plot and line graph of all the
Cheers to BIG DATA in a SMALL WORLD
•Prashast Kumar Singh