2. About Our Project
o With the evolution of social media, customers started expressing their experiences/opinions on their
purchased products (specially with respect to cars) across social networking sites, online portals,
blogs, consumer forums, and feedback forums etc.
o Such expressed content generally is of text form, frequently called unstructured data have hidden
information/insights. They provide us lot of info on cars appealing, drive comforts, features, etc.
which can be categorized under some brand reference.
o Thus, to extract the meaningful quantitative insights from such unstructured data with respect to two
newly purchased car brands is the main objective of our project.
o With the help of unsupervised machine learning application and with proper data mining tasks,
analyzed opinions help businesses to understand customers concern which can in turn help them to
come up with actionable plans to satisfy their customers before they move out to other competitors.
2SWAGAT RANJAN BEHERA
3. Data Loading/Reading
o Obtained data has more than 250 opinions expressed by
several new purchasers in tweets for cars namely, Chevrolet
Spark, Ford Fiesta, and Fiat Punto.
o As they are in unstructured text format, with the help of
Natural Language Processing (NLP) tools, they have been read
into R environment using text mining packages, and converted
to R Data format for all further analysis.
o After tweets conversations are read into tool, they are
arranged in such a way that to extracts patterns & categorizes
them. GGPLOT library has been exploited for reporting
analytical insights.
3SWAGAT RANJAN BEHERA
4. Data Preparation
o As we have unstructured text, in order to make it ready
for model, we prepared the data using String to Word
Vector filter.
o We have performed all the words to lower case
conversion, removed hash tags, other special
characters, stop words, and selected setting minimum
frequency terms etc., as shown in beside figure.
o Even then we observed few unwanted characters and
words exists in the data, that has been removed
manually and make sure data is ready for next step i.e.
modeling.
4SWAGAT RANJAN BEHERA
Tweet-Mining involves extracting the
opinions expressed by the users on new
car brands and indexing their sentiments:
1) Positive Comments
2) Negative Comments
3) Neutral Comments
Our analytical solutions tracks the tweet
information enabling business to get
enhanced opinions and make precise
business decisions and customer
targeting than compared to the
traditional non-regular survey approach.
5. Modelling Approach
Sentiment Analysis using Text Mining
Capturing the sentiments from the customer’s tweets
o Build keyword dictionary for positive & negative keywords
o Associate scores to keywords
o Develop metric from keyword frequencies
Outcome would contain
o Customer Tweet ID, Customer Sentiment scores on a scale
between -2 to +5.
o Dashboard depicting the overall sentiment across the brands.
5SWAGAT RANJAN BEHERA
6. Model Results
6SWAGAT RANJAN BEHERA
17% 40% 9%
Chevrolet Spark Fiat Punto Ford Fiesta
Sentiment Sentiment Sentiment
Slightly-Positive Highly-Positive No Much (+ve) Sentiment
Discovered insights about the sentiments are scored for three different popular hatch-back
car brands are shown below:
7. Business Benefits
Key take a ways for businesses:
o Fiat Punto exhibits sentiment for its look and feel and sportive feature with 40% positive
score, hence, in coming editions, business can target more young generation segment for
increasing its sales.
o Where as though Chevrolet Spark is better than Ford Fiesta, both brands need to focus
efforts to attain attention in market with sportive and look and feel features. Further,
they can think of how to improve its market among other generations like millennials and
baby boomers, etc.
7SWAGAT RANJAN BEHERA