SlideShare a Scribd company logo
1 of 25
Download to read offline
1
TEXT ANALYSIS TECHNIQUES TO
ANALYZE REVIEWS OF SAMSUNG
GALAXY MEGA 5.8 I9152
SUBMITTED BY
KOUSHIK RAKSHIT
ROLL NO:-A14034
2
CONTENTS
1. Introduction------------------------------------------------------------------------ 3
2. Problem Statement---------------------------------------------------------------- 3
3. Key features------------------------------------------------------------------------ 3
4. Research Design------------------------------------------------------------------- 4
5. Research Methodology---------------------------------------------------------- 4
A. Insights from Web Crawling & Word Cloud--------------------------5
B. Latent Semantic Analysis (LSA) and Cluster Analysis--------------7
C. Reviews Ratings Analysis-----------------------------------------------14
D. Classification using Support Vector Machine(SVM)---------------14
E. Reviews Sentiment Analysis-------------------------------------------16
6. Business Perspective------------------------------------------------------------18
7. Appendix-------------------------------------------------------------------------19
3
1. INTRODUCTION:- TEXT ANALYTICS
Text mining,referred to as text data mining, roughly equivalent to text analytics,
refers to the process of deriving high-quality information from text. High-quality
information is typically derived through the devising of patterns and trends through
means such as statistical pattern learning.
2. PROBLEM STATEMENT
ANALYZING REVIEWS FOR SAMSUNG GALAXY MEGA 5.8 I9152 (BLACK, WITH
BLACK)
From flipkart.com reviews for Samsung Galaxy Mega 5.8 I9152 (at least
100 reviews) were downloaded and a thorough analysis using text
analysis techniques was carried out.
3. KEY FEATURES OF SAMSUNG GALAXY MEGA 5.8 I9152
 Wi-Fi Enabled
 Expandable Storage Capacity of 64 GB
 5.8-inch TFT Capacitive Touchscreen
 Android v4.2.2 (Jelly Bean) OS
 8 MP Primary Camera
 1.9 MP Secondary Camera
 1.4 GHz Dual Core Processor
 Full HD Recording
4
4. RESEARCH DESIGN
 To analyze the user’s responses we had to collect primary and secondary
information from user’s mobile reviews from the website http://www.flipkart.com.
 To analyze the user’s perception the about the phone we took 100 reviews from
the review section from flipkart.
5. RESEARCH METHODOLOGY
To analyze the user reviews following research analysis procedures where undertaken:
A. Web Crawling & Word Cloud
B. Latent Semantic Analysis (LSA) and Clustering Analysis
C. Rating Analysis
D. Classification Analysis using Support Vector Machine(SVM)
E. Reviews Sentiment Analysis
A. INSIGHT FROM WEB CRAWLING & WORDCLOUD
A tag cloud (word cloud, or weighted list in visual design) is a visual representation
for text data, typically used to depict keyword metadata (tags) on websites, or to
visualize free form text. Tags are usually single words, and the importance of each tag
is shown with font size or color. This format is useful for quickly perceiving the most
prominent terms and for locating a term alphabetically to determine its relative
prominence. When used as website navigation aids, the terms are hyperlinked to items
associated with the tag.
R packages used for Word cloud:-RCurl , XML , rvest , word cloud , tm
5
1. Fetching reviews from FLIPKART.COM
FLIPKART<-"http://www.flipkart.com/samsung-galaxy-mega-5-8-i9152/product-
reviews/ITMEYFRTWAXZXTUT?pid=MOBDZSDJAPQXGAWN&type=all"
2.Word Cloud Creation
wordcloud(d$words,d$freq,max.words=300,colors=brewer.pal(8,"Dark2"),scale=c(3,0.5),
random.order=F)
INFERENCE DRAWN:- The word that took prominence in this Word Cloud gave a
clear idea that the mobile at the point of discussion is good & may be known for
screen size , display, camera ,battery . But it does not give proper idea if the product is
worth buying or users of the said mobile are satisfied or not. So to gain more insight
into our data we had to analyze their ratings (based out of 5).
B.LATENT SEMANTIC ANALYSIS AND CLUSTERANALYSIS:
For Latent Semantic Analysis, in which we break the term document matrix into 3 matrices:
6
 Word-Dimension Matrix
 Documents dimension Matrix
 Diagonal Matrix(Identity)
Word-Dimension Matrix:-
PLOTTING X1 vs X2
Inferences:
 When we break the term document matrix into dimension-word vector space chart it is
clearly visible that the positive words like good, features like screen, battery etc, are
occurring mostly at dimension-1.
 Grand, Mega , phone occurring mostly in dimension-2
 Display, price, money, quality of phone are more or less occurring in both the
dimensions.
7
PLOTTING X1 vs X3
Inferences:
 Grand, Mega , phone occurring mostly in dimension-1
 quality related words are more or less occurring in both the dimensions.
PLOTTING X2 vs X3
8
Inferences:
 Grand, Mega occurring mostly in dimension-1
 Camera, samsung are more or less occurring in both the dimensions.
PLOTTING FOR DOCUMENT MATRIX
Inference:-
 Document no. 71, 67 49 are close to dimension-1.
 Document no.68 is close to dimension-2.
 HIERARCHIAL CLUSTERING TO DETERMINE OPTIMUM NO. OF
CLUSTERS
In data mining, hierarchical clustering (also called hierarchical cluster analysis or
HCA) is a method of cluster analysis which seeks to build a hierarchy of clusters.
Strategies for hierarchical clustering generally fall into two types:
 Agglomerative: This is a "bottom up" approach: each observation starts in its
own cluster, and pairs of clusters are merged as one moves up the hierarchy.
 Divisive: This is a "top down" approach: all observations start in one cluster,
and splits are performed recursively as one moves down the hierarchy.
In general, the merges and splits are determined in a greedy manner. The
results of hierarchical clustering are usually presented in a dendrogram.
9
As per the plot, optimum values could be either 3,4 or 5.
CLUSTER ANALYSIS:
 From the above LSA analysis we got 5 optimum numbers of clusters,
which shows that there are 5 categories of reviews out of total 100
reviews.
 So as if now we will concentrate on 5 review-clusters which are helpful to
club different types of reviews from the users.
CLUSTER-1
10
 There are total of 57 observations in this cluster.
 This cluster consists of words related to price and the look of the phone
CLUSTER-2
 There are 32 observations in this cluster
 This cluster consists of words from reviews that are from customers who have good
experience with this mobile.
WORD CLOUD FOR CLUSTER-1
11
CLUSTER-3
 There are 38 observations in this cluster
 This cluster consists of words from reviews that are from customers who have good faith in
the company.
WORD CLOUD FOR CLUSTER-2
12
CLUSTER-4
 There are only 2 observations in this cluster
 This cluster doesnot throw any light on the nature of the cluster.
WORD CLOUD FOR CLUSTER-4
WORD CLOUD FOR CLUSTER-3
13
CLUSTER-5
 The above cluster has 1449 number of observations.
 This cluster consists of words related to product features & quality.
WORD CLOUD FOR CLUSTER-5
14
INFERENCE DRAWN FROM CLUSTERING
Apart from Cluster 1 , all other cluster does not give sufficient information about the
customer base/type. Moreover Cluster 2,3,4,5 is substantially smaller than Cluster 1
and no constructive storyline can be curved out of them.
Whereas Cluster 1 reflects almost anything & everything about the various features of
the phone that the customers might have liked.
C.REVIEWS RATINGS ANALYSIS
Total Reviews=100
Satisfied Reviews: 73
Dissatisfied Reviews: 27
Checking The Ratings Gives A Better Idea That Most Users Are Satisfied With
This Mobile.
D. CLASSIFICATION USING SUPPORT VECTOR MACHINE(SVM)
In machine learning, support vector machines (SVMs, also support vector
networks) are supervised learning models with associated learning algorithms
that analyze data and recognize patterns, used for classification and regression
analysis. Given a set of training examples, each marked as belonging to one of
two categories, an SVM training algorithm builds a model that assigns new
examples into one category or the other, making it a non-probabilistic binary
linear classifier. An SVM model is a representation of the examples as points in
space, mapped so that the examples of the separate categories are divided by a
clear gap that is as wide as possible. New examples are then mapped into that
same space and predicted to belong to a category based on which side of the
gap they fall on.
The generalization properties of an SVM do not depend on the dimensionality of
the space. You can bound the generalization error by a term depending on the
quotient of radius of a ball which contains all the data and the margin realized on
that data, but not on the dimensionality of the space. Many extensions exist, but
the answer is essentially the same: The generalization does not depend on the
15
dimensionality.
An extended explanation is that you can generalize well even in high-dimensional
spaces because the data occupies only a low-dimensional subspace of the feature
space, and regularization results in the learner dealing only with that subspace.
You can see this for yourself if you look at the eigenvalues of the kernel matrix
which typically decay quickly, meaning that you can project your data to a low-
dimensional subspace with negligible error.
So even if you have, for example, a Gaussian kernel, where the feature space is
infinite-dimensional, you are actually dealing with an essentially finite dimensional
kernel feature space where you are learning a linear decision function, which is
statistically tractable. Note that you need to regularize, though.
From the above fig. we can infer that out of 100 data points , 95 data points contribute
to the formation of marginal plane .
6 words displayed at the
head has negative coefficient
6 words displayed at the tail
has positive coefficient
16
Using SVM, we have classified reviews into two categories .
Since “dissatisfied” is the first level, the words with negative coefficient have positive
impact & vice versa
Snapshot of Data Frame containing list of words & their
frequency count
SENTIMENT ANALYSIS
Sentiment essentially relates to feelings; attitudes, emotions and opinions.
Sentiment Analysis refers to the practice of applying Natural Language
Processing and Text Analysis techniques to identify and extract subjective
information from a piece of text. A person‟s opinion or feelings are for the
most part subjective and not facts. Which means to accurately analyze an
individual‟s opinion or mood from a piece of text can be extremely difficult.
With Sentiment Analysis from a text analytics point of view, we are essentially
looking to get an understanding of the attitude of a writer with respect to a
17
topic in a piece of text and its polarity; whether it‟s positive, negative or
neutral.
In recent years there has been a steady increase in interest from brands,
companies and researchers in Sentiment Analysis and its application to
business analytics.
The business world today, as is the case in many data analytics streams, is
looking for “business insight.”
● Installing „qdap‟ package
● We decided the threshold value for polarity to classify between satisfied
& dissatisfied on the basis of the plot in the next page.
● This tree plot was done using library “party”.
● Output of the popularity check or sentiment analysis gives the clear
message that 65% of the buyers are satisfied with the purchase of the
mobile
18
CONCLUSION & BUSINESS PERSPECTIVE
• Output of our text analytics techniques brings out the fact that
Samsung Galaxy Mega 5.8 I9152 is a mobile worth buying.
• Most of the customers who bought it are extremely satisfied with the
various features it offers.
• Customer segmentation is possible but the very clear classification is
not possible as there are lot many features that are equally liked by all
across clusters.
• But of course buyers can be classified in terms of their satisfaction
level.
19
APPENDIX
CODES
#WEB CRAWLING & WORD CLOUD
#install.packages("RCurl")
library(RCurl)
#install.packages("XML")
library(XML)
#install.packages("rvest")
library(rvest)
library(wordcloud)
library(tm)
FLIPKART<-"http://www.flipkart.com/samsung-galaxy-core-18262/product-reviews/ITMDV6F6KYTTPGU4"
d = getURL(FLIPKART)
doc=htmlParse(d)
list=getNodeSet(doc,"//a")
list_href=sapply(list,function(x)xmlGetAttr(x,"href"))
page_link=grep("start=",list_href)
page_links<-list_href[page_link]
page_links<-unique(page_links)
crawl_candidate<-"start="
base="http://www.flipkart.com"
num<-10
doclist=list()
anchorlist=vector()
j=0
while(j<num)
{
if(j==0)
{
doclist[j+1]<-getURL(FLIPKART)
20
}
else
{
doclist[j+1]=getURL(paste(base,anchorlist[j+1],sep=""))
}
doc<-htmlParse(doclist[[j+1]])
anchor<-getNodeSet(doc,"//a")
anchor<-sapply(anchor,function(x)xmlGetAttr(x,"href"))
anchor<-anchor[grep(crawl_candidate,anchor)]
anchorlist=c(anchorlist,anchor)
anchorlist=unique(anchorlist)
j=j+1
}
reviews=c()
for(i in 1:10)
{
doc=htmlParse(doclist[[i]])
l=getNodeSet(doc,"//div/p/span[@class='review-text']")
l1=html_text(l)
#r=l1[nchar(l1)>200]
reviews=c(reviews,l1)
}
save(reviews,file="C:UsersKoushikDesktopNew folderAreviews.RData")
#install.packages("wordcloud")
library(wordcloud)
corpus=Corpus(VectorSource(reviews[1:100]))
corpus=tm_map(corpus,tolower)
corpus=tm_map(corpus,removePunctuation)
21
corpus=tm_map(corpus,removeNumbers)
corpus=tm_map(corpus,removeWords,stopwords("en"))
corpus=Corpus(VectorSource(corpus))
tdm=TermDocumentMatrix(corpus)
m=as.matrix(tdm)
v=sort(rowSums(m),decreasing=T)
d=data.frame(words=names(v),freq=v)
wordcloud(d$words,d$freq,max.words=300,colors=brewer.pal(8,"Dark2"),scale=c(3,0.5),random.order=F)
#REVIEW RATINGS
reviews=c()
ratings=c()
missingRating=data.frame(Page=0,missing=0)
for(i in 1:10){
doc=htmlParse(doclist[[i]])
l=getNodeSet(doc,"//div/p/span")
rateNodes=getNodeSet(doc,"//div[@class='fk-stars']")
rates=sapply(rateNodes,function(x)xmlGetAttr(x,"title"))
ratings=c(ratings,rates)
l1=html_text(l)
reviews=c(reviews,l1)
}
View(reviews)
View(ratings)
reviews100=reviews[1:100]
reviews100
ratings
rating=gsub(" star[s]?","",ratings)
rating=as.numeric(rating)
satisfaction=ifelse(rating>3,"satisfied","dissatisfied")
22
satisfaction
dtmmobile=create_matrix(reviews100,removePunctuation=T,removeNumbers=T,weighting=weightTfIdf,ste
mWords=TRUE)
dtmmobile=as.matrix(dtmmobile)
data=as.data.frame(dtmmobile)
data=cbind(data,satisfaction)
#data1=na.omit(data)
data=data[,colSums(data[,-length(data)])>0]
View(data)
table(data$satisfaction)
svm=svm(satisfaction~.,data=data)
svm
#To get variable importance in prediction, SVM weights are evaluated as shown below
coef_imp=as.data.frame(t(svm$coefs)%*%svm$SV)
coef_imp1=data.frame(words=names(coef_imp),Importance=t(coef_imp))
coef_imp1=coef_imp1[order(coef_imp1$Importance),]
head(coef_imp1)
tail(coef_imp1)
View(coef_imp1)
#LSA & CLUSTERING
library(vegan)
install.packages("RTools")
library(RTools)
library(RTextTools)
library(mclust)
library(lsa)
library(cluster)
tdm=create_matrix(reviews,removeNumbers=T)
tdm_tfidf=weightTfIdf(tdm)
m=as.matrix(tdm)
m_tfidf=as.matrix(tdm_tfidf)
23
lsa_m=lsa(t(m),dimcalc_share(share=0.8))
lsa_m_tk=as.data.frame(lsa_m$tk)
lsa_m_dk=as.data.frame(lsa_m$dk)
lsa_mtfidf=lsa(t(m_tfidf),dimcalc_share(share=0.8))
k50=kmeans(scale(lsa_m$dk),centers=50,nstart=20)
centers50=aggregate(cbind(V1,V2,V3)~k50$cluster,data=as.data.frame(lsa_m$dk),FUN=mean)
d=dist(centers50[,-1])
hc=hclust(d,method="ward.D")
plot(hc,hang=-1)
rect.hclust(hc,h=0.3)
rect.hclust(hc,h=0.4,border="blue")
rect.hclust(hc,h=1.0,border="cyan")
rect.hclust(hc,h=1.25,border="green")
rect.hclust(hc,h=1.7,border="black")
#As per the plot, optimum values could be either 3,4 or 5
k3=kmeans(scale(lsa_m$tk),centers=3,nstart=20)
centers3=aggregate(cbind(V1,V2,V3)~k3$cluster,data=as.data.frame(lsa_m$tk),FUN=mean)
k4=kmeans(scale(lsa_m$tk),centers=4,nstart=20)
centers4=aggregate(cbind(V1,V2,V3)~k4$cluster,data=as.data.frame(lsa_m$tk),FUN=mean)
k5=kmeans(scale(lsa_m$tk),centers=5,nstart=20)
centers5=aggregate(cbind(V1,V2,V3)~k5$cluster,data=as.data.frame(lsa_m$tk),FUN=mean)
lsa_tk=lsa_m$tk
v=sort(colSums(m),decreasing=T)
wordFreq=data.frame(words=names(v),freq=v)
k5_1=wordFreq[k5$cluster==1,]
k5_2=wordFreq[k5$cluster==2,]
k5_3=wordFreq[k5$cluster==3,]
k5_4=wordFreq[k5$cluster==4,]
k5_5=wordFreq[k5$cluster==5,]
24
lsa_dk=as.data.frame(lsa_m$dk)
lsa_dk3=data.frame(words=rownames(lsa_dk),lsa_dk[,1:3])
plot(lsa_dk3$V1,lsa_dk3$V2)
text(lsa_dk3$V1,lsa_dk3$V2,label=lsa_dk3$words)
k50=kmeans(scale(lsa_m$tk),centers=50,nstart=20)
centers50=aggregate(cbind(V1,V2,V3)~k50$cluster,data=as.data.frame(lsa_m$tk),FUN=mean)
lsa_tk3=data.frame(words=rownames(lsa_tk),lsa_tk[,1:3])
plot(lsa_tk3$X1,lsa_tk3$X2)
text(lsa_tk3$X1,lsa_tk3$X2,label=lsa_tk3$words)
plot(lsa_tk3$X2,lsa_tk3$X3)
text(lsa_tk3$X2,lsa_tk3$X3,label=lsa_tk3$words)
plot(lsa_tk3$X3,lsa_tk3$X1)
text(lsa_tk3$X3,lsa_tk3$X1,label=lsa_tk3$words)
#sENTIMENT ANALYSIS
#qdap
data1=data
satisfaction1=as.data.frame(satisfaction)
for(i in 1:100)
{
sent=sent_detect(reviews[i])
pol=polarity(sent)
data1$polarity[i]=pol$group$stan.mean.polarity
satisfaction1$polarity_val[i]=pol$group$stan.mean.polarity
if(is.na(satisfaction1$polarity_val[i]))
{satisfaction1$polarity_val[i]=pol$group$ave.polarity
data1$polarity[i]=pol$group$ave.polarity}
25
}
new_rate=cbind(rating,satisfaction1)
aggregate(polarity_val~rating,data=new_rate,FUN=mean)
tree=party::ctree(satisfaction~polarity_val, data=new_rate)
plot(tree)
new_rate$status=ifelse(new_rate$polarity_val>0.385,"Satisfied","Dissatisfied")
count_status1=as.data.frame(table(new_rate$status))
View(count_status1)

More Related Content

What's hot

Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEFeng Zhu
 
Publication - The feasibility of gaze tracking for “mind reading” during search
Publication - The feasibility of gaze tracking for “mind reading” during searchPublication - The feasibility of gaze tracking for “mind reading” during search
Publication - The feasibility of gaze tracking for “mind reading” during searchA. LE
 
Reasonable confidence limits for binomial proportions
Reasonable confidence limits for binomial proportionsReasonable confidence limits for binomial proportions
Reasonable confidence limits for binomial proportionsJohn Zorich, MS, CQE
 
Jarrar.lecture notes.aai.2011s.ch6.games
Jarrar.lecture notes.aai.2011s.ch6.gamesJarrar.lecture notes.aai.2011s.ch6.games
Jarrar.lecture notes.aai.2011s.ch6.gamesPalGov
 
Visual Tools for explaining Machine Learning Models
Visual Tools for explaining Machine Learning ModelsVisual Tools for explaining Machine Learning Models
Visual Tools for explaining Machine Learning ModelsLeonardo Auslender
 
Approach to BSA/AML Rule Thresholds
Approach to BSA/AML Rule ThresholdsApproach to BSA/AML Rule Thresholds
Approach to BSA/AML Rule ThresholdsMayank Johri
 
Chris Hughes Final Year Project
Chris Hughes Final Year ProjectChris Hughes Final Year Project
Chris Hughes Final Year ProjectChris Hughes
 
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsimtiaz khan
 
Machine-Learning: Customer Segmentation and Analysis.
Machine-Learning: Customer Segmentation and Analysis.Machine-Learning: Customer Segmentation and Analysis.
Machine-Learning: Customer Segmentation and Analysis.Siddhanth Chaurasiya
 
MIM (Mobile Instant Messaging) Classification using Term Frequency-Inverse Do...
MIM (Mobile Instant Messaging) Classification using Term Frequency-Inverse Do...MIM (Mobile Instant Messaging) Classification using Term Frequency-Inverse Do...
MIM (Mobile Instant Messaging) Classification using Term Frequency-Inverse Do...IJMREMJournal
 
Continuous Sentiment Intensity Prediction based on Deep Learning
Continuous Sentiment Intensity Prediction based on Deep LearningContinuous Sentiment Intensity Prediction based on Deep Learning
Continuous Sentiment Intensity Prediction based on Deep LearningYunchao He
 

What's hot (13)

Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
 
Moviereview prjct
Moviereview prjctMoviereview prjct
Moviereview prjct
 
Publication - The feasibility of gaze tracking for “mind reading” during search
Publication - The feasibility of gaze tracking for “mind reading” during searchPublication - The feasibility of gaze tracking for “mind reading” during search
Publication - The feasibility of gaze tracking for “mind reading” during search
 
Reasonable confidence limits for binomial proportions
Reasonable confidence limits for binomial proportionsReasonable confidence limits for binomial proportions
Reasonable confidence limits for binomial proportions
 
Jarrar.lecture notes.aai.2011s.ch6.games
Jarrar.lecture notes.aai.2011s.ch6.gamesJarrar.lecture notes.aai.2011s.ch6.games
Jarrar.lecture notes.aai.2011s.ch6.games
 
Visual Tools for explaining Machine Learning Models
Visual Tools for explaining Machine Learning ModelsVisual Tools for explaining Machine Learning Models
Visual Tools for explaining Machine Learning Models
 
Approach to BSA/AML Rule Thresholds
Approach to BSA/AML Rule ThresholdsApproach to BSA/AML Rule Thresholds
Approach to BSA/AML Rule Thresholds
 
Chris Hughes Final Year Project
Chris Hughes Final Year ProjectChris Hughes Final Year Project
Chris Hughes Final Year Project
 
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analytics
 
Machine-Learning: Customer Segmentation and Analysis.
Machine-Learning: Customer Segmentation and Analysis.Machine-Learning: Customer Segmentation and Analysis.
Machine-Learning: Customer Segmentation and Analysis.
 
MIM (Mobile Instant Messaging) Classification using Term Frequency-Inverse Do...
MIM (Mobile Instant Messaging) Classification using Term Frequency-Inverse Do...MIM (Mobile Instant Messaging) Classification using Term Frequency-Inverse Do...
MIM (Mobile Instant Messaging) Classification using Term Frequency-Inverse Do...
 
Continuous Sentiment Intensity Prediction based on Deep Learning
Continuous Sentiment Intensity Prediction based on Deep LearningContinuous Sentiment Intensity Prediction based on Deep Learning
Continuous Sentiment Intensity Prediction based on Deep Learning
 
General lect
General lectGeneral lect
General lect
 

Similar to Analyzing Samsung Galaxy Mega 5.8 Reviews

Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningIRJET Journal
 
Open06
Open06Open06
Open06butest
 
CUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONCUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONIRJET Journal
 
Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...IRJET Journal
 
Sales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R ProgrammingSales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R ProgrammingNagarjun Kotyada
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics DomainDrjabez
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET Journal
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_SagarSagar Kumar
 
Support Vector Machine Optimal Kernel Selection
Support Vector Machine Optimal Kernel SelectionSupport Vector Machine Optimal Kernel Selection
Support Vector Machine Optimal Kernel SelectionIRJET Journal
 
RESUME SCREENING USING LSTM
RESUME SCREENING USING LSTMRESUME SCREENING USING LSTM
RESUME SCREENING USING LSTMIRJET Journal
 
Software EngineeringBackground for Question 1-7 Kean Universi.docx
Software EngineeringBackground for Question 1-7 Kean Universi.docxSoftware EngineeringBackground for Question 1-7 Kean Universi.docx
Software EngineeringBackground for Question 1-7 Kean Universi.docxwhitneyleman54422
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesVimal Gupta
 
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov DisplayIRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov DisplayIRJET Journal
 
An Approach to Software Testing of Machine Learning Applications
An Approach to Software Testing of Machine Learning ApplicationsAn Approach to Software Testing of Machine Learning Applications
An Approach to Software Testing of Machine Learning Applicationsbutest
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesijsc
 

Similar to Analyzing Samsung Galaxy Mega 5.8 Reviews (20)

Stock Market Prediction Using ANN
Stock Market Prediction Using ANNStock Market Prediction Using ANN
Stock Market Prediction Using ANN
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
 
Open06
Open06Open06
Open06
 
Marvin_Capstone
Marvin_CapstoneMarvin_Capstone
Marvin_Capstone
 
CUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONCUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTION
 
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdfTop Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
 
Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...
 
Sales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R ProgrammingSales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R Programming
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_Sagar
 
Support Vector Machine Optimal Kernel Selection
Support Vector Machine Optimal Kernel SelectionSupport Vector Machine Optimal Kernel Selection
Support Vector Machine Optimal Kernel Selection
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
RESUME SCREENING USING LSTM
RESUME SCREENING USING LSTMRESUME SCREENING USING LSTM
RESUME SCREENING USING LSTM
 
Software EngineeringBackground for Question 1-7 Kean Universi.docx
Software EngineeringBackground for Question 1-7 Kean Universi.docxSoftware EngineeringBackground for Question 1-7 Kean Universi.docx
Software EngineeringBackground for Question 1-7 Kean Universi.docx
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov DisplayIRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
 
An Approach to Software Testing of Machine Learning Applications
An Approach to Software Testing of Machine Learning ApplicationsAn Approach to Software Testing of Machine Learning Applications
An Approach to Software Testing of Machine Learning Applications
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniques
 

Analyzing Samsung Galaxy Mega 5.8 Reviews

  • 1. 1 TEXT ANALYSIS TECHNIQUES TO ANALYZE REVIEWS OF SAMSUNG GALAXY MEGA 5.8 I9152 SUBMITTED BY KOUSHIK RAKSHIT ROLL NO:-A14034
  • 2. 2 CONTENTS 1. Introduction------------------------------------------------------------------------ 3 2. Problem Statement---------------------------------------------------------------- 3 3. Key features------------------------------------------------------------------------ 3 4. Research Design------------------------------------------------------------------- 4 5. Research Methodology---------------------------------------------------------- 4 A. Insights from Web Crawling & Word Cloud--------------------------5 B. Latent Semantic Analysis (LSA) and Cluster Analysis--------------7 C. Reviews Ratings Analysis-----------------------------------------------14 D. Classification using Support Vector Machine(SVM)---------------14 E. Reviews Sentiment Analysis-------------------------------------------16 6. Business Perspective------------------------------------------------------------18 7. Appendix-------------------------------------------------------------------------19
  • 3. 3 1. INTRODUCTION:- TEXT ANALYTICS Text mining,referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. 2. PROBLEM STATEMENT ANALYZING REVIEWS FOR SAMSUNG GALAXY MEGA 5.8 I9152 (BLACK, WITH BLACK) From flipkart.com reviews for Samsung Galaxy Mega 5.8 I9152 (at least 100 reviews) were downloaded and a thorough analysis using text analysis techniques was carried out. 3. KEY FEATURES OF SAMSUNG GALAXY MEGA 5.8 I9152  Wi-Fi Enabled  Expandable Storage Capacity of 64 GB  5.8-inch TFT Capacitive Touchscreen  Android v4.2.2 (Jelly Bean) OS  8 MP Primary Camera  1.9 MP Secondary Camera  1.4 GHz Dual Core Processor  Full HD Recording
  • 4. 4 4. RESEARCH DESIGN  To analyze the user’s responses we had to collect primary and secondary information from user’s mobile reviews from the website http://www.flipkart.com.  To analyze the user’s perception the about the phone we took 100 reviews from the review section from flipkart. 5. RESEARCH METHODOLOGY To analyze the user reviews following research analysis procedures where undertaken: A. Web Crawling & Word Cloud B. Latent Semantic Analysis (LSA) and Clustering Analysis C. Rating Analysis D. Classification Analysis using Support Vector Machine(SVM) E. Reviews Sentiment Analysis A. INSIGHT FROM WEB CRAWLING & WORDCLOUD A tag cloud (word cloud, or weighted list in visual design) is a visual representation for text data, typically used to depict keyword metadata (tags) on websites, or to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color. This format is useful for quickly perceiving the most prominent terms and for locating a term alphabetically to determine its relative prominence. When used as website navigation aids, the terms are hyperlinked to items associated with the tag. R packages used for Word cloud:-RCurl , XML , rvest , word cloud , tm
  • 5. 5 1. Fetching reviews from FLIPKART.COM FLIPKART<-"http://www.flipkart.com/samsung-galaxy-mega-5-8-i9152/product- reviews/ITMEYFRTWAXZXTUT?pid=MOBDZSDJAPQXGAWN&type=all" 2.Word Cloud Creation wordcloud(d$words,d$freq,max.words=300,colors=brewer.pal(8,"Dark2"),scale=c(3,0.5), random.order=F) INFERENCE DRAWN:- The word that took prominence in this Word Cloud gave a clear idea that the mobile at the point of discussion is good & may be known for screen size , display, camera ,battery . But it does not give proper idea if the product is worth buying or users of the said mobile are satisfied or not. So to gain more insight into our data we had to analyze their ratings (based out of 5). B.LATENT SEMANTIC ANALYSIS AND CLUSTERANALYSIS: For Latent Semantic Analysis, in which we break the term document matrix into 3 matrices:
  • 6. 6  Word-Dimension Matrix  Documents dimension Matrix  Diagonal Matrix(Identity) Word-Dimension Matrix:- PLOTTING X1 vs X2 Inferences:  When we break the term document matrix into dimension-word vector space chart it is clearly visible that the positive words like good, features like screen, battery etc, are occurring mostly at dimension-1.  Grand, Mega , phone occurring mostly in dimension-2  Display, price, money, quality of phone are more or less occurring in both the dimensions.
  • 7. 7 PLOTTING X1 vs X3 Inferences:  Grand, Mega , phone occurring mostly in dimension-1  quality related words are more or less occurring in both the dimensions. PLOTTING X2 vs X3
  • 8. 8 Inferences:  Grand, Mega occurring mostly in dimension-1  Camera, samsung are more or less occurring in both the dimensions. PLOTTING FOR DOCUMENT MATRIX Inference:-  Document no. 71, 67 49 are close to dimension-1.  Document no.68 is close to dimension-2.  HIERARCHIAL CLUSTERING TO DETERMINE OPTIMUM NO. OF CLUSTERS In data mining, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types:  Agglomerative: This is a "bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.  Divisive: This is a "top down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy. In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering are usually presented in a dendrogram.
  • 9. 9 As per the plot, optimum values could be either 3,4 or 5. CLUSTER ANALYSIS:  From the above LSA analysis we got 5 optimum numbers of clusters, which shows that there are 5 categories of reviews out of total 100 reviews.  So as if now we will concentrate on 5 review-clusters which are helpful to club different types of reviews from the users. CLUSTER-1
  • 10. 10  There are total of 57 observations in this cluster.  This cluster consists of words related to price and the look of the phone CLUSTER-2  There are 32 observations in this cluster  This cluster consists of words from reviews that are from customers who have good experience with this mobile. WORD CLOUD FOR CLUSTER-1
  • 11. 11 CLUSTER-3  There are 38 observations in this cluster  This cluster consists of words from reviews that are from customers who have good faith in the company. WORD CLOUD FOR CLUSTER-2
  • 12. 12 CLUSTER-4  There are only 2 observations in this cluster  This cluster doesnot throw any light on the nature of the cluster. WORD CLOUD FOR CLUSTER-4 WORD CLOUD FOR CLUSTER-3
  • 13. 13 CLUSTER-5  The above cluster has 1449 number of observations.  This cluster consists of words related to product features & quality. WORD CLOUD FOR CLUSTER-5
  • 14. 14 INFERENCE DRAWN FROM CLUSTERING Apart from Cluster 1 , all other cluster does not give sufficient information about the customer base/type. Moreover Cluster 2,3,4,5 is substantially smaller than Cluster 1 and no constructive storyline can be curved out of them. Whereas Cluster 1 reflects almost anything & everything about the various features of the phone that the customers might have liked. C.REVIEWS RATINGS ANALYSIS Total Reviews=100 Satisfied Reviews: 73 Dissatisfied Reviews: 27 Checking The Ratings Gives A Better Idea That Most Users Are Satisfied With This Mobile. D. CLASSIFICATION USING SUPPORT VECTOR MACHINE(SVM) In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other, making it a non-probabilistic binary linear classifier. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on. The generalization properties of an SVM do not depend on the dimensionality of the space. You can bound the generalization error by a term depending on the quotient of radius of a ball which contains all the data and the margin realized on that data, but not on the dimensionality of the space. Many extensions exist, but the answer is essentially the same: The generalization does not depend on the
  • 15. 15 dimensionality. An extended explanation is that you can generalize well even in high-dimensional spaces because the data occupies only a low-dimensional subspace of the feature space, and regularization results in the learner dealing only with that subspace. You can see this for yourself if you look at the eigenvalues of the kernel matrix which typically decay quickly, meaning that you can project your data to a low- dimensional subspace with negligible error. So even if you have, for example, a Gaussian kernel, where the feature space is infinite-dimensional, you are actually dealing with an essentially finite dimensional kernel feature space where you are learning a linear decision function, which is statistically tractable. Note that you need to regularize, though. From the above fig. we can infer that out of 100 data points , 95 data points contribute to the formation of marginal plane . 6 words displayed at the head has negative coefficient 6 words displayed at the tail has positive coefficient
  • 16. 16 Using SVM, we have classified reviews into two categories . Since “dissatisfied” is the first level, the words with negative coefficient have positive impact & vice versa Snapshot of Data Frame containing list of words & their frequency count SENTIMENT ANALYSIS Sentiment essentially relates to feelings; attitudes, emotions and opinions. Sentiment Analysis refers to the practice of applying Natural Language Processing and Text Analysis techniques to identify and extract subjective information from a piece of text. A person‟s opinion or feelings are for the most part subjective and not facts. Which means to accurately analyze an individual‟s opinion or mood from a piece of text can be extremely difficult. With Sentiment Analysis from a text analytics point of view, we are essentially looking to get an understanding of the attitude of a writer with respect to a
  • 17. 17 topic in a piece of text and its polarity; whether it‟s positive, negative or neutral. In recent years there has been a steady increase in interest from brands, companies and researchers in Sentiment Analysis and its application to business analytics. The business world today, as is the case in many data analytics streams, is looking for “business insight.” ● Installing „qdap‟ package ● We decided the threshold value for polarity to classify between satisfied & dissatisfied on the basis of the plot in the next page. ● This tree plot was done using library “party”. ● Output of the popularity check or sentiment analysis gives the clear message that 65% of the buyers are satisfied with the purchase of the mobile
  • 18. 18 CONCLUSION & BUSINESS PERSPECTIVE • Output of our text analytics techniques brings out the fact that Samsung Galaxy Mega 5.8 I9152 is a mobile worth buying. • Most of the customers who bought it are extremely satisfied with the various features it offers. • Customer segmentation is possible but the very clear classification is not possible as there are lot many features that are equally liked by all across clusters. • But of course buyers can be classified in terms of their satisfaction level.
  • 19. 19 APPENDIX CODES #WEB CRAWLING & WORD CLOUD #install.packages("RCurl") library(RCurl) #install.packages("XML") library(XML) #install.packages("rvest") library(rvest) library(wordcloud) library(tm) FLIPKART<-"http://www.flipkart.com/samsung-galaxy-core-18262/product-reviews/ITMDV6F6KYTTPGU4" d = getURL(FLIPKART) doc=htmlParse(d) list=getNodeSet(doc,"//a") list_href=sapply(list,function(x)xmlGetAttr(x,"href")) page_link=grep("start=",list_href) page_links<-list_href[page_link] page_links<-unique(page_links) crawl_candidate<-"start=" base="http://www.flipkart.com" num<-10 doclist=list() anchorlist=vector() j=0 while(j<num) { if(j==0) { doclist[j+1]<-getURL(FLIPKART)
  • 21. 21 corpus=tm_map(corpus,removeNumbers) corpus=tm_map(corpus,removeWords,stopwords("en")) corpus=Corpus(VectorSource(corpus)) tdm=TermDocumentMatrix(corpus) m=as.matrix(tdm) v=sort(rowSums(m),decreasing=T) d=data.frame(words=names(v),freq=v) wordcloud(d$words,d$freq,max.words=300,colors=brewer.pal(8,"Dark2"),scale=c(3,0.5),random.order=F) #REVIEW RATINGS reviews=c() ratings=c() missingRating=data.frame(Page=0,missing=0) for(i in 1:10){ doc=htmlParse(doclist[[i]]) l=getNodeSet(doc,"//div/p/span") rateNodes=getNodeSet(doc,"//div[@class='fk-stars']") rates=sapply(rateNodes,function(x)xmlGetAttr(x,"title")) ratings=c(ratings,rates) l1=html_text(l) reviews=c(reviews,l1) } View(reviews) View(ratings) reviews100=reviews[1:100] reviews100 ratings rating=gsub(" star[s]?","",ratings) rating=as.numeric(rating) satisfaction=ifelse(rating>3,"satisfied","dissatisfied")
  • 22. 22 satisfaction dtmmobile=create_matrix(reviews100,removePunctuation=T,removeNumbers=T,weighting=weightTfIdf,ste mWords=TRUE) dtmmobile=as.matrix(dtmmobile) data=as.data.frame(dtmmobile) data=cbind(data,satisfaction) #data1=na.omit(data) data=data[,colSums(data[,-length(data)])>0] View(data) table(data$satisfaction) svm=svm(satisfaction~.,data=data) svm #To get variable importance in prediction, SVM weights are evaluated as shown below coef_imp=as.data.frame(t(svm$coefs)%*%svm$SV) coef_imp1=data.frame(words=names(coef_imp),Importance=t(coef_imp)) coef_imp1=coef_imp1[order(coef_imp1$Importance),] head(coef_imp1) tail(coef_imp1) View(coef_imp1) #LSA & CLUSTERING library(vegan) install.packages("RTools") library(RTools) library(RTextTools) library(mclust) library(lsa) library(cluster) tdm=create_matrix(reviews,removeNumbers=T) tdm_tfidf=weightTfIdf(tdm) m=as.matrix(tdm) m_tfidf=as.matrix(tdm_tfidf)
  • 23. 23 lsa_m=lsa(t(m),dimcalc_share(share=0.8)) lsa_m_tk=as.data.frame(lsa_m$tk) lsa_m_dk=as.data.frame(lsa_m$dk) lsa_mtfidf=lsa(t(m_tfidf),dimcalc_share(share=0.8)) k50=kmeans(scale(lsa_m$dk),centers=50,nstart=20) centers50=aggregate(cbind(V1,V2,V3)~k50$cluster,data=as.data.frame(lsa_m$dk),FUN=mean) d=dist(centers50[,-1]) hc=hclust(d,method="ward.D") plot(hc,hang=-1) rect.hclust(hc,h=0.3) rect.hclust(hc,h=0.4,border="blue") rect.hclust(hc,h=1.0,border="cyan") rect.hclust(hc,h=1.25,border="green") rect.hclust(hc,h=1.7,border="black") #As per the plot, optimum values could be either 3,4 or 5 k3=kmeans(scale(lsa_m$tk),centers=3,nstart=20) centers3=aggregate(cbind(V1,V2,V3)~k3$cluster,data=as.data.frame(lsa_m$tk),FUN=mean) k4=kmeans(scale(lsa_m$tk),centers=4,nstart=20) centers4=aggregate(cbind(V1,V2,V3)~k4$cluster,data=as.data.frame(lsa_m$tk),FUN=mean) k5=kmeans(scale(lsa_m$tk),centers=5,nstart=20) centers5=aggregate(cbind(V1,V2,V3)~k5$cluster,data=as.data.frame(lsa_m$tk),FUN=mean) lsa_tk=lsa_m$tk v=sort(colSums(m),decreasing=T) wordFreq=data.frame(words=names(v),freq=v) k5_1=wordFreq[k5$cluster==1,] k5_2=wordFreq[k5$cluster==2,] k5_3=wordFreq[k5$cluster==3,] k5_4=wordFreq[k5$cluster==4,] k5_5=wordFreq[k5$cluster==5,]
  • 24. 24 lsa_dk=as.data.frame(lsa_m$dk) lsa_dk3=data.frame(words=rownames(lsa_dk),lsa_dk[,1:3]) plot(lsa_dk3$V1,lsa_dk3$V2) text(lsa_dk3$V1,lsa_dk3$V2,label=lsa_dk3$words) k50=kmeans(scale(lsa_m$tk),centers=50,nstart=20) centers50=aggregate(cbind(V1,V2,V3)~k50$cluster,data=as.data.frame(lsa_m$tk),FUN=mean) lsa_tk3=data.frame(words=rownames(lsa_tk),lsa_tk[,1:3]) plot(lsa_tk3$X1,lsa_tk3$X2) text(lsa_tk3$X1,lsa_tk3$X2,label=lsa_tk3$words) plot(lsa_tk3$X2,lsa_tk3$X3) text(lsa_tk3$X2,lsa_tk3$X3,label=lsa_tk3$words) plot(lsa_tk3$X3,lsa_tk3$X1) text(lsa_tk3$X3,lsa_tk3$X1,label=lsa_tk3$words) #sENTIMENT ANALYSIS #qdap data1=data satisfaction1=as.data.frame(satisfaction) for(i in 1:100) { sent=sent_detect(reviews[i]) pol=polarity(sent) data1$polarity[i]=pol$group$stan.mean.polarity satisfaction1$polarity_val[i]=pol$group$stan.mean.polarity if(is.na(satisfaction1$polarity_val[i])) {satisfaction1$polarity_val[i]=pol$group$ave.polarity data1$polarity[i]=pol$group$ave.polarity}