SlideShare a Scribd company logo
A for Analytics
CONTINUING EDUCATION PROGRAMME
DMS IIT-DELHI
3/24/2013-6/23/2013
Harshad B. Madhamshettiwar
Paper submitted in the partial fulfillment of the
requirements for the Certificate of
Business Analytics and Optimization
1
Harshad Madhamshettiwar
A for Analytics
Objective:
This paper is aimed at explaining why text analytics is important from business point of view for any kind
of business and how sentiment analysis is used to make the better decisions and science behind it.
Background:
Time has changed now; people have become talkative and active in sense of sharing opinions, unlike past
(era before World Wide Web) when an individual’s opinions were shared only to family and friends; The
Web has dramatically changed the way that people express their views and opinions which can influence
decision of thousands or millions of people directly or indirectly.
And that when gut based decisions aren’t worthy for business its now time for the makeover; time
for data driven decision making and setting of fact based goals, using statistical science and various
analytical tools.
One of the analytics method which help make better decisions is text analytics. Now a day’s business
profits and different strategies are dependent on the customer feedback and demand. Companies are more
focusing on getting documented feedback from various sources like: surveys, social networking sites, and
blogs etc. etc.. This leads to generation of huge amount of text data and also answers to questioners,
“undoubtedly the most unstructured or semi-structured data. “
After churning and cleaning the DATA and it is converted into source of critical information
having large number of opportunities hidden in it in form of the behavior and sentiments of customers i.e.
what customer thinks and feels about your service and products and judges how accountable and reliable
is the company, how he/she is promoting you, whether with good or bad reviews.
Context:
In many cases, opinions are hidden in long forum posts and blogs.
It is difficult for a human reader to find relevant sources, extract related sentences with opinions, read
them, summarize them, and organize them into usable forms. Thus, automated opinion discovery and
summarization systems are needed. Sentiment analysis, also known as opinion mining, grows out of this
need. It is a challenging natural language processing or text mining problem. [1]
Context of the content of the paper revolves around why the sentiment analysis is being used vastly by the
companies and how it works.
Literature Review:
Text analytics reveals insights from electronic text materials, associates them so they go to the right
person and place, and provides intelligence to know what you need to do next – whether it is answering
complex search-and-retrieval questions, presenting relevant content to internal or external Web users, or
predicting which phrase will best affect sentiments.
Sentiment analysis automatically locates and extracts sentiment from online materials, such as social
networking sites, comments and blogs on the Internet, as well as internal electronic documents.
Text analytics brings together multiple approaches: [1]
• Text mining involves techniques from several areas, including the fields of computational linguistics
and information retrieval, to structure text into a numeric representation for use in traditional data mining
and predictive analysis.
2
Harshad Madhamshettiwar
• Natural language processing – a discipline from the field of artificial intelligence – combines computer
science and linguistics to identify meaningful concepts, attributes and opinions in the spoken or written
word.
 Best of both worlds(Hybrid Approach)
Data mining approach:
A data mining approach to sentiment analysis translates an unstructured text problem to one that makes
predictions on structured, quantitative data. The approach borrows several techniques from computational
linguistics and information retrieval communities to represent the text numerically, and then applies
traditional data mining techniques to this numeric representation. In the end, a target variable is identified
and a pattern is discovered from the training data for predicting sentiment polarity. This pattern can then
be used to predict new observations.
The first step in creating the numeric representation is to convert the entire training collection into a
document-by-term frequency matrix. Each document is parsed into individual terms, or term/part-of-
speech pairs. Then the set of all terms becomes the variables on the data set so that documents are now
represented as vectors of length equal to the number of distinct terms in the collection. These vectors are
very sparse, containing mostly zeroes – because any one document contains a very small percentage of
the terms in the collection. Once the documents are represented as vectors, the frequencies in each cell
can be weighted with a function that takes into account the distribution of the term across the collection
and relative to the levels of the target variable.
After these document vectors are formed, a dimension reduction technique – such as the singular value
decomposition (see Taming Text with the SVD, Albright, 2004) – is typically used to represent each
document in a reduced-dimensional space of maybe 50 to 100 variables, where each variable is a linear
combination of the weighted terms that originally represented each document.
Finally, these reduced-dimensional vectors, together with the sentiment variable, can be supplied to a
predictive model. The model will attempt to learn from the training data by utilizing patterns in the
reduced-dimensional vector. This predictive model will then create a function that will predict the
sentiment for any document.
Benefits of the data mining approach
The data mining approach is appealing because it is based on learning patterns that are useful for making
automated, efficient predictions. The algorithms are capable of discovering unimagined and complicated
patterns that would be beyond what a human could anticipate. Frequently, a data mining approach can
beat a rule-based approach in topic classification. Of course, this is dependent on having enough training
data to build the model.
Drawback of the data mining approach
The vector-based representation of a document, which is required for data mining techniques, does not
maintain information that is potentially important to sentiment classification. For example, the vector
representation does not capture when terms are close to one another in the document, if one term precedes
another or any other contextual cues. The order of terms in a phrase can significantly affect meaning.
Consider the phrases:
“… night for a great movie”
and
“… great night for a movie”
These two phrases convey two different meanings; yet in a vector representation, the phrases have an
identical representation.
In addition, most predictive models provide little feedback to the user as to precisely why a particular
document was classified as having positive or negative polarity. So when you attempt to understand what
positive things people said in a particular document, you frequently have to read the entire document to
discover the answer.
3
Harshad Madhamshettiwar
As a final drawback, forming the training and validation is an essential component of learning a
predictive model, but it can be very time-consuming and challenging. A rating needs to be provided for
every document, and if there are attributes of documents that you wish to use to measure sentiment, you
will need to provide a rating for each of these as well. Another complication is that two different
reviewers frequently assign two different sentiment ratings to the same document. This can introduce
unexpected errors in building and measuring the performance of your model.
Natural language processing approach:
Natural language processing (NLP) is a field of artificial intelligence that deals with automatically
extracting meaning from natural language text. As discussed in the introduction of this paper, it’s very
challenging to get machines to understand text at the same levels as humans. Doing this with the specific
goal of extracting sentiment is even more challenging.
Natural language processing (NLP) combines computer science and linguistics to identify meaningful
concepts and attributes in the spoken or written word. In the context of text analytics, this analysis most
often applies to electronic documents.
The rule-based NLP methods use certain entities and syntactic patterns in the text to understand its
meaning.
Figure 1 below shows steps involved in sentiment analysis by NLP is carried out. [3][5]
Figure1: Sentiment analysis by NLP approach.
Benefits of the NLP approach
The major advantage of rule-based methods is the amount of control they give rule developers over how
the analysis will be performed. Developers can use their knowledge of the domain and the language
within it to develop rules that have high precision.
Text analytics
Defining problems of
sentiment analysis
Sentiment and subjectivity
classification
Document-Level Sentiment
Classification
Sentence-Level Subjectivity
and Sentiment Classification
Opinion Lexicon Generation
Feature-based sentiment
analysis
Feature Extraction
Opinion Orientation
Identification
Opinion search and retrieval
Opinion spam and utility of
opinions
Opinion Spam
Utility of Reviews
Sentiment analysis of
comparative sentences
Problem Definition
Identification of Comparative
Sentences
Extraction of Objects and
Object Features in
Comparative Sentences
Identification of Preferred
Objects in Comparative
Sentences
Sentiment analysis (NLP)
4
Harshad Madhamshettiwar
Unlike statistical analysis, the results of rule-based analysis are easily interpretable. This is very important
for real-life applications where the analysts need to know exactly why a document or an attribute within a
document was tagged as positive or negative. In other words, analysts need to know exactly what
sentences, keywords or context within the document triggered the positive or negative sentiment.
Figure 2 shows an example of this. [6]
Phrases are marked in original text based on their sentiment score as: Negative, Neutral, Positive.
The document sentiment is: +0.202
Summary
A beginner in analytics is like a child learning Alphabets for first time; it seems to be very complex in first go but then practice makes
man perfect%u2026.slowly... For us; analytics is same, its just waiting for us to learn more and keep learning and then it will become
a part of us%u2026slowly child will become an expert...
Entities
No entities could be found.
Themes
Evidence Sentiment
learning alphabets 4 +0.20
u2026slowly child 4 +0.20
beginning child 4 +0.20
Topics
Score
Education 0.72
Figure 2: Example showing different entities that were used for rule-based analysis.
Rule-based methods are completely unsupervised; that is, they do not require any training data. This is a
big advantage in real-life applications where training data is scarce. The non-availability of training data
is more pronounced when it comes to granular sentiment analysis (sentiment derived at the objects and
attributes level).
Another advantage of rule-based methods is their ability to refine the rules over time based on the
feedback from analysts or subject-matter experts. The more time the rule developer spends on refining the
5
Harshad Madhamshettiwar
rules, the better the results. Language evolves over time and people start using newer terms to express
their sentiments. This is especially true for social media, where the language used changes all the time. In
such cases, rule-based methods give you the flexibility needed to adjust your models accordingly.
Drawback of the NLP approach
The disadvantage of rule-based methods is that they require a lot of human involvement in developing the
rules. These methods completely rely on the domain knowledge of rule developers. It might take a few
weeks to come up with a strong rule-based model for a new domain. However, once you have a strong
rule-based model for a domain, you can reuse that model with some minor modifications for different
applications within the domain.
The importance of validation data is often underestimated while developing these models. The rules being
written must be generic enough so that they are capable of handling all possible cases. Inexperienced rule
developers tend to over-fit their rules to the sample data they are working with. Such rules might not work
well when tested on different data sets. So, rule developers must make sure they validate the rules on
different data sets before considering a model ready to deploy.
Discussion:
We now know that how sentiment analytics works effectively throughout wide range of industries.
Text analytics can be approached from two different directions,
• Discovery-driven. When you don’t know where to start, a discovery-driven approach helps identify key
patterns and attributes in the unstructured data at hand. This exploration reveals new insights, which are
then used to define the structure, such as the categories and concepts you will use.
• Domain-driven. If there is already an understanding of the data or some domain knowledge regarding
which terms and phrases are meaningful, you can start with this knowledge and find where it exists in the
materials.
Both approaches are valid, and more importantly, they complement each other. “Discovery of concepts
can be used to define a structure or taxonomy for the data. On the other hand, content that doesn’t fit into
a predefined structure can be further explored using discovery to find previously unknown information.”
Organizations in a variety of industries – from the public and private sector, from manufacturing to
finance to health care – are using these approaches in inventive ways.
Figure 3: Industries adopting text and sentiment analytics [2]
All these industries are using sentiment analytics because the reviews have economic impact.
Economic impact of Reviews [4]
As mentioned, many readers of online reviews say that these reviews significantly influence their
purchasing decisions. However, while these readers may have believed that they were “significantly
Text and
Sentiment
analysis
Governm
ent and
Research
Health
and Life
Sciences
Finance
Media
and
Publishin
g
Film
Entertain
ment
Industry
E-
Business
6
Harshad Madhamshettiwar
influenced”, perception and reality can differ. A key reason to understand the real economic impact of
reviews is that the results of such an analysis have important implications for how much effort companies
might or should want to expend on online reputation monitoring and management.
Given the rise of online commerce, it is not surprising that a body of work centered within the economics
and marketing literature studies the question of whether the polarity (often referred to as “valence”)
and/or volume of reviews available online have a measurable, significant influence on actual consumer
purchasing.
One way to acquire a good reputation is, of course, by receiving many positive reviews of oneself as a
merchant; another is for the products one offers to receive many positive reviews. For the purposes of our
discussion, we regard experiments wherein the buying is hypothetical as being out of scope; instead, we
focus on economic analyses of the behavior of people engaged in real shopping and spending real money.
The general form that most studies take is to use some form of hedonic regression to analyze the value
and the significance of different item features to some function, such as a measure of utility to the
customer, using previously recorded data. Specific economic functions that have been examined include
revenue (box-office take, sales rank on Amazon, etc.), revenue growth, stock trading volume, and
measures that auction-sites like eBay make available, such as bid price or probability of a bid or sale
being made.
It is important to note that some conclusions drawn from one domain often do not carry over to another;
for instance, reviews seem to be influential for big-ticket items but less so for cheaper items. But there are
also conflicting findings within the same domain. Moreover, different subsegments of the consumer
population may react differently: for example, people who are more highly motivated to purchase may
take ratings more seriously. Additionally, in some studies, positive ratings have an effect but negative
ones don’t, and in other studies the opposite effect is seen; the timing of such feedback and various
characteristics of the merchant or of the feedback itself (e.g., volume) may also be a factor.
Nonetheless, to gloss over many details for the sake of brevity: if one allows any effect — including
correlation even if said correlation is shown to be not predictive — that passes a statistical significance
test at the .05 level to be classed as “significant”, then many studies find that review polarity has a
significant economic effect.
Conclusion:
Independently, both the domain knowledge and the data mining approaches to sentiment analysis have
their strengths and weaknesses; but hopefully you will not be forced to choose between using one or the
other for your analysis. In this paper, we have shown that the two approaches complement one another.
So, while the NLP approach leverages the rule builder’s domain knowledge, text mining can also be used
by that person to improve, clarify or correct how that knowledge relates to the particular collection being
analyzed.
References:
7
Harshad Madhamshettiwar
[1] White Paper- Combining Knowledge and Data Mining to Understand Sentiment – A Practical
Assessment of Approaches (www.sas.com/offices)
[2] Text Analytics 101: Improve Decision-Making by Incorporating Unstructured Data – Words and
Images – into Analytic Processes
Insights from a webinar in the SAS Applying Business Analytics Series Originally broadcast in April
2010
[3] Sentiment Analysis and Subjectivity
Bing Liu
Department of Computer Science
University of Illinois at Chicago
[4] Opinion mining and sentiment analysis
Bo Pang1
and Lillian Lee2
1 Yahoo! Research, 701 First Ave. Sunnyvale, CA 94089, U.S.A., bopang@yahoo-inc.com
2 Computer Science Department, Cornell University, Ithaca, NY 14853, U.S.A., llee@cs.cornell.edu
[5] How sentiment analysis works in machines (an introduction)
www.slideshare.net
[6] Web Demo Lexalytics.htm

More Related Content

What's hot

IRJET- A Survey on Graph based Approaches in Sentiment Analysis
IRJET- A Survey on Graph based Approaches in Sentiment AnalysisIRJET- A Survey on Graph based Approaches in Sentiment Analysis
IRJET- A Survey on Graph based Approaches in Sentiment Analysis
IRJET Journal
 
Methods for Sentiment Analysis: A Literature Study
Methods for Sentiment Analysis: A Literature StudyMethods for Sentiment Analysis: A Literature Study
Methods for Sentiment Analysis: A Literature Study
vivatechijri
 
A Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningA Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion Mining
IJSRD
 
A Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie ReviewsA Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie Reviews
Editor IJMTER
 
Sentiment Analysis and Classification of Tweets using Data Mining
Sentiment Analysis and Classification of Tweets using Data MiningSentiment Analysis and Classification of Tweets using Data Mining
Sentiment Analysis and Classification of Tweets using Data Mining
IRJET Journal
 
An Improved sentiment classification for objective word.
An Improved sentiment classification for objective word.An Improved sentiment classification for objective word.
An Improved sentiment classification for objective word.
IJSRD
 
A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...
A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...
A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...
IRJET Journal
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
Journal For Research
 
Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)
Mido Razaz
 
Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization
Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven SummarizationTowards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization
Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization
Panos Alexopoulos
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with Idioms
Waqas Tariq
 
IRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET- Interpreting Public Sentiments Variation by using FB-LDA TechniqueIRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET Journal
 
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
TELKOMNIKA JOURNAL
 
295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysis295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysisZahid Azam
 
INFORMATION RETRIEVAL FROM TEXT
INFORMATION RETRIEVAL FROM TEXTINFORMATION RETRIEVAL FROM TEXT
INFORMATION RETRIEVAL FROM TEXT
ijcseit
 
Conceptual Sentiment Analysis Model
Conceptual Sentiment Analysis Model Conceptual Sentiment Analysis Model
Conceptual Sentiment Analysis Model
IJECEIAES
 
Query recommendation papers
Query recommendation papersQuery recommendation papers
Query recommendation papersAshish Kulkarni
 
IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...
IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...
IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...
IRJET Journal
 
Sentiment Analysis of Feedback Data
Sentiment Analysis of Feedback DataSentiment Analysis of Feedback Data
Sentiment Analysis of Feedback Data
ijtsrd
 

What's hot (19)

IRJET- A Survey on Graph based Approaches in Sentiment Analysis
IRJET- A Survey on Graph based Approaches in Sentiment AnalysisIRJET- A Survey on Graph based Approaches in Sentiment Analysis
IRJET- A Survey on Graph based Approaches in Sentiment Analysis
 
Methods for Sentiment Analysis: A Literature Study
Methods for Sentiment Analysis: A Literature StudyMethods for Sentiment Analysis: A Literature Study
Methods for Sentiment Analysis: A Literature Study
 
A Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningA Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion Mining
 
A Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie ReviewsA Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie Reviews
 
Sentiment Analysis and Classification of Tweets using Data Mining
Sentiment Analysis and Classification of Tweets using Data MiningSentiment Analysis and Classification of Tweets using Data Mining
Sentiment Analysis and Classification of Tweets using Data Mining
 
An Improved sentiment classification for objective word.
An Improved sentiment classification for objective word.An Improved sentiment classification for objective word.
An Improved sentiment classification for objective word.
 
A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...
A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...
A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
 
Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)
 
Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization
Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven SummarizationTowards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization
Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with Idioms
 
IRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET- Interpreting Public Sentiments Variation by using FB-LDA TechniqueIRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
 
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
 
295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysis295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysis
 
INFORMATION RETRIEVAL FROM TEXT
INFORMATION RETRIEVAL FROM TEXTINFORMATION RETRIEVAL FROM TEXT
INFORMATION RETRIEVAL FROM TEXT
 
Conceptual Sentiment Analysis Model
Conceptual Sentiment Analysis Model Conceptual Sentiment Analysis Model
Conceptual Sentiment Analysis Model
 
Query recommendation papers
Query recommendation papersQuery recommendation papers
Query recommendation papers
 
IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...
IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...
IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...
 
Sentiment Analysis of Feedback Data
Sentiment Analysis of Feedback DataSentiment Analysis of Feedback Data
Sentiment Analysis of Feedback Data
 

Similar to NLP Ecosystem

L017358286
L017358286L017358286
L017358286
IOSR Journals
 
Evaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on BrexitEvaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on Brexit
IAESIJAI
 
Co-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsCo-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online Reviews
Editor IJCATR
 
LSTM Based Sentiment Analysis
LSTM Based Sentiment AnalysisLSTM Based Sentiment Analysis
LSTM Based Sentiment Analysis
ijtsrd
 
A Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningA Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion Mining
IJSRD
 
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOMTEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
ITC Infotech
 
Text Analysis in Research
Text Analysis in ResearchText Analysis in Research
Text Analysis in Research
Bytesview
 
Vol 7 No 1 - November 2013
Vol 7 No 1 - November 2013Vol 7 No 1 - November 2013
Vol 7 No 1 - November 2013
ijcsbi
 
Sentimental analysis of audio based customer reviews without textual conversion
Sentimental analysis of audio based customer reviews without textual conversionSentimental analysis of audio based customer reviews without textual conversion
Sentimental analysis of audio based customer reviews without textual conversion
IJECEIAES
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A Review
INFOGAIN PUBLICATION
 
Running head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docx
Running head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docxRunning head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docx
Running head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docx
healdkathaleen
 
0 Employer Employee Scheme.pptx
0 Employer Employee Scheme.pptx0 Employer Employee Scheme.pptx
0 Employer Employee Scheme.pptx
Dr. J. D. Chandrapal
 
Hybrid Deep Learning Model for Multilingual Sentiment Analysis
Hybrid Deep Learning Model for Multilingual Sentiment AnalysisHybrid Deep Learning Model for Multilingual Sentiment Analysis
Hybrid Deep Learning Model for Multilingual Sentiment Analysis
IRJET Journal
 
A simplified classification computational model of opinion mining using deep ...
A simplified classification computational model of opinion mining using deep ...A simplified classification computational model of opinion mining using deep ...
A simplified classification computational model of opinion mining using deep ...
IJECEIAES
 
Information Retrieval on Text using Concept Similarity
Information Retrieval on Text using Concept SimilarityInformation Retrieval on Text using Concept Similarity
Information Retrieval on Text using Concept Similarity
rahulmonikasharma
 
Implementation of Semantic Analysis Using Domain Ontology
Implementation of Semantic Analysis Using Domain OntologyImplementation of Semantic Analysis Using Domain Ontology
Implementation of Semantic Analysis Using Domain Ontology
IOSR Journals
 
opinion feature extraction using enhanced opinion mining technique and intrin...
opinion feature extraction using enhanced opinion mining technique and intrin...opinion feature extraction using enhanced opinion mining technique and intrin...
opinion feature extraction using enhanced opinion mining technique and intrin...
INFOGAIN PUBLICATION
 
Sentiment analysis on_unstructured_review-1
Sentiment analysis on_unstructured_review-1Sentiment analysis on_unstructured_review-1

Similar to NLP Ecosystem (20)

L017358286
L017358286L017358286
L017358286
 
Evaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on BrexitEvaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on Brexit
 
Co-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsCo-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online Reviews
 
LSTM Based Sentiment Analysis
LSTM Based Sentiment AnalysisLSTM Based Sentiment Analysis
LSTM Based Sentiment Analysis
 
A Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningA Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion Mining
 
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOMTEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
 
Text Analysis in Research
Text Analysis in ResearchText Analysis in Research
Text Analysis in Research
 
Vol 7 No 1 - November 2013
Vol 7 No 1 - November 2013Vol 7 No 1 - November 2013
Vol 7 No 1 - November 2013
 
Sentimental analysis of audio based customer reviews without textual conversion
Sentimental analysis of audio based customer reviews without textual conversionSentimental analysis of audio based customer reviews without textual conversion
Sentimental analysis of audio based customer reviews without textual conversion
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A Review
 
Running head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docx
Running head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docxRunning head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docx
Running head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docx
 
0 Employer Employee Scheme.pptx
0 Employer Employee Scheme.pptx0 Employer Employee Scheme.pptx
0 Employer Employee Scheme.pptx
 
Hybrid Deep Learning Model for Multilingual Sentiment Analysis
Hybrid Deep Learning Model for Multilingual Sentiment AnalysisHybrid Deep Learning Model for Multilingual Sentiment Analysis
Hybrid Deep Learning Model for Multilingual Sentiment Analysis
 
A simplified classification computational model of opinion mining using deep ...
A simplified classification computational model of opinion mining using deep ...A simplified classification computational model of opinion mining using deep ...
A simplified classification computational model of opinion mining using deep ...
 
Information Retrieval on Text using Concept Similarity
Information Retrieval on Text using Concept SimilarityInformation Retrieval on Text using Concept Similarity
Information Retrieval on Text using Concept Similarity
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
Implementation of Semantic Analysis Using Domain Ontology
Implementation of Semantic Analysis Using Domain OntologyImplementation of Semantic Analysis Using Domain Ontology
Implementation of Semantic Analysis Using Domain Ontology
 
2
22
2
 
opinion feature extraction using enhanced opinion mining technique and intrin...
opinion feature extraction using enhanced opinion mining technique and intrin...opinion feature extraction using enhanced opinion mining technique and intrin...
opinion feature extraction using enhanced opinion mining technique and intrin...
 
Sentiment analysis on_unstructured_review-1
Sentiment analysis on_unstructured_review-1Sentiment analysis on_unstructured_review-1
Sentiment analysis on_unstructured_review-1
 

Recently uploaded

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 

NLP Ecosystem

  • 1. A for Analytics CONTINUING EDUCATION PROGRAMME DMS IIT-DELHI 3/24/2013-6/23/2013 Harshad B. Madhamshettiwar Paper submitted in the partial fulfillment of the requirements for the Certificate of Business Analytics and Optimization
  • 2. 1 Harshad Madhamshettiwar A for Analytics Objective: This paper is aimed at explaining why text analytics is important from business point of view for any kind of business and how sentiment analysis is used to make the better decisions and science behind it. Background: Time has changed now; people have become talkative and active in sense of sharing opinions, unlike past (era before World Wide Web) when an individual’s opinions were shared only to family and friends; The Web has dramatically changed the way that people express their views and opinions which can influence decision of thousands or millions of people directly or indirectly. And that when gut based decisions aren’t worthy for business its now time for the makeover; time for data driven decision making and setting of fact based goals, using statistical science and various analytical tools. One of the analytics method which help make better decisions is text analytics. Now a day’s business profits and different strategies are dependent on the customer feedback and demand. Companies are more focusing on getting documented feedback from various sources like: surveys, social networking sites, and blogs etc. etc.. This leads to generation of huge amount of text data and also answers to questioners, “undoubtedly the most unstructured or semi-structured data. “ After churning and cleaning the DATA and it is converted into source of critical information having large number of opportunities hidden in it in form of the behavior and sentiments of customers i.e. what customer thinks and feels about your service and products and judges how accountable and reliable is the company, how he/she is promoting you, whether with good or bad reviews. Context: In many cases, opinions are hidden in long forum posts and blogs. It is difficult for a human reader to find relevant sources, extract related sentences with opinions, read them, summarize them, and organize them into usable forms. Thus, automated opinion discovery and summarization systems are needed. Sentiment analysis, also known as opinion mining, grows out of this need. It is a challenging natural language processing or text mining problem. [1] Context of the content of the paper revolves around why the sentiment analysis is being used vastly by the companies and how it works. Literature Review: Text analytics reveals insights from electronic text materials, associates them so they go to the right person and place, and provides intelligence to know what you need to do next – whether it is answering complex search-and-retrieval questions, presenting relevant content to internal or external Web users, or predicting which phrase will best affect sentiments. Sentiment analysis automatically locates and extracts sentiment from online materials, such as social networking sites, comments and blogs on the Internet, as well as internal electronic documents. Text analytics brings together multiple approaches: [1] • Text mining involves techniques from several areas, including the fields of computational linguistics and information retrieval, to structure text into a numeric representation for use in traditional data mining and predictive analysis.
  • 3. 2 Harshad Madhamshettiwar • Natural language processing – a discipline from the field of artificial intelligence – combines computer science and linguistics to identify meaningful concepts, attributes and opinions in the spoken or written word.  Best of both worlds(Hybrid Approach) Data mining approach: A data mining approach to sentiment analysis translates an unstructured text problem to one that makes predictions on structured, quantitative data. The approach borrows several techniques from computational linguistics and information retrieval communities to represent the text numerically, and then applies traditional data mining techniques to this numeric representation. In the end, a target variable is identified and a pattern is discovered from the training data for predicting sentiment polarity. This pattern can then be used to predict new observations. The first step in creating the numeric representation is to convert the entire training collection into a document-by-term frequency matrix. Each document is parsed into individual terms, or term/part-of- speech pairs. Then the set of all terms becomes the variables on the data set so that documents are now represented as vectors of length equal to the number of distinct terms in the collection. These vectors are very sparse, containing mostly zeroes – because any one document contains a very small percentage of the terms in the collection. Once the documents are represented as vectors, the frequencies in each cell can be weighted with a function that takes into account the distribution of the term across the collection and relative to the levels of the target variable. After these document vectors are formed, a dimension reduction technique – such as the singular value decomposition (see Taming Text with the SVD, Albright, 2004) – is typically used to represent each document in a reduced-dimensional space of maybe 50 to 100 variables, where each variable is a linear combination of the weighted terms that originally represented each document. Finally, these reduced-dimensional vectors, together with the sentiment variable, can be supplied to a predictive model. The model will attempt to learn from the training data by utilizing patterns in the reduced-dimensional vector. This predictive model will then create a function that will predict the sentiment for any document. Benefits of the data mining approach The data mining approach is appealing because it is based on learning patterns that are useful for making automated, efficient predictions. The algorithms are capable of discovering unimagined and complicated patterns that would be beyond what a human could anticipate. Frequently, a data mining approach can beat a rule-based approach in topic classification. Of course, this is dependent on having enough training data to build the model. Drawback of the data mining approach The vector-based representation of a document, which is required for data mining techniques, does not maintain information that is potentially important to sentiment classification. For example, the vector representation does not capture when terms are close to one another in the document, if one term precedes another or any other contextual cues. The order of terms in a phrase can significantly affect meaning. Consider the phrases: “… night for a great movie” and “… great night for a movie” These two phrases convey two different meanings; yet in a vector representation, the phrases have an identical representation. In addition, most predictive models provide little feedback to the user as to precisely why a particular document was classified as having positive or negative polarity. So when you attempt to understand what positive things people said in a particular document, you frequently have to read the entire document to discover the answer.
  • 4. 3 Harshad Madhamshettiwar As a final drawback, forming the training and validation is an essential component of learning a predictive model, but it can be very time-consuming and challenging. A rating needs to be provided for every document, and if there are attributes of documents that you wish to use to measure sentiment, you will need to provide a rating for each of these as well. Another complication is that two different reviewers frequently assign two different sentiment ratings to the same document. This can introduce unexpected errors in building and measuring the performance of your model. Natural language processing approach: Natural language processing (NLP) is a field of artificial intelligence that deals with automatically extracting meaning from natural language text. As discussed in the introduction of this paper, it’s very challenging to get machines to understand text at the same levels as humans. Doing this with the specific goal of extracting sentiment is even more challenging. Natural language processing (NLP) combines computer science and linguistics to identify meaningful concepts and attributes in the spoken or written word. In the context of text analytics, this analysis most often applies to electronic documents. The rule-based NLP methods use certain entities and syntactic patterns in the text to understand its meaning. Figure 1 below shows steps involved in sentiment analysis by NLP is carried out. [3][5] Figure1: Sentiment analysis by NLP approach. Benefits of the NLP approach The major advantage of rule-based methods is the amount of control they give rule developers over how the analysis will be performed. Developers can use their knowledge of the domain and the language within it to develop rules that have high precision. Text analytics Defining problems of sentiment analysis Sentiment and subjectivity classification Document-Level Sentiment Classification Sentence-Level Subjectivity and Sentiment Classification Opinion Lexicon Generation Feature-based sentiment analysis Feature Extraction Opinion Orientation Identification Opinion search and retrieval Opinion spam and utility of opinions Opinion Spam Utility of Reviews Sentiment analysis of comparative sentences Problem Definition Identification of Comparative Sentences Extraction of Objects and Object Features in Comparative Sentences Identification of Preferred Objects in Comparative Sentences Sentiment analysis (NLP)
  • 5. 4 Harshad Madhamshettiwar Unlike statistical analysis, the results of rule-based analysis are easily interpretable. This is very important for real-life applications where the analysts need to know exactly why a document or an attribute within a document was tagged as positive or negative. In other words, analysts need to know exactly what sentences, keywords or context within the document triggered the positive or negative sentiment. Figure 2 shows an example of this. [6] Phrases are marked in original text based on their sentiment score as: Negative, Neutral, Positive. The document sentiment is: +0.202 Summary A beginner in analytics is like a child learning Alphabets for first time; it seems to be very complex in first go but then practice makes man perfect%u2026.slowly... For us; analytics is same, its just waiting for us to learn more and keep learning and then it will become a part of us%u2026slowly child will become an expert... Entities No entities could be found. Themes Evidence Sentiment learning alphabets 4 +0.20 u2026slowly child 4 +0.20 beginning child 4 +0.20 Topics Score Education 0.72 Figure 2: Example showing different entities that were used for rule-based analysis. Rule-based methods are completely unsupervised; that is, they do not require any training data. This is a big advantage in real-life applications where training data is scarce. The non-availability of training data is more pronounced when it comes to granular sentiment analysis (sentiment derived at the objects and attributes level). Another advantage of rule-based methods is their ability to refine the rules over time based on the feedback from analysts or subject-matter experts. The more time the rule developer spends on refining the
  • 6. 5 Harshad Madhamshettiwar rules, the better the results. Language evolves over time and people start using newer terms to express their sentiments. This is especially true for social media, where the language used changes all the time. In such cases, rule-based methods give you the flexibility needed to adjust your models accordingly. Drawback of the NLP approach The disadvantage of rule-based methods is that they require a lot of human involvement in developing the rules. These methods completely rely on the domain knowledge of rule developers. It might take a few weeks to come up with a strong rule-based model for a new domain. However, once you have a strong rule-based model for a domain, you can reuse that model with some minor modifications for different applications within the domain. The importance of validation data is often underestimated while developing these models. The rules being written must be generic enough so that they are capable of handling all possible cases. Inexperienced rule developers tend to over-fit their rules to the sample data they are working with. Such rules might not work well when tested on different data sets. So, rule developers must make sure they validate the rules on different data sets before considering a model ready to deploy. Discussion: We now know that how sentiment analytics works effectively throughout wide range of industries. Text analytics can be approached from two different directions, • Discovery-driven. When you don’t know where to start, a discovery-driven approach helps identify key patterns and attributes in the unstructured data at hand. This exploration reveals new insights, which are then used to define the structure, such as the categories and concepts you will use. • Domain-driven. If there is already an understanding of the data or some domain knowledge regarding which terms and phrases are meaningful, you can start with this knowledge and find where it exists in the materials. Both approaches are valid, and more importantly, they complement each other. “Discovery of concepts can be used to define a structure or taxonomy for the data. On the other hand, content that doesn’t fit into a predefined structure can be further explored using discovery to find previously unknown information.” Organizations in a variety of industries – from the public and private sector, from manufacturing to finance to health care – are using these approaches in inventive ways. Figure 3: Industries adopting text and sentiment analytics [2] All these industries are using sentiment analytics because the reviews have economic impact. Economic impact of Reviews [4] As mentioned, many readers of online reviews say that these reviews significantly influence their purchasing decisions. However, while these readers may have believed that they were “significantly Text and Sentiment analysis Governm ent and Research Health and Life Sciences Finance Media and Publishin g Film Entertain ment Industry E- Business
  • 7. 6 Harshad Madhamshettiwar influenced”, perception and reality can differ. A key reason to understand the real economic impact of reviews is that the results of such an analysis have important implications for how much effort companies might or should want to expend on online reputation monitoring and management. Given the rise of online commerce, it is not surprising that a body of work centered within the economics and marketing literature studies the question of whether the polarity (often referred to as “valence”) and/or volume of reviews available online have a measurable, significant influence on actual consumer purchasing. One way to acquire a good reputation is, of course, by receiving many positive reviews of oneself as a merchant; another is for the products one offers to receive many positive reviews. For the purposes of our discussion, we regard experiments wherein the buying is hypothetical as being out of scope; instead, we focus on economic analyses of the behavior of people engaged in real shopping and spending real money. The general form that most studies take is to use some form of hedonic regression to analyze the value and the significance of different item features to some function, such as a measure of utility to the customer, using previously recorded data. Specific economic functions that have been examined include revenue (box-office take, sales rank on Amazon, etc.), revenue growth, stock trading volume, and measures that auction-sites like eBay make available, such as bid price or probability of a bid or sale being made. It is important to note that some conclusions drawn from one domain often do not carry over to another; for instance, reviews seem to be influential for big-ticket items but less so for cheaper items. But there are also conflicting findings within the same domain. Moreover, different subsegments of the consumer population may react differently: for example, people who are more highly motivated to purchase may take ratings more seriously. Additionally, in some studies, positive ratings have an effect but negative ones don’t, and in other studies the opposite effect is seen; the timing of such feedback and various characteristics of the merchant or of the feedback itself (e.g., volume) may also be a factor. Nonetheless, to gloss over many details for the sake of brevity: if one allows any effect — including correlation even if said correlation is shown to be not predictive — that passes a statistical significance test at the .05 level to be classed as “significant”, then many studies find that review polarity has a significant economic effect. Conclusion: Independently, both the domain knowledge and the data mining approaches to sentiment analysis have their strengths and weaknesses; but hopefully you will not be forced to choose between using one or the other for your analysis. In this paper, we have shown that the two approaches complement one another. So, while the NLP approach leverages the rule builder’s domain knowledge, text mining can also be used by that person to improve, clarify or correct how that knowledge relates to the particular collection being analyzed. References:
  • 8. 7 Harshad Madhamshettiwar [1] White Paper- Combining Knowledge and Data Mining to Understand Sentiment – A Practical Assessment of Approaches (www.sas.com/offices) [2] Text Analytics 101: Improve Decision-Making by Incorporating Unstructured Data – Words and Images – into Analytic Processes Insights from a webinar in the SAS Applying Business Analytics Series Originally broadcast in April 2010 [3] Sentiment Analysis and Subjectivity Bing Liu Department of Computer Science University of Illinois at Chicago [4] Opinion mining and sentiment analysis Bo Pang1 and Lillian Lee2 1 Yahoo! Research, 701 First Ave. Sunnyvale, CA 94089, U.S.A., bopang@yahoo-inc.com 2 Computer Science Department, Cornell University, Ithaca, NY 14853, U.S.A., llee@cs.cornell.edu [5] How sentiment analysis works in machines (an introduction) www.slideshare.net [6] Web Demo Lexalytics.htm