By: Pradeep Pujari
 Sentiment Analysis?
 Sentiment Analysis – General Architecture
 Little Lucene
 Sentiment Analysis and Solr
 Applications of Sentiment Analysis
 Code Walkthrough
Working mostly in Search domain
Search = IR + ML + NLP
Who am I?
Works for
Contributing to SolrSherlock
- Open Source Project
Who am I?
http://solrsherlock.github.io/SolrSherlock/
What is Sentiment Analysis?
A linguistic analysis technique that identifies
The movie is great.
The movie stars Mr. X
The movie is horrible.
opinion early in a piece of text.
Challenging
Too easy Too hard
Difficulty
misclassification
What is Sentiment Analysis?
Sentiment
Analysis
NLP
Cognitive Science
What is Sentiment Analysis?
Human can easily understand
emotions.
Can a machine be trained to do it?
What is Sentiment Analysis?
 SA offers organizations ability to monitor in
real time and act accordingly
 Marketing managers, PR Firms, campaign
managers, politicians, equity investors, on
line shoppers are direct beneficiaries
 http://www.tweetfeel.com
 http://www.nytimes.com/interactive/us/pol
itics/2010-twitter-candidates.html
 Document-Level
supervised/non supervised learning
 Sentence-Level
supervised learning
 Feature-Based Sentiment Analysis
All NP in corpus and Polarity
 Sentiment Lexicon Acquisition
WordNet
 Open-source Java based search
engine
 Provides document indexing w/
arbitrary fields and fast search
 Several relevance and ranking
algorithms
1. Create an index
2. Add ‘document’ representations of
items
3. Construct queries
4. Ask for results (will be scored )
IndexWriterConfig config = /* configure */ ;
Directory dir = FSDirectory.open(indexFile);
IndexWriter w = new IndexWriter(dir, config);
for (ItemInfo item: getItems()) {
Document doc = new Document();
doc.add(new Field("title", item.title));
doc.add(new Field("tags", item.tags));
w.add(doc);
}
w.close();
 IndexSearcher idx = getIndexSearcher();
 IndexReader reader = idx.getIndexReader();
 TopDocs results = idx.search(q, n + 1);
 PyLucene is Python implementation
 Lucy is in C w/ bindings for other langs
 Lucene.NET
 SOLR provides search server (with REST
API) on top of Lucene
Solr ?
Http Request Servlet
Admin
Interface
Update Servlet
Standard
Request
Handler
Custom
Request
Handler
Response
Writer
Solr Core
Lucene
Analysis UIMA
config Caching
Update
Handler
Linguistics module
Stems, Lemmas and Synonyms
multi language capability
CJKAnalyzer, UIMA Analyzers
UIMA integration
UpdateProcessorChain
Why Solr ?
Why Solr ?
Extract domain specific entities
and concepts
Time and Cost
Solr Set Up – 5 mins
UIMA Annotators - 5 days
Enrich text, write to dedicated field
Tagging entities in review text
Applications:
I wasn't really in the market for another tablet, but my girlfriend ended
up getting one for me so she got me on this one. I would like to say that
this tablet reminds me of the first Motorola Droid smartphone that came
out several years back. The phone jam packed a ton of bells & whistles
into its hardware and software to give a lot of bang for your buck. This
is what it feels like amazon has done with the Kindle Fire 8.9. They have
put a lot of advanced hardware and innovative software, so for the
average user, specially someone who absorbs a lot of media, you get a
lot for the price. But just because you get a lot for the price, doesn't
mean it is without its flaws.
Applications:
Consumer feedback about products
Which product features are more relevant
Polarity
Digital SLR with Full 1080p HD Video
There are many preprogrammed scene modes
that make this a very easy camera to use.
The picture quality is beyond belief, and
even better for the price.
Price:
Usecase
Why UIMA ?
UIMA Framework manages components
and data flow – No coding
Deploy pipeline of analysis engines
AEs wrap NLP algorithms
Person
Place
organization
Language
Detection
Aggregate analysis engine
Sentence
Annotator
POS
Annotator
NER
Index
Lucene
Solr Update
RequestProcessor
Solr
QParser Data
Solr+UIMA
UIMA AE
NLP+UIMA
Use POS in query understanding
boosting terms
Synonym expansion
Extract concepts/entities
Faceting using entities
Identify places in query
and use spatial queries
Ideas: Sentiment Analysis App
Identify Subjective Sentences from text
Remove noisy sentences
– Regex, conditional probability
Graph min cut – LingPipe
Subjectivity Lexicons
Discard Facts and Objective Sentences
Subjectivity
detector
Subjective
Objective
Polarity
Classifier
Ideas: Sentiment Analysis App
Sentiments Intensity - SentiWordNet
WordNet-Affect: WordNet +
annotated concepts
Ideas: Sentiment Analysis App
Hybrid model with adding dictionary
Update
Handler with
processor chain
Remove Duplicates
processor
Logging
processor
Custom Transform
processor
Index
processor
Update Processor Chain
Text
Analyzers
Lucene
Lucene Index
Sentence Detection
processor
Sentiment Classifier
Company Name
Annotator
Sentiment Score
processor
Product Reviews
Let’s look at the code
 Data transformation or post processing
 UpdateProcessorFactory
 LogUpdateProcessorFactory
 UIMAUpdateProcessorFactory
 UpdateRequestProcessorChain
◦ Pipe line of UpdateRequestProcessors
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" >
<lst name="defaults">
<str name="update.processor">uima</str>
</lst>
</requestHandler>
 Stanford NER
<updateRequestProcessorChain name="uima">
<processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
<lst name="uimaConfig">
<lst name="runtimeParameters">
</lst>
<lst name="analysisEngine"><str
name="defaultanalysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str>
</lst>
<lst name="analyzeFields">
<bool name="merge">false</bool>
<arr name="fields">
<str>content_text</str>
</arr>
</lst>
<lst name="fieldMappings">
<lst name="type">
<str name="name">org.apache.uima.DictionaryEntry</str>
<lst name="mapping">
<str name="feature">coveredText</str>
<str
name="field">sentiment_keyword,sentiment_type</str>
</lst>
</lst>
http://lucene.apache.org/solr/
http://uima.apache.org/
http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html
http://openie.cs.washington.edu/
http://wiki.apache.org/solr/SolrUIMA
Questions ?
Thank You
Email: pradeepp@rocketmail.com

Sais svcc

  • 1.
  • 2.
     Sentiment Analysis? Sentiment Analysis – General Architecture  Little Lucene  Sentiment Analysis and Solr  Applications of Sentiment Analysis  Code Walkthrough
  • 3.
    Working mostly inSearch domain Search = IR + ML + NLP Who am I? Works for
  • 4.
    Contributing to SolrSherlock -Open Source Project Who am I? http://solrsherlock.github.io/SolrSherlock/
  • 5.
    What is SentimentAnalysis? A linguistic analysis technique that identifies The movie is great. The movie stars Mr. X The movie is horrible. opinion early in a piece of text.
  • 6.
    Challenging Too easy Toohard Difficulty misclassification What is Sentiment Analysis?
  • 7.
  • 8.
    Human can easilyunderstand emotions. Can a machine be trained to do it? What is Sentiment Analysis?
  • 9.
     SA offersorganizations ability to monitor in real time and act accordingly  Marketing managers, PR Firms, campaign managers, politicians, equity investors, on line shoppers are direct beneficiaries  http://www.tweetfeel.com  http://www.nytimes.com/interactive/us/pol itics/2010-twitter-candidates.html
  • 11.
     Document-Level supervised/non supervisedlearning  Sentence-Level supervised learning  Feature-Based Sentiment Analysis All NP in corpus and Polarity  Sentiment Lexicon Acquisition WordNet
  • 12.
     Open-source Javabased search engine  Provides document indexing w/ arbitrary fields and fast search  Several relevance and ranking algorithms
  • 13.
    1. Create anindex 2. Add ‘document’ representations of items 3. Construct queries 4. Ask for results (will be scored )
  • 14.
    IndexWriterConfig config =/* configure */ ; Directory dir = FSDirectory.open(indexFile); IndexWriter w = new IndexWriter(dir, config); for (ItemInfo item: getItems()) { Document doc = new Document(); doc.add(new Field("title", item.title)); doc.add(new Field("tags", item.tags)); w.add(doc); } w.close();
  • 15.
     IndexSearcher idx= getIndexSearcher();  IndexReader reader = idx.getIndexReader();  TopDocs results = idx.search(q, n + 1);
  • 16.
     PyLucene isPython implementation  Lucy is in C w/ bindings for other langs  Lucene.NET  SOLR provides search server (with REST API) on top of Lucene
  • 17.
    Solr ? Http RequestServlet Admin Interface Update Servlet Standard Request Handler Custom Request Handler Response Writer Solr Core Lucene Analysis UIMA config Caching Update Handler
  • 18.
    Linguistics module Stems, Lemmasand Synonyms multi language capability CJKAnalyzer, UIMA Analyzers UIMA integration UpdateProcessorChain Why Solr ?
  • 19.
    Why Solr ? Extractdomain specific entities and concepts Time and Cost Solr Set Up – 5 mins UIMA Annotators - 5 days Enrich text, write to dedicated field
  • 20.
    Tagging entities inreview text Applications: I wasn't really in the market for another tablet, but my girlfriend ended up getting one for me so she got me on this one. I would like to say that this tablet reminds me of the first Motorola Droid smartphone that came out several years back. The phone jam packed a ton of bells & whistles into its hardware and software to give a lot of bang for your buck. This is what it feels like amazon has done with the Kindle Fire 8.9. They have put a lot of advanced hardware and innovative software, so for the average user, specially someone who absorbs a lot of media, you get a lot for the price. But just because you get a lot for the price, doesn't mean it is without its flaws.
  • 21.
    Applications: Consumer feedback aboutproducts Which product features are more relevant Polarity
  • 22.
    Digital SLR withFull 1080p HD Video There are many preprogrammed scene modes that make this a very easy camera to use. The picture quality is beyond belief, and even better for the price. Price: Usecase
  • 23.
    Why UIMA ? UIMAFramework manages components and data flow – No coding Deploy pipeline of analysis engines AEs wrap NLP algorithms Person Place organization Language Detection Aggregate analysis engine Sentence Annotator POS Annotator NER
  • 24.
  • 25.
    NLP+UIMA Use POS inquery understanding boosting terms Synonym expansion Extract concepts/entities Faceting using entities Identify places in query and use spatial queries
  • 26.
    Ideas: Sentiment AnalysisApp Identify Subjective Sentences from text Remove noisy sentences – Regex, conditional probability Graph min cut – LingPipe Subjectivity Lexicons Discard Facts and Objective Sentences
  • 27.
  • 28.
    Sentiments Intensity -SentiWordNet WordNet-Affect: WordNet + annotated concepts Ideas: Sentiment Analysis App Hybrid model with adding dictionary
  • 29.
    Update Handler with processor chain RemoveDuplicates processor Logging processor Custom Transform processor Index processor Update Processor Chain Text Analyzers Lucene Lucene Index Sentence Detection processor Sentiment Classifier Company Name Annotator Sentiment Score processor Product Reviews
  • 30.
  • 31.
     Data transformationor post processing  UpdateProcessorFactory  LogUpdateProcessorFactory  UIMAUpdateProcessorFactory  UpdateRequestProcessorChain ◦ Pipe line of UpdateRequestProcessors
  • 32.
    <requestHandler name="/update" class="solr.XmlUpdateRequestHandler"> <lst name="defaults"> <str name="update.processor">uima</str> </lst> </requestHandler>
  • 33.
  • 34.
    <updateRequestProcessorChain name="uima"> <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory"> <lstname="uimaConfig"> <lst name="runtimeParameters"> </lst> <lst name="analysisEngine"><str name="defaultanalysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str> </lst> <lst name="analyzeFields"> <bool name="merge">false</bool> <arr name="fields"> <str>content_text</str> </arr> </lst> <lst name="fieldMappings"> <lst name="type"> <str name="name">org.apache.uima.DictionaryEntry</str> <lst name="mapping"> <str name="feature">coveredText</str> <str name="field">sentiment_keyword,sentiment_type</str> </lst> </lst>
  • 36.
  • 37.
  • 38.

Editor's Notes

  • #6 Huge explosion today of sentiments. PR firms, etc r Direct beneficiary of SA technology
  • #9 Classify sentences into 2 principal classes subjective, objective
  • #12 Positive, negative neutral, naïve bayes
  • #21 Overall this review is very positive about smart phone, sentiment score
  • #27  Using hierarchical classification, neutrality is determined first, and sentiment polarity is determined second, but only if the text is not neutral.
  • #32 No matter how you choose to import data, there is a final config point within solr that allows manipulation of the imported data before it get indexed. Updaterquesthandler put documents on an update request processor chain.