Sentiment Classification with Case-Based Reasoning

5,053 views

Published on

Given at ICCBR conference Sep/2012, Lyon, France.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
5,053
On SlideShare
0
From Embeds
0
Number of Embeds
3,608
Actions
Shares
0
Downloads
46
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Sentiment Classification with Case-Based Reasoning

  1. 1. Case-Based Approach to Cross-Domain Sentiment Classification ICCBR - Sep/2012 Bruno Ohana Sarah-Jane Delany Brendan Tierney Dublin Institute of Technology - Ireland
  2. 2. Outline● Sentiment Classification● Domain Dependence● Lexicon-based methods.● Case Based Approach● Experiment and Results.
  3. 3. Sentiment Classification● For a given piece of text, determine sentiment orientation.● Positive or Negative?“This is by far the worst hotel experience ive ever had. theowner overbooked while i was staying there (even though ibooked the room two months in advance) and made memove to another room, but that room wasnt even a hotelroom!”
  4. 4. Applications● Search and Recommendation Engines. ○ Show only positive/negative/neutral.● Market Research. ○ What is being said about brand X on Twitter?● Ad Placement.● Mediation of online communities.
  5. 5. Domain DependenceSupervised Learning Methods● Good Performance, but: ○ Labeled data is Expensive. ○ Availability for all domains unlikely.● Classifiers are domain specific. ○ Ex: “Kubrick” may be a good opinion predictor for film reviews, but not on other domains.● (Aue & Gamon 05) ○ Straightforward Train/Test across domains yields poor results.
  6. 6. Using a Sentiment LexiconDatabase of terms associated with positive or negativesentiment. ● Manual: General Enquirer (Stone et al 67) ● Corpus Based (Hatzivassiloglou & McKeown 97) ● Lexical Induction: SentiWordNet (Esuli et al 06) ● Some sample sizes: ○ GI: 4K ○ SWN: 26KApproach: ● Scan document for term ocurrences, prediction based on agregated results for positive/negative classes.● No need for Training data sets.
  7. 7. Sentiment Classification with Lexicons POS Tagger NegEx Classifier Prediction Sent. LexiconLexicon-Based classification ● Annotate text with POS and negation information. ● Identify words present on lexicon. ○ Retrieve numerical score from lexicon indicating opinion. ● Aggregate results, use a rule to make prediction. ○ Ex: max(PosScore,NegScore)
  8. 8. Sentiment Classification with Lexicons The computer-animated comedy "shrek" is designed to beenjoyed on different levels by different groups . for children , it offersimaginative visuals , appealing new characters mixed with a host offamiliar faces , loads of action and a barrage of big laughs The/DT computer-animated/JJ comedy/NN / shrek/NN / is/VBZdesigned/VBN to/TO be/VB enjoyed/VBN on/IN different/JJ levels/NNS by/INdifferent/JJ groups/NNS ./. for/IN children/NNS ,/, it/PRP offers/VBZimaginative/JJ visuals/NNS ,/, appealing/VBG new/JJ characters/NNSmixed/VBN with/IN a/DT host/NN of/IN familiar/JJ faces/NNS ,/, loads/NNSof/IN action/NN and/CC a/DT barrage/NN of/IN big/JJ laughs/NNS
  9. 9. Lexicon-Based Classification: Issues● Performance of supervised learning methods is better.● Selection of lexicon, classifier are established upfront. ○ Ex: Use SWN with classifier F. ○ Your choice can be sub-optimal.● Lexicons perform differently on different domains. (Ohana et al, 11)
  10. 10. Sentiment Classification with Lexicons POS Tagger NegEx Classifier Prediction Classifier Classifier Sent. Sent. Lexicon Sent. Lexicon LexiconClassifier Considerations ● Which Sentiment Lexicon to Use? ● How to apply term sentiment information to the document? ○ What part-of-speech to use. ○ Enable/Disable Negation Detection. ○ How to count terms? (once, every time, adjust for frequency)
  11. 11. Our ApproachBuild a case-base using out-of-domain data where: ● Problem description maps to document characteristics.● Solution description maps to successful combinations of lexicons/classifiers.Use case base to decide on which lexicon and classifier touse on a new document/domain.
  12. 12. Experiment - Case RepresentationProblem Description Counts for words, tokens and sentences; Avg. sentence size Part-of-speech frequencies. Counts for total Syllable and Monosyllable count. Spacing ratio; Word-token ratio. Stop words ratio. Unique words count.Solution Description● Set of lexicons S={L1,...Ln} that yielded a correct prediction on input document.● We use 5 different lexicons from the literature.
  13. 13. Experiment - Data SetsUser generated reviews on 6 x domains● English, Plain text.● Balanced classes.● Borderline cases removed. Data Set Size Source Hotels 2874 Tripadvisor Films 2000 IMDB Electronics 2072 Amazon.com Music 5902 Amazon.com Books 2034 Amazon.com Apparel 566 Amazon.com
  14. 14. Experiment - Case Base6 x domains.● Customer reviews in raw text.● Build 6 x case-bases of 5 x domains (Leave one out). Movies Electronics Apparel Hotels Books Music Albums
  15. 15. Building the Case Base
  16. 16. Experiment - Case BasesCase creation:● Found at least one lexicon that gives a correct prediction. Left out Domain Case Base Size % Positive % Negative Books 9683 53.3 46.7 Electronics 9592 53.6 46.4 Film 9614 54.1 45.9 Music 6137 52.6 47.4 Hotels 11516 53.5 46.5 Apparel 11002 53.4 46.6
  17. 17. Lexicons in Case Solution
  18. 18. Experiment - Retrieval and Ranking● K-NN and Euclidean Distance.● Ranking: Select most common Lexicon out of K cases retrieved. Solutions (k=3) Ranking (Count) Selected case1 = {L1,L3,L4} L1 (3) {L1} case2 = {L1,L2} L3 (2) case3 = {L1,L3,L5} L2, L4, L5 (1)
  19. 19. Case Based Approach
  20. 20. Experiment ResultsBaseline Results ● Results for lexicon that performed best in domain (out of 5 lexicons)
  21. 21. SummaryCase Based Approach● Selection of lexicon/classifier up to case-base.● Expandable. ○ Easy to add more lexicons, classifiers, cases.● Experimental results beat best-lexicon baseline in 4 of 6 domains.
  22. 22. Next StepsGrow Solution Search Space● More lexicons, more classifiers.Retrieval and Ranking● For larger search space, will not scale.● Room to improve case problem description.Case Base Creation● Add negative results instead of discarding.
  23. 23. Thank You.

×