Can I hear you?
Sentiment Analysis on Medical
Forums
TAN V E E R AL I

D AV I D S C H R AM M

M AR I N A S O K O L O VA

D...
Task
Extract opinions from medical forums about hearing problems.
Classify the opinionated messages into positive, negativ...
Dataset
We collected 893 individual posts from 34 threads from three different threads of Hearing loss problems.
We filter...
Data Annotation
 Annotator 1 and Annotator 2 did not label a large number of sentences, i.e., 2939 and 2728, respectively...
Subjectivity Lexicon
 We used the subjective lexicon built by Wilson, Wiebe, and Hoffman (2005) as a resource to extract ...
Methodology
Sentence splitting, tokenization, part of speech tagging, lemmatization for nouns and verbs.
Computed the feat...
Experiments
 For the baseline, the feature vector of bag of words is used. We have not considered the unique words for th...
Analysis
 The bag-of-word representation (BOW) is a high baseline that is hard to beat in many texts
classification probl...
Conclusion and Future Work
 We used several lexicon-based features together with rule-based features like pronoun phrases...
References
 Yi, J., Nasukawa, T., Bunescu, R., &Niblack, W. (2003, November). Sentiment analyzer: Extracting
sentiments a...
Questions?
Email the authors at:
tali028@uottawa.ca
diana.inkpen@uottawa.ca
Upcoming SlideShare
Loading in …5
×

Can i hear you sentiment analysis on medical forums

804 views
758 views

Published on

Presentation about the following paper:

http://www.aclweb.org/anthology/I/I13/I13-1077.pdf

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
804
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • In the current study, the focus is on detecting thepolarity of the opinionsexpressed in messages posted on medical forums about hearing devices. Polarity can be binary (e.g., positive vs. negative) or multi-categorical (e.g., positive, negative, both, neutral, unknown). In this work, the subjective sentences are classified into positive, negative and neutral/unknown. Different syntactic features are identified, i.e., patterns / rules, similar to those of (Yi and Nasukawa, 2003), in order to indicate the subjectivity and polarity of the sentences.Here are examples of sentences from each category [Please read the 3 examples].Users express their sentiments differently on forums compared to the way they express opinions when providing reviews or sharing messages on social networks. Bobicev et al. (2012) have analyzed sentiments in Twitter messages using some statistical features based on the occurrence and correlation among words with the class labels of the training set. However, in this work,the correlation of phrases within sentences for predicting subjectivity and polarity is determined.
  • The initial collection of data contains about 893 individual posts from 34 threads. They were extracted using an XPath query in the Google Chrome extension “XPathHelper”.Statistics like average posts per person, were measured, in order to filter the data. For example, threads with more than 100 posts were removed. Threads with a large number of posts were also removed, because they deviate from the main topic of discussion. Therefore only 26 thread remained, containing 607 posts.The table also shows the 3 forums from which the data was collected.The data was split into sentences using a regular expression-based sentence splitter. Sentences containing very few words (i.e., less than 4) were filtered out. Sentences that were not subjective were filtered out in a previous annotation experiment.The remaining sentences 3151 from the 26 threads were manually annotated by two independent annotators (medical experts).
  • Here are more details about the manual data annotation.The 3515 sentences were manually annotated by two independent annotators into three classes (Positive, Negative and Neutral/Unknown). There were some sentences that were not labeled. They were removed from the dataset.Their agreement was 82.01%, and the kappa coefficient that compensates for chance agreement was 0.65
  • For the experiments, the Subjectivity Lexicon (SL) built by Wilson, Wiebe, and Hoffman (2005) was used as the main resource. The lexicon contains 8221 subjective expressions manually annotated as strongly or weakly subjective, and as positive, negative, neutral, or both.The tables show the distribution in the lexicon of adjectives, nouns, verbs, adverbs, and other parts of speech (grouped under the tag anypos) with polarity labels in the first table and subjectivity labels in the second table.The lower left corner shows a sample entry from the lexicon:type=weaksubjlen=1 word=ability pos=noun stemmed=n prior_polarity=positive This entry contains the term ability, which is a noun. Its length is 1 (single term); it is not stemmed; it is weakly subjective and positive.
  • The methodology was as follows: The data was pre-processes, includingsentence splitting, tokenization, part of speech tagging, and lemmatization for nouns and verbs.Then features were extracted, before applying classification algorithms. Lexicon-basedfeatures simply counted how many words in a sentence match in the lexicon.All the features were on individual tokens, except pronoun phrases which were calculated using different patterns, e.g., I would assume, I feel as, I could sympathize.The table shows the list of features extracted from each sentence [Please read each one]In the lexicon, nounshave a good coverage, while verbs are used in many different forms and have many meanings, e.g., he uses a car is neutral, when he was used has a negative sense.For all nouns and verbs, we did lemmatization with the GATE morphological plugin, which provides the root words. In case of nouns, the root word is the singular form of the plural noun, e.g., bottles become bottle, etc. In the case of verbs, the plugin provides the base form for infinitive, e.g., helping becomes help, and watches become watch. After performing lemmatization, 158 more words were detected in the lexicon compared to the looking up the original word. There were still 175 words which were found with the root word in the lexicon, but with different part of speech, e.g., senses was used as noun in the data; after lemmatization it becomes sense, which exists as a verb in the lexicon. Therefore it cannot be matched, as the context and meaning of the word is different.Adjectives, e.g., what a blessed relief are good indicators for the positivity or negativity of the sentences, but they are not sufficient for identifying the subjectivity.Adverbs are words that modify the verbs, adjectives and other phrases or clauses, e.g., I am usually a contributing adult, and am happily sane and I say whoa how did that happen? Adverbs were only 4% in lexicon, and many adverbs are identified by their characteristic "ly" suffix. we have removed the suffix–ly and then matched the new word in the lexicon by considering it as adjective for e.g. badly, softly, carefully, extremely are forms of adjectives; therefore considering these provides better results in predicting the polarity of words in their correct senses.
  • The data was classified using the WEKA tool (Hall et al., 2009). Three classifiers were used: Naïve Bayes, because it is known to work well with text, support vector machine (SVM) because of its high performance in many tasks, and logistic regression (Logistic-R), in order to try one more classifier based on a different approach. The performance of logistics regression was the best. Performance was evaluated using 10-fold cross validation. F1-measure is reported in the tables. The baseline used bag-of-word features. Words that appeared only once were eliminated; this reduced by half the size of the bag-of-wordvectors. These unique worddo not contribute much in the classification since they appear only once, in one class.The first table shows significant improvement with polarity features over the baseline. The difference was much higher for Naïve Bayes and SVM than for logistic-R, i.e., 10.5%, 3.9% and 5.7%, respectively. The second table, when lemmatization was included, the classifiers Naïve Bayes and SVM under-performed logistic regression andtheir performance decreased by 3.7% and 4.7%, respectively. However, the performance of logistic-R increased significantly, by 4.4%, and its F1-measure reached 68.5%, which indicates the benefit of lemmatization in matching within the lexicon.
  • [Please read from slide]Our results are comparative to other related studies for sentiment classification of medical forums. Sokolova & Bobicev (2011) achieved the best F-score of 0.708 using SVM; similarly Goeuriot et al. (2012) for drug reviews achieved F-score of 0.62 for the positive class, 0.48 for the negative class and 0.09 for the neutral class.In general, for consumer reviews, opinion-bearing text segments are classified into positive and negative categories with Precision 56%−72% (Hu & Liu 2004). For online debates, the complete texts (i.e., posts) were classified as positive or negative stance with F-score 39%−67% (Somasundaran & Wiebe, 2009); when those posts were enriched with preferences learned from the Web, the F-score increased to 53%−75%.
  • In conclusion, this work explored opinion extraction in medical forums about hearing problems.[Please read from slide, exept the last bullet]There are several direction forfuture work. The lexicon could be improved, as the domain lexicon created in (Goeuriot et al., 2012) has shown better results over other dictionaries for polarity detection of the sentences.Also, techniques and features presented in (Taboada et al., 2011) and(Kennedy and Inkpen, 2006) such as intensification (e.g., very good) that increases the polarity of good and negation (e.g., not good) that reverses the polarity of good, can be used for the semantic orientation or polarity detection of the sentences.
  • Here are some references.
  • Please email the authors if you have any questions.
  • Can i hear you sentiment analysis on medical forums

    1. 1. Can I hear you? Sentiment Analysis on Medical Forums TAN V E E R AL I D AV I D S C H R AM M M AR I N A S O K O L O VA D I AN A I N K P E N U N I V E R S I T Y O F O T TAWA & C H I L D R E N ' S H O S P I TAL O F E AS T E R N O N TAR I O , O T TAWA, C AN A D A
    2. 2. Task Extract opinions from medical forums about hearing problems. Classify the opinionated messages into positive, negative and neutral. Introduction Factual Statements Opinion Statements Positive This has the beneficial effect of making the quieter sounds audible but not blowing your head off with the louder sounds. Negative Someone with 50 DB hearing aid gain with a total loss of 70 DB may not know that the place is producing 107 DB since it may not appear too loud to him since he only perceives 47 DB. Neutral/Unknown Now, you'll hear some people saying that compression is bad, and linearity is good, especially for music.
    3. 3. Dataset We collected 893 individual posts from 34 threads from three different threads of Hearing loss problems. We filtered the initial 34 threads by removing the threads in which people did not discuss Hearing Aids. The final data contained 607 posts in 26 threads. Threads Posts Average posts per person www.hearingaidforums.com 7 185 2.9 www.medhelp.org 9 105 2.77 www.alldeaf.com 10 317 1.93 Total 26 607 2.53 Filtered dataset collection statistics  Manual Annotation A total of 3515 sentences from 26 threads were manually annotated by two annotators.
    4. 4. Data Annotation  Annotator 1 and Annotator 2 did not label a large number of sentences, i.e., 2939 and 2728, respectively; therefore these sentences are removed from the dataset.  We consider only those sentences labeled as positive and negative. Since the data is already balanced, no balancing is performed. Annotator 1 Annotator 2 Pos Neg Neutral No Label Total Pos 226 Neg Neut No Label 214 117 230 218 128 2720 2939 Total 329 296 162 2728 3515 Annotations statistics of Sentences between the two annotators  The overall percentage agreement between the annotators for the balanced dataset was 82.01% and kappa was 0.65.
    5. 5. Subjectivity Lexicon  We used the subjective lexicon built by Wilson, Wiebe, and Hoffman (2005) as a resource to extract features for supervised learning classification. Positive Adjective 1171 Noun 677 Verb 380 anypos 362 Adverb 128 Total 2718 Percent 33.06 Negative 1838 1346 869 676 183 4912 59.75 Neutral 235 144 68 104 19 570 6.93 Both 5 3 8 5 0 21 0.26 Total 3249 2170 1325 1147 330 8221 100 Percent 39.52 26.40 16.12 13.95 4.01 100 Distribution of prior polarities in the Subjectivity Lexicon Strong Subj Weak Subj Total Percent Adjective 39.52 Noun Example: type=weaksubj len=1 word=ability pos=noun stemmed=n prior_polarity=positive 2006 (61.74%) 1243 (38.25%) 3249 1440 (25.85%) 730 (33.6%) 2170 26.40 Verb 861 (15.46%) 464 (35.01%) 1325 16.12 Anypos 1043 (18.72%) 104 (9.06%) 1147 13.95 Adverb 219 (33.6%) 330 4.01 Total Percent 5569 67.74 (3.93%) 111 2652 32.26 8221 100 100 Distribution of subjectivity clue in the Subjectivity Lexicon
    6. 6. Methodology Sentence splitting, tokenization, part of speech tagging, lemmatization for nouns and verbs. Computed the features and then using the classifiers to classify into positive, negative, and neutral sentences. - Lexicon matching features simply count how many words in a sentence match in the lexicon. - All features were on individual tokens, except pronoun phrases which were calculated using different patterns, e.g., I would assume, I feel as, I could sympathize. STRONGSUBJ WEAKSUBJ ADJECTIVE ADVERBS PRONOUN POSITIVE NEGATIVE NEUTRAL PRP_PHRASE # of words found as strong subjective in current sentence # of words found as weak subjective in current sentence # of adjectives # of adverbs # of pronouns # of words found having prior polarity as positive # of words found having prior polarity as negative polarity as negative # of words found having prior polarity as neutral # of phrases containing pronouns found in current sentence Final features considered for the experiments  The most common features were pronouns, followed by weak subjective clues, adjectives, and adverbs.
    7. 7. Experiments  For the baseline, the feature vector of bag of words is used. We have not considered the unique words for the bag of words.  We used 10-fold cross validation with three classifiers i.e. Naïve Bayes, Support Vector Machine and Logistics Regression.  More words matched with the lexicon’s positive words than the lexicon’s negative words. This led to classifier’s performance to become slightly better for positive in the experiments. No lemmatization Naïve Bayes P R SVM F-1 P Re Logistics-R F-1 P R F-1 Polarity features only 0.656 0.65 0.644 0.661 0.641 0.625 0.649 0.645 0.641 All features 0.595 0.584 0.565 0.641 0.618 0.596 0.657 0.657 0.656 Baseline 0.540 0.541 0.539 0.586 0.586 0.586 0.585 0.584 0.584 Comparison of performance between different features among three classifiers (no lemmatization) Including lemmatization Naïve Bayes P R SVM F-1 P Re Logistic-R F-1 P R F-1 Polarity features only 0.644 0.625 0.607 0.636 0.607 0.578 0.688 0.686 0.685 All features 0.589 0.580 0.560 0.627 0.600 0.570 0.671 0.670 0.670 Baseline 0.540 0.541 0.539 0.586 0.586 0.586 0.585 0.584 0.584 Comparison of performance between different features among three classifiers (after lemmatization)
    8. 8. Analysis  The bag-of-word representation (BOW) is a high baseline that is hard to beat in many texts classification problems.  The Subjectivity Lexicon clues for polarity such as positive, negative and neutral have shown significant improvement with 4.2% on average among the three classifiers.  The combination of Subjectivity Lexicon clues with other features, i.e., the number of pronoun phrases, number of adjectives, etc., decreased the classification performance.  The classification for semantic orientation depends heavily on the quality of the lexicon used, rather than the size of the lexicon, as the classification of the sentences into positive and negative reached 70% by using only the polarity clues for individual words within the lexicon.
    9. 9. Conclusion and Future Work  We used several lexicon-based features together with rule-based features like pronoun phrases for our classification of the medical forums dataset for detecting semantic orientation.  The dataset of 3515 sentences from 26 threads were manually annotated by two annotators and achieved 82.01% overall agreement with kappa 0.65.  Evaluations have been made for the classification of the data using three different supervised learningbased classifiers.  It is shown that our proposed features outperformed the baseline of bag-of-word features by 10.5% with Naïve Bayes, 3.9% with SVM and 5.7% with logistic-R.  In future work, some techniques and features such as intensification and negation presented in (Taboada et al., 2011), (Kennedy and Inkpen, 2006) can be used for the semantic orientation or polarity detection of the sentences.
    10. 10. References  Yi, J., Nasukawa, T., Bunescu, R., &Niblack, W. (2003, November). Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques. In Data Mining, 2003.ICDM 2003. Third IEEE International Conference on (pp. 427-434).  Sokolova, M., &Bobicev, V. (2011). Sentiments and Opinions in Health-related Web messages.In Recent Advances in Natural Language Processing (pp. 132-139).  Kennedy, A., & Inkpen, D. (2006).Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence, 22(2), 110-125.  Taboada, M., Brooke, J., Tofiloski, M., Voll, K., &Stede, M. (2011).Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2), 267-307.
    11. 11. Questions? Email the authors at: tali028@uottawa.ca diana.inkpen@uottawa.ca

    ×