0
Automatic Identification of Pro and Con Reason in Online Reviews Soo-Min and Eduard Hovy COLING ’06 Advisor: Chia-Hui Chan...
Abstract <ul><li>Authors present a system that automatically extracts the pros and cons from online reviews. </li></ul><ul...
Outline <ul><li>Introduction </li></ul><ul><li>Pro and Con in Online Reviews </li></ul><ul><li>Finding Pros and Cons </li>...
Introduction <ul><li>Many opinions are being expressed on the Web in such settings as product reviews, personal blogs, and...
Introduction  cont. <ul><li>Subjectivity detection: </li></ul><ul><ul><li>It is the task of identifying subjective words, ...
Introduction  cont. <ul><li>The opinion reason identification problem seeks to answer the question “ What are the reasons ...
Introduction  cont. <ul><li>Labeling each sentence is a time consuming and costly task. </li></ul><ul><ul><li>Authors prop...
Pros and Cons in Online Reviews <ul><li>Researchers study opinions at three different levels:  word ,  sentence , and  doc...
Automatically Labeling Pro and Con Sentences <ul><li>Many web sites that have product reviews such as amzaon.com and epini...
Automatically Labeling Pro and Con Sentences  cont. <ul><li>First, generating two sets of phrases: {P1, P2,…,Pn}, {C1, C2,...
Modeling with Maximum Entropy Classification <ul><li>They use Maximum Entropy classification for the task of finding pro a...
Modeling with Maximum Entropy Classification  cont. <ul><li>To build an efficient model, the task of finding pro and con s...
Features <ul><li>News Corpus </li></ul><ul><li>WordNet. </li></ul>
DataSet <ul><li>Two different source: </li></ul><ul><ul><li>Epininos.com for training. </li></ul></ul><ul><ul><li>Complain...
Experimental Results <ul><li>Two goals: </li></ul><ul><ul><li>How well our pro and con detection model. </li></ul></ul><ul...
Experiments on Dataset 1  Identification step
Experiments on Dataset 1 Classification step
Experiment on DataSet 2 <ul><li>Gold Standard Annotation: </li></ul><ul><ul><li>Four humans annotated test sets.  </li></u...
Conclusions  <ul><li>This paper propose a framework for identifying the online product review. </li></ul><ul><li>They pres...
Upcoming SlideShare
Loading in...5
×

Automatic Identification Of Pro And Con Reason In Online Reviews

1,258

Published on

Published in: Economy & Finance, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,258
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Automatic Identification Of Pro And Con Reason In Online Reviews"

  1. 1. Automatic Identification of Pro and Con Reason in Online Reviews Soo-Min and Eduard Hovy COLING ’06 Advisor: Chia-Hui Chang Presenter: Teng-Kai Fan Date: 2008-05-20
  2. 2. Abstract <ul><li>Authors present a system that automatically extracts the pros and cons from online reviews. </li></ul><ul><ul><li>Their focus is on extracting the reasons of the opinions, which may be in the form of either fact or opinions. </li></ul></ul><ul><li>They proposed a system based on maximum entropy model for aligning the pros and cons to their sentence in review texts. </li></ul>
  3. 3. Outline <ul><li>Introduction </li></ul><ul><li>Pro and Con in Online Reviews </li></ul><ul><li>Finding Pros and Cons </li></ul><ul><li>Dataset </li></ul><ul><li>Experiments and Results </li></ul><ul><li>Conclusion </li></ul>
  4. 4. Introduction <ul><li>Many opinions are being expressed on the Web in such settings as product reviews, personal blogs, and news group message.... </li></ul><ul><li>The trend has raised many interesting research topics such as subjectivity detection , semantic orientation classification , and review classifications . </li></ul>
  5. 5. Introduction cont. <ul><li>Subjectivity detection: </li></ul><ul><ul><li>It is the task of identifying subjective words, expressions, and sentences. </li></ul></ul><ul><li>Semantic orientation classification: </li></ul><ul><ul><li>It is the task of determining positive or negative sentiment of words (phrases, sentence or document). </li></ul></ul>
  6. 6. Introduction cont. <ul><li>The opinion reason identification problem seeks to answer the question “ What are the reasons that the author of this review likes or dislikes the product ?” </li></ul><ul><li>Hence, they focus on extracting pros and cons which include not only sentences that contain opinion-bearing expressions about products and features but also sentences with reasons. </li></ul>
  7. 7. Introduction cont. <ul><li>Labeling each sentence is a time consuming and costly task. </li></ul><ul><ul><li>Authors propose a framework for automatically identifying reasons in online reviews and introduce a novel technique to label training data. </li></ul></ul><ul><li>The experimental results show that the pros and cons with 66% precision and 76% recall. </li></ul>
  8. 8. Pros and Cons in Online Reviews <ul><li>Researchers study opinions at three different levels: word , sentence , and document level. </li></ul><ul><li>They assume that reasons in a review are closely related of pros and cons expressed in the review. </li></ul><ul><ul><li>Pros in a product review are sentences that describe reasons why an author of the review likes the product. </li></ul></ul>
  9. 9. Automatically Labeling Pro and Con Sentences <ul><li>Many web sites that have product reviews such as amzaon.com and epinions.com explicitly state pros and cons phrases. </li></ul><ul><li>Hence, the automatic labeling system first collects phrases in pro and con fields and then searches the main reviews text in order to collect sentences corresponding to those phrase. </li></ul>
  10. 10. Automatically Labeling Pro and Con Sentences cont. <ul><li>First, generating two sets of phrases: {P1, P2,…,Pn}, {C1, C2,…,Cn} by extracting each pro and con fileds. </li></ul><ul><ul><li>Ex.: beautiful display. </li></ul></ul><ul><li>Then, the system checks each sentence to find a sentence that covers most of the words in the phrase. </li></ul><ul><ul><li>Ex.: I’m personally quite happy with it because of the beautiful display. </li></ul></ul><ul><li>Last, the system annotates this sentence with the “pro” label. </li></ul>Pro Con Main Review
  11. 11. Modeling with Maximum Entropy Classification <ul><li>They use Maximum Entropy classification for the task of finding pro and con sentences in a given review. </li></ul><ul><li>The conditional probability of a class c given a feature vector x : </li></ul><ul><ul><li>where: </li></ul></ul><ul><ul><li>f i ( c , x ): feature function with boolean value. </li></ul></ul><ul><ul><li>λ a weight parameter for the feature function. </li></ul></ul>
  12. 12. Modeling with Maximum Entropy Classification cont. <ul><li>To build an efficient model, the task of finding pro and con sentence is separated into two phases: </li></ul><ul><ul><li>The Identification separates pro and cons candidate sentences (PR and CR) from sentences irrelevant to either of them (NR). </li></ul></ul><ul><ul><li>The Classification classifies candidates into pros and cons. </li></ul></ul>Identification Classification
  13. 13. Features <ul><li>News Corpus </li></ul><ul><li>WordNet. </li></ul>
  14. 14. DataSet <ul><li>Two different source: </li></ul><ul><ul><li>Epininos.com for training. </li></ul></ul><ul><ul><li>Complaints.com for testing. </li></ul></ul><ul><li>Dataset1: Automatically Labeled Data </li></ul><ul><ul><li>Mp3 player: 3241 reviews (115029 sentences) </li></ul></ul><ul><ul><li>Restaurant: 7524 reviews (194391 sentences) </li></ul></ul><ul><li>Dataset2: Complaints.com Data </li></ul><ul><ul><li>Mp3 player: 59 reviews. </li></ul></ul><ul><ul><li>Restaurant: 322 reviews. </li></ul></ul>
  15. 15. Experimental Results <ul><li>Two goals: </li></ul><ul><ul><li>How well our pro and con detection model. </li></ul></ul><ul><ul><li>How well the trained model performs on complaints.com </li></ul></ul><ul><li>80 % for training, 10 % for development, and 10 % for testing. </li></ul>
  16. 16. Experiments on Dataset 1 Identification step
  17. 17. Experiments on Dataset 1 Classification step
  18. 18. Experiment on DataSet 2 <ul><li>Gold Standard Annotation: </li></ul><ul><ul><li>Four humans annotated test sets. </li></ul></ul><ul><li>Only Identification: </li></ul>
  19. 19. Conclusions <ul><li>This paper propose a framework for identifying the online product review. </li></ul><ul><li>They present a novel technique that automatically labels a large set of pro and con sentences by using clue phrases. </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×