Your SlideShare is downloading. ×
0
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Mining Product Opinions and Reviews on the Web
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Mining Product Opinions and Reviews on the Web

3,131

Published on

1 Comment
1 Like
Statistics
Notes
  • To read this thesis you can download this PDF:
    http://www.rn.inf.tu-dresden.de/uploads/Studentische_Arbeiten/Masterarbeit_Mattosinho_Felipe.pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
3,131
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
123
Comments
1
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Mining Product Opinions and Reviews on The Web<br />Felipe Mattosinho<br />
  • 2. Agenda<br />Introduction<br />Basics<br />Requirements<br />Design<br />Implementation<br />Evaluation<br />Conclusion<br />
  • 3. Introduction<br />
  • 4. Introduction<br />What is Opinion Mining?<br />Opinion Mining<br /><ul><li> not conventional data mining
  • 5. retrieve useful information out of users opinions
  • 6. a sub-area of Web Mining
  • 7. stays at the crossroads of IR, IE and DM. </li></li></ul><li>Introduction<br />Why using Opinion Mining?<br />Opinion overload<br />748<br />Source: Amazon.com<br /><ul><li> Users are not willing to read them all
  • 8. Difficult to find necessary information
  • 9. Difficult to draw conclusions</li></ul>1279<br />100<br />128<br />Structure data<br />576<br /><ul><li> Build intelligent applications (Web 3.0)</li></li></ul><li>Introduction<br />Existing approaches<br />Source: Amazon.com<br />Ranking<br /><ul><li> Facts are different from opinions</li></ul>Classification<br /><ul><li> Asking „pros“ and „cons“ can induce opinions
  • 10. Tiresome task for users</li></ul>Source: CNET.com<br />
  • 11. Basics<br />
  • 12. Basics<br />Opinion Model<br />Opinions highlight strengths and weaknesses about objects under discussion (OuD)<br />O:(T,A), T is a taxonomy of components and A is a set of attributes of O<br />The use of word feature for simplicity<br />
  • 13. Basics<br /> Level of Sentiment Analysis<br />Opinion Level<br /><ul><li> too coarse-grained, does not cover important information </li></ul>Sentence Level<br /><ul><li> better approach, but still do not cover everything</li></ul>Feature Level<br /><ul><li> Optimal level, best coverage </li></li></ul><li>Basics<br />Trends in Sentiment Analysis for Opinion Mining<br />Granularity Level<br />Higher Complexity<br />Lower Perfomance<br />
  • 14. Requirements<br />
  • 15. Requirements<br />Target Audience<br />
  • 16. Requirements<br />Functional Requirements<br />Generate a feature-based summary to the user<br />System administrator has control over core parameters and policies mechanisms<br />Non-functional Requirements<br /><ul><li>Fedseeko compatibility
  • 17. Fault-tolerance
  • 18. Performance
  • 19. Interoperability</li></li></ul><li>Design<br />
  • 20. Design<br />System Architecture<br /><ul><li>System Management Module
  • 21. POS Tagging Module
  • 22. Opinion Retriever Module
  • 23. Opinion Mining Module</li></li></ul><li>Design<br />System Management Module<br /><ul><li>Long jobs handled asynchronously
  • 24. Workers run concurrently, different times of the day or in different machines</li></li></ul><li>Design<br />Opinion Retrieval Module<br /><ul><li>Create task description
  • 25. Web scraping otimization</li></li></ul><li>Design<br />Opinion Composition Model<br /><ul><li>Other words are also special (negation words, orientation inverter words, “too” words)
  • 26. Workers run concurrently, different times of the day or in different machines</li></li></ul><li>Design<br />Opinion Sentence<br />I needed to take pictures during my last travel to Italy. <br />So far, I’m very happy with this camera. The picture quality is good and the zoom is powerful. One thing that I didn’t like is the LCD resolution.<br />I_PRP needed_VBD to_TO take_VB pictures_NNS during_IN my_PRP$ last_RB travel_NN to_TO Italy_NNP ._. So_RB far_RB ,_, I_PRP ‘m_VBP very_RB happy_JJ with_IN this_DT camera_NN ._. The_DT picture_NN quality_NN is_VBZ good_JJ and_CC the_DT zoom_NN is_VBZ powerful_JJ ._. One_CD thing_NN that_IN I_PRP did_VBD n‘t_RB like_VB is_VBZ the_DT LCD_NNP resolution_NN ._.<br />So_RB far_RB ,_, I_PRP ‘m_VBP very_RB happy_JJ with_IN this_DT camera_NN ._. The_DT picture_NN quality_NN is_VBZ good_JJ and_CC the_DT zoom_NN is_VBZ powerful_JJ ._. One_CD thing_NN that_IN I_PRP did_VBD n‘t_RB like_VB is_VBZ the_DT LCD_NNP resolution_NN ._.<br />
  • 27. Design<br />Feature Identification<br />camera_NN (picture_NN quality_NN) zoom_NN thing_NN LCD_NNP resolution_NN wedding_NN car_NN photos_NNS dog_NN road_NN lot_NN [...]<br />camera_NN (picture_NN quality_NN) flash_NN thing_NN England_NNP rehearsel_NN photos_NNS [...]<br />horse_NN (picture_NN quality_NN) flash_NN farm_NN country_NN rehearsel_NN photos_NNS [...]<br />camera_NN (picture_NN quality_NN) flash_NN photos_NNS [...]<br />
  • 28. Design<br />Feature Identification<br />Pros<br />Customers use different words to refer to the same feature<br />Detects additional useful information (not part of the opinion model)<br />No manual annotated data<br />Cons<br /><ul><li>Does not detect infrequent features
  • 29. Detects non-features</li></li></ul><li>Design<br />Search Word Orientation Algorithm<br />bad<br />Seed List<br />good,1,nice,1<br />good,1,nice,1,bad,-1<br />
  • 30. Design<br />Opinion Words in Context<br />Negation Rules<br />good<br />not<br />not<br />bad<br />problem<br />no<br />Too Rules<br /><ul><li>Before adjectives usually denotes negative sentiment. E.g „This camera is too small“.</li></li></ul><li>Design<br />Opinion Words in Context<br />Orientation Inverter / Sentiment Inverter Words<br /><ul><li>Find sentiment/orientation for opinion words with unknown orientation
  • 31. „The camera is nice, except for the initialization time which takes long “
  • 32. „The aufocous is great, the battery life lasts long, but i find the functions a little complex.“</li></li></ul><li>Design<br />Aggregating opinions for a feature<br />0 1 2 3 4 5 6 7 8 9<br />“The image quality is amazing, but the autofocus is terrible.”<br />Score(image quality) = (1 / |4 – 1| ) + (-1 / |9 - 1| ) = 0.2083 (positive)<br />Score(autofocus) = (1 / |1 - 7 | ) + (-1 / | 9 - 7 | ) = 0.1667 - 0.5 = -0.333 (negative)<br />
  • 33. Implementation<br />Overview<br />Core Technologies<br /><ul><li> JRuby on Rails (JRuby 1.5.0 RC3 / Rails 2.3.8 )</li></ul>Ruby Gems (Libraries)<br /><ul><li> Mechanize
  • 34. Nokogiri
  • 35. Ruby-aaws
  • 36. Delayed_Job</li></ul>Java Libraries<br /><ul><li> Rita.Wordnet
  • 37. Stanford POS Tagger API</li></li></ul><li>Implementation<br />Overview<br />
  • 38. Implementation<br />Overview<br />
  • 39. Evaluation<br />
  • 40. Evaluation<br />Test Environment<br /><ul><li>AMD Turion(tm) 64 Mobile Technology ML-32 / 1GB RAM
  • 41. Ubuntu 9.04 32 bits</li></ul>Sample data<br />System Configuration<br />
  • 42. Evaluation<br />Effectiveness of Feature Identification<br />Threshold<br />Accuracy<br />Features<br />
  • 43. Evaluation<br />Sentiment Classification Effectiveness<br /><ul><li>Xbox 360 lowest effectiveness due to wrong part-of-speech tagging
  • 44. „Complex sentences“ and domain dependent sentences are also wrongly classified</li></li></ul><li>Evaluation<br />System Efficiency<br /><ul><li>The lower the threshold, the higher the number of features and hence the number of sentences analyzed
  • 45. What is the price to address many exceptions ? </li></li></ul><li>Evaluation<br />Considerations<br /><ul><li>Users may talk about other objects, with similar features
  • 46. Domain Dependent sentences (e.g „ The device heats very fast.”)</li></ul>Complex Sentences / Domain Dependent Sentences / Exceptions<br />POS Tagging Errors<br />Pluralization cases<br /><ul><li>May not refer to the same OuD (e.g „camera“ and „cameras“)
  • 47. „This camera is GOOD“
  • 48. „[...] the hard drive which comes with the device.”</li></li></ul><li>Conclusion<br />POECS performs well with a good rate of accuracy.<br />Observations shows that many users write „simple“ straightforward sentences, which are covered by POECS.<br />Domain specific annotations can help the system to be more effective.<br />Human language is complex, covering many cases represent a lot of loss in performance<br />Sample data<br />
  • 49. Conclusion<br />Future Work<br />Minimize number of manual annotations through recognition of reusable patterns of the human language<br />Cope with common unsolved problems such as <br />Safe ways to recognize which features belong to which object<br />Global opinion knowledge to help improvement of local analysis (sentence or feature level)<br />
  • 50. Conclusion<br />Questions?<br />

×