Alex andru -Lucian G înscă 1 , Adrian Iftene 1 ,   Marius Corîci 2   ConsILR  Conference,  8 - 9  December, Bucharest, Rom...
<ul><li>Motivation </li></ul><ul><li>Existing Sentiment Corpora </li></ul><ul><li>Files Sources </li></ul><ul><li>Annotati...
<ul><li>Sentiment Analysis or Opinion Mining represents for some time a hot topic within Web 2.0 era. </li></ul><ul><li>To...
ConsILR Conference ,  8 - 9   December ,  MNLR, Bucharest <ul><li>Existing Sentiment Corpora:  MPQA opinion corpus ,  Larg...
ConsILR Conference ,  8 - 9   December ,  MNLR, Bucharest <ul><li>Romanian online publications:  </li></ul><ul><ul><ul><ul...
<ul><li>< paragraph   id= “”></ paragraph > </li></ul><ul><li>< sentimentGroup   value= “”  id_group= “”> </ sentimentGrou...
ConsILR Conference ,  8 - 9   December ,  MNLR, Bucharest
<ul><li>Linking sentiment groups to entities </li></ul>ConsILR Conference ,  8 - 9   December ,  MNLR, Bucharest
<ul><li>We consider the following  major categories :  City ,  Organization ,  Company ,  Country ,  Person  and additiona...
<ul><li>11 annotators (1 st  year master students in computational linguistics at FII, UAIC)  </li></ul><ul><li>As annotat...
<ul><li>11 annotators </li></ul><ul><li>1 week span </li></ul><ul><li>110 files </li></ul><ul><li>1988 paragraphs </li></u...
ConsILR Conference ,  8 - 9   December ,  MNLR, Bucharest
ConsILR Conference ,  8 - 9   December ,  MNLR, Bucharest
<ul><li>Sentiment group precision </li></ul><ul><li>Precision for named entities and sentiment group links </li></ul>ConsI...
<ul><li>Relaxed precision for sentiment group value </li></ul><ul><li>CG = the set of correctly identified sentiment group...
<ul><li>Average deviation for sentiment group value </li></ul><ul><li>CG = the set of correctly identified sentiment group...
<ul><li>The importance of a Corpus for Sentiment Analysis for Romanian. </li></ul><ul><li>The annotation format and method...
ConsILR Conference ,  8 - 9   December ,  MNLR, Bucharest
Upcoming SlideShare
Loading in …5
×

CoLiOS - Corpus Linguistic Open Source

1,413 views

Published on

Published in: Education, Technology, Business
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total views
1,413
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
6
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

CoLiOS - Corpus Linguistic Open Source

  1. 1. Alex andru -Lucian G înscă 1 , Adrian Iftene 1 , Marius Corîci 2 ConsILR Conference, 8 - 9 December, Bucharest, Romania National Museum of Romanian Literature, (MNLR) 1 “Al. I. Cuza”, University of Ia s i, Rom a nia 1 Facult y of Computer Science 2 Intelligentics, Cluj-Napoca, Romania
  2. 2. <ul><li>Motivation </li></ul><ul><li>Existing Sentiment Corpora </li></ul><ul><li>Files Sources </li></ul><ul><li>Annotations </li></ul><ul><li>Annotation Process </li></ul><ul><li>Corpus Statistics </li></ul><ul><li>Evaluation Metrics Proposal </li></ul><ul><li>Conclusions </li></ul>ConsILR Conference , 8 - 9 December , MNLR, Bucharest
  3. 3. <ul><li>Sentiment Analysis or Opinion Mining represents for some time a hot topic within Web 2.0 era. </li></ul><ul><li>To build robust systems for Sentiment Analysis, there are needed resources for training and evaluating the systems. </li></ul><ul><li>The lack of such a Sentiment Corpus for Romanian. </li></ul><ul><li>We intend to make it publicly available, free of charge for individual researchers and research centers. </li></ul>ConsILR Conference , 8 - 9 December , MNLR, Bucharest
  4. 4. ConsILR Conference , 8 - 9 December , MNLR, Bucharest <ul><li>Existing Sentiment Corpora: MPQA opinion corpus , Large Movie Review Dataset , SentiWordNet , The JDPA Sentiment Corpus , UMass Amherst Linguistics Sentiment Corpora </li></ul><ul><li>Languages: English, German, Italian, Chinese, Japanese </li></ul>
  5. 5. ConsILR Conference , 8 - 9 December , MNLR, Bucharest <ul><li>Romanian online publications: </li></ul><ul><ul><ul><ul><li>Online NewsPapers (MediaFax, Romania Libera, etc) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Blogs (Chinezu.eu, Zoso.ro, etc) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>News Portals (Realitatea.net, StirileProTv.ro, etc) </li></ul></ul></ul></ul><ul><li>Category: Telecommunications </li></ul><ul><li>Companies: Orange, Vodafone, Cosmote and so on. </li></ul>
  6. 6. <ul><li>< paragraph   id= “”></ paragraph > </li></ul><ul><li>< sentimentGroup   value= “”  id_group= “”> </ sentimentGroup > </li></ul><ul><li>-4 <= value <= 4 </li></ul><ul><li>< entity   type= “” sentiment= “” id_entity= “”   </li></ul><ul><li>id_group= “”></ entity > </li></ul><ul><li>-4 <= value <= 4 </li></ul>ConsILR Conference , 8 - 9 December , MNLR, Bucharest
  7. 7. ConsILR Conference , 8 - 9 December , MNLR, Bucharest
  8. 8. <ul><li>Linking sentiment groups to entities </li></ul>ConsILR Conference , 8 - 9 December , MNLR, Bucharest
  9. 9. <ul><li>We consider the following major categories : City , Organization , Company , Country , Person and additionaly we consider categories like Brand , Product and Publication </li></ul><ul><li>For almost all major categories we consider subcategories : </li></ul><ul><ul><li>For Cities we consider Romanian , European , American and Other Cities </li></ul></ul><ul><ul><li>For Organizations we consider Parties , Faculties , Universities , Ministries , etc. </li></ul></ul><ul><ul><li>For People we consider Sportsmen , Politicians , Males , Females , etc. </li></ul></ul>ConsILR Conference , 8 - 9 December , MNLR, Bucharest
  10. 10. <ul><li>11 annotators (1 st year master students in computational linguistics at FII, UAIC) </li></ul><ul><li>As annotation tool we decided to use Serna (http://www.syntext.com/products/serna/) : open source, flexible, easy to use, intuitive </li></ul><ul><li>Method 1: process the chosen files with our tools and automatically add annotations for named entities and for sentiments </li></ul><ul><li>Method 2: process only at paragraph level </li></ul>ConsILR Conference , 8 - 9 December , MNLR, Bucharest
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15. <ul><li>11 annotators </li></ul><ul><li>1 week span </li></ul><ul><li>110 files </li></ul><ul><li>1988 paragraphs </li></ul><ul><li>2044 sentiment groups </li></ul><ul><li>4301 entities </li></ul><ul><li>1101 links between entities and sentiment groups </li></ul>ConsILR Conference , 8 - 9 December , MNLR, Bucharest
  16. 16. ConsILR Conference , 8 - 9 December , MNLR, Bucharest
  17. 17. ConsILR Conference , 8 - 9 December , MNLR, Bucharest
  18. 18. <ul><li>Sentiment group precision </li></ul><ul><li>Precision for named entities and sentiment group links </li></ul>ConsILR Conference , 8 - 9 December , MNLR, Bucharest
  19. 19. <ul><li>Relaxed precision for sentiment group value </li></ul><ul><li>CG = the set of correctly identified sentiment groups </li></ul><ul><li>V F (S SG )= the value of the sentiment group as given by the system </li></ul><ul><li>V G (S SG )= the value of the sentiment group from the gold file. </li></ul>ConsILR Conference , 8 - 9 December , MNLR, Bucharest
  20. 20. <ul><li>Average deviation for sentiment group value </li></ul><ul><li>CG = the set of correctly identified sentiment groups </li></ul><ul><li>V F (S SG )= the value of the sentiment group as given by the system </li></ul><ul><li>V G (S SG )= the value of the sentiment group from the gold file. </li></ul>ConsILR Conference , 8 - 9 December , MNLR, Bucharest
  21. 21. <ul><li>The importance of a Corpus for Sentiment Analysis for Romanian. </li></ul><ul><li>The annotation format and methodology. </li></ul><ul><li>Comparison between our proposal and existing Sentiment Corpora. </li></ul>ConsILR Conference , 8 - 9 December , MNLR, Bucharest
  22. 22. ConsILR Conference , 8 - 9 December , MNLR, Bucharest

×