CoLiOS - Corpus Linguistic Open Source
Upcoming SlideShare
Loading in...5
×
 

CoLiOS - Corpus Linguistic Open Source

on

  • 1,331 views

 

Statistics

Views

Total Views
1,331
Views on SlideShare
1,330
Embed Views
1

Actions

Likes
1
Downloads
2
Comments
1

1 Embed 1

http://www.linkedin.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    CoLiOS - Corpus Linguistic Open Source CoLiOS - Corpus Linguistic Open Source Presentation Transcript

    • Alex andru -Lucian G înscă 1 , Adrian Iftene 1 , Marius Corîci 2 ConsILR Conference, 8 - 9 December, Bucharest, Romania National Museum of Romanian Literature, (MNLR) 1 “Al. I. Cuza”, University of Ia s i, Rom a nia 1 Facult y of Computer Science 2 Intelligentics, Cluj-Napoca, Romania
      • Motivation
      • Existing Sentiment Corpora
      • Files Sources
      • Annotations
      • Annotation Process
      • Corpus Statistics
      • Evaluation Metrics Proposal
      • Conclusions
      ConsILR Conference , 8 - 9 December , MNLR, Bucharest
      • Sentiment Analysis or Opinion Mining represents for some time a hot topic within Web 2.0 era.
      • To build robust systems for Sentiment Analysis, there are needed resources for training and evaluating the systems.
      • The lack of such a Sentiment Corpus for Romanian.
      • We intend to make it publicly available, free of charge for individual researchers and research centers.
      ConsILR Conference , 8 - 9 December , MNLR, Bucharest
    • ConsILR Conference , 8 - 9 December , MNLR, Bucharest
      • Existing Sentiment Corpora: MPQA opinion corpus , Large Movie Review Dataset , SentiWordNet , The JDPA Sentiment Corpus , UMass Amherst Linguistics Sentiment Corpora
      • Languages: English, German, Italian, Chinese, Japanese
    • ConsILR Conference , 8 - 9 December , MNLR, Bucharest
      • Romanian online publications:
            • Online NewsPapers (MediaFax, Romania Libera, etc)
            • Blogs (Chinezu.eu, Zoso.ro, etc)
            • News Portals (Realitatea.net, StirileProTv.ro, etc)
      • Category: Telecommunications
      • Companies: Orange, Vodafone, Cosmote and so on.
      • < paragraph   id= “”></ paragraph >
      • < sentimentGroup   value= “”  id_group= “”> </ sentimentGroup >
      • -4 <= value <= 4
      • < entity   type= “” sentiment= “” id_entity= “”  
      • id_group= “”></ entity >
      • -4 <= value <= 4
      ConsILR Conference , 8 - 9 December , MNLR, Bucharest
    • ConsILR Conference , 8 - 9 December , MNLR, Bucharest
      • Linking sentiment groups to entities
      ConsILR Conference , 8 - 9 December , MNLR, Bucharest
      • We consider the following major categories : City , Organization , Company , Country , Person and additionaly we consider categories like Brand , Product and Publication
      • For almost all major categories we consider subcategories :
        • For Cities we consider Romanian , European , American and Other Cities
        • For Organizations we consider Parties , Faculties , Universities , Ministries , etc.
        • For People we consider Sportsmen , Politicians , Males , Females , etc.
      ConsILR Conference , 8 - 9 December , MNLR, Bucharest
      • 11 annotators (1 st year master students in computational linguistics at FII, UAIC)
      • As annotation tool we decided to use Serna (http://www.syntext.com/products/serna/) : open source, flexible, easy to use, intuitive
      • Method 1: process the chosen files with our tools and automatically add annotations for named entities and for sentiments
      • Method 2: process only at paragraph level
      ConsILR Conference , 8 - 9 December , MNLR, Bucharest
      • 11 annotators
      • 1 week span
      • 110 files
      • 1988 paragraphs
      • 2044 sentiment groups
      • 4301 entities
      • 1101 links between entities and sentiment groups
      ConsILR Conference , 8 - 9 December , MNLR, Bucharest
    • ConsILR Conference , 8 - 9 December , MNLR, Bucharest
    • ConsILR Conference , 8 - 9 December , MNLR, Bucharest
      • Sentiment group precision
      • Precision for named entities and sentiment group links
      ConsILR Conference , 8 - 9 December , MNLR, Bucharest
      • Relaxed precision for sentiment group value
      • CG = the set of correctly identified sentiment groups
      • V F (S SG )= the value of the sentiment group as given by the system
      • V G (S SG )= the value of the sentiment group from the gold file.
      ConsILR Conference , 8 - 9 December , MNLR, Bucharest
      • Average deviation for sentiment group value
      • CG = the set of correctly identified sentiment groups
      • V F (S SG )= the value of the sentiment group as given by the system
      • V G (S SG )= the value of the sentiment group from the gold file.
      ConsILR Conference , 8 - 9 December , MNLR, Bucharest
      • The importance of a Corpus for Sentiment Analysis for Romanian.
      • The annotation format and methodology.
      • Comparison between our proposal and existing Sentiment Corpora.
      ConsILR Conference , 8 - 9 December , MNLR, Bucharest
    • ConsILR Conference , 8 - 9 December , MNLR, Bucharest