2. Sentiment datasets for other
languages
• AFINN by Finn Årup Nielsen
AFINN is a list of English words rated for valence with an integer
between minus five (negative) and plus five (positive).The words have
been manually labeled by Finn Årup Nielsen in 2009-2011.The file
is tab-separated.
3. Sentiment datasets for other
languages
• Opinion Lexicon by Hu and Liu
A list of English positive and negative opinion words or sentiment words
(around 6800 words)
4. Sentiment datasets for other
languages
• NRCWord-EmotionAssociation Lexicon by Saif Mohammad
and PeterTurney
The NRC Emotion Lexicon is a list of English words and their associations with
eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and
disgust) and two sentiments (negative and positive).The annotations were
manually done by crowdsourcing.
5. Datasets
• Refined Persian Polarity corpus
• Dehdarbehbahani, I., Shakery, A., & Faili, H. (2014). Semi-supervised word
polarity identification in resource-lean languages. Neural Networks, 58, 50-
59.
• Corpus of exceptions
• [working dataset]
6. Datasets
• Corpus of exceptions was extracted from Flexicon
database
• ناشتا
• نارنگی
• نارج
• بیدمشک
• الروبی
7. Datasets
• The exceptions list also needs to be refined
• اجتماعی غیر
• اصولی غیر
• ناشنوایی
• پادگان
9. How does the algorithm work?
INPUT
Not
Negative
Not
Negative
Negatives
list
Exception
list
Negative
Affix
searching
Negative
10. Further development
• Creating a database of Affixed but positive words
• پروا بی
• ضدآب
• Using Elasticsearch as database for making the process
faster
• Using a corpus instead of FLexicon
• Using statistical approaches for increasing accuracy
11. Further Research
• The datasets are still not reliable enough. Need to be
worked on so that the accuracy of the algorithm will be
higher.