Persian Sentiment Analysis
Natural Language Processing
Moein Hosseini
abc@moein.xyz
Outline
● Introduction
● Researches
– A Framework for Sentiment Analysis in Persian
– A Non-Parametric LDA-Based Induction Method for Sentiment Analysis
– Feature Selection Methods in Persian Sentiment Analysis
– Emotions from Farsi Texts with Mutual-Words-Counting and Word-Spotting
– Others
● Usages
– Analyzing the Political Sentiment of Tweets in Farsi
● Data Sets
Introduction
Different Levels of Analysis
● Document level
● Sentence level
● Entity and Aspect level
– Opinion: (e, a, s, h, t) →
("mac pro", "openness", -10, "Moein", 1464582
Different Types of Opinions
● Regular vs Comprative
● Explicit vs Implicit
Sentiment Analysis Approaches
● Machine Learning Approach
– Identify non-sentiment terms, implied sentiment
– Need Seed Data, Domain Dependency
● Lexicon Based Approach
– Word Net, Senti Word Net
Persian Sentiment Analysis
A Framework for Sentiment
Analysis in Persian
Published in:
Open Transactions on Information Processing
Authors:
● Basiri, Mohammad
● Nilchi, Ahmad
● Ghassem-Aghaee, Nasser
A Framework for Sentiment
Analysis in Persian
A Framework for Sentiment
Analysis in Persian
Normalization: Solve Basic Challenges
– Different forms of writing:
– Different Unicode:
– Space and Psudo-Space:
A Framework for Sentiment
Analysis in Persian
Spell Correction:
● Many alphabets for one sound:
– can be written in 48 ways
● Informal words
–
A Framework for Sentiment
Analysis in Persian
Stemmer: Using Dolamic Stemmer
● Remove stop words
● Doesn't affect verbs
● But most of sentimet words are
related to Nouns and Adjectives
A Framework for Sentiment
Analysis in Persian
Sentence Splitting: Any comment
● Unit of text
● Collection of sentences
A Framework for Sentiment
Analysis in Persian
Polarity Detection: Translated SentiStrength
A Framework for Sentiment
Analysis in Persian
Aggregation:
● SentiStrength
● Maximum of scores
● Scaled rate
● Sum of maximums
● Dempster-Shafer
A Framework for Sentiment
Analysis in Persian
Dempster-Shafer:
A Framework for Sentiment
Analysis in Persian
Evaluation:
● mobile.ir
● Number of reviews: 1100
●
Avrage number of words: 2547
●
Avrage number of sentence: 191
A Framework for Sentiment
Analysis in Persian
Result:
A non-parametric LDA-based
induction method for sentiment
analysis
Published in:
AISP 2012 - 16th CSI International Symposium on
Artificial Intelligence and Signal Processing
Authors:
● Shams, Mohammadreza
● Shakery, Azadeh
● Faili, Heshaam
LDASA
● Build Persian Clues:
– Translate English lexicon to Persian
– Correct errores
● LDA
● Classification
LDASA
Translate English lexicon to Persian
● Subjectivly Clues (8027 terms)
● Using automatically translation
– So differente size:
● Jelouse: negative
–
–
● Reduce Size
– Remove frequent & infrequent words
LDASA
Error Correction:
● Using word net
– There is no well defined Persian word net
● Using concept graph
– Comments are too small for that
● Using mutual information
– Again LIKE A BOSS
LDASA
● Mutual Information:
● Iterative task runs to correct errors:
– Seed and init:
● 40 most used positive
● 40 most used negative
– Correct one word polarity in each interation
LDASA
● Topic Extraction: LDA
● Classification:
– Positive and Negative
● Evaluation:
– Phones, digital cameras, hotels
– 200 positive and 200 negative for each group
LDASA
Evaluation:
Feature selection methods in
Persian sentiment analysis
Published in:
Natural Language Processing and Information
Authors:
● Saraee, Mohamad
● Bagheri, Ayoub
Feature selection methods in
Persian sentiment analysis
● Feature Selection for Sentiment Analysis
– Document Frequency (DF)
– Term Frequency Variance (TFV)
– Mutual Information (MI)
– Modified Mutual Information (MMI)
Feature selection methods in
Persian sentiment analysis
● Mutual Information:
c1
c2
f1
A B
f2
C D
Feature selection methods in
Persian sentiment analysis
● Mutual Information:
c1
c2
f1
A B
f2
C D
Feature selection methods in
Persian sentiment analysis
● Evaluation:
Emotions from Farsi Texts with
Mutual-Word- Counting and Word-
Spotting
Published in:
The 16th CSI International Symposium on
Artificial Intelligence and Signal Processing
Authors:
● Jahromi, Amir Namvar
● Homayounpour, Mohammad Mehdi
Emotions from Farsi Texts with
Mutual-Word- Counting and Word-
Spotting
● Sentiment:
– Polarity: Positive, Negative
● Sense:
– Happy, Sad, Angry and ...
Emotions from Farsi Texts with
Mutual-Word- Counting and Word-
Spotting
Sensing Methods:
●
Word Count
– Counting
– Weighted Counting
●
Word Spotting
– Labeled Word: if the words of more than one emotion exists in the
sentence, the emotion with more number of related words is selected as
a final result
●
Mutual Word Count
– Two similar words are counted as single word
●
Mutual Word Count And Word Spotting
Emotions from Farsi Texts with
Mutual-Word- Counting and Word-
Spotting
Evaluation:
● 2243 sentences in four group: happy, neutral,
sad, angry
Others
● Opinion Mining in Persian Language Using
Supervised Algorithms
● Lexicon-based sentiment analysis for
Persian text
● Sentiment classification in Persian:
Introducing a mutual information-based
method for feature selection
Others
● A SVM-based method for sentiment analysis
in Persian language
●‫یییی‬ ‫یییییی‬ ‫ییی‬ ‫ییییی‬
‫ییییی‬ ‫یییی‬ ‫یی‬ ‫ییییییی‬
‫ی‬ ‫ییییییی‬ ‫یی‬ ‫ییییییی‬ ‫یی‬
‫یییییییی‬SVM
Usages
Analyzing the Political Sentiment of
Tweets in Farsi
Published in:
Proceedings of the Tenth International AAAI
Conference on Web and Social Media (ICWSM
2016)
Authors:
● Vaziripour, Elham
● Zappala, Daniel
● Giraud-carrier, Christophe
Analyzing the Political Sentiment of
Tweets in Farsi
● Using Twitter Steam API During Iran Deal Negotiation
● Filtering by some terms:
–
–
–
–
–
–
–
–
– ...
Analyzing the Political Sentiment of
Tweets in Farsi
● 3000 tweets labeled by native persian
Speakers
– 1,2 → negative → 37%
– 3 → neutral → 35%
– 4,5 → positive → 27%
● Using Brown
● SVM
– 1000 clusters + 3 as cutoff
Analyzing the Political Sentiment of
Tweets in Farsi
● Sub Topic By LDA
Analyzing the Political Sentiment of
Tweets in Farsi
● Result
Data Sets
Persian SentiWordNet
Adjectives: Manualy Annoutation
● Positive: 968 words
● Negative: 962 words
● Neutral: 1572 words
Persian SentiWordNet
Adjectives + Verbs + Nouns: Semi-Supervised
● Adjectives: 3588 words
● Verbs: 4073 words
● Nouns: 7325 words
Persian SentiWordNet
Semi-supervised word polarity identification in
resource-lean languages
Authors:
● Iman Dehdarbehbahania
● Azadeh Shakerya
● Heshaam Failia
Others
●
●‫ییییییییی‬ ‫ییییییی‬ ‫یییی‬ :‫یییییی‬
‫یییییی‬
●‫ییی‬‫ی‬‫یییی‬‫ی‬‫ییییی‬ ‫ییییی‬ ‫ییییی‬ ‫ییییی‬
●‫یییی‬ ‫یییییی‬ ‫ییییی‬ ‫یییی‬ ‫یییییی‬
) ‫ییییی‬Persian ESD(
Thanks for your attention

Persian setiment analysis

  • 1.
    Persian Sentiment Analysis NaturalLanguage Processing Moein Hosseini abc@moein.xyz
  • 2.
    Outline ● Introduction ● Researches –A Framework for Sentiment Analysis in Persian – A Non-Parametric LDA-Based Induction Method for Sentiment Analysis – Feature Selection Methods in Persian Sentiment Analysis – Emotions from Farsi Texts with Mutual-Words-Counting and Word-Spotting – Others ● Usages – Analyzing the Political Sentiment of Tweets in Farsi ● Data Sets
  • 3.
  • 4.
    Different Levels ofAnalysis ● Document level ● Sentence level ● Entity and Aspect level – Opinion: (e, a, s, h, t) → ("mac pro", "openness", -10, "Moein", 1464582
  • 5.
    Different Types ofOpinions ● Regular vs Comprative ● Explicit vs Implicit
  • 6.
    Sentiment Analysis Approaches ●Machine Learning Approach – Identify non-sentiment terms, implied sentiment – Need Seed Data, Domain Dependency ● Lexicon Based Approach – Word Net, Senti Word Net
  • 7.
  • 8.
    A Framework forSentiment Analysis in Persian Published in: Open Transactions on Information Processing Authors: ● Basiri, Mohammad ● Nilchi, Ahmad ● Ghassem-Aghaee, Nasser
  • 9.
    A Framework forSentiment Analysis in Persian
  • 10.
    A Framework forSentiment Analysis in Persian Normalization: Solve Basic Challenges – Different forms of writing: – Different Unicode: – Space and Psudo-Space:
  • 11.
    A Framework forSentiment Analysis in Persian Spell Correction: ● Many alphabets for one sound: – can be written in 48 ways ● Informal words –
  • 12.
    A Framework forSentiment Analysis in Persian Stemmer: Using Dolamic Stemmer ● Remove stop words ● Doesn't affect verbs ● But most of sentimet words are related to Nouns and Adjectives
  • 13.
    A Framework forSentiment Analysis in Persian Sentence Splitting: Any comment ● Unit of text ● Collection of sentences
  • 14.
    A Framework forSentiment Analysis in Persian Polarity Detection: Translated SentiStrength
  • 15.
    A Framework forSentiment Analysis in Persian Aggregation: ● SentiStrength ● Maximum of scores ● Scaled rate ● Sum of maximums ● Dempster-Shafer
  • 16.
    A Framework forSentiment Analysis in Persian Dempster-Shafer:
  • 17.
    A Framework forSentiment Analysis in Persian Evaluation: ● mobile.ir ● Number of reviews: 1100 ● Avrage number of words: 2547 ● Avrage number of sentence: 191
  • 18.
    A Framework forSentiment Analysis in Persian Result:
  • 19.
    A non-parametric LDA-based inductionmethod for sentiment analysis Published in: AISP 2012 - 16th CSI International Symposium on Artificial Intelligence and Signal Processing Authors: ● Shams, Mohammadreza ● Shakery, Azadeh ● Faili, Heshaam
  • 20.
    LDASA ● Build PersianClues: – Translate English lexicon to Persian – Correct errores ● LDA ● Classification
  • 21.
    LDASA Translate English lexiconto Persian ● Subjectivly Clues (8027 terms) ● Using automatically translation – So differente size: ● Jelouse: negative – – ● Reduce Size – Remove frequent & infrequent words
  • 22.
    LDASA Error Correction: ● Usingword net – There is no well defined Persian word net ● Using concept graph – Comments are too small for that ● Using mutual information – Again LIKE A BOSS
  • 23.
    LDASA ● Mutual Information: ●Iterative task runs to correct errors: – Seed and init: ● 40 most used positive ● 40 most used negative – Correct one word polarity in each interation
  • 24.
    LDASA ● Topic Extraction:LDA ● Classification: – Positive and Negative ● Evaluation: – Phones, digital cameras, hotels – 200 positive and 200 negative for each group
  • 25.
  • 26.
    Feature selection methodsin Persian sentiment analysis Published in: Natural Language Processing and Information Authors: ● Saraee, Mohamad ● Bagheri, Ayoub
  • 27.
    Feature selection methodsin Persian sentiment analysis ● Feature Selection for Sentiment Analysis – Document Frequency (DF) – Term Frequency Variance (TFV) – Mutual Information (MI) – Modified Mutual Information (MMI)
  • 28.
    Feature selection methodsin Persian sentiment analysis ● Mutual Information: c1 c2 f1 A B f2 C D
  • 29.
    Feature selection methodsin Persian sentiment analysis ● Mutual Information: c1 c2 f1 A B f2 C D
  • 30.
    Feature selection methodsin Persian sentiment analysis ● Evaluation:
  • 31.
    Emotions from FarsiTexts with Mutual-Word- Counting and Word- Spotting Published in: The 16th CSI International Symposium on Artificial Intelligence and Signal Processing Authors: ● Jahromi, Amir Namvar ● Homayounpour, Mohammad Mehdi
  • 32.
    Emotions from FarsiTexts with Mutual-Word- Counting and Word- Spotting ● Sentiment: – Polarity: Positive, Negative ● Sense: – Happy, Sad, Angry and ...
  • 33.
    Emotions from FarsiTexts with Mutual-Word- Counting and Word- Spotting Sensing Methods: ● Word Count – Counting – Weighted Counting ● Word Spotting – Labeled Word: if the words of more than one emotion exists in the sentence, the emotion with more number of related words is selected as a final result ● Mutual Word Count – Two similar words are counted as single word ● Mutual Word Count And Word Spotting
  • 34.
    Emotions from FarsiTexts with Mutual-Word- Counting and Word- Spotting Evaluation: ● 2243 sentences in four group: happy, neutral, sad, angry
  • 35.
    Others ● Opinion Miningin Persian Language Using Supervised Algorithms ● Lexicon-based sentiment analysis for Persian text ● Sentiment classification in Persian: Introducing a mutual information-based method for feature selection
  • 36.
    Others ● A SVM-basedmethod for sentiment analysis in Persian language ●‫یییی‬ ‫یییییی‬ ‫ییی‬ ‫ییییی‬ ‫ییییی‬ ‫یییی‬ ‫یی‬ ‫ییییییی‬ ‫ی‬ ‫ییییییی‬ ‫یی‬ ‫ییییییی‬ ‫یی‬ ‫یییییییی‬SVM
  • 37.
  • 38.
    Analyzing the PoliticalSentiment of Tweets in Farsi Published in: Proceedings of the Tenth International AAAI Conference on Web and Social Media (ICWSM 2016) Authors: ● Vaziripour, Elham ● Zappala, Daniel ● Giraud-carrier, Christophe
  • 39.
    Analyzing the PoliticalSentiment of Tweets in Farsi ● Using Twitter Steam API During Iran Deal Negotiation ● Filtering by some terms: – – – – – – – – – ...
  • 40.
    Analyzing the PoliticalSentiment of Tweets in Farsi ● 3000 tweets labeled by native persian Speakers – 1,2 → negative → 37% – 3 → neutral → 35% – 4,5 → positive → 27% ● Using Brown ● SVM – 1000 clusters + 3 as cutoff
  • 41.
    Analyzing the PoliticalSentiment of Tweets in Farsi ● Sub Topic By LDA
  • 42.
    Analyzing the PoliticalSentiment of Tweets in Farsi ● Result
  • 43.
  • 44.
    Persian SentiWordNet Adjectives: ManualyAnnoutation ● Positive: 968 words ● Negative: 962 words ● Neutral: 1572 words
  • 45.
    Persian SentiWordNet Adjectives +Verbs + Nouns: Semi-Supervised ● Adjectives: 3588 words ● Verbs: 4073 words ● Nouns: 7325 words
  • 46.
    Persian SentiWordNet Semi-supervised wordpolarity identification in resource-lean languages Authors: ● Iman Dehdarbehbahania ● Azadeh Shakerya ● Heshaam Failia
  • 47.
    Others ● ●‫ییییییییی‬ ‫ییییییی‬ ‫یییی‬:‫یییییی‬ ‫یییییی‬ ●‫ییی‬‫ی‬‫یییی‬‫ی‬‫ییییی‬ ‫ییییی‬ ‫ییییی‬ ‫ییییی‬ ●‫یییی‬ ‫یییییی‬ ‫ییییی‬ ‫یییی‬ ‫یییییی‬ ) ‫ییییی‬Persian ESD(
  • 48.
    Thanks for yourattention