SlideShare a Scribd company logo
1 of 15
Download to read offline
LEXICON-BASED SENTIMENT ANALYSIS FOR PERSIAN
TEXT
Authors:
Mohammadhassan Khodashahi
Twitter: @KhodashahiMh
Email: mh.khodashai@gmail.com
1
CONTENTS
➢ Motivation
➢ Introduction
➢ Approach
✓ Data collection
✓ Processing resource
▪ Tokenizer
▪ Sentence splitter
▪ Pos tagger
▪ Jape rules
➢ Testing
✓ Performance testing (1)
✓ Performance testing (2)
➢ Conclusion and future works
➢ References
2
MOTIVATION
• Online shopping become more preferable for customers.
• E-business has improved in different aspects.
• User feedbacks and comments on different products have an important roll on other
users’ decision.
• Sentiment analysis works on this amount of data to give the expected information
for different Enterprises to help them improved.
3
INTRODUCTION
• Definition (Wikipedia): Sentiment analysis ( opinion mining) refers to the use
of natural language processing and text mining to identify and extract
subjective information in source materials.
• sentiment analysis determine the attitude of a speaker as
• Why for Persian language?! Number of Persian websites and users
increasing, but no such research for Persian language.
4
APPROACH
• Data collection:
● Gathering 7179 Persian adjectives.
● we received a total of 8278 votes for adjectives.
● Asking Persian people to vote about their sense on each
adjectives
● Calculate the average voting for each word, to define the
sentiment of them.
5
APPROACH…
6
• Tokenizer: splits the text into very simple tokens
• Sentence splitter: Fragments the text into its sentences
• Gazetter: our Gazeteer is our gathered adjectives with their sentiment
Approach…
7
APPROACH…
• Jape Rules: To identify regular expressions we have formulated as
grammar base
❖ Word level
“‫”ﺑﯽ‬ “Bi” and “‫”ﻧﺎ‬ “Na” + noun => negative word (‫ﻣﻌﺮﻓﺖ‬ ‫ﺑﯽ‬ ,‫)ﻧﺎﻣﺮد‬
“ ‫”ن‬ + Verb => negative verb (‫ﻓﮭﻤﺪ‬ ‫)ﻧﻤﯽ‬
❖ Sentence level
“ He is not a lair”
“That film had a lot of famous actors but it couldn’t attract people’s “
8
TESTING
• Performance test (1):
Using a Persian web site : www.iran-booking.com
69%
9
10
TESTING…
• Performance test (2)
Gathering about 5000 news from most famous Persian news websites
depends on Alexa rating.
Choose 800 news randomly and asked Persian people to vote on them about
their sense.
11
PERFORMANCE TEST (2)…
1200 votes have been gathered on 800 news.
12
CONCLUSION AND FUTURE WORKS
• For the first Persian API we receive to 69% accuracy till now
How to improved the accuracy?
❖ more jape rules especially in the form of “BUT” “Although” and …
❖ Encouraging more Persian people to vote on the adjectives and phrases to
have a better lexicon data base.
❖ Working on informal form too that’s common in tweets and comments.
13
REFERENCES
• [1] A SVM-Based Method for Sentiment Analysis in Persian Language
(Mohammad Sadegh Hajmohammadi, Roliana Ibrahim)
• [2] https://gate.ac.uk/
• [3] “Majid Sezavar” http://sazvar.student.um.ac.ir/
• [4]
http://www.coreservlets.com/Apache-Tomcat-Tutorial/tomcat-7-with
-eclipse.html
• [5] http://javapapers.com/web-service/java-web-service-using-eclipse
14
THANKS FOR YOUR ATTENTION…
15

More Related Content

Similar to Lexicon-based Sentiment Analysis for Persian Text

Similar to Lexicon-based Sentiment Analysis for Persian Text (20)

Lexicon-Based Sentiment Analysis at GHC 2014
Lexicon-Based Sentiment Analysis at GHC 2014Lexicon-Based Sentiment Analysis at GHC 2014
Lexicon-Based Sentiment Analysis at GHC 2014
 
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
 
Fypca4
Fypca4Fypca4
Fypca4
 
Fypca4
Fypca4Fypca4
Fypca4
 
Fypca4
Fypca4Fypca4
Fypca4
 
Conversational AI from CAT I to III
Conversational AI  from CAT I to IIIConversational AI  from CAT I to III
Conversational AI from CAT I to III
 
Karen N. Johnson: Managing an Offshore Team
Karen N. Johnson: Managing an Offshore TeamKaren N. Johnson: Managing an Offshore Team
Karen N. Johnson: Managing an Offshore Team
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Introducing talent auditions: The future of attracting, assessing, & acquirin...
Introducing talent auditions: The future of attracting, assessing, & acquirin...Introducing talent auditions: The future of attracting, assessing, & acquirin...
Introducing talent auditions: The future of attracting, assessing, & acquirin...
 
Introducing talent auditions: The future of attracting, assessing, & acquirin...
Introducing talent auditions: The future of attracting, assessing, & acquirin...Introducing talent auditions: The future of attracting, assessing, & acquirin...
Introducing talent auditions: The future of attracting, assessing, & acquirin...
 
Deep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHUDeep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHU
 
Better Search Engine Testing
Better Search Engine TestingBetter Search Engine Testing
Better Search Engine Testing
 
Sentiment Analysis for SEO
Sentiment Analysis for SEOSentiment Analysis for SEO
Sentiment Analysis for SEO
 
Customer Insights Workshop - Consumer Text Analytics Conference
Customer Insights Workshop - Consumer Text Analytics ConferenceCustomer Insights Workshop - Consumer Text Analytics Conference
Customer Insights Workshop - Consumer Text Analytics Conference
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Mobipedia presentation
Mobipedia presentationMobipedia presentation
Mobipedia presentation
 
[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community
[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community
[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community
 
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
 
What Questions Are Worth Answering?
What Questions Are Worth Answering?What Questions Are Worth Answering?
What Questions Are Worth Answering?
 

Recently uploaded

Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
dq9vz1isj
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
aqpto5bt
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
siskavia95
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
jk0tkvfv
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 

Recently uploaded (20)

Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 

Lexicon-based Sentiment Analysis for Persian Text

  • 1. LEXICON-BASED SENTIMENT ANALYSIS FOR PERSIAN TEXT Authors: Mohammadhassan Khodashahi Twitter: @KhodashahiMh Email: mh.khodashai@gmail.com 1
  • 2. CONTENTS ➢ Motivation ➢ Introduction ➢ Approach ✓ Data collection ✓ Processing resource ▪ Tokenizer ▪ Sentence splitter ▪ Pos tagger ▪ Jape rules ➢ Testing ✓ Performance testing (1) ✓ Performance testing (2) ➢ Conclusion and future works ➢ References 2
  • 3. MOTIVATION • Online shopping become more preferable for customers. • E-business has improved in different aspects. • User feedbacks and comments on different products have an important roll on other users’ decision. • Sentiment analysis works on this amount of data to give the expected information for different Enterprises to help them improved. 3
  • 4. INTRODUCTION • Definition (Wikipedia): Sentiment analysis ( opinion mining) refers to the use of natural language processing and text mining to identify and extract subjective information in source materials. • sentiment analysis determine the attitude of a speaker as • Why for Persian language?! Number of Persian websites and users increasing, but no such research for Persian language. 4
  • 5. APPROACH • Data collection: ● Gathering 7179 Persian adjectives. ● we received a total of 8278 votes for adjectives. ● Asking Persian people to vote about their sense on each adjectives ● Calculate the average voting for each word, to define the sentiment of them. 5
  • 7. • Tokenizer: splits the text into very simple tokens • Sentence splitter: Fragments the text into its sentences • Gazetter: our Gazeteer is our gathered adjectives with their sentiment Approach… 7
  • 8. APPROACH… • Jape Rules: To identify regular expressions we have formulated as grammar base ❖ Word level “‫”ﺑﯽ‬ “Bi” and “‫”ﻧﺎ‬ “Na” + noun => negative word (‫ﻣﻌﺮﻓﺖ‬ ‫ﺑﯽ‬ ,‫)ﻧﺎﻣﺮد‬ “ ‫”ن‬ + Verb => negative verb (‫ﻓﮭﻤﺪ‬ ‫)ﻧﻤﯽ‬ ❖ Sentence level “ He is not a lair” “That film had a lot of famous actors but it couldn’t attract people’s “ 8
  • 9. TESTING • Performance test (1): Using a Persian web site : www.iran-booking.com 69% 9
  • 10. 10
  • 11. TESTING… • Performance test (2) Gathering about 5000 news from most famous Persian news websites depends on Alexa rating. Choose 800 news randomly and asked Persian people to vote on them about their sense. 11
  • 12. PERFORMANCE TEST (2)… 1200 votes have been gathered on 800 news. 12
  • 13. CONCLUSION AND FUTURE WORKS • For the first Persian API we receive to 69% accuracy till now How to improved the accuracy? ❖ more jape rules especially in the form of “BUT” “Although” and … ❖ Encouraging more Persian people to vote on the adjectives and phrases to have a better lexicon data base. ❖ Working on informal form too that’s common in tweets and comments. 13
  • 14. REFERENCES • [1] A SVM-Based Method for Sentiment Analysis in Persian Language (Mohammad Sadegh Hajmohammadi, Roliana Ibrahim) • [2] https://gate.ac.uk/ • [3] “Majid Sezavar” http://sazvar.student.um.ac.ir/ • [4] http://www.coreservlets.com/Apache-Tomcat-Tutorial/tomcat-7-with -eclipse.html • [5] http://javapapers.com/web-service/java-web-service-using-eclipse 14
  • 15. THANKS FOR YOUR ATTENTION… 15