Sentiment analysis for Serbian language

796 views
467 views

Published on

Sentiment analisis of Serbian languge with stemmer for Serbian.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
796
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Sentiment analysis for Serbian language

  1. 1. Sentiment analysis of sentences in Serbian language Nikola Milošević
  2. 2. Why to analyze sentiment in Serbian? ● Great industrial need – – Automated market research – ● Ads websites Customer satisfaction NLP tools for Serbian are not developed – Need for tools and resources – Almost no accessible tools through API
  3. 3. Serbian language ● Belongs to Indo-Europian language group ● Slavic language ● Highly inflectional ● 3 pronunciation types ● 3 dialect groups ● Write as you speak ● Latin and Cyrillic writing system
  4. 4. Sentiment analysis work-flow
  5. 5. Tokenization and preprocessing ● Process of breaking a stream of text up into words ● Stop-word filtering ● Negation handling – – ● Adding NE_ prefix after negation All words before punctuation Irregular verbs
  6. 6. Stemming ● Process for reducing inflected words to their stem, base or root form ● Kešelj and Šipka (2008) ● Hand crafted rule based stemmer ● ~300 rules
  7. 7. Sentiment analysis ● Aim to build binary sentiment analysis ● General Serbian language ● No annotated corpus for Serbian ● Annotation work (~1000 small texts) ● Supervised machine learning
  8. 8. Naive Bayes ● Algorithm that learns fast ● Bag of words approach ● Assumption of conditional independence ● Laplace smoothing
  9. 9. Implementation ● Web API with presentation layer ● JSON communication ● Secured page for annotating ● Build using PHP and MySQL ● Web & Android
  10. 10. Results ● Stemmer – – 90% correct on news articles – ● Smallest and most precise stemmer Problems: small words, irregular inflections, voice changes Sentiment analyzer – 80% correct – Problems: Irony, ambiguity, small training data
  11. 11. Future work ● Stemmer – – ● Use snowball framework Build multi-step stemmer Sentiment analyzer – POS tagging – Complex negation handling – SVM algorithm
  12. 12. Thank you ● Available from http://inspiratron.org ● Contact: nikola.milosevic@postgrad.manchester.ac.uk

×