Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sentiment analysis for Serbian language

1,628 views

Published on

Sentiment analisis of Serbian languge with stemmer for Serbian.

Published in: Technology
  • Be the first to comment

Sentiment analysis for Serbian language

  1. 1. Sentiment analysis of sentences in Serbian language Nikola Milošević
  2. 2. Why to analyze sentiment in Serbian? ● Great industrial need – – Automated market research – ● Ads websites Customer satisfaction NLP tools for Serbian are not developed – Need for tools and resources – Almost no accessible tools through API
  3. 3. Serbian language ● Belongs to Indo-Europian language group ● Slavic language ● Highly inflectional ● 3 pronunciation types ● 3 dialect groups ● Write as you speak ● Latin and Cyrillic writing system
  4. 4. Sentiment analysis work-flow
  5. 5. Tokenization and preprocessing ● Process of breaking a stream of text up into words ● Stop-word filtering ● Negation handling – – ● Adding NE_ prefix after negation All words before punctuation Irregular verbs
  6. 6. Stemming ● Process for reducing inflected words to their stem, base or root form ● Kešelj and Šipka (2008) ● Hand crafted rule based stemmer ● ~300 rules
  7. 7. Sentiment analysis ● Aim to build binary sentiment analysis ● General Serbian language ● No annotated corpus for Serbian ● Annotation work (~1000 small texts) ● Supervised machine learning
  8. 8. Naive Bayes ● Algorithm that learns fast ● Bag of words approach ● Assumption of conditional independence ● Laplace smoothing
  9. 9. Implementation ● Web API with presentation layer ● JSON communication ● Secured page for annotating ● Build using PHP and MySQL ● Web & Android
  10. 10. Results ● Stemmer – – 90% correct on news articles – ● Smallest and most precise stemmer Problems: small words, irregular inflections, voice changes Sentiment analyzer – 80% correct – Problems: Irony, ambiguity, small training data
  11. 11. Future work ● Stemmer – – ● Use snowball framework Build multi-step stemmer Sentiment analyzer – POS tagging – Complex negation handling – SVM algorithm
  12. 12. Thank you ● Available from http://inspiratron.org ● Contact: nikola.milosevic@postgrad.manchester.ac.uk

×