Your SlideShare is downloading. ×
Sentiment analysis for Serbian language
Sentiment analysis for Serbian language
Sentiment analysis for Serbian language
Sentiment analysis for Serbian language
Sentiment analysis for Serbian language
Sentiment analysis for Serbian language
Sentiment analysis for Serbian language
Sentiment analysis for Serbian language
Sentiment analysis for Serbian language
Sentiment analysis for Serbian language
Sentiment analysis for Serbian language
Sentiment analysis for Serbian language
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Sentiment analysis for Serbian language

180

Published on

Sentiment analisis of Serbian languge with stemmer for Serbian.

Sentiment analisis of Serbian languge with stemmer for Serbian.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
180
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Sentiment analysis of sentences in Serbian language Nikola Milošević
  • 2. Why to analyze sentiment in Serbian? ● Great industrial need – – Automated market research – ● Ads websites Customer satisfaction NLP tools for Serbian are not developed – Need for tools and resources – Almost no accessible tools through API
  • 3. Serbian language ● Belongs to Indo-Europian language group ● Slavic language ● Highly inflectional ● 3 pronunciation types ● 3 dialect groups ● Write as you speak ● Latin and Cyrillic writing system
  • 4. Sentiment analysis work-flow
  • 5. Tokenization and preprocessing ● Process of breaking a stream of text up into words ● Stop-word filtering ● Negation handling – – ● Adding NE_ prefix after negation All words before punctuation Irregular verbs
  • 6. Stemming ● Process for reducing inflected words to their stem, base or root form ● Kešelj and Šipka (2008) ● Hand crafted rule based stemmer ● ~300 rules
  • 7. Sentiment analysis ● Aim to build binary sentiment analysis ● General Serbian language ● No annotated corpus for Serbian ● Annotation work (~1000 small texts) ● Supervised machine learning
  • 8. Naive Bayes ● Algorithm that learns fast ● Bag of words approach ● Assumption of conditional independence ● Laplace smoothing
  • 9. Implementation ● Web API with presentation layer ● JSON communication ● Secured page for annotating ● Build using PHP and MySQL ● Web & Android
  • 10. Results ● Stemmer – – 90% correct on news articles – ● Smallest and most precise stemmer Problems: small words, irregular inflections, voice changes Sentiment analyzer – 80% correct – Problems: Irony, ambiguity, small training data
  • 11. Future work ● Stemmer – – ● Use snowball framework Build multi-step stemmer Sentiment analyzer – POS tagging – Complex negation handling – SVM algorithm
  • 12. Thank you ● Available from http://inspiratron.org ● Contact: nikola.milosevic@postgrad.manchester.ac.uk

×