This document presents research on cross-domain sentiment analysis of the Romanian language. The researcher compiled a multi-domain Romanian corpus from reviews and evaluated popular sentiment analysis models including decision trees, logistic regression, support vector machines, naive Bayes, recurrent neural networks, and BERT. BERT achieved the best performance with 93% accuracy, outperforming models that used translated text. The models demonstrated robustness when trained and tested on a Romanian BERT model. The research aims to advance sentiment analysis for the under-resourced Romanian language.
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
ย
Cross-domain sentiment analysis of the natural Romanian language
1. ศtefana Cioban
Statistics-Forecasts-Mathematics Department, Faculty of Economics and
Business Administration & Interdisciplinary Centre for Data Science,
Babeศ-Bolyai University, Cluj-Napoca, Romania
Cross-domain
sentiment
analysis of the
natural
Romanian
language
4. Related work
โข scarce literature on Romanian SA
โข To benefit from the techniques developed for English, researchers prefer translation [1],
[2]
โข SA methodological applications:
โข Lexicons [4], [5], [6]
โข ML: SVM, NB, DT [7] & [8], RNN [9], Transformers (Googleโs BERT) [10]
โข Cross-domain SA [3]
5. Methodology
multi-domain Romanian from 38310 reviews: LaRoSeDa [11] & a compilation of products
and movies reviews [12]
English translation of the document label
this director must have been sick when he
directed this film
0
a piece of junk that doesn't have a proper
wiring diagram
0
it is a quality product, and the delivery of the
order was made in a short time
1
very satisfied with a small and powerful
phone
1
Statistic Label Word count
Count 38310 38310
Mean 0.5 434.52
Std 0.5 335.04
Min 0 2
25% 0 119
50% 0.5 368.5
75% 1 745
Max 1 6158
6. Methodology
โข Text preprocessing
โข Training and testing with the most popular: DT, LR, SVM, NBC,
RNN, Transformer (BERT)
โข Accuracy evaluation: f1, precision, recall & loss and accuracy stability
7. Findings
Precision Recall F1-score Support
0 (negative) 0.92 0.94 0.93 3834
1 (positive) 0.94 0.92 0.93 3828
Accuracy 0.93 7662
โข BERT โ best performance with 0.1 loss after 5 epochs, 98% training acc, 93% validation accuracy
โข Competing accuracy of LR and SVC with less resources and faster training
โข Most models outperform the ones with translated text [1], [2]
โข Direction towards colloquial language in cross-disciplinary domains
9. Findings
โข robustness check:
โข training and testing using RoBERT [13]
โข Same learning rate and batch size
โข Comparable results โ93% validation accuracy
โข confirm the fitness of using variations of pretrained transformers for cross-domain,
Romanian text document for SA
10. Conclusions
โข compilation of a free-speech dataset to serve for machine learning
applications and validations of models for the task of sentiment
classification for the Romanian language
โข Comparison between ML methods: Googleโs BERT as best performing
โข Further research directions:
โข More annotated documents from other domains
โข Other vectorization techniques besides BOW
โข Comparison with the translations of the text into English
11. References
[1] Marcu, D., Danubianu, M.: Sentiment Analysis from Students' Feedback A Romanian High School Case Study. In: 15th International
Conference on Development and Application Systems (DAS), pp. 204-209, IEEE, Suceava, Romania (2020)
[2] Russu, R.M., Vlad, O.L., Dinsoreanu, M., Potolea, R.: An Opinion Mining Approach For Romanian Language. In: 2014 IEEE International
Conference on Intelligent Computer Communication and Processing (ICCP), pp.43-46, IEEE, Cluj Napoca, Romania (2014).
[3] Deriu, J.M., Weilenmann, M., von Grunigen, D., Cieliebak, M.: Potential and Limitations of Cross-Domain Sentiment Classification, In:
Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 14-24, Association for Computational
Linguistics, Valencia, Spain (2017).
[4] Bobicev, V., Maxim, V., Prodan, T., Burciu, N., Anghelus, V.: Emotions in Words: De-veloping a Multilingual WordNet-Affect. In:
Gelbukh, A.(eds.) 11th International Con-ference on Intelligent Text Processing and Computational Linguistics, vol. 6008, pp. 375-384.
Springer, Iasi, Romania (2010).
[5] Lupea, M., Briciu, A.: Studying emotions in Romanian words using Formal Concept Analysis. Computer Speech and Language 57, 128-
145 (2019).
[6] Gifu, D., Cioca, M.: Detecting Emotions in Comments on Forums. International Journal of Computers Communications & Control 9(6),
694-702 (2014).
12. References
[7] Sun, S.L., Luo, C., Chen, J.Y.: A review of natural language processing techniques for opinion mining systems. Information Fusion 36, 10-
25 (2017).
[8] Nassirtoussi, A.K., Aghabozorgi, S., Teh, YW., Ngo, D.C.L.: Text mining for market prediction: A systematic review. Expert Systems with
Applications 41(16), 7653-7670 (2014).
[9] Schuszter, I.C.: Integrating Deep Learning for NLP in Romanian Psychology. In: 2018 20th International Symposium on Symbolic and
Numeric Algorithms for Scientific Computing (SYNASC 2018), pp. 237-244. IEEE, Timisoara, Romania (2018).
[10] Google AI Blog: Open Sourcing BERT: State-of-the-Art Pre-training for Natural Lan-guage Processing, last accessed 2021/04/14 (2018).
[11] Tache, A.M., Gaman, M., Ionescu, R.T.: Clustering Word Embeddings with Self-Organizing Maps. Application on LaRoSeDa โ A Large
Romanian Sentiment Data Set. arXiv preprint arXiv:2101.04197 (2021).
[12] Katakonst: Sentiment Analysis with Tensorflow, https://github.com/katakonst/sentiment-analysis-tensorflow, last accessed 2021/04/11.
[13] Masala, M., Ruseti, S., Dascalu, M.: RoBERTโA Romanian BERT Model. In: Proceed-ings of the 28th International Conference on
Computational Linguistics, pp. 6626-6637, Barcelona, Spain (2020).