Though there are a lot of existingapproaches and techniques developed, no reports have been made if these approaches are indeed effective when adapted to the Filipino language.
Methodology Raw Data POS TaggingSubjectivity Tagging Lexicon Term Frequency& Presence Tagging Opinionated Keyword Classifier
Methodology Raw Data Raw Data • 20 articles were POS Tagging collected from the Filipino editorial sections of 3Subjectivity Tagging Philippine Newspapers Lexicon (Filipino) Term Frequency • The collection of& Presence Tagging Opinionated articles produced 425 Keyword sentences Classifier
Methodology Raw Data POS Tagging • We extracted the POS Tagging POS information of the sentences to determine theSubjectivity Tagging presence of adjectives Lexicon and adverbs in a Term Frequency sentence.& Presence Tagging Opinionated • We used the TPOST* Keyword tagger to Classifier automatically tag our corpora.
Methodology Raw Data Subjectivity Tagging • We crowd-sourced the POS Tagging subjectivity tags of each sentence in our training data.Subjectivity Tagging • We set-up a web-based Lexicon system that gathered Term Frequency the tags from its& Presence Tagging Opinionated respondents. Keyword Classifier
Methodology Raw Data Lexicon • The subjectivity POS Tagging lexicon was generated using a dictionary- based approach.Subjectivity Tagging • 3,541 lexicon entries Lexicon were added to the Term Frequency subjectivity lexicon,& Presence Tagging Opinionated 1,917 were tagged as Keyword strong subjective Classifier while the rest are tagged as weak subjective.
Methodology Raw Data Opinionated Keyword • We handpicked 22 POS Tagging opinionated keywords that are commonly used terms in Filipino.Subjectivity Tagging Lexicon Term Frequency& Presence Tagging Opinionated Keyword Classifier
Methodology Raw Data Term Frequency & Presence Tagging POS Tagging • We count the number of words from the subjectivity lexiconSubjectivity Tagging and opinionated Lexicon keywords that Term Frequency& Presence Tagging appeared in the Opinionated sentence. Keyword Classifier
Methodology Raw Data Classifier • Naïve Bayes, Bagging, POS Tagging Multilayer perceptron and Random Forest Tree classifier wereSubjectivity Tagging used to test the Lexicon gathered features Term Frequency& Presence Tagging Opinionated Keyword Classifier
Dataset 425 EntriesExperiment 200 225 Subjective Objective Subjectivity POS Features + All features Features Term Presence
Experiment 1 : Opinion Keyword 66.00%Experiment 64.00% 62.00% 60.00% 58.00% subjectivity features pos features + term presence 56.00% all features 54.00% 52.00% 50.00% 48.00% Naïve Bayes Bagging MP RFT
Experiment 2: Subjectivity Lexicon using Strong Subjective Words 70.00%Experiment 60.00% 50.00% 40.00% subjectivity features pos features + term presence 30.00% all features 20.00% 10.00% 0.00% Naïve Bayes Bagging MP RFT
Experiment 3: Subjectivity Lexicon usingExperiment Strong & Weak Subjective Words 70.00% 60.00% 50.00% 40.00% subjectivity features pos features + term presence 30.00% all features 20.00% 10.00% 0.00% Naïve Bayes Bagging MP RFT
Discussion• It’s clear that the results of experiment 1, 2 and 3 are relatively the same and all the classifiers yielded average results.• It is noticeable that some of the terms are incorrectly tagged (e.g. ahensya(agency) is tagged as an adjective)• Possibility that sentences were incorrectly tagged due to the small amount of votes we collected.
Discussion• We only used a small dataset, these complications might have cause the presented results.
Summary and Further Works• We were able to apply sentence-level subjectivity classification (binary) in Filipino Language using a feature-based approach.• We also introduced a crowd source-based approach using votes for subjectivity tagging.• Confirming the results of this paper by manually tagging the POS information of each sentence and expanding the size of the votes collected.
Summary and Further Works• Increasing the volume of the dataset.• Increasing the number of lexical items of the lexicon taking into consideration the unique features of the Filipino Language. Also to highlight difference of Filipino Language to other Languages (English).
References• Liu, B. 2010. Sentiment analysis and subjectivity. Handbook of Natural Language Processing, Second Edition.• Wiebe, J., Bruce, F., and O’Hara, T. 1999. Development and use of a gold-standard data set for subjectivity classification. In Proceedings of the Association for Computational Linguistics(ACL), pp. 246-253.• Yu, H. and Hatzivassiloglou, V. 2003. Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 526-531.• Wilson, T., Wiebe, J., and Hwa, R. 2004. Just how mad are you? Finding strong and weak opinion clauses. In Proceedings of AAAI, pp. 761-769.• M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. SIGKDD Explor. Newsl., 11:10-18, November 2009• Cheng, C., and Rabo, V. 2006. TPOST: A Template-Based, n-gram Part-of-Speech Tagger for Tagalog. Journal of Research in Science, Computing and Engineering (JRSCE), Vol. 3, No. 1. DLSU-M, March 2006.• Dietterich, T. G. 2003. Machine Learning. In Nature Encyclopedia of Cognitive Science, London: Macmillan
Feature-Based SubjectivityClassification of FilipinoTextRalph Vincent RegaladoCharibeth ChengDe La Salle University, Philippines