Feature-Based SubjectivityClassification of FilipinoTextRalph Vincent J. RegaladoCharibeth K. ChengDe La Salle University,...
TEXT
fact     opinion[fakt]   [uh-pin-yuhn]
SENTIMENT ANALYSIS
machine learning based     [muh-sheen lur-ning beys]    lexicon based         [lek-si-kon beys]
Sentiment Polarity Classification negative   neutral     positive
subjectivity analysis  [suhb-jek-tiv-i-tee uh-nal-uh-sis]
Though there are a lot of existingapproaches and techniques developed, no    reports have been made if these  approaches a...
Methodology     Raw Data   POS TaggingSubjectivity Tagging                         Lexicon Term Frequency& Presence Taggin...
Methodology     Raw Data                        Raw Data                                     •   20 articles were   POS Ta...
Methodology     Raw Data                        POS Tagging                                     •   We extracted the   POS...
Methodology     Raw Data                        Subjectivity Tagging                                     •   We crowd-sour...
Methodology     Raw Data                        Lexicon                                     •   The subjectivity   POS Tag...
Methodology     Raw Data                        Opinionated Keyword                                     •   We handpicked ...
Methodology     Raw Data                        Term Frequency &                                     Presence Tagging   PO...
Methodology     Raw Data                        Classifier                                     •   Naïve Bayes, Bagging,  ...
Dataset                                425 EntriesExperiment                     200                        225           ...
Experiment 1 : Opinion Keyword             66.00%Experiment             64.00%             62.00%             60.00%      ...
Experiment 2: Subjectivity Lexicon             using Strong Subjective Words             70.00%Experiment             60.0...
Experiment 3: Subjectivity Lexicon usingExperiment   Strong & Weak Subjective Words             70.00%             60.00% ...
Discussion• It’s clear that the results of experiment 1, 2  and 3 are relatively the same and all the  classifiers yielded...
Discussion• We only used a small dataset, these  complications might have cause the  presented results.
Summary and Further Works• We were able to apply sentence-level  subjectivity classification (binary) in Filipino  Languag...
Summary and Further Works• Increasing the volume of the dataset.• Increasing the number of lexical items of the  lexicon t...
References•   Liu, B. 2010. Sentiment analysis and subjectivity. Handbook of Natural Language Processing,    Second Editio...
Feature-Based SubjectivityClassification of FilipinoTextRalph Vincent RegaladoCharibeth ChengDe La Salle University, Phili...
Upcoming SlideShare
Loading in …5
×

[IALP 2012] Feature-Based Subjectivity Classification of Filipino Text

1,329 views

Published on

Oral Presentation during 2012 International Conference on Asian Language Processing at Hanoi, Vietnam

Published in: Education
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
1,329
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

[IALP 2012] Feature-Based Subjectivity Classification of Filipino Text

  1. 1. Feature-Based SubjectivityClassification of FilipinoTextRalph Vincent J. RegaladoCharibeth K. ChengDe La Salle University, Philippines
  2. 2. TEXT
  3. 3. fact opinion[fakt] [uh-pin-yuhn]
  4. 4. SENTIMENT ANALYSIS
  5. 5. machine learning based [muh-sheen lur-ning beys] lexicon based [lek-si-kon beys]
  6. 6. Sentiment Polarity Classification negative neutral positive
  7. 7. subjectivity analysis [suhb-jek-tiv-i-tee uh-nal-uh-sis]
  8. 8. Though there are a lot of existingapproaches and techniques developed, no reports have been made if these approaches are indeed effective when adapted to the Filipino language.
  9. 9. Methodology Raw Data POS TaggingSubjectivity Tagging Lexicon Term Frequency& Presence Tagging Opinionated Keyword Classifier
  10. 10. Methodology Raw Data Raw Data • 20 articles were POS Tagging collected from the Filipino editorial sections of 3Subjectivity Tagging Philippine Newspapers Lexicon (Filipino) Term Frequency • The collection of& Presence Tagging Opinionated articles produced 425 Keyword sentences Classifier
  11. 11. Methodology Raw Data POS Tagging • We extracted the POS Tagging POS information of the sentences to determine theSubjectivity Tagging presence of adjectives Lexicon and adverbs in a Term Frequency sentence.& Presence Tagging Opinionated • We used the TPOST* Keyword tagger to Classifier automatically tag our corpora.
  12. 12. Methodology Raw Data Subjectivity Tagging • We crowd-sourced the POS Tagging subjectivity tags of each sentence in our training data.Subjectivity Tagging • We set-up a web-based Lexicon system that gathered Term Frequency the tags from its& Presence Tagging Opinionated respondents. Keyword Classifier
  13. 13. Methodology Raw Data Lexicon • The subjectivity POS Tagging lexicon was generated using a dictionary- based approach.Subjectivity Tagging • 3,541 lexicon entries Lexicon were added to the Term Frequency subjectivity lexicon,& Presence Tagging Opinionated 1,917 were tagged as Keyword strong subjective Classifier while the rest are tagged as weak subjective.
  14. 14. Methodology Raw Data Opinionated Keyword • We handpicked 22 POS Tagging opinionated keywords that are commonly used terms in Filipino.Subjectivity Tagging Lexicon Term Frequency& Presence Tagging Opinionated Keyword Classifier
  15. 15. Methodology Raw Data Term Frequency & Presence Tagging POS Tagging • We count the number of words from the subjectivity lexiconSubjectivity Tagging and opinionated Lexicon keywords that Term Frequency& Presence Tagging appeared in the Opinionated sentence. Keyword Classifier
  16. 16. Methodology Raw Data Classifier • Naïve Bayes, Bagging, POS Tagging Multilayer perceptron and Random Forest Tree classifier wereSubjectivity Tagging used to test the Lexicon gathered features Term Frequency& Presence Tagging Opinionated Keyword Classifier
  17. 17. Dataset 425 EntriesExperiment 200 225 Subjective Objective Subjectivity POS Features + All features Features Term Presence
  18. 18. Experiment 1 : Opinion Keyword 66.00%Experiment 64.00% 62.00% 60.00% 58.00% subjectivity features pos features + term presence 56.00% all features 54.00% 52.00% 50.00% 48.00% Naïve Bayes Bagging MP RFT
  19. 19. Experiment 2: Subjectivity Lexicon using Strong Subjective Words 70.00%Experiment 60.00% 50.00% 40.00% subjectivity features pos features + term presence 30.00% all features 20.00% 10.00% 0.00% Naïve Bayes Bagging MP RFT
  20. 20. Experiment 3: Subjectivity Lexicon usingExperiment Strong & Weak Subjective Words 70.00% 60.00% 50.00% 40.00% subjectivity features pos features + term presence 30.00% all features 20.00% 10.00% 0.00% Naïve Bayes Bagging MP RFT
  21. 21. Discussion• It’s clear that the results of experiment 1, 2 and 3 are relatively the same and all the classifiers yielded average results.• It is noticeable that some of the terms are incorrectly tagged (e.g. ahensya(agency) is tagged as an adjective)• Possibility that sentences were incorrectly tagged due to the small amount of votes we collected.
  22. 22. Discussion• We only used a small dataset, these complications might have cause the presented results.
  23. 23. Summary and Further Works• We were able to apply sentence-level subjectivity classification (binary) in Filipino Language using a feature-based approach.• We also introduced a crowd source-based approach using votes for subjectivity tagging.• Confirming the results of this paper by manually tagging the POS information of each sentence and expanding the size of the votes collected.
  24. 24. Summary and Further Works• Increasing the volume of the dataset.• Increasing the number of lexical items of the lexicon taking into consideration the unique features of the Filipino Language. Also to highlight difference of Filipino Language to other Languages (English).
  25. 25. References• Liu, B. 2010. Sentiment analysis and subjectivity. Handbook of Natural Language Processing, Second Edition.• Wiebe, J., Bruce, F., and O’Hara, T. 1999. Development and use of a gold-standard data set for subjectivity classification. In Proceedings of the Association for Computational Linguistics(ACL), pp. 246-253.• Yu, H. and Hatzivassiloglou, V. 2003. Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 526-531.• Wilson, T., Wiebe, J., and Hwa, R. 2004. Just how mad are you? Finding strong and weak opinion clauses. In Proceedings of AAAI, pp. 761-769.• M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. SIGKDD Explor. Newsl., 11:10-18, November 2009• Cheng, C., and Rabo, V. 2006. TPOST: A Template-Based, n-gram Part-of-Speech Tagger for Tagalog. Journal of Research in Science, Computing and Engineering (JRSCE), Vol. 3, No. 1. DLSU-M, March 2006.• Dietterich, T. G. 2003. Machine Learning. In Nature Encyclopedia of Cognitive Science, London: Macmillan
  26. 26. Feature-Based SubjectivityClassification of FilipinoTextRalph Vincent RegaladoCharibeth ChengDe La Salle University, Philippines

×