Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current Work             ...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current Work          Mul...
Outline          MD Classification            MDSSL             Application to SA           Conclusions   Current WorkSuper...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkUni-dimension...
Outline          MD Classification             MDSSL            Application to SA         Conclusions   Current WorkMulti-d...
Outline          MD Classification            MDSSL             Application to SA        Conclusions   Current WorkBayesian...
Outline          MD Classification            MDSSL             Application to SA         Conclusions   Current WorkMulti-d...
Outline          MD Classification                     MDSSL                 Application to SA                Conclusions  ...
Outline          MD Classification            MDSSL             Application to SA             Conclusions       Current Wor...
Outline          MD Classification            MDSSL             Application to SA         Conclusions   Current WorkSub-fam...
Outline          MD Classification            MDSSL             Application to SA         Conclusions   Current WorkSub-fam...
Outline          MD Classification            MDSSL             Application to SA        Conclusions   Current WorkSub-fami...
Outline          MD Classification            MDSSL             Application to SA        Conclusions   Current WorkSub-fami...
Outline          MD Classification            MDSSL             Application to SA        Conclusions   Current WorkSub-fami...
Outline          MD Classification            MDSSL             Application to SA        Conclusions   Current WorkSub-fami...
Outline          MD Classification            MDSSL             Application to SA         Conclusions              Current ...
Outline          MD Classification            MDSSL             Application to SA         Conclusions              Current ...
Outline          MD Classification            MDSSL             Application to SA         Conclusions              Current ...
Outline          MD Classification            MDSSL             Application to SA         Conclusions              Current ...
Outline          MD Classification            MDSSL             Application to SA         Conclusions              Current ...
Outline           MD Classification           MDSSL             Application to SA         Conclusions              Current ...
Outline          MD Classification            MDSSL             Application to SA         Conclusions              Current ...
Outline          MD Classification            MDSSL             Application to SA         Conclusions              Current ...
Outline          MD Classification            MDSSL             Application to SA         Conclusions              Current ...
Outline          MD Classification            MDSSL             Application to SA         Conclusions              Current ...
Outline          MD Classification            MDSSL             Application to SA         Conclusions              Current ...
Outline          MD Classification            MDSSL             Application to SA         Conclusions              Current ...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkA supervised ...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkMajor Problem...
Outline          MD Classification              MDSSL            Application to SA               Conclusions   Current Work...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkThe Expectati...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkArtificial Exp...
Outline                     MD Classification                       MDSSL   Application to SA   Conclusions   Current WorkA...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkApplication t...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkMotivation fo...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkHypothesis Fo...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkProperties of...
Outline          MD Classification            MDSSL             Application to SA                Conclusions          Curre...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkProperties of...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkExperiment 1 ...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkExperiment 1 ...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkExperiment 1-...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkExperiment 2 ...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkExperiment 2 ...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkExperiment 2 ...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkExperiment 2 ...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkExperiment 2 ...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkConclusions I...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkConclusions I...
Outline             MD Classification         MDSSL             Application to SA   Conclusions   Current WorkFeature Selec...
Outline          MD Classification            MDSSL             Application to SA          Conclusions      Current WorkSem...
Outline          MD Classification            MDSSL             Application to SA       Conclusions   Current WorkSemi-supe...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkSemi-supervis...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkSemi-supervis...
Outline            MD Classification          MDSSL             Application to SA            Conclusions           Current ...
Outline          MD Classification             MDSSL              Application to SA                     Conclusions   Curre...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkAffect Analysi...
Outline          MD Classification            MDSSL             Application to SA     Conclusions   Current WorkPlutchik’s ...
Outline          MD Classification            MDSSL             Application to SA   Conclusions   Current WorkQuestions    ...
Upcoming SlideShare
Loading in...5
×

2011 - Current Research

468

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
468
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2011 - Current Research

  1. 1. Outline MD Classification MDSSL Application to SA Conclusions Current Work Semi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers Jonathan Ortigosa-Hern´ndez a January 28th, 2011Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  2. 2. Outline MD Classification MDSSL Application to SA Conclusions Current Work Multi-dimensional Supervised Classification Multi-dimensional Semi-supervised Learning Application to Sentiment Analysis Conclusions Current Topics of ResearchJonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  3. 3. Outline MD Classification MDSSL Application to SA Conclusions Current WorkSupervised Classification It consists of building a classifier Ψ from a given labelled training dataset D, by using an induction algorithm A (A(D) = Ψ), X1 X2 ... Xn C (1) (1) (1) x1 x2 ... xn c (1) (2) (2) (2) x1 x2 ... xn c (2) ... ... ... ... ... (N) (N) (N) x1 x2 ... xn c (N) in order to predict the value of a class variable C for any new unlabelled instance x (Ψ(x) = c).Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  4. 4. Outline MD Classification MDSSL Application to SA Conclusions Current WorkUni-dimensional and Multi-dimensional Classification Uni-dimensional classification tries to predict a single class variable based on a dataset composed of a set of labelled examples. (Uni-dimensional Class) Bayesian Network Classifiers (Larra˜aga et al, 2005). n Multi-dimensional classification is the generalisation of the single-class classification task to the simultaneous prediction of a set of class variables. Multi-dimensional Class Bayesian Network Classifiers (v.d. Gaag and d. Waal, 2006). Do not confuse with multi-class and multi-label classification.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  5. 5. Outline MD Classification MDSSL Application to SA Conclusions Current WorkMulti-dimensional Supervised Learning A typical supervised training dataset X1 X2 ... Xn C1 C2 ... Cm (1) (1) (1) (1) (1) (1) x1 x2 ... xn c1 c2 ... cm (2) (2) (2) (2) (2) (2) x1 x2 ... xn c1 c2 ... cm ... ... ... ... ... ... ... ... (N) (N) (N) (N) (N) (N) x1 x2 ... xn c1 c2 ... cm Each instance of the dataset contains both the values of the attributes and m labels which characterise the attributes.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  6. 6. Outline MD Classification MDSSL Application to SA Conclusions Current WorkBayesian Network Classifiers C X1 X2 X3 X4 X5 X6 Figure: A (uni-dimensional) naive Bayes structure.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  7. 7. Outline MD Classification MDSSL Application to SA Conclusions Current WorkMulti-dimensional Class Bayesian Network Classifiers(MDBNC) C1 C2 C3 X1 X2 X3 X4 X5 X6 Figure: A multi-dimensional naive Bayes structure.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  8. 8. Outline MD Classification MDSSL Application to SA Conclusions Current WorkMDBNC Structure C1 C2 C3 C1 C2 C3 X1 X2 X3 X4 X5 X1 X2 X3 X4 X5 (a) Complete graph (b) Feature selection subgraph C1 C2 C3 X1 X2 X3 X4 X5 (c) Class subgraph (d) Feature subgraph Figure: A MDNBC structure and its division.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  9. 9. Outline MD Classification MDSSL Application to SA Conclusions Current WorkSub-families of MDBNC (a) Multi-dimensional naive Bayes (b) Multi-dimensional tree-augmented network (c) Multi-dimensional J/K dependence Bayesian (2/3) Figure: Different subfamilies of MDBNC.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  10. 10. Outline MD Classification MDSSL Application to SA Conclusions Current WorkSub-families of MDBNC - MDnB Multi-dimensional naive Bayes (MDnB) The class and feature subgraphs are empty. Each class variable is parent of all the features.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  11. 11. Outline MD Classification MDSSL Application to SA Conclusions Current WorkSub-families of MDBNC - MDnB Multi-dimensional naive Bayes (MDnB) It has a fixed structure. Thus, it has no structural learning (v.d. Gaag and d. Waal, 2006).Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  12. 12. Outline MD Classification MDSSL Application to SA Conclusions Current WorkSub-families of MDBNC - MDnB Multi-dimensional tree-augmented network classifier (MDTAN) The class and feature subgraphs are trees.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  13. 13. Outline MD Classification MDSSL Application to SA Conclusions Current WorkSub-families of MDBNC - MDnB Multi-dimensional tree-augmented network classifier (MDTAN) A wrapper structural learning algorithm is proposed in (v.d. Gaag and d. Waal, 2006). [NEW] We have recently proposed a filter approach to learn MDTAN structures.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  14. 14. Outline MD Classification MDSSL Application to SA Conclusions Current WorkSub-families of MDBNC - MDnB Multi-dimensional J/K dependence Bayesian classifier (MD J/K ) The class subgraph is a J-dependence graph. The feature subgraph is a K -dependence graph.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  15. 15. Outline MD Classification MDSSL Application to SA Conclusions Current WorkSub-families of MDBNC - MDnB Multi-dimensional J/K dependence Bayesian classifier (MD J/K ) There was not a specific structural learning algorithm. So, we proposed a learning algorithm in (Ortigosa-Hern´ndez et a al, 2010).Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  16. 16. Outline MD Classification MDSSL Application to SA Conclusions Current WorkA supervised method to learn a MD J/K structure(Ortigosa-Hern´ndez et al, 2010) a Step 0 - Initialisation C1 C2 C3 C4 Establish the maximum number of parents in both class and feature X1 X2 X3 X4 X5 X6 X7 X8 subgraphs, i.e. J = 2 and K = 2.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  17. 17. Outline MD Classification MDSSL Application to SA Conclusions Current WorkA supervised method to learn a MD J/K structure(Ortigosa-Hern´ndez et al, 2010) a Step 1 - Learn the structure between the class variables (Ac ) C1 C2 C3 C4 Calculate the mutual information MI (Ci , Cj ) for each pair of class X1 X2 X3 X4 X5 X6 X7 X8 variables.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  18. 18. Outline MD Classification MDSSL Application to SA Conclusions Current WorkA supervised method to learn a MD J/K structure(Ortigosa-Hern´ndez et al, 2010) a Step 1 - Learn the structure between the class variables (Ac ) Calculate the p-values (significance of each C1 C2 C3 C4 mutual information) using independence test. C1 C2 C3 X1 X2 X3 X4 X5 X6 X7 X8 C4 0.36 0.57 0.01 C3 0.27 0.63 C2 0.06Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  19. 19. Outline MD Classification MDSSL Application to SA Conclusions Current WorkA supervised method to learn a MD J/K structure(Ortigosa-Hern´ndez et al, 2010) a Step 1 - Learn the structure between the class variables (Ac ) Remove the p-values greater than the C1 C2 C3 C4 threshold α = 0.1. C1 C2 C3 C4 0.36 0.57 0.01 X1 X2 X3 X4 X5 X6 X7 X8 C3 0.27 0.63 C2 0.06Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  20. 20. Outline MD Classification MDSSL Application to SA Conclusions Current WorkA supervised method to learn a MD J/K structure(Ortigosa-Hern´ndez et al, 2010) a Step 1 - Learn the structure between the class variables (Ac ) From the lowest value, add arcs to the graph fulfilling the conditions C1 C2 C3 C4 of no cycles and no more than J-parents per class variable. X1 X2 X3 X4 X5 X6 X7 X8 C1 C2 C3 C4 x x 0.01 C3 x x C2 0.06Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  21. 21. Outline MD Classification MDSSL Application to SA Conclusions Current WorkA supervised method to learn a MD J/K structure(Ortigosa-Hern´ndez et al, 2010) a Step 1 - Learn the structure between the class variables (Ac ) From the lowest value, add arcs to the graph fulfilling the conditions C1 C2 C3 C4 of no cycles and no more than J-parents per class variable. X1 X2 X3 X4 X5 X6 X7 X8 C1 C2 C3 C4 x x x C3 x x C2 0.06Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  22. 22. Outline MD Classification MDSSL Application to SA Conclusions Current WorkA supervised method to learn a MD J/K structure(Ortigosa-Hern´ndez et al, 2010) a Step 2 - Learn the structure between the class variables and the features (ACF ) C1 C2 C3 C4 Calculate the mutual information MI (Ci , Xj ) for each pair Ci and Xj . X1 X2 X3 X4 X5 X6 X7 X8Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  23. 23. Outline MD Classification MDSSL Application to SA Conclusions Current WorkA supervised method to learn a MD J/K structure(Ortigosa-Hern´ndez et al, 2010) a Step 2 - Learn the structure between the class variables and the features (ACF ) Calculate the p-value of the mutual informations. C1 C2 C3 C4 C1 C2 C3 C4 X1 0.64 0.00 0.77 0.98 X2 0.82 0.03 0.11 0.37 X3 0.00 0.06 0.00 0.01 X4 0.68 0.09 0.00 0.55 X1 X2 X3 X4 X5 X6 X7 X8 X5 0.81 0.12 0.81 0.65 X6 0.57 0.24 0.00 0.00 X7 0.25 0.26 0.00 0.00 X8 0.32 0.15 0.00 0.44Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  24. 24. Outline MD Classification MDSSL Application to SA Conclusions Current WorkA supervised method to learn a MD J/K structure(Ortigosa-Hern´ndez et al, 2010) a Step 2 - Learn the structure between the class variables and the features (ACF ) Remove the p-values greater than α = 0.1. C1 C2 C3 C4 C1 C2 C3 C4 X1 0.64 0.00 0.77 0.98 X2 0.82 0.03 0.11 0.37 X3 0.00 0.06 0.00 0.01 X4 0.68 0.09 0.00 0.55 X1 X2 X3 X4 X5 X6 X7 X8 X5 0.81 0.12 0.81 0.65 X6 0.57 0.24 0.00 0.00 X7 0.25 0.26 0.00 0.00 X8 0.32 0.15 0.00 0.44Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  25. 25. Outline MD Classification MDSSL Application to SA Conclusions Current WorkA supervised method to learn a MD J/K structure(Ortigosa-Hern´ndez et al, 2010) a Step 2 - Learn the structure between the class variables and the features (ACF ) Add all the arcs to the structure. C1 C2 C3 C4 C1 C2 C3 C4 X1 x 0.00 x x X2 x 0.03 x x X3 0.00 0.06 0.00 0.01 X4 x 0.09 0.00 x X1 X2 X3 X4 X5 X6 X7 X8 X5 x x x x X6 x x 0.00 0.00 X7 x x 0.00 0.00 X8 x x 0.00 xJonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  26. 26. Outline MD Classification MDSSL Application to SA Conclusions Current WorkA supervised method to learn a MD J/K structure(Ortigosa-Hern´ndez et al, 2010) a Step 3 - Learn the structure between the features(AF ) Calculate the conditional mutual information C1 C2 C3 C4 MI (Xi , Xj ||Pac (Xj )). Calculate the p-values. X1 X2 X3 X4 X5 X6 X7 X8 Remove the p-values greater than the threshold α = 0.1.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  27. 27. Outline MD Classification MDSSL Application to SA Conclusions Current WorkA supervised method to learn a MD J/K structure(Ortigosa-Hern´ndez et al, 2010) a Step 3 - Learn the structure between the features (AF ) C1 C2 C3 C4 Add arcs between the features fulfilling the conditions of no cycles between the features X1 X2 X3 X4 X5 X6 X7 X8 and no more than K -parents per feature.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  28. 28. Outline MD Classification MDSSL Application to SA Conclusions Current WorkA supervised method to learn a MDTAN structure(MDTANfi) It is similar to the method to learn MD J/K structures, but trees are learnt in the AC and AF by means of a maximum spanning tree algorithmJonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  29. 29. Outline MD Classification MDSSL Application to SA Conclusions Current WorkMajor Problem of Supervised Learning However, in many real world problems, obtaining data is relatively easy, while labelling is difficult, expensive or labor intensive (usually done by an external mechanism, e.g. human beings). This problem is accentuated when using multiple target variables. DESIRE: Learning algorithms able to incorporate a large number of unlabelled data with a small number of labeled data when learning competitive classifiers.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  30. 30. Outline MD Classification MDSSL Application to SA Conclusions Current WorkMulti-dimensional Semi-supervised Learning A typical semi-supervised training dataset X1 X2 ... Xn C1 C2 ... Cm (1) (1) (1) (1) (1) (1) x1 x2 ... xn c1 c2 ... cm (2) (2) (2) (2) (2) (2) x1 x2 ... xn c1 c2 ... cm ... ... ... ... ... ... ... ... (L) (L) (L) (L) (L) (L) x1 x2 ... xn c1 c2 ... cm (L+1) (L+1) (L+1) x1 x2 ... xn ? ? ... ? (L+2) (L+2) (L+2) x1 x2 ... xn ? ? ... ? ... ... ... ... ... ... ... ... (N) (N) (N) x1 x2 ... xn ? ? ... ? Semi-supervised Learning fulfils this desire.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  31. 31. Outline MD Classification MDSSL Application to SA Conclusions Current WorkThe Expectation-Maximisation Algorithm The EM algorithm (Dempster et al, 1977) Learn an initial model. Repeat until convergence: (a) Expectation step: Using the current model, estimate the missing values of the data. (b) Maximisation step: Using the whole data and the previous estimations, learn a new current model. Any MDBNC learning algorithm can be used as model in this algorithm.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  32. 32. Outline MD Classification MDSSL Application to SA Conclusions Current WorkArtificial Experimentation Study the behaviour of the proposed algorithms along several axes of variability: √ 1. Complexity of the problem (generative structure) 2. Number of variables (features and class variables) 3. Balance of the labels in the generative structure (values of the hyperparamenters for the Dirichlet) 4. Size of the labelled sample 5. Ratio of labelled-unlabelled dataJonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  33. 33. Outline MD Classification MDSSL Application to SA Conclusions Current WorkArtificial Experimentation A preliminary experimentation on the complexity of the problem can be found in: http://www.sc.ehu.es/ccwbayes/members/ jonathan/home/News_and_Notables/Entries/ 2010/11/30_IMACS_2011.html N TA 2/3 MD MD B 2D MD 1/1 2/2 nB N nB B TA MD MD 3D 9 8 7 6 5 4 3 2 1 Figure: Accuracy ranking for different algorithms on 20 artificial datasets, α = 0.05.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  34. 34. Outline MD Classification MDSSL Application to SA Conclusions Current WorkApplication to Sentiment Analysis Sentiment Analysis (AKA Opinion Mining) is the computational study of opinions, sentiments and emotions expressed in text (Liu, 2010). When treating Sentiment Analysis as a classification problem, several different (but related) problems appear. For example: 1. Subjectivity Classification. Its aim is to classify a text as subjective or objective. 2. Sentiment Classification. It classifies an opinionated text as expressing a positive, neutral, or negative opinion.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  35. 35. Outline MD Classification MDSSL Application to SA Conclusions Current WorkMotivation for Using Semi-supervised Learning ofMulti-dimensional Classifiers 1. Up to now, these subproblems have been studied in isolation despite of being closely related. So, probably it would be helpful to use multi-dimensional classifiers. 2. Obtaining enough labeled examples for a classifier may be costly and time consuming. This motivates us to deal with unlabelled examples in a semi-supervised framework.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  36. 36. Outline MD Classification MDSSL Application to SA Conclusions Current WorkHypothesis Formulated First Hypothesis The explicit use of the relationships between different class variables can be beneficial to improve their recognition rates. Second Hypothesis Multi-dimensional techniques can work with unlabelled data in order to improve the classification rates in Sentiment Analysis.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  37. 37. Outline MD Classification MDSSL Application to SA Conclusions Current WorkProperties of the Dataset Collected by Socialware Company S.A., from the ASOMO service of mobilised opinion analysis. It consists of 2, 542 Spanish reviews extracted from a blog: 150 documents have been labeled in isolation by an expert. 2, 392 posts are left unlabelled. Figure: The ASOMO corpus.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  38. 38. Outline MD Classification MDSSL Application to SA Conclusions Current WorkProperties of Each Document I Each document is represented as 14 features: Obtained by using an open source morphological analyser (Carreras et al, 2006). Each feature provide different information related to part-of-speech (PoS). Feature Description Example 1 First Persons Number of verbs in the fist person. Contrat´ ... . e 2 Second Persons Number of verbs in the second per- Tienes ... son. 3 Third Persons Number of verbs in the third per- Sabe ... . son. 4 Relational Forms Number of phatic expressions, i.e. (1) Hola. expressions whose only function is (2) Gracias de antemano. to perform a social task. 5 Agreement Expres- Number of expressions that show (1) Estoy de acuerdo contigo. sions agreement or disagreement. (2) No tienes raz´n. o 6 Request Number of sentences that express (1) Me gustar´ saber ... ıa a certain degree of request. (2) Alguien podr´ ... ıa Table: Subset of features related to the implication of the author with other customers.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  39. 39. Outline MD Classification MDSSL Application to SA Conclusions Current WorkProperties of Each Document II Each document has 3 class variables: Will to Influence: {declarative sentence, soft WI, medium WI, strong WI} Sentiment: {very negative, negative, neutral, positive, very positive} Subjectivity: {Yes (subjective), No (objective)}Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  40. 40. Outline MD Classification MDSSL Application to SA Conclusions Current WorkExperiment 1 - Set Up I First Hypothesis The explicit use of the relationships between different class variables can be beneficial to improve their recognition rates. The ASOMO corpus has been used to learn: 3 (uni-dimensional) naive Bayes classifiers, one per each class variable. A (uni-dimensional) naive Bayes classifier with a compound class variable. A multi-dimensional naive Bayes classifier.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  41. 41. Outline MD Classification MDSSL Application to SA Conclusions Current WorkExperiment 1 - Set Up II First Hypothesis The explicit use of the relationships between different class variables can be beneficial to improve their recognition rates. Features from ASOMO dataset are discretised into 3 values using equal frequency. In addition to the ASOMO feature set, three state-of-the-art feature sets are used: Unigrams Unigrams + Bigrams PoS tagging Results averaged over 20 × 5 fold cross validation.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  42. 42. Outline MD Classification MDSSL Application to SA Conclusions Current WorkExperiment 1- JOINT Accuracies Figure: JOINT accuracies on ASOMO corpus using three different feature sets in both uni and multi-dimensional scenarios (20 × 5cv)Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  43. 43. Outline MD Classification MDSSL Application to SA Conclusions Current WorkExperiment 2 - Set Up Second Hypothesis Multi-dimensional techniques can work with unlabelled data in order to improve the classification rates in Sentiment Analysis. The ASOMO dataset has been used to learn: 3 (uni-dimensional) Bayesian network classifiers: nB, TAN and 2DB. 5 MDBNC: MDnB, MDTAN, MD 2/2, MD 2/3 and MD 2/4. In both Supervised and Semi-supervised (EM algorithm) learning frameworks. Features from ASOMO dataset are discretised into 3 values using equal frequency. Results averaged over 20 × 5 fold cross validation.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  44. 44. Outline MD Classification MDSSL Application to SA Conclusions Current WorkExperiment 2 - JOINT Accuracy Figure: JOINT accuracies on ASOMO dataset in the supervised and semi-supervised learning frameworks.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  45. 45. Outline MD Classification MDSSL Application to SA Conclusions Current WorkExperiment 2 - Will to Influence Figure: Accuracies for the Will to Influence class variable on ASOMO dataset in the supervised and semi-supervised learning frameworks.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  46. 46. Outline MD Classification MDSSL Application to SA Conclusions Current WorkExperiment 2 - Sentiment Polarity Figure: Accuracies for the Sentiment Polarity class variable on ASOMO dataset in the supervised and semi-supervised learning frameworks.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  47. 47. Outline MD Classification MDSSL Application to SA Conclusions Current WorkExperiment 2 - Subjectivity Figure: Accuracies for the Subjectivity class variable on ASOMO dataset in the supervised and semi-supervised learning frameworks.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  48. 48. Outline MD Classification MDSSL Application to SA Conclusions Current WorkConclusions I - Methodology Multi-dimensional classification and semi-supervised learning are two different branches of machine learning. With this research, we have established a bridge between them showing that: Uni-dimensional approaches cannot capture the real nature of multi-dimensional problems. More accurate classifiers can be found using the multi-dimensional learning approaches. The use of large amounts of unlabelled data can be beneficial to improve recognition rates.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  49. 49. Outline MD Classification MDSSL Application to SA Conclusions Current WorkConclusions II - Application With respect to the Sentiment Analysis application, we have proposed a novel perspective to solve the problem. Experimental results demonstrate that the use of multi-dimensional classification, as well as the use of unlabelled data, can lead us to more accurate classifiers.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  50. 50. Outline MD Classification MDSSL Application to SA Conclusions Current WorkFeature Selection Title: Semi-supervised Feature Selection in Multi-dimensional Problems Description: Develop a methodology able to identify irrelevant and redundant features in multi-dimensional problems for dimension reduction in a semi-supervised framework. Motivation Feature selection try to avoid problems related to overfitting, computation burden, etc. Up to now, there is no feature selection technique which is able to deal with multiple class variables. Few work has been done in semi-supervised feature selection (Cai et al, 2011).Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  51. 51. Outline MD Classification MDSSL Application to SA Conclusions Current WorkSemi-supervised Feature Selection X1 Xn C1 Cm Supervised feature selection Unsupervised feature selection Feature relevance is usually Evaluated by their capability of evaluated by their correlation with keeping certain properties of the the class label. data, e.g. variance or separability. The labelled sample is generally Ignoring label information can too small and insufficient for this cause downgrades in the purpose. performance.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  52. 52. Outline MD Classification MDSSL Application to SA Conclusions Current WorkSemi-supervised Feature Selection via Spectral Analysis I Based on the cluster assumption - Unsupervised feature selection (Zhao et al, 2007) f f Unsupervised perspective: Both solutions are OK. Supervised point of view: f is better than f .Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  53. 53. Outline MD Classification MDSSL Application to SA Conclusions Current WorkSemi-supervised Feature Selection via Spectral Analysis II Proposal: Use the clustering assumption to identify the relevant features, but giving more relevance to the features which clearly separate the labels. Drawback: This algorithm does not take into account the redundant detection.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  54. 54. Outline MD Classification MDSSL Application to SA Conclusions Current WorkSemi-supervised Feature Selection via BASSUM Calculate the Markov blanket of a class variable by using G 2 conditional independence tests with both labelled and unlabelled data. It detects redundant features (Cai et al, 2011). Markov blanket of a class variable A is the set of all parents, children and spouses of A in the Bayesian network.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  55. 55. Outline MD Classification MDSSL Application to SA Conclusions Current WorkBASSUM Example F3 is the class variable F1 F2 F3 F2 is in the Markov Blanket S(F3 ), i.e. F1 S(F3 ) = {F2 } F3 We are checking if F1 is also in S(F3 ) F2 So, want to determine F1 ⊥ F3 |S(F3 ) = F1 ⊥ F3 |F2 G 2 conditional independence test DEFINITIONS Marginal sums X cijk c++k cijk ≡ number of P G2 = 2 cijk ln ∼ χ2 c+jk = Pi cijk , ci+k c+jkinstances that satisfy ijk ci+k = P j cijk ,F1 = f1i , F2 = f2j , F3 = cij+ = k cijk . df = (|F1 | − 1)(|F2 | − 1)|F3 | f3k Labelled data Unlabelled dataJonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  56. 56. Outline MD Classification MDSSL Application to SA Conclusions Current WorkImportant Ideas Redundancy and Relevancy Spectral Analysis → A definition of the Markov Blankets in multi-dimensional Bayesian networks is needed. X6 X1 X6 X1 C1 X5 C* X3 X5 X3 C2 C3 X4 X2 X4 X2 BASSUM approach → Modify a classical feature selection technique (Saeys et al, 2007) to be able to deal with multi-dimensional problems in a semi-supervised framework.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  57. 57. Outline MD Classification MDSSL Application to SA Conclusions Current WorkAffect Analysis Title: Application to Affect Analysis Description: Use the methodology proposed in this presentation to deal with the problem of Affect Analysis. Collaboration: Socialware S.A. Motivation (Abbasi et al., 2008) Affect Analysis is concerned with the analysis of text containing emotions and it tries to extract a large number of potential emotions, e.g. happiness, sadness, anger, hate, violence, excitement, etc.Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  58. 58. Outline MD Classification MDSSL Application to SA Conclusions Current WorkPlutchik’s Affect Model We want to take a step forward in this problem taking advantage of the potential possibilities of the MDBNC to model complex relationships between the class variables. 4 class variables with three possible values: {−1, 0, 1}. LOVE-REMORSE (Aceptaci´n-Disgusto) o CONTEMPT-SUBMISSION (Anticipaci´n-Sorpresa) o AGGRESSIVENESS-AWE (Ira-Miedo) OPTIMISM-DISAPPROVAL (Alegr´ ıa-Tristeza)Jonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  59. 59. Outline MD Classification MDSSL Application to SA Conclusions Current WorkQuestions THANKS! jonathan.ortigosa@ehu.esJonathan Ortigosa-Hern´ndez aSemi-supervised Learning of Multi-dimensional Class Bayesian Network Classifiers
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×