CEDI2010 - Article


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

CEDI2010 - Article

  1. 1. A Semi-supervised Approach to Multi-dimensional Classification with Application to Sentiment AnalysisJonathan Ortigosa-Hernández1 , Juan Diego Rodríguez 1 , Leandro Alzate2 , Iñaki Inza1 , and José A. Lozano1 1 Intelligent Systems Group, Department of Computer Science and Artificial Intelligence The university of the Basque Country, San Sebastián, Spain 2 Socialware Company, Bilbao, Spain k{jonathan.ortigosa,juandiego.rodriguez,inaki.inza,ja.lozano}@ehu.es, leandro.alzate@asomo.netAbstract field and it is widely used to deal with many real-life problems known as classification prob-A classical supervised classification task tries lems. In these tasks, a training set of instancesto predict a single class variable based on a is available, and each instance is described bydataset composed of a set of labelled examples. a set of features and a unique known class vari-However, in many real domains more than one able. With the help of a training set, the clas-variable could be considered as a class vari- sification process attempts to construct a de-able, so a generalisation of the single-class clas- scription of the class variable, which in turnsification problem to the simultaneous predic- helps to classify a new unlabelled instance.tion of a set of class variables should be devel- However, many application domains natu-oped. This is referred to as multi-dimensional rally consider more than one class variable.supervised classification. For instance, a text document or a seman- In addition, when performing classification tic scene can be assigned to multiple top-tasks one can be concerned about the fact ics, a gene can have multiple biological func-that, in some practical circumstances, obtain- tions or a patient may suffer from multipleing enough labelled examples for a classifier diseases. This kind of problems is known asmay be costly and time consuming. Thus, it multi-dimensional classification problems [23].is very desirable to have learning algorithmsthat are able to incorporate a large number Due to the fact that they only considerof unlabelled data with a small number of la- a unique class variable, classical supervisedbelled data when learning classifiers. This is classification approaches cannot be straight-referred to as semi-supervised learning. forwardly applied to the multi-dimensional problems scenario. Several attempts have In this paper, we integrate multi- been made to adapt single-class classifiers todimensional classification and semi-supervised multi-dimensional classifiers, but none of themlearning by means of the multi-dimensional exactly captures the problem characteristics.Bayesian network classifiers and the EM The most simple approach is to divide theAlgorithm. Results are presented on a real multi-dimensional problem into several one-Sentiment Analysis dataset demonstrating class problems (one for each class variable)that this integration can be beneficial to and tackle them as if they were independent.improve the recognition rates. However, this approach does not capture the real characteristics of the problem, because it1 Introduction does not explicitly model the correlations be- tween the different class variables and hence,Supervised classification [2] is one of the most it does not take advantage of the informationimportant tasks in the pattern recognition that they may provide [19]. So, within this
  2. 2. context, multi-dimensional supervised classi- algorithms to a real database of opinion analy-fication [1][18][19][23], which is an approach sis. Finally, Section 6 sums up the paper withthat tries to take advantage of this correlation, some conclusions.appears in order to not only capture the un-derlying nature of the multi-dimensional prob- 2 Multi-dimensional Classificationlems, but also to use the relationships betweenthe class variables to improve the recognition A typical supervised classificationrates. problem consists of building a classi- In addition, we are also concerned about the fier from a labelled training datasetfact that several papers in the recent past have D = {(x(1) , c(1) )...(x(N ) , c(N ) )} in orderaddressed the multi-dimensional classification to predict the value of a class variable C giventask by building classifiers that rely exclusively a set of features X = (X1 , ...., Xn ) of an un-on labelled examples [1]. However, some prac- seen unlabelled instance x = (x1 , ...., xn ). Atical circumstances, obtaining enough labelled generalisation of this problem to the joint pre-examples for a classifier may be costly and diction of several class variables has recentlytime consuming, and this problem is accentu- been proposed in the research communityated when using a large number of target vari- [1][18][19][23]. This generalisation is knownables. Thus, the scarcity of labelled data also as multi-dimensional supervised classification.motivates us to deal with unlabelled examples Our purpose is to predict the value of eachin a semi-supervised framework when working class variable in the class variable vectorwith the exposed multi-dimensional problem. C = (C1 , ...., Cm ) given the feature vector of Motivated by the previous comments on the an unseen unlabelled instance.state-of-the-art of the multi-dimensional clas-sification, in this paper, the following pro- 2.1 Multi-dimensional Class Bayesianposals are presented: (i) the extension of Network Classifiersmulti-dimensional classification to the semi- In order to deal with these multi-dimensionalsupervised learning framework by proposing a problems, we propose the use of multi-set of semi-supervised algorithms which make dimensional class Bayesian network classifiersuse of the EM algorithm [6][15], (ii) a su- (MDBNC) [18][23], which are a recent general-pervised filter learning algorithm for Multi- isation of the classical Bayesian network classi-dimensional J/K dependences Bayesian clas- fiers [13] to deal with multiple class variables.sifiers [19], and (iii) the demonstration of the MDBNC represent the underlying jointfact that using both multi-dimensional clas- probability of the data by making use of di-sification and semi-supervised learning can be rected acyclic graphs (DAG) over the classbeneficial to improve recognition rates in a real variables and over the feature variables sep-application of Sentiment Analysis. arately, and then, by connecting the two sets The rest of the paper is organised as follows. of variables by means of a bi-partite directedSection 2 defines either the multi-dimensional graph. So, the DAG structure S = (V, A) hassupervised classification paradigm and the the set V of random variables partitioned intomulti-dimensional class Bayesian network clas- the sets VC = {C1 , . . . , Cm }, m > 1, of classsifiers. A group of algorithms to learn differ- variables and the set VF = {X1 , . . . , Xn },ent types of multi-dimensional Bayesian clas- n ≥ 1, of features. Moreover, the set of arcs Asifiers in a supervised framework is introduced can be partitioned into three sets: ACF , ACin Section 3. Section 4 extends the supervised and AF with the following properties:algorithms presented in the previous section tothe semi-supervised framework. Section 5 re- • ACF ⊆ VC × VF is composed of the arcsviews the work related to Sentiment Analysis between the class variables and the fea-and shows the experimental results of apply- ture variables, so we can define the fea-ing the proposed semi-supervised classification ture selection subgraph of S as SCF =
  3. 3. Figure 1: A multi-dimensional Bayesian classifier and its division [18]. (V, ACF ). This subgraph represents the 3 Supervised Learning of MDBNC selection of features that seems relevant for classification given the class variables. Each previously introduced sub-family has dif- ferent restriction in its structure, so a differ- ent structure learning algorithm is needed for • AC ⊆ VC × VC is composed of the arcs each one. In this section, we define the MDnB, between the class variables, so we can de- MDTAN and MD J/K sub-families and pro- fine the class subgraph of S induced by vide a structure learning algorithm for each VC as SC = (VC , AC ). sub-family. • AF ⊆ VF × VF is composed of the arcs 3.1 MDnB between the feature variables, so we can define the feature subgraph of S induced by VF as SF = (VF , AF ). C1 C2 C3 Figure 1 shows a multi-dimensional classBayesian network classifier with 3 class vari-ables and 5 features, and its partition into the X1 X2 X3 X4 X5 X6 X7 X8 X9 X10three subgraphs. Depending on the structure of the threesubgraphs, several sub-families1 of MDBNC Figure 2: An example of a MDnB structure.have been proposed in the state-of-the-artliterature. These are the Multi-dimensionalnaive Bayes classifier (MDnB), the Multi- In the MDnB [23] (see Figure 2), bothdimensional tree-augmented classifier (MD- the class subgraph and the feature subgraphTAN) and the Multi-dimensional J/K depen- are empty, and the feature selection subgraphdences Bayesian classifier (MD J/K). is complete. As happens in uni-dimensional In addition to its structure, a MDBNC (as naive Bayes, this classifier assumes condi-happens in the classical Bayesian networks) tional independence between each pair of fea-also consists of a set of parameters Θ that cod- tures given the entire subset of class variables.ify the local probability distribution. When this classifier is provided with a train- ing dataset with a determined number of class 1 In [23] and [18], instead of multi-dimensional, the variables and features, it has a fixed struc-term fully is used in order to name the classifiers. ture and so, there is no need for a structure
  4. 4. learning. Therefore, learning a MDnB classi- C1fier consists of just estimating the parametersΘ of the fixed structure by using a training C2 C3 C4dataset D.3.2 MDTAN X1 X7The MDTAN (see Figure 3), as proposed X3 X5 X6 X9 X10in [23], is the generalisation of the uni- X2 X4 X8dimensional TAN classifier [9]. In this multi-dimensional classifier, both the class subgraphand the feature subgraph are directed trees. Figure 4: An example of a MD 2/3 structure. C1 C2 X3 ture is also able to move through the spec- trum of allowable dependence in the multi- X1 X2 X4 X5 dimensional framework, from the MDnB to the full multi-dimensional Bayesian classifier. Figure 3: An example of a MDTAN structure. Note that setting J = K = 0 we can learn a MDnB, setting J = K = 1 a MDTAN is learned and so on. The full multi-dimensional A structure learning algorithm that learns Bayesian classifier, which is the classifier witha MDTAN from a given dataset is proposed has the three subgraphs complete, can bein [23]. This algorithm follows a wrapper ap- learnt by setting J = (m − 1) and K = (n − 1),proach by performing a local search over the where m and n are the number of class vari-ACF structure. Its aim is to produce the MD- ables and predictive features respectively.TAN structure that maximises the accuracy Although the MD J/K structure has beenfrom a given dataset. In order to obtain a proposed in the state-of-the-art literature, toMDTAN structure in each iteration, it gen- the best of our knowledge, a specific MD J/Kerates a set of different ACF from a current learning algorithm has not been defined by theACF and uses mutual information to create research community. For that reason, we pro-the class subgraph and feature subgraph by pose, in this section, a filter algorithm in abuilding two maximum spanning trees. When supervised learning framework able to learnthere is no improvement when generating a this type of structure (see Algorithm 1), i.e.new set of structures, the algorithm stops. the learning algorithm uses mutual informa- tion to measure the dependency between the3.3 MD J/K variables, instead of performing a search for the structure that maximises the accuracy.The MD J/K (see Figure 4) appears in [19]and it is the multi-dimensional generalisation In this algorithm, we do not directly use theof the well-known K-DB [20]. It allows each mutual information as measured in the pre-class variable Ci to have a maximum of J de- vious MDTAN learning algorithm [23]. Thispendences with other class variables, and each is due to the fact that the mutual informa-predictive variable Xi to have, apart from the tion is not normalised when the cardinalitiesclass variables, a maximum of K dependences of the variables are different, so we use an in-with other predictive variables. dependence test to determine if a dependence As happens in the uni-dimensional K-DB between two variables is strong enough to beversion proposed by Sahami [20], this struc- part of the model.
  5. 5. Algorithm 1 MD J/K structure learning al- is tested, while in the case of ACF each pairgorithm using a filter approach of a class variable and a feature (Ci , Xj ) is 1: Learn the AC structure tested. If the significance of the relationship between the two variables surpasses a deter- 1. Calculate the α-value (significance of mined threshold, then an arc is included be- the mutual information) using the in- tween the random variables. Note that, in dependence test for each pair of class the case of the AC structure, we have to con- variables, and rank them. sider some restrictions when adding a new arc. 2. Remove the α-values lower than the These restriction are the allowable maximum threshold sα = 0.90. number of parents and no cycles in the struc- 3. Use the ranking to add arcs between ture. the class variables fulfilling the con- In order to construct the AF structure, we ditions of no cycles between the class use the conditional mutual information be- variables and no more than J-parents tween two random variables for the indepen- per class. dence test [12] (see pp. 166–167). In this case, we perform the test in each pair of features Xi 2: Learn the ACF structure and Xj given the class parents of Xj , Pac (Xj ). After that, the modus operandi of the algo- 1. Calculate the α-value (significance of rithm is the same as in the case of the AC the mutual information) using the in- structure. dependence test for each pair Ci and Before the end of this section, we want to Xj and rank them. make some comments about two designing de- 2. Remove the α-values lower than the cisions we have made: threshold sα = 0.90. 3. Use the ranking to add arcs from the • In order to make the algorithm more ro- class variables to the features. bust [20], we have decided to introduce the threshold sα in steps 1.2, 2.2 and 3.2 3: Learn the AF structure of Algorithm 1. This decision allows more 1. Calculate the α-value (significance of flexibility by not forcing the inclusion of the conditional mutual information) dependencies that do not appear to exist using the conditional independence when the values of K and J are set too test for each pair Xi and Xj given high. Pac (Xj ) and rank them. • From the previous paragraphs and step 2. Remove the α-values lower than the 3 of Algorithm 1, one can easily deduce threshold sα = 0.90. that the class parents of a feature do not 3. Use the ranking to add arcs between count for the restriction of maximum K the class variables fulfilling the condi- parents for each predictive feature. As tions of no cycles between the features in the case of the uni-dimensional KDB, and no more than K-parents per fea- the class parents can either count for the ture. restriction or be left out of it. 4 Semi-supervised Learning in The AC and ACF structures are constructed Multi-dimensional Classificationas follows: we use the mutual information be-tween each pair of random variables to cre- In the semi-supervised learning frame-ate an independence test based on [12] (see work [4][26], the dataset D is di-pp. 155–158). This test is used to calculate vided into two parts: the instancesthe significance of each relationship. In the DL = {(x(1) , c(1) ), ..., (x(L) , c(L) )} forcase of AC each pair of class variables (Ci , Cj ) which labels are provided, and the instances
  6. 6. DU = {(x(L+1) , ?), ..., (x(N ) , ?)}, where the Algorithm 2 Our version of the EM Algo-labels are not known. Therefore, we have a rithmdataset of N instances, where there are L Input: A training dataset with both labelledlabelled examples and (N − L) unlabelled and unlabelled data and an initial modelexamples. The aim of semi-supervised learn- ψ (i=0) with a fixed structure and with aning is to build more accurate classifiers using initial set of parameters Θ(i=0) . (i)both labeled and unlabelled data, rather than 1: while the model ψ does not convergeusing exclusively labeled examples as happens doin supervised learning. 2: E-STEP Use the current model ψ (i) to In order to deal with these kind of prob- estimate the probability of each config-lems, the EM algorithm [6][15] has been widely uration of class variables for each unla-used in the semi-supervised learning frame- belled instance.work [5][16]. The aim of the EM algorithm 3: M-STEP Learn a new model ψ (i+1)as typically used in semi-supervised learning with structure and parameters, given[16] is to find the parameters of the model the estimated probabilities in the E-that maximise the likelihood of the data, using STEP.both labelled and unlabelled instances. The 4: end whileiterative process works as follows: in the i- Output: A classifier ψ, that takes an unla-th iteration the algorithm alternates between belled instance and predicts the class vari-completing the unlabelled instances by using ables.the parameters Θ(i) (E-step) and updatingthe parameters of the model Θ(i+1) calculat-ing the maximum likelihood estimator (MLE)with the whole dataset (M-step), i.e. the la- alternates between completing the unlabelledbelled data and the unlabelled instances that instance by the previously learnt model ψ (i)have been previously classified in the E-Step. (E-step) and learning a new model ψ (i+1) byNote that the structure remains fixed in the using a learning algorithm with the wholewhole iterative process. dataset, both labelled and completed instances Although good results have been achieved (M-step). In the semi-supervised learning re-with the EM algorithm in uni-dimensional search community, the input parameter ψ (i=0)classification [5][16], we are concerned about of the EM Algorithm is usually learnt from thethe restriction of maximising just the pa- labelled subset DL . Hence, we will continuerameters of a fixed structure in the multi- using this modus operandi in this version ofdimensional framework. Due to the fact that the algorithm. Note that our version of theseveral class variables have to be simultane- EM algorithm is closer to the Bayesian struc-ously predicted, the structures of the MDBNC tural EM algorithm proposed in [8] rather thantend to be more complex than the structures the original [6].of the uni-dimensional Bayesian network clas-sifiers. For that reason, it seems more appro- Using Algorithm 2, all the supervised learn-priate to perform a structural search. Thus, ing approaches proposed in the previous sec-we perform several changes to the EM algo- tion can be straightforwardly used. The learn-rithm in order to avoid fixing the structure of ing algorithm is used in the M-STEP, where itthe model during the iterative process. The learns a model using labelled and unlabelledproposal is shown in Algorithm 2. data that have been previously labelled in In this version of the EM algorithm we want the E-STEP. So, applying our EM Algorithmto find the model, both structure and param- we have extended the MDBNC to the semi-eters, that maximises the likelihood of the supervised learning framework generating awhole data. So, in this version, the i-th itera- new set of semi-supervised multi-dimensionaltion of the algorithm is performed as follows: it classifiers.
  7. 7. 5 Sentiment Analysis order to apply straightforwardly those learn- ing methods, the text has to be transformedIn this section, we review several works pro- into a set of features X = {X1 , X2 , ....Xn }.posed for approaching Sentiment Analysis Liu, in [14], presents several approaches thatclassification. After that, our proposed al- have been used to represent a text in a set ofgorithms are applied to a real dataset ob- features such as “terms and their frequency”tained from the company Socialware, one of or “part-of-speech tags”.the most important companies of mobilised Although a large amount of work have beenopinion analysis in Europe, with 3 different proposed not only in engineering a suitableclass variables and 14 features. set of features, but also in studying both sub- problems, most existing researches only focus5.1 State-of-the-art literature on one of the two sub-problems, and to the best of our knowledge neither of them per-Sentiment Analysis (SA), which is also known forms a simultaneous sentiment-subjectivityas Opinion Mining, is defined as the computa- classification framework. Even though, severaltional study of opinions, sentiments and emo- papers have noticed the need to predict bothtions expressed in text [14]. It mainly orig- the sentiment and the subjectivity values of ainated to meet the need for organisations to given text [7][25]. For instance, [25] reportedautomatically find on the Internet the opinions a study which tries to classify subjective sen-or sentiments of the general public about their tences and also determine their opinion orien-products and services. Although SA has been tation. However, they perform both classifi-treated as multiple uni-dimensional different cations in series, not simultaneously, withoutproblems, it has indeed a multi-dimensional using the correlation between the sentimentnature, as we show in this paper. and the subjectivity as is explicitly modelled Treating SA as a text classification problem in multi-dimensional classification.is the area that has been most researched in Finally, despite the fact that SA is broadlythe academia [14]. This approach is mainly applied to extract information from the webdivided into two sub-problems that have been where a huge amount of unlabelled data canwidely studied in the last few years: senti- be found, little work [10] had been carried outment classification and subjectivity classifica- in learning SA problems in a semi-supervisedtion. Both can be formulated as learning prob- framework.lems: 1. The aim of Sentiment classification [11] 5.2 The ASOMO dataset is to learn classifiers able to classify an Since typical benchmark data repositories opinionated text, defined as a set of fea- in Sentiment Classification do not provide tures, as expressing a positive, neutral or datasets with multiple class variables, we do negative opinion. Some authors do not not test our algorithms in the common col- consider the class label neutral [17], while lections used in the SA papers. Our exper- others consider the 1-5 stars rating [21]. imentation is performed using a real dataset 2. Subjectity classification [24] can be formu- extracted from ASOMO service of mobilised lated as the learning problem of determin- opinion analysis. ing whether a text is objective or subjec- The ASOMO dataset has been collected by tive. Socialware and it consists of 2, 542 reviews written in Spanish. 150 of these documents Due to the fact that both can be expressed have been labelled, in isolation, by an expertas learning problems, the existing methods can in Socialware company and 2, 392 are left asbe readily applied for sentiment and subjectiv- unlabelled instances.ity classification, e.g. Bayesian network classi- Each document is preprocessed using anfiers or support vector machines. However, in open source morphological analyser [3]. This
  8. 8. analyser provides information related to part- latter approach and it terminates after find-of-speech, which is helpful in detecting a list of ing a local likelihood maxima or 250 iterac-14 morphological features, expressed as a ra- tions. Due to the fact that the features oftio in the range [0, 1], which characterise each the dataset are continuous, they are discre-analysed document. tised into three values using equal frequency The dataset has three different class vari- discretisation. The parameters of the modelsables, which are naturally related in SA: Sen- are calculated by MLE corrected with Laplacetiment and Subjectivity (as mentioned before), smoothing. Finally, the performance of eachand a third one called Will to Influence. This model has been estimated via 5 runs of 5-foldclass variable, frequently used in the frame- non-stratified cross validation [22].work of ASOMO, is defined as the dimen- Table 8 shows the results of the algorithmssion that rates the desire of the opinion holder over the ASOMO dataset in both supervisedto provoke a certain reaction in the potential and semi-supervised approaches. In additionreaders of the text. It has four possible val- to the joint accuracy2 , the accuracies per eachues: declarative text, soft, medium and strong class variable are shown. The accuracies inwill to influence. Sentiment, in this dataset, bold correspond to the best accuracies ob-has 5 different labels as occurs in the 1-5 stars tained in both supervised and semi-supervisedratings. So, in addition to the three classic frameworks for each class variable and for thevalues, it has the values “very negative” and joint accuracy metric. Based on the results,“very positive”. several conclusions can be extracted: 1. The multi-dimensional classification ap-5.3 Experimentation on the ASOMO proach statistically outperforms the uni- dataset dimensional classification in terms of jointIn order to evaluate our semi-supervised multi- accuracy (Student’s t-test with α = 0.95).dimensional set of algorithms, the follow-ing experiment is performed: The ASOMO 2. The semi-supervised learning frameworkdataset has been used to learn three different obtains better joint accuracies than su-(uni-dimensional) Bayesian network classifiers pervised learning.and three different sub-families of MDBNC. 3. The class variable Will to Influence tendsFor uni-dimensional classification, naive Bayes to degrade its performance in semi-classifier (nB), tree-augmented network clas- supervised learning, whilst the otherssifier (TAN) and a 2 dependence Bayesian tend to achieve better single accuracies.classifier have been chosen. In the multi-dimensional side, MDnB, MDTAN, MD 2/K, 4. The MDTAN approach [23] tends to be-with K = 2, 3, ...6 structures have been se- have more similar to the uni-dimensionallected. In order to compare supervised and approaches rather than to the approachessemi-supervised frameworks, all the structures it belongs to, the multi-dimensional ones.are learnt in both scenarios. As stated be- In conclusion, we show in this experi-fore, the uni-dimensional approach cannot be ment that the proposed semi-supervised multi-straightforwardly apply to deal with the multi- dimensional formulation proposes a novel per-dimensional problems, so, when learning uni- spective for this kind of problems, opening newdimensional classifiers we divide the dataset ways to deal with these problems. In addi-into three one-class variable tasks and tackle tion, it can also be seen that the explicit usethem as independent. and the representation of the relationships be- The supervised learning procedure only uses tween different class variables, as well as thethe labelled dataset (consisting of 150 docu- 2ments), whilst the semi-supervised approach This measure estimates the values of all class vari- ables simultaneously, that is, it only counts a successuses the 2, 532 reviews. Our multi-dimensional if all the classes are correctly predicted, otherwise itextension of the EM algorithm is used in the counts an error [19].
  9. 9. LABELLED DATA LABELLED AND UNLABELLED DATA Classif. Will to Influence Sentiment Subjectivity JOINT Acc Will to Influence Sentiment Subjectivity JOINT Acc nB 56.80 ± 2.18 28.00 ± 3.42 82.80 ± 1.19 11.47 ± 2.76 58.67 ± 2.21 27.33 ± 3.86 54.13 ± 4.15 09.33 ± 2.75 TAN 51.33 ± 1.76 27.20 ± 3.75 79.47 ± 1.19 09.47 ± 1.79 48.13 ± 3.60 28.40 ± 3.79 70.40 ± 2.69 10.53 ± 3.63 2DB 56.53 ± 1.10 29.33 ± 3.40 82.80 ± 0.56 11.73 ± 2.24 35.20 ± 6.23 29.47 ± 1.85 83.60 ± 0.89 09.20 ± 2.51 MDnB 53.60 ± 0.89 29.33 ± 1.63 83.07 ± 1.21 14.80 ± 1.66 53.87 ± 2.60 32.13 ± 0.9984.00 ± 0.6716.80 ± 2.28 MDTAN 52.13 ± 2.23 26.27 ± 1.61 83.47 ± 0.73 13.33 ± 2.05 34.93 ± 6.30 28.27 ± 2.69 83.07 ± 1.12 08.57 ± 2.88 MD 2/2 52.00 ± 1.70 28.80 ± 3.03 78.53 ± 2.51 15.07 ± 3.64 52.40 ± 2.61 26.27 ± 4.70 74.80 ± 2.33 12.52 ± 1.45 MD 2/3 56.67 ± 2.26 29.07 ± 1.92 76.00 ± 1.05 15.87 ± 1.91 50.67 ± 4.42 27.33 ± 1.42 75.60 ± 4.53 14.40 ± 1.30 MD 2/4 56.93 ± 1.67 27.74 ± 3.22 77.73 ± 2.65 14.80 ± 1.73 51.60 ± 3.25 29.73 ± 2.77 77.47 ± 1.97 16.00 ± 1.56 MD 2/5 56.94 ± 2.69 27.86 ± 2.47 75.73 ± 3.18 13.73 ± 2.24 53.20 ± 3.21 26.93 ± 5.00 78.27 ± 3.96 15.47 ± 1.59 MD 2/6 56.07 ± 3.56 31.05 ± 3.41 76.57 ± 2.10 15.47 ± 2.18 53.60 ± 1.80 28.67 ± 1.05 77.73 ± 1.21 16.53 ± 0.55 Table 1: Accuracies on ASOMO datasetuse of unlabelled data, can be beneficial to im- multi-dimensional problem.prove the recognition rates in SA problems, In short, we have proposed a novel per-demonstrating that the SA problem has indeed spective for this kind of problems and demon-a multi-dimensional underlying nature. strated that the use of multi-dimensional clas- sification, as well as the use of unlabelled data, can be beneficial to improve the recognition6 Conclusion rates.Multi-dimensional classification and semi-supervised learning are two different branches Acknowledgmentsof machine learning. While multi-dimensionalsupervised classification is the generalisation This work has been partially supported byof the single-class supervised classification the Saiotek, Etortek and Research Groupsproblem to the simultaneous prediction of a 2007-2012 (IT-242-07) programs (Basqueset of class variables, semi-supervised learning Government), TIN2008-06815-C02-01, Con-is the learning paradigm concerned with the solider Ingenio 2010-CSD2007-00018 projectsstudy of how more accurate classifiers can be and MEC-FPU grant AP2008-00766, andlearnt by adding unlabelled examples to the COMBIOMED network in computationallabelled ones. In this paper, we establish a biomedicine (Carlos III Health Institute).bridge between them. At first instance, a supervised filter learn- Referencesing algorithm for MD J/K classifiers has beenproposed. After that, we have extended, in [1] C. Bielza, G. Li, and P. Larrañaga. Multi- dimensional classification with Bayesian net-addition to the MD J/K learning algorithm, works. Technical Report UPM-FI/DIA/2010-the state-of-the-art of supervised MDBNC 1, Departmento de Inteligencia Artificial,learning algorithms by means of a multi- Universidad Politécnica de Madrid, Madrid,dimensional extension of the EM Algorithm. Spain, 2010.Then, we have applied the proposed battery of [2] C. M. Bishop. Pattern Recognition andsemi-supervised multi-dimensional learning al- Machine Learning (Information Science andgorithms to a real dataset of SA showing that Statistics). Springer, 2006.they are competitive with many existing su- [3] X. Carreras, I. Chao, L. Padro, and M. Padro.pervised learning algorithms. Besides, we have An open-source suite of language analyzers.also shown that multi-dimensional learning In Proceedings of the 4th International Con-algorithms outperform the uni-dimensional ference on Language Resources and Evalua-techniques when dealing with semi-supervised tion, volume 10, pages 239–242, 2006.multi-dimensional problems. This demon- [4] O. Chapelle, B. Scholkopf, and A. Zien. Semi-strates that SA classification is, in fact, a supervised learning. The MIT Press, 2006.
  10. 10. [5] I. Cohen, F. G. Cozman, N. Sebe, M. C. of the Conference on Empirical Methods in Cirelo, and T. S. Huang. Semisupervised Natural Language Processing (EMNLP’08), learning of classifiers: Theory, algorithms and pages 79–86, 2002. their application to human-computer interac- [18] J. D. Rodríguez and J. A. Lozano. Multi- tion. IEEE Trans. Pattern Anal. Mach. In- objective learning of multi-dimensional tell., 26(12):1553–1567, 2004. bayesian classifiers. In Eighth Conference on [6] A. P. Dempster, N. M. Laird, and D. B. Ru- Hybrid Intelligent Systems (HIS’08), pages bin. Maximum likelihood from incomplete 501–506, September 2008. data via the em algorithm. Journal of the [19] J. D. Rodríguez and J. A. Lozano. Learn- Royal Statistical Society. Series B (Method- ing Bayesian network classifiers for multi- ological), 39(1):1–38, 1977. dimensional supervised classification prob- [7] A. Esuli and F. Sebastiani. Sentiwordnet: A lems by means of a multi-objective ap- publicly available lexical resource for opinion proach. Technical Report EHU-KZAA-TR- mining. In In Proceedings of the 5th Confer- 3-2010, Department of Computer Science ence on Language Resources and Evaluation and Artificial Intelligence, University of the (LRECÕ06), pages 417–422, 2006. Basque Country, San Sebastián, Spain, 2010. [8] N. Friedman. The bayesian structural EM al- [20] M. Sahami. Learning limited dependence gorithm. In Proceedings of the 14th Confer- bayesian classifiers. In AAAI Press, editor, ence on Uncertainty in Artificial Intelligence, Proceedings of the 2nd International Confer- pages 129–138. Morgan Kaufmann Publish- ence on Knowledge Discovery and Data Min- ers, 1998. ing (KDD-96), pages 335–338, 1996. [9] N. Friedman, D. Geiger, and M. Goldszmidt. [21] B. Snyder and R. Barzilay. Multiple aspect Bayesian network classifiers. Machine Learn- ranking using the good grief algorithm. In ing, 29(2-3):131–163, 1997. Proceedings of the Joint Human Language[10] M. Gamon, A. Aue, S. Corston-Oliver, and Technology/North American Chapter of the E. Ringger. Pulse: Mining customer opin- ACL Conference (HLT-NAACL), pages 300– ions from free text. In Proceedings of the 6th 307, 2007. International Symposium on Intelligent Data [22] M. Stone. Cross-validatory choice and as- Analysis, pages 121–132, 2005. sessment of statistical predictions. Journal[11] M. Koppel and J. Schler. Using neutral ex- of the Royal Statistical Society B, 36(1):111– amples for learning polarity. In Proceedings 147, 1974. of the International Joint Conference on Ar- [23] L. C. van der Gaag and P. R. de Waal. Multi- tificial Intelligence (IJCAI 2005), 2005. dimensional Bayesian network classifiers. In[12] S. Kullback. Information Theory and Statis- Proceedings of the Third european workshop tics. Wiley, 1959. in probabilistic graphical models, pages 107–[13] P. Larrañaga, J. A. Lozano, J. M. Peña, and 114, 2006. I. Inza. Special Issue on Probabilistic Graph- [24] J. M. Wiebe, R. F. Bruce, and T. P. O’Hara. ical Models for Classification, volume 59(3). Development and use of a gold-standard data Machine Learning, 2005. set for subjectivity classifications. In Proceed-[14] B. Liu. Sentiment analysis and subjectiv- ings of the 37th annual meeting of the Associ- ity. In N. Indurkhya and F. J. Damerau, edi- ation for Computational Linguistics on Com- tors, Handbook of Natural Language Process- putational Linguistics, pages 246–253, 1999. ing. Chapmand & Hall, second edition, 2010. [25] H. Yu and V. Hatzivassiloglou. Towards an-[15] G. J. McLachlan and T. Krishnan. The swering opinion questions: Separating facts EM Algorithm and Extensions. Wiley- from opinions and identifying the polarity Interscience, 1997. of opinion sentences. In Proceedings of the[16] K. Nigam, A. McCallum, S. Thrun, and Conference on Empirical Methods in Natural T. Mitchell. Text classification from labeled Language Processing (EMNLP), 2003. and unlabeled documents using em. Machine [26] X. Zhu and A. B. Goldberg. Introduction Learning, 39(2):103–134, 2000. to Semi-Supervised Learning. Synthesis Lec-[17] B. Pang, L. Lee, and S. Vaithyanathan. tures on Artificial Intelligence and Machine Thumbs up? sentiment classification using Learning. Morgan & Claypool Publishers, machine learning techniques. In Proceedings 2009.