1. Introduction Our solution The Bayesian network model Results Conclusions and future works
Link-based text classiﬁcation using
Bayesian networks
Luis M. de Campos Juan M. Fernández-Luna
Juan F. Huete Andrés R. Masegosa
Alfonso E. Romero
{lci,jmfluna,jhg,andrew,aeromero}@decsai.ugr.es
Departamento de Ciencias de la Computación e Inteligencia Artiﬁcial
E.T.S.I. Informática y de Telecomunicación,
CITIC-UGR, Universidad de Granada
18071 – Granada, Spain
INEX 2009 Workshop, Brisbane
2. Introduction Our solution The Bayesian network model Results Conclusions and future works
Our participation
Universidad de Granada at INEX 2009
The third year we participate on XML mining
(classiﬁcation).
As previous ocasions, we are interested in Bayesian
networks.
We’ve provided a new solution to this problem.
Sorry, no AdHoc this year .
3. Introduction Our solution The Bayesian network model Results Conclusions and future works
Our participation
The problem itself
A text (XML) categorization problem. Training/test corpus.
Multilabel (more than 1 category per doc).
Links among ﬁles (training, test) given in a matrix.
Vectors of indexed terms (normalized tf-idf) provided.
4. Introduction Our solution The Bayesian network model Results Conclusions and future works
Our participation
The problem itself
A text (XML) categorization problem. Training/test corpus.
Same as previous years
Multilabel (more than 1 category per doc).
New this year!
Links among ﬁles (training, test) given in a matrix.
Same as 2008
Vectors of indexed terms (normalized tf-idf) provided.
The eternal question, what about XML?
5. Introduction Our solution The Bayesian network model Results Conclusions and future works
Our solution (2008)
Encyclopedia regularity (a document of category Ci tends
to links documents on the same category). Graphically
veriﬁed on the training set.
In 2008 we combined a ﬂat-text classiﬁer (Naïve Bayes)
with a Bayesian network of ﬁxed structure which modelled
interaction among categories, using learnt probabilities
P(ci |cj ).
Results were discrete (the worst model among 3, and
improvements over our baseline were not signiﬁcant).
6. Introduction Our solution The Bayesian network model Results Conclusions and future works
Our starting point (2009)
We detected the same regularity on categories (no matrix
plot this year).
Possible (hidden) hierarchy (for example
Portal:Religion, Portal:Christianity and
Portal:Catholicism).
This year we learn the interactions among categories from
data, no ﬁxed structure, but any which is on the set of
categories.
7. Introduction Our solution The Bayesian network model Results Conclusions and future works
Modeling link structure
Modeling link structure I
We assume there is a global probability distribution
among all these variables, and we will model it with a
Bayesian network.
Variables: categories Ci (39), categories of incoming links
Ej (39) and terms Tk (many).
Main Assumption: the probability distributions of a
document and the categories of ﬁles that link it are
independent given the category. Or simbolically:
p(dj , ej |ci ) = p(dj |ci ) p(ej |ci ).
8. Introduction Our solution The Bayesian network model Results Conclusions and future works
Modeling link structure
We then search for the conditional probability p(ci |dj , ej ):
p(dj , ej |ci ) p(ci ) p(dj |ci ) p(ej |ci ) p(ci )
p(ci |dj , ej ) = =
p(dj , ej ) p(dj , ej )
p(ci |dj ) p(dj ) p(ej |ci ) p(ci )
=
p(ci ) p(dj , ej )
p(ci |dj ) p(dj ) p(ci |ej ) p(ej )
=
p(ci ) p(dj , ej )
p(dj ) p(ej ) p(ci |dj ) p(ci |ej )
= .
p(dj , ej ) p(ci )
9. Introduction Our solution The Bayesian network model Results Conclusions and future works
Modeling link structure
We then search for the conditional probability p(ci |dj , ej ):
p(dj , ej |ci ) p(ci ) p(dj |ci ) p(ej |ci ) p(ci )
p(ci |dj , ej ) = =
p(dj , ej ) p(dj , ej )
p(ci |dj ) p(dj ) p(ej |ci ) p(ci )
=
p(ci ) p(dj , ej )
p(ci |dj ) p(dj ) p(ci |ej ) p(ej )
=
p(ci ) p(dj , ej )
p(dj ) p(ej ) p(ci |dj ) p(ci |ej )
= .
p(dj , ej ) p(ci )
p(ci |dj ) p(ci |ej )
p(ci |dj , ej ) ∝
p(ci )
10. Introduction Our solution The Bayesian network model Results Conclusions and future works
Modeling link structure
We then search for the conditional probability p(ci |dj , ej ):
p(dj , ej |ci ) p(ci ) p(dj |ci ) p(ej |ci ) p(ci )
p(ci |dj , ej ) = =
p(dj , ej ) p(dj , ej )
p(ci |dj ) p(dj ) p(ej |ci ) p(ci )
=
p(ci ) p(dj , ej )
p(ci |dj ) p(dj ) p(ci |ej ) p(ej )
=
p(ci ) p(dj , ej )
p(dj ) p(ej ) p(ci |dj ) p(ci |ej )
= .
p(dj , ej ) p(ci )
p(ci |dj ) p(ci |ej )
p(ci |dj , ej ) ∝
p(ci )
p(ci |dj ) p(ci |ej ) / p(ci )
p(ci |dj , ej ) =
p(ci |dj )p(ci |ej )/p(ci ) + p(c i |dj )p(c i |ej )/p(c i )
11. Introduction Our solution The Bayesian network model Results Conclusions and future works
Modeling link structure
Modeling link structure III
p(ci |dj ): output of a probabilistic classiﬁer. Any
probabilistic classiﬁer.
p(ci |ej ): probability of being of Ci considering the set of the
categories of the incoming (known) links. This is modeled
by the Bayesian network.
The problem reduces to the following: [see next slide]
12. Introduction Our solution The Bayesian network model Results Conclusions and future works
Modeling link structure
Modeling link structure IV
We have a vector of 39+39 binary variables for each
document: 39 for each category (1 if the doc. is of that
category, 0 if not), and 39 more (1 if the document is linked
by documents of this category, 0 if not).
With a learning algorithm, we learn a Bayesian network
from that data.
For each document to classify, for each category Ci we
compute its content probability p(ci |dj ) (with base
classiﬁer), and the probability of being of Ci knowing the
categories of certain neighbours p(ci |ej ) (with the learnt
Bayesian network).
We combine them using the blue equation.
13. Introduction Our solution The Bayesian network model Results Conclusions and future works
Learning link structure
Learning Bayesian Network, using WEKA package.
14. Introduction Our solution The Bayesian network model Results Conclusions and future works
Learning link structure
Learning Bayesian Network, using WEKA package.
Hillclimbing algorithm (easy and fast).
BDeu metric.
Three parents max. per node.
15. Introduction Our solution The Bayesian network model Results Conclusions and future works
Learning link structure
Learning Bayesian Network, using WEKA package.
Hillclimbing algorithm (easy and fast).
BDeu metric.
Three parents max. per node.
Propagation, using Elvira (WEKA does not have
propagation algorithms).
16. Introduction Our solution The Bayesian network model Results Conclusions and future works
Learning link structure
Learning Bayesian Network, using WEKA package.
Hillclimbing algorithm (easy and fast).
BDeu metric.
Three parents max. per node.
Propagation, using Elvira (WEKA does not have
propagation algorithms).
Compute p(ci ) (once), and p(ci |ej ) (for each document j).
Exact propagation was slow !
Importance Sampling algorithm (approximate).
17. Introduction Our solution The Bayesian network model Results Conclusions and future works
Base classiﬁers
Base classiﬁers
We have used Multinomial Naïve Bayes (binary) and
Bayesian OR Gate (a model presented by our group in
INEX 2007).
They are extensive described on the paper (read it if you
want to learn deeply about these two classiﬁers).
Any other probabilistic classiﬁers can be used to ﬁrstly
obtain p(ci |dj ) (any suggestions or preferences?).
18. Introduction Our solution The Bayesian network model Results Conclusions and future works
Results
MACC µACC MROC µROC MPRF µPRF MAP
N. Bayes 0.95142 0.93284 0.80260 0.81992 0.49613 0.52670 0.64097
N. Bayes + BN 0.95235 0.93386 0.80209 0.81974 0.50015 0.53029 0.64235
OR gate 0.75420 0.67806 0.92526 0.92163 0.25310 0.26268 0.72955
OR gate + BN 0.84768 0.81891 0.92810 0.92739 0.31611 0.36036 0.72508
Initial results
Problem in the OR gate! (Evaluation assumes
dj ∈ Ci ⇔ p(ci |dj ) > 0.5). This is not, in general, true for the
OR gate, need some scaling procedure (like SCut strategy).
19. Introduction Our solution The Bayesian network model Results Conclusions and future works
Results
MACC µACC MROC µROC MPRF µPRF MAP
N. Bayes 0.95142 0.93284 0.80260 0.81992 0.49613 0.52670 0.64097
N. Bayes + BN 0.95235 0.93386 0.80209 0.81974 0.50015 0.53029 0.64235
OR gate 0.75420 0.67806 0.92526 0.92163 0.25310 0.26268 0.72955
OR gate + BN 0.84768 0.81891 0.92810 0.92739 0.31611 0.36036 0.72508
Initial results
Problem in the OR gate! (Evaluation assumes
dj ∈ Ci ⇔ p(ci |dj ) > 0.5). This is not, in general, true for the
OR gate, need some scaling procedure (like SCut strategy).
MACC µACC MROC µROC MPRF µPRF MAP
OR gate 0.92932 0.92612 0.92526 0.92163 0.45966 0.50407 0.72955
OR gate + BN 0.96607 0.95588 0.92810 0.92739 0.51729 0.55116 0.72508
Scaled results (see paper for details).
20. Introduction Our solution The Bayesian network model Results Conclusions and future works
Conclusions
The model is new, parametrizable (learning algorithm,
parameters of algorithm, base classiﬁer,...) and valuable
by itself (always improves a baseline).
21. Introduction Our solution The Bayesian network model Results Conclusions and future works
Conclusions
The model is new, parametrizable (learning algorithm,
parameters of algorithm, base classiﬁer,...) and valuable
by itself (always improves a baseline).
Using the Bayesian network over the OR gate provides a
10% of improvement in some measures
.
22. Introduction Our solution The Bayesian network model Results Conclusions and future works
Conclusions
The model is new, parametrizable (learning algorithm,
parameters of algorithm, base classiﬁer,...) and valuable
by itself (always improves a baseline).
Using the Bayesian network over the OR gate provides a
10% of improvement in some measures
.
Good results on ROC (ranked third).
23. Introduction Our solution The Bayesian network model Results Conclusions and future works
Conclusions
The model is new, parametrizable (learning algorithm,
parameters of algorithm, base classiﬁer,...) and valuable
by itself (always improves a baseline).
Using the Bayesian network over the OR gate provides a
10% of improvement in some measures
.
Good results on ROC (ranked third).
Other base classiﬁer? SVM with probabilistic outputs,
Logistic Regression...
24. Introduction Our solution The Bayesian network model Results Conclusions and future works
Conclusions
The model is new, parametrizable (learning algorithm,
parameters of algorithm, base classiﬁer,...) and valuable
by itself (always improves a baseline).
Using the Bayesian network over the OR gate provides a
10% of improvement in some measures
.
Good results on ROC (ranked third).
Other base classiﬁer? SVM with probabilistic outputs,
Logistic Regression...
More experiments for the ﬁnal version of the paper!
25. Introduction Our solution The Bayesian network model Results Conclusions and future works
Thank you for your
attention!
Questions, comments, criticism?
<SPAM>Expecting to defend my PhD by April 2010,
searching for a PostDoc (in Europe) for 2010 on ML/IR
related stuff. Any offers? < /SPAM>
Be the first to comment