Bolstered error estimation for discrete classifier applied to genomic signal processing

Marcel Brun1, Virginia Ballarín1
1Laboratorio de Procesos y Medición de Señales, Facultad de Ingeniería, UNMdP
mbrun@fi.mdp.edu.ar

Introduction
Introduction
Bolstering is a error estimation technique that provides a less biased estimation than resubstitution, avoiding the large variability of leave-one-out and cross
validation [Braga & Dougherty 2004]. AT this moment a general model for Bolstering was provided for continuous classification spaces, like Rn, where the concept of
expanding the sample points by a circular kernel is conceptually clear, and works very well in practice [Sima & Braga 2005]. In the other hand, discrete classifiers, like
the ones used for image processing and genomic signal processing, present a more complex framework for the design of Bolstered error estimation. In this work we
define a model for Bolstering based on a convolution kernel on both conditional probabilities.

BRCA2 = 1 BRCA1 = 0

Discrete Classification in Genomics: Can we deduce Microarray Data
the transcriptional state of a gene, or a phenotypical
feature, based on the transcriptional state of other
Frequencies Decision
genes? Data Collecting x1x2x3 0 1 x1x2x3 ψ

Gene Gene Gene 3 Status 000 0 14 000 1
1 2
001 2 6 001 1
1 0 1 1
“If gene X1 is active and gene X2 is 010 3 2 010 0
suppressed, gene Y would be 1 0 0 1
011 5 1 011 0
activated”
0 1 1 0
Can we infer regulatory genetic 100 0 3 100 1
function from the cDNA 1 0 1 1
101 7 2 101 0
microarray data,
for both known and unknown 1 0 1 0 110 3 1

R EL-B
110 0

R CH1
B CL3
FR A1

IAP -1
A TF3
functions?
Cell-line Condition … … … … 111 15 1 111 0
ML-1 IR -1 1 1 1 1 1
ML-1 MMS 0 0 0 0 1 0
Molt4 IR -1 0 0 1 1 0
Molt4 MMS 0 0 1 0 1 0
Continuous Bolstering: Bolstered SR IR -1 0 0 1 1 1 Automatic Design: Statistical analysis of the
SR MMS 0 0 0 0 1 0
resubstitution for linear classification, assuming A549 IR 0 0 0 0 0 0 relationship between the index (target) and the status of
A549 MMS 0 0 0 0 1 0
uniform circular bolstering kernels. The A549 UV 0 0 0 0 1 0 the genes of interest (predictors) define the optimal
MCF7 IR -1 0 1 1 0 0
bolstered MCF7
MCF7
MMS
UV
0
0
0
0
1
1
0
1
1
1
0
0
binary classifier. Resubstitution error is estimated by
resubstitution error is the sum of all RKO
RKO MMS
IR 0
0
1
0
0
0
1
0
1
1
1
0
probability of wrong classification (values in red). In this
contributions (shaded areas) divided by the example is 9/65=13.8%
number of points. Resubstitution estimator is usually low biased!!

Discrete Bolstering
Discrete Bolstering
Discrete Bolstering: Bolstered resubstitution error estimation for discrete classification, using a lattice bolstering kernel. The bolstered count for each
configuration is based on the weighted sum of its original value and the ones of its neighbors. In this example, the assigned class for configuration 010
changes from Positive to Negative because of the new counting.

Before Bolstering: estimated error = 0.138 After Bolstering: estimated error = 0.223
1 111
Bolstering
15 111 12 111 1.1 111

0.1
3 110 7 5 011 1 110 2 101 1 011 3.9 110 6.6 5.5 011 1.3 110 2.4 1.6 011
101 101 101

0.7

0 100 3 010 2 001 3 100 2 010 6 001 1 100 2.9 010 2.6 001 3.8 100 3 010 5.9 001

0.1 0.1

0 000 14 000 0.5 000 10.9 000

Number of positive Samples for Number of negative Samples Convolution Kernel Result of convolution for Result of convolution for
each observed configuration for each observed configuration
positive samples negative samples
(35 observations) (30 observations)

Results
Results Conclusions
Conclusions
3 variables simulated data (geometric spatial distribution) with convolution • Discrete Bolstering can be defined in function of convolution kernels, like in the
kernel varying as function of a parameter a. continuous case.
• Convolution of both conditional probabilities induce changes in the amount of error
computed for the estimated classifier.
Convolution Kernels Estimated Error as function of the Bolstering Kernel
• The increase/decrease in the estimated error can be made to change continuously as
0.7
N = 3, M = 58 function of a Kernel Size parameter a.
• Usually there is an optimal a which makes the bolstered error estimator similar to the
0.6
true error of the estimated classifier.

0.5
• Future works is directed to the choose the optimal Kernel parameter a for specific
situations.
Bayes Error 0.282
Estimated Error

0.4 True error 0.301
1
LOO error 0.293
Resub error 0.224 References
References
0.9 Bolstered (Best b = 0.05)
0.8
0.3 Bolstered 0.302
Diff Oper (Best Diff = 0.01) • Ulisses Braga-Neto, Edward Dougherty, “Bolstered error estimation”, Pattern Recognition, 37, pp. 1267-1281, 2004.
0.7

0.6 • Braga-Neto, U., and Dougherty, E. R., "Classification," Genomic Signal Processing and Statistics, eds. Dougherty, E. R., Shmulevich, I. ,
Kernel Value

0.5
0.2 Chen, J., and Wang, Z. J., EURASIP Book Series on Signal Processing and Communication, Hindawi Publishing Corporation, 2005.
0.4
• Choudhary A, Brun M, Hua J, Lowey J, Suh E, Dougherty ER., “Genetic test bed for feature selection”, Bioinformatics. 2006 Apr 1;22(7):837-
0.3
42. Epub 2006 Jan 20.
0.2
0.1
0.1
• Chao Sima, Ulisses Braga-Neto and Edward R. Dougherty, “Superior feature-set ranking for small samples using bolstered error estimation”,
0
Bioinformatics, 21 (7), pp 1046–1054, 2005
0 0.5 1 1.5 2 2.5 3
Distance
0
−4 −3 −2 −1 0 1 2 3 4 • Phillip Stafford and Marcel Brun, “Three methods for optimization of cross-laboratory and cross-platform microarray expression data”, Nucleic
Parameter b Acids Research, 2007, 1–16
• Qian Xu, Jianping Hua, Ulisses Braga-Neto, Zixiang Xiong, Edward Suh, Edward R. Dougherty, Ph.D., “Confidence Intervals for the True
Classification Error Conditioned on the Estimated Error”, Technology in Cancer Research and Treatment, Volume 5, Number 6, December
(2006)

Bolstered error estimation for discrete classifier applied to genomic signal processing

Recommended

Recommended

More Related Content

More from Asociación Argentina de Bioinformática y Biología Computacional

More from Asociación Argentina de Bioinformática y Biología Computacional (13)

Recently uploaded

Recently uploaded (20)

Bolstered error estimation for discrete classifier applied to genomic signal processing