SlideShare a Scribd company logo
Marcel Brun1, Virginia Ballarín1
                                                                                                                 1Laboratorio                      de Procesos y Medición de Señales, Facultad de Ingeniería, UNMdP
                                                                                                                                                                  mbrun@fi.mdp.edu.ar

               Introduction
                Introduction
 Bolstering is a error estimation technique that provides a less biased estimation than resubstitution, avoiding the large variability of leave-one-out and cross
validation [Braga & Dougherty 2004]. AT this moment a general model for Bolstering was provided for continuous classification spaces, like Rn, where the concept of
expanding the sample points by a circular kernel is conceptually clear, and works very well in practice [Sima & Braga 2005]. In the other hand, discrete classifiers, like
the ones used for image processing and genomic signal processing, present a more complex framework for the design of Bolstered error estimation. In this work we
define a model for Bolstering based on a convolution kernel on both conditional probabilities.

                                                                                                                                                                                                                                                                               BRCA2 = 1            BRCA1 = 0

                                                                                                                                      Discrete Classification in Genomics: Can we deduce                                                                                                                                  Microarray Data
                                                                                                                                       the transcriptional state of a gene, or a phenotypical
                                                                                                                                        feature, based on the transcriptional state of other
                                                                                                                                                                                                                                                                                                                                        Frequencies                           Decision
                                                                                                                                                               genes?                                                                                                            Data Collecting                                         x1x2x3         0          1          x1x2x3   ψ


                                                                                                                                                                                                                                                                               Gene    Gene   Gene 3      Status                          000           0          14          000     1
                                                                                                                                                                                                                                                                                1       2
                                                                                                                                                                                                                                                                                                                                          001           2          6           001     1
                                                                                                                                                                                                                                                                                1       0       1           1
                                                                                                                                                                                                   “If gene X1 is active and gene X2 is                                                                                                   010           3          2           010     0
                                                                                                                                                                                                   suppressed, gene Y would be                                                  1       0       0           1
                                                                                                                                                                                                                                                                                                                                          011           5          1           011     0
                                                                                                                                                                                                   activated”
                                                                                                                                                                                                                                                                                0       1       1           0
                                                                                                                                                   Can we infer regulatory genetic                                                                                                                                                        100           0          3           100     1
                                                                                                                                                       function from the cDNA                                                                                                   1       0       1           1
                                                                                                                                                                                                                                                                                                                                          101           7          2           101     0
                                                                                                                                                           microarray data,
                                                                                                                                                    for both known and unknown                                                                                                  1       0       1           0                             110           3          1




                                                                                                                                                                                                                               R EL-B
                                                                                                                                                                                                                                                                                                                                                                               110     0




                                                                                                                                                                                                                               R CH1
                                                                                                                                                                                                                               B CL3
                                                                                                                                                                                                                               FR A1



                                                                                                                                                                                                                                                IAP -1
                                                                                                                                                                                                                                                A TF3
                                                                                                                                                              functions?
                                                                                                                                                                                                      Cell-line    Condition                                                   …       …       …           …                              111           15         1           111     0
                                                                                                                                                                                                              ML-1        IR   -1   1   1   1    1   1
                                                                                                                                                                                                              ML-1     MMS      0   0   0   0    1   0
                                                                                                                                                                                                             Molt4        IR   -1   0   0   1    1   0
                                                                                                                                                                                                             Molt4     MMS      0   0   1   0    1   0
     Continuous Bolstering: Bolstered                                                                                                                                                                           SR        IR   -1   0   0   1    1   1                        Automatic Design: Statistical analysis of the
                                                                                                                                                                                                                SR     MMS      0   0   0   0    1   0
resubstitution for linear classification, assuming                                                                                                                                                            A549        IR    0   0   0   0    0   0                  relationship between the index (target) and the status of
                                                                                                                                                                                                              A549     MMS      0   0   0   0    1   0
    uniform circular bolstering kernels. The                                                                                                                                                                  A549        UV    0   0   0   0    1   0                     the genes of interest (predictors) define the optimal
                                                                                                                                                                                                             MCF7         IR   -1   0   1   1    0   0
                     bolstered                                                                                                                                                                               MCF7
                                                                                                                                                                                                             MCF7
                                                                                                                                                                                                                       MMS
                                                                                                                                                                                                                          UV
                                                                                                                                                                                                                                0
                                                                                                                                                                                                                                0
                                                                                                                                                                                                                                    0
                                                                                                                                                                                                                                    0
                                                                                                                                                                                                                                        1
                                                                                                                                                                                                                                        1
                                                                                                                                                                                                                                            0
                                                                                                                                                                                                                                            1
                                                                                                                                                                                                                                                 1
                                                                                                                                                                                                                                                 1
                                                                                                                                                                                                                                                     0
                                                                                                                                                                                                                                                     0
                                                                                                                                                                                                                                                                          binary classifier. Resubstitution error is estimated by
     resubstitution error is the sum of all                                                                                                                                                                   RKO
                                                                                                                                                                                                              RKO      MMS
                                                                                                                                                                                                                          IR    0
                                                                                                                                                                                                                                0
                                                                                                                                                                                                                                    1
                                                                                                                                                                                                                                    0
                                                                                                                                                                                                                                        0
                                                                                                                                                                                                                                        0
                                                                                                                                                                                                                                            1
                                                                                                                                                                                                                                            0
                                                                                                                                                                                                                                                 1
                                                                                                                                                                                                                                                 1
                                                                                                                                                                                                                                                     1
                                                                                                                                                                                                                                                     0
                                                                                                                                                                                                                                                                        probability of wrong classification (values in red). In this
  contributions (shaded areas) divided by the                                                                                                                                                                                                                                             example is 9/65=13.8%
                number of points.                                                                                                                                                                                                                                       Resubstitution estimator is usually low biased!!

               Discrete Bolstering
               Discrete Bolstering
Discrete Bolstering: Bolstered resubstitution error estimation for discrete classification, using a lattice bolstering kernel. The bolstered count for each
configuration is based on the weighted sum of its original value and the ones of its neighbors. In this example, the assigned class for configuration 010
changes from Positive to Negative because of the new counting.

                                                        Before Bolstering: estimated error = 0.138                                                                                                                                                             After Bolstering: estimated error = 0.223
                                                                                                                                           1           111
                                                                                                                                                                                                     Bolstering
                                                                  15                           111                                                                                                                                                                      12       111                                              1.1             111




                                                                                                                                                                                                         0.1
                                                  3         110   7                                  5    011            1       110       2           101     1         011                                                                             3.9     110     6.6            5.5         011            1.3     110    2.4                        1.6        011
                                                                                               101                                                                                                                                                                               101                                                              101




                                                                                                                                                                                                         0.7


                                                  0         100   3                            010   2    001            3           100   2           010     6         001                                                                             1        100    2.9     010    2.6         001            3.8      100   3               010        5.9        001


                                                                                                                                                                                            0.1                          0.1




                                                                  0                            000                                         14          000                                                                                                               0.5     000                                              10.9            000


                                                  Number of positive Samples for                                          Number of negative Samples                                              Convolution Kernel                                           Result of convolution for                                 Result of convolution for
                                                   each observed configuration                                           for each observed configuration
                                                                                                                                                                                                                                                                  positive samples                                         negative samples
                                                        (35 observations)                                                       (30 observations)


               Results
               Results                                                                                                                                                                                               Conclusions
                                                                                                                                                                                                                     Conclusions
3 variables simulated data (geometric spatial distribution) with convolution                                                                                                                                        •          Discrete Bolstering can be defined in function of convolution kernels, like in the
kernel varying as function of a parameter a.                                                                                                                                                                                   continuous case.
                                                                                                                                                                                                                    •          Convolution of both conditional probabilities induce changes in the amount of error
                                                                                                                                                                                                                               computed for the estimated classifier.
               Convolution Kernels                                                       Estimated Error as function of the Bolstering Kernel
                                                                                                                                                                                                                    •          The increase/decrease in the estimated error can be made to change continuously as
                                                                                         0.7
                                                                                                                     N = 3, M = 58                                                                                             function of a Kernel Size parameter a.
                                                                                                                                                                                                                    •          Usually there is an optimal a which makes the bolstered error estimator similar to the
                                                                                         0.6
                                                                                                                                                                                                                               true error of the estimated classifier.

                                                                                         0.5
                                                                                                                                                                                                                    •          Future works is directed to the choose the optimal Kernel parameter a for specific
                                                                                                                                                                                                                               situations.
                                                                                                                                                                                Bayes Error 0.282
                                                                       Estimated Error




                                                                                         0.4                                                                                    True error 0.301
                1
                                                                                                                                                                                LOO error 0.293
                                                                                                                                                                                Resub error 0.224                    References
                                                                                                                                                                                                                     References
               0.9                                                                                                                                                              Bolstered (Best b = 0.05)
               0.8
                                                                                         0.3                                                                                    Bolstered 0.302
                                                                                                                                                                                Diff Oper (Best Diff = 0.01)         •         Ulisses Braga-Neto, Edward Dougherty, “Bolstered error estimation”, Pattern Recognition, 37, pp. 1267-1281, 2004.
               0.7


               0.6                                                                                                                                                                                                   •         Braga-Neto, U., and Dougherty, E. R., "Classification," Genomic Signal Processing and Statistics, eds. Dougherty, E. R., Shmulevich, I. ,
Kernel Value




               0.5
                                                                                         0.2                                                                                                                                   Chen, J., and Wang, Z. J., EURASIP Book Series on Signal Processing and Communication, Hindawi Publishing Corporation, 2005.
               0.4
                                                                                                                                                                                                                     •         Choudhary A, Brun M, Hua J, Lowey J, Suh E, Dougherty ER., “Genetic test bed for feature selection”, Bioinformatics. 2006 Apr 1;22(7):837-
               0.3
                                                                                                                                                                                                                               42. Epub 2006 Jan 20.
               0.2
                                                                                         0.1
               0.1
                                                                                                                                                                                                                     •         Chao Sima, Ulisses Braga-Neto and Edward R. Dougherty, “Superior feature-set ranking for small samples using bolstered error estimation”,
                0
                                                                                                                                                                                                                               Bioinformatics, 21 (7), pp 1046–1054, 2005
                     0   0.5   1      1.5     2   2.5   3
                                   Distance
                                                                                          0
                                                                                          −4         −3   −2    −1       0       1             2         3         4                                                 •         Phillip Stafford and Marcel Brun, “Three methods for optimization of cross-laboratory and cross-platform microarray expression data”, Nucleic
                                                                                                                     Parameter b                                                                                               Acids Research, 2007, 1–16
                                                                                                                                                                                                                     •         Qian Xu, Jianping Hua, Ulisses Braga-Neto, Zixiang Xiong, Edward Suh, Edward R. Dougherty, Ph.D., “Confidence Intervals for the True
                                                                                                                                                                                                                               Classification Error Conditioned on the Estimated Error”, Technology in Cancer Research and Treatment, Volume 5, Number 6, December
                                                                                                                                                                                                                               (2006)

More Related Content

More from Asociación Argentina de Bioinformática y Biología Computacional

Discovery Of Functional Protein Linear Motifs Using a Greaddy Algorithm and I...
Discovery Of Functional Protein Linear Motifs Using a Greaddy Algorithm and I...Discovery Of Functional Protein Linear Motifs Using a Greaddy Algorithm and I...
Discovery Of Functional Protein Linear Motifs Using a Greaddy Algorithm and I...
Asociación Argentina de Bioinformática y Biología Computacional
 
Cooperatividad en la Expresión Génica: Abordaje Estocástico
Cooperatividad en la Expresión Génica: Abordaje EstocásticoCooperatividad en la Expresión Génica: Abordaje Estocástico
Cooperatividad en la Expresión Génica: Abordaje Estocástico
Asociación Argentina de Bioinformática y Biología Computacional
 
Prediction of heparin binding sites on GAPDH
Prediction of heparin binding sites on GAPDHPrediction of heparin binding sites on GAPDH
Prediction of heparin binding sites on GAPDH
Asociación Argentina de Bioinformática y Biología Computacional
 
Signals of Evolution: Conservation, Specificity Determining Positions and Coe...
Signals of Evolution: Conservation, Specificity Determining Positions and Coe...Signals of Evolution: Conservation, Specificity Determining Positions and Coe...
Signals of Evolution: Conservation, Specificity Determining Positions and Coe...
Asociación Argentina de Bioinformática y Biología Computacional
 
Predicting peptide/MHC interactions: Application to epitope identification an...
Predicting peptide/MHC interactions: Application to epitope identification an...Predicting peptide/MHC interactions: Application to epitope identification an...
Predicting peptide/MHC interactions: Application to epitope identification an...
Asociación Argentina de Bioinformática y Biología Computacional
 
Design of degenerated primers from bioinformatics online software for putativ...
Design of degenerated primers from bioinformatics online software for putativ...Design of degenerated primers from bioinformatics online software for putativ...
Design of degenerated primers from bioinformatics online software for putativ...
Asociación Argentina de Bioinformática y Biología Computacional
 
Modelado de la proteína p35 de toxoplasma gondii
Modelado de la proteína p35 de toxoplasma gondiiModelado de la proteína p35 de toxoplasma gondii
Modelado de la proteína p35 de toxoplasma gondii
Asociación Argentina de Bioinformática y Biología Computacional
 
Data balancing for phenotype classification based on SNPs
Data balancing for phenotype classification based on SNPsData balancing for phenotype classification based on SNPs
Data balancing for phenotype classification based on SNPs
Asociación Argentina de Bioinformática y Biología Computacional
 
Gene selection via significant subset using silhouette index
Gene selection via significant subset using silhouette indexGene selection via significant subset using silhouette index
Gene selection via significant subset using silhouette index
Asociación Argentina de Bioinformática y Biología Computacional
 
Biopython: Overview, State of the Art and Outlook
Biopython: Overview, State of the Art and OutlookBiopython: Overview, State of the Art and Outlook
Biopython: Overview, State of the Art and Outlook
Asociación Argentina de Bioinformática y Biología Computacional
 
¿Cuál es la estabilidad relevante de las proteínas?
¿Cuál es la estabilidad relevante de las proteínas?¿Cuál es la estabilidad relevante de las proteínas?
¿Cuál es la estabilidad relevante de las proteínas?
Asociación Argentina de Bioinformática y Biología Computacional
 
Biogeografía histórica y Análisis de Vicarianza: Una perspectiva computacional
Biogeografía histórica y Análisis de Vicarianza: Una perspectiva computacionalBiogeografía histórica y Análisis de Vicarianza: Una perspectiva computacional
Biogeografía histórica y Análisis de Vicarianza: Una perspectiva computacional
Asociación Argentina de Bioinformática y Biología Computacional
 

More from Asociación Argentina de Bioinformática y Biología Computacional (13)

Discovery Of Functional Protein Linear Motifs Using a Greaddy Algorithm and I...
Discovery Of Functional Protein Linear Motifs Using a Greaddy Algorithm and I...Discovery Of Functional Protein Linear Motifs Using a Greaddy Algorithm and I...
Discovery Of Functional Protein Linear Motifs Using a Greaddy Algorithm and I...
 
Cooperatividad en la Expresión Génica: Abordaje Estocástico
Cooperatividad en la Expresión Génica: Abordaje EstocásticoCooperatividad en la Expresión Génica: Abordaje Estocástico
Cooperatividad en la Expresión Génica: Abordaje Estocástico
 
Prediction of heparin binding sites on GAPDH
Prediction of heparin binding sites on GAPDHPrediction of heparin binding sites on GAPDH
Prediction of heparin binding sites on GAPDH
 
Signals of Evolution: Conservation, Specificity Determining Positions and Coe...
Signals of Evolution: Conservation, Specificity Determining Positions and Coe...Signals of Evolution: Conservation, Specificity Determining Positions and Coe...
Signals of Evolution: Conservation, Specificity Determining Positions and Coe...
 
Predicting peptide/MHC interactions: Application to epitope identification an...
Predicting peptide/MHC interactions: Application to epitope identification an...Predicting peptide/MHC interactions: Application to epitope identification an...
Predicting peptide/MHC interactions: Application to epitope identification an...
 
Design of degenerated primers from bioinformatics online software for putativ...
Design of degenerated primers from bioinformatics online software for putativ...Design of degenerated primers from bioinformatics online software for putativ...
Design of degenerated primers from bioinformatics online software for putativ...
 
A structure-function analysis of s HSPs in plants
A structure-function analysis of s HSPs in plantsA structure-function analysis of s HSPs in plants
A structure-function analysis of s HSPs in plants
 
Modelado de la proteína p35 de toxoplasma gondii
Modelado de la proteína p35 de toxoplasma gondiiModelado de la proteína p35 de toxoplasma gondii
Modelado de la proteína p35 de toxoplasma gondii
 
Data balancing for phenotype classification based on SNPs
Data balancing for phenotype classification based on SNPsData balancing for phenotype classification based on SNPs
Data balancing for phenotype classification based on SNPs
 
Gene selection via significant subset using silhouette index
Gene selection via significant subset using silhouette indexGene selection via significant subset using silhouette index
Gene selection via significant subset using silhouette index
 
Biopython: Overview, State of the Art and Outlook
Biopython: Overview, State of the Art and OutlookBiopython: Overview, State of the Art and Outlook
Biopython: Overview, State of the Art and Outlook
 
¿Cuál es la estabilidad relevante de las proteínas?
¿Cuál es la estabilidad relevante de las proteínas?¿Cuál es la estabilidad relevante de las proteínas?
¿Cuál es la estabilidad relevante de las proteínas?
 
Biogeografía histórica y Análisis de Vicarianza: Una perspectiva computacional
Biogeografía histórica y Análisis de Vicarianza: Una perspectiva computacionalBiogeografía histórica y Análisis de Vicarianza: Una perspectiva computacional
Biogeografía histórica y Análisis de Vicarianza: Una perspectiva computacional
 

Recently uploaded

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 

Recently uploaded (20)

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 

Bolstered error estimation for discrete classifier applied to genomic signal processing

  • 1. Marcel Brun1, Virginia Ballarín1 1Laboratorio de Procesos y Medición de Señales, Facultad de Ingeniería, UNMdP mbrun@fi.mdp.edu.ar Introduction Introduction Bolstering is a error estimation technique that provides a less biased estimation than resubstitution, avoiding the large variability of leave-one-out and cross validation [Braga & Dougherty 2004]. AT this moment a general model for Bolstering was provided for continuous classification spaces, like Rn, where the concept of expanding the sample points by a circular kernel is conceptually clear, and works very well in practice [Sima & Braga 2005]. In the other hand, discrete classifiers, like the ones used for image processing and genomic signal processing, present a more complex framework for the design of Bolstered error estimation. In this work we define a model for Bolstering based on a convolution kernel on both conditional probabilities. BRCA2 = 1 BRCA1 = 0 Discrete Classification in Genomics: Can we deduce Microarray Data the transcriptional state of a gene, or a phenotypical feature, based on the transcriptional state of other Frequencies Decision genes? Data Collecting x1x2x3 0 1 x1x2x3 ψ Gene Gene Gene 3 Status 000 0 14 000 1 1 2 001 2 6 001 1 1 0 1 1 “If gene X1 is active and gene X2 is 010 3 2 010 0 suppressed, gene Y would be 1 0 0 1 011 5 1 011 0 activated” 0 1 1 0 Can we infer regulatory genetic 100 0 3 100 1 function from the cDNA 1 0 1 1 101 7 2 101 0 microarray data, for both known and unknown 1 0 1 0 110 3 1 R EL-B 110 0 R CH1 B CL3 FR A1 IAP -1 A TF3 functions? Cell-line Condition … … … … 111 15 1 111 0 ML-1 IR -1 1 1 1 1 1 ML-1 MMS 0 0 0 0 1 0 Molt4 IR -1 0 0 1 1 0 Molt4 MMS 0 0 1 0 1 0 Continuous Bolstering: Bolstered SR IR -1 0 0 1 1 1 Automatic Design: Statistical analysis of the SR MMS 0 0 0 0 1 0 resubstitution for linear classification, assuming A549 IR 0 0 0 0 0 0 relationship between the index (target) and the status of A549 MMS 0 0 0 0 1 0 uniform circular bolstering kernels. The A549 UV 0 0 0 0 1 0 the genes of interest (predictors) define the optimal MCF7 IR -1 0 1 1 0 0 bolstered MCF7 MCF7 MMS UV 0 0 0 0 1 1 0 1 1 1 0 0 binary classifier. Resubstitution error is estimated by resubstitution error is the sum of all RKO RKO MMS IR 0 0 1 0 0 0 1 0 1 1 1 0 probability of wrong classification (values in red). In this contributions (shaded areas) divided by the example is 9/65=13.8% number of points. Resubstitution estimator is usually low biased!! Discrete Bolstering Discrete Bolstering Discrete Bolstering: Bolstered resubstitution error estimation for discrete classification, using a lattice bolstering kernel. The bolstered count for each configuration is based on the weighted sum of its original value and the ones of its neighbors. In this example, the assigned class for configuration 010 changes from Positive to Negative because of the new counting. Before Bolstering: estimated error = 0.138 After Bolstering: estimated error = 0.223 1 111 Bolstering 15 111 12 111 1.1 111 0.1 3 110 7 5 011 1 110 2 101 1 011 3.9 110 6.6 5.5 011 1.3 110 2.4 1.6 011 101 101 101 0.7 0 100 3 010 2 001 3 100 2 010 6 001 1 100 2.9 010 2.6 001 3.8 100 3 010 5.9 001 0.1 0.1 0 000 14 000 0.5 000 10.9 000 Number of positive Samples for Number of negative Samples Convolution Kernel Result of convolution for Result of convolution for each observed configuration for each observed configuration positive samples negative samples (35 observations) (30 observations) Results Results Conclusions Conclusions 3 variables simulated data (geometric spatial distribution) with convolution • Discrete Bolstering can be defined in function of convolution kernels, like in the kernel varying as function of a parameter a. continuous case. • Convolution of both conditional probabilities induce changes in the amount of error computed for the estimated classifier. Convolution Kernels Estimated Error as function of the Bolstering Kernel • The increase/decrease in the estimated error can be made to change continuously as 0.7 N = 3, M = 58 function of a Kernel Size parameter a. • Usually there is an optimal a which makes the bolstered error estimator similar to the 0.6 true error of the estimated classifier. 0.5 • Future works is directed to the choose the optimal Kernel parameter a for specific situations. Bayes Error 0.282 Estimated Error 0.4 True error 0.301 1 LOO error 0.293 Resub error 0.224 References References 0.9 Bolstered (Best b = 0.05) 0.8 0.3 Bolstered 0.302 Diff Oper (Best Diff = 0.01) • Ulisses Braga-Neto, Edward Dougherty, “Bolstered error estimation”, Pattern Recognition, 37, pp. 1267-1281, 2004. 0.7 0.6 • Braga-Neto, U., and Dougherty, E. R., "Classification," Genomic Signal Processing and Statistics, eds. Dougherty, E. R., Shmulevich, I. , Kernel Value 0.5 0.2 Chen, J., and Wang, Z. J., EURASIP Book Series on Signal Processing and Communication, Hindawi Publishing Corporation, 2005. 0.4 • Choudhary A, Brun M, Hua J, Lowey J, Suh E, Dougherty ER., “Genetic test bed for feature selection”, Bioinformatics. 2006 Apr 1;22(7):837- 0.3 42. Epub 2006 Jan 20. 0.2 0.1 0.1 • Chao Sima, Ulisses Braga-Neto and Edward R. Dougherty, “Superior feature-set ranking for small samples using bolstered error estimation”, 0 Bioinformatics, 21 (7), pp 1046–1054, 2005 0 0.5 1 1.5 2 2.5 3 Distance 0 −4 −3 −2 −1 0 1 2 3 4 • Phillip Stafford and Marcel Brun, “Three methods for optimization of cross-laboratory and cross-platform microarray expression data”, Nucleic Parameter b Acids Research, 2007, 1–16 • Qian Xu, Jianping Hua, Ulisses Braga-Neto, Zixiang Xiong, Edward Suh, Edward R. Dougherty, Ph.D., “Confidence Intervals for the True Classification Error Conditioned on the Estimated Error”, Technology in Cancer Research and Treatment, Volume 5, Number 6, December (2006)