SlideShare a Scribd company logo
1 of 1
Determining bronchial gene expression signature of Chronic Obstructive Pulmonary Disease
by Machine Learning Techniques
Thi K. Tran-Nguyen1, Tongbin Zhang2, Son Do Hai Dang3 and Steven R. Duncan1
Methods and Results
Methods and Results (cont.)
Acknowledgement
Introduction
Conclusion
(1) Division of Pulmonary and Critical Care Medicine, Department of Medicine, The University of Alabama at Birmingham, Birmingham, AL, U.S.A.
(2) The 1st School of Medicine and School of Information and Engineering, Wenzhou Medical University, Zhejiang, China
(3) Master in Data Sciences Program, The University of Alabama at Birmingham, Birmingham, AL, U.S.A.
Chronic Obstructive Pulmonary Diseases (COPD) is the most
common group of respiratory disorders, which are characterized
by persistent and irreversible airflow obstruction.
Molecular phenotyping of COPD status is challenging due to
limited access to lung tissues from patients with debilitating lung
functions. On the other hand, bronchial brushing is a less invasive
method which also allows clinicians and researchers to sample
airway epithelial cells to better understand the changes in cellular
and molecular landscape in COPD lungs.
Therefore, we utilized a GEO dataset (GSE37147) which
specifically profiled the bronchial epithelial cells obtained by
bronchoscopy in a group of smokers with and without COPD
(smoke controls-SC). To analyze this dataset, we utilized two
different machine learning (ML) techniques to classify the COPD
from the SC group using gene expression as features.
2. Random Forest to classify COPD from SC
Figure 2. Feature selection improved Random Forest predictive
performance. After running the predictor importance estimation,
we selected 17 genes with the highest importance ranking for
subsequent classification task. Prior to this feature selection, by using
5-fold cross-validation, we achieved the classification accuracy of
0.676 and AUC of 0.724. After feature selection, we improved the
classification accuracy to 0.744 and AUC to 0.812.
‘Biomarker: ML perspective’
To classify COPD from SC phenotypes:
‘Classification problem’
‘Hub gene’ – influencing many other genes:
‘Regression problem’
Random Forest:
Classification + Feature selection
Support Vector:
Regression
‘top-weighted gene as biomarker’ Control association profile COPD association profile
- =
Which genes have ‘most different association patterns’?
1. Conceptual and computing framework
3. Support Vector Machine (SVM) to identify COPD
hub genes
Figure 3. SVM results show the bronchial genes that are “hub genes”
in COPD compared to SC. First, SVM was used to compute the gene
expression correlation matrices in either COPD or SC cohort. For each
gene, we calculated the differences in the correlation patterns between
the COPD matrix and SC matrix. These values were then used to identify
genes with the most difference in the gene network topology between
COPD and SC.
Figure 1. We conceptualize the identification of COPD biomarkers as
two different problems that can be solved by ML computing frameworks.
Methods and Results (cont.)
4. Two different types of biomarkers highlight
two different molecular processes in COPD vs SC
Positive correlation
Negative correlation
Figure 4. Distinct association patterns
found in COPD vs SC bronchial gene
expression correlation matrix.
• Using 2 different ML techniques, we identified the bronchial gene expression
signature for COPD using data obtained from bronchoscopy.
• Among the highest ranked genes, many have been reported to be COPD
biomarkers in studies assaying COPD lung tissues, suggesting bronchial
brushing can be used as a reliable and robust surrogate tissues to assess
COPD status without the need to sample lung tissues.
• novel gene expression patterns in COPD airways suggest novel mechanisms
of airflow obstruction.
I would like to give special thanks to Thanh Nguyen, Ph.D. from the UAB
Informatics Institute for his ideas and support during the execution of this project.

More Related Content

What's hot

UROP Final poster
UROP Final posterUROP Final poster
UROP Final poster
John Mathew
 
Mining Gene Expression Data Focusing Cancer Therapeutics: A Digest
Mining Gene Expression Data Focusing Cancer Therapeutics: A DigestMining Gene Expression Data Focusing Cancer Therapeutics: A Digest
Mining Gene Expression Data Focusing Cancer Therapeutics: A Digest
KaashivInfoTech Company
 

What's hot (10)

Power spectrum sequence analysis of rheumatic
Power spectrum sequence analysis of rheumaticPower spectrum sequence analysis of rheumatic
Power spectrum sequence analysis of rheumatic
 
UROP Final poster
UROP Final posterUROP Final poster
UROP Final poster
 
An untargeted metabolomics approach to MRS in the human brain: a comparison b...
An untargeted metabolomics approach to MRS in the human brain: a comparison b...An untargeted metabolomics approach to MRS in the human brain: a comparison b...
An untargeted metabolomics approach to MRS in the human brain: a comparison b...
 
Mining Gene Expression Data Focusing Cancer Therapeutics: A Digest
Mining Gene Expression Data Focusing Cancer Therapeutics: A DigestMining Gene Expression Data Focusing Cancer Therapeutics: A Digest
Mining Gene Expression Data Focusing Cancer Therapeutics: A Digest
 
Thesis title presentation
Thesis title presentationThesis title presentation
Thesis title presentation
 
Gene Association Networks: Large-scale integration of data and text
Gene Association Networks: Large-scale integration of data and textGene Association Networks: Large-scale integration of data and text
Gene Association Networks: Large-scale integration of data and text
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
 
Majority Voting Approach for the Identification of Differentially Expressed G...
Majority Voting Approach for the Identification of Differentially Expressed G...Majority Voting Approach for the Identification of Differentially Expressed G...
Majority Voting Approach for the Identification of Differentially Expressed G...
 
Cellular network biology: Proteome-wide analysis of heterogeneous data
Cellular network biology: Proteome-wide analysis of heterogeneous dataCellular network biology: Proteome-wide analysis of heterogeneous data
Cellular network biology: Proteome-wide analysis of heterogeneous data
 
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and textNetwork Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
 

Similar to Determining bronchial gene expression signature of Chronic Obstructive Pulmonary Disease by machine learning techniques

Genome feature optimization and coronary artery disease prediction using cuck...
Genome feature optimization and coronary artery disease prediction using cuck...Genome feature optimization and coronary artery disease prediction using cuck...
Genome feature optimization and coronary artery disease prediction using cuck...
CSITiaesprime
 
PibPET_iTRAQ_JAD2016
PibPET_iTRAQ_JAD2016PibPET_iTRAQ_JAD2016
PibPET_iTRAQ_JAD2016
Je-Hyun Baek
 
E-book Thesis Sara Carvalho
E-book Thesis  Sara CarvalhoE-book Thesis  Sara Carvalho
E-book Thesis Sara Carvalho
Sara Carvalho
 

Similar to Determining bronchial gene expression signature of Chronic Obstructive Pulmonary Disease by machine learning techniques (20)

CXCL1, CCL20, STAT1 was Identified and Validated as a Key Biomarker Related t...
CXCL1, CCL20, STAT1 was Identified and Validated as a Key Biomarker Related t...CXCL1, CCL20, STAT1 was Identified and Validated as a Key Biomarker Related t...
CXCL1, CCL20, STAT1 was Identified and Validated as a Key Biomarker Related t...
 
CXCL1, CCL20, STAT1 was Identified and Validated as a Key Biomarker Related t...
CXCL1, CCL20, STAT1 was Identified and Validated as a Key Biomarker Related t...CXCL1, CCL20, STAT1 was Identified and Validated as a Key Biomarker Related t...
CXCL1, CCL20, STAT1 was Identified and Validated as a Key Biomarker Related t...
 
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
 
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
 
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
 
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
 
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
 
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
 
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
 
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
 
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
Combined Analysis of Micro RNA and Proteomic Profiles and Interactions in Pat...
 
Genome feature optimization and coronary artery disease prediction using cuck...
Genome feature optimization and coronary artery disease prediction using cuck...Genome feature optimization and coronary artery disease prediction using cuck...
Genome feature optimization and coronary artery disease prediction using cuck...
 
Reference for long range pcr based ngs applications
Reference for long range pcr based ngs applicationsReference for long range pcr based ngs applications
Reference for long range pcr based ngs applications
 
A comparative analysis of chronic obstructive pulmonary disease using machin...
A comparative analysis of chronic obstructive pulmonary  disease using machin...A comparative analysis of chronic obstructive pulmonary  disease using machin...
A comparative analysis of chronic obstructive pulmonary disease using machin...
 
P1-01-17_poster
P1-01-17_posterP1-01-17_poster
P1-01-17_poster
 
PibPET_iTRAQ_JAD2016
PibPET_iTRAQ_JAD2016PibPET_iTRAQ_JAD2016
PibPET_iTRAQ_JAD2016
 
E-book Thesis Sara Carvalho
E-book Thesis  Sara CarvalhoE-book Thesis  Sara Carvalho
E-book Thesis Sara Carvalho
 
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
 
Analisis de la expresion de genes en la depresion
Analisis de la expresion de genes en la depresionAnalisis de la expresion de genes en la depresion
Analisis de la expresion de genes en la depresion
 
Introduction to Network Medicine
Introduction to Network MedicineIntroduction to Network Medicine
Introduction to Network Medicine
 

More from Thi K. Tran-Nguyen, PhD

More from Thi K. Tran-Nguyen, PhD (20)

CHAMP1-family-conference-Oct-2022.pptx
CHAMP1-family-conference-Oct-2022.pptxCHAMP1-family-conference-Oct-2022.pptx
CHAMP1-family-conference-Oct-2022.pptx
 
IL-21 promotes pulmonary fibrosis through the induction of profibrotic CD8+ T...
IL-21 promotes pulmonary fibrosis through the induction of profibrotic CD8+ T...IL-21 promotes pulmonary fibrosis through the induction of profibrotic CD8+ T...
IL-21 promotes pulmonary fibrosis through the induction of profibrotic CD8+ T...
 
BiP-derived HLA-DR4 Epitopes Differentially Recognized by T cells in RA
BiP-derived HLA-DR4 Epitopes Differentially Recognized by T cells in RABiP-derived HLA-DR4 Epitopes Differentially Recognized by T cells in RA
BiP-derived HLA-DR4 Epitopes Differentially Recognized by T cells in RA
 
Fibrotic Diseases
Fibrotic DiseasesFibrotic Diseases
Fibrotic Diseases
 
Histology Exam
Histology ExamHistology Exam
Histology Exam
 
Goblet Cells Deliver Luminal Antigen to CD103+ DCs
Goblet Cells Deliver Luminal Antigen to CD103+ DCsGoblet Cells Deliver Luminal Antigen to CD103+ DCs
Goblet Cells Deliver Luminal Antigen to CD103+ DCs
 
Induction of Protective IgA by intestinal DC
Induction of Protective IgA by intestinal DCInduction of Protective IgA by intestinal DC
Induction of Protective IgA by intestinal DC
 
Fibrosis- Why and How?
Fibrosis- Why and How?Fibrosis- Why and How?
Fibrosis- Why and How?
 
Vietnam
VietnamVietnam
Vietnam
 
Transcriptional Responses to Anti-cancer Drugs in vitro
Transcriptional Responses to Anti-cancer Drugs in vitroTranscriptional Responses to Anti-cancer Drugs in vitro
Transcriptional Responses to Anti-cancer Drugs in vitro
 
CancerSeek
CancerSeekCancerSeek
CancerSeek
 
Deep Learning for EHR Data
Deep Learning for EHR DataDeep Learning for EHR Data
Deep Learning for EHR Data
 
PSN for Precision Medicine
PSN for Precision MedicinePSN for Precision Medicine
PSN for Precision Medicine
 
Extract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningExtract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep Learning
 
Big Data Programming-Final Project
Big Data Programming-Final ProjectBig Data Programming-Final Project
Big Data Programming-Final Project
 
Predictive Features of TCR Repertoire
Predictive Features of TCR RepertoirePredictive Features of TCR Repertoire
Predictive Features of TCR Repertoire
 
Cancer Immunotherapy
Cancer ImmunotherapyCancer Immunotherapy
Cancer Immunotherapy
 
Allogeneic IgG Enhances Antitumor T-cell Immunity
Allogeneic IgG Enhances Antitumor T-cell ImmunityAllogeneic IgG Enhances Antitumor T-cell Immunity
Allogeneic IgG Enhances Antitumor T-cell Immunity
 
CD28null T-cells in Autoimmune Disease
CD28null T-cells in Autoimmune DiseaseCD28null T-cells in Autoimmune Disease
CD28null T-cells in Autoimmune Disease
 
Gut Microbiome Composition Influences Responses to immunotherapy
Gut Microbiome Composition Influences Responses to immunotherapyGut Microbiome Composition Influences Responses to immunotherapy
Gut Microbiome Composition Influences Responses to immunotherapy
 

Recently uploaded

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Recently uploaded (20)

Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 

Determining bronchial gene expression signature of Chronic Obstructive Pulmonary Disease by machine learning techniques

  • 1. Determining bronchial gene expression signature of Chronic Obstructive Pulmonary Disease by Machine Learning Techniques Thi K. Tran-Nguyen1, Tongbin Zhang2, Son Do Hai Dang3 and Steven R. Duncan1 Methods and Results Methods and Results (cont.) Acknowledgement Introduction Conclusion (1) Division of Pulmonary and Critical Care Medicine, Department of Medicine, The University of Alabama at Birmingham, Birmingham, AL, U.S.A. (2) The 1st School of Medicine and School of Information and Engineering, Wenzhou Medical University, Zhejiang, China (3) Master in Data Sciences Program, The University of Alabama at Birmingham, Birmingham, AL, U.S.A. Chronic Obstructive Pulmonary Diseases (COPD) is the most common group of respiratory disorders, which are characterized by persistent and irreversible airflow obstruction. Molecular phenotyping of COPD status is challenging due to limited access to lung tissues from patients with debilitating lung functions. On the other hand, bronchial brushing is a less invasive method which also allows clinicians and researchers to sample airway epithelial cells to better understand the changes in cellular and molecular landscape in COPD lungs. Therefore, we utilized a GEO dataset (GSE37147) which specifically profiled the bronchial epithelial cells obtained by bronchoscopy in a group of smokers with and without COPD (smoke controls-SC). To analyze this dataset, we utilized two different machine learning (ML) techniques to classify the COPD from the SC group using gene expression as features. 2. Random Forest to classify COPD from SC Figure 2. Feature selection improved Random Forest predictive performance. After running the predictor importance estimation, we selected 17 genes with the highest importance ranking for subsequent classification task. Prior to this feature selection, by using 5-fold cross-validation, we achieved the classification accuracy of 0.676 and AUC of 0.724. After feature selection, we improved the classification accuracy to 0.744 and AUC to 0.812. ‘Biomarker: ML perspective’ To classify COPD from SC phenotypes: ‘Classification problem’ ‘Hub gene’ – influencing many other genes: ‘Regression problem’ Random Forest: Classification + Feature selection Support Vector: Regression ‘top-weighted gene as biomarker’ Control association profile COPD association profile - = Which genes have ‘most different association patterns’? 1. Conceptual and computing framework 3. Support Vector Machine (SVM) to identify COPD hub genes Figure 3. SVM results show the bronchial genes that are “hub genes” in COPD compared to SC. First, SVM was used to compute the gene expression correlation matrices in either COPD or SC cohort. For each gene, we calculated the differences in the correlation patterns between the COPD matrix and SC matrix. These values were then used to identify genes with the most difference in the gene network topology between COPD and SC. Figure 1. We conceptualize the identification of COPD biomarkers as two different problems that can be solved by ML computing frameworks. Methods and Results (cont.) 4. Two different types of biomarkers highlight two different molecular processes in COPD vs SC Positive correlation Negative correlation Figure 4. Distinct association patterns found in COPD vs SC bronchial gene expression correlation matrix. • Using 2 different ML techniques, we identified the bronchial gene expression signature for COPD using data obtained from bronchoscopy. • Among the highest ranked genes, many have been reported to be COPD biomarkers in studies assaying COPD lung tissues, suggesting bronchial brushing can be used as a reliable and robust surrogate tissues to assess COPD status without the need to sample lung tissues. • novel gene expression patterns in COPD airways suggest novel mechanisms of airflow obstruction. I would like to give special thanks to Thanh Nguyen, Ph.D. from the UAB Informatics Institute for his ideas and support during the execution of this project.