SlideShare a Scribd company logo
A Method to Facilitate Cancer Detection
and Type Classification from Gene
Expression Data using a Deep Autoencoder
and Neural Network
By Xi Chen
March 27, 2019
Gene Expression Data Properties.
• Gene expresses differently depending upon various factors such as the type
of cells, environment and disease conditions.
• Gene expression data are highly available due to the increased affordability
of the sequencing technology.
• Gene expression data are multimodality, high dimensional with small
observation number (#row << #column).
• Gene expression data can be used for disease detection and classification,
and drug suggestion.
2
Gene Expression Data With Dimension
Reduction
• Using dimension reduction methods, such as PCA, for feature
selection, since gene expression data have high dimension.
• Apply traditional statistical and machine learning methods for
application such as disease detection or classification.
• Problem: how to explain the selected features. E.g. Each PC is a linear
combination of the gene expression features.
3
Proposed Drug Suggestion Scheme.
2D Gene Expression Representation
Feature 1
Feature2 Drug Sensitivity
Drug A
Drug B
Drug C
Drug D
Cluster Approaches:
• K-means
• Gaussian Mixture Models 4
Problem: Current Gene Expression Data Don’t
Include Drug Results.
• Most gene expression data aren’t associated with well documented
medical records.
• Available records often miss drug information and patient disease
outputs.
5
Solving The Harder Classification Problem First,
Then We Could Infer Cluster Approach Works
• In general, a classification problem is similar to a cluster problem, e.g.
k-Nearest Neighbors algorithm.
• If using gene expression data we could achieve high accurate
classification results, we might be able to suggest clustering gene
expression data for drug suggestion.
6
Data Processing
60,483
14,157
7
Computation Platform
8
Autoencoder For Feature Learning
Minimize 𝑓(𝐼𝑛𝑝𝑢𝑡 − 𝑂𝑢𝑡𝑝𝑢𝑡)
100
50
25
50
100
Training Autoencoder
1st hidden layer:
2nd hidden layer:
3rd hidden layer:
4th hidden layer:
5th hidden layer:
Model
Configuration
9
Learned Feature + Neural Network
10
Single Type Classification
Lung cancer, abundant and balanced data 11
Why Not PCA?
• PCA is a descriptive model.
• Each component is a linear
combination of all the
features.
• Hard to explain.
12
Cancer Type
Acronym
Full Name
LGG Lower Grad Glioma
UVM Uveal Melanoma
LUSC
Lung squamous cell
carcinoma
GBM Glioblastoma Multiforme
Multiple Type
Classification
• Misclassifications are due to small
sample size.
• Misclassifications are sparse,
clustering potential.
13
Conclusion
• Autoencoder to automatically generate feature representations, thus
addressing the very high dimensionality of gene expression data.
• The extracted feature vector captures the non-linearity of the data.
• This approach is scalable for new data after training, and it can
generalize in multi-classification of different types of cancer.
• We have demonstrated the high accuracy and low FNR/FPR of this
method for the majority of the abundant cancer types, and its
potential for handling sub-classification within certain cancers and
identifying metastasis cancers.
14
Other Projects—Deep Learning Behind The
Scenes
• Almost all machine learning applications use
similar approaches—Feature Engineering +
Deep Learning.
• E.g. Self-driving cars = CNN + DNN
• Feature engineering  CNN
• Deep Learning training  DNN
• Deployment
15
Thank you so
much!
Questions?
16

More Related Content

What's hot

CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Project
butest
 
DREAM Challenge
DREAM ChallengeDREAM Challenge
DREAM Challenge
Tulip Nandu
 
Cause-effect relationships in medicine
Cause-effect relationships in medicineCause-effect relationships in medicine
Cause-effect relationships in medicine
Kisun_bioinfo
 
Computational predictiction of prrotein structure
Computational predictiction of prrotein structureComputational predictiction of prrotein structure
Computational predictiction of prrotein structure
Archita Srivastava
 
Lecture 13 – comparative modeling
Lecture 13 – comparative modelingLecture 13 – comparative modeling
Lecture 13 – comparative modeling
RAJAN ROLTA
 
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
IJTET Journal
 
Diagnosis of Cancer using Fuzzy Rough Set Theory
Diagnosis of Cancer using Fuzzy Rough Set TheoryDiagnosis of Cancer using Fuzzy Rough Set Theory
Diagnosis of Cancer using Fuzzy Rough Set Theory
IRJET Journal
 
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
ijbbjournal
 
A Network View on Parkinson’s Disease Elsevier webinar 15 jan 2015
A Network View on Parkinson’s Disease Elsevier webinar 15 jan 2015A Network View on Parkinson’s Disease Elsevier webinar 15 jan 2015
A Network View on Parkinson’s Disease Elsevier webinar 15 jan 2015
Ann-Marie Roche
 
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
Setia Pramana
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Kimberly Williams
 
An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification
mlaij
 
CADD by Dr. Rajan swami
CADD by Dr. Rajan swamiCADD by Dr. Rajan swami
CADD by Dr. Rajan swami
MrRajanSwamiSwami
 
Integrative Genomics of Non-Small Cell Lung Cancer by Peter McLoughlin
Integrative Genomics of Non-Small Cell Lung Cancer by Peter McLoughlinIntegrative Genomics of Non-Small Cell Lung Cancer by Peter McLoughlin
Integrative Genomics of Non-Small Cell Lung Cancer by Peter McLoughlin
Cirdan
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
IJDKP
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
IJDKP
 
1207.2600
1207.26001207.2600
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
mlaij
 
Identification of novel potential anti cancer agents using network pharmacolo...
Identification of novel potential anti cancer agents using network pharmacolo...Identification of novel potential anti cancer agents using network pharmacolo...
Identification of novel potential anti cancer agents using network pharmacolo...
Cresset
 
Network Pharmacology Tri-Con 022212
Network Pharmacology Tri-Con 022212Network Pharmacology Tri-Con 022212
Network Pharmacology Tri-Con 022212
Philip Bourne
 

What's hot (20)

CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Project
 
DREAM Challenge
DREAM ChallengeDREAM Challenge
DREAM Challenge
 
Cause-effect relationships in medicine
Cause-effect relationships in medicineCause-effect relationships in medicine
Cause-effect relationships in medicine
 
Computational predictiction of prrotein structure
Computational predictiction of prrotein structureComputational predictiction of prrotein structure
Computational predictiction of prrotein structure
 
Lecture 13 – comparative modeling
Lecture 13 – comparative modelingLecture 13 – comparative modeling
Lecture 13 – comparative modeling
 
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
 
Diagnosis of Cancer using Fuzzy Rough Set Theory
Diagnosis of Cancer using Fuzzy Rough Set TheoryDiagnosis of Cancer using Fuzzy Rough Set Theory
Diagnosis of Cancer using Fuzzy Rough Set Theory
 
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
 
A Network View on Parkinson’s Disease Elsevier webinar 15 jan 2015
A Network View on Parkinson’s Disease Elsevier webinar 15 jan 2015A Network View on Parkinson’s Disease Elsevier webinar 15 jan 2015
A Network View on Parkinson’s Disease Elsevier webinar 15 jan 2015
 
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification
 
CADD by Dr. Rajan swami
CADD by Dr. Rajan swamiCADD by Dr. Rajan swami
CADD by Dr. Rajan swami
 
Integrative Genomics of Non-Small Cell Lung Cancer by Peter McLoughlin
Integrative Genomics of Non-Small Cell Lung Cancer by Peter McLoughlinIntegrative Genomics of Non-Small Cell Lung Cancer by Peter McLoughlin
Integrative Genomics of Non-Small Cell Lung Cancer by Peter McLoughlin
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
 
1207.2600
1207.26001207.2600
1207.2600
 
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
 
Identification of novel potential anti cancer agents using network pharmacolo...
Identification of novel potential anti cancer agents using network pharmacolo...Identification of novel potential anti cancer agents using network pharmacolo...
Identification of novel potential anti cancer agents using network pharmacolo...
 
Network Pharmacology Tri-Con 022212
Network Pharmacology Tri-Con 022212Network Pharmacology Tri-Con 022212
Network Pharmacology Tri-Con 022212
 

Similar to A Method to facilitate cancer detection and type classification from gene expression using a deep auto-encoder and neural network

DataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptxDataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptx
MaligireddyTanujaRed1
 
Datamining in BreastCancer.pptx
Datamining in BreastCancer.pptxDatamining in BreastCancer.pptx
Datamining in BreastCancer.pptx
MaligireddyTanujaRed1
 
Mining of Important Informative Genes and Classifier Construction for Cancer ...
Mining of Important Informative Genes and Classifier Construction for Cancer ...Mining of Important Informative Genes and Classifier Construction for Cancer ...
Mining of Important Informative Genes and Classifier Construction for Cancer ...
ijsc
 
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
ijsc
 
Updated proposal powerpoint.pptx
Updated proposal powerpoint.pptxUpdated proposal powerpoint.pptx
Updated proposal powerpoint.pptx
AriyoAgbajeGbeminiyi
 
Parkinson disease classification recorded v2.0
Parkinson disease classification recorded   v2.0Parkinson disease classification recorded   v2.0
Parkinson disease classification recorded v2.0
Nikhil Shrivastava, MS, SAFe PMPO
 
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
Kiogyf
 
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer Diseases
Mohamed Loey
 
May workshop
May workshopMay workshop
May workshop
Fahadahammed2
 
May 15 workshop
May 15  workshopMay 15  workshop
May 15 workshop
Fahadahammed2
 
Parkinson disease classification v2.0
Parkinson disease classification v2.0Parkinson disease classification v2.0
Parkinson disease classification v2.0
Nikhil Shrivastava, MS, SAFe PMPO
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
ijsc
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
ijsc
 
seminar 2 of disease and healthcare.pptx
seminar 2 of disease and healthcare.pptxseminar 2 of disease and healthcare.pptx
seminar 2 of disease and healthcare.pptx
amansinghania16
 
KG_based pharma marketing.pptx
KG_based pharma marketing.pptxKG_based pharma marketing.pptx
KG_based pharma marketing.pptx
Sridhar Nomula
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
IRJET Journal
 
A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree In...
A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree In...A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree In...
A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree In...
CSCJournals
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
ijscai
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
ijscai
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
ijscai
 

Similar to A Method to facilitate cancer detection and type classification from gene expression using a deep auto-encoder and neural network (20)

DataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptxDataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptx
 
Datamining in BreastCancer.pptx
Datamining in BreastCancer.pptxDatamining in BreastCancer.pptx
Datamining in BreastCancer.pptx
 
Mining of Important Informative Genes and Classifier Construction for Cancer ...
Mining of Important Informative Genes and Classifier Construction for Cancer ...Mining of Important Informative Genes and Classifier Construction for Cancer ...
Mining of Important Informative Genes and Classifier Construction for Cancer ...
 
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
 
Updated proposal powerpoint.pptx
Updated proposal powerpoint.pptxUpdated proposal powerpoint.pptx
Updated proposal powerpoint.pptx
 
Parkinson disease classification recorded v2.0
Parkinson disease classification recorded   v2.0Parkinson disease classification recorded   v2.0
Parkinson disease classification recorded v2.0
 
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
 
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer Diseases
 
May workshop
May workshopMay workshop
May workshop
 
May 15 workshop
May 15  workshopMay 15  workshop
May 15 workshop
 
Parkinson disease classification v2.0
Parkinson disease classification v2.0Parkinson disease classification v2.0
Parkinson disease classification v2.0
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
 
seminar 2 of disease and healthcare.pptx
seminar 2 of disease and healthcare.pptxseminar 2 of disease and healthcare.pptx
seminar 2 of disease and healthcare.pptx
 
KG_based pharma marketing.pptx
KG_based pharma marketing.pptxKG_based pharma marketing.pptx
KG_based pharma marketing.pptx
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
 
A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree In...
A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree In...A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree In...
A Novel Approach for Cancer Detection in MRI Mammogram Using Decision Tree In...
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
 

More from Xi Chen

SIAM CSE21 Broader Engagement Program Flyer
SIAM CSE21 Broader Engagement Program FlyerSIAM CSE21 Broader Engagement Program Flyer
SIAM CSE21 Broader Engagement Program Flyer
Xi Chen
 
Pan-Cancer Epigenetic Biomarker Selection from Blood Sample Using SAS®
Pan-Cancer Epigenetic Biomarker Selection from Blood Sample Using SAS®Pan-Cancer Epigenetic Biomarker Selection from Blood Sample Using SAS®
Pan-Cancer Epigenetic Biomarker Selection from Blood Sample Using SAS®
Xi Chen
 
Introduction to SAS Enterprise Miner
Introduction to SAS Enterprise MinerIntroduction to SAS Enterprise Miner
Introduction to SAS Enterprise Miner
Xi Chen
 
RapidPredictiveModelingfor Business Analysis
RapidPredictiveModelingfor Business AnalysisRapidPredictiveModelingfor Business Analysis
RapidPredictiveModelingfor Business Analysis
Xi Chen
 
Cert
CertCert
Cert
Xi Chen
 
Cert-Stat1
Cert-Stat1Cert-Stat1
Cert-Stat1
Xi Chen
 
Cert-SQL
Cert-SQLCert-SQL
Cert-SQL
Xi Chen
 
Cert-Macro1
Cert-Macro1Cert-Macro1
Cert-Macro1
Xi Chen
 

More from Xi Chen (8)

SIAM CSE21 Broader Engagement Program Flyer
SIAM CSE21 Broader Engagement Program FlyerSIAM CSE21 Broader Engagement Program Flyer
SIAM CSE21 Broader Engagement Program Flyer
 
Pan-Cancer Epigenetic Biomarker Selection from Blood Sample Using SAS®
Pan-Cancer Epigenetic Biomarker Selection from Blood Sample Using SAS®Pan-Cancer Epigenetic Biomarker Selection from Blood Sample Using SAS®
Pan-Cancer Epigenetic Biomarker Selection from Blood Sample Using SAS®
 
Introduction to SAS Enterprise Miner
Introduction to SAS Enterprise MinerIntroduction to SAS Enterprise Miner
Introduction to SAS Enterprise Miner
 
RapidPredictiveModelingfor Business Analysis
RapidPredictiveModelingfor Business AnalysisRapidPredictiveModelingfor Business Analysis
RapidPredictiveModelingfor Business Analysis
 
Cert
CertCert
Cert
 
Cert-Stat1
Cert-Stat1Cert-Stat1
Cert-Stat1
 
Cert-SQL
Cert-SQLCert-SQL
Cert-SQL
 
Cert-Macro1
Cert-Macro1Cert-Macro1
Cert-Macro1
 

Recently uploaded

Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
Mydbops
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Ukraine
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
HarpalGohil4
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
ScyllaDB
 

Recently uploaded (20)

Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
 

A Method to facilitate cancer detection and type classification from gene expression using a deep auto-encoder and neural network

  • 1. A Method to Facilitate Cancer Detection and Type Classification from Gene Expression Data using a Deep Autoencoder and Neural Network By Xi Chen March 27, 2019
  • 2. Gene Expression Data Properties. • Gene expresses differently depending upon various factors such as the type of cells, environment and disease conditions. • Gene expression data are highly available due to the increased affordability of the sequencing technology. • Gene expression data are multimodality, high dimensional with small observation number (#row << #column). • Gene expression data can be used for disease detection and classification, and drug suggestion. 2
  • 3. Gene Expression Data With Dimension Reduction • Using dimension reduction methods, such as PCA, for feature selection, since gene expression data have high dimension. • Apply traditional statistical and machine learning methods for application such as disease detection or classification. • Problem: how to explain the selected features. E.g. Each PC is a linear combination of the gene expression features. 3
  • 4. Proposed Drug Suggestion Scheme. 2D Gene Expression Representation Feature 1 Feature2 Drug Sensitivity Drug A Drug B Drug C Drug D Cluster Approaches: • K-means • Gaussian Mixture Models 4
  • 5. Problem: Current Gene Expression Data Don’t Include Drug Results. • Most gene expression data aren’t associated with well documented medical records. • Available records often miss drug information and patient disease outputs. 5
  • 6. Solving The Harder Classification Problem First, Then We Could Infer Cluster Approach Works • In general, a classification problem is similar to a cluster problem, e.g. k-Nearest Neighbors algorithm. • If using gene expression data we could achieve high accurate classification results, we might be able to suggest clustering gene expression data for drug suggestion. 6
  • 9. Autoencoder For Feature Learning Minimize 𝑓(𝐼𝑛𝑝𝑢𝑡 − 𝑂𝑢𝑡𝑝𝑢𝑡) 100 50 25 50 100 Training Autoencoder 1st hidden layer: 2nd hidden layer: 3rd hidden layer: 4th hidden layer: 5th hidden layer: Model Configuration 9
  • 10. Learned Feature + Neural Network 10
  • 11. Single Type Classification Lung cancer, abundant and balanced data 11
  • 12. Why Not PCA? • PCA is a descriptive model. • Each component is a linear combination of all the features. • Hard to explain. 12
  • 13. Cancer Type Acronym Full Name LGG Lower Grad Glioma UVM Uveal Melanoma LUSC Lung squamous cell carcinoma GBM Glioblastoma Multiforme Multiple Type Classification • Misclassifications are due to small sample size. • Misclassifications are sparse, clustering potential. 13
  • 14. Conclusion • Autoencoder to automatically generate feature representations, thus addressing the very high dimensionality of gene expression data. • The extracted feature vector captures the non-linearity of the data. • This approach is scalable for new data after training, and it can generalize in multi-classification of different types of cancer. • We have demonstrated the high accuracy and low FNR/FPR of this method for the majority of the abundant cancer types, and its potential for handling sub-classification within certain cancers and identifying metastasis cancers. 14
  • 15. Other Projects—Deep Learning Behind The Scenes • Almost all machine learning applications use similar approaches—Feature Engineering + Deep Learning. • E.g. Self-driving cars = CNN + DNN • Feature engineering  CNN • Deep Learning training  DNN • Deployment 15