SlideShare a Scribd company logo
Liangqun Lu
2018 - 04 - 25
Outline
● Background on Data Integration
○ Biological regulation
○ Omic data integration objectives
○ Data Integration Challenges
● Unsupervised methods and Application
○ Matrix factorization methods (iCluster+ )
○ Bayesian methods (BCC)
○ Network-based methods (SNF)
○ Multiple Kernel Learning and Multi-Step Analysis (rMKL-LPP)
2
Biological regulation
● Central dogma
3
Gene Regulatory Network
Regulatory elements
● Receptors
● Transcriptional factors
● Inhibitory factors
● Cis-trans element
Source: https://en.wikipedia.org/wiki/Gene_regulatory_network
4
Rich data
5
Single omic study
● One-dimension data explains the
diagnostics and progression for
complex disorders
● Information is limited
● Different layers of biological
system are relevant and
dependent
6
Omic data integration objectives
● Promoting precise medicine from big data
● Multiview investigation on the
completeness and complexity of the
biological system
● Discover hidden biological regularities
● Make use of complementary information
and discover biomarkers for diagnosis,
progression and treatment in human
diseases
7
Data Integration Challenges (From Computational)
● Data integration is broad
● Data heterogeneity
● Data unification
● Data noise and bias
● Data integration and dimensionality reduction
8
9
Unsupervised classification
● Matrix factorization methods (iCluster and iCluster+ )
○ Assumption: common latent variable in different data
● Bayesian methods (Bayesian consensus clustering)
○ Assumption: assumptions on data distribution and data correlation
● Network-based methods (SNF)
○ Assumption: samples relationship can be enhanced from
complementary multiple omic data
● Multiple Kernel Learning and Multi-Step Analysis (rMKL-LPP)
○ Assumption: pattern in a lower dimensional and integrative
subspace
10
Data Integration for subtype discovery
● Data Source
○ Gene expression; DNA methylation; gene mutation
● Procedures
○ Data fusion -- Clustering -- Evaluation
● Biological interpretation
○ Molecular alterations
○ Survival outcome
○ Response to therapies
11
12
iCluster and iCluster+
13
Procedure
● Data Fusion and K-means model selection
○ EM algorithm to obtain maximum
likelihood estimates
■ E-step provides a simultaneous
dimension reduction
■ M-step is to update the parameter
estimates
● Evaluation
○ Proportion of deviance -- POD (d/n^2)
○ Smaller, stronger cluster separability
○ Determine cluster number and lasso
parameter λ 15
Application on breast cancer
16
Summaries
● The joint latent variable model is completely scalable to include additional
data types
● iCluster have been applied to discover subtypes at breast cancer and
glioblastoma multiforme (GBM)
● iCluster+ makes different modeling assumptions on data types: binary,
continuous, categorical, and sequential data
17
Similarity Network Fusion (SNF)
18
SNF data fusion
1. Calculate sample similarity W in each omic dataset
using (1)
2. Calculate normalized weight matrix P from W using (2)
3. Use K nearest neighbors (KNN) to calculate local
affinity matrix S through the formulas (3) from W. P
carries the full information about the similarity of each
patient to all others whereas S only encodes the
similarity to the K most similar patients for each
patient.
4. Network fusion process: for 2 datasets, P1, S1 and P2,
S2 can be calculated, then iteratively update P1 and P2
for t steps using (4) and (5); for more than 2 datasets,
update the Ps using (5)
5. Obtain the overall fused matrix P by averaging the
updated single Ps
19
Spectral Clustering
Input X (n x n sample similarity matrix) and k clusters
Goal subgroups in a graph with disjoint cliques
Procedures:
1. Compute the normalized Laplacian L
2. Compute the first eigenvectors u and eigenvalues
for L
3. Let U be the matrix containing eigenvectors u as
columns
4. Form the matrix T from U by normalizing the rows
to norm 1
5. Cluster the points with k-means into clusters C1, ...,
Ck
20
Application: GBM subtype discovery
Evaluations:
1. P value in Cox log-rank test
2. Silhouette score
21
Summaries
● SNF can construct sample sample network by integrating multiple datasets
● SNF can be expanded to include more datasets and be applied in more
questions
22
Bayesian Consensus Clustering
● An integrative statistical model that permits a separate clustering of the
objects for each data source.
● These separate clusterings adhere loosely to an overall consensus clustering
● BCC do simultaneous estimation of both the consensus clustering and the
source-specific clusterings
23
Procedures
● Dirichlet mixture model to accommodate multiple data (X)
● Probability of belonging to one cluster
● Estimation
○ Gibbs sampling procedure to approximate the posterior distribution
○ Markov chain Monte Carlo (MCMC) proceeds by iteratively sampling
● Choose K based on highest mean adjusted adherence
24
Application on breast cancer
● RNA gene expression (GE) data
for 645 genes.
● DNA methylation (ME) data for
574 probes.
● miRNA expression (miRNA) data
for 423 miRNAs.
● Reverse phase protein array
(RPPA) data for 171 proteins.
25
26
Summaries
1. BCC model assumes a simple and general dependence between data
sources.
2. BCC models both an overall clustering and a clustering specific to each data
source, with advantages over traditional methods in terms of modeling
uncertainty and the ability to borrow information across sources.
3. BCC is suitable to work on multisource biomedical data, as well may be used
to compare clusterings from different statistical models for a single
homogeneous dataset.
27
Regularized Multiple Kernel Learning Locality
Preserving Projections (rMKL-LPP)
28
● It is an extension of the current multiple kernel learning with dimensional
reduction (MKL-DR) method, where the data are projected into a lower
dimensional and integrative subspace.
● A regularization term is added to avoid overfitting during the optimization
procedure, and it allows using several different kernel types.
● The Locality Preserving Projections (LPP) is applied to conserve the
sum of distances for each sample’s k-Nearest Neighbors.
Procedures
● Data fusion
○ rMKL-LPP
○ Optimization
○ integrated kernel matrix
● Clustering
○ K-means
○ Mean silhouette width used to optimize number of clusters
● Evaluation
○ Silhouette score and cross validation (Rand index)
29
Applications in 5 cancers
1. Comparison to state-of-the-art (SNF)
2. Robustness analysis
3. Comparison of clusterings to
established subtypes
4. Clinical implications from clusterings
30
5 cancers
1. glioblastoma multiforme (GBM) --
213 samples
2. breast invasive carcinoma (BIC) --
105 samples
3. kidney renal clear cell carcinoma
(KRCCC) -- 122 samples
4. lung squamous cell carcinoma
(LSCC) -- 106 samples
5. colon adenocarcinoma (COAD) -- 92
samplesDatasets: gene expression, DNA methylation
and miRNA expression data
1. Comparison to state-of-the-art
31
2. Robustness analysis
32
Fig. 2. Robustness of clustering for leave-one-out
datasets measured using Rand index.
Fig. 3. Robustness of clustering for leave-
one-out cross-validation applied to
reduced sized datasets measured using
Rand index.
3. Comparison of clusterings to established subtypes
33
4. Clinical implications from clusterings
34
GBM:
● 94 of 213 were
treated with
Temozolomide
35
Explain better survival
Summaries
1. rMKL-LPP found subtypes with more interesting log-rank test compared to the
state-of-the-art method
2. Several kernel matrices per data type can improve performance burdance,
remove the burden of selecting the optimal kernel matrix and have fair
stability
3. rMKL-LPP compared to unregularized MKL-DR remains stable also for small
datasets
4. The application at GBM shows to capture this diverse information within one
clustering
36
References
1. Huang, S., Chaudhary, K. & Garmire, L. X. More Is Better: Recent Progress in Multi-Omics Data
Integration Methods. Front. Genet. 8, 84 (2017).
2. Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat.
Methods 11, 333–337 (2014).
3. Shen, R., Olshen, A. B. & Ladanyi, M. Integrative clustering of multiple genomic data types using a
joint latent variable model with application to breast and lung cancer subtype analysis.
Bioinformatics 25, 2906–2912 (2009).
4. Shen, R. et al. Integrative subtype discovery in glioblastoma using iCluster. PLoS One 7, e35236
(2012).
5. Mo, Q. et al. Pattern discovery and cancer gene identification in integrated cancer genomic data.
Proc. Natl. Acad. Sci. U. S. A. 110, 4245–4250 (2013).
6. Speicher, N. K. & Pfeifer, N. Integrating different data types by regularized unsupervised multiple
kernel learning with application to cancer subtype discovery. Bioinformatics 31, i268–75 (2015).
7. Lock, E. F. & Dunson, D. B. Bayesian consensus clustering. Bioinformatics 29, 2610–2616 (2013).
37

More Related Content

What's hot

A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
ijcsa
 
Improved fuzzy c-means algorithm based on a novel mechanism for the formation...
Improved fuzzy c-means algorithm based on a novel mechanism for the formation...Improved fuzzy c-means algorithm based on a novel mechanism for the formation...
Improved fuzzy c-means algorithm based on a novel mechanism for the formation...
TELKOMNIKA JOURNAL
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
iosrjce
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
IJMIT JOURNAL
 
A new link based approach for categorical data clustering
A new link based approach for categorical data clusteringA new link based approach for categorical data clustering
A new link based approach for categorical data clustering
International Journal of Science and Research (IJSR)
 
Big Data Clustering Model based on Fuzzy Gaussian
Big Data Clustering Model based on Fuzzy GaussianBig Data Clustering Model based on Fuzzy Gaussian
Big Data Clustering Model based on Fuzzy Gaussian
IJCSIS Research Publications
 
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
IJMER
 
Learning in non stationary environments
Learning in non stationary environmentsLearning in non stationary environments
Learning in non stationary environmentsSpringer
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative Analysis
Editor IJMTER
 
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MININGPATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
IJDKP
 
7. 10083 12464-1-pb
7. 10083 12464-1-pb7. 10083 12464-1-pb
7. 10083 12464-1-pb
IAESIJEECS
 
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...Further Analysis Of A Framework To Analyze Network Performance Based On Infor...
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...
CSCJournals
 
Data reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataData reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological data
eSAT Journals
 
Large scale cell tracking using an approximated Sinkhorn algorithm
Large scale cell tracking using an approximated Sinkhorn algorithmLarge scale cell tracking using an approximated Sinkhorn algorithm
Large scale cell tracking using an approximated Sinkhorn algorithm
Parth Nandedkar
 
Designing GWAS arrays for efficient imputation-based coverage
Designing GWAS arrays for efficient imputation-based coverageDesigning GWAS arrays for efficient imputation-based coverage
Designing GWAS arrays for efficient imputation-based coverage
Affymetrix
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
IJCSIS Research Publications
 
Heterogeneous Information Network Embedding for Recommendation
Heterogeneous Information Network Embedding for RecommendationHeterogeneous Information Network Embedding for Recommendation
Heterogeneous Information Network Embedding for Recommendation
JAYAPRAKASH JPINFOTECH
 
Heterogeneous Information Network Embedding for Recommendation
Heterogeneous Information Network Embedding for RecommendationHeterogeneous Information Network Embedding for Recommendation
Heterogeneous Information Network Embedding for Recommendation
JAYAPRAKASH JPINFOTECH
 

What's hot (20)

A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
 
2009 spie hmm
2009 spie hmm2009 spie hmm
2009 spie hmm
 
Improved fuzzy c-means algorithm based on a novel mechanism for the formation...
Improved fuzzy c-means algorithm based on a novel mechanism for the formation...Improved fuzzy c-means algorithm based on a novel mechanism for the formation...
Improved fuzzy c-means algorithm based on a novel mechanism for the formation...
 
MUSEPosterCoGAPS
MUSEPosterCoGAPSMUSEPosterCoGAPS
MUSEPosterCoGAPS
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
A new link based approach for categorical data clustering
A new link based approach for categorical data clusteringA new link based approach for categorical data clustering
A new link based approach for categorical data clustering
 
Big Data Clustering Model based on Fuzzy Gaussian
Big Data Clustering Model based on Fuzzy GaussianBig Data Clustering Model based on Fuzzy Gaussian
Big Data Clustering Model based on Fuzzy Gaussian
 
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
 
Learning in non stationary environments
Learning in non stationary environmentsLearning in non stationary environments
Learning in non stationary environments
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative Analysis
 
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MININGPATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
 
7. 10083 12464-1-pb
7. 10083 12464-1-pb7. 10083 12464-1-pb
7. 10083 12464-1-pb
 
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...Further Analysis Of A Framework To Analyze Network Performance Based On Infor...
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...
 
Data reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataData reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological data
 
Large scale cell tracking using an approximated Sinkhorn algorithm
Large scale cell tracking using an approximated Sinkhorn algorithmLarge scale cell tracking using an approximated Sinkhorn algorithm
Large scale cell tracking using an approximated Sinkhorn algorithm
 
Designing GWAS arrays for efficient imputation-based coverage
Designing GWAS arrays for efficient imputation-based coverageDesigning GWAS arrays for efficient imputation-based coverage
Designing GWAS arrays for efficient imputation-based coverage
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
 
Heterogeneous Information Network Embedding for Recommendation
Heterogeneous Information Network Embedding for RecommendationHeterogeneous Information Network Embedding for Recommendation
Heterogeneous Information Network Embedding for Recommendation
 
Heterogeneous Information Network Embedding for Recommendation
Heterogeneous Information Network Embedding for RecommendationHeterogeneous Information Network Embedding for Recommendation
Heterogeneous Information Network Embedding for Recommendation
 

Similar to Data integration lab_meeting

AI approaches in healthcare - targeting precise and personalized medicine
AI approaches in healthcare - targeting precise and personalized medicine AI approaches in healthcare - targeting precise and personalized medicine
AI approaches in healthcare - targeting precise and personalized medicine
DayOne
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric Bioinformatics
Natalio Krasnogor
 
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
Edge AI and Vision Alliance
 
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection:  Comparative StudyA Threshold Fuzzy Entropy Based Feature Selection:  Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
IJMER
 
Thesis (presentation)
Thesis (presentation)Thesis (presentation)
Thesis (presentation)nlt2390
 
Perceiver CPI.pptx
Perceiver CPI.pptxPerceiver CPI.pptx
Perceiver CPI.pptx
MinJaeChung8
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
mothersafe
 
Cancer Analytics Poster
Cancer Analytics PosterCancer Analytics Poster
Cancer Analytics PosterMichael Atkins
 
Feature selection and microarray data
Feature selection and microarray dataFeature selection and microarray data
Feature selection and microarray data
Gianluca Bontempi
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
IOSR Journals
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
Editor IJCATR
 
2014 Gene expressionmicroarrayclassification usingPCA–BEL.
2014 Gene expressionmicroarrayclassification usingPCA–BEL.2014 Gene expressionmicroarrayclassification usingPCA–BEL.
2014 Gene expressionmicroarrayclassification usingPCA–BEL.Ehsan Lotfi
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
●๋•máńíکhá Gőýálツ
 
I017235662
I017235662I017235662
I017235662
IOSR Journals
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Projectbutest
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
IJERD Editor
 
Datamining in BreastCancer.pptx
Datamining in BreastCancer.pptxDatamining in BreastCancer.pptx
Datamining in BreastCancer.pptx
MaligireddyTanujaRed1
 
A new model for large dataset dimensionality reduction based on teaching lear...
A new model for large dataset dimensionality reduction based on teaching lear...A new model for large dataset dimensionality reduction based on teaching lear...
A new model for large dataset dimensionality reduction based on teaching lear...
TELKOMNIKA JOURNAL
 
ANN in System Biology
ANN in System Biology ANN in System Biology
ANN in System Biology
Hajra Qayyum
 

Similar to Data integration lab_meeting (20)

AI approaches in healthcare - targeting precise and personalized medicine
AI approaches in healthcare - targeting precise and personalized medicine AI approaches in healthcare - targeting precise and personalized medicine
AI approaches in healthcare - targeting precise and personalized medicine
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric Bioinformatics
 
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection:  Comparative StudyA Threshold Fuzzy Entropy Based Feature Selection:  Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
 
Thesis (presentation)
Thesis (presentation)Thesis (presentation)
Thesis (presentation)
 
Perceiver CPI.pptx
Perceiver CPI.pptxPerceiver CPI.pptx
Perceiver CPI.pptx
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 
Cancer Analytics Poster
Cancer Analytics PosterCancer Analytics Poster
Cancer Analytics Poster
 
Feature selection and microarray data
Feature selection and microarray dataFeature selection and microarray data
Feature selection and microarray data
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
 
2014 Gene expressionmicroarrayclassification usingPCA–BEL.
2014 Gene expressionmicroarrayclassification usingPCA–BEL.2014 Gene expressionmicroarrayclassification usingPCA–BEL.
2014 Gene expressionmicroarrayclassification usingPCA–BEL.
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
I017235662
I017235662I017235662
I017235662
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Project
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
 
dream
dreamdream
dream
 
Datamining in BreastCancer.pptx
Datamining in BreastCancer.pptxDatamining in BreastCancer.pptx
Datamining in BreastCancer.pptx
 
A new model for large dataset dimensionality reduction based on teaching lear...
A new model for large dataset dimensionality reduction based on teaching lear...A new model for large dataset dimensionality reduction based on teaching lear...
A new model for large dataset dimensionality reduction based on teaching lear...
 
ANN in System Biology
ANN in System Biology ANN in System Biology
ANN in System Biology
 

More from Liangqun Lu

NFL_intros.pptx
NFL_intros.pptxNFL_intros.pptx
NFL_intros.pptx
Liangqun Lu
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
Liangqun Lu
 
Gan summary
Gan summaryGan summary
Gan summary
Liangqun Lu
 
NLP DLforDS
NLP DLforDSNLP DLforDS
NLP DLforDS
Liangqun Lu
 
Lasso
LassoLasso
Irgan
IrganIrgan
Deep Learning Application in Biology
Deep Learning Application in BiologyDeep Learning Application in Biology
Deep Learning Application in Biology
Liangqun Lu
 
Liangqun ms defense.pptx
Liangqun ms defense.pptxLiangqun ms defense.pptx
Liangqun ms defense.pptx
Liangqun Lu
 
Thesis ms llq
Thesis ms llqThesis ms llq
Thesis ms llq
Liangqun Lu
 
Liangqun lu 1st_gss_version2
Liangqun lu 1st_gss_version2Liangqun lu 1st_gss_version2
Liangqun lu 1st_gss_version2
Liangqun Lu
 
Presentation orientation
Presentation orientationPresentation orientation
Presentation orientation
Liangqun Lu
 
Journal club.pptx
Journal club.pptxJournal club.pptx
Journal club.pptx
Liangqun Lu
 
Final.project
Final.projectFinal.project
Final.project
Liangqun Lu
 

More from Liangqun Lu (13)

NFL_intros.pptx
NFL_intros.pptxNFL_intros.pptx
NFL_intros.pptx
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
Gan summary
Gan summaryGan summary
Gan summary
 
NLP DLforDS
NLP DLforDSNLP DLforDS
NLP DLforDS
 
Lasso
LassoLasso
Lasso
 
Irgan
IrganIrgan
Irgan
 
Deep Learning Application in Biology
Deep Learning Application in BiologyDeep Learning Application in Biology
Deep Learning Application in Biology
 
Liangqun ms defense.pptx
Liangqun ms defense.pptxLiangqun ms defense.pptx
Liangqun ms defense.pptx
 
Thesis ms llq
Thesis ms llqThesis ms llq
Thesis ms llq
 
Liangqun lu 1st_gss_version2
Liangqun lu 1st_gss_version2Liangqun lu 1st_gss_version2
Liangqun lu 1st_gss_version2
 
Presentation orientation
Presentation orientationPresentation orientation
Presentation orientation
 
Journal club.pptx
Journal club.pptxJournal club.pptx
Journal club.pptx
 
Final.project
Final.projectFinal.project
Final.project
 

Recently uploaded

Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
RASHMI M G
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
The Evolution of Science Education PraxiLabs’ Vision- Presentation (2).pdf
The Evolution of Science Education PraxiLabs’ Vision- Presentation (2).pdfThe Evolution of Science Education PraxiLabs’ Vision- Presentation (2).pdf
The Evolution of Science Education PraxiLabs’ Vision- Presentation (2).pdf
mediapraxi
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 

Recently uploaded (20)

Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
The Evolution of Science Education PraxiLabs’ Vision- Presentation (2).pdf
The Evolution of Science Education PraxiLabs’ Vision- Presentation (2).pdfThe Evolution of Science Education PraxiLabs’ Vision- Presentation (2).pdf
The Evolution of Science Education PraxiLabs’ Vision- Presentation (2).pdf
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 

Data integration lab_meeting

  • 2. Outline ● Background on Data Integration ○ Biological regulation ○ Omic data integration objectives ○ Data Integration Challenges ● Unsupervised methods and Application ○ Matrix factorization methods (iCluster+ ) ○ Bayesian methods (BCC) ○ Network-based methods (SNF) ○ Multiple Kernel Learning and Multi-Step Analysis (rMKL-LPP) 2
  • 4. Gene Regulatory Network Regulatory elements ● Receptors ● Transcriptional factors ● Inhibitory factors ● Cis-trans element Source: https://en.wikipedia.org/wiki/Gene_regulatory_network 4
  • 6. Single omic study ● One-dimension data explains the diagnostics and progression for complex disorders ● Information is limited ● Different layers of biological system are relevant and dependent 6
  • 7. Omic data integration objectives ● Promoting precise medicine from big data ● Multiview investigation on the completeness and complexity of the biological system ● Discover hidden biological regularities ● Make use of complementary information and discover biomarkers for diagnosis, progression and treatment in human diseases 7
  • 8. Data Integration Challenges (From Computational) ● Data integration is broad ● Data heterogeneity ● Data unification ● Data noise and bias ● Data integration and dimensionality reduction 8
  • 9. 9
  • 10. Unsupervised classification ● Matrix factorization methods (iCluster and iCluster+ ) ○ Assumption: common latent variable in different data ● Bayesian methods (Bayesian consensus clustering) ○ Assumption: assumptions on data distribution and data correlation ● Network-based methods (SNF) ○ Assumption: samples relationship can be enhanced from complementary multiple omic data ● Multiple Kernel Learning and Multi-Step Analysis (rMKL-LPP) ○ Assumption: pattern in a lower dimensional and integrative subspace 10
  • 11. Data Integration for subtype discovery ● Data Source ○ Gene expression; DNA methylation; gene mutation ● Procedures ○ Data fusion -- Clustering -- Evaluation ● Biological interpretation ○ Molecular alterations ○ Survival outcome ○ Response to therapies 11
  • 12. 12
  • 14. Procedure ● Data Fusion and K-means model selection ○ EM algorithm to obtain maximum likelihood estimates ■ E-step provides a simultaneous dimension reduction ■ M-step is to update the parameter estimates ● Evaluation ○ Proportion of deviance -- POD (d/n^2) ○ Smaller, stronger cluster separability ○ Determine cluster number and lasso parameter λ 15
  • 16. Summaries ● The joint latent variable model is completely scalable to include additional data types ● iCluster have been applied to discover subtypes at breast cancer and glioblastoma multiforme (GBM) ● iCluster+ makes different modeling assumptions on data types: binary, continuous, categorical, and sequential data 17
  • 18. SNF data fusion 1. Calculate sample similarity W in each omic dataset using (1) 2. Calculate normalized weight matrix P from W using (2) 3. Use K nearest neighbors (KNN) to calculate local affinity matrix S through the formulas (3) from W. P carries the full information about the similarity of each patient to all others whereas S only encodes the similarity to the K most similar patients for each patient. 4. Network fusion process: for 2 datasets, P1, S1 and P2, S2 can be calculated, then iteratively update P1 and P2 for t steps using (4) and (5); for more than 2 datasets, update the Ps using (5) 5. Obtain the overall fused matrix P by averaging the updated single Ps 19
  • 19. Spectral Clustering Input X (n x n sample similarity matrix) and k clusters Goal subgroups in a graph with disjoint cliques Procedures: 1. Compute the normalized Laplacian L 2. Compute the first eigenvectors u and eigenvalues for L 3. Let U be the matrix containing eigenvectors u as columns 4. Form the matrix T from U by normalizing the rows to norm 1 5. Cluster the points with k-means into clusters C1, ..., Ck 20
  • 20. Application: GBM subtype discovery Evaluations: 1. P value in Cox log-rank test 2. Silhouette score 21
  • 21. Summaries ● SNF can construct sample sample network by integrating multiple datasets ● SNF can be expanded to include more datasets and be applied in more questions 22
  • 22. Bayesian Consensus Clustering ● An integrative statistical model that permits a separate clustering of the objects for each data source. ● These separate clusterings adhere loosely to an overall consensus clustering ● BCC do simultaneous estimation of both the consensus clustering and the source-specific clusterings 23
  • 23. Procedures ● Dirichlet mixture model to accommodate multiple data (X) ● Probability of belonging to one cluster ● Estimation ○ Gibbs sampling procedure to approximate the posterior distribution ○ Markov chain Monte Carlo (MCMC) proceeds by iteratively sampling ● Choose K based on highest mean adjusted adherence 24
  • 24. Application on breast cancer ● RNA gene expression (GE) data for 645 genes. ● DNA methylation (ME) data for 574 probes. ● miRNA expression (miRNA) data for 423 miRNAs. ● Reverse phase protein array (RPPA) data for 171 proteins. 25
  • 25. 26
  • 26. Summaries 1. BCC model assumes a simple and general dependence between data sources. 2. BCC models both an overall clustering and a clustering specific to each data source, with advantages over traditional methods in terms of modeling uncertainty and the ability to borrow information across sources. 3. BCC is suitable to work on multisource biomedical data, as well may be used to compare clusterings from different statistical models for a single homogeneous dataset. 27
  • 27. Regularized Multiple Kernel Learning Locality Preserving Projections (rMKL-LPP) 28 ● It is an extension of the current multiple kernel learning with dimensional reduction (MKL-DR) method, where the data are projected into a lower dimensional and integrative subspace. ● A regularization term is added to avoid overfitting during the optimization procedure, and it allows using several different kernel types. ● The Locality Preserving Projections (LPP) is applied to conserve the sum of distances for each sample’s k-Nearest Neighbors.
  • 28. Procedures ● Data fusion ○ rMKL-LPP ○ Optimization ○ integrated kernel matrix ● Clustering ○ K-means ○ Mean silhouette width used to optimize number of clusters ● Evaluation ○ Silhouette score and cross validation (Rand index) 29
  • 29. Applications in 5 cancers 1. Comparison to state-of-the-art (SNF) 2. Robustness analysis 3. Comparison of clusterings to established subtypes 4. Clinical implications from clusterings 30 5 cancers 1. glioblastoma multiforme (GBM) -- 213 samples 2. breast invasive carcinoma (BIC) -- 105 samples 3. kidney renal clear cell carcinoma (KRCCC) -- 122 samples 4. lung squamous cell carcinoma (LSCC) -- 106 samples 5. colon adenocarcinoma (COAD) -- 92 samplesDatasets: gene expression, DNA methylation and miRNA expression data
  • 30. 1. Comparison to state-of-the-art 31
  • 31. 2. Robustness analysis 32 Fig. 2. Robustness of clustering for leave-one-out datasets measured using Rand index. Fig. 3. Robustness of clustering for leave- one-out cross-validation applied to reduced sized datasets measured using Rand index.
  • 32. 3. Comparison of clusterings to established subtypes 33
  • 33. 4. Clinical implications from clusterings 34 GBM: ● 94 of 213 were treated with Temozolomide
  • 35. Summaries 1. rMKL-LPP found subtypes with more interesting log-rank test compared to the state-of-the-art method 2. Several kernel matrices per data type can improve performance burdance, remove the burden of selecting the optimal kernel matrix and have fair stability 3. rMKL-LPP compared to unregularized MKL-DR remains stable also for small datasets 4. The application at GBM shows to capture this diverse information within one clustering 36
  • 36. References 1. Huang, S., Chaudhary, K. & Garmire, L. X. More Is Better: Recent Progress in Multi-Omics Data Integration Methods. Front. Genet. 8, 84 (2017). 2. Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11, 333–337 (2014). 3. Shen, R., Olshen, A. B. & Ladanyi, M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25, 2906–2912 (2009). 4. Shen, R. et al. Integrative subtype discovery in glioblastoma using iCluster. PLoS One 7, e35236 (2012). 5. Mo, Q. et al. Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc. Natl. Acad. Sci. U. S. A. 110, 4245–4250 (2013). 6. Speicher, N. K. & Pfeifer, N. Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery. Bioinformatics 31, i268–75 (2015). 7. Lock, E. F. & Dunson, D. B. Bayesian consensus clustering. Bioinformatics 29, 2610–2616 (2013). 37

Editor's Notes

  1. The main advantage of Bayesian methods in data integration is that they can make assumptions not only on different types of data sets with various distributions but also on the correlations among data sets.
  2. estimating the number of clusters K and the lasso parameter λ.
  3. (C) Model selection based on POD measure. A four-cluster sparse solution (λ = 0.2) was chosen.
  4. Spectral clustering is suitable for graph clustering
  5. It is an extension of the current multiple kernel learning with dimensional reduction (MKL-DR) method MKL-DR: https://pdfs.semanticscholar.org/1cd3/bbae54b217843870fdc771d727b6043225b8.pdf
  6. Fig. 2. Robustness of clustering for leave-one-out datasets measured using Rand index. Each patient is left out once in the dimensionality reduction and clustering procedure and afterwards added to the cluster with the closest mean based on the learned projection for this data point, which is given by projðxiÞ ¼ AT Ki b. The resulting cluster assignment is then compared with the clustering of the whole dataset. The error bars represent one standard deviation Fig. 3. Robustness of clustering for leave-one-out cross-validation applied to reduced sized datasets measured using Rand index. For each cancer type, we sampled 20 times half of the patients and applied leave-one-out cross-validation as described in Section 3.4. The error bars represent one standard deviation
  7. The results are very similar to those found by Noushmehr et al. (2010) for their identified G-CIMP positive subtype. In addition, we found the set of underexpressed genes to be highly enriched for processes associated to the immune system and inflammation [cf. Table 3 (column 2)]. Since chronic inflammation is generally related to cancer progression and is thought to play an important role in the construction of the tumor microenvironment (Hanahan and Weinberg, 2011), these downregulations might be a reason for the favorable outcome of patients from this cluster.