SlideShare a Scribd company logo
1 of 18
Refined blood-borne m iRNom e of hum an
diseases via PCA-based feature extraction

             Y-h. Taguchi
         Departm ent of Physics,
            Chuo University

           Yoshiki Murakam i
      Center for Genom ic Medicine
            Kyoto University
Caution:

Main results obtained by the collaboration
with Prof. Murakam i are based upon his own
experim ents ( * ), but our results are related
to planed patent proposal. T    hus, here we
decided to present our m ethods applied to
alternative public data.

(*) to be subm itted to Journal of hepatology
1. T concept of PCA based feature extraction
    he

2. What is m iRNA (will be skipped)?

3. Previous Work (Dry + Wet)

4. Proposed m ethod + Results

5. S m ary & Conclusion
    um
1. T concept of PCA based feature extraction
    he
 Why feature extraction?

 ・ Avoiding overfitting

 ・Needs for experim ental validation
  too m any genes/proteins cannot be tested.

 ・S everal m ethods require fewer state
    variables than observations
  One of problem s:
   Feature extraction itself rarely passes cross
  validation test.
Conventional         S ples
                      am              T S
                                       est et

   Group1             Group2         Group3

    Feature            Feature
   Extraction    ≠    Extraction


    Model               Model
  Construction        Construction




       Training Set                  Validation
Proposed
                                  S ples
                                   am
                                                  T S
                                                   est et

                    Group1        Group2          Group3

 Feature
Extraction


   Without
                     Model           Model
 knowledge         Construction    Construction
    about
classification/t
     arget
   variable        Training Set                   Validation
2. What is miRNA?

m iRNA is a kind of non-coding RNA.
m iRNAs are believed to suppress target gene
expression by degradation of m RNAs.
Im portant features:
・T   ypically, there are hundreds kinds of m iRNAs found for
 each species (c.a., ≧1000 for hum an).
・ Each m iRNA targets m ore than hundreds of genes.
・ m iRNA m ainly contributes to cell type change
  (e.g., cancer, defferentiation, diseases)
・Infulence to target gene expression by m iRNA is subtle
 (〜10%) and contexts dependent.
・In spite of that,
  m iRNA critically contributes to the related processes
3. Previous Work (Dry + Wet)

Toward the blood-borne m iRNom e of hum an
diseases, A. Keller et al., Nature Method,
(2011).
Discrim ination between diseases using m iRNA
in blood

Feature (m iRNA) selection :
P-value (t test)

Discrim ination:
S with several types of kernels + grid based
 VC
cf. Nature Method, 10 m iRNAs




                                <0.7
4. Proposed m ethod + Results
                      Data

                        ⇓

                       PCA

                        ⇓

               Feature S   election
       (without classification inform ation)
                        LDA
PCA (sam ples:           ◯ Control
     diseases/cancers)   △ lung cancer 

                           diseases
                           cancers
PCA (m iRNAs)                Why outliners?
Feature extraction                   ⇓
(m iRNAs)                   m ain contribution
               m iRNA             to PCA
                             em beddings of
                                 sam ples
              10 outliner
               m iRNAs          Why 10?
                                   ⇓
                            T com pare with
                             o
                             Nature Method
                              paper results
PCA, again (sam ples        ◯ Control
after feature extraction)   △ lung cancer

                                diseases
                                cancers
Control vs Lung Cancer
             LDA with PCA
             (after feature extraction, up to the 5th PC)
                                        Actual
                                 control         lung cancer
Prediction




                 control           56                 8
              lung cancer          14                24

       Accuracy 0.784           0.813
       Specificity 0.800        0.844
       Sensitivity 0.750        0.781
       Precision 0.632
                                cf. Nature Method,
                                250 miRNAs
Relatively       0.813     0.844      0.781 250 m iRNAs
      Best




      Relatively
                        0.867     0.867      0.844 150 m iRNAs
      Worst




                                                    >0.70
(+)(-) : Com parison with 10 m iRNA results in Nature Methods
S elected m iRNAs: diseases/cancers vs norm al
(+)/(-) : up/downregulated after the
transform ation by PCA+LDA
(*) not selected independence of diseases/cancers
5. S m ary & Conclusion
    um
Advantages of proposed m ethod
 ・ No need of classification inform ation for
 feature selection

 ・ Independent of training/test set division for
 feature selection (Thus, stable) 

More Related Content

Similar to Refined blood-borne miRNome of human diseases via PCA-based feature extraction

Towards Prediction of Platinum Treatment Response in Ovarian Cancer using Mac...
Towards Prediction of Platinum Treatment Response in Ovarian Cancer using Mac...Towards Prediction of Platinum Treatment Response in Ovarian Cancer using Mac...
Towards Prediction of Platinum Treatment Response in Ovarian Cancer using Mac...Antoaneta Vladimirova
 
IntOGen, Integrative Oncogenomics for Personal Cancer Genomes
IntOGen, Integrative Oncogenomics for Personal Cancer GenomesIntOGen, Integrative Oncogenomics for Personal Cancer Genomes
IntOGen, Integrative Oncogenomics for Personal Cancer Genomeschristian.perez
 
Rt2 profilerbrochure
Rt2 profilerbrochureRt2 profilerbrochure
Rt2 profilerbrochureElsa von Licy
 
Epi tect methylation qpcr arrays 2013
Epi tect methylation qpcr arrays 2013Epi tect methylation qpcr arrays 2013
Epi tect methylation qpcr arrays 2013Elsa von Licy
 
Maldi tof-ms analysis in identification of prostate cancer
Maldi tof-ms analysis in identification of prostate cancerMaldi tof-ms analysis in identification of prostate cancer
Maldi tof-ms analysis in identification of prostate cancerMoustafa Rezk
 
microRNA-mRNA interaction identification in Wilms tumor using principal compo...
microRNA-mRNA interaction identification in Wilms tumor using principal compo...microRNA-mRNA interaction identification in Wilms tumor using principal compo...
microRNA-mRNA interaction identification in Wilms tumor using principal compo...Y-h Taguchi
 
Tramp and lady presentation (Dec.19,2012)
Tramp and lady presentation (Dec.19,2012)Tramp and lady presentation (Dec.19,2012)
Tramp and lady presentation (Dec.19,2012)Ahmad Usama
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaquesSHAPE Society
 
Tools & techniques (Molecular & Biochemical to Study Physiological Processes ...
Tools & techniques (Molecular & Biochemical to Study Physiological Processes ...Tools & techniques (Molecular & Biochemical to Study Physiological Processes ...
Tools & techniques (Molecular & Biochemical to Study Physiological Processes ...Akshay Deshmukh
 
TriStar Presentation 2011
TriStar Presentation 2011TriStar Presentation 2011
TriStar Presentation 2011thnkstudios
 
Friend DREAM 2012-11-14
Friend DREAM 2012-11-14Friend DREAM 2012-11-14
Friend DREAM 2012-11-14Sage Base
 
Biomarker for genotoxicity 2013
Biomarker for genotoxicity 2013Biomarker for genotoxicity 2013
Biomarker for genotoxicity 2013Elsa von Licy
 
TaqMan® Rare Mutation Assays w/ Digital PCR | ESHG 2015 Poster PM14.030
TaqMan® Rare Mutation Assays w/ Digital PCR | ESHG 2015 Poster PM14.030TaqMan® Rare Mutation Assays w/ Digital PCR | ESHG 2015 Poster PM14.030
TaqMan® Rare Mutation Assays w/ Digital PCR | ESHG 2015 Poster PM14.030Thermo Fisher Scientific
 
Stephen Friend Inspire2Live Discovery Network 2011-10-29
Stephen Friend Inspire2Live Discovery Network 2011-10-29Stephen Friend Inspire2Live Discovery Network 2011-10-29
Stephen Friend Inspire2Live Discovery Network 2011-10-29Sage Base
 

Similar to Refined blood-borne miRNome of human diseases via PCA-based feature extraction (20)

Towards Prediction of Platinum Treatment Response in Ovarian Cancer using Mac...
Towards Prediction of Platinum Treatment Response in Ovarian Cancer using Mac...Towards Prediction of Platinum Treatment Response in Ovarian Cancer using Mac...
Towards Prediction of Platinum Treatment Response in Ovarian Cancer using Mac...
 
IntOGen, Integrative Oncogenomics for Personal Cancer Genomes
IntOGen, Integrative Oncogenomics for Personal Cancer GenomesIntOGen, Integrative Oncogenomics for Personal Cancer Genomes
IntOGen, Integrative Oncogenomics for Personal Cancer Genomes
 
Rt2 profilerbrochure
Rt2 profilerbrochureRt2 profilerbrochure
Rt2 profilerbrochure
 
Mi rna array 2013
Mi rna array 2013Mi rna array 2013
Mi rna array 2013
 
Epi tect methylation qpcr arrays 2013
Epi tect methylation qpcr arrays 2013Epi tect methylation qpcr arrays 2013
Epi tect methylation qpcr arrays 2013
 
Maldi tof-ms analysis in identification of prostate cancer
Maldi tof-ms analysis in identification of prostate cancerMaldi tof-ms analysis in identification of prostate cancer
Maldi tof-ms analysis in identification of prostate cancer
 
microRNA-mRNA interaction identification in Wilms tumor using principal compo...
microRNA-mRNA interaction identification in Wilms tumor using principal compo...microRNA-mRNA interaction identification in Wilms tumor using principal compo...
microRNA-mRNA interaction identification in Wilms tumor using principal compo...
 
Tramp and lady presentation (Dec.19,2012)
Tramp and lady presentation (Dec.19,2012)Tramp and lady presentation (Dec.19,2012)
Tramp and lady presentation (Dec.19,2012)
 
Tpa 2013
Tpa 2013Tpa 2013
Tpa 2013
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques
 
Micro array study for gene expression in vp
Micro array study for gene expression in vpMicro array study for gene expression in vp
Micro array study for gene expression in vp
 
Tools & techniques (Molecular & Biochemical to Study Physiological Processes ...
Tools & techniques (Molecular & Biochemical to Study Physiological Processes ...Tools & techniques (Molecular & Biochemical to Study Physiological Processes ...
Tools & techniques (Molecular & Biochemical to Study Physiological Processes ...
 
TriStar Presentation 2011
TriStar Presentation 2011TriStar Presentation 2011
TriStar Presentation 2011
 
31931 31941
31931 3194131931 31941
31931 31941
 
Friend DREAM 2012-11-14
Friend DREAM 2012-11-14Friend DREAM 2012-11-14
Friend DREAM 2012-11-14
 
Biomarker for genotoxicity 2013
Biomarker for genotoxicity 2013Biomarker for genotoxicity 2013
Biomarker for genotoxicity 2013
 
TaqMan® Rare Mutation Assays w/ Digital PCR | ESHG 2015 Poster PM14.030
TaqMan® Rare Mutation Assays w/ Digital PCR | ESHG 2015 Poster PM14.030TaqMan® Rare Mutation Assays w/ Digital PCR | ESHG 2015 Poster PM14.030
TaqMan® Rare Mutation Assays w/ Digital PCR | ESHG 2015 Poster PM14.030
 
Stephen Friend Inspire2Live Discovery Network 2011-10-29
Stephen Friend Inspire2Live Discovery Network 2011-10-29Stephen Friend Inspire2Live Discovery Network 2011-10-29
Stephen Friend Inspire2Live Discovery Network 2011-10-29
 
Rn ai
Rn aiRn ai
Rn ai
 

More from Y-h Taguchi

Tensor decomposition based and principal component analysis based unsupervise...
Tensor decomposition based and principal component analysis based unsupervise...Tensor decomposition based and principal component analysis based unsupervise...
Tensor decomposition based and principal component analysis based unsupervise...Y-h Taguchi
 
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明Y-h Taguchi
 
Tensor decomposition­based unsupervised feature extraction identified the un...
Tensor decomposition­based unsupervised  feature extraction identified the un...Tensor decomposition­based unsupervised  feature extraction identified the un...
Tensor decomposition­based unsupervised feature extraction identified the un...Y-h Taguchi
 
Tensor decomposition ­based unsupervised feature extraction applied to matrix...
Tensor decomposition ­based unsupervised feature extraction applied to matrix...Tensor decomposition ­based unsupervised feature extraction applied to matrix...
Tensor decomposition ­based unsupervised feature extraction applied to matrix...Y-h Taguchi
 
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発Y-h Taguchi
 
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...Y-h Taguchi
 
Rectified factor networks for biclustering of omics data
Rectified factor networks for biclustering of omics dataRectified factor networks for biclustering of omics data
Rectified factor networks for biclustering of omics dataY-h Taguchi
 
テンソル分解を用いた教師なし学習による変数選択
テンソル分解を用いた教師なし学習による変数選択テンソル分解を用いた教師なし学習による変数選択
テンソル分解を用いた教師なし学習による変数選択Y-h Taguchi
 
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索Y-h Taguchi
 
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定Y-h Taguchi
 
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定Y-h Taguchi
 
Principal component analysis based unsupervised feature extraction applied to...
Principal component analysis based unsupervised feature extraction applied to...Principal component analysis based unsupervised feature extraction applied to...
Principal component analysis based unsupervised feature extraction applied to...Y-h Taguchi
 
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...Y-h Taguchi
 
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析Y-h Taguchi
 
SFRP1 is a possible candidate for epigenetic therapy in non­small cell lung ...
SFRP1 is a possible candidate for epigenetic  therapy in non­small cell lung ...SFRP1 is a possible candidate for epigenetic  therapy in non­small cell lung ...
SFRP1 is a possible candidate for epigenetic therapy in non­small cell lung ...Y-h Taguchi
 
A cross-species bi-clustering approach to identifying conserved co-regulated ...
A cross-species bi-clustering approach to identifying conserved co-regulated ...A cross-species bi-clustering approach to identifying conserved co-regulated ...
A cross-species bi-clustering approach to identifying conserved co-regulated ...Y-h Taguchi
 
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定Y-h Taguchi
 
Identification of aberrant gene expression associated with aberrant promoter ...
Identification of aberrant gene expression associated with aberrant promoter ...Identification of aberrant gene expression associated with aberrant promoter ...
Identification of aberrant gene expression associated with aberrant promoter ...Y-h Taguchi
 
Heuristic Principal Component Analysis Based unsupervised Feature Extraction...
Heuristic Principal Component Analysis  Based unsupervised Feature Extraction...Heuristic Principal Component Analysis  Based unsupervised Feature Extraction...
Heuristic Principal Component Analysis Based unsupervised Feature Extraction...Y-h Taguchi
 
Integrating different data types by regularized unsupervised multiple kernel...
Integrating different data types by regularized  unsupervised multiple kernel...Integrating different data types by regularized  unsupervised multiple kernel...
Integrating different data types by regularized unsupervised multiple kernel...Y-h Taguchi
 

More from Y-h Taguchi (20)

Tensor decomposition based and principal component analysis based unsupervise...
Tensor decomposition based and principal component analysis based unsupervise...Tensor decomposition based and principal component analysis based unsupervise...
Tensor decomposition based and principal component analysis based unsupervise...
 
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明
 
Tensor decomposition­based unsupervised feature extraction identified the un...
Tensor decomposition­based unsupervised  feature extraction identified the un...Tensor decomposition­based unsupervised  feature extraction identified the un...
Tensor decomposition­based unsupervised feature extraction identified the un...
 
Tensor decomposition ­based unsupervised feature extraction applied to matrix...
Tensor decomposition ­based unsupervised feature extraction applied to matrix...Tensor decomposition ­based unsupervised feature extraction applied to matrix...
Tensor decomposition ­based unsupervised feature extraction applied to matrix...
 
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発
 
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...
 
Rectified factor networks for biclustering of omics data
Rectified factor networks for biclustering of omics dataRectified factor networks for biclustering of omics data
Rectified factor networks for biclustering of omics data
 
テンソル分解を用いた教師なし学習による変数選択
テンソル分解を用いた教師なし学習による変数選択テンソル分解を用いた教師なし学習による変数選択
テンソル分解を用いた教師なし学習による変数選択
 
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索
 
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定
 
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定
 
Principal component analysis based unsupervised feature extraction applied to...
Principal component analysis based unsupervised feature extraction applied to...Principal component analysis based unsupervised feature extraction applied to...
Principal component analysis based unsupervised feature extraction applied to...
 
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...
 
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析
 
SFRP1 is a possible candidate for epigenetic therapy in non­small cell lung ...
SFRP1 is a possible candidate for epigenetic  therapy in non­small cell lung ...SFRP1 is a possible candidate for epigenetic  therapy in non­small cell lung ...
SFRP1 is a possible candidate for epigenetic therapy in non­small cell lung ...
 
A cross-species bi-clustering approach to identifying conserved co-regulated ...
A cross-species bi-clustering approach to identifying conserved co-regulated ...A cross-species bi-clustering approach to identifying conserved co-regulated ...
A cross-species bi-clustering approach to identifying conserved co-regulated ...
 
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定
 
Identification of aberrant gene expression associated with aberrant promoter ...
Identification of aberrant gene expression associated with aberrant promoter ...Identification of aberrant gene expression associated with aberrant promoter ...
Identification of aberrant gene expression associated with aberrant promoter ...
 
Heuristic Principal Component Analysis Based unsupervised Feature Extraction...
Heuristic Principal Component Analysis  Based unsupervised Feature Extraction...Heuristic Principal Component Analysis  Based unsupervised Feature Extraction...
Heuristic Principal Component Analysis Based unsupervised Feature Extraction...
 
Integrating different data types by regularized unsupervised multiple kernel...
Integrating different data types by regularized  unsupervised multiple kernel...Integrating different data types by regularized  unsupervised multiple kernel...
Integrating different data types by regularized unsupervised multiple kernel...
 

Refined blood-borne miRNome of human diseases via PCA-based feature extraction

  • 1. Refined blood-borne m iRNom e of hum an diseases via PCA-based feature extraction Y-h. Taguchi Departm ent of Physics, Chuo University Yoshiki Murakam i Center for Genom ic Medicine Kyoto University
  • 2. Caution: Main results obtained by the collaboration with Prof. Murakam i are based upon his own experim ents ( * ), but our results are related to planed patent proposal. T hus, here we decided to present our m ethods applied to alternative public data. (*) to be subm itted to Journal of hepatology
  • 3. 1. T concept of PCA based feature extraction he 2. What is m iRNA (will be skipped)? 3. Previous Work (Dry + Wet) 4. Proposed m ethod + Results 5. S m ary & Conclusion um
  • 4. 1. T concept of PCA based feature extraction he Why feature extraction? ・ Avoiding overfitting ・Needs for experim ental validation too m any genes/proteins cannot be tested. ・S everal m ethods require fewer state variables than observations One of problem s: Feature extraction itself rarely passes cross validation test.
  • 5. Conventional S ples am T S est et Group1 Group2 Group3 Feature Feature Extraction ≠ Extraction Model Model Construction Construction Training Set Validation
  • 6. Proposed S ples am T S est et Group1 Group2 Group3 Feature Extraction Without Model Model knowledge Construction Construction about classification/t arget variable Training Set Validation
  • 7. 2. What is miRNA? m iRNA is a kind of non-coding RNA. m iRNAs are believed to suppress target gene expression by degradation of m RNAs. Im portant features: ・T ypically, there are hundreds kinds of m iRNAs found for each species (c.a., ≧1000 for hum an). ・ Each m iRNA targets m ore than hundreds of genes. ・ m iRNA m ainly contributes to cell type change (e.g., cancer, defferentiation, diseases) ・Infulence to target gene expression by m iRNA is subtle (〜10%) and contexts dependent. ・In spite of that, m iRNA critically contributes to the related processes
  • 8. 3. Previous Work (Dry + Wet) Toward the blood-borne m iRNom e of hum an diseases, A. Keller et al., Nature Method, (2011). Discrim ination between diseases using m iRNA in blood Feature (m iRNA) selection : P-value (t test) Discrim ination: S with several types of kernels + grid based VC
  • 9.
  • 10. cf. Nature Method, 10 m iRNAs <0.7
  • 11. 4. Proposed m ethod + Results Data ⇓ PCA ⇓ Feature S election (without classification inform ation) LDA
  • 12. PCA (sam ples: ◯ Control diseases/cancers) △ lung cancer  diseases cancers
  • 13. PCA (m iRNAs) Why outliners? Feature extraction ⇓ (m iRNAs) m ain contribution m iRNA to PCA em beddings of sam ples 10 outliner m iRNAs Why 10? ⇓ T com pare with o Nature Method paper results
  • 14. PCA, again (sam ples ◯ Control after feature extraction) △ lung cancer diseases cancers
  • 15. Control vs Lung Cancer LDA with PCA (after feature extraction, up to the 5th PC) Actual control lung cancer Prediction control 56 8 lung cancer 14 24 Accuracy 0.784 0.813 Specificity 0.800 0.844 Sensitivity 0.750 0.781 Precision 0.632 cf. Nature Method, 250 miRNAs
  • 16. Relatively 0.813 0.844 0.781 250 m iRNAs Best Relatively 0.867 0.867 0.844 150 m iRNAs Worst >0.70 (+)(-) : Com parison with 10 m iRNA results in Nature Methods
  • 17. S elected m iRNAs: diseases/cancers vs norm al (+)/(-) : up/downregulated after the transform ation by PCA+LDA (*) not selected independence of diseases/cancers
  • 18. 5. S m ary & Conclusion um Advantages of proposed m ethod ・ No need of classification inform ation for feature selection ・ Independent of training/test set division for feature selection (Thus, stable)