Integrating different data types by regularized unsupervised multiple kernel...
Refined blood-borne miRNome of human diseases via PCA-based feature extraction
1. Refined blood-borne m iRNom e of hum an
diseases via PCA-based feature extraction
Y-h. Taguchi
Departm ent of Physics,
Chuo University
Yoshiki Murakam i
Center for Genom ic Medicine
Kyoto University
2. Caution:
Main results obtained by the collaboration
with Prof. Murakam i are based upon his own
experim ents ( * ), but our results are related
to planed patent proposal. T hus, here we
decided to present our m ethods applied to
alternative public data.
(*) to be subm itted to Journal of hepatology
3. 1. T concept of PCA based feature extraction
he
2. What is m iRNA (will be skipped)?
3. Previous Work (Dry + Wet)
4. Proposed m ethod + Results
5. S m ary & Conclusion
um
4. 1. T concept of PCA based feature extraction
he
Why feature extraction?
・ Avoiding overfitting
・Needs for experim ental validation
too m any genes/proteins cannot be tested.
・S everal m ethods require fewer state
variables than observations
One of problem s:
Feature extraction itself rarely passes cross
validation test.
5. Conventional S ples
am T S
est et
Group1 Group2 Group3
Feature Feature
Extraction ≠ Extraction
Model Model
Construction Construction
Training Set Validation
6. Proposed
S ples
am
T S
est et
Group1 Group2 Group3
Feature
Extraction
Without
Model Model
knowledge Construction Construction
about
classification/t
arget
variable Training Set Validation
7. 2. What is miRNA?
m iRNA is a kind of non-coding RNA.
m iRNAs are believed to suppress target gene
expression by degradation of m RNAs.
Im portant features:
・T ypically, there are hundreds kinds of m iRNAs found for
each species (c.a., ≧1000 for hum an).
・ Each m iRNA targets m ore than hundreds of genes.
・ m iRNA m ainly contributes to cell type change
(e.g., cancer, defferentiation, diseases)
・Infulence to target gene expression by m iRNA is subtle
(〜10%) and contexts dependent.
・In spite of that,
m iRNA critically contributes to the related processes
8. 3. Previous Work (Dry + Wet)
Toward the blood-borne m iRNom e of hum an
diseases, A. Keller et al., Nature Method,
(2011).
Discrim ination between diseases using m iRNA
in blood
Feature (m iRNA) selection :
P-value (t test)
Discrim ination:
S with several types of kernels + grid based
VC
11. 4. Proposed m ethod + Results
Data
⇓
PCA
⇓
Feature S election
(without classification inform ation)
LDA
12. PCA (sam ples: ◯ Control
diseases/cancers) △ lung cancer
diseases
cancers
13. PCA (m iRNAs) Why outliners?
Feature extraction ⇓
(m iRNAs) m ain contribution
m iRNA to PCA
em beddings of
sam ples
10 outliner
m iRNAs Why 10?
⇓
T com pare with
o
Nature Method
paper results
14. PCA, again (sam ples ◯ Control
after feature extraction) △ lung cancer
diseases
cancers
15. Control vs Lung Cancer
LDA with PCA
(after feature extraction, up to the 5th PC)
Actual
control lung cancer
Prediction
control 56 8
lung cancer 14 24
Accuracy 0.784 0.813
Specificity 0.800 0.844
Sensitivity 0.750 0.781
Precision 0.632
cf. Nature Method,
250 miRNAs
16. Relatively 0.813 0.844 0.781 250 m iRNAs
Best
Relatively
0.867 0.867 0.844 150 m iRNAs
Worst
>0.70
(+)(-) : Com parison with 10 m iRNA results in Nature Methods
17. S elected m iRNAs: diseases/cancers vs norm al
(+)/(-) : up/downregulated after the
transform ation by PCA+LDA
(*) not selected independence of diseases/cancers
18. 5. S m ary & Conclusion
um
Advantages of proposed m ethod
・ No need of classification inform ation for
feature selection
・ Independent of training/test set division for
feature selection (Thus, stable)