Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Meeting Cancéropoles Clara-Paca - 08092010


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

Meeting Cancéropoles Clara-Paca - 08092010

  1. 1. High throughput genomics data integration for cancer Ghislain Bidaut CRCM Integrative Bioinformatics Team Centre de Recherche en Cancérologie de Marseille Inserm U891, Institut Paoli-Calmettes, Université de la Méditerranée
  2. 2. CRCM Integrative Bioinformatics team Projects <ul><li>ITI ( Interactome-Transcriptome integration ) Linking interactome to disease: a network-based analysis of metastatic relapse in breast cancer (Maxime Garcia) </li></ul><ul><li>MAT (Meta Analysis in the Transcriptome) Search for coexpressed genes by mining ArrayExpress and Gene Expression Omnibus (Fanny Blondin) </li></ul><ul><li>InfoCyt: high throughput bioinformatics for flow cytometry data analysis (Philippe Rouillier, Ph.D., Olivier Stahl) </li></ul><ul><li>Djeen : A high throughput multi-technological Research Information Management System for the Joomla! CMS (Olivier Stahl, Arnaud Guille) </li></ul>
  3. 3. ITI: Biomarkers signing five years metastatic relapse in BC <ul><li>Discovery of biomarkers for improvement of adjuvant chemotherapy decision for BC patients. </li></ul><ul><li>Two studies: </li></ul><ul><li>Classical analysis: discriminative power of the two signatures is not reproduced when crossing studies (Ein dor et al., 2006) </li></ul><ul><li>A strong dependencies of signature over training set has been shown (Michiels et al., 2005) </li></ul><ul><li>1) Microarrays do not detect drivers genes (mutations) </li></ul><ul><li>2) Curse of dimensionality </li></ul><ul><li>Network based analysis: Chuang et al (Nat. Biotech. 2007, cross validation), van Vliet et al. (PLoS one, 2007) </li></ul>76 gènes 70 gènes Wang et al. (286 patients) van’t Veer et al. van de Vijver et al. (295 patients) 3
  4. 4. <ul><li>Subnetworks are scored by correlation: </li></ul><ul><li>Subnetworks must meet a minimal score Sth over c datasets </li></ul>1) Microarrays do not detect drivers genes (mutations) 2) Curse of dimensionality <ul><li>Interaction Data </li></ul><ul><li>6 PPI datasets, 137280 interactions over 13777 proteins </li></ul><ul><ul><li>Gene Expression Data </li></ul></ul><ul><ul><li>5 GEP datasets, 780 samples in total (including 31 IPC patients) </li></ul></ul>
  5. 5. Classification BC metastatic relapse Average accuracy on CV runs ~77% of correctly classified patients (mixed ER+, ER-) Improvements: -Integration w/ CGH data -Comparison w/ other signatures - Integration of subnetwork in classification
  6. 6. Djeen : A high throughput multi-technological Research Information Management System for the Joomla! CMS Hierarchy Right management Joomla! CMS Multitechnological Templates Experiment Design
  7. 7. Djeen : A high throughput multi-technological Research Information Management System for the Joomla! CMS (2) <ul><li>User/Group rights management </li></ul><ul><li>Hierarchical data organisation (Project, Experiment Files) </li></ul><ul><li>Follows wet laboratories workflow </li></ul><ul><li>Multitechnological databases </li></ul><ul><li>Templates </li></ul><ul><li>Experimental design and export to instruments </li></ul><ul><li>CMS-based system </li></ul>Doc & download: http:// / djeen CRCM instance: http:// /code Stahl et al. (Submitted to Bioinformatics)
  8. 8. InfoCyt: high throughput bioinformatics for flow cytometry data analysis Fluorochromes coupled to surface markers CD98 <ul><li>Exploration of all combination </li></ul><ul><li>Unbiased and reproducible analysis </li></ul><ul><li>New classes discovery </li></ul>CD8 CD44 Automated decection (Mixture Models, package FlowCore) CD4
  9. 9. InfoCyt: high throughput bioinformatics for flow cytometry data analysis (2) Experimental design [DJEEN] Tumor microenvironment (Colon Cancer) [J Galon, CRC] Tumor microenvironment (Colon and Breast Cancer) [D Olive, CRCM] <ul><li>- Data integration </li></ul><ul><li>- Visualisation </li></ul><ul><li>Sample Classification </li></ul><ul><li>- Prognosis prediction </li></ul>-Population characterisation -Population/sample matching Projects T-cell development, (mouse models) [M Malissen, CIML] - Automated population detection (Gating) by mixture models [Package Bioconductor FloClust, Finak et al., 2009 ] High throughput Analysis Export experimental structure and instrument configuration to instrument [DJEEN] (BD ® FACSDiva) Data import [DJEEN] Experimentation
  10. 10. Summary & future directions <ul><li>ITI </li></ul><ul><ul><li>Submission to Bioinformatics </li></ul></ul><ul><ul><li>Selected for oral presentation at the Cancer Bioinformatics Workshop (2-4 sept 2010, Cambridge) </li></ul></ul><ul><ul><li>Todo: CGH, classif with subnetworks, </li></ul></ul><ul><li>InfoCyt </li></ul><ul><ul><li>Based on the BioConductor package Flowcore </li></ul></ul><ul><ul><li>Current: Assignation of labels to detected population </li></ul></ul><ul><ul><li>Todo: streamline pipeline to perform classification on large patients cohorts </li></ul></ul><ul><li>DJEEN </li></ul><ul><ul><li>Manuscript under review ( Bioinformatics ) </li></ul></ul><ul><ul><li>Production version released this month (CRCM, CIML) </li></ul></ul><ul><ul><li>Todo: links to ITI & InfoCyt pipeline </li></ul></ul><ul><li>Recent publications from the group </li></ul><ul><ul><li>Garcia M . Et al (in press) Handbook of Research on Computational and Systems Biology: Interdisciplinary Applications . </li></ul></ul><ul><ul><li>Bidaut G. Biomedical Informatics for Cancer Research . 315-333. </li></ul></ul><ul><ul><li>Bidaut G & Stoeckert CJ. Proc. of the Pacific Symp. on Biocomp. 2009:356-67. </li></ul></ul><ul><ul><li>Bidaut G & Stoeckert CJ. Methods Enzymol. 2009;467:229-45. </li></ul></ul>
  11. 11. Acknowledgements <ul><li>ITI </li></ul><ul><ul><li>D. Birnbaum (IPC, CRCM) </li></ul></ul><ul><ul><li>F. Bertucci (IPC) </li></ul></ul><ul><ul><li>S. Carpentier (Ipsogen) </li></ul></ul><ul><ul><li>M. Chaffanet (IPC) </li></ul></ul><ul><ul><li>Junwen Wang (HKU) </li></ul></ul><ul><li>InfoCyt & Djeen </li></ul><ul><ul><li>M. Malissen (CIML) </li></ul></ul><ul><ul><li>S. Granjeaud (TAGC) </li></ul></ul><ul><ul><li>D. Olive (IPC, CRCM) </li></ul></ul><ul><li>IB Team </li></ul><ul><ul><li>O. Stahl (IE, InfoCyt, DJEEN) </li></ul></ul><ul><ul><li>M. Garcia (Doctorant, ITI) </li></ul></ul><ul><ul><li>P. Rouillier (Post-doc, InfoCyt) </li></ul></ul><ul><ul><li>A. Guille (M2, DJEEN) </li></ul></ul>
  12. 12. Data workflow <ul><li>Input data </li></ul><ul><ul><li>5 GEP datasets, 780 samples in total (including 31 IPC patients) </li></ul></ul><ul><ul><li>6 PPI dataset, 137280 interactions over 13777 proteins </li></ul></ul>Null score distribution on Loi et al Step 1: Subnetwork Detection (ITI) Input 2: Protein-Protein Interaction data Input 1: 780 GE samples 10-fold cross validation Step 2: Statistical validation - type 1 p-value : random subnetworks - type 2 p-value : shuffled expression data - type 3 p-value : random interactome Step 3: Subnetwork intersection - type 1 (p-value < 10 ˉ² on 1 dataset) - type 2 (p-value < 10 ˉ³ on 1 dataset) - type 3 (p-value < 10 ˉ¹ on 1 dataset) SVM classification (majority voting) Test data Training data score Gene signature Outcome Prediction
  13. 13. Classification IBC 65% of correctly classified patients <ul><li>Training set: </li></ul><ul><li>Dresman et al. (36 patients, Affy U133) </li></ul><ul><li>Nguyen et al. (36 patients, Affy U133) </li></ul><ul><li>Test set: </li></ul><ul><li>196 IPC patients </li></ul>
  14. 14. Alternative analysis by integration interactome-transcriptome <ul><li>Exemple: voie Ras </li></ul><ul><li>Variabilité technologique, expérimentale, et biologique </li></ul>Dataset 1 Dataset 2
  15. 15. New biomarkers linked to metastasis were found <ul><li>Apoptosis [Subnetworks 291, 5714] </li></ul><ul><li>Cell adhesion [Subnetwork 6513] </li></ul><ul><li>Cell cycle control [Subnetworks 1537, 581,7013, 5339] </li></ul><ul><li>Immune response[Subnetworks 291, 2810, 3251] </li></ul><ul><li>Developpement [Subnetworks 387, 58, 3420,7013,60312,3251,375] </li></ul><ul><li>Metabolism [Subnetworks 29959, 3420, 581,4291,5339, 2068,374291] </li></ul>20 YWHAZ 19 PRKCI 18 CRMP1 17 SFN 16 PPFIA1 15 BAX 14 CYCS 13 AGTPBP1 12 MAPKAPK2 11 HSPB1 10 ACTN1 9 HNRNPA1 8 TSC1 7 TK1 6 SF3B3 5 LUC7L3 4 GRB2 3 STMN2 2 CCND1 1 CDC2 Rang Gene
  16. 16. ITI web resource for functional exploration: ITIDB <ul><li>Subnetwork DB </li></ul><ul><li>http:// ) </li></ul><ul><li>Global exploration of subnetworks and their components –link to NCBI, sorting by discriminative score </li></ul><ul><li>Database contains subnetworrks linked to </li></ul><ul><ul><li>IBC (Inflammatory form of breast cancer) </li></ul></ul><ul><ul><li>BC Metastatic Relapse </li></ul></ul>