Présentation ITI - CRCM - 1st Scientific department Day
Upcoming SlideShare
Loading in...5
×
 

Présentation ITI - CRCM - 1st Scientific department Day

on

  • 209 views

 

Statistics

Views

Total Views
209
Views on SlideShare
209
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Cartes d’interaction proteines proteins
  • Il n’existe pas d’outil tesls que djjeen Extenstion supplémentaires des informations utilisées dans ITI Mise à disposition

Présentation ITI - CRCM - 1st Scientific department Day Présentation ITI - CRCM - 1st Scientific department Day Presentation Transcript

  • ITI - Interactome-Transcriptome Integration An integrated network approachto decipher metastasis molecular basis in breast cancer Ghislain Bidaut (IR2) - Cibi - CRCM Integrative Bioinformatics Platform Centre de Recherche en Cancérologie de Marseille Aix-Marseille-Univ, Institut Paoli-Calmettes, Inserm U1068, CNRS U7258
  • Microarray technology allowed for a better understanding ofbreast cancer (BC) biology • For detection of biomarkers signing five years metastatic relapse in BC [see van’t Veer et al. (2002), Wang et al. (2005)] • BC is a complex disease, 5 molecular subtypes (basal, luminal A/B, HER2+, normal-like), based on the expression of several markers (ER, PR, HER2), several phenotypes (metastatic or not, inflammatory or not). • Clinical goal: treatment personalization by improvement of adjuvant chemotherapy decision for BC patients with cDNA arrays over histology. • Scientific goal: understanding the molecular background of such heterogeneity.
  • In particular, Post genomic approaches allowed thediscovery of biomarkers signing for metastatic relapse in BC Significant sudies - van’t Veer et al. (295 patients) → 70 genes signature (MAMMAPRINT) - Wang et al. (286 patients) → 76 genes signature (Specific ER+,ER-) • A strong dependency of signature over training set has been shown (Michiels et al., 2005) • The discriminative power of the two signatures is poorly reproduced when crossing studies (Ein dor et al., 2006) • Instability of signatures (Only 3 genes in common between the 70g and the 76g signatures) (Chuang et al., 2007).
  • Analysis issues • Assay-related Issues – Experimental variablility – Cohort variability • Microarray-related issues – Curse of dimensionality (data topology) – Microarrays do not separate drivers genes and passengers genes (no causality) • Solutions – Increase sample size (Dobbin et al., 2008): Use of publicly available data – Add prior biological information [Causal Information]: Network-based analysis (Chuang et al., 2007) -> Interactome-Transcriptome Integration
  • A causal approach by Interactome-Transcriptome Integration Signature 2 Signature 1 Disease Technological, biological and experimental variability
  • Interactome-Transcriptome Integration (ITI) Algorithm Subnetworks must meet a minimal score Sth over c datasets Node is added if score increase more than learning rate r From Garcia et al., Bioinformatics (2012)
  • Subnetwork score: links Clinical information and GEP nd 1  S s ,d = corr  ∑g∈s e( g , d ), cc( d )  max nd ( DS ) n  1 Ss = NS ∑ d ∈DS S s ,d - Correlation Gene Expression Profile – Clinical Condition (DMFS) - Normalisation by dataset sample size - Summation over all datasets
  • ITI Global Workflow Input 1: Input 2: Expression data Protein-Protein Interaction data Step 1: # subnetworks Subnetwork Detection with ITI Step 2: Statistical validation -type 1 p-value: random subnetworks -type 2 p-value: shuffled expression data -type 3 p-value: random interactome score Null score distribution on Loi et al Step 3: Subnetwork intersection Gene signature Patients to be tested Metastasis (independent data) SVM classification Prediction
  • Breast Cancer Compendium (2464 tumors) Dataset NCBI Accession number Platform # samples # samples after Patient follow (if available) before filtering up filtering Desmedt GSE7390 U133A 198 198 Yes Finetti U133 Plus 2.0 129 129 Yes Ivshina GSE4922 U133 Plus 2.0 289 249 Yes Loi GSE6532 U133A + U133B 327 293 Yes U133 Plus 2.0 87 87 Parker GSE10886 Agilent-011521 1A G4110A 2 2 Yes Agilent-012097 1A G4110B 27 22 Agilent 1A Oligo UNC Custom 196 177 Pawitan GSE1456 U133A + U133B 159 159 Yes Schmidt GSE11121 U133A 200 200 Yes Sotiriou GSE2990 U133A 189 179 Yes van de Vijver Agilent whole human genome 295 295 Yes Wang GSE2034 U133A 286 286 Yes Zhang GSE12093 U133A 136 136 Yes Zhou GSE7378 U133Av2 54 54 Yes Total: 12 7 distinct 2572 2464 All
  • Interaction data – 13202 proteins, 70530 interactions Resource # # binary Nature proteins interactio ns HPRD [Human Protein 9386 36577 Y2H Resource Database] In vitro (Prasad et al., 2009 In vivo Cocite (Ramani et al., 2005) 6349 15705 In silico [Cocite algorithm] DIP [Database of Interacting 918 810 In vitro/manually curated Proteins] MINT [Molecular 5559 12143 Manually curated from Interactions Database] [Ceol literature et al., 2009) INTAct 7471 25616 Large scale asays (Y2H, CoIP, pull-down) TOTAL 13203 70530 3 Types
  • Analysis organization ITI 1.0: Non-specific analysis 2 analysis were defined with independent testing - Training with all – independent testing on van de Vijver’s dataset - Training with Affymetrix-based data only - independent testing on van de Vijver’s dataset
  • ITI 1.0: SVM classification on BC metastatic relapse shows61% accuracy on “van de Vijver” data Training #genes Accuracy Sensitivity Specificity Set All tumors 180 0.58 0.27 0.74 Tumors profiled 100 0.61 0.27 0.78 On Affymetrix (test on same Van de Vijver et cohort) 70 genes NA NA al. (2002) 0.61 (reported by Michiels et al) Improvements: - Filtering of training data -Training with cross validation
  • Reducing heterogeneity with ER status specificity ITI 2.0: Reduced compendium: 5 datasets, 7 platforms, 764 tumors Samples were further filtered on - No treatment - Metastasis info availability - Sample Censored if follow-up<5 years - Separation on ER status - Training: 10 fold cross validation with distribution balancing of each dataset of ‘metastatic’/’non metastatic’ patients - Testing: 2 runs with independent testing on Desmedt’s dataset, and van de Vijver’dataset
  • ITI 2.0 Data workflow with cross-validation training strategy
  • P-value Thresholds and data topology p-value threshold – n Dataset Held out # subnetworks # genes datasets Desmedt (ER-) 1e-4 – 2 165 2310 Desmedt (ER+) 1e-4 – 2 6 175 Van De Vijver (ER-) 1e-4 – 2 122 1481 Van De Vijver (ER+) 1e-4 – 2 14 272 •ER+ : Triple neg, Luminal 1, B (prevalence: 48-68%*) •ER- : Basaux, HER+ (prevalence: 21-32%*) From Garcia et al. (2012) Bioinformatics. * Carey LA, Perou CM, Livasy CA, et al. Race, breast cancer subtypes, and survival in the Carolina Breast cancer Study. JAMA. 295(21):2492-2502, 2006.
  • Classifier performance on BC after 10 fold cross-validationand stratification is higher than for public signatures on ER+and ER- subtypes Status ER- ER+ Dataset Desmedt van de Vijver Desmedt van de Vijver Signatu 70 g 76 g ITI(165) 70 g 76 g ITI(122) 70 g 76 g ITI(6) 70 g 76 g ITI(14) re N 61 61 61 36 36 36 129 129 129 114 114 114 ACC 0.442 0.377 0.541 0.528 0.556 0.528 0.411 0.604 0.736 0.623 0.632 0.518 SV 1 0.333 0.407 1 0.471 0.118 0.714 0.714 0.257 0.821 0.564 0.256 SP 0 0.411 0.647 0.106 0.632 0.895 0.298 0.563 0.915 0.520 0.667 0.653Signatures obtained with ITI show a stability of 11.5-32.8% for different training setsFrom Garcia et al. (2012) Bioinformatics.
  • New biomarkers linked to metastasiswere found (ITI 1.0) Gene Rang CDC2 1 CCND1 2 • Apoptosis [Subnetworks 291, 5714] STMN2 3 GRB2 4 • LUC7L3 5 Cell adhesion [Subnetwork 6513] SF3B3 6 TK1 7 • Cell cycle control [Subnetworks 1537, 581,7013, 5339] TSC1 8 HNRNPA1 9 ACTN1 10 • Immune response[Subnetworks 291, 2810, 3251] HSPB1 11 MAPKAPK2 12 • AGTPBP1 13 Developpement [Subnetworks 387, 58, 3420,7013,60312,3251,375] CYCS 14 BAX 15 • Metabolism [Subnetworks 29959, 3420, PPFIA1 16 581,4291,5339, 2068,374291] SFN 17 CRMP1 18 PRKCI 19 YWHAZ 20
  • Novel markers (not previously linked to metastasis) havebeen discovered (ITI 1.0) Genes previously known to be associated with cancer - GRB2: growth factor receptor-bound protein 2 Biological process: - epidermal growth factor receptor signaling pathway (GO:0007173) - Ras protein signal transduction (GO:0007265) - TK1: thymidine kinase 1, soluble Biological process: - nucleic acid metabolic process (GO:0006139) - DNA replication (GO:0006260) Genes not previously known to be associated with cancer - STMN2 : stathmin-like 2 Microtubule formation and signalling. Previously associated with Alzheimer’s disease
  • ITI web resource for functional exploration: ITIDB• ITIDB Subnetwork Database• bioinformatique.marseille.inserm.fr/iti• Global exploration of subnetworks and their components –link to NCBI, sorting by discriminative score• Database contains subnetworks linked to BC Metastatic Relapse
  • Subnetworks biology reveals fundamental hallmarks ofcancer subnetwork 291-3 subnetwork 26973-3 Mutation In metastatic BC Apoptosis regulator Cell Division Cell Cycle
  • Other results From Garcia et al. (2012) Bioinformatics.
  • Clinical/Survival analysis shows improvement over existingsignatures From Garcia et al. (2012) Bioinformatics.
  • ITI 3.0 Graph Search algorithm Example of ‘elongated’ subnetwork Examples of ‘circular’ subnetwork - ITI 1.0, 2.0: in-Depth exploration - Not really adapted to graphs - Problem of elongated subnetworks - ITI 3.0: implementation of graph ‘transversal- exploration’ algorithm
  • ITI 3.0 Preliminary results. Reorganisation of search algorithm improvedclassification results (test on Desmedt ER+) Signature 70 g 76 g ITI(6) ITI 3.0 (39) N 129 129 129 129 TN 28 53 86 80 FP 66 41 8 14 TP 25 25 9 18 FN 10 10 26 17 ACC 0.411 0.604 0.736 0.760 SV 0.714 0.714 0.257 0.514 SP 0.298 0.563 0.915 0.851
  • Learning rate r optimization Median accuracy for 10-fold stratification 0,73 0,71 0,69 0,67 0,65 MEDIAN(ALL) Accuracy MAX(ALL) 0,63 0,61 0,59 0,57 0,55 0,005 0,008 0,01 0,015 0,02 0,025 0,03 0,04 0,05 0,06 0,07 0,08 0,09 0,1 0,11 0,12 0,13 0,14 0,15 Threshold Optimal rate=0.03
  • Learning rate r optimization Optimal rate=0.03
  • ITI 3.0 summary - Improvements of subnetwork search (problem of elongated subnetwork effect) - Improvement on classification - Improvements on obtained p-values and consensus - Currently adding Common Genomics Hybridization (CGH) array data in subnetwork search
  • Other project - Djeen: A high throughput multi-technological Research Information Management System for the Joomla! CMSwnload: http://bioinformatique.marseille.inserm.fr/djeenst Instance: http://bioinformatique.marseille.inserm.fr/djeentest User and groups management Joomla! CMS Project Hierarchy Permissions management Simple DB scheme design Experiment Templates, MI Design MultitechnologicalStahl et al. (under preparation – to be submitted to Bioinformatics)
  • Data storage with Djeen: A user-friendly multi-technologicalResearch Information Management System • Based on Joomla! CMS • User/Group rights management * • Hierarchical data organisation (Project, Experiment Files) • Multitechnological databases • Templates • Version 1.5 = all technologies • Version 1.6 = specialized in proteomics with sample history follow up Doc & Download: http://bioinformatique.marseille.inserm.fr/djeen CRCM instance: http://bioinformatique.marseille.inserm.fr/code Test Instance: http://bioinformatique.marseille.inserm.fr/djeentest * Taken from Stahl et al. (under preparation – to be submitted to Bioinformatics)
  • Summary – future directions - Increased performance when integrating transcriptome and interaction data - Hallmarks of cancer are revealed in subnetwork biology - Novel markers of metastasis in BC have been found Next points to be improved/developed • Adding CNV information (CGH, SNPs) • BC heterogeneity nature still limits classification performance • Start analyzing other cancer types with ITI • Define subnetwork comparison approach • Start integrating post-transcriptional regulation (miRNA, méthylome) 1 paper accepted: Garcia M, Millat-Carus R, Bertucci F, Finetti P, Birnbaum D, Bidaut G (2012): Interactome-Transcriptome integration for predicting distant metastasis in breast cancer. Bioinformatics (2012) 28 (5): 672-678. 1 paper under preparation: Stahl et al. Djeen: A multi-technological Research Information Management System See us at Jobim 2012!
  • Acknowledgements •Collaborators on ITI • D. Birnbaum (IPC, CRCM) • F. Bertucci (IPC) http://bioinformatique.marseille.inserm.fr • S. Carpentier (Ipsogen) • M. Chaffanet (IPC) • Junwen Wang (HKU) •C. Ginestier (IPC) •Integrative Bioinformatics Platform • O. Stahl (Engineer) • M. Garcia (grad student) Formers: • C. Rioualen (grad student) • R. Millat-Carus (grad student) •P. Rouillier (postdoc)
  • ITI 1.0-2.0 Aggregation algorithm (‘in-Depth’) For Each node In interactome testSubnetwork = Subnetwork = empty; NeighborAcceptedCandidates = empty; For Each neightborNode In neighbor(node) testSubnetwork = c(neightborNode , Subnetwork) Nc=0; For Each dataset Sc(testSubnetwork)= Compute-subnetwork_score(testSubnetwork, dataset) If Sc(dataset>th) and rate = Sc(testSubnetwork) – Sc(Subnetwork) >r Then Nconsensus = NConsensus+1 NeighborAcceptedCandidates = c(neightborNode , NeighborAcceptedCandidates); End End End Subnetwork = c(Subnetwork, NeighborAcceptedCandidates) End
  • ITI 3.0 Aggregation algorithm (‘transversal exploration’) For Each node In interactome testSubnetwork = Subnetwork = empty; Routine constructNet(node, Subnetwork) testSubnetwork = c(node, Subnetwork) Nconsensus =0; For Each dataset Sc(testSubnetwork)= Compute-subnetwork_score(testSubnetwork, dataset) If Sc(dataset>th) and rate = Sc(testSubnetwork) – Sc(Subnetwork) >r Then Nconsensus = NConsensus+1 End End If Nconsensus>=c Then Subnetwork = testSubnetwork For Each node In neighbor(node) constructNet(node, subnetwork) End Else break End End From Garcia et al. (2011)