The Cancer Genome Atlas Project January 24, 2008
Program <ul><li>Goal: find genomic alterations that cause cancer (mutations, CNA, methylation, …) </li></ul><ul><li>Pilot ...
Organization <ul><li>Biospecimen Core Resource (BCR) </li></ul><ul><li>Genome Sequencing Centers (GSCs) (3) </li></ul><ul>...
PI’s Ari Kahn SRA DCC Chuck Perou UNC   Rick Myers Stanford   Marc Ladanyi MSKCC   Joe Gray LBL   Steve Baylin JHU   Raju ...
URLs <ul><li>project site:  http://cancergenome.nih.gov </li></ul><ul><li>gforge:  http://gforge.nci.nih.gov  (search for ...
Data Types DNA sequencing Somatic Mutations WashU DNA sequencing Somatic Mutations Baylor DNA sequencing Somatic Mutations...
Data Levels <ul><li>raw </li></ul><ul><ul><li>low-level data for a single sample, not normalized (e.g., trace file, .cel f...
Flow Tissue Source (MD Anderson, Henry Ford, …) <ul><li>BCR </li></ul><ul><li>check pathology, quality/quantity </li></ul>...
Data Formats <ul><li>BCR </li></ul><ul><ul><li>XML (tags are CDEs) </li></ul></ul><ul><ul><li>images </li></ul></ul><ul><l...
Where Does/Will the Data Go? <ul><li>ftp site (now with a simple web wrapper: “portal #1”) </li></ul><ul><li>“ tracking da...
Upcoming SlideShare
Loading in …5
×

TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)

1,931 views

Published on

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,931
On SlideShare
0
From Embeds
0
Number of Embeds
46
Actions
Shares
0
Downloads
20
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

TCGA data coordination center: Carl Schaefer and Ari Kahn (NCICB)

  1. 1. The Cancer Genome Atlas Project January 24, 2008
  2. 2. Program <ul><li>Goal: find genomic alterations that cause cancer (mutations, CNA, methylation, …) </li></ul><ul><li>Pilot project </li></ul><ul><ul><li>$100M (NCI/NHGRI) </li></ul></ul><ul><ul><li>3 years </li></ul></ul><ul><ul><li>3 diseases </li></ul></ul><ul><ul><ul><li>brain (glioblastoma multiforme) </li></ul></ul></ul><ul><ul><ul><li>lung (squamous) </li></ul></ul></ul><ul><ul><ul><li>ovarian (serous cystadenocarcinoma ) </li></ul></ul></ul>
  3. 3. Organization <ul><li>Biospecimen Core Resource (BCR) </li></ul><ul><li>Genome Sequencing Centers (GSCs) (3) </li></ul><ul><li>Cancer Genome Characterization Centers (CGCCs) (7) </li></ul><ul><li>Data Coordinating Center (DCC) </li></ul><ul><li>Project Team (NCI/NHGRI) </li></ul><ul><li>Steering Committee (NCI/NHGRI & PIs) </li></ul><ul><li>External Scientific Committee </li></ul><ul><li>Working Groups </li></ul>
  4. 4. PI’s Ari Kahn SRA DCC Chuck Perou UNC   Rick Myers Stanford   Marc Ladanyi MSKCC   Joe Gray LBL   Steve Baylin JHU   Raju Kucherlapati Harvard/B&W   Matthew Meyerson Broad/DFCI CGCC Rick Wilson WashU   Eric Lander Broad   Richard Gibbs Baylor GSC Robert Penny IGC/TGEN BCR
  5. 5. URLs <ul><li>project site: http://cancergenome.nih.gov </li></ul><ul><li>gforge: http://gforge.nci.nih.gov (search for TCGA) </li></ul><ul><li>data: http://tcga-data.nci.nih.gov </li></ul><ul><li>portal: http://tcga-portal.nci.nih.gov [coming] </li></ul>
  6. 6. Data Types DNA sequencing Somatic Mutations WashU DNA sequencing Somatic Mutations Baylor DNA sequencing Somatic Mutations Broad Illumina Infinium 550K BeadChip Array Copy Number Stanford Agilent 44K Array Transcription UNC Illumina GoldenGate Methylation JHU Agilent 244K Array Copy Number MSKCC Affymetrix Exon 1.0 ST Array Transcription LBL Agilent 244K Array Transcription and Copy Number Harvard/B&W Affymetrix U133 Plus 2.0 & SNP Array 6.0 Transcription and Copy Number Broad/DFCI Platform Analysis Institution
  7. 7. Data Levels <ul><li>raw </li></ul><ul><ul><li>low-level data for a single sample, not normalized (e.g., trace file, .cel file) </li></ul></ul><ul><li>processed </li></ul><ul><ul><li>single-sample, normalized & interpreted (e.g. mutation call, amplification call for a locus, .snp, .chp) </li></ul></ul><ul><li>segmented (n/a for mutation & expression) </li></ul><ul><ul><li>single-sample, aggregation of loci into regions (e.g. amplification call for a region of a sample) </li></ul></ul><ul><li>summary finding (aka “region of interest”) </li></ul><ul><ul><li>cross-sample findings (e.g. minimal common region of amplification across a sample set) </li></ul></ul>
  8. 8. Flow Tissue Source (MD Anderson, Henry Ford, …) <ul><li>BCR </li></ul><ul><li>check pathology, quality/quantity </li></ul><ul><li>extract analytes </li></ul><ul><li>prepare data file </li></ul>GSC WGA CGCC DNA, mRNA DNA NCBI Trace Archive DCC sample data Bulk Download caTissue Core caArray caIntegrator “ tracking database”
  9. 9. Data Formats <ul><li>BCR </li></ul><ul><ul><li>XML (tags are CDEs) </li></ul></ul><ul><ul><li>images </li></ul></ul><ul><li>GSC </li></ul><ul><ul><li>Called mutations (Genboree LFF format) </li></ul></ul><ul><ul><li>Linking table </li></ul></ul><ul><ul><ul><li>sample-trace-target </li></ul></ul></ul><ul><li>CGCC </li></ul><ul><ul><li>MAGE-TAB </li></ul></ul><ul><ul><ul><li>IDF: Investigation Definition Format </li></ul></ul></ul><ul><ul><ul><li>SDRF: Sample and Data Relationship Format </li></ul></ul></ul>
  10. 10. Where Does/Will the Data Go? <ul><li>ftp site (now with a simple web wrapper: “portal #1”) </li></ul><ul><li>“ tracking database” </li></ul><ul><li>repositories with caBIG API’s </li></ul><ul><ul><li>caArray </li></ul></ul><ul><ul><li>caTissue CORE </li></ul></ul><ul><ul><li>caIntegrator </li></ul></ul><ul><ul><li>NCIA </li></ul></ul><ul><li>NCBI trace archive </li></ul><ul><li>a richer, “portal #2” </li></ul><ul><ul><li>more convenient download capability </li></ul></ul><ul><ul><li>filtering datasets by clinical information </li></ul></ul><ul><ul><li>summary level data </li></ul></ul><ul><ul><li>genome browser view </li></ul></ul><ul><ul><li>gene info page </li></ul></ul><ul><ul><li>visualization on pathways </li></ul></ul><ul><ul><li>etc. </li></ul></ul>

×